Random Forest Algorithm with derived Geographical Layers for Improved Classification of Remote Sensing Data

Citation

Contact

PDF

HOME

Random Forest Algorithm with derived Geographical Layers for Improved Classification of Remote Sensing Data

Uttam Kumar^1,2,3 Anindita Dasgupta¹ Chiranjit Mukhopadhyay² T.V. Ramachandra^1,3,4,*

¹Energy and Wetlands Research Group, Centre for Ecological Sciences [CES], ²Department of Management Studies, ³Centre for Sustainable Technologies (astra),
⁴Centre for infrastructure, Sustainable Transportation and Urban Planning [CiSTUP], Indian Institute of Science, Bangalore – 560012, India.
*Corresponding author: cestvr@ces.iisc.ernet.in

INTRODUCTION

Classification of remote sensing (RS) data accurately is a prerequisite for many environmental and socio-economic applications such as urban change detection, urban heat islands, etc. [1]. Satisfactory classification of RS data depends on several factors including (a) the characteristics of study area, (b) availability of suitable RS data, (c) ancillary and ground reference data, (d) proper use of variables and classification algorithms, (e) user’s experience with reference to the application and (f) time constraints [2]. Furthermore, diverse landscapes and terrain types have a mixture of both homogeneous and heterogeneous land cover (LC) classes and require supplemental environmental or geographical layers for improved classification accuracies. Increased spectral variation is common with high degree of spectral heterogeneity in complex landscapes [3]. For example, urban landscapes are composed of features having a complex mix of buildings, roads, flyovers, pavements, trees and lakes which are sometimes smaller than the medium spatial resolution sensors [4]. This creates mixed pixels, a common problem prevalent in residential areas where buildings, trees, lawns, concrete, and asphalt all occur within a pixel, often responsible for low classification accuracy. In landscapes with mountains and dense forests, problems arise due to changes in elevation, topographic differences and often shades (shadows) produced by hillocks and long trees due to altitudinal variations, which is a major challenge for selection of suitable image processing approach over a large area.

On the other hand, availability of fine spatial resolution data such as IKONOS Multispectral (MS) and Panchromatic (PAN) provide vast opportunities in urban studies. A major advantage is the reduction of mixed pixel problem and finer extraction of detailed information of urban entities compared to medium spatial resolution data. However, high spatial resolution data are expensive and requires more time for analysis than medium spatial resolution [2]. Moreover, it often leads to high rate of spectral confusion due to spectral variations present within the LC class and poor classification performance due to limited number of spectral bands [5]. In practice, data acquired from medium spatial resolution sensors such as Landsat TM/ETM+ or IRS LISS-III, being readily available for multiple dates, are commonly used for most landscape analysis (urban and forested terrain at a regional scale). Reducing spectral variation within the same LC class and increasing the separability of different LC types are the keys for improved LC classification [3]. In this regard, different approaches such as sub-pixel classification [6], multi-sensor data integration [7], full spectral image classification [8], expert classification, etc. have been used. Traditional per-pixel spectral classification is based only on spectral signatures, but does not make use of rich spatial information inherent in the data [5]. Therefore, deriving information from RS data with ancillary information (acquired or derived environmental layers) would considerably improve classification accuracy.

Earlier, X. Na et al., (2010) [9] used 103 geographical layers to show improvement in LC mapping using Landsat TM bands 1 to 5 and 7, NDVI, EVI, first principal component (PC1) of the six Landsat TM bands as additional predictors, image texture measures (variance, homogeneity, contrast, dissimilarity and entropy) with window size of 3 x 3 pixels and 11 x 11 pixels, DEM, slope, and soil type. Na. Xiaodong et al., (2009) [10] integrated TM data with NDVI, EVI, PC1, slope, soil types, and five texture measures (variance, homogeneity, contrast, dissimilarity and entropy) for classification of marsh area using Classification Trees and Maximum Likelihood classifier. This highlights that spectral, textural and terrain data with ancillary derived geographical data improved significantly the LC classification accuracy. A. Fahsi et al., (2000) [11] evaluated the contribution and quantified the effectiveness of DEM in improving LC classification accuracies of the different classes by up to 60% using Landsat TM data over a rugged area in the Atlas Mountains, Morocco. J. A. Recio et al., (2011) [12] used historical land use (LU) and ancillary data as a feature in a geospatial framework for image classification and showed improvement in overall classification accuracy for each class. Masocha and Skidmore (2011) [13] used DEM along with ASTER imagery and geo-referenced point data obtained from field to increase the accuracy of invasive species (Lantana camera) mapping using hybrid classifiers (Neural Network (NN) and SVM classifiers along with GIS expert system). The overall accuracy increased from 71% (kappa 0.61) to 83% (kappa 0.77) with NN and from 64% (kappa 0.52) to 76% (kappa 0.67) with SVM.

G. Xian et al., (2008) [14] quantified multi-temporal urban development characteristics in Las Vegas from Landsat and ASTER Data with ancillary data such as NDVI, slope, aspect and temperature. Lu and Weng (2005) [2] demonstrated urban classification using full spectral information of Landsat ETM+ imagery in Marion County, Indiana. PC’s of ETM+ MS bands, texture, temperature and data fusion of MS and PAN were considered to improve classification accuracy. They concluded that data fusion of MS and PAN, with texture and temperature as additional layers are useful but high spatial resolution also increases intra-class spectral variations, decreasing the classification accuracy. Most of the above studies have been based on a single landscape/terrain, sometimes focusing on the comparison of classification techniques, or investigating the role of a few layers on the improvement in classification accuracy, and often using commercial data (such as IKONOS, ASTER) for LC analysis. Hence, there is a need for study that uses free RS data (such as Landsat TM/ETM+) along with other geographical layers for LC classification in different landscape/terrain types using an advanced classification technique that is not complex in its implementation and at the same time does not dependent on the underlying data distribution.

The objective of this work is to investigate the role of ancillary and derived geographical layers such as vegetation indices (NDVI and EVI), elevation and derived layers (slope and aspect), texture (angular second moment, contrast, entropy and variance) and PAN band in addition to original MS bands in improving classification accuracy. The algorithm and classification strategy have been tested in three different terrain types with varying characteristics – Greater Bangalore (highly urbanised terrain), Western Ghats (forested with undulating terrain) and Western Himalaya (rugged terrain with temperate climate) for classifying Landsat ETM+ MS bands.

The paper is organised as follows: section II introduces the data and study area. Section III describes Random Forest classification algorithm. Section IV presents various classification results and discussion followed by conclusions in section V.

BACK « TOP » NEXT

Citation : Uttam Kumar, Anindita Dasgupta, Chiranjit Mukhopadhyay and Ramachandra. T.V., 2011, Random Forest Algorithm with derived Geographical Layers for Improved Classification of Remote Sensing Data., Proceedings of the INDICON 2011, Engineering Sustainable Solutions, 16-18^th December, Hyderabad - India, pp. 1-6.

* Corresponding Author :
	Dr. T.V. Ramachandra Energy & Wetlands Research Group, Centre for Ecological Sciences, Indian Institute of Science, Bangalore – 560 012, India. Tel : +91-80-2293 3099/2293 3503-extn 107, Fax : 91-80-23601428 / 23600085 / 23600683 [CES-TVR] E-mail : cestvr@ces.iisc.ernet.in, energy@ces.iisc.ernet.in, Web : http://wgbis.ces.iisc.ernet.in/energy, http://ces.iisc.ernet.in/grass

Contact Address :
	Dr. T.V. Ramachandra Energy & Wetlands Research Group, Centre for Ecological Sciences, Indian Institute of Science, Bangalore – 560 012, INDIA. Tel : +91-80-2293 3099/2293 3503 - extn 107 Fax : 91-80-23601428 / 23600085 / 23600683 [CES-TVR] E-mail : cestvr@ces.iisc.ernet.in, energy@ces.iisc.ernet.in, Web : http://wgbis.ces.iisc.ernet.in/energy