0% found this document useful (0 votes)
116 views8 pages

A Random Forest Implementation For MATLAB

This document describes a study that evaluates the performance of the random forest (RF) algorithm for classifying multispectral satellite images. The study uses Ikonos and QuickBird satellite images covering both urban and rural areas. RF classification accuracy is compared to other algorithms like SVM, GAB and MLC. Preliminary results show RF achieves 10-11% higher accuracy than other methods. The MATLAB code implements RF classification in 9 steps, including selecting optimal parameters, constructing RF models, and estimating classifications. RF accuracy depends on parameters like number of trees and variables, and the study tests different combinations to find optimal values.

Uploaded by

freesourcecoder
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
116 views8 pages

A Random Forest Implementation For MATLAB

This document describes a study that evaluates the performance of the random forest (RF) algorithm for classifying multispectral satellite images. The study uses Ikonos and QuickBird satellite images covering both urban and rural areas. RF classification accuracy is compared to other algorithms like SVM, GAB and MLC. Preliminary results show RF achieves 10-11% higher accuracy than other methods. The MATLAB code implements RF classification in 9 steps, including selecting optimal parameters, constructing RF models, and estimating classifications. RF accuracy depends on parameters like number of trees and variables, and the study tests different combinations to find optimal values.

Uploaded by

freesourcecoder
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

A Random Forest implementation for MATLAB

Description

Random Forest (RF) algorithm is known


to be one of the most efcient
classication methods. Due to its
inherent interdisciplinary nature, it draws
researchers from different backgrounds.
This study aims at inves-
tigating the performance of RF algorithm
using multispectral satellite images having
different spatial reso-
lutions and scene characteristics. The
satellite images used include Ikonos and
QuickBird images with four
multispectral bands. Ikonos image taken in
2003 covers mainly urban area, whereas
QuickBird images ac-
quired in 2005 and 2008 covers both urban
and rural areas, respectively. QuickBird
image taken in 2005 also
contains noisy patterns over Black Sea due
to waves resulting from windy weather. To
evaluate the perfor-
mance of RF, the classication results are
compared with the results obtained from
Gentle AdaBoost (GAB),
Support Vector Machine (SVM) and
Maximum Likelihood Classication
(MLC) algorithms. Preliminary
results indicate that RF gives higher
classication accuracies than other
methods. For Ikonos image over
urban area, the results show that RF
algorithm gives 10% higher classication
accuracy than SVM, whereas
GAB algorithm has the lowest
classication accuracy (14 % lower than
RF). For QuickBird image (taken in
2008) of rural area, RF gives the best
result compared to the others. Also, for
QuickBird image containing
noisy pattern, RF has around 11% higher
overall accuracy than SVM
Random Forest (RF) algorithm is known
to be one of the most efcient
classication methods. Due to its
inherent interdisciplinary nature, it draws
researchers from different backgrounds.
This study aims at inves-
tigating the performance of RF algorithm
using multispectral satellite images having
different spatial reso-
lutions and scene characteristics. The
satellite images used include Ikonos and
QuickBird images with four
multispectral bands. Ikonos image taken in
2003 covers mainly urban area, whereas
QuickBird images ac-
quired in 2005 and 2008 covers both urban
and rural areas, respectively. QuickBird
image taken in 2005 also
contains noisy patterns over Black Sea due
to waves resulting from windy weather. To
evaluate the perfor-
mance of RF, the classication results are
compared with the results obtained from
Gentle AdaBoost (GAB),
Support Vector Machine (SVM) and
Maximum Likelihood Classication
(MLC) algorithms. Preliminary
results indicate that RF gives higher
classication accuracies than other
methods. For Ikonos image over
urban area, the results show that RF
algorithm gives 10% higher classication
accuracy than SVM, whereas
GAB algorithm has the lowest
classication accuracy (14 % lower than
RF). For QuickBird image (taken in
2008) of rural area, RF gives the best
result compared to the others. Also, for
QuickBird image containing
noisy pattern, RF has around 11% higher
overall accuracy than SVM
Random Forest (RF) algorithm is known
to be one of the most efcient
classication methods. Due to its
inherent interdisciplinary nature, it draws
researchers from different backgrounds.
This study aims at inves-
tigating the performance of RF algorithm
using multispectral satellite images having
different spatial reso-
lutions and scene characteristics. The
satellite images used include Ikonos and
QuickBird images with four
multispectral bands. Ikonos image taken in
2003 covers mainly urban area, whereas
QuickBird images ac-
quired in 2005 and 2008 covers both urban
and rural areas, respectively. QuickBird
image taken in 2005 also
contains noisy patterns over Black Sea due
to waves resulting from windy weather. To
evaluate the perfor-
mance of RF, the classication results are
compared with the results obtained from
Gentle AdaBoost (GAB),
Support Vector Machine (SVM) and
Maximum Likelihood Classication
(MLC) algorithms. Preliminary
results indicate that RF gives higher
classication accuracies than other
methods. For Ikonos image over
urban area, the results show that RF
algorithm gives 10% higher classication
accuracy than SVM, whereas
GAB algorithm has the lowest
classication accuracy (14 % lower than
RF). For QuickBird image (taken in
2008) of rural area, RF gives the best
result compared to the others. Also, for
QuickBird image containing
noisy pattern, RF has around 11% higher
overall accuracy than SVM
Random Forest (RF) algorithm is known to be one of the most efficient classification methods. Due to its
inherent interdisciplinary nature, it draws researchers from different backgrounds. This study aims at
investigating the performance of RF algorithm using multispectral satellite images having different
spatial resolutions and scene characteristics. The satellite images used include Ikonos and Quick Bird
images with four multispectral bands. Ikonos image taken in 2003 covers mainly urban area, whereas
Quick Bird images acquired in 2005 and 2008 covers both urban and rural areas, respectively. Quick Bird
image taken in 2005 also contains noisy patterns over Black Sea due to waves resulting from windy
weather. To evaluate the performance of RF, the classification results are compared with the results
obtained from Gentle AdaBoost (GAB), Support Vector Machine (SVM) and Maximum Likelihood
Classification (MLC) algorithms. Preliminary results indicate that RF gives higher classification accuracies
than other methods. For Ikonos image over urban area, the results show that RF algorithm gives 10%
higher classification accuracy than SVM, whereas GAB algorithm has the lowest classification accuracy
(14 % lower than RF). For QuickBird image (taken in 2008) of rural area, RF gives the best result
compared to the others. Also, for QuickBird image containing noisy pattern, RF has around 11% higher
overall accuracy than SVM.

To perform appropriate RFC, the MATLAB codes follow the procedure below, after data set is loaded.

1. Decide the number of decision trees For example, it is 500.

2. Decide candidates of the ratio of the number of explanatory variables (X) for decision trees
For example, they are 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8.

3. Run RFC for every candidate of X-ratio and estimate values of objective variable (Y) for Out Of Bag
(OOB) samples

4. Calculate misclassification rate between actual Y and estimated Y for each candidate of X-ratio

5. Decide the optimal X-ratio with the minimum misclassification rate value

6. Construct RFC model with the optimal X-ratio

7. Calculate confusion matrix between actual Y and calculated Y for the optimal X-ratio

8. Calculate confusion matrix between actual Y and estimated Y of OOB samples for the optimal X-ratio

9. Estimate Y based on the RFC model in 6.

If it takes too much time to train RFC, please decrease the number of decision trees.
Modules

Image Classification

Image classification is the process of converting Digital Number (DN) values to significant land cover
information at every pixel location in the image. In other words, image classification assigns pixels of an
image to many classes according to statistical decision rules in spectral domain or logical decision rules
in spatial domain. Spectral domain uses decision rules, which are based on spectral values of pixels;
whereas, decision rules in spatial domain are based on neighborhood information of pixels and spatial
contexts such as shape, texture and pattern.

Random forest Algorithm

Ensemble classification methods are learning algorithms that construct a set of classifiers instead of one
classifier, and then classify new data points by taking a vote of their predictions. The most commonly
used ensemble classifiers are Bagging, Boosting and RF. To initialize RF algorithm, the user must define
two parameters. These parameters are N and m, which are the number of trees to grow and the number
of variables used to split each node, respectively. First, N bootstrap samples are drawn from the 2/3 of
the training data set. Remaining 1/3 of the training data, also called out-of-bag (OOB) data, are used to
test the error of the predictions. Then, an un-pruned tree from each bootstrap sample is grown such
that at each node m predictors are randomly selected as a subset of predictor variables, and the best
split from among those variables is chosen.

Study area and data

This study is carried out using high resolution multiple images over the city of Trabzon, Turkey and its
vicinity with both urban and rural features. Image data used include QuickBird pan-sharpened
multispectral (0.6 m) images acquired.

Result of RF Algorithm

Classification accuracy of RF method depends on user-defined parameters N and m; hence, optimal


selection of these parameters increases classification accuracy. To find the optimum values for N and m,
multiple combinations are tested and assessed to obtain more reliable thematic maps for the study
areas. For different N and m combinations, OOB error, test accuracy, kappa and computational time
results for the training set are given in Table 1.As seen in Table 1, N = 100 and m = 2 is selected for
Ikonos image over urban area. For QuickBird image taken over urban area N = 350 and m = 2 is chosen;
whereas N = 500 and m = 2 is selected for QuickBird image of rural area.

You might also like