A Random Forest Implementation For MATLAB
A Random Forest Implementation For MATLAB
Description
To perform appropriate RFC, the MATLAB codes follow the procedure below, after data set is loaded.
2. Decide candidates of the ratio of the number of explanatory variables (X) for decision trees
For example, they are 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8.
3. Run RFC for every candidate of X-ratio and estimate values of objective variable (Y) for Out Of Bag
(OOB) samples
4. Calculate misclassification rate between actual Y and estimated Y for each candidate of X-ratio
5. Decide the optimal X-ratio with the minimum misclassification rate value
7. Calculate confusion matrix between actual Y and calculated Y for the optimal X-ratio
8. Calculate confusion matrix between actual Y and estimated Y of OOB samples for the optimal X-ratio
If it takes too much time to train RFC, please decrease the number of decision trees.
Modules
Image Classification
Image classification is the process of converting Digital Number (DN) values to significant land cover
information at every pixel location in the image. In other words, image classification assigns pixels of an
image to many classes according to statistical decision rules in spectral domain or logical decision rules
in spatial domain. Spectral domain uses decision rules, which are based on spectral values of pixels;
whereas, decision rules in spatial domain are based on neighborhood information of pixels and spatial
contexts such as shape, texture and pattern.
Ensemble classification methods are learning algorithms that construct a set of classifiers instead of one
classifier, and then classify new data points by taking a vote of their predictions. The most commonly
used ensemble classifiers are Bagging, Boosting and RF. To initialize RF algorithm, the user must define
two parameters. These parameters are N and m, which are the number of trees to grow and the number
of variables used to split each node, respectively. First, N bootstrap samples are drawn from the 2/3 of
the training data set. Remaining 1/3 of the training data, also called out-of-bag (OOB) data, are used to
test the error of the predictions. Then, an un-pruned tree from each bootstrap sample is grown such
that at each node m predictors are randomly selected as a subset of predictor variables, and the best
split from among those variables is chosen.
This study is carried out using high resolution multiple images over the city of Trabzon, Turkey and its
vicinity with both urban and rural features. Image data used include QuickBird pan-sharpened
multispectral (0.6 m) images acquired.
Result of RF Algorithm