Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Jan 12:11:624273.
doi: 10.3389/fpls.2020.624273. eCollection 2020.

Application of Machine Learning Algorithms in Plant Breeding: Predicting Yield From Hyperspectral Reflectance in Soybean

Affiliations

Application of Machine Learning Algorithms in Plant Breeding: Predicting Yield From Hyperspectral Reflectance in Soybean

Mohsen Yoosefzadeh-Najafabadi et al. Front Plant Sci. .

Abstract

Recent substantial advances in high-throughput field phenotyping have provided plant breeders with affordable and efficient tools for evaluating a large number of genotypes for important agronomic traits at early growth stages. Nevertheless, the implementation of large datasets generated by high-throughput phenotyping tools such as hyperspectral reflectance in cultivar development programs is still challenging due to the essential need for intensive knowledge in computational and statistical analyses. In this study, the robustness of three common machine learning (ML) algorithms, multilayer perceptron (MLP), support vector machine (SVM), and random forest (RF), were evaluated for predicting soybean (Glycine max) seed yield using hyperspectral reflectance. For this aim, the hyperspectral reflectance data for the whole spectra ranged from 395 to 1005 nm, which were collected at the R4 and R5 growth stages on 250 soybean genotypes grown in four environments. The recursive feature elimination (RFE) approach was performed to reduce the dimensionality of the hyperspectral reflectance data and select variables with the largest importance values. The results indicated that R5 is more informative stage for measuring hyperspectral reflectance to predict seed yields. The 395 nm reflectance band was also identified as the high ranked band in predicting the soybean seed yield. By considering either full or selected variables as the input variables, the ML algorithms were evaluated individually and combined-version using the ensemble-stacking (E-S) method to predict the soybean yield. The RF algorithm had the highest performance with a value of 84% yield classification accuracy among all the individual tested algorithms. Therefore, by selecting RF as the metaClassifier for E-S method, the prediction accuracy increased to 0.93, using all variables, and 0.87, using selected variables showing the success of using E-S as one of the ensemble techniques. This study demonstrated that soybean breeders could implement E-S algorithm using either the full or selected spectra reflectance to select the high-yielding soybean genotypes, among a large number of genotypes, at early growth stages.

Keywords: artificial intelligence; data-driven model; ensemble methods; high-throughput phenotyping; random forest; recursive feature elimination.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

FIGURE 1
FIGURE 1
The location of the experiments in 2018 and 2019.
FIGURE 2
FIGURE 2
The distribution of soybean genotypes in each yield class.
FIGURE 3
FIGURE 3
A schematic representation of the machine learning algorithms used in this study to classify the soybean yield using reflectance bands: (A) Multilayer perceptron, (B) Support vector machine, and (C) Random forest.
FIGURE 4
FIGURE 4
The scheme of data collection and machine learning algorithm development and validation. OP, optimizing parameters; MLP, multilayer perceptron; SVM, support vector machine; RF, random forest; E–S, ensemble–stacking strategy.
FIGURE 5
FIGURE 5
The minimum, mean, and maximum values of each reflectance band were measured for 245 soybean genotypes evaluated at (A) R4 and (B) R5 growth stages at four different field environments.
FIGURE 6
FIGURE 6
The importance of selected variables based on the recursive feature elimination (RFE) strategy for soybean reflectance bands measured at R4 (A) and R5 (B) soybean growth stages.
FIGURE 7
FIGURE 7
The soybean yield classes versus the 395 nm reflectance band at R5 growth stages.
FIGURE 8
FIGURE 8
The accuracy of RF, MLP, SVM, and E–S algorithms for predicting soybean yield using full and RFE selected variables (-VS) measured at R4 (A) and R5 (B) soybean growth stages in four environments. The mean performance was shown as × in each figure. MLP, multilayer perceptron; SVM, support vector machine; RF, random forest; E–S, ensemble–stacking strategy; RFE, recursive feature elimination.
FIGURE 9
FIGURE 9
The estimate values of (A) Matthews correlation coefficient (MCC), (B) Precision, (C) Recall, and (D) F-Measure for RF, MLP, SVM, and the E–S algorithms used for predicting soybean yield from all and selected variables collected (-VS) at the R5 growth stage. The mean performance is indicated with × in each figure. MLP, multilayer perceptron; SVM, support vector machine; RF, random forest; E–S, ensemble–stacking strategy; RFE, recursive feature elimination.

Similar articles

Cited by

References

    1. Aghighi H., Azadbakht M., Ashourloo D., Shahrabi H. S., Radiom S. (2018). Machine learning regression techniques for the silage maize yield prediction using time-series images of landsat 8 OLI. IEEE J. Select. Top. Appl. Earth Observ. Remote Sens. 11 4563–4577. 10.1109/JSTARS.2018.2823361 - DOI
    1. Albetis J., Duthoit S., Guttler F., Jacquin A., Goulard M., Poilvé H., et al. (2017). Detection of flavescence dorée grapevine disease using unmanned aerial vehicle (UAV) multispectral imagery. Remote Sens. 9:308 10.3390/rs9040308 - DOI
    1. Alexandratos N., Bruinsma J. (2012). World Agriculture Towards 2030/2050: the 2012 Revision. Rome: Food and Agriculture Organization of the United Nations, Agricultural Development Economics Division (ESA).
    1. Ali I., Cawkwell F., Green S., Dwyer N. (2014). “Application of statistical and machine learning models for grassland yield estimation based on a hypertemporal satellite remote sensing time series,” in Proceedings of the 2014 IEEE Geoscience and Remote Sensing Symposium, (IEEE; ), 5060–5063. 10.1109/IGARSS.2014.6947634 - DOI
    1. Alirezanejad M., Enayatifar R., Motameni H., Nematzadeh H. (2020). Heuristic filter feature selection methods for medical datasets. Genomics 112 1173–1181. 10.1016/j.ygeno.2019.07.002 - DOI - PubMed