Abstract
A statistical challenge to analyse hyperspectral data is the multicollinearity between spectral bands. Partial least squares (PLS) has been extensively used as a dimensionality reduction technique through constructing lower dimensional latent variables from the spectral bands that correlate with the response variables. However, it does not take into account the grouping structure of the full spectrum where spectral subsets may exhibit distinct relationships with the response variables. We propose a two-step group penalized PLS regression approach by performing a PLS regression on each group of predictors identified from a clustering approach in the first step. In the second step, a group penalty is imposed on the latent components to select the group with the highest predictive power. Our proposed method demonstrated a superior prediction performance, higher R-squared value and faster computation time over other PLS variations when applied to simulations and a real-world observational data set. Interpretations of the model performance are illustrated using the real-world data example of leaf spectra to indirectly quantify leaf traits. The method is implemented in an R package called “groupPLS”, which is accessible from github.com/jialiwang1211/groupPLS.
Similar content being viewed by others
References
Akarachantachote N, Chadcham S, Saithanu K (2014) Cutoff threshold of variable importance in projection for variable selection. Int J Pure Appl Math 94(3):307–322
Broge NH, Leblanc E (2001) Comparing prediction power and stability of broadband and hyperspectral vegetation indices for estimation of green leaf area index and canopy chlorophyll density. Remote Sens Environ 76(2):156–172
Bühlmann P, Rütimann P, van de Geer S, Zhang C-H (2013) Correlated variables in regression: clustering and sparse estimation. J Stat Plan Inference 143(11):1835–1858
Chun H, Keleş S (2010) Sparse partial least squares regression for simultaneous dimension reduction and variable selection. J R Stat Soc Ser B 72(1):3–25
Chung D, Chun H, Keles S (2018) spls: Sparse Partial Least Squares (SPLS) Regression and Classification. URL https://CRAN.R-project.org/package=spls. R package version 2.2-2
Cloutis EA (1996) Review article hyperspectral geological remote sensing: evaluation of analytical techniques. Int J Remote Sens 17(12):2215–2242
De Jong S (1993) Simpls: an alternative approach to partial least squares regression. Chemom Intell Lab Syst 18(3):251–263
de Micheaux PL, Liquet B, Sutton M et al (2019) Pls for big data: a unified parallel algorithm for regularised group pls. Stat Surv 13:119–149
Gamon J, Penuelas J, Field C (1992) A narrow-waveband spectral index that tracks diurnal changes in photosynthetic efficiency. Remote Sens Environ 41(1):35–44
Goodhue DL, Lewis W, Thompson R (2012) Does pls have advantages for small sample size or non-normal data? Mis Quarterly, pages 981–1001
Govender M, Chetty K, Bulcock H (2007) A review of hyperspectral remote sensing and its application in vegetation and water resource studies. Water Sa 33(2)
Guyot G et al (1990) Optical properties of vegetation canopies. Optical properties of vegetation canopies. 19–43
Li Y, Nan B, Zhu J (2015) Multivariate sparse group lasso for the multivariate multiple linear regression with an arbitrary group structure. Biometrics 71(2):354–363
Liquet B, de Micheaux PL, Hejblum BP, Thiébaut R (2015) Group and sparse group partial least square approaches applied in genomics context. Bioinformatics 32(1):35–42
Liquet B, de Micheaux PL, Broc C (2017) sgPLS: Sparse Group Partial Least Square Methods. URL https://CRAN.R-project.org/package=sgPLS. R package version 1.7
Liu A, Zhang Y, Gehan E, Clarke R (2002) Block principal component analysis with application to gene microarray data classification. Stat Med 21(22):3465–3474
Mehmood T, Ahmed B (2016) The diversity in the applications of partial least squares: an overview. J Chemom 30(1):4–17
Meier L, Van De Geer S, Bühlmann P (2008) The group lasso for logistic regression. J R Stat Soc Ser B 70(1):53–71
Merchante LFS, Grandvalet Y, Govaert G (2012) An efficient approach to sparse linear discriminant analysis. arXiv preprint arXiv:1206.6472
Mercier G, Lennon M (2003) Support vector machines for hyperspectral image classification with spectral-based kernels. In IGARSS 2003. In: 2003 IEEE International Geoscience and Remote Sensing Symposium. Proceedings (IEEE Cat. No. 03CH37477), volume 1, pages 288–290. IEEE
Mevik B-H, Wehrens R, Liland KH (2018) pls: Partial Least Squares and Principal Component Regression. URL https://CRAN.R-project.org/package=pls. R package version 2.7-0
Musumarra G, Barresi V, Condorelli D, Fortuna C, Scire S (2004) Potentialities of multivariate approaches in genome-based cancer research: identification of candidate genes for new diagnostics by pls discriminant analysis. J Chemom 18(3–4):125–132
Nguyen DV, Rocke DM (2002a) Multi-class cancer classification via partial least squares with gene expression profiles. Bioinformatics 18(9):1216–1226
Nguyen DV, Rocke DM (2002b) Tumor classification by partial least squares using microarray gene expression data. Bioinformatics 18(1):39–50
Rebetzke G, Jimenez-Berni J, Fischer R, Deery D, Smith D (2019) High-throughput phenotyping to enhance the use of crop genetic resources. Plant Sci 282:40–48
Roitsch T, Cabrera-Bosquet L, Fournier A, Ghamkhar K, Jiménez-Berni J, Pinto F, Ober ES (2019) New sensors and data-driven approaches–a path to next generation phenomics. Plant Sci 282:2–10
Silva-Perez V, Molero G, Serbin SP, Condon AG, Reynolds MP, Furbank RT, Evans JR (2017) Hyperspectral reflectance as a tool to measure biochemical and physiological traits in wheat. J Exp Bot 69(3):483–496
Sutton M, Thiébaut R, Liquet B (2018) Sparse partial least squares with group and subgroup structure. Stat Med 37(23):3338–3356
Tan KM, Witten D, Shojaie A (2015) The cluster graphical lasso for improved estimation of gaussian graphical models. Comput Stat Data Anal 85:23–36
Ter Braak CJ, de Jong S (1998) The objective function of partial least squares regression. J Chemom 12(1):41–54
Thenkabail PS, Lyon JG (2016) Hyperspectral remote sensing of vegetation. CRC Press, Boca Raton
Van der Meer FD, Van der Werff HM, Van Ruitenbeek FJ, Hecker CA, Bakker WH, Noomen MF, Van Der Meijde M, Carranza EJM, De Smeth JB, Woldai T (2012) Multi-and hyperspectral geologic remote sensing: a review. Int J Appl Earth Obs Geoinf 14(1):112–128
Verrelst J, Malenovskỳ Z, Van der Tol C, Camps-Valls G, Gastellu-Etchegorry J-P, Lewis P, North P, Moreno J (2018) Quantifying vegetation biophysical variables from imaging spectroscopy data: a review on retrieval methods. Surv Geophys 40(3):589–629
Wold H (1966) Estimation of principal components and related models by iterative least squares. Multivar Anal 391–420
Woodgate W, Suarez L, van Gorsel E, Cernusak L, Dempsey R, Devilla R, Held A, Hill M, Norton A (2019) tri-pri: a three band reflectance index tracking dynamic photoprotective mechanisms in a mature eucalypt forest. Agric For Meteorol 272:187–201
Yu S, Jia S, Xu C (2017) Convolutional neural networks for hyperspectral image classification. Neurocomputing 219:88–98
Yuan L, Huang Y, Loraamm RW, Nie C, Wang J, Zhang J (2014) Spectral analysis of winter wheat leaves for detection and differentiation of diseases and insects. Field Crops Res 156:199–207
Zhu H, Cen H, Zhang C, He Y (2016) Early detection and classification of tobacco leaves inoculated with tobacco mosaic virus based on hyperspectral imaging technique. In: 2016 ASABE Annual International Meeting, page 1. American Society of Agricultural and Biological Engineers
Acknowledgements
The authors would like to thank Dr. Klara L Verbyla and Dr. Alexander B Zwart for their insightful discussions and comments. Dr. William Woodgate is supported by an Australian Research Council DECRA Fellowship (DE190101182). The OzFlux and SuperSite network is supported by the National Collaborative Infrastructure Strategy (NCRIS) through the Terrestrial Ecosystem Research Network (TERN).
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Pierre Dutilleul.
Appendix
Appendix
Rights and permissions
About this article
Cite this article
Chang, L., Wang, J. & Woodgate, W. Analysing spectroscopy data using two-step group penalized partial least squares regression. Environ Ecol Stat 28, 445–467 (2021). https://doi.org/10.1007/s10651-021-00496-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10651-021-00496-2