A kernel PLS based classification method with missing data handling

Nguyen, Thuy Tuong; Tsoy, Yury

doi:10.1007/s00362-015-0694-y

A kernel PLS based classification method with missing data handling

Regular Article
Published: 12 June 2015

Volume 58, pages 211–225, (2017)
Cite this article

Statistical Papers Aims and scope Submit manuscript

Thuy Tuong Nguyen¹ &
Yury Tsoy²

527 Accesses
8 Citations
Explore all metrics

Abstract

We provide a data classification mechanism with missing data handling based on kernel partial least squares (kernel PLS) and discriminant analysis (kernel PLSDA). The novelty of the method is that class variables are used for validation of the missing values imputation. Likewise, this paper is first in utilizing the kernel PLS in handling and classifying missing data. By experimentally comparing the results of different classification methods including missing data handling on three opened biomedical datasets (Arrhythmia, Mammographic Mass, and Pima Indians Diabetes at UCI Machine Learning Repository, http://archive.ics.uci.edu/ml/datasets.html), we found that the proposed kernel PLS plus kernel PLSDA yielded better accuracies than the existing methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Novel Hybrid Imputation Method to Predict Missing Values in Medical Datasets

Missing Data Characteristics and the Choice of Imputation Technique: An Empirical Study

Feature Based Multivariate Data Imputation

Notes

References

Acuna E, Rodriguez C (2004) The treatment of missing values and its effect in the classifier accuracy. In: Proceedings of the meeting of the international federation of classification societies
Alin A (2009) Comparison of PLS algorithms when number of objects is much larger than number of variables. Stat Pap 50:711–720
Article MathSciNet MATH Google Scholar
Alin A, Ali MM (2012) Improved straightforward implementation of a statistically inspired modification of the partial least squares algorithms. Pak J Statist 28(2):217–229
MathSciNet Google Scholar
Barker M, Rayens W (2003) Partial least squares for discrimination. J Chemom 17:166–173
Article Google Scholar
Bovaird JA, Kypzyk KA, Maikranz J, Dreyer M, Steele R (2007) Missing data and standard errors with partial least squares. In: Proceedings of the 115th annual meeting of the American Psychological Association, San Francisco
Burges CJC (1998) A tutorial on support vector machines for pattern recognition. Data Min Knowl Discov 2:121–167
Article Google Scholar
Dayal BS, MacGregor JF (1997) Improved PLS algorithms. J Chemom 11:73–85
Article Google Scholar
De Jong S, Ter Braak CJF (1994) Short communication: comments on the PLS kernel algorithm. J Chemom 8:169–174
Article Google Scholar
Duda RO, Hart PE, Stork DG (2000) Pattern classification, 2nd edn. Wiley, New York
MATH Google Scholar
Efron B, Tibshirani RC (1997) Improvements on cross-validation: the 632+ bootstrap method. J Am Stat Assoc 92(438):548–560
MathSciNet MATH Google Scholar
Elter M, Schulz-Wendtland R, Wittenberg T (2007) The prediction of breast cancer biopsy outcomes using two CAD approaches that both emphasize and intelligible decision process. Med Phys 34(11):4164–4172
Article Google Scholar
Fernandez-Delgado M, Cernadas E, Barro S (2014) Do we need hundreds of classifiers to solve real world classification problems? J Mach Learn Res 15:3133–3181
MathSciNet MATH Google Scholar
Guvenir HA, Acar B, Demiroz G, Cekin A (1997) A supervised machine learning algorithm for arrhythmia analysis. In: Proceedings of the computers in cardiology conference. pp 433–436
Lindgren F, Geladi P, Wold S (1993) The kernel algorithm for PLS. J Chemom 7:45–59
Article Google Scholar
Oliveira ALI, Medeiros EA, Rocha TABV, Bezerra MER, Veras RC (2006) On the influence of parameter \(\theta ^{-}\) on performance of RBF neural network trained with the dynamic decay adjustment algorithm. Int J Neural Syst 16(4):271–281
Article Google Scholar
Pappa GL, Freitas AA, Kaestner CAA (2002) Attribute selection with a multi-objective genetic algorithm. Brazilian Symposium on artifical intelligence, pp 280–290
Pelckmans K, De Brabanter J, Suykens JAK, De Moor B (2005) Handling missing values in support vector machine classifiers. Neural Netw 18:684–692
Article MATH Google Scholar
Perez-Enciso M, Tenenhaus M (2003) Prediction of clinical outcome with microarray data: a partial least squares discriminant analysis (PLS-DA) approach. Hum Genet 112(5–6):581–592
Google Scholar
Prechelt L (1994) PROBEN1 - A set of neural network benchmark problems and benchmarking rules. Technical Report 21/94, Fakultat fur Informatik, Universitat Karlsruhe
Rosipal R (2011) Nonlinear partial least squares: an overview. In: Lodhi H, Yamanishi Y (eds) Complex computational methods and collaborative techniques., Chemoinformatics and advance machine learning perspectivescomplex IGI global, Hershey, PA, pp 168–189
Google Scholar
Rubin H, Witkiewitz K, St. Andre J, Reilly S (2007) Methods for handling missing data in the behavioral neurosciences: don’t throw the baby rat out with the bath water. J Undergrad Neurosci Educ 5(2):A71–A77
Google Scholar
Scheffer J (2002) Dealing with missing data. Res Lett Inf Math Sci 3:153–160
Google Scholar
Smith JW, Everhart JE, Dickson WC, Knowler WC, Johannes RS (1988) Using the ADAP learning algorithm to forecast the onset of diabetes mellitus. In: Proceedings of the symposium on computer applications and medical care, pp 261–265
Van Gestel T, Suykens JAK, Baesens B, Viaene S, Vanthienen J, Guido Dedene, De Moor B, Vandewalle J (2004) Benchmarking least squares support vector machine classifiers. Mach Learn 54(1):5–32
Article MATH Google Scholar

Download references

Acknowledgments

This work was supported by the National Research foundation of Korea (NRF) grant funded by the Korea government (MSIP)(No. 2007–00559), Gyeonggi–do and KISTI. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Author information

Authors and Affiliations

University of California, Davis, USA
Thuy Tuong Nguyen
Institut Pasteur Korea, Seongnam-si, South Korea
Yury Tsoy

Authors

Thuy Tuong Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Yury Tsoy
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yury Tsoy.

Additional information

This work was done when T. T. Nguyen worked at Institut Pasteur Korea, South Korea.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Nguyen, T.T., Tsoy, Y. A kernel PLS based classification method with missing data handling. Stat Papers 58, 211–225 (2017). https://doi.org/10.1007/s00362-015-0694-y

Download citation

Received: 02 January 2014
Revised: 20 May 2015
Published: 12 June 2015
Issue Date: March 2017
DOI: https://doi.org/10.1007/s00362-015-0694-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A kernel PLS based classification method with missing data handling

Abstract

Access this article

Similar content being viewed by others

A Novel Hybrid Imputation Method to Predict Missing Values in Medical Datasets

Missing Data Characteristics and the Choice of Imputation Technique: An Empirical Study

Feature Based Multivariate Data Imputation

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A kernel PLS based classification method with missing data handling

Abstract

Access this article

Similar content being viewed by others

A Novel Hybrid Imputation Method to Predict Missing Values in Medical Datasets

Missing Data Characteristics and the Choice of Imputation Technique: An Empirical Study

Feature Based Multivariate Data Imputation

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation