Abstract
Multivariate calibration is a classic problem in the analytical chemistry field and frequently solved by partial least squares (PLS) and artificial neural networks (ANNs) in the previous works. The spaciality of multivariate calibration is high dimensionality with small sample. Here, we apply support vector regression (SVR) as well as ANNs, and PLS to the multivariate calibration problem in the determination of the three aromatic amino acids (phenylalanine, tyrosine and tryptophan) in their mixtures by fluorescence spectroscopy. The results of the leave-one-out method show that SVR performs better than other methods, and appear to be one good method for this task. Furthermore, feature selection is performed for SVR to remove redundant features and a novel algorithm named Prediction RIsk based FEature selection for support vector Regression (PRIFER) is proposed. Results on the above multivariate calibration data set show that PRIFER is a powerful tool for solving the multivariate calibration problems.
Similar content being viewed by others
References
Peussa M, Härkönen S, Puputti J, Niinistö L (2000) Application of PLS multivariate calibration for the determination of the hydroxyl group content in calcined silica by DRIFTS. J Chemom 14:501–512
Marx BD, Eilers PHC (2002) Multivariate calibration stability: a comparison of methods. J Chemom 16:129–140
Tormod N, Knut K, Tomas I, Charles M (1993) Artificial neural networks in multivariate calibration. J Near Infrared Spectr 1:1–11
Poppi RJ, Massart DL (1998) The optimal brain surgeon for pruning neural network architecture applied to multivariate calibration. Anal Chim Acta 375:187–195
Tetko VI, Livingstone JD, Luik IA (1995) Neural network studies: 1. comparison of overfitting and overtraining. J Chem Inform Comp Sci 35:826–833
Moody J, Utans J (1992) Principled architecture selection for neural networks: application to corporate bond rating prediction. In: Moody JE, Hanson SJ, Lippmann RP (eds) Advances in neural Information processing systems. Morgan Kaufmann Publishers, Inc., Menlo Park, pp. 683–690
Foresee FD, Hagan MT (1997) Gauss-newton approximation to bayesian regularization. In: Proceedings of the 1997 International Joint Conference on Neural Networks. 1930–1935
Vapnik V (1995) The nature of statistical learning theory. Springer, New York
Chen NY, Lu WC, Yang J, Li GZ (2004) Support vector machines in chemistry. World Scientific Publishing Company, Singapore
Belousov AI, Verzakov SA, von Frese J (2002) Applicational aspects of support vector machines. J Chemom 16:482–489
Liu H, Yu L (2005) Toward integrating feature selection algorithms for classification and clustering. IEEE Trans Knowl Data Eng 17(3):1–12
Lacowicz JR (1983) Principle of fluorescence spectroscopy. Plenum Press, New York
Cristianini N, Shawe-Taylor J (2000) An introduction to support vector machines. Cambridge University Press, Cambridge
Karush W (1939) Minima of functions of several variables with inequalities as side constraints. Master’s Thesis, Department of Mathematics, University of Chicago
Kuhn HW, Tucker AW (1951) Nonlinear programming. In: Proceeding of the 2nd Berkeley Symposium on Mathematical Statistics and Probabilistic. University of California Press, Berkeley, pp 481–492
Mercer J (1909) Functions of positive and negative type and their connection with the theory of integral equations. Philos Trans Roy Soc Lond A 209:415–446
Hsu CW, Chang CC, Lin CJ (2003) A practical guide to support vector classification. Technical report, Department of Computer Science and Information Engineering of National Taiwan University, Available: http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf (14 August 2003)
Rüping S (2000) Mysvm-Manual. University of Dortmund, Lehrstuhl Informatik 8, Available: http://www-ai.cs.uni-dortmund.de/SOFTWARE/mySvm/(14 August 2003)
Demuth H, Beale M (2001) Neural network Toolbox User’s Guide for Use with MATLAB, 4th edn. The Mathworks Inc
Sarle WS (1995) Stopped training and other remedies for overfitting. In: Proceedings of the 27th Symposium on the Interface of Computing Science and Statistics. 352–360
Andersson CA, Bro R (2000) The n-way toolbox for MATLAB. Chemometrics & Intelligent Laboratory Systems 52:1–4
Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182
Liu H, Dougherty ER, Dy JG, Torkkola K, Tuv E, Peng H, Ding C, Long F, Berens M, Parsons L, Yu L, Zhao Z, Forman G (2005) Evolving feature selection. IEEE Trans Intell Syst 20(6):64–76
Zhang YQ, Rajapakse JC (2007) Machine learning in bioinformatics. Wiley, New York
Li GZ, Yang J, Liu GP, Xue L (2004) Feature selection for multi-class problems using support vector machines. In: Lecture Notes on Artificial Intelligence 3173 (PRICAI2004), Springer 292–300
Lal TN, Chapelle O, Weston J, Elisseeff A (2006) Embedded methods. In: Guyon I, Gunn S, Nikravesh M (eds) Feature extraction, foundations and applications. Physica-Verlag, Springer, Berlin
Peng H, Long F, Ding C (2005) Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238
Guyon I, Weston J, Barnhill S, Vapnik V (2002) Gene selection for cancer classification using support vector machines. Mach Learn 46:389–422
Acknowledgments
Thanks to the late professor Nian-Yi Chen for his advices to this paper. This work was supported in part by the Nature Science Foundation of China under grant no. 20503015 and 60873129, the Shanghai Rising-Star Program under grant no. 08QA14032 and open funding by Institute of Systems Biology of Shanghai University.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Li, GZ., Meng, HH., Yang, M.Q. et al. Combining support vector regression with feature selection for multivariate calibration. Neural Comput & Applic 18, 813–820 (2009). https://doi.org/10.1007/s00521-008-0202-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-008-0202-6