Abstract
This paper presents the results of an investigation into the use of machine learning methods for the identification of narcotics from Raman spectra. The classification of spectral data and other high dimensional data, such as images, gene-expression data and spectral data, poses an interesting challenge to machine learning, as the presence of high numbers of redundant or highly correlated attributes can seriously degrade classification accuracy. This paper investigates the use of Principal Component Analysis (PCA) to reduce high dimensional spectral data and to improve the predictive performance of some well known machine learning methods. Experiments are carried out on a high dimensional spectral dataset. These experiments employ the NIPALS (Non-Linear Iterative Partial Least Squares) PCA method, a method that has been used in the field of chemometrics for spectral classification, and is a more efficient alternative than the widely used eigenvector decomposition approach. The experiments show that the use of this PCA method can improve the performance of machine learning in the classification of high dimensionsal data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Peng, S., Xu, Q., Ling, X., Peng, X., Du, W., Chen, L.: Molecular Classification of Cancer Types from Microarray Data using the combination of Genetic Algorithms and Support Vector Machines. FEBS Letters 555 (2003) 358–362
Wang, J., Kwok, J., Shen, H., Quan, L.: Data-dependent kernels for small-scale, high-dimensional data classification. In: Proc. of the International Joint Conference on Neural Networks (to appear). (2005)
Joachims, T.: Text categorisation with support vector machines. In: Proceedings of European Conference on Machine Learning (ECML). (1998)
Ryder, A.: Classification of narcotics in solid mixtures using Principal Component Analysis and Raman spectroscopy and chemometric methods. J. Forensic Sci 47 (2002) 275–284
Bulkin, B.: The Raman effect: an introduction. New York: John Wiley and Sons, Inc (1991)
Conroy, J., Ryder, A., Leger, M., Hennessy, K., Madden, M.: Qualitative and quantitative analysis of chlorinated solvents using Raman spectroscopy and machine learning. In: Proc. SPIE-Int. Soc. Opt. Eng. Volume 5826 (in press). (2005)
Cheng, C., Kirkbride, T., Batchelder, D., Lacey, R., Sheldon, T.: In situ detection and identification of trace explosives by Raman microscopy. J. Forensic Sci 40 (1995)31–37
O’Connell, M., Howley, T., Ryder, A., Leger, M., Madden, M.: Classification of a target analyte in solid mixtures using principal component analysis, support vector machines and Raman spectroscopy. In: Proc. SPIE-Int. Soc. Opt. Eng. Volume 5826 (in press). (2005)
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Springer (2001)
Geladi, P., Kowalski, B.: Partial Least Squares: A Tutorial. Analytica Chemica Acta 185 (1986) 1–17
Scholkopf, B., Smola, A.: Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press (2002)
Witten, I., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann Publishers (2000)
Quinlan, R.: Learning Logical Definitions from Relations. Machine Learning 5 (1990)
Cohen, W.: Fast Eeffective Rule Induction. In: Proc. of the 12th Int. Conference on Machine Learning. (2002) 115–123
Savitzky, A., Golay, M.: Smoothing and differentiation of data by simplified least squares procedures. Anal. Chem. 36 (1964) 1627–1639
Nadeau, C., Bengio, Y.: Inference for generalisation error. In: Advances in Neural Information Processing 12. MIT Press (2000)
Popelinsky, L., Brazdil, P.: The Principal Components Method as a Preprocessing Stage for Decision Tree Learning. In: Proc. of PKDD Workshop (Data Mining, Decision Support, Meta-learning and ILP). (2000)
Sigurdsson, S., Philipsen, P., Hansen, L., Larsen, J., Gniadecka, M., Wulf, H.: Detection of Skin Cancer by Classification of Raman Spectra. IEEE Transactions on Biomedical Engineering 51 (2004)
Popelinsky, L.: Combining the Principal Components Method with Different Learning Algorithms. In: Proc. of ECML/PKDD IDDM Workshop (Integrating Aspects of Data Mining, Decision Support and Meta-Learning). (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag London Limited
About this paper
Cite this paper
Howley, T., Madden, M.G., O’Connell, ML., Ryder, A.G. (2006). The Effect of Principal Component Analysis on Machine Learning Accuracy with High Dimensional Spectral Data. In: Macintosh, A., Ellis, R., Allen, T. (eds) Applications and Innovations in Intelligent Systems XIII. SGAI 2005. Springer, London. https://doi.org/10.1007/1-84628-224-1_16
Download citation
DOI: https://doi.org/10.1007/1-84628-224-1_16
Publisher Name: Springer, London
Print ISBN: 978-1-84628-223-2
Online ISBN: 978-1-84628-224-9
eBook Packages: Computer ScienceComputer Science (R0)