The Effect of Principal Component Analysis on Machine Learning Accuracy with High Dimensional Spectral Data

Howley, Tom; Madden, Michael G.; O’Connell, Marie-Louise; Ryder, Alan G.

doi:10.1007/1-84628-224-1_16

Tom Howley⁴,
Michael G. Madden⁴,
Marie-Louise O’Connell⁴ &
…
Alan G. Ryder⁴

Included in the following conference series:

International Conference on Innovative Techniques and Applications of Artificial Intelligence

1095 Accesses
22 Citations

Abstract

This paper presents the results of an investigation into the use of machine learning methods for the identification of narcotics from Raman spectra. The classification of spectral data and other high dimensional data, such as images, gene-expression data and spectral data, poses an interesting challenge to machine learning, as the presence of high numbers of redundant or highly correlated attributes can seriously degrade classification accuracy. This paper investigates the use of Principal Component Analysis (PCA) to reduce high dimensional spectral data and to improve the predictive performance of some well known machine learning methods. Experiments are carried out on a high dimensional spectral dataset. These experiments employ the NIPALS (Non-Linear Iterative Partial Least Squares) PCA method, a method that has been used in the field of chemometrics for spectral classification, and is a more efficient alternative than the widely used eigenvector decomposition approach. The experiments show that the use of this PCA method can improve the performance of machine learning in the classification of high dimensionsal data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Peng, S., Xu, Q., Ling, X., Peng, X., Du, W., Chen, L.: Molecular Classification of Cancer Types from Microarray Data using the combination of Genetic Algorithms and Support Vector Machines. FEBS Letters 555 (2003) 358–362
Article Google Scholar
Wang, J., Kwok, J., Shen, H., Quan, L.: Data-dependent kernels for small-scale, high-dimensional data classification. In: Proc. of the International Joint Conference on Neural Networks (to appear). (2005)
Google Scholar
Joachims, T.: Text categorisation with support vector machines. In: Proceedings of European Conference on Machine Learning (ECML). (1998)
Google Scholar
Ryder, A.: Classification of narcotics in solid mixtures using Principal Component Analysis and Raman spectroscopy and chemometric methods. J. Forensic Sci 47 (2002) 275–284
Google Scholar
Bulkin, B.: The Raman effect: an introduction. New York: John Wiley and Sons, Inc (1991)
Google Scholar
Conroy, J., Ryder, A., Leger, M., Hennessy, K., Madden, M.: Qualitative and quantitative analysis of chlorinated solvents using Raman spectroscopy and machine learning. In: Proc. SPIE-Int. Soc. Opt. Eng. Volume 5826 (in press). (2005)
Google Scholar
Cheng, C., Kirkbride, T., Batchelder, D., Lacey, R., Sheldon, T.: In situ detection and identification of trace explosives by Raman microscopy. J. Forensic Sci 40 (1995)31–37
Google Scholar
O’Connell, M., Howley, T., Ryder, A., Leger, M., Madden, M.: Classification of a target analyte in solid mixtures using principal component analysis, support vector machines and Raman spectroscopy. In: Proc. SPIE-Int. Soc. Opt. Eng. Volume 5826 (in press). (2005)
Google Scholar
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Springer (2001)
Google Scholar
Geladi, P., Kowalski, B.: Partial Least Squares: A Tutorial. Analytica Chemica Acta 185 (1986) 1–17
Article Google Scholar
Scholkopf, B., Smola, A.: Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press (2002)
Google Scholar
Witten, I., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann Publishers (2000)
Google Scholar
Quinlan, R.: Learning Logical Definitions from Relations. Machine Learning 5 (1990)
Google Scholar
Cohen, W.: Fast Eeffective Rule Induction. In: Proc. of the 12th Int. Conference on Machine Learning. (2002) 115–123
Google Scholar
Savitzky, A., Golay, M.: Smoothing and differentiation of data by simplified least squares procedures. Anal. Chem. 36 (1964) 1627–1639
Article Google Scholar
Nadeau, C., Bengio, Y.: Inference for generalisation error. In: Advances in Neural Information Processing 12. MIT Press (2000)
Google Scholar
Popelinsky, L., Brazdil, P.: The Principal Components Method as a Preprocessing Stage for Decision Tree Learning. In: Proc. of PKDD Workshop (Data Mining, Decision Support, Meta-learning and ILP). (2000)
Google Scholar
Sigurdsson, S., Philipsen, P., Hansen, L., Larsen, J., Gniadecka, M., Wulf, H.: Detection of Skin Cancer by Classification of Raman Spectra. IEEE Transactions on Biomedical Engineering 51 (2004)
Google Scholar
Popelinsky, L.: Combining the Principal Components Method with Different Learning Algorithms. In: Proc. of ECML/PKDD IDDM Workshop (Integrating Aspects of Data Mining, Decision Support and Meta-Learning). (2001)
Google Scholar

Download references

Author information

Authors and Affiliations

National University of Ireland, Galway, Ireland
Tom Howley, Michael G. Madden, Marie-Louise O’Connell & Alan G. Ryder

Authors

Tom Howley
View author publications
You can also search for this author in PubMed Google Scholar
Michael G. Madden
View author publications
You can also search for this author in PubMed Google Scholar
Marie-Louise O’Connell
View author publications
You can also search for this author in PubMed Google Scholar
Alan G. Ryder
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Napier University, Edinburgh, EH10 5DT, UK
Ann Macintosh BSc, CEng
Stratum Management Ltd, UK
Richard Ellis BSc, MSc
Nottingham Trent University, UK
Tony Allen PhD

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Howley, T., Madden, M.G., O’Connell, ML., Ryder, A.G. (2006). The Effect of Principal Component Analysis on Machine Learning Accuracy with High Dimensional Spectral Data. In: Macintosh, A., Ellis, R., Allen, T. (eds) Applications and Innovations in Intelligent Systems XIII. SGAI 2005. Springer, London. https://doi.org/10.1007/1-84628-224-1_16

Download citation

DOI: https://doi.org/10.1007/1-84628-224-1_16
Publisher Name: Springer, London
Print ISBN: 978-1-84628-223-2
Online ISBN: 978-1-84628-224-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics