Skip to main content

The Effect of Principal Component Analysis on Machine Learning Accuracy with High Dimensional Spectral Data

  • Conference paper
Applications and Innovations in Intelligent Systems XIII (SGAI 2005)

Abstract

This paper presents the results of an investigation into the use of machine learning methods for the identification of narcotics from Raman spectra. The classification of spectral data and other high dimensional data, such as images, gene-expression data and spectral data, poses an interesting challenge to machine learning, as the presence of high numbers of redundant or highly correlated attributes can seriously degrade classification accuracy. This paper investigates the use of Principal Component Analysis (PCA) to reduce high dimensional spectral data and to improve the predictive performance of some well known machine learning methods. Experiments are carried out on a high dimensional spectral dataset. These experiments employ the NIPALS (Non-Linear Iterative Partial Least Squares) PCA method, a method that has been used in the field of chemometrics for spectral classification, and is a more efficient alternative than the widely used eigenvector decomposition approach. The experiments show that the use of this PCA method can improve the performance of machine learning in the classification of high dimensionsal data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Peng, S., Xu, Q., Ling, X., Peng, X., Du, W., Chen, L.: Molecular Classification of Cancer Types from Microarray Data using the combination of Genetic Algorithms and Support Vector Machines. FEBS Letters 555 (2003) 358–362

    Article  Google Scholar 

  2. Wang, J., Kwok, J., Shen, H., Quan, L.: Data-dependent kernels for small-scale, high-dimensional data classification. In: Proc. of the International Joint Conference on Neural Networks (to appear). (2005)

    Google Scholar 

  3. Joachims, T.: Text categorisation with support vector machines. In: Proceedings of European Conference on Machine Learning (ECML). (1998)

    Google Scholar 

  4. Ryder, A.: Classification of narcotics in solid mixtures using Principal Component Analysis and Raman spectroscopy and chemometric methods. J. Forensic Sci 47 (2002) 275–284

    Google Scholar 

  5. Bulkin, B.: The Raman effect: an introduction. New York: John Wiley and Sons, Inc (1991)

    Google Scholar 

  6. Conroy, J., Ryder, A., Leger, M., Hennessy, K., Madden, M.: Qualitative and quantitative analysis of chlorinated solvents using Raman spectroscopy and machine learning. In: Proc. SPIE-Int. Soc. Opt. Eng. Volume 5826 (in press). (2005)

    Google Scholar 

  7. Cheng, C., Kirkbride, T., Batchelder, D., Lacey, R., Sheldon, T.: In situ detection and identification of trace explosives by Raman microscopy. J. Forensic Sci 40 (1995)31–37

    Google Scholar 

  8. O’Connell, M., Howley, T., Ryder, A., Leger, M., Madden, M.: Classification of a target analyte in solid mixtures using principal component analysis, support vector machines and Raman spectroscopy. In: Proc. SPIE-Int. Soc. Opt. Eng. Volume 5826 (in press). (2005)

    Google Scholar 

  9. Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Springer (2001)

    Google Scholar 

  10. Geladi, P., Kowalski, B.: Partial Least Squares: A Tutorial. Analytica Chemica Acta 185 (1986) 1–17

    Article  Google Scholar 

  11. Scholkopf, B., Smola, A.: Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press (2002)

    Google Scholar 

  12. Witten, I., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann Publishers (2000)

    Google Scholar 

  13. Quinlan, R.: Learning Logical Definitions from Relations. Machine Learning 5 (1990)

    Google Scholar 

  14. Cohen, W.: Fast Eeffective Rule Induction. In: Proc. of the 12th Int. Conference on Machine Learning. (2002) 115–123

    Google Scholar 

  15. Savitzky, A., Golay, M.: Smoothing and differentiation of data by simplified least squares procedures. Anal. Chem. 36 (1964) 1627–1639

    Article  Google Scholar 

  16. Nadeau, C., Bengio, Y.: Inference for generalisation error. In: Advances in Neural Information Processing 12. MIT Press (2000)

    Google Scholar 

  17. Popelinsky, L., Brazdil, P.: The Principal Components Method as a Preprocessing Stage for Decision Tree Learning. In: Proc. of PKDD Workshop (Data Mining, Decision Support, Meta-learning and ILP). (2000)

    Google Scholar 

  18. Sigurdsson, S., Philipsen, P., Hansen, L., Larsen, J., Gniadecka, M., Wulf, H.: Detection of Skin Cancer by Classification of Raman Spectra. IEEE Transactions on Biomedical Engineering 51 (2004)

    Google Scholar 

  19. Popelinsky, L.: Combining the Principal Components Method with Different Learning Algorithms. In: Proc. of ECML/PKDD IDDM Workshop (Integrating Aspects of Data Mining, Decision Support and Meta-Learning). (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag London Limited

About this paper

Cite this paper

Howley, T., Madden, M.G., O’Connell, ML., Ryder, A.G. (2006). The Effect of Principal Component Analysis on Machine Learning Accuracy with High Dimensional Spectral Data. In: Macintosh, A., Ellis, R., Allen, T. (eds) Applications and Innovations in Intelligent Systems XIII. SGAI 2005. Springer, London. https://doi.org/10.1007/1-84628-224-1_16

Download citation

  • DOI: https://doi.org/10.1007/1-84628-224-1_16

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-84628-223-2

  • Online ISBN: 978-1-84628-224-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics