Skip to main content

Advertisement

Log in

Assessing and improving the stability of chemometric models in small sample size situations

  • Original Paper
  • Published:
Analytical and Bioanalytical Chemistry Aims and scope Submit manuscript

Abstract

Small sample sizes are very common in multivariate analysis. Sample sizes of 10–100 statistically independent objects (rejects from processes or loading dock analysis, or patients with a rare disease), each with hundreds of data points, cause unstable models with poor predictive quality. Model stability is assessed by comparing models that were built using slightly varying training data. Iterated k-fold cross-validation is used for this purpose. Aggregation stabilizes models. It is possible to assess the quality of the aggregated model without calculating further models. The validation and aggregation methods investigated in this study apply to regression as well as to classification. These techniques are useful for analyzing data with large numbers of variates, e.g., any spectral data like FT-IR, Raman, UV/VIS, fluorescence, AAS, and MS. FT-IR images of tumor tissue were used in this study. Some tissue types occur frequently, while some are very rare. They are classified using LDA. Initial models were severely unstable. Aggregation stabilizes the predictions. The hit rate increased from 67% to 82%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. Krafft C, Sobottka SB, Geiger KD, Schackert G, Salzer R (2007) Anal Bioanal Chem 387:1669–1677

    Article  CAS  Google Scholar 

  2. Krafft C, Thümmler K, Sobottka SB, Schackert G, Salzer R (2006) Biopolymers 82:301–305

    Article  CAS  Google Scholar 

  3. Beleites C, Steiner G, Sowa MG, Baumgartner R, Sobottka S, Schackert G, Salzer R (2005) Vib Spectrosc 38:143–149

    Article  CAS  Google Scholar 

  4. Bryden HL, Longworth HR, Cunningham SA (2005) Nature 438:655–657

    Article  CAS  Google Scholar 

  5. Cunningham SA, Kanzow T, Rayner D, Baringer MO, Johns WE, Marotzke J, Longworth HR, Grant EM, Hirschi JJM, Beal LM, Meinen CS, Bryden HL (2007) Science 317:935–938

    Article  CAS  Google Scholar 

  6. Schiermeier Q (2007) Nature 448:844–845

    Article  CAS  Google Scholar 

  7. Church JA (2007) Science 317:908–909

    Article  CAS  Google Scholar 

  8. Hastie T, Tibshirani R, Friedman J (2001) The elements of statistical learning; data mining, inference and prediction. Springer, New York

    Google Scholar 

  9. Forthofer RN, Lee ES, Hernandez M (2007) Biostatistics, 2nd edn. Elsevier, Amsterdam

  10. Efron B, Tibshirani R (1993) An introduction to the bootstrap. Chapman & Hall, New York

    Google Scholar 

  11. Asuncion A, Newman D (2005) UCI machine learning repository. http://archive.ics.uci.edu/ml/. Accessed 24 December 2007

  12. Beleites C, Baumgartner R, Bowman C, Somorjai R, Steiner G, Salzer R, Sowa MG (2005) Chem Intell Lab Syst 79:91–100

    Article  CAS  Google Scholar 

  13. Kohavi R (1995) In: Mellish CS (ed) Proc 14th Int Joint Conf Artificial Intelligence, Montréal, Québec, Canada, 20–25 August 1995. Morgan Kaufmann, San Francisco, CA, pp 1137–1145

  14. Breiman L (1996) Machine Learning 24:123–140

    Google Scholar 

  15. Breiman L (1996) Out-of-bag estimation. Technical report, Statistics Department, University of California, Berkeley, CA

  16. Beleites C (2003) Chemometrische Auswertung von IR-Images und -Maps. Master’s thesis, Dresden University of Technology, Dresden

  17. Huberty CJ (1994) Applied discriminant analysis. Wiley, New York

    Google Scholar 

  18. Nikulin A, Dolenko B, Bezabeh T, Somorjai R (1998) NMR Biomed 11:209–216

    Article  CAS  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Claudia Beleites.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Beleites, C., Salzer, R. Assessing and improving the stability of chemometric models in small sample size situations. Anal Bioanal Chem 390, 1261–1271 (2008). https://doi.org/10.1007/s00216-007-1818-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00216-007-1818-6

Keywords

Navigation