Exploratory Data Analysis for Investigating GC-MS Biomarkers

  • Ken McGarry
  • Kim Bartlett
  • Morteza Pourfarzam
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5265)


The detection of reliable biomarkers is a major research activity within the field of proteomics. A biomarker can be a single molecule or set of molecules that can be used to differentiate between normal and diseased states. This paper describes our methods to develop a reliable, automated method of detecting abnormal metabolite profiles from urinary organic acids. These metabolic profiles are used to detect Inborn Errors of Metabolism (IEM) in infants, which are inherited diseases resulting from alterations in genes that code for enzymes. The detection of abnormal metabolic profiles is usually accomplished through manual inspection of the chromatograms by medical experts. The chromatograms are derived by a method called Gas Chromatography - Mass Spectrometry (GC-MS). This combined technique is used to identify presence of different substances in a given sample. Using GC/MS analysis of the urine sample of the patient, the medical experts are able to identify the presence of metabolites which are a result of an IEM.


Exploratory Data Analysis Decision Tree Model Subspace Cluster Urinary Organic Acid Organic Acidemia 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Humphrey-Smith, I., Dracup, W.: The search for validated biomarkers in the face of biosystems complexity. Drug Discovery World, 49–56 (Spring 2005)Google Scholar
  2. 2.
    Kumps, A., Duez, P., Mardens, Y.: Metabolic, nutritional, latrogenic, and artifactual sources of urinary organic acids: a comprehensive table. Clinical Chemistry 48(5), 708–717 (2002)PubMedGoogle Scholar
  3. 3.
    Chu, C., Xiao, X., Zhou, X., Lau, T., Rogers, M., Fok, T., Law, L., Pang, C., Wang, C.: Metabolomic and bioinformatic analyses in asphyxiated neonates. Clinical Biochemistry 39, 203–209 (2006)CrossRefPubMedGoogle Scholar
  4. 4.
    Tanaka, K., Budd, M., Efron, M., Isselbacher, K.: Isovaleric acidemia: a new genetic defect of leucine metabolism. Proc. Natl. Acad. Sci. USA 56(1), 236–242 (1966)CrossRefPubMedPubMedCentralGoogle Scholar
  5. 5.
    Kimura, M., Yamamoto, T., Yamaguchi, S.: Automated metabolic profiling and interpretation of GC/MS data for organic acidemia screening: a personal computer-based system. Journal of Experimental Medicine 188, 317–334 (1999)Google Scholar
  6. 6.
    Halket, J., Przyborowska, A., Stein, S., Mallard, W., Down, S., Chalmers, R.: Deconvolution gas chromatography/mass spectrometery of urinary organic acids - potential for pattern recognition and automated identification of metabolic disorders. Rapid Communications in Mass Spectrometry 13, 279–284 (1999)CrossRefPubMedGoogle Scholar
  7. 7.
    Ho, S., Lukacs, Z., Hoffmann, G., Linder, M., Wetter, T.: Feature construction can improve diagnostic criteria for high-dimensional metabolic data in newborn screening for medium-chain acyl-coa dehydrogenase deficiency. Clinical Chemistry 53(7), 1330–1337 (2007)CrossRefPubMedGoogle Scholar
  8. 8.
    Duran, A., Wang, L., Yng, J., Sumner, L.: Metabolomics spectral formatting, alignment and conversion tools MSFACTS. Bioinformatics 19(17), 2283–2293 (2003)CrossRefPubMedGoogle Scholar
  9. 9.
    Hanson, M., Andersen, B., Smedsgaard, J.: Automated and unbiased classification of chemical profiles from fungi using high performance liquid chromatograph. Journal of Microbiological Methods 61, 295–304 (2005)CrossRefGoogle Scholar
  10. 10.
    Guillo, C., Barlow, D., Perrett, D., Hanna-Brown, M.: Micellar electrokinetic capillary chromatography and data alignment analysis: a new tool in urine profiling. Journal of Chromatography A 1027, 203–212 (2004)CrossRefPubMedGoogle Scholar
  11. 11.
    Baran, R., Kochi, H., Saito, N., Suematsu, M., Soga, T., Nishioka, T., Robert, M., Tomita, M.: MathDAMP: a package for differential analysis of metabolite profile. BMC Bioinformatics 7, 1–9 (2006)CrossRefGoogle Scholar
  12. 12.
    Broadhurst, D., Kell, D.: Statistical strategies for avoiding false discoveries in metabolomics and related experiments. Metabolomics 2(4), 171–196 (2006)CrossRefGoogle Scholar
  13. 13.
    Damian, D., Oresic, M., Verheij, E., Meulman, J., Friedman, J., Adourian, A., Morel, N., Smilde, A., Van Der Greef, J.: Applications of a new subspace clustering algorithm (COSA) in medical systems biology. Metabolomics 3(1), 69–77 (2007)CrossRefGoogle Scholar
  14. 14.
    Goodacre, R., Vaidyanathan, S., Dunn, W., Harrigan, G., Kell, D.: Metabolomics by numbers: acquiring and understanding global metabolite data. TRENDS in Biotechnology 22(5), 245–252 (2004)CrossRefPubMedGoogle Scholar
  15. 15.
    Smit, S., Hoefsloot, H., Smilde, A.: Statistical data processing in clinical proteomics. Journal of Chromatography B866(1-2), 77–88 (2008)Google Scholar
  16. 16.
    Obuchowshi, N., Lieber, M., Wians, F.: ROC curves in /it Clinical Chemistry: uses, misuses, and possible solutions. Clinical Chesmitry 50(7), 118–1125 (2004)Google Scholar
  17. 17.
    Leibermeister, W., Klipp, E.: Bringing metabolic networks to life: integration of kinetic, metabolic and proteomic data. Theoretical Biology and Medical Modelling 42(3), 1–15 (2006)Google Scholar
  18. 18.
    Yeang, C., Vingron, M.: A joint model of regulatory and metabolic networks. BMC Bioinformatics 332(7), 1–5 (2006)Google Scholar
  19. 19.
    Hilario, M., Kalousis, A., Prados, J., Binz, P.: Data mining for mass spectra-based cancer diagnosis and biomarker discovery. Drug Discovery Today 2(5), 214–222 (2004)CrossRefGoogle Scholar
  20. 20.
    Gower, J., Hand, D.: Biplots. Chapman and Hall, London (1996)Google Scholar
  21. 21.
    Martinez, W., Martinez, A.: Exploratory data analysis with Matlab. Chapman and Hall, New York (2000)Google Scholar
  22. 22.
    Baumgartner, C., Bohm, C., Baumgartner, D.: Modelling of classification rules on metabolic patterns including machine learning and expert knowledge. Journal of Biomedical Informatics 38(2), 89–98 (2005)CrossRefPubMedGoogle Scholar
  23. 23.
    Baumgartner, C., Baumgartner, D.: Biomarker discovery, disease classification, and similarity query processing on high-throughput MS/MS data of inborn errors of metabolism. Journal of Biomolecular Screening 11(1), 90–99 (2006)CrossRefPubMedGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Ken McGarry
    • 1
  • Kim Bartlett
    • 1
  • Morteza Pourfarzam
    • 2
  1. 1.School of Pharmacy, City CampusUniversity of SunderlandUK
  2. 2.Royal Victoria Infirmary, Department of Clinical BiochemistryNewcastle Upon TyneUK

Personalised recommendations