Analyzing Feature Importance for Metabolomics Using Genetic Programming

  • Ting HuEmail author
  • Karoliina Oksanen
  • Weidong Zhang
  • Edward Randell
  • Andrew Furey
  • Guangju Zhai
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10781)


The emerging and fast-developing field of metabolomics examines the abundance of small-molecule metabolites in body fluids to study the cellular processes related to how the human body responds to genetic and environmental perturbations. Considering the complexity of metabolism, metabolites and their represented cellular processes can correlate and synergistically contribute to a phenotypic status. Genetic programming (GP) provides advanced analytical instruments for the investigation of multifactorial causes of metabolic diseases. In this article, we analyzed a population-based metabolomics dataset on osteoarthritis (OA) and developed a Linear GP (LGP) algorithm to search classification models that can best predict the disease outcome, as well as to identify the most important metabolic markers associated with the disease. The LGP algorithm was able to evolve prediction models with high accuracies especially with a more focused search using a reduced feature set that only includes potentially relevant metabolites. We also identified a set of key metabolic markers that may improve our understanding of the biochemistry and pathogenesis of the disease.


Metabolomics Osteoarthritis Biomarker discovery Genetic programming Classification 



This research was supported by Newfoundland and Labrador Research and Development Corporation (RDC) Ignite Grant 5404.1942.101 and the Natural Science and Engineering Research Council (NSERC) of Canada Discovery Grant RGPIN-2016-04699 to TH. GZ acknowledges grants from Canadian Institute of Health Research (CIHR), Newfoundland and Labrador Research and Development Corporation (RDC) and Memorial University. We thank all the study participants who made this study possible and all the Operation Room staff at Eastern Health General Hospital and St. Clare’s Hospital who helped for collecting samples.


  1. 1.
    Kitano, H.: Systems biology: a brief overview. Science 295(5560), 1662–1664 (2002)CrossRefGoogle Scholar
  2. 2.
    Kitano, H.: Computational systems biology. Nature 420(6912), 206–210 (2002)CrossRefGoogle Scholar
  3. 3.
    Ideker, T., Galitski, T., Hood, L.: A new approach to decoding life: systems biology. Annu. Rev. Genom. Hum. Genet. 2(1), 343–372 (2001)CrossRefGoogle Scholar
  4. 4.
    Cusick, M.E., Klitgord, N., Vidal, M., Hill, D.E.: Interactome: gateway into systems biology. Hum. Mol. Genet. 14(suppl 2), R171–181 (2005)CrossRefGoogle Scholar
  5. 5.
    Bruggeman, F.J., Westerhoff, H.V.: The nature of systems biology. Trends Microbiol. 15(1), 45–50 (2007)CrossRefGoogle Scholar
  6. 6.
    Shim, S.H.: Cell imaging: an intracellular dance visualized. Nature 546, 39–40 (2017)CrossRefGoogle Scholar
  7. 7.
    Wang, K., Lee, I., Carlson, G., Hood, L., Galas, D.: Systems biology and the discovery of diagnostic biomarkers. Dis. Markers 28(4), 199–207 (2010)CrossRefGoogle Scholar
  8. 8.
    Butcher, E.C., Berg, E.L., Kunkel, E.J.: Systems biology in drug discovery. Nat. Biotechnol. 22(10), 1253–1259 (2004)CrossRefGoogle Scholar
  9. 9.
    Li, Y., Chen, L.: Big biological data: challenges and opportunities. Genom. Proteomics Bioinf. 12(5), 187–189 (2014)CrossRefGoogle Scholar
  10. 10.
    Alfieri, R., Milanesi, L.: Multi-level data integration and data mining in systems biology. In: Handbook of Research on Systems Biology Applications in Medicine, pp. 476–496. IGI Global (2009)Google Scholar
  11. 11.
    Sugimoto, M., Kawakami, M., Robert, M., Soga, T., Tomita, M.: Bioinformatics tools for mass spectroscopy-based metabolomic data processing and analysis. Curr. Bioinf. 7(1), 96–108 (2012)CrossRefGoogle Scholar
  12. 12.
    Hotelling, H.: Analysis of a complex of statistical variables into principal components. J. Educ. Psychol. 24(6), 417 (1933)CrossRefzbMATHGoogle Scholar
  13. 13.
    Bishop, C.M.: Neural Networks for Pattern Recognition. Oxford University Press, Oxford (1995)zbMATHGoogle Scholar
  14. 14.
    Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)CrossRefzbMATHGoogle Scholar
  15. 15.
    Worzel, W.P., Yu, J., Almal, A.A., Chinnaiyan, A.M.: Applications of genetic programming in cancer research. Int. J. Biochem. Cell Biol. 41(2), 405–413 (2009)CrossRefGoogle Scholar
  16. 16.
    Kandpal, M., Kalyan, C.M., Samavedham, L.: Genetic programming-based approach to elucidate biochemical interaction networks from data. IET Syst. Biol. 7(1), 18–25 (2013)CrossRefGoogle Scholar
  17. 17.
    Gowda, G.N., Zhang, S., Gu, H., Asiago, V., Shanaiah, N., Raftery, D.: Metabolomics-based methods for early disease diagnostics. Expert Rev. Mol. Diagn. 8(5), 617–633 (2008)CrossRefGoogle Scholar
  18. 18.
    WHO Scientic Group: the burden of musculoskeletal conditions at the start of the new millennium. WHO Technical Report Series 919, 218 (2003)Google Scholar
  19. 19.
    Reginster, J.Y.: The prevalence and burden of arthritis. Rheumatology 41, 3–6 (2004)CrossRefGoogle Scholar
  20. 20.
    Zhai, G., Aref-Eshghi, E., Rahman, P., Zhang, H., Martin, G., Furey, A., Green, R.C., Sun, G.: Attempt to replicate the published osteoarthritis-associated genetic variants in the newfoundland & labrador population. J. Orthop. Rheumatol. 1(3), 5 (2014)Google Scholar
  21. 21.
    Hu, T., Zhang, W., Fan, Z., Sun, G., Likhodi, S., Randell, E., Zhai, G.: Metabolomics differential correlation network analysis of osteoarthritis. Pac. Symp. Biocomput. 21, 120–131 (2016)Google Scholar
  22. 22.
    Altman, R., Alarcon, G., Appelrouth, D., Bloch, D., Borenstein, D., Brandt, K., Brown, C., Cooke, T.D., et al.: The american college of rheumatology criteria for the classification and reporting of osteoarthritis of the hip. Arthritis Rheum. 34(5), 505–514 (1991)CrossRefGoogle Scholar
  23. 23.
    Zhang, W., Likhodii, S., Aref-Eshghi, E., Zhang, Y., Harper, P.E., Randell, E., Green, R., Martin, G., Furey, A., Sun, G., Rahman, P., Zhai, G.: Relationship between blood plasma and synovial fluid metabolite concentrations in patients with osteoarthritis. J. Rheumatol. 42(5), 859–865 (2015)CrossRefGoogle Scholar
  24. 24.
    Brameier, M.F., Banzhaf, W.: Linear Genetic Programming. Springer, New York (2007)zbMATHGoogle Scholar
  25. 25.
    Brameier, M.F., Banzhaf, W.: A comparison of linear genetic programming and neural networks in medical data mining. IEEE Trans. Evol. Comput. 5(1), 17–26 (2001)CrossRefzbMATHGoogle Scholar
  26. 26.
    Guven, A.: Linear genetic programming for time-series modeling of daily flow rate. J. Earth Syst. Sci. 118(2), 137–146 (2009)CrossRefGoogle Scholar
  27. 27.
    Song, D., Heywood, M.I., Zincir-Heywood, A.N.: A linear genetic programming approach to intrusion detection. In: Cantú-Paz, E. (ed.) GECCO 2003. LNCS, vol. 2724, pp. 2325–2336. Springer, Heidelberg (2003). CrossRefGoogle Scholar
  28. 28.
    Bezanson, J., Edelman, A., Karpinski, S., Shah, V.B.: Julia: a fresh approach to numerical computing. CoRR abs/1411.1607 (2014).
  29. 29.
    Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)zbMATHGoogle Scholar
  30. 30.
    Zhang, W., Sun, G., Likhodii, S., Liu, M., Aref-Eshghi, E., Harper, P.E., Martin, G., Furey, A., Green, R., Randell, E., Rahman, P., Zhai, G.: Metabolomic analysis of human plasma reveals that arginine is depleted in knee osteoarthritis patients. Osteoarthr. Cartil. 24, 827–834 (2016)CrossRefGoogle Scholar
  31. 31.
    Zhai, G., Wang-Sattler, R., Hart, D.J., Arden, N.K., Hakim, A.J., Illig, T., Spector, T.D.: Serum branched-chain amino acid to histidine ratio: a novel metabolomic biomarker of knee osteoarthritis. Ann. Rheum. Dis. 69(6), 1227–1231 (2010)CrossRefGoogle Scholar
  32. 32.
    Zhang, W., Sun, G., Likhodii, S., Aref-Eshghi, E., Harper, P.E., Randell, E., Green, R., Martin, G., Furey, A., Rahman, P., Zhai, G.: Metabolomic analysis of human synovial fluid and plasma reveals that phosphatidylcholine metabolism is associated with both osteoarthritis and diabetes mellitus. Metabolomics 12, 24 (2016)CrossRefGoogle Scholar
  33. 33.
    Zhang, W., Sun, G., Aitken, D., Likhodii, S., Liu, M., Martin, G., Furey, A., Randell, E., Rahman, P., Jones, G., Zhai, G.: Lysophosphatidylcholines to phosphatidylcholines ratio predicts advanced knee osteoarthritis. Rheumatology 55(9), 1566–1574 (2016)CrossRefGoogle Scholar
  34. 34.
    Zhang, W., Likhodii, S., Zhang, Y., Aref-Eshghi, E., Harper, P.E., Randell, E., Green, R., Martin, G., Furey, A., Sun, G., Rahman, P., Zhai, G.: Classification of osteoarthritis phenotypes by metabolomics analysis. BMJ Open 4, e006286 (2014)CrossRefGoogle Scholar
  35. 35.
    Marcinkiewicz, J., Kontny, E.: Taurine and inflammatory diseases. Amino Acids 46(1), 7–20 (2014)CrossRefGoogle Scholar
  36. 36.
    Loeser, R.F.: Aging and osteoarthritis: the role of chondrocyte senescence and aging changes in the cartilage matrix. Osteoarthr. Cartil. 17(8), 971–979 (2009)CrossRefGoogle Scholar
  37. 37.
    Kontny, E., Wojtecka-ŁUkasik, E., Rell-Bakalarska, K., Dziewczopolski, W., Maśliński, W., Maślinski, S.: Impaired generation of taurine chloramine by synovial fluid neutrophils of rheumatoid arthritis patients. Amino Acids 23(4), 415–418 (2002)CrossRefGoogle Scholar
  38. 38.
    Loeser, R.F., Carlson, C.S., Carlo, M.D., Cole, A.: Detection of nitrotyrosine in aging and osteoarthritic cartilage: correlation of oxidative damage with the presence of interleukin-1\(\beta \) and with chondrocyte resistance to insulin-like growth factor 1. Arthritis Rheumatol. 46(9), 2349–2357 (2002)CrossRefGoogle Scholar
  39. 39.
    Forrest, C.M., Kennedy, A., Stone, T.W., Stoy, N., Darlington, L.G.: Kynurenine and neopterin levels in patients with rheumatoid arthritis and osteoporosis during drug treatment. In: Allegri, G., Costa, C.V.L., Ragazzi, E., Steinhart, H., Varesio, L. (eds.) Developments in Tryptophan and Serotonin Metabolism. AEMB, vol. 527, pp. 287–295. Springer, Boston (2003). CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Department of Computer ScienceMemorial UniversitySt. John’sCanada
  2. 2.Faculty of MedicineMemorial UniversitySt. John’sCanada
  3. 3.School of Pharmaceutical SciencesJilin UniversityJilinChina

Personalised recommendations