Abstract
The emerging and fast-developing field of metabolomics examines the abundance of small-molecule metabolites in body fluids to study the cellular processes related to how the human body responds to genetic and environmental perturbations. Considering the complexity of metabolism, metabolites and their represented cellular processes can correlate and synergistically contribute to a phenotypic status. Genetic programming (GP) provides advanced analytical instruments for the investigation of multifactorial causes of metabolic diseases. In this article, we analyzed a population-based metabolomics dataset on osteoarthritis (OA) and developed a Linear GP (LGP) algorithm to search classification models that can best predict the disease outcome, as well as to identify the most important metabolic markers associated with the disease. The LGP algorithm was able to evolve prediction models with high accuracies especially with a more focused search using a reduced feature set that only includes potentially relevant metabolites. We also identified a set of key metabolic markers that may improve our understanding of the biochemistry and pathogenesis of the disease.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Kitano, H.: Systems biology: a brief overview. Science 295(5560), 1662–1664 (2002)
Kitano, H.: Computational systems biology. Nature 420(6912), 206–210 (2002)
Ideker, T., Galitski, T., Hood, L.: A new approach to decoding life: systems biology. Annu. Rev. Genom. Hum. Genet. 2(1), 343–372 (2001)
Cusick, M.E., Klitgord, N., Vidal, M., Hill, D.E.: Interactome: gateway into systems biology. Hum. Mol. Genet. 14(suppl 2), R171–181 (2005)
Bruggeman, F.J., Westerhoff, H.V.: The nature of systems biology. Trends Microbiol. 15(1), 45–50 (2007)
Shim, S.H.: Cell imaging: an intracellular dance visualized. Nature 546, 39–40 (2017)
Wang, K., Lee, I., Carlson, G., Hood, L., Galas, D.: Systems biology and the discovery of diagnostic biomarkers. Dis. Markers 28(4), 199–207 (2010)
Butcher, E.C., Berg, E.L., Kunkel, E.J.: Systems biology in drug discovery. Nat. Biotechnol. 22(10), 1253–1259 (2004)
Li, Y., Chen, L.: Big biological data: challenges and opportunities. Genom. Proteomics Bioinf. 12(5), 187–189 (2014)
Alfieri, R., Milanesi, L.: Multi-level data integration and data mining in systems biology. In: Handbook of Research on Systems Biology Applications in Medicine, pp. 476–496. IGI Global (2009)
Sugimoto, M., Kawakami, M., Robert, M., Soga, T., Tomita, M.: Bioinformatics tools for mass spectroscopy-based metabolomic data processing and analysis. Curr. Bioinf. 7(1), 96–108 (2012)
Hotelling, H.: Analysis of a complex of statistical variables into principal components. J. Educ. Psychol. 24(6), 417 (1933)
Bishop, C.M.: Neural Networks for Pattern Recognition. Oxford University Press, Oxford (1995)
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Worzel, W.P., Yu, J., Almal, A.A., Chinnaiyan, A.M.: Applications of genetic programming in cancer research. Int. J. Biochem. Cell Biol. 41(2), 405–413 (2009)
Kandpal, M., Kalyan, C.M., Samavedham, L.: Genetic programming-based approach to elucidate biochemical interaction networks from data. IET Syst. Biol. 7(1), 18–25 (2013)
Gowda, G.N., Zhang, S., Gu, H., Asiago, V., Shanaiah, N., Raftery, D.: Metabolomics-based methods for early disease diagnostics. Expert Rev. Mol. Diagn. 8(5), 617–633 (2008)
WHO Scientic Group: the burden of musculoskeletal conditions at the start of the new millennium. WHO Technical Report Series 919, 218 (2003)
Reginster, J.Y.: The prevalence and burden of arthritis. Rheumatology 41, 3–6 (2004)
Zhai, G., Aref-Eshghi, E., Rahman, P., Zhang, H., Martin, G., Furey, A., Green, R.C., Sun, G.: Attempt to replicate the published osteoarthritis-associated genetic variants in the newfoundland & labrador population. J. Orthop. Rheumatol. 1(3), 5 (2014)
Hu, T., Zhang, W., Fan, Z., Sun, G., Likhodi, S., Randell, E., Zhai, G.: Metabolomics differential correlation network analysis of osteoarthritis. Pac. Symp. Biocomput. 21, 120–131 (2016)
Altman, R., Alarcon, G., Appelrouth, D., Bloch, D., Borenstein, D., Brandt, K., Brown, C., Cooke, T.D., et al.: The american college of rheumatology criteria for the classification and reporting of osteoarthritis of the hip. Arthritis Rheum. 34(5), 505–514 (1991)
Zhang, W., Likhodii, S., Aref-Eshghi, E., Zhang, Y., Harper, P.E., Randell, E., Green, R., Martin, G., Furey, A., Sun, G., Rahman, P., Zhai, G.: Relationship between blood plasma and synovial fluid metabolite concentrations in patients with osteoarthritis. J. Rheumatol. 42(5), 859–865 (2015)
Brameier, M.F., Banzhaf, W.: Linear Genetic Programming. Springer, New York (2007)
Brameier, M.F., Banzhaf, W.: A comparison of linear genetic programming and neural networks in medical data mining. IEEE Trans. Evol. Comput. 5(1), 17–26 (2001)
Guven, A.: Linear genetic programming for time-series modeling of daily flow rate. J. Earth Syst. Sci. 118(2), 137–146 (2009)
Song, D., Heywood, M.I., Zincir-Heywood, A.N.: A linear genetic programming approach to intrusion detection. In: Cantú-Paz, E. (ed.) GECCO 2003. LNCS, vol. 2724, pp. 2325–2336. Springer, Heidelberg (2003). https://doi.org/10.1007/3-540-45110-2_125
Bezanson, J., Edelman, A., Karpinski, S., Shah, V.B.: Julia: a fresh approach to numerical computing. CoRR abs/1411.1607 (2014). http://arxiv.org/abs/1411.1607
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)
Zhang, W., Sun, G., Likhodii, S., Liu, M., Aref-Eshghi, E., Harper, P.E., Martin, G., Furey, A., Green, R., Randell, E., Rahman, P., Zhai, G.: Metabolomic analysis of human plasma reveals that arginine is depleted in knee osteoarthritis patients. Osteoarthr. Cartil. 24, 827–834 (2016)
Zhai, G., Wang-Sattler, R., Hart, D.J., Arden, N.K., Hakim, A.J., Illig, T., Spector, T.D.: Serum branched-chain amino acid to histidine ratio: a novel metabolomic biomarker of knee osteoarthritis. Ann. Rheum. Dis. 69(6), 1227–1231 (2010)
Zhang, W., Sun, G., Likhodii, S., Aref-Eshghi, E., Harper, P.E., Randell, E., Green, R., Martin, G., Furey, A., Rahman, P., Zhai, G.: Metabolomic analysis of human synovial fluid and plasma reveals that phosphatidylcholine metabolism is associated with both osteoarthritis and diabetes mellitus. Metabolomics 12, 24 (2016)
Zhang, W., Sun, G., Aitken, D., Likhodii, S., Liu, M., Martin, G., Furey, A., Randell, E., Rahman, P., Jones, G., Zhai, G.: Lysophosphatidylcholines to phosphatidylcholines ratio predicts advanced knee osteoarthritis. Rheumatology 55(9), 1566–1574 (2016)
Zhang, W., Likhodii, S., Zhang, Y., Aref-Eshghi, E., Harper, P.E., Randell, E., Green, R., Martin, G., Furey, A., Sun, G., Rahman, P., Zhai, G.: Classification of osteoarthritis phenotypes by metabolomics analysis. BMJ Open 4, e006286 (2014)
Marcinkiewicz, J., Kontny, E.: Taurine and inflammatory diseases. Amino Acids 46(1), 7–20 (2014)
Loeser, R.F.: Aging and osteoarthritis: the role of chondrocyte senescence and aging changes in the cartilage matrix. Osteoarthr. Cartil. 17(8), 971–979 (2009)
Kontny, E., Wojtecka-ŁUkasik, E., Rell-Bakalarska, K., Dziewczopolski, W., Maśliński, W., Maślinski, S.: Impaired generation of taurine chloramine by synovial fluid neutrophils of rheumatoid arthritis patients. Amino Acids 23(4), 415–418 (2002)
Loeser, R.F., Carlson, C.S., Carlo, M.D., Cole, A.: Detection of nitrotyrosine in aging and osteoarthritic cartilage: correlation of oxidative damage with the presence of interleukin-1\(\beta \) and with chondrocyte resistance to insulin-like growth factor 1. Arthritis Rheumatol. 46(9), 2349–2357 (2002)
Forrest, C.M., Kennedy, A., Stone, T.W., Stoy, N., Darlington, L.G.: Kynurenine and neopterin levels in patients with rheumatoid arthritis and osteoporosis during drug treatment. In: Allegri, G., Costa, C.V.L., Ragazzi, E., Steinhart, H., Varesio, L. (eds.) Developments in Tryptophan and Serotonin Metabolism. AEMB, vol. 527, pp. 287–295. Springer, Boston (2003). https://doi.org/10.1007/978-1-4615-0135-0_32
Acknowledgments
This research was supported by Newfoundland and Labrador Research and Development Corporation (RDC) Ignite Grant 5404.1942.101 and the Natural Science and Engineering Research Council (NSERC) of Canada Discovery Grant RGPIN-2016-04699 to TH. GZ acknowledges grants from Canadian Institute of Health Research (CIHR), Newfoundland and Labrador Research and Development Corporation (RDC) and Memorial University. We thank all the study participants who made this study possible and all the Operation Room staff at Eastern Health General Hospital and St. Clare’s Hospital who helped for collecting samples.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Hu, T., Oksanen, K., Zhang, W., Randell, E., Furey, A., Zhai, G. (2018). Analyzing Feature Importance for Metabolomics Using Genetic Programming. In: Castelli, M., Sekanina, L., Zhang, M., Cagnoni, S., García-Sánchez, P. (eds) Genetic Programming. EuroGP 2018. Lecture Notes in Computer Science(), vol 10781. Springer, Cham. https://doi.org/10.1007/978-3-319-77553-1_5
Download citation
DOI: https://doi.org/10.1007/978-3-319-77553-1_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-77552-4
Online ISBN: 978-3-319-77553-1
eBook Packages: Computer ScienceComputer Science (R0)