Recognition of early and late stages of bladder cancer using metabolites and machine learning
Bladder cancer (BCa) is one of the most common and aggressive cancers. It is the sixth most frequently occurring cancer in men and its rate of occurrence increases with age. The current method of BCa diagnosis includes a cystoscopy and biopsy. This process is expensive, unpleasant, and may have severe side effects. Recent growth in the power and accessibility of machine-learning software has allowed for the development of new, non-invasive diagnostic methods whose accuracy and sensitivity are uncompromising to function.
The goal of this research was to elucidate the biomarkers including metabolites and corresponding genes for different stages of BCa, show their distinguishing and common features, and create a machine-learning model for classification of stages of BCa.
Sets of metabolites for early and late stages, as well as common for both stages were analyzed using MetaboAnalyst and Ingenuity® Pathway Analysis (IPA®) software. Machine-learning methods were utilized in the development of a binary classifier for early- and late-stage metabolites of BCa. Metabolites were quantitatively characterized using EDragon 1.0 software. The two modeling methods used are Multilayer Perceptron (MLP) and Stochastic Gradient Descent (SGD) with a logistic regression loss function.
We explored metabolic pathways related to early-stage BCa (Galactose metabolism and Starch and sucrose metabolism) and to late-stage BCa (Glycine, serine, and threonine metabolism, Arginine and proline metabolism, Glycerophospholipid metabolism, and Galactose metabolism) as well as those common to both stages pathways. The central metabolite impacting the most cancerogenic genes (AKT, EGFR, MAPK3) in early stage is d-glucose, while late-stage BCa is characterized by significant fold changes in several metabolites: glycerol, choline, 13(S)-hydroxyoctadecadienoic acid, 2′-fucosyllactose. Insulin was also seen to play an important role in late stages of BCa. The best performing model was able to predict metabolite class with an accuracy of 82.54% and the area under precision-recall curve (PRC) of 0.84 on the training set. The same model was applied to three separate sets of metabolites obtained from public sources, one set of the late-stage metabolites and two sets of the early-stage metabolites. The model was better at predicting early-stage metabolites with accuracies of 72% (18/25) and 95% (19/20) on the early sets, and an accuracy of 65.45% (36/55) on the late-stage metabolite set.
By examining the biomarkers present in the urine samples of BCa patients as compared with normal patients, the biomarkers associated with this cancer can be pinpointed and lead to the elucidation of affected metabolic pathways that are specific to different stages of cancer. Development of machine-learning model including metabolites and their chemical descriptors made it possible to achieve considerable accuracy of prediction of stages of BCa.
KeywordsBladder cancer Metabolomics Metabolic networks Biomarkers Machine learning
VK and IT designed the research. EK coordinated the data collection. VK, IT and EK analyzed the data and wrote the manuscript. AZ and ER created the data set for machine learning and executed it. All authors read and approved the manuscript.
Compliance with ethical standards
Conflict of interest
The authors declare that they have no conflict of interest.
Research involving human and animal participants
This article does not contain any studies with human and/or animal participants performed by any of the authors.
- Chen, W. Y., Wu, F., You, Z. Y., Zhang, Z. M., Guo, Y. L., & Zhong, L. X. (2015). Analyzing the differentially expressed genes and pathway cross-talk in aggressive breast cancer. Journal of Obstetrics and Gynaecology, 41(1), 132–140.Google Scholar
- Conde, V. R., Oliveira, P. F., Nunes, A. R., Rocha, C. S., Ramalhosa, E., Pereira, J. A., et al. (2015). The progression from a lower to a higher invasive stage of bladder cancer is associated with severe alterations in glucose and pyruvate metabolism. Experimental Cell Research, 335(1), 91–98.PubMedCrossRefGoogle Scholar
- Córdoba-Chacón, J., Gahete, M. D., Pozo-Salas, A. I., Castaño, J. P., Kineman, R. D., & Luqu, E. R. M. (2013). Endogenous somatostatin is critical in regulating the acute effects of l-arginine on growth hormone and insulin release in mice. Endocrinology, 154(7), 2393–2398.PubMedPubMedCentralCrossRefGoogle Scholar
- Frank, E., Hall, M. A., & Witten, I. H. (2016). The WEKA workbench. Online Appendix for Data Mining: Practical Machine Learning Tools and Techniques (4th ed.). Morgan Kaufmann Series in Data Managing Systems. San Francisco, Calif., USA: Morgan Kauffman Publishers, Inc. https://www.cs.waikato.ac.nz/ml/weka/Witten_et_al_2016_appendix.pdf.
- IPA Fall 2018 Release Is Here! Now with faster functionalities and more than 49,000 datasets available for Analysis Match! https://www.qiagenbioinformatics.com/products/ingenuity-pathway-analysis.
- KO (KEGG ORTHOLOGY) Database. https://www.genome.jp/kegg/ko.html.
- Machine Learning at Waikato University. https://www.cs.waikato.ac.nz/ml/index.html.
- McDunn, J. E., Perichon, R., Neri, B., & Wittmann, B. (2015). Biomarkers for bladder cancer and methods using the same. US Patent. Pub. No.: US 2015/0065366 A1, 3/5/2015. https://patentimages.storage.googleapis.com/42/d3/37/b454c930e66f77/US20150065366A1.pdf
- MetaboAnalyst—statistical, functional and integrative analysis of metabolomics data. http://www.metaboanalyst.ca/.
- Nakamura, Y., Katagiri, T & Nakatsuru, S. (2012). Method of diagnosing bladder cancer. US Patent 7998695B2.Google Scholar
- Oliveros, J.C. (2007–2015) VENNY. An interactive tool for comparing lists with Venn Diagrams. Venny 2.1. http://bioinfogp.cnb.csic.es/tools/venny/index.html.
- Pasikanti, K. K., Esuvaranathan, K., Hong, Y., Ho, P. C., Mahendran, R., Raman Nee Mani, L., et al. (2013). Urinary metabotyping of bladder cancer using two-dimensional gas chromatography time-of-flight mass spectrometry. Journal of Proteome Research, 12(9), 3865–3873. https://doi.org/10.1021/pr4000448.PubMedCrossRefGoogle Scholar
- Puchades-Carrasco, L., Jantus-Lewintre, E., Pérez-Rambla, C., García-García, F., Lucas, R., Calabuig, S., et al. (2016). Serum metabolomic profiling facilitates the non-invasive identification of metabolic biomarkers associated with the onset and progression of non-small cell lung cancer. Oncotarget, 7(11), 12904–12916.PubMedPubMedCentralCrossRefGoogle Scholar
- Siegel, R. L., Miller, K. D., & Jemal, A. (2018). Cancer statistics, 2018. CA: A Cancer Journal for Clinicians, 68(1), 7–30.Google Scholar
- Tetko, I. V., Gasteiger, J., Todeschini, R., Mauri, A., Livingstone, D., Ertl, P, et al. (2005). Virtual computational chemistry laboratory—design and description. Journal of Computer Aided Molecular Design, 19, 453–463. VCCLAB, Virtual Computational Chemistry Laboratory. http://www.vcclab.org.PubMedCrossRefGoogle Scholar
- Tsigelny, I. F. (2018). Artificial intelligence in drug combination therapy. Briefings in Bioinformatics, bby004, https://doi.org/10.1093/bib/bby004.
- Venn, J. (1880). On the employment of geometrical diagrams for the sensible representations of logical propositions. Proceedings of the Cambridge Philoophical Society, 4, 47–59.Google Scholar
- von Rundstedt, F. C., Rajapakshe, K., Ma, J., Arnold, J. M., Gohlke, J., Putluri, V., et al. (2016). Integrative pathway analysis of metabolic signature in bladder cancer: A linkage to the cancer genome atlas project and prediction of survival. Journal of Urology, 195(6), 1911–1919.CrossRefGoogle Scholar
- Wartenberg, M., Ling, F. C., Schallenberg, M., Bäumer, A. T., Petrat, K., Hescheler, J., et al. (2001). Down-regulation of intrinsic P-glycoprotein expression in multicellular prostate tumor spheroids by reactive oxygen species. Journal of Biological Chemistry, 276(20), 17420–17428.PubMedCrossRefGoogle Scholar
- Weka 3: Data Mining Software in Java. https://www.cs.waikato.ac.nz/ml/weka/index.html.
- Welcome to HMDB Version 4.0. Retrieved October 23, 2018http://www.hmdb.ca/.
- Whitten, H., Frank, E., Hall, M. & Pal, C. J. (2016). Data Mining: Practical Machine Learning Tools and Techniques (4th ed., pp. 553–572). Morgan Kaufmann Publishers, Inc., Cambridge, Mass., USA; The WEKA Workbench. Appendix B.Google Scholar
- Wu, H., Liu, T., Ma, C., Xue, R., Deng, C., Zeng, H., et al. (2011). GC/MS-based metabolomic approach to validate the role of urinary sarcosine and target biomarkers for human prostate cancer by microwave-assisted derivatization. Analytical and Bioanalytical Chemistry, 401(2), 635–646.PubMedCrossRefGoogle Scholar
- Yigiter, M., Halici, Z., Odabasoglu, F., Keles, O. N., Atalay, F., Unal, B., et al. (2011). Growth hormone reduces tissue damage in rat ovaries subjected to torsion and detorsion: Biochemical and histopathologic evaluation. European Journal of Obstetrics & Gynecology and Reproductive Biology, 157(1), 94–100.CrossRefGoogle Scholar
- Zaravinos, A., Pieri, M., Mourmouras, N., Anastasiadou, N., Zouvani, I., Delakas, D., et al. (2014). Altered metabolic pathways in clear cell renal cell carcinoma: A meta-analysis and validation study focused on the deregulated genes and their associated networks. Oncoscience, 1(2), 117–131.PubMedPubMedCentralCrossRefGoogle Scholar
- Zhao, J., Xu, W., He, M., Zhang, Z., Zeng, S., Ma, C., et al. (2016). Whole-exome sequencing of muscle-invasive bladder cancer identifies recurrent copy number variation in IPO11 and prognostic significance of importin-11 overexpression on poor survival. Oncotarget, 7(46), 75648–75658.PubMedPubMedCentralCrossRefGoogle Scholar