Identifying Cancer Biomarkers from High-Throughput RNA Sequencing Data by Machine Learning

Zhang, Zishuang; Liu, Zhi-Ping

doi:10.1007/978-3-030-26969-2_49

Identifying Cancer Biomarkers from High-Throughput RNA Sequencing Data by Machine Learning

Zishuang Zhang¹¹ &
Zhi-Ping Liu¹¹

Conference paper
First Online: 24 July 2019

1644 Accesses
2 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11644))

Abstract

In cancer progression, the expression level of relevant genes will change significantly in tumors comparing to their healthy counterparts. Therefore, the discovery of specific genes serving as biomarkers is of practical significance for diagnosis and prognosis. The available high-throughput ‘-omic’ datasets provide unprecedented resources and opportunities of deriving cancer biomarkers, such as the public RNA-sequencing data generated by the Cancer Genome Atlas (TCGA) consortium. Here, we explore the identification of biomarker genes in 12 types of cancers from the classification effects in control and disease samples by machine learning. We firstly identify differentially expressed genes individually. Then, we implement feature selection by integrating recursive feature reduction and random forest classification with feature ranking. The final feature number will be determined via a parsimony principle that the features will be as few as possible, while they are still with the highest classification accuracy. In each cancer, the biomarker genes are then evaluated by tenfold cross-validations via several classification algorithms. We find extreme learning machine achieves the best classification performance when compared to the other methods. The further gene enrichment analyses indicate the dysfunctional and pathogenic mechanism in these identified biomarkers.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Rodriguez, H., Pennington, S.R.: Revolutionizing precision oncology through collaborative proteogenomics and data sharing. Cell 173, 535–539 (2018)
Article Google Scholar
Zhu, C., Ren, C., Han, J., et al.: A five-microRNA panel in plasma was identified as potential biomarker for early detection of gastric cancer. Br. J. Cancer 110, 2291–2299 (2014)
Article Google Scholar
Li, M., Hong, G., Cheng, J., et al.: Identifying reproducible molecular biomarkers for gastric cancer metastasis with the aid of recurrence information. Sci. Rep. 6, 24869 (2016)
Article Google Scholar
Vargas, A.J., Harris, C.C.: Biomarker development in the precision medicine era: lung cancer as a case study. Nat. Rev. Cancer 16, 525–537 (2016)
Article Google Scholar
Bhalla, S., Chaudhary, K., Kumar, R., et al.: Gene expression-based biomarkers for discriminating early and late stage of clear cell renal cancer. Sci. Rep. 7, 44997 (2017)
Article Google Scholar
Chang, K., Creighton, C.J., Davis, C., et al.: The cancer genome atlas pan-cancer analysis project. Nat. Genet. 45, 1113–1120 (2013)
Article Google Scholar
Wei, L., Lian, B., Zhang, Y., et al.: Application of microRNA and mRNA expression profiling on prognostic biomarker discovery for hepatocellular carcinoma. BMC Genom. 15, S13 (2014)
Article Google Scholar
Tsai, C.-A., Chen, J.J., Baek, S.: Development of biomarker classifiers from high-dimensional data. Brief. Bioinform. 10, 537–546 (2009)
Article Google Scholar
Dupont, P., Helleputte, T., Abeel, T., et al.: Robust biomarker identification for cancer diagnosis with ensemble feature selection methods. Bioinformatics 26, 392–398 (2009)
Google Scholar
Swan, A.L., Mobasheri, A., Allaway, D., et al.: Application of machine learning to proteomics data: classification and biomarker identification in postgenomics biology. OMICS J. Integr. Biol. 17, 595–610 (2013)
Article Google Scholar
Wenric, S., Shemirani, R.: Using supervised learning methods for gene selection in RNA-Seq case-control studies. Front. Genet. 9, 297 (2018)
Article Google Scholar
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)
MATH Google Scholar
Wong, T.-T.: Performance evaluation of classification algorithms by k-fold and leave-one-out cross validation. Pattern Recogn. 48, 2839–2846 (2015)
Article Google Scholar
Goldman, M., Craft, B., Swatloski, T., et al.: The UCSC cancer genomics browser: update 2015. Nucleic Acids Res. 43, D812–D817 (2014)
Article Google Scholar
Anders, S., Huber, W.: Differential expression analysis for sequence count data. Genome Biol. 11, R106 (2010)
Article Google Scholar
Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001)
Article Google Scholar
Kuhn, M.: Building predictive models in R using the caret package. J. Stat. Softw. 28, 1–26 (2008)
Article Google Scholar
Guyon, I., Weston, J., Barnhill, S., et al.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46, 389–422 (2002)
Article Google Scholar
Ganganwar, V.: An overview of classification algorithms for imbalanced datasets. Int. J. Emerg. Technol. Adv. Eng. 2, 42–47 (2012)
Google Scholar
Freund, Y., Schapire, R.E.: Experiments with a new boosting algorithm. In: ICML, pp. 148–156. Citeseer (1996)
Google Scholar
Huang, G.-B., Zhu, Q.-Y., Siew, C.-K.: Extreme learning machine: theory and applications. Neurocomputing 70, 489–501 (2006)
Article Google Scholar
Domingos, P., Pazzani, M.: On the optimality of the simple bayesian classifier under zero-one loss. Mach. Learn. 29, 103–130 (1997)
Article Google Scholar
Hecht-Nielsen, R.: Theory of the backpropagation neural network. In: Neural Networks for Perception, pp. 65–93. Elsevier (1992)
Google Scholar
Chen, H.-L., Yang, B., Liu, J., et al.: A support vector machine classifier with rough set-based feature selection for breast cancer diagnosis. Expert Syst. Appl. 38, 9014–9022 (2011)
Article Google Scholar
Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 27 (2011)
Article Google Scholar
Demircan, K., Cömertoğlu, İ., Akyol, S., et al.: A new biological marker candidate in female reproductive system diseases: Matrix metalloproteinase with thrombospondin motifs (ADAMTS). J. Turk. Ger. Gynecol. Assoc. 15, 250–255 (2014)
Article Google Scholar
Russell, D.L., Brown, H.M., Dunning, K.R.: ADAMTS proteases in fertility. Matrix Biol. 44–46, 54–63 (2015)
Article Google Scholar
Lindgren, D., Eriksson, P., Krawczyk, K., et al.: Cell-type-specific gene programs of the normal human nephron define kidney cancer subtypes. Cell Rep. 20, 1476–1489 (2017)
Article Google Scholar

Download references

Acknowledgement

This work was partially supported by the National Natural Science Foundation of China (Nos. 61572287 and 61533011), the Shandong Provincial Key Research and Development Program, China (No. 2018GSF118043), the Innovation Method Fund of China (Ministry of Science and Technology of China, No. 2018IM020200), and the Program of Qilu Young Scholars of Shandong University.

Author information

Authors and Affiliations

School of Control Science and Engineering, Shandong University, Jinan, 250061, Shandong, China
Zishuang Zhang & Zhi-Ping Liu

Authors

Zishuang Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Zhi-Ping Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhi-Ping Liu .

Editor information

Editors and Affiliations

Tongji University, Shanghai, China
De-Shuang Huang
University of Ulsan, Ulsan, Korea (Republic of)
Kang-Hyun Jo
Nanchang Institute of Technology, Nanchang, China
Zhi-Kai Huang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, Z., Liu, ZP. (2019). Identifying Cancer Biomarkers from High-Throughput RNA Sequencing Data by Machine Learning. In: Huang, DS., Jo, KH., Huang, ZK. (eds) Intelligent Computing Theories and Application. ICIC 2019. Lecture Notes in Computer Science(), vol 11644. Springer, Cham. https://doi.org/10.1007/978-3-030-26969-2_49

Download citation

DOI: https://doi.org/10.1007/978-3-030-26969-2_49
Published: 24 July 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-26968-5
Online ISBN: 978-3-030-26969-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics