Statistical Methods for Identifying Biomarkers from miRNA Profiles of Cancers

  • Junghyun Namkung
Part of the Methods in Molecular Biology book series (MIMB, volume 1882)


Biomarkers play important roles in early diagnosis and treatment plan for cancer patients and the importance is growing. With advances in high-throughput molecular profiling technology for various types of molecules such as DNA, RNA, proteins, or metabolites, it is now possible to perform massive profiling analysis that allows accelerating discovery of novel biomolecules. Because no single marker is sufficiently accurate for clinical use, the cancer biomarker is developed in the form of multiple biomarker panels. No single marker is sufficiently accurate for clinical use, and thus cancer biomarkers are developed in the form of multiple biomarker panels. Of various types of molecular biomarkers, microRNA (miRNA) has emerged as a class of promising cancer biomarker recently. MiRNAs are small noncoding RNAs that regulate gene expression. The chapter overviews the process of identification of biomarker panels from miRNA profiles focusing on statistical methods. Introduction to molecular cancer biomarkers is touched first. From sample design to miRNA profiling process is reviewed in the method section.

Statistical methods for biomarker development are introduced according to three typical purposes of molecular biomarkers: tumor subtype classification, early detection, and prediction of treatment response or prognosis of patients. Example codes for R program are provided as well for selected methods.

Key words

Cancer biomarker Biomarker identification Penalized regression Cox proportional hazard model Molecular subtype MiRNA profile 


  1. 1.
    Kim C, Baker J, Ph D, Cronin M, Baehner FL, Walker MG et al (2004) A multigene assay to predict recurrence of tamoxifen-treated, node-negative breast cancer. N Engl J Med 351(27):2817–2826CrossRefPubMedGoogle Scholar
  2. 2.
    Carthew RW, Sontheimer EJ (2009) Origins and mechanisms of miRNAs and siRNAs. Cell 136(4):642–655CrossRefPubMedGoogle Scholar
  3. 3.
    Siomi H, Siomi MC (2010) Posttranscriptional regulation of MicroRNA biogenesis in animals. Mol Cell 38(3):323–332CrossRefPubMedGoogle Scholar
  4. 4.
    Gu S, Kay MA (2010) How do miRNAs mediate translational repression? Silence 1:11CrossRefPubMedGoogle Scholar
  5. 5.
    Srivastava SK, Arora S, Singh S, Bhardwaj A, Averett C, Singh AP (2014) MicroRNAs in pancreatic malignancy: progress and promises. Cancer Lett 347(2):167–174CrossRefPubMedGoogle Scholar
  6. 6.
    Esquela-Kerscher A, Slack FJ (2006) Oncomirs—MicroRNAs with a role in cancer. Nat Rev Cancer 6(4):259–269CrossRefPubMedGoogle Scholar
  7. 7.
    Croce CM (2009) Causes and consequences of microRNA dysregulation in cancer. Nat Rev Genet:704–714CrossRefPubMedGoogle Scholar
  8. 8.
    Peng Y, Croce CM (2016) The role of MicroRNAs in human cancer. Signal Transduct Target Ther 1:15004CrossRefPubMedGoogle Scholar
  9. 9.
    Nikitina EG, Urazova LN, Stegny VN (2012) MicroRNAs and human cancer. Exp Oncol 34(1):2–8PubMedGoogle Scholar
  10. 10.
    Namkung J, Kwon W, Choi Y, Yi SG, Han S, Kang MJ et al (2016) Molecular subtypes of pancreatic cancer based on miRNA expression profiles have independent prognostic value. J Gastroenterol Hepatol 31(6):1160–1167CrossRefPubMedGoogle Scholar
  11. 11.
    Szafranska-Schwarzbach AE, Adai AT, Lee LS, Conwell DL, Andruss BF (2011) Development of a miRNA-based diagnostic assay for pancreatic ductal adenocarcinoma. Expert Rev Mol Diagn 11(3):249–257CrossRefPubMedGoogle Scholar
  12. 12.
    Rundle A, Ahsan H, Vineis P (2012) Better cancer biomarker discovery through better study design. Eur J Clin Invest 42(12):1350–1359CrossRefPubMedGoogle Scholar
  13. 13.
    Perez-Gracia JL, Sanmamed MF, Bosch A, Patiño-Garcia A, Schalper KA, Segura V et al (2017) Strategies to design clinical studies to identify predictive biomarkers in cancer research. Cancer Treat Rev 53:79–97CrossRefPubMedGoogle Scholar
  14. 14.
    Dobbin KK, Zhao Y, Simon RM (2008) How large a training set is needed to develop a classifier for microarray data? Clin Cancer Res 14(1):108–114CrossRefPubMedGoogle Scholar
  15. 15.
    Dobbin KK, Simon RM (2007) Sample size planning for developing classifiers using high-dimensional DNA microarray data. Biostatistics 8(1):101–117CrossRefPubMedGoogle Scholar
  16. 16.
    Baker M (2010) MicroRNA profiling: separating signal from noise. Nat Methods 7(9):687–692CrossRefPubMedGoogle Scholar
  17. 17.
    Debey-Pascher S, Chen J, Voss T, Staratschek-Jox A (2012) Blood-based miRNA preparation for noninvasive biomarker development. Methods Mol Biol 822:307–338CrossRefPubMedGoogle Scholar
  18. 18.
    Hua YJ, Tu K, Tang ZY, Li YX, Xiao HS (2008) Comparison of normalization methods with microRNA microarray. Genomics 92(2):122–128CrossRefPubMedGoogle Scholar
  19. 19.
    Tam S, Tsao MS, McPherson JD (2015) Optimization of miRNA-seq data preprocessing. Brief Bioinform 16(6):950–963CrossRefPubMedGoogle Scholar
  20. 20.
    Zwiener I, Frisch B, Binder H (2014) Transforming RNA-Seq data to improve the performance of prognostic gene signatures. PLoS One 9(1)CrossRefPubMedGoogle Scholar
  21. 21.
    Yepes S, Mercedes Torres M (2016) Mining datasets for molecular subtyping in cancer. J Data Min Genomics Proteomics 7(1):185Google Scholar
  22. 22.
    Oh SC, Park YY, Park ES, Lim JY, Kim SM, Kim SB et al (2012) Prognostic gene expression signature associated with two molecularly distinct subtypes of colorectal cancer. Gut 61(9):1291–1298CrossRefGoogle Scholar
  23. 23.
    Jézéquel P, Loussouarn D, Guérin-Charbonnel C, Campion L, Vanier A, Gouraud W et al (2015) Gene-expression molecular subtyping of triple-negative breast cancer tumours: importance of immune response. Breast Cancer Res 17(1):43CrossRefPubMedGoogle Scholar
  24. 24.
    Ashkani J, Naidoo KJ (2016) Glycosyltransferase gene expression profiles classify cancer types and propose prognostic subtypes. Sci Rep 6Google Scholar
  25. 25.
    Frantzi M, Van Kessel KE, Zwarthoff EC, Marquez M, Rava M, Malats N et al (2016) Development and validation of urine-based peptide biomarker panels for detecting bladder cancer in a multi-center study. Clin Cancer Res 22(16):4077–4086CrossRefGoogle Scholar
  26. 26.
    Tibshirani R (1996) Regression selection and shrinkage via the lasso. J R Stat Soc B 58(1):267–288Google Scholar
  27. 27.
    Bhalla S, Chaudhary K, Kumar R, Sehgal M, Kaur H, Sharma S et al (2017) Gene expression-based biomarkers for discriminating early and late stage of clear cell renal cancer. Sci Rep 7Google Scholar
  28. 28.
    Lee JW, Lee JB, Park M, Song SH (2005) An extensive comparison of recent classification tools applied to microarray data. Comput Stat Data Anal 48(4):869–885CrossRefGoogle Scholar
  29. 29.
    Mao Y, Zhao X, Wang S, Cheng Y (2007) Urinary nucleosides based potential biomarker selection by support vector machine for bladder cancer recognition. Anal Chim Acta 598(1):34–40CrossRefPubMedGoogle Scholar
  30. 30.
    Riddick G, Song H, Ahn S, Walling J, Borges-Rivera D, Zhang W et al (2011) Predicting in vitro drug sensitivity using random forests. Bioinformatics 27(2):220–224CrossRefGoogle Scholar
  31. 31.
    Le Van T, van Leeuwen M, Carolina Fierro A, De Maeyer D, Van den Eynden J, Verbeke L et al (2016) Simultaneous discovery of cancer subtypes and subtype features by molecular data integration. Bioinformatics 32(17):i445–i454CrossRefPubMedGoogle Scholar
  32. 32.
    Breiman L (2001) Random forests. Mach Learn 45(1):5–32CrossRefGoogle Scholar
  33. 33.
    Seligson DB, Horvath S, Shi T, Yu H, Tze S, Grunstein M et al (2005) Global histone modification patterns predict risk of prostate cancer recurrence. Nature 435(7046):1262–1266CrossRefPubMedGoogle Scholar
  34. 34.
    Meding S, Nitsche U, Balluff B, Elsner M, Rauser S, Schöne C et al (2012) Tumor classification of six common cancer types based on proteomic profiling by MALDI imaging. J Proteome Res 11(3):1996–2003CrossRefPubMedGoogle Scholar
  35. 35.
    Calle ML, Urrea V, Boulesteix AL, Malats N (2011) AUC-RF: A new strategy for genomic profiling with random forest. Hum Hered 72(2):121–132CrossRefPubMedGoogle Scholar
  36. 36.
    De Paoli M, Gogalic S, Sauer U, Preininger C, Pandha H, Simpson G et al (2016) Multiplatform biomarker discovery for bladder cancer recurrence diagnosis. Dis Markers 2016Google Scholar
  37. 37.
    Gui J, Li H (2005) Penalized Cox regression analysis in the high-dimensional and low-sample size settings, with applications to microarray gene expression data. Bioinformatics 21(13):3001–3008CrossRefPubMedGoogle Scholar
  38. 38.
    Tibshirani R (1997) The lasso method for variable selection in the cox model. Stat Med 16(4):385–395CrossRefGoogle Scholar
  39. 39.
    Park H, Niida A, Miyano S, Imoto S (2015) Sparse overlapping group lasso for integrative multi-omics analysis. J Comput Biol 22(2):73–84CrossRefPubMedGoogle Scholar
  40. 40.
    Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc Ser B Stat Methodol 67(2):301–320CrossRefGoogle Scholar
  41. 41.
    Gosho M, Nagashima K, Sato Y (2012) Study designs and statistical analyses for biomarker research. Sensors (Switzerland) 12(7):8966–8986CrossRefGoogle Scholar
  42. 42.
    Kim VN, Nam JW (2006) Genomics of microRNA. Trends Genet 22(3):165–173CrossRefGoogle Scholar
  43. 43.
    Morgos L (2014) Non-negative factorization for clustering of microarray data. Int J Comput Commun Control 9(1):16–23CrossRefGoogle Scholar
  44. 44.
    Mohammed N, University of Z (2012) Evaluation of partitioning around medoids algorithm with various metrics on microarray data. J Integr Bioinform:1–22Google Scholar
  45. 45.
    Kianmehr K, Alshalalfa M, Alhajj R (2010) Fuzzy clustering-based discretization for gene expression classification. Knowl Inf Syst 24(3):441–465CrossRefGoogle Scholar
  46. 46.
    Wright MW, Bruford E (2011) a. Naming “junk”: human non-protein coding RNA (ncRNA) gene nomenclature. Hum Genomics 5(2):90–98CrossRefPubMedGoogle Scholar
  47. 47.
    Kaufman L, Rousseeuw PJ (1990) Finding groups in data: an introduction to cluster analysis (Wiley series in probability and statistics). Eepe Ethz Ch:342Google Scholar
  48. 48.
    Dai X, Li T, Bai Z, Yang Y, Liu X, Zhan J et al (2015) Breast cancer intrinsic subtype classification, clinical use and future trends. Am J Cancer Res 5(10):2929–2943PubMedCentralPubMedGoogle Scholar
  49. 49.
    Tibshirani R, Walther G, Hastie T (2001) Estimating the number of clusters in a data set via the gap statistic. J R Stat Soc Ser B (Statistical Methodol) 63(2):411–423CrossRefGoogle Scholar
  50. 50.
    Collisson EA, Sadanandam A, Olson P, Gibb WJ, Truitt M, Gu S et al (2011) Subtypes of pancreatic ductal adenocarcinoma and their differing responses to therapy. Nat Med 17(4):500–503CrossRefPubMedGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  • Junghyun Namkung
    • 1
  1. 1.Data Analytics CoE, Data R&D CenterSK TelecomSeoulKorea

Personalised recommendations