Cognitive Computation

, Volume 7, Issue 6, pp 652–666 | Cite as

Specific Biomarkers: Detection of Cancer Biomarkers Through High-Throughput Transcriptomics Data

  • Wei Du
  • Zhongbo Cao
  • Yan WangEmail author
  • Fengfeng Zhou
  • Wei Pang
  • Xin Chen
  • Yuan Tian
  • Yanchun LiangEmail author


Cancer is a systemic disease involving dysregulated biological processes of cell proliferation, metabolism, and apoptosis. It is known that some types of cancer have longer life span, and they are even curable if they are diagnosed and treated properly in the early stage. So it is essential to find biomarkers to detect these cancers in their early stages. With the rapid development of high-throughput microarray and sequencing technologies, many biomarker-based cancer early diagnosis assays are proposed and some are already available in the market. Most of the cancer biomarkers are detected through comparing cancer samples versus normal samples in a certain cancer type, but most of them are not in the comparison against other cancer types. In this research, we propose a novel computational method to comprehensively detect highly accurate cancer biomarkers for different groups of cancer types, with a special emphasis on the detection specificity against the control samples including both those from healthy persons and those from other cancer types. Such biomarkers are called specific biomarkers for a given cancer group, which may be defined as cancers of the same type, cancers with similar survival rates, grade, development stage, or cancers in the same human body systems, etc. The proposed algorithm is extensively evaluated across eight cancer types, and the detection performance shows that the specific biomarkers have reasonable sensitivities and very high specificities. The main contributions of this work are (a) the detection of highly specific biomarkers for eight cancer types and (b) the detection of specific biomarkers for cancers with the similar survival rates. The proposed algorithm may also be used to detect specific biomarkers for cancers of given stages, grades or belonging systems, etc.


Cancer Specific biomarker Microarray data Multiple cancer types Survival rate 



The authors are grateful to the support of NSFC (61272207, 61472158, 61402194), China 973 program (2010CB732606), the Ph.D. Program Foundation of MOE of China (20120061120106), China Postdoctoral Science Foundation (2012M520678, 2014T70291). Computing resources were partly provided by the Dawning supercomputing clusters at SIAT CAS.


  1. 1.
    Ruddon RW. Cancer biology. Oxford: Oxford University Press; 1995.Google Scholar
  2. 2.
    Boyle P, Levin B. World cancer report 2008: international agency for research on cancer (Lyon); 2008.Google Scholar
  3. 3.
    Biomarkers Definitions Working G. Biomarkers and surrogate endpoints: preferred definitions and conceptual framework. Clin Pharmacol Therap. 2001;69(3):89–95.CrossRefGoogle Scholar
  4. 4.
    Barrett T, Wilhite SE, Ledoux P, Evangelista C, Kim IF, Tomashevsky M, Marshall KA, Phillippy KH, Sherman PM, Holko M et al. NCBI GEO: archive for functional genomics data sets–update. Nucl Acids Res. 2013; 41 (Database issue):D991-995.Google Scholar
  5. 5.
    Hubble J, Demeter J, Jin H, Mao M, Nitzberg M, Reddy TB, Wymore F, Zachariah ZK, Sherlock G, Ball CA. Implementation of GenePattern within the Stanford microarray database. Nucl Acids Res. 2009; 37 (Database issue):D898-901.Google Scholar
  6. 6.
    Rhodes DR, Kalyana-Sundaram S, Mahavisno V, Varambally R, Yu J, Briggs BB, Barrette TR, Anstet MJ, Kincead-Beal C, Kulkarni P, et al. Oncomine 3.0: genes, pathways, and networks in a collection of 18,000 cancer gene expression profiles. Neoplasia. 2007;9(2):166–80.PubMedCentralPubMedCrossRefGoogle Scholar
  7. 7.
    Miyamoto DT, Sequist LV, Lee RJ. Circulating tumour cells-monitoring treatment response in prostate cancer. Nat Rev Clin Oncol. 2014;11(7):401–12.PubMedCrossRefGoogle Scholar
  8. 8.
    Lai Q, Avolio AW, Graziadei I, Otto G, Rossi M, Tisone G, Goffette P, Vogel W, Pitton MB, Lerut J, et al. Alpha-fetoprotein and modified response evaluation criteria in solid tumors progression after locoregional therapy as predictors of hepatocellular cancer recurrence and death after transplantation. Liver Transplant. 2013;19(10):1108–18.Google Scholar
  9. 9.
    Akhavan-Niaki H, Samadani AA. Molecular insight in gastric cancer induction: an overview of cancer stemness genes. Cell Biochem Biophys. 2014;68(3):463–73.PubMedCrossRefGoogle Scholar
  10. 10.
    Duffy MJ. Carcinoembryonic antigen as a marker for colorectal cancer: is it clinically useful? Clin Chem. 2001;47(4):624–30.PubMedGoogle Scholar
  11. 11.
    Canney PA, Moore M, Wilkinson PM, James RD. Ovarian cancer antigen CA125: a prospective clinical assessment of its role as a tumour marker. Br J Cancer. 1984;50(6):765–9.PubMedCentralPubMedCrossRefGoogle Scholar
  12. 12.
    Bauer TM, El-Rayes BF, Li XB, Hammad N, Philip PA, Shields AF, Zalupski MM, Bekaii-Saab T. Carbohydrate antigen 19-9 is a prognostic and predictive biomarker in patients with advanced pancreatic cancer who receive gemcitabine-containing chemotherapy. Cancer. 2013;119(2):285–92.PubMedCentralPubMedCrossRefGoogle Scholar
  13. 13.
    Saeys Y, Inza I, Larranaga P. A review of feature selection techniques in bioinformatics. Bioinformatics. 2007;23(19):2507–17.PubMedCrossRefGoogle Scholar
  14. 14.
    Hsu AL, Tang SL, Halgamuge SK. An unsupervised hierarchical dynamic self-organizing approach to cancer class discovery and marker gene identification in microarray data. Bioinformatics. 2003;19(16):2131–40.PubMedCrossRefGoogle Scholar
  15. 15.
    Liu JJ, Cutler G, Li W, Pan Z, Peng S, Hoey T, Chen L, Ling XB. Multiclass cancer classification and biomarker discovery using GA-based algorithms. Bioinformatics. 2005;21(11):2691–7.PubMedCrossRefGoogle Scholar
  16. 16.
    Beattie BJ, Robinson PN. Binary state pattern clustering: a digital paradigm for class and biomarker discovery in gene microarray studies of cancer. J Comput Biol. 2006;13(5):1114–30.PubMedCrossRefGoogle Scholar
  17. 17.
    Peng Y, Li W, Liu Y. A hybrid approach for biomarker discovery from microarray gene expression data for cancer classification. Cancer Inf. 2006;2:301–11.Google Scholar
  18. 18.
    Yousef M, Jung S, Showe LC, Showe MK. Recursive cluster elimination (RCE) for classification and feature selection from gene expression data. BMC Bioinf. 2007;8:144.CrossRefGoogle Scholar
  19. 19.
    Harris C, Ghaffari N. Biomarker discovery across annotated and unannotated microarray datasets using semi-supervised learning. BMC Genom. 2008;9(Suppl 2):S7.CrossRefGoogle Scholar
  20. 20.
    Abeel T, Helleputte T, Van de Peer Y, Dupont P, Saeys Y. Robust biomarker identification for cancer diagnosis with ensemble feature selection methods. Bioinformatics. 2010;26(3):392–8.PubMedCrossRefGoogle Scholar
  21. 21.
    Yousef M, Ketany M, Manevitz L, Showe LC, Showe MK. Classification and biomarker identification using gene network modules and support vector machines. BMC Bioinf. 2009;10:337.CrossRefGoogle Scholar
  22. 22.
    Xu Y, Cui J, Puett D. Cancer bioinformatics. Berlin: Springer; 2014.CrossRefGoogle Scholar
  23. 23.
    Xu K, Cui J, Olman V, Yang Q, Puett D, Xu Y. A comparative analysis of gene-expression data of multiple cancer types. PLoS One. 2010;5(10):e13696.PubMedCentralPubMedCrossRefGoogle Scholar
  24. 24.
    Xu K, Mao X, Mehta M, Cui J, Zhang C, Xu Y. A comparative study of gene-expression data of basal cell carcinoma and melanoma reveals new insights about the two cancers. PLoS One. 2012;7(1):e30750.PubMedCentralPubMedCrossRefGoogle Scholar
  25. 25.
    Yoon D, Lee EK, Park T. Robust imputation method for missing values in microarray data. BMC Bioinf. 2007;8(Suppl 2):S6.CrossRefGoogle Scholar
  26. 26.
    Yang YH, Dudoit S, Luu P, Lin DM, Peng V, Ngai J, Speed TP. Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. Nucl Acids Res. 2002;30(4):e15.PubMedCentralPubMedCrossRefGoogle Scholar
  27. 27.
    Autio R, Kilpinen S, Saarela M, Kallioniemi O, Hautaniemi S, Astola J. Comparison of Affymetrix data normalization methods using 6,926 experiments across five array generations. BMC Bioinf. 2009;10(Suppl. 1):S24.CrossRefGoogle Scholar
  28. 28.
    da Huang W, Sherman BT, Lempicki RA. Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucl Acids Res. 2009;37(1):1–13.PubMedCentralCrossRefGoogle Scholar
  29. 29.
    McLachlan GJ. Discriminant analysis and statistical pattern recognition. Hoboken: Wiley; 2004.Google Scholar
  30. 30.
    Kohavi R. A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Proceedings of the 14th international joint conference on Artificial intelligence. Vol 2; Montreal. 1643047: Morgan Kaufmann Publishers Inc. 1995; 1137–1143.Google Scholar
  31. 31.
    Wu X, Yamada-Mabuchi M, Morris EJ, Tanwar PS, Dobens L, Gluderer S, Khan S, Cao J, Stocker H, Hafen E, et al. The Drosophila homolog of human tumor suppressor TSC-22 promotes cellular growth, proliferation, and survival. Proc Nat Acad Sci USA. 2008;105(14):5414–9.PubMedCentralPubMedCrossRefGoogle Scholar
  32. 32.
    Kashuba VI, Li J, Wang F, Senchenko VN, Protopopov A, Malyukova A, Kutsenko AS, Kadyrova E, Zabarovska VI, Muravenko OV, et al. RBSP3 (HYA22) is a tumor suppressor gene implicated in major epithelial malignancies. Proc Nat Acad Sci USA. 2004;101(14):4906–11.PubMedCentralPubMedCrossRefGoogle Scholar
  33. 33.
    Wang X, Fu S, Wang Y, Yu P, Hu J, Gu W, Xu XM, Lu P. Interleukin-1beta mediates proliferation and differentiation of multipotent neural precursor cells through the activation of SAPK/JNK pathway. Mol Cell Neurosci. 2007;36(3):343–54.PubMedCrossRefGoogle Scholar
  34. 34.
    Watari A, Yutsudo M. Multi-functional gene ASY/Nogo/RTN-X/RTN4: apoptosis, tumor suppression, and inhibition of neuronal regeneration. Apoptosis. 2003;8(1):5–9.PubMedCrossRefGoogle Scholar
  35. 35.
    Walsh LA, Nawshad A, Medici D. Discoidin domain receptor 2 is a critical regulator of epithelial-mesenchymal transition. Matrix Biol. 2011;30(4):243–7.PubMedCentralPubMedCrossRefGoogle Scholar
  36. 36.
    Martinez A, Pio R, Lopez J, Cuttitta F. Expression of the adrenomedullin binding protein, complement factor H, in the pancreas and its physiological impact on insulin secretion. J Endocrinol. 2001;170(3):503–11.PubMedCrossRefGoogle Scholar
  37. 37.
    Brittsan AG, Kranias EG. Phospholamban and cardiac contractile function. J Mol Cell Cardiol. 2000;32(12):2131–9.PubMedCrossRefGoogle Scholar
  38. 38.
    Penkov D, Ni R, Else C, Pinol-Roma S, Ramirez F, Tanaka S. Cloning of a human gene closely related to the genes coding for the c-myc single-strand binding proteins. Gene. 2000;243(1–2):27–36.PubMedCrossRefGoogle Scholar
  39. 39.
    Nomura DK, Long JZ, Niessen S, Hoover HS, Ng SW, Cravatt BF. Monoacylglycerol lipase regulates a fatty acid network that promotes cancer pathogenesis. Cell. 2010;140(1):49–61.PubMedCentralPubMedCrossRefGoogle Scholar
  40. 40.
    Mithani SK, Smith IM, Califano JA. Use of integrative epigenetic and cytogenetic analyses to identify novel tumor-suppressor genes in malignant melanoma. Melanoma Res. 2011;21(4):298–307.PubMedCentralPubMedCrossRefGoogle Scholar
  41. 41.
    Maruyama H, Kleeff J, Wildi S, Friess H, Buchler MW, Israel MA, Korc M. Id-1 and Id-2 are overexpressed in pancreatic cancer and in dysplastic lesions in chronic pancreatitis. Am J Pathol. 1999;155(3):815–22.PubMedCentralPubMedCrossRefGoogle Scholar
  42. 42.
    Ling MT, Wang X, Zhang X, Wong YC. The multiple roles of Id-1 in cancer progression. Differentiation. 2006;74(9–10):481–7.PubMedCrossRefGoogle Scholar
  43. 43.
    Liu F, Singh A, Yang Z, Garcia A, Kong Y, Meyskens FL Jr. MiTF links Erk1/2 kinase and p21 CIP1/WAF1 activation after UVC radiation in normal human melanocytes and melanoma cells. Mol Cancer. 2010;9:214.PubMedCentralPubMedCrossRefGoogle Scholar
  44. 44.
    Cao D, Hustinx SR, Sui G, Bala P, Sato N, Martin S, Maitra A, Murphy KM, Cameron JL, Yeo CJ et al. Identification of novel highly expressed genes in pancreatic ductal adenocarcinomas through a bioinformatics analysis of expressed sequence tags. Cancer Biol Ther. 2004; 3(11):1081–1089; discussion 1090–1081.Google Scholar
  45. 45.
    Russo A, O’Bryan JP. Intersectin 1 is required for neuroblastoma tumorigenesis. Oncogene. 2012;31(46):4828–34.PubMedCentralPubMedCrossRefGoogle Scholar
  46. 46.
    Niu J, Chang Z, Peng B, Xia Q, Lu W, Huang P, Tsao MS, Chiao PJ. Keratinocyte growth factor/fibroblast growth factor-7-regulated cell migration and invasion through activation of NF-kappaB transcription factors. J Biol Chem. 2007;282(9):6001–11.PubMedCrossRefGoogle Scholar
  47. 47.
    Singh P, Wig JD, Srinivasan R. The Smad family and its role in pancreatic cancer. Ind J Cancer. 2011;48(3):351–60.CrossRefGoogle Scholar
  48. 48.
    Deakin NO, Turner CE. Distinct roles for paxillin and Hic-5 in regulating breast cancer cell morphology, invasion, and metastasis. Mol Biol Cell. 2011;22(3):327–41.PubMedCentralPubMedCrossRefGoogle Scholar
  49. 49.
    Hait WN, Yang JM. The individualization of cancer therapy: the unexpected role of p53. Trans Am Clin Climatol Assoc. 2006; 17:85–101; discussion 101.Google Scholar
  50. 50.
    Nishigaki M, Aoyagi K, Danjoh I, Fukaya M, Yanagihara K, Sakamoto H, Yoshida T, Sasaki H. Discovery of aberrant expression of R-RAS by cancer-linked DNA hypomethylation in gastric cancer using microarrays. Cancer Res. 2005;65(6):2115–24.PubMedCrossRefGoogle Scholar
  51. 51.
    Koutros S, Schumacher FR, Hayes RB, Ma J, Huang WY, Albanes D, Canzian F, Chanock SJ, Crawford ED, Diver WR, et al. Pooled analysis of phosphatidylinositol 3-kinase pathway variants and risk of prostate cancer. Cancer Res. 2010;70(6):2389–96.PubMedCentralPubMedCrossRefGoogle Scholar
  52. 52.
    Lal G, Hashimi S, Smith BJ, Lynch CF, Zhang L, Robinson RA, Weigel RJ. Extracellular matrix 1 (ECM1) expression is a novel prognostic marker for poor long-term survival in breast cancer: a hospital-based cohort study in Iowa. Ann Surg Oncol. 2009;16(8):2280–7.PubMedCrossRefGoogle Scholar
  53. 53.
    Zhang WM, Liu WT, Xu Y, Xuan Q, Zheng J, Li YY. Study of genes related to gastric cancer and its premalignant lesions with fluorescent differential display. Ai zheng = Aizheng = Chin J Cancer. 2004;23(3):264–8.Google Scholar
  54. 54.
    Vermeulen SJ, Nollet F, Teugels E, Vennekens KM, Malfait F, Philippe J, Speleman F, Bracke ME, van Roy FM, Mareel MM. The alphaE-catenin gene (CTNNA1) acts as an invasion-suppressor gene in human colon cancer cells. Oncogene. 1999;18(4):905–15.PubMedCrossRefGoogle Scholar
  55. 55.
    Kotsinas A, Pateras IS, Galanos PS, Karamouzis MV, Sfikakis PP, Gorgoulis VG. Why is p53-inducible gene 3 rarely affected in cancer? Oncogene. 2010;29(37):5220.PubMedCrossRefGoogle Scholar
  56. 56.
    Tapper J, Kettunen E, El-Rifai W, Seppala M, Andersson LC, Knuutila S. Changes in gene expression during progression of ovarian carcinoma. Cancer Gen Cytogen. 2001;128(1):1–6.CrossRefGoogle Scholar
  57. 57.
    Pritchard C, Mecham B, Dumpit R, Coleman I, Bhattacharjee M, Chen Q, Sikes RA, Nelson PS. Conserved gene expression programs integrate mammalian prostate development and tumorigenesis. Cancer Res. 2009;69(5):1739–47.PubMedCrossRefGoogle Scholar
  58. 58.
    Ho WC, Pikor L, Gao Y, Elliott BE, Greer PA. Calpain 2 regulates Akt-FoxO-p27(Kip1) protein signaling pathway in mammary carcinoma. J Biol Chem. 2012;287(19):15458–65.PubMedCentralPubMedCrossRefGoogle Scholar
  59. 59.
    Hou X, Liu JE, Liu W, Liu CY, Liu ZY, Sun ZY. A new role of NUAK1: directly phosphorylating p53 and regulating cell proliferation. Oncogene. 2011;30(26):2933–42.PubMedCrossRefGoogle Scholar
  60. 60.
    Terraube V, Pendu R, Baruch D, Gebbink MF, Meyer D, Lenting PJ, Denis CV. Increased metastatic potential of tumor cells in von Willebrand factor-deficient mice. J Thromb Haemost. 2006;4(3):519–26.PubMedCrossRefGoogle Scholar
  61. 61.
    Savitha R, Suresh S, Kim HJ. A meta-cognitive learning algorithm for an extreme learning machine classifier. Cogn Comput. 2014;6(2):253–63.CrossRefGoogle Scholar
  62. 62.
    Akusok A, Miche Y, Hegedus J, Nian R, Lendasse A. A two-stage methodology using K-NN and false-positive minimizing ELM for nominal data classification. Cogn Comput. 2014;6(3):432–45.CrossRefGoogle Scholar
  63. 63.
    Du W, Sun Y, Wang Y, Cao ZB, Zhang C, Liang YC. A novel multi-stage feature selection method for microarray expression data analysis. Int J Data Min Bioinf. 2013;7(1):58–77.CrossRefGoogle Scholar
  64. 64.
    Cui J, Liu Q, Puett D, Xu Y. Computational prediction of human proteins that can be secreted into the bloodstream. Bioinformatics. 2008;24(20):2370–5.PubMedCentralPubMedCrossRefGoogle Scholar
  65. 65.
    Hong CS, Cui JA, Ni ZH, Su YY, Puett D, Li F, Xu Y. A computational method for prediction of excretory proteins and application to identification of gastric cancer markers in Urine. PLoS One. 2011;6(2):e16875.PubMedCentralPubMedCrossRefGoogle Scholar
  66. 66.
    Wang J, Liang Y, Wang Y, Cui J, Liu M, Du W, Xu Y. Computational prediction of human salivary proteins from blood circulation and application to diagnostic biomarker identification. PLoS One. 2013;8(11):e80211.PubMedCentralPubMedCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2015

Authors and Affiliations

  1. 1.Key Laboratory of Symbolic Computation and Knowledge Engineering of the Ministry of Education, College of Computer Science and TechnologyJilin UniversityChangchunChina
  2. 2.Department of Biochemistry and Molecular Biology, Institute of BioinformaticsUniversity of GeorgiaAthensUSA
  3. 3.Shenzhen Institutes of Advanced Technology, Key Lab for Health InformaticsChinese Academy of SciencesShenzhenChina
  4. 4.School of Natural and Computing SciencesUniversity of AberdeenAberdeenUK

Personalised recommendations