Abstract
In recent years, microarrays have been shown to be an effective method for studying various biological processes, e.g., to improve our understanding of diseases such as cancer. In a typical situation, microarrays can be seen as large matrices in which rows and columns represent expression values of thousands of genes and tens of conditions such as samples from various patients. Several statistical techniques have been proposed in the literature to analyze the gene expression matrices. Towards that end, biclustering has been demonstrated to be one of the most effective methods for discovering gene expression patterns under various conditions. In this paper, we present a methodology to take advantage of the homogeneously expressed genes in biclusters to construct a classifier for sample class membership prediction. Our extensive experiments on 8 real cancer microarray datasets (4 diagnostic and 4 prognostic) show that our proposed classifier performed superior in both cancer diagnosis and prognosis, the latter of which was regarded quite difficult previously. Additionally, our results demonstrate that sample classification accuracy can serve as a good subjective quality measure for different types of biclusters, and hence as a tool to extrinsically evaluate the performance of various biclustering algorithms that produce those biclusters.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Alon, U., Barkai, N., Notterman, D.A., et al.: Broad patterns of gene expressionrevealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc. Nat. Acad. Sci. USA 96, 6745–6750 (1999)
Ben-Dor, A., Friedman, N., Yakhini, Z.: Class discovery in gene expression data. In: Proceedings of RECOMB 2001, pp. 31–38 (2001)
Madeira, S.C., Oliveira, A.L.: Biclustering algorithms for biological data analysis: A survey. IEEE/ACM Trans. Comput. Biol. Bioinf. 1, 24–45 (2004)
Cheng, Y., Church, G.M.: Biclustering of expression data. In: Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology (ISMB 2000), pp. 93–103 (2000)
Cho, H., Dhillon, I.S., Guan, Y., Sra, S.: Minimum sum-squared residue cococlustering of gene expression data. In: Proceedings of the Fourth SIAM International Conference on Data Mining (2004)
Liu, X., Wang, L.: Computing the maximum similarity bi-clusters of gene expression data. Bioinformatics 23, 50–56 (2007)
Klugar, Y., Basri, R., Chang, J.T., Gerstein, M.: Spectral biclustering of microarray data: Coclustering genes and conditions. Genome Res. 13, 703–716 (2003)
Tanay, A., Sharan, R., Shamir, R.: Discovering statistically significant biclusters in gene expression data. Bioinformatics 18, 136–144 (2002)
Golub, T.R., Slonim, D.K., Tamayo, P., et al.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286, 531–537 (1999)
Cai, Z., Xu, L., Shi, Y., Salavatipour, M.R., Goebel, R., Lin, G.-H.: Using gene clustering to identify discriminatory genes with higher classification accuracy. In: Proceedings of IEEE The 6th Symposium on Bioinformatics and Bioengineering (IEEE BIBE 2006), Washington D.C., USA, pp. 235–242 (2006)
Gordon, G.J., Jensen, R.V., Hsiao, L.-L., et al.: Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. Cancer Res. 62, 4963–4967 (2002)
Nutt, C.L., Mani, D.R., Betensky, R.A., et al.: Gene expression-based classification of malignant gliomas correlates better with survival than histological classification. Cancer Res. 63, 1602–1607 (2003)
Cai, Z., Goebel, R., Salavatipour, M.R., Lin, G.-H.: Selecting dissimilar genes for multi-class classification, an application in cancer subtyping. BMC Bioinform. 8, 206 (2007)
Su, A.I., Welsh, J.B., Sapinoso, L.M., et al.: Molecular classification of human carcinomas by use of gene expression signatures. Cancer Res. 61, 7388–7393 (2001)
van’t Veer, L.J., Dai, H., van de Vijver, M.J., et al.: Gene expression profiling predicts clinical outcome of breast cancer. Nature 415, 530–536 (2002)
Pomeroy, S.L., Tamayo, P., Gaasenbeek, M., et al.: Prediction of central nervous system embryonal tumor outcome based on gene expression. Nature 415, 436–442 (2002)
Singh, D., Febbo, P.G., Ross, K., et al.: Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1, 203–209 (2002)
Yang, J., Wang, W., Wang, H., Yu, P.: Enhanced biclustering on expression data. In: Proceedings of the Third IEEE Conference on Bioinformatics and Bioenginerring, pp. 321–327 (2003)
Acknowledgements
The authors would like to thank Guohui Lin for his guidance on this work and Zhipeng Cai for providing the author with a number of microarray datasets used in this study. This research was supported by the Economic Development Board and the National Research Foundation of Singapore.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Malhotra, B., Dahlmeier, D., Nandan, N. (2014). A Biclustering-Based Classification Framework for Microarray Analysis. In: Peng, WC., et al. Trends and Applications in Knowledge Discovery and Data Mining. PAKDD 2014. Lecture Notes in Computer Science(), vol 8643. Springer, Cham. https://doi.org/10.1007/978-3-319-13186-3_39
Download citation
DOI: https://doi.org/10.1007/978-3-319-13186-3_39
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-13185-6
Online ISBN: 978-3-319-13186-3
eBook Packages: Computer ScienceComputer Science (R0)