Collateral Missing Value Estimation: Robust Missing Value Estimation for Consequent Microarray Data Processing

Sehgal, Muhammad Shoaib B.; Gondal, Iqbal; Dooley, Laurence

doi:10.1007/11589990_30

Muhammad Shoaib B. Sehgal²⁰,
Iqbal Gondal²⁰ &
Laurence Dooley²⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3809))

Included in the following conference series:

Australasian Joint Conference on Artificial Intelligence

1770 Accesses
5 Citations

Abstract

Microarrays have unique ability to probe thousands of genes at a time that makes it a useful tool for variety of applications, ranging from diagnosis to drug discovery. However, data generated by microarrays often contains multiple missing gene expressions that affect the subsequent analysis, as most of the times these missing values are ignored. In this paper we have analyzed how accurate estimation of missing values can lead to better subsequent gene selection and class prediction. Collateral Missing Values Estimation (CMVE), which demonstrates superior imputation performance compared to Bayesian Principal Component Analysis (BPCA) Impute, K-Nearest Neighbour (KNN) algorithm, when estimating missing values in the BRCA1, BRCA2 and Sporadic genetic mutation samples present in ovarian cancer by exploiting both local/global and positive/negative correlation values. CMVE also consistently outperforms, in terms of classification accuracies, BPCA, KNN and ZeroImpute techniques. The imputation is followed by gene selection using fusion of Between Group to within Group Sum ofSquares and Weighted Partial Least Squares where Ridge Partial Least Square algorithm is used as a class predictor.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Sehgal, M.S.B., Gondal, I., Dooley, L.: Statistical Neural Networks and Support Vector Machine for the Classification of Genetic Mutations in Ovarian Cancer. In: IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB 2004), USA, pp. 140–146 (2004)
Google Scholar
Bhattacharjee, A., Richards, W.G., Staunton, J., Li, C., Monti, S., Vasa, P., Ladd, C., Beheshti, J., Bueno, R., Gillette, M., Loda, M., Weber, G., Mark, E.F., Lander, E.S., Wong, W., Johnson, B.E., Golub, T.R., Sugarbaker, D.J., Meyerson, M.: Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses. Presented at Proc. Natl. Acad. Sci, USA (2001)
Google Scholar
Oba, S., Sato, M.A., Takemasa, I., Monden, M., Matsubara, K., Ishii, S.: A Bayesian Missing Value Estimation Method for Gene Expression Profile Data. Bioinformatics 19, 2088–2096 (2003)
Article Google Scholar
Dudoit, S., Fridlyand, J., Speed, T.P.: Comparison of discrimination methods for the classification of tumors using gene expression data. Journal of the American Statistical Association, 77–78 (2002)
Google Scholar
Sehgal, M.S.B., Gondal, I., Dooley, L.: K-Ranked Covariance Based Missing Values Estimation for Microarray Data Classification. IEEE Hybrid Intelligent Systems (HIS 2004) 00, 274–279 (2004)
Article Google Scholar
Acuna, E., Rodriguez, C.: The treatment of missing values and its effect in the classifier accuracy. In: Classification, Clustering and Data Mining Applications, pp. 639–648 (2004)
Google Scholar
Troyanskaya, O., Cantor, M., Sherlock, G., Brown, P., Hastie, T., Tibshirani, R., Botstein, D., Altman, R.: Missing Value Estimation Methods for DNA Microarrays. Bioinformatics 17, 520–525 (2001)
Article Google Scholar
Sehgal, M.S.B., Gondal, I., Dooley, L.: A Collateral Missing Value Estimation Algorithm for DNA Microarrays. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2005, USA, pp. 377–380 (2005)
Google Scholar
Sehgal, M.S.B., Gondal, I., Dooley, L.: Support Vector Machine and Generalized Regression Neural Network Based Classification Fusion Models for Cancer Diagnosis. In: IEEE Hybrid Intelligent Systems, HIS 2004, Japan, pp. 49–54 (2004)
Google Scholar
Antoniadis, A., Lambert-Lacroix, S., Leblanc, F.: Effective dimension reduction methods for tumor classification using gene expression data. Bioinformatics 19(5), 563–570 (2003)
Article Google Scholar
Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasen-beek, M., Mesirov, J.P., Coller, H., Loh, M.L., Down-ing, J.R., Caligiuri, M.A., Bloomfield, C.D., Lan-der, E.S.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439), 531–537 (1999)
Article Google Scholar
Broët, P., Lewin, A., Richardson, S., Dalmasso, C., Magdelenat, H.: A mixture model-based strategy for selecting sets of genes in multiclass response microarray experiments. Bioinformatics 20, 2562–2571 (2004)
Article Google Scholar
Liu, X., Krishnan, A., Mondry, A.: An Entropy-based gene selection method for cancer classification using microarray data. BMC Bioinformatics 6(76) (2005)
Google Scholar
Sehgal, M.S.B., Gondal, I., Dooley, L.: Collateral Missing Value Imputation: a new robust missing value estimation algorithm for microarray data. Bioinformatics 21(10), 2417–2423 (2005)
Article Google Scholar
Hedenfalk, I., Duggan, D., Chen, Y., Radmacher, M., Bittner, M., Simon, R., Meltzer, P., Gusterson, B., Esteller, M., Kallioniemi, O.P., Wilfond, B., Borg, A., Trent, J.: Gene-expression profiles in hereditary breast cancer. N. Engl. J. Med. 344(8), 22, 539–548 (2001)
Article Google Scholar
Fort, G., Lambert-Lacroix, S.: Classification using partial least squares with penalized logistic regression. Bioinformatics 21, 1104–1111 (2005)
Article Google Scholar
Harvey, M., Arthur, C.: Fitting models to biological Data using linear and nonlinear regression. Oxford University Press, Oxford (2004)
MATH Google Scholar
Yeung, K.Y., Bumgarner, R.E., Raftery, A.E.: Bayesian Model Averaging: development of an improved multi-class, gene selection and classification tool for microarray data. Bioinformatics 21(10), 2394–2402 (2005)
Article Google Scholar
Zhou, X., Wang, X., Dougherty, E.R.: Gene Selection Using Logistic Regressions Based on AIC, BIC and MDL Criteria. New Mathematics and Natural Computation 1, 129–145 (2005)
Article MATH MathSciNet Google Scholar
Amir, A.J., Yee, C.J., Sotiriou, C., Brantley, K.R., Boyd, J., Liu, E.T.: Gene Expression Profiles of Brca1-Linked, Brca2-Linked, and Sporadic Ovarian Cancers. Journal of the National Cancer Institute 94(13) (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of IT, Monash University, Churchill, VIC, 3842, Australia
Muhammad Shoaib B. Sehgal, Iqbal Gondal & Laurence Dooley

Authors

Muhammad Shoaib B. Sehgal
View author publications
You can also search for this author in PubMed Google Scholar
Iqbal Gondal
View author publications
You can also search for this author in PubMed Google Scholar
Laurence Dooley
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Guangxi Normal University, College of CS and IT, Guilin, China, and University of Technology, Faculty of Engineering and Information Technology, Sydney, Australia
Shichao Zhang
Department of Electrical and Computer Systems Engineering, Monash University, 3800, Melbourne, Victoria, Australia
Ray Jarvis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sehgal, M.S.B., Gondal, I., Dooley, L. (2005). Collateral Missing Value Estimation: Robust Missing Value Estimation for Consequent Microarray Data Processing. In: Zhang, S., Jarvis, R. (eds) AI 2005: Advances in Artificial Intelligence. AI 2005. Lecture Notes in Computer Science(), vol 3809. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11589990_30

Download citation

DOI: https://doi.org/10.1007/11589990_30
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-30462-3
Online ISBN: 978-3-540-31652-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics