Missing Value Imputation Framework for Microarray Significant Gene Selection and Class Prediction

Sehgal, Muhammad Shoaib B.; Gondal, Iqbal; Dooley, Laurence

doi:10.1007/11691730_14

Muhammad Shoaib B. Sehgal²²,
Iqbal Gondal²² &
Laurence Dooley²²

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 3916))

Included in the following conference series:

International Workshop on Data Mining for Biomedical Applications

1025 Accesses
3 Citations

Abstract

Microarray data is used in a large number of applications ranging from diagnosis through to drug discovery. Such data however, often contains multiple missing genetic expressions which are generally ignored thus degrading the reliability of inferred results. This paper presents an innovative and robust imputation framework that more accurately estimates missing values leading subsequently to better gene selection and class prediction. To prove this premise, several missing value techniques including the Collateral Missing Values Estimation (CMVE), Bayesian Principal Component Analysis (BPCA), Least Square Impute (LSImpute), k-Nearest Neighbour (KNN) and ZeroImpute are analysed. A combination of univariate and multiple gene selection methods, namely, Between Group to within Group Sum of Squares and Weighted Partial Least Squares is then performed before applying class prediction using the Ridge Partial Least Square method. Overall, CMVE imputation consistently provided superior missing values estimation accuracy compared with the other algorithms examined, by virtue of exploiting local and global as well as positive and negative correlations between genes, with all empirical results being corroborated by the two-sided Wilcoxon Rank sum statistical significance test.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Impact of missing data imputation methods on gene expression clustering and classification

Article Open access 26 February 2015

Missing value estimation for microarray data through cluster analysis

Article 13 February 2017

Missing Value Imputation Using Correlation Coefficient

References

Sehgal, M.S.B., Gondal, I., Dooley, L.: Collateral Missing Value Imputation: a new robust missing value estimation algorithm for microarray data. Bioinformatics 21(10), 2417–2423 (2005)
Article MATH Google Scholar
Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasen-beek, M., Mesirov, J.P., Coller, H., Loh, M.L., Down-ing, J.R., Caligiuri, M.A., Bloomfield, C.D., Lan-der, E.S.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439), 531–537 (1999)
Article Google Scholar
Bhattacharjee, A., Richards, W.G., Staunton, J., Li, C., Monti, S., Vasa, P., Ladd, C., Beheshti, J., Bueno, R., Gillette, M., Loda, M., Weber, G., Mark, E.F., Lander, E.S., Wong, W., Johnson, B.E., Golub, T.R., Sugarbaker, D.J., Meyerson, M.: Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses. Proc. Natl. Acad. Sci., 13790–13795 (2001)
Google Scholar
Sehgal, M.S.B., Gondal, I., Dooley, L.: A Collateral Missing Value Estimation Algorithm for DNA Microarrays. In: 2005 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), USA, pp. 377–380 (2005)
Google Scholar
Oba, S., Sato, M.A., Takemasa, I., Monden, M., Matsubara, K., Ishii, S.: A Bayesian Missing Value Estimation Method for Gene Expression Profile Data. Bioinformatics 19, 2088–2096 (2003)
Article Google Scholar
Sehgal, M.S.B., Gondal, I., Dooley, L.: Support Vector Machine and Generalized Regression Neural Network Based Classification Fusion Models for Cancer Diagnosis. In: IEEE Hybrid Intelligent Systems (HIS) 2004, Japan, pp. 49–54 (2004)
Google Scholar
Fort, G., Lambert-Lacroix, S.: Classification using partial least squares with penalized logistic regression. Bioinformatics 21, 1104–1111 (2005)
Article Google Scholar
Liu, X., Krishnan, A., Mondry, A.: An Entropy-based gene selection method for cancer classification using microarray data. BMC Bioinformatics 6, 76 (2005)
Article Google Scholar
Hedenfalk, I., Duggan, D., Chen, Y., Radmacher, M., Bittner, M., Simon, R., Meltzer, P., Gusterson, B., Esteller, M., Kallioniemi, O.P., Wilfond, B., Borg, A., Trent, J.: Gene-expression profiles in hereditary breast cancer. N. Engl. J. Med. 344(8), 539–548 (2001)
Article Google Scholar
Sehgal, M.S.B., Gondal, I., Dooley, L.: Statistical Neural Networks and Support Vector Machine for the Classification of Genetic Mutations in Ovarian Cancer. In: IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB) 2004, USA, pp. 140–146 (2004)
Google Scholar
Bø, T.H., Dysvik, B., Jonassen, I.: LSimpute: Accurate estimation of missing values in microarray data with least squares methods. Nucleic Acids Res. 32(3), e34 (2004)
Article Google Scholar
Troyanskaya, M., Cantor, G., Sherlock, P., Brown, T., Hastie, R., Tibshirani, D.: Missing Value Estimation Methods for DNA Microarrays. Bioinformatics 17, 520–525 (2001)
Article Google Scholar
Sehgal, M.S.B., Gondal, I., Dooley, L.: Collateral Missing Value Estimation: Robust missing value estimation for consequent microarray data processing. Lecture Notes in Artificial Intelligence (LNAI), pp. 274–283. Springer, Heidelberg (2005)
MATH Google Scholar
Chen, P.Y., Popovich, P.M.: Correlation: Parametric and Nonparametric Measures, 1st edn. SAGE Publications, Thousand Oaks (2002)
Book Google Scholar
Boulesteix, A.-L.: PLS Dimension Reduction for Classification with Microarray Data. In: Statistical Applications in Genetics and Molecular Biology, vol. 3 (2003)
Google Scholar
Yeung, K.Y., Bumgarner, R.E., Raftery, A.E.: Bayesian Model Averaging: development of an improved multiclass, gene selection and classification tool for microarray data. Bioinformatics 21(10), 2394–2402 (2005)
Article Google Scholar
Zhou, X., Wang, X., Dougherty, E.R.: Gene Selection Using Logistic Regressions Based on AIC, BIC and MDL Criteria. New Mathematics and Natural Computation 1, 129–145 (2005)
Article MathSciNet MATH Google Scholar
Sehgal, M.S.B., Gondal, I., Dooley, L.: Missing Values Imputation for DNA Microarray Data using Ranked Covariance Vectors. The International Journal of Hybrid Intelligent Systems (IJHIS) (2005) ISSN 1448-5869
Google Scholar
Sidak, Z., Sen, P.K., Hajek, J.: Theory of Rank Tests (Probability and Mathematical Statistics). Academic Press, London (1999)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of IT, Monash University, Churchill, 3842, VIC, Australia
Muhammad Shoaib B. Sehgal, Iqbal Gondal & Laurence Dooley

Authors

Muhammad Shoaib B. Sehgal
View author publications
You can also search for this author in PubMed Google Scholar
Iqbal Gondal
View author publications
You can also search for this author in PubMed Google Scholar
Laurence Dooley
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computer Engineering, Nanyang Technological University, 639798, Singapore
Jinyan Li
The Hong Kong University of Science and Technology, Hong Kong, China
Qiang Yang
Intelligent Systems Centre and School of Computer Engineering, Nanyang Technological University, Nanyang Avenue, 639798, Singapore
Ah-Hwee Tan

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sehgal, M.S.B., Gondal, I., Dooley, L. (2006). Missing Value Imputation Framework for Microarray Significant Gene Selection and Class Prediction. In: Li, J., Yang, Q., Tan, AH. (eds) Data Mining for Biomedical Applications. BioDM 2006. Lecture Notes in Computer Science(), vol 3916. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11691730_14

Download citation

DOI: https://doi.org/10.1007/11691730_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-33104-9
Online ISBN: 978-3-540-33105-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Missing Value Imputation Framework for Microarray Significant Gene Selection and Class Prediction

Abstract

Access this chapter

Preview

Similar content being viewed by others

Impact of missing data imputation methods on gene expression clustering and classification

Missing value estimation for microarray data through cluster analysis

Missing Value Imputation Using Correlation Coefficient

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Missing Value Imputation Framework for Microarray Significant Gene Selection and Class Prediction

Abstract

Access this chapter

Preview

Similar content being viewed by others

Impact of missing data imputation methods on gene expression clustering and classification

Missing value estimation for microarray data through cluster analysis

Missing Value Imputation Using Correlation Coefficient

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation