Skip to main content

Missing Value Imputation Framework for Microarray Significant Gene Selection and Class Prediction

  • Conference paper
Data Mining for Biomedical Applications (BioDM 2006)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 3916))

Included in the following conference series:

Abstract

Microarray data is used in a large number of applications ranging from diagnosis through to drug discovery. Such data however, often contains multiple missing genetic expressions which are generally ignored thus degrading the reliability of inferred results. This paper presents an innovative and robust imputation framework that more accurately estimates missing values leading subsequently to better gene selection and class prediction. To prove this premise, several missing value techniques including the Collateral Missing Values Estimation (CMVE), Bayesian Principal Component Analysis (BPCA), Least Square Impute (LSImpute), k-Nearest Neighbour (KNN) and ZeroImpute are analysed. A combination of univariate and multiple gene selection methods, namely, Between Group to within Group Sum of Squares and Weighted Partial Least Squares is then performed before applying class prediction using the Ridge Partial Least Square method. Overall, CMVE imputation consistently provided superior missing values estimation accuracy compared with the other algorithms examined, by virtue of exploiting local and global as well as positive and negative correlations between genes, with all empirical results being corroborated by the two-sided Wilcoxon Rank sum statistical significance test.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Sehgal, M.S.B., Gondal, I., Dooley, L.: Collateral Missing Value Imputation: a new robust missing value estimation algorithm for microarray data. Bioinformatics 21(10), 2417–2423 (2005)

    Article  MATH  Google Scholar 

  2. Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasen-beek, M., Mesirov, J.P., Coller, H., Loh, M.L., Down-ing, J.R., Caligiuri, M.A., Bloomfield, C.D., Lan-der, E.S.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439), 531–537 (1999)

    Article  Google Scholar 

  3. Bhattacharjee, A., Richards, W.G., Staunton, J., Li, C., Monti, S., Vasa, P., Ladd, C., Beheshti, J., Bueno, R., Gillette, M., Loda, M., Weber, G., Mark, E.F., Lander, E.S., Wong, W., Johnson, B.E., Golub, T.R., Sugarbaker, D.J., Meyerson, M.: Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses. Proc. Natl. Acad. Sci., 13790–13795 (2001)

    Google Scholar 

  4. Sehgal, M.S.B., Gondal, I., Dooley, L.: A Collateral Missing Value Estimation Algorithm for DNA Microarrays. In: 2005 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), USA, pp. 377–380 (2005)

    Google Scholar 

  5. Oba, S., Sato, M.A., Takemasa, I., Monden, M., Matsubara, K., Ishii, S.: A Bayesian Missing Value Estimation Method for Gene Expression Profile Data. Bioinformatics 19, 2088–2096 (2003)

    Article  Google Scholar 

  6. Sehgal, M.S.B., Gondal, I., Dooley, L.: Support Vector Machine and Generalized Regression Neural Network Based Classification Fusion Models for Cancer Diagnosis. In: IEEE Hybrid Intelligent Systems (HIS) 2004, Japan, pp. 49–54 (2004)

    Google Scholar 

  7. Fort, G., Lambert-Lacroix, S.: Classification using partial least squares with penalized logistic regression. Bioinformatics 21, 1104–1111 (2005)

    Article  Google Scholar 

  8. Liu, X., Krishnan, A., Mondry, A.: An Entropy-based gene selection method for cancer classification using microarray data. BMC Bioinformatics 6, 76 (2005)

    Article  Google Scholar 

  9. Hedenfalk, I., Duggan, D., Chen, Y., Radmacher, M., Bittner, M., Simon, R., Meltzer, P., Gusterson, B., Esteller, M., Kallioniemi, O.P., Wilfond, B., Borg, A., Trent, J.: Gene-expression profiles in hereditary breast cancer. N. Engl. J. Med. 344(8), 539–548 (2001)

    Article  Google Scholar 

  10. Sehgal, M.S.B., Gondal, I., Dooley, L.: Statistical Neural Networks and Support Vector Machine for the Classification of Genetic Mutations in Ovarian Cancer. In: IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB) 2004, USA, pp. 140–146 (2004)

    Google Scholar 

  11. Bø, T.H., Dysvik, B., Jonassen, I.: LSimpute: Accurate estimation of missing values in microarray data with least squares methods. Nucleic Acids Res. 32(3), e34 (2004)

    Article  Google Scholar 

  12. Troyanskaya, M., Cantor, G., Sherlock, P., Brown, T., Hastie, R., Tibshirani, D.: Missing Value Estimation Methods for DNA Microarrays. Bioinformatics 17, 520–525 (2001)

    Article  Google Scholar 

  13. Sehgal, M.S.B., Gondal, I., Dooley, L.: Collateral Missing Value Estimation: Robust missing value estimation for consequent microarray data processing. Lecture Notes in Artificial Intelligence (LNAI), pp. 274–283. Springer, Heidelberg (2005)

    MATH  Google Scholar 

  14. Chen, P.Y., Popovich, P.M.: Correlation: Parametric and Nonparametric Measures, 1st edn. SAGE Publications, Thousand Oaks (2002)

    Book  Google Scholar 

  15. Boulesteix, A.-L.: PLS Dimension Reduction for Classification with Microarray Data. In: Statistical Applications in Genetics and Molecular Biology, vol. 3 (2003)

    Google Scholar 

  16. Yeung, K.Y., Bumgarner, R.E., Raftery, A.E.: Bayesian Model Averaging: development of an improved multiclass, gene selection and classification tool for microarray data. Bioinformatics 21(10), 2394–2402 (2005)

    Article  Google Scholar 

  17. Zhou, X., Wang, X., Dougherty, E.R.: Gene Selection Using Logistic Regressions Based on AIC, BIC and MDL Criteria. New Mathematics and Natural Computation 1, 129–145 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  18. Sehgal, M.S.B., Gondal, I., Dooley, L.: Missing Values Imputation for DNA Microarray Data using Ranked Covariance Vectors. The International Journal of Hybrid Intelligent Systems (IJHIS) (2005) ISSN 1448-5869

    Google Scholar 

  19. Sidak, Z., Sen, P.K., Hajek, J.: Theory of Rank Tests (Probability and Mathematical Statistics). Academic Press, London (1999)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Sehgal, M.S.B., Gondal, I., Dooley, L. (2006). Missing Value Imputation Framework for Microarray Significant Gene Selection and Class Prediction. In: Li, J., Yang, Q., Tan, AH. (eds) Data Mining for Biomedical Applications. BioDM 2006. Lecture Notes in Computer Science(), vol 3916. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11691730_14

Download citation

  • DOI: https://doi.org/10.1007/11691730_14

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-33104-9

  • Online ISBN: 978-3-540-33105-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics