Skip to main content

Collateral Missing Value Estimation: Robust Missing Value Estimation for Consequent Microarray Data Processing

  • Conference paper
AI 2005: Advances in Artificial Intelligence (AI 2005)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3809))

Included in the following conference series:


Microarrays have unique ability to probe thousands of genes at a time that makes it a useful tool for variety of applications, ranging from diagnosis to drug discovery. However, data generated by microarrays often contains multiple missing gene expressions that affect the subsequent analysis, as most of the times these missing values are ignored. In this paper we have analyzed how accurate estimation of missing values can lead to better subsequent gene selection and class prediction. Collateral Missing Values Estimation (CMVE), which demonstrates superior imputation performance compared to Bayesian Principal Component Analysis (BPCA) Impute, K-Nearest Neighbour (KNN) algorithm, when estimating missing values in the BRCA1, BRCA2 and Sporadic genetic mutation samples present in ovarian cancer by exploiting both local/global and positive/negative correlation values. CMVE also consistently outperforms, in terms of classification accuracies, BPCA, KNN and ZeroImpute techniques. The imputation is followed by gene selection using fusion of Between Group to within Group Sum ofSquares and Weighted Partial Least Squares where Ridge Partial Least Square algorithm is used as a class predictor.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
USD 189.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others


  1. Sehgal, M.S.B., Gondal, I., Dooley, L.: Statistical Neural Networks and Support Vector Machine for the Classification of Genetic Mutations in Ovarian Cancer. In: IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB 2004), USA, pp. 140–146 (2004)

    Google Scholar 

  2. Bhattacharjee, A., Richards, W.G., Staunton, J., Li, C., Monti, S., Vasa, P., Ladd, C., Beheshti, J., Bueno, R., Gillette, M., Loda, M., Weber, G., Mark, E.F., Lander, E.S., Wong, W., Johnson, B.E., Golub, T.R., Sugarbaker, D.J., Meyerson, M.: Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses. Presented at Proc. Natl. Acad. Sci, USA (2001)

    Google Scholar 

  3. Oba, S., Sato, M.A., Takemasa, I., Monden, M., Matsubara, K., Ishii, S.: A Bayesian Missing Value Estimation Method for Gene Expression Profile Data. Bioinformatics 19, 2088–2096 (2003)

    Article  Google Scholar 

  4. Dudoit, S., Fridlyand, J., Speed, T.P.: Comparison of discrimination methods for the classification of tumors using gene expression data. Journal of the American Statistical Association, 77–78 (2002)

    Google Scholar 

  5. Sehgal, M.S.B., Gondal, I., Dooley, L.: K-Ranked Covariance Based Missing Values Estimation for Microarray Data Classification. IEEE Hybrid Intelligent Systems (HIS 2004) 00, 274–279 (2004)

    Article  Google Scholar 

  6. Acuna, E., Rodriguez, C.: The treatment of missing values and its effect in the classifier accuracy. In: Classification, Clustering and Data Mining Applications, pp. 639–648 (2004)

    Google Scholar 

  7. Troyanskaya, O., Cantor, M., Sherlock, G., Brown, P., Hastie, T., Tibshirani, R., Botstein, D., Altman, R.: Missing Value Estimation Methods for DNA Microarrays. Bioinformatics 17, 520–525 (2001)

    Article  Google Scholar 

  8. Sehgal, M.S.B., Gondal, I., Dooley, L.: A Collateral Missing Value Estimation Algorithm for DNA Microarrays. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2005, USA, pp. 377–380 (2005)

    Google Scholar 

  9. Sehgal, M.S.B., Gondal, I., Dooley, L.: Support Vector Machine and Generalized Regression Neural Network Based Classification Fusion Models for Cancer Diagnosis. In: IEEE Hybrid Intelligent Systems, HIS 2004, Japan, pp. 49–54 (2004)

    Google Scholar 

  10. Antoniadis, A., Lambert-Lacroix, S., Leblanc, F.: Effective dimension reduction methods for tumor classification using gene expression data. Bioinformatics 19(5), 563–570 (2003)

    Article  Google Scholar 

  11. Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasen-beek, M., Mesirov, J.P., Coller, H., Loh, M.L., Down-ing, J.R., Caligiuri, M.A., Bloomfield, C.D., Lan-der, E.S.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439), 531–537 (1999)

    Article  Google Scholar 

  12. Broët, P., Lewin, A., Richardson, S., Dalmasso, C., Magdelenat, H.: A mixture model-based strategy for selecting sets of genes in multiclass response microarray experiments. Bioinformatics 20, 2562–2571 (2004)

    Article  Google Scholar 

  13. Liu, X., Krishnan, A., Mondry, A.: An Entropy-based gene selection method for cancer classification using microarray data. BMC Bioinformatics 6(76) (2005)

    Google Scholar 

  14. Sehgal, M.S.B., Gondal, I., Dooley, L.: Collateral Missing Value Imputation: a new robust missing value estimation algorithm for microarray data. Bioinformatics 21(10), 2417–2423 (2005)

    Article  Google Scholar 

  15. Hedenfalk, I., Duggan, D., Chen, Y., Radmacher, M., Bittner, M., Simon, R., Meltzer, P., Gusterson, B., Esteller, M., Kallioniemi, O.P., Wilfond, B., Borg, A., Trent, J.: Gene-expression profiles in hereditary breast cancer. N. Engl. J. Med. 344(8), 22, 539–548 (2001)

    Article  Google Scholar 

  16. Fort, G., Lambert-Lacroix, S.: Classification using partial least squares with penalized logistic regression. Bioinformatics 21, 1104–1111 (2005)

    Article  Google Scholar 

  17. Harvey, M., Arthur, C.: Fitting models to biological Data using linear and nonlinear regression. Oxford University Press, Oxford (2004)

    MATH  Google Scholar 

  18. Yeung, K.Y., Bumgarner, R.E., Raftery, A.E.: Bayesian Model Averaging: development of an improved multi-class, gene selection and classification tool for microarray data. Bioinformatics 21(10), 2394–2402 (2005)

    Article  Google Scholar 

  19. Zhou, X., Wang, X., Dougherty, E.R.: Gene Selection Using Logistic Regressions Based on AIC, BIC and MDL Criteria. New Mathematics and Natural Computation 1, 129–145 (2005)

    Article  MATH  MathSciNet  Google Scholar 

  20. Amir, A.J., Yee, C.J., Sotiriou, C., Brantley, K.R., Boyd, J., Liu, E.T.: Gene Expression Profiles of Brca1-Linked, Brca2-Linked, and Sporadic Ovarian Cancers. Journal of the National Cancer Institute 94(13) (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations


Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Sehgal, M.S.B., Gondal, I., Dooley, L. (2005). Collateral Missing Value Estimation: Robust Missing Value Estimation for Consequent Microarray Data Processing. In: Zhang, S., Jarvis, R. (eds) AI 2005: Advances in Artificial Intelligence. AI 2005. Lecture Notes in Computer Science(), vol 3809. Springer, Berlin, Heidelberg.

Download citation

  • DOI:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-30462-3

  • Online ISBN: 978-3-540-31652-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics