Estimation of Missing Values in SNP Array

  • Przemyslaw Podsiadly
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8482)


DNA microarray usage in genetics is rapidly proliferating, generating huge amount of data. It is estimated that around 5-20% of measurements do not succeed, leading to missing values in the data destined for further analysis. Missing values in further microarray analysis lead to low reliability, therefore there is a need for effective and efficient methods of missing values estimation.

This report presents a method for estimating missing values in SNP Microarrays using k-Nearest Neighbors among similar individuals. Usage of preliminary imputation is proposed and discussed. It is shown that introduction of multiple passes of kNN improves quality of missing value estimation.


microarray bioinformatics SNP array missing values 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Alizadeh, A.A., Eisen, M.B., Davis, R.E., Ma, C., Lossos, I.S., Rosenwald, A., Boldrick, J.C., Sabet, H., Tran, T., Yu, X., Powell, J.I., Yang, L., Marti, G.E., Moore, T., Hudson Jr., J., Lu, L., Lewis, D.B., Tibshirani, R., Sherlock, G., Chan, W.C., Greiner, T.C., Weisenburger, D.D., Armitage, J.O., Warnke, R., Staudt, L.M., et al.: Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403, 503–511 (2000) Google Scholar
  2. 2.
    Han, J., Kamber, M., Pei, J.: Data Mining: Concepts and Techniques, 3rd edn. Morgan Kaufmann Publishers (2011) Google Scholar
  3. 3.
    Troyanskaya, O., Cantor, M., Sherlock, G., Brown, P., Hastie, T., Tibshi-rani, R., Botstein, D., Altman, R.B.: Missing value estimation methods for DNA microarrays. Bioinformatics 17(6), 520–525 (2001) Google Scholar
  4. 4.
    Kang, H., Qin, Z.S., Niu, T., Liu, J.S.: Incorporating Genotyping Uncer-tainty in Haplotype Inference for Single-Nucleotide Polymorphisms. Am. J. Hum. Genet. 74, 495–510 (2004) Google Scholar
  5. 5.
    Patil, N., et al.: Blocks of Limited Haplotype Diversity Revealed by High-Resolution Scanning of Human Chromosome 21. Science 294, 1719–1723 (2001) Google Scholar
  6. 6.
    Sinoquet, C.: Iterative two-pass algorithm for missing data imputation in SNP arrays. Journal of Bioinformatics and Computational Biology 7(5), 833–852 (2009) Google Scholar
  7. 7.
    Zezula, P., Amato, G., Dohnal, V., Batko, M.: Similarity Search: The Me-tric Space Approach. Springer (2006) Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Przemyslaw Podsiadly
    • 1
  1. 1.Institute of Computer ScienceWarsaw University of TechnologyWarsawPoland

Personalised recommendations