Advertisement

The Linear Factorial Smoothing for the Analysis of Incomplete Data

  • Basavanneppa Tallur
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3776)

Abstract

Huge amounts of data are generated in every field of science and technology and the need for the proper data analysis tools and their adaptation to the ever-increasing data size is more and more crucial. Statistical exploratary data analysis techniques –such as principal component analysis, correspondence analysis, clustering and classification among others– are greatly useful in discovering useful information –or knowledge– hidden in data but they require the data set to be complete. In many situations the data is incomplete for various reasons. Erroneous and uncertain data may also be considered as missing since their use may lead to incorrect results. Many research works have addressed this issue in specific applications. This paper presents a simple and efficient iterative method for estimating the missing values in the data set based on linear factorial smoothing. Though this work was prompted by the recurrent problem faced in the field of bioinformatics while analysing the gene expression data, the method proposed for missing value imputation in this paper may be useful in any area.

Keywords

Principal Component Analysis Down Syndrome Gene Expression Data Correspondence Analysis Independent Component Analysis 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Alter, O., Brown, P., Botstein, D.: Singular value decomposition for genome-wide expression data processing and modelling. PNAS 97, 10101–10106 (2000)CrossRefGoogle Scholar
  2. 2.
    Girolami, M., Breitling, R.: Biologically valid linear factor models of gene expression. Bioinformatics 20, 3021–3033 (2004)CrossRefGoogle Scholar
  3. 3.
    Eisen, M., Spellman, P., Brown, P., Botstein, D.: Cluster analysis and display of genome-wide expression patterns. PNAS 95, 14863–14868 (1998)CrossRefGoogle Scholar
  4. 4.
    Lee, S.I., Batzoglou, S.: Application of independent component analysis to microarrays. Genome Biol. 4 (2003)Google Scholar
  5. 5.
    Lebart, L., Morineau, A., Warwick, K.: Multivariate descriptive statistical analysis, Correspondence analysis and related techniques for large matrices. Wiley series in probability and mathematical statistics (1984)Google Scholar
  6. 6.
    Mao, R., Zielke, C.L., Zielke, H.R., Pevnser, J.: Global upregulation of chromosome 21 gene expression in the developing Down syndrome brain. Genomics 81, 457–467 (2003)CrossRefGoogle Scholar
  7. 7.
    Roberts, S., Everson, R. (eds.): Independent component analysis Principles and practice. Cambridge University Press, Cambridge (2001)zbMATHGoogle Scholar
  8. 8.
    Roweis, S.: EM Algorithms for PCA and SPCA. In: Advances in neural informartion processing systems vol. 10 (1998)Google Scholar
  9. 9.
    Tallur, B.: Analyse des correspondances en cas de données manquantes: application en biologie. Thèse doctorat de 3ème cycle, Université de Paris 6 (1973)Google Scholar
  10. 10.
    Tallur, B.: Contribution à l’analyse exploratoire de tableaux de contingence par la classification. Thèse doctorat ès science, Université de Rennes 1 (1988)Google Scholar
  11. 11.
    Tallur, B.: Analyse des données de l’expression génomique par la classification: pourquoi et comment? In: Méthodes et perspectives en classification, Presse académique de Nauchâtel (2003)Google Scholar
  12. 12.
    Troyanskaya, O., Cantor, M., Sherlock, G., Brown, P., Hastie, T., Tibshirani, R., Botstein, D., Altman, R.: Missing value estimation methods for DNA microarrays Bioinformatics 17, 520–525 (2001)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Basavanneppa Tallur
    • 1
  1. 1.IRISA, Université de Rennes 1RennesFrance

Personalised recommendations