Data Mining and Knowledge Discovery

, Volume 27, Issue 1, pp 146–165 | Cite as

Visualizing dimensionality reduction of systems biology data

  • Andreas Lehrmann
  • Michael Huber
  • Aydin C. Polatkan
  • Albert Pritzkau
  • Kay Nieselt
Article

Abstract

One of the challenges in analyzing high-dimensional expression data is the detection of important biological signals. A common approach is to apply a dimension reduction method, such as principal component analysis. Typically, after application of such a method the data is projected and visualized in the new coordinate system, using scatter plots or profile plots. These methods provide good results if the data have certain properties which become visible in the new coordinate system but which were hard to detect in the original coordinate system. Often however, the application of only one method does not suffice to capture all important signals. Therefore several methods addressing different aspects of the data need to be applied. We have developed a framework for linear and non-linear dimension reduction methods within our visual analytics pipeline SpRay. This includes measures that assist the interpretation of the factorization result. Different visualizations of these measures can be combined with functional annotations that support the interpretation of the results. We show an application to high-resolution time series microarray data in the antibiotic-producing organism Streptomyces coelicolor as well as to microarray data measuring expression of cells with normal karyotype and cells with trisomies of human chromosomes 13 and 21.

Keywords

Dimension reduction Principal component analysis Independent component analysis Local linear embedding Systems biology 

Mathematics Subject Classification (2000)

62H25 15A18 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abdi H, Williams LJ (2010) Principal component analysis. Wiley interdisciplinary reviews. Comput Stat 2(4): 433–459CrossRefGoogle Scholar
  2. Altug-Teber Ö, Bonin M, Walter M, Mau-Holzmann UA, Dufke A, Stappert H, Tekesin I, Heilbronner H, Nieselt K, Riess O (2008) Specific transcriptional changes in human fetuses with autosomal trisomies. Cytogenet Genome Res 119(3-4): 171–184CrossRefGoogle Scholar
  3. Battke F, Symons S, Nieselt K (2010) Mayday—integrative analytics for expression data. BMC Bioinform 11(1): 121CrossRefGoogle Scholar
  4. Battke F, Herbig A, Wentzel A, Jakobsen ØM, Bonin M, Hodgson DA, Wohlleben W, Ellingsen TE, Nieselt K (2011) A technical platform for generating reproducible expression data from Streptomyces coelicolor batch cultivations. In: Arabnia HRR, Tran QN (eds) Software tools and algorithms for biological systems, advances in experimental medicine and biology, vol 696. Springer, New York, , pp 3–15CrossRefGoogle Scholar
  5. Dietzsch J, Heinrich J, Nieselt K, Bartz D (2009) Spray: a visual analytics approach for gene expression data. In: IEEE symposium on visual analytics science and technology (VAST)Google Scholar
  6. Fontes M, Soneson C (2011) The projection score—an evaluation criterion for variable subset selection in PCA visualization. BMC Bioinform 12(1): 307CrossRefGoogle Scholar
  7. Golub GH, van Loan CF (1983) Matrix computations, 1st edn. The John Hopkins University Press, BaltimoreMATHGoogle Scholar
  8. Harrower M, Brewer C (2003) ColorBrewer.org: an online tool for selecting colour schemes for maps. Cartogr J 40(1): 27–37CrossRefGoogle Scholar
  9. Hotelling H (1933) Analysis of a complex of statistical variables into principal components. J Edu Psychol 24(7): 498–520CrossRefGoogle Scholar
  10. Hyvaerinen A (1997) New approximations of differential entropy for independent component analysis and projection pursuit. In: Advances in neural information processing systems, vol 10. pp 273–279Google Scholar
  11. Hyvaerinen A (1999) Fast and robust fixed-point algorithms for independent component analysis. IEEE Trans Neural Netw 10(3): 626–634CrossRefGoogle Scholar
  12. Hyvaerinen A, Oja E (1997) A fast fixed-point algorithm for independent component analysis. Neural Comput 9(7): 1483–1492CrossRefGoogle Scholar
  13. Hyvaerinen A, Karhunen J, Oja E (2001) Independent component analysis. In: Adaptive and learning systems for signal processing, communication, and control, 1st edn. Wiley-Interscience, New YorkGoogle Scholar
  14. Inselberg A (1985) The plane with parallel coordinates. Visual Comput 1(2): 69–91MATHCrossRefGoogle Scholar
  15. Inselberg A (2009) Parallel coordinates: visual multidimensional geometry and its applications. Springer, New YorkMATHGoogle Scholar
  16. Jeong DH, Ziemkiewicz C, Fisher B, Ribarsky W, Chang R (2009) iPCA: an interactive system for PCA-based visual analytics. Comput Graph Forum 28(3): 767–774CrossRefGoogle Scholar
  17. Joliffe I (2002) Principal component analysis, 2nd edn. Springer series in statistics, New YorkGoogle Scholar
  18. Kaiser HF (1958) The varimax criterion for analytic rotation in factor analysis. Psychometrika 23(3): 187–200MATHCrossRefGoogle Scholar
  19. Karbauskaite R, Kurasova O, Dzemyda G (2007) Selection of the number of neighbors of each data point for the locally linear embedding algorithm. Inf Technol Control 36(4): 359–364Google Scholar
  20. Kouropteva O, Okun O, Pietikinen M (2002) Selection of the optimal parameter value for the locally linear embedding algorithm. In: Proceedings of the 1st international conference on fuzzy systems and knowledge discovery, pp 359–363Google Scholar
  21. Lockhart DJ, Dong H, Byrne MC, Follettie MT, Gallo MV, Chee MS, Mittmann M, Wang C, Kobayashi M, Norton H, Brown EL (1996) Expression monitoring by hybridization to high-density oligonucleotide arrays. Nat Biotechnol 14(13): 1675–1680CrossRefGoogle Scholar
  22. Mannfolk P, Wirestam R, Nilsson M, Sthlberg F, Olsrud J (2010) Dimensionality reduction of fMRI time series data using locally linear embedding. Magn Reson Mater Phys Biol Med 23(5-6): 327–338CrossRefGoogle Scholar
  23. Nieselt K, Battke F, Herbig A, Bruheim P, Wentzel A, Jakobsen O, Sletta H, Alam M, Merlo M, Moore J, Omara W, Morrissey E, Juarez-Hermosillo M, Rodriguez-Garcia A, Nentwich M, Thomas L, Iqbal M, Legaie R, Gaze W, Challis G, Jansen R, Dijkhuizen L, Rand D, Wild D, Bonin M, Reuther J, Wohlleben W, Smith M, Burroughs N, Martin J, Hodgson D, Takano E, Breitling R, Ellingsen T, Wellington E (2010) The dynamic architecture of the metabolic switch in Streptomyces coelicolor. BMC Genomics 11(1):10Google Scholar
  24. Pearson K (1901) On lines and planes of closest fit to systems of points in space. Philos Mag 2(6): 559–572Google Scholar
  25. Roweis S, Saul L (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500): 2323–2326CrossRefGoogle Scholar
  26. Saeed A, Bhagabati N, Braisted J, Liang W, Sharov V, Howe E, Li J, Thiagarajan M, White J, Quackenbush J (2006) TM4 microarray software suite. Methods Enzymol 411: 134–193CrossRefGoogle Scholar
  27. Saul LK, Roweis ST (2003) Think globally, fit locally: unsupervised learning of low dimensional manifolds. J Mach Learn Res 4: 119–155MathSciNetGoogle Scholar
  28. Schena M, Shalon D, Davis RW, Brown PO (1995) Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 270(5235): 467–470CrossRefGoogle Scholar
  29. Shendure J, Ji H (2008) Next-generation DNA sequencing. Nat Biotechnol 26(10): 1135–1145CrossRefGoogle Scholar
  30. Tarjan R (1972) Depth-first search and linear graph algorithms. SIAM J Comput 1(2): 146–160MathSciNetMATHCrossRefGoogle Scholar
  31. Tenenbaum JB, de Silva V, Langford JC (2000) A global geometric framework for nonlinear dimensionality reduction. Science 290(5500): 2319–2323CrossRefGoogle Scholar
  32. Valencia-Aguirre J, lvarez Mesa A, Daza-Santacoloma G, Castellanos-Domnguez G (2009) Automatic choice of the number of nearest neighbors in locally linear embedding. In: Bayro-Corrochano E, Eklundh JO (eds) Progress in pattern recognition, image analysis, computer vision, and applications. Lecture notes in computer science, vol 5856. Springer, New York, pp 77–84Google Scholar
  33. Weinberger KQ, Saul LK (2006) Unsupervised learning of image manifolds by semidefinite programming. Int J Comput Vision 70(1): 77–90CrossRefGoogle Scholar
  34. Zhang Z, Zha H (2004) Principal manifolds and nonlinear dimensionality reduction via tangent space alignment. SIAM J Sci Comput 26(1): 313–338MathSciNetMATHCrossRefGoogle Scholar

Copyright information

© The Author(s) 2012

Authors and Affiliations

  • Andreas Lehrmann
    • 1
  • Michael Huber
    • 2
  • Aydin C. Polatkan
    • 1
  • Albert Pritzkau
    • 3
  • Kay Nieselt
    • 1
  1. 1.Center for Bioinformatics TübingenUniversity of TübingenTübingenGermany
  2. 2.Wilhelm Schickard Institute for Computer ScienceUniversity of TübingenTübingenGermany
  3. 3.Bild- und SignalverarbeitungUniversity of LeipzigLeipzigGermany

Personalised recommendations