Data Visualisation and Exploration with Prior Knowledge

  • Martin Schroeder
  • Dan Cornford
  • Ian T. Nabney
Part of the Communications in Computer and Information Science book series (CCIS, volume 43)


Visualising data for exploratory analysis is a major challenge in many applications. Visualisation allows scientists to gain insight into the structure and distribution of the data, for example finding common patterns and relationships between samples as well as variables. Typically, visualisation methods like principal component analysis and multi-dimensional scaling are employed. These methods are favoured because of their simplicity, but they cannot cope with missing data and it is difficult to incorporate prior knowledge about properties of the variable space into the analysis; this is particularly important in the high-dimensional, sparse datasets typical in geochemistry. In this paper we show how to utilise a block-structured correlation matrix using a modification of a well known non-linear probabilistic visualisation model, the Generative Topographic Mapping (GTM), which can cope with missing data. The block structure supports direct modelling of strongly correlated variables. We show that including prior structural information it is possible to improve both the data visualisation and the model fit. These benefits are demonstrated on artificial data as well as a real geochemical dataset used for oil exploration, where the proposed modifications improved the missing data imputation results by 3 to 13%.


Root Mean Square Error Block Structure Expectation Maximisation Algorithm Data Visualisation Nonlinear Dimensionality Reduction 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bishop, C.M., Svensen, M., Williams, C.K.I.: Gtm: a principled alternative to the self-organizing map. In: Vorbrüggen, J.C., von Seelen, W., Sendhoff, B. (eds.) ICANN 1996. LNCS, vol. 1112, pp. 165–170. Springer, Heidelberg (1996)CrossRefGoogle Scholar
  2. 2.
    Bishop, C.M., Svensen, M., Williams, C.K.I.: Developments of the generative topographic mapping. Neurocomputing 21, 203–224 (1998)CrossRefzbMATHGoogle Scholar
  3. 3.
    Borg, I., Groenen, P.: Modern Multidimensional Scaling: theory and applications. Springer, Heidelberg (2005)zbMATHGoogle Scholar
  4. 4.
    Broomhead, D., Lowe, D.: Feed-forward neural networks and topographic mappings for exploratory data analysis. Complex Systems 2, 321–355 (1988)MathSciNetzbMATHGoogle Scholar
  5. 5.
    Chatfield, C., Collins, A.J.: Introduction to Multivariate Analysis. Chapman and Hall, Boca Raton (1980)CrossRefzbMATHGoogle Scholar
  6. 6.
    Dempster, A., Laird, N., Rubin., D.: Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society 39, 1–38 (1977)MathSciNetzbMATHGoogle Scholar
  7. 7.
    Ghahramani, Z., Jordan, M.I.: Learning from incomplete data. Technical Report AIM-1509 (1994)Google Scholar
  8. 8.
    Harmeling, S.: Exploring model selection techniques for nonlinear dimensionality reduction. Technical report, Edinburgh University, Scotland (2007)Google Scholar
  9. 9.
    de Silva, V., Tenenbaum, J.B., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Science 290, 2319–2323 (2000)CrossRefGoogle Scholar
  10. 10.
    Liechty, M.W., Liechty, J.C., Müller, P.: Bayesian correlation estimation. Biometrika 91, 1–14 (2004)MathSciNetCrossRefzbMATHGoogle Scholar
  11. 11.
    Kohonen, T.: Self-Organizing Maps. Springer, Heidelberg (1995)CrossRefzbMATHGoogle Scholar
  12. 12.
    Lawrence, N.D.: A scaled conjugate gradient algorithm for fast supervised learning. Journal of Machine Learning Research 6, 1783–1816 (2005)Google Scholar
  13. 13.
    Lowe, D., Tipping, M.E.: Feed-forward neural networks and topographic mappings for exploratory data analysis. Neural Computing and Applications 4, 84–95 (1996)CrossRefGoogle Scholar
  14. 14.
    Moeller, U., Radke, D.: Performance of data resampling methods for robust class discovery based on clustering. Intelligent Data Analysis 10, 139–162 (2006)Google Scholar
  15. 15.
    Roweis, S.T., Saul, L.K.: Locally linear embedding. Science 290, 2323–2326 (2000)CrossRefGoogle Scholar
  16. 16.
    Schroeder, M., Cornford, D., Farrimond, P., Cornford, C.: Addressing missing data in geochemistry: A non-linear approach. Organic Geochemistry 39, 1162–1169 (2008)CrossRefGoogle Scholar
  17. 17.
    Schroeder, M., Nabney, I.T., Cornford, D.: Block gtm: Incorporating prior knowledge of covariance structure in data visualisation. Technical report, NCRG, Aston University, Birmingham (2008)Google Scholar
  18. 18.
    Sun, Y.: Non-linear Hierarchical Visualisation. PhD thesis, Aston University (2002)Google Scholar
  19. 19.
    Yu, C.H.: Resampling methods: concepts, applications, and justification. Practical Assessment, Research and Evaluation 8 (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Martin Schroeder
    • 1
  • Dan Cornford
    • 1
  • Ian T. Nabney
    • 1
  1. 1.Aston University, NCRG, Aston TriangleBirminghamUK

Personalised recommendations