Advertisement

An Accurate MDS-Based Algorithm for the Visualization of Large Multidimensional Datasets

  • Antoine Naud
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4029)

Abstract

A common task in data mining is the visualization of multivariate objects on scatterplots, allowing human observers to perceive subtle inter-relations in the dataset such as outliers, groupings or other regularities. Least- squares multidimensional scaling (MDS) is a well known Exploratory Data Analysis family of techniques that produce dissimilarity or distance preserving layouts in a nonlinear way. In this framework, the issue of visualizing large multidimensional datasets through MDS-based methods is addressed. An original scheme providing very accurate layouts of large datasets is introduced. It is a compromise between the computational complexity O(N 5/2) and the accuracy of the solution that makes it suitable both for visualization of fairly large datasets and preprocessing in pattern recognition tasks.

Keywords

Cluster Center Multidimensional Scaling Association Scheme Basis Size Pattern Recognition Task 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Alsabti, K., Ranka, S., Singh, V.: An efficient k-means clustering algorithm. In: IPPS/SPDP Workshop on High Performance Data Mining (1998)Google Scholar
  2. 2.
    Basalaj, W.: Incremental multidimensional scaling method for database visualization. In: Visual Data Exploration and Analysis VI, SPIE, vol. 3643, pp. 149–158 (1999)Google Scholar
  3. 3.
    Bishop, C.M., Svensen, J.F.M., Williams, C.K.I.: GTM: The Generative Topographic Mapping. Neural Computation 10(1), 215–234 (1998)CrossRefGoogle Scholar
  4. 4.
    Blake, C.L., Merz, C.J.: UCI Repository of machine learning databases. University of California, Department of Information and Computer Science, Irvine, CA (1998)Google Scholar
  5. 5.
    Brodbeck, D., Girardin, L.: Combining Topological Clustering and Multidimensional Scaling for Visualising Large Data Sets. (Unpublished paper) (accepted for, but not published) In: IEEE Information Visualization (1998)Google Scholar
  6. 6.
    Chalmers, M.: A linear iteration time layout algorithm for visualising high-dimensional data. In: Proceeding of the IEEE Visualization 1996, San Francisco, pp. 127–132 (1996)Google Scholar
  7. 7.
    Cox, T.F., Cox, M.A.A.: Multidimensional Scaling. In: Monographs on Statistics and Applied Probability, vol. 59, Chapman & Hall, Boca Raton (1994)Google Scholar
  8. 8.
    Faloutsos, C., Lin, K.-I.: FastMap: A Fast Algorithm for Indexing, Data-Mining and Visualization of Traditional and Multimedia Datasets. In: Proceeding of the SIGMOD Conference, pp. 163–174 (1995)Google Scholar
  9. 9.
    Kanungo, T., Mount, D.M., Netanyahu, N.S., Piatko, C.D., Silverman, R., Wu, A.Y.: An Efficient k-Means Clustering Algorithm: Analysis and Implementation. IEEE Trans. PAMI 24(7), 881–892 (2002)Google Scholar
  10. 10.
    Kohonen, T.: Self-Organizing Maps. Springer, Berlin (1995)Google Scholar
  11. 11.
    Morrison, A., Ross, G., Chalmers, M.: Fast multidimensional scaling through sampling, springs and interpolation. In: Proceeding Information Visualization, vol. 2(1), pp. 68–77 (2003)Google Scholar
  12. 12.
    Morrison, A., Chalmers, M.: Improving hybrid MDS with pivot-based searching. In: Proceeding of the Information Visualization, vol. 4(2), pp. 109–122 (2005)Google Scholar
  13. 13.
    Naud, A., Duch, W.: Visualization of large datasets using MDS combined with LVQ. In: Rutkowski, L., Kacprzyk, J. (eds.) Proceeding of the Information Sixth International Conference on Neural Networks and Soft Computing, Zakopane, pp. 632–637 (2002)Google Scholar
  14. 14.
    Naud, A.: Visualization of high-dimensional data using an association of multidimensional scaling to clustering. In: Proceeding of the Information 2004 IEEE Cybernetics and Intelligent Systems, Singapore (2004)Google Scholar
  15. 15.
    Pelleg, D., Moore, A.: Accelerating Exact k -means Algorithms with Geometric Reasoning. In: Knowledge Discovery and Data Mining, pp. 277–281 (1999)Google Scholar
  16. 16.
    Saul, L.K., Roweis, S.T.: Think Globally, Fit Locally: Unsupervised Learning of Low Dimensional Manifolds. Journal of Machine Learning Research 4, 119–155 (2003)CrossRefMathSciNetGoogle Scholar
  17. 17.
    Schölkopf, B., Smola, A., Müller, K.-R.: Nonlinear Component analysis as a Kernel Eigenvalue Problem. Neural Computation 10(5), 1299–1319 (1998)CrossRefGoogle Scholar
  18. 18.
    Schwenker, F., Kestler, H., Palm, G.: Algorithms for the visualization of large and multivariate datasets. In: Seiffet, U., Jain, L.C. (eds.) Self-organizing neural networks, ch.8, pp. 165–183. Physica-Verlag, Heidelberg (2002)Google Scholar
  19. 19.
    Williams, M., Munzner, T.: Steerable, Progressive Multidimensional Scaling. In: Proceeding of the Information InfoVis 2004, pp. 57–64 (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Antoine Naud
    • 1
  1. 1.Department of InformaticsNicolaus Copernicus UniversityToruńPoland

Personalised recommendations