Text Mining pp 63-85 | Cite as

A Topology-Based Approach to Visualize the Thematic Composition of Document Collections

  • Patrick Oesterling
  • Christian Heine
  • Gunther H. Weber
  • Gerik Scheuermann
Part of the Theory and Applications of Natural Language Processing book series (NLP)


The thematic composition of document collections is commonly conceptualized by clusters of high-dimensional point clouds. However, illustrating these clusters is challenging: typical visualizations such as colored projections or parallel coordinate plots suffer from feature occlusion and noise covering the whole visualization. We propose a method that avoids structural occlusion by using topology-based visualizations to preserve primary clustering features and neglect geometric properties that cannot be preserved in low-dimensional representations. Abstracting the input points as nested dense regions with individual properties, we provide the user with intuitive landscape visualizations that illustrate the high-dimensional clustering structure occlusion-free.


Point Cloud Linear Discriminant Analysis Dense Region Delaunay Triangulation Neighborhood Graph 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Carr H, Snoeyink J, Axen U (2003) Computing contour trees in all dimensions. Comput Geom 24(2):75–94CrossRefzbMATHMathSciNetGoogle Scholar
  2. 2.
    Choo J, Bohn S, Park H (2009) Two-stage framework for visualization of clustered high dimensional data. In: IEEE VAST, IEEE, pp 67–74Google Scholar
  3. 3.
    Davidson GS, Hendrickson B, Johnson DK, Meyers CE, Wylie BN (1998) Knowledge mining with vxinsight: discovery through interaction. J Intell Inform Syst 11:259–285CrossRefGoogle Scholar
  4. 4.
    Deerwester S, Dumais ST, Furnas GW, Landauer TK, Harshman R (1990) Indexing by latent semantic analysis. J Am Soc Inform Sci 41(6):391–407CrossRefGoogle Scholar
  5. 5.
    Edelsbrunner H, Letscher D, Zomorodian A (2002) Topological persistence and simplification. Dis Comput Geom 28(4):511–533CrossRefzbMATHMathSciNetGoogle Scholar
  6. 6.
    Elmqvist N, Dragicevic P, Fekete J-D (2008) Rolling the dice: multidimensional visual exploration using scatterplot matrix navigation. IEEE Trans Vis Comput Graph 14(6):1539–1148CrossRefGoogle Scholar
  7. 7.
    Fekete J-D, Plaisant C (1999) Excentric labeling: dynamic neighborhood labeling for data visualization. In: CHI ’99: proceedings of the SIGCHI conference on human factors in computing systemsGoogle Scholar
  8. 8.
    Fortune S (1997) Voronoi diagrams and Delaunay triangulations. In: Handbook of discrete and computational geometry. CRC Press, Boca Raton, pp 377–388Google Scholar
  9. 9.
    Gabriel RK, Sokal RR (1969) A new statistical approach to geographic variation analysis. Syst Zool 18(3):259–270CrossRefGoogle Scholar
  10. 10.
    Hinneburg A, Aggarwal C, Keim DA (2000) What is the nearest neighbor in high dimensional spaces? In: Proceedings of the 26th international conference on very large data bases (VLDB’00). Morgan Kaufmann Publishers Inc., San Francisco, pp 506–515.
  11. 11.
    Holz F, Teresniak S (2010) Towards automatic detection and tracking of topic change. In: Gelbukh A (ed) Proceedings of CICLing 2010, Iai. LNCS, vol 6008. Springer, LNCSGoogle Scholar
  12. 12.
    Ingram S, Munzner T, Olano M (2009) Glimmer: multilevel mds on the gpu. IEEE Trans Vis Comput Graph 15:249–261CrossRefGoogle Scholar
  13. 13.
    Inselberg A (2012) Parallel coordinates: visual multidimensional geometry and its applications. In: Fred ALN, Filipe J (eds) KDIR. SciTePressGoogle Scholar
  14. 14.
    Inselberg A, Dimsdale B (1990) Parallel coordinates: a tool for visualizing multi-dimensional geometry. In: VIS ’90: proceedings of the 1st conference on visualization ’90, pp 361–378Google Scholar
  15. 15.
    Jaromczyk GT, Toussaint JW (1992) Relative neighborhood graphs and their relatives. Proc IEEE 80(9):1502–1517CrossRefGoogle Scholar
  16. 16.
    John M, Chambers WS, Cleveland BK, Tukey PA (eds) (1983) Graphical methods for data analysis. The Wadsworth Statistics/Probability SerieszbMATHGoogle Scholar
  17. 17.
    Jolliffe IT (2002) Principal component analysis. Springer, New YorkzbMATHGoogle Scholar
  18. 18.
    Jonathan KB, Goldstein J, Ramakrishnan R, Shaft U (1999) When is “nearest neighbor” meaningful? In: International conference on database theory, pp 217–235Google Scholar
  19. 19.
    Kaski S, Honkela T, Lagus K, Kohonen T (1998) Websom-self-organizing maps of document collections. Neurocomputing 21(1):101–117CrossRefzbMATHGoogle Scholar
  20. 20.
    Kohonen T, Schroeder MR, Huang TS (2001) Self-organizing maps, 3rd edn. Springer, New YorkCrossRefzbMATHGoogle Scholar
  21. 21.
    Kruskal JB, Wish M (1978) Multidimensional scaling. SAGE Publications, Beverly Hills, LondonGoogle Scholar
  22. 22.
    Miller NE, Wong PC, Brewster M, Foote H (1998) Topic islands—a wavelet-based text visualization system. In: Proceedings of the conference on Visualization ’98 (VIS ’98). IEEE Computer Society Press, Los Alamitos, CA, pp 189–196Google Scholar
  23. 23.
    Oesterling P, Scheuermann G, Teresniak S, Heyer G, Koch S, Ertl T, Weber GH (2010) Two-stage framework for a topology-based projection and visualization of classified document collections. In: 2010 IEEE symposium on visual analytics science and technology (IEEE VAST), Utah, October 2010. IEEE Computer Society, pp 91–98Google Scholar
  24. 24.
    Oesterling P, Heine C, Janicke H, Scheuermann G, Heyer G (2011) Visualization of high-dimensional point clouds using their density distribution’s topology. IEEE Trans Vis Comput Graph 17(11):1547–1559CrossRefGoogle Scholar
  25. 25.
    Oesterling P, Heine C, Weber GH, Scheuermann G (2013) Visualizing nd point clouds as topological landscape profiles to guide local data analysis. IEEE Trans Vis Comput Graph 19(3):514–526CrossRefGoogle Scholar
  26. 26.
    Pascucci V, Mclaughlin KC, Scorzelli G (2005) Multi-resolution computation and presentation of contour trees, Lawrence Livermore National Laboratory. Technical report, in the proceedings of the IASTED conference on visualization, imaging, and image processing (VIIP)Google Scholar
  27. 27.
    Paulovich FV, Minghim R (2006) Text map explorer: a tool to create and explore document maps. In: 2013 17th international conference on information visualisation, pp 245–251Google Scholar
  28. 28.
    Paulovich FV, Oliveira MCF, Minghim R (2007) The projection explorer: a flexible tool for projection-based multidimensional visualization. In: Proceedings of the XX Brazilian symposium on computer graphics and image processing (SIBGRAPI ’07), Washington, DC. IEEE Computer Society, Los Alamitos, pp 27–36Google Scholar
  29. 29.
    Paulovich FV, Nonato LG, Minghim R, Levkowitz H (2008) Least square projection: a fast high-precision multidimensional projection technique and its application to document mapping. IEEE Trans Vis Comput Graph 14:564–575CrossRefGoogle Scholar
  30. 30.
    Salton G, Buckley C (1987) Term weighting approaches in automatic text retrieval. Technical report, Ithaca, NYGoogle Scholar
  31. 31.
    Sammon JW (1969) A nonlinear mapping for data structure analysis. IEEE Trans Comput 18(5):401–409CrossRefGoogle Scholar
  32. 32.
    Sebastiani F (2002) Machine learning in automated text categorization. ACM Comput Surv 34(1):1–47CrossRefGoogle Scholar
  33. 33.
    Šilić A, Bašić BD (2010) Visualization of text streams: a survey. In: Setchi R, Jordanov I, Howlett RJ, Jain LC (eds) Knowledge-based and intelligent information and engineering systems. Lecture notes in computer science, vol 6277. Springer, Berlin, pp 31–43CrossRefGoogle Scholar
  34. 34.
    Steinbach M, Ertöz L, Kumar V (2003) The challenges of clustering high-dimensional data. In: New vistas in statistical physics: applications in econophysics, bioinformatics, and pattern recognitionGoogle Scholar
  35. 35.
    Teresniak S, Heyer G, Scheuermann G, Holz F (2009) Visualisierung von Bedeutungsverschiebungen in großen diachronen Dokumentkollektionen. Datenbank-Spektrum 31:33–39Google Scholar
  36. 36.
    Weber G, Bremer P-T, Pascucci V (2007) Topological landscapes: a terrain metaphor for scientific data. IEEE Trans Vis Comput Graph 13:1416–1423CrossRefGoogle Scholar
  37. 37.
    Wise JA, Thomas JJ, Pennock K, Lantrip D, Pottier M, Schur A, Crow V (1995) Visualizing the non-visual: spatial analysis and interaction with information from text documents. In: Gershon ND, Eick SG (eds) INFOVIS. IEEE Computer Society, Los Alamitos, pp 51–58Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Patrick Oesterling
    • 1
  • Christian Heine
    • 2
  • Gunther H. Weber
    • 3
  • Gerik Scheuermann
    • 1
  1. 1.Image and Signal Processing Group, Institute of Computer ScienceLeipzig UniversityLeipzigGermany
  2. 2.Scientific Visualization Group, Department of Computer ScienceETH ZürichZürichSwitzerland
  3. 3.Computational Research DivisionLawrence Berkeley National LaboratoryBerkeleyUSA

Personalised recommendations