Abstract
In this paper we present our ongoing research on applying computational topology to analysis of structure of similarities within a collection of text documents. Our work is on the fringe between text mining and computational topology, and we describe techniques from each of these disciplines. We transform text documents to the so-called vector space model, which is often used in text mining. This representation is suitable for topological computations. We compute homology, using discrete Morse theory, and persistent homology of the Flag complex built from the point-cloud representing the input data. Since the space is high-dimensional, many difficulties appear. We describe how we tackle these problems and point out what challenges are still to be solved.
Chapter PDF
Similar content being viewed by others
Keywords
References
Baeza-Yates, R., Ribeiro-Neto, B.: Modern information retrieval, p. 192. Addison-Wesley Longman, Reading (1999)
Beygelzimer, A., Kakade, S., Langford, J.: Cover trees for nearest neighbor. In: Proc. of ICML 2006 (2006)
Carlson, G.: Topology and Data. Bulletin of the AMS 46(2), 255–308 (2009)
Chen, C., Kerber, M.: Persistent homology computation with a twist. In: 27th European Workshop on Computational Geometry, EuroCG 2011 (2011)
Edelsbrunner, H., Harer, J.L.: Computational Topology. An Introduction. Amer. Math. Soc., Providence (2010)
Feng, A.-X., Fu, C.-H., Xu, X.-L., Liu, A.-F., Chang, H., He, D.-R., Feng, G.-L.: An Empirical Investigation on Important Subgraphs in Cooperation-Competition networks. Science (2011)
Forman, R.: A User’s Guide To Discrete Morse Theory. Séminaire Lotharingien de Combinatoire B48c, 1–35 (2002)
Kozlov, D.: Combinatorial Algebraic Topology. Springer (2007)
Lewiner, T.: Geometric discrete Morse complexes, PhD Thesis (2005)
Lewiner, T., Lopes, H., Tavares, G.: Toward Optimality in Discrete Morse Theory. Experiment. Math. 12(3), 271–286 (2003)
Polanco, X., Juan, E.S.: Text Data Network Analysis Using Graph Approach. In: Proc. of InSciT, pp. 586–592 (2006)
Robins, V., Wood, P.J., Sheppard, A.P.: Theory and Algorithms for Constructing Discrete Morse Complexes from Grayscale Digital Images. IEEE Trans. Pattern Anal. Mach. Intell. 33, 1646–1658 (2011)
Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Commun. ACM 18(11), 613–620 (1975)
Zipf, G.K.: Human Behavior and the Principle of Least Effort. Addison-Wesley, Cambridge (1949)
Zomorodian, A.: Fast construction of the Vietoris-Rips complex. Computers & Graphics 34(3), 263–271 (2010)
Günther, D., Reininghaus, J., Wagner, H., Hotz, I.: Memory Efficient Computation of Persistent Homology for 3D Image Data using Discrete Morse Theory. In: Sibgrapi 2011, Maceio, Brazil (2011)
English Wikipedia corpus, http://dumps.wikimedia.org/enwiki/
Gensim Library, http://radimrehurek.com/gensim/
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wagner, H., Dłotko, P., Mrozek, M. (2012). Computational Topology in Text Mining. In: Ferri, M., Frosini, P., Landi, C., Cerri, A., Di Fabio, B. (eds) Computational Topology in Image Context. Lecture Notes in Computer Science, vol 7309. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-30238-1_8
Download citation
DOI: https://doi.org/10.1007/978-3-642-30238-1_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-30237-4
Online ISBN: 978-3-642-30238-1
eBook Packages: Computer ScienceComputer Science (R0)