Mapping Heterogeneous Textual Data: A Multidimensional Approach Based on Spatiality and Theme

  • Jacques FizeEmail author
  • Mathieu Roche
  • Maguelonne Teisseire
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11938)


In this paper, we propose a multidimensional mapping approach for heterogeneous textual data that exploits firstly the spatial dimension and secondly the thematic dimension. Based on the Spatial Textual Representation (STR) as well as the Geodict geographic database, the contribution presented in this paper integrates the thematic dimension of documents. To support our proposal on mapping textual documents, we evaluate the different aspects of the process using two real corpora, including one corpus that is highly heterogeneous.


Text mining Spatial and thematic dimensions Heterogeneous data 


  1. 1.
    Arsevska, E., et al.: Monitoring disease outbreak events on the web using text-mining approach and domain expert knowledge. In: European Language Resources Association (ELRA), Paris, France, May 2016Google Scholar
  2. 2.
    Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)zbMATHGoogle Scholar
  3. 3.
    Bunke, H., Allermann, G.: Inexact graph recognition matching for structural pattern. Pattern Recognit. Lett. 1(May), 245–253 (1983). Scholar
  4. 4.
    Casati, R., Varzi, A.C.: Spatial entities. In: Stock, O. (ed.) Spatial and Temporal Reasoning, pp. 73–96. Springer, Dordrecht (1997). Scholar
  5. 5.
    Fischer, A., Riesen, K., Bunke, H.: Improved quadratic time approximation of graph edit distance by combining Hausdorff matching and greedy assignment. Pattern Recognit. Lett. 87, 55–62 (2017). Scholar
  6. 6.
    Fize, J., Roche, M., Teisseire, M.: Matching heterogeneous textual data using spatial features. In: 2018 IEEE International Conference on Data Mining Workshops, ICDM Workshops, Singapore, Singapore, 17–20 November 2018, pp. 1389–1396 (2018).
  7. 7.
    Fize, J., Shrivastava, G.: GeoDict: an integrated gazetteer. Association for Computational Linguistics (2017)Google Scholar
  8. 8.
    Lossio-Ventura, J.A., Jonquet, C., Roche, M., Teisseire, M.: Biomedical term extraction: overview and a new methodology. Inf. Retr. J. 19(1–2), 59–99 (2016). Scholar
  9. 9.
    Papadimitriou, P., Dasdan, A., Garcia-Molina, H.: Web graph similarity for anomaly detection. J. Internet Serv. Appl. 1(1), 19–30 (2010). Scholar
  10. 10.
    Riesen, K., Jiang, X., Bunke, H.: Exact and inexact graph matching: methodology and applications. In: Aggarwal, C.C., Wang, H. (eds.) Managing and Mining Graph Data, vol. 40, pp. 217–247. Springer, Boston (2010). Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Jacques Fize
    • 1
    • 2
    Email author
  • Mathieu Roche
    • 1
    • 2
  • Maguelonne Teisseire
    • 2
  1. 1.CIRAD, UMR TETISMontpellierFrance
  2. 2.TETIS, Univ Montpellier, AgroParisTech, CIRAD, CNRS, IRSTEAMontpellierFrance

Personalised recommendations