Abstract
In this paper, we propose a multidimensional mapping approach for heterogeneous textual data that exploits firstly the spatial dimension and secondly the thematic dimension. Based on the Spatial Textual Representation (STR) as well as the Geodict geographic database, the contribution presented in this paper integrates the thematic dimension of documents. To support our proposal on mapping textual documents, we evaluate the different aspects of the process using two real corpora, including one corpus that is highly heterogeneous.
Keywords
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Geodict is available at this address: http://dx.doi.org/10.18167/DVN1/MWQQOQ.
- 2.
Group of morphemes or words that follow each other with a specific meaning.
- 3.
- 4.
References
Arsevska, E., et al.: Monitoring disease outbreak events on the web using text-mining approach and domain expert knowledge. In: European Language Resources Association (ELRA), Paris, France, May 2016
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
Bunke, H., Allermann, G.: Inexact graph recognition matching for structural pattern. Pattern Recognit. Lett. 1(May), 245–253 (1983). https://doi.org/10.1016/0167-8655(83)90033-8
Casati, R., Varzi, A.C.: Spatial entities. In: Stock, O. (ed.) Spatial and Temporal Reasoning, pp. 73–96. Springer, Dordrecht (1997). https://doi.org/10.1007/978-0-585-28322-7_3
Fischer, A., Riesen, K., Bunke, H.: Improved quadratic time approximation of graph edit distance by combining Hausdorff matching and greedy assignment. Pattern Recognit. Lett. 87, 55–62 (2017). https://doi.org/10.1016/j.patrec.2016.06.014
Fize, J., Roche, M., Teisseire, M.: Matching heterogeneous textual data using spatial features. In: 2018 IEEE International Conference on Data Mining Workshops, ICDM Workshops, Singapore, Singapore, 17–20 November 2018, pp. 1389–1396 (2018). https://doi.org/10.1109/ICDMW.2018.00197
Fize, J., Shrivastava, G.: GeoDict: an integrated gazetteer. Association for Computational Linguistics (2017)
Lossio-Ventura, J.A., Jonquet, C., Roche, M., Teisseire, M.: Biomedical term extraction: overview and a new methodology. Inf. Retr. J. 19(1–2), 59–99 (2016). https://doi.org/10.1007/s10791-015-9262-2
Papadimitriou, P., Dasdan, A., Garcia-Molina, H.: Web graph similarity for anomaly detection. J. Internet Serv. Appl. 1(1), 19–30 (2010). https://doi.org/10.1007/s13174-010-0003-x
Riesen, K., Jiang, X., Bunke, H.: Exact and inexact graph matching: methodology and applications. In: Aggarwal, C.C., Wang, H. (eds.) Managing and Mining Graph Data, vol. 40, pp. 217–247. Springer, Boston (2010). https://doi.org/10.1007/978-1-4419-6045-0_7
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Fize, J., Roche, M., Teisseire, M. (2019). Mapping Heterogeneous Textual Data: A Multidimensional Approach Based on Spatiality and Theme. In: El Yacoubi, S., Bagnoli, F., Pacini, G. (eds) Internet Science. INSCI 2019. Lecture Notes in Computer Science(), vol 11938. Springer, Cham. https://doi.org/10.1007/978-3-030-34770-3_25
Download citation
DOI: https://doi.org/10.1007/978-3-030-34770-3_25
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-34769-7
Online ISBN: 978-3-030-34770-3
eBook Packages: Computer ScienceComputer Science (R0)