Abstract
Soft textual cartography is an original approach aimed to study communities on spatially embedded and textually defined complex weighted networks. The present approach relies on the integration of topic modeling and soft clustering procedures. These two aspects can be combined using topic distances, and weighted unoriented networks representing the spatial configuration; their synergy is promising in topic interpretation and geographical information retrieval. This paper proposes an unified formalism, underlining the compatibility of the two aspects, as illustrated on the textual descriptions of the municipalities of the canton of Vaud, Switzerland. It also points to possible extensions and applications of the method, potentially useful for dealing with the ever growing amount of georeferenced textual content.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
The value of q has been selected after computing several topic models with q varying from 9 to 36, With \(q=12\) we had a relatively small number of interesting topics apt to illustrate our method.
References
Bavaud, F.: Aggregation invariance in general clustering approaches. Adv. Data Anal. Classif. 3(3), 205–225 (2009)
Bavaud, F.: Testing spatial autocorrelation in weighted networks: the modes permutation test. J. Geogr. Syst. 3(15), 233–247 (2013)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003). http://dl.acm.org/citation.cfm?id=944919.944937
Ceré, R., Bavaud, F.: Multi-labelled image segmentation in irregular, weighted networks: a spatial autocorrelation approach. In: GISTAM 2017 - Proceedings of the 3rd International Conference on Geographical Information Systems Theory, Applications and Management, Porto, Portugal, 27–28 April, 2017, pp. 62–69 (2017). https://doi.org/10.5220/0006322800620069, https://doi.org/https://doi.org/10.5220/0006322800620069
Ceré, R., Bavaud, F.: Soft image segmentation: on the clustering of irregular, weighted, multivariate marked networks (2017). Accepted for Springer Book of GISTAM 2017: Communications in Computer and Information Science CCIS series
Chang, J., Gerrish, S., Wang, C., Boyd-Graber, J.L., Blei, D.M.: Reading tea leaves: How humans interpret topic models. In: Advances in Neural Information Processing Systems, pp. 288–296 (2009)
DBpedia: DBpedia (2017). https://dbpedia.org/, http://dbpedia.org. Accessed 01 Sept 2017
Feinerer, I., Hornik, K., Meyer, D.: Text mining infrastructure in r. J. Stat. Softw. 25(5), 1–54 (2008). http://www.jstatsoft.org/v25/i05/
Fellows, I.: wordcloud: Word Clouds (2014). https://CRAN.R-project.org/package=wordcloud. R package version 2.5
Fouss, F., Saerens, M., Shimbo, M.: Algorithms and Models for Network Data and Link Analysis. Cambridge University Press (2016)
Grün, B., Hornik, K.: topicmodels: an R package for fitting topic models. J. Stat. Softw. 40(13), 1–30 (2011). 10.18637/jss.v040.i13
Lê, S., Josse, J., Husson, F.: FactoMineR: A package for multivariate analysis. J. Stat. Softw. 25(1), 1–18 (2008). 10.18637/jss.v025.i01
Lu, K., Cai, X., Ajiferuke, I., Wolfram, D.: Vocabulary size and its effect on topic representation. Inf. Process. Manag. 53(3), 653–665 (2017)
Salah, A., Nadif, M.: Social regularized von mises-fisher mixture model for item recommendation. Data Mining Knowl. Discov. 31(5), 1218–1241 (2017). https://doi.org/10.1007/s10618-017-0499-9
Smola, A.J., Kondor, R.: Kernels and regularization on graphs. In: COLT, vol. 2777, pp. 144–158. Springer (2003)
Sui, D.Z., Elwood, S., Goodchild, M.F. (eds.): Crowdsourcing Geographic Knowledge: Volunteered Geographic Information (VGI) in Theory and Practice. Springer, Dordrecht, New York (2013). OCLC: ocn810987841
Swiss Federal Statistical Office (FSO): STAT-TAB—Interactive tables (2017). http://www.bfs.admin.ch, https://www.pxweb.bfs.admin.ch. Accessed 01 Sept 2017
Wikipedia: Wikipedia, The Free Encyclopedia (2017). https://en.wikipedia.org/, http://en.wikipedia.org. Accessed 01 Sept 2017
Xu, Y., Yin, Y., Yin, J.: Tackling topic general words in topic modeling. Eng. Appl. Artif. Intell. 62, 124–133 (2017). https://doi.org/10.1016/j.engappai.2017.04.009, http://www.sciencedirect.com/science/article/pii/S0952197617300738
Youssef Mourchid, M.E.H., Cherifi, H.: An image segmentation algorithm based on community detection. In: Complex Networks & Their Applications V Proceedings of the 5th International Workshop on Complex Networks and their Applications (COMPLEX NETWORKS 2016), pp. 821–830. Springer (2017). https://doi.org/10.1007/978-3-319-50901-3_65
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this paper
Cite this paper
Egloff, M., Ceré, R. (2018). Soft Textual Cartography Based on Topic Modeling and Clustering of Irregular, Multivariate Marked Networks. In: Cherifi, C., Cherifi, H., Karsai, M., Musolesi, M. (eds) Complex Networks & Their Applications VI. COMPLEX NETWORKS 2017. Studies in Computational Intelligence, vol 689. Springer, Cham. https://doi.org/10.1007/978-3-319-72150-7_59
Download citation
DOI: https://doi.org/10.1007/978-3-319-72150-7_59
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-72149-1
Online ISBN: 978-3-319-72150-7
eBook Packages: EngineeringEngineering (R0)