Advertisement

Soft Textual Cartography Based on Topic Modeling and Clustering of Irregular, Multivariate Marked Networks

  • Mattia EgloffEmail author
  • Raphaël CeréEmail author
Conference paper
Part of the Studies in Computational Intelligence book series (SCI, volume 689)

Abstract

Soft textual cartography is an original approach aimed to study communities on spatially embedded and textually defined complex weighted networks. The present approach relies on the integration of topic modeling and soft clustering procedures. These two aspects can be combined using topic distances, and weighted unoriented networks representing the spatial configuration; their synergy is promising in topic interpretation and geographical information retrieval. This paper proposes an unified formalism, underlining the compatibility of the two aspects, as illustrated on the textual descriptions of the municipalities of the canton of Vaud, Switzerland. It also points to possible extensions and applications of the method, potentially useful for dealing with the ever growing amount of georeferenced textual content.

Keywords

Textual Cartography Community detection Complex network Topicmodeling Soft clustering Modularity 

References

  1. 1.
    Bavaud, F.: Aggregation invariance in general clustering approaches. Adv. Data Anal. Classif. 3(3), 205–225 (2009)Google Scholar
  2. 2.
    Bavaud, F.: Testing spatial autocorrelation in weighted networks: the modes permutation test. J. Geogr. Syst. 3(15), 233–247 (2013)Google Scholar
  3. 3.
    Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003). http://dl.acm.org/citation.cfm?id=944919.944937
  4. 4.
    Ceré, R., Bavaud, F.: Multi-labelled image segmentation in irregular, weighted networks: a spatial autocorrelation approach. In: GISTAM 2017 - Proceedings of the 3rd International Conference on Geographical Information Systems Theory, Applications and Management, Porto, Portugal, 27–28 April, 2017, pp. 62–69 (2017).  https://doi.org/10.5220/0006322800620069, https://doi.org/https://doi.org/10.5220/0006322800620069
  5. 5.
    Ceré, R., Bavaud, F.: Soft image segmentation: on the clustering of irregular, weighted, multivariate marked networks (2017). Accepted for Springer Book of GISTAM 2017: Communications in Computer and Information Science CCIS seriesGoogle Scholar
  6. 6.
    Chang, J., Gerrish, S., Wang, C., Boyd-Graber, J.L., Blei, D.M.: Reading tea leaves: How humans interpret topic models. In: Advances in Neural Information Processing Systems, pp. 288–296 (2009)Google Scholar
  7. 7.
    DBpedia: DBpedia (2017). https://dbpedia.org/, http://dbpedia.org. Accessed 01 Sept 2017
  8. 8.
    Feinerer, I., Hornik, K., Meyer, D.: Text mining infrastructure in r. J. Stat. Softw. 25(5), 1–54 (2008). http://www.jstatsoft.org/v25/i05/
  9. 9.
    Fellows, I.: wordcloud: Word Clouds (2014). https://CRAN.R-project.org/package=wordcloud. R package version 2.5
  10. 10.
    Fouss, F., Saerens, M., Shimbo, M.: Algorithms and Models for Network Data and Link Analysis. Cambridge University Press (2016)Google Scholar
  11. 11.
    Grün, B., Hornik, K.: topicmodels: an R package for fitting topic models. J. Stat. Softw. 40(13), 1–30 (2011). 10.18637/jss.v040.i13Google Scholar
  12. 12.
    Lê, S., Josse, J., Husson, F.: FactoMineR: A package for multivariate analysis. J. Stat. Softw. 25(1), 1–18 (2008). 10.18637/jss.v025.i01Google Scholar
  13. 13.
    Lu, K., Cai, X., Ajiferuke, I., Wolfram, D.: Vocabulary size and its effect on topic representation. Inf. Process. Manag. 53(3), 653–665 (2017)Google Scholar
  14. 14.
    Salah, A., Nadif, M.: Social regularized von mises-fisher mixture model for item recommendation. Data Mining Knowl. Discov. 31(5), 1218–1241 (2017).  https://doi.org/10.1007/s10618-017-0499-9
  15. 15.
    Smola, A.J., Kondor, R.: Kernels and regularization on graphs. In: COLT, vol. 2777, pp. 144–158. Springer (2003)Google Scholar
  16. 16.
    Sui, D.Z., Elwood, S., Goodchild, M.F. (eds.): Crowdsourcing Geographic Knowledge: Volunteered Geographic Information (VGI) in Theory and Practice. Springer, Dordrecht, New York (2013). OCLC: ocn810987841Google Scholar
  17. 17.
    Swiss Federal Statistical Office (FSO): STAT-TAB—Interactive tables (2017). http://www.bfs.admin.ch, https://www.pxweb.bfs.admin.ch. Accessed 01 Sept 2017
  18. 18.
    Wikipedia: Wikipedia, The Free Encyclopedia (2017). https://en.wikipedia.org/, http://en.wikipedia.org. Accessed 01 Sept 2017
  19. 19.
    Xu, Y., Yin, Y., Yin, J.: Tackling topic general words in topic modeling. Eng. Appl. Artif. Intell. 62, 124–133 (2017).  https://doi.org/10.1016/j.engappai.2017.04.009, http://www.sciencedirect.com/science/article/pii/S0952197617300738
  20. 20.
    Youssef Mourchid, M.E.H., Cherifi, H.: An image segmentation algorithm based on community detection. In: Complex Networks & Their Applications V Proceedings of the 5th International Workshop on Complex Networks and their Applications (COMPLEX NETWORKS 2016), pp. 821–830. Springer (2017).  https://doi.org/10.1007/978-3-319-50901-3_65

Copyright information

© Springer International Publishing AG 2018

Authors and Affiliations

  1. 1.Department of Language and Information SciencesUniversity of LausanneLausanneSwitzerland
  2. 2.Department of Geography and SustainabilityUniversity of LausanneLausanneSwitzerland

Personalised recommendations