Advertisement

GSP (Geo-Semantic-Parsing): Geoparsing and Geotagging with Machine Learning on Top of Linked Data

  • Marco Avvenuti
  • Stefano Cresci
  • Leonardo Nizzoli
  • Maurizio Tesconi
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10843)

Abstract

Recently, user-generated content in social media opened up new alluring possibilities for understanding the geospatial aspects of many real-world phenomena. Yet, the vast majority of such content lacks explicit, structured geographic information. Here, we describe the design and implementation of a novel approach for associating geographic information to text documents. GSP exploits powerful machine learning algorithms on top of the rich, interconnected Linked Data in order to overcome limitations of previous state-of-the-art approaches. In detail, our technique performs semantic annotation to identify relevant tokens in the input document, traverses a sub-graph of Linked Data for extracting possible geographic information related to the identified tokens and optimizes its results by means of a Support Vector Machine classifier. We compare our results with those of 4 state-of-the-art techniques and baselines on ground-truth data from 2 evaluation datasets. Our GSP technique achieves excellent performances, with the best \(F1 = 0.91\), sensibly outperforming benchmark techniques that achieve \(F1 \le 0.78\).

Keywords

Geoparsing Machine learning Linked data Twitter 

Notes

Acknowledgements

This research is supported in part by the EU H2020 Program under the schemes INFRAIA-1-2014-2015: Research Infrastructures grant agreement #654024 SoBigData: Social Mining & Big Data Ecosystem.

References

  1. 1.
    Avvenuti, M., Cresci, S., Del Vigna, F., Tesconi, M.: Impromptu crisis mapping to prioritize emergency response. Computer 49(5), 28–37 (2016)CrossRefGoogle Scholar
  2. 2.
    Avvenuti, M., Cresci, S., Marchetti, A., Meletti, C., Tesconi, M.: Predictability or early warning: using social media in modern emergency response. IEEE Internet Comput. 20(6), 4–6 (2016)CrossRefGoogle Scholar
  3. 3.
    Avvenuti, M., Del Vigna, F., Cresci, S., Marchetti, A., Tesconi, M.: Pulling information from social media in the aftermath of unpredictable disasters. In: ICT-DM 2015. IEEE (2015)Google Scholar
  4. 4.
    Cheng, Z., Caverlee, J., Lee, K.: You are where you tweet: a content-based approach to geo-locating Twitter users. In: CIKM 2010. ACM (2010)Google Scholar
  5. 5.
    Cresci, S., D’Errico, A., Gazzé, D., Duca, A.L., Marchetti, A., Tesconi, M.: Towards a DBpedia of tourism: the case of Tourpedia. In: 2014 International Semantic Web Conference (Posters & Demos), pp. 129–132, October 2014Google Scholar
  6. 6.
    Dell’Orletta, F., Venturi, G., Cimino, A., Montemagni, S.: T2K\(^2\): a system for automatically extracting and organizing knowledge from texts. In: LREC 2014 (2014)Google Scholar
  7. 7.
    Ding, L., Shinavier, J., Shangguan, Z., McGuinness, D.L.: SameAs networks and beyond: analyzing deployment status and implications of owl:sameAs in linked data. In: Patel-Schneider, P.F., Pan, Y., Hitzler, P., Mika, P., Zhang, L., Pan, J.Z., Horrocks, I., Glimm, B. (eds.) ISWC 2010. LNCS, vol. 6496, pp. 145–160. Springer, Heidelberg (2010).  https://doi.org/10.1007/978-3-642-17746-0_10CrossRefGoogle Scholar
  8. 8.
    Dredze, M., Osborne, M., Kambadur, P.: Geolocation for Twitter: timing matters. In: HLT-NAACL 2016. ACL (2016)Google Scholar
  9. 9.
    Dredze, M., Paul, M.J., Bergsma, S., Tran, H.: Carmen: a Twitter geolocation system with applications to public health. In: AAAI 2013 Workshops. AAAI (2013)Google Scholar
  10. 10.
    Ferragina, P., Scaiella, U.: TagMe: on-the-fly annotation of short text fragments (by Wikipedia entities). In: CIKM 2010. ACM (2010)Google Scholar
  11. 11.
    Gelernter, J., Balaji, S.: An algorithm for local geoparsing of microtext. GeoInformatica 17(4), 635–667 (2013)CrossRefGoogle Scholar
  12. 12.
    Gottron, T., Schmitz, J., Middleton, S.: Focused exploration of geospatial context on linked open data. In: 2014 Workshop on Intelligent Exploration of Semantic Data (IESD 2014) at the 2014 International Semantic Web Conference, pp. 1–12, October 2014Google Scholar
  13. 13.
    Halterman, A.: Mordecai: full text geoparsing and event geocoding. J. Open Source Softw. 2(9), 91 (2017).  https://doi.org/10.21105/joss.00091CrossRefGoogle Scholar
  14. 14.
    Kordopatis-Zilos, G., Papadopoulos, S., Kompatsiaris, I.: Geotagging text content with language models and feature mining. Proc. IEEE 105(10), 1971–1986 (2017)CrossRefGoogle Scholar
  15. 15.
    McGee, J., Caverlee, J., Cheng, Z.: Location prediction in social media based on tie strength. In: CIKM 2013. ACM (2013)Google Scholar
  16. 16.
    Mendes, P.N., Jakob, M., García-Silva, A., Bizer, C.: DBpedia spotlight: shedding light on the web of documents. In: I-Semantics 2011. ACM (2011)Google Scholar
  17. 17.
    Middleton, S.E., Middleton, L., Modafferi, S.: Real-time crisis mapping of natural disasters using social media. IEEE Intell. Syst. 29(2), 9–17 (2014)CrossRefGoogle Scholar
  18. 18.
    Paulheim, H., Fürnkranz, J.: Unsupervised feature generation from linked open data. In: WIMS 2012. ACM (2012)Google Scholar
  19. 19.
    Rietveld, L., Hoekstra, R., Schlobach, S., Guéret, C.: Structural properties as proxy for semantic relevance in RDF graph sampling. In: Mika, P., et al. (eds.) ISWC 2014. LNCS, vol. 8797, pp. 81–96. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-11915-1_6CrossRefGoogle Scholar
  20. 20.
    Töpper, G., Knuth, M., Sack, H.: DBpedia ontology enrichment for inconsistency detection. In: I-Semantics 2012. ACM (2012)Google Scholar
  21. 21.
    Trani, S., Ceccarelli, D., Lucchese, C., Orlando, S., Perego, R.: Dexter 2.0: an open source tool for semantically enriching data. In: 2014 International Semantic Web Conference (Posters & Demos), pp. 417–420, October 2014Google Scholar
  22. 22.
    Usbeck, R., Ngonga Ngomo, A.-C., Röder, M., Gerber, D., Coelho, S.A., Auer, S., Both, A.: AGDISTIS - graph-based disambiguation of named entities using linked data. In: Mika, P., et al. (eds.) ISWC 2014. LNCS, vol. 8796, pp. 457–471. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-11964-9_29CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Department of Information EngineeringUniversity of PisaPisaItaly
  2. 2.Institute for Informatics and TelematicsIIT-CNRPisaItaly

Personalised recommendations