Skip to main content

Things and Strings: Improving Place Name Disambiguation from Short Texts by Combining Entity Co-Occurrence with Topic Modeling

  • Conference paper
  • First Online:
Knowledge Engineering and Knowledge Management (EKAW 2016)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10024))

Included in the following conference series:

Abstract

Place name disambiguation is the task of correctly identifying a place from a set of places sharing a common name. It contributes to tasks such as knowledge extraction, query answering, geographic information retrieval, and automatic tagging. Disambiguation quality relies on the ability to correctly identify and interpret contextual clues, complicating the task for short texts. Here we propose a novel approach to the disambiguation of place names from short texts that integrates two models: entity co-occurrence and topic modeling. The first model uses Linked Data to identify related entities to improve disambiguation quality. The second model uses topic modeling to differentiate places based on the terms used to describe them. We evaluate our approach using a corpus of short texts, determine the suitable weight between models, and demonstrate that a combined model outperforms benchmark systems such as DBpedia Spotlight and Open Calais in terms of F1-score and Mean Reciprocal Rank.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://en.wikipedia.org/wiki/Washington.

  2. 2.

    http://dbpedia.org/resource/.

  3. 3.

    http://dbpedia.org/ontology/.

  4. 4.

    For example via dbr:FreedomWorks dbp:headquarters dbr:Washington,_D.C.

  5. 5.

    https://en.wikipedia.org/wiki/List_of_the_most_common_U.S._place_names.

  6. 6.

    https://datamarket.azure.com/dataset/bing/search.

  7. 7.

    https://github.com/dbpedia-spotlight/dbpedia-spotlight.

  8. 8.

    https://www.textrazor.com/.

  9. 9.

    http://www.opencalais.com/.

References

  1. Adams, B., Janowicz, K.: On the geo-indicativeness of non-georeferenced text. In: International AAAI Conference on Web and Social Media (ICWSM), pp. 375–378 (2012)

    Google Scholar 

  2. Adams, B., McKenzie, G., Gahegan, M.: Frankenplace: interactive thematic mapping for ad hoc exploratory search. In: Proceedings of the 24th International Conference on World Wide Web, pp. 12–22. ACM (2015)

    Google Scholar 

  3. Bilenko, M., Mooney, R., Cohen, W., Ravikumar, P., Fienberg, S.: Adaptive name matching in information integration. IEEE Intell. Syst. 18(5), 16–23 (2003)

    Article  Google Scholar 

  4. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3(Jan), 993–1022 (2003)

    MATH  Google Scholar 

  5. Bunescu, R.C., Pasca, M.: Using encyclopedic knowledge for named entity disambiguation. EACL 6, 9–16 (2006)

    Google Scholar 

  6. Cucerzan, S.: Large-scale named entity disambiguation based on Wikipedia data. EMNLP-CoNLL 7, 708–716 (2007)

    Google Scholar 

  7. Fader, A., Soderland, S., Etzioni, O., Center, T.: Scaling Wikipedia-based named entity disambiguation to arbitrary web text. In: Proceedings of the IJCAI Workshop on User-contributed Knowledge, Artificial Intelligence: An Evolving Synergy, Pasadena, CA, USA, pp. 21–26, 2009 (2011)

    Google Scholar 

  8. Goodchild, M.F., Glennon, J.A.: Crowdsourcing geographic information for disaster response: a research frontier. Int. J. Digit. Earth 3(3), 231–241 (2010)

    Article  Google Scholar 

  9. Gray, R.W.: Exact transformation equations for Fuller’s world map. Cartogr.: Int. J. Geogr. Inf. Geovis. 32(3), 17–25 (1995)

    Article  Google Scholar 

  10. Han, X., Zhao, J., Structural semantic relatedness: a knowledge-based method to named entity disambiguation. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 50–59. Association for Computational Linguistics (2010)

    Google Scholar 

  11. Hu, Y., Janowicz, K., Prasad, S.: Improving Wikipedia-based place name disambiguation in short texts using structured data from DBpedia. In: Proceedings of the 8th Workshop on Geographic Information Retrieval, p. 8. ACM (2014)

    Google Scholar 

  12. Janowicz, K., Hitzler, P.: The digital earth as knowledge engine. Semant. Web 3(3), 213–221 (2012)

    Google Scholar 

  13. Jones, C.B., Purves, R.S.: Geographical information retrieval. Int. J. Geogr. Inf. Sci. 22(3), 219–228 (2008)

    Article  Google Scholar 

  14. Machado, I.M.R., de Alencar, R.O., de Oliveira Campos Jr., R., Davis Jr., C.A.: An ontological gazetteer and its application for place name disambiguation in text. J. Braz. Comput. Soc. 17(4), 267–279 (2011)

    Article  Google Scholar 

  15. Mendes, P.N., Jakob, M., García-Silva, A., Bizer, C., Dbpedia spotlight: shedding light on the web of documents. In: Proceedings of the 7th International Conference on Semantic Systems, pp. 1–8. ACM (2011)

    Google Scholar 

  16. Mihalcea, R., Csomai, A.: Wikify!: linking documents to encyclopedic knowledge. In Proceedings of the sixteenth ACM conference on Conference on information and knowledge management, pp. 233–242. ACM (2007)

    Google Scholar 

  17. Milne, D., Witten, I.H.: Learning to link with Wikipedia. In: Proceedings of the 17th ACM conference on Information and knowledge management, pp. 509–518. ACM (2008)

    Google Scholar 

  18. Overell, S., Rüger, S.: Using co-occurrence models for placename disambiguation. Int. J. Geogr. Inf. Sci. 22(3), 265–287 (2008)

    Article  Google Scholar 

  19. Purves, R., Jones, C.: Geographic information retrieval. SIGSPATIAL Spec. 3(2), 2–4 (2011)

    Article  Google Scholar 

  20. Rizzo, G., van Erp, M., Troncy, R.: Benchmarking the extraction and disambiguation of named entities on the semantic web. In: LREC, pp. 4593–4600 (2014)

    Google Scholar 

  21. Sahr, K., White, D., Kimerling, A.J.: Geodesic discrete global grid systems. Cartogr. Geogr. Inf. Sci. 30(2), 121–134 (2003)

    Article  Google Scholar 

  22. Spitz, A., Geiß, J., Gertz, M., So far away, yet so close: augmenting toponym disambiguation and similarity with text-based networks. In: Proceedings of the Third International ACM SIGMOD Workshop on Managing and Mining Enriched Geo-Spatial Data, GeoRich 2016, pp. 2: 1–2: 6. ACM, New York, NY, USA (2016)

    Google Scholar 

  23. Steyvers, M., Griffiths, T.: Probabilistic topic models. Handb. Latent Semant. Anal. 427(7), 424–440 (2007)

    Google Scholar 

  24. Zhang, W., Gelernter, J.: Geocoding location expressions in Twitter messages: a preference learning method. J. Spat. Inf. Sci. 2014(9), 37–70 (2014)

    Google Scholar 

Download references

Acknowledgement

The authors would like to acknowledge partial support by the National Science Foundation (NSF) under award 1440202 EarthCube Building Blocks: Collaborative Proposal: GeoLink Leveraging Semantics and Linked Data for Data Sharing and Discovery in the Geosciences.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yiting Ju .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Ju, Y., Adams, B., Janowicz, K., Hu, Y., Yan, B., McKenzie, G. (2016). Things and Strings: Improving Place Name Disambiguation from Short Texts by Combining Entity Co-Occurrence with Topic Modeling. In: Blomqvist, E., Ciancarini, P., Poggi, F., Vitali, F. (eds) Knowledge Engineering and Knowledge Management. EKAW 2016. Lecture Notes in Computer Science(), vol 10024. Springer, Cham. https://doi.org/10.1007/978-3-319-49004-5_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-49004-5_23

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-49003-8

  • Online ISBN: 978-3-319-49004-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics