Advertisement

Using @Twitter Conventions to Improve #LOD-Based Named Entity Disambiguation

  • Genevieve GorrellEmail author
  • Johann Petrak
  • Kalina Bontcheva
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9088)

Abstract

State-of-the-art named entity disambiguation approaches tend to perform poorly on social media content, and microblogs in particular. Tweets are processed individually and the richer, microblog-specific context is largely ignored. This paper focuses specifically on quantifying the impact on entity disambiguation performance when readily available contextual information is included from URL content, hash tag definitions, and Twitter user profiles. In particular, including URL content significantly improves performance. Similarly, user profile information for @mentions improves recall by over 10 % with no adverse impact on precision. We also share a new corpus of tweets, which have been hand-annotated with DBpedia URIs, with high inter-annotator agreement.

Keywords

Similarity Score Link Open Data Context Window Datatype Property Twitter Profile 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Notes

Acknowledgements

The authors wish to thank all volunteers from the NLP research group in Sheffield, who annotated the tweet corpus. This work was partially supported by the European Union under grant agreements No. 287863 TrendMiner and No. 610829 DecarboNet, as well as UK EPSRC grant No. EP/I004327/1.

References

  1. 1.
    Abel, F., Gao, Q., Houben, G.-J., Tao, K.: Semantic enrichment of Twitter posts for user profile construction on the social web. In: Antoniou, G., Grobelnik, M., Simperl, E., Parsia, B., Plexousakis, D., De Leenheer, P., Pan, J. (eds.) ESWC 2011, Part II. LNCS, vol. 6644, pp. 375–389. Springer, Heidelberg (2011) CrossRefGoogle Scholar
  2. 2.
    Aswani, N., Gorrell, G., Bontcheva, K., Petrak, J.: Multilingual, ontology-based information extraction from stream media - v2. Technical report D2.3.2, TrendMiner Project Deliverable (2013). http://www.trendminer-project.eu/images/d2.3.2_final.pdf
  3. 3.
    Basave, A.E.C., Rizzo, G., Varga, A., Rowe, M., Stankovic, M., Dadzie, A.S.: Making sense of microposts (#microposts2014) named entity extraction & linking challenge. In: 4th Workshop on Making Sense of Microposts (#Microposts2014) (2014)Google Scholar
  4. 4.
    Carter, S., Weerkamp, W., Tsagkias, E.: Microblog language identification: Overcoming the limitations of short, unedited and idiomatic text. Lang. Resour. Eval. J. 47, 195–215 (2013)CrossRefGoogle Scholar
  5. 5.
    Cornolti, M., Ferragina, P., Ciaramita, M.: A framework for benchmarking entity-annotation systems. In: Proceedings of the 22nd International Conference on World Wide Web, WWW 2013, pp. 249–260 (2013)Google Scholar
  6. 6.
    Cunningham, H., Tablan, V., Roberts, A., Bontcheva, K.: Getting more out of biomedical documents with GATE’s full lifecycle open source text analytics. PLoS Comput. Biol. 9(2) (2013)Google Scholar
  7. 7.
    Daiber, J., Jakob, M., Hokamp, C., Mendes, P.N.: Improving efficiency and accuracy in multilingual entity extraction. In: Proceedings of the 9th International Conference on Semantic Systems, I-SEMANTICS 2013, New York, NY, USA, pp. 121–124 (2013)Google Scholar
  8. 8.
    Derczynski, L., Maynard, D., Rizzo, G., van Erp, M., Gorrell, G., Troncy, R., Bontcheva, K.: Analysis of named entity recognition and linking for tweets. Inf. Process. Manag. 51, 32–49 (2015)CrossRefGoogle Scholar
  9. 9.
    Ferragina, P., Scaiella, U.: Fast and accurate annotation of short texts with Wikipedia pages. IEEE Softw. 29(1), 70–75 (2012)CrossRefGoogle Scholar
  10. 10.
    Gattani, A., Lamba, D.S., Garera, N., Tiwari, M., Chai, X., Das, S., Subramaniam, S., Rajaraman, A., Harinarayan, V., Doan, A.: Entity extraction, linking, classification, and tagging for social media: a Wikipedia-based approach. Proceed. VLDB Endow. 6(11), 1126–1137 (2013)CrossRefGoogle Scholar
  11. 11.
    Hoffart, J., Yosef, M.A., Bordino, I., Furstenau, H., Pinkal, M., Spaniol, M., Taneva, B., Thater, S., Weikum, G.: Robust disambiguation of named entities in text. In: Conference on Empirical Methods in Natural Language Processing, pp. 782–792 (2011)Google Scholar
  12. 12.
    Hoffart, J., Altun, Y., Weikum, G.: Discovering emerging entities with ambiguous names. In: Proceedings of the 23rd International Conference on World Wide Web, pp. 385–396 (2014)Google Scholar
  13. 13.
    Huang, H., Cao, Y., Huang, X., Ji, H., Lin, C.Y.: Collective tweet wikification based on semi-supervised graph regularization. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, pp. 380–390 (2014)Google Scholar
  14. 14.
    Li, C., Weng, J., He, Q., Yao, Y., Datta, A., Sun, A., Lee, B.S.: Twiner: named entity recognition in targeted Twitter stream. In: Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 721–730. ACM (2012)Google Scholar
  15. 15.
    Liu, X., Zhang, S., Wei, F., Zhou, M.: Recognizing named entities in tweets. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 359–367 (2011)Google Scholar
  16. 16.
    Lösch, U., Müller, D.: Mapping microblog posts to encyclopedia articles. Lect. Notes Inform. 192(150) (2011)Google Scholar
  17. 17.
    Meij, E., Weerkamp, W., de Rijke, M.: Adding semantics to microblog posts. In: Proceedings of the Fifth International Conference on Web Search and Data Mining (WSDM) (2012)Google Scholar
  18. 18.
    Mendes, P.N., Jakob, M., García-Silva, A., Bizer, C.: DBpedia Spotlight: shedding light on the web of documents. In: Proceedings of I-SEMANTICS, pp. 1–8 (2011)Google Scholar
  19. 19.
    Milne, D., Witten, I.H.: Learning to link with Wikipedia. In: Proceedings of the 17th Conference on Information and Knowledge Management (CIKM), pp. 509–518 (2008)Google Scholar
  20. 20.
    Milne, D., Witten, I.H.: An effective, low-cost measure of semantic relatedness obtained from Wikipedia links. In: Proceedings of AAAI 2008 (2008)Google Scholar
  21. 21.
    Murnane, E.L., Haslhofer, B., Lagoze, C.: Reslve: leveraging user interest to improve entity disambiguation on short text. In: Proceedings of the 22nd International Conference on World Wide Web Companion, pp. 1275–1284. International World Wide Web Conferences Steering Committee (2013)Google Scholar
  22. 22.
    Piccolo, L.S.G., Alani, H., De Liddo, A., Baranauskas, C.: Motivating online engagement and debates on energy consumption. In: Proceedings of the 2014 ACM Conference on Web Science (2014)Google Scholar
  23. 23.
    Rao, D., McNamee, P., Dredze, M.: Entity linking: finding extracted entities in a knowledge base. In: Poibeau, T., Saggion, H., Piskorski, J., Yangarber, R. (eds.) Multi-Source, Multi-Lingual Information Extraction and Summarization, pp. 93–115. Springer, Heidelberg (2013) CrossRefGoogle Scholar
  24. 24.
    Ritter, A., Clark, S., Mausam, Etzioni, O.: Named entity recognition in tweets: an experimental study. In: Proceedings of EMNLP (2011)Google Scholar
  25. 25.
    Rowe, M., Stankovic, M., Dadzie, A., Nunes, B., Cano, A.: Making sense of microposts (#msm2013): big things come in small packages. In: Proceedings of the WWW Conference - Workshops (2013)Google Scholar
  26. 26.
    Shen, W., Wang, J., Luo, P., Wang, M.: Linking named entities in tweets with knowledge base via user interest modeling. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 68–76. ACM (2013)Google Scholar
  27. 27.
    Usbeck, R., Ngonga Ngomo, A.-C., Röder, M., Gerber, D., Coelho, S.A., Auer, S., Both, A.: AGDISTIS - graph-based disambiguation of named entities using linked data. In: Mika, P. (ed.) ISWC 2014, Part I. LNCS, vol. 8796, pp. 457–471. Springer, Heidelberg (2014) CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Genevieve Gorrell
    • 1
    Email author
  • Johann Petrak
    • 1
  • Kalina Bontcheva
    • 1
  1. 1.Department of Computer ScienceUniversity of SheffieldSheffieldUK

Personalised recommendations