Skip to main content
Log in

Entity linking of tweets based on dominant entity candidates

  • Original Article
  • Published:
Social Network Analysis and Mining Aims and scope Submit manuscript

Abstract

Entity linking, also known as semantic annotation, of textual content has received increasing attention. Recent works in this area have focused on entity linking on text with special characteristics such as search queries and tweets. The semantic annotation of tweets is specially proven to be challenging given the informal nature of the writing and the short length of the text. In this paper, we propose a method to perform entity linking on tweets built based on one primary hypothesis. We hypothesize that while there are formally many possible entity candidates for an ambiguous mention in a tweet, as listed on the disambiguation page of the corresponding entity on Wikipedia, there are only few entity candidates that are likely to be employed in the context of Twitter. Based on this hypothesis, we propose a method to identify such dominant entity candidates for each ambiguous mention and use them in the annotation process. Particularly, our proposed work integrates two phases (i) dominant entity candidate detection, which applies community detection methods for finding the dominant candidates of ambiguous mentions; and (ii) named entity disambiguation that links a tweet to entities in Wikipedia by only considering the identified dominant entity candidates. Our investigations show that: (1) there are only very few entity candidates for each ambiguous mention in a tweet that need to be considered when performing disambiguation. This helps us limit the candidate search space and hence noticeably reduce the entity linking time; (2) limiting the search space to only a subset of disambiguation options will not only improve entity linking execution time but will also lead to improved accuracy of the entity linking process when the main entity candidates of each mention are mined from a temporally aligned corpus. We show that our proposed method offers competitive results with the state-of-the-art methods in terms of precision and recall on widely used gold standard datasets while significantly reducing the time for processing each tweet.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. http://www.internetlivestats.com/twitter-statistics/.

  2. https://en.wikipedia.org/wiki/Apple_(disambiguation).

  3. https://en.wikipedia.org/wiki/Apple_Inc.

  4. https://en.wikipedia.org/wiki/Apple.

  5. tagme.di.unipi.it.

  6. http://swoogle.umbc.edu/SimService/phrase_similarity.html.

  7. http://swoogle.umbc.edu/StsService/index.html.

  8. https://wordnet.princeton.edu/.

  9. https://en.wikipedia.org/wiki/Frankie_Beverly.

  10. https://archive.org/details/twitter_cikm_2010.

  11. https://www.mpi-inf.mpg.de/departments/databases-and-information-systems/research/yago-naga/aida/.

  12. https://denote.rnet.ryerson.ca/rysann.

  13. https://tagme.di.unipi.it/tagme_help.html.

  14. https://github.com/dbpedia-spotlight/dbpedia-spotlight/wiki/Web-service.

References

  • Abel F, Gao Q, Houben G-J, Tao K (2011) Analyzing temporal dynamics in twitter profiles for personalized recommendations in the social web. In: Web Science 2011, WebSci ’11, Koblenz, Germany—June 15–17, 2011, pp. 2:1–2:8

  • Abel F, Gao Q, Houben G-J, Tao K (2011) Semantic enrichment of twitter posts for user profile construction on the social web. In: The semanic web: research and applications—8th extended semantic web conference, ESWC 2011, Heraklion, Crete, Greece, May 29–June 2, 2011, proceedings, Part II, pp. 375–389

  • Arthur D, Vassilvitskii S (2007) k-means++: the advantages of careful seeding. Symp Discret Algorithms SODA 2007 2007:1027–1035

    MathSciNet  MATH  Google Scholar 

  • Bhatia S, Jain A (2016) Context sensitive entity linking of search queries in enterprise knowledge graphs. In: International semantic web conference, Springer, New York, pp. 50–54

  • Blondel VD, Guillaume J-L, Lambiotte R (2008) Lefebvre E (2008) Fast unfolding of communities in large networks. J Stat Mech Theory Exp 10:P10008

    Article  Google Scholar 

  • Cano BAE, Rizzo G, Varga A, Rowe A, Stankovic M, Dadzie A-S (2014) Making sense of microposts (#microposts2014) named entity extraction & linking challenge. In: Proceedings of the the 4th workshop on making sense of microposts co-located with the 23rd international world wide web conference (WWW 2014), Seoul, Korea, April 7th, 2014, pp. 54–60

  • Cheng Z, Caverlee J, Lee K (2010) You are where you tweet: a content-based approach to geo-locating twitter users. In: Proceedings of the 19th ACM conference on information and knowledge management, CIKM 2010, pp. 759–768

  • Cornolti M, Ferragina P, Ciaramita M (2013) A framework for benchmarking entity-annotation systems. In: 22nd international world wide web conference, WWW 2013, pp. 249–260

  • Cornolti M, Ferragina P, Ciaramita M, Rüd S, Schütze H (2016) A piggyback system for joint entity mention detection and linking in web queries. In: Proceedings of the 25th international conference on world wide web. International World Wide Web Conferences Steering Committee, pp. 567–578

  • Cucerzan S (2007) Large-scale named entity disambiguation based on Wikipedia data. In: Joint conference on empirical methods in natural language processing and computational natural language learning, pp. 708–716

  • Cuzzola J, Bagheri E (2014) Derive: finding semantic concepts with property-values from natural language text. In: International conference on computer science and software engineering, CASCON ’14, pp. 331–334

  • Daiber J, Jakob M, Hokamp C, Mendes PN (2013) Improving efficiency and accuracy in multilingual entity extraction. In: I-SEMANTICS 2013—9th international conference on semantic systems, pp. 121–124

  • Derczynski L, Maynard D, Rizzo G, van Erp M, Gorrell G, Troncy R, Petrak J, Bontcheva K (2015) Analysis of named entity recognition and linking for tweets. Inf Process Manag 51(2):32–49

    Article  Google Scholar 

  • Dumais ST (2004) Latent semantic analysis. Ann Rev Inf Sci Technol 38(1):188–230

    Article  Google Scholar 

  • Feng Y, Fani H, Bagheri E, Jovanovic J (2015) Lexical semantic relatedness for twitter analytics. In: International conference on tools with artificial intelligence 2015

  • Ferragina P, Scaiella U (2010) TAGME: on-the-fly annotation of short text fragments (by Wikipedia entities). In: 19th ACM conference on information and knowledge management, CIKM 2010, pp. 1625–1628

  • Ferrer M, Valveny E, Serratosa F, Bardají I, Bunke H (2009) Graph-based k-means clustering: a comparison of the set median versus the generalized median graph. In: Computer analysis of images and patterns, 13th international conference, CAIP 2009, Münster, Germany, September 2–4, 2009, proceedings, pp. 342–350

  • Gale WA, Church KW, Yarowsky D (1992) One sense per discourse. In: Proceedings of the workshop on speech and natural language, pp. 233–237

  • Ganea O-E, Ganea M, Lucchi A, Eickhoff C, Hofmann T (2016) Probabilistic bag-of-hyperlinks model for entity linking. In: Proceedings of the 25th international conference on world wide web. International World Wide Web Conferences Steering Committee, pp. 927–938

  • Gao N, Cucerzan S (2017) Entity linking to one thousand knowledge bases. In: European conference on information retrieval, Springer, New York, pp. 1–14

  • Gomaa WH, Fahmy AA (2013) A survey of text similarity approaches. Int J Comput Appl 68(13):13–18

    Google Scholar 

  • Habib MB, van Keulen M (2012) Unsupervised improvement of named entity extraction in short informal context using disambiguation clues. In: Proceedings of the workshop on semantic web and information extraction (SWAIE 2012), Galway, Ireland, October 9, 2012, pp. 1–10

  • Habib MB, van Keulen M (2016) Twitterneed: a hybrid approach for named entity extraction and disambiguation for tweet. Nat Lang Eng 22(3):423–456

    Article  Google Scholar 

  • Han L, Kashyap A, Finin T, Mayfield J, Weese J (2013) Umbc ebiquity-core: semantic textual similarity systems. Proc Second Jt Conf Lex Comput Semant 1:44–52

    Google Scholar 

  • Han X, Sun L, Zhao J (2011) Collective entity linking in web text: a graph-based method. In: Proceeding of the 34th international ACM SIGIR conference on research and development in information retrieval, pp. 765–774

  • Hoffart J, Yosef MA, Bordino I, Fürstenau H, Pinkal M, Spaniol M, Taneva B, Thater S, Weikum G (2011) Robust disambiguation of named entities in text. In: Proceedings of the 2011 conference on empirical methods in natural language processing, EMNLP 2011, 27–31 July 2011, John McIntyre Conference Centre, Edinburgh, UK, A meeting of SIGDAT, a Special Interest Group of the ACL, pp. 782–792

  • Huang J, Peng M, Wang H, Cao J, Gao W, Zhang X (2017) A probabilistic method for emerging topic tracking in microblog stream. World Wide Web 20(2):325–350

    Article  Google Scholar 

  • Huang H, Cao Y, Huang X, Ji H, Lin C-Y (2014) Collective tweet wikification based on semi-supervised graph regularization. In: Proceedings of the 52nd annual meeting of the association for computational linguistics, pp. 380–390

  • Inches G, Carman MJ, Crestani F (2010) Statistics of online user-generated short documents. In: Advances in information retrieval, 32nd European conference on IR research, pp. 649–652

  • Jansen BJ, Zhang M, Sobel K, Chowdury A (2009) Twitter power: tweets as electronic word of mouth. JASIST 60(11):2169–2188

    Article  Google Scholar 

  • Jovanovic J, Bagheri E, Cuzzola J, Gasevic D, Jeremic Z, Bashash R (2014) Automated semantic tagging of textual content. IT Prof 16(6):38–46

    Article  Google Scholar 

  • Kapanipathi P, Jain P, Venkatramani C, Sheth AP (2014) User interests identification on twitter using a hierarchical knowledge base. In: The semantic web: trends and challenges—11th international conference, ESWC 2014, Anissaras, Crete, Greece, May 25–29, 2014. proceedings, pp. 99–113

  • Kapanipathi P, Orlandi F, Sheth AP, Passant A (2011) Personalized filtering of the twitter stream. In: Proceedings of the second workshop on semantic personalized information management: retrieval and recommendation 2011, Bonn, Germany, October 24, 2011, pp. 6–13

  • Kulkarni S, Singh A, Ramakrishnan G, Chakrabarti S (2009) Collective annotation of Wikipedia entities in web text. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining, pp. 457–466

  • Lawler GF, Limic V (2010) Random walk: a modern introduction, vol 123. Cambridge University Press, Cambridge

    Book  Google Scholar 

  • Li Y, Tan S, Sun H, Han J, Roth D, Yan X (2016) Entity disambiguation with linkless knowledge bases. In: Proceedings of the 25th international conference on world wide web, pp. 1261–1270

  • Li Y, Tan S, Sun H, Han J, Roth D, Yan X (2016) Entity disambiguation with linkless knowledge bases. In: Proceedings of the 25th international conference on world wide web. International World Wide Web Conferences Steering Committee, pp. 1261–1270

  • Liu X, Li Y, Wu H, Zhou M, Wei F, Lu Y (2013) Entity linking for tweets. In: Proceedings of the 51st annual meeting of the association for computational linguistics, pp. 1304–1311

  • Mannor S, Menache I, Hoze A, Klein U (2004) Dynamic abstraction in reinforcement learning via clustering. In: Machine learning, proceedings of the twenty-first international conference (ICML 2004), Banff, Alberta, Canada, July 4–8, 2004

  • Massoudi K, Tsagkias M, de Rijke M, Weerkamp W (2011) Incorporating query expansion and quality indicators in searching microblog posts. In: Advances in information retrieval—33rd European conference on IR research, pp. 362–367

  • Meij E, Weerkamp W, de Rijke M (2012) Adding semantics to microblog posts. In: Proceedings of the fifth international conference on web search and web data mining, pp. 563–572

  • Mihalcea R, Csomai A (2007) Wikify!: linking documents to encyclopedic knowledge. In: ACM conference on information and knowledge management, pp. 233–242

  • Pak A, Paroubek P (2010) Twitter as a corpus for sentiment analysis and opinion mining. In: Proceedings of the international conference on language resources and evaluation, LREC 2010, 17–23 May 2010, Valletta, Malta

  • Saleiro P, Eduarda MR, Soares C, Oliveira E (2017) Texrep: a text mining framework for online reputation monitoring. New Gener Comput 35(4):365–389

    Article  Google Scholar 

  • Santamaría C, Gonzalo J, Artiles J (2010) Wikipedia as sense inventory to improve diversity in web search results. In: Proceedings of the 48th annual meeting of the association for computational Linguistics. Association for Computational Linguistics, pp. 1357–1366

  • Sarmento L, Kehlenbeck A, Oliveira EC, Ungar LH (2009) An approach to web-scale named-entity disambiguation. In: Machine learning and data mining in pattern recognition, 6th international conference, MLDM 2009, Leipzig, Germany, July 23–25, 2009. Proceedings, pp. 689–703

  • Shen W, Wang J, Luo P, Wang M (2013) Linking named entities in tweets with knowledge base via user interest modeling. In: International conference on knowledge discovery and data mining, KDD 2013, pp. 68–76

  • Shirakawa M, Wang H, Song Y, Wang Z, Nakayama K, Hara T, Nishio S (2011) Entity disambiguation based on a probabilistic taxonomy. In: technical report MSR-TR-2011-125

  • Tran AT, Tran NK, Asmelash TH, Jäschke R (2015) Semantic annotation for microblog topics using Wikipedia temporal information. In: Proceedings of the 2015 conference on empirical methods in natural language processing, pp. 97–106

  • Turney PD (2008) The latent relation mapping engine: algorithm and experiments. J Artif Intell Res (JAIR) 33:615–655

    Article  Google Scholar 

  • Varga A, Basave AEC, Rowe M, Ciravegna F, He Y (2014) Linked knowledge sources for topic classification of microposts: a semantic graph-based approach. J Web Semant 26:36–57

    Article  Google Scholar 

  • Vitale D, Ferragina P, Scaiella U (2012) Classification of short texts by deploying topical annotations. In: Advances in information retrieval—34th european conference on IR research, ECIR 2012, Barcelona, Spain, April 1–5, 2012, proceedings, pp. 376–387

  • Yamada I, Takeda H, Takefuji Y (2015) An end-to-end entity linking approach for tweets. In: Proceedings of the the 5th workshop on making sense of microposts co-located with the 24th international world wide web conference, pp. 55–56

  • Yosef MA, Hoffart J, Bordino I, Spaniol M, Weikum G (2011) AIDA: an online tool for accurate disambiguation of named entities in text and tables. PVLDB 4(12):1450–1453

    Google Scholar 

  • Zarrinkalam F, Fani H, Bagheri E, Kahani M, Du W (2015) Semantics-enabled user interest detection from twitter. In: IEEE/WIC/ACM international conference on web intelligence and intelligent agent technology, WI-IAT 2015, pp. 469–476

  • Zhao G, Wu J, Wang D, Li T (2016) Entity disambiguation to Wikipedia using collective ranking. Inf Process Manag 52(6):1247–1257

    Article  Google Scholar 

  • Zou X, Sun C, Sun Y, Liu B, Lin L (2014) Linking entities in tweets to Wikipedia knowledge base. In: Natural language processing and Chinese computing—third CCF Conference, pp. 368–378

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ebrahim Bagheri.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Feng, Y., Zarrinkalam, F., Bagheri, E. et al. Entity linking of tweets based on dominant entity candidates. Soc. Netw. Anal. Min. 8, 46 (2018). https://doi.org/10.1007/s13278-018-0523-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s13278-018-0523-0

Keywords

Navigation