Abstract
Entity linking, also known as semantic annotation, of textual content has received increasing attention. Recent works in this area have focused on entity linking on text with special characteristics such as search queries and tweets. The semantic annotation of tweets is specially proven to be challenging given the informal nature of the writing and the short length of the text. In this paper, we propose a method to perform entity linking on tweets built based on one primary hypothesis. We hypothesize that while there are formally many possible entity candidates for an ambiguous mention in a tweet, as listed on the disambiguation page of the corresponding entity on Wikipedia, there are only few entity candidates that are likely to be employed in the context of Twitter. Based on this hypothesis, we propose a method to identify such dominant entity candidates for each ambiguous mention and use them in the annotation process. Particularly, our proposed work integrates two phases (i) dominant entity candidate detection, which applies community detection methods for finding the dominant candidates of ambiguous mentions; and (ii) named entity disambiguation that links a tweet to entities in Wikipedia by only considering the identified dominant entity candidates. Our investigations show that: (1) there are only very few entity candidates for each ambiguous mention in a tweet that need to be considered when performing disambiguation. This helps us limit the candidate search space and hence noticeably reduce the entity linking time; (2) limiting the search space to only a subset of disambiguation options will not only improve entity linking execution time but will also lead to improved accuracy of the entity linking process when the main entity candidates of each mention are mined from a temporally aligned corpus. We show that our proposed method offers competitive results with the state-of-the-art methods in terms of precision and recall on widely used gold standard datasets while significantly reducing the time for processing each tweet.
Similar content being viewed by others
Notes
tagme.di.unipi.it.
References
Abel F, Gao Q, Houben G-J, Tao K (2011) Analyzing temporal dynamics in twitter profiles for personalized recommendations in the social web. In: Web Science 2011, WebSci ’11, Koblenz, Germany—June 15–17, 2011, pp. 2:1–2:8
Abel F, Gao Q, Houben G-J, Tao K (2011) Semantic enrichment of twitter posts for user profile construction on the social web. In: The semanic web: research and applications—8th extended semantic web conference, ESWC 2011, Heraklion, Crete, Greece, May 29–June 2, 2011, proceedings, Part II, pp. 375–389
Arthur D, Vassilvitskii S (2007) k-means++: the advantages of careful seeding. Symp Discret Algorithms SODA 2007 2007:1027–1035
Bhatia S, Jain A (2016) Context sensitive entity linking of search queries in enterprise knowledge graphs. In: International semantic web conference, Springer, New York, pp. 50–54
Blondel VD, Guillaume J-L, Lambiotte R (2008) Lefebvre E (2008) Fast unfolding of communities in large networks. J Stat Mech Theory Exp 10:P10008
Cano BAE, Rizzo G, Varga A, Rowe A, Stankovic M, Dadzie A-S (2014) Making sense of microposts (#microposts2014) named entity extraction & linking challenge. In: Proceedings of the the 4th workshop on making sense of microposts co-located with the 23rd international world wide web conference (WWW 2014), Seoul, Korea, April 7th, 2014, pp. 54–60
Cheng Z, Caverlee J, Lee K (2010) You are where you tweet: a content-based approach to geo-locating twitter users. In: Proceedings of the 19th ACM conference on information and knowledge management, CIKM 2010, pp. 759–768
Cornolti M, Ferragina P, Ciaramita M (2013) A framework for benchmarking entity-annotation systems. In: 22nd international world wide web conference, WWW 2013, pp. 249–260
Cornolti M, Ferragina P, Ciaramita M, Rüd S, Schütze H (2016) A piggyback system for joint entity mention detection and linking in web queries. In: Proceedings of the 25th international conference on world wide web. International World Wide Web Conferences Steering Committee, pp. 567–578
Cucerzan S (2007) Large-scale named entity disambiguation based on Wikipedia data. In: Joint conference on empirical methods in natural language processing and computational natural language learning, pp. 708–716
Cuzzola J, Bagheri E (2014) Derive: finding semantic concepts with property-values from natural language text. In: International conference on computer science and software engineering, CASCON ’14, pp. 331–334
Daiber J, Jakob M, Hokamp C, Mendes PN (2013) Improving efficiency and accuracy in multilingual entity extraction. In: I-SEMANTICS 2013—9th international conference on semantic systems, pp. 121–124
Derczynski L, Maynard D, Rizzo G, van Erp M, Gorrell G, Troncy R, Petrak J, Bontcheva K (2015) Analysis of named entity recognition and linking for tweets. Inf Process Manag 51(2):32–49
Dumais ST (2004) Latent semantic analysis. Ann Rev Inf Sci Technol 38(1):188–230
Feng Y, Fani H, Bagheri E, Jovanovic J (2015) Lexical semantic relatedness for twitter analytics. In: International conference on tools with artificial intelligence 2015
Ferragina P, Scaiella U (2010) TAGME: on-the-fly annotation of short text fragments (by Wikipedia entities). In: 19th ACM conference on information and knowledge management, CIKM 2010, pp. 1625–1628
Ferrer M, Valveny E, Serratosa F, Bardají I, Bunke H (2009) Graph-based k-means clustering: a comparison of the set median versus the generalized median graph. In: Computer analysis of images and patterns, 13th international conference, CAIP 2009, Münster, Germany, September 2–4, 2009, proceedings, pp. 342–350
Gale WA, Church KW, Yarowsky D (1992) One sense per discourse. In: Proceedings of the workshop on speech and natural language, pp. 233–237
Ganea O-E, Ganea M, Lucchi A, Eickhoff C, Hofmann T (2016) Probabilistic bag-of-hyperlinks model for entity linking. In: Proceedings of the 25th international conference on world wide web. International World Wide Web Conferences Steering Committee, pp. 927–938
Gao N, Cucerzan S (2017) Entity linking to one thousand knowledge bases. In: European conference on information retrieval, Springer, New York, pp. 1–14
Gomaa WH, Fahmy AA (2013) A survey of text similarity approaches. Int J Comput Appl 68(13):13–18
Habib MB, van Keulen M (2012) Unsupervised improvement of named entity extraction in short informal context using disambiguation clues. In: Proceedings of the workshop on semantic web and information extraction (SWAIE 2012), Galway, Ireland, October 9, 2012, pp. 1–10
Habib MB, van Keulen M (2016) Twitterneed: a hybrid approach for named entity extraction and disambiguation for tweet. Nat Lang Eng 22(3):423–456
Han L, Kashyap A, Finin T, Mayfield J, Weese J (2013) Umbc ebiquity-core: semantic textual similarity systems. Proc Second Jt Conf Lex Comput Semant 1:44–52
Han X, Sun L, Zhao J (2011) Collective entity linking in web text: a graph-based method. In: Proceeding of the 34th international ACM SIGIR conference on research and development in information retrieval, pp. 765–774
Hoffart J, Yosef MA, Bordino I, Fürstenau H, Pinkal M, Spaniol M, Taneva B, Thater S, Weikum G (2011) Robust disambiguation of named entities in text. In: Proceedings of the 2011 conference on empirical methods in natural language processing, EMNLP 2011, 27–31 July 2011, John McIntyre Conference Centre, Edinburgh, UK, A meeting of SIGDAT, a Special Interest Group of the ACL, pp. 782–792
Huang J, Peng M, Wang H, Cao J, Gao W, Zhang X (2017) A probabilistic method for emerging topic tracking in microblog stream. World Wide Web 20(2):325–350
Huang H, Cao Y, Huang X, Ji H, Lin C-Y (2014) Collective tweet wikification based on semi-supervised graph regularization. In: Proceedings of the 52nd annual meeting of the association for computational linguistics, pp. 380–390
Inches G, Carman MJ, Crestani F (2010) Statistics of online user-generated short documents. In: Advances in information retrieval, 32nd European conference on IR research, pp. 649–652
Jansen BJ, Zhang M, Sobel K, Chowdury A (2009) Twitter power: tweets as electronic word of mouth. JASIST 60(11):2169–2188
Jovanovic J, Bagheri E, Cuzzola J, Gasevic D, Jeremic Z, Bashash R (2014) Automated semantic tagging of textual content. IT Prof 16(6):38–46
Kapanipathi P, Jain P, Venkatramani C, Sheth AP (2014) User interests identification on twitter using a hierarchical knowledge base. In: The semantic web: trends and challenges—11th international conference, ESWC 2014, Anissaras, Crete, Greece, May 25–29, 2014. proceedings, pp. 99–113
Kapanipathi P, Orlandi F, Sheth AP, Passant A (2011) Personalized filtering of the twitter stream. In: Proceedings of the second workshop on semantic personalized information management: retrieval and recommendation 2011, Bonn, Germany, October 24, 2011, pp. 6–13
Kulkarni S, Singh A, Ramakrishnan G, Chakrabarti S (2009) Collective annotation of Wikipedia entities in web text. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining, pp. 457–466
Lawler GF, Limic V (2010) Random walk: a modern introduction, vol 123. Cambridge University Press, Cambridge
Li Y, Tan S, Sun H, Han J, Roth D, Yan X (2016) Entity disambiguation with linkless knowledge bases. In: Proceedings of the 25th international conference on world wide web, pp. 1261–1270
Li Y, Tan S, Sun H, Han J, Roth D, Yan X (2016) Entity disambiguation with linkless knowledge bases. In: Proceedings of the 25th international conference on world wide web. International World Wide Web Conferences Steering Committee, pp. 1261–1270
Liu X, Li Y, Wu H, Zhou M, Wei F, Lu Y (2013) Entity linking for tweets. In: Proceedings of the 51st annual meeting of the association for computational linguistics, pp. 1304–1311
Mannor S, Menache I, Hoze A, Klein U (2004) Dynamic abstraction in reinforcement learning via clustering. In: Machine learning, proceedings of the twenty-first international conference (ICML 2004), Banff, Alberta, Canada, July 4–8, 2004
Massoudi K, Tsagkias M, de Rijke M, Weerkamp W (2011) Incorporating query expansion and quality indicators in searching microblog posts. In: Advances in information retrieval—33rd European conference on IR research, pp. 362–367
Meij E, Weerkamp W, de Rijke M (2012) Adding semantics to microblog posts. In: Proceedings of the fifth international conference on web search and web data mining, pp. 563–572
Mihalcea R, Csomai A (2007) Wikify!: linking documents to encyclopedic knowledge. In: ACM conference on information and knowledge management, pp. 233–242
Pak A, Paroubek P (2010) Twitter as a corpus for sentiment analysis and opinion mining. In: Proceedings of the international conference on language resources and evaluation, LREC 2010, 17–23 May 2010, Valletta, Malta
Saleiro P, Eduarda MR, Soares C, Oliveira E (2017) Texrep: a text mining framework for online reputation monitoring. New Gener Comput 35(4):365–389
Santamaría C, Gonzalo J, Artiles J (2010) Wikipedia as sense inventory to improve diversity in web search results. In: Proceedings of the 48th annual meeting of the association for computational Linguistics. Association for Computational Linguistics, pp. 1357–1366
Sarmento L, Kehlenbeck A, Oliveira EC, Ungar LH (2009) An approach to web-scale named-entity disambiguation. In: Machine learning and data mining in pattern recognition, 6th international conference, MLDM 2009, Leipzig, Germany, July 23–25, 2009. Proceedings, pp. 689–703
Shen W, Wang J, Luo P, Wang M (2013) Linking named entities in tweets with knowledge base via user interest modeling. In: International conference on knowledge discovery and data mining, KDD 2013, pp. 68–76
Shirakawa M, Wang H, Song Y, Wang Z, Nakayama K, Hara T, Nishio S (2011) Entity disambiguation based on a probabilistic taxonomy. In: technical report MSR-TR-2011-125
Tran AT, Tran NK, Asmelash TH, Jäschke R (2015) Semantic annotation for microblog topics using Wikipedia temporal information. In: Proceedings of the 2015 conference on empirical methods in natural language processing, pp. 97–106
Turney PD (2008) The latent relation mapping engine: algorithm and experiments. J Artif Intell Res (JAIR) 33:615–655
Varga A, Basave AEC, Rowe M, Ciravegna F, He Y (2014) Linked knowledge sources for topic classification of microposts: a semantic graph-based approach. J Web Semant 26:36–57
Vitale D, Ferragina P, Scaiella U (2012) Classification of short texts by deploying topical annotations. In: Advances in information retrieval—34th european conference on IR research, ECIR 2012, Barcelona, Spain, April 1–5, 2012, proceedings, pp. 376–387
Yamada I, Takeda H, Takefuji Y (2015) An end-to-end entity linking approach for tweets. In: Proceedings of the the 5th workshop on making sense of microposts co-located with the 24th international world wide web conference, pp. 55–56
Yosef MA, Hoffart J, Bordino I, Spaniol M, Weikum G (2011) AIDA: an online tool for accurate disambiguation of named entities in text and tables. PVLDB 4(12):1450–1453
Zarrinkalam F, Fani H, Bagheri E, Kahani M, Du W (2015) Semantics-enabled user interest detection from twitter. In: IEEE/WIC/ACM international conference on web intelligence and intelligent agent technology, WI-IAT 2015, pp. 469–476
Zhao G, Wu J, Wang D, Li T (2016) Entity disambiguation to Wikipedia using collective ranking. Inf Process Manag 52(6):1247–1257
Zou X, Sun C, Sun Y, Liu B, Lin L (2014) Linking entities in tweets to Wikipedia knowledge base. In: Natural language processing and Chinese computing—third CCF Conference, pp. 368–378
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Feng, Y., Zarrinkalam, F., Bagheri, E. et al. Entity linking of tweets based on dominant entity candidates. Soc. Netw. Anal. Min. 8, 46 (2018). https://doi.org/10.1007/s13278-018-0523-0
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s13278-018-0523-0