Abstract
We study the task of entity linking for Vietnamese tweets, which aims at detecting entity mentions and linking them to corresponding entries in a given knowledge base. Unlike authored news or textual web content, tweets are noisy, irregular, and short, which causes entity linking in tweets much more challenging.We propose an approach to build an end-to-end entity linking system for Vietnamese tweets. The system consists of two stages. The first stage is to detect mentions and the second one performs entity disambiguation. We create a dataset including 524 Vietnamese tweets with 1,061 mentions and evaluate the system on this dataset. Our system achieves 69.2% F1-score. In order to show that our system is language-independent,we evaluate the system on a public dataset including 562 English tweets. The experiment results show that our system achieves 54.5% F1-score and outperforms the state-of-the-art end-to-end entity linking methods for tweets. To the best of our knowledge, this is the first attempt to build an end-to-end entity linking system for Vietnamese tweets and the system achieves very encouraging performance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Milne, D., Witten, H.I.: Learning to Link with Wikipedia. In: Proc. of the ACM Conference on Information and Knowledge Management, pp. 509–518 (2008)
Meij, E., Weerkamp, W., Rijke, D.M.: Adding Semantics to Microblog Posts. In: Proc. of the Fifth ACM International Conference on Web Search and Data Mining (WSDM) (2012)
Liu, X., Li, Y., Wu, H., Zhou, M., Wei, F., Lu, Y.: Entity Linking for Tweets. In: Proc. of the 51st Annual Meeting of the Association for Computational Linguistics (ACL 2013), pp. 1304–1311 (2013)
Cassidy, T., Ji, H., Ratinov, L., Zubiaga, A., Huang, H.: Analysis and Enhancement of Wikification for Microblogs with Context Expansion. In: Proc. of the 23th International Conference on Computational Linguistics (COLING 2012), pp. 441–456 (2012)
Ratinov, L., Roth, D., Downey, D., Anderson, M.: Local and Global Algorithms for Disambiguation to Wikipedia. In: Proc. of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 1375–1384 (2011)
Huynh, H.M., Nguyen, T.T., Cao, T.H.: Using Coreference and Surrounding Context for Entity Linking. In: Proc. of the 10th IEEE RIVF International Conference on Computing and Communication Technologies (RIVF 2013) (2013)
Sofean, M., Stewart, A., Denecke, K., Smith, M.: Medical Case-Driven Classification of Microblogs: Characteristics and Annotation. In: Proc. of IHI 2012 (2012)
Truong, L.M., Cao, T.H., Dinh, D.: Towards vietnamese entity disambiguation. In: Van Huynh, N., Denoeux, T., Tran, D.H., Le, A.C., Pham, B.S. (eds.) KSE 2013, Part II. Advances in Intelligent Systems and Computing, vol. 245, pp. 299–310. Springer, Heidelberg (2014)
Milne, D., Witten, H.I.: An open-source toolkit for mining Wikipedia. Artificial Intelligence 194, 222–239 (2012)
Han, X., Sun, L., Zhao, J.: Collective Entity Linking in Web Text: A Graph-Based Method. In: Proc. of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 765–774 (2011)
Han, X., Sun, L.: A generative entity-mention model for linking entities with knowledge base. In: Proc. of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 945–954. Association for Computational Linguistics
Hachey, B., Radford, W., Curran, J.R.: Graph-based named entity linking with wikipedia. In: Bouguettaya, A., Hauswirth, M., Liu, L. (eds.) WISE 2011. LNCS, vol. 6997, pp. 213–226. Springer, Heidelberg (2011)
Ji, H., Grishman, R., Dang, H.T.: Overview of the TAC 2011 Knowledge Base Population Track. In: Proc. of Text Analysis Conference (2011)
Ji, H., Grishman, R., Dang, H.T., Griffitt, K., Ellis, J.: Overview of the TAC 2010 Knowledge Base Population Track. In: Proc. Text Analysis Conference (2010)
McNamee, P., Dang, H.T.: Overview of the tac 2009 knowledge base population track. In: Proc. Text Analysis Conference (2009)
Shen, W., Wang, J., Luo, P., Wang, M.: Linking named entities in tweets with knowledge base via user interest modeling. In: Proc. of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 68–76 (2013)
Guo, S., Chang, M.W., Kiciman, E.: To Link or Not to Link? A Study on End-to-End Tweet Entity Linking. In: Proc. of NAACL 2013 (2013)
Murnane, E.L., Haslhofer, B., Lagoze, C.: RESLVE: leveraging user interest to improve entity disambiguation on short text. In: Proc. of the 22nd International Conference on World Wide Web, pp. 1275–1284 (2013)
Derczynski, L., Maynard, D., Aswani, N., Bontcheva, K.: Microblog-Genre Noise and Impact on Semantic Annotation Accuracy. In: Proc. of 24th ACM Conference on Hypertext and Social Media (2013)
Bontcheva, K., Rout, D.: Making sense of social media streams through semantics: a survey. Semantic Web Journal (2012)
Jin, Y., Kiciman, E., Wang, K., Loynd, R.: Entity Linking at the Tail: Sparse Signals, Unknown Entities and Phrase Models. In: Proc. of The Seventh ACM International Conference on Web Search and Data Mining (WSDM 2014) (2014)
Li, Y., Wang, C., Han, F., Han, J., Roth, D., Yan, X.: Mining evidences for named entity disambiguation. In: Proc. of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2013) (2013)
He, Z., Liu, S., Song, Y., Li, M., Zhou, M., Wang, H.: Efficient collective entity linking with stacking. In: Proc. of the 2013 Conference on Empirical Methods in Natural Language Processing (EMNLP 2013) (2013)
He, Z., Liu, S., Li, M., Zhou, M., Zhang, L., Wang, H.: Learning entity representation for entity disambiguation. In: Proc. of the 51st Annual Meeting of the Association for Computational Linguistics (ACL 2013), pp. 30–34 (2013)
Berners-Lee, T., Hendler, J., Lassila, O.: The Semantic Web. Scientific American, 34–43 (2001)
Spina, D., Gonzalo, J., Amigó, E.: Discovering filter keywords for company name dis-ambiguation in twitter. Expert Systems with Applications 40(12), 4986–5003 (2013)
Nguyen, H.T., Cao, T.H.: Named Entity Disambiguation: A Hybrid Approach. International Journal of Computational Intelligence Systems 5(6), 1052–1067 (2012)
Huang, H., Cao, Y., Huang, X., Ji, H., Lin, C.-Y.: Collective Tweet Wikification based on Semi-supervised Graph Regularization. In: Proc. of the 52nd Annual Meeting of the Association for Computational Linguistics (ACL 2014) (2014)
Garcia, N.F., Fisteus, J.A., Fernández, L.S.: Comparative Evaluation of Link-Based Approaches for Candidate Ranking in Link-to-Wikipedia Systems. Journal of Artificial Intelligence Research 49, 733–773 (2014)
Ferragina, P., Scaiella, U.: Tagme: on-the-fly annotation of short text fragments (by Wikipedia entities). In: Proc. of the 19th ACM International Conference on Information and Knowledge Management (CIKM 2010) (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Van, D.K., Huynh, H.M., Nguyen, H.T., Vo, V.T. (2015). Entity Linking for Vietnamese Tweets. In: Nguyen, VH., Le, AC., Huynh, VN. (eds) Knowledge and Systems Engineering. Advances in Intelligent Systems and Computing, vol 326. Springer, Cham. https://doi.org/10.1007/978-3-319-11680-8_48
Download citation
DOI: https://doi.org/10.1007/978-3-319-11680-8_48
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11679-2
Online ISBN: 978-3-319-11680-8
eBook Packages: EngineeringEngineering (R0)