Entity Linking in Enterprise Search: Combining Textual and Structural Information

Bhatia, Sumit

doi:10.1007/978-3-030-01872-6_8

Sumit Bhatia⁴

Part of the book series: Unsupervised and Semi-Supervised Learning ((UNSESUL))

795 Accesses
1 Citations

Abstract

Fast and correct identification of named entities in queries is crucial for query understanding and to map the query to information in structured knowledge base. Most of the existing works have focused on utilizing search logs and manually curated knowledge bases for entity linking and often involve complex graph operations and are generally slow. We describe a simple, yet fast and accurate, probabilistic entity linking algorithm that can be used in enterprise settings where automatically constructed, domain-specific knowledge graphs are used. In addition to the linked graph structure, textual evidence from the domain-specific corpus is also utilized to improve the performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 109.00; Price excludes VAT (USA)

Hardcover Book: USD 139.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
For domain-specific applications where the knowledge graph is constructed using automated methods, the set of input documents constitute the background corpus. For applications that use generic, open-domain knowledge bases such as DBPedia and WikiData, Wikipedia could be used as the background text corpus.
2.
http://opennlp.apache.org/.
3.
https://nlp.stanford.edu/software/CRF-NER.html.
4.
https://www.ibm.com/watson/services/natural-language-understanding/.
5.
Text context components can be computed by using an inverted index implementation where using the context terms as queries, most relevant mention docs (and thus the corresponding entities) can be retrieved in a single query. Likewise, entity context component can be computed by just counting the number of connections between target entities—can be performed in a single optimized SQL query.

References

Aggarwal, N., Buitelaar, P.: Wikipedia-based distributional semantics for entity relatedness. In: 2014 AAAI Fall Symposium Series (2014)
Google Scholar
Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.: DBpedia: A nucleus for a web of open data. In: Aberer, K., Choi, K.S., Noy, N.F., Allemang, D., Lee, K.I., Nixon, L.J.B., Golbeck, J., Mika, P., Maynard, D., Mizoguchi, R., Schreiber, G., Cudré-Mauroux, P. (eds.) 6th International Semantic Web Conference (ISWC 2007). Lecture Notes in Computer Science, vol. 4825, pp. 722–735. Busan (2007). https://doi.org/10.1007/978-3-540-76298-0_52
Chapter Google Scholar
Aula, A., Khan, R.M., Guan, Z.: How does search behavior change as search becomes more difficult? In: Mynatt, E.D., Schoner, D., Fitzpatrick, G., Hudson, S.E., Edwards, W.K., Rodden, T. (eds.) Proceedings of the 28th International Conference on Human Factors in Computing Systems, CHI 2010, Atlanta, GA, 10–15 April 2010, pp. 35–44. Association for Computing Machinery, New York (2010). http://doi.acm.org/10.1145/1753326.1753333
Google Scholar
Bagga, A., Baldwin, B.: Entity-based cross-document coreferencing using the vector space model. In: Boitet, C., Whitelock, P. (eds.) ACL/COLING, pp. 79–85. Morgan Kaufmann Publishers/ACL (1998). http://aclweb.org/anthology/P/P98/
Bhatia, S., Jain, A.: Context sensitive entity linking of search queries in enterprise knowledge graphs. In: Sack, H., Rizzo, G., Steinmetz, N., Mladenic, D., Auer, S., Lange, C. (eds.) The Semantic Web – ESWC 2016 Satellite Events, Heraklion, Crete, 29 May–2 June 2016, Revised Selected Papers. Lecture Notes in Computer Science, vol. 9989, pp. 50–54 (2016). https://doi.org/10.1007/978-3-319-47602-5_11
Article Google Scholar
Bhatia, S., Vishwakarma, H.: Know Thy Neighbors, and More! Studying the Role of Context in Entity Recommendation. In: HT ’18: 29th ACM Conference on Hypertext and Social Media, 9–12 July 2018, Baltimore. Association for Computing Machinery, New York (2018). https://doi.org/10.1145/3209542.3209548
Bhatia, S., Majumdar, D., Mitra, P.: Query suggestions in the absence of query logs. In: Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’11, pp. 795–804. Association for Computing Machinery, New York (2011). http://doi.acm.org/10.1145/2009916.2010023
Bhatia, S., Rajshree, N., Jain, A., Aggarwal, N.: Tools and infrastructure for supporting enterprise knowledge graphs. In: Cong, G., Peng, W., Zhang, W.E., Li, C., Sun, A. (eds.) Proceedings of the 13th International Conference Advanced Data Mining and Applications, ADMA 2017, Singapore, 5–6 November 2017. Lecture Notes in Computer Science, vol. 10604, pp. 846–852. Springer, Berlin (2017). https://doi.org/10.1007/978-3-319-69179-4_60
Google Scholar
Blanco, R., Cambazoglu, B.B., Mika, P., Torzec, N.: Entity recommendations in web search. In: Alani, H., Kagal, L., Fokoue, A., Groth, P., Biemann, C., Parreira, J.X., Aroyo, L., Noy, N., Welty, C., Janowicz, K. (eds.) The Semantic Web – ISWC 2013, pp. 33–48. Springer, Berlin (2013)
Google Scholar
Blanco, R., Ottaviano, G., Meij, E.: Fast and space-efficient entity linking for queries. In: Cheng, X., Li, H., Gabrilovich, E., Tang, J. (eds.) Proceedings of the Eighth ACM International Conference on Web Search and Data Mining, WSDM 2015, Shanghai, 2–6 February 2015, pp. 179–188. Association for Computing Machinery, New York (2015). http://dl.acm.org/citation.cfm?id=2684822
Google Scholar
Brizan, D.G., Tansel, A.U.: A. survey of entity resolution and record linkage methodologies. Commun. IIMA 6(3), 5 (2006)
Google Scholar
Castelli, V., Raghavan, H., Florian, R., Han, D.J., Luo, X., Roukos, S.: Distilling and exploring nuggets from a corpus. In: SIGIR, pp. 1006–1006 (2012)
Google Scholar
Cheng, X., Roth, D.: Relational inference for wikification. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 1787–1796. Association for Computational Linguistics, Seattle (2013). http://aclweb.org/anthology/D/D13/D13-1184.pdf
Christen, P.: Data Matching – Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection. Data-Centric Systems and Applications. Springer, Berlin (2012)
Google Scholar
Cucerzan, S.: Large-scale named entity disambiguation based on Wikipedia data. In: Eisner, J. (ed.) EMNLP-CoNLL 2007, Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, 28–30 June 2007, Prague, pp. 708–716. Association for Computational Linguistics, Seattle (2007). http://www.aclweb.org/anthology/K/K07/
Google Scholar
Dalton, J., Dietz, L., Allan, J.: Entity query feature expansion using knowledge base links. In: Proceedings of the 37th International ACM SIGIR Conference on Research & Development in Information Retrieval, SIGIR ’14, pp. 365–374. Association for Computing Machinery, New York (2014). http://doi.acm.org/10.1145/2600428.2609628
Dill, S., Eiron, N., Gibson, D., Gruhl, D., Guha, R., Jhingran, A., Kanungo, T., Rajagopalan, S., Tomkins, A., Tomlin, J.A., Zien, J.Y.: Semtag and seeker: Bootstrapping the semantic web via automated semantic annotation. In: Proceedings of the 12th International Conference on World Wide Web, WWW ’03, pp. 178–186. Association for Computing Machinery, New York (2003). http://doi.acm.org/10.1145/775152.775178
Dunn, H.L.: Record linkage. Am. J. Public Health and the Nations Health 36(12), 1412–1416 (1946). https://doi.org/10.2105/AJPH.36.12.1412. PMID: 18016455
Article Google Scholar
Elango, P.: Coreference resolution: A survey. Technical Report, University of Wisconsin, Madison, WI (2005)
Google Scholar
Ferragina, P., Scaiella, U.: Fast and accurate annotation of short texts with Wikipedia pages. IEEE Softw. 29(1), 70–75 (2012). http://dx.doi.org/10.1109/MS.2011.122
Article Google Scholar
Gottipati, S., Jiang, J.: Linking entities to a knowledge base with query expansion. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP ’11, pp. 804–813. Association for Computational Linguistics, Stroudsburg (2011). http://dl.acm.org/citation.cfm?id=2145432.2145523
Guha, R., McCool, R.: Tap: a semantic web test-bed. Web Semant. Sci. Serv. Agents on the World Wide Web 1(1), 81–87 (2003). https://doi.org/10.1016/j.websem.2003.07.004. http://www.sciencedirect.com/science/article/pii/S1570826803000064
Article Google Scholar
Guo, S., Chang, M.W., Kiciman, E.: To link or not to link? a study on end-to-end tweet entity linking. In: Vanderwende, L., III, H.D., Kirchhoff, K. (eds.) Human Language Technologies: Conference of the North American Chapter of the Association of Computational Linguistics, Proceedings, 9–14 June 2013, Westin Peachtree Plaza Hotel, Atlanta, pp. 1020–1030. The Association for Computational Linguistics (2013). http://aclweb.org/anthology/N/N13/N13-1122.pdf
Hasibi, F., Balog, K., Bratsberg, S.E.: Entity linking in queries: tasks and evaluation. In: Proceedings of the 2015 International Conference on The Theory of Information Retrieval, ICTIR ’15, pp. 171–180. Association for Computing Machinery, New York (2015). http://doi.acm.org/10.1145/2808194.2809473
Hasibi, F., Balog, K., Bratsberg, S.E.: Entity linking in queries: efficiency vs. effectiveness. In: Jose, J.M., Hauff, C., Altingövde, I.S., Song, D., Albakour, D., Watt, S.N.K., Tait, J. (eds.) Proceedings of the 39th European Conference on IR Research Advances in Information Retrieval, ECIR 2017, Aberdeen, 8–13 April 2017. Lecture Notes in Computer Science, vol. 10193, pp. 40–53 (2017)
Article Google Scholar
Hoffart, J., Yosef, M.A., Bordino, I., Fürstenau, H., Pinkal, M., Spaniol, M., Taneva, B., Thater, S., Weikum, G.: Robust disambiguation of named entities in text. In: EMNLP, pp. 782–792. Association for Computational Linguistics, Seattle (2011). http://www.aclweb.org/anthology/D11-1072
Hoffart, J., Seufert, S., Nguyen, D.B., Theobald, M., Weikum, G.: Kore: keyphrase overlap relatedness for entity disambiguation. In: Chen, X.-W., Lebanon, G., Wang, H., Zaki, M.J. (eds.) 21st ACM International Conference on Information and Knowledge Management, CIKM’12, Maui, 29 October–02 November 2012, pp. 545–554. Association for Computing Machinery, New York (2012). http://dl.acm.org/citation.cfm?id=2396761
Google Scholar
Huang, J., Treeratpituk, P., Taylor, S.M., Giles, C.L.: Enhancing cross document coreference of web documents with context similarity and very large scale text categorization. In: Huang, C.R., Jurafsky, D. (eds.) COLING 2010, 23rd International Conference on Computational Linguistics, Proceedings of the Conference, 23–27 August 2010, Beijing, pp. 483–491. Tsinghua University Press (2010). http://aclweb.org/anthology/C/C10/
Khalid, M.A., Jijkoun, V., de Rijke, M.: The impact of named entity normalization on information retrieval for question answering. Springer, New York (2009). http://dare.uva.nl/record/297954
Google Scholar
Kulkarni, S., 0003, A.S., Ramakrishnan, G., Chakrabarti, S.: Collective annotation of Wikipedia entities in web text. In: IV, J.F.E., Fogelman-Soulié, F., Flach, P.A., Zaki, M.J. (eds.) Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Paris, 28 June–1 July, 2009, pp. 457–466. Association for Computing Machinery, New York (2009). http://doi.acm.org/10.1145/1557019.1557073
Lin, T., Pantel, P., Gamon, M., Kannan, A., Fuxman, A.: Active objects: Actions for entity-centric search. In: World Wide Web. Association for Computing Machinery, New York (2012). http://research.microsoft.com/apps/pubs/default.aspx?id=161389
Liu, X., Li, Y., Wu, H., Zhou, M., Wei, F., Lu, Y.: Entity linking for tweets. In: ACL (1), pp. 1304–1311. The Association for Computer Linguistics (2013). http://aclweb.org/anthology/P/P13/
Mendes, P.N., Jakob, M., García-Silva, A., Bizer, C.: DBpedia spotlight: shedding light on the web of documents. In: Proceedings of the 7th International Conference on Semantic Systems, pp. 1–8. Association for Computing Machinery, New York (2011)
Google Scholar
Mihalcea, R., Csomai, A.: Wikify!: linking documents to encyclopedic knowledge. In: Silva, M.J., Laender, A.H.F., Baeza-Yates, R.A., McGuinness, D.L., Olstad, B., Olsen, Ø.H., Falcão, A.O. (eds.) Proceedings of the Sixteenth ACM Conference on Information and Knowledge Management, CIKM 2007, Lisbon, 6–10 November 2007, pp. 233–242. Association for Computing Machinery, New York (2007). http://doi.acm.org/10.1145/1321440.1321475
Google Scholar
Mohit, B.: Named entity recognition, pp. 221–245 (2014). https://doi.org/10.1007/978-3-642-45358-8_7
Chapter Google Scholar
Moro, A., Raganato, A., Navigli, R.: Entity linking meets word sense disambiguation: a unified approach. Trans. Assoc. Comput. Linguist. 2, 231–244 (2014). https://transacl.org/ojs/index.php/tacl/article/view/291
Google Scholar
Nadeau, D., Sekine, S.: A survey of named entity recognition and classification. Lingvisticæ Investigationes 30(1), 3–26 (2007). http://www.jbe-platform.com/content/journals/10.1075/li.30.1.03nad
Article Google Scholar
Nagarajan, M., Wilkins, A.D., Bachman, B.J., Novikov, I.B., Bao, S., Haas, P.J., Terrón-Díaz, M.E., Bhatia, S., Adikesavan, A.K., Labrie, J.J., Regenbogen, S., Buchovecky, C.M., Pickering, C.R., Kato, L., Lisewski, A.M., Lelescu, A., Zhang, H., Boyer, S., Weber, G., Chen, Y., Donehower, L.A., Spangler, W.S., Lichtarge, O.: Predicting future scientific discoveries based on a networked analysis of the past literature. In: Cao, L., Zhang, C., Joachims, T., Webb, G.I., Margineantu, D.D., Williams, G. (eds.) Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Sydney, NSW, 10–13 August 2015, pp. 2019–2028. Association for Computing Machinery, New York (2015). http://dl.acm.org/citation.cfm?id=2783258
Newcombe, H.B., Kennedy, J.M., Axford, S.J., James, A.P.: Automatic linkage of vital records. Science 130(3381), 954–959 (1959). http://science.sciencemag.org/content/130/3381/954
Article Google Scholar
Pang, B., Kumar, R.: Search in the lost sense of “query”: question formulation in web search queries and its temporal changes. In: ACL (Short Papers), pp. 135–140. The Association for Computer Linguistics (2011). http://www.aclweb.org/anthology/P11-2024
Popescu, O.: Dynamic parameters for cross document coreference. In: Huang, C.R., Jurafsky, D. (eds.) COLING 2010, 23rd International Conference on Computational Linguistics, Posters Volume, 23–27 August 2010, Beijing, pp. 988–996. Chinese Information Processing Society of China (2010). http://aclweb.org/anthology/C/C10/C10-2114.pdf
Pound, J., Mika, P., Zaragoza, H.: Ad-hoc object retrieval in the web of data. In: Proceedings of the 19th International Conference on World Wide Web, WWW ’10, pp. 771–780. Association for Computing Machinery, New York (2010). http://doi.acm.org/10.1145/1772690.1772769
Ratinov, L., Roth, D., Downey, D., Anderson, M.: Local and global algorithms for disambiguation to Wikipedia. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, HLT ’11, vol. 1, pp. 1375–1384. Association for Computational Linguistics, Stroudsburg (2011). http://dl.acm.org/citation.cfm?id=2002472.2002642
Shen, W., Wang, J., Luo, P., Wang, M.: Linking named entities in tweets with knowledge base via user interest modeling. In: Dhillon, I.S., Koren, Y., Ghani, R., Senator, T.E., Bradley, P., Parekh, R., He, J., Grossman, R.L., Uthurusamy, R. (eds.) The 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2013, Chicago, 11–14 August 2013, pp. 68–76. Association for Computing Machinery, New York (2013). http://dl.acm.org/citation.cfm?id=2487575
Google Scholar
Suchanek, F.M., Kasneci, G., Weikum, G.: Yago: A core of semantic knowledge. In: Proceedings of the 16th International Conference on World Wide Web, WWW ’07, pp. 697–706. Association for Computing Machinery, New York (2007). http://doi.acm.org/10.1145/1242572.1242667
Varma, V., Bysani, P., Reddy, K., Reddy, V.B., Kovelamudi, S., Vaddepally, S.R., Nanduri, R., Kumar, N.K., Gsk, S., Pingali, P.: IIIT Hyderabad in guided summarization and knowledge base population. In: TAC. NIST (2010). http://www.nist.gov/tac/publications/2010/papers.html
Welty, C., Murdock, J.W., Kalyanpur, A., Fan, J.: A comparison of hard filters and soft evidence for answer typing in Watson. In: Cudré-Mauroux, P., Heflin, J., Sirin, E., Tudorache, T., Euzenat, J., Hauswirth, M., Parreira, J.X., Hendler, J., Schreiber, G., Bernstein, A., Blomqvist, E. (eds.) The Semantic Web – ISWC 2012, pp. 243–256. Springer, Berlin (2012)
Chapter Google Scholar
West, R., Gabrilovich, E., Murphy, K., Sun, S., Gupta, R., Lin, D.: Knowledge base completion via search-based question answering. In: Proceedings of the 23rd International Conference on World Wide Web, pp. 515–526. Association for Computing Machinery, New York (2014)
Google Scholar
Zhang, W., Su, J., Tan, C.L., Wang, W.: Entity linking leveraging automatically generated annotation. In: Huang, C.R., Jurafsky, D. (eds.) COLING 2010, 23rd International Conference on Computational Linguistics, Proceedings of the Conference, 23–27 August 2010, Beijing, pp. 1290–1298. Tsinghua University Press (2010). http://aclweb.org/anthology/C/C10/
Zheng, Z., Li, F., Huang, M., Zhu, X.: Learning to link entities with knowledge base. In: HLT-NAACL, pp. 483–491. The Association for Computational Linguistics (2010). http://www.aclweb.org/anthology/N10-1072

Download references

Author information

Authors and Affiliations

IBM Research AI, New Delhi, India
Sumit Bhatia

Authors

Sumit Bhatia
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sumit Bhatia .

Editor information

Editors and Affiliations

Queen’s University Belfast, Northern Ireland, UK
Deepak P
Queen’s University Belfast, Northern Ireland, UK
Anna Jurek-Loughrey

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Bhatia, S. (2019). Entity Linking in Enterprise Search: Combining Textual and Structural Information. In: P, D., Jurek-Loughrey, A. (eds) Linking and Mining Heterogeneous and Multi-view Data. Unsupervised and Semi-Supervised Learning. Springer, Cham. https://doi.org/10.1007/978-3-030-01872-6_8

Download citation

DOI: https://doi.org/10.1007/978-3-030-01872-6_8
Published: 27 November 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-01871-9
Online ISBN: 978-3-030-01872-6
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics