Term-Based Models for Entity Ranking

Balog, Krisztian

doi:10.1007/978-3-319-93935-3_3

Krisztian Balog⁴

Part of the book series: The Information Retrieval Series ((INRE,volume 39))

23k Accesses

Abstract

Ad hoc entity retrieval is the task of answering a free text query with a ranked list of entities. The main idea behind our approaches in this chapter can be summarized as follows: If textual representations can be constructed for entities, then the ranking of these representations (“entity descriptions”) becomes straightforward by building on traditional document retrieval techniques. Accordingly, the bulk of the work presented in this chapter revolves around assembling term-based entity representations from various sources, ranging from unstructured documents to structured knowledge bases. We also discuss evaluation methodology and standard test collections.

Download to read the full chapter text

Chapter PDF

References

Balog, K., Azzopardi, L., de Rijke, M.: Formal models for expert finding in enterprise corpora. In: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, SIGIR ’06, pp. 43–50. ACM (2006). doi: 10.1145/1148170.1148181
Balog, K., Azzopardi, L., de Rijke, M.: A language modeling framework for expert finding. Inf. Process. Manage. 45(1), 1–19 (2009a). doi: 10.1016/j.ipm.2008.06.003
Article Google Scholar
Balog, K., Fang, Y., de Rijke, M., Serdyukov, P., Si, L.: Expertise retrieval. Found. Trends Inf. Retr. 6(2-3), 127–256 (2012a). doi: 10.1561/1500000024
Article Google Scholar
Balog, K., Neumayer, R.: A test collection for entity search in DBpedia. In: Proceedings of the 36th international ACM SIGIR conference on Research and development in Information Retrieval, SIGIR ’13, pp. 737–740. ACM (2013). doi: 10.1145/2484028.2484165
Balog, K., de Rijke, M.: Associating people and documents. In: Proceedings of the IR Research, 30th European Conference on Advances in Information Retrieval, ECIR’08, pp. 296–308. Springer (2008). doi: 10.1007/978-3-540-78646-7_28
Google Scholar
Balog, K., Serdyukov, P., de Vries, A.P.: Overview of the TREC 2010 Entity track. In: Proceedings of the Nineteenth Text REtrieval Conference, TREC ’10. NIST (2011)
Google Scholar
Balog, K., Serdyukov, P., de Vries, A.P.: Overview of the TREC 2011 Entity track. In: The Twentieth Text REtrieval Conference Proceedings, TREC ’11. NIST (2012b)
Google Scholar
Balog, K., Soboroff, I., Thomas, P., Craswell, N., de Vries, A.P., Bailey, P.: Overview of the TREC 2008 Enterprise track. In: Proceedings of the 17th Text REtrieval Conference, TREC ’08. NIST (2009b)
Google Scholar
Balog, K., de Vries, A.P., Serdyukov, P., Thomas, P., Westerveld, T.: Overview of the TREC 2009 Entity track. In: Proceedings of the Eighteenth Text REtrieval Conference, TREC ’09. NIST (2010)
Google Scholar
Blanco, R., Halpin, H., Herzig, D.M., Mika, P., Pound, J., Thompson, H.S., Duc, T.T.: Entity search evaluation over structured web data. In: Proceedings of the 1st International Workshop on Entity-Oriented Search, EOS ’11, pp. 65–71 (2011a)
Google Scholar
Blanco, R., Halpin, H., Herzig, D.M., Mika, P., Pound, J., Thompson, H.S., Tran, T.: Repeatable and reliable semantic search evaluation. Web Semant. 21, 14–29 (2013)
Article Google Scholar
Blanco, R., Mika, P., Vigna, S.: Effective and efficient entity search in RDF data. In: Proceedings of the 10th International Conference on The Semantic Web, ISWC ’11, pp. 83–97. Springer (2011b). doi: 10.1007/978-3-642-25073-6_6
Chapter Google Scholar
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001). doi: 10.1023/A:1010933404324
Article Google Scholar
Broder, A., Gabrilovich, E., Josifovski, V., Mavromatis, G., Metzler, D., Wang, J.: Exploiting site-level information to improve web search. In: Proceedings of the 19th ACM International Conference on Information and Knowledge Management, CIKM ’10, pp. 1393–1396. ACM (2010). doi: 10.1145/1871437.1871630
Burges, C., Shaked, T., Renshaw, E., Lazier, A., Deeds, M., Hamilton, N., Hullender, G.: Learning to rank using gradient descent. In: Proceedings of the 22nd International Conference on Machine Learning, ICML ’05, pp. 89–96. ACM (2005). doi: 10.1145/1102351.1102363
Burges, C.J.C., Ragno, R., Le, Q.V.: Learning to rank with nonsmooth cost functions. In: Proceedings of the 19th International Conference on Neural Information Processing Systems, NIPS ’06, pp. 193–200. MIT Press (2006)
Google Scholar
Cao, Z., Qin, T., Liu, T.Y., Tsai, M.F., Li, H.: Learning to rank: From pairwise approach to listwise approach. In: Proceedings of the 24th International Conference on Machine Learning, ICML ’07, pp. 129–136. ACM (2007). doi: 10.1145/1273496.1273513
Chakrabarti, S., Kasturi, S., Balakrishnan, B., Ramakrishnan, G., Saraf, R.: Compressed data structures for annotated web search. In: Proceedings of the 21st International Conference on World Wide Web, WWW ’12, pp. 121–130. ACM (2012). doi: 10.1145/2187836.2187854
Chakrabarti, S., Puniyani, K., Das, S.: Optimizing scoring functions and indexes for proximity search in type-annotated corpora. In: Proceedings of the 15th International Conference on World Wide Web, WWW ’06, pp. 717–726. ACM (2006). doi: 10.1145/1135777.1135882
Chapelle, O., Chang, Y.: Yahoo! Learning to Rank Challenge overview. In: Proceedings of the Yahoo! Learning to Rank Challenge, pp. 1–24 (2011)
Google Scholar
Chen, J., Xiong, C., Callan, J.: An empirical study of learning to rank for entity search. In: Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’16, pp. 737–740. ACM (2016). doi: 10.1145/2911451.2914725
Cheng, T., Chang, K.C.C.: Beyond pages: Supporting efficient, scalable entity search with dual-inversion index. In: Proceedings of the 13th International Conference on Extending Database Technology, EDBT ’10, pp. 15–26. ACM (2010). doi: 10.1145/1739041.1739047
Cimiano, P., Lopez, V., Unger, C., Cabrio, E., Ngonga Ngomo, A.C., Walter, S.: Multilingual question answering over Linked Data (QALD-3): Lab overview. In: Information Access Evaluation. Multilinguality, Multimodality, and Visualization: 4th International Conference of the CLEF Initiative, CLEF 2013, Valencia, Spain, September 23–26, 2013. Proceedings, pp. 321–332. Springer (2013). doi: 10.1007/978-3-642-40802-1_30
Google Scholar
Conrad, J.G., Utt, M.H.: A system for discovering relationships by feature extraction from text databases. In: Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’94, pp. 260–270. Springer (1994)
Google Scholar
Craswell, N., de Vries, A.P., Soboroff, I.: Overview of the TREC-2005 Enterprise track. In: Proceedings of the 14th Text REtrieval Conference, TREC ’05. NIST (2006)
Google Scholar
Croft, B., Metzler, D., Strohman, T.: Search Engines: Information Retrieval in Practice. 1st edn. Addison-Wesley Publishing Co. (2009)
Google Scholar
Dalton, J., Huston, S.: Semantic entity retrieval using web queries over structured RDF data. In: Proceedings of the 3rd International Semantic Search Workshop, SEMSEARCH ’10 (2010)
Google Scholar
Dalvi, N., Kumar, R., Soliman, M.: Automatic wrappers for large scale web extraction. Proc. VLDB Endow. 4(4), 219–230 (2011). doi: 10.14778/1938545.1938547
Article Google Scholar
Demartini, G., Iofciu, T., de Vries, A.: Overview of the INEX 2009 Entity Ranking track. In: Geva, S., Kamps, J., Trotman, A. (eds.) Focused Retrieval and Evaluation, Lecture Notes in Computer Science, vol. 6203, pp. 254–264. Springer (2010). doi: 10.1007/978-3-642-14556-8_26
Chapter Google Scholar
Demartini, G., de Vries, A.P., Iofciu, T., Zhu, J.: Overview of the INEX 2008 Entity Ranking track. In: Advances in Focused Retrieval: 7th International Workshop of the Initiative for the Evaluation of XML Retrieval (INEX 2008), pp. 243–252 (2009). doi: 10.1007/978-3-642-03761-0_25
Google Scholar
Firth, J.R.: A synopsis of linguistic theory 1930-55. Studies in Linguistic Analysis (special volume of the Philological Society) 1952-59, 1–32 (1957)
Google Scholar
Freund, Y., Iyer, R., Schapire, R.E., Singer, Y.: An efficient boosting algorithm for combining preferences. J. Mach. Learn. Res. 4, 933–969 (2003)
MathSciNet MATH Google Scholar
Friedman, J.H.: Greedy function approximation: A gradient boosting machine. Annals of Statistics 29, 1189–1232 (2000)
Article MathSciNet Google Scholar
Graus, D., Tsagkias, M., Weerkamp, W., Meij, E., de Rijke, M.: Dynamic collective entity representations for entity ranking. In: Proceedings of the Ninth ACM International Conference on Web Search and Data Mining, WSDM ’16, pp. 595–604. ACM (2016). doi: 10.1145/2835776.2835819
Gurajada, S., Kamps, J., Mishra, A., Schenkel, R., Theobald, M., Wang, Q.: Overview of the INEX 2013 Linked Data track. In: CLEF 2013 Evaluation Labs and Workshop, Online Working Notes (2013)
Google Scholar
Halpin, H., Herzig, D.M., Mika, P., Blanco, R., Pound, J., Thompson, H.S., Tran, D.T.: Evaluating ad-hoc object retrieval. In: Proceedings of the International Workshop on Evaluation of Semantic Technologies, IWEST ’10 (2010)
Google Scholar
Hasibi, F., Balog, K., Bratsberg, S.E.: Exploiting entity linking in queries for entity retrieval. In: Proceedings of the 2016 ACM on International Conference on the Theory of Information Retrieval, ICTIR ’16, pp. 209–218. ACM (2016). doi: 10.1145/2970398.2970406
Hasibi, F., Nikolaev, F., Xiong, C., Balog, K., Bratsberg, S.E., Kotov, A., Callan, J.: DBpedia-Entity v2: A test collection for entity search. In: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’17, pp. 1265–1268. ACM (2017). doi: 10.1145/3077136.3080751
Joachims, T.: Optimizing search engines using clickthrough data. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’02, pp. 133–142. ACM (2002). doi: 10.1145/775047.775067
Kim, J., Xue, X., Croft, W.B.: A probabilistic retrieval model for semistructured data. In: Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval, pp. 228–239. Springer (2009). doi: 10.1007/978-3-642-00958-7_22
Google Scholar
Laender, A.H.F., Ribeiro-Neto, B.A., da Silva, A.S., Teixeira, J.S.: A brief survey of web data extraction tools. SIGMOD Rec. 31(2), 84–93 (2002). doi: 10.1145/565117.565137
Article Google Scholar
Liu, T.Y.: Learning to Rank for Information Retrieval. Springer (2011)
Google Scholar
Lopez, V., Unger, C., Cimiano, P., Motta, E.: Evaluating question answering over Linked Data. Web Semantics: Science, Services and Agents on the World Wide Web 21, 3–13 (2013). doi: 10.1016/j.websem.2013.05.006
Article Google Scholar
Lu, C., Lam, W., Liao, Y.: Entity retrieval via entity factoid hierarchy. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), ACL ’15, pp. 514–523. Association for Computational Linguistics (2015). doi: 10.3115/v1/P15-1050
Lu, W., Robertson, S., MacFarlane, A.: Field-weighted XML retrieval based on BM25. In: Proceedings of the 4th International Conference on Initiative for the Evaluation of XML Retrieval, INEX ’05, pp. 161–171 (2006). doi: 10.1007/11766278_12
Lv, Y., Zhai, C.: Positional language models for information retrieval. In: Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’09, pp. 299–306. ACM (2009). doi: 10.1145/1571941.1571994
Macdonald, C., Ounis, I.: Voting for candidates: Adapting data fusion techniques for an expert search task. In: Proceedings of the 15th ACM international conference on Information and knowledge management, CIKM ’06, pp. 387–396. ACM (2006). doi: 10.1145/1183614.1183671
Macdonald, C., Santos, R.L., Ounis, I.: The whens and hows of learning to rank for web search. Inf. Retr. 16(5), 584–628 (2013). doi: 10.1007/s10791-012-9209-9
Article Google Scholar
Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press (2008)
Google Scholar
Meij, E., Weerkamp, W., de Rijke, M.: Adding semantics to microblog posts. In: Proceedings of the Fifth ACM International Conference on Web Search and Data Mining, WSDM ’12, pp. 563–572. ACM (2012). doi: 10.1145/2124295.2124364
Metzler, D.: A Feature-Centric View of Information Retrieval. Springer (2011)
Google Scholar
Metzler, D., Croft, W.B.: A Markov random field model for term dependencies. In: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’05, pp. 472–479. ACM (2005). doi: 10.1145/1076034.1076115
Metzler, D., Croft, W.B.: Linear feature-based models for information retrieval. Inf. Retr. 10(3), 257–274 (2007). doi: 10.1007/s10791-006-9019-z
Article Google Scholar
Neumayer, R., Balog, K., Nørvåg, K.: On the modeling of entities for ad-hoc entity search in the Web of Data. In: Proceedings of the 34th European conference on Advances in Information Retrieval, ECIR ’12, pp. 133–145. Springer (2012a). doi: 10.1007/978-3-642-28997-2_12
Google Scholar
Neumayer, R., Balog, K., Nørvåg, K.: When simple is (more than) good enough: Effective semantic search with (almost) no semantics. In: Proceedings of the 34th European conference on Advances in Information Retrieval, ECIR ’12, pp. 540–543. Springer (2012b). doi: 10.1007/978-3-642-28997-2_59
Google Scholar
Nikolaev, F., Kotov, A., Zhiltsov, N.: Parameterized fielded term dependence models for ad-hoc entity retrieval from knowledge graph. In: Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’16, pp. 435–444. ACM (2016). doi: 10.1145/2911451.2911545
Ogilvie, P., Callan, J.: Combining document representations for known-item search. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’03, pp. 143–150. ACM (2003). doi: 10.1145/860435.860463
Ogilvie, P., Callan, J.: Hierarchical language models for XML component retrieval. In: Fuhr, N., Lalmas, M., Malik, S., Szlávik, Z. (eds.) Advances in XML Information Retrieval, Third International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2004, Lecture Notes in Computer Science, vol. 3493, pp. 224–237. Springer (2005). doi: 10.1007/11424550_18
Chapter Google Scholar
Pérez-Agüera, J.R., Arroyo, J., Greenberg, J., Iglesias, J.P., Fresno, V.: Using BM25F for semantic search. In: Proceedings of the 3rd International Semantic Search Workshop, SEMSEARCH ’10. ACM (2010). doi: 10.1145/1863879.1863881
Petkova, D., Croft, W.B.: Proximity-based document representation for named entity retrieval. In: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management, CIKM ’07, pp. 731–740. ACM (2007). doi: 10.1145/1321440.1321542
Qin, T., Liu, T.Y., Xu, J., Li, H.: LETOR: A benchmark collection for research on learning to rank for information retrieval. Inf. Retr. 13(4), 346–374 (2010). doi: 10.1007/s10791-009-9123-y
Article Google Scholar
Raghavan, H., Allan, J., Mccallum, A.: An exploration of entity models, collective classification and relation description. In: KDD Workshop on Link Analysis and Group Detection, LinkKDD ’04, pp. 1–10 (2004)
Google Scholar
Robertson, S., Zaragoza, H.: The probabilistic relevance framework: BM25 and beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). doi: 10.1561/1500000019
Article Google Scholar
Robertson, S., Zaragoza, H., Taylor, M.: Simple BM25 extension to multiple weighted fields. In: Proceedings of the 13th ACM conference on Information and knowledge management, CIKM ’04, pp. 42–49 (2004). doi: 10.1145/1031171.1031181
Robertson, S.E.: The probability ranking principle in information retrieval. Journal of Documentation 33, 294–304 (1977)
Article Google Scholar
Sanderson, M.: Test collection based evaluation of information retrieval systems. Found. Trends Inf. Retr. 4(4), 247–375 (2010). doi: 10.1561/1500000009
Article Google Scholar
Sandhaus, E.: The New York Times Annotated Corpus. Tech. rep. (2008)
Google Scholar
Schuhmacher, M., Dietz, L., Paolo Ponzetto, S.: Ranking entities for web queries through text and knowledge. In: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, CIKM ’15, pp. 1461–1470. ACM (2015). doi: 10.1145/2806416.2806480
Unger, C., Forascu, C., Lopez, V., Ngonga Ngomo, A.C., Cabrio, E., Cimiano, P., Walter, S.: Question answering over Linked Data (QALD-4). In: Cappellato, L., Ferro, N., Halvey, M., Kraaij, W. (eds.) Working Notes for CLEF 2014 Conference (2014)
Google Scholar
Unger, C., Forascu, C., Lopez, V., Ngonga Ngomo, A.C., Cabrio, E., Cimiano, P., Walter, S.: Question answering over Linked Data (QALD-5). In: Cappellato, L., Ferro, N., Jones, G., San Juan, E. (eds.) Working Notes of CLEF 2015 - Conference and Labs of the Evaluation forum (2015)
Google Scholar
Unger, C., Ngomo, A.C.N., Cabrio, E.: 6th Open Challenge on Question Answering over Linked Data (QALD-6). In: Sack, H., Dietze, S., Tordai, A., Lange, C. (eds.) Semantic Web Challenges: Third SemWebEval Challenge at ESWC 2016, Heraklion, Crete, Greece, May 29 - June 2, 2016, Revised Selected Papers, pp. 171–177. Springer (2016). doi: 10.1007/978-3-319-46565-4_13
Chapter Google Scholar
Voorhees, E.M., Harman, D.K.: TREC: Experiment and Evaluation in Information Retrieval. The MIT Press (2005)
Google Scholar
de Vries, A.P., Vercoustre, A.M., Thom, J.A., Craswell, N., Lalmas, M.: Overview of the INEX 2007 Entity Ranking track. In: Proceedings of the 6th Initiative on the Evaluation of XML Retrieval, INEX ’07, pp. 245–251. Springer (2008). doi: 10.1007/978-3-540-85902-4_22
Chapter Google Scholar
Wang, Q., Kamps, J., Camps, G.R., Marx, M., Schuth, A., Theobald, M., Gurajada, S., Mishra, A.: Overview of the INEX 2012 Linked Data track. In: CLEF 2012 Evaluation Labs and Workshop, Online Working Notes (2012)
Google Scholar
Wu, Q., Burges, C.J., Svore, K.M., Gao, J.: Adapting boosting for information retrieval measures. Inf. Retr. 13(3), 254–270 (2010). doi: 10.1007/s10791-009-9112-1
Article Google Scholar
Xu, J., Li, H.: AdaRank: A boosting algorithm for information retrieval. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’07, pp. 391–398. ACM (2007). doi: 10.1145/1277741.1277809
Zhai, C., Lafferty, J.: A study of smoothing methods for language models applied to information retrieval. ACM Trans. Inf. Syst. 22(2), 179–214 (2004). doi: 10.1145/984321.984322
Article Google Scholar
Zhai, C., Massung, S.: Text Data Management and Analysis: A Practical Introduction to Information Retrieval and Text Mining. ACM and Morgan & Claypool (2016)
Google Scholar
Zheng, Z., Zha, H., Zhang, T., Chapelle, O., Chen, K., Sun, G.: A general boosting method and its application to learning ranking functions for web search. In: Proceedings of the 20th International Conference on Neural Information Processing Systems, NIPS ’07, pp. 1697–1704. Curran Associates Inc. (2007)
Google Scholar
Zhiltsov, N., Agichtein, E.: Improving entity search over Linked Data by modeling latent semantics. In: Proceedings of the 22nd ACM International Conference on Information and Knowledge Management, CIKM ’13, pp. 1253–1256. ACM (2013). doi: 10.1145/2505515.2507868
Zhiltsov, N., Kotov, A., Nikolaev, F.: Fielded sequential dependence model for ad-hoc entity retrieval in the Web of Data. In: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’15, pp. 253–262. ACM (2015). doi: 10.1145/2766462.2767756

Download references

Author information

Authors and Affiliations

University of Stavanger, Stavanger, Norway
Krisztian Balog

Authors

Krisztian Balog
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this book are included in the book's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the book's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Balog, K. (2018). Term-Based Models for Entity Ranking. In: Entity-Oriented Search. The Information Retrieval Series, vol 39. Springer, Cham. https://doi.org/10.1007/978-3-319-93935-3_3

Download citation

DOI: https://doi.org/10.1007/978-3-319-93935-3_3
Published: 03 October 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-93933-9
Online ISBN: 978-3-319-93935-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics