Abstract
Ad hoc entity retrieval is the task of answering a free text query with a ranked list of entities. The main idea behind our approaches in this chapter can be summarized as follows: If textual representations can be constructed for entities, then the ranking of these representations (“entity descriptions”) becomes straightforward by building on traditional document retrieval techniques. Accordingly, the bulk of the work presented in this chapter revolves around assembling term-based entity representations from various sources, ranging from unstructured documents to structured knowledge bases. We also discuss evaluation methodology and standard test collections.
Download chapter PDF
References
Balog, K., Azzopardi, L., de Rijke, M.: Formal models for expert finding in enterprise corpora. In: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, SIGIR ’06, pp. 43–50. ACM (2006). doi: 10.1145/1148170.1148181
Balog, K., Azzopardi, L., de Rijke, M.: A language modeling framework for expert finding. Inf. Process. Manage. 45(1), 1–19 (2009a). doi: 10.1016/j.ipm.2008.06.003
Balog, K., Fang, Y., de Rijke, M., Serdyukov, P., Si, L.: Expertise retrieval. Found. Trends Inf. Retr. 6(2-3), 127–256 (2012a). doi: 10.1561/1500000024
Balog, K., Neumayer, R.: A test collection for entity search in DBpedia. In: Proceedings of the 36th international ACM SIGIR conference on Research and development in Information Retrieval, SIGIR ’13, pp. 737–740. ACM (2013). doi: 10.1145/2484028.2484165
Balog, K., de Rijke, M.: Associating people and documents. In: Proceedings of the IR Research, 30th European Conference on Advances in Information Retrieval, ECIR’08, pp. 296–308. Springer (2008). doi: 10.1007/978-3-540-78646-7_28
Balog, K., Serdyukov, P., de Vries, A.P.: Overview of the TREC 2010 Entity track. In: Proceedings of the Nineteenth Text REtrieval Conference, TREC ’10. NIST (2011)
Balog, K., Serdyukov, P., de Vries, A.P.: Overview of the TREC 2011 Entity track. In: The Twentieth Text REtrieval Conference Proceedings, TREC ’11. NIST (2012b)
Balog, K., Soboroff, I., Thomas, P., Craswell, N., de Vries, A.P., Bailey, P.: Overview of the TREC 2008 Enterprise track. In: Proceedings of the 17th Text REtrieval Conference, TREC ’08. NIST (2009b)
Balog, K., de Vries, A.P., Serdyukov, P., Thomas, P., Westerveld, T.: Overview of the TREC 2009 Entity track. In: Proceedings of the Eighteenth Text REtrieval Conference, TREC ’09. NIST (2010)
Blanco, R., Halpin, H., Herzig, D.M., Mika, P., Pound, J., Thompson, H.S., Duc, T.T.: Entity search evaluation over structured web data. In: Proceedings of the 1st International Workshop on Entity-Oriented Search, EOS ’11, pp. 65–71 (2011a)
Blanco, R., Halpin, H., Herzig, D.M., Mika, P., Pound, J., Thompson, H.S., Tran, T.: Repeatable and reliable semantic search evaluation. Web Semant. 21, 14–29 (2013)
Blanco, R., Mika, P., Vigna, S.: Effective and efficient entity search in RDF data. In: Proceedings of the 10th International Conference on The Semantic Web, ISWC ’11, pp. 83–97. Springer (2011b). doi: 10.1007/978-3-642-25073-6_6
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001). doi: 10.1023/A:1010933404324
Broder, A., Gabrilovich, E., Josifovski, V., Mavromatis, G., Metzler, D., Wang, J.: Exploiting site-level information to improve web search. In: Proceedings of the 19th ACM International Conference on Information and Knowledge Management, CIKM ’10, pp. 1393–1396. ACM (2010). doi: 10.1145/1871437.1871630
Burges, C., Shaked, T., Renshaw, E., Lazier, A., Deeds, M., Hamilton, N., Hullender, G.: Learning to rank using gradient descent. In: Proceedings of the 22nd International Conference on Machine Learning, ICML ’05, pp. 89–96. ACM (2005). doi: 10.1145/1102351.1102363
Burges, C.J.C., Ragno, R., Le, Q.V.: Learning to rank with nonsmooth cost functions. In: Proceedings of the 19th International Conference on Neural Information Processing Systems, NIPS ’06, pp. 193–200. MIT Press (2006)
Cao, Z., Qin, T., Liu, T.Y., Tsai, M.F., Li, H.: Learning to rank: From pairwise approach to listwise approach. In: Proceedings of the 24th International Conference on Machine Learning, ICML ’07, pp. 129–136. ACM (2007). doi: 10.1145/1273496.1273513
Chakrabarti, S., Kasturi, S., Balakrishnan, B., Ramakrishnan, G., Saraf, R.: Compressed data structures for annotated web search. In: Proceedings of the 21st International Conference on World Wide Web, WWW ’12, pp. 121–130. ACM (2012). doi: 10.1145/2187836.2187854
Chakrabarti, S., Puniyani, K., Das, S.: Optimizing scoring functions and indexes for proximity search in type-annotated corpora. In: Proceedings of the 15th International Conference on World Wide Web, WWW ’06, pp. 717–726. ACM (2006). doi: 10.1145/1135777.1135882
Chapelle, O., Chang, Y.: Yahoo! Learning to Rank Challenge overview. In: Proceedings of the Yahoo! Learning to Rank Challenge, pp. 1–24 (2011)
Chen, J., Xiong, C., Callan, J.: An empirical study of learning to rank for entity search. In: Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’16, pp. 737–740. ACM (2016). doi: 10.1145/2911451.2914725
Cheng, T., Chang, K.C.C.: Beyond pages: Supporting efficient, scalable entity search with dual-inversion index. In: Proceedings of the 13th International Conference on Extending Database Technology, EDBT ’10, pp. 15–26. ACM (2010). doi: 10.1145/1739041.1739047
Cimiano, P., Lopez, V., Unger, C., Cabrio, E., Ngonga Ngomo, A.C., Walter, S.: Multilingual question answering over Linked Data (QALD-3): Lab overview. In: Information Access Evaluation. Multilinguality, Multimodality, and Visualization: 4th International Conference of the CLEF Initiative, CLEF 2013, Valencia, Spain, September 23–26, 2013. Proceedings, pp. 321–332. Springer (2013). doi: 10.1007/978-3-642-40802-1_30
Conrad, J.G., Utt, M.H.: A system for discovering relationships by feature extraction from text databases. In: Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’94, pp. 260–270. Springer (1994)
Craswell, N., de Vries, A.P., Soboroff, I.: Overview of the TREC-2005 Enterprise track. In: Proceedings of the 14th Text REtrieval Conference, TREC ’05. NIST (2006)
Croft, B., Metzler, D., Strohman, T.: Search Engines: Information Retrieval in Practice. 1st edn. Addison-Wesley Publishing Co. (2009)
Dalton, J., Huston, S.: Semantic entity retrieval using web queries over structured RDF data. In: Proceedings of the 3rd International Semantic Search Workshop, SEMSEARCH ’10 (2010)
Dalvi, N., Kumar, R., Soliman, M.: Automatic wrappers for large scale web extraction. Proc. VLDB Endow. 4(4), 219–230 (2011). doi: 10.14778/1938545.1938547
Demartini, G., Iofciu, T., de Vries, A.: Overview of the INEX 2009 Entity Ranking track. In: Geva, S., Kamps, J., Trotman, A. (eds.) Focused Retrieval and Evaluation, Lecture Notes in Computer Science, vol. 6203, pp. 254–264. Springer (2010). doi: 10.1007/978-3-642-14556-8_26
Demartini, G., de Vries, A.P., Iofciu, T., Zhu, J.: Overview of the INEX 2008 Entity Ranking track. In: Advances in Focused Retrieval: 7th International Workshop of the Initiative for the Evaluation of XML Retrieval (INEX 2008), pp. 243–252 (2009). doi: 10.1007/978-3-642-03761-0_25
Firth, J.R.: A synopsis of linguistic theory 1930-55. Studies in Linguistic Analysis (special volume of the Philological Society) 1952-59, 1–32 (1957)
Freund, Y., Iyer, R., Schapire, R.E., Singer, Y.: An efficient boosting algorithm for combining preferences. J. Mach. Learn. Res. 4, 933–969 (2003)
Friedman, J.H.: Greedy function approximation: A gradient boosting machine. Annals of Statistics 29, 1189–1232 (2000)
Graus, D., Tsagkias, M., Weerkamp, W., Meij, E., de Rijke, M.: Dynamic collective entity representations for entity ranking. In: Proceedings of the Ninth ACM International Conference on Web Search and Data Mining, WSDM ’16, pp. 595–604. ACM (2016). doi: 10.1145/2835776.2835819
Gurajada, S., Kamps, J., Mishra, A., Schenkel, R., Theobald, M., Wang, Q.: Overview of the INEX 2013 Linked Data track. In: CLEF 2013 Evaluation Labs and Workshop, Online Working Notes (2013)
Halpin, H., Herzig, D.M., Mika, P., Blanco, R., Pound, J., Thompson, H.S., Tran, D.T.: Evaluating ad-hoc object retrieval. In: Proceedings of the International Workshop on Evaluation of Semantic Technologies, IWEST ’10 (2010)
Hasibi, F., Balog, K., Bratsberg, S.E.: Exploiting entity linking in queries for entity retrieval. In: Proceedings of the 2016 ACM on International Conference on the Theory of Information Retrieval, ICTIR ’16, pp. 209–218. ACM (2016). doi: 10.1145/2970398.2970406
Hasibi, F., Nikolaev, F., Xiong, C., Balog, K., Bratsberg, S.E., Kotov, A., Callan, J.: DBpedia-Entity v2: A test collection for entity search. In: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’17, pp. 1265–1268. ACM (2017). doi: 10.1145/3077136.3080751
Joachims, T.: Optimizing search engines using clickthrough data. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’02, pp. 133–142. ACM (2002). doi: 10.1145/775047.775067
Kim, J., Xue, X., Croft, W.B.: A probabilistic retrieval model for semistructured data. In: Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval, pp. 228–239. Springer (2009). doi: 10.1007/978-3-642-00958-7_22
Laender, A.H.F., Ribeiro-Neto, B.A., da Silva, A.S., Teixeira, J.S.: A brief survey of web data extraction tools. SIGMOD Rec. 31(2), 84–93 (2002). doi: 10.1145/565117.565137
Liu, T.Y.: Learning to Rank for Information Retrieval. Springer (2011)
Lopez, V., Unger, C., Cimiano, P., Motta, E.: Evaluating question answering over Linked Data. Web Semantics: Science, Services and Agents on the World Wide Web 21, 3–13 (2013). doi: 10.1016/j.websem.2013.05.006
Lu, C., Lam, W., Liao, Y.: Entity retrieval via entity factoid hierarchy. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), ACL ’15, pp. 514–523. Association for Computational Linguistics (2015). doi: 10.3115/v1/P15-1050
Lu, W., Robertson, S., MacFarlane, A.: Field-weighted XML retrieval based on BM25. In: Proceedings of the 4th International Conference on Initiative for the Evaluation of XML Retrieval, INEX ’05, pp. 161–171 (2006). doi: 10.1007/11766278_12
Lv, Y., Zhai, C.: Positional language models for information retrieval. In: Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’09, pp. 299–306. ACM (2009). doi: 10.1145/1571941.1571994
Macdonald, C., Ounis, I.: Voting for candidates: Adapting data fusion techniques for an expert search task. In: Proceedings of the 15th ACM international conference on Information and knowledge management, CIKM ’06, pp. 387–396. ACM (2006). doi: 10.1145/1183614.1183671
Macdonald, C., Santos, R.L., Ounis, I.: The whens and hows of learning to rank for web search. Inf. Retr. 16(5), 584–628 (2013). doi: 10.1007/s10791-012-9209-9
Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press (2008)
Meij, E., Weerkamp, W., de Rijke, M.: Adding semantics to microblog posts. In: Proceedings of the Fifth ACM International Conference on Web Search and Data Mining, WSDM ’12, pp. 563–572. ACM (2012). doi: 10.1145/2124295.2124364
Metzler, D.: A Feature-Centric View of Information Retrieval. Springer (2011)
Metzler, D., Croft, W.B.: A Markov random field model for term dependencies. In: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’05, pp. 472–479. ACM (2005). doi: 10.1145/1076034.1076115
Metzler, D., Croft, W.B.: Linear feature-based models for information retrieval. Inf. Retr. 10(3), 257–274 (2007). doi: 10.1007/s10791-006-9019-z
Neumayer, R., Balog, K., Nørvåg, K.: On the modeling of entities for ad-hoc entity search in the Web of Data. In: Proceedings of the 34th European conference on Advances in Information Retrieval, ECIR ’12, pp. 133–145. Springer (2012a). doi: 10.1007/978-3-642-28997-2_12
Neumayer, R., Balog, K., Nørvåg, K.: When simple is (more than) good enough: Effective semantic search with (almost) no semantics. In: Proceedings of the 34th European conference on Advances in Information Retrieval, ECIR ’12, pp. 540–543. Springer (2012b). doi: 10.1007/978-3-642-28997-2_59
Nikolaev, F., Kotov, A., Zhiltsov, N.: Parameterized fielded term dependence models for ad-hoc entity retrieval from knowledge graph. In: Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’16, pp. 435–444. ACM (2016). doi: 10.1145/2911451.2911545
Ogilvie, P., Callan, J.: Combining document representations for known-item search. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’03, pp. 143–150. ACM (2003). doi: 10.1145/860435.860463
Ogilvie, P., Callan, J.: Hierarchical language models for XML component retrieval. In: Fuhr, N., Lalmas, M., Malik, S., Szlávik, Z. (eds.) Advances in XML Information Retrieval, Third International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2004, Lecture Notes in Computer Science, vol. 3493, pp. 224–237. Springer (2005). doi: 10.1007/11424550_18
Pérez-Agüera, J.R., Arroyo, J., Greenberg, J., Iglesias, J.P., Fresno, V.: Using BM25F for semantic search. In: Proceedings of the 3rd International Semantic Search Workshop, SEMSEARCH ’10. ACM (2010). doi: 10.1145/1863879.1863881
Petkova, D., Croft, W.B.: Proximity-based document representation for named entity retrieval. In: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management, CIKM ’07, pp. 731–740. ACM (2007). doi: 10.1145/1321440.1321542
Qin, T., Liu, T.Y., Xu, J., Li, H.: LETOR: A benchmark collection for research on learning to rank for information retrieval. Inf. Retr. 13(4), 346–374 (2010). doi: 10.1007/s10791-009-9123-y
Raghavan, H., Allan, J., Mccallum, A.: An exploration of entity models, collective classification and relation description. In: KDD Workshop on Link Analysis and Group Detection, LinkKDD ’04, pp. 1–10 (2004)
Robertson, S., Zaragoza, H.: The probabilistic relevance framework: BM25 and beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). doi: 10.1561/1500000019
Robertson, S., Zaragoza, H., Taylor, M.: Simple BM25 extension to multiple weighted fields. In: Proceedings of the 13th ACM conference on Information and knowledge management, CIKM ’04, pp. 42–49 (2004). doi: 10.1145/1031171.1031181
Robertson, S.E.: The probability ranking principle in information retrieval. Journal of Documentation 33, 294–304 (1977)
Sanderson, M.: Test collection based evaluation of information retrieval systems. Found. Trends Inf. Retr. 4(4), 247–375 (2010). doi: 10.1561/1500000009
Sandhaus, E.: The New York Times Annotated Corpus. Tech. rep. (2008)
Schuhmacher, M., Dietz, L., Paolo Ponzetto, S.: Ranking entities for web queries through text and knowledge. In: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, CIKM ’15, pp. 1461–1470. ACM (2015). doi: 10.1145/2806416.2806480
Unger, C., Forascu, C., Lopez, V., Ngonga Ngomo, A.C., Cabrio, E., Cimiano, P., Walter, S.: Question answering over Linked Data (QALD-4). In: Cappellato, L., Ferro, N., Halvey, M., Kraaij, W. (eds.) Working Notes for CLEF 2014 Conference (2014)
Unger, C., Forascu, C., Lopez, V., Ngonga Ngomo, A.C., Cabrio, E., Cimiano, P., Walter, S.: Question answering over Linked Data (QALD-5). In: Cappellato, L., Ferro, N., Jones, G., San Juan, E. (eds.) Working Notes of CLEF 2015 - Conference and Labs of the Evaluation forum (2015)
Unger, C., Ngomo, A.C.N., Cabrio, E.: 6th Open Challenge on Question Answering over Linked Data (QALD-6). In: Sack, H., Dietze, S., Tordai, A., Lange, C. (eds.) Semantic Web Challenges: Third SemWebEval Challenge at ESWC 2016, Heraklion, Crete, Greece, May 29 - June 2, 2016, Revised Selected Papers, pp. 171–177. Springer (2016). doi: 10.1007/978-3-319-46565-4_13
Voorhees, E.M., Harman, D.K.: TREC: Experiment and Evaluation in Information Retrieval. The MIT Press (2005)
de Vries, A.P., Vercoustre, A.M., Thom, J.A., Craswell, N., Lalmas, M.: Overview of the INEX 2007 Entity Ranking track. In: Proceedings of the 6th Initiative on the Evaluation of XML Retrieval, INEX ’07, pp. 245–251. Springer (2008). doi: 10.1007/978-3-540-85902-4_22
Wang, Q., Kamps, J., Camps, G.R., Marx, M., Schuth, A., Theobald, M., Gurajada, S., Mishra, A.: Overview of the INEX 2012 Linked Data track. In: CLEF 2012 Evaluation Labs and Workshop, Online Working Notes (2012)
Wu, Q., Burges, C.J., Svore, K.M., Gao, J.: Adapting boosting for information retrieval measures. Inf. Retr. 13(3), 254–270 (2010). doi: 10.1007/s10791-009-9112-1
Xu, J., Li, H.: AdaRank: A boosting algorithm for information retrieval. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’07, pp. 391–398. ACM (2007). doi: 10.1145/1277741.1277809
Zhai, C., Lafferty, J.: A study of smoothing methods for language models applied to information retrieval. ACM Trans. Inf. Syst. 22(2), 179–214 (2004). doi: 10.1145/984321.984322
Zhai, C., Massung, S.: Text Data Management and Analysis: A Practical Introduction to Information Retrieval and Text Mining. ACM and Morgan & Claypool (2016)
Zheng, Z., Zha, H., Zhang, T., Chapelle, O., Chen, K., Sun, G.: A general boosting method and its application to learning ranking functions for web search. In: Proceedings of the 20th International Conference on Neural Information Processing Systems, NIPS ’07, pp. 1697–1704. Curran Associates Inc. (2007)
Zhiltsov, N., Agichtein, E.: Improving entity search over Linked Data by modeling latent semantics. In: Proceedings of the 22nd ACM International Conference on Information and Knowledge Management, CIKM ’13, pp. 1253–1256. ACM (2013). doi: 10.1145/2505515.2507868
Zhiltsov, N., Kotov, A., Nikolaev, F.: Fielded sequential dependence model for ad-hoc entity retrieval in the Web of Data. In: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’15, pp. 253–262. ACM (2015). doi: 10.1145/2766462.2767756
Author information
Authors and Affiliations
Rights and permissions
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this book are included in the book's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the book's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Copyright information
© 2018 The Editor(s) (if applicable) and the Author(s)
About this chapter
Cite this chapter
Balog, K. (2018). Term-Based Models for Entity Ranking. In: Entity-Oriented Search. The Information Retrieval Series, vol 39. Springer, Cham. https://doi.org/10.1007/978-3-319-93935-3_3
Download citation
DOI: https://doi.org/10.1007/978-3-319-93935-3_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-93933-9
Online ISBN: 978-3-319-93935-3
eBook Packages: Computer ScienceComputer Science (R0)