On the Modeling of Entities for Ad-Hoc Entity Search in the Web of Data

  • Robert Neumayer
  • Krisztian Balog
  • Kjetil Nørvåg
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7224)

Abstract

The Web of Data describes objects, entities, or “things” in terms of their attributes and their relationships, using RDF statements. There is a need to make this wealth of knowledge easily accessible by means of keyword search. Despite recent research efforts in this direction, there is a lack of understanding of how structured semantic data is best represented for text-based entity retrieval. The task we are addressing in this paper is ad-hoc entity search: the retrieval of RDF resources that represent an entity described in the keyword query. We build upon and formalise existing entity modeling approaches within a generative language modeling framework, and compare them experimentally using a standard test collection, provided by the Semantic Search Challenge evaluation series. We show that these models outperform the current state-of-the-art in terms of retrieval effectiveness, however, this is done at the cost of abandoning a large part of the semantics behind the data. We propose a novel entity model capable of preserving the semantics associated with entities, without sacrificing retrieval effectiveness.

Keywords

Language Model Keyword Query Entity Model Retrieval Effectiveness Normalize Discount Cumulative Gain 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Balog, K., Azzopardi, L., de Rijke, M.: A language modeling framework for expert finding. Inf. Process. Manage. 45, 1–19 (2009)CrossRefGoogle Scholar
  2. 2.
    Balog, K., de Vries, A.P., Serdyukov, P., Thomas, P., Westerveld, T.: Overview of the TREC 2009 entity track. In: Proc. of the 18th Text Retrieval Conference, TREC 2009 (2010)Google Scholar
  3. 3.
    Balog, K., Ciglan, M., Neumayer, R., Wei, W., Nørvåg, K.: NTNU at SemSearch 2011. In: Proc. of the 4th Intl. Semantic Search Workshop (2011)Google Scholar
  4. 4.
    Blanco, R., Halpin, H., Herzig, D.M., Mika, P., Pound, J., Thompson, H.S., Duc, T.T.: Entity search evaluation over structured web data. In: Proc. of the 1st International Workshop on Entity-Oriented Search (EOS 2011), pp. 65–71 (2011)Google Scholar
  5. 5.
    Blanco, R., Mika, P., Vigna, S.: Effective and efficient entity search. In: Proc. of the 9th Intl. Semantic Web Conf., pp. 83–97 (2011)Google Scholar
  6. 6.
    Campinas, S., Delbru, R., Rakhmawati, N.A., Ceccarelli, D., Tummarello, G.: Sindice BM25F at SemSearch 2011. In: Proc. of the 4th Intl. Semantic Search Workshop (2011)Google Scholar
  7. 7.
    Craswell, N., De Vries, A.P., Soboroff, I.: The TREC 2005 enterprise track. In: Proc. of the 14th Text Retrieval Conference (TREC 2005) (2005)Google Scholar
  8. 8.
    Dalton, J., Huston, S.: Semantic entity retrieval using web queries over structured RDF data. In: Proc. of the 3rd Intl. Semantic Search Workshop (2010)Google Scholar
  9. 9.
    de Vries, A.P., Vercoustre, A.-M., Thom, J.A., Craswell, N., Lalmas, M.: Overview of the INEX 2007 Entity Ranking Track. In: Fuhr, N., Kamps, J., Lalmas, M., Trotman, A. (eds.) INEX 2007. LNCS, vol. 4862, pp. 245–251. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  10. 10.
    Delbru, R., Toupikov, N., Catasta, M., Tummarello, G.: A Node Indexing Scheme for Web Entity Retrieval. In: Aroyo, L., Antoniou, G., Hyvönen, E., ten Teije, A., Stuckenschmidt, H., Cabral, L., Tudorache, T. (eds.) ESWC 2010. LNCS, vol. 6089, pp. 240–256. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  11. 11.
    Demartini, G., de Vries, A.P., Iofciu, T., Zhu, J.: Overview of the INEX 2008 Entity Ranking Track. In: Geva, S., Kamps, J., Trotman, A. (eds.) INEX 2008. LNCS, vol. 5631, pp. 243–252. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  12. 12.
    Demartini, G., Firan, C.S., Iofciu, T., Krestel, R., Nejdl, W.: Why finding entities in Wikipedia is difficult, sometimes. Information Retrieval 13(5), 534–567 (2010)CrossRefGoogle Scholar
  13. 13.
    Demartini, G., Kärger, P., Papadakis, G., Fankhauser, P.: L3S Research Center at the SemSearch 2010 Evaluation for Entity Search Track. In: Proc. of the 3rd Intl. Semantic Search Workshop (2010b)Google Scholar
  14. 14.
    Elbassuoni, S., Ramanath, M., Schenkel, R., Sydow, M., Weikum, G.: Language-model-based ranking for queries on RDF-graphs. In: Proc. of the 18th Intl. Conf. on Information and Knowledge Management, pp. 977–986 (2009)Google Scholar
  15. 15.
    Halpin, H., Herzig, D.M., Mika, P., Blanco, R., Pound, J., Thompson, H.S., Tran, D.T.: Evaluating ad-hoc object retrieval. In: Proc. of the Intl. Workshop on Evaluation of Semantic Technologies (2010)Google Scholar
  16. 16.
    Herzig, D.M., Tran, T.D.: Scoring model for entity search on RDF graphs. In: Proc. of the 3rd Intl. Semantic Search Workshop (2010)Google Scholar
  17. 17.
    Kamps, J., Geva, S., Trotman, A., Woodley, A., Koolen, M.: Overview of the INEX 2008 Ad Hoc Track. In: Geva, S., Kamps, J., Trotman, A. (eds.) INEX 2008. LNCS, vol. 5631, pp. 1–28. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  18. 18.
    Kim, J., Xue, X., Croft, W.B.: A Probabilistic Retrieval Model for Semistructured Data. In: Boughanem, M., Berrut, C., Mothe, J., Soule-Dupuy, C. (eds.) ECIR 2009. LNCS, vol. 5478, pp. 228–239. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  19. 19.
    Liu, X., Fang, H.: A study of entity search in semantic search workshop. In: Proc. of the 3rd Intl. Semantic Search Workshop (2010)Google Scholar
  20. 20.
    Ogilvie, P., Callan, J.: Combining document representations for known-item search. In: Proc. of the 26th Annual Intl. ACM SIGIR Conf. on Research and Development in Information Retrieval, pp. 143–150 (2003)Google Scholar
  21. 21.
    Ogilvie, P., Callan, J.: Hierarchical Language Models for XML Component Retrieval. In: Fuhr, N., Lalmas, M., Malik, S., Szlávik, Z. (eds.) INEX 2004. LNCS, vol. 3493, pp. 224–237. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  22. 22.
    Park, J., Lee, S.-G.: Keyword search in relational databases. Knowl. Inf. Syst. 26, 175–193 (2011) ISSN: 0219-1377CrossRefGoogle Scholar
  23. 23.
    Pérez-Agüera, J.R., Arroyo, J., Greenberg, J., Iglesias, J.P., Fresno, V.: Using BM25F for semantic search. In: Proc. of the 3rd Intl. Semantic Search Workshop, pp. 1–8 (2010)Google Scholar
  24. 24.
    Pound, J., Mika, P., Zaragoza, H.: Ad-hoc object retrieval in the web of data. In: Proc. of the 19th Intl. Conf. on World Wide Web, pp. 771–780 (2010)Google Scholar
  25. 25.
    Robertson, S., Zaragoza, H., Taylor, M.: Simple BM25 extension to multiple weighted fields. In: Proc. of the 13th Intl. Conf. on Information and Knowledge Management, pp. 42–49 (2004)Google Scholar
  26. 26.
    Zhai, C.: Statistical language models for information retrieval a critical review. Foundations and Trends in Information Retrieval 2, 137–213 (2008)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Robert Neumayer
    • 1
  • Krisztian Balog
    • 1
  • Kjetil Nørvåg
    • 1
  1. 1.Norwegian University of Science and TechnologyTrondheimNorway

Personalised recommendations