International Semantic Web Conference

The Semantic Web - ISWC 2015 pp 474-491 | Cite as

Improving Entity Retrieval on Structured Data

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9366)

Abstract

The increasing amount of data on the Web, in particular of Linked Data, has led to a diverse landscape of datasets, which make entity retrieval a challenging task. Explicit cross-dataset links, for instance to indicate co-references or related entities can significantly improve entity retrieval. However, only a small fraction of entities are interlinked through explicit statements. In this paper, we propose a two-fold entity retrieval approach. In a first, offline preprocessing step, we cluster entities based on the x–means and spectral clustering algorithms. In the second step, we propose an optimized retrieval model which takes advantage of our precomputed clusters. For a given set of entities retrieved by the BM25F retrieval approach and a given user query, we further expand the result set with relevant entities by considering features of the queries, entities and the precomputed clusters. Finally, we re-rank the expanded result set with respect to the relevance to the query. We perform a thorough experimental evaluation on the Billions Triple Challenge (BTC12) dataset. The proposed approach shows significant improvements compared to the baseline and state of the art approaches.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.G.: DBpedia: a nucleus for a web of open data. In: Aberer, K., Choi, K.-S., Noy, N., Allemang, D., Lee, K.-I., Nixon, L.J.B., Golbeck, J., Mika, P., Maynard, D., Mizoguchi, R., Schreiber, G., Cudré-Mauroux, P. (eds.) ASWC 2007 and ISWC 2007. LNCS, vol. 4825, pp. 722–735. Springer, Heidelberg (2007) CrossRefGoogle Scholar
  2. 2.
    Bizer, C., Heath, T., Berners-Lee, T.: Linked data - the story so far. Int. J. Semantic Web Inf. Syst. 5(3), 1–22 (2009)CrossRefGoogle Scholar
  3. 3.
    Blanco, R., Cambazoglu, B.B., Mika, P., Torzec, N.: Entity recommendations in web search. In: Alani, H., Kagal, L., Fokoue, A., Groth, P., Biemann, C., Parreira, J.X., Aroyo, L., Noy, N., Welty, C., Janowicz, K. (eds.) ISWC 2013, Part II. LNCS, vol. 8219, pp. 33–48. Springer, Heidelberg (2013) CrossRefGoogle Scholar
  4. 4.
    Blanco, R., Halpin, H., Herzig, D.M., Mika, P., Pound, J., Thompson, H.S., Tran, D.T.: Repeatable and reliable search system evaluation using crowdsourcing. In: Proceeding of the 34th ACM SIGIR, pp. 923–932 (2011)Google Scholar
  5. 5.
    Blanco, R., Mika, P., Vigna, S.: Effective and efficient entity search in RDF data. In: Aroyo, L., Welty, C., Alani, H., Taylor, J., Bernstein, A., Kagal, L., Noy, N., Blomqvist, E. (eds.) ISWC 2011, Part I. LNCS, vol. 7031, pp. 83–97. Springer, Heidelberg (2011) CrossRefGoogle Scholar
  6. 6.
    Demartini, G., Missen, M.M.S., Blanco, R., Zaragoza, H.: Entity summarization of news articles. In: Proceeding of the 33rd ACM SIGIR, pp. 795–796 (2010)Google Scholar
  7. 7.
    Guéret, C., Groth, P., Stadler, C., Lehmann, J.: Assessing linked data mappings using network measures. In: Simperl, E., Cimiano, P., Polleres, A., Corcho, O., Presutti, V. (eds.) ESWC 2012. LNCS, vol. 7295, pp. 87–102. Springer, Heidelberg (2012) CrossRefGoogle Scholar
  8. 8.
    Halpin, H., Herzig, D.M., Mika, P., Blanco, R., Pound, J., Thompson, H.S., Duc, T.T.: Evaluating ad-hoc object retrieval. In: Proceedings of the IWEST (2010)Google Scholar
  9. 9.
    Harth, A.: Billion Triples Challenge data set (2012). Downloaded from http://km.aifb.kit.edu/projects/btc-2012/
  10. 10.
    Lv, Y. Zhai, C.: When documents are very long, bm25 fails! In: Proceedings of 34th ACM SIGIR, pp. 1103–1104. ACM, New York (2011)Google Scholar
  11. 11.
    Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, New York (2008) CrossRefMATHGoogle Scholar
  12. 12.
    Mika, P., Meij, E., Zaragoza, H.: Investigating the semantic gap through query log analysis. In: Bernstein, A., Karger, D.R., Heath, T., Feigenbaum, L., Maynard, D., Motta, E., Thirunarayan, K. (eds.) ISWC 2009. LNCS, vol. 5823, pp. 441–455. Springer, Heidelberg (2009) CrossRefGoogle Scholar
  13. 13.
    Neumann, T., Weikum, G.: Rdf-3x: A risc-style engine for rdf. Proc. VLDB Endow. 1(1), 647–659 (2008)CrossRefGoogle Scholar
  14. 14.
    Nie, Z., Ma, Y., Shi, S., Wen, J.-R., Ma, W.-Y.: Web object retrieval. In: Proceedings of the 16th WWW, pp. 81–90 (2007)Google Scholar
  15. 15.
    Oren, E., Delbru, R., Catasta, M., Cyganiak, R., Stenzhorn, H., Tummarello, G.: Sindice.com: a document-oriented lookup index for open linked data. IJMSO 3(1), 37–52 (2008)CrossRefGoogle Scholar
  16. 16.
    Pelleg, D., Moore, A.W., et al.: X-means: Extending k-means with efficient estimation of the number of clusters. In: ICML, pp. 727–734 (2000)Google Scholar
  17. 17.
    Pound, J., Mika, P., Zaragoza, H.: Ad-hoc object retrieval in the web of data. In: Proceedings of the 19th WWW, pp. 771–780 (2010)Google Scholar
  18. 18.
    Rajaraman, A., Ullman, J.D.: Mining of massive datasets. Cambridge University Press (2011)Google Scholar
  19. 19.
    Meusel, R., Petrovski, P., Bizer, C.: The webdatacommons microdata, rdfa and microformat dataset series. In: Mika, P., Tudorache, T., Bernstein, A., Welty, C., Knoblock, C., Vrandečić, D., Groth, P., Noy, N., Janowicz, K., Goble, C. (eds.) ISWC 2014, Part I. LNCS, vol. 8796, pp. 277–292. Springer, Heidelberg (2014) Google Scholar
  20. 20.
    Robertson, S., Zaragoza, H.: The probabilistic relevance framework: Bm25 and beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009)CrossRefGoogle Scholar
  21. 21.
    Tonon, A., Demartini, G., Cudré-Mauroux, P.: Combining inverted indices and structured search for ad-hoc object retrieval. In: Proceedings of the 35th ACM SIGIR, pp. 125–134 (2012)Google Scholar
  22. 22.
    von Luxburg, U.: A tutorial on spectral clustering. Statistics and Computing 17(4), 395–416 (2007)MathSciNetCrossRefGoogle Scholar
  23. 23.
    Zhiltsov, N., Agichtein, E.: Improving entity search over linked data by modeling latent semantics. In: Proceedings of the 22nd ACM CIKM, pp. 1253–1256 (2013)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.L3S Research CenterLeibniz Universität HannoverHanoverGermany

Personalised recommendations