Entity-Based Data Source Contextualization for Searching the Web of Data

  • Andreas WagnerEmail author
  • Peter Haase
  • Achim Rettinger
  • Holger Lamm
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8798)


To allow search on the Web of data, systems have to combine data from multiple sources. However, to effectively fulfill user information needs, systems must be able to “look beyond” exactly matching data sources and offer information from additional/contextual sources (data source contextualization). For this, users should be involved in the source selection process – choosing which sources contribute to their search results. Previous work, however, solely aims at source contextualization for “Web tables”, while relying on schema information and simple relational entities. Addressing these shortcomings, we exploit work from the field of data mining and show how to enable Web data source contextualization. Based on a real-world use case, we built a prototype contextualization engine, which we integrated in a system for searching the Web of data. We empirically validated the effectiveness of our approach – achieving performance gains of up to \(29\) % over the state-of-the-art.


  1. 1.
    Chitta, R., Jin, R., Havens, T.C., Jain, A.K.: Approximate kernel k-means: solution to large scale kernel clustering. In: SIGKDD (2011)Google Scholar
  2. 2.
    Das Sarma, A., Fang, L., Gupta, N., Halevy, A., Lee, H., Wu, F., Xin, R., Yu, C.: Finding related tables. In: SIGMOD (2012)Google Scholar
  3. 3.
    de Oliveira, H.R., Tavares, A.T., Lóscio, B.F.: Feedback-based data set recommendation for building linked data applications. In: I-SEMANTICS (2012)Google Scholar
  4. 4.
    Doan, A., Halevy, A.Y., Ives, Z.G.: Principles of Data Integration. Morgan Kaufmann, Waltham (2012)Google Scholar
  5. 5.
    Fausser, S., Schwenker, F.: Clustering large datasets with kernel methods. In: ICPR (2012)Google Scholar
  6. 6.
    Görlitz, O., Staab, S.: SPLENDID: SPARQL endpoint federation exploiting VOID descriptions. In: COLD Workshop (2011)Google Scholar
  7. 7.
    Grimnes, G.A., Edwards, P., Preece, A.D.: Instance based clustering of semantic Web resources. In: Bechhofer, S., Hauswirth, M., Hoffmann, J., Koubarakis, M. (eds.) ESWC 2008. LNCS, vol. 5021, pp. 303–317. Springer, Heidelberg (2008)Google Scholar
  8. 8.
    Haase, P., Schmidt, M., Schwarte, A.: The information workbench as a self-service platform for linked data applications. In: COLD Workshop (2011)Google Scholar
  9. 9.
    Harth, A., Hose, K., Karnstedt, M., Polleres, A., Sattler, K., Umbrich, J.: Data summaries for on-demand queries over linked data. In: WWW (2010)Google Scholar
  10. 10.
    Hartig, O., Bizer, C., Freytag, J.-C.: Executing SPARQL queries over the Web of linked data. In: Bernstein, A., Karger, D.R., Heath, T., Feigenbaum, L., Maynard, D., Motta, E., Thirunarayan, K. (eds.) ISWC 2009. LNCS, vol. 5823, pp. 293–309. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  11. 11.
    Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. 31, 264–323 (1999)CrossRefGoogle Scholar
  12. 12.
    Kendall, M., Gibbons, J.D.: Rank Correlation Methods. Edward Arnold, London (1990)zbMATHGoogle Scholar
  13. 13.
    Ladwig, G., Tran, T.: Linked data query processing strategies. In: Patel-Schneider, P.F., Pan, Y., Hitzler, P., Mika, P., Zhang, L., Pan, J.Z., Horrocks, I., Glimm, B. (eds.) ISWC 2010, Part I. LNCS, vol. 6496, pp. 453–469. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  14. 14.
    Leme, L.A.P.P., Lopes, G.R., Nunes, B.P., Casanova, M.A., Dietze, S.: Identifying candidate datasets for data interlinking. In: Daniel, F., Dolog, P., Li, Q. (eds.) ICWE 2013. LNCS, vol. 7977, pp. 354–366. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  15. 15.
    Lodhi, H., Saunders, C., Shawe-Taylor, J., Cristianini, N., Watkins, C.: Text classification using string kernels. J. Mach. Learn. Res. 2, 419–444 (2002)Google Scholar
  16. 16.
    Lösch, U., Bloehdorn, S., Rettinger, A.: Graph kernels for RDF data. In: Simperl, E., Cimiano, P., Polleres, A., Corcho, O., Presutti, V. (eds.) ESWC 2012. LNCS, vol. 7295, pp. 134–148. Springer, Heidelberg (2012)Google Scholar
  17. 17.
    Nikolov, A., d’Aquin, M.: Identifying relevant sources for data linking using a semantic Web index. In: LDOW Workshop (2011)Google Scholar
  18. 18.
    Nikolov, A., Schwarte, A., Hütter, C.: FedSearch: efficiently combining structured queries and full-text search in a SPARQL federation. In: Alani, H., et al. (eds.) ISWC 2013, Part I. LNCS, vol. 8218, pp. 427–443. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  19. 19.
    Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge (2004)CrossRefGoogle Scholar
  20. 20.
    Wagner, A., Haase, P., Rettinger, A., Lamm, H.: Discovering related data sources in data-portals. In: First International Workshop on Semantic Statistics (2013)Google Scholar
  21. 21.
    Wagner, A., Haase, P., Rettinger, A., Lamm, H.: Entity-based data source contextualization for searching the Web of data. Technical report (2013).
  22. 22.
    Zhang, R., Rudnicky, A.: A large scale clustering scheme for kernel K-Means. In: Pattern Recognition (2002)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Andreas Wagner
    • 1
    Email author
  • Peter Haase
    • 2
  • Achim Rettinger
    • 1
  • Holger Lamm
    • 2
  1. 1.Karlsruhe Institute of TechnologyKarlsruheGermany
  2. 2.Fluid OperationsWalldorfGermany

Personalised recommendations