A Framework for Evaluating Snippet Generation for Dataset Search

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11778)


Reusing existing datasets is of considerable significance to researchers and developers. Dataset search engines help a user find relevant datasets for reuse. They can present a snippet for each retrieved dataset to explain its relevance to the user’s data needs. This emerging problem of snippet generation for dataset search has not received much research attention. To provide a basis for future research, we introduce a framework for quantitatively evaluating the quality of a dataset snippet. The proposed metrics assess the extent to which a snippet matches the query intent and covers the main content of the dataset. To establish a baseline, we adapt four state-of-the-art methods from related fields to our problem, and perform an empirical evaluation based on real-world datasets and queries. We also conduct a user study to verify our findings. The results demonstrate the effectiveness of our evaluation framework, and suggest directions for future research.


Snippet generation Dataset search Evaluation metric 



This work was supported in part by the National Key R&D Program of China under Grant 2018YFB1005100, in part by the NSFC under Grant 61572247, and in part by the SIRIUS Centre, Norwegian Research Council project number 237898. Cheng was funded by the Six Talent Peaks Program of Jiangsu Province under Grant RJFW-011.


  1. 1.
    Bai, X., Delbru, R., Tummarello, G.: RDF snippets for semantic web search engines. In: Meersman, R., Tari, Z. (eds.) OTM 2008. LNCS, vol. 5332, pp. 1304–1318. Springer, Heidelberg (2008). Scholar
  2. 2.
    Brickley, D., Burgess, M., Noy, N.F.: Google dataset search: building a search engine for datasets in an open web ecosystem. In: WWW, pp. 1365–1375 (2019)Google Scholar
  3. 3.
    Butt, A.S., Haller, A., Xie, L.: Dwrank: learning concept ranking for ontology search. Semant. Web 7(4), 447–461 (2016)CrossRefGoogle Scholar
  4. 4.
    Cebiric, S., Goasdoué, F., Manolescu, I.: Query-oriented summarization of RDF graphs. PVLDB 8(12), 2012–2015 (2015)Google Scholar
  5. 5.
    Cheng, G., Ge, W., Qu, Y.: Generating summaries for ontology search. In: WWW (Companion Volume), pp. 27–28 (2011)Google Scholar
  6. 6.
    Cheng, G., Ji, F., Luo, S., Ge, W., Qu, Y.: Biprank: ranking and summarizing RDF vocabulary descriptions. In: JIST, pp. 226–241 (2011)Google Scholar
  7. 7.
    Cheng, G., Jin, C., Ding, W., Xu, D., Qu, Y.: Generating illustrative snippets for open data on the web. In: WSDM, pp. 151–159 (2017)Google Scholar
  8. 8.
    Cheng, G., Jin, C., Qu, Y.: HIEDS: a generic and efficient approach to hierarchical dataset summarization. In: IJCAI, pp. 3705–3711 (2016)Google Scholar
  9. 9.
    Cheng, G., Kharlamov, E.: Towards a semantic keyword search over industrial knowledge graphs (extended abstract). In: IEEE BigData, pp. 1698–1700 (2017)Google Scholar
  10. 10.
    Coffman, J., Weaver, A.C.: An empirical performance evaluation of relational keyword search techniques. IEEE Trans. Knowl. Data Eng. 26(1), 30–42 (2014)CrossRefGoogle Scholar
  11. 11.
    Ding, B., Yu, J.X., Wang, S., Qin, L., Zhang, X., Lin, X.: Finding top-k min-cost connected trees in databases. In: ICDE, pp. 836–845 (2007)Google Scholar
  12. 12.
    Dolby, J., et al.: Scalable semantic retrieval through summarization and refinement. In: AAAI, pp. 299–304 (2007)Google Scholar
  13. 13.
    Ellefi, M.B., et al.: RDF dataset profiling - a survey of features, methods, vocabularies and applications. Semant. Web 9(5), 677–705 (2018)CrossRefGoogle Scholar
  14. 14.
    Feigenblat, G., Roitman, H., Boni, O., Konopnicki, D.: Unsupervised query-focused multi-document summarization using the cross entropy method. In: SIGIR, pp. 961–964 (2017)Google Scholar
  15. 15.
    Fkoue, A., Meneguzzi, F., Sensoy, M., Pan, J.Z.: Querying linked ontological data through distributed summarization. In: AAAI (2012)Google Scholar
  16. 16.
    Gambhir, M., Gupta, V.: Recent automatic text summarization techniques: a survey. Artif. Intell. Rev. 47(1), 1–66 (2017)CrossRefGoogle Scholar
  17. 17.
    Ge, W., Cheng, G., Li, H., Qu, Y.: Incorporating compactness to generate term-association view snippets for ontology search. Inf. Process. Manag. 49(2), 513–528 (2013)CrossRefGoogle Scholar
  18. 18.
    Horrocks, I., Giese, M., Kharlamov, E., Waaler, A.: Using semantic technology to tame the data variety challenge. IEEE Internet Comput. 20(6), 62–66 (2016)CrossRefGoogle Scholar
  19. 19.
    Jiménez-Ruiz, E., et al.: BootOX: practical mapping of RDBs to OWL 2. In: Arenas, M., et al. (eds.) ISWC 2015. LNCS, vol. 9367, pp. 113–132. Springer, Cham (2015). Scholar
  20. 20.
    Kacprzak, E., Koesten, L., Ibáñez, L.D., Blount, T., Tennison, J., Simperl, E.: Characterising dataset search - an analysis of search logs and data requests. J. Web Semant. 55, 37–55 (2019)CrossRefGoogle Scholar
  21. 21.
    Kasneci, G., Ramanath, M., Sozio, M., Suchanek, F.M., Weikum, G.: STAR: steiner-tree approximation in relationship graphs. In: ICDE, pp. 868–879 (2009)Google Scholar
  22. 22.
    Kharlamov, E., et al.: Capturing industrial information models with ontologies and constraints. In: Groth, P., et al. (eds.) ISWC 2016. LNCS, vol. 9982, pp. 325–343. Springer, Cham (2016). Scholar
  23. 23.
    Kharlamov, E., et al.: Ontology Based Data Access in Statoil. J. Web Semant. 44, 3–36 (2017)CrossRefGoogle Scholar
  24. 24.
    Kharlamov, E., et al.: An ontology-mediated analytics-aware approach to support monitoring and diagnostics of static and streaming data. J. Web Semant. 56, 30–55 (2019)CrossRefGoogle Scholar
  25. 25.
    Kharlamov, E., et al.: Semantic access to streaming and static data at Siemens. J. Web Semant. 44, 54–74 (2017)CrossRefGoogle Scholar
  26. 26.
    Kharlamov, E., Mehdi, G., Savković, O., Xiao, G., Kalayci, E.G., Roshchin, M.: Semantically-enhanced rule-based diagnostics for industrial internet of things: the SDRL language and case study for siemens trains and turbines. J. Web Semant. 56, 11–29 (2019)CrossRefGoogle Scholar
  27. 27.
    Le, W., Li, F., Kementsietsidis, A., Duan, S.: Scalable keyword search on large RDF data. IEEE Trans. Knowl. Data Eng. 26(11), 2774–2788 (2014)CrossRefGoogle Scholar
  28. 28.
    Li, N., Motta, E., d’Aquin, M.: Ontology summarization: an analysis and an evaluation. In: IWEST (2010)Google Scholar
  29. 29.
    Li, R., Qin, L., Yu, J.X., Mao, R.: Efficient and progressive group steiner tree search. In: SIGMOD, pp. 91–106 (2016)Google Scholar
  30. 30.
    Pan, J., Vetere, G., Gomez-Perez, J., Wu, H. (eds.): Exploiting Linked Data and Knowledge Graphs for Large Organisations. Springer, Heidelberg (2016). Scholar
  31. 31.
    Penin, T., Wang, H., Tran, T., Yu, Y.: Snippet generation for semantic web search engines. In: ASWC, pp. 493–507 (2008)Google Scholar
  32. 32.
    Pietriga, E., et al.: Browsing linked data catalogs with LODAtlas. In: Vrandečić, D., et al. (eds.) ISWC 2018. LNCS, vol. 11137, pp. 137–153. Springer, Cham (2018). Scholar
  33. 33.
    Pinkel, C., et al.: RODI: benchmarking relational-to-ontology mapping generation quality. Semant. Web 9(1), 25–52 (2018)CrossRefGoogle Scholar
  34. 34.
    Pouriyeh, S., et al.: Graph-based methods for ontology summarization: A survey. In: AIKE, pp. 85–92 (2018)Google Scholar
  35. 35.
    Pouriyeh, S., et al.: Ontology summarization: graph-based methods and beyond. Int. J. Semant. Comput. 13(2), 259–283 (2019)CrossRefGoogle Scholar
  36. 36.
    Rietveld, L., Hoekstra, R., Schlobach, S., Guéret, C.: Structural properties as proxy for semantic relevance in RDF graph sampling. In: Mika, P., et al. (eds.) ISWC 2014. LNCS, vol. 8797, pp. 81–96. Springer, Cham (2014). Scholar
  37. 37.
    Ringsquandl, M., et al.: Event-enhanced learning for KG completion. In: ESWC, pp. 541–559 (2018)CrossRefGoogle Scholar
  38. 38.
    Song, Q., Wu, Y., Lin, P., Dong, X., Sun, H.: Mining summaries for knowledge graph search. IEEE Trans. Knowl. Data Eng. 30(10), 1887–1900 (2018)CrossRefGoogle Scholar
  39. 39.
    Troullinou, G., Kondylakis, H., Stefanidis, K., Plexousakis, D.: Exploring RDFS KBs Using summaries. In: Vrandečić, D., et al. (eds.) ISWC 2018. LNCS, vol. 11136, pp. 268–284. Springer, Cham (2018). Scholar
  40. 40.
    Turpin, A., Tsegay, Y., Hawking, D., Williams, H.E.: Fast generation of result snippets in web search. In: SIGIR, pp. 127–134 (2007)Google Scholar
  41. 41.
    Wang, H., Aggarwal, C.C.: A survey of algorithms for keyword search on graph data. In: Managing and Mining Graph Data, pp. 249–273. Springer, Boston (2010). Scholar
  42. 42.
    Zhang, X., Cheng, G., Ge, W., Qu, Y.: Summarizing vocabularies in the global semantic web. J. Comput. Sci. Technol. 24(1), 165–174 (2009)CrossRefGoogle Scholar
  43. 43.
    Zhang, X., Cheng, G., Qu, Y.: Ontology summarization based on rdf sentence graph. In: WWW, pp. 707–716 (2007)Google Scholar
  44. 44.
    Zhang, X., Li, H., Qu, Y.: Finding important vocabulary within ontology. In: ASWC, pp. 106–112 (2006)CrossRefGoogle Scholar
  45. 45.
    Zneika, M., Vodislav, D., Kotzinos, D.: Quality metrics for RDF graph summarization. Semant. Web 10(3), 555–584 (2019)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.National Key Laboratory for Novel Software TechnologyNanjing UniversityNanjingChina
  2. 2.Edinburgh Research CentreHuaweiEdinburghUK
  3. 3.Department of Computing ScienceUniversity of AberdeenAberdeenUK
  4. 4.Department of InformaticsUniversity of OsloOsloNorway
  5. 5.Bosch Center for Artificial IntelligenceRobert Bosch GmbHRenningenGermany

Personalised recommendations