Thesaurus-Based Search in Large Heterogeneous Collections

  • Jan Wielemaker
  • Michiel Hildebrand
  • Jacco van Ossenbruggen
  • Guus Schreiber
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5318)


In cultural heritage, large virtual collections are coming into existence. Such collections contain heterogeneous sets of metadata and vocabulary concepts, originating from multiple sources. In the context of the E-Culture demonstrator we have shown earlier that such virtual collections can be effectively explored with keyword search and semantic clustering. In this paper we describe the design rationale of ClioPatria, an open-source system which provides APIs for scalable semantic graph search. The use of ClioPatria’s search strategies is illustrated with a realistic use case: searching for ”Picasso”. We discuss details of scalable graph search, the required OWL reasoning functionalities and show why SPARQL queries are insufficient for solving the search problem.


Keyword Search Graph Pattern SPARQL Query Graph Search Graph Exploration 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Hyvönen, E., Junnila, M., Kettula, S., Mäkelä, E., Saarela, S., Salminen, M., Syreeni, A., Valo, A., Viljanen, K.: MuseumFinland — Finnish museums on the semantic web. Journal of Web Semantics 3, 224–241 (2005)CrossRefGoogle Scholar
  2. 2.
    Tordai, A., Omelayenko, B., Schreiber, G.: Semantic excavation of the city of books. In: Proc. Semantic Authoring, Annotation and Knowledge Markup Workshop (SAAKM 2007), CEUR-WS, vol. 289, pp. 39–46 (2007),
  3. 3.
    Miles, A., Bechhofer, S.: SKOS simple knowledge organization system reference. W3C working draft, World-Wide Web Consortium, Latest version (2008),
  4. 4.
    de Boer, V., van Someren, M., Wielinga, B.J.: A redundancy-based method for the extraction of relation instances from the web. International Journal of Human-Computer Studies 65, 816–831 (2007)CrossRefGoogle Scholar
  5. 5.
    Tummarello, G., Morbidoni, C., Nucci, M.: Enabling Semantic Web communities with DBin: an overview. In: Cruz, I., Decker, S., Allemang, D., Preist, C., Schwabe, D., Mika, P., Uschold, M., Aroyo, L.M. (eds.) ISWC 2006. LNCS, vol. 4273, Springer, Heidelberg (2006)CrossRefGoogle Scholar
  6. 6.
    Rocha, C., Schwabe, D., de Aragao, M.: A hybrid approach for searching in the semantic web. In: Proceedings of the 13th International World Wide Web Conference, New York, USA, pp. 374–383 (2004)Google Scholar
  7. 7.
    Ding, L., Finin, T., Joshi, A., Pan, R., Cost, R.S., Peng, Y., Reddivari, P., Doshi, V.C., Sachs, J.: Swoogle: A Search and Metadata Engine for the Semantic Web. In: Proceedings of the Thirteenth ACM Conference on Information and Knowledge Management, Washington, D.C., USA, pp. 652–659 (2004)Google Scholar
  8. 8.
    Tummarello, G., Oren, E., Delbru, R.: Weaving the open linked data. In: Aberer, K., Choi, K.-S., Noy, N., Allemang, D., Lee, K.-I., Nixon, L., Golbeck, J., Mika, P., Maynard, D., Mizoguchi, R., Schreiber, G., Cudré-Mauroux, P. (eds.) ASWC 2007 and ISWC 2007. LNCS, vol. 4825, pp. 547–560. Springer, Heidelberg (2007)Google Scholar
  9. 9.
    Bast, H., Weber, I.: The CompleteSearch Engine: Interactive, Efficient, and towards IR&DB Integration. In: CIDR 2007, Third Biennial Conference on Innovative Data Systems Research, Asilomar, CA, USA, pp. 88–95 (2007)Google Scholar
  10. 10.
    Wielemaker, J., Huang, Z., van der Meij, L.: SWI-Prolog and the web. Theory and Practice of Logic Programming 8, 363–392 (2008) (accepted for publication)MathSciNetCrossRefzbMATHGoogle Scholar
  11. 11.
    Wielemaker, J., Hildebrand, M., van Ossenbruggen, J.: Using Prolog as the fundament for applications on the semantic web. In: Heymans, S., et al. (eds.) Proceedings of ALPSWS 2007, pp. 84–98 (2007)Google Scholar
  12. 12.
    Wielemaker, J., Schreiber, G., Wielinga, B.: Prolog-based infrastructure for RDF: performance and scalability. In: Fensel, D., Sycara, K.P., Mylopoulos, J. (eds.) ISWC 2003. LNCS, vol. 2870, pp. 644–658. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  13. 13.
    Huynh, D., Karger, D., Miller, R.: Exhibit: Lightweight structured data publishing. In: 16th International World Wide Web Conference, Banff, Alberta, Canada, ACM Press, New York (2007)Google Scholar
  14. 14.
    Broekstra, J., Kampman, A., van Harmelen, F.: Sesame: A generic architecture for storing and querying rdf and rdf schema. In: Horrocks, I., Hendler, J. (eds.) ISWC 2002. LNCS, vol. 2342, pp. 54–68. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  15. 15.
    Wielemaker, J.: An optimised semantic web query language implementation in prolog. In: Gabbrielli, M., Gupta, G. (eds.) ICLP 2005. LNCS, vol. 3668, pp. 128–142. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  16. 16.
    Clark, K.G., Feigenbaum, L., Torres, E.: Serializing sparql query results in json W3C Working Group Note 18 June 2007 (2007)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Jan Wielemaker
    • 1
  • Michiel Hildebrand
    • 2
  • Jacco van Ossenbruggen
    • 2
  • Guus Schreiber
    • 3
  1. 1.Human Computer Studies (HCS)University of AmsterdamThe Netherlands
  2. 2.CWI AmsterdamThe Netherlands
  3. 3.VU University AmsterdamThe Netherlands

Personalised recommendations