Investigating the Semantic Gap through Query Log Analysis

  • Peter Mika
  • Edgar Meij
  • Hugo Zaragoza
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5823)


Significant efforts have focused in the past years on bringing large amounts of metadata online and the success of these efforts can be seen by the impressive number of web sites exposing data in RDFa or RDF/XML. However, little is known about the extent to which this data fits the needs of ordinary web users with everyday information needs. In this paper we study what we perceive as the semantic gap between the supply of data on the Semantic Web and the needs of web users as expressed in the queries submitted to a major Web search engine. We perform our analysis on both the level of instances and ontologies. First, we first look at how much data is actually relevant to Web queries and what kind of data is it. Second, we provide a generic method to extract the attributes that Web users are searching for regarding particular classes of entities. This method allows to contrast class definitions found in Semantic Web vocabularies with the attributes of objects that users are interested in. Our findings are crucial to measuring the potential of semantic search, but also speak to the state of the Semantic Web in general.


Context Word Semantic Search Query Suggestion Mean Reciprocal Rank Query Context 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Angeletou, S., Sabou, M., Motta, E.: Folksonomy Enrichment and Search. In: Aroyo, L., Traverso, P., Ciravegna, F., Cimiano, P., Heath, T., Hyvonen, E., Mizoguchi, R., Oren, E., Sabou, M., Simperl, E.P.B. (eds.) ESWC. LNCS, vol. 5554, pp. 801–805. Springer, Heidelberg (2009)Google Scholar
  2. 2.
    Baeza-Yates, R., Gionis, A., Junqueira, F., Murdock, V., Plachouras, V., Silvestri, F.: The impact of caching on search engines. In: SIGIR 2007: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 183–190. ACM, New York (2007)CrossRefGoogle Scholar
  3. 3.
    Bai, J., Nie, J.-Y.: Adapting information retrieval to query contexts. IPM 44(6), 1901–1922 (2008)Google Scholar
  4. 4.
    Bizer, C.: DBPedia: Querying Wikipedia Like a Database. In: WWW 2007 (2007)Google Scholar
  5. 5.
    Brusilovsky, P., Davis, H.C. (eds.): HYPERTEXT 2008, Proceedings of the 19th ACM Conference on Hypertext and Hypermedia, Pittsburgh, PA, USA, June 19-21. ACM, New York (2008)Google Scholar
  6. 6.
    d’Aquin, M., Baldassarre, C., Gridinoc, L., Angeletou, S., Sabou, M., Motta, E.: Characterizing Knowledge on the Semantic Web with Watson. In: EON (2007)Google Scholar
  7. 7.
    Ding, L., Finin, T.: Characterizing the Semantic Web on the Web. In: Cruz, I., Decker, S., Allemang, D., Preist, C., Schwabe, D., Mika, P., Uschold, M., Aroyo, L.M. (eds.) ISWC 2006. LNCS, vol. 4273, pp. 242–257. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  8. 8.
    Ding, L., Zhou, L., Finin, T., Joshi, A.: How the Semantic Web is Being Used: An Analysis of FOAF Documents. In: HICSS 2005 (2005)Google Scholar
  9. 9.
    Francisco, A.P., Baeza-Yates, R.A., Oliveira, A.L.: Clique Analysis of Query Log Graphs. In: Amir, A., Turpin, A., Moffat, A. (eds.) SPIRE 2008. LNCS, vol. 5280, pp. 188–199. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  10. 10.
    Hausenblas, M., Halb, W., Raimond, Y., Heath, T.: What is the Size of the Semantic Web? In: I-Semantics 2008, Graz, Austria (2008)Google Scholar
  11. 11.
    Jäschke, R., Hotho, A., Schmitz, C., Ganter, B., Stumme, G.: Discovering shared conceptualizations in folksonomies. J. Web Sem. 6(1), 38–53 (2008)Google Scholar
  12. 12.
    Krause, B., Jäschke, R., Hotho, A., Stumme, G.: Logsonomy - social information retrieval with logdata. In: Brusilovsky, Davis (eds.) [5], pp. 157–166Google Scholar
  13. 13.
    Mika, P.: Ontologies Are Us: A Unified Model of Social Networks and Semantics. In: Gil, Y., Motta, E., Benjamins, V.R., Musen, M.A. (eds.) ISWC 2005. LNCS, vol. 3729, pp. 522–536. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  14. 14.
    Tummarello, G., Delbru, R., Oren, E.: Weaving the Open Linked Data. The Semantic Web, 552–565 (2008)Google Scholar
  15. 15.
    Zhou, M., Bao, S., Wu, X., Yu, Y.: An Unsupervised Model for Exploring Hierarchical Semantics from Social Annotations. In: Aberer, K., Choi, K.-S., Noy, N., Allemang, D., Lee, K.-I., Nixon, L.J.B., Golbeck, J., Mika, P., Maynard, D., Mizoguchi, R., Schreiber, G., Cudré-Mauroux, P. (eds.) ASWC 2007 and ISWC 2007. LNCS, vol. 4825, pp. 680–693. Springer, Heidelberg (2007)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Peter Mika
    • 1
  • Edgar Meij
    • 2
  • Hugo Zaragoza
    • 1
  1. 1.Yahoo ResearchBarcelonaSpain
  2. 2.ISLAUniversity of AmsterdamAmsterdam

Personalised recommendations