Learning Semantic Query Suggestions

  • Edgar Meij
  • Marc Bron
  • Laura Hollink
  • Bouke Huurnink
  • Maarten de Rijke
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5823)

Abstract

An important application of semantic web technology is recognizing human-defined concepts in text. Query transformation is a strategy often used in search engines to derive queries that are able to return more useful search results than the original query and most popular search engines provide facilities that let users complete, specify, or reformulate their queries. We study the problem of semantic query suggestion, a special type of query transformation based on identifying semantic concepts contained in user queries. We use a feature-based approach in conjunction with supervised machine learning, augmenting term-based features with search history-based and concept-specific features. We apply our method to the task of linking queries from real-world query logs (the transaction logs of the Netherlands Institute for Sound and Vision) to the DBpedia knowledge base. We evaluate the utility of different machine learning algorithms, features, and feature types in identifying semantic concepts using a manually developed test bed and show significant improvements over an already high baseline. The resources developed for this paper, i.e., queries, human assessments, and extracted features, are available for download.

References

  1. 1.
    Aleksovski, Z., Klein, M., ten Kate, W., van Harmelen, F.: Matching unstructured vocabularies using a background ontology. In: Staab, S., Svátek, V. (eds.) EKAW 2006. LNCS (LNAI), vol. 4248, pp. 182–197. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  2. 2.
    Auer, S., Lehmann, J.: What have Innsbruck and Leipzig in common? Extracting semantics from wiki content. In: The Semantic Web: Research and Applications (2007)Google Scholar
  3. 3.
    Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.G.: Dbpedia: A nucleus for a web of open data. In: Aberer, K., Choi, K.-S., Noy, N., Allemang, D., Lee, K.-I., Nixon, L.J.B., Golbeck, J., Mika, P., Maynard, D., Mizoguchi, R., Schreiber, G., Cudré-Mauroux, P. (eds.) ASWC 2007 and ISWC 2007. LNCS, vol. 4825, pp. 722–735. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  4. 4.
    Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison Wesley, Reading (1999)Google Scholar
  5. 5.
    Beitzel, S.M., Jensen, E.C., Chowdhury, A., Grossman, D., Frieder, O.: Hourly analysis of a very large topically categorized web query log. In: SIGIR 2004 (2004)Google Scholar
  6. 6.
    Bendersky, M., Croft, W.B.: Discovering key concepts in verbose queries. In: SIGIR 2008 (2008)Google Scholar
  7. 7.
    Bendersky, M., Croft, W.B.: Analysis of long queries in a large scale search log. In: WSCD 2009 (2009)Google Scholar
  8. 8.
    Bhole, A., Fortuna, B., Grobelnik, M., Mladenic, D.: Extracting named entities and relating them over time based on wikipedia. Informatica 4(4), 463–468 (2007)Google Scholar
  9. 9.
    Bizer, C., Cyganiak, R., Auer, S., Kobilarov, G.: DBpedia–querying Wikipedia like a database. In: WWW 2007 (2007)Google Scholar
  10. 10.
    Caracciolo, C., Euzenat, J., Hollink, L., Ichise, R., Isaac, A., Malaisé, V., Meilicke, C., Pane, J., Shvaiko, P., Stuckenschmidt, H., Šváb, O., Svátek, V.: Results of the OAEI 2008. In: The Third International Workshop on Ontology Matching at ISWC (2008)Google Scholar
  11. 11.
    Chemudugunta, C., Holloway, A., Smyth, P., Steyvers, M.: Modeling documents by combining semantic concepts with unsupervised statistical learning. In: Sheth, A.P., Staab, S., Dean, M., Paolucci, M., Maynard, D., Finin, T., Thirunarayan, K. (eds.) ISWC 2008. LNCS, vol. 5318, pp. 229–244. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  12. 12.
    Church, K.W., Gale, W.A.: Inverse document frequency (IDF): A measure of deviations from poisson. In: Proc. Third Workshop on Very Large Corpora (1995)Google Scholar
  13. 13.
    Fellbaum, C., Palmer, M., Dang, H.T., Delfs, L., Wolf, S.: Manual and automatic semantic annotation with wordnet. WordNet and Other Lexical Resources (2001)Google Scholar
  14. 14.
    Guo, J., Xu, G., Cheng, X., Li, H.: Named entity recognition in query. In: SIGIR 2009 (2009)Google Scholar
  15. 15.
    Hiemstra, D.: Using Language Models for Information Retrieval. PhD thesis, University of Twente (2001)Google Scholar
  16. 16.
    Jansen, B.J., Goodrum, A., Spink, A.: Searching for multimedia: analysis of audio, video and image web queries. World Wide Web 3(4), 249–254 (2000)MATHCrossRefGoogle Scholar
  17. 17.
    Jansen, B.J., Spink, A., Saracevic, T.: Real life, real users, and real needs: a study and analysis of user queries on the web. Information Processing and Management 36(2), 207–227 (2000)CrossRefGoogle Scholar
  18. 18.
    Jansen, B.J., Spink, A., Blakely, C., Koshman, S.: Defining a session on web search engines. J. Am. Soc. Inf. Sci. Technol. 58(6), 862–871 (2007)CrossRefGoogle Scholar
  19. 19.
    John, G.H., Langley, P.: Estimating continuous distributions in bayesian classifiers. In: UAI 1995 (1995)Google Scholar
  20. 20.
    Kiryakov, A., Popov, B., Terziev, I., Manov, D., Ognyanoff, D.: Semantic annotation, indexing, and retrieval. J. Web Sem. 2(1), 49–79 (2004)Google Scholar
  21. 21.
    Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)MATHGoogle Scholar
  22. 22.
    Mcguinness, D.L.: Ontologies come of age. In: Fensel, D., Hendler, J., Lieberman, H., Wahlster, W. (eds.) Spinning the Semantic Web: Bringing the World Wide Web to Its Full Potential. MIT Press, Cambridge (2003)Google Scholar
  23. 23.
    Meij, E., Mika, P., Zaragoza, H.: Investigating the demand side of semantic search through query log analysis. In: SemSearch 2009 (2009)Google Scholar
  24. 24.
    Mihalcea, R., Csomai, A.: Wikify!: Linking documents to encyclopedic knowledge. In: CIKM 2007 (2007)Google Scholar
  25. 25.
    Milne, D., Witten, I.H.: Learning to link with wikipedia. In:CIKM 2008 (2008)Google Scholar
  26. 26.
    Mishne, G., de Rijke, M.: A study of blog search. In: Lalmas, M., MacFarlane, A., Rüger, S.M., Tombros, A., Tsikrika, T., Yavlinsky, A. (eds.) ECIR 2006. LNCS, vol. 3936, pp. 289–301. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  27. 27.
    Platt, J.C.: Fast training of support vector machines using sequential minimal optimization. In: Advances in kernel methods: support vector learning. MIT Press, Cambridge (1999)Google Scholar
  28. 28.
    Ponte, J.M., Croft, W.B.: A language modeling approach to information retrieval. In: SIGIR 1998 (1998)Google Scholar
  29. 29.
    Quinlan, R.J.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco (1993)Google Scholar
  30. 30.
    Shvaiko, P., Euzenat, J.: A survey of schema-based matching approaches. Journal on Data Semantics 4(3730), 146–171 (2005)CrossRefGoogle Scholar
  31. 31.
    Spink, A., Jansen, B.J., Wolfram, D., Saracevic, T.: From e-sex to e-commerce: Web search changes. IEEE Computer 35(3), 107–109 (2002)Google Scholar
  32. 32.
    Stoilos, G., Stamou, G., Kollias, S.D.: A string metric for ontology alignment. In: Gil, Y., Motta, E., Benjamins, V.R., Musen, M.A. (eds.) ISWC 2005. LNCS, vol. 3729, pp. 624–637. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  33. 33.
    Suchanek, F.M., Kasneci, G., Weikum, G.: Yago: a core of semantic knowledge. In: WWW 2007 (2007)Google Scholar
  34. 34.
    van Hage, W.R., de Rijke, M., Marx, M.: Information retrieval support for ontology construction and use. In: McIlraith, S.A., Plexousakis, D., van Harmelen, F. (eds.) ISWC 2004. LNCS, vol. 3298, pp. 518–533. Springer, Heidelberg (2004)Google Scholar
  35. 35.
    Vapnik, V.N.: The nature of statistical learning theory. Springer, Heidelberg (1995)MATHGoogle Scholar
  36. 36.
    Wang, S., Englebienne, G., Schlobach, S.: Learning concept mappings from instance similarity. In: Sheth, A.P., Staab, S., Dean, M., Paolucci, M., Maynard, D., Finin, T., Thirunarayan, K. (eds.) ISWC 2008. LNCS, vol. 5318, pp. 339–355. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  37. 37.
    Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, San Francisco (2005)MATHGoogle Scholar
  38. 38.
    Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: ICML 1997 (1997)Google Scholar
  39. 39.
    Zhai, C., Lafferty, J.: A study of smoothing methods for language models applied to information retrieval. ACM Trans. Inf. Syst. 22(2), 179–214 (2004)CrossRefGoogle Scholar
  40. 40.
    Zhou, Y., Croft, B.W.: Query performance prediction in web search environments. In: SIGIR 2007 (2007)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Edgar Meij
    • 1
  • Marc Bron
    • 1
  • Laura Hollink
    • 2
  • Bouke Huurnink
    • 1
  • Maarten de Rijke
    • 1
  1. 1.ISLAUniversity of AmsterdamAmsterdam
  2. 2.Dept. of Computer ScienceVU University AmsterdamAmsterdam

Personalised recommendations