Abstract
An important application of semantic web technology is recognizing human-defined concepts in text. Query transformation is a strategy often used in search engines to derive queries that are able to return more useful search results than the original query and most popular search engines provide facilities that let users complete, specify, or reformulate their queries. We study the problem of semantic query suggestion, a special type of query transformation based on identifying semantic concepts contained in user queries. We use a feature-based approach in conjunction with supervised machine learning, augmenting term-based features with search history-based and concept-specific features. We apply our method to the task of linking queries from real-world query logs (the transaction logs of the Netherlands Institute for Sound and Vision) to the DBpedia knowledge base. We evaluate the utility of different machine learning algorithms, features, and feature types in identifying semantic concepts using a manually developed test bed and show significant improvements over an already high baseline. The resources developed for this paper, i.e., queries, human assessments, and extracted features, are available for download.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Aleksovski, Z., Klein, M., ten Kate, W., van Harmelen, F.: Matching unstructured vocabularies using a background ontology. In: Staab, S., Svátek, V. (eds.) EKAW 2006. LNCS (LNAI), vol. 4248, pp. 182–197. Springer, Heidelberg (2006)
Auer, S., Lehmann, J.: What have Innsbruck and Leipzig in common? Extracting semantics from wiki content. In: The Semantic Web: Research and Applications (2007)
Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.G.: Dbpedia: A nucleus for a web of open data. In: Aberer, K., Choi, K.-S., Noy, N., Allemang, D., Lee, K.-I., Nixon, L.J.B., Golbeck, J., Mika, P., Maynard, D., Mizoguchi, R., Schreiber, G., Cudré-Mauroux, P. (eds.) ASWC 2007 and ISWC 2007. LNCS, vol. 4825, pp. 722–735. Springer, Heidelberg (2007)
Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison Wesley, Reading (1999)
Beitzel, S.M., Jensen, E.C., Chowdhury, A., Grossman, D., Frieder, O.: Hourly analysis of a very large topically categorized web query log. In: SIGIR 2004 (2004)
Bendersky, M., Croft, W.B.: Discovering key concepts in verbose queries. In: SIGIR 2008 (2008)
Bendersky, M., Croft, W.B.: Analysis of long queries in a large scale search log. In: WSCD 2009 (2009)
Bhole, A., Fortuna, B., Grobelnik, M., Mladenic, D.: Extracting named entities and relating them over time based on wikipedia. Informatica 4(4), 463–468 (2007)
Bizer, C., Cyganiak, R., Auer, S., Kobilarov, G.: DBpedia–querying Wikipedia like a database. In: WWW 2007 (2007)
Caracciolo, C., Euzenat, J., Hollink, L., Ichise, R., Isaac, A., Malaisé, V., Meilicke, C., Pane, J., Shvaiko, P., Stuckenschmidt, H., Šváb, O., Svátek, V.: Results of the OAEI 2008. In: The Third International Workshop on Ontology Matching at ISWC (2008)
Chemudugunta, C., Holloway, A., Smyth, P., Steyvers, M.: Modeling documents by combining semantic concepts with unsupervised statistical learning. In: Sheth, A.P., Staab, S., Dean, M., Paolucci, M., Maynard, D., Finin, T., Thirunarayan, K. (eds.) ISWC 2008. LNCS, vol. 5318, pp. 229–244. Springer, Heidelberg (2008)
Church, K.W., Gale, W.A.: Inverse document frequency (IDF): A measure of deviations from poisson. In: Proc. Third Workshop on Very Large Corpora (1995)
Fellbaum, C., Palmer, M., Dang, H.T., Delfs, L., Wolf, S.: Manual and automatic semantic annotation with wordnet. WordNet and Other Lexical Resources (2001)
Guo, J., Xu, G., Cheng, X., Li, H.: Named entity recognition in query. In: SIGIR 2009 (2009)
Hiemstra, D.: Using Language Models for Information Retrieval. PhD thesis, University of Twente (2001)
Jansen, B.J., Goodrum, A., Spink, A.: Searching for multimedia: analysis of audio, video and image web queries. World Wide Web 3(4), 249–254 (2000)
Jansen, B.J., Spink, A., Saracevic, T.: Real life, real users, and real needs: a study and analysis of user queries on the web. Information Processing and Management 36(2), 207–227 (2000)
Jansen, B.J., Spink, A., Blakely, C., Koshman, S.: Defining a session on web search engines. J. Am. Soc. Inf. Sci. Technol. 58(6), 862–871 (2007)
John, G.H., Langley, P.: Estimating continuous distributions in bayesian classifiers. In: UAI 1995 (1995)
Kiryakov, A., Popov, B., Terziev, I., Manov, D., Ognyanoff, D.: Semantic annotation, indexing, and retrieval. J. Web Sem. 2(1), 49–79 (2004)
Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)
Mcguinness, D.L.: Ontologies come of age. In: Fensel, D., Hendler, J., Lieberman, H., Wahlster, W. (eds.) Spinning the Semantic Web: Bringing the World Wide Web to Its Full Potential. MIT Press, Cambridge (2003)
Meij, E., Mika, P., Zaragoza, H.: Investigating the demand side of semantic search through query log analysis. In: SemSearch 2009 (2009)
Mihalcea, R., Csomai, A.: Wikify!: Linking documents to encyclopedic knowledge. In: CIKM 2007 (2007)
Milne, D., Witten, I.H.: Learning to link with wikipedia. In:CIKM 2008 (2008)
Mishne, G., de Rijke, M.: A study of blog search. In: Lalmas, M., MacFarlane, A., Rüger, S.M., Tombros, A., Tsikrika, T., Yavlinsky, A. (eds.) ECIR 2006. LNCS, vol. 3936, pp. 289–301. Springer, Heidelberg (2006)
Platt, J.C.: Fast training of support vector machines using sequential minimal optimization. In: Advances in kernel methods: support vector learning. MIT Press, Cambridge (1999)
Ponte, J.M., Croft, W.B.: A language modeling approach to information retrieval. In: SIGIR 1998 (1998)
Quinlan, R.J.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco (1993)
Shvaiko, P., Euzenat, J.: A survey of schema-based matching approaches. Journal on Data Semantics 4(3730), 146–171 (2005)
Spink, A., Jansen, B.J., Wolfram, D., Saracevic, T.: From e-sex to e-commerce: Web search changes. IEEE Computer 35(3), 107–109 (2002)
Stoilos, G., Stamou, G., Kollias, S.D.: A string metric for ontology alignment. In: Gil, Y., Motta, E., Benjamins, V.R., Musen, M.A. (eds.) ISWC 2005. LNCS, vol. 3729, pp. 624–637. Springer, Heidelberg (2005)
Suchanek, F.M., Kasneci, G., Weikum, G.: Yago: a core of semantic knowledge. In: WWW 2007 (2007)
van Hage, W.R., de Rijke, M., Marx, M.: Information retrieval support for ontology construction and use. In: McIlraith, S.A., Plexousakis, D., van Harmelen, F. (eds.) ISWC 2004. LNCS, vol. 3298, pp. 518–533. Springer, Heidelberg (2004)
Vapnik, V.N.: The nature of statistical learning theory. Springer, Heidelberg (1995)
Wang, S., Englebienne, G., Schlobach, S.: Learning concept mappings from instance similarity. In: Sheth, A.P., Staab, S., Dean, M., Paolucci, M., Maynard, D., Finin, T., Thirunarayan, K. (eds.) ISWC 2008. LNCS, vol. 5318, pp. 339–355. Springer, Heidelberg (2008)
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, San Francisco (2005)
Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: ICML 1997 (1997)
Zhai, C., Lafferty, J.: A study of smoothing methods for language models applied to information retrieval. ACM Trans. Inf. Syst. 22(2), 179–214 (2004)
Zhou, Y., Croft, B.W.: Query performance prediction in web search environments. In: SIGIR 2007 (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Meij, E., Bron, M., Hollink, L., Huurnink, B., de Rijke, M. (2009). Learning Semantic Query Suggestions. In: Bernstein, A., et al. The Semantic Web - ISWC 2009. ISWC 2009. Lecture Notes in Computer Science, vol 5823. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04930-9_27
Download citation
DOI: https://doi.org/10.1007/978-3-642-04930-9_27
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04929-3
Online ISBN: 978-3-642-04930-9
eBook Packages: Computer ScienceComputer Science (R0)