Tuning Topical Queries through Context Vocabulary Enrichment: A Corpus-Based Approach

  • Carlos M. Lorenzetti
  • Ana G. Maguitman
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5333)


Context-based Web search has become an important research area and many strategies have been proposed to reflect contextual information in search queries. Despite the success of some of these proposals they still have serious limitations due to their inability to bridge the terminology gap existing between the user context description and the relevant documents’ vocabulary. This paper presents a quantitative technique to learn vocabularies useful for describing the theme of a context under analysis. The enriched vocabulary allows the formulation of search queries to identify resources with higher precision than those identified using the initial vocabulary. Rigorous experimentation leads us to conclude that the proposed technique is superior to a baseline and other well-known query reformulation techniques.


Semantic Similarity Relevance Feedback Latent Semantic Analysis Query Term Query Expansion 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Amanti, G.: Probabilistics Models for Information Retrieval based on Divergence from Randomness. PhD thesis, Department of Computing Science, University of Glasgow, UK (2003)Google Scholar
  2. 2.
    Amati, G., Carpineto, C., Romano, G.: Query difficulty, robustness and selective application of query expansion. In: Advances in Information Retrieval, 26th European Conference on IR research, pp. 127–137. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  3. 3.
    Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison-Wesley, Reading (1999)Google Scholar
  4. 4.
    Billerbeck, B., Scholer, F., Williams, H.E., Zobel, J.: Query expansion using associated queries. In: Proceedings of the twelfth international conference on Information and knowledge management, pp. 2–9. ACM Press, New York (2003)Google Scholar
  5. 5.
    Budzik, J., Hammond, K.J., Birnbaum, L.: Information access in context. Knowledge based systems 14(1-2), 37–53 (2001)CrossRefGoogle Scholar
  6. 6.
    Deerwester, S.C., Dumais, S.T., Landauer, T.K., Furnas, G.W., Harshman, R.A.: Indexing by latent semantic analysis. Journal of the American Society of Information Science 41(6), 391–407 (1990)CrossRefGoogle Scholar
  7. 7.
    Holland, J.H.: Adaptation in natural and artificial systems. The University of Michigan Press, Ann Arbor (1975)Google Scholar
  8. 8.
    Kraft, R., Chang, C.C., Maghoul, F., Kumar, R.: Searching with context. In: WWW 2006: Proceedings of the 15th international conference on World Wide Web, pp. 477–486. ACM, New York (2006)Google Scholar
  9. 9.
    Kwok, K.L., Chan, M.: Improving two-stage ad-hoc retrieval for short queries. In: SIGIR 1998: Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, pp. 250–256. ACM, New York (1998)Google Scholar
  10. 10.
    Lorenzetti, C.M., Cecchini, R.L., Maguitman, A.G.: Intelligent methods for information access in context: The role of topic descriptors and discriminators. In: VIII Workshop de Agentes y Sistemas Inteligentes - CACIC 2007: XIII Congreso Argentino de Ciencias de la Computación, Corrientes, Argentina (October 2007)Google Scholar
  11. 11.
    Maguitman, A., Leake, D., Reichherzer, T.: Suggesting novel but related topics: towards context-based support for knowledge model extension. In: IUI 2005: Proceedings of the 10th international conference on Intelligent user interfaces, pp. 207–214. ACM Press, New York (2005)Google Scholar
  12. 12.
    Maguitman, A., Leake, D., Reichherzer, T., Menczer, F.: Dynamic extraction of topic descriptors and discriminators: Towards automatic context-based topic search. In: Proceedings of the Thirteenth Conference on Information and Knowledge Management (CIKM). ACM Press, Washington (2004)Google Scholar
  13. 13.
    Maguitman, A.G., Menczer, F., Roinestad, H., Vespignani, A.: Algorithmic detection of semantic similarity. In: WWW 2005: Proceedings of the 14th international conference on World Wide Web, pp. 107–116. ACM, New York (2005)Google Scholar
  14. 14.
    Ounis, I., Lioma, C., Macdonald, C., Plachouras, V.: Research directions in Terrier: a search engine for advanced retrieval on the web. In: Baeza-Yates, R., et al. (eds.) Novatica/UPGRADE Special Issue on Web Information Access, February 2007, vol. VIII(1), pp. 49–56 (2007)Google Scholar
  15. 15.
    Ramirez, E.H., Brena, R.F.: Semantic contexts in the internet. In: LA-WEB 2006: Proceedings of the Fourth Latin American Web Congress, Washington, DC, USA, pp. 74–81. IEEE Computer Society, Los Alamitos (2006)CrossRefGoogle Scholar
  16. 16.
    Rennie, J.D.M., Jaakkola, T.: Using term informativeness for named entity detection. In: SIGIR 2005: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 353–360. ACM, New York (2005)CrossRefGoogle Scholar
  17. 17.
    Rocchio, J.J.: Relevance feedback in information retrieval. In: Salton, G. (ed.) The Smart retrieval system - experiments in automatic document processing, pp. 313–323. Prentice-Hall, Englewood Cliffs (1971)Google Scholar
  18. 18.
    Salton, G., Buckley, C.: Term weighting approaches in automatic text retrieval. Information Processing and Management 24(5), 513–523 (1988)CrossRefGoogle Scholar
  19. 19.
    Scholer, F., Williams, H.E.: Query association for effective retrieval. In: Proceedings of the eleventh international conference on Information and knowledge management, pp. 324–331. ACM Press, New York (2002)Google Scholar
  20. 20.
    Turney, P.D.: Mining the web for synonyms: PMI-IR versus LSA on TOEFL. In: Flach, P.A., De Raedt, L. (eds.) ECML 2001. LNCS (LNAI), vol. 2167, pp. 491–502. Springer, Heidelberg (2001)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Carlos M. Lorenzetti
    • 1
  • Ana G. Maguitman
    • 1
  1. 1.Grupo de Investigación en Recuperación de Información y Gestión del Conocimiento, LIDIA - Laboratorio de Investigación y Desarrollo en Inteligencia Artificial, Departamento de Ciencias e Ingeniería de la Computación, Universidad Nacional del Sur, Av. Alem 1253, (B8000CPB) Bahía Blanca, Argentina, CONICET - Consejo Nacional de Investigaciones Científicas y TécnicasArgentina

Personalised recommendations