Short query linguistic expansion techniques: Palliating one-word queries by providing intermediate structure to text

  • Gregory Grefenstette
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1299)


The usual approach to finding information on the WWW via existing Web browsers is to use a one or two word query. Browsers return a number of documents containing these words, and the user examines those documents, or their abstracts, sees how the word or words in their query are being used and alters their initial query accordingly. This contrasts markedly with the Information Retrieval models explored by researchers over the past thirty-five years. These models were designed for longer queries and do not provide an adequate response to the user needs. On the other hand, recent advances in natural language processing permit the extraction of typed information that is axed on one or two words. We review a selection of this typed information and describe how it could be used to present an intermediate structure for the user fitting between their short queries and the documents found in a heterogeneous text collection such as the WWW.


Information Retrieval Noun Phrase Natural Language Processing Direct Object Vector Space Model 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Salah Ait-Mokhtar and Jean-Pierre Chanod. Incremental finite-state parsing. In ANLP'97, pages 72–79, Washington, 1997.Google Scholar
  2. 2.
    D.C. Blair and M.E. Maron. An evaluation of retrieval effectiveness. Communications of the ACM, 28:289–299, 1985.CrossRefGoogle Scholar
  3. 3.
    C. Borkowski. An experimental system for the automatic identification of personal names and personal titles in newspaper texts. American Documentation, 18:131, July 1967.CrossRefGoogle Scholar
  4. 4.
    Eric Brill. A simple Rule-Based part of speech tagger. In Proceedings of the Third conference on Applied Natural Language Processing, Trento, Italy, 1992. ACL.Google Scholar
  5. 5.
    Ted Briscoe, Greg Grefenstette, Lluís Padró, and Iskander Serai. Hybrid techniques for training hmm part-of-speech tagger. Technical Report MLTT-007, Rank Xerox Research Centre, 1994.Google Scholar
  6. 6.
    Chris Buckley, Amit Singhal, and Mindhar Mitra. New retrieval approaches using smart: Trec4. In D.K. Harman, editor, The Fourth Text Retrieval Conference (TREC-4), pages 25–48. U.S. Department of Commerce, 1996. NIST Special Publication 500–236.Google Scholar
  7. 7.
    John Carroll and Ted Briscoe. The derivation of a large computational lexicon for english from ldoce. In B. Boguraev and T. Briscoe, editors, Computational Lexicography for Natural Language Processing, London, 1989. Longman.Google Scholar
  8. 8.
    J.P. Chanod and P. Tapanainen. Creating a tagset, lexicon and guesser for a french tagger. In Proceedings of the A CL SIGDAT Workshop, Dublin, Ireland, 1995.Google Scholar
  9. 9.
    Eugene Charniak. Statistical Language Learning. MIT Press, Cambridge, Mass, 1993.Google Scholar
  10. 10.
    Fah-Chun Cheong. Internet Agents: Spiders, Wanderers, Brokers and 'Bots. New Riders Publishing, Indianapolis, 1996.Google Scholar
  11. 11.
    Kenneth Ward Church and Patrick Hanks. Word association norms, mutual information, and lexicography. Computational Linguistics, 16(1):22–29, March 1990.Google Scholar
  12. 12.
    Cyril W. Cleverdon. The significance of the cranfield tests on index languages. In A. Bookstein, Y. Chiaramella, G. Salton, and V. V. Raghavan, editors, Proceedings of the 14th Annual International ACM/SIGIR Conference on Research and Development in Information Retrieval, pages 3–131, New York, Oct 13–16 1991. SIGIR'91, Association for Computing Machinery. Special issue of the SIGIR Forum.Google Scholar
  13. 13.
    Doug Cutting, Julian Kupiec, Jan Pedersen, and Penelope Sibun. A practical part-of-speech tagger. Proceedings of the Third Conference on Applied Natural Language Processing, April 1992.Google Scholar
  14. 14.
    Douglas Cutting, Jan O. Pedersen, David Karger, and John W. Tukey. Scatter/Gather: A cluster-based approach to browsing large document collections. In Proceedings of SIGIR'92, pages 318–329, Copenhagen, Denmark, June 21–24 1992. ACM.Google Scholar
  15. 15.
    Steven J. DeRose. Grammatical category disambiguation by statistical optimization. Computational Linguistics, 14(1):31–39, Winter 1988.Google Scholar
  16. 16.
    David D. Donaldson. Internal and external evidence in the identification and semantic categorization of proper names. In B. Boguraev and J. Pustejovsky, editors, Proceedings of the SIGLEX Workshop on Acquisition of Lexical Knowledge from Text, pages 32–43, Columbus, OH, 1993.Google Scholar
  17. 17.
    Lauren B. Doyle. Semantic road maps for literature searchers. Journal of the ACM, 8(4):553–578, October 1961.CrossRefGoogle Scholar
  18. 18.
    G. Grefenstette. Light parsing as finite state filtering. In Workshop on Extended finite state models of language, Budapest, Hungary, Aug 11–12 1996. ECAI'96.Google Scholar
  19. 19.
    G. Grefenstette and F. Segond. Multilingual natural language processing. International Corpus of Corpus Linguistics, 2(1), 1997.CrossRefGoogle Scholar
  20. 20.
    G. Grefenstette and P. Tapanainen. What is a word, what is a sentence? Problems of tokenization. In 3rd Conference on Computational Lexicography and Text Research, Budapest, Hungary, 7–10 July 1994. COMPLEX'94. Scholar
  21. 21.
    Gregory Grefenstette. Explorations in Automatic Thesaurus Discovery. Kluwer Academic Press, Boston, 1994.CrossRefGoogle Scholar
  22. 22.
    Gregory Grefenstette. Corpus-derived first, second and third-order word affinities. In Sixth Euralex International Congress, Amsterdam, Aug 3–Sept 3, 1994.Google Scholar
  23. 23.
    Gregory Grefenstette. Comparing two language identification schemes. In Proceedings of the 3rd International Conference on the Statistical Analysis of Textual Data, JADT'95, Rome, Dec 11–13, 1995.Google Scholar
  24. 24.
    Gregory Grefenstette, Ulrich Heid, and Thierry Fontenelle. The DECIDE; project: Multilingual collocation extraction. In Seventh Euralex International Congress, University of Gothenburg, Sweden, Aug 13–18, 1996.Google Scholar
  25. 25.
    Donna Harman. Relevance feedback revisited. In Proceedings of SIGIR'92, Copenhagen, Denmark, June 21–24 1992. ACM.Google Scholar
  26. 26.
    Donna Harman, editor. The First Text REtrieval Conference (TREC-1). U.S. Government Printing Office, Washington, 1993. NIST Special Publication 500207.Google Scholar
  27. 27.
    Marti A. Hearst, David Karger, and Jan O. Pedersen. Scatter/gather as a tool for the navigation of retrieval results. In Robin Burke, editor, Working Notes of the AAAI Fall Symposium on AI Applications in Knowledge Navigation and Retrieval, Cambridge, MA, November 1995. AAAI.Google Scholar
  28. 28.
    David A. Hull. Stemming algorithms: A case study for detailed evaluation. JASIS, 47(1), January 1996. Special Issue on the Evaluation of Information Retrieval systems.Google Scholar
  29. 29.
    L. Karttunen, J.P Chanod, G. Grefenstette, and A. Schiller. Regular expression for language engineering. Journal of Natural Language Engineering, 1997.Google Scholar
  30. 30.
    Lauri Karttunen. Finite-state lexicon compiler. Technical Report ISTL-NLTT1993-04-02, Xerox, Palo Alto Research Center, April 1993.Google Scholar
  31. 31.
    Lauri Karttunen. Directed replacement. In Proceedings of the 34rd Annual Meeting of the A CL, Santa Cruz, CA, 1996.Google Scholar
  32. 32.
    K. L. Kwok. A new method for weighting query terms for ad-hoc retrieval. In Proc. of the 19th ACMISIGIR Conference, pages 187–196, 1996.Google Scholar
  33. 33.
    X. A. Lu and R. B. Keefer. Query expansion/reduction and its impact on information retrieval effectiveness. In Donna Harman, editor, The Thirs Text REtrieval Conference (TREC-3), pages 231–239, Washington, 1995. U.S. Government Printing Office. NIST Special Publication 500-225.Google Scholar
  34. 34.
    M. F. Porter. An algorithm for suffix stripping. Program, 14(3):130–137, 1980.CrossRefGoogle Scholar
  35. 35.
    G. Russell, S. Pulman, G. Ritchie, and A. Black. A dictionary and morphological analyser for english. In 11th International Conference on Computational Linguistics, pages 277–279, Bonn, Germany, 1987.Google Scholar
  36. 36.
    Gerard Salton. A note on information retrieval models. In RIAO'85, pages 2–27, Grenoble, France, March 18–20 1985. CID, Paris, and IMAG.Google Scholar
  37. 37.
    Gerard Salton and M. McGill. An Introduction to Modern Information Retrieval. McGraw-Hill, New York, 1983.zbMATHGoogle Scholar
  38. 38.
    Anne Schiller. Multilingual finite-state noun phrase extraction. In Workshop on Extended finite state models of language, Budapest, Hungary, Aug 11–12 1996. ECAI'96.Google Scholar
  39. 39.
    Anne Schiller. Multilingual part-of-speech tagging and noun phrase mark-up. In 15th European Conference on Grammar and Lexicon of Romance Languages, University of Munich, Sept 19–21 1996.Google Scholar
  40. 40.
    Pasi Tapanainen. RXRC finite-state compiler. Technical Report MLTT-020, Rank Xerox Research Centre, Grenoble, April 1995.Google Scholar
  41. 41.
    Atro Voutilainen, Julia Heikkila, and Arto Anttila. A lexicon and constraint grammar of english. In Proceedings of the Fourteenth International Conference on Computational Linguistics, Nantes, France, July 1992. COLING'92.Google Scholar
  42. 42.
    Beatrice Warren, editor. Semantic Patterns of Noun-Noun Compounds. Acta Universitatis Gothoburgensis, Goteborg, Sweden, 1978. Gothenburg Studies in English, 41.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1997

Authors and Affiliations

  • Gregory Grefenstette
    • 1
  1. 1.Rank Xerox Research CentreMeylanFrance

Personalised recommendations