What is the Role of NLP in Text Retrieval?

Jones, Karen Sparck

doi:10.1007/978-94-017-2388-6_1

Karen Sparck Jones⁴

Part of the book series: Text, Speech and Language Technology ((TLTB,volume 7))

313 Accesses
27 Citations

Abstract

This paper addresses the value of linguistically-motivated indexing (LMI) for document and text retrieval. After reviewing the basic concepts involved and the assumptions on which LMI is based, namely that complex index descriptions and terms are necessary, I consider past and recent research on LMI, and specifically on automated LMI via NLP. Experiments in the first phase of research, to the late eighties, did not demonstrate value in LMI, but were very limited; but the much larger tests of the Nineties, with full text, have not done so either. My conclusion is that LMI is not needed for effective retrieval, but has other important roles within information-selection systems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bely, N., Borillo, A., Virbel, J. and Siot-Decauville, N. (1970) Procédures d’analyse sémantique appliquées a la documentation scientifique. Paris: Gauthier-Villars.
Google Scholar
Biebricher, B. et al. (1988) The automatic indexing system AIR PHYS - from research to application. Proceedings of the 11th International Conference on Research and Development in Information Retrieval (ACM SIGIR.), pp. 333–342.
Google Scholar
Buckley, C., Allan, J. and Salton, G. (1995) Automatic routing and retrieval using SMART: TREC-2. Information Processing and Management. 31 (3), pp. 315–326.
Article Google Scholar
Callan, J., Croft, W.B. and Broglio, J. (1995) TREC and TIPSTER Experiments with INQUERY. Information Processing and Management, 31 (3), pp. 385–395.
Article Google Scholar
Chan, L.M., Richmond, P.A. and Svenonius, E. (Eds.) (1985) Theory of subject analysis: a sourcebook. Littleton, CO: Libraries Unlimited.
Google Scholar
Cleverdon, C.W. (1967) The Cranfield tests on index language devices. Aslib Proceedings, 19, 1967, pp. 173–192.
Article Google Scholar
Cleverdon, C.W. (1977) A computer evaluation of searching by controlled language and natural language in an experimental NASA data base. Report ESA 1 432, European Space Agency, Frascati, Italy.
Google Scholar
Croft, W.B., Turtle, H.R. and Lewis, D.D. (1991) The use of phrases and structured queries in information retrieval. SIGIR 91, Proceedings of the 1 4 th Annual International ACM SICIR Conference on Research and Development in Information Retrieval, pp. 32–45.
Google Scholar
Damerau, F.J. (1993) Generating and evaluating domain-oriented multi-word terms from texts. Information Processing and Management, 29 (4), pp. 433–447.
Article Google Scholar
Dillon, M. and Gray, A.S. (1983) Fully automatic syntaxt-based indexing. Journal of the American Society for Information Science, 34 (2), pp. 99–108.
Article Google Scholar
Evans, D.A. and Lefferts, R.G. (1995) CLARIT-TREC Experiments. Information Processing and Management, 31 (3), pp. 385–395.
Article Google Scholar
Fagan, J.L. (1987) Experiments in automatic phrase indexing for document retrieval: a comparison on syntactic and non-syntactic methods. PhD Thesis, Department of Computer Science, Cornell University; TR 87–868.
Google Scholar
Fagan, J.L. (1989) The effectiveness of a non-syntactic approach to automatic phrase indexing for document retrieval. Journal of the American Society for Information Science, 40 (2), pp. 115–132.
Article Google Scholar
Hayes, P.J. (1992) Intelligent, high-cvolume text processing using shallow, domain-specific techniques. In Text-based intelligent systems, Ed. P.S. Jacobs, Hillsdale NJ: Lawrence Erlbaum Associates, pp. 227–241.
Google Scholar
Hahn, U. (1990) Topic parsing: accounting for text macro structures in full-text analysis. Information Processing and Management, 26, pp. 135–170.
Article Google Scholar
Harman. D. (1991) How effective is suffixing? Journal of the American Society for Information Science, 42 (1), pp. 7–15.
Article Google Scholar
Hillman, D.J. (1968) Negotiation of inquiries in an online retrieval system. Information Storage and Retrieval, 4, pp. 219–238.
Article Google Scholar
Hull, D.M. (1990) Stemming algorithms: a case study for detailed evaluation. Journal of the American Society for Information Science, 47 (1), pp. 70–84.
Article Google Scholar
Hutchins, W.J. (1975) Languages of indexing and classification. Stevenage, Herts: Peter Peregrinus.
Google Scholar
Jacobs, P.S. and Rau, L.F. (1988) Natural language techniques for intelligent information retrieval. Proceedings of the 11th International Conference on Research and Development in Information Retrieval (ACM SIGIR,), pp. 85–99.
Google Scholar
Klingbiel, P.H. (1973) A technique for machine-aided indexing. Information Storage and Retrieval, 9 (9), pp. 477–494.
Article Google Scholar
Klingbiel, P.H. and Rinker, C.C. (1976) Evaluation of machine-aided indexing. Information Processing and Management, 12 (6), pp. 351–366.
Article Google Scholar
Krovetz, R. and Croft, W.B. (1992) Lexical ambiguity and information retrieval. ACM Transactions on Information Systems, 10 (2), pp. 115–141.
Article Google Scholar
Lancaster, F.W. (1972) Vocabulary control for information retrieval. Washington, DC: Information Reswources Press.
Google Scholar
Lewis, D.D. (1991) Representation and learning in information retrieval. PhD Thesis, Department of Computer and Information Science, University of Massachusetts at Amherst, TR 91–93.
Google Scholar
Mauldin, M. (1991) Retrieval performance in FERRET: a conceptual information retrieval system. SIGIR 91, Proceedings of the 14th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 347–355.
Google Scholar
MUC-6 (1996) Proceedings of the Sixth Message Understanding Conference (MUC-6), San Francisco, CA: Morgan Kaufmann
Google Scholar
Porter, M.F. (1980) An algorithm for suffix stripping. Program, 14, pp. 130–137.
Article Google Scholar
Salton, G. (1968) Automatic information organisation and retrieval. New York: McGraw-Hill.
Google Scholar
Salton, G. (1972) A new comparison between conventional indexing (MEDLARS) and automatic text processing (SMART). Journal of the American Society for Information Science, 23 (2), pp. 75–84.
Article Google Scholar
Salton, G. and McGill, M.J. (1983) Introduction to modern information retrieval. New York: McGraw-Hill.
Google Scholar
Schütze, H. and Pedersen, J.O. (1995) Information retrieval based on word senses. Fourth Annual Symposium on Document Analysis and Information Retrieval, Information Science Research Institute, University of Nevada, Las Vegas, pp. 161–175.
Google Scholar
Silvester, J.P., Genuardi, M.T. and Klingbiel, P.H. (1994) Machine-aided indexing at NASA. Information Processing and Management, 30 (5), pp. 631–645.
Article Google Scholar
Srneaton, A.F. and van Rijsbergen, C.J. (1988) Experiments in incorporating syntactic processing of user queries into a document retrieval strategy. Proceedings of the 11th International Conference on Research and Development in Information Retrieval (ACM SIGIR), pp. 32–51.
Google Scholar
Sparck Jones, K. and Tait, J.I. (1984) Automatic search term variant generation. Journal of Documentation, 40, pp. 50–66.
Article Google Scholar
Strzalkowski, T. (1994) Robust text processing in automated information retrieval. Pro-ceedings of the 4th Conference on Applied Natural Language Processing ( Stuttgart ), Association for Computational Lingustics, pp. 168–173.
Google Scholar
Strzalkowski, T. (1995) Natural language information retrieval. Information Processing and Management, 31 (3), pp. 397–417.
Article Google Scholar
TIPSTER (1996) Tipster Text Program, Phase II. Proceedings of a Workshop held at Vienna, Virginia May 6–8 1996. San Francisco, CA: Morgan Kaufmann.
Google Scholar
TREC (1993–1997) Proceedings of the First Text REtrieval Conference (TREC-1). Ed. D.K. Harman, Special Publication 500–207, National Institute of Standards and Technology, Gaithersburg, MD, 1993; Second (TREC-4),500–215, 1994; Third (TREC-3),500–225, 1995; Fourth (TREC-4),500–236, 1996; Fifth (TREC-5),1997.
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Laboratory, University of Cambridge, New Museum Site, Pembroke Street, Cambridge, CB2 3QG, England
Karen Sparck Jones

Authors

Karen Sparck Jones
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

General Electric, Research & Development, 12301, Schenectady, NY, USA
Tomek Strzalkowski

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Jones, K.S. (1999). What is the Role of NLP in Text Retrieval?. In: Strzalkowski, T. (eds) Natural Language Information Retrieval. Text, Speech and Language Technology, vol 7. Springer, Dordrecht. https://doi.org/10.1007/978-94-017-2388-6_1

Download citation

DOI: https://doi.org/10.1007/978-94-017-2388-6_1
Publisher Name: Springer, Dordrecht
Print ISBN: 978-90-481-5209-4
Online ISBN: 978-94-017-2388-6
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics