Abstract
The paper discusses the traditional, and ongoing, question as to whether natural language processing (NLP) techniques, or indeed and representational techniques at all, aid in the retrieval of information, as that task is traditionally understood. The discussion is partly a response to Karen Sparck Jones’ (1999) claim that artificial intelligence, and by implication NLP, should learn from the methodology of Information Retrieval (IR), rather than vice versa, as the first sentence above implies. The issue has been made more interesting and complicated by the shift of interest from classic IR experiments with very long queries to Internet search queries which are typically of two highly ambiguous terms. This simple fact has changed the assumptions of the debate. Moreover, the return to statistical and empirical methods with NLP have made it less clear what an NLP technique, or even a “representational” method, is. The paper also notes the growth of “language models” within IR and the use of the term “translation” in recent years to describe a range of activities, including IR, and which constitutes rather the opposite of what Sparck Jones was calling for.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Agichtein, E., Grishman, R., Borthwick, A., Sterling, J.: Description of the named entity system as used in MUC-7. In: Proceedings of the MUC-7 Conference, NYU (1998)
Azzam, S., Humphreys, K., Gaizauskas, R.: Using coreference chains for text summarization. In: Proc. ACL Workshop on Coreference and its Applications, Maryland (1999)
Bely, N., Borillo, A., Virbel, J., Siot-Decauville, N.: Procedures d’analyse semantique appliquees a la documentation scientifique. Gauthier-Villars, Paris (1970)
Berger, A., Lafferty, J.: Information retrieval as statistical translation. In: SIGIR 1999 (1999)
Berger, A., et al.: Bridging the lexical chasm: statistical approaches to question answering. In: SIGIR 2000 (2000)
Bikel, D., Miller, S., Schwartz, R., Weischedel, R.: Nymble: a High-Performance Learning Name-finder. In: Proceedings of the Fifth conference on Applied Natural Language Processing (1997)
Brill, E.: Some Advances in Transformation-Based Part of Speech Tagging. In: Proceedings of the Twelfth National Conference on AI (AAAI 1994), Seattle, Washington (1994)
Brown, P.F., Cocke, J.: A Statistical Approach to Machine Translation, IBM Research Division, T.J. Watson Research Center, RC 14773 (1989)
Carberry, S., Samuel, K., Vijay-Shanker, K.: Dialogue act tagging with transformation-based learning. In: Proceedings of the COLING-ACL 1998 Conference, Montreal, Canada, vol. 2, pp. 1150–1156 (1998)
Cardie, C.: Empirical methods in information extraction. AI Magazine 18(4) (1997); Special Issue on Empirical Natural Language Processing
Chiaramella, Y., Nie, J.: A retrieval model based on an extended modal logic and its application to the RIME experimental approach. In: Proceedings of the 13th ACM International Conference on Research and Development in Information Retrieval (SIGIR), pp. 25–43 (1990)
Collier, R.: Automatic Template Creation for Information Extraction. PhD thesis, University of Sheffield Computer Science Dept., UK (1998)
Cowie, J., Guthrie, L., Jin, W., Odgen, W., Pustejowsky, J., Wanf, R., Wakao, T., Waterman, S., Wilks, Y.: CRL/Brandeis: The Diderot System. In: Proceedings of Tipster Text Program (Phase I), Morgan Kaufmann, San Francisco (1993)
Cunningham, H.: JAPE – a Java Annotation Patterns Engine. Technical Report, Department of Computer Science, University of Sheffield (1999)
Daelemans, W., Zavrel, J., van der Sloot, K., van den Bosch, A.: TiMBL: Tilburg memory based learner version 1.0. Technical report, ILK Technical Report 98-03 (1998)
Gaizauskas, R., Wilks, Y.: Information Extraction: beyond document retrieval. Journal of Documentation (1997)
Gardin, J.: Syntol. Rutgers Graduate School of Library Science, New Brunswick (1965)
Gollins, T., Sanderson, M.: Improving Cross Language Information Retrieval with triangulated translation. In: SIGIR 2001 (2001)
Granger, R.: FOULUP: a program that figures out meanings of words from context. In: Proc. Fifth Joint Internat. Conf. on AI (1977)
Green, B., Wolf, A., Chomsky, C., Laughery, K.: BASEBALL, an automatic question answerer. In: Proc. Western Joint Computer Conference 19, pp. 219–224 (1961)
Grefenstette, G., Hearst, M.A.: Method for Refining Automatically-Discovered Lexical Relations: Combining Weak Techniques for Stronger Results. In: Weir (ed.) Statistically-based natural language programming techniques, Proc. AAAI Workshop, AAAI Press, Menlo Park (1992)
Grishman, R.: Information extraction: Techniques and challenges. In: Pazienza, M.T. (ed.) SCIE 1997. LNCS (LNAI), vol. 1299, Springer, Heidelberg (1997)
Grishman, R., Sterling, J.: Generalizing automatically generated patterns. In: Proceedings of COLING 1992 (1992)
Gross, M.: On the equivalence of models of language used in the fields of mechanical translation and information retrieval. Information Storage and Retrieval 2(1) (1964)
Hobbs, J.R.: The generic information extraction system. In: Proceedings of the Fifth Message Understanding Conference (MUC-5), pp. 87–91. Morgan Kaufman, San Francisco (1993)
Hutchins, W.J.: Linguistic processes in the indexing and retrieval of documents. Linguistics 61 (1970)
Jeffrey, K.: What’s next in databases? ERCIM News 39 (1999), www.ercim.org
Krovetz, R.: More than one sense per discourse. NEC Princeton NJ Labs., Research Memorandum (1998)
Lehnert, W.: A Conceptual Theory of Question Answering. In: Proc. Fifth IJCAI, pp. 158–164. Kaufmann, Cambridge (1977)
Lehnert, W., Cardie, C., Fisher, D., McCarthy, J., Riloff, E.: University of Massachusetts: Description of the CIRCUS system as used for MUC-4. In: Proceedings of the Fourth Message Understanding Conference MUC-4, pp. 282–288. Morgan Kaufmann, San Francisco (1992)
Lenat, D., Prakash, M., Shepherd, M.: CYC: Using common sense knowledge to overcome brittleness and knowledge acquisition bottlenecks. The AI Magazine 6(4) (1986)
Luhn, H.P.: A statistical approach to mechanized encoding and searching of literary information. IBM Journal of Research and Development 1, 309–317 (1957)
Mauldin, M.: Retrieval performance in FERRET: a conceptual information retrieval system. In: SIGIR 1991 (1991)
Miller, G.A. (ed.): WordNet: An on-line Lexical Database. International Journal of Lexicography 3(4) (1990)
Morgan, R., Garigliano, R., Callaghan, P., Poria, S., Smith, M., Urbanowicz, A., Collingham, R., Costantino, M., Cooper, C.: Description of the LOLITA System as used for MUC-6. In: Proceedings of the Sixth Message Understanding Conference (MUC-6), pp. 71–86. Morgan Kaufmann, San Francisco (1995)
Muggleton, S.: Recent advances in inductive logic programming. In: Proc. 7th Ann. ACM Workshop on Comput. Learning Theory, pp. 3–11. ACM Press, New York (1994)
Muggleton, S., Cussens, J., Page, D., Srinivasan, A.: Using inductive logic programming for natural language processing. In: van Someren, M., Widmer, G. (eds.) ECML 1997. LNCS, vol. 1224, pp. 25–34. Springer, Heidelberg (1997); Workshop Notes on Empirical Learning of Natural Language Tasks
Pearl, J.: Bayesian Networks: A Model of Self-Activated Memory for Evidential Reasoning. In: Proceedings of the Cognitive Science Society (CSS-7) (1985)
Pietrosanti, E., Graziadio, B.: Extracting Information for Business Needs. Unicom Seminar on Information Extraction, London (March 1997)
Riloff, E., Lehnert, W.: Automated dictionary construction for information extraction from text. In: Proceedings of Ninth IEEE Conference on Artificial Intelligence for Applications, pp. 93–99 (1993)
Riloff, E., Shoen, J.: Automatically acquiring conceptual patterns without an annotated corpus. In: Proceedings of the Third Workshop on Very Large Corpora (1995)
Roche, E., Schabes, Y.: Deterministic Part-of-Speech Tagging with Finite-State Transducers. Computational Linguistics 21(2), 227–254 (1995)
Salton, G.: A new comparison between conventional indexing (MEDLARS) and automatic text processing (SMART). Journal of the American Society of Information Science 23(2) (1972)
Schvaneveldt, R. (ed.): Pathfinder Networks: Theory and Applications. Ablex, Norwood (1990)
Smeaton, A., van Rijsbergen, C.: Experiments in incorporating syntactic processing of user queries into a document retrieval strategy. In: Proc. 11th. ACM SIGIR (1988)
Sparck Jones, K.: Synonymy and Semantic Classification. Edinburgh University Press, Edinburgh (1966/1986)
Sparck Jones, K.: What is the role of NLP in text retrieval. In: Strzalkowski (ed.) Natural language Information Retrieval, Kluwer, New York (1999a)
Sparck Jones, K.: Information Retrieval and Artificial Intelligence. Artificial Intelligence Journal 114 (1999b)
Stevenson, M., Wilks, Y.: Combining Weak Knowledge Sources for Sense Disambiguation. In: Proceedings of the International Joint Conference for Artifical Intelligence (IJCAI 1999) (1999)
Strzalkowski, T., Vauthey, B.: Natural Language Processing in Automated Information Retrieval, PROTEUS Project Memorandum. Department of Computer Science, New York University (1991)
Vilain, M.: Validation of terminological inference in an information extraction task. In: Proceedings of the 1993 ARPA Human Language Workshop (1993)
Wilks, Y.: Text Searching with Templates. Cambridge Language Research Unit Memo, ML.156 (1964)
Wilks, Y.: The application of CLRU’s method of semantic analysis to information retrieval. Cambridge Language Research Unit Memo, ML.173 (1965)
Wilks, Y.: Frames, semantics and novelty. In: Metzing (ed.) Frame Conceptions and Text Understanding, de Gruyter, Berlin (1979)
Wilks, Y.: Senses and Texts. In: Ide, N. (ed.) Special issue of Computers and the Humanities (1998)
Wilks, Y., Catizone, R.: Making information extraction more adaptive. In: Pazienza, M.-T. (ed.) Proc. Information Extraction Workshop, Frascati (1999)
Winograd, T.: Understanding Natural language (1971)
Winograd, T., Flores, A.: Understanding Computers and Cognition: A New Foundation for Design. Ablex, Norwood (1986)
Yarowsky, D.: Word-sense disambiguation using statistical models of Roget’s categories trained on large corpora. In: Proc. COLING 1992, Nantes, France (1992)
Yarowsky, D.: Unsupervised word-sense disambiguation rivalling supervised methods. In: Proc. ACL 1995 (1995)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wilks, Y. (2004). IR and AI: Traditions of Representation and Anti-representation in Information Processing. In: McDonald, S., Tait, J. (eds) Advances in Information Retrieval. ECIR 2004. Lecture Notes in Computer Science, vol 2997. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24752-4_2
Download citation
DOI: https://doi.org/10.1007/978-3-540-24752-4_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-21382-6
Online ISBN: 978-3-540-24752-4
eBook Packages: Springer Book Archive