Information Access and Natural Language Processing: A Stimulating Dialogue

Part of the Text, Speech and Language Technology book series (TLTB, volume 36)

In this paper we examine the interplay between the requirements of information seekers to access information in large digital text collections and the techniques developed by natural language processing researchers to support this access. In particular we examine how language processing technologies such as question answering, single and multidocument summarisation, and ontology-guided similar event searching can assist journalists in gathering information from news archives for the purpose of writing background to a breaking news event – the Cub Reporter scenario. Our thesis is that investigating real-world tasks with complex information access requirements, such as the Cub Reporter scenario, stimulates researchers to look beyond existing search engine solutions and drives the development and evaluation of novel language processing techniques; at the same time novel developments in language processing capabilities allow both conceptual insights into how to characterise information seeking behaviour and empirical insights based on observation of information seeking behaviour using new technologies


Natural Language Processing Information Access Word Sense Question Answering Document Retrieval 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. E. Agirre and O. Lopez de Lacalle. 2003. Clustering WordNet Word Senses. In Proceedings of RANLP 2003, p. 121–130.Google Scholar
  2. E.J. Barker and R. Gaizauskas. 2005. Evaluating Cub Reporter: proposals for extrinsic evaluation of journalists using language technologies to access a news archive in background research. In Proceedings of the COLIS 2005 Workshop on Evaluating User Studies in Information Access. To appear.Google Scholar
  3. H. Cunningham, D. Maynard, K. Bontcheva and V. Tablan. 2002. GATE: A framework and graphical development environment for robust NLP tools and applications. In Proceedings of the 40th Anniversary Meeting of the Association for Computational Linguistics.Google Scholar
  4. R. Gaizauskas, M. Hepple, H. Saggion and M. Greenwood. 2005. SUPPLE: A Practical Parser for Natural Language Engineering Applications. In International Workshop on Parsing Technologies.Google Scholar
  5. D. Jurafsky and J.H. Martin. 2000. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition. Prentice Hall, Upper Saddle River, NJ.Google Scholar
  6. H.P. Luhn. (1999). The automatic creation of literature abstracts. IBM Journal of Research & Development, 2(2):159–165, 1958. Reprinted in Mani and Maybury.CrossRefGoogle Scholar
  7. I. Mani and M.T. Maybury. (eds.). 1999. Advances in Automatic Text Summarization. The MIT Press.Google Scholar
  8. D. Marcu. 2000. The Theory and Practice of Discourse Parsing and Summarization. MIT Press, Cambridge, MA.Google Scholar
  9. D. Milward and J. Thomas. 2000. From information retrieval to information extraction. In Proceedings of the ACL Workshop on Recent Advances in Natural Language Processing and Information Retrieval. Available at: Scholar
  10. P. Over and J. Yen. 2004. Introduction to DUC-2004: An intrinsic evaluation of generic news text summarization systems. In Proceedings of the HLT/NAACL 2004 Document Understanding Workshop (DUC-2004). Available at: 2004slides/duc2004.intro.pdf.Google Scholar
  11. N. Sager. 1981. Natural Language Information Processing. Addison-Wesley, Reading, MA.Google Scholar
  12. H. Saggion. 2002. Shallow-based Robust Summarization. In Automatic Summarization: Solutions and Perspectives, ATALA, December, 14.Google Scholar
  13. H. Saggion and R. Gaizauskas. 2004a. Multi-document summarization by cluster/profile relevance and redundancy removal. In Proceedings of Document Understanding Conference, Boston, MA, May 6–7. NIST.Google Scholar
  14. H. Saggion and R. Gaizauskas. 2004b. Mining on-line sources for definition knowledge. In Proceedings of FLAIRS 2004, Florida, USA. AAAI.Google Scholar
  15. H. Saggion and R. Gaizauskas. 2005. Experiments on Statistical and Pattern-based Biographical Summarization. In Proceedings of the 12th Portuguese Conference on Artificial Intelligence – TeMA Workshop. Accepted.Google Scholar
  16. G. Salton. 1988. Automatic Text Processing. Addison-Wesley Publishing Company.Google Scholar
  17. G. Sampson. 1995. English for the Computer: The SUSANNE Corpus and Analytic Scheme. Clarendon Press, Oxford.Google Scholar
  18. R.F. Simmons. 1965. Answering English questions by computer: A survey. Communications of the ACM, 8(1):53–70.CrossRefGoogle Scholar
  19. K. Sparck Jones. 1981. Retrieval system tests: 1958–1978. In K. Sparck Jones, (ed.), Information Retrieval Experiment, pages 213–255. Butterworths, London. URL Scholar
  20. K. Sparck Jones and J.R. Galliers. 1996. Evaluating Natural Language Processing Systems. Springer, Berlin.Google Scholar
  21. K. Sparck Jones and P. Willett. 1997. Chapter 1: Overall introduction. In K. Sparck Jones and P. Willett, (ed.), Readings in Information Retrieval, p 1–7. Morgan Kaufmann, San Francisco, CA.Google Scholar
  22. A. Tombros, M. Sanderson and P. Gray. 1998. Advantages of Query Biased Summaries in Information retrieval. In Intelligent Text Summarization. Papers from the 1998 AAAI Spring Symposium. Technical Report SS-98-06, p 34–43, Standford (CA), USA, March 23–25. The AAAI Press.Google Scholar
  23. E. Voorhees. 2004. Overview of TREC 2003. In Proceedings of the Twelfth Text Retrieval Conference (TREC 2003), NIST Special Publication 500-255. Available at: OVERVIEW.12.pdf.Google Scholar
  24. E. Voorhees. 2005. Overview of the TREC 2004 question answering track. In Proceedings of the Thirteenth Text Retrieval Conference (TREC 2003). URL NIST Special Publication 500-261.Google Scholar
  25. Y. Wilks. 1964. Text searching with templates. Technical Report Memo, ML.156, Cambridge Language Research Unit.Google Scholar
  26. F. Wolf and E. Gibson. 2004. A response to Marcu (2003). Discourse structure: trees or graphs?. Available at: Scholar

Copyright information

© Springer 2007

Authors and Affiliations

  1. 1.Department of Computer ScienceUniversity of SheffieldSheffieldUK

Personalised recommendations