Head/Modifier Frames for Information Retrieval

  • Cornelis H. A. Koster
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2945)


We describe a principled method for representing documents by phrases abstracted into Head/Modifier pairs. First the notion of aboutness and the characterization of full-text documents by HM pairs is didcussed. Based on linguistic arguments, a taxonomy of HM pairs is derived. We briefly describe the EP4IR parser/transducer of English and present some statistics of the distribution of HM pairs in newspaper text.

Based on the HM pairs generated, a new technique to measure the accuracy of a parser is introduced, and applied to the EP4IR grammar of English. Finally we discuss the merits of HM pairs and HM trees as a document representation.


Information Retrieval Noun Phrase Indexing Term Document Representation Verb Phrase 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Arampatzis, A., van der Weide, T.P., Koster, C.H.A., van Bommel, P.: An Evaluation of Linguistically-motivated Indexing Schemes. In: Proceedings BCS-IRSG 2000 Colloquium on IR Research, Cambridge, England (2000)Google Scholar
  2. 2.
    Bruza, P., Huibers, T.W.C.: Investigating Aboutness Axioms using Information Fields. In: Proceedings SIGIR 1994, pp. 112–121 (1994)Google Scholar
  3. 3.
    Bruza, P., Huibers, T.W.C.: A Study of Aboutness in Information Retrieval. Artificial Intelligence Review 10, 1–27 (1996)CrossRefGoogle Scholar
  4. 4.
    Bruza, P., van der Weide, T.P.: The Modelling and Retrieval of Documents Using Index Expressions. SIGIR Forum 25(2), 91–103 (1991)CrossRefGoogle Scholar
  5. 5.
    Carroll, J., Guido, M., Briscoe, E.: Corpus Annotation for Parser Evaluation. In: Proceedings of the EACL workshop on Linguistically Interpreted Corpora (LINC) (1999)Google Scholar
  6. 6.
    Daelemans, W., Buchholz, S., Veenstra, J.: Memory-based shallow parsing. In: Proceedings CoNLL, Bergen, Norway(1999)Google Scholar
  7. 7.
    Evans, D.A., Lefferts, R.G., Grefenstette, G., Handerson, S.H., Hersch, W.R., Archbold, A.A.: CLARIT TREC design, experiments and results. In: TREC-1 proceedings, pp. 251–286 (1993)Google Scholar
  8. 8.
    Fagan, J.L.: Experiments in automatic phrase indexing for document retrieval: a comparison of syntactic and non-syntactic methods, PhD Thesis, Cornell University (1988)Google Scholar
  9. 9.
    Gelbukh, A., Sidorov, G., Han, S.-Y., Hernández-Rubio, E.: Automatic Syntactic Analysis for Detection of Word Combinations. In: Gelbukh, A. (ed.) CICLing 2004. LNCS, vol. 2945, pp. 240–244. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  10. 10.
    Grefenstette, G.: Light parsing as finite state filtering. In: Workshop on Extended finite state models of language, ECAI 1996, Budapest (1996)Google Scholar
  11. 11.
    Koster, C.H.A.: Affix Grammars for Natural Languages. In: Alblas, H., Melichar, B. (eds.) SAGA School 1991. LNCS, vol. 545, pp. 469–484. Springer, Heidelberg (1991)Google Scholar
  12. 12.
    Koster, C.H.A., Seutter, M.: Taming Wild Phrases. In: Sebastiani, F. (ed.) ECIR 2003. LNCS, vol. 2633, pp. 161–176. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  13. 13.
    Koster, C.H.A., Verbruggen, E.: The AGFL Grammar Work Lab. In: Proceedings of the FREENIX/Usenix conference 2002, pp. 13–18 (2002)Google Scholar
  14. 14.
    Krier, M., Zaccà, F.: Automatic Categorisation Applications at the European Patent Office. World Patent Information 24, 187–196 (2002)CrossRefGoogle Scholar
  15. 15.
    Lewis, D.D.: Representation and Learning in Information Retrieval. PhD thesis, Department of Computer Science, Univ. of Massachusetts, Amherst, MA 01003 (1992)Google Scholar
  16. 16.
    Lin, D.: A dependency-based method for evaluating broad-coverage parsers. In: Proceedings IJCAI 1995, pp. 1420–1425 (1995)Google Scholar
  17. 17.
    Marcus, M., Santorini, B., Marcinkiewicz, M.: Building a Large Annotated Corpus of English: The Penn Treebank. Computational Linguistics 19(2), 313–330 (1994)Google Scholar
  18. 18.
    Sparck Jones, K.: Information retrieval: how far will really simple methods take you? In: Proceedings TWTL 14, Twente University, the Netherlands, pp. 71–78 (1998)Google Scholar
  19. 19.
    Sparck Jones, K.: The role of NLP in Text Retrieval. In: [22], pp. 1–24 (1999)Google Scholar
  20. 20.
    Smeaton, A.F.: Using NLP and NLP resources for Information Retrieval Tasks. In: Strzalkowski, T. (ed.) Natural Language Information Retrieval. Kluwer Academic Publishers, Dordrecht (1997)Google Scholar
  21. 21.
    Strzalkowski, T.: Natural Language Information Retrieval. Information Processing and Management 31(3), 397–417 (1995)CrossRefGoogle Scholar
  22. 22.
    Strzalkowski, T. (ed.): Natural Language Information Retrieval. Kluwer Academic Publishers, Dordrecht (1999); ISBN 0-7923-5685-3zbMATHGoogle Scholar
  23. 23.
    Winograd, T.: Language as a Cognitive Process. Syntax, vol. I, p. 650. Addison-Wesley, Reading (1983)zbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Cornelis H. A. Koster
    • 1
  1. 1.Computing Science InstituteUniversity of NijmegenThe Netherlands

Personalised recommendations