Searching Text Corpora with grep

  • Tomasz Obrębski
Conference paper
Part of the Advances in Soft Computing book series (AINSC, volume 35)


The paper presents simple methods for perfoming pattern search on annotated text corpora. Elementary text processing techniques are applied, based on the use of common text scanning tools: flex and grep. The methods allow to properly handle ambiguous annotation, as well as structured tags. Processing times for some types of queries are comparable to those attained by elaborated search engines using indexing techniques with query languages of similar expressiveness.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    1. grep documentation. Scholar
  2. 2.
    2. Nicol, G. T. (1993) Flex: The Lexical Scanner Generator. Free Software FoundationGoogle Scholar
  3. 3.
    3. Izop documentation. man.phpGoogle Scholar
  4. 4.
    4. m4 documentation. Scholar
  5. 5.
    5. Obrçbski, T., Stolarski M. (2005) UAM Text Tools - A text processing toolkit for Polish. Proceedings of 2nd Language&Technology Conference, Poznań, Poland, 301–304Google Scholar
  6. 6.
    6. Przepiórkowski A., Krynicki Z. et al. (2004) A Search Tool for Corpora with Positional Tagsets and Ambiguities. The Proceedings of LREC 2004, 1235–1238Google Scholar
  7. 7.
    7. Silberztein, M. (1993) Dictionnaires électroniques et analyse automatique de textes. Le système INTEX.MASSON, ParisGoogle Scholar

Copyright information

© Springer 2006

Authors and Affiliations

  • Tomasz Obrębski
    • 1
  1. 1.Adam Mickiewicz UniversityPoznańPoland

Personalised recommendations