Skip to main content

Part of the book series: Advances in Soft Computing ((AINSC,volume 35))

  • 611 Accesses

Abstract

The paper presents simple methods for perfoming pattern search on annotated text corpora. Elementary text processing techniques are applied, based on the use of common text scanning tools: flex and grep. The methods allow to properly handle ambiguous annotation, as well as structured tags. Processing times for some types of queries are comparable to those attained by elaborated search engines using indexing techniques with query languages of similar expressiveness.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 259.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 329.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1. grep documentation. http://www.gnu.org/software/grep/doc/grep.html

    Google Scholar 

  2. 2. Nicol, G. T. (1993) Flex: The Lexical Scanner Generator. Free Software Foundation

    Google Scholar 

  3. 3. Izop documentation. http://www.lzop.org/lzop man.php

    Google Scholar 

  4. 4. m4 documentation. http://www.gnu.org/software/m4/manual/index.html

    Google Scholar 

  5. 5. Obrçbski, T., Stolarski M. (2005) UAM Text Tools - A text processing toolkit for Polish. Proceedings of 2nd Language&Technology Conference, Poznań, Poland, 301–304

    Google Scholar 

  6. 6. Przepiórkowski A., Krynicki Z. et al. (2004) A Search Tool for Corpora with Positional Tagsets and Ambiguities. The Proceedings of LREC 2004, 1235–1238

    Google Scholar 

  7. 7. Silberztein, M. (1993) Dictionnaires électroniques et analyse automatique de textes. Le système INTEX.MASSON, Paris

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer

About this paper

Cite this paper

Obrębski, T. (2006). Searching Text Corpora with grep. In: Kłopotek, M.A., Wierzchoń, S.T., Trojanowski, K. (eds) Intelligent Information Processing and Web Mining. Advances in Soft Computing, vol 35. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-33521-8_36

Download citation

  • DOI: https://doi.org/10.1007/3-540-33521-8_36

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-33520-7

  • Online ISBN: 978-3-540-33521-4

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics