Abstract
The paper presents simple methods for perfoming pattern search on annotated text corpora. Elementary text processing techniques are applied, based on the use of common text scanning tools: flex and grep. The methods allow to properly handle ambiguous annotation, as well as structured tags. Processing times for some types of queries are comparable to those attained by elaborated search engines using indexing techniques with query languages of similar expressiveness.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
1. grep documentation. http://www.gnu.org/software/grep/doc/grep.html
2. Nicol, G. T. (1993) Flex: The Lexical Scanner Generator. Free Software Foundation
3. Izop documentation. http://www.lzop.org/lzop man.php
4. m4 documentation. http://www.gnu.org/software/m4/manual/index.html
5. Obrçbski, T., Stolarski M. (2005) UAM Text Tools - A text processing toolkit for Polish. Proceedings of 2nd Language&Technology Conference, Poznań, Poland, 301–304
6. Przepiórkowski A., Krynicki Z. et al. (2004) A Search Tool for Corpora with Positional Tagsets and Ambiguities. The Proceedings of LREC 2004, 1235–1238
7. Silberztein, M. (1993) Dictionnaires électroniques et analyse automatique de textes. Le système INTEX.MASSON, Paris
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer
About this paper
Cite this paper
Obrębski, T. (2006). Searching Text Corpora with grep. In: Kłopotek, M.A., Wierzchoń, S.T., Trojanowski, K. (eds) Intelligent Information Processing and Web Mining. Advances in Soft Computing, vol 35. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-33521-8_36
Download citation
DOI: https://doi.org/10.1007/3-540-33521-8_36
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-33520-7
Online ISBN: 978-3-540-33521-4
eBook Packages: EngineeringEngineering (R0)