Skip to main content

Principles, Implementation Strategies, and Evaluation of a Corpus Query System

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4002))

Abstract

The last decade has seen an increase in the number of available corpus query systems. These systems generally implement a query language as well as a database model. We report on one such corpus query system, and evaluate its query language against a range of queries and criteria quoted from the literature. We show some important principles of the design of the query language, and argue for the strategy of separating what is retrieved by a linguistic query from the data retrieved in order to display or otherwise process the results, stating the needs for generality, simplicity, and modularity as reasons to prefer this strategy.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Mengel, A.: MATE deliverable D3.1 – specification of coding workbench: 3.8 improved query language (Q4M). Technical report, Institut für Maschinelle Sprachverarbeitung, Stuttgart, November 18 (1999)

    Google Scholar 

  2. Cassidy, S., Bird, S.: Querying databases of annotated speech. In: Orlowska, M. (ed.) Database Technologies: Proceedings of the Eleventh Australasian Database Conference, Canberra, Australia. Australian Computer Science Communications, vol. 22, pp. 12–20. IEEE Computer Society, Los Alamitos (2000)

    Google Scholar 

  3. Bird, S., Buneman, P., Tan, W.C.: Towards a query language for annotation graphs. In: Proceedings of the Second International Conference on Language Resources and Evaluation, pp. 807–814. European Language Resources Association, Paris (2000)

    Google Scholar 

  4. Lezius, W.: TIGERSearch – ein Suchwerkzeug für Baumbanken. In: Busemann, S. (ed.) Proceedings der 6. Konferenz zur Verarbeitung natürlicher Sprache (KONVENS 2002), Saarbrücken, pp. 107–114 (2002)

    Google Scholar 

  5. Heid, U., Voormann, H., Milde, J.T., Gut, U., Erk, K., Pado, S.: Querying both time-aligned and hierarchical corpora with NXT Search. In: Fourth Language Resources and Evaluation Conference, Lisbon, Portugal (May 2004)

    Google Scholar 

  6. Rohde, D.L.T.: TGrep2 user manual, version 1.12 (Access Online April 2005) (2004), Available for download online, http://tedlab.mit.edu/~dr/Tgrep2/tgrep2.pdf

  7. Bird, S., Chen, Y., Davidson, S., Lee, H., Zheng, Y.: Extending XPath to support linguistic queries. In: Proceedings of Programming Language Technologies for XML (PLANX) Long Beach, California, pp. 35–46 (January 2005)

    Google Scholar 

  8. Petersen, U.: Emdros — A text database engine for analyzed or annotated text. In: Proceedings of COLING 2004, 20th International Conference on Computational Linguistics, Geneva, International Commitee on Computational Linguistics, August 23rd–27th, 2004, pp. 1190–1193 (2004), http://emdros.org/petersen-emdros-COLING-2004.pdf

  9. Petersen, U.: Evaluating corpus query systems on functionality and speed: Tigersearch and emdros. In: Angelova, G., Bontcheva, K., Mitkov, R., Nicolov, N., Nikolov, N. (eds.) International Conference Recent Advances in Natural Language Processing 2005, Proceedings, Borovets, Bulgaria, Shoumen, Bulgaria, INCOMA Ltd., September 21-23, pp. 387–391 (2005), ISBN 954-91743-3-6

    Google Scholar 

  10. Doedens, C.J.: Text Databases: One Database Model and Several Retrieval Languages. Language and Computers, vol. (14), Editions Rodopi, Amsterdam and Atlanta, GA (1994)

    Google Scholar 

  11. Lund, K., Burgess, C.: Producing high-dimensional semantic spaces from lexical co-occurrence. Behavior Research Methods, Instruments and Computers 28, 203–208 (1996)

    Article  Google Scholar 

  12. McCawley, J.D.: Parentheticals and discontinuous constituent structure. Linguistic Inquiry 13, 91–106 (1982)

    Google Scholar 

  13. Lai, C., Bird, S.: Querying and updating treebanks: A critical survey and requirements analysis. In: Proceedings of the Australasian Language Technology Workshop, pp. 139–146 (December 2004)

    Google Scholar 

  14. Beckman, M.E., Pierrehumbert, J.B.: Japanese prosodic phrasing and intonation synthesis. In: Proceedings of the 24th Annual Meeting of the Association for Computational Linguistics, pp. 173–180. ACL (1986)

    Google Scholar 

  15. Brants, S., Hansen, S.: Developments in the TIGER annotation scheme and their realization in the corpus I. In: Proceedings of the Third International Conference on Language Resources and Evaluation (LREC 2002), Las Palmas, Spain, pp. 1643–1649 (May 2002)

    Google Scholar 

  16. Taylor, A., Marcus, M., Santorini, B.: The Penn treebank: An overview. In: Abeillé, A. (ed.) Treebanks — Building and Using Parsed Corpora. Text, Speech and Language Technology, vol. 20, pp. 5–22. Kluwer Academic Publishers, Dordrecht (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Petersen, U. (2006). Principles, Implementation Strategies, and Evaluation of a Corpus Query System. In: Yli-Jyrä, A., Karttunen, L., Karhumäki, J. (eds) Finite-State Methods and Natural Language Processing. FSMNLP 2005. Lecture Notes in Computer Science(), vol 4002. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11780885_21

Download citation

  • DOI: https://doi.org/10.1007/11780885_21

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-35467-3

  • Online ISBN: 978-3-540-35469-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics