Skip to main content

Linguistic Resources, Development, and Evaluation of Text and Speech Systems

  • Chapter
Book cover Evaluation of Text and Speech Systems

Part of the book series: Text, Speech and Language Technology ((TLTB,volume 37))

  • 635 Accesses

Over the past several decades, research and development of human language technology has been driven or hindered by the availability of data and a number of organizations have arisen to address the demand for greater volumes of linguistic data in a wider variety of languages with more sophisticated annotation and better quality. A great deal of the linguistic data available today results from common task technology evaluation programs that, at least as implemented in the United States, typically involve objective measures of system performance on a benchmark corpus that are compared with human performance over the same data. Data centres play an important role by distributing and archiving, sometimes collecting and annotating, and even by coordinating the efforts of other organizations in the creation of linguistic data. Data planning depends upon the purpose of the project, the linguistic resources needed, the internal and external limitations on acquiring them, availability of data, bandwidth and distribution requirements, available funding, the limits on human annotation, the timeline, the details of the processing pipeline including the ability to parallelize, or the need to serialize steps. Language resource creation includes planning, creation of a specification, collection, segmentation, annotation, quality assurance, preparation for use, distribution, adjudication, refinement, and extension. In preparation for publication, shared corpora are generally associated with metadata and documented to indicate the authors and annotators of the data, the volume and types of raw material included, the percent annotated, the annotation specification, and the quality control measures adopted. This chapter sketches issues involved in identifying and evaluating existing language resources and in planning, creating, validating, and distributing new language resources, especially those used for developing human language technologies with specific examples taken from the collection and annotation of conversational telephone speech and the adjudication of corpora created to support information retrieval.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer

About this chapter

Cite this chapter

Cieri, C. (2007). Linguistic Resources, Development, and Evaluation of Text and Speech Systems. In: Dybkjær, L., Hemsen, H., Minker, W. (eds) Evaluation of Text and Speech Systems. Text, Speech and Language Technology, vol 37. Springer, Dordrecht. https://doi.org/10.1007/978-1-4020-5817-2_8

Download citation

Publish with us

Policies and ethics