Skip to main content

Parallel Corpus

  • Reference work entry
Encyclopedia of Machine Learning
  • 141 Accesses

A parallel corpus (pl. corpora) is a document collection composed of two or more disjoint subsets, each written in a different language, such that documents in each subset are translations of documents in each other subset. Moreover, it is required that the translation relation is known, i.e., that given a document in one of the subset (i.e., languages), it is known what documents in the other subset are its translations. The statistical analysis of parallel corpora is at the heart of most methods for cross-language text mining.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer Science+Business Media, LLC

About this entry

Cite this entry

(2011). Parallel Corpus. In: Sammut, C., Webb, G.I. (eds) Encyclopedia of Machine Learning. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-30164-8_627

Download citation

Publish with us

Policies and ethics