© 2013

Building and Using Comparable Corpora

  • Serge Sharoff
  • Reinhard Rapp
  • Pierre Zweigenbaum
  • Pascale Fung

Table of contents

  1. Front Matter
    Pages i-xii
  2. Serge Sharoff, Reinhard Rapp, Pierre Zweigenbaum
    Pages 1-17
  3. Compiling and Measuring Comparable Corpora

    1. Front Matter
      Pages 19-19
    2. Antton Gurrutxaga, Igor Leturia, Xabier Saralegi, Iñaki San Vicente
      Pages 51-75
    3. Monica Lestari Paramita, David Guthrie, Evangelos Kanoulas, Rob Gaizauskas, Paul Clough, Mark Sanderson
      Pages 93-112
    4. Thomas Eckart, Uwe Quasthoff
      Pages 151-165
    5. Bin Lu, Ka Po Chow, Benjamin K. Tsou
      Pages 167-187
  4. Using Comparable Corpora

    1. Front Matter
      Pages 189-189
    2. Sanjika Hewavitharana, Stephan Vogel
      Pages 191-204
    3. Dragos Stefan Munteanu, Daniel Marcu
      Pages 205-222
    4. Louise Deléger, Bruno Cartoni, Pierre Zweigenbaum
      Pages 223-241
    5. Emmanuel Morin, Béatrice Daille, Emmanuel Prochasson
      Pages 265-284
    6. Silvia Bernardini, Adriano Ferraresi
      Pages 303-319

About this book


The 1990s saw a paradigm change in the use of corpus-driven methods in NLP. In the field of multilingual NLP (such as machine translation and terminology mining) this implied the use of parallel corpora. However, parallel resources are relatively scarce: many more texts are produced daily by native speakers of any given language than translated. This situation resulted in a natural drive towards the use of comparable corpora, i.e. non-parallel texts in the same domain or genre. Nevertheless, this research direction has not produced a single authoritative source suitable for researchers and students coming to the field.

The proposed volume provides a reference source, identifying the state of the art in the field as well as future trends. The book is intended for specialists and students in natural language processing, machine translation and computer-assisted translation.


68T50, 91F20 Corpus linguistics Machine Translation Natural Language Processing Terminology

Editors and affiliations

  • Serge Sharoff
    • 1
  • Reinhard Rapp
    • 2
  • Pierre Zweigenbaum
    • 3
  • Pascale Fung
    • 4
  1. 1.Centre for Translation StudiesUniversity of LeedsLeedsUnited Kingdom
  2. 2.University of MainzMainzGermany
  3. 3.Université de Paris-Sud LIMSI-CNRSOrsayFrance
  4. 4.Electronic & Computer EngineeringThe Hong Kong University of Science and TechnologyHong KongPeople's Republic of China

Bibliographic information


“I would like to recommend ‘Building and Using Comparable … to those who are working with or are interested in multilingual and monolingual comparable corpora. … it is easy to say that the notion of comparable corpora was not only visionary, long-sighted, and productive. It is also easy to say that this volume remains the optimal starting point for any research or for any applications in Language Technology leveraging on comparable corpora.” (Marina Santini,, February, 2017)