Name: Building and Using Comparable Corpora
ISBN: 978-3-642-20128-8

Overview

Editors:

Serge Sharoff⁰,
Reinhard Rapp¹,
Pierre Zweigenbaum²,
…
Pascale Fung³

Serge Sharoff
1. Centre for Translation Studies, University of Leeds, Leeds, United Kingdom
View editor publications

You can also search for this editor in PubMed Google Scholar
Reinhard Rapp
1. University of Mainz, Mainz, Germany
View editor publications

You can also search for this editor in PubMed Google Scholar
Pierre Zweigenbaum
1. Université de Paris-Sud LIMSI-CNRS, Orsay, France
View editor publications

You can also search for this editor in PubMed Google Scholar
Pascale Fung
1. Electronic & Computer Engineering, The Hong Kong University of Science and Technology, Hong Kong, People's Republic of China
View editor publications

You can also search for this editor in PubMed Google Scholar

A reference source for researchers and students coming to the field of comparable corpora
Identifies the state of the art in the field as well as future trends
Written by experts in the fields
Includes supplementary material: sn.pub/extras

19k Accesses
63 Citations
1 Altmetric

This is a preview of subscription content, log in via an institution to check access.

Access this book

eBook USD 84.99

Price excludes VAT (USA)

Softcover Book USD 109.00

Price excludes VAT (USA)

Hardcover Book USD 109.99

Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Other ways to access

Licence this eBook for your library

Institutional subscriptions

Table of contents (17 chapters)

Front Matter

Pages i-xii

Download chapter PDF
Overviewing Important Aspects of the Last Twenty Years of Research in Comparable Corpora
- Serge Sharoff, Reinhard Rapp, Pierre Zweigenbaum
Pages 1-17
Compiling and Measuring Comparable Corpora
1. Front Matter
  
  Pages 19-19
  
  Download chapter PDF
2. Mining Parallel Documents Using Low Bandwidth and High Precision CLIR from the Heterogeneous Web
  
  Simon Shi, Pascale Fung
  
  Pages 21-49
3. Automatic Comparable Web Corpora Collection and Bilingual Terminology Extraction for Specialized Dictionary Making
  
  Antton Gurrutxaga, Igor Leturia, Xabier Saralegi, Iñaki San Vicente
  
  Pages 51-75
4. Statistical Comparability: Methodological Caveats
  
  Reinhard Köhler
  
  Pages 77-91
5. Methods for Collection and Evaluation of Comparable Documents
  
  Monica Lestari Paramita, David Guthrie, Evangelos Kanoulas, Rob Gaizauskas, Paul Clough, Mark Sanderson
  
  Pages 93-112
6. Measuring the Distance Between Comparable Corpora Between Languages
  
  Serge Sharoff
  
  Pages 113-130
7. Exploiting Comparable Corpora for Lexicon Extraction: Measuring and Improving Corpus Quality
  
  Bo Li, Eric Gaussier
  
  Pages 131-149
8. Statistical Corpus and Language Comparison on Comparable Corpora
  
  Thomas Eckart, Uwe Quasthoff
  
  Pages 151-165
9. Comparable Multilingual Patents as Large-Scale Parallel Corpora
  
  Bin Lu, Ka Po Chow, Benjamin K. Tsou
  
  Pages 167-187
Using Comparable Corpora
1. Front Matter
  
  Pages 189-189
  
  Download chapter PDF
2. Extracting Parallel Phrases from Comparable Data
  
  Sanjika Hewavitharana, Stephan Vogel
  
  Pages 191-204
3. Exploiting Comparable Corpora
  
  Dragos Stefan Munteanu, Daniel Marcu
  
  Pages 205-222
4. Paraphrase Detection in Monolingual Specialized/Lay Comparable Corpora
  
  Louise Deléger, Bruno Cartoni, Pierre Zweigenbaum
  
  Pages 223-241
5. Information Network Construction and Alignment from Automatically Acquired Comparable Corpora
  
  Heng Ji, Adam Lee, Wen-Pin Lin
  
  Pages 243-263
6. Bilingual Terminology Mining from Language for Special Purposes Comparable Corpora
  
  Emmanuel Morin, Béatrice Daille, Emmanuel Prochasson
  
  Pages 265-284
7. The Place of Comparable Corpora in Providing Terminological Reference Information to Online Translators: A Strategic Framework
  
  Kyo Kageura, Takeshi Abekawa
  
  Pages 285-301
8. Old Needs, New Solutions: Comparable Corpora for Language Professionals
  
  Silvia Bernardini, Adriano Ferraresi
  
  Pages 303-319
9. Exploiting the Incomparability of Comparable Corpora for Contrastive Linguistics and Translation Studies
  
  Stella Neumann, Silvia Hansen-Schirra
  
  Pages 321-335

Keywords

About this book

The 1990s saw a paradigm change in the use of corpus-driven methods in NLP. In the field of multilingual NLP (such as machine translation and terminology mining) this implied the use of parallel corpora. However, parallel resources are relatively scarce: many more texts are produced daily by native speakers of any given language than translated. This situation resulted in a natural drive towards the use of comparable corpora, i.e. non-parallel texts in the same domain or genre. Nevertheless, this research direction has not produced a single authoritative source suitable for researchers and students coming to the field.

The proposed volume provides a reference source, identifying the state of the art in the field as well as future trends. The book is intended for specialists and students in natural language processing, machine translation and computer-assisted translation.

Reviews

“I would like to recommend ‘Building and Using Comparable … to those who are working with or are interested in multilingual and monolingual comparable corpora. … it is easy to say that the notion of comparable corpora was not only visionary, long-sighted, and productive. It is also easy to say that this volume remains the optimal starting point for any research or for any applications in Language Technology leveraging on comparable corpora.” (Marina Santini, forum.santini.se, February, 2017)

Editors and Affiliations

Centre for Translation Studies, University of Leeds, Leeds, United Kingdom

Serge Sharoff
University of Mainz, Mainz, Germany

Reinhard Rapp
Université de Paris-Sud LIMSI-CNRS, Orsay, France

Pierre Zweigenbaum
Electronic & Computer Engineering, The Hong Kong University of Science and Technology, Hong Kong, People's Republic of China

Pascale Fung

Bibliographic Information

Book Title: Building and Using Comparable Corpora
Editors: Serge Sharoff, Reinhard Rapp, Pierre Zweigenbaum, Pascale Fung
DOI: https://doi.org/10.1007/978-3-642-20128-8
Publisher: Springer Berlin, Heidelberg
eBook Packages: Computer Science, Computer Science (R0)
Copyright Information: Springer-Verlag Berlin Heidelberg 2013
Hardcover ISBN: 978-3-642-20127-1Published: 07 January 2014
Softcover ISBN: 978-3-662-52006-2Published: 23 August 2016
eBook ISBN: 978-3-642-20128-8Published: 13 December 2013
Edition Number: 1
Number of Pages: XII, 335
Number of Illustrations: 56 b/w illustrations, 14 illustrations in colour
Topics: Natural Language Processing (NLP), Computational Linguistics, Information Systems Applications (incl. Internet)

Publish with us

Policies and ethics

Building and Using Comparable Corpora

Overview

Access this book

Other ways to access

Table of contents (17 chapters)

Front Matter

Compiling and Measuring Comparable Corpora

Front Matter

Using Comparable Corpora

Front Matter

Keywords

About this book

Reviews

Editors and Affiliations

Centre for Translation Studies, University of Leeds, Leeds, United Kingdom

University of Mainz, Mainz, Germany

Université de Paris-Sud LIMSI-CNRS, Orsay, France

Electronic & Computer Engineering, The Hong Kong University of Science and Technology, Hong Kong, People's Republic of China

Bibliographic Information

Publish with us

Search

Navigation