Overview
- Explains the background and basic principles of comparable and parallel corpora for Natural Language Processing
- Provides an in-depth explanation of how to build comparable corpora for Machine Translation engines
- Discusses the other applications of comparable corpora including text annotation
Part of the book series: Synthesis Lectures on Human Language Technologies (SLHLT)
Access this book
Tax calculation will be finalised at checkout
Other ways to access
About this book
Keywords
Table of contents (8 chapters)
Authors and Affiliations
About the authors
Pierre Zweigenbaum, Ph.D., FACMI, FIAHSI, is a Senior Researcher at the Interdisciplinary Laboratory for Digital Sciences (LISN, Orsay, France), a laboratory of the French National Center forScientific Research (CNRS) and Université Paris-Saclay, where he has led the ILES Natural Language Processing group. Before CNRS he was a researcher at Paris Public Hospitals in an Inserm team. He also was a part-time professor at the National Institute for Oriental Languages and Civilizations. His research focus is Natural Language Processing, with medicine as a main application domain. He has also designed methods to acquire linguistic knowledge automatically from corpora and thesauri, to help extend monolingual and bilingual lexicons and terminologies, using parallel and comparable corpora.
Reinhard Rapp, Ph.D., is Professor of Applied Translation Studies at Magdeburg-Stendal University of Applied Sciences and is also affiliated with the University of Mainz. He has conducted EU-funded research projects at the University of Geneva, the University of Tarragona, the University of Leeds, at Aix-Marseille University, at the University of Mainz and at the Athena Research Center in Athens. His main research interests are in computational linguistics, translation studies and cognitive science. His publications have dealt with unsupervised language learning from text corpora, word sense disambiguation, text mining, thesaurus construction, bilingual dictionary induction from parallel and comparable corpora, and with statistical and neural machine translation.
Bibliographic Information
Book Title: Building and Using Comparable Corpora for Multilingual Natural Language Processing
Authors: Serge Sharoff, Reinhard Rapp, Pierre Zweigenbaum
Series Title: Synthesis Lectures on Human Language Technologies
DOI: https://doi.org/10.1007/978-3-031-31384-4
Publisher: Springer Cham
eBook Packages: Synthesis Collection of Technology (R0), eBColl Synthesis Collection 12
Copyright Information: The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2023
Hardcover ISBN: 978-3-031-31383-7Published: 24 August 2023
Softcover ISBN: 978-3-031-31386-8Published: 24 August 2024
eBook ISBN: 978-3-031-31384-4Published: 23 August 2023
Series ISSN: 1947-4040
Series E-ISSN: 1947-4059
Edition Number: 1
Number of Pages: VIII, 133
Number of Illustrations: 17 b/w illustrations, 14 illustrations in colour
Topics: Natural Language Processing (NLP), Artificial Intelligence, Computer Applications, Computer Science, general, Computational Linguistics, Machine Learning