Advertisement

Dutch Parallel Corpus: A Balanced Parallel Corpus for Dutch-English and Dutch-French

  • Hans Paulussen
  • Lieve Macken
  • Willy Vandeweghe
  • Piet Desmet
Chapter
Part of the Theory and Applications of Natural Language Processing book series (NLP)

Abstract

Parallel corpora are a valuable resource for researchers across a wide range of disciplines,i.e. machine translation, computer-assisted translation, terminology extraction, computer-assisted language learning, contrastive linguistics and translation studies. Since the development of a high-quality parallel corpus is a time-consuming and costly process, the DPC project aimed at the creation of a multifunctional resource that satisfies the needs of this diverse group of disciplines.

References

  1. 1.
    Allauzen, A., Bonneau-Maynard, H.: Training and evaluation of POS taggers on the French MULTITAG corpus. In: Proceedings of the Sixth International Language Resources and Evaluation (LREC-08), Marrakech, pp. 28–30 (2008)Google Scholar
  2. 2.
    Brown, P.F., Lai, J.C., Mercer, R.L.: Aligning sentences in parallel corpora. In: Proceedings of the 29th Annual Meeting of the Association for Computational Linguistics, Berkeley, pp. 169–176 (1991)Google Scholar
  3. 3.
    Brown, P.F., Della Pietra, V.J., Della Pietra, S.A., Mercer, R.L.: The mathematics of statistical machine translation: parameter estimation. Comput. Linguist. 19 (2), 263–311 (1993)Google Scholar
  4. 4.
    Christ, O.: A modular and flexible architecture for an integrated corpus query system. In: Proceedings of COMPLEX, Conference on Computational Lexicography and Text Research, Budapest, pp. 23–32 (1994)Google Scholar
  5. 5.
    Daelemans, W., van den Bosch, A.: Memory-Based Language Processing. Cambridge University Press, Cambridge (2005)CrossRefGoogle Scholar
  6. 6.
    Danielsson, P., Ridings, D.: Practical presentation of a vanilla aligner. In: Reyle, U., Rohrer, C. (eds.) The TELRI Workshop on Alignment and Exploitation of Texts, Ljubljana (1997)Google Scholar
  7. 7.
    De Clercq, O., Montero Perez, M.: Data collection and IPR in multilingual parallel corpora: Dutch parallel corpus. In: Proceedings of the 7th Language Resources and Evaluation Conference (LREC2010), Valletta, pp. 3383–3388 (2010)Google Scholar
  8. 8.
    Delaere, I., De Sutter, G., Plevoets, K.: Is translated language more standardized than non-translated language? Using profile-based correspondence analysis for measuring linguistic distances between language varieties. Target, 24 (2), (2012)Google Scholar
  9. 9.
    Desmet, P., Eggermont, C.: FRANEL: un environnement électronique d’apprentissage du français qui intègre des matériaux audio-visuels et qui est à la portée de tous. Cahiers F: revue de didactique français langue étrangère / Cahiers F: didactisch tijdschrift Frans vreemde taal pp. 39–54 (2006)Google Scholar
  10. 10.
    De Sutter, G., Delaere, I., Plevoets, K.: Lexical lectometry in corpus-based translation studies. Combining profile-based correspondence analysis and logistic regression modeling. In: Oakes, M., Ji, M. (eds.) Quantitative Methods in Corpus-Based Translation Studies. A Practical Guide To Descriptive Translation Research, pp. 325–345. John Benjamins, Amsterdam (2012)CrossRefGoogle Scholar
  11. 11.
    Gale, W.A., Church, K.W.: A program for aligning sentences in bilingual corpora. Comput. Linguist. 19 (1), 75–102 (1993)Google Scholar
  12. 12.
    Goetschalckx, J., Cucchiarini, C., Van Hoorde, J.: Machine translation for Dutch: the NL-Translex project, Brussels/Den Haag, 16pp (2001)Google Scholar
  13. 13.
    Kay, M., Röscheisen, M.: Text-translation alignment. Comput. Linguist. 19 (1), 121–142 (1993)Google Scholar
  14. 14.
    Macken, L.: In search of the recurrent units of translation. In: Daelemans, W., Hoste, V. (eds.) Evaluation of Translation Technology. LANS 8/2009, pp. 195–212. Academic and Scientific Publishers, Brussels (2009)Google Scholar
  15. 15.
    Macken, L.: Sub-sentential alignment of translational correspondences. Ph.D. thesis, University of Antwerp (2010)Google Scholar
  16. 16.
    Macken, L., Daelemans, W.: A Chunk-Driven Bootstrapping Approach to Extracting Translation Patterns. In: Proceedings of the 11th International Conference on Intelligent Text Processing and Computational Linguistics (Iasi, Romania). Lecture Notes in Computer Science, vol. 6009, pp. 394–405. Springer, Berlin/ Heidelberg (2010)Google Scholar
  17. 17.
    Macken, L., De Clerq, O., Paulussen, H.: Dutch parallel corpus: a balanced copyright-cleared parallel corpus. Meta 56 (2), 374–390 (2011)CrossRefGoogle Scholar
  18. 18.
    Marcus, M.P., Santorini, B., Marcinkiewicz, M.A.: Building a Large Annotated Corpus of English: The Penn Treebank. Comput. Linguist. 19 (2), 313–330 (1993)Google Scholar
  19. 19.
    Melamed, D.I.: A portable algorithm for mapping bitext correspondence. In: Proceedings of the 35th Annual Meeting of the Association of Computational Linguistics (ACL), Madrid, pp. 305–312 (1997)Google Scholar
  20. 20.
    Montero Perez, M., Paulussen, H., Macken, L., Desmet, P.: From input to output: the potential of parallel corpora for CALL. LRE (Submitted)Google Scholar
  21. 21.
    Moore, R.: Fast and Accurate Sentence Alignment of Bilingual Corpora. Machine Translation: From Research to Real Users. 2499, 135–144 (2002)Google Scholar
  22. 22.
    Och, F.J., Ney, H.: A systematic comparison of various statistical alignment models. Comput. Linguist. 29, 19–51 (2003)CrossRefGoogle Scholar
  23. 23.
    Paroubek, P.: Language resources as by-product of evaluation: the multitag example. In: Second International Conference on Language Resources and Evaluation (LREC 2000), Athens, pp. 151–154 (2000)Google Scholar
  24. 24.
    Rura, L., Vandeweghe, W., Montero Perez, M.: Designing a parallel corpus as a multifunctional translator’s aid. In: Proceedings of XVIII FIT World Congress, Shangai, pp. 4–7 (2008)Google Scholar
  25. 25.
    Schmid, H.: Probabilistic part-of-speech tagging using decision trees. In: Proceedings of International Conference on New Methods in Language Processing, Manchester (1994)Google Scholar
  26. 26.
    Simard, M., Foster, G., Hannan, M.L., Macklovitch, E., Plamondon, P.: Bilingual text alignment: where do we draw the line? In: Botley, S., McEnery, A., Wilson, A. (eds.) Multilingual Corpora in Teaching and Research, pp. 38–64. Rodopi, Amsterdam (2000)Google Scholar
  27. 27.
    van den Bosch, A., Schuurman, I., Vandeghinste, V.: Transferring POS tagging and lemmatization tools from spoken to written Dutch corpus development. In: Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC), Genua (2006)Google Scholar
  28. 28.
    Van Eynde, F., Zavrel, J., Daelemans, W.: Part of speech tagging and lemmatisation for the spoken Dutch corpus. In: Proceedings of the Second International Conference on Language Resources and Evaluation (LREC-2000), Athens, pp. 1427–1434 (2000)Google Scholar
  29. 29.
    Van Keirsbilck, P., Lauwers, P., Desmet, P.: Le subjonctif tel qu’il s’enseigne en Flandre et en France: bilan et perspectives. Travaux de didactique du FLE. 64, 131–145 (2010)Google Scholar
  30. 30.
    Vanderbauwhede, G.: Le déterminant démonstratif en français et en néerlandais à travers les corpus: théorie, description, acquisition. Ph.D. thesis, K.U. Leuven (2011)Google Scholar
  31. 31.
    Vanderbauwhede, G.: Les emplois référentiels du SN démonstratif en français et en néerlandais: pas du pareil au même. J. Fr. Lang. Stud. 22 (2), 273–294 (2012) doi:10.1017/S0959269511000020CrossRefGoogle Scholar

Copyright information

© The Author(s) 2013

Open Access. This chapter is distributed under the terms of the Creative Commons Attribution Noncommercial License, which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.

Authors and Affiliations

  • Hans Paulussen
    • 1
  • Lieve Macken
    • 2
  • Willy Vandeweghe
    • 2
  • Piet Desmet
    • 1
  1. 1.ITEC-IBBT KULeuven KulakKortrijkBelgium
  2. 2.Language and Translation Technology Team (LT3)University College Ghent and Ghent UniversityGentBelgium

Personalised recommendations