Skip to main content

Divergences in the Usage of Discourse Markers in English and Mandarin Chinese

  • Conference paper
  • 1564 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8655))

Abstract

Statistical machine translation (SMT) has, in recent years, improved the accuracy of automated translations. However, SMT systems often fail to deliver human quality translations especially with complex sentences and distant language pairs. Current SMT systems often focus on translating single sentences with clauses being treated in isolation. leading to a loss of contextual information. Discourse markers (DMs) are vital contextual links between discourse segments and this paper examines the divergences in their usage across English and Mandarin Chinese. We highlight important structural differences in composite sentences extracted from a number of parallel corpora, and show examples of how these cases are dealt with by popular SMT systems. Numerous significant divergences, such as contextual omissions, were observed which can lead to incoherent automatic translations. Our objective is to use these findings to guide a framework proposal to address divergences in DM usage in order to improve SMT output quality.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Zuffery, S., Degand, L.: Annotating the Meaning of Discourse Connectives in Multilingual Corpora. Corpus Linguistics and Linguistic Theory, 1–24 (2013)

    Google Scholar 

  2. Tsou, B., Gao, W., Lai, T., Chan, S.: Applying Machine Learning to Identify Chinese Discourse Markers. In: International Conference on Information, Intelligence and Systems, Chania Crete, Greece (1999)

    Google Scholar 

  3. Hussein, M.: Two Accounts of Discourse Markers in English. University of Damascus, Syria (2002)

    Google Scholar 

  4. Hardmeier, C.: Discourse in Statistical Machine Translation: A Survey and a Case Study. In: Discours – Revue de linguistique, psycholinguistique et informatique, Caen, Presses Universitaires de Caen (2012)

    Google Scholar 

  5. Meyer, T., Webber, B.: Implicitation of Discourse Connectives in (Machine) Translation. In: Workshop on Discourse in Machine Translation (DiscoMT), Sofia, Bulgaria, pp. 19–26 (2013)

    Google Scholar 

  6. Hardmeier, C., Stymne, S., Tiedemann, J., Nivre, J.: Docent: A Document-Level Decoder for Phrase-Based Statistical Machine Translation. In: 51st Annual Meeting of the ACL, Sofia, Bulgaria, pp. 193–198 (2013)

    Google Scholar 

  7. Hajlaoni, N., Popsecu-Belis, A.: Translating English Discourse Connectives into Arabic: a Corpus-based analysis and an Evaluation Metric. In: CAASL4 Workshop at AMTA (Fourth Workshop on Computational Approaches to Arabic Script-based Languages), San Diego, CA, pp. 1–8 (2013)

    Google Scholar 

  8. Swan, M., Smith, B.: Learner English, 2nd edn. Cambridge University Press, Cambridge (2004)

    Google Scholar 

  9. Chang, P., Jurafsky, D., Manning, C.: Disambiguating “DE” for Chinese-English Machine Translation. In: 4th Workshop on SMT, Athens, Greece, pp. 215–223 (2009a)

    Google Scholar 

  10. Li, Y.: Sensitive Positions and Chinese Complex Sentences: A Comparative Perspective. Journal of Chinese Language and Computing 18(2), 47–59 (2008)

    Google Scholar 

  11. Po-Ching, Y., Rimmington, D.: A Comprehensive Grammar. Routledge, London (2004)

    Google Scholar 

  12. Takezawa, T., Sumita, E., Sugaya, F., Yamamoto, H., Yamamoto, S.: Toward a broad-coverage bilingual corpus for speech translation of travel conversations in the real world. In: LREC, Las Palmas, Spain, pp. 147–152 (2002)

    Google Scholar 

  13. Cettolo, M., Girardi, C., Federico, M.: WIT3: Web Inventory of Transcribed and Translated Talks. In: EAMT, Trento, Italy, pp. 261–268 (2012)

    Google Scholar 

  14. Eisele, A., Chen, Y.: MultiUN: A Multilingual Corpus from United Nation Documents. In: 7th Conference on International Language Resources and Evaluation, Pages, La Valletta, Malta, pp. 2868–2872 (2010)

    Google Scholar 

  15. Hutchinson, B.: Acquiring the Meaning of Discourse Markers. In: 42nd Meeting of ACL, Main Volume, Barcelona, Spain, pp. 684–691 (2004)

    Google Scholar 

  16. Po-Ching, Y., Rimmington, D.: Chinese: Intermediate Chinese, A Grammar and Workbook. Routledge, London (1998)

    Google Scholar 

  17. Po-Ching, Y., Rimmington, D.: Chinese: An Essential Grammar, 2nd edn. Routledge, London (2010)

    Google Scholar 

  18. Ross, C., Sheng Ma, J.: Modern Mandarin Chinese Grammar. Routledge, London (2006)

    Google Scholar 

  19. The Conjunction (2010), http://www.chineseteachers.com/blog/resource_content.jsp?id=142

  20. Wang, C., Huang, L.: Grammaticalisation of Connectives in Mandarin Chinese: A Corpus-Based Study. Language and Linguistics 7(4), 991–1016 (2006)

    Google Scholar 

  21. Xue, N.: Annotating Discourse Connectives in the Chinese Treebank. In: ACL Workshop on Frontiers in Corpus Annotation 2: Pie in the Sky (2005)

    Google Scholar 

  22. Oxford Chinese Dictionary: English-Chinese Chinese-English. Oxford University Press, UK (2009)

    Google Scholar 

  23. Macmillan Publishers Limited 2009–2014, http://www.macmillandictionary.com/thesaurus-category/british/

  24. Thesauraus.com. Roget’s 21st Century Thesaurus, 3rd edn., http://thesaurus.com/

  25. Olive, J., Christianson, C., McCary, J.: Handbook of Natural Language Processing and Machine Translation. Springer, New York (2011)

    Book  MATH  Google Scholar 

  26. Xia, F.: The Part-Of-Speech Tagging Guidelines for the Penn Chinese Treebank (3.0). Technical Reports, IRCS Report 00-07. Pennsylvania (2000)

    Google Scholar 

  27. Chang, P., Tseng, H., Jurafsky, D., Manning, C.: Discriminative Reordering with Chinese Grammatical Relations Features. In: 3rd Workshop on Syntax and Structure in Statistical Translation at NACCL HTL, Boulder, Colorado (2009b)

    Google Scholar 

  28. Zhou, L., Gao, W., Li, B., Wei, Z., Wong, K.: Cross-lingual Identification of Ambiguous Discourse Connectives for Resource-Poor Language. In: 24th International Conference on Computational Linguistics (COLING), Mumba, India (2012)

    Google Scholar 

  29. Tu, M., Zhou, Y., Zong, C.: Enhancing Grammatical Cohesion: Generating Transitional Expressions for SMT. In: 52nd Annual Meeting of the ACL, Baltimore, USA, June 23-25 (2014)

    Google Scholar 

  30. Guilou, L.: Analysing Lexical Consistency in Translation. In: Workshop on Discourse in Machine Translation (DiscoMT), Sofia, Bulgaria, pp. 10–18 (2013)

    Google Scholar 

  31. Wong, B., Kit, C.: Extending machine translation Evaluation Metrics with Lexical Cohesion to Document Level. In: 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Jeju Island, Korea, pp. 1060–1068 (2012)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Steele, D., Specia, L. (2014). Divergences in the Usage of Discourse Markers in English and Mandarin Chinese. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds) Text, Speech and Dialogue. TSD 2014. Lecture Notes in Computer Science(), vol 8655. Springer, Cham. https://doi.org/10.1007/978-3-319-10816-2_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-10816-2_24

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-10815-5

  • Online ISBN: 978-3-319-10816-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics