Skip to main content

‘Garbage Let’s Take Away’: Producing Understandable and Translatable Government Documents: A Case Study from Japan

  • Chapter
  • First Online:
  • 1281 Accesses

Abstract

Government departments increasingly communicate information to citizens digitally via web sites, and, in many societies, the linguistic diversity of these citizens is also growing. In Japan, a largely monolingual society, municipal governments now routinely address the necessity of providing practical and legal information to residents with limited Japanese by machine-translating their public service web sites into selected languages. Cost constraints often mean the translation is left un-edited and, as a result, may be unclear, misleading or even incomprehensible. While machine translation from Japanese is particularly challenging because of its structural uniqueness, the state of the art in the field generally is such that poor output is a universal problem. The solution we propose draws on recent advances in controlled authoring, document structuring and machine translation evaluation. It is realised as a prototype tool that enables non-professional writers to create documents where individual sentences and overall flow are both clear. The tool is designed to enhance machine-translatability into English without compromising the readability of the Japanese original. The originality of the tool is to provide an interactive sentence checker that is context-sensitive to the individual functional elements of a document template specialised for the public administration domain. Where natural Japanese sentences give bad translation results, we pre-process them internally into a form which yields acceptable machine translation output. Evaluation of the tool will target three concerns: its usability by non-professional authors; the acceptability of the Japanese document; and the comprehensibility of the English translation. We suggest that such an authoring framework could facilitate government communication with citizens in many societies beyond Japan.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    http://facebook.com/.

  2. 2.

    https://mixi.jp/.

  3. 3.

    https://myspace.com/.

  4. 4.

    https://twitter.com/.

  5. 5.

    http://now.ameba.jp/.

  6. 6.

    https://translate.google.com/.

  7. 7.

    http://www.city.shinjuku.lg.jp/foreign/english/guide/todoke/todoke_7.html. Accessed 11 June 2015.

  8. 8.

    See http://www.plainlanguage.gov/. For a similar UK initiative, but one not backed by legislation, see http://www.plainenglish.co.uk/.

  9. 9.

    http://www.smartny.com/maxit.html/.

  10. 10.

    http://www.acrolinx.com/.

  11. 11.

    http://www.hotdocs.com/.

  12. 12.

    http://www.exari.com/.

  13. 13.

    http://www.logicnets.com/.

  14. 14.

    http://www.ptc.com/products/arbortext/.

  15. 15.

    http://dita-jp.org/en/.

  16. 16.

    http://mecab.googlecode.com/svn/trunk/mecab/doc/index.html.

References

  1. Adriaens, G., & Schreurs, D. (1992). From Cogram to Alcogram: Toward a controlled English grammar checker. In Proceedings COLING1992, Nantes, France.

    Google Scholar 

  2. AECMA (1995). A guide for the preparation of aircraft maintenance documents in the aerospace maintenance language AECMA Simplified English. AECMA Document, PSC-85-16598, Paris: AECMA.

    Google Scholar 

  3. Bellamy, L., Carey, M., & Schlotfeldt, J. (2012). DITA best practices: A roadmap for writing, editing, and architecting in DITA. Upper Saddle River, NJ: IBM Press.

    Google Scholar 

  4. Bernth, A., & Gdaniec, C. (2001). Mtranslatability. Machine Translation, 16(3), 175–218.

    Article  MATH  Google Scholar 

  5. Bertot, J., Jaeger, P., & Hansen, D. (2012). The impact of policies on government social media usage: Issues, challenges and recommendations. Government Information Quarterly, 29(2012):30–40. (Elsevier).

    Google Scholar 

  6. Biber, D., & Conrad, S. (2009). Register, genre, and style. New York: Cambridge University Press.

    Book  Google Scholar 

  7. Bouayad-Agha, N., Power, R., & Belz, A. (2002). PILLS: Multilingual generation of medical information documents with overlapping content. In Proceedings LREC 2002, Las Palmas, Spain.

    Google Scholar 

  8. Brown, P., Della Pietra, S., Della Pietra, V., & Mercer, R. (1993). The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics, 19(2), 263–311.

    Google Scholar 

  9. Carroll, T. (2010). Local government websites in Japan: International, multicultural, multilingual? Japanese Studies, 30(3), 373–392.

    Article  Google Scholar 

  10. Colineau, N., Paris, C., & Linden, K. V. (2002). An evaluation of procedural instructional text. In Proceedings International Natural Language Generation Conference, New York.

    Google Scholar 

  11. Colineau, N., Paris, C., & Linden, K. V. (2012). Government to citizen communications: From generic to tailored documents in public administration. Information Polity, 17(2), 177–193.

    Google Scholar 

  12. Colineau, N., Paris, C., & Linden, K. V. (2013). Automatically producing tailored web materials for public administration. New Review of HyperMedia and MultiMedia, 9(2), 158–181.

    Article  Google Scholar 

  13. Day, D., Priestley, M., & Schell, D. (2005). Introduction to the Darwin Information Typing Architecture: Toward portable technical information. IBM Corporation. http://www.ibm.com/developerworks/xml/library/x-dita1/x-dita1-pdf.pdf. Accessed 18 Jan 2015.

  14. DiMarco, C., Bray, P., Covvey, H. D., Cowan, D., DiCuccio, V., Hovy, E., & Yang, C. (2008). Authoring and generation of individualised patient education materials. Journal on Information Technology in Healthcare, 6(1), 63–71.

    Google Scholar 

  15. Hartley, A. (2010). Enabling multilingual applications of ‘controlled language’: The DITA framework. Asia-Pacific Association for Machine Translation Journal, 48, 15–18.

    Google Scholar 

  16. Hartley, A. F., & Paris, C. (1997). Multilingual document production: From support for translating to support for authoring. Machine Translation, 12(1997), 109–128.

    Article  Google Scholar 

  17. Hartley, A., Paris, C. (2001). Translation, controlled languages, generation. In E. Steiner, C. Yallop (Eds.), Exploring translation and multilingual text production: Beyond content (pp. 307–325), Berlin: De Gruyter Mouton.

    Google Scholar 

  18. Hartley, A., Tatsumi, M., Isahara, H., Kageura, K., & Miyata, R. (2012). Readability and translatability judgments for ‘Controlled Japanese.’ In Proceedings EAMT2012, Trento, IT.

    Google Scholar 

  19. Inui, K., & Fujita, A. (2004). 言い換え技術に関する研究動向 (A survey on paraphrase generation and recognition). Natural Language Processing, 11(5), 151–198.

    Article  Google Scholar 

  20. Japan Technical Communicators Association (Ed.). (2011). 日本語スタイルガイド (Style guide for Japanese documents) (2nd ed.). Tokyo: JTCA Publication.

    Google Scholar 

  21. Jong, M., & Schellens, P. J. (2000). Toward a document evaluation methodology: What does research tell us about the validity and reliability of evaluation methods? IEEE Transactions on Professional Communication, 43(3), 242–260.

    Article  Google Scholar 

  22. Kamprath, C., Adolphson, E., Mitamura, T., & Nyberg, E. (1998). Controlled language for multilingual document production: Experience with Caterpillar Technical English. In Proceedings CLAW1998, Pittsburgh, PA.

    Google Scholar 

  23. Kando, N. (1997). Text-level structure of research articles and its implication for text-based information processing systems. In Proceedings. 19th British Computer Society Annual Colloquium on Information Retrieval Research, Aberdeen, Scotland, UK.

    Google Scholar 

  24. Kittredge, R. (2003). Sublanguages and controlled languages. In R. Mitkov (Ed.), Oxford handbook of computational linguistics (pp. 430–437). Oxford: Oxford University Press.

    Google Scholar 

  25. Kruijff, G.-J., Teich, E., Bateman, J., Kruijff-Korbayova, I., Skoumalova, H., Sharoff, S., Sokolova, E., Hartley, T., Staykova, K., & Hana, J. (2000). Multilinguality in a text generation system for three Slavic languages. In Proceedings COLING2000, Saarbruecken, Germany.

    Google Scholar 

  26. Kuhn, T. (2014). A survey and classification of controlled natural languages. Computational Linguistics, 40(1), 121–170.

    Article  Google Scholar 

  27. Ministry of Internal Affairs and Communications. (2014). 地域におけるICT利活用の現状等に関する調査研究 報告書 (Report of survey on utilisation of ICT in the regions). http://www.soumu.go.jp/johotsusintokei/linkdata/h26_07_houkoku.pdf. Accessed 24 May 2015.

  28. Mitamura, T., & Nyberg, E. (2001). Automatic rewriting for controlled language translation. In Proceedings NLPRS2001 Workshop on Automatic Paraphrasing: Theory and Application, Tokyo, Japan.

    Google Scholar 

  29. Mitamura, T., Baker, K., Nyberg, E., & Svoboda, D. (2003). Diagnostics for interactive controlled language checking. In Proceedings EAMT2003 Workshop on Controlled Language Applications, Dublin.

    Google Scholar 

  30. Nagao, M. (1984). A framework of a mechanical translation between Japanese and English by analogy principle. In A. Elithorn & R. Banerji (Eds.), Artificial and human intelligence. New York: Elsevier North-Holland Inc.

    Google Scholar 

  31. Nagao, M., Tanaka, N., & Tsujii, J. (1984). 制限文法にもとづく文章作成援助システム (Support system for writing texts based on controlled grammar). Information Processing Society of Japan, NL-44, 33–40.

    Google Scholar 

  32. Nielsen, J. (1993). Usability engineering. San Francisco: Morgan Kaufmann.

    MATH  Google Scholar 

  33. Nyberg, E., & Mitamura, T. (2000). The KANTOO machine translation environment. In Proceedings AMTA2000, Cuernavaca, Mexico.

    Google Scholar 

  34. Nyberg, E., Mitamura, T., & Huijsen, W. (2003). Controlled language for authoring and translation. In H. Somers (Ed.), Computers and the translator. Amsterdam: Benjamins.

    Google Scholar 

  35. OASIS. (2010). Darwin Information Typing Architecture (DITA) Version 1.2. http://docs.oasis-open.org/dita/v1.2/os/spec/DITA1.2-spec.html. Accessed 31 May 2015.

  36. O’Brien, S. (2003). Controlling controlled English: An analysis of several controlled language rule sets. In Proceedings EAMT2003 Workshop on Controlled Language Applications, Dublin.

    Google Scholar 

  37. O’Brien, S. (2010). Controlled language and readability. Translation and Cognition, 15, 143–165.

    Article  Google Scholar 

  38. Ogura, E., Kudo, M., & Yanagi, H. (2010). シンプリファイド・テクニカル・ジャパニーズ英訳を視野に入れて日本語を作る (Simplified Technical Japanese: Writing translation-ready Japanese documents). Information Processing Society of Japan, DD-78(5), 1–8.

    Google Scholar 

  39. Paris, C., Linden, K. V., Colineau, N., & Lu, S. (2005). Automatically generating effective on-line help. International Journal on E-Learning, 4(1), 83–103.

    Google Scholar 

  40. Paris, C., Colineau, N., Lampert, A., & Linden, K. V. (2010). Discourse planning for information composition and delivery: A reusable platform. The International Journal of Natural Language Engineering, 16(1), 61–98.

    Article  Google Scholar 

  41. Paris, C., Thomas, P., & Wan, S. (2012). Differences in language and style between two social media communities. In Proceedings ICWSM2012, Dublin.

    Google Scholar 

  42. PLAIN (Plain Language and Information Network). (2011). Federal Plain Language Guidelines. http://www.plainlanguage.gov. Accessed 31 May 2015.

  43. Power, R., Scott, D., & Hartley, A. (2003). Multilingual generation of controlled languages. In Proceedings EAMT2003 Workshop on Controlled Language Applications, Dublin.

    Google Scholar 

  44. Pym, P. (1990). Pre-editing and the use of simplified writing for MT. In P. Mayorcas (Ed.), Translating and the computer 10 (pp. 80–95). London: Aslib.

    Google Scholar 

  45. Roturier, J. (2009). Controlled language for MT in action. In Proceedings Translingual Europe, Prague.

    Google Scholar 

  46. Sato, S., & Nagao, M. (1990). Toward memory-based translation. In Proceedings COLING1990, Stroudsburg, PA.

    Google Scholar 

  47. Sato, S., Tsuchiya, M., Murayama, M., Asaoka, M., & Wang, Q. (2003). 日本語文の規格化 (Standardization of Japanese sentences). Information Processing Society of Japan, NL-4, 133–140.

    Google Scholar 

  48. Shirai, S., Ikehara, S., Yokoo, A., & Ooyama, Y. (1998). Automatic rewriting method for internal expressions in Japanese to English MT and its effects. In Proceedings CLAW1998, Pittsburgh, PA.

    Google Scholar 

  49. Smart, J. F. (2006). SMART Controlled English. In Proceedings CLAW2006, Cambridge, MA.

    Google Scholar 

  50. Tatsumi, M., Miyata, R., Hartley, A., Kageura, K., & Isahara, H. (2013). Towards acceptable quality machine translation without post-editing for municipal websites: An evaluation of Japanese controlled language rules. MT Summit 2013 QTLaunchPad Workshop on Human-Centric Machine Translation and Evaluation, Nice, France.

    Google Scholar 

  51. Watanabe, T. (2010). 産業日本語プロジェクトの概要 特許・技術情報の利用性向上のために (Outline of the ‘Technical Japanese’ project: Activity for acceleration of patent technological information utilization). Information Processing and Management, 53(9), 480–491.

    Article  Google Scholar 

  52. Yoshida, S., & Matsuyama, A. (1985). 日本語の規格化:係り受け関係の規格化とそれへの変換ルール (Standardizing Japanese: Standardizing dependency relations and transformation rules). Information Processing Society of Japan, NL-31, 1–6.

    Google Scholar 

  53. Yoshimi, T., Sata, I., & Fukumochi, Y. (2000). Automatic preediting of English sentences for a robust English-to-Japanese MT system. Natural Language Processing, 7(4), 99–117.

    Article  Google Scholar 

Download references

Acknowledgments

This work was supported by the Research Grant Program of KDDI Foundation, Japan. The MT system J-SERVER Professional TransGateway V3 was offered by Kodensha Co. Paris’s stay in Japan to work with Miyata, Kageura and Hartley was funded by the Japanese Society for the Promotion of Science and CSIRO.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rei Miyata .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Miyata, R., Hartley, A., Kageura, K., Paris, C. (2015). ‘Garbage Let’s Take Away’: Producing Understandable and Translatable Government Documents: A Case Study from Japan. In: Nepal, S., Paris, C., Georgakopoulos, D. (eds) Social Media for Government Services. Springer, Cham. https://doi.org/10.1007/978-3-319-27237-5_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-27237-5_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-27235-1

  • Online ISBN: 978-3-319-27237-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics