‘Garbage Let’s Take Away’: Producing Understandable and Translatable Government Documents: A Case Study from Japan



Government departments increasingly communicate information to citizens digitally via web sites, and, in many societies, the linguistic diversity of these citizens is also growing. In Japan, a largely monolingual society, municipal governments now routinely address the necessity of providing practical and legal information to residents with limited Japanese by machine-translating their public service web sites into selected languages. Cost constraints often mean the translation is left un-edited and, as a result, may be unclear, misleading or even incomprehensible. While machine translation from Japanese is particularly challenging because of its structural uniqueness, the state of the art in the field generally is such that poor output is a universal problem. The solution we propose draws on recent advances in controlled authoring, document structuring and machine translation evaluation. It is realised as a prototype tool that enables non-professional writers to create documents where individual sentences and overall flow are both clear. The tool is designed to enhance machine-translatability into English without compromising the readability of the Japanese original. The originality of the tool is to provide an interactive sentence checker that is context-sensitive to the individual functional elements of a document template specialised for the public administration domain. Where natural Japanese sentences give bad translation results, we pre-process them internally into a form which yields acceptable machine translation output. Evaluation of the tool will target three concerns: its usability by non-professional authors; the acceptability of the Japanese document; and the comprehensibility of the English translation. We suggest that such an authoring framework could facilitate government communication with citizens in many societies beyond Japan.


Government communication Controlled language Document structure Authoring tool Machine translation DITA 



This work was supported by the Research Grant Program of KDDI Foundation, Japan. The MT system J-SERVER Professional TransGateway V3 was offered by Kodensha Co. Paris’s stay in Japan to work with Miyata, Kageura and Hartley was funded by the Japanese Society for the Promotion of Science and CSIRO.


  1. 1.
    Adriaens, G., & Schreurs, D. (1992). From Cogram to Alcogram: Toward a controlled English grammar checker. In Proceedings COLING1992, Nantes, France.Google Scholar
  2. 2.
    AECMA (1995). A guide for the preparation of aircraft maintenance documents in the aerospace maintenance language AECMA Simplified English. AECMA Document, PSC-85-16598, Paris: AECMA.Google Scholar
  3. 3.
    Bellamy, L., Carey, M., & Schlotfeldt, J. (2012). DITA best practices: A roadmap for writing, editing, and architecting in DITA. Upper Saddle River, NJ: IBM Press.Google Scholar
  4. 4.
    Bernth, A., & Gdaniec, C. (2001). Mtranslatability. Machine Translation, 16(3), 175–218.zbMATHCrossRefGoogle Scholar
  5. 5.
    Bertot, J., Jaeger, P., & Hansen, D. (2012). The impact of policies on government social media usage: Issues, challenges and recommendations. Government Information Quarterly, 29(2012):30–40. (Elsevier).Google Scholar
  6. 6.
    Biber, D., & Conrad, S. (2009). Register, genre, and style. New York: Cambridge University Press.CrossRefGoogle Scholar
  7. 7.
    Bouayad-Agha, N., Power, R., & Belz, A. (2002). PILLS: Multilingual generation of medical information documents with overlapping content. In Proceedings LREC 2002, Las Palmas, Spain.Google Scholar
  8. 8.
    Brown, P., Della Pietra, S., Della Pietra, V., & Mercer, R. (1993). The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics, 19(2), 263–311.Google Scholar
  9. 9.
    Carroll, T. (2010). Local government websites in Japan: International, multicultural, multilingual? Japanese Studies, 30(3), 373–392.CrossRefGoogle Scholar
  10. 10.
    Colineau, N., Paris, C., & Linden, K. V. (2002). An evaluation of procedural instructional text. In Proceedings International Natural Language Generation Conference, New York.Google Scholar
  11. 11.
    Colineau, N., Paris, C., & Linden, K. V. (2012). Government to citizen communications: From generic to tailored documents in public administration. Information Polity, 17(2), 177–193.Google Scholar
  12. 12.
    Colineau, N., Paris, C., & Linden, K. V. (2013). Automatically producing tailored web materials for public administration. New Review of HyperMedia and MultiMedia, 9(2), 158–181.CrossRefGoogle Scholar
  13. 13.
    Day, D., Priestley, M., & Schell, D. (2005). Introduction to the Darwin Information Typing Architecture: Toward portable technical information. IBM Corporation. Accessed 18 Jan 2015.
  14. 14.
    DiMarco, C., Bray, P., Covvey, H. D., Cowan, D., DiCuccio, V., Hovy, E., & Yang, C. (2008). Authoring and generation of individualised patient education materials. Journal on Information Technology in Healthcare, 6(1), 63–71.Google Scholar
  15. 15.
    Hartley, A. (2010). Enabling multilingual applications of ‘controlled language’: The DITA framework. Asia-Pacific Association for Machine Translation Journal, 48, 15–18.Google Scholar
  16. 16.
    Hartley, A. F., & Paris, C. (1997). Multilingual document production: From support for translating to support for authoring. Machine Translation, 12(1997), 109–128.CrossRefGoogle Scholar
  17. 17.
    Hartley, A., Paris, C. (2001). Translation, controlled languages, generation. In E. Steiner, C. Yallop (Eds.), Exploring translation and multilingual text production: Beyond content (pp. 307–325), Berlin: De Gruyter Mouton.Google Scholar
  18. 18.
    Hartley, A., Tatsumi, M., Isahara, H., Kageura, K., & Miyata, R. (2012). Readability and translatability judgments for ‘Controlled Japanese.’ In Proceedings EAMT2012, Trento, IT.Google Scholar
  19. 19.
    Inui, K., & Fujita, A. (2004). 言い換え技術に関する研究動向 (A survey on paraphrase generation and recognition). Natural Language Processing, 11(5), 151–198.CrossRefGoogle Scholar
  20. 20.
    Japan Technical Communicators Association (Ed.). (2011). 日本語スタイルガイド (Style guide for Japanese documents) (2nd ed.). Tokyo: JTCA Publication.Google Scholar
  21. 21.
    Jong, M., & Schellens, P. J. (2000). Toward a document evaluation methodology: What does research tell us about the validity and reliability of evaluation methods? IEEE Transactions on Professional Communication, 43(3), 242–260.CrossRefGoogle Scholar
  22. 22.
    Kamprath, C., Adolphson, E., Mitamura, T., & Nyberg, E. (1998). Controlled language for multilingual document production: Experience with Caterpillar Technical English. In Proceedings CLAW1998, Pittsburgh, PA.Google Scholar
  23. 23.
    Kando, N. (1997). Text-level structure of research articles and its implication for text-based information processing systems. In Proceedings. 19th British Computer Society Annual Colloquium on Information Retrieval Research, Aberdeen, Scotland, UK.Google Scholar
  24. 24.
    Kittredge, R. (2003). Sublanguages and controlled languages. In R. Mitkov (Ed.), Oxford handbook of computational linguistics (pp. 430–437). Oxford: Oxford University Press.Google Scholar
  25. 25.
    Kruijff, G.-J., Teich, E., Bateman, J., Kruijff-Korbayova, I., Skoumalova, H., Sharoff, S., Sokolova, E., Hartley, T., Staykova, K., & Hana, J. (2000). Multilinguality in a text generation system for three Slavic languages. In Proceedings COLING2000, Saarbruecken, Germany.Google Scholar
  26. 26.
    Kuhn, T. (2014). A survey and classification of controlled natural languages. Computational Linguistics, 40(1), 121–170.CrossRefGoogle Scholar
  27. 27.
    Ministry of Internal Affairs and Communications. (2014). 地域におけるICT利活用の現状等に関する調査研究 報告書 (Report of survey on utilisation of ICT in the regions). Accessed 24 May 2015.
  28. 28.
    Mitamura, T., & Nyberg, E. (2001). Automatic rewriting for controlled language translation. In Proceedings NLPRS2001 Workshop on Automatic Paraphrasing: Theory and Application, Tokyo, Japan.Google Scholar
  29. 29.
    Mitamura, T., Baker, K., Nyberg, E., & Svoboda, D. (2003). Diagnostics for interactive controlled language checking. In Proceedings EAMT2003 Workshop on Controlled Language Applications, Dublin.Google Scholar
  30. 30.
    Nagao, M. (1984). A framework of a mechanical translation between Japanese and English by analogy principle. In A. Elithorn & R. Banerji (Eds.), Artificial and human intelligence. New York: Elsevier North-Holland Inc.Google Scholar
  31. 31.
    Nagao, M., Tanaka, N., & Tsujii, J. (1984). 制限文法にもとづく文章作成援助システム (Support system for writing texts based on controlled grammar). Information Processing Society of Japan, NL-44, 33–40.Google Scholar
  32. 32.
    Nielsen, J. (1993). Usability engineering. San Francisco: Morgan Kaufmann.zbMATHGoogle Scholar
  33. 33.
    Nyberg, E., & Mitamura, T. (2000). The KANTOO machine translation environment. In Proceedings AMTA2000, Cuernavaca, Mexico.Google Scholar
  34. 34.
    Nyberg, E., Mitamura, T., & Huijsen, W. (2003). Controlled language for authoring and translation. In H. Somers (Ed.), Computers and the translator. Amsterdam: Benjamins.Google Scholar
  35. 35.
    OASIS. (2010). Darwin Information Typing Architecture (DITA) Version 1.2. Accessed 31 May 2015.
  36. 36.
    O’Brien, S. (2003). Controlling controlled English: An analysis of several controlled language rule sets. In Proceedings EAMT2003 Workshop on Controlled Language Applications, Dublin.Google Scholar
  37. 37.
    O’Brien, S. (2010). Controlled language and readability. Translation and Cognition, 15, 143–165.CrossRefGoogle Scholar
  38. 38.
    Ogura, E., Kudo, M., & Yanagi, H. (2010). シンプリファイド・テクニカル・ジャパニーズ英訳を視野に入れて日本語を作る (Simplified Technical Japanese: Writing translation-ready Japanese documents). Information Processing Society of Japan, DD-78(5), 1–8.Google Scholar
  39. 39.
    Paris, C., Linden, K. V., Colineau, N., & Lu, S. (2005). Automatically generating effective on-line help. International Journal on E-Learning, 4(1), 83–103.Google Scholar
  40. 40.
    Paris, C., Colineau, N., Lampert, A., & Linden, K. V. (2010). Discourse planning for information composition and delivery: A reusable platform. The International Journal of Natural Language Engineering, 16(1), 61–98.CrossRefGoogle Scholar
  41. 41.
    Paris, C., Thomas, P., & Wan, S. (2012). Differences in language and style between two social media communities. In Proceedings ICWSM2012, Dublin.Google Scholar
  42. 42.
    PLAIN (Plain Language and Information Network). (2011). Federal Plain Language Guidelines. Accessed 31 May 2015.
  43. 43.
    Power, R., Scott, D., & Hartley, A. (2003). Multilingual generation of controlled languages. In Proceedings EAMT2003 Workshop on Controlled Language Applications, Dublin.Google Scholar
  44. 44.
    Pym, P. (1990). Pre-editing and the use of simplified writing for MT. In P. Mayorcas (Ed.), Translating and the computer 10 (pp. 80–95). London: Aslib.Google Scholar
  45. 45.
    Roturier, J. (2009). Controlled language for MT in action. In Proceedings Translingual Europe, Prague.Google Scholar
  46. 46.
    Sato, S., & Nagao, M. (1990). Toward memory-based translation. In Proceedings COLING1990, Stroudsburg, PA.Google Scholar
  47. 47.
    Sato, S., Tsuchiya, M., Murayama, M., Asaoka, M., & Wang, Q. (2003). 日本語文の規格化 (Standardization of Japanese sentences). Information Processing Society of Japan, NL-4, 133–140.Google Scholar
  48. 48.
    Shirai, S., Ikehara, S., Yokoo, A., & Ooyama, Y. (1998). Automatic rewriting method for internal expressions in Japanese to English MT and its effects. In Proceedings CLAW1998, Pittsburgh, PA.Google Scholar
  49. 49.
    Smart, J. F. (2006). SMART Controlled English. In Proceedings CLAW2006, Cambridge, MA.Google Scholar
  50. 50.
    Tatsumi, M., Miyata, R., Hartley, A., Kageura, K., & Isahara, H. (2013). Towards acceptable quality machine translation without post-editing for municipal websites: An evaluation of Japanese controlled language rules. MT Summit 2013 QTLaunchPad Workshop on Human-Centric Machine Translation and Evaluation, Nice, France.Google Scholar
  51. 51.
    Watanabe, T. (2010). 産業日本語プロジェクトの概要 特許・技術情報の利用性向上のために (Outline of the ‘Technical Japanese’ project: Activity for acceleration of patent technological information utilization). Information Processing and Management, 53(9), 480–491.CrossRefGoogle Scholar
  52. 52.
    Yoshida, S., & Matsuyama, A. (1985). 日本語の規格化:係り受け関係の規格化とそれへの変換ルール (Standardizing Japanese: Standardizing dependency relations and transformation rules). Information Processing Society of Japan, NL-31, 1–6.Google Scholar
  53. 53.
    Yoshimi, T., Sata, I., & Fukumochi, Y. (2000). Automatic preediting of English sentences for a robust English-to-Japanese MT system. Natural Language Processing, 7(4), 99–117.CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.Graduate School of EducationThe University of TokyoTokyoJapan
  2. 2.College of Intercultural CommunicationRikkyo UniversityTokyoJapan
  3. 3.CSIRO, Data61SydneyAustralia
  4. 4.University of LeedsLeedsUK

Personalised recommendations