Language Resources and Evaluation

, Volume 40, Issue 3–4, pp 203–218 | Cite as

Asian language processing: current state-of-the-art

  • Chu-Ren Huang
  • Takenobu TokunagaEmail author
  • Sophia Yat Mei Lee

Background: the challenge of Asian language processing

Asian language processing presents formidable challenges to achieving multilingualism and multiculturalism in our society. One of the first and most obvious challenges is the multitude and diversity of languages: more than 2,000 languages are listed as languages in Asia by Ethnologue (Gordon 2005), representing four major language families: Austronesian, Trans-New Guinea, Indo-European, and Sino-Tibetan. 1The challenge is made more formidable by the fact that as a whole, Asian languages range from the language with most speakers in the world (Mandarin Chinese, close to 900 million native speakers) to the more than 70 nearly extinct languages (e.g. Pazeh in Taiwan, one speaker). As a result, there are vast differences in the level of language processing capability and the number of sharable resources available for individual languages. Major Asian languages such as Mandarin Chinese, Hindi, Japanese, Korean, and Thai have benefited...


Natural Language Processing Machine Translation Query Expansion Word Sense Statistical Machine Translation 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



We would like to thank all the authors who submitted 74 papers on a wide range of research topics on Asian languages. We had the privilege of going through all these papers and wished that the full range of resources and topics could have been presented. We would also like to thank all the reviewers, whose prompt action helped us through all the submitted papers with helpful comments. We would like to thank AFNLP for its support of the initiative to promote Asian language processing. Various colleagues helped us processing all the papers, including Dr. Sara Goggi at CNR-Italy, Dain Kaplan at Tokyo Institute of Technology, and Liwu Chen at Academia Sinica. Finally, we could like to thank four people at LRE and Springer that made this special issue possible. Without the generous support of the chief editors Nancy Ide and Nicoletta Calzolari, this volume would not have been possible. In addition, without the diligent work of both Estella La Jappon and Jenna Cataluna at Springer, we would never have been able to negotiate all the steps of publication. For this introductory chapter, we would like to thank Kathleen Ahrens, Nicoletta Calzolari, and Nancy Ide for their detailed comments. We would also like to thank Aravind Joshi, Pushpak Bhattacharyya, Benjamin T’sou, and Jun’ichi Tsujii for making their panel materials accessible to us. Any remaining errors are, of course, ours.


  1. Baeza-Yates, R., & Ribeiro-Neto, B. (1999). Modern information retrieval. Addison Wesley.Google Scholar
  2. Bhattacharyya, P. (2006). Can the availability of detailed linguistic information, say morphology, help in ameliorating the scarcity of large annotated corpora? In COLING/ACL 2006. Sydney. Panel Presentation at the Panel: Challenges in NLP: Some new perspectives from the east.Google Scholar
  3. Brown, P. E., Della Pietra, V. J., Della Pietra, S. A., & Mercer, R. L. (1993). The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics, 19(2), 263–311.Google Scholar
  4. Butt, M., & King, T. (to appear). Urdu in a parallel grammar development environment. To appear in New Frontiers in Asian Language Resources. A special issue of Language Resources and Evaluation. Google Scholar
  5. Copestake, A., Flickinger, D., Sag, I. A., & Pollard, C. (2005). Minimal recursion semantics: An introduction. Journal of Research on Language and Computation, 3(2–3), 281–332.CrossRefGoogle Scholar
  6. Fellbaum, C. (1998). WordNet: An electronic lexical database. The MIT Press.Google Scholar
  7. Francopoulo, G., George, M., Calzolari, N., Monachini, M., Bel, N., Pet, C., & Soria, M. (2006). Lexical markup framework (LMF). In Proceedings of LREC 2006: 5th International Conference on Language Resources and Evaluation (pp. 233–236).Google Scholar
  8. Gordon, R. G. J. (Ed.) (2005). Ethnologue: Languages of the World (15th ed.). SIL International.Google Scholar
  9. Hashimoto, S. (1984). Kokugohô Yôsetu (Elements of Japanese Grammar), Vol. II of The Complete Works of Dr. Shinkichi Hashimoto. Iwanami Syoten.Google Scholar
  10. Huang, C., Calzolari, N., Gangemi, A., Lenci, A., Oltramari, A., & Prévot, L. (Eds.) (to appear). Ontologies and the Lexicon. Cambridge studies in natural language processing. Cambridge: Cambridge University Press.Google Scholar
  11. Huang, C., Tokunaga, T., Calzolari, N., Prévot, L., Chung, S., Jiang, T., et al. (2007, January). Extending an international lexical framework for Asian languages, the case of Mandarin, Taiwanese, Cantonese, Bangla and Malay. Proceedings of the First International Workshop on Intercultural Collaboration (IWIC) (pp. 24–26). Kyoto: Kyoto University.Google Scholar
  12. Joshi, A. (2006). Panel: Challenges in NLP: Some New Perspectives from the East. In COLING/ACL 2006. Sydney.Google Scholar
  13. Karttunen, L., & McCarthy, J. (1983). A special issue on Two-level morphology introducing the KIMMO system. Texas Linguistic Forum 22.Google Scholar
  14. Koskenniemi, K. (1983). Two-level morphology: A general computational model for word-form recognition and production. Ph.D. thesis, University of Helsinki.Google Scholar
  15. Kukich, K. (1992). Techniques for automatically correcting words in text. ACM Computing Surveys, 24(4), 377–439.CrossRefGoogle Scholar
  16. Kurohashi, S., & Nagao, M. (1994). A syntactic analysis method of long Japanese sentences based on the detection of conjunctive structures. Computational Linguistics, 20(4), 507–534.Google Scholar
  17. Nagata, M. (1996). Context-based spelling correction for Japanese OCR. In Proceedings of the 16th International Conference on Computational Linguistics (pp. 806–811).Google Scholar
  18. Nagata, M. (1998). Japanese OCR error correction using character shape similarity and statistical language model. In Proceedings of 36th Annual Meeting of the Association for Computational Linguistics and the 17th International Conference on Computational Linguistics (pp. 922–928).Google Scholar
  19. Pollard, C., & Sag, I. A. (1994). Head-driven phrase structure grammar. CLSI.Google Scholar
  20. Tokunaga, T., Sornlertlamvanich, V., Charoenporn, T., Calzolari, N., Monachini, M., Sonia, C., Huang, C., Xia, Y., Yu, H., Prevot, L., & Shirai, K. (2006). Infrastructure for standardization of Asian language resources. In COLING/ACL 2006 (pp. 827–834).Google Scholar
  21. T’sou, B. (2004). Chinese language processing at the dawn of the 21st century. In C.-R. Huang & W. Lenders (Eds.), Computational linguistics and beyond (pp. 189–206). Language and Linguistics.Google Scholar
  22. T’sou, B. (2006). Some salient linguistic differences in Asia and implications for NLP. In COLING/ACL 2006. Sydney. Panel Presentation at the Panel: Challenges in NLP: Some new perspectives from the East.Google Scholar
  23. Tsujii, J. (2006). Diversity vs. universality. In COLING/ACL 2006. Sydney. Panel Presentation at the Panel: Challenges in NLP: Some New Perspectives from the East.Google Scholar

Copyright information

© Springer Science+Business Media B.V. 2007

Authors and Affiliations

  • Chu-Ren Huang
    • 1
  • Takenobu Tokunaga
    • 2
    Email author
  • Sophia Yat Mei Lee
    • 1
  1. 1.Institute of LinguisticsAcademia SinicaTaipeiTaiwan
  2. 2.Department of Computer Science, Graduate School of Information Science and EngineeringTokyo Instiute of TechnologyMeguro, TokyoJapan

Personalised recommendations