Skip to main content
Log in

Asian language processing: current state-of-the-art

  • Published:
Language Resources and Evaluation Aims and scope Submit manuscript

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Notes

  1. These four language families, plus the Niger-Congo family in Africa, each include more than 400 languages. Other larger language families in Asia include Austro-Asiatic (169), Tai-Kadai (76), Dravidian (73), and Altaic (66).

  2. Bunsetsu, often translated as base phrase, is the basic unit of Japanese text proposed by Hashimoto (1984). A bunsetsu is a written and prosodic unit which is typically composed of a root and particles and can be identified by phonological principles. The concept of bunsetsu is also adopted in Korean linguistics.

  3. Two of these languages, Filipino and Urdu, do not appear in the current issue and will be represented in the subsequent issue.

  4. The panel, entitled Challenges in NLP: Some New Perspectives from the East, covers three different issues: Jun’ichi Tsujii’s Diversity vs. Universality: Are Asian language special, Benjamin T’sou’s Some Salient Linguistic Differences in Asia and Implications for NLP, and Pushpak Bhattacharyya’s Can the availability of detailed linguistic information (for example, morphology) help in ameliorating the scarcity of large annotated corpora.

  5. Research on this issue was carried out in a project funded by NEDO and directed by Tokunaga Takenobu, as well as within the WG meetings of ISO TC37 SC4.

  6. http://www2.parc.com/isl/groups/nltt/pargram/

  7. http://www.nlp.kuee.kyoto-u.ac.jp/nl-resource/knp.html

  8. http://www.cs.ust.hk/dekai/ssst

  9. There is no clear consensus on the language family of Japanese and Korean. We follow a position popular among theoretical linguists to classify both of them in the Altaic family. Note that Ethnologue lists Japanese as a separate family, while Korean is listed as an isolate and non-affiliated language.

  10. http://www.research.microsoft.com/nlp/Projects/MindNet.aspx

References

  • Baeza-Yates, R., & Ribeiro-Neto, B. (1999). Modern information retrieval. Addison Wesley.

  • Bhattacharyya, P. (2006). Can the availability of detailed linguistic information, say morphology, help in ameliorating the scarcity of large annotated corpora? In COLING/ACL 2006. Sydney. Panel Presentation at the Panel: Challenges in NLP: Some new perspectives from the east.

  • Brown, P. E., Della Pietra, V. J., Della Pietra, S. A., & Mercer, R. L. (1993). The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics, 19(2), 263–311.

    Google Scholar 

  • Butt, M., & King, T. (to appear). Urdu in a parallel grammar development environment. To appear in New Frontiers in Asian Language Resources. A special issue of Language Resources and Evaluation.

  • Copestake, A., Flickinger, D., Sag, I. A., & Pollard, C. (2005). Minimal recursion semantics: An introduction. Journal of Research on Language and Computation, 3(2–3), 281–332.

    Article  Google Scholar 

  • Fellbaum, C. (1998). WordNet: An electronic lexical database. The MIT Press.

  • Francopoulo, G., George, M., Calzolari, N., Monachini, M., Bel, N., Pet, C., & Soria, M. (2006). Lexical markup framework (LMF). In Proceedings of LREC 2006: 5th International Conference on Language Resources and Evaluation (pp. 233–236).

  • Gordon, R. G. J. (Ed.) (2005). Ethnologue: Languages of the World (15th ed.). SIL International.

  • Hashimoto, S. (1984). Kokugohô Yôsetu (Elements of Japanese Grammar), Vol. II of The Complete Works of Dr. Shinkichi Hashimoto. Iwanami Syoten.

  • Huang, C., Calzolari, N., Gangemi, A., Lenci, A., Oltramari, A., & Prévot, L. (Eds.) (to appear). Ontologies and the Lexicon. Cambridge studies in natural language processing. Cambridge: Cambridge University Press.

  • Huang, C., Tokunaga, T., Calzolari, N., Prévot, L., Chung, S., Jiang, T., et al. (2007, January). Extending an international lexical framework for Asian languages, the case of Mandarin, Taiwanese, Cantonese, Bangla and Malay. Proceedings of the First International Workshop on Intercultural Collaboration (IWIC) (pp. 24–26). Kyoto: Kyoto University.

  • Joshi, A. (2006). Panel: Challenges in NLP: Some New Perspectives from the East. In COLING/ACL 2006. Sydney.

  • Karttunen, L., & McCarthy, J. (1983). A special issue on Two-level morphology introducing the KIMMO system. Texas Linguistic Forum 22.

  • Koskenniemi, K. (1983). Two-level morphology: A general computational model for word-form recognition and production. Ph.D. thesis, University of Helsinki.

  • Kukich, K. (1992). Techniques for automatically correcting words in text. ACM Computing Surveys, 24(4), 377–439.

    Article  Google Scholar 

  • Kurohashi, S., & Nagao, M. (1994). A syntactic analysis method of long Japanese sentences based on the detection of conjunctive structures. Computational Linguistics, 20(4), 507–534.

    Google Scholar 

  • Nagata, M. (1996). Context-based spelling correction for Japanese OCR. In Proceedings of the 16th International Conference on Computational Linguistics (pp. 806–811).

  • Nagata, M. (1998). Japanese OCR error correction using character shape similarity and statistical language model. In Proceedings of 36th Annual Meeting of the Association for Computational Linguistics and the 17th International Conference on Computational Linguistics (pp. 922–928).

  • Pollard, C., & Sag, I. A. (1994). Head-driven phrase structure grammar. CLSI.

  • Tokunaga, T., Sornlertlamvanich, V., Charoenporn, T., Calzolari, N., Monachini, M., Sonia, C., Huang, C., Xia, Y., Yu, H., Prevot, L., & Shirai, K. (2006). Infrastructure for standardization of Asian language resources. In COLING/ACL 2006 (pp. 827–834).

  • T’sou, B. (2004). Chinese language processing at the dawn of the 21st century. In C.-R. Huang & W. Lenders (Eds.), Computational linguistics and beyond (pp. 189–206). Language and Linguistics.

  • T’sou, B. (2006). Some salient linguistic differences in Asia and implications for NLP. In COLING/ACL 2006. Sydney. Panel Presentation at the Panel: Challenges in NLP: Some new perspectives from the East.

  • Tsujii, J. (2006). Diversity vs. universality. In COLING/ACL 2006. Sydney. Panel Presentation at the Panel: Challenges in NLP: Some New Perspectives from the East.

Download references

Acknowledgements

We would like to thank all the authors who submitted 74 papers on a wide range of research topics on Asian languages. We had the privilege of going through all these papers and wished that the full range of resources and topics could have been presented. We would also like to thank all the reviewers, whose prompt action helped us through all the submitted papers with helpful comments. We would like to thank AFNLP for its support of the initiative to promote Asian language processing. Various colleagues helped us processing all the papers, including Dr. Sara Goggi at CNR-Italy, Dain Kaplan at Tokyo Institute of Technology, and Liwu Chen at Academia Sinica. Finally, we could like to thank four people at LRE and Springer that made this special issue possible. Without the generous support of the chief editors Nancy Ide and Nicoletta Calzolari, this volume would not have been possible. In addition, without the diligent work of both Estella La Jappon and Jenna Cataluna at Springer, we would never have been able to negotiate all the steps of publication. For this introductory chapter, we would like to thank Kathleen Ahrens, Nicoletta Calzolari, and Nancy Ide for their detailed comments. We would also like to thank Aravind Joshi, Pushpak Bhattacharyya, Benjamin T’sou, and Jun’ichi Tsujii for making their panel materials accessible to us. Any remaining errors are, of course, ours.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Takenobu Tokunaga.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Huang, CR., Tokunaga, T. & Lee, S.Y.M. Asian language processing: current state-of-the-art. Lang Resources & Evaluation 40, 203–218 (2006). https://doi.org/10.1007/s10579-007-9041-9

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10579-007-9041-9

Keywords

Navigation