Skip to main content
Log in

Chinese Information Processing and Its Prospects

  • Applications
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

The paper presents some main progresses and achievements in Chinese information processing. It focuses on six aspects, i.e., Chinese syntactic analysis, Chinese semantic analysis, machine translation, information retrieval, information extraction, and speech recognition and synthesis. The important techniques and possible key problems of the respective branch in the near future are discussed as well.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Xu Lin, Zhao Tiejun. Review on recently finished NSFC sponsored NLP projects. Journal of Software, 16(10): 1853–1858.

  2. Jin Kiat Low, Hwee Tou Ng, Wenyuan Guo. A maximum entropy approach to Chinese word segmentation. In Proc. Fourth SIGHAN Workshop on Chinese Language Processing, Jeju Island, Korea, 14–15 October 2005, pp. 161–164.

  3. Chen Yin, Yang Muyun, Zhao Tiejun et al. A lexicalized second-order-HMM for ambiguity resolution in Chinese segmentation and POS tagging. High Technology Letters, 2005, 11(4): 346–350.

    Google Scholar 

  4. Yinghong Liang, Tiejun Zhao. Distributed English text chunking using multi-agent based architecture. In Proc. Int. Conf. Artificial Intelligence, Mexican City, 2005, pp. 752–760.

  5. Zhang Min. Research on algorithms of Chinese treebank construction based on weakly restricted stochastic context-sensitive grammars [Dissertation]. Harbin Institute of Technology, 1997.

  6. Zhou Q. A statistics-based Chinese parser. In Proc. the 5th Workshop on Very Large Corpora, 1997, pp. 4–15.

  7. Zhou M. A block-based dependency parser for unrestricted Chinese text. In Proc. 2nd Chinese Language Processing Workshop, ACL2000, Hong Kong, 2000, pp. 78–84.

  8. Meng Yao. Research on global Chinese parsing model based maximum entropy and parsing algorithm [Dissertation]. Harbin Institute of Technology, 2003.

  9. One-Soon Her. Grammatical functions and verb subcategorization in Mandarin Chinese [Dissertation]. University of Hawaii, 1990, pp. 342–359.

  10. Dorr B J, Gina-Anne Levow, Dekang Lin, Scott Thomas. Chinese-English semantic resource construction. In Proc. the 2nd Int. Conf. Language Resources and Evaluation, Athens, Greece, 2000, pp. 757–760.

  11. Han Xiwu. Research on automatic acquisition of Chinese verb subcategorization [Dissertation]. Harbin Institute of Technology, 2006.

  12. Huang ChangNing, TongXiang. Auto-tagging Chinese real-text word sense. Applied Linguistics, 1993, (4): 18–25.

  13. Li JuanZi. The research on Chinese word sense disambiguation [Dissertation]. Tsinghua University, Beijing, 1999.

  14. Lu Song, Bai Shuo, Huang Xiong. An unsuptervised approach to word sense disambiguation based on sense-words in vector space model. Journal of Software, 2002, 3(6): 1082–1089.

    Google Scholar 

  15. Lu Zhimao, Liu Ting, Zhang Gang et al. Word sense disambiguation based on dependency relationship analysis and Bayes model. High Technology Letters, 2003, 13(5): 4–10.

    MATH  Google Scholar 

  16. Xu Min, Wang Nengzhong, Ma Yanhua. On study of anaphora resolution of Chinese character. J. Southwest China Normal University (Natural Science), 1999, 24(6): 633–637.

    Google Scholar 

  17. Zhang Wei, Zhou Changle. Study on meta-anaphoric resolution in Chinese discourse understanding. Journal of Software, 2002, 13(4): 732–738.

    MathSciNet  Google Scholar 

  18. Wang Xiaobin, Zhou Changle. Study on Chinese pronominal anaphora resolution based on discourse representation theory. Journal of Xiamen University (Natural Science), 2004, 43(1): 31–35.

    MathSciNet  Google Scholar 

  19. Li Guochen, Luo Yunfei. Chinese pronominal anaphora resolution via a preference selection approach. Journal of Chinese Information Processing, 2005, 19(4): 24–30.

  20. Coulthard R. An Introduction to Discourse Analysis. London: Longman, 1985.

    Google Scholar 

  21. Fu Jianlian, Chen Qunxiu. Topic analysis in the automatic summarization system. Journal of Chinese Information Processing, 2005, 19(6): 28–36.

  22. Mei Lijun, Zhou Qiang. Research on the information combination of HOWNET and thesaurus. Journal of Chinese Information Processing, 2005, 1(19): 63–70.

    Google Scholar 

  23. Wu Weitian. Chinese Computational Semantic Theory. Beijing: Electron Industry Press, 1999.

    Google Scholar 

  24. Lu Chuan. Semantic Network in Chinese Grammar. Beijing: Business Press, 1999.

    Google Scholar 

  25. Huang Zengyang. Hierarchical Network of Concepts Theory. Beijing: Tsinghua University Press, 1998.

    Google Scholar 

  26. Jin Guangjin, Lu Ruzhan. A method for extracting logical functors from Chinese sentences. Journal of Software, 1998, 9(6): 444–447.

    Google Scholar 

  27. Zhan Weidong. A framework of Chinese semantic representation — Generalized valence mode. In Proc. 5th Joint Symposium of Computation Linguistics, Beijing, 1999, pp. 1–7.

  28. Wu H, Zhou H. Synonymous collocation extraction using translation information. In Proc. the 41st Annual Meeting of the Association for Computational Linguistics, Japan, 2003, pp. 120–127.

  29. Le Sun, Youbin Jin, Lin Du, Yufang Sun. Word alignment of English-Chinese bilingual corpus based on chunks. In Proc. 2000 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, Hong Kong, 2000, pp. 110–116.

  30. Lv Yajuan, Zhao Tiejun, Li Sheng, Yang Muyun. English-Chinese word alignment based on statistic and lexicon. In Proc. the 6th Joint Symposium of Computational Linguistics, Taiyuan, China, 2001, pp. 108–115.

  31. Wei Wang, Ming Zhou, Jin-Xia Huang, Chang-Ning Huang. Structure alignment using bilingual chunking. In Proc. the 19th Int. Conf. Computational Linguistics, Taipei, 2002, pp. 1072–1078.

  32. Dekai Wu. Stochastic inversion transduction grammars and bilingual parsing of parallel corpora. Computational Linguistics, 1997, 23(3): 377–404.

    Google Scholar 

  33. Yajuan Lv. Research on bilingual corpus alignment and automatic translation knowledge acquisition [Dissertation]. Harbin Institute of Technology, 2003.

  34. Lin Xianming, Li Tangqiu, Shi Xiaodong. Auto-extraction of template library in template based machine translation (TBMT) system. Computer Applications. 2004, 24(9): 127–128.

  35. Zhang Chunxiang, Zhao Tiejun, Yang Muyun et al. Research on adapting machine translation system to new domain. Computer Engineering and Applications, 2005, 41(3): 10–11.

    Google Scholar 

  36. Wang Haifeng. Chinese oral language analysis and its application in MT [Dissertation]. Harbin Institute of Technology, 1999.

  37. Yang Muyun, Zhao Tiejun. Auto word alignment based Chinese-English EBMT. In Proc. Int. Workshop on Spoken Language Translation, 2004, pp. 10–13.

  38. Xie G, Zong C, Xu B. Chinese spoken language analyzing based on combination of statistical and rule methods. In Proc. the Int. Conf. Spoken Language Processing (ICSLP’2002), Colorado, USA, 2002, pp. 613–616.

  39. Wu H, Huang T et al. Chinese generation in a spoken dialogue translation system. In 18th Int. Conf. Computational Linguistics, Germany, 2000, pp. 1141–1144.

  40. Zhou Y, Zong C, Xu Bo. Various aligned models in Chinese-to-English statistical machine translation. In Proc. the IEEE Int. Conf. Natural Language Processing and Knowledge Engineering (NLP-KE), Wuhan, China, 2005, pp. 443–448.

  41. Pang W, Yang Zhengdong, Zhenbiao Chen et al. The CASIA phrase-based machine translation system. In Proc. 2005 International Workshop on Spoken Language Translation, Pittsburgh, USA, 2005, pp. 31–36.

  42. Wangxin Xue. The current state and development of Chinese search engines. Information Technology and Economic Development, 2005, 3(15): 1–3.

    Google Scholar 

  43. Luk R W P, Kwok K L. A comparison of Chinese document indexing strategies and retrieval models. ACM Trans. Asian Language Information Processing, 2002, 1(3): 225–268.

    Google Scholar 

  44. Peng F, Huang X, Schuurmans D et al. Investigating the relationship between word segmentation performance and retrieval performance in Chinese IR. In Proc. 19th Int. Conf. Computational Linguistics, Taipei, 2002, pp. 793–799.

  45. Hong G, He P, Wu G et al. The impact of Chinese segmentation to Chinese information retrieval. Computer Engineering and Application, 2003, 39(19): 78–90. (in Chinese)

    Google Scholar 

  46. Du L, Sun Y F. A new indexing method based on word proximity for Chinese text retrieval. Journal of Computer Science and Technology, 2000, 15(3): 280–286.

    Article  Google Scholar 

  47. Seo H C, Kim S B et al. Improving query translation in English-Korean cross-language information retrieval. Information Processing and Management, 2005, 41(3): 507–522.

    Google Scholar 

  48. Gao J F, Nie J Y, He Zh J et al. Resolving query translation ambiguity using a decaying co-occurrence model and syntactic dependence relations. In Proc. 25th Annual Int. Conf. Research and Development in Information Retrieval, Tampere, Finland, 2002, pp. 183–190.

  49. Zheng Dequan. Research on cross language information retrieval based on a combination of ontology and statistical language model [Dissertation]. Harbin Institute of Technology, 2006.

  50. Ion Muslea. Extraction patterns for information extraction tasks: A survey. In Proc. AAAI’99, Workshop on Machine Learning for Information Extraction, USA, 1999.

  51. Nicholas Kushmerick, Bernd Thomas. Adaptive information extraction: Core technologies for information agents. Intelligents Information Agents R&D in Europe: An AgentLink Perspective, 2002.

  52. Yuan Junpeng, Zhu Donghua, LI Yi et al. Survey of text mining technology. Application Research of Computers, 2006, 23(2): 1–4.

    Google Scholar 

  53. Wang Haochang, Zhao Tiejun, Yu Hao, Extracting information from biomedical literatures. In Proc. 8th Joint Symp. Computational Linguistics, Nanjing, 2005, pp. 217–220.

  54. Wang J B, Du C L, Wang K Z. Study of automatic abstraction system based on natural language understanding. Journal of Chinese Information Processing, 1995, 9(3): 33–42.

    Google Scholar 

  55. Wang Y C, Xu H M. OA automatic abstracting system on Chinese documents. Journal of the China Society for Scientific and Technical Information, 1997, 16(2): 128–132.

    Google Scholar 

  56. Tianshun Yao et al. Natural Language Processing—A Research of Making Computers Understand Human Languages. Beijing: Tsinghua University Press, Guangxi Science and Technology Publishing House, 1995.

  57. Li L, Zhong Y X, Guo X H. An understanding-based Chinese automatic abstract system in special field. J. Computer Research and Development, 2000, 37(4): 6–10.

    MATH  Google Scholar 

  58. Zheng Y, Huang X J, Wu L D. Research and implementation of automatic multi-documents summarization system. J. Computer Research and Development, 2003, 40(11): 1606–1611.

    Google Scholar 

  59. Qing B, Ling T, Li S. Multi-document summarization based on local topics identification and extraction. Acta Automatica Sinica, 2004, 30(6): 906–910.

    Google Scholar 

  60. Liu D R, Wang Y C, Liu C H. Study of multiple documents summarization based on subject concept cohesion. J. the China Society for Scientific and Technical Information, 2005, 24(1): 69–71.

    Google Scholar 

  61. Li Xiao, Liu Jimin, Shi Zhongzhi. The concept-reasoning network and its application in text classification. J. Computer Research and Development, 2000, 37(9): 1033–1038.

    Google Scholar 

  62. Xie Chongfeng, Li Xing. A sequence-based automatic text classification algorithm. J. Software, 2002, 13(6): 783–789.

    MathSciNet  Google Scholar 

  63. Tang Chunsheng, Jin Yihui. A multiple classifiers integration method based on full information matrix. Journal of Software, 2003, 14(6): 1103–1109.

    MATH  Google Scholar 

  64. Wang Zhiyong, Wang Zhengou. New text clustering method based on statistical reduction dimension and Kohonen network. Computer Applications, 2005, 15(10): 2328–2330.

    Google Scholar 

  65. Luo Weihua, Yu Manquan. The study of topic detection based on algorithm of division and multi-level clustering with multi-strategy optimization. Journal of Chinese Information Processing, 2006, 20(1): 29–36.

  66. Wai Lam, Helen M Meng, Kin Hui. Multilingual topic detection using a parallel corpus. In Proc. Topic Detection and Tracking 2000 Workshop, USA, 2000.

  67. Jingbo Zhu, Wenliang Chen, Tianshun Yao. TDT-oriented topic similarity computation model. In Proc. 7th Joint Symposium of Computational Linguistics, Harbin, 2003, pp. 476–481.

  68. Guo Qing, Wu Wenhu, Fang Litang. A new method in hidden Markov model for modeling frame correlation. Journal of Software, 1999, 10(6): 631–635.

    Google Scholar 

  69. Li Jian, Wang Zuo-ying. A new re-estmation algorithm of HMM’s transition probability. Acta Ectronica Sinica, 2001, 29(S1): 1833–1835.

    Google Scholar 

  70. Wang Renhua, Jiang Hui. Forward and backward hidden Markov model with their application to continuous speech recognition. Acta Electronica Sinica, 1996, 24(10): 63–68.

    MathSciNet  Google Scholar 

  71. Tang Yun, Liu Wen-ju, Xu Bo. Mandarin digit string recognition based on segment model using prosterior probability decoding. Chinese J. Computers, 2006, 29(4): 635–641.

    Google Scholar 

  72. Han Zhao-bing, Jia Lei, Zhang Shu-wu, Xu Bo. A combined clustering algorithm of acoustic modeling for continuous speech recognition. Journal of Chinese Information Processing, 2003, 17(4): 33–38.

    Google Scholar 

  73. Yu Sheng-min, Zhang Shu-wu, Xu bo. Research of Chinese-English bilingual acoustic modeling. Journal of Chinese Information Processing, 2004, 18(5): 78–84.

  74. Zhu Xiaoyan, Wang Yu, Xu Wei. Speech recognition model based on recurrent neural networks. Chinese Journal of Computer, 2001, 24(2): 213–218.

    Google Scholar 

  75. Zhang Ruiqiang, Wang Zuoying, Lu Dajin. Zero-probabilities of language model in translation of Chinese spellings to characters. Acta Electronic Sinica, 1998, 26(8): 43–46.

    Google Scholar 

  76. Taiyi Huang, Caifei Wang, Yoh-Han Pao. A Chinese text-to-speech synthesis system based on an initial-final model. In Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing, Paris, 1982, pp. 1601–1604.

  77. Wei Huawu, Cai Lianhong. Mandarin Sentence Synthesis System and the Acoustic Coding. In Proc. the 2nd National Conference on Acoustics, Guilin, China, 2002, pp. 281–291.

  78. Chu Min, Lv Shinan. A Chinese text-to-speech system with high intelligibility and high naturalness. Journal of Acoustics, 1996, 21(4), 639–647.

    Google Scholar 

  79. Wang Renhua, Liu Qinfeng, Hu Yu. KD2000 Chinese text-to-speech system. In Proc. 3rd Int. Conf. Multimodal Interface, Beijing, 2000, pp. 187–190.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sheng Li.

Additional information

Survey: Supported by the National Natural Science Foundation of China (Grant Nos. 60375019, 60373101 and 60575041).

Sheng Li is a professor and Ph.D. supervisor of Harbin Institute of Technology. He is a standing director of Chinese Information Processing Society, appraiser of National Science Foundation, director of MOE-MS Key laboratory of NLP & Speech in HIT. His research interests include machine translation, information retrieval and natural language processing. In recent years, he has accomplished more than 10 projects funded by the Natural Science Foundation of China or 863 Hi-Tech Project. He has won 4 Second Prizes and 3 Third Prizes of the Ministry Science and Technology Progress Award. He has published more than 70 academic papers in the journals and conferences at home and abroad.

Tie-Jun Zhao is a professor and Ph.D. supervisor of Harbin Institute of technology, vice director of MOE-MS Key Laboratory of NLP & Speech in HIT. He is the member of NLP subject committee of Chinese Information Society, member of editorial board of Journal of Chinese Information Processing, member of the committee of China Language Data Consortium, member of Harbin Expert Group on Information Security, the senior member of China Computer Federation. His research interests include natural language processing, content-based web information processing, applied artificial intelligence. He has won 3 prizes of Ministry Science & Technology Award. He has published over 60 academic papers and 2 books.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, S., Zhao, TJ. Chinese Information Processing and Its Prospects. J Comput Sci Technol 21, 838–846 (2006). https://doi.org/10.1007/s11390-006-0838-6

Download citation

  • Received:

  • Revised:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-006-0838-6

Keywords

Navigation