Chinese Information Processing and Its Prospects

Li, Sheng; Zhao, Tie-Jun

doi:10.1007/s11390-006-0838-6

Chinese Information Processing and Its Prospects

Applications
Published: September 2006

Volume 21, pages 838–846, (2006)
Cite this article

Journal of Computer Science and Technology Aims and scope Submit manuscript

Sheng Li¹ &
Tie-Jun Zhao¹

1 Citation
Explore all metrics

Abstract

The paper presents some main progresses and achievements in Chinese information processing. It focuses on six aspects, i.e., Chinese syntactic analysis, Chinese semantic analysis, machine translation, information retrieval, information extraction, and speech recognition and synthesis. The important techniques and possible key problems of the respective branch in the near future are discussed as well.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Turkish and Its Challenges for Language and Speech Processing

Introduction to CKIP’s Language Resources and Their Applications

Language Modeling for Turkish Text and Speech Processing

References

Xu Lin, Zhao Tiejun. Review on recently finished NSFC sponsored NLP projects. Journal of Software, 16(10): 1853–1858.
Jin Kiat Low, Hwee Tou Ng, Wenyuan Guo. A maximum entropy approach to Chinese word segmentation. In Proc. Fourth SIGHAN Workshop on Chinese Language Processing, Jeju Island, Korea, 14–15 October 2005, pp. 161–164.
Chen Yin, Yang Muyun, Zhao Tiejun et al. A lexicalized second-order-HMM for ambiguity resolution in Chinese segmentation and POS tagging. High Technology Letters, 2005, 11(4): 346–350.
Google Scholar
Yinghong Liang, Tiejun Zhao. Distributed English text chunking using multi-agent based architecture. In Proc. Int. Conf. Artificial Intelligence, Mexican City, 2005, pp. 752–760.
Zhang Min. Research on algorithms of Chinese treebank construction based on weakly restricted stochastic context-sensitive grammars [Dissertation]. Harbin Institute of Technology, 1997.
Zhou Q. A statistics-based Chinese parser. In Proc. the 5th Workshop on Very Large Corpora, 1997, pp. 4–15.
Zhou M. A block-based dependency parser for unrestricted Chinese text. In Proc. 2nd Chinese Language Processing Workshop, ACL2000, Hong Kong, 2000, pp. 78–84.
Meng Yao. Research on global Chinese parsing model based maximum entropy and parsing algorithm [Dissertation]. Harbin Institute of Technology, 2003.
One-Soon Her. Grammatical functions and verb subcategorization in Mandarin Chinese [Dissertation]. University of Hawaii, 1990, pp. 342–359.
Dorr B J, Gina-Anne Levow, Dekang Lin, Scott Thomas. Chinese-English semantic resource construction. In Proc. the 2nd Int. Conf. Language Resources and Evaluation, Athens, Greece, 2000, pp. 757–760.
Han Xiwu. Research on automatic acquisition of Chinese verb subcategorization [Dissertation]. Harbin Institute of Technology, 2006.
Huang ChangNing, TongXiang. Auto-tagging Chinese real-text word sense. Applied Linguistics, 1993, (4): 18–25.
Li JuanZi. The research on Chinese word sense disambiguation [Dissertation]. Tsinghua University, Beijing, 1999.
Lu Song, Bai Shuo, Huang Xiong. An unsuptervised approach to word sense disambiguation based on sense-words in vector space model. Journal of Software, 2002, 3(6): 1082–1089.
Google Scholar
Lu Zhimao, Liu Ting, Zhang Gang et al. Word sense disambiguation based on dependency relationship analysis and Bayes model. High Technology Letters, 2003, 13(5): 4–10.
MATH Google Scholar
Xu Min, Wang Nengzhong, Ma Yanhua. On study of anaphora resolution of Chinese character. J. Southwest China Normal University (Natural Science), 1999, 24(6): 633–637.
Google Scholar
Zhang Wei, Zhou Changle. Study on meta-anaphoric resolution in Chinese discourse understanding. Journal of Software, 2002, 13(4): 732–738.
MathSciNet Google Scholar
Wang Xiaobin, Zhou Changle. Study on Chinese pronominal anaphora resolution based on discourse representation theory. Journal of Xiamen University (Natural Science), 2004, 43(1): 31–35.
MathSciNet Google Scholar
Li Guochen, Luo Yunfei. Chinese pronominal anaphora resolution via a preference selection approach. Journal of Chinese Information Processing, 2005, 19(4): 24–30.
Coulthard R. An Introduction to Discourse Analysis. London: Longman, 1985.
Google Scholar
Fu Jianlian, Chen Qunxiu. Topic analysis in the automatic summarization system. Journal of Chinese Information Processing, 2005, 19(6): 28–36.
Mei Lijun, Zhou Qiang. Research on the information combination of HOWNET and thesaurus. Journal of Chinese Information Processing, 2005, 1(19): 63–70.
Google Scholar
Wu Weitian. Chinese Computational Semantic Theory. Beijing: Electron Industry Press, 1999.
Google Scholar
Lu Chuan. Semantic Network in Chinese Grammar. Beijing: Business Press, 1999.
Google Scholar
Huang Zengyang. Hierarchical Network of Concepts Theory. Beijing: Tsinghua University Press, 1998.
Google Scholar
Jin Guangjin, Lu Ruzhan. A method for extracting logical functors from Chinese sentences. Journal of Software, 1998, 9(6): 444–447.
Google Scholar
Zhan Weidong. A framework of Chinese semantic representation — Generalized valence mode. In Proc. 5th Joint Symposium of Computation Linguistics, Beijing, 1999, pp. 1–7.
Wu H, Zhou H. Synonymous collocation extraction using translation information. In Proc. the 41st Annual Meeting of the Association for Computational Linguistics, Japan, 2003, pp. 120–127.
Le Sun, Youbin Jin, Lin Du, Yufang Sun. Word alignment of English-Chinese bilingual corpus based on chunks. In Proc. 2000 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, Hong Kong, 2000, pp. 110–116.
Lv Yajuan, Zhao Tiejun, Li Sheng, Yang Muyun. English-Chinese word alignment based on statistic and lexicon. In Proc. the 6th Joint Symposium of Computational Linguistics, Taiyuan, China, 2001, pp. 108–115.
Wei Wang, Ming Zhou, Jin-Xia Huang, Chang-Ning Huang. Structure alignment using bilingual chunking. In Proc. the 19th Int. Conf. Computational Linguistics, Taipei, 2002, pp. 1072–1078.
Dekai Wu. Stochastic inversion transduction grammars and bilingual parsing of parallel corpora. Computational Linguistics, 1997, 23(3): 377–404.
Google Scholar
Yajuan Lv. Research on bilingual corpus alignment and automatic translation knowledge acquisition [Dissertation]. Harbin Institute of Technology, 2003.
Lin Xianming, Li Tangqiu, Shi Xiaodong. Auto-extraction of template library in template based machine translation (TBMT) system. Computer Applications. 2004, 24(9): 127–128.
Zhang Chunxiang, Zhao Tiejun, Yang Muyun et al. Research on adapting machine translation system to new domain. Computer Engineering and Applications, 2005, 41(3): 10–11.
Google Scholar
Wang Haifeng. Chinese oral language analysis and its application in MT [Dissertation]. Harbin Institute of Technology, 1999.
Yang Muyun, Zhao Tiejun. Auto word alignment based Chinese-English EBMT. In Proc. Int. Workshop on Spoken Language Translation, 2004, pp. 10–13.
Xie G, Zong C, Xu B. Chinese spoken language analyzing based on combination of statistical and rule methods. In Proc. the Int. Conf. Spoken Language Processing (ICSLP’2002), Colorado, USA, 2002, pp. 613–616.
Wu H, Huang T et al. Chinese generation in a spoken dialogue translation system. In 18th Int. Conf. Computational Linguistics, Germany, 2000, pp. 1141–1144.
Zhou Y, Zong C, Xu Bo. Various aligned models in Chinese-to-English statistical machine translation. In Proc. the IEEE Int. Conf. Natural Language Processing and Knowledge Engineering (NLP-KE), Wuhan, China, 2005, pp. 443–448.
Pang W, Yang Zhengdong, Zhenbiao Chen et al. The CASIA phrase-based machine translation system. In Proc. 2005 International Workshop on Spoken Language Translation, Pittsburgh, USA, 2005, pp. 31–36.
Wangxin Xue. The current state and development of Chinese search engines. Information Technology and Economic Development, 2005, 3(15): 1–3.
Google Scholar
Luk R W P, Kwok K L. A comparison of Chinese document indexing strategies and retrieval models. ACM Trans. Asian Language Information Processing, 2002, 1(3): 225–268.
Google Scholar
Peng F, Huang X, Schuurmans D et al. Investigating the relationship between word segmentation performance and retrieval performance in Chinese IR. In Proc. 19th Int. Conf. Computational Linguistics, Taipei, 2002, pp. 793–799.
Hong G, He P, Wu G et al. The impact of Chinese segmentation to Chinese information retrieval. Computer Engineering and Application, 2003, 39(19): 78–90. (in Chinese)
Google Scholar
Du L, Sun Y F. A new indexing method based on word proximity for Chinese text retrieval. Journal of Computer Science and Technology, 2000, 15(3): 280–286.
Article Google Scholar
Seo H C, Kim S B et al. Improving query translation in English-Korean cross-language information retrieval. Information Processing and Management, 2005, 41(3): 507–522.
Google Scholar
Gao J F, Nie J Y, He Zh J et al. Resolving query translation ambiguity using a decaying co-occurrence model and syntactic dependence relations. In Proc. 25th Annual Int. Conf. Research and Development in Information Retrieval, Tampere, Finland, 2002, pp. 183–190.
Zheng Dequan. Research on cross language information retrieval based on a combination of ontology and statistical language model [Dissertation]. Harbin Institute of Technology, 2006.
Ion Muslea. Extraction patterns for information extraction tasks: A survey. In Proc. AAAI’99, Workshop on Machine Learning for Information Extraction, USA, 1999.
Nicholas Kushmerick, Bernd Thomas. Adaptive information extraction: Core technologies for information agents. Intelligents Information Agents R&D in Europe: An AgentLink Perspective, 2002.
Yuan Junpeng, Zhu Donghua, LI Yi et al. Survey of text mining technology. Application Research of Computers, 2006, 23(2): 1–4.
Google Scholar
Wang Haochang, Zhao Tiejun, Yu Hao, Extracting information from biomedical literatures. In Proc. 8th Joint Symp. Computational Linguistics, Nanjing, 2005, pp. 217–220.
Wang J B, Du C L, Wang K Z. Study of automatic abstraction system based on natural language understanding. Journal of Chinese Information Processing, 1995, 9(3): 33–42.
Google Scholar
Wang Y C, Xu H M. OA automatic abstracting system on Chinese documents. Journal of the China Society for Scientific and Technical Information, 1997, 16(2): 128–132.
Google Scholar
Tianshun Yao et al. Natural Language Processing—A Research of Making Computers Understand Human Languages. Beijing: Tsinghua University Press, Guangxi Science and Technology Publishing House, 1995.
Li L, Zhong Y X, Guo X H. An understanding-based Chinese automatic abstract system in special field. J. Computer Research and Development, 2000, 37(4): 6–10.
MATH Google Scholar
Zheng Y, Huang X J, Wu L D. Research and implementation of automatic multi-documents summarization system. J. Computer Research and Development, 2003, 40(11): 1606–1611.
Google Scholar
Qing B, Ling T, Li S. Multi-document summarization based on local topics identification and extraction. Acta Automatica Sinica, 2004, 30(6): 906–910.
Google Scholar
Liu D R, Wang Y C, Liu C H. Study of multiple documents summarization based on subject concept cohesion. J. the China Society for Scientific and Technical Information, 2005, 24(1): 69–71.
Google Scholar
Li Xiao, Liu Jimin, Shi Zhongzhi. The concept-reasoning network and its application in text classification. J. Computer Research and Development, 2000, 37(9): 1033–1038.
Google Scholar
Xie Chongfeng, Li Xing. A sequence-based automatic text classification algorithm. J. Software, 2002, 13(6): 783–789.
MathSciNet Google Scholar
Tang Chunsheng, Jin Yihui. A multiple classifiers integration method based on full information matrix. Journal of Software, 2003, 14(6): 1103–1109.
MATH Google Scholar
Wang Zhiyong, Wang Zhengou. New text clustering method based on statistical reduction dimension and Kohonen network. Computer Applications, 2005, 15(10): 2328–2330.
Google Scholar
Luo Weihua, Yu Manquan. The study of topic detection based on algorithm of division and multi-level clustering with multi-strategy optimization. Journal of Chinese Information Processing, 2006, 20(1): 29–36.
Wai Lam, Helen M Meng, Kin Hui. Multilingual topic detection using a parallel corpus. In Proc. Topic Detection and Tracking 2000 Workshop, USA, 2000.
Jingbo Zhu, Wenliang Chen, Tianshun Yao. TDT-oriented topic similarity computation model. In Proc. 7th Joint Symposium of Computational Linguistics, Harbin, 2003, pp. 476–481.
Guo Qing, Wu Wenhu, Fang Litang. A new method in hidden Markov model for modeling frame correlation. Journal of Software, 1999, 10(6): 631–635.
Google Scholar
Li Jian, Wang Zuo-ying. A new re-estmation algorithm of HMM’s transition probability. Acta Ectronica Sinica, 2001, 29(S1): 1833–1835.
Google Scholar
Wang Renhua, Jiang Hui. Forward and backward hidden Markov model with their application to continuous speech recognition. Acta Electronica Sinica, 1996, 24(10): 63–68.
MathSciNet Google Scholar
Tang Yun, Liu Wen-ju, Xu Bo. Mandarin digit string recognition based on segment model using prosterior probability decoding. Chinese J. Computers, 2006, 29(4): 635–641.
Google Scholar
Han Zhao-bing, Jia Lei, Zhang Shu-wu, Xu Bo. A combined clustering algorithm of acoustic modeling for continuous speech recognition. Journal of Chinese Information Processing, 2003, 17(4): 33–38.
Google Scholar
Yu Sheng-min, Zhang Shu-wu, Xu bo. Research of Chinese-English bilingual acoustic modeling. Journal of Chinese Information Processing, 2004, 18(5): 78–84.
Zhu Xiaoyan, Wang Yu, Xu Wei. Speech recognition model based on recurrent neural networks. Chinese Journal of Computer, 2001, 24(2): 213–218.
Google Scholar
Zhang Ruiqiang, Wang Zuoying, Lu Dajin. Zero-probabilities of language model in translation of Chinese spellings to characters. Acta Electronic Sinica, 1998, 26(8): 43–46.
Google Scholar
Taiyi Huang, Caifei Wang, Yoh-Han Pao. A Chinese text-to-speech synthesis system based on an initial-final model. In Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing, Paris, 1982, pp. 1601–1604.
Wei Huawu, Cai Lianhong. Mandarin Sentence Synthesis System and the Acoustic Coding. In Proc. the 2nd National Conference on Acoustics, Guilin, China, 2002, pp. 281–291.
Chu Min, Lv Shinan. A Chinese text-to-speech system with high intelligibility and high naturalness. Journal of Acoustics, 1996, 21(4), 639–647.
Google Scholar
Wang Renhua, Liu Qinfeng, Hu Yu. KD2000 Chinese text-to-speech system. In Proc. 3rd Int. Conf. Multimodal Interface, Beijing, 2000, pp. 187–190.

Download references

Author information

Authors and Affiliations

MOE-MS Key Laboratory of Natural Language Processing and Speech, Harbin Institute of Technology, Harbin, 150001, P.R. China
Sheng Li & Tie-Jun Zhao

Authors

Sheng Li
View author publications
You can also search for this author in PubMed Google Scholar
Tie-Jun Zhao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sheng Li.

Additional information

Survey: Supported by the National Natural Science Foundation of China (Grant Nos. 60375019, 60373101 and 60575041).

Sheng Li is a professor and Ph.D. supervisor of Harbin Institute of Technology. He is a standing director of Chinese Information Processing Society, appraiser of National Science Foundation, director of MOE-MS Key laboratory of NLP & Speech in HIT. His research interests include machine translation, information retrieval and natural language processing. In recent years, he has accomplished more than 10 projects funded by the Natural Science Foundation of China or 863 Hi-Tech Project. He has won 4 Second Prizes and 3 Third Prizes of the Ministry Science and Technology Progress Award. He has published more than 70 academic papers in the journals and conferences at home and abroad.

Tie-Jun Zhao is a professor and Ph.D. supervisor of Harbin Institute of technology, vice director of MOE-MS Key Laboratory of NLP & Speech in HIT. He is the member of NLP subject committee of Chinese Information Society, member of editorial board of Journal of Chinese Information Processing, member of the committee of China Language Data Consortium, member of Harbin Expert Group on Information Security, the senior member of China Computer Federation. His research interests include natural language processing, content-based web information processing, applied artificial intelligence. He has won 3 prizes of Ministry Science & Technology Award. He has published over 60 academic papers and 2 books.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, S., Zhao, TJ. Chinese Information Processing and Its Prospects. J Comput Sci Technol 21, 838–846 (2006). https://doi.org/10.1007/s11390-006-0838-6

Download citation

Received: 10 April 2006
Revised: 26 July 2006
Issue Date: September 2006
DOI: https://doi.org/10.1007/s11390-006-0838-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Chinese Information Processing and Its Prospects

Abstract

Access this article

Similar content being viewed by others

Turkish and Its Challenges for Language and Speech Processing

Introduction to CKIP’s Language Resources and Their Applications

Language Modeling for Turkish Text and Speech Processing

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Chinese Information Processing and Its Prospects

Abstract

Access this article

Similar content being viewed by others

Turkish and Its Challenges for Language and Speech Processing

Introduction to CKIP’s Language Resources and Their Applications

Language Modeling for Turkish Text and Speech Processing

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation