A new similarity computing method based on concept similarity in Chinese text processing

  • Jing PengEmail author
  • DongQing Yang
  • ShiWei Tang
  • TengJiao Wang
  • Jun Gao


The paper proposes a new text similarity computing method based on concept similarity in Chinese text processing. The new method converts text to words vector space model at first, and then splits words into a set of concepts. Through computing the inner products between concepts, it obtains the similarity between words. The new method computes the similarity of text based on the similarity of words at last. The contributions of the paper include: 1) propose a new computing formula between words; 2) propose a new text similarity computing method based on words similarity; 3) successfully use the method in the application of similarity computing of WEB news; and 4) prove the validity of the method through extensive experiments.


concept similarity similarity computing vector space inner product space 


  1. 1.
    Nirenburg S. Two approaches of matching in example-based machine translation. In: Proc the 4th International Conference on Theoretical and Methodological Issues in Machine Translation(TMI-93), Kyoto, 1993. 47–57Google Scholar
  2. 2.
    Li S J, Zhang J, Huang X, et al. Semantic computation in Chinese question-answering system. J Comput Sci Tech, 2002,17(6),933–939zbMATHCrossRefGoogle Scholar
  3. 3.
    Ristad E S, Yianilos P N. Learning string-edit distance. IEEE PAMI, 1998, 20(5): 522–532Google Scholar
  4. 4.
    Chatterjee N. 2001. A statistical approach for similarity measurement between sentences for EBMT. In: Proceedings of Symposium on Translation Support Systems STRANS-2001. Kanpur: Indian Institute of Technology, 2001Google Scholar
  5. 5.
    Corley C, Mihalcea R. Measuring the Semantic similarity of texts. In: Proceedings of the ACL Workshop on Empirical Modeling of Semantic Equivalence and Entailment. Morristown. NJ: Assoc Comput Linguist, 2005, 13–18Google Scholar
  6. 6.
    Dagan I, Glickman O, Magnini B. The PASCAL recognising textual entailment challenge. In: Proceedings of the PASCAL Workshop. Berlin: Springer-Verlag, 2006. 3944: 177–190Google Scholar
  7. 7.
    Zhang Z, Otterbacher J, Radev D. Learning cross-document structural relations using boosting. In: Proceedings of the 12th International Conference on Information and Knowledge Management. New Orleans: ACM, 2003. 124–130Google Scholar
  8. 8.
    Dagan I, Lee L, Pereira F. Similarity-based models of word concurrence probabilities. Mach Learn, Special Issue on Machine Learning and Natural Language, 1999, 43–69Google Scholar
  9. 9.
    Dolan W B, Quirk C, Brockett C. Unsupervised construction of large paraphrase corpora: Exploiting massively parallel news sources. In: Proceedings of the 20th International Conference on Computational Linguistics. Morristown: Assoc Comput Linguist, 2004. 350–356Google Scholar
  10. 10.
    Budanitsky A, Hirst G. Semantic distance in wordnet: An experimental, application-oriented evaluation of five measures. In: Proceedings of the NAACL Workshop on Word-Net and Other Lexical Resources. Morristown: Assoc Comput Linguist, 2001Google Scholar
  11. 11.
    Liu Q, Li S J. Word similarity computing based on How-net. In: Computational Linguistics and Chinese Language Processing. Taiwan: Assoc Comput Linguist Chin Lang Proc, 2002. 7(2): 59–76Google Scholar
  12. 12.
    Fan Xinghua, Sun Maosong. A high performance two-class Chinese text categorization method. Chin J Comput, 2006, 29(1): 124–131MathSciNetGoogle Scholar
  13. 13.
    Pan Qianhong, Wang Ju, Shi Zhongzhi. Text similarity computing based on attribute theory. Chin J Comput, 1999, 22(6): 651–655Google Scholar
  14. 14.
    Xu Xiaoling, Peng Jing, Shi Baomei, et al. A New All-pairs Shortest Paths Algorithm Based on Edge List. Comput Eng Appl, 2005, 41(29): 88–90Google Scholar

Copyright information

© Science in China Press and Springer-Verlag GmbH 2008

Authors and Affiliations

  • Jing Peng
    • 1
    • 2
    Email author
  • DongQing Yang
    • 1
  • ShiWei Tang
    • 1
  • TengJiao Wang
    • 1
  • Jun Gao
    • 1
  1. 1.School of Electronics Engineering and Computer SciencePeking UniversityBeijingChina
  2. 2.Department of Science and TechnologyChengdu Municipal Public SecurityBureau, ChengduChina

Personalised recommendations