XML-Based Document Retrieval in Chinese Diseases Question Answering System

  • Haodong ZhangEmail author
  • Lijun Zhu
  • Shuo Xu
  • Weifeng Li
Part of the Lecture Notes in Electrical Engineering book series (LNEE, volume 274)


A Chinese Diseases Question Answering System(Hestia QA) is being developed by ISTIC. As a part of Hestia QA, a XML-based document retrieval and similarity calculation model is established here. The texts which describe diseases in Chinese are indexed and wrapped in XML tags. The query is compared with related tags in XML document and the similarity is calculated with a deformed cosine similarity algorithm. The Chinese terms semantic similarity calculation algorithm is used to get the similarity of two terms in the system. The result shows that our model works well. The Chinese disease XML datasets will be analyzed in different granularity levels or dimensions. The corpus of diseases in Chinese will be established after the automatic XML annotation software is completed in the next step.


ML Chinese Diseases QA Chinese Terms Similarity Cosine Similarity 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Zhao, J., Jin, Q.-L., Xu, B.: Semantic Computation for Text Retrieval. Chinese Journal of Computers 28(12) (December 2005)Google Scholar
  2. 2.
    Jin, Q.-L., Zhao, J., Xu, B.: Query expansion based on term similarity tree model. In: Proceedings of the International Conference on Nature Language Processing and Knowledge Engineering (NLPKE), Beijing, 400-406 (2003)Google Scholar
  3. 3.
    Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Information Processing & Management 24(5), 513–523 (1988)CrossRefGoogle Scholar
  4. 4.
    Church, K.W., Gale, W.A.: Inverse document frequency (IDF): A measure of deviations from Poisson. In: Proceedings of the 3rd Workshop on Very Large Corpora, Boston, MA, USA, pp. 121–130 (1995)Google Scholar
  5. 5.
    Jiang, T.: Research on Rich-text XML Document Retrieval. Jiangxi University of Finance and Economics (2006)Google Scholar
  6. 6.
    Mei, J.J., Zhu, Y.M., Gao, Y.Q., Yin, H.X.: Tongyici Cilin: Shanghai Lexicographical Publishing House, Shanghai, China (1983) (in Chinese) Google Scholar
  7. 7.
    Tongyici Cilin (Extension Edition),
  8. 8.
    Xu, S., Zhu, L., Qiao, X., Xue, C.: A Novel Approach to Chinese Terms Semantic Similarity Calculation Based on Pairwise Sequence Alignment. Journal of the China Society for Scientific and Technical Information 29(4), 701–708 (2010)Google Scholar
  9. 9.
    Han, J., Kamber, M., Pei, J.: Date Mining Concepts and Techniques (March 2012)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  • Haodong Zhang
    • 1
    • 2
    Email author
  • Lijun Zhu
    • 2
  • Shuo Xu
    • 2
  • Weifeng Li
    • 2
  1. 1.Network Center, Science and Technology DailyBeijingChina
  2. 2.Institute of Scientific and Technical InformationBeijingChina

Personalised recommendations