XML-Based Document Retrieval in Chinese Diseases Question Answering System
A Chinese Diseases Question Answering System(Hestia QA) is being developed by ISTIC. As a part of Hestia QA, a XML-based document retrieval and similarity calculation model is established here. The texts which describe diseases in Chinese are indexed and wrapped in XML tags. The query is compared with related tags in XML document and the similarity is calculated with a deformed cosine similarity algorithm. The Chinese terms semantic similarity calculation algorithm is used to get the similarity of two terms in the system. The result shows that our model works well. The Chinese disease XML datasets will be analyzed in different granularity levels or dimensions. The corpus of diseases in Chinese will be established after the automatic XML annotation software is completed in the next step.
KeywordsML Chinese Diseases QA Chinese Terms Similarity Cosine Similarity
Unable to display preview. Download preview PDF.
- 1.Zhao, J., Jin, Q.-L., Xu, B.: Semantic Computation for Text Retrieval. Chinese Journal of Computers 28(12) (December 2005)Google Scholar
- 2.Jin, Q.-L., Zhao, J., Xu, B.: Query expansion based on term similarity tree model. In: Proceedings of the International Conference on Nature Language Processing and Knowledge Engineering (NLPKE), Beijing, 400-406 (2003)Google Scholar
- 4.Church, K.W., Gale, W.A.: Inverse document frequency (IDF): A measure of deviations from Poisson. In: Proceedings of the 3rd Workshop on Very Large Corpora, Boston, MA, USA, pp. 121–130 (1995)Google Scholar
- 5.Jiang, T.: Research on Rich-text XML Document Retrieval. Jiangxi University of Finance and Economics (2006)Google Scholar
- 6.Mei, J.J., Zhu, Y.M., Gao, Y.Q., Yin, H.X.: Tongyici Cilin: Shanghai Lexicographical Publishing House, Shanghai, China (1983) (in Chinese) Google Scholar
- 7.Tongyici Cilin (Extension Edition), http://www.irlab.org
- 8.Xu, S., Zhu, L., Qiao, X., Xue, C.: A Novel Approach to Chinese Terms Semantic Similarity Calculation Based on Pairwise Sequence Alignment. Journal of the China Society for Scientific and Technical Information 29(4), 701–708 (2010)Google Scholar
- 9.Han, J., Kamber, M., Pei, J.: Date Mining Concepts and Techniques (March 2012)Google Scholar