Abstract
A Chinese Diseases Question Answering System(Hestia QA) is being developed by ISTIC. As a part of Hestia QA, a XML-based document retrieval and similarity calculation model is established here. The texts which describe diseases in Chinese are indexed and wrapped in XML tags. The query is compared with related tags in XML document and the similarity is calculated with a deformed cosine similarity algorithm. The Chinese terms semantic similarity calculation algorithm is used to get the similarity of two terms in the system. The result shows that our model works well. The Chinese disease XML datasets will be analyzed in different granularity levels or dimensions. The corpus of diseases in Chinese will be established after the automatic XML annotation software is completed in the next step.
This research is granted by National Twelfth “Five-Year Plan” for Science and Technology Support Program: 2011BAH10B04.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Zhao, J., Jin, Q.-L., Xu, B.: Semantic Computation for Text Retrieval. Chinese Journal of Computers 28(12) (December 2005)
Jin, Q.-L., Zhao, J., Xu, B.: Query expansion based on term similarity tree model. In: Proceedings of the International Conference on Nature Language Processing and Knowledge Engineering (NLPKE), Beijing, 400-406 (2003)
Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Information Processing & Management 24(5), 513–523 (1988)
Church, K.W., Gale, W.A.: Inverse document frequency (IDF): A measure of deviations from Poisson. In: Proceedings of the 3rd Workshop on Very Large Corpora, Boston, MA, USA, pp. 121–130 (1995)
Jiang, T.: Research on Rich-text XML Document Retrieval. Jiangxi University of Finance and Economics (2006)
Mei, J.J., Zhu, Y.M., Gao, Y.Q., Yin, H.X.: Tongyici Cilin: Shanghai Lexicographical Publishing House, Shanghai, China (1983) (in Chinese)
Tongyici Cilin (Extension Edition), http://www.irlab.org
Xu, S., Zhu, L., Qiao, X., Xue, C.: A Novel Approach to Chinese Terms Semantic Similarity Calculation Based on Pairwise Sequence Alignment. Journal of the China Society for Scientific and Technical Information 29(4), 701–708 (2010)
Han, J., Kamber, M., Pei, J.: Date Mining Concepts and Techniques (March 2012)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zhang, H., Zhu, L., Xu, S., Li, W. (2014). XML-Based Document Retrieval in Chinese Diseases Question Answering System. In: Park, J., Adeli, H., Park, N., Woungang, I. (eds) Mobile, Ubiquitous, and Intelligent Computing. Lecture Notes in Electrical Engineering, vol 274. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40675-1_33
Download citation
DOI: https://doi.org/10.1007/978-3-642-40675-1_33
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40674-4
Online ISBN: 978-3-642-40675-1
eBook Packages: EngineeringEngineering (R0)