XML-Based Document Retrieval in Chinese Diseases Question Answering System

Zhang, Haodong; Zhu, Lijun; Xu, Shuo; Li, Weifeng

doi:10.1007/978-3-642-40675-1_33

XML-Based Document Retrieval in Chinese Diseases Question Answering System

Haodong Zhang^5,6,
Lijun Zhu⁶,
Shuo Xu⁶ &
…
Weifeng Li⁶

Conference paper

2694 Accesses
4 Citations

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 274))

Abstract

A Chinese Diseases Question Answering System(Hestia QA) is being developed by ISTIC. As a part of Hestia QA, a XML-based document retrieval and similarity calculation model is established here. The texts which describe diseases in Chinese are indexed and wrapped in XML tags. The query is compared with related tags in XML document and the similarity is calculated with a deformed cosine similarity algorithm. The Chinese terms semantic similarity calculation algorithm is used to get the similarity of two terms in the system. The result shows that our model works well. The Chinese disease XML datasets will be analyzed in different granularity levels or dimensions. The corpus of diseases in Chinese will be established after the automatic XML annotation software is completed in the next step.

This research is granted by National Twelfth “Five-Year Plan” for Science and Technology Support Program: 2011BAH10B04.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Zhao, J., Jin, Q.-L., Xu, B.: Semantic Computation for Text Retrieval. Chinese Journal of Computers 28(12) (December 2005)
Google Scholar
Jin, Q.-L., Zhao, J., Xu, B.: Query expansion based on term similarity tree model. In: Proceedings of the International Conference on Nature Language Processing and Knowledge Engineering (NLPKE), Beijing, 400-406 (2003)
Google Scholar
Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Information Processing & Management 24(5), 513–523 (1988)
Article Google Scholar
Church, K.W., Gale, W.A.: Inverse document frequency (IDF): A measure of deviations from Poisson. In: Proceedings of the 3rd Workshop on Very Large Corpora, Boston, MA, USA, pp. 121–130 (1995)
Google Scholar
Jiang, T.: Research on Rich-text XML Document Retrieval. Jiangxi University of Finance and Economics (2006)
Google Scholar
Mei, J.J., Zhu, Y.M., Gao, Y.Q., Yin, H.X.: Tongyici Cilin: Shanghai Lexicographical Publishing House, Shanghai, China (1983) (in Chinese)
Google Scholar
Tongyici Cilin (Extension Edition), http://www.irlab.org
Xu, S., Zhu, L., Qiao, X., Xue, C.: A Novel Approach to Chinese Terms Semantic Similarity Calculation Based on Pairwise Sequence Alignment. Journal of the China Society for Scientific and Technical Information 29(4), 701–708 (2010)
Google Scholar
Han, J., Kamber, M., Pei, J.: Date Mining Concepts and Techniques (March 2012)
Google Scholar

Download references

Author information

Authors and Affiliations

Network Center, Science and Technology Daily, Beijing, China
Haodong Zhang
Institute of Scientific and Technical Information, Beijing, China
Haodong Zhang, Lijun Zhu, Shuo Xu & Weifeng Li

Authors

Haodong Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Lijun Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Shuo Xu
View author publications
You can also search for this author in PubMed Google Scholar
Weifeng Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Haodong Zhang .

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, Seoul University of Science & and Technology (SeoulTech), Seoul, Korea, Republic of (South Korea)
James J. (Jong Hyuk) Park
Biomedical Informatics Neuroscience, Ohio State University Center for Biomedical Engineering, Columbus, Ohio, USA
Hojjat Adeli
Dept of Computer Education, Jeju National University Teachers College, Jeju Special Self-Governing Province, Korea, Republic of (South Korea)
Namje Park
Ryerson University Dept. Computer Science, Toronto, Ontario, Canada
Isaac Woungang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, H., Zhu, L., Xu, S., Li, W. (2014). XML-Based Document Retrieval in Chinese Diseases Question Answering System. In: Park, J., Adeli, H., Park, N., Woungang, I. (eds) Mobile, Ubiquitous, and Intelligent Computing. Lecture Notes in Electrical Engineering, vol 274. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40675-1_33

Download citation

DOI: https://doi.org/10.1007/978-3-642-40675-1_33
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40674-4
Online ISBN: 978-3-642-40675-1
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics