“Similar query was answered earlier”: processing of patient authored text for retrieving relevant contents from health discussion forum
- 16 Downloads
Online remedy finders and health-related discussion forums have become increasingly popular in recent years. Common web users write their health problems there and request suggestion from experts or other users. As a result, these forums became a huge repository of information and discussions on various health issues. An intelligent information retrieval system can help to utilize this repository in various applications. In this paper, we propose a system for the automatic identification of existing similar forum posts given a new post. The system is based on computing similarity between two patient authored texts. For computing the similarity between the current post and existing posts, the system uses a hybrid strategy based on template information, topic modelling, and latent semantic indexing. The system is tested using a set of real questions collected from a homeopathy forum namely abchomeopathy.com. The relevance of the posts retrieved by the system is evaluated by human experts. The evaluation results demonstrate that the precision of the system is 88.87%.
KeywordsHealth information retrieval Patient authored text Web forum analysis Natural language processing Public health informatics
The authors declare that they have received no funding for the current study.
Compliance with ethical standards
Conflict of interest
The authors declare that they have no conflict of interest.
This article does not contain any studies with human participants or animals performed by any of the authors.
Informed consent was obtained from all individual participants included in the study.
- 6.Cimino JJ, Aguirre A, Johnson SB, Peng P. Generic queries for meeting clinical information needs. Bull Med Libr Assoc. 1993;81(2):195–206.Google Scholar
- 8.Yu H, Cao Y. Automatically extracting information needs from ad hoc clinical questions. In: AMIA annual symposium proceedings; 2008. p. 96–100.Google Scholar
- 10.Harkema H, Roberts I, Gaizauskas R, Hepple M. Information extraction from clinical records. In: Proceedings of the 4th UK e-Science All Hands Meeting; 2005. p. 19–22.Google Scholar
- 11.Sohn S, Clark C, Halgrim SR, Murphy SP, Jonnalagadda SR, Wagholikar KB. Analysis of cross-institutional medication description patterns in clinical narratives. Biomedical. 2013;6(2013):7–16.Google Scholar
- 15.Stewart A, Smith M, Nejdl W. A transfer approach to detecting disease reporting events in blog social media. In: Proceedings of the 22nd ACM conference on hypertext and hypermedia; 2011. p. 271–280.Google Scholar
- 16.Xu J, Gan L, Cheng M, Wu Q. Unsupervised medical entity recognition and linking in Chinese Online Medical Text. J Healthc Eng. 2018; 2018:Article ID 2548537Google Scholar
- 17.Chen Y, Guo W, Zhao X. A semantic based information retrieval model for blog. In: Third international symposium on electronic commerce and security; 2010. p. 257–60.Google Scholar
- 19.Ranjan H, Agarwal S, Prakash A, Saha SK. Automatic labelling of important terms and phrases from medical discussions. In: IEEE conference on information and communication technology; 2017. IEEE Explore. https://doi.org/10.1109/INFOCOMTECH.2017.8340644.
- 24.Satyam A, Dawn AK, Saha SK. A statistical analysis approach to author identification using latent semantic analysis. Notebook for PAN at CLEF 2014. In: Proceedings of the CLEF2014 working notes, p. 1143–1147. ISSN 1613-0073. Sheffield, UK, 15–18 September 2014.Google Scholar
- 25.Prakash A, Saha SK. Experiments on document chunking and query formation for plagiarism source retrieval. Notebook for PAN at CLEF 2014. In: Proceedings of the CLEF2014 working notes. p. 990–996. ISSN 1613-0073. Sheffield, UK, 15–18 September 2014.Google Scholar
- 27.Suchomel S, Brandejs, M. Improving synoptic quering for source retrieval—Notebook for PAN at CLEF 2015. CLEF 2015 Evaluation Labs and Workshop—Working Notes Papers, Toulouse, France, 8–11 September 2015.Google Scholar