Skip to main content

Advertisement

Log in

“Similar query was answered earlier”: processing of patient authored text for retrieving relevant contents from health discussion forum

  • Research
  • Published:
Health Information Science and Systems Aims and scope Submit manuscript

Abstract

Online remedy finders and health-related discussion forums have become increasingly popular in recent years. Common web users write their health problems there and request suggestion from experts or other users. As a result, these forums became a huge repository of information and discussions on various health issues. An intelligent information retrieval system can help to utilize this repository in various applications. In this paper, we propose a system for the automatic identification of existing similar forum posts given a new post. The system is based on computing similarity between two patient authored texts. For computing the similarity between the current post and existing posts, the system uses a hybrid strategy based on template information, topic modelling, and latent semantic indexing. The system is tested using a set of real questions collected from a homeopathy forum namely abchomeopathy.com. The relevance of the posts retrieved by the system is evaluated by human experts. The evaluation results demonstrate that the precision of the system is 88.87%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

Notes

  1. https://patient.info/forums.

  2. https://www.medhelp.org/forums/.

  3. https://forum.hpathy.com/forum/.

  4. https://abchomeopathy.com/forums.php.

  5. https://www.drhomeo.com/.

  6. http://nltk.org/.

  7. https://en.wikipedia.org/wiki/Cosine similarity.

References

  1. Berlin A, Sorani M, Sim I. A taxonomic description of computer-based clinical decision support systems. J Biomed Inform. 2006;39(6):656–67.

    Article  Google Scholar 

  2. Wright A, Chen ES, Maloney FL. An automated technique for identifying associations between medications, laboratory results and problems. J Biomed Inform. 2010;43(6):891–901.

    Article  Google Scholar 

  3. Ordonez C. Association rule discovery with the train and test approach for heart disease prediction. IEEE Trans Inf Technol Biomed. 2006;10(2):334–43.

    Article  Google Scholar 

  4. Aronsky D, Chan KJ, Haug PJ. Evaluation of a computerized diagnostic decision support system for patients with pneumonia: study design considerations. J Am Med Inform Assoc. 2001;8(5):473–85.

    Article  Google Scholar 

  5. Liu J, Zhang Z, Wong DW, et al. Automatic glaucoma diagnosis through medical imaging informatics. J Am Med Inform Assoc. 2013;20(6):1021–7.

    Article  Google Scholar 

  6. Cimino JJ, Aguirre A, Johnson SB, Peng P. Generic queries for meeting clinical information needs. Bull Med Libr Assoc. 1993;81(2):195–206.

    Google Scholar 

  7. Yu H, Lee M, Kaufman D, et al. Development, implementation, and a cognitive evaluation of a definitional question answering system for physicians. J Biomed Inf. 2007;40(3):236–51.

    Article  Google Scholar 

  8. Yu H, Cao Y. Automatically extracting information needs from ad hoc clinical questions. In: AMIA annual symposium proceedings; 2008. p. 96–100.

  9. Cao YG, Liu F, Simpson P. AskHERMES: an online question answering system for complex clinical questions. J Biomed Inform. 2011;44:277–88.

    Article  Google Scholar 

  10. Harkema H, Roberts I, Gaizauskas R, Hepple M. Information extraction from clinical records. In: Proceedings of the 4th UK e-Science All Hands Meeting; 2005. p. 19–22.

  11. Sohn S, Clark C, Halgrim SR, Murphy SP, Jonnalagadda SR, Wagholikar KB. Analysis of cross-institutional medication description patterns in clinical narratives. Biomedical. 2013;6(2013):7–16.

    Google Scholar 

  12. Xu H, Stenner SP, Doan S, Johnson KB, Waitman LR, Denny JC. MedEx: a medication information extraction system for clinical narratives. J Am Med Inf Assoc. 2010;17(1):19–24.

    Article  Google Scholar 

  13. Savova GK, Masanz JJ, Ogren PV, Zheng J, Sohn S, Kipper-Schuler KC. Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications. J Am Med Inf Assoc. 2010;17(2010):507–13.

    Article  Google Scholar 

  14. Aronson AR, Lang F-M. An overview of MetaMap: historical perspective and recent advances. J Am Med Inf Assoc. 2010;17(2010):229–36.

    Article  Google Scholar 

  15. Stewart A, Smith M, Nejdl W. A transfer approach to detecting disease reporting events in blog social media. In: Proceedings of the 22nd ACM conference on hypertext and hypermedia; 2011. p. 271–280.

  16. Xu J, Gan L, Cheng M, Wu Q. Unsupervised medical entity recognition and linking in Chinese Online Medical Text. J Healthc Eng. 2018; 2018:Article ID 2548537

  17. Chen Y, Guo W, Zhao X. A semantic based information retrieval model for blog. In: Third international symposium on electronic commerce and security; 2010. p. 257–60.

  18. MacLean DL, Heer J. Identifying medical terms in patient-authored text: a crowdsourcing-based approach. J Am Med Inf Assoc. 2013;2013(20):1120–7.

    Article  Google Scholar 

  19. Ranjan H, Agarwal S, Prakash A, Saha SK. Automatic labelling of important terms and phrases from medical discussions. In: IEEE conference on information and communication technology; 2017. IEEE Explore. https://doi.org/10.1109/INFOCOMTECH.2017.8340644.

  20. Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P. Natural language processing (almost) from scratch. J Mach Learn Res (JMLR). 2011;12:2493–537.

    MATH  Google Scholar 

  21. Blei DM, Ng AY, Jordan MI, Lafferty J. Latent Dirichlet allocation. J Mach Learn Res. 2003;3:993–1022.

    MATH  Google Scholar 

  22. Kim H, Howland P, Park H. Dimension reduction in text classification with support vector machines. J Mach Learn Res. 2005;6(1):37–53.

    MathSciNet  MATH  Google Scholar 

  23. Manning CD, Raghavan P, Schütze H. Introduction to information retrieval. Cambridge: Cambridge University Press; 2008.

    Book  Google Scholar 

  24. Satyam A, Dawn AK, Saha SK. A statistical analysis approach to author identification using latent semantic analysis. Notebook for PAN at CLEF 2014. In: Proceedings of the CLEF2014 working notes, p. 1143–1147. ISSN 1613-0073. Sheffield, UK, 15–18 September 2014.

  25. Prakash A, Saha SK. Experiments on document chunking and query formation for plagiarism source retrieval. Notebook for PAN at CLEF 2014. In: Proceedings of the CLEF2014 working notes. p. 990–996. ISSN 1613-0073. Sheffield, UK, 15–18 September 2014.

  26. Plansangket S, Gan JQ. A query suggestion method combining TF-IDF and Jaccard coefficient for interactive web search. Artif Intell Res. 2015;4(2):119–25.

    Article  Google Scholar 

  27. Suchomel S, Brandejs, M. Improving synoptic quering for source retrieval—Notebook for PAN at CLEF 2015. CLEF 2015 Evaluation Labs and Workshop—Working Notes Papers, Toulouse, France, 8–11 September 2015.

Download references

Funding

The authors declare that they have received no funding for the current study.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sujan Kumar Saha.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Informed consent

Informed consent was obtained from all individual participants included in the study.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Saha, S.K., Prakash, A. & Majumder, M. “Similar query was answered earlier”: processing of patient authored text for retrieving relevant contents from health discussion forum. Health Inf Sci Syst 7, 4 (2019). https://doi.org/10.1007/s13755-019-0067-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s13755-019-0067-3

Keywords

Navigation