An Effective Machine Learning Framework for Data Elements Extraction from the Literature of Anxiety Outcome Measures to Build Systematic Review

  • Shubhaditya Goswami
  • Sukanya Pal
  • Simon Goldsworthy
  • Tanmay BasuEmail author
Conference paper
Part of the Lecture Notes in Business Information Processing book series (LNBIP, volume 353)


The process of developing systematic reviews is a well established method of collecting evidence from publications, where it follows a predefined and explicit protocol design to promote rigour, transparency and repeatability. The process is manual and involves lot of time and needs expertise. The aim of this work is to build an effective framework using machine learning techniques to partially automate the process of systematic literature review by extracting required data elements of anxiety outcome measures. A framework is thus proposed that initially builds a training corpus by extracting different data elements related to anxiety outcome measures from relevant publications. The publications are retrieved from Medline, EMBASE, CINAHL, AHMED and Pyscinfo following a given set of rules defined by a research group in the United Kingdom reviewing comfort interventions in health care. Subsequently, the method trains a machine learning classifier using this training corpus to extract the desired data elements from new publications. The experiments are conducted on 48 publications containing anxiety outcome measures with an aim to automatically extract the sentences stating the mean and standard deviation of the measures of outcomes of different types of interventions to lessen anxiety. The experimental results show that the recall and precision of the proposed method using random forest classifier are respectively 100% and 83%, which indicates that the method is able to extract all required data elements.


Information extraction NLP Health informatics Systematic review Text mining Machine learning 


  1. 1.
    Basu, T., et al.: A novel framework to expedite systematic reviews by automatically building information extraction training corpora. arXiv preprint arXiv:1606.06424 (2016)
  2. 2.
    Jonnalagadda, S.R., Goyal, P., Huffman, M.D.: Automating data extraction in systematic reviews: a systematic review. Syst. Rev. 4(1), 78 (2015)CrossRefGoogle Scholar
  3. 3.
    Goldsworthy, S.D., Tuke, K., Latour, J.M.: A focus group consultation round exploring patient experiences of comfort during radiotherapy for head and neck cancer. J. Radiother. Pract. 15(2), 143–149 (2016)CrossRefGoogle Scholar
  4. 4.
    Basu, T., Murthy, C.: A supervised term selection technique for effective text categorization. Int. J. Mach. Learn. Cybern. 7(5), 877–892 (2016)CrossRefGoogle Scholar
  5. 5.
    Yadav, V., Bethard, S.: A survey on recent advances in named entity recognition from deep learning models. In: Proceedings of the International Conference on Computational Linguistics (COLING), pp. 2145–2158 (2018)Google Scholar
  6. 6.
    Uzuner, Ö., Luo, Y., Szolovits, P.: Evaluating the state-of-the-art in automatic de-identification. J. Am. Med. Inform. Assoc. 14(5), 550–563 (2007)CrossRefGoogle Scholar
  7. 7.
    Uzuner, Ö., Solti, I., Cadag, E.: Extracting medication information from clinical text. J. Am. Med. Inform. Assoc. 17(5), 514–518 (2010)CrossRefGoogle Scholar
  8. 8.
    Halgrim, S.R., Xia, F., Solti, I., Cadag, E., Uzuner, Ö.: A cascade of classifiers for extracting medication information from discharge summaries. J. Biomed. Semant. 2(3), S2 (2011)CrossRefGoogle Scholar
  9. 9.
    Stubbs, A., Kotfila, C., Uzuner, Ö.: Automated systems for the de-identification of longitudinal clinical narratives: overview of 2014 i2b2/UTHealth shared task Track 1. J. Biomed. Inform. 58, S11–S19 (2015)CrossRefGoogle Scholar
  10. 10.
    Stubbs, A., Filannino, M., Uzuner, Ö.: De-identification of psychiatric intake records: overview of 2016 CEGS N-GRID shared tasks Track 1. J. Biomed. Inform. 75, S4–S18 (2017)CrossRefGoogle Scholar
  11. 11.
    Gobbel, G.T., et al.: Development and evaluation of raptat: a machine learning system for concept mapping of phrases from medical narratives. J. Biomed. Inform. 48, 54–65 (2014)CrossRefGoogle Scholar
  12. 12.
    Zhang, B., Lu, M., Fang, Y.: A feature-enhanced entity recognition method for Chinese electronic medical records. In: 2018 9th International Conference on Information Technology in Medicine and Education (ITME), pp. 9–14. IEEE (2018)Google Scholar
  13. 13.
    Goeuriot, L., et al.: Overview of the CLEF eHealth evaluation lab 2015. In: Mothe, J., et al. (eds.) CLEF 2015. LNCS, vol. 9283, pp. 429–443. Springer, Cham (2015). Scholar
  14. 14.
    Dalianis, H., Velupillai, S.: De-identifying swedish clinical text-refinement of a gold standard and experiments with conditional random fields. J. Biomed. Semant. 1(1), 6 (2010)CrossRefGoogle Scholar
  15. 15.
    Marshall, I.J., Kuiper, J., Banner, E., Wallace, B.C.: Automating biomedical evidence synthesis: RobotReviewer. In: Proceedings of the Conference. Association for Computational Linguistics. Meeting, vol. 2017, p. 7. NIH Public Access (2017)Google Scholar
  16. 16.
    Higgins, J.P.T., Green, S.: Cochrane Handbook for Systematic Reviews of Interventions, 5th edn. Cochrane Collaboration, London (2011)Google Scholar
  17. 17.
    Guntuku, S.C., Yaden, D.B., Kern, M.L., Ungar, L.H., Eichstaedt, J.C.: Detecting depression and mental illness on social media: an integrative review. Curr. Opin. Behav. Sci. 18, 43–49 (2017)CrossRefGoogle Scholar
  18. 18.
    De Choudhury, M., Counts, S., Horvitz, E.: Social media as a measurement tool of depression in populations. In: Proceedings of the Annual ACM Web Science Conference, pp. 47–56 (2013)Google Scholar
  19. 19.
    Shen, G., et al.: Depression detection via harvesting social media: a multimodal dictionary learning solution. In: Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI-17), pp. 3838–3844 (2017)Google Scholar
  20. 20.
    Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach.Learn. Res. 12, 2825–2830 (2011)MathSciNetzbMATHGoogle Scholar
  21. 21.
    Basu, T., Murthy, C.A.: A feature selection method for improved document classification. In: Zhou, S., Zhang, S., Karypis, G. (eds.) ADMA 2012. LNCS (LNAI), vol. 7713, pp. 296–305. Springer, Heidelberg (2012). Scholar
  22. 22.
    Trambert, R., Kowalski, M.O., Wu, B., Mehta, N., Friedman, P.: A randomized controlled trial provides evidence to support aromatherapy to minimize anxiety in women undergoing breast biopsy. Worldviews Evid.-Based Nurs. 14(5), 394–402 (2017)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Shubhaditya Goswami
    • 1
  • Sukanya Pal
    • 1
  • Simon Goldsworthy
    • 2
    • 3
  • Tanmay Basu
    • 1
    Email author
  1. 1.Ramakrishna Mission Vivekananda Educational and Research InstituteBelur MathIndia
  2. 2.Taunton and Somerset NHS Foundation Trust, Beacon CentreTauntonUK
  3. 3.University of the West of EnglandBristolUK

Personalised recommendations