Diagnostic Knowledge Extraction from MedlinePlus: An Application for Infectious Diseases

  • Alejandro Rodríguez-GonzálezEmail author
  • Marcos Martínez-Romero
  • Roberto Costumero
  • Mark D. Wilkinson
  • Ernestina Menasalvas-Ruiz
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 375)


In the creation of diagnostic decision support systems (DDSS) it is crucial to have validated and precise knowledge in order to create accurate systems. Typically, medical experts are the source of this knowledge, but it is not always possible to obtain all the desired information from them. Another valuable source could be medical books or articles describing the diagnosis of diseases managed by the DDSS, but again, it is not easy to extract this information. In this paper we present the results of our research, in which we have used Web scraping and a combination of natural language processing techniques to extract diagnostic criteria from MedlinePlus articles about infectious diseases.


Diagnostic knowledge Information extraction CDSS DDSS NLP 



Alejandro Rodríguez González’s and Mark Wilkinson’s work is supported by Isaac Peral Programme of the UPM. Marcos Martínez-Romero work has been supported by a Postdoc Fellowship from the Xunta de Galicia, Spain (ref. POS-A/2013/197).


  1. 1.
    Tsumoto, S.: Automated extraction of medical expert system rules from clinical databases based on rough set theory. Inf. Sci. 12(1–4), 67–84 (1998)CrossRefGoogle Scholar
  2. 2.
    Tan, K.C., Yu, Q., Heng, C.M., Lee, T.H.: Evolutionary computing for knowledge discovery in medical diagnosis. Artif. Intell. Med. 27, 129–154 (2003)CrossRefGoogle Scholar
  3. 3.
    Hahn, U., Romacker, M., Schulz, S.: medSynDiKATe—a natural language system for the extraction of medical information from findings reports. Int. J. Med. Inf. 67(1–3), 63–74 (2002)Google Scholar
  4. 4.
    Amaral, M.B., Roberts, A., Rector, A.L.: NLP techniques associated with the OpenGALEN ontology for semi-automatic textual extraction of medical knowledge: abstracting and mapping equivalent linguistic and logical constructs. In: Proceedings if the AMIA Annual Symposium, pp. 76–80 (2000)Google Scholar
  5. 5.
    Aronson, A.R.: Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program. In: Proceedings of the AMIA Annual Symposium, pp. 17–21 (2001)Google Scholar
  6. 6.
    Hamosh, A., Scott, A.F., Amberger, J.S., Bocchini, C.A., McKusick, V.A.: Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Res. 33(1), 514–517 (2005)Google Scholar
  7. 7.
    Köhler, S., et al.: The human phenotype ontology project: linking molecular biology and disease through phenotype data. Nucleic Acids Res. 42(D1), 966–974 (2014)CrossRefGoogle Scholar
  8. 8.
    Bodenreider, O.: The unified medical language system (UMLS): integrating biomedical terminology. Nucleic Acids Res. 32(1), 267–270 (2004)CrossRefGoogle Scholar
  9. 9.
    Okumura, T., Aramaki, E., Tateisi, Y.: Clinical vocabulary and clinical finding concepts in medical literature. In: Proceedings of the International Joint Conference on Natural Language Processing Workshop on Natural Language Processing for Medical and Healthcare Fields, pp. 7–13 (2013)Google Scholar
  10. 10.
    Okumura, T., Tateisi, Y.: A lightweight approach for extracting disease-symptom relation with MetaMap toward automated generation of Disease Knowledge Base. Health Inf. Sci. 164–172 (2012)Google Scholar
  11. 11.
    Wu, Y., Denny, J.C., Rosenbloom, S.T., Miller, R.A., Giuse, D.A., Xu, H.A.: comparative study of current clinical natural language processing systems on handling abbreviations in discharge summaries. In: Proceedings of the AMIA Annual Symposium, pp. 997–1003 (2012)Google Scholar
  12. 12.
    Denecke, K.: Extracting medical Concepts from medical social media with clinical NLP tools: a qualitative study. In: Proceedings of the Fourth Workshop on Building and Evaluation Resources for Health and Biomedical Text Processing (2014)Google Scholar
  13. 13.
    Rodríguez-González, A., Martinez-Romero, M., Egaña-Aranguren, M., Wilkinson, M.D.: Nanopublishing clinical diagnoses: tracking diagnostic knowledge base content and utilization. In: IEEE 27th International Symposium on Computer-Based Medical Systems (CBMS), pp. 335–340 (2014)Google Scholar
  14. 14.
    Zhou, X.Z., Menche, J., Barabási, A.-L., Sharma, A.: Human symptoms–disease network. Nat. Commun. 5 (2013)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Alejandro Rodríguez-González
    • 1
    Email author
  • Marcos Martínez-Romero
    • 2
  • Roberto Costumero
    • 3
  • Mark D. Wilkinson
    • 1
  • Ernestina Menasalvas-Ruiz
    • 3
  1. 1.Universidad Politécnica de Madrid – Centro de Biotecnología y Genómica de PlantasMéxicoMéxico
  2. 2.Universidad de A Coruña – Centro IMEDIRA CoruñaSpain
  3. 3.Universidad Politécnica de Madrid – Centro de Biotecnología BiomédicaMadridSpain

Personalised recommendations