Assessing the Suitability of MeSH Ontology for Classifying Medline Documents

  • Rosalía Laza
  • Reyes Pavón
  • Miguel Reboiro-Jato
  • Florentino Fdez-Riverola
Part of the Advances in Intelligent and Soft Computing book series (AINSC, volume 93)


Automated document classification has become an interesting research field due to the increasing availability of biomedical information in digital form which is necessary to catalogue and organize. In this context, the machine learning paradigm is usually applied to text classification, according to which a general inductive process automatically builds a text classifier from a set of pre-classified documents. In this work we investigate the application of a Bayesian network model for the triage of documents represented by the association of different MeSH terms. Our results show both that Bayesian networks are adequate for describing conditional independencies between MeSH terms and that MeSH ontology is a valuable resource for representing Medline documents at different abstraction levels.


document classification MeSH ontology Medline documents Bayesian networks 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Shegal, A.K., Das, S., Noto, K., Saier, M., Elkan, C.: Identifying relevant data for a biological database: Handcrafted rules versus Machine Learning. IEEE/ACM Transactions on Computatuional Biology and Informatics 99, 1 (2009)Google Scholar
  2. 2.
    Nelson, S.J., Johnston, D., Humphreys, B.L.: Relationships in the organization of knowledge, pp. 171–184. Kluwer Academic Publishers, Dordrecht (2001)Google Scholar
  3. 3.
    Névéol, A., Shooshan, S.E., Humphrey, S.M., Mork, J.G., Aronson, A.R.: A recent advance in the automatic indexing of the biomedical literature. Journal of biomedical informatics 42(5), 814–823 (2008)CrossRefGoogle Scholar
  4. 4.
    Ruch, P., Ehreler, F., Marty, J., Chichester, C., Cohen, G., Fabry, P., Müller, H., Geissbühler, A.: Report on the TREC 2004 Experiment: Genomic Track. TREC 2004 (2005)Google Scholar
  5. 5.
    Hersh, W., Voorhess, E.: TREC genomics special issue overview. Information Retrieval 12(1), 1–15 (2009)CrossRefGoogle Scholar
  6. 6.
    Zhang, D., Lee, W.S.: Experience of using SVM for the Triage Task in TREC2004 Genomics Track. In: Proceedings of TREC 2004, Gaithersburg, Maryland (2004)Google Scholar
  7. 7.
    Si, L., Kanungo, T.: Thresholding Strategies for Text Classifiers: TREC-2005 Biomedical Triage Task Experiments. In: Proceedings of TREC 2005, Gaithersburg, Maryland (2005)Google Scholar
  8. 8.
    Kraaij, W., Weeber, M., Raaijmakers, S., Jelier, R.: MeSH based feedback, concept recognition and stacked classification for curation tasks. In: Proceedings of TREC 2004, Gaithersburg, Maryland (2004)Google Scholar
  9. 9.
    Chen, N., Blostein, D., Shatkay, H.: Biomedical document triage based on figure classification. In: First Canadian Student Conference on Biomedical Computing, Kingston Ontario (2006)Google Scholar
  10. 10.
    Lee, C., Hou, W.-J., Chen, H.-H.: Identifying relevant full-text articles for database curation. In: Proceedings of the fourteenth Text Retrieval Conference (TREC 2005), Gaithersburg, Maryland (2005)Google Scholar
  11. 11.
    Seki, K., Costello, J.C., Sigan, V.R., Mostafa, J.: TREC 2004 Genomics Track experiments at IUB. In: Proceedings of the Thirteenth Text Retrieval Conference (TREC 2004), Gaithersburg, Maryland (2004)Google Scholar
  12. 12.
    Camous, F., Blott, S., Asmeatou, F.: Ontology-Based MEDLINE Document Classification. In: Hochreiter, S., Wagner, R. (eds.) BIRD 2007. LNCS (LNBI), vol. 4414, pp. 439–452. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  13. 13.
    Pearl, J.: Probailistic Reasoning in Intelligent Systems: Networks of plausible inference. Morgan Kaufmann, San Mateo (1998)Google Scholar
  14. 14.
    Hliaoutakes, A.: Semantic similarity measures in MeSH ontology and their application to information retrieval on Medline. Master’s thesis, Tech. University of Crete, Chaniá, Crete (2005)Google Scholar
  15. 15.
    Cooper, G., Herskovits, E.: A Bayesian method for the introduction of probabilistic networks from data. Machine Learning 9(4), 309–347 (1992)zbMATHGoogle Scholar
  16. 16.
    Dayanik, A., Genkin, A., Kantor, P., Lewis, D.D., Madigan, D.: DIMACS at the TREC 2005 genomics track. In: Proceedings of the fourteenth Text Retrieval Conference (TREC 2005), Gaithersburg, Maryland (2005)Google Scholar
  17. 17.
    Rijsbergen, C.J.: Information Retrieval. Butterworth, London (1979)Google Scholar
  18. 18.
    Estabrooks, A., Jo, T., Japkowicz, N.: A multiple resampling method for learning from imbalanced data sets. Computational Intelligence 20(1), 18–36 (2004)MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Rosalía Laza
    • 1
  • Reyes Pavón
    • 1
  • Miguel Reboiro-Jato
    • 1
  • Florentino Fdez-Riverola
    • 1
  1. 1.ESEI: Escuela Superior de Ingeniería InformáticaUniversity of Vigo, Edificio PolitécnicoOurenseSpain

Personalised recommendations