Using Dictionaries for Biomedical Text Classification

  • R. Romero
  • E. L. Iglesias
  • L. Borrajo
  • C. M. Redondo Marey
Part of the Advances in Intelligent and Soft Computing book series (AINSC, volume 93)


The purpose of this paper is to study the use of dictionaries in the classification of biomedical texts. Experiments are conducted with three different dictionaries (BioCreative [13], NLPBA [8] and a subset of the UniProt database [4], named Protein) and three types of classifiers (KNN, SVM and Naive-Bayes) when they are applied to search on the PubMed database. Dictionaries have been used during the preprocessing and annotation of documents. The best results were obtained with the NLPBA and Protein dictionaries and the SVM classifier.


Biomedical text mining classification techniques dictionaries 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Abi-Haidar, A., Rocha, L.M.: Biomedical article classification using an agent-based model of T-cell cross-regulation. In: Hart, E., McEwan, C., Timmis, J., Hone, A. (eds.) ICARIS 2010. LNCS, vol. 6209, pp. 237–249. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  2. 2.
    Aha, D.W., Kibler, D., Albert, M.K.: Instance-based learning algorithms. Machine Learning, 37–66 (1991)Google Scholar
  3. 3.
    Ando, R.K., Dredze, M., Zhang, T.: Trec 2005 genomics track experiments at ibm watson. In: In Proceedings of TREC 2005. NIST Special Publication (2005)Google Scholar
  4. 4.
    Apweiler, R., Bairoch, A., Wu, C.H., Barker, W.C., Boeckmann, B., Ferro, S., Gasteiger, E., Huang, H., Lopez, R., Magrane, M., Martin, M.J., Natale, D.A., O’Donovan, C., Redaschi, N., Yeh, L.S.L.: Uniprot: the universal protein knowledgebase. Nucleic Acids Res. 32, 115–119 (2004)CrossRefGoogle Scholar
  5. 5.
    Bai, R., Wang, X., Liao, J.: Extract semantic information from wordnet to improve text classification performance. In: AST/UCMA/ISA/ACN, pp. 409–420 (2010)Google Scholar
  6. 6.
    Boguraev, B., Briscoe, T., Carroll, J., Carter, D., Grover, C.: The derivation of a grammatically indexed lexicon from the longman dictionary of contemporary english. In: Proceedings of the 25th Annual Meeting on Association for Computational Linguistics, Morristown, NJ, USA, pp. 193–200. Association for Computational Linguistics (1987)Google Scholar
  7. 7.
    Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines (2001)Google Scholar
  8. 8.
    Collier, N., Ruch, P., Nazarenko, A. (eds.): JNLPBA 2004: Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications, Morristown, NJ, USA. Association for Computational Linguistics (2004)Google Scholar
  9. 9.
    Cortes, C., Vapnik, V.: Support-vector networks. Machine Learning, 273–297 (1995)Google Scholar
  10. 10.
    Cunningham, H., Wilks, Y., Gaizauskas, R.J.: Gate - a general architecture for text engineering (1996)Google Scholar
  11. 11.
    Frakes, W.B., Baeza-Yates, R.A. (eds.): Information Retrieval: Data Structures and Algorithms. Prentice-Hall, Englewood Cliffs (1992)Google Scholar
  12. 12.
    Garner, S.R.: Weka: The waikato environment for knowledge analysis. In: Proc. of the New Zealand Computer Science Research Students Conference, pp. 57–64 (1995)Google Scholar
  13. 13.
    Hirschman, L., Yeh, A., Blaschke, C., Valencia, A.: Overview of biocreative: critical assessment of information extraction for biology. BMC Bioinformatics 6 (Suppl.1), S1 (2005)CrossRefGoogle Scholar
  14. 14.
    John, G., Langley, P.: Estimating continuous distributions in bayesian classifiers. In: Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence, pp. 338–345. Morgan Kaufmann, San Francisco (1995)Google Scholar
  15. 15.
    Kang, P., Cho, S.: EUS SVMs: Ensemble of Under-Sampled SVMs for Data Imbalance Problems. In: King, I., Wang, J., Chan, L.-W., Wang, D. (eds.) ICONIP 2006. LNCS, vol. 4232, pp. 837–846. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  16. 16.
    Liu, Y., Scheuermann, P., Li, X., Zhu, X.: Using wordnet to disambiguate word senses for text classification. In: Proceedings of the 7th International Conference on Computational Science, Part III: ICCS 2007, pp. 781–789. Springer, Heidelberg (2007)Google Scholar
  17. 17.
    McCrae, J., Collier, N.: Synonym set extraction from the biomedical literature by lexical pattern discovery. BMC Bioinformatics 9 (2008)Google Scholar
  18. 18.
    Miller, G.A., Beckwith, R., Fellbaum, C., Gross, D., Miller, K.J.: Introduction to wordnet: An on-line lexical database. Journal of Lexicography 3(4), 235–244 (1990)CrossRefGoogle Scholar
  19. 19.
    Settles, B.: Abner: An open source tool for automatically tagging genes, proteins, and other entity names in text. Bioinformatics 21(14), 3191–3192 (2005)CrossRefGoogle Scholar
  20. 20.
    Sureka, A., Mirajkar, P.P., Teli, P.N., Agarwal, G., Bose, S.K.: Semantic based text classification of patent documents to a user-defined taxonomy. In: Huang, R., Yang, Q., Pei, J., Gama, J., Meng, X., Li, X. (eds.) ADMA 2009. LNCS, vol. 5678, pp. 644–651. Springer, Heidelberg (2009)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • R. Romero
    • 1
  • E. L. Iglesias
    • 1
  • L. Borrajo
    • 1
  • C. M. Redondo Marey
    • 2
  1. 1.Univ. of VigoOurenseSpain
  2. 2.Complexo HospitalarioUniversitario de VigoVigoSpain

Personalised recommendations