Abstract
The purpose of this paper is to study the use of dictionaries in the classification of biomedical texts. Experiments are conducted with three different dictionaries (BioCreative [13], NLPBA [8] and a subset of the UniProt database [4], named Protein) and three types of classifiers (KNN, SVM and Naive-Bayes) when they are applied to search on the PubMed database. Dictionaries have been used during the preprocessing and annotation of documents. The best results were obtained with the NLPBA and Protein dictionaries and the SVM classifier.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Abi-Haidar, A., Rocha, L.M.: Biomedical article classification using an agent-based model of T-cell cross-regulation. In: Hart, E., McEwan, C., Timmis, J., Hone, A. (eds.) ICARIS 2010. LNCS, vol. 6209, pp. 237–249. Springer, Heidelberg (2010)
Aha, D.W., Kibler, D., Albert, M.K.: Instance-based learning algorithms. Machine Learning, 37–66 (1991)
Ando, R.K., Dredze, M., Zhang, T.: Trec 2005 genomics track experiments at ibm watson. In: In Proceedings of TREC 2005. NIST Special Publication (2005)
Apweiler, R., Bairoch, A., Wu, C.H., Barker, W.C., Boeckmann, B., Ferro, S., Gasteiger, E., Huang, H., Lopez, R., Magrane, M., Martin, M.J., Natale, D.A., O’Donovan, C., Redaschi, N., Yeh, L.S.L.: Uniprot: the universal protein knowledgebase. Nucleic Acids Res. 32, 115–119 (2004)
Bai, R., Wang, X., Liao, J.: Extract semantic information from wordnet to improve text classification performance. In: AST/UCMA/ISA/ACN, pp. 409–420 (2010)
Boguraev, B., Briscoe, T., Carroll, J., Carter, D., Grover, C.: The derivation of a grammatically indexed lexicon from the longman dictionary of contemporary english. In: Proceedings of the 25th Annual Meeting on Association for Computational Linguistics, Morristown, NJ, USA, pp. 193–200. Association for Computational Linguistics (1987)
Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines (2001)
Collier, N., Ruch, P., Nazarenko, A. (eds.): JNLPBA 2004: Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications, Morristown, NJ, USA. Association for Computational Linguistics (2004)
Cortes, C., Vapnik, V.: Support-vector networks. Machine Learning, 273–297 (1995)
Cunningham, H., Wilks, Y., Gaizauskas, R.J.: Gate - a general architecture for text engineering (1996)
Frakes, W.B., Baeza-Yates, R.A. (eds.): Information Retrieval: Data Structures and Algorithms. Prentice-Hall, Englewood Cliffs (1992)
Garner, S.R.: Weka: The waikato environment for knowledge analysis. In: Proc. of the New Zealand Computer Science Research Students Conference, pp. 57–64 (1995)
Hirschman, L., Yeh, A., Blaschke, C., Valencia, A.: Overview of biocreative: critical assessment of information extraction for biology. BMC Bioinformatics 6 (Suppl.1), S1 (2005)
John, G., Langley, P.: Estimating continuous distributions in bayesian classifiers. In: Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence, pp. 338–345. Morgan Kaufmann, San Francisco (1995)
Kang, P., Cho, S.: EUS SVMs: Ensemble of Under-Sampled SVMs for Data Imbalance Problems. In: King, I., Wang, J., Chan, L.-W., Wang, D. (eds.) ICONIP 2006. LNCS, vol. 4232, pp. 837–846. Springer, Heidelberg (2006)
Liu, Y., Scheuermann, P., Li, X., Zhu, X.: Using wordnet to disambiguate word senses for text classification. In: Proceedings of the 7th International Conference on Computational Science, Part III: ICCS 2007, pp. 781–789. Springer, Heidelberg (2007)
McCrae, J., Collier, N.: Synonym set extraction from the biomedical literature by lexical pattern discovery. BMC Bioinformatics 9 (2008)
Miller, G.A., Beckwith, R., Fellbaum, C., Gross, D., Miller, K.J.: Introduction to wordnet: An on-line lexical database. Journal of Lexicography 3(4), 235–244 (1990)
Settles, B.: Abner: An open source tool for automatically tagging genes, proteins, and other entity names in text. Bioinformatics 21(14), 3191–3192 (2005)
Sureka, A., Mirajkar, P.P., Teli, P.N., Agarwal, G., Bose, S.K.: Semantic based text classification of patent documents to a user-defined taxonomy. In: Huang, R., Yang, Q., Pei, J., Gama, J., Meng, X., Li, X. (eds.) ADMA 2009. LNCS, vol. 5678, pp. 644–651. Springer, Heidelberg (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Romero, R., Iglesias, E.L., Borrajo, L., Marey, C.M.R. (2011). Using Dictionaries for Biomedical Text Classification. In: Rocha, M.P., RodrÃguez, J.M.C., Fdez-Riverola, F., Valencia, A. (eds) 5th International Conference on Practical Applications of Computational Biology & Bioinformatics (PACBB 2011). Advances in Intelligent and Soft Computing, vol 93. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19914-1_47
Download citation
DOI: https://doi.org/10.1007/978-3-642-19914-1_47
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-19913-4
Online ISBN: 978-3-642-19914-1
eBook Packages: EngineeringEngineering (R0)