Development of a Machine Learning Framework for Biomedical Text Mining

Rodrigues, Ruben; Costa, Hugo; Rocha, Miguel

doi:10.1007/978-3-319-40126-3_5

Ruben Rodrigues^7,8,
Hugo Costa⁸ &
Miguel Rocha⁷

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 477))

Included in the following conference series:

International Conference on Practical Applications of Computational Biology & Bioinformatics

862 Accesses
2 Citations
1 Altmetric

Abstract

Biomedical text mining (BTM) aims to create methods for searching and structuring knowledge extracted from biomedical literature. Named entity recognition (NER), a BTM task, seeks to identify mentions to biological entities in texts. Dictionaries, regular expressions, natural language processing and machine learning (ML) algorithms are used in this task. Over the last years, @Note2, an open-source software framework, which includes user-friendly interfaces for important tasks in BTM, has been developed, but it did not include ML-based methods. In this work, the development of a framework, BioTML, including a number of ML-based approaches for NER is proposed, to fill the gap between @Note2 and state-of-the-art ML approaches. BioTML was integrated in @Note2 as a novel plug-in, where Hidden Markov Models, Conditional Random Fields and Support Vector Machines were implemented to address NER tasks, working with a set of over 60 feature types used to train ML models. The implementation was supported in open-source software, such as MALLET, LibSVM, ClearNLP or OpenNLP. Several manually annotated corpora were used in the validation of BioTML. The results are promising, while there is room for improvement.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Feldman, R., Sanger, J.: The Text Mining Hand Book - Advanced Approaches in Analysing Unstructured Data (2007)
Google Scholar
Shatkay, H., Craven, M.: Mining the biomedical literature (2012)
Google Scholar
Nadeau, D., Sekine, S.: A survey of named entity recognition and classification. Lingvisticae Investigationes 1–20, 2007 (1991)
Google Scholar
Kim, J.D., Ohta, T., Tateisi, Y., Tsujii, J.: GENIA corpus - A semantically annotated corpus for bio-textmining. Bioinformatics 19 (2003)
Google Scholar
Eom, J., Zhang, B.: PubMiner : Machine Learning-based Text Mining for Biomedical Information Analysis. Genomics 2, 99–106 (2004)
Google Scholar
Takeuchi, K., Collier, N.: Bio-medical entity extraction using support vector machines. Artificial Intelligence in Medicine 33, 125–137 (2005)
Article Google Scholar
Bundschus, M., Dejori, M., Stetter, M., Tresp, V., Kriegel, H.P.: Extraction of semantic biomedical relations from text using conditional random fields. BMC Bioinformatics 9, 207 (2008)
Article Google Scholar
Ramage, D.: Hidden Markov models fundamentals. Standford CS229 Section Notes, pp. 1–13 (2007)
Google Scholar
Sutton, C.: An Introduction to Conditional Random Fields. Foundations and Trends in Machine Learning 4(4), 267–373 (2012)
Article MATH Google Scholar
Torii, M., Wagholikar, K., Liu, H.: Detecting concept mentions in biomedical text using hidden Markov model: multiple concept types at once or one at a time? Journal of Biomedical Semantics 5, 3 (2014)
Article Google Scholar
Lourenço, A., Carreira, R., Carneiro, S., Maia, P., Glez-Peña, D., Fdez-Riverola, F., Ferreira, E.C., Rocha, I., Rocha, M.: @Note: A workbench for Biomedical Text Mining. Journal of Biomedical Informatics 42(4), 710–720 (2009)
Article Google Scholar
Batanlar, Y., Özuysal, M.: Introduction to machine learning. Methods in Molecular Biology 1107, 105–128 (2014)
Article Google Scholar
Quan, C., Wang, M., Ren, F.: An unsupervised text mining method for relation extraction from biomedical literature. PLoS ONE 9(7), 1–8 (2014)
Article Google Scholar
Pereira, F., Lafferty, J., Mccallum, A.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of 18th International Conference on Machine Learning, (ICML), pp. 282–289 (2001)
Google Scholar
Campos, D., Matos, S., Oliveira, J.L.: Gimli: open source and high-performance biomedical name recognition. BMC Bioinformatics 14, 54 (2013)
Article Google Scholar
Morton, T., Kottmann, J., Baldridge, J.: OpenNLP: A Java-based NLP Toolkit (2005)
Google Scholar
Choi, J.D.: Optimization of Natural Language Processing Components for Robustness and Scalability. PhD thesis, University of Colorado at Boulder (2012)
Google Scholar
Manning, C.D., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S.J., McClosky, D: The stanford coreNLP natural language processing toolkit. In: Proceedings of 52nd Annual Meet. Assoc. Comput. Linguistics: System Demonstrations, pp. 55–60 (2014)
Google Scholar
McCallum, A.K.: MALLET: A Machine Learning for Language Toolkit (2002)
Google Scholar
Kim, J.D., Ohta, T., Tsuruoka, Y., Tateisi, Y., Collier, N.: Introduction to the bio-entity recognition task at JNLPBA. In: Proceedings of Intern. Joint Workshop Natural Language Processing in Biomedicine and Its Applications, pp. 70–75 (2004)
Google Scholar
Zhou, G., Su, J.: Exploring deep knowledge resources in biomedical name recognition. In: Workshop on Natural Language Processing in Biomedicine and Its Applications at COLING, pp. 96–99 (2004)
Google Scholar
Krallinger, M., et al.: Overview of the CHEMDNER patents task. In: Proceedings of the Fifth BioCreative Challenge Evaluation Workshop, pp. 63–75 (2015)
Google Scholar

Download references

Author information

Authors and Affiliations

Centre of Biological Engineering, University of Minho, Braga, Portugal
Ruben Rodrigues & Miguel Rocha
Silicolife, Lda, Braga, Portugal
Ruben Rodrigues & Hugo Costa

Authors

Ruben Rodrigues
View author publications
You can also search for this author in PubMed Google Scholar
Hugo Costa
View author publications
You can also search for this author in PubMed Google Scholar
Miguel Rocha
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ruben Rodrigues .

Editor information

Editors and Affiliations

Faculty of Computing, Universiti Teknologi Malaysia, Johor, Malaysia
Mohd Saberi Mohamad
Dep. de Inf.Campus of Gualtar, Universidade do Minho, Braga, Portugal
Miguel P. Rocha
Edificio Poli.Campus Univ.o As La, Escuela Superior de Ingeniería Informáti, Ourense, Spain
Florentino Fdez-Riverola
ETS Ingeniería Info., University of Sevilla, Sevilla, Spain
Francisco J. Domínguez Mayo
Departamento de Informática y Automática, Universidad de Salamanca, Salamanca, Spain
Juan F. De Paz

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rodrigues, R., Costa, H., Rocha, M. (2016). Development of a Machine Learning Framework for Biomedical Text Mining. In: Saberi Mohamad, M., Rocha, M., Fdez-Riverola, F., Domínguez Mayo, F., De Paz, J. (eds) 10th International Conference on Practical Applications of Computational Biology & Bioinformatics. PACBB 2016. Advances in Intelligent Systems and Computing, vol 477. Springer, Cham. https://doi.org/10.1007/978-3-319-40126-3_5

Download citation

DOI: https://doi.org/10.1007/978-3-319-40126-3_5
Published: 01 June 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-40125-6
Online ISBN: 978-3-319-40126-3
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics