Development of a Machine Learning Framework for Biomedical Text Mining
- First Online:
- Cite this paper as:
- Rodrigues R., Costa H., Rocha M. (2016) Development of a Machine Learning Framework for Biomedical Text Mining. In: Saberi Mohamad M., Rocha M., Fdez-Riverola F., Domínguez Mayo F., De Paz J. (eds) 10th International Conference on Practical Applications of Computational Biology & Bioinformatics. Advances in Intelligent Systems and Computing, vol 477. Springer, Cham
Biomedical text mining (BTM) aims to create methods for searching and structuring knowledge extracted from biomedical literature. Named entity recognition (NER), a BTM task, seeks to identify mentions to biological entities in texts. Dictionaries, regular expressions, natural language processing and machine learning (ML) algorithms are used in this task. Over the last years, @Note2, an open-source software framework, which includes user-friendly interfaces for important tasks in BTM, has been developed, but it did not include ML-based methods. In this work, the development of a framework, BioTML, including a number of ML-based approaches for NER is proposed, to fill the gap between @Note2 and state-of-the-art ML approaches. BioTML was integrated in @Note2 as a novel plug-in, where Hidden Markov Models, Conditional Random Fields and Support Vector Machines were implemented to address NER tasks, working with a set of over 60 feature types used to train ML models. The implementation was supported in open-source software, such as MALLET, LibSVM, ClearNLP or OpenNLP. Several manually annotated corpora were used in the validation of BioTML. The results are promising, while there is room for improvement.
KeywordsBiomedical text mining Named entity recognition Machine learning
Unable to display preview. Download preview PDF.