Sequence Labeling for Extracting Relevant Pieces of Information from Raw Text Medicine Descriptions
In Natural Language Processing, Named Entity Recognition aims to delimit and appropriately label the chunks of text containing a specific information. The paper presents the preliminary results we obtained by using a Conditional Random Fields approach for extracting information of interest from drug prescriptions. So far, our model was trained to extract the amount of medicine, measuring unit, frequency of administration, treatment duration and the treatment beneficiary condition. The model was trained using a corpus of drug prescriptions constructed and annotated by hand. The results obtained so far indicate the CRF model we developed performs well, scoring a 91% F1 score on the test set.
KeywordsNamed entity recognition Conditional random fields Drug prescriptions Information extraction
Conflict of Interest
The authors declare that they have no conflict of interest.
- 2.Ng, A.Y., Jordan, M.I.: On discriminative vs. generative classifiers: a comparison of logistic regression and naive bayes. In: Advances in Neural Information Processing Systems, pp. 841–848. MIT Press (2002)Google Scholar
- 3.Wijffels, J., Okazaki, N.: CRFsuite: conditional random fields for labelling sequential data in natural language processing based on CRFsuite: a fast implementation of conditional random fields (crfs). [Online] Available: https://github.com/bnosac/crfsuite
- 4.Bird, S., Klein, E., Loper, E.: Natural Language Processing with Python. O’Reilly Media (2009)Google Scholar