Numerical Eligibility Criteria in Clinical Protocols: Annotation, Automatic Detection and Interpretation

  • Vincent ClaveauEmail author
  • Lucas Emanuel Silva Oliveira
  • Guillaume Bouzillé
  • Marc Cuggia
  • Claudia Maria Cabral Moro
  • Natalia Grabar
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10259)


Clinical trials are fundamental for evaluating therapies and diagnosis techniques. Yet, recruitment of patients remains a real challenge. Eligibility criteria are related to terms but also to patient laboratory results usually expressed with numerical values. Both types of information are important for patient selection. We propose to address the processing of numerical values. A set of sentences extracted from clinical trials are manually annotated by four annotators. Four categories are distinguished: C (concept), V (numerical value), U (unit), O (out position). According to the pairs of annotators, the inter-annotator agreement on the whole annotation sequence CVU goes up to 0.78 and 0.83. Then, an automatic method using CFRs is exploited for creating a supervised model for the recognition of these categories. The obtained F-measure is 0.60 for C, 0.82 for V, and 0.76 for U.


Natural language processing Supervised learning Clinical trials Patient eligibility Numerical criteria 



This work was partly funded by CNRS-CONFAP project FIGTEM for Franco-Brazilian collaborations and a French government support granted to the CominLabs LabEx managed by the ANR in Investing for the Future program under reference ANR-10-LABX-07-01.


  1. 1.
    Artstein, R., Poesio, M.: Inter-coder agreement for computational linguistics. Comput. Linguist. 34(4), 555–596 (2008)CrossRefGoogle Scholar
  2. 2.
    Bigeard, E., Jouhet, V., Mougin, F., Thiessard, F., Grabar, N.: Automatic extraction of numerical values from unstructured data in EHRs. In: MIE (Medical Informatics in Europe) 2015, Madrid, Spain (2015)Google Scholar
  3. 3.
    Campillo-Gimenez, B., Buscail, C., Zekri, O., Laguerre, B., Le Prisé, E., De Crevoisier, R., Cuggia, M.: Improving the pre-screening of eligible patients in order to increase enrollment in cancer clinical trials. Trials 16(1), 1–15 (2015)CrossRefGoogle Scholar
  4. 4.
    Center Watch: State of the clinical trials industry: a sourcebook of charts and statistics. Technical report, Center Watch (2013)Google Scholar
  5. 5.
    Davidov, D., Rappaport, A.: Extraction and approximation of numerical attributes from the web. In: 48th Annual Meeting of the Association for Computational Linguistics, pp. 1308–1317 (2010)Google Scholar
  6. 6.
    Fletcher, B., Gheorghe, A., Moore, D., Wilson, S., Damery, S.: Improving the recruitment activity of clinicians in randomised controlled trials: a systematic review. BMJ Open 2(1), 1–14 (2012)CrossRefGoogle Scholar
  7. 7.
    Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: International Conference on Machine Learning (ICML) (2001)Google Scholar
  8. 8.
    Lavergne, T., Cappé, O., Yvon, F.: Practical very large scale CRFs. In: Proceedings the 48th Annual Meeting of the Association for Computational Linguistics (ACL), pp. 504–513. Association for Computational Linguistics, July 2010.
  9. 9.
    Madaan, A., Mitta, A., Mausam, Ramakrishnan, G., Sarawagi, S.: Numerical relation extraction with minimal supervision. In: Thirtieth AAAI Conference on Artificial Intelligence (2016)Google Scholar
  10. 10.
    Nath, C., Albaghdadi, M., Jonnalagadda, S.: A natural language processing tool for large-scale data extraction from echocardiography reports. PLoS One 11(4), 153749–153764 (2016)CrossRefGoogle Scholar
  11. 11.
    Olasov, B., Sim, I.: Ruleed, a web-based semantic network interface for constructing and revising computable eligibility rules. In: AMIA Symposium, p. 1051 (2006)Google Scholar
  12. 12.
    Pranjal, A., Delip, R., Balaraman, R.: Part of speech tagging and chunking with HMM and CRF. In: Proceedings of NLP Association of India (NLPAI) Machine Learning Contest (2006)Google Scholar
  13. 13.
    Sarath, P.R., Mandhan, S., Niwa, Y.: Numerical atrribute extraction from Clinical Texts. CoRR 1602.00269 (2016).
  14. 14.
    Raymond, C., Fayolle, J.: Reconnaissance robuste d’entités nommées sur de la parole transcrite automatiquement. In: Actes de la conférence Traitement Automatique des Langues Naturelles. Montréal, Canada (2010)Google Scholar
  15. 15.
    Schmid, H.: Probabilistic part-of-speech tagging using decision trees. In: Proceedings of International Conference on New Methods in Language Processing, pp. 44–49 (1994)Google Scholar
  16. 16.
    Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. 34(1), 1–47 (2002)MathSciNetCrossRefGoogle Scholar
  17. 17.
    Shivade, C., Raghavan, P., Fosler-Lussier, E., Embi, P.J., Elhadad, N., Johnson, S.B., Lai, A.M.: A review of approaches to identifying patient phenotype cohorts using electronic health records. J. Am. Med. Inform. Assoc. 21(2), 221–230 (2014)CrossRefGoogle Scholar
  18. 18.
    Wang, T., Li, J., Diao, Q., Hu, W., Zhang, Y., Dulong, C.: Semantic event detection using conditional random fields. In: IEEE Conference on Computer Vision and Pattern Recognition Workshop (CVPRW 2006), p. 109 (2006)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Vincent Claveau
    • 1
    Email author
  • Lucas Emanuel Silva Oliveira
    • 2
  • Guillaume Bouzillé
    • 3
  • Marc Cuggia
    • 3
  • Claudia Maria Cabral Moro
    • 2
  • Natalia Grabar
    • 4
  1. 1.IRISA - CNRSRennesFrance
  2. 2.PUCPR - Pontifícia Universidade Católica do ParanáCuritibaBrazil
  3. 3.INSERM/LTSI, HBD; CHU de Rennes; Université Rennes 2RennesFrance
  4. 4.CNRS, Univ. Lille, UMR 8163 - STL - Savoirs Textes LangageLilleFrance

Personalised recommendations