Advertisement

Towards Using Scientific Publications to Automatically Extract Information on Rare Diseases

  • Charles CousynEmail author
  • Kévin Bouchard
  • Sébastien Gaboury
  • Bruno Bouchard
Article

Abstract

A small percentage of the population is afflicted by what is called an orphan or a rare disease. All over the world, there are about several thousand of these diseases. When adding up together all the individuals who are affected, it amounts for up to 10% of the US population. Scientific works on these diseases are often poorly financed due to the lack of potential markets for a treatment, which means for patients and clinicians a very limited and scattered access to vital information. To contribute addressing this issue, we present in this paper a new software tool for automating the extraction of information related to rare diseases from scientific publications. More precisely, our contribution consists in a new method of extracting automatically symptoms of these diseases from research papers exploiting a Named Entity Recognition (NER) algorithm based on the numerical statistic Term Frequency - Inverse Document Frequency (TF-IDF). The proposed tool has been tested using PubMed Central (PMC) database.

Keywords

Text mining Rare disease Named entity recognition Knowledge aggregation Symptoms 

Notes

Acknowledgements

This project success was conducted with the financial support received from UQAC and the National Sciences and Engineering Research Council of Canada (NSERC).

Supplementary material

11036_2019_1237_MOESM1_ESM.png (142 kb)
(PNG 142 KB)
11036_2019_1237_MOESM2_ESM.png (296 kb)
(PNG 295 KB)
11036_2019_1237_MOESM3_ESM.png (749 kb)
(PNG 749 KB)
11036_2019_1237_MOESM4_ESM.png (84 kb)
(PNG 84.4 KB)
11036_2019_1237_MOESM5_ESM.png (68 kb)
(PNG 67.9 KB)
11036_2019_1237_MOESM6_ESM.png (54 kb)
(PNG 53.8 KB)

References

  1. 1.
    OoM (2018) Budget. Budget of the U.S. Government (2018). https://www.whitehouse.gov/
  2. 2.
    National institutes for health (2018) Budget. https://www.nih.gov/about-nih/what-we-do/budget
  3. 3.
    Rooke T (2018) The therapeutic challenge of rare diseases. Mayo Clin Proc 93(5):560CrossRefGoogle Scholar
  4. 4.
    Orphanet (2018) Orphanet: about orphanet. https://www.orpha.net/consor/cgi-bin/Education_AboutOrphanet.php
  5. 5.
    EU (2015) European platform for rare disease registries. http://www.epirare.eu
  6. 6.
    NORD (1969) Home - NORD (national organization for rare disorders). https://rarediseases.org
  7. 7.
    Levenshtein V (1966) Binary codes capable of correcting deletions, insertions and reversals. Sov Phys Dokl 10:707MathSciNetGoogle Scholar
  8. 8.
    Gupta V, Lehal GS (2009) Journal of Emerging Technologies in Web Intelligence 1(1):60.  https://doi.org/10.4304/jetwi.1.1.60-76 CrossRefGoogle Scholar
  9. 9.
    Allahyari M, Pouriyeh S, Assefi M, Safaei S, Trippe ED, Gutierrez JB, Kochut K (2017) arXiv:1707.02268.  https://doi.org/10.14569/IJACSA.2017.081052
  10. 10.
    Venkata N, Padmasree L, Mangathayaru N (2016) Int J Comput Appl 146 (11):30.  https://doi.org/10.5120/ijca2016910908 Google Scholar
  11. 11.
    Liu Y, Liang Y, Wishart D (2015) Nucleic Acids Res 43(W1):W535.  https://doi.org/10.1093/nar/gkv383 CrossRefGoogle Scholar
  12. 12.
    Li A, Zang Q, Sun D, Wang M (2016) Neurocomputing 206:73.  https://doi.org/10.1016/j.neucom.2015.11.110 CrossRefGoogle Scholar
  13. 13.
    Peng Y, Wei CH, Lu Z (2016) J Cheminf 8(1):1.  https://doi.org/10.1186/s13321-016-0165-z CrossRefGoogle Scholar
  14. 14.
    Mahmood AS, Wu TJ, Mazumder R, Vijay-Shanker K (2016) , . PLoS ONE 11(4):1.  https://doi.org/10.1371/journal.pone.0152725 Google Scholar
  15. 15.
    Bui QC, Sloot PMA (2012) Bioinformatics 28(20):2654.  https://doi.org/10.1093/bioinformatics/bts487 CrossRefGoogle Scholar
  16. 16.
    Holat P, Tomeh N, Charnois T, Battistelli D, Jaulent MC, Métivier JP (2016) Weakly-supervised symptom recognition for rare diseases in biomedical textGoogle Scholar
  17. 17.
    Martin L, Battistelli D, Charnois T (2014). In: 13th workshop on biomedical natural language processing (BioNLP 2014), pp 107–111Google Scholar
  18. 18.
    Schmid H (1995) Treetagger| a language independent part-of-speech tagger. Institut für Maschinelle Sprachverarbeitung, Universität Stuttgart 43:28Google Scholar
  19. 19.
    Manning C, Surdeanu M, Bauer J, Finkel J, Bethard S, McClosky D (2014). In: Proceedings of 52nd annual meeting of the association for computational linguistics: system demonstrations, pp 55–60.  https://doi.org/10.3115/v1/P14-5010
  20. 20.
    Orphadata (2013) Free access data from Orphanet. http://www.orphadata.org
  21. 21.
    U.S. National Institutes of Health’s National Library of Medicine (NIH/NLM) (2018) Pubmed Central. https://www.ncbi.nlm.nih.gov/pmc
  22. 22.
    Köhler S, Vasilevsky NA, et al. (2017) Nucleic Acids Res 45(D1):D865.  https://doi.org/10.1093/nar/gkw1039 CrossRefGoogle Scholar
  23. 23.
    Freud S (1920) Entrez programming utilities help [Internet]. Bethesda: national center for biotechnology informationGoogle Scholar
  24. 24.
    Umbel C, Ellis R, Mull R (2011) NaturalNode/natural. https://github.com/NaturalNode/natural
  25. 25.
    Alias-i (2008) LingPipe. http://alias-i.com/lingpipe/
  26. 26.
    Liu Y, Liao WK, Choudhary A, Li J (2007) Parallel data mining algorithms for association rules and clustering. CRC Press, Boca Raton.  https://doi.org/10.1201/9781420011296.ch32 CrossRefGoogle Scholar
  27. 27.
    Vukotic V, Claveau V, Raymond C (2015) IRISA at DeFT 2015: supervised and unsupervised methods in sentiment analysis. https://hal.archives-ouvertes.fr/hal-01226528
  28. 28.
    Garcia E (2008). J Doc 60(5):503.  https://doi.org/10.1108/00220410410560582 Google Scholar
  29. 29.
    Cousyn C, Bouchard K, Bouchard B, Gaboury S. In: Proceedings of the 4th EAI international conference on smart objects and technologies for social good - Goodtechs ’18. Goodtechs ’18. ACM, New York, pp 13–18.  https://doi.org/10.1145/3284869.3284892

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.LIARA LaboratoryUniversité du Québec à ChicoutimiChicoutimiCanada

Personalised recommendations