Preparing Data at the Source to Foster Interoperability across Rare Disease Resources

  • Marco Roos
  • Estrella López Martin
  • Mark D. Wilkinson
Part of the Advances in Experimental Medicine and Biology book series (AEMB, volume 1031)


The ability to combine heterogeneous data distributed across the globe is critically important to boost research on rare diseases, but it presents a number of methodological, representational and automation challenges. In this scenario, biomedical ontologies are of critical importance for enabling computers to aid in information retrieval and analysis across data collections.

This chapter presents an approach to preparing rare disease data for integration through the application of a global standard for computer-readable data and knowledge. This includes the use of common data elements, ontological codes and computer-readable data. This approach was developed under a number of domain-relevant requirements, such as controlled access to data, independence of the original sources, and the desire to combining the data sources with other computational workflows and data platforms.


Ontologies FAIR approach Linkable data Data integration Standardization Semantic model 


  1. 1.
    Bizer C, Heath T, Berners-Lee T (2009) Linked data-the story so far. Int J Semant Web Inf Syst 5(3):1–22CrossRefGoogle Scholar
  2. 2.
    Brochhausen M, Zheng J, Birtwell D, Williams H, Masci AM et al (2016) OBIB-a novel ontology for biobanking. J Biomed Semant 7(May):23CrossRefGoogle Scholar
  3. 3.
    Ceusters W (2012) An information artifact ontology perspective on data collections and associated representational artifacts. Stud Health Technol Inform 180:68–72PubMedGoogle Scholar
  4. 4.
    International Rare Disease Research Consortium (IRDiRC) Policies and Guidelines, Long version (2013). Available from: Accessed Dec 2016
  5. 5.
    Ison J, Kalas M, Jonassen I, Bolser D, Uludag M et al (2013) EDAM: an ontology of bioinformatics operations, types of data and identifiers, topics and formats. Bioinformatics 29(10):1325–1332CrossRefPubMedPubMedCentralGoogle Scholar
  6. 6.
    Jupp S, Malone J, Bolleman J, Brandizi M, Davies M et al (2014) The EBI RDF platform: linked open data for the life sciences. Bioinformatics 30(9):1338–1339CrossRefPubMedPubMedCentralGoogle Scholar
  7. 7.
    López E, Thompson R, Gainotti S, Wang M, Rubinstein Y et al (2016) Overview of existing initiatives to develop and improve access and data sharing in rare disease registries and biobanks worldwide. Expert Opin Orphan Drugs 4(7):729–739CrossRefGoogle Scholar
  8. 8.
    Lynch C, Parastatidis S, Jacobs N, Van de Sompel H, Lagoze C (2007) The OAI-ORE Effort: Progress, Challenges, Synergies. Proceedings of the 7th ACM/IEEE-CS joint conference on digital libraries 80-80Google Scholar
  9. 9.
    Malone J, Holloway E, Adamusiak T, Kapushesky M, Zheng J et al (2010) Modeling sample variables with an experimental factor ontology. Bioinformatics 26(8):1112–1118CrossRefPubMedPubMedCentralGoogle Scholar
  10. 10.
    McMurry J, Blomberg N, Burdett T, Conte N, Dumontier M et al (2015) 10 Simple rules for design, provision, and reuse of identifiers for web-based life science data. Zenodo. Available from: Accessed Dec 2016
  11. 11.
    McMurry J, Köhler S, Washington NL, Balhoff JP, Borromeo C et al (2016) Navigating the phenotype frontier: the monarch initiative. Genetics 203(4):1491–1495CrossRefPubMedPubMedCentralGoogle Scholar
  12. 12.
    Miles A, Bechhofer S (2009) SKOS Simple Knowledge Organization System Reference. World Wide Web Consortium. Available from: Accessed Dec 2016
  13. 13.
    Orphanet Standard Operating Procedures, Version 02.1 (2016) Available from: Accessed Dec 2016
  14. 14.
    Philippakis AA, Azzariti DR, Beltran S, Brookes AJ, Brownstein CA et al (2015) The matchmaker exchange: a platform for rare disease gene discovery. Hum Mutat 36(10):915–921CrossRefPubMedPubMedCentralGoogle Scholar
  15. 15.
    Rath A, Olry A, Dhombres F, Brandt MM, Urbero B et al (2012) Representation of rare diseases in health information systems: the orphanet approach to serve a wide range of end users. Hum Mutat 33(5):803–808CrossRefPubMedGoogle Scholar
  16. 16.
    RD-Connect “Bring Your Own Data (BYOD)” Workshop to Link Rare Disease Registries (September 29–30, 2016) National centre for rare diseases, Istituto Superiore di Sanità, Rome. Available from: Accessed Dec 2016
  17. 17.
    Roos M, Wilkinson MD, Kaliyaperumal R, Thompson M, Carta C et al (2016) Registries of domain-relevant semantic reference models help bootstrap interoperability in domains with fragmented data resources. Proceedings of the 9th International Semantic Web Applications and Tools for the Life Sciences (SWAT4LS) Conference. Available from: Accessed Dec 2016
  18. 18.
    Samadian S, McManus B, Wilkinson MD (2012) Extending and encoding existing biological terminologies and datasets for use in the reasoned semantic web. J Biomed Semant 3(1):6CrossRefGoogle Scholar
  19. 19.
    Smedley D, Jacobsen JOB, Jäger M, Köhler S, Holtgrewe M et al (2015) Next-generation diagnostics and disease-gene discovery with the exomiser. Nat Protoc 10:2004–2015CrossRefPubMedPubMedCentralGoogle Scholar
  20. 20.
    Smith B, Ashburner M, Rosse C, Bard J, Bug W et al (2007) The OBO foundry: coordinated evolution of ontologies to support biomedical data integration. Nat Biotechnol 25(11):1251–1255CrossRefPubMedPubMedCentralGoogle Scholar
  21. 21.
    Thompson R, Johnston L, Taruscio D, Monaco L, Béroud C et al (2014) RD-Connect: an integrated platform connecting databases, registries, biobanks and clinical bioinformatics for rare disease research. J Gen Intern Med 29(S3):S780–S787CrossRefPubMedGoogle Scholar
  22. 22.
    Weibel S, Kunze J, Lagoze C, Wolf M (1998) Dublin core metadata for resource discovery. Available from: Accessed: Dec 2016
  23. 23.
    Whetzel PL, Noy NF, Shah NH, Alexander PR, Nyulas C et al (2011) BioPortal: enhanced functionality via new web services from the National Center for Biomedical Ontology to access and use ontologies in software applications. Nucleic Acids Res 39(Web Server issue):W541–W545CrossRefPubMedPubMedCentralGoogle Scholar
  24. 24.
    Wilkinson MD, Dumontier M, Aalbersberg IJ, Appleton G, Axton M et al (2016) The FAIR guiding principles for scientific data management and stewardship. Sci Data 3(March):1600018Google Scholar
  25. 25.
    Williams AJ, Harland L, Groth P, Pettifer S, Chichester C et al (2012) Open PHACTS: semantic interoperability for drug discovery. Drug Discov Today 17(21–22):1188–1198CrossRefPubMedGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Marco Roos
    • 1
  • Estrella López Martin
    • 2
  • Mark D. Wilkinson
    • 3
  1. 1.BioSemantics group, Human Genetics DepartmentLeiden University Medical CenterLeidenThe Netherlands
  2. 2.Institute of Rare Diseases Research & Centre for Biomedical Network Research on Rare DiseasesInstituto de Salud Carlos IIIMadridSpain
  3. 3.Centro de Biotecnología y Genómica de Plantas UPM-INIAUniversidad Politécnica de MadridMadridSpain

Personalised recommendations