Advertisement

Smart Persistence and Accessibility of Genomic and Clinical Data

  • Eleonora Cappelli
  • Emanuel WeitschekEmail author
  • Fabio Cumbo
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 1062)

Abstract

The continuous growth of experimental data generated by Next Generation Sequencing (NGS) machines has led to the adoption of advanced techniques to intelligently manage them. The advent of the Big Data era posed new challenges that led to the development of novel methods and tools, which were initially born to face with computational science problems, but which nowadays can be widely applied on biomedical data. In this work, we address two biomedical data management issues: (i) how to reduce the redundancy of genomic and clinical data, and (ii) how to make this big amount of data easily accessible. Firstly, we propose an approach to optimally organize genomic and clinical data by taking into account data redundancy and propose a method able to save as much space as possible by exploiting the power of no-SQL technologies. Then, we propose design principles for organizing biomedical data and make them easily accessible through the development of a collection of Application Programming Interfaces (APIs), in order to provide a flexible framework that we called OpenOmics. To prove the validity of our approach, we apply it on data extracted from The Genomic Data Commons repository. OpenOmics is free and open source for allowing everyone to extend the set of provided APIs with new features that may be able to answer specific biological questions. They are hosted on GitHub at the following address https://github.com/fabio-cumbo/open-omics-api/, publicly queryable at http://bioinformatics.iasi.cnr.it/openomics/api/routes, and their documentation is available at https://openomics.docs.apiary.io/.

Keywords

Biomedical data modeling Biomedical data management No-SQL API Genomic and clinical data 

References

  1. 1.
    Stenson, P.D., et al.: The human gene mutation database: towards a comprehensive repository of inherited mutation data for medical research, genetic diagnosis and next-generation sequencing studies. Hum. Genet. 136(6), 665–677 (2017)CrossRefGoogle Scholar
  2. 2.
    Barrett, T., et al.: NCBI GEO: archive for high-throughput functional genomic data. Nucleic Acids Res. 37(Suppl. 1), D885–D890 (2008)Google Scholar
  3. 3.
    Benson, D.A., Clark, K., Karsch-Mizrachi, I., Lipman, D.J., Ostell, J., Sayers, E.W.: GenBank. Nucleic Acids Res. 42(D1), D32–D37 (2013)CrossRefGoogle Scholar
  4. 4.
    Chen, Q., Zobel, J., and Verspoor, K.: Duplicates, redundancies and inconsistencies in the primary nucleotide databases: a descriptive study. In: Database 2017, baw163 (2017)Google Scholar
  5. 5.
    Cumbo, F., Fiscon, G., Ceri, S., Masseroli, M., Weitschek, E.: TCGA2BED: extracting, extending, integrating, and querying The Cancer Genome Atlas. BMC Bioinform. 18(1), 6 (2017)CrossRefGoogle Scholar
  6. 6.
    Cappelli, E., Cumbo, F., Bernasconi, A., Masseroli, M., Weitschek, E.: OpenGDC: standardizing, extending, and integrating genomics data of cancer. In ESCS 2018: 8th European Student Council Symposium, International Society for Computational Biology (ISCB), p. 1 (2018)Google Scholar
  7. 7.
    Weinstein, J.N., et al.: The cancer genome atlas pan-cancer analysis project. Nat. Genet. 45(10), 1113 (2013)CrossRefGoogle Scholar
  8. 8.
    Jensen, M.A., Ferretti, V., Grossman, R.L., Staudt, L.M.: The NCI genomic data commons as an engine for precision medicine. Blood 130(4), 453–459 (2017)CrossRefGoogle Scholar
  9. 9.
    Mortazavi, A., Williams, B.A., McCue, K., Schaeffer, L., Wold, B.: Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat. Methods 5(7), 621 (2008)CrossRefGoogle Scholar
  10. 10.
    Bibikova, M., et al.: High density DNA methylation array with single CpG site resolution. Genomics 98(4), 288–295 (2011)CrossRefGoogle Scholar
  11. 11.
    Trapnell, C., et al.: Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat. Biotechnol. 28(5), 511 (2010)CrossRefGoogle Scholar
  12. 12.
    Zeng, Y., Cullen, B.R.: Sequence requirements for micro RNA processing and function in human cells. RNA 9(1), 112–123 (2003)CrossRefGoogle Scholar
  13. 13.
    Timmermann, B., et al.: Somatic mutation profiles of MSI and MSS colorectal cancer identified by whole exome next generation sequencing and bioinformatics analysis. PLoS ONE 5(12), e15661 (2010)CrossRefGoogle Scholar
  14. 14.
    Conrad, D.F., et al.: Origins and functional impact of copy number variation in the human genome. Nature 464(7289), 704 (2010)CrossRefGoogle Scholar
  15. 15.
    Cumbo, F., Weitschek, E., Bertolazzi, P., Felici, G.: IRIS-TCGA: an information retrieval and integration system for genomic data of cancer. In: Bracciali, A., Caravagna, G., Gilbert, D., Tagliaferri, R. (eds.) CIBB 2016. LNCS, vol. 10477, pp. 160–171. Springer, Cham (2017).  https://doi.org/10.1007/978-3-319-67834-4_13CrossRefGoogle Scholar
  16. 16.
    Cumbo, F., Felici, G.: GDCWebApp: filtering, extracting, and converting genomic and clinical data from the Genomic Data Commons portal. In: Genome Informatics, Cold Spring Harbor Meeting (2017)Google Scholar
  17. 17.
    Weitschek, E., Cumbo, F., Cappelli, E., Felici, G.: Genomic data integration: a case study on next generation sequencing of cancer. In: International Workshop on Database and Expert Systems Applications, pp. 49–53, IEEE Computer Society, Los Alamitos (2016)Google Scholar
  18. 18.
    Weitschek, E., Cumbo, F., Cappelli, E., Felici, G., Bertolazzi, P.: Classifying big DNA methylation data: a gene-oriented approach. In: Elloumi, M., et al. (eds.) DEXA 2018. CCIS, vol. 903, pp. 138–149. Springer, Cham (2018).  https://doi.org/10.1007/978-3-319-99133-7_11CrossRefGoogle Scholar
  19. 19.
    Cappelli, E., Felici, G., Weitschek, E.: Combining DNA methylation and RNA sequencing data of cancer for supervised knowledge extraction. BioData Min. 11(1), 22 (2018)CrossRefGoogle Scholar
  20. 20.
    Weitschek, E., Di Lauro, S., Cappelli, E., Bertolazzi, P., Felici, G.: CamurWeb: a classification software and a large knowledge base for gene expression data of cancer. BMC Bioinform. 19(10), 245 (2018)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Department of EngineeringRoma Tre UniversityRomeItaly
  2. 2.Department of EngineeringUninettuno UniversityRomeItaly
  3. 3.CIBIO DepartmentUniversity of TrentoTrentoItaly

Personalised recommendations