Overview of GeCo: A Project for Exploring and Integrating Signals from the Genome

  • Stefano CeriEmail author
  • Anna Bernasconi
  • Arif Canakoglu
  • Andrea Gulino
  • Abdulrahman Kaitoua
  • Marco Masseroli
  • Luca Nanni
  • Pietro Pinoli
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 822)


Next Generation Sequencing is a 10-year old technology for reading the DNA, capable of producing massive amounts of genomic data - in turn, reshaping genomic computing. In particular, tertiary data analysis is concerned with the integration of heterogeneous regions of the genome; this is an emerging and increasingly important problem of genomic computing, because regions carry important signals and the creation of new biological or clinical knowledge requires the integration of these signals into meaningful messages. We specifically focus on how the GeCo project is contributing to tertiary data analysis, by overviewing the main results of the project so far and by describing its future scenarios.


Genomic computing Data translation and optimization Cloud computing Next generation sequencing Open data 



This research is funded by the ERC Advanced Grant project GeCo (Data-Driven Genomic Computing), No. 693174, 2016-2021.


  1. 1.
    1000 Genomes Consortium: An integrated map of genetic variation from 1,092 human genomes. Nature, 491, 56–65 (2012)Google Scholar
  2. 2.
    Albrecht, F., et al.: DeepBlue epigenomic data server: programmatic data retrieval and analysis of the epigenome. Nucleid Acids Res. 44(W1), W581–586 (2016)CrossRefGoogle Scholar
  3. 3.
    Accelerating bioinformatics research with new software for big data to knowledge (BD2K). Paradigm4 Inc. (2015).
  4. 4.
  5. 5.
  6. 6.
  7. 7.
    Bernasconi, A., et al.: Conceptual modeling for genomics: building an integrated repository of open data. In: Proceedings of the Entity-Relationship, Valencia, ES (2017)CrossRefGoogle Scholar
  8. 8.
    Bertoni, M., et al.: Evaluating cloud frameworks on genomic applications. In: Proceedings of the IEEE Conference on Big Data Management, Santa Clara, CA (2015)Google Scholar
  9. 9.
    Cattani, S., et al.: Evaluating genomic big data operations on SciDB and Spark. In: Cabot, J., De Virgilio, R., Torlone, R. (eds.) ICWE 2017. LNCS, vol. 10360, pp. 482–493. Springer, Cham (2017). Scholar
  10. 10.
    Ceri, S., et al.: Data-Driven Genomic Computing (GeCo): Making sense of Signals from the Genome. In: Selected Papers of the XIX International Conference on Data Analytics and Management in Data Intensive Domains (DAMDID/RCDL 2017), CEUR Workshop Proceedings, vol. 2022, pp. 1–2 (2017)Google Scholar
  11. 11.
    Ceri, S., et al.: Data management for heterogeneous genomic datasets. IEEE/ACM Trans. Comput. Biol. Bioinf. 14(6), 1251–1264 (2016)CrossRefGoogle Scholar
  12. 12.
    Cumbo, F., et al.: TCGA2BED: extracting, extending, integrating, and querying the Cancer genome atlas. BMC Bioinf. 18(6), 1–9 (2017)Google Scholar
  13. 13.
    ENCODE Project Consortium: An integrated encyclopedia of DNA elements in the human genome. Nature 489(7414), 57–74 (2012)CrossRefGoogle Scholar
  14. 14.
  15. 15.
    Jalili, V., et al.: Indexing next-generation sequencing data. Inf. Sci. 384, 90–109 (2016). Scholar
  16. 16.
    Jalili, V., et al.: Explorative visual analytics on interval-based genomic data and their metadata. BMC Bioinf. 18, 536 (2017)CrossRefGoogle Scholar
  17. 17.
    Kaitoua, A., et al.: Framework for supporting genomic operations, IEEE-TC (2016).
  18. 18.
    Masseroli, M., et al.: GenoMetric query language: a novel approach to large-scale genomic data management. Bioinformatics 31(12), 1881–1888 (2015)CrossRefGoogle Scholar
  19. 19.
    Masseroli, M., et al.: Modeling and interoperability of heterogeneous genomic big data for integrative processing and querying. Methods 111, 3–11 (2016)CrossRefGoogle Scholar
  20. 20.
    Nanni, L., et al.: Exploring genomic datasets: from batch to interactive and back. In: Proceedings of the ExploreDB 2018, Co-Located with ACM-Sigmod, June 2018Google Scholar
  21. 21.
    Olston, C., et al.: Pig Latin: a not-so-foreign language for data processing. In: ACM-SIGMOD, pp. 1099–1110 (2008)Google Scholar
  22. 22.
    Romanoski, C.E., et al.: Epigenomics: roadmap for regulation. Nature 518, 314–316 (2015)CrossRefGoogle Scholar
  23. 23.
  24. 24.
    Schuster, S.C.: Next-generation sequencing transforms today’s biology. Nat. Methods 5(1), 16–18 (2008)CrossRefGoogle Scholar
  25. 25.
    Stephens, Z.D., et al.: Big data: astronomical or genomical? PLoS Biol. 13(7), e1002195 (2015)CrossRefGoogle Scholar
  26. 26.
    Weinstein, J.N., et al.: The Cancer Genome Atlas Pan-Cancer analysis project. Nat. Genet. 45(10), 1113–1120 (2013)CrossRefGoogle Scholar
  27. 27.
    Zaharia, M., et al.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of the USENIX, pp. 15–28 (2012)Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Dipartimento di Elettronica, Informazione e BioingegneriaPolitecnico di MilanoMilanoItaly

Personalised recommendations