Identifying Sequenced Eukaryotic Genomes and Transcriptomes with diArk

  • Martin KollmarEmail author
  • Dominic Simm
Part of the Methods in Molecular Biology book series (MIMB, volume 1757)


The diArk Eukaryotic Genome Database is a manually curated and updated repository of available eukaryotic genome and transcriptome assemblies. diArk is a key resource for researchers interested in comparative eukaryotic genomics, and the entry point to browsing sequenced eukaryotes in general and to find the most closely related species to the own organism of interest in particular. The exponentially increasing number of sequenced species demands sophisticated search and data presentation tools. In this chapter we describe how to navigate the diArk database keeping a first-time user in mind.

Key words

Eukaryotes Sequenced genomes Genome assembly Transcriptome assembly 


  1. 1.
    Odronitz F, Hellkamp M, Kollmar M (2007) diArk--a resource for eukaryotic genome research. BMC Genomics 8:103. CrossRefPubMedPubMedCentralGoogle Scholar
  2. 2.
    Clark AG, Eisen MB, Smith DR et al (2007) Evolution of genes and genomes on the drosophila phylogeny. Nature 450:203–218. CrossRefPubMedGoogle Scholar
  3. 3.
    Lindblad-Toh K, Garber M, Zuk O et al (2011) A high-resolution map of human evolutionary constraint using 29 mammals. Nature 478:476–482. CrossRefPubMedPubMedCentralGoogle Scholar
  4. 4.
    Genome 10K Community of Scientists (2009) Genome 10K: a proposal to obtain whole-genome sequence for 10 000 vertebrate species. J Hered 100:659–674. CrossRefPubMedCentralGoogle Scholar
  5. 5.
    Zhang G, Rahbek C, Graves GR et al (2015) Genomics: bird sequencing project takes off. Nature 522:34. CrossRefPubMedGoogle Scholar
  6. 6.
    i5K Consortium (2013) The i5K initiative: advancing arthropod genomics for knowledge, human health, agriculture, and the environment. J Hered 104:595–600. CrossRefPubMedCentralGoogle Scholar
  7. 7.
    Kumar S, Schiffer PH, Blaxter M (2012) 959 nematode genomes: a semantic wiki for coordinating sequencing projects. Nucleic Acids Res 40:D1295–D1300. CrossRefPubMedGoogle Scholar
  8. 8.
    Kumar S, Koutsovoulos G, Kaur G, Blaxter M (2012) Toward 959 nematode genomes. WormBook 1:42–50. CrossRefGoogle Scholar
  9. 9.
    Matasci N, Hung L-H, Yan Z et al (2014) Data access for the 1,000 plants (1KP) project. GigaScience 3:17. CrossRefPubMedPubMedCentralGoogle Scholar
  10. 10.
    Keeling PJ, Burki F, Wilcox HM et al (2014) The marine microbial eukaryote Transcriptome sequencing project (MMETSP): illuminating the functional diversity of eukaryotic life in the oceans through transcriptome sequencing. PLoS Biol 12:e1001889. CrossRefPubMedPubMedCentralGoogle Scholar
  11. 11.
    Sun Y, Huang Y, Li X et al (2016) Fish-T1K (Transcriptomes of 1,000 fishes) project: large-scale transcriptome data for fish evolution studies. GigaScience 5:18. CrossRefPubMedPubMedCentralGoogle Scholar
  12. 12.
    Normile D (2017) Plant scientists plan massive effort to sequence 10,000 genomes. In: Sci. AAAS. Accessed 28 Aug 2017
  13. 13.
    Kitts PA, Church DM, Thibaud-Nissen F et al (2016) Assembly: a resource for assembled genomes at NCBI. Nucleic Acids Res 44:D73–D80. CrossRefPubMedGoogle Scholar
  14. 14.
    Hammesfahr B, Odronitz F, Hellkamp M, Kollmar M (2011) diArk 2.0 provides detailed analyses of the ever increasing eukaryotic genome sequencing data. BMC Res Notes 4:338. CrossRefPubMedPubMedCentralGoogle Scholar
  15. 15.
    Kollmar M, Kollmar L, Hammesfahr B, Simm D (2015) diArk – the database for eukaryotic genome and transcriptome assemblies in 2014. Nucleic Acids Res 43:D1107–D1112. CrossRefPubMedGoogle Scholar
  16. 16.
    Federhen S (2012) The NCBI taxonomy database. Nucleic Acids Res 40:D136–D143. CrossRefPubMedGoogle Scholar
  17. 17.
    Kaye J, Heeney C, Hawkins N et al (2009) Data sharing in genomics — re-shaping scientific practice. Nat Rev Genet 10:331–335. CrossRefPubMedPubMedCentralGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Group Systems Biology of Motor Proteins, Department of NMR-Based Structural BiologyMax-Planck-Institute for Biophysical ChemistryGöttingenGermany
  2. 2.Theoretical Computer Science and Algorithmic Methods, Institute of Computer ScienceGeorg-August-UniversityGöttingenGermany

Personalised recommendations