Comparative Genomics in Drosophila

  • Martin Oti
  • Attilio Pane
  • Michael SammethEmail author
Part of the Methods in Molecular Biology book series (MIMB, volume 1704)


Since the pioneering studies of Thomas Hunt Morgan and coworkers at the dawn of the twentieth century, Drosophila melanogaster and its sister species have tremendously contributed to unveil the rules underlying animal genetics, development, behavior, evolution, and human disease. Recent advances in DNA sequencing technologies launched Drosophila into the post-genomic era and paved the way for unprecedented comparative genomics investigations. The complete sequencing and systematic comparison of the genomes from 12 Drosophila species represents a milestone achievement in modern biology, which allowed a plethora of different studies ranging from the annotation of known and novel genomic features to the evolution of chromosomes and, ultimately, of entire genomes. Despite the efforts of countless laboratories worldwide, the vast amount of data that were produced over the past 15 years is far from being fully explored.

In this chapter, we will review some of the bioinformatic approaches that were developed to interrogate the genomes of the 12 Drosophila species. Setting off from alignments of the entire genomic sequences, the degree of conservation can be separately evaluated for every region of the genome, providing already first hints about elements that are under purifying selection and therefore likely functional. Furthermore, the careful analysis of repeated sequences sheds light on the evolutionary dynamics of transposons, an enigmatic and fascinating class of mobile elements housed in the genomes of animals and plants. Comparative genomics also aids in the computational identification of the transcriptionally active part of the genome, first and foremost of protein-coding loci, but also of transcribed nevertheless apparently noncoding regions, which were once considered “junk” DNA. Eventually, the synergy between functional and comparative genomics also facilitates in silico and in vivo studies on cis-acting regulatory elements, like transcription factor binding sites, that due to the high degree of sequence variability usually impose increased challenges for bioinformatics approaches.

Key words

Comparative genomics Drosophila 12 genomes project Multiple genome alignment Evolutionary conservation Homology-based prediction of protein-coding genes Noncoding RNAs miRNAs Transcription factor binding sites 


  1. 1.
    Adams MD, Celniker SE, Holt RA et al (2000) The genome sequence of Drosophila melanogaster. Science 287:2185–2195CrossRefPubMedGoogle Scholar
  2. 2.
    Misra S, Crosby MA, Mungall CJ et al (2002) Annotation of the Drosophila melanogaster euchromatic genome: a systematic review. Genome Biol 3(12):research0083.1–research083.22CrossRefGoogle Scholar
  3. 3.
    Richards S, Liu Y, Bettencourt BR et al (2005) Comparative genome sequencing of Drosophila pseudoobscura: chromosomal, gene, and cis-element evolution. Genome Res 15:1–18CrossRefPubMedPubMedCentralGoogle Scholar
  4. 4.
    Bergman CM, Pfeiffer BD, Rincón-Limas DE et al (2002) Assessing the impact of comparative genomic sequence data on the functional annotation of the Drosophila genome. Genome Biol 3:RESEARCH0086CrossRefPubMedPubMedCentralGoogle Scholar
  5. 5.
    Kellis M, Patterson N, Endrizzi M et al (2003) Sequencing and comparison of yeast species to identify genes and regulatory elements. Nature 423:241–254CrossRefPubMedGoogle Scholar
  6. 6.
    Clark AG, Eisen MB, Smith DR et al (2007) Evolution of genes and genomes on the Drosophila phylogeny. Nature 450:203–218CrossRefPubMedGoogle Scholar
  7. 7.
    Stark A, Lin MF, Kheradpour P et al (2007) Discovery of functional elements in 12 Drosophila genomes using evolutionary signatures. Nature 450:219–232CrossRefPubMedPubMedCentralGoogle Scholar
  8. 8.
    Lin MF, Carlson JW, Crosby MA et al (2007) Revisiting the protein-coding gene catalog of Drosophila melanogaster using 12 fly genomes. Genome Res 17:1823–1836CrossRefPubMedPubMedCentralGoogle Scholar
  9. 9.
    Roy S, Ernst J, modENCODE Consortium et al (2010) Identification of functional elements and regulatory circuits by Drosophila modENCODE. Science 330:1787–1797CrossRefPubMedPubMedCentralGoogle Scholar
  10. 10.
    Nègre N, Brown CD, Ma L et al (2011) A cis-regulatory map of the Drosophila genome. Nature 471:527–531CrossRefPubMedPubMedCentralGoogle Scholar
  11. 11.
    Attrill H, Falls K, Goodman JL et al (2016) FlyBase: establishing a gene group resource for Drosophila melanogaster. Nucleic Acids Res 44:D786–D792CrossRefPubMedGoogle Scholar
  12. 12.
    Herrero J, Muffato M, Beal K et al (2016) Ensembl comparative genomics resources. Database 2016:bav096. CrossRefPubMedPubMedCentralGoogle Scholar
  13. 13.
    Speir ML, Zweig AS, Rosenbloom KR et al (2016) The UCSC genome browser database: 2016 update. Nucleic Acids Res 44:D717–D725CrossRefPubMedGoogle Scholar
  14. 14.
    Harris RS (2007) Improved pairwise alignment of genomic DNA. Pennsylvania State University, State College, PAGoogle Scholar
  15. 15.
    Blanchette M, Kent WJ, Riemer C et al (2004) Aligning multiple genomic sequences with the threaded blockset aligner. Genome Res 14:708–715CrossRefPubMedPubMedCentralGoogle Scholar
  16. 16.
    Felsenstein J, Churchill GA (1996) A hidden markov model approach to variation among sites in rate of evolution. Mol Biol Evol 13:93–104CrossRefPubMedGoogle Scholar
  17. 17.
    Siepel A, Bejerano G, Pedersen JS et al (2005) Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res 15:1034–1050CrossRefPubMedPubMedCentralGoogle Scholar
  18. 18.
    Li R, Ye J, Li S et al (2005) ReAS: recovery of ancestral sequences for transposable elements from the unassembled reads of a whole genome shotgun. PLoS Comput Biol 1:e43CrossRefPubMedPubMedCentralGoogle Scholar
  19. 19.
    Edgar RC, Myers EW (2005) PILER: identification and classification of genomic repeats. Bioinformatics 21(Suppl 1):i152–i158CrossRefPubMedGoogle Scholar
  20. 20.
    Tempel S (2012) Using and understanding RepeatMasker. Methods Mol Biol 859:29–51CrossRefPubMedGoogle Scholar
  21. 21.
    Smith CD, Edgar RC, Yandell MD et al (2007) Improved repeat identification and masking in dipterans. Gene 389:1–9CrossRefPubMedGoogle Scholar
  22. 22.
    Stanke M, Waack S (2003) Gene prediction with a hidden Markov model and a new intron submodel. Bioinformatics 19(Suppl 2):ii215–ii225CrossRefPubMedGoogle Scholar
  23. 23.
    Gross SS, Brent MR (2006) Using multiple alignments to improve gene prediction. J Comput Biol 13:379–393CrossRefPubMedGoogle Scholar
  24. 24.
    Gross SS, Do CB, Sirota M, Batzoglou S (2007) CONTRAST: a discriminative, phylogeny-free approach to multiple informant de novo gene prediction. Genome Biol 8:R269CrossRefPubMedPubMedCentralGoogle Scholar
  25. 25.
    Lin MF, Jungreis I, Kellis M (2011) PhyloCSF: a comparative genomics method to distinguish protein coding and non-coding regions. Bioinformatics 27:i275–i282CrossRefPubMedPubMedCentralGoogle Scholar
  26. 26.
    Kristensen DM, Wolf YI, Mushegian AR, Koonin EV (2011) Computational methods for gene Orthology inference. Brief Bioinform 12:379–391CrossRefPubMedPubMedCentralGoogle Scholar
  27. 27.
    Vilella AJ, Severin J, Ureta-Vidal A et al (2009) EnsemblCompara genetrees: complete, duplication-aware phylogenetic trees in vertebrates. Genome Res 19:327–335CrossRefPubMedPubMedCentralGoogle Scholar
  28. 28.
    Huerta-Cepas J, Capella-Gutiérrez S, Pryszcz LP et al (2014) PhylomeDB v4: zooming into the plurality of evolutionary histories of a genome. Nucleic Acids Res 42:D897–D902CrossRefPubMedGoogle Scholar
  29. 29.
    Li L, Stoeckert CJ Jr, Roos DS (2003) OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res 13:2178–2189CrossRefPubMedPubMedCentralGoogle Scholar
  30. 30.
    Nielsen R (2005) Molecular signatures of natural selection. Annu Rev Genet 39:197–218CrossRefPubMedGoogle Scholar
  31. 31.
    Yang Z (2007) PAML 4: phylogenetic analysis by maximum likelihood. Mol Biol Evol 24:1586–1591CrossRefPubMedGoogle Scholar
  32. 32.
    Pedersen JS, Bejerano G, Siepel A et al (2006) Identification and classification of conserved RNA secondary structures in the human genome. PLoS Comput Biol 2:e33CrossRefPubMedPubMedCentralGoogle Scholar
  33. 33.
    Lorenz R, Bernhart SH, Höner Zu Siederdissen C et al (2011) ViennaRNA package 2.0. Algorithms Mol Biol 6:26CrossRefPubMedPubMedCentralGoogle Scholar
  34. 34.
    Lai EC, Tomancak P, Williams RW, Rubin GM (2003) Computational identification of drosophila microRNA genes. Genome Biol 4:R42CrossRefPubMedPubMedCentralGoogle Scholar
  35. 35.
    Lim LP, Lau NC, Weinstein EG et al (2003) The microRNAs of Caenorhabditis elegans. Genes Dev 17:991–1008CrossRefPubMedPubMedCentralGoogle Scholar
  36. 36.
    Blanchette M, Tompa M (2002) Discovery of regulatory elements by a computational method for phylogenetic footprinting. Genome Res 12:739–748CrossRefPubMedPubMedCentralGoogle Scholar
  37. 37.
    Zhang Z, Gerstein M (2003) Of mice and men: phylogenetic footprinting aids the discovery of regulatory elements. J Biol 2:11CrossRefPubMedPubMedCentralGoogle Scholar
  38. 38.
    Ganley ARD, Kobayashi T (2007) Phylogenetic footprinting to find functional DNA elements. Methods Mol Biol 395:367–380CrossRefPubMedGoogle Scholar
  39. 39.
    Satija R, Novák A, Miklós I et al (2009) BigFoot: Bayesian alignment and phylogenetic footprinting with MCMC. BMC Evol Biol 9:217CrossRefPubMedPubMedCentralGoogle Scholar

Copyright information

© Springer Science+Business Media LLC 2018

Authors and Affiliations

  1. 1.Institute of Biophysics Carlos Chagas Filho (IBCCF)Federal University of Rio de Janeiro (UFRJ)Rio de JaneiroBrazil
  2. 2.Institute of Biomedical Sciences (ICB)Federal University of Rio de Janeiro (UFRJ)Rio de JaneiroBrazil

Personalised recommendations