Molecular Genetics and Genomics

, Volume 294, Issue 1, pp 211–226 | Cite as

The Bear Giant-Skipper genome suggests genetic adaptations to living inside yucca roots

  • Qian Cong
  • Wenlin Li
  • Dominika Borek
  • Zbyszek Otwinowski
  • Nick V. GrishinEmail author
Original Article


Giant-Skippers (Megathymini) are unusual thick-bodied, moth-like butterflies whose caterpillars feed inside Yucca roots and Agave leaves. Giant-Skippers are attributed to the subfamily Hesperiinae and they are endemic to southern and mostly desert regions of the North American continent. To shed light on the genotypic determinants of their unusual phenotypic traits, we sequenced and annotated a draft genome of the largest Giant-Skipper species, the Bear (Megathymus ursus violae). The Bear skipper genome is the least heterozygous among sequenced Lepidoptera genomes, possibly due to much smaller population size and extensive inbreeding. Their lower heterozygosity helped us to obtain a high-quality genome with an N50 of 4.2 Mbp. The ~ 430 Mb genome encodes about 14000 proteins. Phylogenetic analysis supports placement of Giant-Skippers with Grass-Skippers (Hesperiinae). We find that proteins involved in odorant and taste sensing as well as in oxidative reactions have diverged significantly in Megathymus as compared to Lerema, another Grass-Skipper. In addition, the Giant-Skipper has lost several odorant and gustatory receptors and possesses many fewer (1/3–1/2 of other skippers) anti-oxidative enzymes. Such differences may be related to the unusual life style of Giant-Skippers: they do not feed as adults, and their caterpillars feed inside Yuccas and Agaves, which provide a source of antioxidants such as polyphenols.


Skipper butterflies Root borers Comparative genomics Antioxidants 



We thank Lisa N. Kinch for suggestions and proofreading of the manuscript. We are grateful to Texas Parks and Wildlife Department (Natural Resources Program Director David H. Riskind) for the research permit #08-02Rev. Qian Cong was a Howard Hughes Medical Institute International Student Research fellow when these studies were performed. We thank Greg M. Lasley for the photograph of a live male shown in Fig. 4B.

Author contributions

Q. and NVG collected the specimens, QC designed and carried out the experiments, performed the computational analyses and drafted the manuscript; WL performed the analysis of genome quality; DB and ZO designed and supervised experimental studies; NVG directed the project and drafted the sections of the manuscript. All authors wrote the manuscript.


This work was funded in part by the National Institutes of Health (GM094575 and GM127390 to NVG) and the Welch Foundation (I-1505 to NVG).

Compliance with ethical standards

Conflict of interest

The authors declare that they have no competing interests.

Human and animal rights

This article does not contain any studies with human participants performed by any of the authors. All applicable international, national, and institutional guidelines for the care and use of animals were followed.

Availability of supporting data

See the Supplemental Information for the details of our protocols. Major scripts used in this projects and intermediate results are made available at

Supplementary material

438_2018_1494_MOESM1_ESM.xlsx (72 kb)
Supplementary material 1 (XLSX 72 KB)
438_2018_1494_MOESM2_ESM.xlsx (10 kb)
Supplementary material 2 (XLSX 10 KB)
438_2018_1494_MOESM3_ESM.xlsx (6.2 mb)
Supplementary material 3 (XLSX 6328 KB)
438_2018_1494_MOESM4_ESM.xlsx (77 kb)
Supplementary material 4 (XLSX 77 KB)
438_2018_1494_MOESM5_ESM.xlsx (48 kb)
Supplementary material 5 (XLSX 47 KB)
438_2018_1494_MOESM6_ESM.xlsx (57 kb)
Supplementary material 6 (XLSX 56 KB)
438_2018_1494_MOESM7_ESM.xlsx (90 kb)
Supplementary material 7 (XLSX 89 KB)
438_2018_1494_MOESM8_ESM.xlsx (56 kb)
Supplementary material 8 (XLSX 55 KB)
438_2018_1494_MOESM9_ESM.xlsx (50 kb)
Supplementary material 9 (XLSX 50 KB)


  1. Ahola V, Lehtonen R et al (2014) The Glanville fritillary genome retains an ancient karyotype and reveals selective chromosomal fusions in Lepidoptera. Nat Commun 5:4737. CrossRefGoogle Scholar
  2. Altschul SF, Gish W et al (1990) Basic local alignment search tool. J Mol Biol 215(3):403–410. CrossRefGoogle Scholar
  3. Ashburner M, Ball CA et al (2000) Gene ontology: tool for the unification of biology. Gene Ontol Consort Nat Genet 25(1):25–29. CrossRefGoogle Scholar
  4. Besemer J, Borodovsky M (2005) GeneMark: web software for gene finding in prokaryotes, eukaryotes and viruses. Nucl Acids Res 33: W451–W454CrossRefGoogle Scholar
  5. Cantarel BL, Korf I et al (2008) MAKER: an easy-to-use annotation pipeline designed for emerging model organism genomes. Genome Res 18(1):188–196CrossRefGoogle Scholar
  6. Challis RJ, Kumar S et al (2016) Lepbase: the Lepidopteran genome database. bioRxiv. Google Scholar
  7. Cheeke PR, Piacente S et al (2006) Anti-inflammatory and anti-arthritic effects of Yucca schidigera: a review. J Inflamm 3:6. CrossRefGoogle Scholar
  8. Chevreux B, Wetter T et al (1999) Genome sequence assembly using trace signals and additional sequence information. Comput Sci Biol 99:45–56Google Scholar
  9. Cong Q, Borek D et al (2015a) Skipper genome sheds light on unique phenotypic traits and phylogeny. BMC Genom 16:639. CrossRefGoogle Scholar
  10. Cong Q, Borek D et al (2015b) Tiger Swallowtail genome reveals mechanisms for speciation and caterpillar chemical defense. Cell Rep. Google Scholar
  11. Cong Q, Shen J et al (2016) Complete genomes of Hairstreak butterflies, their speciation, and nucleo-mitochondrial incongruence. Sci Rep 6:24863. CrossRefGoogle Scholar
  12. Cong Q, Shen J et al (2017a) When COI barcodes deceive: complete genomes reveal introgression in hairstreaks. Proc Biol Sci. Google Scholar
  13. Cong Q, Shen J et al (2017b) The first complete genomes of Metalmarks and the classification of butterfly families. Genomics 109(5–6):485–493. CrossRefGoogle Scholar
  14. Cong Q, Shen J et al (2017c) The first complete genomes of Metalmarks and the classification of butterfly families. Genomics. Google Scholar
  15. Duan J, Li R et al (2010) SilkDB v2.0: a platform for silkworm (Bombyx mori) genome biology. Nucl Acids ReS 38:D453–D456CrossRefGoogle Scholar
  16. Foley NM, Springer MS et al (2016) Mammal madness: is the mammal tree of life not yet resolved? Philos Trans R Soc Lond B Biol Sci 371(1699).
  17. Freeman HA (1969) Systematic review of the Megathymidae. J Lep Soc 23(1):1–59Google Scholar
  18. Gnerre S, Maccallum I et al (2011) High-quality draft assemblies of mammalian genomes from massively parallel sequence data. Proc Natl Acad Sci USA 108(4):1513–1518. CrossRefGoogle Scholar
  19. Haas BJ, Salzberg SL et al (2008) Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments. Genome Biol 9(1):R7CrossRefGoogle Scholar
  20. Haas BJ, Papanicolaou A et al (2013) De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nat Protoc 8(8):1494–1512CrossRefGoogle Scholar
  21. Heikkila M, Kaila L et al (2012) Cretaceous origin and repeated tertiary diversification of the redefined butterflies. Proc Biol Sci 279(1731):1093–1099. CrossRefGoogle Scholar
  22. Heliconius Genome C (2012) Butterfly genome reveals promiscuous exchange of mimicry adaptations among species. Nature 487(7405):94–98CrossRefGoogle Scholar
  23. International Silkworm Genome C (2008) The genome of a lepidopteran model insect, the silkworm Bombyx mori. Insect Biochem Mol Biol 38(12):1036–1045CrossRefGoogle Scholar
  24. Janzen DH, Burns JM et al (2017) Nuclear genomes distinguish cryptic species suggested by their DNA barcodes and ecology. Proc Natl Acad Sci USA 114(31):8313–8318. CrossRefGoogle Scholar
  25. Jarvis ED, Mirarab S et al (2014) Whole-genome analyses resolve early branches in the tree of life of modern birds. Science 346(6215):1320–1331. CrossRefGoogle Scholar
  26. Jones P, Binns D et al (2014) InterProScan 5: genome-scale protein function classification. Bioinformatics 30(9):1236–1240CrossRefGoogle Scholar
  27. Jurka J, Klonowski P et al (1996) CENSOR–a program for identification and elimination of repetitive elements from DNA sequences. Comput Chem 20(1):119–121CrossRefGoogle Scholar
  28. Jurka J, Kapitonov VV et al (2005) Repbase Update, a database of eukaryotic repetitive elements. Cytogenet Genome Res 110(1–4):462–467CrossRefGoogle Scholar
  29. Kajitani R, Toshimoto K et al (2014) Efficient de novo assembly of highly heterozygous genomes from whole-genome shotgun short reads. Genome Res 24(8):1384–1395CrossRefGoogle Scholar
  30. Katoh K, Standley DM (2013) MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol 30(4):772–780. CrossRefGoogle Scholar
  31. Kawahara AY, Breinholt JW (2014) Phylogenomics provides strong evidence for relationships of butterflies and moths. Proc Biol Sci 281(1788):20140970. CrossRefGoogle Scholar
  32. Kelley DR, Schatz MC et al (2010) Quake: quality-aware detection and correction of sequencing errors. Genome Biol 11(11):R116CrossRefGoogle Scholar
  33. Kim D, Pertea G et al (2013) TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol 14(4):R36CrossRefGoogle Scholar
  34. Korf I (2004) Gene finding in novel genomes. BMC Bioinform 5:59CrossRefGoogle Scholar
  35. Kubatko LS, Degnan JH (2007) Inconsistency of phylogenetic estimates from concatenated data under coalescence. Syst Biol 56(1):17–24. CrossRefGoogle Scholar
  36. Kunte K, Zhang W et al (2014) doublesex is a mimicry supergene. Nature 507(7491):229–232. CrossRefGoogle Scholar
  37. Li L, Stoeckert CJ et al (2003) OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res 13(9):2178–2189. CrossRefGoogle Scholar
  38. Li S, Zhu S et al (2018) The genomic and functional landscapes of developmental plasticity in the American cockroach. Nat Commun 9(1):1008. CrossRefGoogle Scholar
  39. Majoros WH, Pertea M et al (2004) TigrScan and GlimmerHMM: two open source ab initio eukaryotic gene-finders. Bioinformatics 20(16):2878–2879CrossRefGoogle Scholar
  40. Marcais G, Kingsford C (2011) A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics 27(6):764–770CrossRefGoogle Scholar
  41. Minno MC (1994) Immature stages of the skipper butterflies (Lepidoptera: Hesperiidae) of the United States; biology, morphology, and descriptions. University of Florida, GainesvilleGoogle Scholar
  42. Mirarab S, Reaz R et al (2014) ASTRAL: genome-scale coalescent-based species tree estimation. Bioinformatics 30(17):i541–i548. CrossRefGoogle Scholar
  43. Misra S, Crosby MA et al (2002) Annotation of the Drosophila melanogaster euchromatic genome: a systematic review. Genome Biol 3(12):RESEARCH0083CrossRefGoogle Scholar
  44. Mutanen M, Wahlberg N et al (2010) Comprehensive gene and taxon coverage elucidates radiation patterns in moths and butterflies. Proc Biol Sci 277(1695):2839–2848.
  45. Nadeau NJ, Ruiz M et al (2014) Population genomics of parallel hybrid zones in the mimetic butterflies, H. melpomene and H. erato. Genome Res 24(8):1316–1333. CrossRefGoogle Scholar
  46. Palkopoulou E, Mallick S et al (2015) Complete genomes reveal signatures of demographic and genetic declines in the woolly mammoth. Curr Biol 25(10):1395–1400. CrossRefGoogle Scholar
  47. Parra G, Bradnam K et al (2007) CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes. Bioinformatics 23(9):1061–1067CrossRefGoogle Scholar
  48. Pellissier L, Kostikova A et al (2017) High rate of protein coding sequence evolution and species Diversification in the Lycaenids. Front Ecol Evol. Google Scholar
  49. Petterson MA, Wielgus RS (1973) Acceptance of artificial diet by Megathymus streckeri. (Skinner) (Megathymidae). J Res Lepidoptera 12(4):197–198Google Scholar
  50. Poling OC (1902) A new Megathymus from Arizona. Entomol News 13(4):97–98Google Scholar
  51. Pringle EG, Baxter SW et al (2007) Synteny and chromosome evolution in the lepidoptera: evidence from mapping in Heliconius melpomene. Genetics 177(1):417–426. CrossRefGoogle Scholar
  52. Read TD, Petit RA, et al (2017) Draft sequencing and assembly of the genome of the world’s largest fish, the whale shark: Rhincodon typus Smith 1828. BMC Genom 18(1):532. CrossRefGoogle Scholar
  53. Rizwan K, Zubair M et al (2012) Phytochemical and biological studies of Agave attenuata. Int J Mol Sci 13(5):6440–6451. CrossRefGoogle Scholar
  54. Roberts A, Pimentel H et al (2011) Identification of novel transcripts in annotated genomes using RNA-SEq. Bioinformatics 27(17):2325–2329CrossRefGoogle Scholar
  55. Roever K (1975) Family Megathymidae. In: Howe WH (ed) The butterflies of North America. Doubleday, Garden, pp 411–422Google Scholar
  56. Scott JA (1986) The butterflies of North America: a natural history and field guide. Standford University, StanfordGoogle Scholar
  57. She R, Chu JS et al (2011) genBlastG: using BLAST searches to build homologous gene models. Bioinformatics 27(15):2141–2143CrossRefGoogle Scholar
  58. Shen J, Cong Q et al (2017) Complete genome of Achalarus lyciades, the first representative of the Eudaminae subfamily of Skippers. Curr Genomics 18(4):366–374CrossRefGoogle Scholar
  59. Slater GS, Birney E (2005) Automated generation of heuristics for biological sequence comparison. BMC Bioinform 6:31CrossRefGoogle Scholar
  60. Smit AFA, Hubley R (2008–2010) RepeatModeler Open-1.0
  61. Smit AFA, Hubley R et al. (1996–2010) RepeatMasker Open-3.0
  62. St Pierre SE, Ponting L et al (2014) FlyBase 102–advanced approaches to interrogating FlyBase. Nucl Acids Res 42:D780–D788CrossRefGoogle Scholar
  63. Stallings DB, Turner JR (1956) Notes on Megathymus ursus, with description of a related new species. Lepidopterists’ News 10(1–2):1–8Google Scholar
  64. Stamatakis A (2014) RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30(9):1312–1313. CrossRefGoogle Scholar
  65. Stanke M, Schoffmann O et al (2006) Gene prediction in eukaryotes with a generalized hidden Markov model that uses hints from external sources. BMC Bioinform 7:62CrossRefGoogle Scholar
  66. Suzek BE, Huang H et al (2007) UniRef: comprehensive and non-redundant UniProt reference clusters. Bioinformatics 23(10):1282–1288CrossRefGoogle Scholar
  67. Talla V, Suh A et al (2017) Rapid Increase in genome size as a consequence of transposable element hyperactivity in wood-white (Leptidea) butterflies. Genome Biol Evol 9(10):2491–2505. CrossRefGoogle Scholar
  68. Tang W, Yu L et al (2014) DBM-DB: the diamondback moth genome database. Database 4:bat087CrossRefGoogle Scholar
  69. Thawornwattana Y, Dalquen D et al (2018) Coalescent analysis of phylogenomic data confidently resolves the species relationships in the Anopheles gambiae species complex. Mol Biol Evol. Google Scholar
  70. Tigano A, Sackton TB et al (2018) Assembly and RNA-free annotation of highly heterozygous genomes: the case of the thick-billed murre (Uria lomvia). Mol Ecol Resour 18(1):79–90. CrossRefGoogle Scholar
  71. UniProt C (2014) Activities at the Universal Protein Resource (UniProt). Nucl Acids Res 42:D191–D198CrossRefGoogle Scholar
  72. Van Nieuwerburgh F, Thompson RC et al (2012) Illumina mate-paired DNA sequencing-library preparation using Cre-Lox recombination. Nucl Acids Res 40(3):e24CrossRefGoogle Scholar
  73. Warren AD, Ogawa JR et al (2008) Phylogenetic relationships of subfamilies and circumscription of tribes in the family Hesperiidae (Lepidoptera: Hesperioidea). Cladistics 24(5):642–676. CrossRefGoogle Scholar
  74. Warren AD, Ogawa JR et al (2009) Revised classification of the family Hesperiidae (Lepidoptera: Hesperioidea) based on combined molecular and morphological data. Syst Entomol 34(3):467–523CrossRefGoogle Scholar
  75. Waterhouse RM, Seppey M et al (2017) BUSCO applications from quality assessments to gene prediction and phylogenomics. Mol Biol Evol. Google Scholar
  76. Wences AH, Schatz MC (2015) Metassembler: merging and optimizing de novo genome assemblies. Genome Biol 16:207. CrossRefGoogle Scholar
  77. Wielgus RS, Wielgus JR et al (1972) A new subspecies of Megathymus ursus Poling (Megathymidae) from Arizona with observations and notes on its distribution and life history. Bull Allyn Museum 9:1–11Google Scholar
  78. You M, Yue Z et al (2013) A heterozygous moth genome provides insights into herbivory and detoxification. Nat Genet 45(2):220–225CrossRefGoogle Scholar
  79. Zhan S, Reppert SM (2013) MonarchBase: the monarch butterfly genome database. Nucl Acids Res 41:D758–D763CrossRefGoogle Scholar
  80. Zhan S, Merlin C et al (2011) The monarch butterfly genome yields insights into long-distance migration. Cell 147(5):1171–1185CrossRefGoogle Scholar
  81. Zhang J, Cong Q et al (2017) Mitogenomes of Giant-Skipper Butterflies reveal an ancient split between deep and shallow root feeders. F1000Res 6:222. CrossRefGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Howard Hughes Medical InstituteUniversity of Texas Southwestern Medical CenterDallasUSA
  2. 2.Department of Biophysics and Department of BiochemistryUniversity of Texas Southwestern Medical CenterDallasUSA

Personalised recommendations