The Bear Giant-Skipper genome suggests genetic adaptations to living inside yucca roots

Abstract

Giant-Skippers (Megathymini) are unusual thick-bodied, moth-like butterflies whose caterpillars feed inside Yucca roots and Agave leaves. Giant-Skippers are attributed to the subfamily Hesperiinae and they are endemic to southern and mostly desert regions of the North American continent. To shed light on the genotypic determinants of their unusual phenotypic traits, we sequenced and annotated a draft genome of the largest Giant-Skipper species, the Bear (Megathymus ursus violae). The Bear skipper genome is the least heterozygous among sequenced Lepidoptera genomes, possibly due to much smaller population size and extensive inbreeding. Their lower heterozygosity helped us to obtain a high-quality genome with an N50 of 4.2 Mbp. The ~ 430 Mb genome encodes about 14000 proteins. Phylogenetic analysis supports placement of Giant-Skippers with Grass-Skippers (Hesperiinae). We find that proteins involved in odorant and taste sensing as well as in oxidative reactions have diverged significantly in Megathymus as compared to Lerema, another Grass-Skipper. In addition, the Giant-Skipper has lost several odorant and gustatory receptors and possesses many fewer (1/3–1/2 of other skippers) anti-oxidative enzymes. Such differences may be related to the unusual life style of Giant-Skippers: they do not feed as adults, and their caterpillars feed inside Yuccas and Agaves, which provide a source of antioxidants such as polyphenols.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

References

  1. Ahola V, Lehtonen R et al (2014) The Glanville fritillary genome retains an ancient karyotype and reveals selective chromosomal fusions in Lepidoptera. Nat Commun 5:4737. https://doi.org/10.1038/ncomms5737

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Altschul SF, Gish W et al (1990) Basic local alignment search tool. J Mol Biol 215(3):403–410. https://doi.org/10.1016/S0022-2836(05)80360-2

    Article  CAS  Google Scholar 

  3. Ashburner M, Ball CA et al (2000) Gene ontology: tool for the unification of biology. Gene Ontol Consort Nat Genet 25(1):25–29. https://doi.org/10.1038/75556

    Article  CAS  Google Scholar 

  4. Besemer J, Borodovsky M (2005) GeneMark: web software for gene finding in prokaryotes, eukaryotes and viruses. Nucl Acids Res 33: W451–W454

    Article  CAS  PubMed  Google Scholar 

  5. Cantarel BL, Korf I et al (2008) MAKER: an easy-to-use annotation pipeline designed for emerging model organism genomes. Genome Res 18(1):188–196

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Challis RJ, Kumar S et al (2016) Lepbase: the Lepidopteran genome database. bioRxiv. https://doi.org/10.1101/056994

    Article  Google Scholar 

  7. Cheeke PR, Piacente S et al (2006) Anti-inflammatory and anti-arthritic effects of Yucca schidigera: a review. J Inflamm 3:6. https://doi.org/10.1186/1476-9255-3-6

    Article  CAS  Google Scholar 

  8. Chevreux B, Wetter T et al (1999) Genome sequence assembly using trace signals and additional sequence information. Comput Sci Biol 99:45–56

    Google Scholar 

  9. Cong Q, Borek D et al (2015a) Skipper genome sheds light on unique phenotypic traits and phylogeny. BMC Genom 16:639. https://doi.org/10.1186/s12864-015-1846-0

    Article  CAS  Google Scholar 

  10. Cong Q, Borek D et al (2015b) Tiger Swallowtail genome reveals mechanisms for speciation and caterpillar chemical defense. Cell Rep. https://doi.org/10.1016/j.celrep.2015.01.026

    Article  PubMed  Google Scholar 

  11. Cong Q, Shen J et al (2016) Complete genomes of Hairstreak butterflies, their speciation, and nucleo-mitochondrial incongruence. Sci Rep 6:24863. https://doi.org/10.1038/srep24863

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Cong Q, Shen J et al (2017a) When COI barcodes deceive: complete genomes reveal introgression in hairstreaks. Proc Biol Sci. https://doi.org/10.1098/rspb.2016.1735

    Article  PubMed  PubMed Central  Google Scholar 

  13. Cong Q, Shen J et al (2017b) The first complete genomes of Metalmarks and the classification of butterfly families. Genomics 109(5–6):485–493. https://doi.org/10.1016/j.ygeno.2017.07.006

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Cong Q, Shen J et al (2017c) The first complete genomes of Metalmarks and the classification of butterfly families. Genomics. https://doi.org/10.1016/j.ygeno.2017.07.006

    Article  PubMed  PubMed Central  Google Scholar 

  15. Duan J, Li R et al (2010) SilkDB v2.0: a platform for silkworm (Bombyx mori) genome biology. Nucl Acids ReS 38:D453–D456

    Article  CAS  PubMed  Google Scholar 

  16. Foley NM, Springer MS et al (2016) Mammal madness: is the mammal tree of life not yet resolved? Philos Trans R Soc Lond B Biol Sci 371(1699). https://doi.org/10.1098/rstb.2015.0140

  17. Freeman HA (1969) Systematic review of the Megathymidae. J Lep Soc 23(1):1–59

    Google Scholar 

  18. Gnerre S, Maccallum I et al (2011) High-quality draft assemblies of mammalian genomes from massively parallel sequence data. Proc Natl Acad Sci USA 108(4):1513–1518. https://doi.org/10.1073/pnas.1017351108

    Article  CAS  PubMed  Google Scholar 

  19. Haas BJ, Salzberg SL et al (2008) Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments. Genome Biol 9(1):R7

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Haas BJ, Papanicolaou A et al (2013) De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nat Protoc 8(8):1494–1512

    Article  CAS  Google Scholar 

  21. Heikkila M, Kaila L et al (2012) Cretaceous origin and repeated tertiary diversification of the redefined butterflies. Proc Biol Sci 279(1731):1093–1099. https://doi.org/10.1098/rspb.2011.1430

    Article  PubMed  Google Scholar 

  22. Heliconius Genome C (2012) Butterfly genome reveals promiscuous exchange of mimicry adaptations among species. Nature 487(7405):94–98

    Article  CAS  Google Scholar 

  23. International Silkworm Genome C (2008) The genome of a lepidopteran model insect, the silkworm Bombyx mori. Insect Biochem Mol Biol 38(12):1036–1045

    Article  CAS  Google Scholar 

  24. Janzen DH, Burns JM et al (2017) Nuclear genomes distinguish cryptic species suggested by their DNA barcodes and ecology. Proc Natl Acad Sci USA 114(31):8313–8318. https://doi.org/10.1073/pnas.1621504114

    Article  CAS  PubMed  Google Scholar 

  25. Jarvis ED, Mirarab S et al (2014) Whole-genome analyses resolve early branches in the tree of life of modern birds. Science 346(6215):1320–1331. https://doi.org/10.1126/science.1253451

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Jones P, Binns D et al (2014) InterProScan 5: genome-scale protein function classification. Bioinformatics 30(9):1236–1240

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Jurka J, Klonowski P et al (1996) CENSOR–a program for identification and elimination of repetitive elements from DNA sequences. Comput Chem 20(1):119–121

    Article  CAS  Google Scholar 

  28. Jurka J, Kapitonov VV et al (2005) Repbase Update, a database of eukaryotic repetitive elements. Cytogenet Genome Res 110(1–4):462–467

    Article  CAS  PubMed  Google Scholar 

  29. Kajitani R, Toshimoto K et al (2014) Efficient de novo assembly of highly heterozygous genomes from whole-genome shotgun short reads. Genome Res 24(8):1384–1395

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Katoh K, Standley DM (2013) MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol 30(4):772–780. https://doi.org/10.1093/molbev/mst010

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Kawahara AY, Breinholt JW (2014) Phylogenomics provides strong evidence for relationships of butterflies and moths. Proc Biol Sci 281(1788):20140970. https://doi.org/10.1098/rspb.2014.0970

    Article  PubMed  PubMed Central  Google Scholar 

  32. Kelley DR, Schatz MC et al (2010) Quake: quality-aware detection and correction of sequencing errors. Genome Biol 11(11):R116

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Kim D, Pertea G et al (2013) TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol 14(4):R36

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Korf I (2004) Gene finding in novel genomes. BMC Bioinform 5:59

    Article  Google Scholar 

  35. Kubatko LS, Degnan JH (2007) Inconsistency of phylogenetic estimates from concatenated data under coalescence. Syst Biol 56(1):17–24. https://doi.org/10.1080/10635150601146041

    Article  CAS  PubMed  Google Scholar 

  36. Kunte K, Zhang W et al (2014) doublesex is a mimicry supergene. Nature 507(7491):229–232. https://doi.org/10.1038/nature13112

    Article  CAS  PubMed  Google Scholar 

  37. Li L, Stoeckert CJ et al (2003) OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res 13(9):2178–2189. https://doi.org/10.1101/gr.1224503

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Li S, Zhu S et al (2018) The genomic and functional landscapes of developmental plasticity in the American cockroach. Nat Commun 9(1):1008. https://doi.org/10.1038/s41467-018-03281-1

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Majoros WH, Pertea M et al (2004) TigrScan and GlimmerHMM: two open source ab initio eukaryotic gene-finders. Bioinformatics 20(16):2878–2879

    Article  CAS  PubMed  Google Scholar 

  40. Marcais G, Kingsford C (2011) A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics 27(6):764–770

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Minno MC (1994) Immature stages of the skipper butterflies (Lepidoptera: Hesperiidae) of the United States; biology, morphology, and descriptions. University of Florida, Gainesville

  42. Mirarab S, Reaz R et al (2014) ASTRAL: genome-scale coalescent-based species tree estimation. Bioinformatics 30(17):i541–i548. https://doi.org/10.1093/bioinformatics/btu462

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Misra S, Crosby MA et al (2002) Annotation of the Drosophila melanogaster euchromatic genome: a systematic review. Genome Biol 3(12):RESEARCH0083

    Article  PubMed  PubMed Central  Google Scholar 

  44. Mutanen M, Wahlberg N et al (2010) Comprehensive gene and taxon coverage elucidates radiation patterns in moths and butterflies. Proc Biol Sci 277(1695):2839–2848. https://doi.org/10.1098/rspb.2010.0392

  45. Nadeau NJ, Ruiz M et al (2014) Population genomics of parallel hybrid zones in the mimetic butterflies, H. melpomene and H. erato. Genome Res 24(8):1316–1333. https://doi.org/10.1101/gr.169292.113

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  46. Palkopoulou E, Mallick S et al (2015) Complete genomes reveal signatures of demographic and genetic declines in the woolly mammoth. Curr Biol 25(10):1395–1400. https://doi.org/10.1016/j.cub.2015.04.007

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  47. Parra G, Bradnam K et al (2007) CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes. Bioinformatics 23(9):1061–1067

    Article  CAS  Google Scholar 

  48. Pellissier L, Kostikova A et al (2017) High rate of protein coding sequence evolution and species Diversification in the Lycaenids. Front Ecol Evol. https://doi.org/10.3389/fevo.2017.00090

    Article  Google Scholar 

  49. Petterson MA, Wielgus RS (1973) Acceptance of artificial diet by Megathymus streckeri. (Skinner) (Megathymidae). J Res Lepidoptera 12(4):197–198

    Google Scholar 

  50. Poling OC (1902) A new Megathymus from Arizona. Entomol News 13(4):97–98

    Google Scholar 

  51. Pringle EG, Baxter SW et al (2007) Synteny and chromosome evolution in the lepidoptera: evidence from mapping in Heliconius melpomene. Genetics 177(1):417–426. https://doi.org/10.1534/genetics.107.073122

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  52. Read TD, Petit RA, et al (2017) Draft sequencing and assembly of the genome of the world’s largest fish, the whale shark: Rhincodon typus Smith 1828. BMC Genom 18(1):532. https://doi.org/10.1186/s12864-017-3926-9

    Article  CAS  Google Scholar 

  53. Rizwan K, Zubair M et al (2012) Phytochemical and biological studies of Agave attenuata. Int J Mol Sci 13(5):6440–6451. https://doi.org/10.3390/ijms13056440

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. Roberts A, Pimentel H et al (2011) Identification of novel transcripts in annotated genomes using RNA-SEq. Bioinformatics 27(17):2325–2329

    Article  CAS  PubMed  Google Scholar 

  55. Roever K (1975) Family Megathymidae. In: Howe WH (ed) The butterflies of North America. Doubleday, Garden, pp 411–422

    Google Scholar 

  56. Scott JA (1986) The butterflies of North America: a natural history and field guide. Standford University, Stanford

    Google Scholar 

  57. She R, Chu JS et al (2011) genBlastG: using BLAST searches to build homologous gene models. Bioinformatics 27(15):2141–2143

    Article  CAS  PubMed  Google Scholar 

  58. Shen J, Cong Q et al (2017) Complete genome of Achalarus lyciades, the first representative of the Eudaminae subfamily of Skippers. Curr Genomics 18(4):366–374

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  59. Slater GS, Birney E (2005) Automated generation of heuristics for biological sequence comparison. BMC Bioinform 6:31

    Article  CAS  Google Scholar 

  60. Smit AFA, Hubley R (2008–2010) http://www.repeatmasker.org RepeatModeler Open-1.0

  61. Smit AFA, Hubley R et al. (1996–2010) http://www.repeatmasker.org RepeatMasker Open-3.0

  62. St Pierre SE, Ponting L et al (2014) FlyBase 102–advanced approaches to interrogating FlyBase. Nucl Acids Res 42:D780–D788

    Article  CAS  PubMed  Google Scholar 

  63. Stallings DB, Turner JR (1956) Notes on Megathymus ursus, with description of a related new species. Lepidopterists’ News 10(1–2):1–8

    Google Scholar 

  64. Stamatakis A (2014) RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30(9):1312–1313. https://doi.org/10.1093/bioinformatics/btu033

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  65. Stanke M, Schoffmann O et al (2006) Gene prediction in eukaryotes with a generalized hidden Markov model that uses hints from external sources. BMC Bioinform 7:62

    Article  CAS  Google Scholar 

  66. Suzek BE, Huang H et al (2007) UniRef: comprehensive and non-redundant UniProt reference clusters. Bioinformatics 23(10):1282–1288

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  67. Talla V, Suh A et al (2017) Rapid Increase in genome size as a consequence of transposable element hyperactivity in wood-white (Leptidea) butterflies. Genome Biol Evol 9(10):2491–2505. https://doi.org/10.1093/gbe/evx163

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  68. Tang W, Yu L et al (2014) DBM-DB: the diamondback moth genome database. Database 4:bat087

    Article  CAS  Google Scholar 

  69. Thawornwattana Y, Dalquen D et al (2018) Coalescent analysis of phylogenomic data confidently resolves the species relationships in the Anopheles gambiae species complex. Mol Biol Evol. https://doi.org/10.1093/molbev/msy158

    Article  PubMed  PubMed Central  Google Scholar 

  70. Tigano A, Sackton TB et al (2018) Assembly and RNA-free annotation of highly heterozygous genomes: the case of the thick-billed murre (Uria lomvia). Mol Ecol Resour 18(1):79–90. https://doi.org/10.1111/1755-0998.12712

    Article  CAS  PubMed  Google Scholar 

  71. UniProt C (2014) Activities at the Universal Protein Resource (UniProt). Nucl Acids Res 42:D191–D198

    Article  CAS  Google Scholar 

  72. Van Nieuwerburgh F, Thompson RC et al (2012) Illumina mate-paired DNA sequencing-library preparation using Cre-Lox recombination. Nucl Acids Res 40(3):e24

    Article  CAS  PubMed  Google Scholar 

  73. Warren AD, Ogawa JR et al (2008) Phylogenetic relationships of subfamilies and circumscription of tribes in the family Hesperiidae (Lepidoptera: Hesperioidea). Cladistics 24(5):642–676. https://doi.org/10.1111/j.1096-0031.2008.00218.x

    Article  Google Scholar 

  74. Warren AD, Ogawa JR et al (2009) Revised classification of the family Hesperiidae (Lepidoptera: Hesperioidea) based on combined molecular and morphological data. Syst Entomol 34(3):467–523

    Article  Google Scholar 

  75. Waterhouse RM, Seppey M et al (2017) BUSCO applications from quality assessments to gene prediction and phylogenomics. Mol Biol Evol. https://doi.org/10.1093/molbev/msx319

    Article  PubMed  PubMed Central  Google Scholar 

  76. Wences AH, Schatz MC (2015) Metassembler: merging and optimizing de novo genome assemblies. Genome Biol 16:207. https://doi.org/10.1186/s13059-015-0764-4

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  77. Wielgus RS, Wielgus JR et al (1972) A new subspecies of Megathymus ursus Poling (Megathymidae) from Arizona with observations and notes on its distribution and life history. Bull Allyn Museum 9:1–11

    Google Scholar 

  78. You M, Yue Z et al (2013) A heterozygous moth genome provides insights into herbivory and detoxification. Nat Genet 45(2):220–225

    Article  CAS  PubMed  Google Scholar 

  79. Zhan S, Reppert SM (2013) MonarchBase: the monarch butterfly genome database. Nucl Acids Res 41:D758–D763

    Article  CAS  PubMed  Google Scholar 

  80. Zhan S, Merlin C et al (2011) The monarch butterfly genome yields insights into long-distance migration. Cell 147(5):1171–1185

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  81. Zhang J, Cong Q et al (2017) Mitogenomes of Giant-Skipper Butterflies reveal an ancient split between deep and shallow root feeders. F1000Res 6:222. https://doi.org/10.12688/f1000research.10970.1

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

We thank Lisa N. Kinch for suggestions and proofreading of the manuscript. We are grateful to Texas Parks and Wildlife Department (Natural Resources Program Director David H. Riskind) for the research permit #08-02Rev. Qian Cong was a Howard Hughes Medical Institute International Student Research fellow when these studies were performed. We thank Greg M. Lasley for the photograph of a live male shown in Fig. 4B.

Funding

This work was funded in part by the National Institutes of Health (GM094575 and GM127390 to NVG) and the Welch Foundation (I-1505 to NVG).

Author information

Affiliations

Authors

Contributions

Q. and NVG collected the specimens, QC designed and carried out the experiments, performed the computational analyses and drafted the manuscript; WL performed the analysis of genome quality; DB and ZO designed and supervised experimental studies; NVG directed the project and drafted the sections of the manuscript. All authors wrote the manuscript.

Corresponding author

Correspondence to Nick V. Grishin.

Ethics declarations

Conflict of interest

The authors declare that they have no competing interests.

Human and animal rights

This article does not contain any studies with human participants performed by any of the authors. All applicable international, national, and institutional guidelines for the care and use of animals were followed.

Availability of supporting data

See the Supplemental Information for the details of our protocols. Major scripts used in this projects and intermediate results are made available at http://prodata.swmed.edu/LepDB/.

Additional information

Communicated by S. Hohmann.

Electronic supplementary material

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Cong, Q., Li, W., Borek, D. et al. The Bear Giant-Skipper genome suggests genetic adaptations to living inside yucca roots. Mol Genet Genomics 294, 211–226 (2019). https://doi.org/10.1007/s00438-018-1494-6

Download citation

Keywords

  • Skipper butterflies
  • Root borers
  • Comparative genomics
  • Antioxidants