Objective

Dall’s sheep (Ovis dalli dalli) are endemic to alpine ecosystems of northwestern North America, and their populations have been declining in recent decades [1,2,3,4]. Climate change may be altering alpine plant communities and contributing to these declines. Dall’s sheep have a generalist plant diet; they were observed eating 110 different plant species in the Yukon Territory, Canada through traditional observational methods [5]. However, the diet of Dall’s sheep remains relatively poorly characterized and represents a gap in understanding how climate change is affecting plant-animal interactions in alpine ecosystems.

The level of taxonomic resolution of items consumed in a diet study greatly affects ecological analysis [6]. DNA based tools can infer diet composition with higher resolution and reduces cost, time, and effort compared to observational, morphological, and microhistological methods [7, 8]. Specifically, DNA metabarcoding uses universal primers for multispecies identification to mass-amplify DNA barcodes using PCR that are then read using next generation sequencing and assigned to the appropriate taxon [9]. DNA barcoding includes a reference database of potential diet components, providing the capability to identify diet items to a desirable taxonomic resolution, ensuring that all components will be detected and assigned [10]. Next generation sequencing of DNA from fecal samples has been successfully used to characterize diets of a variety of species, including ungulates [11, 12]. However, metabarcoding has not yet been used to assess the diet of Dall’s sheep. Lack of sequence data for some arctic/alpine plants known to be grazed upon by Dall’s sheep currently limits the development and application of metabarcoding for alpine herbivore diet studies.

To improve capabilities for diet analysis of Dall’s sheep and other arctic herbivores, we used a python script [13] to identify gaps in archived nucleotide sequence data for species known to comprise the diet of Dall’s Sheep, then obtained specimens of 16 species of arctic/alpine vascular plants for which sequence information was missing or underrepresented in publicly archived databases. We then sequenced the rbcL gene of the plant chloroplast genome, which is one of the most commonly used barcoding regions for plants [9, 14].

Data description

Plant specimens were obtained from herbarium specimens collected from the various arctic or alpine sites across mainland Alaska (Additional file 1). Plant tissue was extracted at the U. S. Geological Survey Alaska Science Center, employing a CTAB-PVP protocol modified from Stewart and Via [15] as reported by Muñiz-Salazar et al. [16]. Extracts were quantified and shipped to the School of Environmental and Forest Sciences Genetics Lab at the University of Washington for PCR amplification and NexteraXT library preparation for sequencing. The rbcL gene region of each specimen was amplified via a two-step PCR protocol [17] with a primary amplification with tailed primers (rbcLaf + adaptor, rbcLr506 + adaptor) followed by a second round of amplification to anneal NexteraXT indices. Amplicons were quantified using a Qubit 4 Fluorometer (ThermoFisher) and diluted with dH2O to the recommended starting concentration for library preparation, 0.2 ng/μL (Illumina). Tagmentation, library amplification, and clean-up steps were completed according to the NexteraXT library preparation protocol (Illumina) with a variation of using New England Biolabs AMPure XP beads for cleanup instead of Agentcourt AMPure beads. The libraries were normalized and pooled prior to sequencing on an Illumina Miseq platform. Samples were paired-end sequenced in a 2 × 300 bp format .

Illumina sequence reads were processed in Geneious Prime 2020.2.4. Forward and reverse read files (fastq) were paired upon import, then quality trimmed with BBDuk trimmer (minimum quality 20, minimum overlap 20, minimum length 20). Sequences were normalized, then aligned and assembled using the de novo assembly tool (Geneious Prime). Assembled contigs were uploaded and annotated using BankIt, then submitted to GenBank [18] (Table 1).

Table 1 Overview of data files for arctic plant rbcL sequencing

Limitations

The following are limitations for these data files:

  1. 1.

    We sequenced one DNA extraction from each plant species.

  2. 2.

    The sequencing project was funded through a grant to train new users on Illumina Nextera sequencing.