Sample acquisition
Adult female spiders were acquired from Spider Pharm (Yarnell, AZ). These spiders include the following species: Parasteatoda tepidariorum, Steatoda grossa, Latrodectus geometricus, Latrodectus hesperus, Latrodectus mactans. Female spiders were confirmed to have reached adulthood by examining the epigynum prior to dissections. Each spider was treated / housed in identical conditions and fed crickets (Vita-Bug, common brown cricket – Timberline Fisheries, Marion, IL) from the same batch/lot whenever possible.
Last feeding and environmental isolation
At least six adult female spiders of each species were fed one cricket on the same day. After 24 h − 32 h each spider was transferred from the feeding vial into a sterile housing container. Each spider was isolated in this sterile environment for 7–8 days, without any additional feedings. The goal of this isolation procedure was to “starve” spiders to reduce the signal/ background microbial community of the spider’s prey.
Twelve crickets were also included in this study to determine background / food sourced microbial community members. Prior to the first round of spider dissections, crickets from the same batch underwent the same isolation conditions as the spiders. An additional three crickets were used during the second round of spider dissections (for second sequencing run) to determine the cricket microbiome at the time of the respective spiders’ last feeding. These cricket controls were euthanized on the same day as the last feeding and stored immediately at − 80 °C.
Aseptic spider tissue dissections
Prior to tissue dissections, forceps, wash containers/ beakers, and dissection buffer (SSC buffer) were autoclaved and sterilized. The microscope and surrounding area were cleaned with 10% bleach and 70% ethanol. After each dissection, the forceps were sterilized with a bleach wash, ethanol wash, followed by a sterile PCR water (VWR) rinse prior to the next spider dissection. Each spider underwent surface sterilization to remove possible environmental contaminants with a 10% bleach soak for 1 min followed by two separate washes in sterile PCR water for 1 min each (adapted from Brooks, A.W., et al. [8]). Three to four individual spiders of each species had the following tissues dissected in an aseptic manner [53]: venom glands [54], cephalothorax (without chelicerae), silk glands, ovary, and fat / mid-gut region.
Each tissue was rinsed with sterile PCR-grade water prior to collection in a sterile 1.5 mL microfuge tube. Three to four individual spiders of each species (whole, no dissection) were also surface sterilized prior to transfer to a sterile 1.5 mL microfuge tube. All cricket samples were transferred to 1.5 mL tubes without surface sterilization. All samples and an aliquot of SSC buffer were frozen in liquid nitrogen and then stored at − 80 °C until DNA extraction.
DNA extractions
DNA was extracted from each spider (3 per each species), spider tissue (3 sets per species), cricket and negative controls (SSC buffer and DNA extraction controls (reagents only) using Qiagen’s QiaAMP DNA Mini kits. We utilized Qiagen’s protocol with the following specifications: liquid nitrogen to freeze samples prior to homogenizing with a motorized pestle, 1.5 h lysis (vortexing every 20–30 min), a centrifugation step for 30 s at 6000 g prior to transferring sample lysates to their respective columns, and two elution steps (except for venom glands) - each with 5 min room temperature incubations. Prior to the DNA extractions of large whole spiders (Latrodectus species), each individual spider was divided in half with sterilized razor blades and forceps and the mass of each half was measured. DNA was extracted from both halves separately to avoid overloading the columns. Eluted DNA was combined in equal ratio based off of the pre-processed weight. Each sample type had an optimal elution volume, based on the size of the tissue or if a whole sample (spider or cricket). These elution volumes were as follows: whole spider = 400ul, cricket = 200ul, cephalothorax = 200ul, venom glands = 50ul, ovaries = 100ul, silk glands = 100ul, fat = 100ul.
DNA extractions were performed aseptically, with re-directed airflow, and while wearing a facial mask to reduce the risk contaminating the samples with exhaled bacteria. The extracted DNA was then quantified with ThermoFisher’s Quant-iT dsDNA High Sensitivity kit.
16S rRNA gene amplicon library preparation and sequencing
The standard methods for taxonomic classification of bacteria within a microbial community utilize the small ribosomal subunit (16S rRNA) gene. The 16S rRNA gene contains nine hyper-variable regions of various lengths. The variable regions with highest confidence of identifying bacteria down to the genus and species level to date are the V1-V2 and V1-V3 regions [55, 56]. The V1-V2 region was selected for this study because it is ~ 310 bp long and the appropriate length for higher quality paired-end sequencing with the Illumina MiSeq. Furthermore, utilizing the V1-V2 target is 90% accurate for identifying bacteria at the species level and 92% accurate at the genus level [55]. Prior to commencing this study, over twenty spider samples (whole and tissue) were used to test different sets of universal polymerase chain reaction (PCR) primers that target the V1-V2 region (27F-338R) and V3-V4 region (338F-786R). The 27F (5′-AGAGTTTGATCMTGGCTCA-3′ – slightly modified from Brooks et al.) and 338R (5′-GCTGCCTCCCGTAGGAGT-3′) universal 16S primers amplified the expected ~ 310 bp V1-V2 region from < 90% of test samples [8].
The V1 and V2 variable regions of the 16S rRNA gene were amplified from the extracted DNA and mock community DNA control (ZymoBIOMICS™ Microbial Community Standard from Zymo Research) utilizing universal PCR primers, 27F and 338R [8]. PCR was completed in a two-step process (PCR-1 and PCR-2) in order to yield significant PCR product with a unique molecular barcode for each sample’s 16S amplicons [49, 57, 58]. We designed custom primers containing V1 and V2 regions following the 16S primer design protocol by Kozich and Schloss [59], where the 27F and 338R primers contain a unique 8 bp barcode on each primer, a short Linker/ Pad sequence and the appropriate Illumina adaptor sequence (i5 or i7) (see Table S1 for list of primer sequences). Preliminary data showed that two-step PCR yielded significantly better results (consistent visible bands from gel electrophoresis) than nested-PCR. These multi-step PCR processes were also compared with single step PCR, where single-step PCR resulted in inconsistent and/ or low amplification of the spider microbiome DNA samples.
Extracted DNA from each sample and all negative controls (SCC buffer, Negative Extraction Controls, and PCR-water (non-template control) were run through one round of PCR-1 using a 12.5ul reaction with Q5 high fidelity master-mix (New England BioLabs, Inc.) with the following cycling conditions: 98 °C for 30 s, 25 cycles of 98 °C for 30 s (denature), 50 °C for 30 s (anneal), and 72 °C for 30 s (extension), with a final extension step at 72 °C for 10 min and end hold at 4 °C. PCR-2 included 2-3ul of PCR-1 product as template DNA. Four PCR-2 replicate 25ul reactions, using Q5 high fidelity master-mix, were generated per sample (3 with sample PCR-1 product and 1 as a non-template control). The conditions for PCR-2 were as follows: 98 °C for 30 s, 15 cycles of 98 °C for 30 s (denature), 50 °C for 30 s (anneal), and 72 °C for 30 s (extension), with a final extension step at 72 °C for 10 min and end hold at 4 °C. Each set of PCR-2 product replicates were combined per sample and purified with AMPure XP beads in a 1.8X bead-to-product ratio [60]. Each purified sample was then normalized to the same molar mass using Qubit Fluorometric Quantification (ThermoFisher Scientific). Two final normalized, pooled sample libraries and custom sequencing primers (Table S2) were sent to Cornell University’s Genomics Facility (according to their protocol) for two runs of paired-end sequencing (2 × 250 bp), with a 10% PhiX spike in, on an Illumina MiSeq following the Kozich and Schloss MiSeq protocol [59]. The concentration of sequencing primers was doubled for the second round of sequencing in order to increase the number of high-quality reads.
Microbial community data analysis
Pre-processing of sequences and initial quality control
The Quantitative Insights into Microbial Ecology (QIIME) program was utilized for pre-processing sequencing reads and microbial community analyses [61]. QIIME 1 was used to add barcodes to the read files (merge_bcs_reads.py), extract barcode sequences from the reads (extract_barcodes.py), join overlapping paired-end reads (join_paired_ends.py), and lastly demultiplex the joined reads according to their respective barcodes and sample IDs (split_libraries_fastq.py). After joining the paired-end reads (un-joined reads were removed from downstream analyses), the demultiplexing script also passes reads through quality filtering (reads < Q20 were removed from the dataset). A total of 896,429 reads out of 3,578,685 passed initial quality filtering for the first run and 5,172,436 reads out of 10,211,041 passed from the second sequencing run (large percentage of reads lost to PhiX spike-in and joining-step).
Sequence Dereplication, chimera checking, OTU picking, and taxonomy assignments with QIIME
The resulting joined, demultiplexed and high quality reads from each MiSeq run are contained in their run-specific seqs.fna output file and were imported into QIIME 2 [62]. Each set of sequences were dereplicated and de novo chimera-checked via the VSEARCH plug-in tool [63]. Chimeric reads (i.e. PCR artifacts/biases from parental strands acting as primers during PCR – hybrid 16S sequences from two different species of bacteria that artificially affect diversity estimates) were removed from the dataset to reduce the impact of PCR errors prior to Operational Taxonomic Unit (OTU) clustering and diversity analyses [8, 64]. The feature-table plug-in was then utilized to merge the two sets of resulting high quality sequences (merge-seqs option) and feature/ OTU tables (merge option) from each of the runs together for downstream analyses. The resulting feature table and sequences files were run through open-reference OTU picking with VSEARCH utilizing SILVA’s 16S QIIME formatted database (release 128–99% identity sequences) at a 99% identity threshold for clustering [63, 65,66,67]. Low abundance OTUs were filtered out from the resulting feature table, where OTUs with a frequency of less than 10 sequences across all samples were removed. Taxonomy assignments were also generated with SILVA rRNA Database (release 128–99% consensus taxonomy, 7 levels) by extracting out only the 16S V1-V2 regions that correspond to the 27F – 338R primers used for sample library preparation – truncation length of 500 bp (feature-classifier plug-in, extract-reads option) [68]. QIIME 2’s Naïve Bayes classifier was trained to these extracted V1-V2 reference sequences with the feature-classifier plug-in (fit-classifier-naive-bayes). We then utilized this V1-V2 trained classifier set to complete taxonomic assignments with the classify-sklearn feature-classifier plug-in option. After assigning taxonomy to the OTUs, taxa plots were generated with the taxa plug-in (barplot command) and all OTUs that were classified as unassigned, chloroplasts, and / or mitochondria were filtered out from the feature/ OTU table (taxa filter-table command) and representative sequences. Each of the tissue sample groups feature tables were summarized (feature-table summarize command), taxa barplots generated and reviewed via QIIME 2 View. The resulting level 3 (class) and level-6 (genera) csv files were analyzed via R with the following packages: dplyr, tidyr, stringr, and digest [69,70,71,72,73]. The OTUs that made up at least 2% or greater relative abundance across each tissue set was used to generate barplots with ggplot 2 [74].
Alignments were completed on the set of representative sequences with MAFFT (Alignment MAFFT plug-in) [75] and the unconserved, highly gapped columns in these aligned sequences were masked with the alignment plug-in, mask command [76]. A phylogenetic tree was then generated with the FastTree 2 tool (Phylogeny plug-in) using a Maximum-Likelihood method [77]. The resulting tree was then midpoint rooted (Phylogeny plug-in, midpoint-root option) and utilized for downstream beta-diversity analyses.
Mock community standard and quality assurance measures
The data from the first sequencing run underwent quality control measures (Table S3) to ensure the sample library preparation steps and sequencing performed as expected prior to moving forward with sample library preparation and second sequencing run for the majority of the spider samples. This step was also completed to determine if the OTU clustering threshold of 99% was appropriate for the data analysis pipeline, in order to reduce the potential of erroneously generating OTUs. Quality control assessments were completed by comparing the percentage of each mock community member present within the first sequencing run dataset to the theoretical/ expected mock community composition as provided by the manufacturer, Zymo Research. The preliminary results of the mock community analysis indicated that the sample library preparation method and sequencing specifications were appropriate and accurately measured the mock community composition. Furthermore, one of the whole P. tepidariorum replicate samples (note: total of 3 replicates per sample) from the first sequencing run was repeated through the library preparation process and second MiSeq run as a positive control to test the differences between runs.
Diversity analyses
After OTU clustering, taxonomy assignments, taxonomy-based filtering, and 16S rRNA gene alignments, the feature/ OTU table was rarefied based on the depths of coverage per sample type. A depth of 7032 randomized sequences per sample was selected for the core-metrics beta-diversity analyses (prior to any grouping of replicate samples) based on this depth of coverage encompassed all but one spider tissue sample (this sample contained a very high percentage of chloroplast related OTUs, which were filtered out in upstream data processing steps) and the alpha-diversity results from running the diversity plug-in, alpha-rarefaction command run on spider samples (Fig. S1A & B, depth range of 18–18,832 sequences). QIIME 2’s diversity plug-in core-metrics-phylogenetic command was run on all the filtered samples (crickets included), each set of tissue types, and then on only the spider samples (crickets removed); in order to best resolve the resulting UniFrac distance matrix derived EMPeror PCoA plots [78]. Box-plots were generated, along with group significance statistical testing (using the QIIME diversity alpha-group-significance which is a Kruskal-Wallis one-way analysis of variance), from each resulting alpha-diversity test vector file utilizing the diversity plug-in alpha-group-significance command [79]. Statistical analyses were completed to determine significant changes in the abundances of the microbial community members between and across samples from each tissue type with ANCOM testing [80]. Sample (biological) replicates were then grouped with the feature-table group command (mean-ceiling, i.e. average frequency of each OTU across sample replicates) by base sample type - whole spiders, cephalothorax, venom glands, ovaries, silk glands and fat tissue (one grouped table per tissue type – resulting in 6 grouped tables - only these grouped tables, not individual replicate samples, were used for the beta-diversity dendrograms for phylosymbiosis analyses). A master grouped feature table was also generated for all the spider samples – grouping each replicate sample by the mean-ceiling option. Each of these grouped feature tables were summarized (feature-table summarize command), had taxa barplots generated and each were reviewed via QIIME 2 View. Each grouped tissue type feature table was then passed though the diversity beta-rarefaction command with a selected depth of coverage determined by the lowest sample sequence/ feature count per each set of grouped tissue samples (Whole Spiders = 14,509, Cephalothorax = 20,799, Venom = 20,826, Ovaries = 28,629, Silk = 28,628, Fat = 17,074) for the following beta-diversity tests: Bray-Curtis, Jaccard, Unweighed UniFrac [81] and Weighted UniFrac [78]. Furthermore, a PERMANOVA test was conducted to look at pairwise differences in beta diversity (Bray-Curtis Distances) between species or tissues was conducted as defined by the conditions of the ADONIS function within QIIME (Supplemental Table 8) [82].
RNA sequencing Metatranscriptome analysis
Publicly available RNA sequencing data from previous research completed by Garb et. al, specific to silk, venom, and ovary glands, were acquired from NCBI for the following species: P. tepidariorum, S. grossa, L. geometricus, and L. hesperus. The collection sites for these spiders are as follows: P. tepidariorum (silk, venom, and ovary) - Cologne, Germany, out-bred with P. tepidariorum spiders from Spider Pharm/ Arizona, United States, L. hesperus (silk, venom and ovary) - Riverside County, California, United States, L. geometricus (silk, venom, and ovary) - San Diego County, California, United States. S. grossa was obtained directly from Spider Pharm (similar to samples used for 16S sequencing). The following SRA files were downloaded from NCBI’s SRA database: SRR1219665, SRR1824489, SRR5131057, SRR5131058, SRR5285094, SRR5285095, SRR5285096, SRR5285099, SRR5285100, SRR5285114, SRR5285115, SRR5285118, SRR5285121, SRR5285122, SRR5285123, SRR5285135, SRR5285136, SRR5285138, SRR5285141 and SRR5285142 [21, 53, 83, 84]. Fastq files were extracted from each SRA file utilizing the SRAToolkit, fastq-dump command [85]. The extracted fastq files were run through FastQC and then the adaptors and poor quality bases were trimmed with Trimmomatic [86, 87]. The Trimmomatic parameters utilized for trimming the raw reads are as follows: crop_length = total read length – 1 (between 75 and 100 depending on library), seed_mismatches = 2, paired_end_seed_score = 30, min_adapter_length = 2, keepBothReads = true, sliding_window_size = 10, sliding_window_minimum_average_phred_score = 15, min_length_to_keep_reads = 36, trimmomatic_threads = 8 [86, 87]. Corresponding Read 1 and Read 2 sequences for each species silk, venom or ovary glands were concatenated, gzipped and uploaded to CosmosID for identification of any possible microbial transcripts / sequences, as Cosmos ID is capable of uploading and analyzing raw or processed read data [88]. CosmosID takes raw, unassembled reads and matches them to the GenBook® database, utilizing statisticial and and computational methods [89].
Phylosymbiosis analyses
The following analysis methods were used to determine if there is evidence of phylosymbiosis across the selected widow related spider phylogeny as described by Brooks et al. [8]. Mitochondrial COI gene sequences were obtained from a previous study, where a 428–659 bp fragment was sequenced for each spider species represented in this study [22]. The five resulting mtCOI sequences were aligned with ClustalX v2.1 in multiple alignment mode [90]. The aligned sequences were exported in Newick format and run through jModeltest (2.1.10 v20160303) to determine the best substitution model for generating a widow spider phylogenetic tree [27, 91]. The aligned sequences were then utilized to generate a Maximum-Likelihood tree with RAxML v8.2.11 (GTR + γ substitution model and 10,000 iterations) [26]. The resulting tree was viewed and rooted with the P. tepidariorum branch utilizing FigTree v1.4.3 [92]. The resulting widow phylogenetic tree and each of the tissue specific and whole spider microbial beta-diversity dendrograms were tested for congruency by utilizing Robinson-Foulds Cluster and Matching Cluster tests to compute the distances (dissimilarity) between the host phylogeny and each microbial beta-diversity tree (both trees rooted) [8, 28].