Keywords

1 Introduction

With today’s cheap Next Generation DNA sequencing (NGS) virtually all DNA variation in genomes can be readily identified, including new mutations. Such knowledge makes the breeding process more efficient. Being able to comprehensively catalogue genome-wide DNA variation at the population-scale opens the door for genomic prediction as well as for tracking genetic variation through the breeding process.

Despite the low prices, sequencing cost is currently still of concern when applying whole genome approaches on a large number of samples, particularly when high sequencing depth is required. An example is mutation detection in mutant M1 populations, where induced mutations are in hemizygous, and often in chimeric state. For many genomics-supported breeding applications it is sufficient to sequence only a representative subset of the genome. This can save cost. There exist several approaches to achieve such complexity reduction. One of them is ‘target sequence capture’, a molecular biology procedure that enriches for predefined regions of the genome (targets) prior to sequencing. Probes complementary to target DNA sequences are designed at large scale and used to effectively capture, i.e., pull-out, the desired molecules from sequencing libraries, thereby enriching for target molecules. The so enriched libraries are then subjected to Next Generation Sequencing (NGS) and the resulting sequencing data mostly consist of sequences representing the target regions. In the case of exome capture, those target regions are the predicted genes, the exome.

Applying target capture requires an up-front investment: It needs prior knowledge of the DNA sequence of target regions and the production of probes. In case of exome capture, which intends to enrich for (all) genes, the selection of target regions is based on a suitable reference genome and a genome annotation, which has to be available or generated. The number of exons in a eukaryotic genome is large, and the necessary number of probes can be in the hundreds of thousands. In human medical applications, including diagnostics, exome capture sequencing is standard procedure for more than a decade (Choi et al. 2009). Exome capture sequencing has gained traction in plant breeding for important food crops with very large genomes such as wheat (Dong et al. 2020; Gardiner et al. 2019) and barley (Mascher et al. 2013; Russell et al. 2016), with several commercial suppliers offering competing exome capture panels and kits.

To enable cost-effective whole genome approaches in coffee breeding, we developed and provide an Exome Capture Kit for Coffea arabica. This is in collaboration with Daicel Arbor Biosciences (Ann Arbor, MI, USA), hereafter “Arbor”. Coffea arabica is an allotetraploid and the genome is the result of a merger of C. eugenoides and C. canephora (Scalabrin et al. 2020). The design is based on a public C. arabica genome assembly and annotation (Cara_1.0, NCBI accession number GCF_003713225.1, derived from cultivar ‘Caturra red’, isolate CCC135-36), which we augmented with a public C. arabica chloroplast sequence (NCBI accession number: NC_008535.1).

This chapter details the design of Daicel Arbor Biosciences’ Exome Capture Kit, provides a step-by-step protocol for its use, and describes a validation experiment of exome capture sequencing of 41 indexed samples in a single capture experiment.

2 Materials

Main inputs to the exome capture procedure are a whole-genome DNA sequencing library outfitted with respective adaptors and Arbor’s Exome Capture Kit. Additional requirements for equipment, consumables, and reagents are listed below. Most of these should already be at hand as they will have been used when preparing the NGS library. For post-capture library amplification, Arbor recommends KAPA HiFi DNA polymerase.

2.1 The Exome Capture Kit

Main component of the Exome Capture Kit are thousands of probes that are complementary to the thousands of target regions. They function as baits to fish their complementary targets from an NGS library in solution. In case of this Arbor kit the baits are biotinylated RNA molecules, and the target is the exome extracted from a publicly available coffee reference genome and annotation (NCBI).

2.1.1 Exome Capture Kit Design Details

Initial target intervals for probe design included a C. arabica Chloroplast (NC_008535.1) in its entirety and all annotations containing the string “exon” found in the C. arabica genome assembly GCF_003713225.1 (https://www.ncbi.nlm.nih.gov/assembly/GCF_003713225.1).

The exonic intervals of the genome assembly were merged into non-overlapping regions representing 94.5 Mbp total exome space. The regions were padded with 50 nt on either side (i.e., 5’- and 3’-ends) and new overlaps re-merged, which resulted in 121.0 Mbp sequence space for initial probe design. Regions were divided into non-overlapping 100nt intervals and the best 80nt candidate probe hybridization site was chosen using Arbor’s proprietary algorithm. Candidate probe sequences with strong predicted affinity to regions outside of the target regions were removed. The final predicted retrievable space of the filtered probe set was estimated by aligning the remaining probes back to the genome (megablastn, BLAST + version 2.6.0 +, default parameters) and padding each probe hit with 200nt on either side. Merging these regions results in 151.8 Mbp total genome space (represented in file DAB_CoffeeExomeV1_capspace.bed.gz), of which 87.2 Mbp overlap with the original exon region intervals (overlap represented in file: DAB_CoffeeExomeV1_exonspace.bed.gz). These files can be downloaded from the kit’s dedicated section on the Arbor website https://arborbiosci.com/genomics/targeted-sequencing/mybaits/mybaits-custom-predesigned-community-panels/plants-and-fungi/.

The probes were synthesized in four distinct sets: Subgenome “C” (=canephora), Subgenome “E” (=eugenoides), Subgenome “O” (“other” = unassigned contigs), and “Chlor” (=chloroplast). The probe sets can be used separately or combined as the user sees fit depending on the application. To generate a pool of all nuclear genome probes, the “C” sub-genome module should comprise 47.4% of the pool by volume, the “E” sub-genome module 49.1%, and the “other” sub-genome module 3.5%. If the user aims to enrich the chloroplast as well, that module can comprise a final 0.1% of the final pool, though optimization for the tissue type might be required.

2.1.2 Availability of the Exome Capture Kit

The Coffee Exome V1 kit is available from Daicel Arbor Biosciences as part of their Community Panels series (https://arborbiosci.com/genomics/targeted-sequencing/mybaits/mybaits-custom-predesigned-community-panels/plants-and-fungi/). The design ID is D10496CFEXM. Order inquiries should be directed to sales@arbor.daicel.com.

2.2 NGS Library Requirements

In principle, libraries prepared for Illumina short-read as well 3rd-generation long-read sequencing technologies can be used. This protocol describes the exome capture reaction for Illumina sequencing libraries with dual-index-barcoded Nextera-type adaptors. For different adaptors, such as ‘TruSeq’, the protocol is the same, but different blockers and universal amplification primers will be required. Please consult the respective manual from Arbor.

Input requirement

100500 ng dsDNA in 7 µl

Nextera/Illumina short-read sequencing library

2.3 Equipment

  1. 1.

    Heat Block for 1.5 ml microfuge tubes.

  2. 2.

    Thermal cycler (PCR machine) with heated lid suitable for desired vessel size.

  3. 3.

    Qubit Instrument or equivalent for fluorescence-based dsDNA quantification.

  4. 4.

    Optional: Fragment analyser to establish DNA fragment size distribution.

2.4 Consumables and Reagents (Non-standard)

  1. 1.

    Coffee Exome V1 Capture Kit, Daicel Arbor Biosciences, Community Panel design ID D10496CFEXM

  2. 2.

    Magnet for 1.5 ml Eppendorf tubes (e.g., DynaMag™-2, Invitrogen™, ThermoFisher #12321D)

  3. 3.

    Magnet for PCR-strips/tubes (e.g., DynaMag™-96 Side Magnet (Invitrogen™ ThermoFisher #12331D)

  4. 4.

    KAPA HiFi HotStart ReadyMix (Roche #KK2601)

  5. 5.

    Resuspension Buffer (self-prepared): 10 mM TrisCl, 0.05% Tween-20, pH 8.0–8.5

  6. 6.

    Protein LoBind® Tube, 1.5 ml (Eppendorf #0030108116)

  7. 7.

    Agencourt Ampure XP beads (Beckman Coulter, Agencourt #A63881)

  8. 8.

    Qubit™ dsDNA HS Reagent (Invitrogen™, ThermoFisher #Q32851)

  9. 9.

    Optional (when using manufacturer’s deprecated protocol version 4): xGen Universal Blockers-NXT Mix, Integrated DNA Technologies Inc. (IDT): Catalogue No. 1079584.

2.5 PCR Primers

Universal amplification primers post-capture amplification of the NGS library must match the respective NGS library type. This protocol uses Nextera-type/Illumina libraries.

Name

Alias

Sequencea

Seib_275

Nextera libraries-universal-FWD

A*ATGATACGGCGACCACCGAGA

Seib_276

Nextera libraries-universal-REV

C*AAGCAGAAGACGGCATACGAGA

  1. athe star (*) denotes a PTO-binding

3 Methods

Figure 1 provides an overview of the subsequent steps, their approximate duration, and required consumables and equipment.

Fig. 1
A flow diagram includes the following steps. Illumina sequencing library, hybridization mix setup, hybridization, bead preparation, bead binding, washes, library resuspension, library amplification, library cleanup, library Q C, and submit for sequencing.

Workflow of the exome capture procedure with time estimates and required consumables

Sequencing libraries are combined with various blockers (=Hybridisation Mix Setup) and then incubated with the baits/probes at 65 °C for the actual capture (=Hybridisation). The hybridisation is usually performed overnight. The next day, buffer and beads for the binding and washes are prepared and the bait/target hybrid molecules are captured with beads (=Bead Binding). A total of 4 washes at 65 °C remove unbound and unspecific DNA molecules (=Washes). The target molecule library is then recovered from the beads and amplified to desired amount (Library Resuspension, Library Amplification) and bead cleaned for sequencing (=Library Cleanup, Library QC).

All reagents required for the actual capture and wash reactions are included in the Daicel Arbor Biosciences Kit. Reagents for resuspension, amplification, final bead clean-up, and QC will have to be provided by the user.

3.1 Hybridisation Mix Setup

The following describes the preparation of the baits, the setup of the hybridization mix. All consumables for the hybridisation are contained in the Exome Capture Kit.

3.1.1 Combining Baits

Pool the different sub-genome probe sets in representative ratios (see Note 1). Below table gives the necessary amounts for one capture reaction, scale if required.

Bait

Amount

Ratio in final pool (%)

C. canephora (“C”)

2.61 µl

47.4

C. eugenoides (“E”)

2.70 µl

49.1

Other (“O”)

0.2 µl

3.5

Chloroplast (“Chlo”)

1 µl of a 1:1000 dilution

 ≤ 0.1

 

6.5 µl total

 

3.1.2 Set Up the Hybridisation Mix

Component

Amount

Hyb N

9.25 µl

Hyb D

3.5 µl

Hyb S

0.5 µl

Hyb R

1.25 µl

Baits

5.5 µl

 

20 µl total

3.1.3 Set Up the Blockers Mix

The Blockers Mix has changed between Arbor myBaits kit manuals versions v4 and v5. Version v5 should be used. Version v4 is given for backwards compatibility only.

  1. 1.

    Set up the Blockers Mix.

    (Amounts are given for one capture reaction, scale as appropriate)

    Component

    Blockers Mix v5

    Blockers Mix v4

    (deprecated)

    Block X

    0.5 µl

    IDT blocker (see Note 2)

    2 µl

    Block O

    2.5 µl

    2.5 µl

    H2O (see Note 3)

    2.5 µl

    2.5 µl

     

    5.5 µl total

    7 µl total

3.2 Hybridisation

During hybridization the binding of the probes/baits to the complimentary molecules in the NGS library occurs. Hybridisation is performed at 65 °C after denaturation at 95 °C. Use PCR tubes/or strips and perform the Incubation program in a thermal cycler. Use a heated lid to minimize condensation. The hybridisation is a 2-step process, where blockers and library are denatured at 95, and the Hybridisation mix is added after the library has been cooled down to 65 °C. (Amounts given are per capture reaction).

  1. 1.

    Create the incubation program in a thermal cycler.

    Incubation program

    95 °C

    5 min

    65 °C

    5 min

    65 °C

    Forever

  1. 2.

    For hybridisation, combine components and incubate as per table below.

    Component

    Amount

    Blockers mix

    5 µl

    Sequencing library (100500 ng dsDNA)

    7 µl (mix by pipetting)

    • Denature in thermal cycler (95 °C, 5 min)

    • Let the cycler reach hybridization temperature (65 °C)

    • Equilibrate Hybridisation mix in thermal cycler (65 °C, 5 min)

    Add hybridisation mix to library/blocker, mix by pipetting, ~ 5 × 

    18.5 µl

     

    30.5 µl total

    • Incubate as 65 °C for 16 + h (in practice: overnight)

3.3 Bead Binding and Washes

During binding, the bait-target hybrids are collected with streptavidin coated magnetic beads and subsequently washed with warm buffer (65 °C) to remove non-target DNA. ‘Wash buffer X’ and beads and need to be prepared before use.

3.3.1 Prepare ‘Wash Buffer X’

Amounts given are per capture reaction. Scale up if you have more than one.

Component

Amount

Hyb S

6.25 µl

H2O

618 µl

Wash Buffer

156 µl

 

780.25 µl total

3.3.2 Prepare Beads

  1. 1.

    Aliquot 30 µl beads in a 1.5 ml protein low-bind Eppendorf tube.

  2. 2.

    Pellet the beads on a magnet for 2 min.

  3. 3.

    Discard supernatant.

  4. 4.

    Conduct 3 washes:

    • Add 200 µl Binding Buffer and thoroughly resuspend the beads,

    • Pellet the beads on the magnet for 2 min,

    • Remove and discard the supernatant.

  5. 5.

    Resuspend beads in 70 µl binding buffer.

  6. 6.

    Transfer to PCR tube/strip.

3.3.3 Bead Binding Reaction

At this point the hybridization reaction should have been in the thermal cycler for the past 16 + hours and still be in the cycler at 65 °C. In the below we will add the prepared magnetic beads to our hybridisation reaction. Those beads will then bind the baits.

  1. 1.

    For bead-binding the baits, combine components and incubate as per table below.

    Component

    Amount (in µl)

    Prepared Beads in PCR tube

    70 µl

    • Equilibrate bead aliquots in thermal cycler at 65 °C for 2 min) (place them alongside the hybridization reaction in the thermal cycler)

    Transfer capture reaction(s) to the bead aliquot(s)

    30.5 µl

    • mix by pipetting, ~ 5 × 

    • replace the lids

     

    100 µl total

    • Incubate in thermal cycler at 65 °C for 5 min (Flick/spin the tubes after 2.5 min to keep beads suspended)

  1. 2.

    Take out from the thermal cycler.

  2. 3.

    Pellet the beads on a magnet until the solution is clear, discard supernatant.

  3. 4.

    Immediately perform 4 subsequent washes with pre-warmed ‘Wash buffer X’ (see next step: 3.3.4 Bead Washing).

3.3.4 Bead Washing

Repeat the below steps 4 times for a total of 4 washes. After the last wash, remove all wash buffer and proceed without delay to 3.4 Library Resuspension.

4 × 

Add 180 µl warmed wash buffer X to the beads, mix by pipetting

Incubate in thermal cycler (65 °C, 5 min). Flick/spin the tubes after 2.5 min to keep beads suspended

Pellet the beads on a magnet until the solution is clear, discard supernatant

Proceed without delay with the next step: 3.4 Library Resuspension.

3.4 Library Resuspension

Add 30 µl of 10 mM Tris-Cl, 0.05% Tween-20 (pH 8.0–8.5) to the washed beads and resuspend the ‘enriched library’ by pipetting.

3.5 Library Amplification

Set up the PCR reaction mix as per below with universal primers suitable for your library type. The resuspended, ‘enriched library’ is of sufficient volume to conduct two PCRs as per Arbor protocol. Overamplification of the library should be avoided. Pooling of independent PCRs can reduce error.

3.5.1 PCR Primers

Amplification primers for Nextera libraries

Universal forward primer [i5]

AATGATACGGCGACCACCGAGA

Tm = 66.2

Universal reverse primer [i7]

CAAGCAGAAGACGGCATACGAGA

Tm = 64.4

3.5.2 PCR Reaction Mix

Component

Amount

H2O

5 µl

KAPA HiFi HotStart ReadyMix (2 × )

25 µl

Universal forward primer [i5], 10 µM

2.5 µl

Universal reverse primer [i7], 10 µM

2.5 µl

Enriched library (on beads)

15 µl

 

50 µl total

3.5.3 PCR Program

Step

Temperature (°C)

Time

 

1

98

2 min

 

2

98

20 s

8–14 cycles

3

60

30 s

4

72

Length-dependenta

5

72

5 min

 

6

15

Forever

 
  1. aRecommended elongation times (by average insert size): 500 bp: 30 s, 500–700 bp: 45 s, > 700 bp: 1 min

3.6 Library Clean-Up

  1. 1.

    Optional: Pool several PCRs.

  2. 2.

    Perform least two rounds of bead clean-ups: 1× bead clean-up, followed by a 0.7× bead clean-up. Initial clean-up and volume reduction can be more cost-effective using a column-based PCR clean-up kit (e.g., Qiagen).

3.7 Library QC and Quantification

Sequencing service providers will have minimum requirements with respect to DNA amount and quality and often require a minimum ‘molarity’, which can be calculated from average fragment size and weight. The size distribution should be determined with a Fragment Analyzer and the amount of dsDNA in ng by fluorescence-based DNA quantification. Molarity can then be calculated using the formula below:

$$ \frac{{concentration \left( {\frac{ng}{{\mu l}}} \right)*10^{6} }}{660*Average \,fragment\, length} = Molarity \left( \frac{nmol}{l} \right) $$

The formula was copied from https://bitesizebio.com/23105/quantifying-your-ngs-libraries/. Illumina has published a technical note on the quantification of Nextera Libraries of similar content: https://www.illumina.com/documents/products/technotes/technote_nextera_library_validation.pdf.

4 Performance of the Exome Capture Kit—Example Project

To test the performance of the PBGL/Daicel Arbor Biosciences Exome Capture Kit, we performed exome capture and sequencing on an Illumina/Nextera NGS library pool of 41 DNA samples, aligned the resulting sequencing reads to the reference genome and assessed the fraction of reads that matched the exome and the coverage. We used the same reference genome and annotation that had been used to design the kit.

4.1 Example Project: Sequencing a Mutant Population (M1V1)

The work was performed at the PBG Laboratory, Seibersdorf, Austria and entailed individual DNA isolations from 41 leaf samples derived from Coffea arabica plants that had been grown in tissue culture, sequencing library construction for each sample (Nextera), pooling of all samples, performing the exome capture reaction on the pool of 41 samples, and submitting the library pool to a service provider for Illumina short-read sequencing (PE150). During library preparation, each sample received an individual molecular barcode (index), so the sequencing reads could be associated to the respective samples after DNA sequencing. We aligned the raw reads (fastq files) to the Coffea arabica reference genome Cara_1.0 (NCBI assembly GCF_003713225.1) with software bwa mem (Li and Durbin 2009). From these alignments (bam files) we evaluated the quality of the capture and enrichment with the R-package TEQC (Hummel et al. 2011, 2020).

4.1.1 Input NGS Library

An Illumina DNA sequencing library pool with 41 individually-indexed coffee samples was prepared following a transposase-mediated protocol (Nextera-type) as detailed in the IAEA-PBGL protocol: Library Preparation for Medium- to High-throughput DNA Sequencing on the Illumina Sequencing Platform, A Laboratory Protocol (IAEA 2022a). The library pool was size selected with Ampure XP beads (one-sided, 0.7×) to an average insert size of ~ 540 bp and a lower size limit of above 300 bp (Fig. 2). Seven microliter (7 µl) containing 300 ng of this Illumina/Nextera sequencing library pool was the input for the exome capture reactions.

Fig. 2
A graph of sample intensity versus size depicts a curve with 3 peaks as follows. 1. Lower, (25, 375). 2. 539, (550, 700). 3. Upper, (1500, 500). The values are approximate.

Size distribution of the input DNA sequencing library pool of 41 individually indexed coffee samples (Illumina/Nextera), assessed with ©Agilent Technologies, Inc. TapeStation, high sensitivity D1000 ScreenTape®

Fig. 3
A graph of sample intensity versus size depicts a curve with 3 peaks as follows. 1. Lower, (25, 425). 2. 566, (575, 475). 3. Upper, (1500, 500). The values are approximate.

Size distribution of the Exome Captured library as shipped to the sequencing service provider, 1/4 dilution assessed with ©Agilent Technologies, Inc. TapeStation, high sensitivity D1000 ScreenTape®

4.1.2 Exome Capture

One capture reaction was performed on this pool of 41 samples following Arbor protocol version 4: Baits (5.5 µl) were combined with the hybridisation components to 20 µl Hybridisation Mix. Blockers (2 µl IDT Blocker, 2.5 µl Block O) were added to 7.5 µl of the Illumina library resulting in 12 µl total. 18.5 µl of Hybridisation mix were combined with the 12 µl library/blocker mix and hybridization was allowed to occur in a PCR machine for 16 h at 65 °C. The bait/library hybrids were captured (with streptavidin-coated beads) and washed with 1× Buffer X (618 µl H2O, 156 µl wash buffer, 6.25 µl Hyb S). Beads were resuspended in 30 µl 10 mM TrisCl, 0.05% TWEEN-20, pH 8.0, and two independent enrichment PCRs (50 µl, KAPA HiFi) were performed, each with 15 µl of the bead suspension as template, 13 PCR cycles with 45 s extension time. Both PCRs were pooled (100 µl total) and subjected to PCR purification (Qiagen MinElute) and two subsequent bead-cleanups for size selection (1 and 0.7× with Ampure XP beads). Final DNA amount was assessed by fluorescence measurement (Qubit). A one in four dilution was assessed for size distribution on the Agilent TapeStation.

4.1.3 Output Exome Enriched NGS Library

DNA amount of the exome enriched library was assessed by fluorescence measurement (Qubit). A one in four dilution was assessed for size distribution on the Agilent TapeStation (Fig. 3). Average fragment size of the library was ~ 570 bp, which corresponds to an average insert size of ~ 460 bp, adaptors subtracted.

4.1.4 DNA Sequencing

The exome-enriched library along with the list of sample indices was submitted to a sequencing service provider for Illumina DNA sequencing PE150 (paired-end reads with 150 bp read length). We shipped 200 ng (50 µl, 4 ng/µl) and requested 400 Gbp raw data output. We received a total of 3.2 billion reads. They were fairly well distributed across the 41 samples (Fig. 4), with between 58 and 113 Mio reads per sample (Median: 75 Mio).

Fig. 4
A box plot indicates the number of reads in millions. The minimum and maximum values are 58 and 114, respectively. The lower and upper quartiles are 65 and 85, respectively. The median is 74. The values are approximate.

Requesting 400 Gbp raw data output resulted in 3.2 billion sequencing reads with fairly even distribution across the 41 samples. Median is 75 Mio reads

4.1.5 Analysis and Results

We aligned all 3.2 billion sequencing reads to the coffee reference genome; the same annotated reference assembly that had been used to derive the targets (Cara_1.0, NCBI accession number: GCA_003713225.1). The reads were aligned with software bwa mem (Li and Durbin 2009) as part of our automated analysis workflow: A Software Workflow for Automated Analysis of Genome (Re-) Sequencing Projects, A Laboratory Protocol (IAEA 2022b). Software and documentation are available on PBGL’s github page (https://github.com/pbgl).

Fig. 5
A box plot of the fraction of reads versus the target region depicts an increasing trend. The data are as follows. 1. On-target, 80%. 2. On-target + 100, 86%. 3. On-target + 200, 88%.

For each individual sample we assessed what fraction of the sequencing reads that align to the genome match annotated genes. Counting strictly the region annotated as exons we reach 80% with a very little variation between samples. When extending the target space by 100 or 200 bp to either side this fraction increases. This is expected, because the probes are fishing molecules from a library with an average insert size of 460 bp (Fig. 3). We can conclude that close to 90% of the sequencing reads are matching the target space

Fig. 6
A box plot of the fraction of targets versus read coverage depicts a decreasing trend. The data are as follows. 1-fold, 91%. 5-fold, 84%. 10-fold, 75%. 20-fold, 63%. 40-fold, 45%.

We assessed what fractions of genes are covered at least 1, 5, 10, 20 or 40-fold. More than 90% of annotated gene is covered at least one-fold and ¾ of the genes are covered more than tenfold

The on-target enrichment for each individual sample was assessed from the alignments to the reference, represented in per sample.bam files, with the R-Bioconductor package TEQC (Hummel et al. 2011, 2020). Target definitions were the actual exons of the annotation (see Figs. 5 and 6 for results). As an example, a representative genomic region is shown in Fig. 7.

Fig. 7
A screenshot of the Integrative Genomics Viewer window demonstrates the genomic region composed of bars of different colors.

Visualization of successful target enrichment by the Exome Capture Kit. Depicted is a representative genomic region (screenshot of the Integrative Genomics Viewer, IGV, see Note 4), showing the alignments of sequencing reads (bam file) of 41 coffee samples on the Coffee arabica reference genome. Target regions (red bars) correspond to the exons (thick blue bars) of genes (blue bars). The libraries are effectively enriched for the target regions, reads (grey bars) pile on target regions (red bars) with very little background, i.e., non-target reads

5 Manuals

  1. 1.

    The manufacturer’s manuals for performing exome capture reactions with this kit

myBaits, Hybridization Capture for Targeted NGS Manual Version 4.01 April 2018, https://arborbiosci.com/wp-content/uploads/2019/08/myBaits-Manual-v4.pdf.

myBaits, Hybridization Capture for Targeted NGS User Manual Version 5.00 September 2020,

https://arborbiosci.com/wp-content/uploads/2020/08/myBaits_v5.0_Manual.pdf.

  1. 2.

    Sequencing library preparation

The custom-indexed Nextera NGS libraries for Illumina Sequencing were prepared following the PBGL protocol: Library Preparation for Medium- to High-throughput DNA Sequencing on the Illumina Sequencing Platform, A Laboratory Protocol (IAEA 2022a).

  1. 3.

    Sequence read mapping

Read mapping with software bwa mem (see Note 5) (Li and Durbin 2009) was performed as part of PBGL's automated software workflow: A Software Workflow for Automated Analysis of Genome (Re-) Sequencing Projects, A Laboratory Protocol (IAEA 2022b).

  1. 4.

    Quality assessment of the capture reactions

TEQC: Quality control for target capture experiments, Hummel et al. (2020). DOI:10.18129/B9.bioc.TEQC, TEQC, R package version 4.18.0. https://bioconductor.org/packages/release/bioc/html/TEQC.html (Hummel et al. 2011).

6 Notes

  1. 1.

    Coffee arabica is an allotetraploid of Coffea eugenoides and Coffea canephora. We developed separate probe sets for the different sub genomes, so that they can be used independently, if desired. For use in Coffea arabica they need to be pooled in representative ratios. If the user aims to enrich the chloroplast as well, that module can comprise a final 0.1% of the final pool, though optimization for the tissue type might be required.

  2. 2.

    xGen® Universal Blockers-NXT Mix, Catalog no. 1079584, purchased from Integrated DNA Technologies Inc. (IDT, www.idtdna.com).

  3. 3.

    If the amount of DNA in library is limiting, then H2O can be replaced with additional sequencing library.

  4. 4.

    https://software.broadinstitute.org/software/igv/.

  5. 5.

    https://bio-bwa.sourceforge.net/bwa.shtml.