Targeted Sequencing in Coffee with the Daicel Arbor Biosciences Exome Capture Kit

Warthmann, Norman

doi:10.1007/978-3-662-67273-0_19

Norman Warthmann⁴

1138 Accesses

Abstract

Exome Capture is a molecular biology technique that, in combination with Next Generation DNA sequencing technologies (NGS), allows for selectively sequencing the predicted genes of an organism. Such capture sequencing provides a compromise between genome coverage and sequencing cost. The capture reaction is an additional step in an otherwise standard sequencing protocol and exome capture effectively enriches the sequencing library for DNA molecules that overlap with predicted genes (the exome). This enables genome-wide assessments while focusing on the gene space. Capture sequencing is particularly attractive in species with large genomes, where whole genome sequencing in larger numbers of samples would be cost-prohibitive at present prices. Plant Breeding and Genetics Laboratory (PBGL) developed an Exome Capture Kit for Coffea arabica in collaboration with Daicel Arbor Biosciences (Ann Arbor, MI, USA). Use of the kit achieves eightfold enrichment, and hence approx. eightfold reduction in sequencing cost for a whole genome assessment of Coffee arabica plants. The kit is available as a regular product from Daicel Arbor Biosciences and this protocol describes the kit and gives detailed instructions on how to perform the capture reaction.

You have full access to this open access chapter, Download chapter PDF

Whole-genome resequencing of Coffea arabica L. (Rubiaceae) genotypes identify SNP and unravels distinct groups showing a strong geographical pattern

Article Open access 14 February 2022

CRISPR-Cas9 enrichment and long read sequencing for fine mapping in plants

Article Open access 01 September 2020

Development and application of the Faba_bean_130K targeted next-generation sequencing SNP genotyping platform based on transcriptome sequencing

Article 12 June 2021

Keywords

1 Introduction

With today’s cheap Next Generation DNA sequencing (NGS) virtually all DNA variation in genomes can be readily identified, including new mutations. Such knowledge makes the breeding process more efficient. Being able to comprehensively catalogue genome-wide DNA variation at the population-scale opens the door for genomic prediction as well as for tracking genetic variation through the breeding process.

Despite the low prices, sequencing cost is currently still of concern when applying whole genome approaches on a large number of samples, particularly when high sequencing depth is required. An example is mutation detection in mutant M₁ populations, where induced mutations are in hemizygous, and often in chimeric state. For many genomics-supported breeding applications it is sufficient to sequence only a representative subset of the genome. This can save cost. There exist several approaches to achieve such complexity reduction. One of them is ‘target sequence capture’, a molecular biology procedure that enriches for predefined regions of the genome (targets) prior to sequencing. Probes complementary to target DNA sequences are designed at large scale and used to effectively capture, i.e., pull-out, the desired molecules from sequencing libraries, thereby enriching for target molecules. The so enriched libraries are then subjected to Next Generation Sequencing (NGS) and the resulting sequencing data mostly consist of sequences representing the target regions. In the case of exome capture, those target regions are the predicted genes, the exome.

Applying target capture requires an up-front investment: It needs prior knowledge of the DNA sequence of target regions and the production of probes. In case of exome capture, which intends to enrich for (all) genes, the selection of target regions is based on a suitable reference genome and a genome annotation, which has to be available or generated. The number of exons in a eukaryotic genome is large, and the necessary number of probes can be in the hundreds of thousands. In human medical applications, including diagnostics, exome capture sequencing is standard procedure for more than a decade (Choi et al. 2009). Exome capture sequencing has gained traction in plant breeding for important food crops with very large genomes such as wheat (Dong et al. 2020; Gardiner et al. 2019) and barley (Mascher et al. 2013; Russell et al. 2016), with several commercial suppliers offering competing exome capture panels and kits.

To enable cost-effective whole genome approaches in coffee breeding, we developed and provide an Exome Capture Kit for Coffea arabica. This is in collaboration with Daicel Arbor Biosciences (Ann Arbor, MI, USA), hereafter “Arbor”. Coffea arabica is an allotetraploid and the genome is the result of a merger of C. eugenoides and C. canephora (Scalabrin et al. 2020). The design is based on a public C. arabica genome assembly and annotation (Cara_1.0, NCBI accession number GCF_003713225.1, derived from cultivar ‘Caturra red’, isolate CCC135-36), which we augmented with a public C. arabica chloroplast sequence (NCBI accession number: NC_008535.1).

This chapter details the design of Daicel Arbor Biosciences’ Exome Capture Kit, provides a step-by-step protocol for its use, and describes a validation experiment of exome capture sequencing of 41 indexed samples in a single capture experiment.

2 Materials

Main inputs to the exome capture procedure are a whole-genome DNA sequencing library outfitted with respective adaptors and Arbor’s Exome Capture Kit. Additional requirements for equipment, consumables, and reagents are listed below. Most of these should already be at hand as they will have been used when preparing the NGS library. For post-capture library amplification, Arbor recommends KAPA HiFi DNA polymerase.

2.1 The Exome Capture Kit

Main component of the Exome Capture Kit are thousands of probes that are complementary to the thousands of target regions. They function as baits to fish their complementary targets from an NGS library in solution. In case of this Arbor kit the baits are biotinylated RNA molecules, and the target is the exome extracted from a publicly available coffee reference genome and annotation (NCBI).

2.1.1 Exome Capture Kit Design Details

Initial target intervals for probe design included a C. arabica Chloroplast (NC_008535.1) in its entirety and all annotations containing the string “exon” found in the C. arabica genome assembly GCF_003713225.1 (https://www.ncbi.nlm.nih.gov/assembly/GCF_003713225.1).

The exonic intervals of the genome assembly were merged into non-overlapping regions representing 94.5 Mbp total exome space. The regions were padded with 50 nt on either side (i.e., 5’- and 3’-ends) and new overlaps re-merged, which resulted in 121.0 Mbp sequence space for initial probe design. Regions were divided into non-overlapping 100nt intervals and the best 80nt candidate probe hybridization site was chosen using Arbor’s proprietary algorithm. Candidate probe sequences with strong predicted affinity to regions outside of the target regions were removed. The final predicted retrievable space of the filtered probe set was estimated by aligning the remaining probes back to the genome (megablastn, BLAST + version 2.6.0 +, default parameters) and padding each probe hit with 200nt on either side. Merging these regions results in 151.8 Mbp total genome space (represented in file DAB_CoffeeExomeV1_capspace.bed.gz), of which 87.2 Mbp overlap with the original exon region intervals (overlap represented in file: DAB_CoffeeExomeV1_exonspace.bed.gz). These files can be downloaded from the kit’s dedicated section on the Arbor website https://arborbiosci.com/genomics/targeted-sequencing/mybaits/mybaits-custom-predesigned-community-panels/plants-and-fungi/.

The probes were synthesized in four distinct sets: Subgenome “C” (=canephora), Subgenome “E” (=eugenoides), Subgenome “O” (“other” = unassigned contigs), and “Chlor” (=chloroplast). The probe sets can be used separately or combined as the user sees fit depending on the application. To generate a pool of all nuclear genome probes, the “C” sub-genome module should comprise 47.4% of the pool by volume, the “E” sub-genome module 49.1%, and the “other” sub-genome module 3.5%. If the user aims to enrich the chloroplast as well, that module can comprise a final 0.1% of the final pool, though optimization for the tissue type might be required.

2.1.2 Availability of the Exome Capture Kit

The Coffee Exome V1 kit is available from Daicel Arbor Biosciences as part of their Community Panels series (https://arborbiosci.com/genomics/targeted-sequencing/mybaits/mybaits-custom-predesigned-community-panels/plants-and-fungi/). The design ID is D10496CFEXM. Order inquiries should be directed to sales@arbor.daicel.com.

2.2 NGS Library Requirements

In principle, libraries prepared for Illumina short-read as well 3rd-generation long-read sequencing technologies can be used. This protocol describes the exome capture reaction for Illumina sequencing libraries with dual-index-barcoded Nextera-type adaptors. For different adaptors, such as ‘TruSeq’, the protocol is the same, but different blockers and universal amplification primers will be required. Please consult the respective manual from Arbor.

Input requirement
100–500 ng dsDNA in 7 µl	Nextera/Illumina short-read sequencing library

2.3 Equipment

1.
Heat Block for 1.5 ml microfuge tubes.
2.
Thermal cycler (PCR machine) with heated lid suitable for desired vessel size.
3.
Qubit Instrument or equivalent for fluorescence-based dsDNA quantification.
4.
Optional: Fragment analyser to establish DNA fragment size distribution.

2.4 Consumables and Reagents (Non-standard)

1.
Coffee Exome V1 Capture Kit, Daicel Arbor Biosciences, Community Panel design ID D10496CFEXM
2.
Magnet for 1.5 ml Eppendorf tubes (e.g., DynaMag™-2, Invitrogen™, ThermoFisher #12321D)
3.
Magnet for PCR-strips/tubes (e.g., DynaMag™-96 Side Magnet (Invitrogen™ ThermoFisher #12331D)
4.
KAPA HiFi HotStart ReadyMix (Roche #KK2601)
5.
Resuspension Buffer (self-prepared): 10 mM TrisCl, 0.05% Tween-20, pH 8.0–8.5
6.
Protein LoBind^® Tube, 1.5 ml (Eppendorf #0030108116)
7.
Agencourt Ampure XP beads (Beckman Coulter, Agencourt #A63881)
8.
Qubit™ dsDNA HS Reagent (Invitrogen™, ThermoFisher #Q32851)
9.
Optional (when using manufacturer’s deprecated protocol version 4): xGen Universal Blockers-NXT Mix, Integrated DNA Technologies Inc. (IDT): Catalogue No. 1079584.

2.5 PCR Primers

Universal amplification primers post-capture amplification of the NGS library must match the respective NGS library type. This protocol uses Nextera-type/Illumina libraries.

Name	Alias	Sequence^a
Seib_275	Nextera libraries-universal-FWD	A*ATGATACGGCGACCACCGAGA
Seib_276	Nextera libraries-universal-REV	C*AAGCAGAAGACGGCATACGAGA

^athe star (*) denotes a PTO-binding

3 Methods

Figure 1 provides an overview of the subsequent steps, their approximate duration, and required consumables and equipment.

A flow diagram includes the following steps. Illumina sequencing library, hybridization mix setup, hybridization, bead preparation, bead binding, washes, library resuspension, library amplification, library cleanup, library Q C, and submit for sequencing. — **Fig. 1**

Sequencing libraries are combined with various blockers (=Hybridisation Mix Setup) and then incubated with the baits/probes at 65 °C for the actual capture (=Hybridisation). The hybridisation is usually performed overnight. The next day, buffer and beads for the binding and washes are prepared and the bait/target hybrid molecules are captured with beads (=Bead Binding). A total of 4 washes at 65 °C remove unbound and unspecific DNA molecules (=Washes). The target molecule library is then recovered from the beads and amplified to desired amount (Library Resuspension, Library Amplification) and bead cleaned for sequencing (=Library Cleanup, Library QC).

All reagents required for the actual capture and wash reactions are included in the Daicel Arbor Biosciences Kit. Reagents for resuspension, amplification, final bead clean-up, and QC will have to be provided by the user.

3.1 Hybridisation Mix Setup

The following describes the preparation of the baits, the setup of the hybridization mix. All consumables for the hybridisation are contained in the Exome Capture Kit.

3.1.1 Combining Baits

Pool the different sub-genome probe sets in representative ratios (see Note 1). Below table gives the necessary amounts for one capture reaction, scale if required.

Bait	Amount	Ratio in final pool (%)
C. canephora (“C”)	2.61 µl	47.4
C. eugenoides (“E”)	2.70 µl	49.1
Other (“O”)	0.2 µl	3.5
Chloroplast (“Chlo”)	1 µl of a 1:1000 dilution	≤ 0.1
	6.5 µl total

3.1.2 Set Up the Hybridisation Mix

Component	Amount
Hyb N	9.25 µl
Hyb D	3.5 µl
Hyb S	0.5 µl
Hyb R	1.25 µl
Baits	5.5 µl
	20 µl total

3.1.3 Set Up the Blockers Mix

The Blockers Mix has changed between Arbor myBaits kit manuals versions v4 and v5. Version v5 should be used. Version v4 is given for backwards compatibility only.

1.
Set up the Blockers Mix.

(Amounts are given for one capture reaction, scale as appropriate)
Component
Blockers Mix v5
Blockers Mix v4
(deprecated)
Block X
0.5 µl
–
IDT blocker (see Note 2)
–
2 µl
Block O
2.5 µl
2.5 µl
H₂O (see Note 3)
2.5 µl
2.5 µl

5.5 µl total
7 µl total

3.2 Hybridisation

During hybridization the binding of the probes/baits to the complimentary molecules in the NGS library occurs. Hybridisation is performed at 65 °C after denaturation at 95 °C. Use PCR tubes/or strips and perform the Incubation program in a thermal cycler. Use a heated lid to minimize condensation. The hybridisation is a 2-step process, where blockers and library are denatured at 95, and the Hybridisation mix is added after the library has been cooled down to 65 °C. (Amounts given are per capture reaction).

1.
Create the incubation program in a thermal cycler.
Incubation program
95 °C
5 min
65 °C
5 min
65 °C
Forever

2.

For hybridisation, combine components and incubate as per table below.

Component	Amount
Blockers mix	5 µl
Sequencing library (100–500 ng dsDNA)	7 µl (mix by pipetting)
• Denature in thermal cycler (95 °C, 5 min) • Let the cycler reach hybridization temperature (65 °C) • Equilibrate Hybridisation mix in thermal cycler (65 °C, 5 min)
Add hybridisation mix to library/blocker, mix by pipetting, ~ 5 ×	18.5 µl
	30.5 µl total
• Incubate as 65 °C for 16 + h (in practice: overnight)

3.3 Bead Binding and Washes

During binding, the bait-target hybrids are collected with streptavidin coated magnetic beads and subsequently washed with warm buffer (65 °C) to remove non-target DNA. ‘Wash buffer X’ and beads and need to be prepared before use.

3.3.1 Prepare ‘Wash Buffer X’

Amounts given are per capture reaction. Scale up if you have more than one.

Component	Amount
Hyb S	6.25 µl
H2O	618 µl
Wash Buffer	156 µl
	780.25 µl total

3.3.2 Prepare Beads

1.
Aliquot 30 µl beads in a 1.5 ml protein low-bind Eppendorf tube.
2.
Pellet the beads on a magnet for 2 min.
3.
Discard supernatant.
4.
Conduct 3 washes:
- Add 200 µl Binding Buffer and thoroughly resuspend the beads,
- Pellet the beads on the magnet for 2 min,
- Remove and discard the supernatant.
5.
Resuspend beads in 70 µl binding buffer.
6.
Transfer to PCR tube/strip.

3.3.3 Bead Binding Reaction

At this point the hybridization reaction should have been in the thermal cycler for the past 16 + hours and still be in the cycler at 65 °C. In the below we will add the prepared magnetic beads to our hybridisation reaction. Those beads will then bind the baits.

1.

For bead-binding the baits, combine components and incubate as per table below.

Component	Amount (in µl)
Prepared Beads in PCR tube	70 µl
• Equilibrate bead aliquots in thermal cycler at 65 °C for 2 min) (place them alongside the hybridization reaction in the thermal cycler)
Transfer capture reaction(s) to the bead aliquot(s)	30.5 µl
• mix by pipetting, ~ 5 × • replace the lids
	100 µl total
• Incubate in thermal cycler at 65 °C for 5 min (Flick/spin the tubes after 2.5 min to keep beads suspended)

2.
Take out from the thermal cycler.
3.
Pellet the beads on a magnet until the solution is clear, discard supernatant.
4.
Immediately perform 4 subsequent washes with pre-warmed ‘Wash buffer X’ (see next step: 3.3.4 Bead Washing).

3.3.4 Bead Washing

Repeat the below steps 4 times for a total of 4 washes. After the last wash, remove all wash buffer and proceed without delay to 3.4 Library Resuspension.

4 ×	• Add 180 µl warmed wash buffer X to the beads, mix by pipetting
	• Incubate in thermal cycler (65 °C, 5 min). Flick/spin the tubes after 2.5 min to keep beads suspended
	• Pellet the beads on a magnet until the solution is clear, discard supernatant

Proceed without delay with the next step: 3.4 Library Resuspension.

3.4 Library Resuspension

Add 30 µl of 10 mM Tris-Cl, 0.05% Tween-20 (pH 8.0–8.5) to the washed beads and resuspend the ‘enriched library’ by pipetting.

3.5 Library Amplification

Set up the PCR reaction mix as per below with universal primers suitable for your library type. The resuspended, ‘enriched library’ is of sufficient volume to conduct two PCRs as per Arbor protocol. Overamplification of the library should be avoided. Pooling of independent PCRs can reduce error.

3.5.1 PCR Primers

Amplification primers for Nextera libraries
Universal forward primer [i5]	AATGATACGGCGACCACCGAGA	Tm = 66.2
Universal reverse primer [i7]	CAAGCAGAAGACGGCATACGAGA	Tm = 64.4

3.5.2 PCR Reaction Mix

Component	Amount
H2O	5 µl
KAPA HiFi HotStart ReadyMix (2 × )	25 µl
Universal forward primer [i5], 10 µM	2.5 µl
Universal reverse primer [i7], 10 µM	2.5 µl
Enriched library (on beads)	15 µl
	50 µl total

3.5.3 PCR Program

Step	Temperature (°C)	Time
1	98	2 min
2	98	20 s	8–14 cycles
3	60	30 s
4	72	Length-dependent^a
5	72	5 min
6	15	Forever

^aRecommended elongation times (by average insert size): 500 bp: 30 s, 500–700 bp: 45 s, > 700 bp: 1 min

3.6 Library Clean-Up

1.
Optional: Pool several PCRs.
2.
Perform least two rounds of bead clean-ups: 1× bead clean-up, followed by a 0.7× bead clean-up. Initial clean-up and volume reduction can be more cost-effective using a column-based PCR clean-up kit (e.g., Qiagen).

3.7 Library QC and Quantification

Sequencing service providers will have minimum requirements with respect to DNA amount and quality and often require a minimum ‘molarity’, which can be calculated from average fragment size and weight. The size distribution should be determined with a Fragment Analyzer and the amount of dsDNA in ng by fluorescence-based DNA quantification. Molarity can then be calculated using the formula below:

$$ \frac{{concentration \left( {\frac{ng}{{\mu l}}} \right)*10^{6} }}{660*Average \,fragment\, length} = Molarity \left( \frac{nmol}{l} \right) $$

The formula was copied from https://bitesizebio.com/23105/quantifying-your-ngs-libraries/. Illumina has published a technical note on the quantification of Nextera Libraries of similar content: https://www.illumina.com/documents/products/technotes/technote_nextera_library_validation.pdf.

4 Performance of the Exome Capture Kit—Example Project

To test the performance of the PBGL/Daicel Arbor Biosciences Exome Capture Kit, we performed exome capture and sequencing on an Illumina/Nextera NGS library pool of 41 DNA samples, aligned the resulting sequencing reads to the reference genome and assessed the fraction of reads that matched the exome and the coverage. We used the same reference genome and annotation that had been used to design the kit.

4.1 Example Project: Sequencing a Mutant Population (M₁V₁)

The work was performed at the PBG Laboratory, Seibersdorf, Austria and entailed individual DNA isolations from 41 leaf samples derived from Coffea arabica plants that had been grown in tissue culture, sequencing library construction for each sample (Nextera), pooling of all samples, performing the exome capture reaction on the pool of 41 samples, and submitting the library pool to a service provider for Illumina short-read sequencing (PE150). During library preparation, each sample received an individual molecular barcode (index), so the sequencing reads could be associated to the respective samples after DNA sequencing. We aligned the raw reads (fastq files) to the Coffea arabica reference genome Cara_1.0 (NCBI assembly GCF_003713225.1) with software bwa mem (Li and Durbin 2009). From these alignments (bam files) we evaluated the quality of the capture and enrichment with the R-package TEQC (Hummel et al. 2011, 2020).

4.1.1 Input NGS Library

An Illumina DNA sequencing library pool with 41 individually-indexed coffee samples was prepared following a transposase-mediated protocol (Nextera-type) as detailed in the IAEA-PBGL protocol: Library Preparation for Medium- to High-throughput DNA Sequencing on the Illumina Sequencing Platform, A Laboratory Protocol (IAEA 2022a). The library pool was size selected with Ampure XP beads (one-sided, 0.7×) to an average insert size of ~ 540 bp and a lower size limit of above 300 bp (Fig. 2). Seven microliter (7 µl) containing 300 ng of this Illumina/Nextera sequencing library pool was the input for the exome capture reactions.

A graph of sample intensity versus size depicts a curve with 3 peaks as follows. 1. Lower, (25, 375). 2. 539, (550, 700). 3. Upper, (1500, 500). The values are approximate. — **Fig. 2**

A graph of sample intensity versus size depicts a curve with 3 peaks as follows. 1. Lower, (25, 425). 2. 566, (575, 475). 3. Upper, (1500, 500). The values are approximate. — **Fig. 3**

4.1.2 Exome Capture

One capture reaction was performed on this pool of 41 samples following Arbor protocol version 4: Baits (5.5 µl) were combined with the hybridisation components to 20 µl Hybridisation Mix. Blockers (2 µl IDT Blocker, 2.5 µl Block O) were added to 7.5 µl of the Illumina library resulting in 12 µl total. 18.5 µl of Hybridisation mix were combined with the 12 µl library/blocker mix and hybridization was allowed to occur in a PCR machine for 16 h at 65 °C. The bait/library hybrids were captured (with streptavidin-coated beads) and washed with 1× Buffer X (618 µl H2O, 156 µl wash buffer, 6.25 µl Hyb S). Beads were resuspended in 30 µl 10 mM TrisCl, 0.05% TWEEN-20, pH 8.0, and two independent enrichment PCRs (50 µl, KAPA HiFi) were performed, each with 15 µl of the bead suspension as template, 13 PCR cycles with 45 s extension time. Both PCRs were pooled (100 µl total) and subjected to PCR purification (Qiagen MinElute) and two subsequent bead-cleanups for size selection (1 and 0.7× with Ampure XP beads). Final DNA amount was assessed by fluorescence measurement (Qubit). A one in four dilution was assessed for size distribution on the Agilent TapeStation.

4.1.3 Output Exome Enriched NGS Library

DNA amount of the exome enriched library was assessed by fluorescence measurement (Qubit). A one in four dilution was assessed for size distribution on the Agilent TapeStation (Fig. 3). Average fragment size of the library was ~ 570 bp, which corresponds to an average insert size of ~ 460 bp, adaptors subtracted.

4.1.4 DNA Sequencing

The exome-enriched library along with the list of sample indices was submitted to a sequencing service provider for Illumina DNA sequencing PE150 (paired-end reads with 150 bp read length). We shipped 200 ng (50 µl, 4 ng/µl) and requested 400 Gbp raw data output. We received a total of 3.2 billion reads. They were fairly well distributed across the 41 samples (Fig. 4), with between 58 and 113 Mio reads per sample (Median: 75 Mio).

A box plot indicates the number of reads in millions. The minimum and maximum values are 58 and 114, respectively. The lower and upper quartiles are 65 and 85, respectively. The median is 74. The values are approximate. — **Fig. 4**

4.1.5 Analysis and Results

We aligned all 3.2 billion sequencing reads to the coffee reference genome; the same annotated reference assembly that had been used to derive the targets (Cara_1.0, NCBI accession number: GCA_003713225.1). The reads were aligned with software bwa mem (Li and Durbin 2009) as part of our automated analysis workflow: A Software Workflow for Automated Analysis of Genome (Re-) Sequencing Projects, A Laboratory Protocol (IAEA 2022b). Software and documentation are available on PBGL’s github page (https://github.com/pbgl).

A box plot of the fraction of reads versus the target region depicts an increasing trend. The data are as follows. 1. On-target, 80%. 2. On-target + 100, 86%. 3. On-target + 200, 88%. — **Fig. 5**

A box plot of the fraction of targets versus read coverage depicts a decreasing trend. The data are as follows. 1-fold, 91%. 5-fold, 84%. 10-fold, 75%. 20-fold, 63%. 40-fold, 45%. — **Fig. 6**

The on-target enrichment for each individual sample was assessed from the alignments to the reference, represented in per sample.bam files, with the R-Bioconductor package TEQC (Hummel et al. 2011, 2020). Target definitions were the actual exons of the annotation (see Figs. 5 and 6 for results). As an example, a representative genomic region is shown in Fig. 7.

A screenshot of the Integrative Genomics Viewer window demonstrates the genomic region composed of bars of different colors. — **Fig. 7**

5 Manuals

1.
The manufacturer’s manuals for performing exome capture reactions with this kit

myBaits, Hybridization Capture for Targeted NGS Manual Version 4.01 April 2018, https://arborbiosci.com/wp-content/uploads/2019/08/myBaits-Manual-v4.pdf.

myBaits, Hybridization Capture for Targeted NGS User Manual Version 5.00 September 2020,

https://arborbiosci.com/wp-content/uploads/2020/08/myBaits_v5.0_Manual.pdf.

2.
Sequencing library preparation

The custom-indexed Nextera NGS libraries for Illumina Sequencing were prepared following the PBGL protocol: Library Preparation for Medium- to High-throughput DNA Sequencing on the Illumina Sequencing Platform, A Laboratory Protocol (IAEA 2022a).

3.
Sequence read mapping

Read mapping with software bwa mem (see Note 5) (Li and Durbin 2009) was performed as part of PBGL's automated software workflow: A Software Workflow for Automated Analysis of Genome (Re-) Sequencing Projects, A Laboratory Protocol (IAEA 2022b).

4.
Quality assessment of the capture reactions

TEQC: Quality control for target capture experiments, Hummel et al. (2020). DOI:10.18129/B9.bioc.TEQC, TEQC, R package version 4.18.0. https://bioconductor.org/packages/release/bioc/html/TEQC.html (Hummel et al. 2011).

6 Notes

1.
Coffee arabica is an allotetraploid of Coffea eugenoides and Coffea canephora. We developed separate probe sets for the different sub genomes, so that they can be used independently, if desired. For use in Coffea arabica they need to be pooled in representative ratios. If the user aims to enrich the chloroplast as well, that module can comprise a final 0.1% of the final pool, though optimization for the tissue type might be required.
2.
xGen^® Universal Blockers-NXT Mix, Catalog no. 1079584, purchased from Integrated DNA Technologies Inc. (IDT, www.idtdna.com).
3.
If the amount of DNA in library is limiting, then H₂O can be replaced with additional sequencing library.
4.
https://software.broadinstitute.org/software/igv/.
5.
https://bio-bwa.sourceforge.net/bwa.shtml.

References

Choi M, Scholl UI, Ji W, Liu T, Tikhonova IR, Zumbo P, Nayir A, Bakkaloğlu A, Ozen S, Sanjad S, Nelson-Williams C, Farhi A, Mane S, Lifton RP (2009) Genetic diagnosis by whole exome capture and massively parallel DNA sequencing. Proc Natl Acad Sci U S A 106:19096–19101. https://doi.org/10.1073/pnas.0910672106
Article PubMed PubMed Central Google Scholar
Dong C, Zhang L, Chen Z, Xia C, Gu Y, Wang J, Li D, Xie Z, Zhang Q, Zhang X, Gui L, Liu X, Kong X (2020) Combining a new exome capture panel with an effective varBScore algorithm accelerates BSA-based gene cloning in wheat. Front Plant Sci 11:1249. https://doi.org/10.3389/fpls.2020.01249
Article PubMed PubMed Central Google Scholar
Gardiner L-J, Brabbs T, Akhunov A, Jordan K, Budak H, Richmond T, Singh S, Catchpole L, Akhunov E, Hall A (2019) Integrating genomic resources to present full gene and putative promoter capture probe sets for bread wheat. GigaScience 8. https://doi.org/10.1093/gigascience/giz018
Hummel M, Bonnin S, Lowy E, Roma G (2011) TEQC: an R package for quality control in target capture experiments. Bioinformatics 27:1316–1317. Oxford, England. https://doi.org/10.1093/bioinformatics/btr122
Hummel M, Bonnin S, Lowy E, Roma G (2020). TEQC: Quality control for target capture experiments. R package version 4.18.0.
Google Scholar
Li H, Durbin R (2009) Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25:1754–1760. Oxford, England. https://doi.org/10.1093/bioinformatics/btp324
Mascher M, Richmond TA, Gerhardt DJ, Himmelbach A, Clissold L, Sampath D, Ayling S, Steuernagel B, Pfeifer M, D’Ascenzo M, Akhunov ED, Hedley PE, Gonzales AM, Morrell PL, Kilian B, Blattner FR, Scholz U, Mayer KFX, Flavell AJ, Muehlbauer GJ, Waugh R, Jeddeloh JA, Stein N (2013) Barley whole exome capture: a tool for genomic research in the genus Hordeum and beyond. Plant J Cell Mol Biol. 76:494–505. https://doi.org/10.1111/tpj.12294
Article CAS Google Scholar
IAEA (2022a) Library preparation for medium- to high-throughput DNA sequencing on the illumina sequencing platform. A laboratory protocol, https://www.iaea.org/resources/manual/library-preparation-for-medium-to-high-throughput-dna-sequencing-on-the-illumina-sequencing-platform-a-laboratory-protocol
IAEA (2022b) A software workflow for automated initial analysis of high-throughput DNA sequencing project. A laboratory protocol, https://www.iaea.org/resources/manual/a-software-workflow-for-automated-initial-analysis-of-high-throughput-dna-sequencing-project-a-laboratory-protocol
Russell J, Mascher M, Dawson IK, Kyriakidis S, Calixto C, Freund F, Bayer M, Milne I, Marshall-Griffiths T, Heinen S, Hofstad A, Sharma R, Himmelbach A, Knauft M, van Zonneveld M, Brown JWS, Schmid K, Kilian B, Muehlbauer GJ, Stein N, Waugh R (2016) Exome sequencing of geographically diverse barley landraces and wild relatives gives insights into environmental adaptation. Nat Genet 48:1024–1030. https://doi.org/10.1038/ng.3612
Article CAS PubMed Google Scholar
Scalabrin S, Toniutti L, Di Gaspero G, Scaglione D, Magris G, Vidotto M, Pinosio S, Cattonaro F, Magni F, Jurman I, Cerutti M, Suggi Liverani F, Navarini L, Del Terra L, Pellegrino G, Ruosi MR, Vitulo N, Valle G, Pallavicini A, Graziosi G, Klein PE, Bentley N, Murray S, Solano W, Al Hakimi A, Schilling T, Montagnon C, Morgante M, Bertrand B (2020) A single polyploidization event at the origin of the tetraploid genome of Coffea arabica is responsible for the extremely low genetic variation in wild and cultivated germplasm. Sci Rep 10:4642. https://doi.org/10.1038/s41598-020-61216-7
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

This work was funded by FAO/IAEA. The Exome Capture Kit design was a contribution by Daicel Arbor Biosciences (Ann Arbor, Michigan, USA). Mr Florian Goessnitzer (IAEA) provided tissue from in vitro coffee plantlets.

Author information

Authors and Affiliations

Plant Breeding and Genetics Laboratory (PBGL), Joint FAO/IAEA Centre for Nuclear Applications in Food and Agriculture, International Atomic Energy Agency, Seibersdorf, Austria
Norman Warthmann

Authors

Norman Warthmann
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Norman Warthmann .

Editor information

Editors and Affiliations

Plant Breeding and Genetics Laboratory, Joint FAO/IAEA Centre of Nuclear Techniques in Food and Agriculture, IAEA Laboratories Seibersdorf, International Atomic Energy Agency, Vienna International Centre, Vienna, Austria
Ivan L.W. Ingelbrecht
Universidade de Lisboa, Instituto Superior de Agronomia, Lisboa, Portugal
Maria do Céu Lavado da Silva
Plant Breeding and Genetics Laboratory, Joint FAO/IAEA Centre of Nuclear Techniq, Vienna, Austria
Joanna Jankowicz-Cieslak

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Warthmann, N. (2023). Targeted Sequencing in Coffee with the Daicel Arbor Biosciences Exome Capture Kit. In: Ingelbrecht, I.L., Silva, M.d.C.L.d., Jankowicz-Cieslak, J. (eds) Mutation Breeding in Coffee with Special Reference to Leaf Rust. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-67273-0_19

Download citation

DOI: https://doi.org/10.1007/978-3-662-67273-0_19
Published: 01 October 2023
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-67272-3
Online ISBN: 978-3-662-67273-0
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)

Publish with us

Policies and ethics