Non-destructive genome skimming for aquatic copepods

Copepods are important ecologically and represent a large amount of aquatic biomass in both freshwater and marine systems. Despite this, the taxonomy of copepods and other meiofauna is not well understood, hampered by tiny sizes, cryptic taxa, intraspecific polymorphisms and total specimen destruction where DNA methods are employed. In this article we highlight these issues and propose a more up-to-date approach for dealing with them. Namely, we recommend non-destructive DNA extraction methods, coupled with high-throughput sequencing (HTS). Whilst DNA yields may be low, they should still be sufficient for HTS library preparation and DNA sequencing. At the same time morphological specimens can be preserved and the crucial link between morphology and DNA sequence is maintained. This is critical for an integrative taxonomy and a fuller understanding of biodiversity patterns as well as evolutionary processes in meiofauna.

Subclass Copepoda are often called "insects of the sea", one of the most important and diverse aquatic crustacean groups on the planet in terms of total biomass. They dominate plankton and can be found in aquatic (freshwater to deep-sea) sediments and from ground waters, forest litter, moss, moist soils and wet packed leaves, to Himalayan glacier lakes. Some copepods are free-living and some are associated with a wide range of animals (Walter and Boxshall 2019). The orders Siphonostomatoida and Monstrilloida are exclusively parasites (Fogel et al. 2017;Suárez-Morales 2018), and only some species of the orders Calanoida, Cyclopoida, Canuelloida, and Harpacticoida are parasites or associated with a wide variety of organisms (Boxshall et al. 2016;Ho 2001;Huys 2016). The orders Platycopioida, Misophrioida, Mormonilloida, and Gelyelloida are completely free-living (Varela and Lalana 2015). Order Calanoida comprises the most diverse and widely distributed group that is the dominant component in zooplankton samples (Huys and Boxshall 1991). Currently, 10,000 valid species of copepods have been recorded and described (Walter and Boxshall 2019), of which 2814 species are reported from freshwaters (Boxshall and Defaye 2008). The approximate number of valid taxa among ten orders is depicted in Fig. 1. It has also been estimated that a large number of species remain undescribed (Humes 1994).
Copepods have tremendous ecological significance and also commercial value: in aquatic food webs and carbon flux (Legendre and Rivkin 2002); control of mosquitoborne diseases by consuming mosquito larvae (Marten et al. 2000); they are prey for higher trophic levels, thus are used in aquaculture (Zeng et al. 2018); considered as a possible food for human consumption (Eysteinsson et al. 2018); as bioindicators of water quality (Annabi-Trabelsi et al. 2019); and a good model to study ecological changes (Grieve et al. 2017). Despite this, copepod species are still a difficult taxonomic group and need to be identified accurately, which is the basic criterion of biodiversity assessment. Cryptic species are frequently encountered from terrestrial to aquatic animals and copepods are no exception (Gomes et al. 2015;Pulido-Santacruz et al. 2018;Ramos 1 3 et al. 2019;Vakati et al. 2019). Since such species are genetically divergent while morphologically homogenous, classical taxonomy underestimates their diversity (Lajus et al. 2015;Vakati et al. 2019). Cryptic taxa are often characterised by allopatric distribution patterns (Dodson et al. 2003;Garlitska et al. 2012), however, some cryptic taxa among benthic copepods exist sympatrically (Schizas et al. 2002;Vakati et al. 2019). Although cryptic taxa are morphologically similar, each species may have a different ecological significance, economic value, and interactions with ecosystems (Eisenring et al. 2016), thus, it is crucial to identify these species accurately.
Sometimes the incorrect pairing of sexes can occur among closely related species; for example, N. minutus and N. dimorphicus are morphologically different but could be misidentified by incorrectly pairing their sexes using classical taxonomy alone (Vakati et al. 2019). Overall, the underestimation of cryptic taxa, plus the inverse problem of variable species being erroneously split are both highly problematic, and especially so if only traditional taxonomic approaches are followed. A more relevant taxonomy is best achieved with an integrative molecular and morphological approach, and newer molecular methods (e.g. non-destructive genome skimming) provide a clear advantage (Fig. 2).
Several works have discussed the importance of integrative taxonomy for meiofauna, with reciprocal illumination based on genetic and morphological identification (Castro-Romero et al. 2016;Garraffoni et al. 2019). In the past, specimens have typically been destroyed to extract DNA for molecular analyses and morphological data is thus obtained from different specimens (Di Capua et al. 2017;Karanovic et al. 2015). However, this approach is not effective when several cryptic taxa live sympatrically (Schizas et al. 2002;Vakati et al. 2019). In contrast to macrofauna, meiofauna are extremely small (50-1000 μm), so it is difficult to extract genomic DNA (gDNA) from a part of the specimen to obtain genetic and morphological information from the same individual. Recently, non-destructive DNA extraction methods have been suggested as a way to deal with cryptic taxa in copepods (Cornils 2015), whereby specimens are preserved after DNA extraction for morphological studies.
The largest copepod thus far reported grows up to 32 cm, which is Pennella balaenopterae, an ectoparasite of fin whale (Vecchione and Aznar 2014); the smallest copepod thus far reported grows to only to 0.11 mm, males of Sphaeronella monothrix, a parasite of marine ostracods (Bowman and Kornicker 1967). Despite this range of sizes, most copepods are typically small-sized, between 1 and 2 mm (Walter and Boxshall 2019). A recent study has demonstrated the usefulness of non-destructive DNA extraction, successfully identifying and describing several species of Nannopus from the Yellow Sea (Vakati et al. 2019). In this approach, specimens are first washed in distilled water and subjected to gDNA extraction in lysis buffer, and then subsequently specimens are placed in ethanol for morphological analyses, without damage. Due to their small sizes the concentration of gDNA from a single copepod will be low (< 10 ng/μl) and mostly extracted in approximately 20-50 μl buffer. Copepods have fragile exoskeletons so care must be taken here. It is not appropriate to extract gDNA from several specimens together to obtain higher concentrations of gDNA, as this heterogeneous mixture of specimens could potentially lead to analytical discrepancies.
Some of the most widely used markers for molecular systematics of copepods are mtCOI, mtCYTB, 18S and 28S rDNA, and occasionally ITS2, H3, 12S and, 16S rDNA (Braga et al. 1999;Blanco-Bercial et al. 2011;Cornils and Blanco-Bercial 2013;Figueroa 2011;Hirai et al. 2013;Huys et al. 2007;Jørgensen et al. 2010;Khodami et al. 2017;Marrone et al. 2013;Marszalek et al. 2008;Thum 2004;Thum and Harrison 2009;Vakati et al. 2019;von Reumont et al. 2012;Wyngaard et al. 2010). Often universal primers do not effectively amplify both highly variable protein coding genes and conserved ribosomal genes (Cepeda et al. 2012;Lv et al. 2014), and attempting several PCR reactions with ineffective primer combinations can result in wasting gDNA extracts. As an example, we failed to amplify mtCOI even after several attempts using several universal primer combinations Fig. 1 An approximate number of total valid taxa in amongst ten orders of Copepoda, subdivided into (i) families and subfamilies; (ii) genera and subgenera, and (iii) species and subspecies. The numbers of taxa are estimated from Walter and Boxshall (2019) and also follow Suárez-Morales (2015) for two species of Nannopus, and consequently used all of the gDNA extracts with no additional specimens available (Vakati and Lee in press). An additional problem is that universal primers often amplify pseudogenes (Machida and Lin 2017;Song et al. 2008), which can lead to an overestimation of species and is highly problematic, hindering Fig. 2 Summary of gDNA extraction and high-throughput sequencing (HTS) methods applied to evolutionary and ecological studies of meiobenthic animals. Integrative methods involve genetics and morphological observations from the same specimens. Traditional methods demonstrate that results from both morphology-alone or genetics-alone can lead to confusion or discrepancies such as: (i) no guarantee that separate specimens belong to the same species in the case of cryptic taxa; (ii) molecular phylogeny will not have morpho-logical data support and vice versa; (iii) classical taxonomy cannot accurately identify cryptic species; (iv) sometimes incorrect pairing of sexes and overestimation of species can happen with classical taxonomy; (v) phylogeny, population genetics, and DNA barcoding based on single genes are often insufficient to resolve relationships; and (vi) single genes often do not amplify effectively with 'universal' primers. Figure adapted from Vakati et al. (2019) 1 3 accurate estimates of biodiversity (Song et al. 2008). Highthroughput sequencing (HTS) methods (e.g. genome skimming) are potentially powerful tools to overcome these issues. For example, we employed genome skimming using Illumina technology from approximately 100 ng of DNA (< 5 ng/μl), non-destructively extracted from a single specimen of Nannopus ganghwaensis (Vakati et al. 2016), a benthic harpacticoid copepod. The results yielded a complete mitochondrial genome with good coverage (Vakati et al. unpublished).
Genome skimming is a term coined by Straub et al. (2012), and such approaches consist of shallow shotgun sequencing to obtain genomic data from eukaryotic taxa. Scientists have already employed this method in several terrestrial and aquatic animals, as well as plants and protists, for a range of biodiversity research questions (Miller et al. 2011;Richter et al. 2015). However, using such an approach on meiofaunal specimens is currently uncommon, especially for non-model benthic copepods. Although DNA concentrations can be very low, HTS libraries can be prepared routinely from 50 ng of DNA, and theoretically from as little as a few picograms of DNA. Another advantage is that HTS libraries overcome problems associated with less-effective amplification from 'universal' primers, as total DNA is sequenced without the need for specific primer combinations. Genome skimming permits the efficient sequencing of the high-copy portion of nuclear DNA as well as high-copy organellar DNA from a single specimen. This method can therefore be utilized to deal with cryptic taxa even when only one specimen is available.
Phylogenetic relationships of copepod orders have thus far been resolved based only on partial gene sequences (mtCOI, H3, 28S, and 18S rDNA), whereas genome skimming would also help to construct phylogenies based on complete mitogenomes of all orders (and larger stretches of nuclear ribosomal DNA). At the current time, approximately 15 copepod species have complete mitogenomes available and 5 have draft nuclear genome sequences (Jørgensen et al. 2019). Employing genome skimming techniques for meiofauna will also improve genomic databases, which would eventually help further studies in metagenomics and environmental DNA (eDNA) analyses, reducing reliance of such studies on traditional barcode regions. Combining nondestructive DNA extraction with genome skimming, one can identify species and construct phylogenetic relationships more efficiently. This approach has wider implications for species identification, molecular ecology (metabarcoding/ eDNA), DNA barcoding, phylogenomics, and population genetics, and would be a significant step forward for biodiversity research on copepods and meiofauna more broadly.

Compliance with ethical standards
Conflict of interest All authors declare that they have no conflict of interest.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creat iveco mmons .org/licen ses/by/4.0/.