Population Genomics and Biogeography of the Northern Acorn Barnacle (Semibalanus balanoides) Using Pooled Sequencing Approaches
The northern acorn barnacle (Semibalanus balanoides) is a robust system for the study of evolutionary processes in the intertidal. S. balanoides has a well-characterized ecology, a wide circumboreal distribution, and a life history characterized by tractable environmental stressors at various ecological scales. In this chapter, we discuss a variety of topics concerning the development of S. balanoides as a model in ecological genomics as well as inferences of demography and historical phylogeography. In addition, we introduce two novel genomic tools for S. balanoides: the complete mtDNA sequence and the second draft of the nuclear genome (Sbal2). Using these tools, we conducted a reanalysis of previously described mtDNA haplotypes, a and b, as well as genome-wide levels of variation and population structure across the North Atlantic using pooled sequencing approaches. Analyses of sequence data from older and more recent Illumina platforms revealed the effects of technical bias in the estimates of population genomic metrics. We found concordant levels of nuDNA and mtDNA genetic variation with no evidence of demographic bottlenecks. We observed low genome-wide FST values across the Atlantic, suggesting a large number of ancestral polymorphisms and shared standing variation across the basin. Comparisons of genome-wide estimates of FST with those derived from a discriminant analysis of principal components uncovered population-structure-informative SNPs. This suggests the existence of latent population structure across broad scales, despite the capacity for extensive planktonic dispersal. Noticeably, our samples collected in Iceland displayed higher similarity to North American populations than to the rest of Europe. We hypothesize this is consistent with a periglacial refugium in Iceland concomitant with a barrier to gene flow caused by the North Atlantic current. Lastly, we discuss challenges and opportunities for the improvement of genomic tools in barnacles. Our reflections in this area are easily generalizable to most natural populations.
KeywordsBarnacles Ecological genomics Genome assembly Mitochondria Pooled sequencing Population genetics Semibalanus balanoides
The authors thank Kim Neil, Stephen Rong, and Alejandro Damian-Serrano for their insightful discussion on genetic variation, statistical genetics, and pipeline design as well as comments on the manuscript and also to Dylan R. Gaddes for editorial comments that improved the manuscript. This work was made possible by Brown University through the use of the facilities of its Center for Computation and Visualization. This work was funded by a NSF grant IGERT DGE-0966060 to DMR. DMR acknowledges support from NIH 2R01GM067862.
Refers to the process in which previously isolated populations begin interbreeding.
Genetic variation present in two (or more) species, subspecies, or populations that appeared prior to divergence.
A graph that can be traversed to create an assembled DNA sequence. The most commonly used assembly graph is a De Bruijn graph.
DNA or RNA sequence, typically assembled from multiple overlapping short sequence reads.
Indicates the number of times that a particular genomic region was sampled by mapped reads produced by a sequencing experiment.
Cytochrome c oxidase I, a gene encoded in mtDNA involved in the electron transport chain. This gene is commonly used in population genetic studies and species identification or DNA barcoding.
The mitochondrial DNA control region, also known as the displacement loop. It contains the sequences for the origin of replication and transcription of the mtDNA molecule. Its high rate of DNA substitution makes it suitable for analyses of closely related populations and species.
A directed graph representing overlaps between k-mers present in a set of reads. Nodes are represented by k-mers and edges by (k − 1)-mers.
The process of stringing together overlapping DNA sequence reads to make longer DNA sequences, called contigs. Perfect genome assembly would produce 1 contig for each chromosome.
The effective number of breeding individuals in a population, equivalent to the idealized population size in which the effects of stochastic sampling on allele frequencies (i.e. genetic drift) are similar to the real population of interest.
Highly parallelized DNA sequencing that produces millions to 100s of millions of DNA sequences of varying length (50–250 bp for the Illumina platform; 1,000 to >20,000 bp for the Pacific Biosystems (PacBio) and Oxford Nanopore platforms.)
A condition where a character is shared by a set of species or populations that is not shared by their common ancestor. In DNA terminology, it may refer to the independent mutation (or back-mutation) to the same nucleotide state in two populations.
The process by which a phylogenetically informative marker is shared among species or populations in which other markers have diverged to fixation in each population.
An insertion or deletion in a DNA sequence.
A DNA sequence of length k. In genome assembly, k-mers are generated by splitting reads into smaller pieces of length k.
DNA-sequences longer than 1,000 bp.
The process of identifying a subsequence or multiple subsequences in the reference genome that matches or approximately matches a read.
A statistical measure of the average length of a set of sequences (or contigs). N50 measures the length N such that 50% of all bases are contained within sequences with length less than or equal to N.
An idealized demographic model in which all members of a population mate randomly, resulting in panmixia.
An experimental approach for the quantification of genetic variation in populations through the pooling and subsequent sequencing of multiple individuals.
An experimental approach to quantify genetic variation in populations by sampling a reduced (~10%) portion of the genome to high coverage.
A set of genomic sequences that represents the genome of a population or species. These sequences may include DNA from multiple individuals.
The first generation of the Semibalanus balanoides genome.
The second generation of the Semibalanus balanoides genome.
The introduction of sequencing artifacts by the unequal sampling of DNA sequences due to characteristics of the target sequence, such as GC content.
DNA sequences with lengths ranging from 50 to 200 bp.
A genomic variant occurring at a single-nucleotide position in genomic sequences.
Allelic variation that currently exists within populations as opposed to new variants arising by de novo mutation.
- Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M, Kulikov AS, Lesin VM, Nikolenko SI, Pham S, Prjibelski AD, Pyshkin AV, Sirotkin AV, Vyahhi N, Tesler G, Alekseyev MA, Pevzner PA. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J Comput Biol. 2012;19:455–77.PubMedPubMedCentralGoogle Scholar
- Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc B Methodol. 1995;57:289–300.Google Scholar
- Bushnell B. BBMap short read aligner. Berkeley: University of California; 2016. http://sourceforge.net/projects/bbmap.
- Chevreux B, Wetter T, Suhai S. Genome sequence assembly using trace signals and additional sequence information. Comput Sci Biol. 1999;99:45–56.Google Scholar
- Chin CS, Peluso P, Sedlazeck FJ, Nattestad M, Concepcion GT, Clum A, Dunn C, O’Malley R, Figueroa-Balderas R, Morales-Cruz A, Cramer GR, Delledonne M, Luo C, Ecker JR, Cantu D, Rank DR, Schatz MC. Phased diploid genome assembly with single-molecule real-time sequencing. Nat Methods. 2016;13:1050–4.PubMedPubMedCentralGoogle Scholar
- Crisp DJ. Racial differences between North American and European forms of Balanus balanoides. J Mar Biol Assoc UK. 1964;44:33.Google Scholar
- Crisp DJ. Differences between North American and European Populations of Balanus balanoides revealed by transplantation. J Fish Res Board Can. 1968a;25:2633–41.Google Scholar
- Crisp DJ. Distribution of the parasitic isopod Hemioniscus balani with special reference to the east coast of North America. J Fish Res Board Can. 1968b;25:1161–7.Google Scholar
- Endler JA. Natural selection in the wild. Princeton: Princeton University Press; 1986.Google Scholar
- Flight PA, Schoepfer SD, Rand DM. Physiological stress and the fitness effects of Mpi genotypes in the acorn barnacle Semibalanus balanoides. Mar Ecol Prog Ser. 2010;404:139–49.Google Scholar
- Fratantoni DM. North Atlantic surface circulation during the 1990’s observed with satellite-tracked drifters. J Geophys Res Oceans. 2001;106(C10):22067–93.Google Scholar
- Helmuth B, Mieszkowska N, Moore P, Hawkins SJ. Living on the edge of two changing worlds: forecasting the responses of rocky intertidal ecosystems to climate change. Annu Rev Ecol Evol Syst. 2006;37:373–404.Google Scholar
- Kajitani R, Toshimoto K, Noguchi H, Toyoda A, Ogura Y, Okuno M, Yabana M, Harada M, Nagayasu E, Maruyama H, Kohara Y, Fujiyama A, Hayashi T, Itoh T. Efficient de novo assembly of highly heterozygous genomes from whole-genome shotgun short reads. Genome Res. 2014;24:1384–95.PubMedPubMedCentralGoogle Scholar
- Kidd JR, Friedlaender FR, Speed WC, Pakstis AJ, De La Vega FM, Kidd KK. Analyses of a set of 128 ancestry informative single-nucleotide polymorphisms in a global set of 119 population samples. Investigative Genet. 2011;2(1):1.Google Scholar
- Lê S, Josse J, Husson F. FactoMineR: an R package for multivariate analysis. J Stat Softw. 2008;25:1–18.Google Scholar
- Li H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM, arXiv Pre-print, version 2. 2013.Google Scholar
- Luikart G, England PR, Tallmon D, Jordan S, Taberlet P. The power and promise of population genomics: from genotyping to genome typing. Nat Genet. 2003;4:981–94.Google Scholar
- Malécot G, Blaringhem L-F. Les mathématiques de l'hérédité. 1948.Google Scholar
- Marsden CD, Ortega-Del Vecchyo D, O’Brien DP, Taylor JF, Ramirez O, Vila C, Marques-Bonet T, Schnabel RD, Wayne RK, Lohmueller KE. Bottlenecks and selective sweeps during domestication have increased deleterious genetic variation in dogs. Proc Natl Acad Sci U S A. 2016;113:152–7.PubMedGoogle Scholar
- Orvik KA, Niiler P. Major pathways of Atlantic water in the northern North Atlantic and Nordic Seas toward Arctic. Geophys Res Lett. 2002;29(19):1896.Google Scholar
- Perez-Losada M, Hoeg JT, Simon-Blecher N, Achituv Y, Jones D, Crandall KA. Molecular phylogeny, systematics and morphological evolution of the acorn barnacles (Thoracica: Sessilia: Balanomorpha). Mol Phylogenet Evol. 2014;81C:147–58.Google Scholar
- Ruddiman WF, Mcintyre A. The North-Atlantic Ocean during the last deglaciation. Palaeogeogr Palaeoclimatol Palaeoecol. 1981;35:145–214.Google Scholar
- Schmidt PS, Serrão EA, Pearson GA, Riginos C, Rawson PD, Hilbish TJ, Brawley SH, Trussell GC, Carrington E, Wethey DS, Grahame JW, Bonhomme F, Rand DM. Ecological genetics in the North Atlantic: environmental gradients and adaptation at specific loci. Ecology. 2008;89:S91–S107.PubMedPubMedCentralGoogle Scholar
- Sea Urchin Genome Sequencing Consortium, Sodergren E, Weinstock GM, Davidson EH, Cameron RA, Gibbs RA, Angerer RC, Angerer LM, Arnone MI, Burgess DR, Burke RD, Coffman JA, Dean M, Elphick MR, Ettensohn CA, Foltz KR, Hamdoun A, Hynes RO, Klein WH, Marzluff W, McClay DR, Morris RL, Mushegian A, Rast JP, Smith LC, Thorndyke MC, Vacquier VD, Wessel GM, Wray G, Zhang L, Elsik CG, Ermolaeva O, Hlavina W, Hofmann G, Kitts P, Landrum MJ, Mackey AJ, Maglott D, Panopoulou G, Poustka AJ, Pruitt K, Sapojnikov V, Song X, Souvorov A, Solovyev V, Wei Z, Whittaker CA, Worley K, Durbin KJ, Shen Y, Fedrigo O, Garfield D, Haygood R, Primus A, Satija R, Severson T, Gonzalez-Garay ML, Jackson AR, Milosavljevic A, Tong M, Killian CE, Livingston BT, Wilt FH, Adams N, Belle R, Carbonneau S, Cheung R, Cormier P, Cosson B, Croce J, Fernandez-Guerra A, Geneviere AM, Goel M, Kelkar H, Morales J, Mulner-Lorillon O, Robertson AJ, Goldstone JV, Cole B, Epel D, Gold B, Hahn ME, Howard-Ashby M, Scally M, Stegeman JJ, Allgood EL, Cool J, Judkins KM, McCafferty SS, Musante AM, Obar RA, Rawson AP, Rossetti BJ, Gibbons IR, Hoffman MP, Leone A, Istrail S, Materna SC, Samanta MP, Stolc V, Tongprasit W, Tu Q, Bergeron KF, Brandhorst BP, Whittle J, Berney K, Bottjer DJ, Calestani C, Peterson K, Chow E, Yuan QA, Elhaik E, Graur D, Reese JT, Bosdet I, Heesun S, Marra MA, Schein J, Anderson MK, Brockton V, Buckley KM, Cohen AH, Fugmann SD, Hibino T, Loza-Coll M, Majeske AJ, Messier C, Nair SV, Pancer Z, Terwilliger DP, Agca C, Arboleda E, Chen N, Churcher AM, Hallbook F, Humphrey GW, Idris MM, Kiyama T, Liang S, Mellott D, Mu X, Murray G, Olinski RP, Raible F, Rowe M, Taylor JS, Tessmar-Raible K, Wang D, Wilson KH, Yaguchi S, Gaasterland T, Galindo BE, Gunaratne HJ, Juliano C, Kinukawa M, Moy GW, Neill AT, Nomura M, Raisch M, Reade A, Roux MM, Song JL, Su YH, Townley IK, Voronina E, Wong JL, Amore G, Branno M, Brown ER, Cavalieri V, Duboc V, Duloquin L, Flytzanis C, Gache C, Lapraz F, Lepage T, Locascio A, Martinez P, Matassi G, Matranga V, Range R, Rizzo F, Rottinger E, Beane W, Bradham C, Byrum C, Glenn T, Hussain S, Manning G, Miranda E, Thomason R, Walton K, Wikramanayke A, Wu SY, Xu R, Brown CT, Chen L, Gray RF, Lee PY, Nam J, Oliveri P, Smith J, Muzny D, Bell S, Chacko J, Cree A, Curry S, Davis C, Dinh H, Dugan-Rocha S, Fowler J, Gill R, Hamilton C, Hernandez J, Hines S, Hume J, Jackson L, Jolivet A, Kovar C, Lee S, Lewis L, Miner G, Morgan M, Nazareth LV, Okwuonu G, Parker D, Pu LL, Thorn R, Wright R. The genome of the sea urchin Strongylocentrotus purpuratus. Science. 2006;314:941–52.PubMedCentralGoogle Scholar
- Shen X, Chu KH, Chan BK, Tsang LM. The complete mitochondrial genome of the fire coral-inhabiting barnacle Megabalanus ajax (Sessilia: Balanidae): gene rearrangements and atypical gene content. Mitochondrial DNA. 2014a:1–2.Google Scholar
- Shen X, Tsoi KH, Cheang CC. The model barnacle Balanus balanus Linnaeus, 1758 (Crustacea: Maxillopoda: Sessilia) mitochondrial genome and gene rearrangements within the family Balanidae. Mitochondrial DNA. 2014b:1–3.Google Scholar
- Smit AFA, Hubley R, Green P. RepeatMasker Open-4.0. In. 2013–2015.Google Scholar
- The UniProt Consortium. UniProt: the universal protein knowledgebase. Nucleic Acids Res. 2017;45:D158–69.Google Scholar
- van Oppen MJH, Draisma SGA, Olsen JL, Stam WT. Multiple trans-Arctic passages in the red alga Phycodrys rubens: evidence from nuclear rDNA ITS sequences. Mar Biol. 1995;123:179–88.Google Scholar
- Vermeij GJ. Anatomy of an invasion: the trans-Arctic interchange. Paleobiology. 1991;17:281–307.Google Scholar
- Wares JP. Intraspecific variation and geographic isolation in idotea balthica (Isopoda: Valvifera). J Crustac Biol. 2001;21:1007–13.Google Scholar
- Ye C, Hill C, Ruan J, Ma Z. DBG2OLC: efficient assembly of large genomes using the compressed overlap graph. arXiv. 2015;1410.2801.Google Scholar