Copy number variation arising from gene conversion on the human Y chromosome

Shi, Wentao; Massaia, Andrea; Louzada, Sandra; Banerjee, Ruby; Hallast, Pille; Chen, Yuan; Bergström, Anders; Gu, Yong; Leonard, Steven; Quail, Michael A.; Ayub, Qasim; Yang, Fengtang; Tyler-Smith, Chris; Xue, Yali

doi:10.1007/s00439-017-1857-9

Copy number variation arising from gene conversion on the human Y chromosome

Original Investigation
Open access
Published: 05 December 2017

Volume 137, pages 73–83, (2018)
Cite this article

Download PDF

You have full access to this open access article

Human Genetics Aims and scope Submit manuscript

Copy number variation arising from gene conversion on the human Y chromosome

Download PDF

Wentao Shi^1,2^na1,
Andrea Massaia^1,3^na1,
Sandra Louzada¹^na1,
Ruby Banerjee¹,
Pille Hallast^1,4,
Yuan Chen¹,
Anders Bergström¹,
Yong Gu¹,
Steven Leonard¹,
Michael A. Quail¹,
Qasim Ayub¹^nAff5,
Fengtang Yang¹,
Chris Tyler-Smith¹ &
…
Yali Xue¹

3298 Accesses
9 Citations
3 Altmetric
Explore all metrics

Abstract

We describe the variation in copy number of a ~ 10 kb region overlapping the long intergenic noncoding RNA (lincRNA) gene, TTTY22, within the IR3 inverted repeat on the short arm of the human Y chromosome, leading to individuals with 0–3 copies of this region in the general population. Variation of this CNV is common, with 266 individuals having 0 copies, 943 (including the reference sequence) having 1, 23 having 2 copies, and two having 3 copies, and was validated by breakpoint PCR, fibre-FISH, and 10× Genomics Chromium linked-read sequencing in subsets of 1234 individuals from the 1000 Genomes Project. Mapping the changes in copy number to the phylogeny of these Y chromosomes previously established by the Project identified at least 20 mutational events, and investigation of flanking paralogous sequence variants showed that the mutations involved flanking sequences in 18 of these, and could extend over > 30 kb of DNA. While either gene conversion or double crossover between misaligned sister chromatids could formally explain the 0–2 copy events, gene conversion is the more likely mechanism, and these events include the longest non-allelic gene conversion reported thus far. Chromosomes with three copies of this CNV have arisen just once in our data set via another mechanism: duplication of 420 kb that places the third copy 230 kb proximal to the existing proximal copy. Our results establish gene conversion as a previously under-appreciated mechanism of generating copy number changes in humans and reveal the exceptionally large size of the conversion events that can occur.

Overview of Statistical Methods for Genome-Wide Association Studies (GWAS)

Next-Generation Sequencing: Advantages, Disadvantages, and Future

Opportunities and challenges in long-read sequencing data analysis

Article Open access 07 February 2020

Introduction

Copy number variation (CNV, which also refers to copy number variants) is well-documented in the human genome, affecting more nucleotides than are affected by SNP variation and contributing abundantly to phenotypic diversity (Lupski 2015; Sudmant et al. 2015 a, b). CNVs arise by several mechanisms, including non-allelic homologous recombination, non-homologous end joining, microhomology-mediated break-induced replication, and retroelement insertions (Hastings et al. 2009; Carvalho and Lupski 2016). Gene conversion, the non-reciprocal transfer of genetic information from one locus or allele to another, is also well-documented in the human genome as a possible consequence of double-strand breaks and subsequent repair involving homologous regions, including the formation and resolution of a double Holliday junction during meiosis (Szostak et al. 1983). It generally involves short stretches of DNA converting between alleles or nearby paralogs, requires high sequence similarity, and occurs less frequently between different chromosomes. Allelic conversion tract lengths of up to 22 kb are known (Wang et al. 2012), while lengths for non-allelic events are shorter, although > 9 kb has been reported (Chen et al. 2007; Hallast et al. 2013). These two processes of CNV and gene conversion are generally considered quite distinct, but here, we describe a CNV, whose origin is best explained by gene conversion events, linking the two processes.

The male-specific region of the Y chromosome offers unique opportunities for investigating both CNV and gene conversion, because (1) it is particularly tolerant of genetic variation (Poznik et al. 2016), so a wide variety of variants persist in the population and (2) the lack of recombination between different Y lineages allows the history of variants to be identified from the phylogeny (Jobling 2008; Jobling and Tyler-Smith 2017; Massaia and Xue 2017; Trombetta and Cruciani 2017). We have previously described the genetic variation in a set of 1244 diverse worldwide Y chromosomes sequenced in phase 3 of the 1000 Genomes Project, including the identification and validation of 110 CNVs (Poznik et al. 2016). The current study presents a detailed analysis of one of these, dbVar ID esv3818053 [chrY: 9,640,466–9,653,590 GRCh37 (hg19); chrY: 9,802,857–9,815,981, GRCh38 (hg38)], here designated by the more descriptive name TTTY22-CNV, because it overlaps with ~ 85% of this lincRNA gene and changes the number of its functional copies.

Results and discussion

Validation of TTTY22-CNV

TTTY22-CNV lies within the ~ 300 kb long inverted repeat 3 (IR3), which has two copies on the short arm of the Y chromosome, at approximately 6.1–6.4 and 9.4–9.7 Mb (GRCh37) (Skaletsky et al. 2003). In the reference sequence (Skaletsky et al. 2003), derived mostly from haplogroup R1b (Wei et al. 2103), the proximal copy contains an additional ~ 10 kb segment which carries most of TTTY22 (Fig. 1a). Variation in copy number of this additional segment was initially identified using Genome STRiP and validated by array-CGH (Poznik et al. 2016), revealing 0–3 copies, compared with the single copy in the reference sequence. TTTY22-CNV was further validated here by establishing a PCR assay using one pair of flanking primers which generated a 568 bp product in the absence of the TTTY22-CNV, and one pair within TTTY22 which generated a 249 bp product in its presence (Table S1). Structures matching the reference sequence with one copy of TTTY22-CNV generate both products, those with 0 copies generate only the 568 bp band, and those with 2 or 3 copies only the 249 bp band (Fig. 1c). In addition, fibre-FISH experiments using probes generated from two partially overlapping BAC clones (P1 and P2) spanning TTTY22-CNV, together with two 5 kb custom PCR-generated probes (P3 and P4, combined; Table S1) lying mainly within TTTY22-CNV are expected to show differential patterns: (1) hybridization of only the BAC clones plus a small fragment of P4 when the TTTY22-CNV is absent and (2) hybridization of all four probes when the gene is present. As expected, both patterns were detected in similar proportions (7:9, Fig. 1b, Fig. S1) in samples with structures matching the reference sequence. Conversely, only the former pattern (18:0, Fig. S1) was detected in samples with 0 copies of TTTY22-CNV and only the latter pattern (0:12, Fig. S1) was observed in samples with 2 copies of TTTY22-CNV (Fig. 1b). Samples with three copies showed the latter pattern plus a separate, additional fibre-FISH signal, and are described separately below. Finally, 10x Genomics Chromium linked-read data (Zheng et al. 2016) were available for samples with one and two copies; the sample with one copy shows uniform distributions of barcode sharing and read depth across both TTTY22-CNV locations, while the sample with two copies shows increased barcode sharing and read depth in IR3_proximal and decreased sharing and depth in IR3_distal (Fig. S2). Thus, this combination of validation approaches confirms both the predicted details of the breakpoints (PCR) and the broader context of the TTTY22-CNV location within IR3 (fibre-FISH and linked-read sequencing), so we conclude that individuals with 0, 1, or 2 copies of TTTY22-CNV can be explained by variation within the IR3 repeats.

Population variation and phylogeny of TTTY22-CNV

In the 1234 Y chromosomes examined for copy number (ten samples were excluded because of Y-chromosomal mosaicism), 266 had 0 copies, 943 had 1, 23 had 2 copies, and two had 3 copies (Table S2). Examination of the Y-chromosomal phylogeny established for these samples using Y-SNP variation (Poznik et al. 2016) showed that, as expected, the different copy numbers of TTTY22-CNV were often clustered in branches of the phylogeny, so that the number of mutational events inferred was just 20 (Fig. 2, and see below for further details), nevertheless indicating a relatively high mutation rate compared with SNPs. IR3 (including TTTY22-CNV) is not detectable in the chimpanzee or gorilla genome sequences (the longest BLAST hit is 7.4 kb, in block B, and shares only 87.97% sequence identity with chimpanzee Y:15,046,773-15,054,094) and the TTTY22-CNV PCR primers do not detect any product in male chimpanzee or gorilla (Fig. 1c), so examination of an outgroup is not informative about the ancestral state of TTTY22-CNV. Within the human Y-chromosomal phylogeny sampled, all major haplogroups include structures matching the reference with one copy of TTTY22-CNV (Fig. 2), so we infer that the most recent common ancestor of the Y chromosomes examined probably also carried one copy.

A gene conversion mechanism for most TTTY22-CNV variation

Since the copy number of the IR3 repeat as a whole does not vary in the samples examined (see below for a partial exception), the most likely mechanisms to generate 0, 1, or 2 copies of TTTY22-CNV variation would be gene conversion or double crossover in the IR3 sequences flanking TTTY22-CNV, either replacing the proximal copy with the distal to generate 0 copies of TTTY22-CNV, or replacing the distal copy with the proximal to generate 2 copies (Fig. 1a, lower section). These flanking sequences are 99.45% identical (99.65% for Block A; 99.13% for Block B), but show occasional variants in the reference sequence differentiating the proximal and distal IR3 copies, known as Paralogous Sequence Variants (PSVs) or Sequence Family Variants (SFVs), which are expected to be present in the population as well. The low-coverage sequences available from the 1000 Genomes Project prevented reliable de novo detection of PSVs, because first, an erroneous PSV allele call might be made in a single read and misinterpreted as a true variant, and second, PSV state would not be called at all if there was zero coverage of a particular position. In contrast, the genotypes of PSVs known in the reference sequence can be extracted reliably, since the probability of a single-nucleotide variant that matches a known PSV allele being genuine is much higher than for a novel variant, even for low-coverage sequences. By aligning the reference sequences of the distal and proximal IR3 copies, we identified 165 PSVs and counted the number of reads carrying each PSV allele in each sample, usually detecting both alleles (Table S3). In this way, sizes of mutations including TTTY22-CNV and all such TTTY22-CNV mutations throughout IR3 were estimated (Figs. 3, 4, respectively; Table S4). Missing data increase the uncertainty of these estimates and thus decrease the resolution of the mapping; low sample depth increases the probability of data being missing and thus decreases the resolution. The resolution of the size estimates was entirely determined by the location of the PSVs: we can only detect a mutation when a PSV is affected. Therefore, for every mutation event, we give two numbers: the maximum and minimum lengths given the information from the PSVs.

Examination of the PSV profiles adjacent to the TTTY22-CNV thus provided the minimum and maximum size estimates for the genomic region accompanying the CNV change, as shown in Fig. 3. We then mapped the CNV changes onto the phylogeny constructed using the SNPs in the same sample set (Poznik et al. 2016). We identified 12 different mutation sizes, which could be resolved into 20 mutation events using the phylogeny, since events with indistinguishable sizes sometimes occurred on independent branches of the phylogeny. The largest mutation had a minimum size of > 32 kb.

Since gene conversion and double crossover produce indistinguishable structures (Chen et al. 2007), additional factors have to be considered to distinguish between the two possibilities. It is generally reasoned that crossovers are rare, so the chance of two occurring in close proximity is low, and therefore, structures resulting from exchange of information over lengths of 10 kb or less, which have been abundantly documented on the Y chromosome (Rozen et al. 2003; Bosch et al. 2004; Hurles et al. 2004; Hallast et al. 2013), have been interpreted as resulting from gene conversion. It is notable that gene conversion events of up to 9 kb have been reported on the Y chromosome (Hallast et al. 2013), exceeding the maximum size of ~ 4 kb on other chromosomes (Dumont and Eichler 2013; Trombetta and Cruciani 2017). Nevertheless, the large size of some of the events has been reported (Williams 1998; Repping et al. 2006). Although recurrent, only 12 changes in orientation for this inversion have so far been identified here suggests that reconsideration of the possible involvement of double crossovers is merited. We know of no reported measurement of the frequencies of double crossovers on the human Y chromosome. However, some information about the frequencies of single crossovers is available. Single crossovers between sister chromatids would result in isodicentric or acentric chromosomes, which are both very rare (and evolutionarily lethal), but intra-chromatid single crossovers result in inversions which are unlikely to affect the phenotype and have in fact been inferred between IR3_proximal and IR3_distal within the Y-chromosomal phylogeny (Repping et al. 2006). Repping et al. counted 12 inversions in the phylogeny they sampled and inferred a mutation rate of ≥ 2.3 × 10⁻⁴ per generation. This phylogeny consisted of a slightly different set of haplogroups from the current study, so we constructed and examined a consensus phylogeny based on the subset of haplogroups in common between the two studies. On this consensus phylogeny, there were eight inversion events and 13 TTTY22-CNV copy number changes (Fig. 2). Thus, TTTY22-CNV mutations are more common than single intra-chromatid crossovers. Double cross-over events close together should be considerably more rare than single crossovers. The known breakpoints of single crossovers are also located in a different region of IR3 (Turner et al. 2006, 2008), suggesting that the known crossovers in IR3 are both structurally independent from, and less frequent than, changes to TTTY22-CNV copy number. We, therefore, conclude that double crossovers do not provide a plausible mechanism for TTTY22-CNV copy number change, while gene conversion does, as noted for other Y loci (Hallast et al. 2013).

A distinct and more standard mechanism generating three copies of TTTY22-CNV

The two individuals with three copies of TTTY22-CNV are clustered in the phylogeny and share a common origin (Fig. 5a, b) which, in contrast to all the other TTTY22-CNV variants, cannot be accounted for by a simple gene conversion event between IR3 copies. Fibre-FISH shows the presence of a third copy of TTTY22-CNV and > 100 kb of flanking sequence (the maximum tested by this method) located ~ 230 kb away from one of the reference copies (Fig. 5d). Measurement of read depth in the two individuals carrying three copies identified ~ 420 kb extending from within IR3_proximal into the proximal flanking region with increased read depth, and analysis of the PSVs within this segment showed over-representation of proximal PSVs. For the part of IR3 shown by the fibre-FISH and read depth to be present in three copies, the proximal:distal PSV read depth ratio was 1.7 or 1.9 in the two individuals carrying three copies of TTTY22-CNV, while for the part present in two copies, the ratios were 1.3 and 1.0 (Table S5). Together, these data suggest that copy 3 originated by tandem duplication of a 420 kb region including TTTY22-CNV in IR3_proximal, from a chromosome with two copies (Fig. 5c).

Implications of TTTY22-CNV variation

Men carrying non-reference copy numbers of TTTY22-CNV are likely to have the corresponding numbers of copies of the functional TTTY22 gene, because ~ 15% of this gene that lies outside the CNV is provided by the flanking sequence. Their high frequency, and the large numbers who sometimes share the same mutational event, most marked in an E1b sub-lineage which carries 0 copies and is very common in sub-Saharan Africa, represented 260 times in the current data set (Poznik et al. 2016), demonstrate that copy numbers 0–3 allow male lineage expansion, and are unlikely to be detrimental. Given the long-term persistence of lineages with 0–3 copies of TTTY22-CNV in the population, and the extreme drift experienced by the Y chromosome, this variation in compatible with evolutionary neutrality. Nevertheless, the location of TTTY22 in the CNV suggests that the abundance of this transcript may vary between men. The gene is transcribed primarily in testis, with a lower level in brain (The GTEx Consortium 2017) and further work is needed to investigate whether or not subtle phenotypic consequences of variation in transcript levels can be detected. Men with three copies of TTTY22-CNV carry an additional large duplication of proximal Yp sequences and have in addition three copies of TTTY23, compared with two copies (one in each IR3 repeat) in other men. This duplication has also spread in the population, so also seems unlikely to be detrimental.

Overall, our findings highlight gene conversion as an additional mechanism for generating CNV in the human genome, especially on the Y chromosome, that has previously received little attention. The link between these two processes should be further explored and considered when either is investigated.

Methods

Data sets

We used the existing Y-chromosomal sequence data and CNV calls from phase 3 of the 1000 Genomes Project, where an initial CNV call set had been made using Genome STRiP and calls validated using array-CGH (Poznik et al. 2016). We also generated new 10× Genomics Chromium libraries from two of these samples (HG01097 and NA18953) following the manufacturer’s instructions (https://www.10xgenomics.com/genome/) followed by sequencing on the Illumina HiSeq X platform with 150 bp paired-end reads to a depth of ~ 30×. The sequence data were processed using the LongRanger 1.0 software using the reference sequence GRCh37 and viewed using the Loupe 2.1.0 software from 10x Genomics. All coordinates in this paper are based on GRCh37 to make them compatible with the initial CNV calls (Poznik et al. 2016).

TTTY22-CNV validation

We designed two sets of primers to validate the presence or absence of the 10 kb insert of TTTY22-CNV (dbVar description: https://www.ncbi.nlm.nih.gov/dbvar/variants/esv3818053/#VariantGenome), one set spanning the breakpoint of the empty site and the other lying within the 10 kb insert region. Primers, PCR conditions, and predicted product sizes are described in Table S1.

Molecular combing fibre-FISH experiments were carried out as described previously (Poznik et al. 2016). Probes consisted of two BAC clones (RP11-117N22 and RP11-453C1, obtained from the clone archive resource of Wellcome Trust Sanger Institute) used to identify the genomic region of interest, as well as two custom PCR probes of ~ 5 kb each lying mainly within the insert of TTTY22-CNV (details also in Table S1), which were combined to produce a single probe to distinguish the presence of the insert from its absence. We validated the CNV calls using both PCR and fibre-FISH for the same four samples: NA19146 with 0, HG00096 with 1, NA18953 with 2, and NA19661 with 3 copies. In addition, we applied 10× Genomics Chromium to two samples: HG01097 with 1 and NA18953 with 2 copies of TTTY22-CNV.

Sequence and phylogenetic analyses

We defined blocks A and B within IR3 (https://genome.ucsc.edu/) as being separated by TTTY22-CNV, and measured their similarity between the two copies of IR3 as 99.65% for the ~ 170 kb of block A and 99.13% for the ~ 101 kb of block B.

We placed the TTTY22-CNV copy number of each samples onto the full Y-chromosomal phylogeny based on SNPs (Poznik et al. 2016) to infer the number of mutational events (deletion or duplication). To understand the relation between the TTTY22-CNV and the IR3 inversion events (Repping et al. 2006), we placed both TTTY22-CNV copy number changes and IR3 inversion events onto a simplified phylogeny as described in the main text and compared their phylogenetic locations.

To investigate gene conversion events, we first identified PSVs (Table S3) between the two copies of IR3 from the reference sequence after aligning them using Basic Local Alignment Search Tool (https://blast.ncbi.nlm.nih.gov/Blast.cgi), and then counted the number of reads covering each allele of each PSV from the BAM files for each individual in the 1000 Genomes Project phase 3 data using a custom Perl script. To allow for the low sequence coverage, we established the following protocol. When six or more reads covering the same PSV were found in a sample, the chance of both IR3 copies being represented was ~ 97%, and so a ≥ 6:0 ratio of PSV alleles was taken to indicate that both IR3 copies carried the same allele, implying that a gene conversion (or double crossover) event had occurred. The length of such mutational events was inferred by extending the analysis to PSVs adjacent to the ≥ 6:0 seed, accepting ≥ 1:0 counts of the same allele as resulting from the same event.

Data access

All data on Y-chromosomal variation from the 1000 Genomes Project phase 3 are freely available:http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/. PSV read coverage is reported in Table S3.

References

Bosch E, Hurles ME, Navarro A, Jobling MA (2004) Dynamics of a human interparalog gene conversion hotspot. Genome Res 14:835–844. https://doi.org/10.1101/gr.2177404
Article CAS PubMed PubMed Central Google Scholar
Carvalho CM, Lupski JR (2016) Mechanisms underlying structural variant formation in genomic disorders. Nat Rev Genet 17:224–238. https://doi.org/10.1038/nrg.2015.25
Article CAS PubMed PubMed Central Google Scholar
Chen JM, Cooper DN, Chuzhanova N, Ferec C, Patrinos GP (2007) Gene conversion: mechanisms, evolution and human disease. Nat Rev Genet 8:762–775. https://doi.org/10.1038/nrg2193
Article CAS PubMed Google Scholar
Dumont BL, Eichler EE (2013) Signals of historical interlocus gene conversion in human segmental duplications. PLoS One 8:e75949. https://doi.org/10.1371/journal.pone.0075949
Article CAS PubMed PubMed Central Google Scholar
Hallast P, Balaresque P, Bowden GR, Ballereau S, Jobling MA (2013) Recombination dynamics of a human Y-chromosomal palindrome: rapid GC-biased gene conversion, multi-kilobase conversion tracts, and rare inversions. PLoS Genet 9:e1003666. https://doi.org/10.1371/journal.pgen.1003666
Article CAS PubMed PubMed Central Google Scholar
Hastings PJ, Lupski JR, Rosenberg SM, Ira G (2009) Mechanisms of change in gene copy number. Nat Rev Genet 10:551–564. https://doi.org/10.1038/nrg2593
Article CAS PubMed PubMed Central Google Scholar
Hurles ME, Willey D, Matthews L, Hussain SS (2004) Origins of chromosomal rearrangement hotspots in the human genome: evidence from the AZFa deletion hotspots. Genome Biol 5:R55. https://doi.org/10.1186/gb-2004-5-8-r55
Article PubMed PubMed Central Google Scholar
Jobling MA (2008) Copy number variation on the human Y chromosome. Cytogenet Genome Res 123:253–262. https://doi.org/10.1159/000184715
Article CAS PubMed Google Scholar
Jobling MA, Tyler-Smith C (2017) Human Y-chromosome variation in the genome-sequencing era. Nat Rev Genet Online. https://doi.org/10.1038/nrg.2017.36
Google Scholar
Lupski JR (2015) Structural variation mutagenesis of the human genome: impact on disease and evolution. Environ Mol Mutagen 56:419–436. https://doi.org/10.1002/em.21943
Article CAS PubMed PubMed Central Google Scholar
Massaia A, Xue Y (2017) Human Y chromosome copy number variation in the next generation sequencing era and beyond. Hum Genet 136:591–603
Article CAS PubMed PubMed Central Google Scholar
Poznik GD, Xue Y, Mendez FL, Willems TF, Massaia A, Wilson Sayres MA, Ayub Q, McCarthy SA, Narechania A, Kashin S, Chen Y, Banerjee R, Rodriguez-Flores JL, Cerezo M, Shao H, Gymrek M, Malhotra A, Louzada S, Desalle R, Ritchie GR, Cerveira E, Fitzgerald TW, Garrison E, Marcketta A, Mittelman D, Romanovitch M, Zhang C, Zheng-Bradley X, Abecasis GR, McCarroll SA, Flicek P, Underhill PA, Coin L, Zerbino DR, Yang F, Lee C, Clarke L, Auton A, Erlich Y, Handsaker RE, Genomes Project C, Bustamante CD, Tyler-Smith C (2016) Punctuated bursts in human male demography inferred from 1244 worldwide Y-chromosome sequences. Nat Genet 48:593–599. https://doi.org/10.1038/ng.3559
Repping S, van Daalen SK, Brown LG, Korver CM, Lange J, Marszalek JD, Pyntikova T, van der Veen F, Skaletsky H, Page DC, Rozen S (2006) High mutation rates have driven extensive structural polymorphism among human Y chromosomes. Nat Genet 38:463–467. https://doi.org/10.1038/ng1754
Article CAS PubMed Google Scholar
Rozen S, Skaletsky H, Marszalek JD, Minx PJ, Cordum HS, Waterston RH, Wilson RK, Page DC (2003) Abundant gene conversion between arms of palindromes in human and ape Y chromosomes. Nature 423:873–876. https://doi.org/10.1038/nature01723
Article CAS PubMed Google Scholar
Skaletsky H, Kuroda-Kawaguchi T, Minx PJ, Cordum HS, Hillier L, Brown LG, Repping S, Pyntikova T, Ali J, Bieri T, Chinwalla A, Delehaunty A, Delehaunty K, Du H, Fewell G, Fulton L, Fulton R, Graves T, Hou SF, Latrielle P, Leonard S, Mardis E, Maupin R, McPherson J, Miner T, Nash W, Nguyen C, Ozersky P, Pepin K, Rock S, Rohlfing T, Scott K, Schultz B, Strong C, Tin-Wollam A, Yang SP, Waterston RH, Wilson RK, Rozen S, Page DC (2003) The male-specific region of the human Y chromosome is a mosaic of discrete sequence classes. Nature 423:825–837. https://doi.org/10.1038/nature01722
Article CAS PubMed Google Scholar
Sudmant PH, Rausch T, Gardner EJ, Handsaker RE, Abyzov A, Huddleston J, Zhang Y, Ye K, Jun G, Hsi-Yang Fritz M, Konkel MK, Malhotra A, Stutz AM, Shi X, Paolo Casale F, Chen J, Hormozdiari F, Dayama G, Chen K, Malig M, Chaisson MJ, Walter K, Meiers S, Kashin S, Garrison E, Auton A, Lam HY, Jasmine Mu X, Alkan C, Antaki D, Bae T, Cerveira E, Chines P, Chong Z, Clarke L, Dal E, Ding L, Emery S, Fan X, Gujral M, Kahveci F, Kidd JM, Kong Y, Lameijer EW, McCarthy S, Flicek P, Gibbs RA, Marth G, Mason CE, Menelaou A, Muzny DM, Nelson BJ, Noor A, Parrish NF, Pendleton M, Quitadamo A, Raeder B, Schadt EE, Romanovitch M, Schlattl A, Sebra R, Shabalin AA, Untergasser A, Walker JA, Wang M, Yu F, Zhang C, Zhang J, Zheng-Bradley X, Zhou W, Zichner T, Sebat J, Batzer MA, McCarroll SA, Genomes Project C; Mills RE, Gerstein MB, Bashir A, Stegle O, Devine SE, Lee C, Eichler EE, Korbel JO (2015) An integrated map of structural variation in 2504 human genomes. Nature 526:75–81. https://doi.org/10.1038/nature15394
Sudmant PH, Mallick S, Nelson BJ, Hormozdiari F, Krumm N, Huddleston J, Coe BP, Baker C, Nordenfelt S, Bamshad M, Jorde LB, Posukh OL, Sahakyan H, Watkins WS, Yepiskoposyan L, Abdullah MS, Bravi CM, Capelli C, Hervig T, Wee JT, Tyler-Smith C, van Driem G, Romero IG, Jha AR, Karachanak-Yankova S, Toncheva D, Comas D, Henn B, Kivisild T, Ruiz-Linares A, Sajantila A, Metspalu E, Parik J, Villems R, Starikovskaya EB, Ayodo G, Beall CM, Di Rienzo A, Hammer MF, Khusainova R, Khusnutdinova E, Klitz W, Winkler C, Labuda D, Metspalu M, Tishkoff SA, Dryomov S, Sukernik R, Patterson N, Reich D, Eichler EE (2015) Global diversity, population stratification, and selection of human copy-number variation. Science 349:aab3761. https://doi.org/10.1126/science.aab3761
Article PubMed PubMed Central Google Scholar
Szostak JW, Orr-Weaver TL, Rothstein RJ, Stahl FW (1983) The double-strand-break repair model for recombination. Cell 33:25–35
Article CAS PubMed Google Scholar
The GTEx Consortium (2017) Genetic effects on gene expression across human tissues. Nature 550:201–213. https://doi.org/10.1038/nature24277
Google Scholar
Trombetta B, Cruciani F (2017) Y chromosome palindromes and gene conversion. Hum Genet 136:605–619. https://doi.org/10.1007/s00439-017-1777-8
Article CAS PubMed Google Scholar
Turner DJ, Shendure J, Porreca G, Church G, Green P, Tyler-Smith C, Hurles ME (2006) Assaying chromosomal inversions by single-molecule haplotyping. Nat Methods 3:439–445. https://doi.org/10.1038/nmeth881
Article CAS PubMed PubMed Central Google Scholar
Turner DJ, Tyler-Smith C, Hurles ME (2008) Long-range, high-throughput haplotype determination via haplotype-fusion PCR and ligation haplotyping. Nucleic Acids Res 36:e82. https://doi.org/10.1093/nar/gkn373
Article PubMed PubMed Central Google Scholar
Wang J, Fan HC, Behr B, Quake SR (2012) Genome-wide single-cell analysis of recombination activity and de novo mutation rates in human sperm. Cell 150:402–412
Article CAS PubMed PubMed Central Google Scholar
Williams G (1998) Mapping studies of the centromeric region of the human Y chromosome. D.Phil. Thesis, Oxford University
Zheng GX, Lau BT, Schnall-Levin M, Jarosz M, Bell JM, Hindson CM, Kyriazopoulou-Panagiotopoulou S, Masquelier DA, Merrill L, Terry JM, Mudivarti PA, Wyatt PW, Bharadwaj R, Makarewicz AJ, Li Y, Belgrader P, Price AD, Lowe AJ, Marks P, Vurens GM, Hardenbol P, Montesclaros L, Luo M, Greenfield L, Wong A, Birch DE, Short SW, Bjornson KP, Patel P, Hopmans ES, Wood C, Kaur S, Lockwood GK, Stafford D, Delaney JP, Wu I, Ordonez HS, Grimes SM, Greer S, Lee JY, Belhocine K, Giorda KM, Heaton WH, McDermott GP, Bent ZW, Meschi F, Kondov NO, Wilson R, Bernate JA, Gauby S, Kindwall A, Bermejo C, Fehr AN, Chan A, Saxonov S, Ness KD, Hindson BJ, Ji HP (2016) Haplotyping germline and cancer genomes with high-throughput linked-read sequencing. Nat Biotechnol 34:303–311. https://doi.org/10.1038/nbt.3432
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

We thank Ed Hollox for comments. Our work was supported by The Wellcome Trust (098051); W.S. was also supported by the State Scholarship fund (No. 201606940004) of the China Scholarship Council, and a National Natural Science Foundation of China grant (No. 31201029); and P.H. was supported by Estonian Research Council Grant PUT1036.

Author information

Qasim Ayub
Present address: School of Science, Monash University Malaysia, Jalan Lagoon Selantan, Bandar Sunway, 47500, Subang Jaya, Selangor Darul Ehsan, Malaysia
Wentao Shi, Andrea Massaia, and Sandra Louzada contributed equally to this work.

Authors and Affiliations

Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
Wentao Shi, Andrea Massaia, Sandra Louzada, Ruby Banerjee, Pille Hallast, Yuan Chen, Anders Bergström, Yong Gu, Steven Leonard, Michael A. Quail, Qasim Ayub, Fengtang Yang, Chris Tyler-Smith & Yali Xue
Department of Genetics, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, 30070, China
Wentao Shi
National Heart and Lung Institute, Imperial College London, London, SW7 2AZ, UK
Andrea Massaia
Institute of Molecular and Cell Biology, University of Tartu, Tartu, 51010, Estonia
Pille Hallast

Authors

Wentao Shi
View author publications
You can also search for this author in PubMed Google Scholar
Andrea Massaia
View author publications
You can also search for this author in PubMed Google Scholar
Sandra Louzada
View author publications
You can also search for this author in PubMed Google Scholar
Ruby Banerjee
View author publications
You can also search for this author in PubMed Google Scholar
Pille Hallast
View author publications
You can also search for this author in PubMed Google Scholar
Yuan Chen
View author publications
You can also search for this author in PubMed Google Scholar
Anders Bergström
View author publications
You can also search for this author in PubMed Google Scholar
Yong Gu
View author publications
You can also search for this author in PubMed Google Scholar
Steven Leonard
View author publications
You can also search for this author in PubMed Google Scholar
Michael A. Quail
View author publications
You can also search for this author in PubMed Google Scholar
Qasim Ayub
View author publications
You can also search for this author in PubMed Google Scholar
Fengtang Yang
View author publications
You can also search for this author in PubMed Google Scholar
Chris Tyler-Smith
View author publications
You can also search for this author in PubMed Google Scholar
Yali Xue
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Chris Tyler-Smith or Yali Xue.

Ethics declarations

Ethical approval

Cell lines and DNA samples were obtained from the Coriell Institute for Medical Research (Camden, New Jersey, USA) following their ethical procedures (https://catalog.coriell.org/1/Support/FirstOrder).

Conflict of interest

The authors declare that they have no conflict of interest.

Electronic supplementary material

Below is the link to the electronic supplementary material.

439_2017_1857_MOESM1_ESM.pdf

Fig. S1 Combined fibre-FISH images for HG00096 (1 copy of TTTY22-CNV), NA19146 (0 copies), NA18953 (2 copies), and NA19661 (3 copies). Probes and colors are the same as in Fig. 1 (PDF 1154 kb)

439_2017_1857_MOESM2_ESM.pdf

Fig. S2 10x Genomics Chromium linked-read sequence data plots for samples with one copy and two copies of TTTY22-CNV. Each plot shows the relative density of cross-linked barcodes (white, yellow = low; orange, red = high), and the read density (green, along each axis). Left-hand panels, sample with two copies of TTTY22-CNV (NA18953); right-hand panels, sample with one copy of TTTY22-CNV (HG01097). Upper panels, part of IR3_proximal containing TTTY22-CNV and flanking sequences; lower panels, part of IR3_distal containing the empty TTTY22-CNV site and flanking sequences. The sample with one copy shows a uniform distribution of barcodes and read depth in both plots, because its structure matches the reference sequence (used for mapping reads). In contrast, the sample with two copies shows an increase in read depth and shared barcodes in IR3_proximal, because all spanning molecules map to this region, and corresponding decreases in barcode sharing in IR3_distal (PDF 1489 kb)

439_2017_1857_MOESM3_ESM.xlsx

Table S1 Primer sequences and PCR conditions used for validation of the TTTY22-CNV copy number (upper) and custom probe production for fibre-FISH (lower). (XLSX 11 kb)

439_2017_1857_MOESM4_ESM.xlsx

Table S2 Samples included in this analysis, with a summary of the published TTTY22-CNV copy number, population of origin, and Y-chromosomal haplogroup (ISOGG 31 December 2013 version) as used in (Poznik et al. 2016) (XLSX 46 kb)

439_2017_1857_MOESM5_ESM.xlsx

Table S3 Number of reads covering each allele of each IR3 PSV in each sample. Cell color indicates read number, with darker colors showing more reads; separate palettes are used for the different blocks illustrated in Fig. 1. Separate sheets show all samples, and then the same information for samples classified according to TTTY22-CNV copy number (XLSX 3147 kb)

439_2017_1857_MOESM6_ESM.xlsx

Table S4 Gene conversion events between the two copies/arms of IR3 inferred from the number of reads covering each PSV (Table S3) (XLSX 375 kb)

439_2017_1857_MOESM7_ESM.xlsx

Table S5 Summary of number of reads covering IR3 PSVs in the two samples (NA19661 and NA19685) with three copies of TTTY22-CNV. Left: raw counts. Right: sum of reads for the region present in three copies (PSV-A3:A2 + PSV-B1:B129) and the region present in two copies (PSV-A4:A36). Note the increased ratio of proximal PSVs for the region present in samples with three copies, indicating partial duplication of IR3_proximal (XLSX 28 kb)

Table S6 PSV patterns in samples with one copy of TTTY22-CNV (XLSX 140 kb)

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Cite this article

Shi, W., Massaia, A., Louzada, S. et al. Copy number variation arising from gene conversion on the human Y chromosome. Hum Genet 137, 73–83 (2018). https://doi.org/10.1007/s00439-017-1857-9

Download citation

Received: 26 September 2017
Accepted: 28 November 2017
Published: 05 December 2017
Issue Date: January 2018
DOI: https://doi.org/10.1007/s00439-017-1857-9

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Copy number variation arising from gene conversion on the human Y chromosome

Abstract

Similar content being viewed by others

Overview of Statistical Methods for Genome-Wide Association Studies (GWAS)

Next-Generation Sequencing: Advantages, Disadvantages, and Future

Opportunities and challenges in long-read sequencing data analysis

Introduction