An effect of large-scale deletions and duplications on transcript expression

Mielczarek, Magda; Frąszczak, Magdalena; Zielak-Steciwko, Anna E.; Nowak, Błażej; Hofman, Bartłomiej; Pierścińska, Jagoda; Kruszyński, Wojciech; Szyda, Joanna

doi:10.1007/s10142-022-00946-5

An effect of large-scale deletions and duplications on transcript expression

Original Article
Open access
Published: 23 December 2022

Volume 23, article number 19, (2023)
Cite this article

Download PDF

You have full access to this open access article

Functional & Integrative Genomics Aims and scope Submit manuscript

An effect of large-scale deletions and duplications on transcript expression

Download PDF

1575 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

Since copy number variants (CNVs) have been recognized as an important source of genetic and transcriptomic variation, we aimed to characterize the impact of CNVs located within coding, intergenic, upstream, and downstream gene regions on the expression of transcripts. Regions in which deletions occurred most often were introns, while duplications in coding regions. The transcript expression was lower for deleted coding (P = 0.008) and intronic regions (P = 1.355 × 10⁻¹⁰), but it was not changed in the case of upstream and downstream gene regions (P = 0.085). Moreover, the expression was decreased if duplication occurred in the coding region (P = 8.318 × 10⁻⁵). Furthermore, a negative correlation (r = − 0.27) between transcript length and its expression was observed. The correlation between the percent of deleted/duplicated transcript and transcript expression level was not significant for all concerned genomic regions in five out of six animals. The exceptions were deletions in coding regions (P = 0.004) and duplications in introns (P = 0.01) in one individual. CNVs in coding (deletions, duplications) and intronic (deletions) regions are important modulators of transcripts by reducing their expression level. We hypothesize that deletions imply severe consequences by interrupting genes. The negative correlation between the size of the transcript and its expression level found in this study is consistent with the hypothesis that selection favours shorter introns and a moderate number of exons in highly expressed genes. This may explain the transcript expression reduction by duplications. We did not find the correlation between the size of deletions/duplications and transcript expression level suggesting that expression is modulated by CNVs regardless of their size.

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

Copy number variation (CNV) has been recognized as an important source of genetic variation (Gamazon et al. 2011; Jiang et al. 2014). Many studies were conducted in mammals, particularly in humans and rodents, to link structural variation to phenotypic variation and disease susceptibility (Stankiewicz and Lupski 2010; Yalcin et al. 2011). Potentially, CNVs exhibit a high impact on phenotypes, by changing gene structure and dosage, altering gene regulation, and exposing recessive alleles (Mills et al. 2011; Geistlinger et al. 2018). In pigs, it has been found that genes related to olfaction and neurological processes (Paudel et al. 2013, 2015), coat colour (Giuffra et al. 2002), diseases (e.g. umbilical hernia; (Long et al. 2016)), and production performance (Jiang et al. 2014; Xie et al. 2016) are enriched in CNVs. However, compared to humans and model organisms, relatively few studies of CNVs have been carried out in pigs. The Database of Genomic Variants (DGVa, www.ebi.ac.uk/dgva) comprises CNVs from only four studies: nstd24 identifying 37 variants (Fadista et al. 2008), nstd44 reporting 49 variants (Ramayo-Caldas et al. 2010), nstd138 with 223,216 variants (Li et al. 2017), and nstd117 with 737 variants (Long et al. 2016). During the last few years, CNVs in swine genomes were mostly detected using aCGH approach (Fadista et al. 2008; Wang et al. 2014) or SNP arrays (Ramayo-Caldas et al. 2010; Wang et al. 2012, 2013; Chen et al. 2018). Nevertheless, the development of next-generation sequencing (NGS) technology during the past decade has created new, more accurate tools of CNV detection at a base-pair resolution (Rubin et al. 2012; Paudel et al. 2013; Esteve-Codina et al. 2013; Liu et al. 2018).

Since CNVs play an important role in genomic studies, and pigs are one of the most economically important livestock species worldwide as well as a model organism for many human diseases (Meurens et al. 2012; Wang et al. 2014), the understanding of how CNVs act on transcript expression is essential. The availability of whole-genome and whole-transcriptome sequences allows for genome-wide identification of polymorphic variants and transcript expression levels. This project aims to characterize the impact of CNVs on the expression on transcript-level resolution.

Results

CNV calling pipeline and genomic annotation results

From 93,45 to 93,84 (DNA-seq) and from 87,15 to 91,45 (RNA-seq) percent of reads survived the cleaning procedure (details in the supplementary material S1, S3 and S4). The percentage of reads aligned to the reference genome for the six Polish Landrace boars was very similar and ranged between 98.16 and 98.49% with the percent of properly paired reads ranging between 93.99 and 95.15% (supplementary material S2). The average genome coverage after alignment varied from 17 to 21. The total number of deletions per individual ranged between 551 and 730, and duplications from 619 to 693. The length of deletions varied from 300 to 530,600 bp with the average of 3939 ± 10,976 bp and of duplications from 900 to 346,200 bp with the average duplication being 12,377 ± 20,154 bp long. Deletions occurred most often in introns (465–756 deletions per individual), followed by intergenic regions (235–312), coding regions (156–259), and up-/downstream genes regions (123–194) (Fig. 1a). Duplications were most common in coding regions (463–591) and introns (196–304) followed by intergenic (169–297) and upstream/downstream gene regions (173–246) (Fig. 1b).

CNV impact on gene expression

Random transcripts were selected to check if decreased/increased expression is not caused by the animal effect. The average expression of a random transcript that does not contain a CNV in 3 out of 6 animals was equal to the average expression of a random transcript without a CNV in the other 3 out of 6 animals (P = 0.456). While comparing mean transcript expression between individuals carrying CNV and those not carrying CNV in the particular transcript, we observed that transcript expression was significantly lower when a part of a coding region (P = 0.008) or an intron (P = 1.355 × 10⁻¹⁰) was deleted. Regions located upstream or downstream genes were not significantly altered by deletions (P = 0.085). For none region, the expression was increased by duplication (P = 0.990 for coding, P = 0.846 for intronic, and P = 0.872 for upstream and downstream gene regions). On the other hand, duplications of coding regions significantly decreased their expression levels (P = 8.318 × 10⁻⁵), while having no effect on expression when occurring in other regions. Since expression was decreased by duplication, we further explored the phenomenon of an effect of transcript length on its expression level based on longissimus dorsi muscle transcriptomic data of 143 sows (PRJNA403969) (Velez-Irizarry et al. 2019). There was a negative correlation between the length of the transcript and the transcript expression level (the correlation coefficient r = − 0.27, P = 0). The level of expression decreases exponentially with increasing transcript length (Fig. 2).

No consistent pattern was found regarding the correlation between the percent of deleted or duplicated transcript and transcript expression level (expressed as a difference between average expression in animals with CNV and animals with unaltered sequence). The correlation was not significant for all concerned genomic regions in five out of six animals except deletions in coding regions (P = 0.004) and duplications in introns (P = 0.01) for sample number 5. All P-values were presented in the Fig. 3.

Discussion

CNVs modulate transcript expression

CNVs contribute to the variation of gene expression in multiple ways including gene dosage effects, gene coding region disruption and deletion, or duplication of regulatory sequences (Stranger et al. 2007). It has been shown that CNVs alter not only gene expressions but also the timing of expression throughout the life of an organism (Chaignat et al. 2011). In humans, a positive correlation between CNV and gene expression in gastric cancer was found (Cheng et al. 2012). Moreover, it has been reported that CNVs induce gene expression changes in breast tumour tissues (Kumaran et al. 2017). Copy number loss and corresponding downregulation of gene expression was also found by Zhao and Zhao (2016) in human tumour suppressor genes. Moreover, considering particular genes, deletions decreased their expression resulting in disease phenotypes, e.g., retinitis pigmentosa (deletion of an intron of the PRPF31 gene) (Ruberto et al. 2021), DiGeorge syndrome (deletions within several genes) (Sellier et al. 2014), or Fanconi anaemia (deletion in exons of the FANCA gene) (Tischkowitz et al. 2004). Results confirming downregulation of gene expression by deletions are in line with observations made in present study on pigs. We hypothesize that deletions imply severe consequences by interrupting genes not only in coding but also in intronic regions. The latter may be explained by the fact that introns modify the expression level of their host gene in many different ways (Chorev and Carmel 2012) and for variety of genes important sequences that regulate expression are situated not in the promoter but rather are located within introns (Rose 2018).

Regarding duplications of coding regions, we observed a significant reduction of transcription expression. This is consistent with Qian et al. (2010) who reported that gene expression in yeast and mammals showed a substantial decrease after duplication. The authors explained that most of the reductions are neutral in the context of expression, but some are beneficial to rebalancing gene dosage after duplication. The dosage rebalancing hypothesis was also supported by Rogozin (2014), who stated that gene copy-number variations gains had a positive or negative dosage effect depending on the tissue type or environmental conditions. It means that balancing of positive and negative dosage effects is an essential factor causing diversification of expression patterns (rebalancing of expression) of duplicated genes in the course of fixation of gene duplications. Since expression was decreased by duplication, we also explored the effect of transcript length on its expression level. Our results demonstrate negative correlation between transcript length and its expression. High expression of short genes and lower expression of longer genes were reported in the literature. For example, to minimize the cost of transcription and other molecular processes such as splicing, natural selection may favour shorter introns in highly expressed genes (Castillo-Davis et al. 2002; Urrutia and Hurst 2003). Furthermore, highly expressed genes contain a moderate number of exons and produce shorter mRNAs with shorter 3′-UTRs (Chiaromonte et al. 2003). Moreover, Brown (2021) explained that transcription is a time-dependent process and it is expected that gene expression will be inversely related to gene length. This was confirmed by the author since high expression was found only among short, not among long genes, but at the same time, it was emphasized that the level of a gene’s expression can be affected by the tissue where the gene is transcribed. The complex analysis of genes length was also extensively investigated by Lopes et al. (2021). In the context of gene expression, the author stated that for shorter genes, their association with high levels of expression is not entirely correct and there is great variability of expression values among them. Instead, it was hypothesized that longer genes tend to be associated with functions important in the early development stages, while functions of shorter genes are important throughout the whole life (e.g. related to the immune system).

In the context of livestock, Park et al. (2014) did not observe a significant impact of copy number deletions on gene expression in race horses, but Lee et al. (2021) suggested that CNV can alter gene expression and affect multiple economically important traits in cattle. Moreover, Geistlinger et al. (2018) also pointed out the role of CNVs in the modulation of bovine gene expression by observing the effect of gene dosage in several immune genes and genes involved in major skeletal muscle pathways. The authors also concluded that CNV-associated expression may be manifested at the phenotypic level. In pigs, the ear size (one of the conformation traits that distinguish breeds) is modified by the 38.7-kb long CNV in the MSRB3 gene (Chen et al. 2018).

In relation to intergenic variants, little is known about their biological function in the genome (Antúnez-Ortiz et al. 2017). According to Chaignat et al. (2011), gene expression is a complex process depending on many factors including CNVs, and they were shown not only to modify the expression of genes that map within them but also located on their flanks and sometimes those at a great distance from their boundaries. They may even have effects on the gene whose distance can be as far as 1 Mb away by interacting with the functionality of the regulatory region like disrupting the interaction of the transcriptional unit with the promoter unit or by a change in the chromatin structure (Stranger et al. 2007; Klopocki and Mundlos 2011). For example, in the long-tailed macaque genome, a popular animal model in biomedical research, the 13-kb copy number loss of the intergenic region on chromosome 19 modulated the expression levels of neighbouring genes providing a selective advantage to environmental conditions. The results were significant for the kidney and the heart but not for the liver, lung, and spleen tissues. What is interesting, deletion affects expression of 45 genes that are encoded up to 1 Mb upstream and downstream of this CNV (Heckel et al. 2015). Although specific CNVs affecting expression upstream and downstream of these CNVs are known, it is worth to remember that intergenic CNVs are likely to better reflect neutrality than gene-containing CNVs (Perry 2008). This is in line with the results of our study where we investigated the overall contribution of intergenic CNVs in transcript expression modification which was not significantly changed by them.

Conclusions

In the presented study, we investigated CNV’s impact on transcript expression and observed that deletion of coding and intronic regions reduce expression. We hypothesize that deletions imply severe consequences by interrupting genes. The significant decreasing impact of gene sequence duplication was also confirmed by the analysis of our data. Moreover, we estimated the negative correlation coefficient between the size of the transcript and its expression level. It is consistent with the hypothesis that selection favours shorter introns and a moderate number of exons in highly expressed genes. Nevertheless, we did not find a correlation between the size of deletions/duplications and transcript expression level suggesting expression is modulated by CNVs regardless of polymorphisms size.

Materials and methods

Material

The dataset comprised whole genome DNA and RNA samples of six Polish Landrace boars from longissimus dorsi muscle. For whole genome sequencing, 100 ng of DNA was used to prepare the library with a TruSeq DNA Nano (Illumina) Kit. The total number of 150-bp-long reads varied from 328,516,078 to 404,898,834. The average genome coverage ranged from 20 to 24 × per sample. In the case of transcriptomic data, libraries were produced from 50 ng of RNA fragments using KAPA RNA HyperPrep (HMR) (Roche) Library Preparation Kit. Sequencing yielded libraries with an average size of 297 million reads per sample (from 241,605,646 to 337,017,230 reads), and the read length was 100 bp. Both, DNA- and RNA-seq data were generated using Illumina NovaSeq 6000 in the paired-end module. The exact number of reads per sample can be found in the supplementary material S1. Genomic and transcriptomic data of six Polish Landrace boars are deposited in the NCBI Sequence Read Archive with BioProject Number PRJNA804745. Moreover, we used transcriptomic data of 143 females (PRJNA403969), generated using Illumina Hiseq 2500 from longissimus dorsi muscle (Velez-Irizarry et al. 2019). The whole swine transcriptome required to perform transcript expression quantification was taken from the Ensembl database release 104 (Howe et al. 2021). The animal research ethics committee approval was not required since meat samples were taken immediately after the slaughter in the slaughterhouse.

CNV detection and expression quantification pipelines

Two bioinformatics pipelines were designed. (1) The CNV detection pipeline applied for DNA-seq data consisted of (i) quality control and data filtering, (ii) alignment of reads to the reference genome, (iii) quality control and data processing after alignment, (iv) CNV detection and filtration, and (v) CNV genomic annotation. The quality control was performed using the FastQC (bioinformatics.babraham.ac.uk/projects/fastqc) and MultiQC programs (Ewels et al. 2016). The Trimmomatic (Bolger et al. 2014) software was used to trim poor quality sequences by scanning each read with a 4-base sliding window and cutting them when the average quality per base dropped below 20 (the “SLIDINGWINDOW:4:20” option). The minimum length of read after trimming was set as 60 bp (“MINLEN:60”). The alignment to the Sscrofa11.1 (the assembly version: https://ftp.ensembl.org/pub/release-102/fasta/sus_scrofa/dna/) reference genome was performed by the BWA-MEM software (Li and Durbin 2009) with default parameters. Standard, post-alignment processes included sorting and indexing of reads; PCR duplicates removing and quality control with generating an alignment report. The average genome coverage was calculated using bedtools (Quinlan and Hall 2010). Two CNV detection tools, CNVnator (Abyzov et al. 2011) and Pindel (Ye et al. 2009), were used for variant identification. The algorithm implemented into the CNVnator software is based on the comparison of genome coverage (read-depth, RD) and assumes that regions with coverage different than the genome average correspond to CNVs (Medvedev et al. 2010). The size of the comparison-bin was set to 100 bp, which according to Abyzov et al. (2011), is recommended for samples with the approximate coverage of 20 × , as it was the case in our data. As a consequence, the CNVnator software provided a detection resolution of 200 bp of upstream and downstream CNV breakpoint positions. The split-read (SR) approach, implemented into the Pindel software, identifies CNVs by searching for read pairs in which one read from a pair is aligned to the reference genome at a unique position and its mate read is aligned only partially or is aligned to different regions of the genome. The raw outputs of both programs were edited by discarding variants outside the length range 50–1,000,000 bp. The CNV selection was performed by determining variants overlapping between both programs. The length-edited output of the CNVnator was used as a baseline dataset, which was validated by the length-edited output of the Pindel to identify overlapping regions. Each variant detected by the CNVnator, which was fully covered by Pindel (± 100 bp), was classified as validated. Moreover, to exclude false positive variants being a consequence of artefacts of the reference genome, CNVs overlapping with gaps in the reference genome were filtered out. Genomic annotation was carried out using the Variant Effect Predictor software (McLaren et al. 2016) to identify CNV located within genes or upstream and downstream gene regions up to 5000 bp. The upstream variant relates to the variant located upstream of the transcript start site (TSS) and the downstream variant to the downstream of the transcript termination site (TTS). If deletion/duplication overlapped more than one region (e.g., exon and intron) of the same transcript, it was not considered in this study.

(2) The expression quantification pipeline for transcriptomic data covered:

(i) quality control and data filtering; (ii) transcript quantification and expression normalization. Based on the FastQC report, we set the parameters for data filtering. The Trimmomatic software was used for filtering of low-quality data in the same way as described in the DNA-seq based pipeline above. Additionally, Illumina adapters were removed (“ILLUMINACLIP”). Transcript quantification and expression normalization were done by the Kallisto software (Bray et al. 2016) resulting in the expression expressed as TPM (transcripts per million).

Statistical analysis

Statistical analysis was performed based on the validated CNV dataset, separately for duplications and deletions. The Kolmogorov–Lilliefors test was used to test the null hypothesis that expression data comes from a normally distributed population. The null hypothesis that expression of transcripts overlapping with CNV is equal to unaltered ones was tested using the Wilcoxon-signed rank test:

$$\mathrm W=\frac12\sum\limits_{\mathrm i=1}^{\mathrm n}\overline{\mathrm r}\left({\mathrm D}_{\mathrm i}\right)+\frac{\mathrm n(\mathrm n+1)}4$$

where $\overline{\mathrm r}\left({\mathrm D}_{\mathrm i}\right)$ denotes the rank of absolute values $\left|{\mathrm D}_{\mathrm i}\right|=\vert{\mathrm Y}_{\mathrm i}-{\mathrm X}_{\mathrm i}\vert$ with sign corresponding to the sign of the difference (${\mathrm Y}_{\mathrm i}-{\mathrm X}_{\mathrm i})$. The $\mathrm n$ is the number of transcripts overlapping deletions/duplications for at least one and not more than five (out of six) considered animals. ${\mathrm X}_{\mathrm i}$ denotes the mean TPM value for $\mathrm i$ th transcript for pigs with deletion/duplication identified as a part of this transcript, while ${\mathrm Y}_{\mathrm i}$ represented the average expression in individuals with the unaltered $\mathrm i$ th transcript. The tests were performed separately for coding, intronic, and upstream/downstream gene regions. Additionally, for each individual randomly located transcripts non-overlapping with CNVs were chosen. Since each transcript was represented by six TPM values (one per animal), we randomly divided these values into two groups and the mean expression for each group was calculated. To check if the lower/higher expression is not an effect of animal, the hypothesis that there is no difference in expression level between those two groups was tested. The whole procedure was repeated 500 times.

The Spearman rank correlation between the percentage of transcript covered by CNV and the transcript expression level was estimated and tested for being different than zero, what denotes independence:

$$\mathrm T={\mathrm R}_{\mathrm S}\sqrt{\frac{\mathrm n-2}{1-\mathrm R_{\mathrm S}^2},}$$

where,

$${\mathrm R}_{\mathrm S}=1-\frac{6\sum_{\mathrm i=1}^{\mathrm n}{({\mathrm R}_{\mathrm i}-{\mathrm S}_{\mathrm i})}^2}{\mathrm n(\mathrm n^2-1)},$$

${\mathrm R}_{\mathrm i}$ denotes the ranks of the percentage of transcript overlapping with deletion/duplication and ${\mathrm S}_{\mathrm i}$ denotes the ranks of the difference between mean TPM expression for a given transcript in a group with CNV overlapping transcript and in a group with an unaltered sequence. The null hypothesis of the test can be approximated by the t-Student distribution with $\left(\mathrm n-2\right)$ degrees of freedom. Additionally, the Spearman correlation coefficient between transcript length and the transcript expression level was estimated based on transcriptomic data of 143 individuals. The significance of the correlation coefficient was tested, using the same test as described above. The R package (R Core Team 2022) was used to perform computations and create graphics.

Data availability

Genomic and transcriptomic data of six Polish Landrace boars are deposited in the NCBI Sequence Read Archive with BioProject Number PRJNA804745. Transcriptomic data of the 143 individuals can be found under PRJNA403969 identification number.

References

Abyzov A, Urban AE, Snyder M, Gerstein M (2011) CNVnator: an approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing. Genome Res 21:974–984. https://doi.org/10.1101/gr.114876.110
Article CAS PubMed PubMed Central Google Scholar
Antúnez-Ortiz DL, Flores-Alfaro E, Burguete-García AI et al (2017) Copy number variations in candidate genes and intergenic regions affect body mass index and abdominal obesity in Mexican children. Biomed Res Int 2017:2432957. https://doi.org/10.1155/2017/2432957
Article CAS PubMed PubMed Central Google Scholar
Bolger AM, Lohse M, Usadel B (2014) Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30:2114–2120. https://doi.org/10.1093/bioinformatics/btu170
Article CAS PubMed PubMed Central Google Scholar
Bray NL, Pimentel H, Melsted P, Pachter L (2016) Near-optimal probabilistic RNA-seq quantification. Nat Biotechnol 34:525–527. https://doi.org/10.1038/nbt.3519
Article CAS PubMed Google Scholar
Brown JC (2021) Role of gene length in control of human gene expression: chromosome-specific and tissue-specific effects. Int J Genomics 2021:8902428. https://doi.org/10.1155/2021/8902428
Article CAS PubMed PubMed Central Google Scholar
Castillo-Davis CI, Mekhedov SL, Hartl DL et al (2002) Selection for short introns in highly expressed genes. Nat Genet 31:415–418. https://doi.org/10.1038/ng940
Article CAS PubMed Google Scholar
Chaignat E, Yahya-Graison EA, Henrichsen CN et al (2011) Copy number variation modifies expression time courses. Genome Res 21:106–113. https://doi.org/10.1101/gr.112748.110
Article CAS PubMed PubMed Central Google Scholar
Chen C, Liu C, Xiong X et al (2018) Copy number variation in the MSRB3 gene enlarges porcine ear size through a mechanism involving miR-584-5p. Genet Sel Evol 50:72. https://doi.org/10.1186/s12711-018-0442-6
Article CAS PubMed PubMed Central Google Scholar
Cheng L, Wang P, Yang S et al (2012) Identification of genes with a correlation between copy number and expression in gastric cancer. BMC Med Genomics 5:14. https://doi.org/10.1186/1755-8794-5-14
Article PubMed PubMed Central Google Scholar
Chiaromonte F, Miller W, Bouhassira EE (2003) Gene length and proximity to neighbors affect genome-wide expression levels. Genome Res 13:2602–2608. https://doi.org/10.1101/gr.1169203
Article CAS PubMed PubMed Central Google Scholar
Chorev M, Carmel L (2012) The Function of Introns. Front Genet 3:55. https://doi.org/10.3389/fgene.2012.00055
Article PubMed PubMed Central Google Scholar
Esteve-Codina A, Paudel Y, Ferretti L et al (2013) Dissecting structural and nucleotide genome-wide variation in inbred Iberian pigs. BMC Genomics 14:148. https://doi.org/10.1186/1471-2164-14-148
Article CAS PubMed PubMed Central Google Scholar
Ewels P, Magnusson M, Lundin S, Käller M (2016) MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics 32:3047–3048. https://doi.org/10.1093/bioinformatics/btw354
Article CAS PubMed PubMed Central Google Scholar
Fadista J, Nygaard M, Holm L-E et al (2008) A snapshot of CNVs in the pig genome. PLoS One 3:e3916
Article PubMed PubMed Central Google Scholar
Gamazon ER, Nicolae DL, Cox NJ (2011) A study of CNVs as trait-associated polymorphisms and as expression quantitative trait loci. PLoS Genet 7:e1001292–e1001292. https://doi.org/10.1371/journal.pgen.1001292
Article CAS PubMed PubMed Central Google Scholar
Geistlinger L, da Silva VH, Cesar ASM et al (2018) Widespread modulation of gene expression by copy number variation in skeletal muscle. Sci Rep 8:1399. https://doi.org/10.1038/s41598-018-19782-4
Article CAS PubMed PubMed Central Google Scholar
Giuffra E, Törnsten A, Marklund S et al (2002) A large duplication associated with dominant white color in pigs originated by homologous recombination between LINE elements flanking KIT. Mamm Genome 13:569–577. https://doi.org/10.1007/s00335-002-2184-5
Article CAS PubMed Google Scholar
Heckel T, Singh A, Gschwind A, et al (2015) Chapter 4 — Genetic variations in the Macaca fascicularis genome related to biomedical research. In: Bluemel J, Korte S, Schenck E, Weinbauer GFBT-TNP in NDD and SA (eds). Academic Press, San Diego, pp 53–64
Howe KL, Achuthan P, Allen J et al (2021) Ensembl 2021. Nucleic Acids Res 49:D884–D891. https://doi.org/10.1093/nar/gkaa942
Article CAS PubMed Google Scholar
Jiang J, Wang J, Wang H et al (2014) Global copy number analyses by next generation sequencing provide insight into pig genome variation. BMC Genomics 15:593. https://doi.org/10.1186/1471-2164-15-593
Article PubMed PubMed Central Google Scholar
Klopocki E, Mundlos S (2011) Copy-number variations, noncoding sequences, and human phenotypes. Annu Rev Genomics Hum Genet 12:53–72. https://doi.org/10.1146/annurev-genom-082410-101404
Article CAS PubMed Google Scholar
Kumaran M, Cass CE, Graham K et al (2017) Germline copy number variations are associated with breast cancer risk and prognosis. Sci Rep 7:14621. https://doi.org/10.1038/s41598-017-14799-7
Article CAS PubMed PubMed Central Google Scholar
Lee Y-L, Takeda H, Costa Monteiro Moreira G et al (2021) A 12 kb multi-allelic copy number variation encompassing a GC gene enhancer is associated with mastitis resistance in dairy cattle. PLoS Genet 17:e1009331–e1009331. https://doi.org/10.1371/journal.pgen.1009331
Article CAS PubMed PubMed Central Google Scholar
Li H, Durbin R (2009) Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25:1754–1760. https://doi.org/10.1093/bioinformatics/btp324
Article CAS PubMed PubMed Central Google Scholar
Li M, Chen L, Tian S et al (2017) Comprehensive variation discovery and recovery of missing sequence in the pig genome using multiple de novo assemblies. Genome Res 27:865–874. https://doi.org/10.1101/gr.207456.116
Article CAS PubMed PubMed Central Google Scholar
Liu Y, Yang X, Jing X et al (2018) Transcriptomics analysis on excellent meat quality traits of skeletal muscles of the Chinese indigenous min pig compared with the large white breed. Int J Mol Sci 19. https://doi.org/10.3390/ijms19010021
Long Y, Su Y, Ai H et al (2016) A genome-wide association study of copy number variations with umbilical hernia in swine. Anim Genet 47:298–305. https://doi.org/10.1111/age.12402
Article CAS PubMed Google Scholar
Lopes I, Altab G, Raina P, de Magalhães JP (2021) Gene size matters: an analysis of gene length in the human genome. Front Genet 12:559998. https://doi.org/10.3389/fgene.2021.559998
Article CAS PubMed PubMed Central Google Scholar
McLaren W, Gil L, Hunt SE et al (2016) The ensemble variant effect predictor. Genome Biol 17:122. https://doi.org/10.1186/s13059-016-0974-4
Article CAS PubMed PubMed Central Google Scholar
Medvedev P, Fiume M, Dzamba M et al (2010) Detecting copy number variation with mated short reads. Genome Res 20:1613–1622. https://doi.org/10.1101/gr.106344.110
Article CAS PubMed PubMed Central Google Scholar
Meurens F, Summerfield A, Nauwynck H et al (2012) The pig: a model for human infectious diseases. Trends Microbiol 20:50–57. https://doi.org/10.1016/j.tim.2011.11.002
Article CAS PubMed Google Scholar
Mills RE, Walter K, Stewart C et al (2011) Mapping copy number variation by population-scale genome sequencing. Nature 470:59–65. https://doi.org/10.1038/nature09708
Article CAS PubMed PubMed Central Google Scholar
Park K-D, Kim H, Hwang JY et al (2014) Copy number deletion has little impact on gene expression levels in racehorses. Asian-Australasian J Anim Sci 27:1345–1354. https://doi.org/10.5713/ajas.2013.13857
Article CAS Google Scholar
Paudel Y, Madsen O, Megens H-J et al (2015) Copy number variation in the speciation of pigs: a possible prominent role for olfactory receptors. BMC Genomics 16:330. https://doi.org/10.1186/s12864-015-1449-9
Article CAS PubMed PubMed Central Google Scholar
Paudel Y, Madsen O, Megens H-J et al (2013) Evolutionary dynamics of copy number variation in pig genomes in the context of adaptation and domestication. BMC Genomics 14:449. https://doi.org/10.1186/1471-2164-14-449
Article CAS PubMed PubMed Central Google Scholar
Perry GH (2008) The evolutionary significance of copy number variation in the human genome. Cytogenet Genome Res 123:283–287. https://doi.org/10.1159/000184719
Article CAS PubMed Google Scholar
Qian W, Liao B-Y, Chang AY-F, Zhang J (2010) Maintenance of duplicate genes and their functional redundancy by reduced expression. Trends Genet 26:425–430. https://doi.org/10.1016/j.tig.2010.07.002
Article CAS PubMed PubMed Central Google Scholar
Quinlan AR, Hall IM (2010) BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26:841–842. https://doi.org/10.1093/bioinformatics/btq033
Article CAS PubMed PubMed Central Google Scholar
R Core Team (2022) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/
Ramayo-Caldas Y, Castelló A, Pena RN et al (2010) Copy number variation in the porcine genome inferred from a 60 k SNP BeadChip. BMC Genomics 11:593. https://doi.org/10.1186/1471-2164-11-593
Article CAS PubMed PubMed Central Google Scholar
Rogozin IB (2014) Complexity of gene expression evolution after duplication: protein dosage rebalancing. Genet Res Int 2014:516508. https://doi.org/10.1155/2014/516508
Article PubMed PubMed Central Google Scholar
Rose AB (2018) Introns as gene regulators: a brick on the accelerator. Front Genet 9:672. https://doi.org/10.3389/fgene.2018.00672
Article CAS PubMed Google Scholar
Ruberto FP, Balzano S, Namburi P et al (2021) Heterozygous deletions of noncoding parts of the PRPF31 gene cause retinitis pigmentosa via reduced gene expression. Mol vis 27:107–116
CAS PubMed PubMed Central Google Scholar
Rubin C-J, Megens H-J, Barrio AM et al (2012) Strong signatures of selection in the domestic pig genome. Proc Natl Acad Sci 109(48):19529–19536. https://doi.org/10.1073/pnas.1217149109
Article PubMed Central Google Scholar
Sellier C, Hwang VJ, Dandekar R et al (2014) Decreased DGCR8 expression and miRNA dysregulation in individuals with 22q11.2 deletion syndrome. PLoS One 9:e103884. https://doi.org/10.1371/journal.pone.0103884
Article CAS PubMed PubMed Central Google Scholar
Stankiewicz P, Lupski JR (2010) Structural variation in the human genome and its role in disease. Annu Rev Med 61:437–455. https://doi.org/10.1146/annurev-med-100708-204735
Article CAS PubMed Google Scholar
Stranger BE, Forrest MS, Dunning M et al (2007) Relative impact of nucleotide and copy number variation on gene expression phenotypes. Science 315:848–853. https://doi.org/10.1126/science.1136678
Article CAS PubMed PubMed Central Google Scholar
Tischkowitz MD, Morgan NV, Grimwade D et al (2004) Deletion and reduced expression of the Fanconi anemia FANCA gene in sporadic acute myeloid leukemia. Leukemia 18:420–425. https://doi.org/10.1038/sj.leu.2403280
Article CAS PubMed Google Scholar
Urrutia AO, Hurst LD (2003) The signature of selection mediated by expression on human genes. Genome Res 13:2260–2264. https://doi.org/10.1101/gr.641103
Article CAS PubMed PubMed Central Google Scholar
Velez-Irizarry D, Casiro S, Daza KR et al (2019) Genetic control of longissimus dorsi muscle gene expression variation and joint analysis with phenotypic quantitative trait loci in pigs. BMC Genomics 20:3. https://doi.org/10.1186/s12864-018-5386-2
Article PubMed PubMed Central Google Scholar
Wang J, Jiang J, Fu W et al (2012) A genome-wide detection of copy number variations using SNP genotyping arrays in swine. BMC Genomics 13:273. https://doi.org/10.1186/1471-2164-13-273
Article CAS PubMed PubMed Central Google Scholar
Wang J, Jiang J, Wang H et al (2014) Enhancing genome-wide copy number variation identification by high density array CGH using diverse resources of pig breeds. PLoS One 9:e87571
Article PubMed PubMed Central Google Scholar
Wang J, Wang H, Jiang J et al (2013) Identification of genome-wide copy number variations among diverse pig breeds using SNP genotyping arrays. PLoS One 8:e68683
Article CAS PubMed PubMed Central Google Scholar
Xie J, Li R, Li S et al (2016) Identification of copy number variations in Xiang and Kele Pigs. PLoS One 11:e0148565
Article PubMed PubMed Central Google Scholar
Yalcin B, Wong K, Agam A et al (2011) Sequence-based characterization of structural variation in the mouse genome. Nature 477:326–329. https://doi.org/10.1038/nature10432
Article CAS PubMed PubMed Central Google Scholar
Ye K, Schulz MH, Long Q et al (2009) Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads. Bioinformatics 25:2865–2871. https://doi.org/10.1093/bioinformatics/btp394
Article CAS PubMed PubMed Central Google Scholar
Zhao M, Zhao Z (2016) Concordance of copy number loss and down-regulation of tumor suppressor genes: a pan-cancer study. BMC Genomics 17(Suppl 7):532. https://doi.org/10.1186/s12864-016-2904-y
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

The computational power was provided by Poznan Supercomputing and Networking Centre. Muscle samples of Polish Landrace boars were kindly provided by slaughterhouse RTA SZWAGRZAK in Złoczew, Poland.

Funding

This work was supported by the Wrocław University of Environmental and Life Sciences (Poland) as the Ph.D. research program “Innowacyjny Naukowiec, no. N060/0035/20.”

Author information

Authors and Affiliations

Wroclaw University of Environmental and Life Sciences, Kozuchowska 7, 51-631, Wroclaw, Poland
Magda Mielczarek, Magdalena Frąszczak, Anna E. Zielak-Steciwko, Błażej Nowak, Bartłomiej Hofman, Jagoda Pierścińska, Wojciech Kruszyński & Joanna Szyda

Authors

Magda Mielczarek
View author publications
You can also search for this author in PubMed Google Scholar
Magdalena Frąszczak
View author publications
You can also search for this author in PubMed Google Scholar
Anna E. Zielak-Steciwko
View author publications
You can also search for this author in PubMed Google Scholar
Błażej Nowak
View author publications
You can also search for this author in PubMed Google Scholar
Bartłomiej Hofman
View author publications
You can also search for this author in PubMed Google Scholar
Jagoda Pierścińska
View author publications
You can also search for this author in PubMed Google Scholar
Wojciech Kruszyński
View author publications
You can also search for this author in PubMed Google Scholar
Joanna Szyda
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

M. M. provided the concept and funding of the study as well as implemented both bioinformatic pipelines. The ideas standing behind statistical analyses were provided by both M. F. and J. S. Moreover, M. F., B. H., and J. P. created custom-written scripts in python, bash, and R languages which were necessary to perform the analysis. A. Z. S. and B. N. collected muscle samples. W. K. provided helpful guidance on data collection and contributed to scientific discussions. M. M., J. S., and M. F. interpreted the results and drafted the manuscript. All authors contributed to the writing of the manuscript and approved the final manuscript version.

Corresponding author

Correspondence to Magda Mielczarek.

Ethics declarations

Ethics approval

Not applicable.

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (PDF 278 KB)

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Mielczarek, M., Frąszczak, M., Zielak-Steciwko, A.E. et al. An effect of large-scale deletions and duplications on transcript expression. Funct Integr Genomics 23, 19 (2023). https://doi.org/10.1007/s10142-022-00946-5

Download citation

Received: 26 September 2022
Revised: 14 December 2022
Accepted: 15 December 2022
Published: 23 December 2022
DOI: https://doi.org/10.1007/s10142-022-00946-5

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

An effect of large-scale deletions and duplications on transcript expression

Abstract

Introduction

Results

CNV calling pipeline and genomic annotation results

CNV impact on gene expression

Discussion

CNVs modulate transcript expression

Conclusions

Materials and methods

Material

CNV detection and expression quantification pipelines

Statistical analysis

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval

Conflict of interest

Additional information

Publisher's note

Supplementary Information

Supplementary file1 (PDF 278 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation