Inter-individual variation in cytosine modifications has been linked to complex traits in humans. Cytosine modification variation is partially controlled by single nucleotide polymorphisms (SNPs), known as modified cytosine quantitative trait loci (mQTL). However, little is known about the role of short tandem repeat polymorphisms (STRPs), a class of structural genetic variants, in regulating cytosine modifications. Utilizing the published data on the International HapMap Project lymphoblastoid cell lines (LCLs), we assessed the relationships between 721 STRPs and the modification levels of 283,540 autosomal CpG sites. Our findings suggest that, in contrast to the predominant cis-acting mode for SNP-based mQTL, STRPs are associated with cytosine modification levels in both cis-acting (local) and trans-acting (distant) modes. In local scans within the ±1 Mb windows of target CpGs, 21, 9, and 21 cis-acting STRP-based mQTL were detected in CEU (Caucasian residents from Utah, USA), YRI (Yoruba people from Ibadan, Nigeria), and the combined samples, respectively. In contrast, 139,420, 76,817, and 121,866 trans-acting STRP-based mQTL were identified in CEU, YRI, and the combined samples, respectively. A substantial proportion of CpG sites detected with local STRP-based mQTL were not associated with SNP-based mQTL, suggesting that STRPs represent an independent class of mQTL. Functionally, genetic variants neighboring CpG-associated STRPs are enriched with genome-wide association study (GWAS) loci for a variety of complex traits and diseases, including cancers, based on the National Human Genome Research Institute (NHGRI) GWAS Catalog. Therefore, elucidating these STRP-based mQTL in addition to SNP-based mQTL can provide novel insights into the genetic architectures of complex traits.
Complex Trait Combine Sample International HapMap Project National Human Genome Research Institute International HapMap Consortium
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, log in to check access.
This work was partially supported by grants from the National Institutes of Health: R21HG006367 (to WZ), R21CA187869 (to WZ and LH), and The Robert H. Lurie Comprehensive Cancer Center-Developmental Funds P30CA060553 (to WZ).
Supplementary material 1 Fig.1 Pearson’s correlation coefficients (ρ) of STRPs and cytosine modifications in local scans between CEU and YRI samples. Scatter plot for Pearson’s correlations (ρ) of STRP length and M-values of local CpGs within ± 1 Mb windows of STRPs for CEU and YRI. (PNG 269 kb)
Supplementary material 2 Fig.2 QQ-plots of the observedp-values fortrans-acting STRP-based mQTL. P-values are binned and displayed as hexagons. Different grey scales of each hexagon represent different counts of p-values. A total of > 200 million observed p-values from the whole-genome scan are shown. (a) CEU; (b) YRI. (PNG 204 kb)
Supplementary material 3 Fig.3 Enrichment of GWAS loci amongcis-acting STRP-based mQTL. The null distributions of the numbers of SNPs overlapped with GWAS loci are displayed as histograms. The asterisk marks the true number of SNPs overlapped with GWAS loci within different windows: (a) ± 100 Kb; (b) ± 500 Kb; and (c) ± 1 Mb of cis-acting STRP-based mQTL (p-value < 10−3). (PNG 97 kb)
Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple Testing. J Roy Stat Soc B Met 57:289–300. doi:10.2307/2346101Google Scholar
Berto G, Camera P, Fusco C, Imarisio S, Ambrogio C, Chiarle R et al (2007) The Down syndrome critical region protein TTC3 inhibits neuronal differentiation via RhoA and Citron kinase. J Cell Sci 120:1859–1867. doi:10.1242/jcs.000703PubMedCrossRefGoogle Scholar
Bolstad BM, Irizarry RA, Astrand M, Speed TP (2003) A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 19:185–193PubMedCrossRefGoogle Scholar
Bolton KA, Ross JP, Grice DM, Bowden NA, Holliday EG, Avery-Kiejda KA et al (2013) STaRRRT: a table of short tandem repeats in regulatory regions of the human genome. BMC Genom 14:795. doi:10.1186/1471-2164-14-795CrossRefGoogle Scholar
Du P, Zhang X, Huang CC, Jafari N, Kibbe WA, Hou L et al (2010) Comparison of Beta-value and M-value methods for quantifying methylation levels by microarray analysis. BMC Bioinform 11:587. doi:10.1186/1471-2105-11-587CrossRefGoogle Scholar
Hattori E, Ebihara M, Yamada K, Ohba H, Shibuya H, Yoshikawa T (2001) Identification of a compound short tandem repeat stretch in the 5′-upstream region of the cholecystokinin gene, and its association with panic disorder but not with schizophrenia. Mol Psychiatry 6:465–470. doi:10.1038/sj.mp.4000875PubMedCrossRefGoogle Scholar
Kuroda S, Schweighofer N, Kawato M (2001) Exploration of signal transduction pathways in cerebellar long-term depression by kinetic simulation. J Neurosci 21:5693–5702PubMedGoogle Scholar
Li R, Hsieh CL, Young A, Zhang Z, Ren X, Zhao Z (2015) Illumina synthetic long read sequencing allows recovery of missing sequences even in the “finished” C. elegans genome. Sci Rep. 5:10814. doi:10.1038/srep10814
McCarroll SA, Kuruvilla FG, Korn JM, Cawley S, Nemesh J, Wysoker A et al (2008) Integrated detection and population-genetic analysis of SNPs and copy number variation. Nat Genet 40:1166–1174. doi:10.1038/ng.238PubMedCrossRefGoogle Scholar
Murrell A, Heeson S, Cooper WN, Douglas E, Apostolidou S, Moore GE et al (2004) An association between variants in the IGF2 gene and Beckwith–Wiedemann syndrome: interaction between genotype and epigenotype. Hum Mol Genet 13:247–255. doi:10.1093/hmg/ddh013PubMedCrossRefGoogle Scholar
Ram D, Leshkowitz D, Gonzalez D, Forer R, Levy I, Chowers M et al (2015) Evaluation of GS Junior and MiSeq next-generation sequencing technologies as an alternative to Trugene population sequencing in the clinical HIV laboratory. J Virol Methods 212:12–16. doi:10.1016/j.jviromet.2014.11.003PubMedCrossRefGoogle Scholar
St George-Hyslop P, Haines J, Rogaev E, Mortilla M, Vaula G, Pericak-Vance M et al (1992) Genetic evidence for a novel familial Alzheimer’s disease locus on chromosome 14. Nat Genet 2:330–334. doi:10.1038/ng1292-330PubMedCrossRefGoogle Scholar
Stadler MB, Murr R, Burger L, Ivanek R, Lienert F, Scholer A et al (2011) DNA-binding factors shape the mouse methylome at distal regulatory regions. Nature 480:490–495. doi:10.1038/nature10716PubMedGoogle Scholar
Westfall P, Young S (1993) Resampling-based multiple testing: examples and methods for p-value adjustment. Wiley, New YorkGoogle Scholar
Wooster R, Cleton-Jansen AM, Collins N, Mangion J, Cornelis RS, Cooper CS et al (1994) Instability of short tandem repeats (microsatellites) in human cancers. Nat Genet 6:152–156. doi:10.1038/ng0294-152PubMedCrossRefGoogle Scholar
Zhang W, Duan S, Bleibel WK, Wisel SA, Huang RS, Wu X, He L, Clark TA, Chen TX, Schweitzer AC, Blume JE, Dolan ME, Cox NJ (2009) Identification of common genetic variants that account for transcript isoform variation between human populations. Hum Genet 125(1):81–93PubMedPubMedCentralCrossRefGoogle Scholar
Zhang W, Gamazon ER, Zhang X, Konkashbaev A, Liu C, Szilagyi KL et al (2015) SCAN database: facilitating integrative analyses of cytosine modification and expression QTL. Database (Oxford). doi:10.1093/database/bav025Google Scholar