Exome genotyping and linkage analysis identifies two novel linked regions and replicates two others for myopia in Ashkenazi Jewish families
Myopia is one of most common eye diseases in the world and affects 1 in 4 Americans. It is a complex disease caused by both environmental and genetics effects; the genetics effects are still not well understood. In this study, we performed genetic linkage analyses on Ashkenazi Jewish families with a strong familial history of myopia to elucidate any potential causal genes.
Sixty-four extended Ashkenazi Jewish families were previously collected from New Jersey. Genotypes from the Illumina ExomePlus array were merged with prior microsatellite linkage data from these families. Additional custom markers were added for candidate regions reported in literature for myopia or refractive error. Myopia was defined as mean spherical equivalent (MSE) of -1D or worse and parametric two-point linkage analyses (using TwoPointLods) and multi-point linkage analyses (using SimWalk2) were performed as well as collapsed haplotype pattern (CHP) analysis in SEQLinkage and association analyses performed with FBAT and rv-TDT.
Strongest evidence of linkage was on 1p36(two-point LOD = 4.47) a region previously linked to refractive error (MYP14) but not myopia. Another genome-wide significant locus was found on 8q24.22 with a maximum two-point LOD score of 3.75. CHP analysis also detected the signal on 1p36, localized to the LINC00339 gene with a maximum HLOD of 3.47, as well as genome-wide significant signals on 7q36.1 and 11p15, which overlaps with the MYP7 locus.
We identified 2 novel linkage peaks for myopia on chromosomes 7 and 8 in these Ashkenazi Jewish families and replicated 2 more loci on chromosomes 1 and 11, one previously reported in refractive error but not myopia in these families and the other locus previously reported in the literature. Strong candidate genes have been identified within these linkage peaks in our families. Targeted sequencing in these regions will be necessary to definitively identify causal variants under these linkage peaks.
KeywordsMyopia Genetic linkage Family studies
collapsed haplotype pattern
Center for Inherited Disease Research
copy number variation
genome-wide association study
heterogeneity logarithm of the odds
identical by descent
identical by state
logarithm of the odds
minor allele frequency
mean spherical equivalent
single nucleotide polymorphism
sequence tagged site (microsatellite)
whole exome sequence
whole genome sequence
Myopia is a common, complex trait with both genetic and environmental factors influencing risk [1, 2]. As rates of myopia have been increasing rapidly in many parts of the world, myopia is one of the most preventable forms of blindness that imposes significant socio-economic costs. Recent genomewide association studies (GWAS) have identified a number of loci associated with the risk of developing refractive errors [3, 4, 5, 6, 7, 8, 9] but so far few causal variants have been identified. Whole exome sequencing (WES) has been used in a number of traits to identify causal variants that modify the risk of developing traits and diseases and although this can be an attractive approach in phenotypes that are relatively uncommon, the challenges for identifying which variants are truly causal in a common trait like myopia are much greater. Population-based designs can be difficult to analyze and interpret for WES data and the sample size requirements can be prohibitive, especially as the cost of sequencing remains relatively high. Family-based study designs have several advantages over population-based studies, especially when focusing on rare variants, as these may be enriched within a family even if they are rare in the population and require lower numbers of individuals to retain sufficient power. This approach has been used successfully to identify genes increasing risk for pathogenic or “high” myopia (mean spherical equivalent (MSE) < − 6 diopters (D)) [10, 11, 12]. Family-based linkage studies using sparse panels of genetic markers (microsatellites and common single nucleotide polymorphisms (SNPs)) have identified regions of the genome likely to be harboring high-risk rare variants contributing to non-pathogenic myopia (MSE < −1D) in highly aggregated families [13, 14, 15, 16, 17, 18, 19, 20, 21, 22] but the causal variants responsible for these results have not yet been identified. Exome-focused arrays such as the Illumina ExomePlus array provide an inexpensive way of surveying variation in the coding regions of the genome, with content more targeted at coding variation. This study uses dense exome array genotype data to attempt to narrow in on genes with rare variants that strongly increase risk of myopia in our highly-aggregated Ashkenazi Jewish families from the Penn Family Study.
Patient recruitment and genotyping
Genotype data were available for 527 Ashkenazi Jewish individuals (64 extended families) selected due to their strong information content for linkage studies of myopia from among the 105 Ashkenazi Jewish families included in the Penn Family Study. Details of the recruitment of these families has been previously described . This study followed the tenets of the Declaration of Helsinki and informed consent was obtained from all subjects after explanation of the nature of the study and any potential consequences. This study was approved by the institutional review boards of the University of Pennsylvania and the National Human Genome Research Institute. All subjects were genotyped with the Illumina ExomePlus array by the Center for Inherited Disease Research (CIDR) at Johns Hopkins University.
CIDR standard quality control procedures were applied to the entire dataset. Blind duplicates and HapMap controls were distributed across plates for concordance checking. Cases and controls were evenly distributed across plates, but family members were kept on the same plate. Samples with suspected mixtures or unusual X and Y patterns or gender mismatch identified and dropped before release. SNP clustering was performed on all SNPs in project and SNP genotypes with genotype quality (GC) score less than 0.15 recoded as missing genotypes. Autosomal SNPs with less than 85% call rate, cluster separation of less than 0.3 and heterozygote rate greater than 80% were dropped. Subsets of SNPs manually reviewed are detailed in Supplementary Methods and details of SNPs not released due to technical failure can be found in Additional file 1: Table S1.
After receiving data from CIDR, additional quality control measures were applied. Genotype and phenotype data were combined and an additional 85 ungenotyped individuals were added to the pedigrees to complete family relationships. Detailed Mendelian error checking was performed in Sib-pair , sex discrepancies were calculated in PLINK  and samples which did not appear sufficiently matched to their recorded sex were dropped. Any unexpected duplicate samples were identified using PREST-PLUS  and one of the duplicate pair dropped. SNPs with > 1 errors in blind duplicates or HapMap controls were dropped and SNPs with > 1 Mendelian error after correction of pedigree relationships were also removed. Batch effects were tested for using a homogeneity test of minor allele frequency for each SNP on each plate compared to all other plates.  We averaged these statistics over all SNPs to determine how the plates deviated from each other in  PLINK. Heterozygosity rates across samples were checked and outlier samples excluded. Examination of samples for chromosomal abnormalities was performed and problematic samples identified. Autosomal SNPs with sex difference in allelic frequency > 0.2, sex difference in heterozygosity > 0.3 were also excluded. Variants monomorphic in the study were also excluded.
We did not filter SNPs based on Hardy-Weinberg equilibrium (HWE), instead SNPs that were not in HWE were flagged. All significant and suggestive SNPs reported here were in HWE. We did initially find a single SNP that was out of HWE at 16q22.1 that had a highly significant for two-point logarithm of the odds (LOD) score of 7.76. This SNP had an excess of heterozygotes (approximately 70%) and a decrease in both homozygotes. We later found that this SNP was within a known copy number variant (CNV), which is responsible for the heterozygote inflation. This SNP was removed from all analyses and is not reported as significant.
Remapping and merger of the SNP and STS data sets
After cleaning, we merged the exome variant data with older microsatellite (sequence tagged site (STS)) data from previous linkage studies in the same population [13, 14, 28]. All genetic markers (SNPs and STSs) were mapped onto a common genetic map, the Rutgers Map version 3 for GRCh37 . After merging, the entire data set consisted of 665 individuals from 64 extended families with 67,196 markers (399 STS) for analysis. Family-specific marker allele frequencies were estimated using a Monte-Carlo expectation maximization algorithm in sib-pair  and used in all linkage analyses.
A full description of the phenotyping has been previously described  but briefly families were eligible to be included in the study if there was an index case with a spherical equivalent (SpEq) of -1D or lower and no systemic or ocular disease. All adults in the family were classified as affected or unaffected based on these same criteria. In children, a more stringent approach to classification was used in order to account for normal refractive development. Individuals between 6 and 10 years of age were classified as unaffected if their MSE in both eyes was +2D or higher, and individuals whose MSE was between +2D and -1D were designated as unknown. Individuals aged 11–20 years with a minimum MSE of + 1.5D in both eyes were classified as unaffected. In this age group, individuals with a MSE between + 1.5D and -1D were placed in the unknown class.
Two-point linkage analyses
Two-point linkage analyses were performed using the program TwoPointLods . This is a parametric linkage analysis program, and we assumed an autosomal dominant model with a disease allele (D) frequency of 0.0133 and a 90% penetrance and 10% phenocopy rate (dd/Dd/DD = 0.1/0.9/0.9). Analysis was performed individually on each family. Cumulative LOD scores and heterogeneity (HLOD) scores were then calculated across all families.
SNP pruning for linkage disequilibrium
It is well-known that including markers that are in strong linkage disequilibrium (LD) in multi-point linkage analyses that assume linkage equilibrium can cause inflation of false positive rates. Previous analyses have allowed us to determine that even multi-point linkage analyses that attempt to adjust for intermarker LD are often inaccurate for very dense marker maps, so the data were pruned. All SNPs were condensed into 1 cM bins. The SNP with the highest minor allele frequency (MAF) in the bin was chosen to then represent the bin in the multi-point analyses. We performed further LD analysis on the binned SNPs in Haploview . For any SNP-pairs with an r2 value greater than 0.2, one of the SNPs in the pair was removed. Because of their high information level, no STS markers were removed in pruning analyses. Thus, after cleaning we were left with 3764 markers.
Multi-point linkage analyses
Collapsed haplotype pattern linkage analyses
A new approach to deal with intermarker LD without pruning is the collapsed haplotype pattern (CHP) method by Wang et al.  and implemented in the program SEQLinkage. This approach generates multiallelic pseudo-markers based on short haplotypes within specified genetic regions such as genes as determined using physical positions from RefSeq for GRCh37. The advantage of this approach it is does not require pruning as the multipoint analysis does. We then performed two-point linkage analysis of myopia with these pseudo-markers using Merlin .
We also performed two types of association analyses. The family-based association test FBAT [37, 38] was used to examine all variants across all families. We also used rv-TDT  which examines rare variants (MAF < 0.05). We chose a single trio of genotyped individuals from each extended pedigree for this analysis.
Functional annotation and microRNA target prediction
HLOD Scores for Genome-wide Significant Chromosomal Regions
Highest SNP (TP) or Gene (CHP)
Max LOD Score
Max HLOD Score
Alpha for HLOD
Max Multipoint HLOD near this location
Alpha for Multipoint HLOD
Max CHP HLOD at this location
Alpha for CHP HLOD
The association analyses using FBAT and rv-TDT found no genome-wide significant signals.
Here we report significant linkage with myopia at 1p36.12, 8q24, 7q36.1, and 11p15.1. The loci on chromosomes 1 and 11 are replications, while the loci on chromosomes 8 and 7 are novel. All of these linkage signals are cumulative effects across families. However, the families do not share identical linked haplotypes; if they did we should have seen significant association within these regions as well. This suggests that several different causal variants may exist across the linked families, with these causal variants possibly all being in the same gene (allelic heterogeneity).
Our strongest signals occurred in the 1p36.12 region, identified as significant in both the two-point and CHP analyses. Linkage of refractive error (but not myopia) to markers on 1q36  has been reported before in this population, and is therefore not a true replication. It now appears that this region did not exhibit significant linkage with myopia in previous analyses due to prior insufficient marker information at this location to detect linkage to the binary trait of myopia affection. The 1p36.12 region contained the highest overall two-point and CHP HLOD scores, both located either in or near LINC00339, a long non-coding RNA gene known to be associated with endometriosis [46, 47, 48] but with no published role in ocular disease. CDC42, a GTPase directly downstream of LINC00339, contained three significant variants in the two-point analysis, but has not previously been implicated in myopia either. However, one of its activation targets, LAMA1, has been found to cause myopia in the presence of other phenotypes . Slightly further upstream at 1p36.2, the genes FRAP1 and PDGFRA (both located on 1p36.2) have both been found to be associated with corneal curvature and eye size in Asian and European populations [50, 51]. Neither gene was found to be even suggestively linked to myopia in this study.
We report the discovery of a novel locus linked to myopia on 7q36.1, distinct from another known chromosome 7 locus, MYP17 [52, 53, 54], located on the opposite arm at 7p15. This locus was only detected by the CHP analyses and localized to the SCO-spondin gene (SSPO). The subcommissural organ (SCO) is one of the circumventricular organs, a set of brain structures that form the linkage between the central nervous system and the peripheral blood stream. It is one of the first differentiated brain structures to form and its function is largely unknown. SCO-spondin is a large glycoprotein from the thrombospondin. This protein is highly expressed during CNS development and is believed to be important in cellular adhesion, axonal pathfinding and homeostasis. The Pax6 mutation which causes a small eye and is known as Sey also causes abnormalities in the SCO . Homozygous Sey/Sey mice die at birth with numerous defects including an inability to properly form the SCO and Sey/+ mice demonstrate a mosaic of SCO cells, some of which are not expressing the Reissner’s fiber, a fibrous aggregation of the secreted molecules of the SCO and is formed by secretion of SCO-spondin, and other abnormalities related to normal development of this important brain region. This admittedly tenuous link to PAX6 is an intriguing addition to the complex story of myopia and its relationship to early brain development.
Another replicated locus on 11p15 was present at a suggestive level in the two-point analysis but at the genome-wide significant level in the CHP analysis. This is a true replication as this signal has not been previously seen in this population and adds to the body of evidence that some genomic feature in this location appears to be actively modulating the risk of developing myopia and refractive errors. This signal overlaps with the MYP7 locus  which spans 11p13-p15.4 and there is suggestive evidence of linkage in the two-point analyses of a 3’ UTR variant in PAX6. The role of PAX6, long postulated as a potential modifier of myopia risk, remains murky, with evidence both supporting and rejecting its involvement [17, 22, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66]. It remains to be seen whether repeated detection of signals in this location by multiple studies will turn out to be from PAX6 or another nearby gene. The CHP analyses of these data by contrast localized the signal to another gene, NCR3LG1 which is considerably upstream of PAX6 but still within the linkage region originally identified by Hammond . NCR3LG1 is a natural killer (NK) cell cytotoxicity receptor ligand and when it interacts with NKp30 results in NK activation and cell death. It interacts exclusively with NCR3 but not with other NK cell activating receptors. It has only been reported as expressed in tumor tissues. None of these facts make NCR3LG1 a particularly attractive candidate for myopia development however, and there are many other candidate genes in the region that, based on biological function, may be more likely to be causal genes (Additional file 3: Table S3).
Although they did not reach genome-wide significance, it is interesting to note that several loci did meet the criteria for suggestive evidence of linkage, including 7p14 close to the MYP17 locus at 7p15 [52, 53, 54].
It is unfortunate but not surprising that none of the association analyses were able to detect associations in the regions found in the linkage analyses. Family-based association analysis relies on risk alleles being shared across families either identical by state (IBS) or identical by descent (IBD). Linkage by contrast tracks the co-segregation of haplotypes and the trait within a pedigree, but is not concerned with whether those segregating haplotypes contain alleles IBS across different families. Using a founder population such as the Orthodox Ashkenazi Jewish families in this analysis increases the likelihood that there may be shared risk alleles across linked families, but this is not guaranteed. Therefore, this result, combined with the annotation of the significantly linked variants/genes discussed above, suggests that even using the exome-targeted array, we have likely not genotyped the actual causal allele(s) and instead are only able to detect its presence via linkage to specific haplotypes in each linked family.
This study found significant linkage to myopia in Ashkenazi Jewish families at four chromosomal loci - 1p36.12, 8q24, 7q36.1, and 11p15.1. The signals at 7q and 8q were novel, while the signals at 1p and 11p are replications of previously identified signals, albeit ones where the causal genes have yet to be identified. We were able to identify several potential causal genes, including WISP1 on 8q and SSPO on 7q, though with our limited exome-based array we were unable to resolve the signal further than the chromosomal regions. We plan to perform either targeted sequencing on the regions of interest or whole genome sequencing (WGS) on the most highly linked families to unequivocally identify the causal variants that account for the linkages to myopia detected here.
The authors thank all study participants and their families. The authors acknowledge that this work utilized the computational resources of the NIH HPC Biowulf cluster.
This study was funded in part by the National Eye Institute Grant R01 EY020483 and the Intramural Research Program of the National Human Genome Research Institute, National Institutes of Health.
Availability of data and materials
Data available from researchers under collaborative agreement due to restrictions of original informed consent.
DS and JEBW designed the experiment and were responsible for the acquisition of the data and helped to interpret the analysis results. CLS and AMM performed quality control and all genetic linkage analyses, interpreted the data, and drafted the manuscript. QL and ERH assisted in the analyses and interpretation of the data. CDD and DDL performed the association analyses. RYC, JBC and BAM performed the functional annotation of the all variants and were involved in the creation of any annotation tables. LP and FM performed the initial quality control on the exome array data. All authors read and approved the final version of the manuscript.
Ethics approval and consent to participate
This study was approved by the institutional review boards of the University of Pennsylvania and the National Human Genome Research Institute (02-HG-N191). All study participants provided informed written consent.
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
- 5.Stambolian D, Wojciechowski R, Oexle K, Pirastu M, Li X, Raffel LJ, et al. Meta-analysis of genome-wide association studies in five cohorts reveals common variants in RBFOX1, a regulator of tissue-specific splicing, associated with refractive error. Hum Mol Genet. 2013;22(13):2754–64.CrossRefPubMedGoogle Scholar
- 23.Duffy D. SIB-PAIR: A program for simple genetic analysis v1.00.beta. Queensland Institute of Medical Research; 2008. https://genepi.qimr.edu.au/staff/davidD/Sib-pair/Documents/sib-pair.html. Accessed May 2017.
- 25.Sun L. Detecting pedigree relationship errors. In: R. E, J. S, S. S, editors. Statistical Human Genetics:Methods and Protocols New York: Humana Press; 2012. p. 25-46.Google Scholar
- 30.Thomas A. TwoPointsLods: http://www-genepi.med.utah.edu/~alun/software/. Accessed May 2017.
- 49.Vilboux T, Malicdan MC, Chang YM, Guo J, Zerfas PM, Stephen J, et al. Cystic cerebellar dysplasia and biallelic LAMA1 mutations: a lamininopathy associated with tics, obsessive compulsive traits and myopia due to cell adhesion and migration defects. J Med Genet. 2016;53(5):318–29.CrossRefPubMedGoogle Scholar
- 50.Guggenheim JA, McMahon G, Kemp JP, Akhtar S, St Pourcain B, Northstone K, et al. A genome-wide association study for corneal curvature identifies the platelet-derived growth factor receptor alpha gene as a quantitative trait locus for eye size in white Europeans. Mol Vis. 2013;19:243–53.PubMedCentralPubMedGoogle Scholar
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.