Background

Pulmonary arterial hypertension (PAH, MIM #178600) is a rare vascular lung disease presenting increased pulmonary vascular resistance and elevation of mean pulmonary arterial pressure, leading to a grave prognosis of right heart failure without treatment. While survival rates are increasing with a number of recently developed treatments such as epoprostenol, the optimal care of patients with these therapies is unclear due to phenotypic variations and genetic backgrounds.

Despite the complex etiology of PAH due to incomplete penetrance and genetic heterogeneity, multiple genes that cause PAH have been discovered during the last few decades. BMPR2 variants have been identified in not only >70% of heritable PAH (HPAH), but also 10–40% of idiopathic PAH (IPAH) [16]. Variants of other genes including KCNK3, ACVRL1, ENG, CAV1, and the SMAD family are also rare causes of PAH [711]. Furthermore, common variants in CBLN2 and KCNA5 were also reported to be associated with the risk of PAH in European Caucasians [12, 13].

Although identification of individuals who carry genetic variants that increase the risk of developing PAH offers an opportunity for earlier diagnosis and finding a therapeutic strategy, the majority of previous studies was only focused on the protein coding regions of the most frequently mutated gene BMPR2 with the use of conventional methods such as Sanger sequencing [6, 14]. Thus, for the patients who have no mutation in BMPR2, i.e., around 30% of HPAH and 60–90% of IPAH, another approach, which is practical for multiple genes, is necessary. Considering that current state-of-the-art sequencing technologies allow us to access exome- and genome-wide variants with reasonable costs, unbiased screening of pathogenic variants is beneficial to continue expanding the genetic diagnosis catalogue. However, conventional segregation-based approaches, e.g., linkage analysis, do not have enough power to pinpoint the true pathogenic variants from these genome-wide candidate variants without functional evaluations, especially for diseases with phenotypic and genetic heterogeneity. In this study, applying a gene-based association test, we statistically evaluate the significance of rare variant enrichment in genes responsible for PAH. The strategy we employed here would be useful to unbiasedly elucidate the pathogenicity of multiple rare variants arising from independent founder events in Mendelian diseases as well as common diseases.

Methods

Subjects

We consecutively enrolled five families with PAH and four individual cases with a family history of PAH to this study (Fig. 1). The patients have been diagnosed in the National Hospital Organization Okayama Medical Center between 1996 and 2014. All subjects who participated in our study were approved by the Institutional Review Board of our institutes in which donors gave written informed consent in accordance with institutional and national guidelines.

Fig. 1
figure 1

Segregation of the pathogenic variants identified in 9 familial PAH. a Eight families carried BMPR2 variants and one family carried a KCNK3 variant. Nucleotide and amino acid changes for BMPR2 and KCNK3 are described on NM_001204.6 and NM_002246.2, respectively. Index patients of each family are pointed with arrows. The subjects whose DNAs were available are indicated in plus signs. b All the possible pathogenic variants discovered in the eight PAH families were located before or in the kinase domain. Four previously reported and four novel variants were indicated with black and red letters, respectively

Next-generation sequencing and data analysis

To understand comprehensive genetic background of these 9 PAH families, we applied whole exome- or genome-sequencing to 12 patients and 5 healthy family members whose DNA were available (Fig. 1 and Additional file 1: Table S1). For the exome sequencing, DNA fragments were enriched by SureSelectXT Human All Exon v4 + UTR (Agilent Technologies, Santa Clara, CA, USA) and then applied to SOLiD™ 5500XL sequencer (Thermo Fisher Scientific inc., Waltham, MA, USA). The whole genome sequencing was conducted with the Illumina HiSeq X sequencer (Illumina Inc., San Diego, CA, USA). After aligning the sequence reads onto the reference genome (NCBI Build 37) using the Burrows-Wheeler Aligner [15], downstream processes including the duplication removal, the recalibration of base quality values, the local realignment, the variant call, and the variant quality score recalibration were analyzed using GATK [16]. The variants were called with an exome sequencing data set of 300 control samples obtained from the Human Genome Variation Database (accession ID: HGV0000004) [17]. The resulting VCF file has been deposited on the same database under accession HGV0000005.

Quality control for association study

After removing variants that were assigned as low quality by the GATK VariantRecalibrator, additional filters were applied to extract high quality variants such as low call rate (<0.9), excessive strand bias (FS > 50), haplotype score (> 5), deviation from Hardy-Weinberg equilibrium (InbreedingCoeff > 0.3), mapping quality of the reads (MQ < 35), excess of zero mapping quality (MQ0 > 100), bias of mapping quality between reference and alternative alleles (MQRankSum < 13), coverage over sample (DP/sample < 10), positional bias of the reads (ReadPosRankSum > 5), quality over depth (QD < 8), and low LOD score (VQSLOD < 0). For the gene-based association analysis, we selected likely protein damaging variants (premature termination, splice site, missense, and indels on exons) to perform the Variable Threshold (VT) test [18] implemented in Variant Association Tools [19].

Annotation and screening of pathogenic variants

All identified variants were annotated using ANNOVAR [20]. Candidate pathogenic variants were screened according to the registrations and frequencies of the variants in the public databases: dbSNP (Build 147) [21], The 1000 Genomes (November 2010 data release) [22], The 10Gen Data Set (version 1.04) [23], NHLBI GO Exome Sequencing Project (ESP6500SI) [24], the Human Genetic Variation Database [17] or ClinVar [25]. For missense variants, PolyPhen-2 [26] and Mutation Taster [27], LRT [28] and PhyloP [29] score were obtained from the dbNSFP database [30]. Damaging effects of splice site variants were evaluated with MaxEntScan [31] and Human Splicing Finder [32].

Results

Gene-based association study

To identify genes responsible for the pathogenesis of PAH, we applied a gene-based association test (Variable Threshold test [18]) to the 60,367 damaging variants extracted from the nine PAH patients and the 300 control samples. These variants were located within 10,744 gene regions. Despite the small sample size, the burden of association between BMPR2 and PAH was highly significant (p = 6.0 × 10−8) compared to the genome-wide significance threshold (p < 2.4 × 10−6) after Bonferroni correction for approximately 21,000 genes (Fig. 2). No other gene was found beyond the threshold.

Fig. 2
figure 2

Gene-based association analysis. Quantile-quantile plot for VT test of the nine cases and 300 controls are shown. BMPR2 was the only gene that surpassed the genome-wide significance threshold for 21,000 genes

Identifying the pathogenic variants

The spectrum of rare variants found in BMPR2 is summarized in Table 1. Of the nine families, four carried previously reported single nucleotide pathogenic variants (2 missenses, 1 nonsense, and 1 splice site) [1, 4, 6] and four carried novel insertions/deletions (indels) in this gene (88.9%). One of the novel indels was a large deletion of 6.5 kilobases in length by which one allele lacks the entire region of exon 3 (Additional file 2: Figure S1). Two variants are suspected to be pathogenic although showing incomplete penetrance, since clinically unaffected subjects in the families harbored the same variants found in the patients (Table 1). There were no BMPR2 variants observed in the remaining one of nine families, but we identified one heterozygous missense variant (p.Gly203Asp) in KCNK3 by screening the previously reported pathogenic variants [25]. This variant was shown to disrupt the ion-channel function by patch-clump analysis [11]. None of the pathogenic variants we identified was observed in the 300 control samples or in the public database for the Japanese population [17]. Of these, all three missense variants were occurred at highly conserved nucleotides among vertebrates and were assumed to be damaging to the protein function by at least three in silico prediction programs [2629] (Table 1).

Table 1 Summary of pathogenic variants and clinical features

Discussion

To our knowledge, this is the first report of gene-based genome-wide association analysis of HPAH. A burden of rare variants in BMPR2 significantly contributes to risk of the disease (p = 6.0 × 10−8). The approach robustly detected the gene having a large effect on the pathogenesis of PAH, despite the genetic heterogeneity. Eight probands in the nine families harbored possible pathogenic variants in BMPR2. Half of these variants were novel indels. One of the novel indels was a large 6.5 kilobase deletion spanning the entire region of exon 3. Another novel indel was a three base insertion (NM_001204.6:c.1277-10_1277-9insGGG) in intron 9 (Additional file 3: Figure S2). Although we could not dismiss that this insertion has no responsibility to the disease pathogenicity, a potential creation of a new splice acceptor site by this insertion was strongly suggested from the multiple splice site prediction tools (Table 1, Additional file 4: Figure S3 and Additional file 5: Figure S4) [31, 32]. The remaining one patient harbored a missense variant in KCNK3, which was the first replicative finding of channelopathy in Japanese population. Among the nine families, all variants identified here were mutually exclusive, suggesting that the variants have originated from independent genetic founder events.

Patients who suffered from chronic lung diseases such as chronic obstructive pulmonary disease (COPD) and pulmonary fibrosis are prone to pulmonary hypertension (PH) development. They are categorized as Group 3 in the latest guidelines [33]. Most patients with COPD develop mild PH but 3–5% of them show a further rise in mean pulmonary arterial pressure >35 mmHg. It is unknown how “severe PH-COPD”, formerly known as “out-of-proportion PH” is induced. Furthermore, the PAH-approved drugs are yet to be approved for the patients with Group 3 PH. In this study, a male patient (OM0195 in HPAH005) had been treated at another hospital for COPD. He was later diagnosed with PH and referred to our hospital. This patient could be categorized as “severe PH-COPD”, if none of his family members developed PAH. Since we had treated his daughter for IPAH at our hospital, we clinically diagnosed them as HPAH. Genetic testing revealed an in-frame-deletion (c.1443_1445delGAA) in BMPR2 in both patients. Underlying genetic predisposition might be one of the reasons for developing “severe PH-COPD”. Given our finding and a similarity of morphological appearance of vascular lesions between Group 1 and “severe PH-COPD” patients [33], PAH-approved drug treatments tailored to genetic diagnosis could well be a therapeutic strategy for such patients.

Conclusions

According to the genetic testing registry at the National Institutes of Health, the available panels for clinical genetic testing for PAH do not include KCNK3 and the detection methods are limited. Considering that pathogenic variants could occur within or spanning non-coding regions with a variety of sizes, the sequencing of the entire region of candidate genes is recommended to further understand the genetic factors relevant to PAH. This strategy will be essential for improving genetic diagnosis and counseling for PAH.