Characterization of phenylalanine hydroxylase gene variants and analysis of genotype–phenotype correlation in patients with phenylalanine hydroxylase deficiency from Fujian Province, Southeastern China

Background Phenylalanine hydroxylase deficiency (PAHD) is the most prevalent inherited disorder of amino acid metabolism in China. Its complex phenotype includes many variants and genotypes among different populations. Methods and results In this study, we analyzed the phenylalanine hydroxylase gene (PAH) variants in a cohort of 93 PAHD patients from Fujian Province. We also assessed genotype and phenotype correlation in patients with PAHD. A total of 44 different pathogenic variants were identified, including five novel variants. The three most prevalent variants among all patents were c.158G > A, p.(Arg53His) (18.03%), c.721C > T, p.(Arg241Cys) (14.75%), and c.728G > A, p.(Arg243Gln) (7.65%). The frequency of the c.158G > A, p.(Arg53His) variant was highest in patients with mild hyperphenylalaninemia, whereas the frequency of the c.1197A > T, p.(Val399 =) and c.331C > T, p.(Arg111Ter) variants was highest in patients with classic phenylketonuria. The most abundant genotypes observed in PAHD patients were c.[158G > A];[728G > A], c.[158G > A];[442-1G > A], and c.[158G > A];[721C > T]. Comparing allelic phenotype to genotypic phenotype values yielded fairly accurate predictions of phenotype, with an overall consistency rate was 85.71% for PAHD patients. Conclusions Our study identified a PAH variant spectrum in PAHD patients from Fujian Province, Southeastern China. Quantitative correlation analysis between genotype and phenotype severity is helpful for genetic counseling and management.


Introduction
Phenylalanine hydroxylase deficiency (PAHD, MIM#261,600) is an autosomal recessive metabolic disorder caused by variants in the gene encoding the enzyme phenylalanine hydroxylase (PAH; EC 1.14.16.1). PAHD is the most prevalent inborn genetic defect of the amino acid metabolism seen in China, with an average incidence of approximately 1:12,000 [1]. However, the incidence rate varies significantly across different regions, ranging from 1:30,000 to 1:2,000 [2][3][4][5]. PAH catalyzes the hydroxylation of L-phenylalanine (L-Phe), forming L-tyrosine using tetrahydrobiopterin (BH4) as a cofactor. PAH variants lead to a loss in enzyme activity and an increase in serum concentrations of phenylalanine (Phe). Abnormal accumulation of Phe in serum can damage the peripheral and central nervous systems, resulting in mental retardation, seizures, and 1 3 cerebral palsy to varying degrees if left untreated [6]. The mechanisms of neurotoxicity caused by elevated phenylalanine levels include impairing the synthesis of brain catecholamines, cholesterol, and proteins; extensively disturbing neutral amino acid transportation to the brain; inhibiting pyruvate kinase; dysregulating calcium levels; and inducing mitochondrial dysfunction, oxidative stress, and inflammation [6][7][8]. Based on the severity of the metabolic phenotype, PAHD is classified into three types: classic phenylketonuria (cPKU), mild PKU (mPKU), and mild hyperphenylalaninemia (MHP). Therefore, early diagnosis and treatment of PKU are essential in avoiding permanent damage.
The human PAH gene comprises 13 exons and 12 introns, which are located on chromosome 12q23.2. To date, more than 1,200 different variants have been reported in the PAH gene of patients with PAHD and have been registered in the Phenylalanine Hydroxylase Gene Locus-Specific database (PAHdb; http:// www. pahdb. mcgill. ca). The detection of PAH variants and the analysis of the correlation between the genotype and clinical phenotype are extremely valuable for genetic counseling, the selection of the most suitable therapeutic options, and prognosis prediction [9][10][11][12][13]. Although several genotype-based methods for phenotype prediction have been applied for this disease, further clarification on their predictive value is required [14][15][16][17]. In addition, the great regional and ethnic heterogeneity of PAH variants necessitates the establishment of a spectrum of PAH variants in a population-specific manner. Certain studies have reported the characteristics of PAH gene variants in different populations, including the Chinese population [2,3,[17][18][19]. However, the spectra of PAH variants show certain variations among the populations in different provinces of China [2,3,18]. This necessitates the establishment of a PAH gene variant spectrum among the PAHD patients in Fujian Province, Southeastern China.
In this study, we analyzed the PAH variants in a cohort of 93 PAHD patients from Fujian Province using next-generation sequencing (NGS) and Sanger sequencing with the aim of characterizing the distribution of PAH variants in this region. Additionally, we analyzed the genotype and phenotype correlation in patients with PAHD.

Study subjects
A total of 93 children (56 males and 37 females) diagnosed at the Medical Genetic Diagnosis and Therapy Center of Fujian Province Maternal and Child Health Hospital between January 2016 and September 2021 were included in this study. All patients and their parents were from the Chinese Han population.
All patients were identified through a neonatal hyperphenylalaninemia screening program. We applied tandem mass spectrometry to measure plasma phenylalanine concentrations from dried blood samples before treatment was started. All patients had plasma Phe levels of > 120 μmol/L, and Phe:Tyr ratios of > 2. Additionally, a urinary pterin analysis and dihydropteridine reductase activity assay were performed on dried blood spot samples to exclude patients with tetrahydrobiopterin reductase deficiency. The metabolic phenotypes of the patients were classified according to their plasma Phe concentrations before treatment, and the maximum pretreatment values were applied in this study. The patients were diagnosed with cPKU (Phe ≥ 1,200 μmol/L), mPKU (Phe: 360-1200 μmol/L), or MHP (Phe: 120-360 μmol/L) [20].

Genotype analysis
Genomic DNA was isolated from the peripheral blood using the QIAamp DNA Mini Kit (Qiagen, Germany), following the manufacturer's instructions. NGS was used to detect the PAH gene variants using Biosan (Zhejiang, China). The basic edition panel of inherited metabolic diseases was used to detect 94 genes, including PAH, PTS, GCH1, QDPR, and PCBD1. Briefly, the target region sequences were enriched by multiple probes, and then, the capture products were purified. Thereafter, library construction, sequencing, and data analysis were carried out. Then, the detected variants were verified using Sanger sequencing. To determine sequence variability, the genes of the respective parents were screened for the variables detected in their offspring.
The obtained sequences were compared with the wildtype transcript of human PAH (NM_0002777.3) to identify potential variants. Variant nomenclature was adherent to the guidelines and recommendations of the Human Genome Variation Society (http:// varno men. hgvs. org/). Gene variants were classified according to the ACMG guidelines (https:// clini calge nome. org/).

De novo variants pedigrees
Paternity testing was subsequently conducted to determine the biological nature of the relationship between the patient and their parents, with the aim of confirming that the de novo variants were not inherited from parents.

Genotype-phenotype prediction
Two different algorithms were used to analyze the correlation between genotype and phenotype. Based on the in vitro residual activity associated with the PAH variants [14], the gene variants were predicted to cause any of the three phenotypes: cPKU (21.1% ± 7.0%), mPKU (40.2% ± 7.6%), and MHP ( 52.1% ± 8.5%). In addition, the allelic phenotype values (APV) of the PAH variants were queried in the BIOPKU database (http:// www. biopku. org). Phenotypes were predicted using genotypic phenotype values (GPVs) [15], which were equal to the higher APV alleles: (i) GPV = 0-2.7 for cPKU, (ii) GPV = 2.8-6.6 for mPKU, and (iii) GPV = 6.7-10 for MHP [15]. We were unable to predict the phenotype of one variant that did not have an APV score in the BIOPKU database. Finally, we defined these alleles as null alleles, such as frameshift, splice-site, and nonsense variants.

Statistical analyses
All categorical data were expressed as proportions. Comparisons between the distribution frequencies of PAH variants in the different subtypes of the disorder were performed using the χ 2 test. All statistical analyses were implemented using the statistical software SPSS 17.0. P value < 0.05 was considered to be statistically significant.

Genotype analysis of PAHD patients
The frequency of variants that occurred in more than 1% of the patients were compared among the three subgroups of PAHD patients' six variants: c.158G > A, p.  Table 3).
Out of the 93 patients, 96.77% (90/93) carried biallelic variants and were either homozygous (n = 3) or compound heterozygous (n = 87). Monoallelic variants were detected in three patients, one of which presented the cPKU phenotype, whereas the other two presented the MHP phenotype.
At the amino acid level, 63 distinct combinations were detected in 90 patients (

Prediction of genotype-phenotype correlations in PAH-deficient patients
A total of 84 patients met the requirements for phenotype prediction based on the APV/GPV system analysis. As shown in Table 4, this analysis accurately predicted 56.25% (9/16) of cPKU, 77.27% (17/22) of mPKU, and

Pedigrees of patients with de novo variants
By screening the corresponding loci containing the variant in the parents, we found one patient carrying de novo variant. Subsequently, paternity testing confirmed the biological relationship between the patient and parents. Thus, the de novo variant was further confirmed.    Table 5.

Discussion
In this study, 93 patients with PAHD that had been identified via national newborn screening in the past 5 years were included.  **Mean activity is defined as the average sum of activities of both two alleles, and expressed as the percentage of the wild-type enzyme ***The expected phenotypes were predicted by the mean of the two allelic activities. The activity cut-off was the following: cPKU, 21.1% ± 7.0%; mPKU, 40.2% ± 7.6%; MHP, 52.1% ± 8.5%. cPKU-mPKU means the activity was between cPKU and mPKU relatively high frequencies in mPKU patients. In addition, variant c.158G > A, p.(Arg53His) was relatively more frequent among MHP patients, in line with the results of previous large-sample studies [3,24]. In vitro studies have reported that the residual activity of the c.158G > A, p.(Arg53His)-type PAH enzyme was equivalent to 79% of that of the wild-type [14,25].  [3,24]. Previous studies showed that the detection rate of variant p.(Arg241Cys) in mPKU and MHP patients from southern China was higher than that in other areas [3,28,29]. Interestingly, variant c.721C > T, p.(Arg241Cys) was more frequent in patients with mPKU (32.50%) and MHP (11.21%) compared to that in the cPKU (3.70%) group, suggesting that our study may represent regional and demographic differences to a certain degree. Although Arg241 is localized close to the cofactor binding region, this amino acid is not directly involved in the interaction with the cofactor [28]. Hence, variant c.721C > T, p.(Arg241Cys) causes relatively mild structural changes.
Most of the patients (63/90) included in our study presented a unique genotype, showing that the genotypes in Fujian Province are highly heterogeneous. In addition, 93.54% (87/93) of the patients were compound heterozygotes, a slightly higher proportion that was reported by the BIOPKU database (76%  (3.33%), which is inconsistent with an earlier finding [30]. This may be related to the high proportion of patients with MHP in this study. The high proportion of patients with MHP may also represent regional and ethnic heterogeneity. Interestingly, two c.721C > T, p.(Arg241Cys) homozygous patients showed a MHP phenotype, which was also found in previous studies [3,21]. In addition, a total of 12 compound heterozygous variants were detected in the 13 patients with cPKU. At present, prenatal diagnosis is increasingly used to prevent the recurrence of PKU in families [31][32][33]. Therefore, these results can provide preliminary and valuable data for the prenatal diagnosis and prevention of cPKU.
Previous studies have attempted to uncover a quantitative correlation between the genotype and phenotype in PAHD patients using a series of different algorithms [15,16,[34][35][36]. APV and GPV have been identified as systems with high sensitivity and specificity for phenotypic prediction [4,15]. In this study, the APV/GPV-based prediction system adequately predicted the actual phenotype, as the overall consistency rate was 85.71% for PAHD, and 100% MHP. The APV value of variant c.158G > A, p.(Arg53His) was 9.5, and the phenotype was predicted to be MHP patients. In this study, we found that variant c.158G > A, p.(Arg53His) was closely associated with MHP. These findings strongly support the close correlation between the genotypes and phenotypes underlying PAHD using the APV/GPV-based prediction system. By contrast, the prediction system based on the average value of the residual activity of the two alleles in vitro did not perform well, with a prediction accuracy rate of only 65.82% for PAHD and 42.42% for mPKU. Genotypic and phenotypic inconsistencies existed mainly in patients that had compound heterozygous variants. The mechanism underlying these discrepancies requires further clarification. The co-expression of different PAH variants may lead to a residual activity that differs from the predicted activity owing to intermolecular interactions [37]. However, the accuracy of these two prediction systems in predicting PKU needs to be further improved. Therefore, further studies should optimize these systems to accurately predict phenotypes. Certain studies show that large-scale deletion/duplication variants were related to the pathogenesis of PAHD. In a large cohort of 475 PKU families in Northwest China, 74 cases without two known pathogenic variants were analyzed for large-scale deletion/duplication variants using multiplex ligation-dependent probe amplification (MLPA) method, and 7 large-scale deletion/duplication variants were found in 25 patients (25/475, 5.26%). In another Chinese population study, large-scale deletion/duplication variants were detected in 48 of 808 PAHD patients, accounting for 5.94%. Out of 293 patients with hyperphenylalaninemia in Russia, 10 patients with one PAH gene variant had gross deletions revealed. In the PAHD patients from Italy, the detection rate of large-scale deletion/duplication variants was as low as 1.3% (13/759). Interestingly, there was only one PAH gene variant that was detected in three patients in this study. These patients had normal pterin profiles and dihydropteridine reductase activity, and no variants were detected in PTS, GCH1, QDPR, and PCBD1. Therefore, a large-scale deletion/replication of the variants might be involved in the pathogenesis in these patients. Unfortunately, the patients were not further assessed using MLPA analysis.
Interestingly, there was one case in which the PKU pedigree identified a novel de novo PAH gene variant. The patient exhibited a compound heterozygous variant with c.[1223G > A];[722G > A]. However, the variant c.722G > A was not found in the corresponding locus in either of the parents. After confirming a biological relationship between the patient and the parents through the paternity testing, we considered this to be a true de novo variant. Thus, the identification of de novo variants may enable accurate and appropriate genetic counseling to be provided to the affected families.
In conclusion, we constructed a PAH gene variant spectrum of the PAHD patients of Fujian Province and identified novel variants that broaden the PAH gene variant spectrum. Exploring and clarifying the differences in the frequencies of different PAH gene variants among patients with different sub-phenotypes may elucidate the correlation between compound heterozygosity and phenotype. In addition, we demonstrated that genotype-phenotype prediction using the APV/GPV system resulted in a higher prediction accuracy than when the results of residual enzyme activity in vitro were used, illustrating that APV/GPV prediction could be a suitable tool for genetic counseling of families with PAHD patients.

Conflict of interest
The authors declare that they have no competing interests.

Ethical approval
The study was conducted according to the guidelines of the Declaration of Helsinki and approved by the Ethics Review Committee of Fujian Provincial Maternity and Child Hospital (permission no. 2017-037).
Informed consent Written informed consent was obtained from the guardians of all participants after providing a detailed description of the purpose of the study.

Consent to publish All authors read and approved the manuscript.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.