Background

Phenylalanine hydroxylase (PAH) deficiency is an autosomal recessive disorder that results in elevated concentrations of the amino acid phenylalanine (Phe) in the blood [1,2,3,4]. Over 1000 PAH variants exist [5], and depending on the inherited alleles, affected individuals may have very mild to pronounced elevation of Phe [4]. Phenylalanine hydroxylase catalyzes the conversion of Phe into tyrosine and is key to maintaining a stable concentration of Phe in the blood [7]. When PAH activity is decreased, blood Phe concentration increases from the typical mean of 60 μmol/L [3]. In addition, an estimated 1–2% of cases of hyperphenylalanemia (HPA) are secondary to a deficiency in tetrahydrobiopterin (BH4), a necessary cofactor for PAH and other amino acid-metabolizing enzymes [4, 6]. Cases of mutations in a heat shock co-chaperone family member, DNAJC12 have been also reported to result in HPA [8]. If left untreated, the accumulation of Phe can result in profound neurocognitive disability [2]. Early diagnosis and intervention are essential to preserve cognitive function [1, 3].

Treatment guidelines recommend initiation of treatment as early as possible upon diagnosis of PAH deficiency [3]. Treatment options include dietary and pharmaceutical management. Dietary management involves severely restricted intake of Phe (and protein)-rich foods based on each individual’s maximum Phe tolerance [9, 10] in combination with medical foods to supplement inadequate intake of protein and other essential nutrients due to the Phe-restricted diet. Approved pharmaceutical treatments for PAH deficiency include pegvaliase and sapropterin. While pegvaliase, a Phe-metabolizing enzyme composed of pegylated recombinant phenylalanine ammonia lyase, is approved for use only in adults (United States) and persons aged 16 years and above (Europe) who have uncontrolled Phe in blood (> 600 uM/L) with current treatment [11, 12], sapropterin dihydrochloride, a synthetic form of BH4, is indicated for use in children (> 1 month of age) and adults with BH4-responsive PKU in conjunction with a Phe-restricted diet [2, 13, 14].

Phenylalanine hydroxylase deficiency is classified into mild HPA, mild phenylketonuria (PKU), moderate PKU, and classical PKU based on blood Phe concentration obtained in the neonatal period (Table 1); however, concentrations determined in this period are unlikely to reflect peak untreated levels, as neonates vary in their dietary exposure to Phe before the blood sample is taken, and early treatment often precludes obtaining more definitive Phe concentrations [1].

Table 1 Current classification and treatment guidelines for PAH deficiency

Because of the severe consequences of untreated phenylalanine hydroxylase deficiency, many countries currently perform routine newborn screening for elevated blood Phe concentration [15,16,17]. Methods for measuring Phe have evolved over time, with increasing accuracy, initiating with the bacterial inhibition assay (Guthrie test) in 1963 [18] to the current state-of-the-art tandem mass spectrometry [19]. The Guthrie test has been suggested to miss as many as 1 in 25 affected newborns screened at or before 3 days of age [20].

The accumulation of data from newborn screening programs with varied screening methods employed across the world provides an opportunity to evaluate the birth prevalence of HPA and PKU at the regional and global levels. Here, we systematically review the published literature and analyze regional differences in HPA and PKU birth prevalence, overall and for various clinically relevant blood Phe concentration cutoff values used in confirmatory testing.

Methods

The protocol for this literature review was registered with PROSPERO (International prospective register of systematic reviews: https://www.crd.york.ac.uk/PROSPERO/display_record.php?RecordID=156377; ID 156377).

Birth prevalence

For the purpose of this review and to ensure consistent methodology in calculation of birth prevalence estimates across studies, birth prevalence was defined as cases identified during newborn screening divided by the number of newborns screened. This method was most frequently described in studies reporting birth prevalence of PAH deficiency from newborn screening programs.

Literature search

PubMed and Embase were searched using a strategy based on the PICOS (population, intervention, comparison, outcomes, study design) framework (Additional file 1: Table A-1) [21]. The search strategy included terms to identify newborns, prevalence, incidence, newborn screening, Guthrie and other tests, PKU, HPA, and PAH deficiency (Additional file 1: Table A-2 and Table A-3). No language or time limits were implemented. Animal studies, editorials, and commentaries were excluded.

Study selection

Entries retrieved from PubMed and Embase were screened in two steps (Fig. 1): in level 1 screening, two researchers independently reviewed titles and abstracts; in level 2 screening, two researchers independently reviewed full-text articles. Lack of agreement on inclusion was resolved by discussion and consensus within the research team.

Fig. 1
figure 1

Study selection process. PRISMA chart modeled after Moher et al. [21]. BH4 = tetrahydrobiopterin; PAH = phenylalanine hydroxylase deficiency; PKU = phenylketonuria

In level 1 screening (Additional file 1: Table A-4), conference abstracts, studies reporting exclusively on BH4 deficiency but not on PAH deficiency, and studies that focused primarily on assay development and/or validation were excluded. Publications were eligible if the abstract or title indicated that the paper presented original research and contained numeric reports on the birth prevalence of PAH deficiency. Birth prevalence must have been reported on an unselected population (e.g., studies on institutionalized patients were not eligible) and was required to be directly measured (rather than estimated from models). When duplicate records reporting on one study were identified, only one was retained; in this circumstance, records published in English were preferred.

The following additional criteria were applied in level 2 screening (Additional file 1: Table A-5): articles were required to be written in English and birth prevalence was required to be based on confirmed cases. When two or more publications on any given region were identified, both were included if the research had been conducted by different groups, or if both the geography and time frame did not overlap. For reports with geographic and temporal overlap conducted by the same institution, the study covering the largest population was eligible.

Data extraction and quality assessment

Extracted data elements included country and region, dates of data collection, study design, assay method for screening and for case confirmation (when diagnostic methods varied among sites or over time, scoring for the estimate was based on the lowest scoring diagnostic method, per the list in Table 2), diagnosis as reported in the publication (“nominal diagnosis”), Phe concentration used as a positive cutoff value, whether patients with BH4 deficiency were included in the number of cases reported, number of newborns screened, number of cases, and reported birth prevalence. For publications that reported birth prevalence stratified by multiple variables, values for each variable were extracted separately (herein referred to as “estimates”).

Table 2 Quality assessment tool for birth prevalence estimates

Data were extracted by one researcher using a form specifically designed for this study; extracted data were verified by a second researcher. Each estimate was assessed for quality as strong, moderate, or weak in each of five scoring domains (Table 2). The quality assessment tool used in this study was based on existing tools for assessing the quality of studies that report the prevalence of conditions assessed by surveillance [22] or conditions of genetic origin [23].

Meta-analyses

To mitigate errors that may arise from using early, less reliable assays, such as the Guthrie test, only estimates derived from confirmatory diagnostic assays that were assessed as strong in the quality assessment tool (Table 2) were eligible for meta-analysis. Inclusion in the meta-analysis also required that the number of cases and the number of screened newborns were reported. For each region and Phe concentration cutoff value category, at least 2 birth prevalence estimates were required to conduct a meta-analysis. For regions and Phe concentration cutoff value categories with only one published birth prevalence estimate, the single published estimate was used to represent the region (or Phe concentration cutoff value) in the global prevalence estimates. Once the eligible estimates for each planned meta-analysis were identified, estimates with both temporal and geographic overlap were assessed, and the estimate representing the largest geographic coverage or time period was included.

Meta-analyses were performed to determine aggregated regional birth prevalence (Europe, North America, Middle East/North Africa, Latin America, South Pacific, and West Pacific; Additional file 1: Table A-6) and a global birth prevalence. The global birth prevalence was estimated by using two approaches. A “regionally weighted” global prevalence was calculated, in which results from each region were weighted by the region’s relative numerical contribution to the total population of the regions for each analysis. For this determination, country-specific population counts were obtained from 2020 United Nations population estimates [24] and were summed within each region to determine regional totals (weights for analyses incorporating results from six regions: Europe, 0.126; Latin America, 0.097; Middle East/North Africa, 0.125; North America, 0.055; Southeast Asia, 0.303; West Pacific, 0.293). A non-regionally weighted global prevalence was also calculated for comparison to other recently published PKU global birth prevalence estimates that were not regionally-weighted.

For both regional and global birth prevalence determinations, birth prevalence was calculated and stratified by three confirmatory Phe concentration cutoff values (360 ± 100 μmol/L, 600 ± 100 μmol/L, 1200 ± 200 μmol/L). When a publication reported birth prevalence by Phe cutoff interval (e.g., separate birth prevalence values for ≥ 360 ± 100 to 600 μmol/L, ≥ 600 ± 100 μmol/L to 1200 μmol/L and ≥ 1200 ± 200 μmol/L), the sum of all values above the cutoff value was used. Finally, an unstratified meta-analysis was conducted, which additionally included estimates from studies in which Phe cutoff values were not reported, to determine overall (regionally weighted and non-regionally weighted) birth prevalence.

To provide appropriate weights for meta-analysis, birth prevalence estimates were transformed using the double arcsine method [25]; meta-analysis was conducted using a random-effects model with inverse variance weighting. Transformation and calculations were performed using MetaXL (version 5.3, EpiGear International). Heterogeneity was assessed using the I2 statistic [26, 27].

Results

Literature search and review

Searches in PubMed and Embase identified 1112 entries (Fig. 1). Screening of 997 unique PubMed and Embase entries and an additional 28 publications identified from reference lists of screened entries identified 85 publications meeting eligibility criteria, resulting in 238 birth prevalence estimates (Additional file 2).

These 85 publications were published from 1964 [28] to 2019 [29] and reported on data from 1960 [30] to 2018 [29] from 59 countries. Newborn blood or urine samples for screening were taken between the first day of life [31] and age 3–8 weeks [32]; 25 publications (125 birth prevalence estimates) did not report age at screening. Phe concentration used for confirmatory testing ranged from 120 μmol/L [33] to over 2600 μmol/L [34]. Forty-three publications (135 birth prevalence estimates) did not report the cutoff value for confirmatory testing. Nominal diagnoses were inconsistent. For example, classical PKU was defined using confirmatory Phe cutoff values ranging from 726 μmol/L [35] to 1816 μmol/L [36]. Cases with BH4 deficiency were included in 5 publications (6 birth prevalence estimates) and the presence or absence of BH4 deficiency was not reported in 58 publications (186 birth prevalence estimates).

The only domains of the quality assessment tool on which > 50% of the estimates scored strong were statistical methods and study setting/source population. Sixty percent of the estimates scored moderate or weak on precision, and 53% scored moderate or weak on the method for case confirmation (Fig. 2A).

Fig. 2
figure 2

ae Quality of evidence assessments of birth prevalence estimates

Meta-analysis results

A total of 112 birth prevalence estimates (54 publications) scored strong in the quality assessment domain diagnostic method used for case confirmation and were therefore potentially eligible for meta-analysis. One publication (18 estimates) with strong scores in the diagnostic method used for case confirmation reported birth prevalence (in the format 1:8000), but did not provide the number of cases or screened newborns [37] and was not deemed eligible. No birth prevalence estimates from the African region were included in the meta-analysis, and the only estimates eligible for inclusion in Southeast Asia were from Thailand.

Birth prevalence estimates ranged from 0 (Estonia [38], Finland [39], and Thailand [40]) to 2.46 per 10,000 births (Macedonia) [41] (Table 3).

Table 3 Birth prevalence estimates scoring strong on diagnostic method for case confirmation (n = 54 publications)

Estimates from 45 publications were included in at least one meta-analysis, and the rest were excluded due to temporal and regional overlap. Meta-analysis results are summarized in Table 4 and Additional file 3: Figures A2–A5. The regionally weighted global birth prevalence of PAH deficiency (N = 44 publications, 1 estimate per publication) was 0.64 (95% confidence interval [CI] 0.53–0.75) per 10,000 births (Table 4; quality assessment results shown in Fig. 2E). The lowest regional birth prevalence was observed in Southeast Asia, with 0.03 cases per 10,000 births (95% CI 0.02–0.05); the highest was observed in the Middle East/North Africa, with 1.18 (95% CI 0.64–1.87) cases per 10,000 births.

Table 4 Meta-analysisa of birth prevalence estimates stratified by region and by phenylalanine diagnostic cutoff value

Eleven publications reported birth prevalence estimates (1 estimate per publication) with a confirmatory test Phe concentration cutoff value of 360 ± 100 µmol/L. The regionally weighted global birth prevalence was 0.96 (95% CI 0.50–1.42) per 10,000 births (Table 4 and Fig. 2B). The lowest regional birth prevalence was observed in North America, with 0.49 cases per 10,000 births (95% CI 0.38–0.61), based on two publications that presented very similar results [42, 43], as reflected in the heterogeneity statistic I2 value of 0. The highest birth prevalence was observed in the Middle East/North Africa, 1.60 (95% CI 1.06–2.31) per 10,000 births, based on a single estimate [44].

Ten publications (1 estimate each) reported birth prevalence estimates using a confirmatory test Phe concentration cutoff value of 600 ± 100 µmol/L. The regionally weighted global birth prevalence was 0.50 (95% CI 0.37–0.64) per 10,000 births (Table 4 and Fig. 2C) for this cutoff value.

For the 1200 ± 200 µmol/L cutoff value for a Phe concentration confirmatory test, 20 publications (1 estimate each) were eligible and the regionally weighted global birth prevalence was 0.30 (95% CI 0.20–0.40) per 10,000 births (Table 4 and Fig. 2D).

Discussion

The overall meta-analysis conducted in this systematic review provides a regionally weighted global birth prevalence of PAH deficiency of 0.64 (95% CI 0.53–0.75) per 10,000 births. It is important to weight birth prevalence estimates by region so that the global PAH deficiency birth prevalence reflects both the birth prevalence and population size of each region rather than just the inverse variance (primarily driven by the sample size) of the individual studies (as was done for the calculation of non–regionally weighted birth prevalence). The highest regional birth prevalence in the overall analysis was reported in the Middle East/North Africa, where consanguineous marriages are among the most frequent in the world, with frequencies up to 42% in Saudi Arabia [45].

Among estimates with a confirmatory test Phe concentration cutoff value of 360 ± 100 µmol/L, the regionally weighted global birth prevalence was 0.96 (95% CI 0.50–1.42) per 10,000 births. On the basis of recent European and American College of Medical Genetics and Genomics guidelines (Table 1), this would represent the population for which treatment in children is recommended. Based on the single estimate for Middle East/North Africa, the birth prevalence was again highest in this region [44].

In the meta-analyses based on Phe concentration cutoff values of 600 µmol/L and 1200 µmol/L, the regionally weighted global prevalences were 0.50 (95% CI 0.37–0.64) and 0.30 (95% CI 0.20–0.40), respectively, per 10,000 births. Regional variation in the prevalence of PAH deficiency defined by these cutoff values was observed, with higher prevalences in Europe, Latin America, North America, and the Middle East than was observed globally. In a recent analysis of global variations in PAH genotype [46], genotypes associated with classical PKU (Phe ≥ 1200 µmol/L) tended to be the most common in the Middle East.

As might be expected, in this meta-analysis we observed decreasing pooled birth prevalence as confirmatory test Phe cutoff values increased (Table 4). The decreasing prevalence we observed with increasing Phe cutoff values should be interpreted cautiously. Specifically, this finding does not necessarily reflect differences in the relative frequencies of classical, moderate, mild PKU and HPA, but rather the fact that individuals with higher Phe levels are included in the estimates with lower cutoff values (e.g., the pooled prevalence for the 360 µmol/L cutoff value includes individuals that would be diagnosed as having classical and severe PKU per Table 1). This approach was taken to ascertain the birth prevalence of all individuals whose Phe levels were within the treatable range and the impact different confirmatory Phe cutoff thresholds have on PAH deficiency birth prevalence estimates. The confidence intervals for the various Phe cutoff thresholds had substantial overlap, likely due to heterogeneity of estimates from individual studies.

As evidenced by the high I2 values, heterogeneity of birth prevalence estimates was generally high, even among estimates stratified by region and Phe concentration cutoff values for case confirmation. Heterogeneity may be partly explained by random variation related to sampling, which is supported by the fact that many included studies were small (35% of the 238 estimates scored weak on precision of the prevalence estimate [Fig. 2]). Other reasons for heterogeneity include variations in age at screening and confirmatory testing, and dietary intake prior to sampling.

We found that data elements that are key to understanding the reported birth prevalence estimates were often missing: 30% of the 238 estimates scored weak on case definition (i.e., failed to provide Phe cutoff values for both screening and for case confirmation), and 66% scored moderate on this domain (failed to report on either screening or confirmatory Phe cutoff values); 11% did not report the study setting/source population or derived the information from personal communications. In addition, 126 of 238 reported birth prevalence estimates (53%) scored moderate or weak in diagnostic method used for case confirmation. Thirteen percent of the 238 estimates lacked information on the time period assessed, 3% on the assay used for screening, and 38% on the assay used for case confirmation. Although the frequency of BH4 deficiency is very low (1–2% of HPA cases) [6], it was not reported or not excluded from the reported birth prevalence estimates in 81% of the 238 birth prevalence estimates included in this review.

Substantial inconsistencies were observed in the nominal diagnoses reported, even in recent publications, with poor or inaccurate distinction between PKU, moderate PKU, classical PKU, and HPA (Additional file 2).

We have not found published papers estimating the global prevalence of PAH deficiency. However, two recently published reviews estimated the global prevalence of PKU. Shoraka et al. [47] identified studies reporting the birth prevalence of classical PKU in newborns and meta-analyzed them by region and overall (non-regionally weighted, with no stratification by case confirmation Phe cut off value). Hillert et al. [46] used unpublished information from national screening centers and reports identified through a literature search to estimate a global prevalence of PKU in newborns. Table 5 provides a comparison of the birth prevalence estimates from our analysis with the results from the studies by Shoraka et al. [46] and Hillert et al. [45].

Table 5 Comparison of birth prevalence estimates among recent literature reviews

The largest differences between the current study and the study by Shoraka et al. were seen in Europe, the Americas, and the West Pacific regions. The similarity between the overall estimate by Shoraka et al. and the currently reported regionally weighted global birth prevalence is likely largely due to chance, as substantially different inclusion criteria and methodologies were employed in the two studies (Additional file 1: Figure A-1). Shoraka et al. excluded publications considered to have a high risk of bias as assessed using an existing 10-point checklist [48], which has some similar elements to the quality of evidence tool used in this publication. There was no requirement that cases be confirmed. The reported prevalence was described as relating to classical PKU, even though the Phe cutoff for confirmatory tests of the included studies ranged from 1.65 mg/dL (equivalent to 100 µmol/L) to 20 mg/dL (1211 µmol/L).

The current study provides a higher estimate of the global birth prevalence of PAH deficiency than Hillert et al. Unfortunately the inclusion and exclusion criteria and the method(s) for combining estimates from individual studies are not fully described in that paper, nor are the sources fully described; the global estimate included data from countries that the study describes as lacking newborn screening programs in parts of Africa, Asia, South America, and the Caribbean [46].

The current findings confirm that regional differences exist in the birth prevalence of PAH deficiency, with higher frequencies of inheritance of this autosomal recessive disease in areas with higher frequencies of consanguineous marriages, as has also been noted by others [46, 47].

Limitations of this study include incomplete reporting of key data elements in many of the included publications. In addition, the precision of the reported prevalence was low for most of the included estimates due to small sample sizes. No articles were identified reporting on the birth prevalence of PAH deficiency in Sub-Saharan Africa and birth prevalence estimates from countries in Southeast Asia were limited, lacking representation of some of the most populous countries in the region such as India. Absence of estimates could be attributed to absence of newborn screening programs for PAH deficiency in specific countries and regions [15], or lack of published estimates from newborn screening programs meeting the inclusion criteria for this review, such as the requirement that the full-text article be written in English. Strengths of this study include the fact that only confirmed cases were included in the qualitative synthesis, and that the meta-analysis only included estimates based on higher quality confirmatory assays. In addition, meta-analyses were undertaken based on clinically relevant diagnostic cutoff values.

Conclusions

In this systematic literature review and meta-analysis, we estimated the regionally weighted global birth prevalence of PAH deficiency to be 0.64 (95% CI 0.53–0.75) per 10,000 births (overall). The estimated regionally weighted global birth prevalence among newborns with Phe level ≥ 360 ± 100 µmol/L at diagnosis was 0.96 (95% CI 0.50–1.42), which is the population for whom treatment is recommended. Substantial regional variation was observed with an elevated birth prevalence of this autosomal recessive disease in regions with higher frequencies of consanguineous births. Despite the fact that newborn screening has been widely implemented in much of the world for decades, the precision of the estimates is limited by the unavailability of publications on large population samples. This observation underscores the need for more comprehensive and systematic data collection as well as improved standards for reporting results. Only with more widespread availability of data from newborn screening programs from large populations will it be possible to obtain robust estimates and truly understand the magnitude of this serious and treatable condition.