Introduction

Newborn screening (NBS), an important and successful public health program, refers to the specific examination of inherited and congenital diseases that seriously threaten the health of newborns in the neonatal period [1, 2]. NBS aims to improve long-term clinical outcomes by providing interventions for the early diagnosis and treatment of these diseases before the onset of symptoms in affected newborns [3]. Since the start of NBS in 1961, new methods have been continuously introduced into NBS, including the bacterial inhibition test for phenylketonuria (PKU) screening [4], the enzyme activity test for galactosemia screening [5], and the radioimmunoassay for congenital hypothyroidism (CH) screening [6]. With the application of tandem mass spectrometry (MS/MS) in the 1990s, it was possible to screen multiple inherited metabolic diseases (IMDs) in a single assay, greatly expanding the screened diseases of NBS [7]. NBS has been widely recognized as an important measure to reduce the morbidity and mortality of neonatal diseases.

At present, MS/MS and other biochemical methodologies are the main screening methods for neonatal IMDs in China [8]. By measuring the levels of amino acids, succinylacetone and acylcarnitines in neonatal dried blood spots (DBS), MS/MS can screen dozens of IMDs through a single experiment, including oxidative metabolic disorders of amino acids, fatty acids and organic acids [9, 10]. However, there are limitations in the current screening technologies, including a limited number of diseases screened, missing detection of newborns with variable biochemical changes at the time of screening, difficulty in interpreting results, and the possibility of false-negative and false-positive screening results [11, 12].

Next-generation sequencing (NGS) is a high-throughput parallel sequencing technology that can analyze the sequences of millions of DNA molecules simultaneously at much lower cost and higher speed than Sanger sequencing [13]. Since the introduction of NGS, it has been quickly and widely adopted in both research and clinical applications. NGS makes it possible to analyze the whole human genome (whole genome sequencing, WGS) or the coding regions of all genes (whole exome sequencing, WES) at an affordable cost [14]. NGS is now widely used in the screening of neonatal genetic disorders [15]. NGS could expand the screening of genetic diseases and facilitate the early detection of genetic defects [16]. Furthermore, the application of NGS in NBS could clarify the variation source and types of genetic disorders from the molecular perspective, provide a basis for genetic counseling, and improve the clinical outcome of children [17]. In the USA, the newborn sequencing in genomics medicine and public health (NSIHT) consortium, funded by the National Institutes of Health (NIH), was established [18]. The BabySeq project, a part of the NSIHT project, was a pilot randomized clinical trial based on WES, which aimed to explore the utility of NGS in genetic screening in healthy and sick newborns and compare the clinical impacts of NGS and routine neonatal screening [19, 20]. A recent study published by the BabySeq project displayed results of risk of childhood onset, carrier status, risk of operable adult-onset disease, and pharmacogenomics from NGS of 159 newborns [21]. However, NGS application in NBS is still in its early infancy, and most NGS application modes involve sequencing positive or suspected children for biochemical screening [22]. In China, a few studies explored genetic screening in newborns, such as hearing loss and other neonatal diseases [23, 24]. Nevertheless, our understanding and experiences of implementing newborn genetic screening of multiple diseases are limited.

In the current study, a large-scale, multicenter prospective analysis was conducted to screen multiple genetic diseases from DBS profiles of 21,442 neonates with a customized newborn genetic screening (NBGS) panel, which has been used in a previous retrospective study [25]. A total of 75 neonatal inborn disorders and 135 genes were carefully selected to be analyzed by the NBGS panel, and the regional incidences and carrier frequencies of selected congenital diseases of these newborns in different regions of China were explored. The screening methods for major genes and pathogenic variants of genetic disorders reported in the current research could improve the detection range of NBGS and contribute to genetic counseling and clinical communication.

Methods

Study subjects

A total of 21,442 newborn samples were randomly collected from November 2020 to November 2021. The samples were collected by 12 hospitals from 6 regions, including 1907 samples from Maternal and Child Health Care Hospital of Shandong Province (SDH), 1990 samples from Jinan Maternal and Child Health Care Hospital (JNH), 2050 samples from Maternal and Child Health Care Hospital of Shanxi Province (SXH), 904 samples from Changzhi Maternal and Child Health Care Hospital (CZH), 2060 samples from Ningbo Women and Children’s Hospital (NBH), 1999 samples from Shenzhen Hospital Affiliated to University of Chinese Academy of Sciences (SZH), 1789 samples from Maternal and Child Health Care Hospital of Xinjiang Uygur Autonomous Region (XJH), 1837 samples from Maternal and Child Health Care Hospital of Ningxia Hui Autonomous Region (NXH), 2019 samples from Changsha Maternal and Child Health Care Hospital (CSH), 1874 samples from Chongqing Maternal and Child Health Care Hospital (CQH), 2013 samples from Guiyang Maternal and Child Health Care Hospital (GYH), and 1000 samples from Maternal and Child Health Care Hospital of Yunnan Province (YNH). The inclusion criteria of newborns involved in this study were as follows: (1) neonates had undergone or would undergo MS/MS; (2) Chinese singleton newborns; (3) the parents were in good health, without serious acute or chronic medical history and clear genetic diseases, and (4) follow-up to the end of the project. The exclusion criteria were as follows: (1) those who did not meet the inclusion criteria; (2) parents were not Chinese; (3) the infant was older than 28 days; (4) one of the multiple pregnancies; (5) newborns could not provide a dry blood spot with a diameter greater than 8 mm, and (6) assisted pregnancy (including in vitro fertilization and embryo transfer (IVF-ET), intracytoplasmic sperm injection (ICSI) pregnancy) and newborns born after receiving preimplantation genetic screening tests during pregnancy. All the parents of the 21,442 newborns signed informed consent forms. This study was approved by the institutional review board of the ethics committee in all of the above hospitals, and the procedures were in accordance with the seventh revision of the Helsinki Declaration (2013).

Study design

The 21,442 newborn samples were all subjected to NBGS and conventional NBS (C-NBS). For the NBGS, dried blood spots (4 × 3.2 mm) harvested from the 21,442 samples were screened using an NBGS panel, which includes 1189 amplicons covering 2527 known variants of 135 genes associated with 75 neonatal genetic diseases [25]. For C-NBS, G6PDD screening was performed with the GSP® Neonatal G6PD fluoroimmunoassay kit (PerkinElmer, Finland), time-resolved fluoroimmunoassay (TRFIA) was operated to detect thyroid-stimulating hormone (TSH) for CH screening with a GSP® Neonatal hTSH kit (PerkinElmer), and MS/MS was proceeded to screen IMDs. For clinical profiling, newborn birth weight (in grams) and gestational age (GA, in weeks) were collected from all enrolled newborns. The list of 75 disorders and genes included in the NBGS are shown in Table S1.

Genetic screening and bioinformatic analysis

Dried blood spots collected from neonates were used to extract genomic DNA via a nucleic acid automatic extraction system (Bioer, China). NGS libraries were generated by amplifying targeted regions with an ultra multiplex PCR system based on the SLIMamp (StemLoop Inhibition Mediated amplification) method [26]. The quality of the libraries was assessed by Bioanalyzer 2100 (Agilent Technologies, Santa Clara, CA, USA). High-throughput sequencing was carried out using an Illumina NextSeq 500 according to the manufacturer’s protocol.

For base calling and raw data generation, bcl2fastq (Illumina) was adopted to process the raw image files. Low-quality sequencing reads were subsequently excluded, and the NCBI human reference genome (hg19/GRCh37) was used to align the remaining reads. The minor allele frequencies (MAFs) of the known variants were identified with the accordance of the 1000 Genome Project, dbSNP and Gnomad. Public and commercial databases, such as OMIM, ClinVar and Human Gene Mutation Database, were used for variant annotations. Bioinformatic tools were implemented for variant interpretation, including SIFT, PolyPhen-2 and MutationTaster, and PROVEAN. The descriptions of these online bioinformatic tools and databases are shown in Table S2.

In the present study, the pathogenicity of the variant was evaluated manually according to the American College of Medical Genetics and Genomics (ACMG) variant interpretation guidelines and updates published by ClinGen. The variations were classified into five categories: pathogenic (P), likely pathogenic (LP), unknown significance (VUS), likely benign (LB) and benign (B). The panel we used included 4 mitochondrial diseases and 131 monogenic diseases, of which monogenic diseases were divided into three groups: (1) dominantly inherited diseases: pathogenic or likely pathogenic (P/LP) variants in genes; (2) recessively inherited diseases: biallelic P/LP variants in genes, and (3) X-linked recessive inheritance and X-linked dominant inheritance.

Statistical analysis

In the present study, the observational indicators included the ratios of the numbers of detected positive and carriers of the target gene to the number of newborns enrolled in this region, which were the positive rate and carrying frequency of this gene in this region, respectively. In addition, the proportions of the numbers of positive and carrying high-frequency variations in the total detected amount in a region were the positive MAFs and carrying MAFs of this region, respectively. Data were statistically analyzed using SPSS 19.0 (IBM, USA). The difference in a single index among multiple regions was calculated using the chi-square test. P < 0.05 was considered a significant difference.

Results

Distribution of the screening population and positive detection/carrier frequencies

The screening population included 21,442 newborns, who were divided into six groups according to their region of the enrollment hospital, including North China (n = 6851, SDH, JNH, SXH and CZH), Northwest China (n = 3626, XJH and NXH), East China (n = 2060, NBH), Central China (n = 2019, CSH), Southwest China (n = 4887, CQH, GYH and YNH), and South China (n = 1999, SZH). Positive detection was defined according to the following standards: AR ≥ 2 variants; AD ≥ 1 variant; XLR Male ≥ 1 variant, female ≥ 2 variants or XLD ≥ 1 variant. A carrier is defined according to the following standards: AR = 1 variant or XLR Female = 1. The overall positive detection rates covered by NBGS screening in each region ranged from 0.1% to 0.38% (except G6PD variants), the lowest in South China and the highest in North China (Table 1). There was no significant difference in the pathogenic variant carrier frequencies for one variant of each region.

Table 1 The overall prevalence and carrier frequencies in different regions (G6PD excluded)

Regional features of positive detection rates

Several regional features of positive detection rates were observed. When a hemizygous variant in an X-linked gene, or biallelic variants in a autosomal recessive gene, were detected, the subject is considered as a positive case. In the whole cohort, the top 6 genes with the most positive cases were glucose-6-phosphate dehydrogenase (G6PD) (50.37 in 10,000), phenylalanine hydroxylase (PAH) (9.79 in 10,000), gap junction beta 2 (GJB2) (3.26 in 10,000), dual oxidase 2 (DUOX2) (2.80 in 10,000), solute carrier family 22 member (SLC22A5) (2.33 in 10,000), and solute carrier family 26 member 4 (SLC26A4) (1.40 in 10,000) (Fig. 1a). The X-linked incomplete dominant G6PD variations were quite common in South China but relatively rare in North China (Table 2), whereas PAH variations were most commonly identified in North China (Table S3). Overall positive detection rates were similar in Southwest and Northwest China and were not detected in Central China and South China. The positive rates of other target genes are shown in Fig. 2b, which had large geographical differences.

Fig. 1
figure 1

Distribution of gene variation positive rates by subgroups. a The fractions of the top 6 common gene variations in each geological subgroup. b The positive rates of the remaining 10 most common gene variations in each subgroup indicated as 1 in 10,000

Table 2 The positive detection rate of G6PD gene variants of different regions. aThe detection rates were 1 in 10,000
Fig. 2
figure 2

Correlation of genotype and biochemical indicators. a The difference in thyroid-stimulating hormone (TSH) between carriers of DUOX2 variants and non-carriers; b The difference in C0 (free carnitine) and CIT (citrulline) between carriers of SLC22A5 and SLC25A13 variants and non-carriers; c The difference in C5 (C5 acylcarnitine) between carriers of ACADSB variants and non-carriers

A total of 11 different G6PD pathogenic variants were observed in the current study, and their positive rates are presented in Table 2. Among them, c.1388G > A, c.1376G > T, c.95A > G, and c.1024C > T were the four pathogenic variants observed frequently. The frequency of G6PD gene pathogenic variants varied in different regions (P value < 0.001). A group of 108 positive cases of G6PD variations confirmed in newborns were detected by NBGS. G6PD variants were detected in 5 females, including 4 compound heterozygous variants and 1 homozygous variant, and the rest were males. The positive rate of G6PD detected by NBGS was 0.50% (108/21,442). After the G6PD enzyme activity test, the positive rate was confirmed to be 94.44% (102/108), no feedback was confirmed to be 1.85% (2/108), and normal results were confirmed to be 3.70% (4/108). The distribution characteristics of G6PD pathogenic variant frequency showed a decreasing trend from south to north in China. Among them, south China was the highest (146.74 in 10,000), followed by Southwest (68.21 in 10,000), Central China (29.72 in 10,000), East China (16.18 in 10,000), and Northwest (3.68 in 10,000), and North China (2.92 in 10,000) was the lowest.

All PAH pathogenic variants detected and their positive rates are presented in Table S3. The distribution of PKU among the Chinese population showed geographical differences (P  < 0.001). North China had a spectrum of 18 distinct PAH gene variants, which was the region with the most variants in China. After additional biochemical analysis, the positive rate was confirmed to be 52.38% (11/21). After family verification, variants located on the same chromosome were detected in two cases, which can be considered carriers. Eight patients were lost to follow-up. c.158G > A was the most prevalent variant (MAF: 1.17 in 10,000). Currently, all compound heterozygous variants with c.158G > A have normal clinical phenotypes.

Distribution of frequent pathogenic gene and variant carrier frequencies in different regions

The top 10 most frequent pathogenic gene carrier frequencies are presented in Table 3. Among them, DUOX2, PAH, GJB2, ATPase copper transporting beta (ATP7B) and SLC26A4 were the five pathogenic gene carrier frequencies most frequently observed. Except for GJB2, the pathogenic gene carrier frequencies of the other genes were significantly different in different regions. Seventy percent of high-frequency carrier genes correspond to high-frequency carrier variants.

Table 3 The distribution of frequent pathogenic gene carrier frequencies in different regions

The top 10 most frequent pathogenic variant carrier frequencies are presented in Table 4. Among them, DUOX2 c.1588A > T, GJB2 c.235del, SLC26A4 c.919-2A > G, SLC22A5 c.1400C > G, and solute carrier family 25 member 13 (SLC25A13) c.852_855del were the five pathogenic variants most frequently observed. DUOX2 c.1588A > T, SLC26A4 c.919-2A > G, SLC22A5 c.1400C > G, SLC25A13 c.852_855del, DUOX2 c.3329G > A, and acyl-CoA dehydrogenase short/branched chain (ACADSB) c.1165A > G had significant regional differences.

Table 4 The distribution of frequent pathogenic variants carrier frequencies in different regions

Findings pertaining to monogenic-disease risk

Except for G6PDD and PKU, we found other monogenic diseases in the preliminary screening of NBGS, and the results are shown in Table 5. We found seven GJB2 variant-positive cases (3.26 in 10,000), and all seven cases were verified by Sanger sequencing and clinical confirmation. Six positive cases with DUOX2 variants (2.80 in 10,000) were found in our study, all of which were confirmed by Sanger family verification and clinical confirmative diagnosis, except for one case that had no available follow-up data. Among the five positive cases of SLC22A5 variants discovered (2.33 in 10,000), all cases were excluded as negative via Sanger family verification, and no clinical follow-up data were available. Among the three positive cases with SLC26A4 variants (1.40 in 10,000), two cases were confirmed by Sanger sequencing and clinical evaluation. One patient displayed the appearance of the bilateral enlarged vestibular aqueduct on inner ear MRI, and the other patient had no follow-up data available. Two positive cases with SLC25A13 variants (0.93 in 10,000) were identified, one of which was ruled out as negative with Sanger sequencing, and the other was confirmed clinically. Furthermore, three positive cases of ATP7B variants (1.40 in 10,000) were identified, two cases of which were confirmed by Sanger sequencing and clinically confirmed, whereas no follow-up data were available for the other case. For three positive cases with SMN1 exon 7 deletions (1.40 in 10,000), copy numbers of SMN2 were further analyzed. We found that they had two copies and three copies and four copies of SMN2, respectively. The baby who had two copies of SMN2 was admitted in the ICU with lung infection. In addition, two positive alpha glucosidase (GAA) variants (0.93 in 10,000) were confirmed by Sanger sequencing. One case with MMACHC variants and one with HBB variants were identified, both of which were confirmed by Sanger sequencing and additional clinical evaluation. Moreover, all of the screened two positive cases of ATP binding cassette subfamily D member 1 (ABCD1) variants (0.93 in 10,000), two positive cases of coagulation factor IX (F9) variants (0.93 in 10,000), and one positive case of iduronate 2-sulfatase (IDS) (0.47 in 10,000) were confirmed by Sanger sequencing.

Table 5 Findings pertaining to monogenic-diseases risk. Inh inheritance, AD autosomal dominant, AR autosomal recessive, XLR X-linked recessive, Hom homozygote, Het heterozygote, Comp het compound heterozygote, Hemi hemizygote, M male, F female, TSH thyroid-stimulating hormone, C0 free carnitine, C2/C3/C5, C2/C3/C5 acylcarnitine, GAA acid α-glucosidase. aThe ratio unit is 1 in 10,000

Correlation in biochemical indicators between carriers and non-carriers

In this study, east China, northwest China, and southwest China were selected to analyze the correlation in biochemical indicators. We selected from the top 10 most frequent pathogenic variant carriers. DUOX2 c.1588A > T, SLC22A5 c.1400C > G, SLC25A13 c.852_855del, DUOX2 c.3329G > A, DUOXA2 c.738C > G, ACADSB c.1165A > G, and ACADSB c.655G > A were selected, corresponding to the biochemical indicators thyroid-stimulating hormone (TSH) (Fig. 2a), free carnitine (C0), citrulline (CIT) (Fig. 2b), and methylcrotonyl carnitine (C5) (Fig. 2C), excluding the deafness-related GJB2 and SLC26A4 and the non-C-NBS gene ATP7B. Only single-variation samples were selected for biochemical indicator analysis. There were no significant differences in birth weight or gestational age between the variant carriers and non-carriers in the three regions (Table S4). Although the indices were all within the normal range, the C0 index of SLC22A5 c.1400C > G carriers was significantly lower than that of non-carriers (control group), while the C5 index of ACADSB c.1165A > G carriers was significantly higher than that of controls.

Discussion

Newborn disease screening is one of the important measures for the three-level prevention of birth defects, which could prevent serious, life-threatening health problems through early intervention [27]. At present, NC NEXUS [28, 29] and Babyseq [30, 31] of newborn screening by NGS have been carried out in many places in the United States. Methodological evaluation of genetic screening by applying WES and WGS to a retrospective cohort analysis [28, 30, 32]. The technical methods we adopted in this study were more advanced and easy to operate, which also greatly shortened the reporting cycle and reduced the difficulty of report interpretation when compared to that of NC NEXUS and Babyseq. Another study [32] found that NGS technology can be used as a supplement to C-NBS, reducing the false-positive rate of screening results, resolving inconclusive results from C-NBS, and identifying pathogenic variant loci in affected individuals.

In this study, a large-scale, multicenter prospective analysis of 21,442 neonates was conducted by applying an NGS panel covering 135 genes associated with 75 neonatal inborn disorders. The study was performed using simple-to-operate and customizable multiplex PCR amplicon sequencing technology [33]. We present the positive and carrier frequencies of gene variations in different regions, illustrating the regional features in China. In our study, from these 21,442 infants, pathogenic variations were detected in 5700 infants. Among the 5700 infants, 168 cases were positive, and 5532 were pathogenic gene carriers. The 168 (0.78%) positive cases were detected by NBGS (Fig. 3), of which 164 cases were verified by Sanger family verification, and 4 were lost to follow-up. Among the 164 Sanger family verification cases, 7 were excluded as carriers because two pathogenic variants were located on the same chromosome. In addition, there were 149 clinical follow-up cases, of which 135 were confirmed, 7 had normal clinical phenotypes, and 7 were undetected. Among them, 3 cases with DUOX2 variants and 1 with SLC25A13 variants were normal in the initial clinical screening. The variants were detected by NBGS, and the clinical diagnosis was confirmed after recall examination.

Fig. 3
figure 3

Summary of positively identified neonates and related genes in this study. G6PD glucose-6-phosphate dehydrogenase, PAH phenylalanine hydroxylase, GJB2 gap junction beta 2, DUOX2 dual oxidase 2, SLC22A5 solute carrier family 22 member, SLC26A4 solute carrier family 26 member 4, GAA acid α-glucosidase, ATP7B ATPase copper transporting beta, ABCD1 ATP binding cassette subfamily D member 1, IDS iduronate 2-sulfatase, GAA alpha glucosidase, F9 coagulation factor IX, HBB hemoglobin beta-chain, SMN1 survival of motor neuron 1

There was no significant difference in the prevalence of 75 diseases detected by the NBGS panel in different regions. The same diseases, such as G6PDD and PKU, with higher incidences have significant differences in different regions [34, 35]. Among them, the prevalence of G6PDD in South China (2.15%, 43/1,999) and Southwest China (0.96%, 47/4,887) was the highest, and North China (0.04%, 3/6,851) and Northwest China (0.06%, 2/3,626) were the lowest, showing significant geographical differences, which was consistent with existing research [34]. The prevalence of PKU detected by the NBGS was 0.10% (21/21,442), with the highest prevalence in northern China. In addition, three deafness-related gene GJB3 (gap junction protein beta 3), MT-RNR1 (mitochondrially encoded 12S rRNA), and MTTL1 (mitochondrially encoded tRNA leucine 1) variants, excluded in the positive reports due to controversial genotype–phenotype correlation, also had higher prevalences of 0.34% (72 in 21,442), 0.23% (49 in 21,442) and 0.10% (22 in 21,442), respectively. The pathogenicity of GJB3 variants is believed to lead to delayed deafness [24, 36]. MT-RNR1 and MTTL1 variants lead to mitochondrial hearing loss, which has variable penetrance and severity, even within families [37].

The results showed that the top 10 most frequent pathogenic gene and variant carrier frequencies were presented. Seventy percent of high-frequency carrier genes correspond to high-frequency carrier variants. Eighty percent of high-frequency gene carriers and 60% of high-frequency variant carriers had obvious regional differences. On the premise that there was no significant difference in birth weight and gestational age, although the biochemical indicators of SLC22A5 c.1400C > G and ACADSB c.1165A > G carriers were within the normal range, they were significantly different from non-carriers. This shows that the variation in these two sites has a certain influence on the enzyme activity. In addition, we also found 13 cases of chromosomal abnormalities by multiplex PCR, of which 3 were recalled and confirmed, Klinefelter syndrome (XXY) in 2, and XX male syndrome in 1.

In this study, we explored whether the prevalence of diseases has significant regional characteristics, which provides a theoretical basis for screening diseases in different regions. However, our data only represented the neonatal genetic disease prevalence of 12 representative hospitals in the 6 regions. The positive detection rates of the panel were estimated by variant carrier frequencies. This total was composed of monogenetic diseases (41.55 in 10,000), consisting of autosomal dominant (11.36 in 10,000) and X-linked recessive disorders (30.18 in 10,000). Genetic screening could save huge medical costs. Multiplex PCR technology also has the advantages of a short reporting time, easy genetic interpretation, and low cost, which costs only 1/5 of WES.

In summary, we evaluated the incidence and carrier frequencies of 75 neonatal inborn disorders and 135 genes in 21,442 newborns from different regions of China through an NBGS panel. We found that the positive detection and carrier frequency of neonatal inborn disorders in different regions were significantly different. These findings proved that NBGS was a potential strategy for NBS and served as a supplemental tool for C-NBS methods. In addition, our data provide a theoretical basis for screening neonatal inborn disorders in different regions.