Genetic Architecture of Childhood Kidney and Urological Diseases in China

Kidney disease is manifested in a wide variety of phenotypes, many of which have an important hereditary component. To delineate the genotypic and phenotypic spectrum of pediatric nephropathy, a multicenter registration system is being implemented based on the Chinese Children Genetic Kidney Disease Database (CCGKDD). In this study, all the patients with kidney and urological diseases were recruited from 2014 to 2020. Genetic analysis was conducted using exome sequencing for families with multiple affected individuals with nephropathy or clinical suspicion of a genetic kidney disease owing to early-onset or extrarenal features. The genetic diagnosis was confirmed in 883 of 2256 (39.1%) patients from 23 provinces in China. Phenotypic profiles showed that the primary diagnosis included steroid-resistant nephrotic syndrome (SRNS, 23.5%), glomerulonephritis (GN, 32.2%), congenital anomalies of the kidney and urinary tract (CAKUT, 21.2%), cystic renal disease (3.9%), renal calcinosis/stone (3.6%), tubulopathy (9.7%), and chronic kidney disease of unknown etiology (CKDu, 5.8%). The pathogenic variants of 105 monogenetic disorders were identified. Ten distinct genomic disorders were identified as pathogenic copy number variants (CNVs) in 11 patients. The diagnostic yield differed by subgroups, and was highest in those with cystic renal disease (66.3%), followed by tubulopathy (58.4%), GN (57.7%), CKDu (43.5%), SRNS (29.2%), renal calcinosis /stone (29.3%) and CAKUT (8.6%). Reverse phenotyping permitted correct identification in 40 cases with clinical reassessment and unexpected genetic conditions. We present the results of the largest cohort of children with kidney disease in China where diagnostic exome sequencing was performed. Our data demonstrate the utility of family-based exome sequencing, and indicate that the combined analysis of genotype and phenotype based on the national patient registry is pivotal to the genetic diagnosis of kidney disease. Supplementary Information The online version contains supplementary material available at 10.1007/s43657-021-00014-1.

Box 1. Variant filtering strategy for identifying the potential pathogenic variants in genes known to cause kidney disease i. Keep rare variants present with a minor allele frequency (MAF) <1% in healthy control cohorts dbSNP147 (https://www.ncbi.nlm.nih.gov/projects/SNP). ii. Keep non-synonymous variants and intronic variants that are located within splice sites.
iii. Applying known gene approach by selecting all variants detected in known kidney disease genes.17 iv. Ranking of remaining variants based on their predicted likelihood to be deleterious for the function of the encoded protein using Polyphen 2 (http://genetics.bwh.harvard.edu/pph2, SIFT (http://sift.jcvi.org/) and Mutation Taster (http://www.mutationtaster.org) v. Reviewing literature and review with referring physician delineating whether the detected mutation matches the phenotype. vi. Cross reference with the ACMG guidelines to determine if pathogenic, likely pathogenic or a variant of uncertain signi

Box 2. Variant analysis criteria Autosomal recessive variants
Disease-causing variants in recessive genes were considered if two alleles were found in the same individual that fulfilled at least one of the following criteria: i) Truncating allele (stop, abrogation of start or stop, obligatory splice site, or frameshift); OR ii) Missense mutation if a minimum of 4 of 5 of the following criteria were met: • Continuously conserved at least among vertebrates (or beyond) • Previously reported as disease causing or functional evidence implicating causality • Loss of function in human allele is supported by functional data • Non segregation:if the allele did not segregate with the affected status in the family. Or If an unaffected family member is with the allele consider incomplete penetrance and variable expressivity Discussion of genotype-phenotype correlation in a panel of nephro-geneticists followed by review of clinical phenotype with referring physician

Quality control of sequencing data
The QC was performed at many stages of the analysis pipeline, including pre-cleaning, post-cleaning, post-alignment, and post-variant-calling. The average sequencing depth of targeted regions is 98.3X, with 95% on average of the targeted based sequenced at least 20 times.

Qualitative data verification
Data quality is evaluated by relevance, completeness and accuracy18. Quantitatively, all data in the CCGKDD (Chinese Children Genetic Kidney Disease Database, www. ccgkdd.com.cn) were entered into a computer by research assistants. Combined data entered into a single cell (e.g. dates as day/month/year) were divided so that each piece of data was entered into a separate cell. In total, 38 different data categories were analyzed (Table 2). Date, numerical data and categorical data were entered as they were, and illegible data were given a new code. Incorrectly coded and inappropriate data were identified. Inappropriate data were those that do not exist within the presumed range (e.g. '13' for month). Nominal data (name and residence) and numbers (references and birth certificate) were newly coded as legible or illegible, and residence was further categorized by local research assistants as recognized or not recognized. Completeness was evaluated by counting cells filled with data.
Complete data that were illegible, incorrectly coded, inappropriate or unrecognized were counted to assess accuracy. The data categories in the registers were qualitatively compared with those in the quarterly workload reports to determine whether these categories were relevant to diagnosis and management needs. The quarterly workload reports presented to the council members of "Internet Plus" Nephrology Alliance of National Center for Children's Care". And the annual reports on data quality were presented on the website.

•
Counting the total number of deliveries and comparing this with the number in a quarterly workload report (accuracy) • Counting the complete and incomplete data (completeness) • Counting the complete data which were illegible, wrongly coded, inappropriate and unrecognised (accuracy) Qualitative verification

•
Comparing the data categories with those in a quarterly workload report to evaluate whether the data collected satisfied management information needs (relevance) • Examining the instructions, with the input by network management working group, to evaluate their influence to data accuracy and incompleteness