Background

Phelan-McDermid syndrome (PMS, OMIM 606232) is a rare genetic condition [1, 2], with 3216 affected individuals worldwide (1739 in the US) engaged with the Phelan-McDermid Syndrome Foundation (PMSF). Clinically, affected individuals present with global developmental delay, intellectual disability (ID), hypotonia, and severe speech impairments [3,4,5,6,7]. Autistic features and autism spectrum disorder (ASD) are common [6, 8], with 63% of cases meeting criteria for ASD [6]. Other frequent findings include seizures, gastrointestinal problems, minor dysmorphic features, gait abnormalities, structural brain abnormalities, and renal malformations [1,2,3,4, 6,7,8,9]. In addition, some individuals develop severe neuropsychiatric illness in adolescence or early adulthood, including bipolar disorder, catatonia, and regression of skills [10, 11].

PMS is caused by deletions of varying sizes in the distal long arm of chromosome 22 affecting the SHANK3 (SH3 and multiple ankyrin repeat domains 3) gene, or by pathogenic sequence variants in SHANK3 [1, 2]. SHANK3 encodes a post-synaptic scaffolding protein that plays a critical role in the development and function of excitatory synapses [12]. Haploinsufficiency of SHANK3 causes the core features of PMS [13,14,15,16]. Individuals with SHANK3 variants or interstitial deletions that exclusively disrupt SHANK3 exhibit developmental delay, moderate to profound ID, motor deficits, severely impaired speech, ASD, hypotonia, seizures, brain abnormalities and minor dysmorphic features [5, 6, 9, 15, 16]. They are also more likely to have psychiatric diagnoses such as bipolar disorder, catatonia, depression, psychotic disorders, and regression [6, 10, 11].

Most reported cases of PMS are caused by 22q13.3 deletions, which can be either terminal or interstitial, and can be associated with ring chromosomes, translocations, or inversions [1, 2, 17]. Deletions range in size from < 10 kb to > 9 Mb, and usually encompass many genes in addition to SHANK3. Previous studies have shown that the size of the deletion is positively associated with hypotonia, developmental delay, speech deficits, dysmorphic features and a number of medical comorbidities, including renal abnormalities and lymphedema [4, 6,7,8,9, 18, 19]. However, most of the studies exploring genotype-phenotype relationships in PMS have used very small samples (reviewed in [6]), with only the two most recent ones analyzing larger cohorts (n = 170 and 210) [6, 9].

Interstitial 22q13 deletions of variable size upstream of SHANK3 have been reported in a small number of individuals [20,21,22,23,24]. These individuals share features common to PMS, including developmental delay, hypotonia, language delay, behavioral problems, dysmorphic features, renal abnormalities, congenital heart defects, and abnormal neuroimaging. These findings suggest that haploinsufficiency of other genes in the 22q13 region besides SHANK3 contributes to the phenotype of PMS patients with larger deletions. Thus far, only two genes have been identified as contributing to the PMS phenotype: TCF20, involved in a syndromic neurodevelopmental disorder with ID and ASD [25, 26], and CELSR1, implicated in lymphedema [27, 28].

To explore the contribution of genes other than SHANK3 to the phenotype of individuals with larger deletions, we utilized the PMS International Registry (PMSIR), a family-driven registry founded by parents of individuals with PMS and launched in 2011 by the PMSF. The international registry includes 1623 of the 3216 patients registered with PMSF. The phenotypic information is annotated by the patients’ families, who are also asked to submit copies of the genetic tests. We hypothesized that although retrospective and not recorded in the context of a clinical evaluation, family-reported data (including genetic test results) obtained from the PMSIR could be used to conduct genotype-phenotype correlation analyses. Our results uncovered phenotypes significantly associated with larger deletions, implicating additional genes besides SHANK3 in determining the phenotypic outcome. These findings have implications not only for PMS but also for those interested in using patient or family-reported data to shed new light on other rare diseases.

Methods

PMS International Registry and PMS Data Network

Funded by the Patient Centered Outcome Research Institute (PCORI), the PMS Data Network (PMS_DN) project aims to advance knowledge about PMS and related conditions by integrating patient-reported outcomes, curated genetic test results, and knowledge extracted from clinical records in the centralized i2b2/tranSMART platform to facilitate patient-centered research [29]. Clinical records were not used in this study. The network promotes a culture of transparency, as well as authentic engagement and leadership of families in the network’s vision, governance, and generation of research priorities. The PMS_DN uses data entered into the online PMSIR by the patients’ families. Only families in the registry that provided specific consent to participate in the PMS_DN were included in this study.

The registry comprises 1300 patient-reported outcome items across three distinct questionnaires: a “clinical” questionnaire, based on diagnosed comorbidities, symptoms, tests, and treatments for the entire range of known pathologies and features associated with PMS; a “developmental” questionnaire, focusing on physical, motor, behavioral, cognitive, and social development; and an “adult” questionnaire aimed at patients aged 12 or older, regarding the evolution of manifestations after puberty. Only the first two questionnaires were used in this study. Families can retake the surveys any number of times, allowing for longitudinal analysis and ensuring the most complete and accurate information. The registry is available in four languages: English, Spanish, French and Italian.

In addition to phenotypic data, the registry also contains curated genetic information, including results from karyotypes, fluorescent in situ hybridization (FISH), chromosomal microarrays, and SHANK3 sequencing. The scanned paper reports of genetic test results are uploaded by families onto the platform, and manually curated by a team of trained genetic counselors, who fill in structured fields documenting the genetic abnormalities. The data are then reviewed by an expert in PMS genetics, who also interprets the pathogenicity of the variants.

All phenotypic and genetic data from the patients who consented to participate in the PMS_DN were integrated into the i2b2/tranSMART [30,31,32] data warehouse. In addition, a PMS ontology was created to tag and organize the registry data elements in a controlled vocabulary. The aim was to enable efficient user-selection of various stratifications of registry patients. This required developing an automatic data cleaning and loading pipeline (using the R language) to provide easy integration of new data from the PMSIR into the PMS_DN over time with consistent variable coding. Variables representing clinical items in the survey were categorized as being either “historical” or “evolving”. “Historical” variables are immutable (e.g., birth weight), whereas “evolving” variables change over time (e.g., current weight). When a family retakes a survey in the PMSIR, we considered a change in an “evolving” variable to be an update to the current value, and a change in a “historical” variable to be a correction of a previous mistake. All “evolving” data were stored for each time point, and the most recent entry for “historical” items was considered the most accurate (Additional file 1: Figure S1). This ensures that the most recent record has the most up-to-date “historical” information, while storing the “evolving” parameter histories.

We note that some of the families enrolled in the PMSIR have also participated in previous studies investigating genotype-phenotype relationships in PMS, in the US and in other countries [4,5,6,7,8,9, 14, 16,17,18,19].

Genetic criteria

Only deletions with coordinates determined by chromosomal microarrays were included in the study. When multiple arrays were available, we selected the one with the highest resolution for the analysis. Chromosomal coordinates were transformed to the most recent assembly, GRCh38/hg38, using the LiftOver tool from the UCSC Genome Browser. In three individuals with contiguous deletions that were separated by a gap of less than 5% of their total length, the deletions were merged into a single deletion to prevent underestimation of their effect. The present study focused exclusively on deletions affecting SHANK3; proximal copy number variants in chromosome 22q or in other chromosomes were not included in the analysis. SHANK3 variants were interpreted according to the American College of Medical Genetics and Genomics guidelines [33]. Only variants classified as pathogenic or likely pathogenic were included in the analyses.

In this phenome-wide association study [34], we compared the extent of deleted genetic material at the q terminus of chromosome 22 with the occurrence of each phenotypic feature. To this end, we treated patients with SHANK3 sequence variants as patients with terminal deletions extending from the starting position of SHANK3 to the end of the chromosome. Similarly, interstitial deletions of SHANK3 were extended to the end of the chromosome. Note that the two coding genes distal to SHANK3 do not contribute to the PMS phenotype: ACR, encoding acrosin, a sperm protein, and RABL2B, encoding a small Rab GTPase, are both tolerant to heterozygous loss-of-function variation in the general population (Genome Aggregation Database, gnomAD v2.1.1).

Selection of phenotypic variables

In order to satisfy the technical requirements of the regression models, we selected for analysis physical and behavioral phenotypic features that were present in at least five individuals. To represent phenotypic outcomes, we preselected 454 out of the 1300 total family-reported outcome items available in the registry. The excluded items were those not directly referring to a symptom or a condition, such as “Does the patient have a primary cardiologist?”, “How was the genetic test paid for?”, and “Has the patient had any of the following tests: Manometry testing?”. Questions referring to treatments were also excluded. Overall, 328 phenotypes satisfied our inclusion criteria.

Statistical analyses

We built multivariate models that associated phenotypic outcome with deletion size, adjusting for age and gender. When the outcome was a continuous variable (birth length, weight, head circumference, or Apgar score), we used linear regression to analyze the associations. To investigate the association between the phenotypes and the deletion size, we employed logistic regression, a method that has previously been used for deletion analyses in PMS [7, 8, 18]. For ordinal outcomes (severity of phenotype based on a four-point scale from “absent” to “always present,” and age ranges for the acquisition of developmental milestones), we used proportional odds logistic regression [35]. We used the Benjamini-Hochberg procedure for determining the false discovery rate (FDR) globally to adjust the p values for multiple hypothesis testing [36]. Age was analyzed using the nonparametric Mann-Whitney test, while gender and race were analyzed using the Fisher test. Statistical analyses were conducted using R version 4.0.0. For the proportional odds logistic regression, the “clm” function from the “ordinal” package was used.

Ethics

The project was approved by the Harvard Medical School Institutional Review Board (HMS IRB14-2161). The PMSIR obtained continuing review approval from the Advarra (formerly Chesapeake) IRB as of January 21, 2021 (CR00182785). All the data were treated anonymously, from the integration into i2b2/transmart to the analysis.

Results

Study population

As of June 2020, 1132 families (661 in the US) consented to participate in the PMS_DN project. Of these, 754 patients had at least their demographic information entered into the registry, including 749 with phenotypic information from the clinical questionnaire (718 patients) and/or the developmental questionnaire (497 patients), and 491 individuals with genetic test results. We included in the analyses 401 patients with phenotypic information who had either 22q13 deletions with genomic coordinates defined by chromosomal microarray and encompassing SHANK3 (n = 350) or pathogenic or likely pathogenic sequence variants in SHANK3 (n = 51). As expected, there were no differences in the gender distribution of the participants (46.9% males, Table 1). Analysis of the demographic information (Table 1) revealed that the selected population was slightly younger and less ethnically diverse than the larger PMS_DN cohort. The majority of the individuals studied were White, which is likely a consequence of the socio-economic factors involved in the referral of individuals with ID and ASD for genetic testing. The 401 patients included in the analyses were distributed across countries as follows: United States 240, Spain 34, Australia 22, Canada 20, United Kingdom 19, France 15, Brazil 14, Italy 12, Belgium 3, Greece 3, Ireland 3, Chile 2, Russia 2, and one patient each from Colombia, Costa Rica, Israel, Mexico, Netherlands, New Zealand, Norway, Portugal, Romania, South Africa, Sweden, and United Arab Emirates. The completion rates of items in the clinical and developmental questionnaires ranged from 2% to 95%, with a mean completion rate of 69%. A total of 328 phenotypes, each present in at least five individuals, were included in the analyses.

Table 1 Sample characteristics and comparison with individuals excluded because of missing phenotypic or genetic data

Genetic findings

Figure 1 shows the 22q13.2-q13.33 region deleted in PMS and the genetic variants in 401 individuals included in the genotype-phenotype analysis. The variant types are summarized in Table 2. Deletion sizes ranged from 10 kb to 9.1 Mb, with a median deletion size of 3.7 Mb. Most deletions were terminal (97.1%), and were associated with translocations in 12.1% and with ring chromosome 22 in at least 8.5% of cases. The proportion of individuals with ring chromosome 22 is likely underestimated, given that the majority of individuals with deletions (57.1%) were diagnosed with chromosomal microarray and had not had a karyotype performed to exclude the presence of a ring chromosome. Thirteen individuals (3.7%) had mosaic deletions. SHANK3 sequence variants included frameshift, nonsense, splice-site and missense variants; the majority were truncating variants (Table 2). Among the individuals for whom parental testing was available, all deletions and sequence variants were confirmed to have occurred de novo.

Fig. 1
figure 1

Genetic variants affecting SHANK3 in 401 PMS patients included in the study. Each line represents a patient; deletions are shown in red and SHANK3 sequence variants in blue. For simplicity, sequence variants are represented as overlapping the whole gene. SHANK3 is indicated by an arrow. Chromosomal coordinates are based on the GRCh38 genome build. Constrained genes intolerant to loss-of-function variants as measured by the LOEUF (loss-of-function observed/expected upper bound fraction) metric from gnomAD (v2.1.1) are indicated in orange (LOEUF < 0.2 darker orange, < 0.3 lighter orange; smaller LOEUF indicates higher constraint)

Table 2 SHANK3 genetic variation in 401 individuals with Phelan-McDermid syndrome

Genotype-phenotype analyses

We tested 328 phenotypes in 401 patients in relation to deletion size, using the multivariate models described in the Methods section. We found that 130 phenotypes were significantly associated with larger deletions (adjusted p value < 0.05), suggesting that additional genes on chromosome 22q contribute to these clinical manifestations, independently or in association with SHANK3. Tables 3 and 4 show the phenotypes positively and negatively associated with deletion size, respectively. For complete analyses and figures of all the phenotypes, see Additional file 2 (Table S1) and Additional file 3 (Other supplementary figures).

Table 3 Phenotypes significantly and positively associated with deletion sizea
Table 4 Phenotypes significantly and negatively associated with deletion sizea

Phenotypes positively associated with deletion size

Major delays in nearly all gross motor milestones were strongly and linearly associated with larger deletion sizes (Fig. 2). These included Walks unassisted, Crawls on hands and knees, Sits when placed, Rolls over back to stomach, Climbs stairs standing up without help, Descends stairs without help, and Holds head up on his/her own (adjusted p values ranged from 9.27e−18 to 4.01e−5, ORs ranged from 2.13 to 5.31; p values and ORs for each phenotype are listed in Table 3). Large deletions were also associated with delays in fine motor skills (Deliberately releases object to a container; Transfers object between hands; Looks at, reaches and grasps distant objects; p values from 1.74e−3 to 4.87e−3, ORs from 1.58 to 1.66), coordination deficits (Difficulties maintaining balance and Coordination difficulties, p values 1.93e−9 and 1.35e−3, ORs 2.25 and 1.56, respectively), and difficulties with motor planning (p = 3.35e−2, OR 1.37). These delays and deficits were accompanied by a range of conditions related to low muscular tone, including floppy baby, neonatal hypotonia, neonatal hyporeflexia, neonatal feeding problems, fatigue, ‘weak muscles’, and poor reflexes (p values from 4.68e−10 to 1.46e−2, ORs from 1.17 to 1.39). Specific feeding difficulties in babies observed more frequently in individuals with larger deletions included Required special feeds, Swallowing problems, Difficulty latching onto the breast, Difficulty latching onto the bottle, Poor suck, and Failure to gain weight normally (p values from 3.51e−6 to 2.50e−2, ORs from 1.14 to 1.29), as well as the need for nasogastric tube (p = 1.59e−2, OR 1.33). Moreover, breathing difficulties at birth, requiring neonatal intensive care, were more common among carriers of larger deletions.

Fig. 2
figure 2

Association between deletion size and age of acquisition of gross motor milestones. On the left: deletions (in red) and SHANK3 variants (asterisks), ordered by decreasing deletion size/position. The position of SHANK3 is indicated by a vertical line. On the right: status of each patient for developmental gross motor phenotypes, lined up with the respective genetic status. The shades of gray and black represent the age range at which the patients achieved gross motor milestones, with lighter shades indicating younger ages and darker shades indicating older ages. The color white indicates that the answer was either 'not applicable' or 'unsure'

Of note, all renal conditions in the clinical questionnaire, including Vesicoureteral reflux, Dysplastic kidney, Polycystic kidney (defined in the questionnaire as “multiple cysts present in one or both kidneys”, which may include multicystic dysplastic kidney), Hydronephrosis, Recurrent urinary tract infections, and Increased kidney size, were strongly associated with deletion size (p values from 2.32e−7 to 1.18e−2, ORs from 1.30 to 1.67). The results indicate that patients with renal malformations carried larger deletions (Fig. 3). Persistent fever related to recurrent urinary tract infections was also associated with deletion size (p = 7.60e−3, OR 1.53). In addition, primary lymphedema was reported more frequently in individuals with large deletions (p = 4.55e−3, OR 1.35) (Fig. 4). Individuals with large deletions also exhibited swelling of the extremities, likely reflecting the presence of lymphedema (Persistent swelling of the hands and feet and Persistent swelling of the legs; p values 2.94e−6 and 1.05e−4, ORs 1.40 and 1.39, respectively).

Fig. 3
figure 3

Association between deletion size and renal abnormalities. On the left: deletions (in red) and SHANK3 variants (asterisks), ordered by decreasing deletion size/position. The positions of SHANK3 and CELSR1, recently implicated in renal defects, are indicated by vertical lines. On the right: status of each patient for renal conditions, lined up with the respective genetic status. The color black indicates the presence of the phenotype, gray indicates the absence of the phenotype, and white indicates missing information

Fig. 4
figure 4

Association between deletion size and lymphatic-related conditions. On the left: deletions (in red) and SHANK3 variants (asterisks), ordered by decreasing deletion size/position. The positions of SHANK3 and CELSR1, recently implicated in lymphedema, are indicated by vertical lines. On the right: status of each patient for lymphatic-related conditions, lined up with the respective genetic status. The color black indicates the presence of the phenotype, gray indicates the absence of the phenotype, and white indicates missing information

In terms of cardiac abnormalities, the presence of a heart murmur (p = 3.43e−5, OR 1.32) and congenital heart defects (p = 4.91e−3, OR 1.25) were associated with larger deletions, including patent ductus arteriosus (n = 15), ventricular septal defect (n = 10), ‘hole in the heart’ (n = 9), bicuspid aortic valve (n = 9), persistent left superior vena cava (n = 5), total anomalous pulmonary venous return or connection (n = 4), ‘heart valve problems’ (n = 3), coarctation of the aorta (n = 2), aortic stenosis (n = 2), enlarged aorta (n = 2), and enlarged aortic root (n = 1). Other congenital heart defects in the questionnaire (mitral valve prolapse) or cardiomyopathy were not reported in the study population.

Several dysmorphic features reported in PMS were associated with greater deletion sizes, including large fleshy hands (p = 9.47e−16, OR 1.50), sacral dimple (p = 5.05e−12, OR 1.45), dysplastic toenails (p = 1.56e−6, OR 1.24) or fingernails (p = 2.47e−3, OR 1.17), defective tooth enamel (p = 4.87e−3, OR 1.20), supernumerary teeth (p = 5.29e−3, OR 1.33), overgrowth (Too tall for age; p = 1.16e−3, OR 1.20), bushy eyebrows (p = 8.69e−3, OR 1.15), 2–3 toe syndactyly (p = 2.36e−2, OR 1.15), and diastema (Gap between two front teeth; p = 9.83e−3, OR 1.13). Recurring ingrown toenails were also more common in those with larger deletions (p = 8.65e−6, OR 1.29), probably related to toenail dysplasia. Teeth extractions were also more frequent (p = 8.94e−3, OR 1.18), possibly because of crowding and/or supernumerary teeth.

Abnormal brain magnetic resonance imaging (MRI) (p = 6.72e−5, OR 1.23) and computed tomography (CT) scans (p = 5.83e−4, OR 1.38), as well as the presence of specific abnormalities such as irregular brain ventricles, decreased gray matter, decreased myelination, and arachnoid cysts, were more likely to be found in patients with larger deletions (p values from 7.36e−4 to 2.40e−2, ORs from 1.20 to 1.32). Individuals with larger deletions also had more current seizures (p = 2.07e−2, OR 1.13) and more febrile seizures (p = 4.51e−6, OR 1.30). However, other types of seizures (e.g., absence seizures, tonic-clonic seizures, simple and complex partial seizures) were not associated with deletion size. Significant associations were observed for the need for intravenous antibiotics, chronic bronchitis, > 2 pneumonias in a year, > 2 serious sinus infections in a year, recurrent pneumonia, and placement of ear tubes (p values from 2.86e−4 to 3.98e−2, ORs from 1.16 to 1.31), suggesting that recurrent infections may be more frequent among individuals with larger deletions. Except for a mild association between deletion size and chronic aspiration (p = 1.90e−4, OR 1.26), no gastrointestinal conditions or symptoms were found to be associated with deletion size. The only ocular issue significantly associated with deletion size was strabismus (p = 1.16e−3, OR 1.19).

Phenotypes negatively associated with deletion size

Self-help skills had some of the strongest negative associations with deletion size, including Dresses without assistance, Undresses without assistance, Toilets independently, Night-time toilet trained, Eats independently, and Drinks independently (p values from 2.43e−9 to 2.75e−4, ORs from 0.327 to 0.558; p values and ORs for each phenotype are listed in Table 4). In addition, abilities related to socialization and imagination were more common among individuals with smaller deletions and SHANK3 variants: Awareness of imaginary characters, Engages in pretend play, Plays alongside others alone, Responds to others' emotions, Responds affectionately to caregivers, Hugs/kisses caregivers or dolls, Plays peek-a-boo, and Plays with peers (p values from 1.65e−5 to 3.09e−2, ORs from 0.327 to 0.750). Speech and language abilities (Number of words in a typical sentence, Verbal speech ability, and Ability to follow directions; p values from 9.41e−7 to 4.50e−3, ORs from 0.378 to 0.560) were observed more frequently in individuals with smaller deletions. Similarly, nonverbal communication skills, including Pointing and Gesturing, were used more often by individuals with smaller deletions (p values from 3.19e−3 to 4.87e−3, ORs from 0.855 to 0.860). Some speech disorders, such as Apraxia of speech, Receptive language disorder, and Expressive language disorder, were more frequently diagnosed in patients with smaller deletions (p values from 2.79e−4 to 6.31e−4, ORs from 0.759 to 0.807). Auditory processing disorder was also negatively associated with deletion size (p = 4.87e−3, OR 0.693), whereas no significant association was observed for hearing loss.

A range of psychiatric diagnoses, including autism, pervasive developmental disorder, attention deficit-hyperactivity disorder (ADHD), anxiety disorder, obsessive-compulsive disorder, and pica, were negatively associated with deletion size (p values from 4.51e−6 to 7.35e−3, ORs from 0.643 to 0.820). Attention deficit and hyperactivity symptoms were also less prevalent in individuals with larger deletions; specifically, Symptoms/diagnosis of ADD/ADHD, Acts as if driven by a motor, Runs around or climbs excessively when inappropriate, Appears to act without thinking, Difficulty playing quietly, Leaves seat when remaining seated is expected, and Difficulty sustaining attention in tasks (p values from 8.97e−8 to 4.06e−2, ORs from 0.486 to 0.721). Sleep disturbances (Difficulty falling asleep, Difficulty going back to sleep, Frequent nighttime awakenings, and Short nighttime sleep; p values from 9.40e−6 to 4.58e−2, ORs from 0.823 to 0.908) were all less frequent in individuals with larger deletions.

Interestingly, nearly all the cognitive development milestones surveyed showed significant impairment proportional to the size of the deletion. These included, Understands the use of familiar objects, Imitates household activities during play, Uses functional toys appropriately, Acts out recent experiences using gestures or words, Observes actions and imitates them later, and Understands the difference between food and objects (p values from 4.51e−6 to 4.87e−3, ORs from 0.786 to 0.864). Accordingly, regression in cognitive skills was observed less frequently in individuals with larger deletions (p value 1.99e−2, OR 0.862). Furthermore, Apgar scores at 1 and 5 min were significantly lower in individuals with larger deletions (p values 1.31e−2 and 1.35e−3, ORs 0.870 and 0.901, respectively), indicating an adverse impact on newborn health. Gestational age and birth weight were not associated with deletion size.

Finally, there were no associations with any endocrine (including hypothyroidism, too tall or too short for age, or puberty milestones), orthopedic (except scoliosis and torticollis), nose and throat, vision (except strabismus), or allergy-related conditions (Additional file 2: Table S1). Responses to sounds, movement, touch, oral input, or temperature variations were not associated with deletion size. Similarly, tolerance to pain was not associated with deletion size.

Genotype-phenotype associations in individuals with sequence variants or Class I or Class II deletions

Individuals with PMS can be classified as having SHANK3 sequence variants; small Class I deletions, including SHANK3 only or SHANK3 with ARSA and/or ACR and RABL2B; or larger Class II deletions, which include all other deletions [6]. The ARSA, ACR and RABL2B genes are not sensitive to haploinsufficiency and are therefore not expected to contribute to the phenotype of PMS. Comparison of individuals with sequence variants or Class I deletions versus those with Class II deletions, shown in Table 5, confirmed our findings based on the analysis of deletion size. Kidney abnormalities, lymphedema, congenital heart disease, abnormal brain imaging, hypotonia, feeding difficulties, and certain dysmorphic features were more common in individuals with Class II deletions than in those with sequence variants or Class I deletions. Individuals with sequence variants and Class I deletions were more likely to have verbal speech, self-help skills, and psychiatric diagnoses, including ASD, ADHD, depression, bipolar disorder and anxiety. Due to the nature of the parent-reported data used in this study, the frequencies in Table 5 are not directly comparable to those reported in studies based on clinical assessment or direct review of medical records [6]. In particular, the total number of individuals considered for each characteristic listed in the table includes the cases with a "No" response, but this does not necessarily mean that the clinical feature was analyzed in all individuals. There were also many unanswered questions, raising the possibility that some parents may have prioritized questions about their children's problems. For these reasons, we opted not to perform statistical analyses on these data.

Table 5 Clinical features of individuals with SHANK3 sequence variants, Class I deletions and Class II deletions

Discussion

We demonstrated the feasibility of using registry data based on family-reported outcome questionnaires to conduct a multiphenotype-genotype association study. We further replicated findings from previous association studies, confirmed recent findings regarding renal malformations, lymphedema, and gross motor development, and suggested novel associations with congenital heart defects, neuroimaging abnormalities and recurrent infections. This is the largest study of its kind in PMS and the first to use the PMSIR. Access to the PMSIR allowed us to recruit the largest sample size (401 patients) for a genotype-phenotype association analysis in PMS, with deletion breakpoints identified by microarray or SHANK3 sequence variants. The sample sizes of previous studies of individuals with PMS were considerably smaller: 29 [19, 37], 30 [38], 32 [4], 51 [14], 67 [39], 70 [7], 71 [18], 98 [8], 170 [6], and 210 [9]. Most previous studies focused on the analysis of deletions, with only the two most recent and largest studies including a significant proportion of SHANK3 variants, between 10 and 20% [6, 9], compared to 12.7% in the current study.

The data provided by this family-driven registry differ from those obtained from medical records. Patient-driven data are rich in information that only caregivers and families can provide, and because the information is provided by the parents themselves, as a proxy for the patients, this approach allows for greater volume and specificity of registry items in many domains. Such a repository would be very difficult and costly to amass through a conventional academic institution-led prospective study. In addition, individuals with PMS can exhibit challenging or hyperactive behaviors so it may be difficult for parents to complete extensive surveys during interviews with their physicians. Furthermore, the information is subject to recall bias. At home, parents have more time and flexibility to complete the surveys; find it easier to make corrections to the data entered, making it more accurate; and are more likely to answer items more thoroughly and to update the surveys, as they are able to refer to the child’s medical records when answering questions. We assert that the reliability of the registry information is demonstrated by the consistency of the results throughout different analyses and with previous findings. Overall, the results based on a larger sample size demonstrate the potential of analyses using data from the PMSIR to pinpoint specific phenotype associations in this syndrome.

Larger deletion sizes are consistently and proportionally associated with more severe gross motor development delays, as well as phenotypes related to low muscle tone in the early stages of development and feeding difficulties. Some of these findings have been reported previously [6,7,8, 18]. However, previous studies only reported “age at walking” [6,7,8] or “age at walking and crawling” [18], and therefore did not capture positive associations in a consistent manner for each gross motor milestone. Some of these features have also been reported in individuals with interstitial deletions not affecting SHANK3 [22,23,24] and are also present to a lesser extent and severity in PMS patients with SHANK3 sequence variants [6, 16]. The fact that these tone-related conditions were observed even in individuals with the smallest deletions, and that the severity of the features increased in proportion to the deletion size, suggests a cumulative role of multiple genes on the long arm of chromosome 22, including SHANK3.

Renal malformations, reported in 15–40% of PMS patients with deletions [4, 6, 9], are strongly associated with larger deletion sizes, with the distribution of deletion sizes hinting at a potentially causative gene (or genes) on chromosome 22q. This association has been reported in four previous studies, which analyzed all renal and urogenital abnormalities, regardless of the specific condition, in smaller samples [4, 6, 9, 19]. This finding indicates that renal malformations in PMS, which are absent (or extremely rare) in patients with SHANK3 sequence variants [6, 9, 16] but are found in some individuals with interstitial deletions not involving SHANK3 [22,23,24], are related to genes other than SHANK3 and have incomplete penetrance.

Our results also confirm previous studies showing an association of deletion size with large fleshy hands, dysplastic toenails and sacral dimple [7, 8, 18, 38], and an increased prevalence of lymphedema and swelling of the extremities [4, 9, 19]. Larger deletions were significantly associated with heart conditions, including heart murmur and congenital heart defects; each was present in 11% (45/401) of the patients. Cardiac abnormalities had been associated with larger deletions in a small study of 30 individuals with ring chromosome 22 [38], but this finding had not been replicated in other cohorts with larger samples. Similarly, recurrent ear infections were correlated with deletion size in a study of 51 individuals with 22q13 deletion syndrome [14], but not in subsequent studies with larger sample sizes. Here, we showed that larger deletions were associated with recurrent ear and sinus infections, chronic bronchitis, recurrent pneumonia, and the need for intravenous antibiotics. These new associations were identified by analyzing a much larger dataset compared to previous studies.

Additionally, larger deletion sizes were associated with abnormalities in brain MRI and CT scans, including abnormal ventricles, decreased gray matter, and arachnoid cysts, which are well documented in individuals with PMS [4, 40] but had not been previously associated with deletion size. No other neurological morphological abnormalities or symptoms were associated with larger deletion sizes, except for current seizures and febrile seizures. Specific subtypes of seizures (e.g., absence seizures, tonic‒clonic seizures, simple and complex partial seizures) were not associated with deletion size. Analyses also revealed that individuals with small (Class I) deletions or SHANK3 sequence variants had fewer febrile seizures compared to those with Class II deletions (Table 5), whereas no difference was observed for non-febrile seizures. The difference between febrile and non-febrile seizures is difficult to explain and needs to be confirmed in future studies.

We found that individuals with larger deletions were less likely to be diagnosed with ADHD and ASD, or to present with behavioral features of these disorders. As previously suggested [7], this finding might reflect the severity of the disability due to the larger deletion size “masking” autistic or ADHD manifestations. The negative association between deletion size and ADHD and hyperactivity symptoms might further be linked to the severity of low muscle tone and its association with larger deletion sizes. Social, cognitive and communication development, representing the depth and breadth of intellectual disability, were inversely associated with deletion size. The abilities most significantly impacted were those related to understanding social conventions and interacting with others, whereas individual abilities and anxiety were evenly distributed throughout the sample.

Our method had previously been used as a proxy for the number of deleted genes, which is linked to the severity of the genetic condition [4]. Although our findings indicate the involvement of other genes upstream of SHANK3, our approach cannot determine their identity. The proportional association of deletion size with developmental delays and severity of cognitive or social disabilities suggests a cumulative effect of multiple gene alterations. Specifically, the consistency of associations between deletion size and gross motor milestone delays, as opposed to other features such as endocrine or sensory issues, for instance, indicates that a number of haploinsufficient genes spanning the terminal 9 Mb of the long arm of chromosome 22 affect gross motor functions and muscle tone when deleted. For those phenotypes not associated with larger deletion sizes, we suggest that some may not actually be associated with PMS. This would apply to phenotypes with a prevalence in our cohort similar to that observed in the general population, as is the case for the majority of phenotypes (e.g., painful urination, night terrors, or enuresis). Phenotypes present in nearly all affected individuals could be part of the core features of PMS caused by defects in SHANK3. Haploinsufficiency of SHANK3 resulting from sequence variants is sufficient to cause intellectual disability, ASD or autistic features, severe speech deficits, hypotonia, motor skill deficits, regression, seizures, brain abnormalities, mild dysmorphic features, and feeding and gastrointestinal problems [5, 6, 9, 16]. The present data suggest that in addition to the role of SHANK3 in these phenotypes, the loss of other genes in individuals with larger deletions can also contribute to some of these manifestations. To date, only two genes with autosomal dominant inheritance and loss-of-function mechanism have been identified in the interval deleted in PMS. TCF20, located in the 22q13.2 region and deleted in individuals with the largest terminal deletions (> 8.6 Mb), is implicated in a novel syndrome characterized by motor and language delay, ID, ASD, hypotonia, and variable dysmorphic features [25, 26]. Large terminal deletions involving TCF20 are rare, suggesting that there are other genes contributing to the phenotype that remain to be identified. CELSR1, located in the 22q13.31 region and deleted in individuals with deletions > 4.4 Mb, has recently been implicated in lymphedema [27, 28]. Interestingly, of the 22 individuals with lymphedema included in this study, 20 (91%) have a > 4.4 Mb deletion including CELSR1, lending further support to this gene playing a role in the lymphedema observed in individuals with PMS. The report of lymphedema in two individuals without loss of CELSR1 is unexpected and needs to be confirmed. More recently, CELSR1 loss of function was also associated with renal defects, including ureteropelvic junction obstruction and renal atrophy [41]. This finding indicates that loss of CELSR1 may be responsible for at least some of the kidney abnormalities in individuals with PMS with larger deletions, although loss of a more distal gene (or genes) may also increase the risk of renal defects in those with deletions that do not include CELSR1 (see Fig. 3).

Limitations

There are some limitations that warrant consideration. One caveat of the current analyses is that, when used as a predictor of binomial outcomes, such as the occurrence of a clearly defined somatic phenotype, deletions can only act as a proxy for the contribution of chromosomal regions of interest and hint at potential locations of the genes involved in specific phenotypes. Comparisons between individual genes could be used to narrow down the range of genetic contributors to phenotypes. However, since the majority of genetic abnormalities studied are contiguous deletions, there is a strong correlation among the deleted genes that would require the development of specific models. We also note that not all participants diagnosed with PMS caused by 22q13.33 deletions were tested with a genome-wide array; some were only tested for copy number variants on chromosome 22q12.3-terminus using a custom microarray in a research setting [8, 18], and karyotypes were not available in many cases. Because of this missing information in a significant proportion of individuals, we did not take into account the possible contribution of additional chromosomal aberrations to the phenotype. Another limitation stems from the reporting of data by families, with the potential for error, particularly concerning complex medical terms. In the future, novel associations, such as those with congenital heart defects in the present study, would ideally be validated by asking families reporting these defects to provide relevant medical records. Additionally, in terms of future directions, there exists a third source of data based upon natural language processing of the patients’ clinical notes, encoded as Unified Medical Language System concepts. We expect that this will provide a range of phenotypes that were not captured by the registry questionnaires and enable further detailed analyses. Finally, it is worth noting that due to missing genetic and/or phenotypic data, we were unable to analyze three-quarters of the individuals in the registry.

Conclusions

In this study, we presented a method for the curation and integration of data from a patient-driven registry and applied it to the PMSIR. We used this data source to demonstrate that deletion size is associated with the severity of delays in gross motor development and a spectrum of generalized muscle tone impairments. Furthermore, we corroborated previous studies indicating that larger deletion sizes are associated with renal defects, lymphedema, large fleshy hands, dysplastic toenails, sacral dimple, and more severe impairments in verbal ability and socio-cognitive development. We also showed novel associations, including a higher prevalence of congenital heart defects, neuroimaging abnormalities, and infections among individuals with larger deletions. Overall, the results demonstrated the feasibility of utilizing this data source to perform genotype-phenotype analyses. Our method can serve as a reference for future research projects related to this and similar data sources.