Introduction

MECP2 Duplication Syndrome, MDS (MIM# 300260), is a neurogenetic developmental disorder stemming from increased copies of the MECP2 gene. The frequency of MDS has not been studied comprehensively. A recent study from Australia reported the prevalence as 0.65/100,000 live births [1]. The most common features include hypotonia, recurrent respiratory infections, developmental delay, epilepsy, and gastrointestinal and nutritional problems.

Currently, the management of MDS is symptomatic. However, preclinical studies using antisense oligonucleotide (ASO) provided robust phenotype recovery in mice models [2, 3]. Since disease-modifying treatments targeting the root problem are within reach, validated outcome measures for use in clinical and translational studies are needed. Toward this goal, we surveyed the caregivers of MDS individuals to explore the most bothersome complaints [4]. Gastrointestinal symptoms, especially constipation, were among the most bothersome problems and should be considered as primary outcome measures in future interventional studies.

Gastrointestinal problems are highly prevalent in MDS and allelic Rett syndrome (RTT, MIM 312750, caused by deletions or loss of function mutations of MECP2). To explore gastrointestinal health issues in MECP2-related disorders, we generated the Gastrointestinal Health Questionnaire. The reliability and validity studies for RTT were conducted and published separately [5].

In the current study, we aimed to develop a parent-oriented, reliable, and valid Gastrointestinal Health Scale (GHS) specific to MDS that could be utilized as an outcome measure in clinical assessments and interventional studies.

Material and methods

Gastrointestinal health questionnaire development and delivery

The study protocol was reviewed and approved by the Institutional Review Board (IRB) at Baylor College of Medicine with IRB approval number H-46176. We have created an online registry portal (https://mds.nrihub.org) that complies with the Health Insurance Portability and Accountability Act (HIPAA). This portal serves as a secure platform for conducting cross-sectional survey studies. All registrants were required to upload the genetic report confirming the molecular diagnosis of MDS. Our survey was promoted through the social media channels of family-based organizations. All participants provided written consent form for their registration to portal, participation into surveys and publishing the results.

Gastrointestinal problems are common in MECP2-related disorders including MDS and Rett. The senior author (K.J.M.) developed the Gastrointestinal Health Questionnaire (GHQ) through caregiver interviews and national surveys over the past two decades with multiple revisions based on feedbacks. The finalized GHQ was revised to make it comprehensive with no overlapping questions and understandable at the 8th-grade reading level. The GHQ consists of 55 questions on 9 factors, including General Health/Pain (5 questions), Eating/Chewing/Swallowing (9 questions), Reflux (3 questions), Gas/Bloating (5 questions), Diarrhea/Constipation (6 questions), Personality/Mood (5 questions), Medications (9 questions), Surgery (5 questions) and Parenting (8 questions). The responses were comprised of a five-point Likert scale from never to almost always except for the surgery questions where answers were “Yes/No”. Participants also were asked to report the relevance and importance of each question on a four-point Likert scale from not relevant/important to very relevant/important. The GHQ is a screening tool rather than a scale and investigates gastrointestinal problems broadly (e.g., both diarrhea and constipation questions were included in the GHQ). We applied GHQ to Rett syndrome and MDS patients and published overall gastrointestinal findings in these allelic disorders in separate articles [5, 6]. In this paper, we applied multiple statistical methods for the caregiver responses and removed irrelevant items. Now, this tool is called “MDS-Specific Gastrointestinal Health Scale” to be used as an outcome measure in clinical and translational research studies.

The survey was delivered to families between December 9th 2021 and January 20th 2022 through our secure portal. After the completion of the survey, we conducted statistical methods in two phases to tailor the GHQ specific to MDS (Fig. 1).

Fig. 1
figure 1

Flowchart of MDS Specific Gastrointestinal Health Scale Development Process. We initially surveyed MDS parents with GHQ. We then followed the described steps to create MDS-Specific GHS. GHQ Gastrointestinal Health Questionnaire, GHS Gastrointestinal Health Scale, MDS MECP2 Duplication Syndrome, CFA Confirmatory Factor Analysis

Phase I: item reduction/retention

For Item Reduction, we performed a stepwise item elimination/retention process including a) Confirmatory Factor Analysis, b) parent-reported item elimination/selection and c) expert opinion.

Confirmatory factor analysis on the GHQ items

Confirmatory Factor Analysis (CFA) was executed on the initial GHQ items to examine the importance of items using factor loading values as a measure, then removing unrelated items from the questionnaire as the first step of item reduction. A factor loading score greater than 0.500 was determined as a cut-off according to Hu and Bentler’s guidelines [7]. We subsequently investigated whether the GHQ fits the CFA model by evaluating the following fit indices: Noncentrality-based Indices, Relative (Incremental) Fit Indices, and Absolute Fit Indices.

Item reduction/retention based on parent-reports

We used the fifth version of the guidelines developed by the European Organization for Research and Treatment of Cancer (EORTC) Quality of Life Group for a module development for the parent-based item reduction process [8]. We calculated floor effect, ceiling effect, compliance, relevance, importance, mean scores for relevance and importance, and prevalence ratio and prevalence scores for relevance and importance per guidelines.

The guideline recommended the following cut-off points for decision rules for selection of item reduction: Relevance: < 25% scored 1 (Although published guidelines stated score “0” instead of the score “1”, we have reached out to the authors of the guideline developers and confirmed that score should be “1”; they will provide a corrigendum to the Manual); Importance: > 60% scored3 or 4; Mean score > 1.5; Prevalence ratio > 30% or prevalence of scores 3 or 4 > 50%; Range > 2 points; No floor or ceiling effect: responses in categories 3&4 or 1&2 > 10%; and Compliance: at least 95% response to the item [8]. When we applied these criteria, we had too few items resulting in disruption of the structure of the survey. As suggested by the guideline, we modified Relevance as < 33% scored 1 and Importance as > 47% scored 3 or 4 [8]. This flexibility provided retention of additional items, thus regaining a model structure.

Item reduction/retention based on expert opinion

MDS experts (authors D.P., B.S., and K.J.M.) completed the item reduction process per EORTC guidelines. Experts were comprised of investigator clinicians who evaluate and manage MDS individuals at Texas Children’s Hospital Rett Center, a center of excellence dedicated to MECP2 and Rett-related disorders. MDS experts discussed the clinical importance of each removed and added items, regardless of factor loading scores and parent-based relevance and importance scores. A consensus was reached for the final scale for further statistical evaluations (phase II).

Phase II: statistical evaluation of the MDS-specific gastrointestinal health scale for reliability and validity

Normality/sampling adequacy/power analysis

Normality

Prior to the validation and reliability analysis, normality was evaluated by the Kolmogorov–Smirnov test and Shapiro–Wilk test. The assumption of normality based on the skewness of values within the range ± 2 [9] and kurtosis of values within the range ± 7 [10] was determined. We also conducted Mardia’s Skewness Test and Mardia’s Kurtosis Test to investigate the multivariate normality of distribution.

Sampling adequacy

Kaiser–Meyer–Olkin (KMO) Test and Bartlett’s Test of Sphericity were used for data suitability and sampling adequacy The KMO test is a statistical measure to determine how data suit for factor analysis. The test measures sampling adequacy for each variable in the model and the entire model. Higher values mean a better fit of the data for factor analysis. KMO > 0.80 s was considered meritorious and less than 0.5 was unacceptable [11].

Bartlett’s test of Sphericity assesses the null hypothesis using an identity and correlation matrix. A significant statistical test (usually less than 0.05) shows that the correlation matrix is not an identity matrix (rejection of the null hypothesis). If the p-value from Bartlett’s Test of Sphericity is < 0.05, then our dataset is suitable for data reduction techniques such as principal component analysis and factor analysis studies.

Power analysis

We calculated power analysis according to RMSEA good fit indices criteria. We used three different methods for Post-hoc Power Analysis using “findRMSEApower”, “semPower.postHoc” and “semPower.compromise” functions in R language. Power of 0.80 and above is widely considered as a valid and acceptable value [12].

Reliability and internal consistency

Reliability and internal consistency, including factor-based internal consistency and overall internal consistency, of the scale was assessed by multiple methods including Cronbach’s alpha (value > 0.7 is considered meaningful) [13], McDonald’s Omega (value > 0.7 is considered meaningful) [13, 14], Consistent Reliability (RhoA) [15, 16], Composite Reliability (RhoC) [16, 17] and Spearman’s correlation analysis (r between 0.10–0.39 is considered a weak correlation) [18].

McDonald’s omega

McDonald’s omega is a reliability coefficient metric similar to Cronbach's Alpha [19]. McDonald’s Omega measures the strength of association between items and factors, and item-specific measurement errors. This provides more reasonable estimates compared to Cronbach's Alpha in reliability assessment [14]. The values and their interpretation are similar to Cronbach’s Alpha [19].

Composite reliability (RhoC)

Composite reliability (RhoC) is one of the primary reliability coefficients that uses the factor analysis method. Values between 0.60 and 0.90 are considered acceptable ranges and higher numbers indicate better reliability with the following ranges [17]:

  • Values between 0.60 and 0.70: Acceptable,

  • Values between 0.70 and 0.90: Satisfactory to good,

  • Values above 0.90: Unacceptable. Because values above 0.9, especially above 0.95, indicate the presence of unnecessary items in the examined factor, thus disrupting the construct validity.

The reliability coefficient

The reliability coefficient (known as Exact Reliability or RhoA) is a relatively new method to assess the internal reliability of a scale. RhoA is usually a value between Cronbach’s alpha and composite reliability score. RhoA is an adjustment coefficient value to support the limitations of Cronbach’s alpha [15].

We further developed a new variable, a total item score, by summing all item scores. We calculated correlation coefficient values between this new variable and each factor’s item scores to assess the reliability.

Validity studies

Indicator collinearity

Indicator Collinearity was used to assess the correlation between factors and items of each factor. Variance Inflation Factor (VIF) is a standard measure to assess the collinearity. The VIF values of 5 or above indicate presence of collinearity problem. VIF values between 3 and 5 are acceptable but is not ideal The VIF values less than three suggest the absence of overlapping [16, 20].

Construct validity

We assessed the Construct Validity by calculating the Convergent Validity and Discriminant Validity.

Convergent validity

Convergent validity refers to the degree to which two measures of constructs that theoretically should be related, are in fact related [21]. In convergent validity, larger and statistically significant factor loadings mean better convergent validity. Loading values > 0.5 are acceptable values.

We further assessed convergent validity by Average Variance Extracted (AVE). If the AVE value is > 0.50, convergent validity is statistically established.

Discriminant validity

Discriminant validity tests whether concepts or measurements that are supposed to be unrelated are, in fact, unrelated [21]. It shows that constructs in the study have their own individual identity and are not too highly correlated with other constructs in the study. We assessed the discriminant validity of the GHS through the heterotrait-monotrait ratio (HTMT) of the correlations and Fornell and Larcker Criterion [22].

HTMT correlation assesses the arithmetic or geometric mean correlation among items across factors relative to the geometric-mean correlation among items within the same factor. The resulting HTMT values are interpreted as estimates of inter-construct correlations. Values more than 0.90 indicates the absence of discriminant validity, thus values less than 0.90 was considered as accepted [23].

Fornell and Larcker Criterion evaluates the factors in the model by calculating the square root of AVE in the diagonal with the correlation coefficients (off-diagonal) for each construct in the relevant rows and columns. This value should be greater than its correlation with all other factors.

CFA for finalized GHS

CFA is a multivariate statistical procedure that tests how well the measured items represent the number of factors. We performed CFA by using the Diagonally Weighted Least Squares (DWLS) method as an estimator to test and evaluate our model’s validity and whether the data fit a hypothesized measurement model. Based on the assumption of multivariate normality is severely violated and/or data are ordinal, the DWLS method provides more accurate parameter estimates [24,25,26]. We conducted CFA for the final MDS-specific GHS by calculating Fit Indices. We used the most common and well-known fit indices under four major categories to assess the construct of the model:

Noncentrality-based indices: RMSEA, CFI, RNI

  1. 1.

    The root mean square error of approximation (RMSEA) shows the lack of fit per degree of freedom of the model on the ground of sample size. Values < 0.05 indicate a very good fit. Of note, RMSEA is the only fit indices with a confidence interval value.

  2. 2.

    Comparative Fit Index (CFI) compares the sample covariance matrix with a null model. Accepted values > 0.90 mean a better fit.

  3. 3.

    Relative Noncentrality Index (RNI): Accepted values are same as CFI values.

Relative Fit Indices: IFI, TLI and NFI

  1. 1.

    Bollen’s Incremental Fit Index (IFI): Values > 0.90 indicates a better fit.

  2. 2.

    Tucker-Lewis Index (TLI) adjusts for the number of model parameters and values and the interpretation of the values are same as CFI.

  3. 3.

    Bentler-Bonett Normed Fit Index (NFI): Values and interpretation of the values are the same as CFI.

Absolute Fit Indices: Chi-square, GFI, AGFI, WRMR/SRMR

  1. 1.

    Chi-square and Chi square/df ratio (χ2/df): Chi-squared goodness-of-fit statistic measures the overall model fit to observed data; a significance test with p-values > 0.05 indicates a good fit. Χ2/df values of < 3.0 is considered acceptable.

  2. 2.

    Goodness of Fit Index (GFI): Evaluates the fitness between the proposed model and observed covariance matrix. Similar to IFI, a value > 0.95 is an acceptable value.

  3. 3.

    Adjusted Goodness of Fit Index (AGFI): Corrected GFI. Values > 0.90 are considered as an ideal value.

  4. 4.

    Weighted Root Mean Square Residual (WRMR)/Standardized Root Mean Square Residual SRMR: WRMR and SRMR measures the average differences between samples and population variances. However, SRMR is for continuous items and situations with large sample sizes. On the other hand, WRMR is for categorical items and preferred for relatively small sample sizes. Thus, in this study, we used the WRMR fit index instead of SRMR [27,28,29]. WRMR scores between 0.90 and 1.00 are considered appropriate values [30].

Finally, we measured the Efficiency Converges which calculates the number of iterations using R studio. Ideal Efficient Converge means reaching an optimum solution (efficient algorithm) after a few iterations. Thus, a lower number of iterations indicates a better model. Our iteration number is 7, which is proving the desired accuracy of our model.

Phase II structural equation modeling (SEM)

We performed Structural Equation Modeling (SEM) using DWLS as an estimator to evaluate factors affecting parenting. For SEM, we evaluated the same fit indices with their reference values that we used in CFA to confirm whether our model fits.

Phase II exploratory graph analyses (EGA)

EGA is a relatively new method to estimate the number of factors/dimensions and items with their relations to each other [31, 32]. We applied EGA to compare the final MDS-Specific GHS with the EGA’s proposed model.

All statistical analyses are conducted using multiple software and programs including SPSS version 29.0, JASP version 0.14.1.0 software (JASP Team, Amsterdam, Netherlands), JAMOVI version 2.3 and R Studio program.

Results

Subjects

A total of 122 caregivers initially participated in the survey. After review, 106 surveys met the eligibility criteria and were included in the analysis. Sixteen surveys were excluded due to either the MDS individual was female (as they do not exhibit the classic clinical features of MDS) or because parents did not provide the required genetic report for their child. However, amongst the 106 eligible surveys, three of them were females since they had translocations to an autosome thus represented as classic MDS phenotype (selective X inactivation favoring the duplicated X chromosome). Of the 106 eligible surveys, responders comprised of mothers (n = 88), fathers (n = 17), or mothers and fathers together (n = 1). Surgery questions were removed because: 1) the response was dichotomous, thereby incompatible with the model structure and 2) parental relevance and importance choices excluded these questions.

Phase I: item reduction/retention studies based on CFA, parent-reports and expert opinion

We conducted CFA to assess the importance of items and exclude nonrelevant items based on factor loading score. This step removed 11 items and one factor (Additional file 1: Fig. S1 and Additional file 3: Table S1, column I). At the end of this step, 44 items and 8 factors remained. We examined the CFA models results with chi-square, χ2/df, TLI, GFI, RMSEA, and WRMR. All results were within the expected ranges described in the Methods section and validated the model’s structure (Data not shown).

Table 1 Mean, standard deviation, skewness, kurtosis, Shapiro–Wilk, McDonald ω, Cronbach's α, Spearmen Correlation analysis, and Confirmatory Factor Analysis of MECP2 Duplication Syndrome Specific Gastrointestinal Health Scale

We applied the EORTC recommended relevance (score 1 < 25%) and importance (score 3 or 4 > 60%) cut-offs for the entire GHQ (Additional file 3: Table S1). Thirty-nine out of 55 questions were eliminated with these criteria (Additional file 1: Fig. S1 and Additional file 3: Table S1, columns B and C). The remaining 16 items were too few and disrupted the survey structure. We used the flexibility option in the guidelines and relaxed the relevance criteria from < 25% to < 33% for score 1 and the importance criteria from > 60% to > 47% for scores 3 and 4 without changing other criteria (Mean, Prevalence ratio, Range and Floor effect or Ceiling effect). The relaxed criteria restored an additional 14 questions to achieve a total of 30 questions (Additional file 1: Fig. S1 and Additional file 3: Table S1, columns D and E).

The experts gathered to discuss each item reduction result, regardless of parent-based responses and CFA results. The final GHS, which included a total of 38 items with 7 factors. This scale is called the MDS-specific Gastrointestinal Health Scale (GHS) and underwent reliability and validity testing (Additional file 1: Fig. S1 and Additional file 3: Table S1, column K).

Phase II: reliability and validity studies

Normality, sampling adequacy and power analysis

Kolmogorov–Smirnov and Shapiro–Wilk tests revealed that the data distribution was not normal. When considering the skewness normal range between -2 and + 2 and kurtosis normal range between -7 and + 7, skewness and kurtosis values for all items were within expected ranges except Questions 4 and 5 in the Medication factor for both skewness and kurtosis values (Table 1).

Multivariate normality analysis using Mardia’s Skewness Test and Mardia’s Kurtosis Test showed Skewness and Kurtosis values for Mardia’s Coefficients, Kappa and p-values are 633.825 and 1542.931 for Mardia’s Coefficient, 11,197.575 and 2.141 for Kappa, and < 0.001 and 0.032 for p-values, respectively.

Sampling adequacy measurements were assessed with KMO [KMO value = 0.834 which is above Kaiser’s (703) = 2553 (p-value < 0.001)]. This result indicates strong sampling adequacy for the CFA. Bartlett’s Test of Sphericity analysis resulted in a Chi-square of 2553.034 (p-value < 0.001), which showed that our scale is suitable to execute factor analyses.

We calculated the Power of the gastrointestinal health scale using the CFA model-derived degree of freedom and sample size, and RMSEA good fit values. Power calculation using Basic Power Analysis, Post-hoc Power Analysis, Compromise Power Analysis revealed 0.999, 0.994 and 0.982, respectively, confirming the strong power of the study.

Reliability and internal consistency

Factor-Based Internal Consistency: We calculated Cronbach’s alpha, McDonald’s omega, RhoA and RhoC values for each factor to assess the reliability. All factor reliability values were over 0.700 except Medication Factor, which confirms that each factor’s internal consistency was very good except for Medication (Table 2 and Fig. 1).

Table 2 Factor based reliability and AVE of gastrointestinal health scale

We calculated the Composite reliability (RhoC) values as a composite reliability measure. RhoC values were between satisfactory to good except for two factors (mood and parenting) with values between 0.90 to 0.95.

The Reliability Coefficient (Exact Reliability or RhoA) value for the factors in our scale had values above 0.70 except for medications (0.658), however, RhoA and RhoC values were higher than Cronbach’s alpha.

Overall Internal Consistency: To assess the Overall Internal Consistency, we calculated Cronbach’s alpha and McDonald’s omega values for all factor items together. Cronbach’s alpha and McDonald’s omega were 0.928 (95% confidence interval 0.907–0.946) and 0.926 (95% confidence interval 0.905–0.946), respectively, which means excellent coefficient scores.

Spearman’s Correlation Analysis: We examined the correlation between each item and the total item score (Sum of items) using Spearman’s correlation. All pairwise correlation coefficients were statistically significant [p-values mostly < 0.001 with the highest p-value of 0.027, see Table 1 for entire item values].

Validity studies

Indicator collinearity

All VIF values were under 5. VIF values were also under 3 in 6 out of 7 factors except for some of the parenting items (Additional file 4: Table S2).

Construct validity

Convergent validity assessment as part of the construct validity is conducted by calculating factor loading (Table 1) for each item and AVE values for each factor (Table 2). Factor loading values were mostly very high except for four items between 0.34 and 0.50, which were retained in the scale by the expert opinion (Table 1). AVE values for the factors in our scale had values above 0.50 except for eating-chewing-swallowing function (0.444) and medications (0.318).

Discriminant Validity: We calculated HTMT, and Fornell and Larcker Criterion scores to assess discriminant validity. All HTMT values were within the acceptable range and less than 0.90, confirming the discriminant validity of the scale (Table 3).

Table 3 HTMT: heterotrait–monotrait (ratio of correlations method)

All Fornell and Larcker values were within Fornell and Larcker Criterion for each factor, further supporting the discriminant validity of our scale (Table 4).

Table 4 Fornell and Larcker criterion
Confirmatory factor analysis

The CFA of the final MDS-specific GHS showed a perfect model fit based on the goodness of fit statistics. Chi-square was 708.251 with a df value of 644 (n = 106) and the p-value was 0.04. The χ2/df fit value as 1.099 (acceptable value < 3). We calculated 10 different fit indices, and eight out of nine indices were within the acceptable values including the most commonly used ones: CFI 0.997 (acceptable value > 0.85), RMSEA 0.031 [Confidence Interval 90%: 0.007 – 0.044], GFI 0.975 (acceptable value > 0.85). The only fit index that was not within the acceptable value was SRMR 0.097 (preferred value < 0.08). All fit indices scores and their acceptable values were detailed in Table 5. Path diagram CFA is shown in Fig. 2.

Table 5 Fit Indices of MECP2 Duplication Syndrome Specific Gastrointestinal Health Scale
Fig. 2
figure 2

Path Diagram for the GHS. Items are shown in rectangles and Factors are shown in oval shapes. Factor loading values are shown on the arrows from Factors to Items. Item Residual values are given with the numbers next to items. Factor correlation values are provided with the arrows between Factors. ECS Eating-Chewing-Swallowing, GHealth General Health, Medic Medication, Q Question, Parent Parenting

Structural equation modeling

SEM analysis revealed three factors independently affect parenting including general health, motility and medications with p-values < 0.001, < 0.001 and 0.04, respectively.

Exploratory graph analysis (EGA)

We explored whether our model (CFA-based MDS-specific GHS) overlaps with the proposed EGA model. The EGA identified six factors with 37 items. Importantly, 34 out of 37 questions were present in our GHS (~ 92% overlap with the existing scale), supporting our model structure and providing further evidence that EGA should be considered as an adjunct or alternative method for exploratory factor analysis.

Discussion

In this study, we developed an MDS-specific gastrointestinal health scale (MDS-specific GHS) based on CFA, parents’ responses and experts’ opinions. The final scale included 38 items in 7 factors and covers most bothersome gastrointestinal symptoms. The statistical studies revealed that the MDS-specific GHS is a reliable and valid tool developed based on parent-reports. Thus, this survey can be used as an outcome measure of symptom severity in clinical and translational research studies. Moreover, since it is easy and quick to apply, it can serve as a screening tool for individuals with MDS in gastrointestinal clinics.

Outcome measures are tools to assess the patient’s severity of symptoms in an objective way. Outcome measures are more valuable if patients or caregivers are involved in the development process of tool development [33]. MDS individuals are not the source of information in our surveys due to their limited or absent communication skills stemming from their profound cognitive deficits. Thus, parents/caregivers were the primary source of information.

We followed a stepwise method in our scale development. First, we conducted item-reduction on the entire GHQ using CFA, EORTC guideline decision rules and expert opinion. The CFA model removed 12 items and one factor. Applying the EORTC decision rules disrupted the survey structure. Thus, we loosened the relevance and importance criteria per the EORTC guideline [8], resulting in 31 items. Finally, experts included additional 7 items, resulting in a total of 38 items and 7 factors for the final GHS.

The power of our study was measured by three functions using R-language. The lowest score amongst them was 0.982 (Compromised Power Analysis) proving the power of the study. Furthermore, sampling adequacy assessment (KMO and Bartlett’s Test of Sphericity) showed the suitability of the scale for factor analysis. Skewness/kurtosis values were low for two items in the Medication factor. However, these two items were included in the final scale per expert opinion.

The reliability of our study is assessed by multiple measures including Cronbach’s alpha, McDonald’s omega, composite reliability (RhoC) and exact reliability (RhoA) as opposed to many other studies which mostly conduct reliability analysis based on Cronbach’s alpha. All these reliability measures have limitations thus measuring reliability with multiple methods provided a more robust reliability assessment for our model. One of the important but underestimated constraints of Cronbach’s alpha is that it assumes all items’ loadings are the same in the population, thus providing lower reliability values [16]. On the other hand, very high (> 0.95) RhoC values can provide information on construct validity. Thus, Cronbach’s alpha assesses the lower bound whereas RhoC assesses the upper bound for internal consistency reliability [16] In our scale, the Medication factor’s Cronbach Alpha and RhoA scores were borderline low, 0.638 and 0.658, respectively (Fig. 3). This is likely due to lower skewness scores for two items in the medication factor and experts retained them in the survey due to clinical importance. Additionally, even if these two medication items were removed from the scale, overall Cronbach’s alpha changes were minimal (Table 1).

Fig. 3
figure 3

Reliability Assessments of the GHS. Cronbach alpha, RhoA and RhoC values for each factor. All values are within desired values except for Medication factor where Cronbach alpha and RhoA are below perfect value

We performed validation studies with construct validity, discriminant validity and CFA. In CFA, the p-value for the Chi-square was 0.04. However, the p-value should not be statistically significant (> 0.05), which is an indicator of good model fit. This was a commonly encountered problem in CFAs, thus fit indices values were developed [34]. We calculated 10 different fit indices and all of them were within acceptable values. We used the estimator DWLS when we were conducting CFA analysis. In DWLS, WRMR is more meaningful than SRMR and our WRMR value is also within acceptable values. We thus removed SRMR from our fit indices list. Eventually, all of our fit indices including the most important and commonly used ones (χ2/df, RMSEA, CFI, TLI, GFI and WRMR) were within acceptable ranges (Table 5).

Factor loading scores and AVE values were used to evaluate construct validity. Both analyses showed borderline low values in the eating/chewing/swallowing and medication factors. There are two questions in each section (questions 4 and 8 in the eating/chewing/swallowing factor and questions 6 and 7 in the Medication factor) that has low factor loading and AVE values. Lastly, another key element of validity assessment is Discriminant Validity. Fornell–Larcker criterion has been in use as the primary criterion to assess discriminant validity. However, the HTMT criterion is becoming the preferred choice in recent years [16]. In our study, we calculated both HTMT, and Fornell and Larcker Criterion and both analyses were within expected ranges for discriminant validity. Overall, these analyses confirm the validity of our scale. Further evidence for the validity of our scale comes from EGA. Final MDS-Specific GHS (Additional file 2: Fig. S2A) and EGA’s proposed model (Additional file 2: Fig. S2B) were very similar (7 factors with 38 items versus 6 factors with 37 items) despite multiple items being reincorporated into the actual scale with expert opinion.

SEM analysis to identify factors affecting parenting revealed general health, motility and medications. Our meaningfulness survey also identified motility (constipation) as one of the top concerns that caregivers were seeking treatment for, which confirms the SEM analysis [4].

This study had limitations based on study design. The study was conducted as an online survey, rather than an in-person interview process, which could lead to bias. The study design was cross-sectional, rather than longitudinal, which also limits the exploration of the full scope of the symptoms and their severity. This design could have caused parental bias in their relevance and importance decisions. Furthermore, validation studies of the survey ideally should be conducted longitudinally. Most of our sample population originated from USA and Europe. This selection could cause bias in responses due to treatment preferences and the socioeconomic status of these countries. Finally, the present study was conducted during the COVID pandemic, which may have affected parental responses.

In conclusion, MDS-Specific GHS is a valid and reliable rating scale with adequate psychometric properties to measure the gastrointestinal health of MDS individuals. The significance of this scale lies in its development based-on parent-reports. It is reliable and valid tool, that is also easy to administer. This scale can serve as a valuable outcome measure in clinical trials and translational studies. Additionally, it can be utilized as a screening tool for gastrointestinal health in clinical settings.