Confirmatory factor analysis with missing data in a small sample: cognitive reserve in people with Down Syndrome

The presence of missing data and small sample sizes are very common in social and health sciences. Concurrently to present a methodology to solve the small sample size and missing data, we aim to present a definition of Cognitive Reserve for people with Down Syndrome. This population has become an appealing focus to study this concept because of the high incidence of dementia. The accidental sample comprised 35 persons with DS (16–35 years). A total of 12 variables were acquired, four of them had missing data. Two types of multiple imputation were made. Confirmatory factor analysis with Bayesian estimations was performed on the final database with non-informative priors. However, to solve the sample size problem, two additional corrections were made: first, we followed the Jiang and Yuan (2017) schema, and second, we made a Jackknife correlation correction. The estimations of the confirmatory factor analysis, as well as the global fit, are adequate. As an applied perspective, the acceptable fit of our model suggests the possibility of operationalizing the latent factor Cognitive Reserve in a simple way to measure it in the Down Syndrome population.


Introduction
In the field of social and health sciences the presence of sample anomalies is very common, specifically, samples are usally characterized by reduced sample size and missing data (Cohen 1990;Silvia et al. 2014). Obviously, these types of samples directly affect the validity of the data, the generalizing possibility of the results and the investigation quality. Moreover, most data analysis procedures are not designed for these abnormalities (Schafer and Graham 2002). However, in recent years Bayesian estimations have become popular in many scientific fields (Muthén and Asparouhov 2012;McNeish 2016;König andvan der Schoot 2018, Smid et al. 2020), mainly because of the availability of popular softwares and because it adds some advantages over frequentists estimations. Regarding these advantages, one of the most interesting features is it's flexibility to work with small samples sizes (MacNab et al. 2011;Smid et al. 2020).
In relation to missing data, multiple imputation (MI) techniques have become popular to replace the lost values with simulated data that allow the full cases analysis. These techniques are not exempt from controversy since they assume an arbitrary alteration of the original distributions and can modify some of their statistics (He 2010). There are a multitude of imputation possibilities and easy access, which implies an evident risk of erroneous selection of these techniques (Graham et al. 2006;Rhemtulla and Little 2012;Little et al. 2014;Smid et al. 2020).
Missing data is very frequent in studies with persons who have limitations to participate directly from themselves. Concurrently to present a methodology process to solve the small sample size and missing data, we aim to present a definition of Cognitive Reserve (CR) for people with Down Syndrome (DS).CR is the brain's ability to handle brain damage or disease, maintaining a stable level of function (Stern 2002(Stern , 2006(Stern , 2011(Stern , 2018. From our point of view, the study of CR concept in DS population would be of great advance to make a step forward in the dementia field. The prevalence and risk for dementia in this population is significantly different from general population or from people with Intellectual Disability (ID) of other aetiologies (Lott and Head 2019). In this population, as in general population, individual trajectories of cognitive changes are highly heterogenous, with some declining dramatically rapidly and others declining very slowly or even improving (Bull 2020;Temple et al. 2001). Given the variability that exists in DS, the implications of CR for individuals with DS deserve special attention to generate a more systematic definition (McGlinchey et al. 2014).
CR is considered to offer neuroprotective benefits against brain injury and disease by facilitating greater neural efficiency and/or enhanced capacity to develop compensatory cognitive mechanisms subsequent to the neuropathology onset (Jones et al. 2011;Stern 2002Stern , 2006Stern , 2011Stern , 2018. Although this concept has been depply studied, there are still measurement problems in order to operationalize CR by using different proxies. For instance, years of formal education is the most widely studied variable, and it is commonly used as the single proxy for the CR concept (Valenzuela and Sachdev 2006). However, CR could be also conceived as an hypothetical factor, not directly quantified by a unique measure, but as a latent construct. Therefore, nowadays CR is studied from a latent variable data analyisis approach with more proxies as leisure activities (Ruiz-Contreras et al. 2012;Suchy et al., 2011), early family environment and genetic factors (Lee 2007), intelligence quotient (IQ; Stern 2006;Tucker-Drob et al. 2009;Tuokko et al. 2003); parent socioeconomic status (Richards and Sacker 2003); education attainment (Foubert-Samier et al. 2012) and physicical activities (Gow et al. 2012).
To address the CR concept we have found some challenges related to the sample size and missing data, that we have had to solve through a suitable methodological procedure. The goal to define a concept of CR in DS population are included in a broadly study which the techniques used (e.g. aerobic capacity test) are difficult to administer in people with DS. Due to the complexity of the register techniques used, the sample obtained has been small and with enough missing data that could hinder the main study objectives.
Thus, the purposes of this paper are twofold; on the one hand, establish an heuristic for the treatment of missing data and small samples sizes, on the other, develop a measurement 1 3 model for the operational definition for CR in a DS population. As far as we know, there have been no published works in which a chained heuristic is proposed to solve both issues.

Participants
The sample was composed of 35 persons with DS between 16 and 35 years old (M = 24.4 and SD = 5.42); 25.7% were women. The sampling was accidental, and recruitment was performed through several DS care institutions in Mexico (54.3%) and Spain (45.7%). The following inclusion criteria were applied: (a) age between 16 and 35 years old and (b) formal diagnosis of DS through karyotyping. Regarding the exclusion criteria: (a) evidence of other comorbid diagnoses implying cognitive dysfunction; (b) impossibility of obtaining consent from legal tutors; and (c) presence of medication affecting cognitive functions. In order to assess the presence of other comorbid diagnoses implying cognitive dysfunction, the Dementia Screening Questionnaire for Individuals with Intellectual Disability (DSQIID) was used. It has a high internal consistency (α = 0.91; Deb et al., 2007). Thus, based on the mentioned criteria the final sample was composed of 31 participants.
The diagnostic of ID is based in the Spain government law about disability, which assigns a "handicap" percentage to every person with a disability to represent the condition severity. The disability percentage is assigned administratively based on all types of impairments (e.g., intellectual, physical, sensorial). In the present study, 58% of the sample had mild ID, 38.7% had moderate ID, 3.2% had severe ID. None of the participants had sever or profound ID. IQ was assessed using the Kaufman Brief Test of Intelligence (KBIT, Kaufman and Kaufman 1990). However, due to the no adaptation of the test for people with ID, it wasn't included in the CFA model. However, IQ had a mean in our sample (n = 21) of 46.21( SD: 10.91); with a minimum of 40 and a maximum of 80.

Instruments
These data are part of a larger protocol in which the relation between cerebral signal (fMRI) and variables connected with cognitive outcome, quality of life and physical activity is studied. In the current study, in addition of the age and ID diagnosis (Personal Conditions) described above, the variables assessed through the next scales and tests were included. For the Quality of Life assessement, the Spanish version of Personal Outcomes Scale was used (Carbó-Carreté et al. 2015). This scale aims to assess quality of life in people with ID on the basis of the eight domain quality of life model (Schalock and Verdugo 2002), which was arranged into three higher-order factors: Independence (personal development, self determination, interpersonal relations); Social participation (social inclusion, rights); Wellbeing (emotional wellbeing, physical wellbeing, material wellbeing) (Wang et al. 2010). The reliability is appropriate for the domains and, particularly, for the factors, with α values higher than 0.82.
The Cognitive Outcomes were assessed by three measurements tools: (a) the semantic fluency (number of words produced during 1 min in two conditions, things to buy in a supermarket and names of colors); (b) the phonological fluency (number of words produced beginning with the letter "p" during 1 min) and (c) administration of the Frontal Assessment Battery (FAB) with an internal consistency of α = 0.78 (Dubois et al. 2000).
In relation the Physical Activity there were four assessments: (a) the first was related to the level of practice of physical activity, through the Level of Physical Activity Scale (LPAS) (Carbó-Carreté et al. 2016). The internal consistency of this scale is α = 0.79. The next two measurements were about physical activity and sedentary level by using GT3x Actigraph accelerometers (ActiGraph, Fort Walton Beach, FL, USA) (Oviedo et al. 2017). Specifically, (b) the Total Physical Activity (Total PA) were calculated by assessing the accelerations of the device on the longitudinal axis 1 (counts/min) and (c) the Vector Magnitude (counts/min), understood as the device accelerations on the longitudinal, sagittal and frontal axis, was calculated by the square root of the sum of the squares of the three axes. Finally, (d) the Cardiorespiratory Fitness was determined by obtaining the peak oxygen consumption relative to body weigh (relative VO 2 peak; ml/kg/min).

Procedure
The applied protocol was approved by the Bioethical Committee of the Universitat de Barcelona (16/03/2017) and the data was registered between March 2018 and July 2019. Informed consent of parents or tutors in legal charge of every person with DS was obtained. Moreover, informed consent from the participants with DS was obtained. All the participants were evaluated in two register sessions by previously trained investigators. The administration sequence was the same for all the participants, and the scales referenced above were administered first to avoid fatigue bias.
For the quality of life and for the Level of Physical Scale parent's participation was accepted. For the Cognitive Outcomes and the rest of Physical Activity it was required the direct participation of the person with DS. The researchers who administered the scales and the physical tests were specific professionals trained to do this assessments. In the Mexican and Spanish administration, the same instruments were used in all the registers to guarantee the comparability of the data and avoid measurement biases.

Statistical analysis and heuristics
First, the observed distribution of the variables was studied to evaluate the impact of the missing data in every variable. Of the twelve variables studied, only 4 presented missings: ID level (11.4% of missings), Cardiorespiratory fitness (22.9% of missings), Total PA (25.7% of missings) and finally Vector Magnitude (25.7% of missings). Second, the acceptable maximum percentage for missing imputation was studied. To do so, the Little χ 2 test was applied to evaluate whether the missings followed a MCAR, which would be the more desirable pattern.
In the case of the categorical variable ID diagnostic, the prediction models based on multinomial logistic models and linear discriminant analyses were studied to identify a predictor variable combination that allowed the relevance probability to be applied to one of the four categories of ID diagnostic prediction. None of the models tested offered statistically significant results. This forced us to discard the prediction models with auxiliary variable prediction. Therefore, it was decided to use simple imputation based on linear interpolation.
For the quantitative variables, the imputations derived from the linear regression (LR) and predictive mean matching (PMM) models were obtained. In the case of LR models, the linearity between pairs of variables and the possible effects of collinearity were previously analyzed, and stepwise estimation models were chosen. For both cases, LR and PMM, a total of 10 simulated samples were obtained iterating from the initial solution, following Montenegro and Chesnut (2015), and each was analyzed to identify the best adjusted iteration.
With the aim of achieving a more exhaustive study of the imputation results, the mean SE of all imputed means was obtained from the Rubin (1976) estimation. This is a standard error (SE) estimation of the parameter mean of interest (a).
where M is the number of generated databases, s k is the SE on the K database, a k is the parameter estimation on the k database, and a is the estimated parameter mean and uses the factor (1 + 1/M) that corrects the equation because the number of databases is finite.
However, the adjustment estimation proposed by Rubin (1976), does not make a direct comparison of the mean's SE obtained in the original distribution and in the solution after the imputation mechanism. To overcome these limitations, an adjustment index associated with the SE of the original and imputed distributions was defined. The basic expressions for calculating this index, called the imputation fit index (IFI), are shown below.
where m are the different variables and k are the different iterations, EE x m(o) is the mean SE of the initial variable (observed) and EE (x � m(k) ) is the mean SE of the variable in the k imputation. By means of this index, the difference between the mean SE of an observed variable and the mean SE of the imputed variable is estimated. Obviously, depending on whether the result of this difference tends to 0, the two distributions are closer. However, when standardizing the IFI distribution, the fit values will be placed in the bottom tail of the distribution. In this way, Z IFI ≤ −1.65 was set as the reference value for the acceptance of H 0 , that is, the observed and imputed equality of distributions.
As the final part of the present work, the resulting database was submitted to CFA. Given the small sample size, Bayesian parameter estimations were made using Bayesian structural equation models (BSEM) with non-informative priors (Muthén and Asparouhov 2012;Asparouhov et al. 2015;Hoijtink and Van de Schoot 2017).
Obviously, it would be preferable to have informative priors (Smid et al. 2020), but we do not have solid empirical data to use them as informative priors as is the first time to our knowledge that CR in DS is studied. The correlation matrix used as the input was imputed using Jackknife correlation. This procedure consists of estimating Pearson's correlations for each pair of variables, using as many samples as are obtained from removing one subject from the sample at a time. From the distribution of correlations thus obtained, the average correlation and its 95% confidence interval were estimated. To integrate the correlation matrix, the lower limit value was used for all cases to obtain correlations that were as attenuated as possible and, at the same time, that had the least possible bias. Parameter estimations were made using MPlus version 8.4 and ad hoc programming in R version 3.6.3.
Given the reduced sample size, it was not feasible to study the measurement adjustment model by the usual indicators. Therefore, Jiang and Yuan's (2017) strategy was used. These authors, based on the proposals of Satorra and Bentler (1988) and Jung (2013), developed a fitting index design that is effective for multivariable normal distribution violation assumption and suitable for very small samples. Jing and Yuan's proposal (2017) is based on four fitting index extractions: T COR1 , T COR2 , T COR3 and P COR4 . All of them oscillate between [0,1], and the right settings must be assumed in values of T CORi > 0.90 and P COR4 > 0.10. They are defined as follows: where c 1 = tr(ÛΓ)∕rank(ÛΓ ) is fulfilled by rank < df. This condition indicates that it is impossible to estimate more parameters in the CFA than those determined by the sample size, n should be lower than the number of non-duplicated units in the variances-covariances matrix (Û ). Finally, (Γ) is the reproduced matrix from the usual estimated parameters in the adjustment of χ 2 . Moreover, T ML = nF ML ̂ , where F ML (̂ ) is the usual discrepancy function F ML ( ) = tr SΣ −1 ( ) − log| SΣ −1 ( )| − p.
From the previous expressions, Satorra and Bentler (1988) defined TML as follows: in which every parameter ( j ) considers the statistical adjustment of 2 1j . From this expression, two more indicators are proposed. The first is the T RML indicator in a rescaled measure: where p* is the number of different units in R and q is the number of parameters to estimate. A second indicator adjusted to the variable called T AML is defined as follows: 2 with the condition that c 2 should be between r and c 1 and the value of T COR2 should be between the values of T COR1 and T RM where p RML and p AML correspond to the significance levels associated with the statistics of the χ 2 distribution and with the degrees of freedom of the measurement model.

Imputation estimation
The lost percentage effect was studied from the statistical estimation of Little's χ 2 test to obtain a MCAR pattern. Little's test indicates a reasonable adjustment to a characteristic MCAR pattern (Little's χ 2 test = 34.024, df = 27, p > 0.05).
In the case of the categorical variable ID diagnostic, the prediction models based on multinomial logistic models and linear discriminant analyses were studied but none of the models tested offered statistically significant results (for instance, in the case of discriminant analysis, the Wilks' λ values ranged from 0.750 to 0.964).
In view of the impossibility of using multiple imputations in this variable, it was decided to use simple imputation based on linear interpolation. The results after applying simple imputation shown that the median, mode and absolute deviations median for both variables were the same.
In relation to the quantitative variables, the mean SE of all imputed means were calculated through Rubin (1976)  In all cases, the Rubin (1976) estimates are higher in the LR solution. Likewise, the distribution of Rubin's proposal does not present a standardized solution that facilitates interpretation, nor does it fix or recommend reference values from which a suitable solution is considered in a given imputation. Therefore, the imputation fit index (IFI) were calculated. An iteration process was generated by fixing a maximum of iterations (k = 10) with the PMM method. It is important to remember that, given the small sample, the number of iterations recommended (Schafer and Graham 2002) was doubled to guarantee that an adequate solution was obtained.After performing the different IFI estimations for each variable, values lower than or equal to − 1.65 were found for the 3 variables. Table 2 shows the IFI results in the 10 iterations performed. In this first step, an adequate value of IFI ( Z IFI ≤ −1, 65) was obtained from the imputations performed with 3 variables (Total PA, Vector Magnitude and cardiorespiratory fitness). In the third column of Table 2, the iterations number for an adequate solution are shown.

Confirmatory factor analysis
Once the initial data database was reconstructed, a measurement model was established to define the CR latent factor for people with DS. This model was based on the works mentioned in the introduction with some adaptations because it is addressed to a specific population. The model proposed is composed of four latent factors (personal conditions, quality of life, cognitive outcome and physical activity). Figure 1 shows these factors with their correspondent direct indicators.
As shown in the previous figure, we have four first order latent factors, each defined by the indicators in Fig. 1 and only one second-order factor. To guarantee the range condition described above, the maximum number of variables possible must be a number that allows the degrees of freedom to be equal or greater than 0. For this, the number of possible free parameters to be estimated must be known. In this case, using all the variables in Fig. 1, the free parameters would be the following: Matrix Λx = 12 factor saturations; Matrix Φ = 6 correlation coefficients (as many as correlations other than 1 among the first-order factors); Matrix Θ δ = 12 estimates of the variance of the measurement error. There are no other parameters to estimate since no correlations have been established between the variances of the measurement residuals [E (θ δij , θ δkm ) = 0)]. Therefore, the number of free parameters to estimate is 30. Given this value, the minimum number of possible variables is derived from the expression df = ½ [(p · p + 1)] − q, where p is the possible number of variables and q is the number of parameters. In these terms, the number of variables (p) must be greater than 8 to guarantee that df > 0.  Table 2 Variables to impute, better result in Z IFI and iteration in which the better result was found Step 1 indicates the first iteration round Variables (m) Z IFI Iteration (k) Step 1: Iteration's start (max k = 10) In our case, the condition is confirmed; therefore, the estimates are feasible using Bayesian estimation in accordance with what is proposed by Lee and Song (2004), Muthén and Asparouhov (2012) and Smid et al. (2020). Figure 2 shows the parameter estimation and all loading factors are statistically significant.
Regarding the correlations between factors, they are very low in all the cases. The higher value is between factor 2 and factor 3, that are, quality of life and cognitive outcome.
Finally, Table 3 shows the adjustment values of the proposed CFA model, showing both the usual indices and those proposed in the Jiang and Yuan's work (2017) as an alternative to the adjustments for small samples. The T CORR1 , T CORR2 and T CORR3 values are greater than 0.9, and P CORR4 is greater than 0.10. For all of these, we could say that the four corrections made by Jiang and Yuan (2017) demonstrate a good fit to the model. In the case of Bayesian Information Criteria (BIC) values, a lower value is preferable (Asparouhov et al. 2015).

Discussion
This study had to main objectives. The first aim of the present study was to propose a solution to overcome the problems of missing data and small sample size. MI was performed in the database, and two corrections for the small sample size were also done. The second aim of the paper was to estimate a CR model for DS people. A confirmatory factor analysis was developed and, as an applied point of view, the global fit of the model suggests the possibility of operationalizing the latent factor CR in a simple way to measure it in the Down Syndrome population.
More concretely, regarding the first aim of the study, all the variables followed MCAR distribution and two types of MI were made, LR and PMM. Once all variables used were completed, CFA with Bayesian estimations was performed. However, to solve the sample size problem, two additional corrections were made prior to the final analysis: Jiang and Obviously, we must assume that working with missing data and small sample size is not a good scenario. Even if we add some recommendations, it is not a perfect stage to work with. Therefore, what we aim to present in this paper is only valid when resampling is not viable and the population of interest is reduced. In the case of DS and considering the variables involved, this is the perfect situation to apply these techniques.
Past research on missing imputation has focused on which technique is the best or what point is acceptable to impute missing data (Rhetmulla and Little 2012; Little 2014). However, there is an important issue that directly affects these questions, the sample size. Our results focus on this aspect that affects all the decisions made from the first step. Our results illustrate several concerns about multiple imputation. Although the evaluation of imputations can be made by traditional estimations (Rubin 1976) and is widely used, this technique seems slightly weak because it does not directly compare the SE obtained in the original distribution and in the solution after multiple imputation. Moreover, the estimation does not evaluate which iteration works better. Therefore, to address these limitations, the use of our index (IFI) seems to be a good solution. This indicator allows us to know the point at which imputation generates a variable that behaves in a very similar way to the original one. Regarding the number of iterations made in the multiple imputations, it is important to note that although some authors argue that 5 iterations are sufficient to obtain good results (Montenegro and Chesnut 2015) and that the last iteration is normally the best, in the present paper we find that in many cases the best iteration is the second one. As mentioned above, the proposed IFI indicator has demonstrated to be a good tool to evaluate the quality of missing imputation.
In reference to the posterior analysis, a Bayesian CFA was conducted with non-informative priors. As noted, this analysis attempted to operationalize the concept of CR in DS.  Regarding the present analysis, two specific points should be highlighted. First, we do not recommend the use of frequentist inferential techniques because of the limited sample size. Second, the use of Bayesian estimation could generate the problem of using informative or noninformative priors (Smid et al. 2020). These authors also note that the analysis with informative priors would be much stronger and much more potent in the case of small sample size, but many times, as in our case, this information is not available. The corrections that we proposed, based on Jiang and Yuan's (2017) seem to be a good approximation to study the global fit of BSEM. In the same line, our results show the capacity of conservative strategies such as the Jackknife correlation (in particular, the use of the low confidence interval value) to estimate the R matrix as an input to BSEM. The combination of both strategies makes it possible to generate a statistical framework that is more adequate to solve the small sample problem and limitations.
As an applied point of view, and regarding the second aim of this paper, the acceptable fit of the CFA model suggests the possibility of operationalizing the latent factors of CR to measure it in the DS population. The results of the individual Bayesian parameter estimation, as well the global fit, are evidence of this possibility. As proven above, Quality of Life, Cognitive Outcome, and Physical Activity can influence and mediate the concept of CR. Moreover, although years of formal education have been the principal variable studied to operationalize this concept (Valenzuela and Sachdev 2006), it is tough to measure in ID population. As mentioned above, the perspective of conceptualizing CR as a latent construct defined with different measures seems the clearer one. Obviously, according to the real properties of the sampling, this model does not allow any extrapolation for the general population and must be used strictly as an operational definition by our sample. However, it is an empirical confirmation and therefore an optimal measure for the operationalization of CR in persons with DS, at least in our sample. As it has been mentioned above, the prevalence of dementia in this population is much higher than in other ID populations and general population, this could be an initial phase in order to study the big differences encountered in DS population regarding the appearance of dementia.
These results are not exempt from some limitations. We did not study the statistical properties of the IFI to achieve a more adjusted significance or use simulations to estimate it's sensitivity specificity. Another important limitation of this study is the use of noninformative priors in the Bayesian estimation of the CFA. Clearly, more efforts are needed to develop better-performing statistics with small sample sizes and missing data in the sample. However, in relation to the CR study, this paper provides the first approximation of this concept to the DS population and may provide a starting point to study the individual differences in the clinical and neuropathological appearance of dementias.