Development of measures for the Inner Setting constructs
The development of measures occurred in 4 phases: first, we identified constructs of interest and compiled existing measures for those constructs; second, we generated items for each construct of interest by adapting items from existing measures and developing new items to create a set of preliminary measures; third, we pilot-tested and refined the preliminary measures; and fourth, we conducted a validation study with the refined measures. Since our goal was to develop measures of constructs that could potentially be targets for implementation interventions and could be implemented feasibly within the FQHCs, we chose CFIR constructs that were relevant for FQHCs, modifiable, and hypothesized to be measurable with few items.
For all the steps described above, we used a consensus development process. We made decisions about what constructs to include at a CPCRN meeting that included CPCRN investigators and other implementation science experts. We discussed each Inner Setting construct and sub-construct and chose a preliminary set of constructs based on expert opinion about importance, changeability, and feasibility for measurement. Following the in-person meeting, the CPCRN FQHC Workgroup held two more in-person meetings and a series of teleconference discussions to make final decisions on the constructs and other development steps described above. We ultimately selected 15 out of 37 CFIR constructs to create measures for. Among these were 5 constructs that fall within the Inner Setting domain: Culture, Implementation Climate, Learning Climate, Leadership Engagement, and Available Resources. CPCRN sites then each took the lead on searching for items for one or more constructs, and the team held weekly meetings for several months and made decisions collectively about the items chosen as described below.
Identification and selection of items
We began our identification of the Inner Setting measures by drawing on existing surveys that had been administered in FQHCs. Specifically, we reviewed a survey created by the Association of Asian Pacific Community Health Organizations (AAPCHO) to study capacity for implementation of evidence-based interventions for cancer screening . We chose the AAPCHO because it was highly related and allowed us to build on previous work. This survey included the Practice Adaptive Reserve (PAR) scale which had previously been used in the evaluation of the national Patient-Centered Medical Home Demonstration Project [17,18,19]. First, we identified items from the AAPCHO survey that matched CFIR constructs based on the construct definitions  and the face validity of items. We held multiple group discussions to reach consensus on the “match”. For constructs that did not have matching items from the AAPCHO survey or had items that did not fully reflect their definitions, we conducted a literature search for other existing measures. We started with models and frameworks included in the CFIR to see if they referred to measures of specific constructs. We also searched the following electronic databases: PubMed, CINAHL, ISI Web of Science, and PsycINFO for peer-reviewed articles published in the past 15 years to identify relevant measures. We used search terms such as CFIR, inner-setting, implementation culture, and other construct names to identify measures and constructs. In addition to the search, we also reviewed measures listed on the Grid Enabled Measures (GEM) and Society of Implementation Research Collaboration (SIRC) websites. We then compiled all the potential measures for those constructs and had extensive discussions to select items from each. We used the following criteria for item selection: (1) items fit the CFIR definition of the constructs, (2) items had been used in health related settings (e.g., public health, healthcare, mental health, and school) and were relevant for FQHCs or could be adapted to the FQHC setting, and (3) items fit the goals of the survey and were from published studies with measures that demonstrated some evidence of reliability (e.g., internal consistency) and validity (e.g., construct validity) in previous research.
In searching for Culture measures, we identified two sub-constructs not explicitly listed in the CFIR, Stress  and Effort , which were assessed separately. We decided to include these sub-constructs in addition to a more general measure of culture because the workgroup members believed that while related, these constructs were likely distinct. Therefore, our final list of the Inner Setting measures included 38 items to measure 7 constructs and sub-constructs: Culture Overall (CFIR construct; 9 items), Culture Stress (sub-construct based on the work of Patterson ; 4 items), Culture Effort (sub-construct based on the work of Lehman ; 5 items), Implementation Climate (CFIR construct; 4 items), Learning Climate (CFIR sub-construct; 4 items), Leadership Engagement (CFIR construct; 4 items), and Available Resources (CFIR sub-construct; 7 items). Definitions for each the Inner Setting construct and sub-construct are described in Table 1.
Item adaptation and survey development
The identification of measures made it clear that some constructs could be measured generally, that is, they did not necessarily need to be tied to a particular implementation effort or EBA, while others required specific anchoring about what EBA the item was referring to. Selected items were adapted for the context of improving colorectal cancer (CRC) screening in FQHCs. For intervention-specific constructs, such as Implementation Climate, items were also adapted to the specific EBA for CRC screening that the FQHC was implementing (captured in another section of the survey). EBA options were selected from those recommended by the Guide to Community Preventive Services (Community Guide) for increasing CRC screening (www.thecommunityguide.org).
Additionally, since we were interested in understanding factors influencing implementation of several EBAs for increasing CRC screening, participants were first asked about the level of implementation of each Community Guide recommended EBA and then asked questions related to CFIR constructs that were specific to the EBA being implemented. Because of constraints on the length of the survey, when a respondent indicated that the FQHC was implementing more than one EBA, subsequent questions on CFIR constructs referred to only one of the EBAs mentioned. The survey automatically inserted only one of the EBAs using the following prioritization: provider reminders first, followed by patient reminders, one-on-one education, and provider assessment and feedback. For example, if the clinic responded that they were implementing both provider reminders and one-on-one education, the follow-up questions would insert provider reminders. An example of a follow-up question is as follows: “the program is a top priority in the company” was an item to measure implementation climate by Klein et al. It was adapted as “Using <EBA> to increase CRC screening rates is a top priority in the clinic” in our measure. Depending on which EBAs were used by the clinic, as indicated by previous answers, the question appeared online with a specific EBA. Table 1 indicates whether an item was general or specific to an EBA.
Pilot testing and refinement
We programmed a web-based survey and then pilot-tested the survey in 4 FQHCs in 2 states (WA and TX). We also sought input from leaders at individual FQHCs and states’ Primary Care Associations (PCA) to ensure the appropriateness of the measures for FQHC clinic staff. More specifically, we asked leaders to review constructs for their importance and changeability as well as items for their understanding and representation of the constructs. We then held telephone meetings with leaders to discuss feedback. Feedback from leaders confirmed our selection of constructs and led to minor changes in the wording of some items.
Recruitment and survey administration
We used a variety of strategies to recruit FQHCs to participate in the study . While survey administration was customized, recruitment protocols were tailored based on the CPCRN existing partnerships with FQHCs in each participating state. Five CPCRN sites (WA, SC, TX, GA, CO) partnered with their state’s PCA. In 4 of these states (WA, TX, SC, CO), the PCA emailed their member FQHCs encouraging them to participate in the survey. Five CPCRN sites that had existing relationships with FQHCs (TX, GA, CA, CO, MO) invited them to participate in the survey by contacting them directly through email, telephone calls, or in-person meetings. One state PCA (SC) also directly recruited participants at a meeting of FQHC staff members.
In most cases, one individual from each participating FQHC was designated as the main contact, usually the clinic’s medical or administrative director. This individual was asked to complete questions about their clinic characteristics as well as send an introductory email with a link to the online FQHC CFIR survey to eligible staff members encouraging their participation. The online FQHC CFIR survey was programmed to allow a maximum of 10 staff from each clinic to complete the survey with a maximum of 3 providers (physicians, nurse practitioners, and physician assistants), 3 nurses or quality improvement staff, and 4 medical assistants (non-medical administrative staff were excluded). Between January 2013 and May 2013, providers and staff at FQHC clinics located in CA, CO, GA, MO, SC, TX, and WA completed the survey. Reminder emails were sent to potential participants at 2, 4, 6, and 8 weeks post-invitation. Incentives were offered to either individuals completing the survey or to FQHCs, whichever was preferred by the FQHC. If the clinic chose the individual incentive, participants received $25 gift cards. FQHCs that chose the clinic incentive received $250. One FQHC declined any incentives. All study procedures were approved by the Institutional Review Boards of each CPCRN Collaborating Center as well as the Coordinating Center at the University of North Carolina at Chapel Hill and the CDC.
We assessed descriptive statistics for clinics which responded to the clinic characteristics survey (n = 52) and demographic information from FQHC CFIR survey respondents (n = 327). We also assessed descriptive statistics for FQHC CFIR survey measurement items. Since we collected data from individuals nested within clinics to measure clinic-level constructs, we used a series of confirmatory factor analysis (CFA) models to test factor structure. We first conducted single-level CFA models adjusting for the nested structure of the data for each of the following constructs: Culture Overall, Culture Stress, Culture Effort, Implementation Climate, Learning Climate, Leadership Engagement, and Available Resources. We used full information maximum likelihood estimation with robust standard errors to account for missing data and non-normality of survey items. We adjusted for the nested structure of the data by using the TYPE = COMPLEX command in Mplus. We used multiple indices to evaluate model fit as recommended by : Chi square (non-significant value = good fit), comparative fit index (CFI, > 0.90 = adequate fit and > 0.95 = good fit), Tucker–Lewis Index (TLI, > 0.90 = adequate fit and > 0.95 = good fit), standardized root mean square residual (SRMR, < 0.08 = adequate fit and < 0.05 = good fit), and root mean square error of approximation (RMSEA, < 0.08 = adequate fit and < 0.05 = good fit) [23,24,25,26]. We considered model adjustments if modification indices revealed substantial model improvements that were theoretically meaningful (e.g., reverse-coded items or items that referred to a specific EBA versus a general EBA).
We then conducted two sets of multilevel CFA models for each respective construct. Multilevel models allow for modeling the factor structure at the within-group or individual-level (level 1) and the between-group or the clinic-level (level 2), as illustrated in Fig. 1 [27, 28]. This approach allowed for testing whether the factor structure was similar at the individual-level and the clinic-level, which is assumed when only modeling individual data to represent a higher level. In the first set of multilevel models, we allowed factor loadings for both levels to freely estimate to test unrestricted models. We then tested a set of models where we constrained factor loadings to be equal across levels to determine if items were loading similarly for the individual (within-group) and clinic-levels (between-group). We compared model fit of constrained and unconstrained models between respective factors using Satorra-Bentler’s scaled chi square difference tests . To assess fit for multilevel models, we used the same fit indices as previously listed, including the SRMR which is presented separately for the individual and clinic-levels for each model.
To evaluate internal consistency, we computed Cronbach’s alpha for each of the scales. We also examined discriminant validity by calculating correlation coefficients of each pair of scales using individual-level data and aggregated data by clinic (to represent the clinic-level). To further assess the reliability of mean scale scores aggregated at the clinic-level, we computed two intraclass correlation coefficients, ICC(1) and ICC(2), using one-way random effects ANOVA . ICC(1) provides an estimate of the proportion of variance in a specific measure that is explained by group membership (FQHC clinic). The larger the value of ICC(1), the greater agreement or shared perception there is among raters within a group (FQHC clinic). ICC(2) indicates the reliability of the group-level mean scores. It varies as a function of ICC(1) and group size: the larger the value of ICC(1) and the larger the group size, the greater the value of ICC(2) and then, a more reliable group mean score. As recommended in the literature [30, 31], we used a threshold of 0.70 to indicate a reliable group score.
Finally, we tested an index of inter-rater agreement, the rWG(J), to further assess the validity of clinic-level means as measures of clinic-level constructs. The rWG(J) index indicates the degree of agreement among raters by comparing within-group variances to an expected variance under the null hypothesis of a distribution representing no agreement . An rWG(J) score above 0.70 indicates sufficient inter-rater agreement to compute FQHC clinic-level means for clinic-level constructs . ICC(1), ICC(2), and rWG(J) statistics at the clinic-level were computed for clinics with two or more respondents, so clinics with only one respondent were dropped from analyses. We used Mplus version 7.31  for testing all CFA models. To test Cronbach’s alpha, correlation coefficients, ICC(1), ICC(2), and rWG(J), we used SPSS version 23.