Background

Stakeholder engagement in research is the process of ensuring that key community health constituents are identified and involved throughout the research process as partners (investigators not participants). Ideally this involvement starts before project inception, so that they are able to inform study design, implementation, interpretation of results, and make use of the results when the study is completed [1]. There has been a call for better reporting and evaluation of engagement approaches, initiatives, and activities to advance the science of stakeholder engagement [2]. The engagement of stakeholders (e.g., patients and their families, clinicians, health systems, policy makers, community organizations, advocacy groups) in research projects has created lessons learned and best practices. However, few methods exist for measuring the extent to which stakeholders are engaged in a research project (e.g., quality of engagement efforts), limiting the ability to determine evidence-based approaches for stakeholder engagement [2]. This poses two major problems to advancing stakeholder engaged research. The first is that it is difficult to compare the effectiveness of various strategies employed by different research teams in incorporating stakeholder views and input. The second problem lies in determining the effect of stakeholder engaged research practices on the rates of program adoption and success of implementation.

Currently, researchers must work from a set of case studies and ‘best practices’ recommendations (e.g., actively seeking collaboration with diverse populations, offering many opportunities to give input in a variety of formats and venues, going to where people are, being transparent and trustworthy). For instance, Holzer et al. use three case studies to demonstrate some key elements (e.g., building trust, encouraging participation, promoting uptake of findings) of successful approaches to community engagement in research [3]. However, the breadth of disciplines that undertake stakeholder engaged research impedes any kind of generalization of best practices. Furthermore, stakeholder engagement can occur at any stage of research, yet may look very different in the early stages of a research project (such as hypothesis development) as compared to the dissemination phase of a translational research project. It is impossible to gauge from the existing literature what level of engagement is necessary for a study and what types of engagement practices would be best given a particular population and research question.

Reviewers of the literature tend to suggest community engagement practices have some positive impact on health improvement interventions for a range of health outcomes across various conditions. However, there is insufficient evidence to determine whether one particular model of community engagement is more effective than any other [4, 5]. In addition, such articles also simultaneously note substantial variation in the effectiveness of different practices on improving interventions without being able to determine whether any one approach consistently outshines the rest [6, 7]. A systematic review found no evidence of impact from community engagement on population health or the quality of services, but engagement initiatives did have positive impacts on housing, crime, social capital, and community empowerment. Methodological developments are needed to enable studies of complex social interventions to provide robust evidence of population impact in relation to community engagement. With no consistent approach to measuring engagement, conducting analyses across multiple studies is ineffectual.

Current approaches to measuring stakeholder engagement focus largely on qualitative methods [8,9,10,11,12]. Despite their efficacy at assessing engagement, these methods are difficult to scale up for large-scale projects and produce results that are difficult to compare across studies and do not generalize well into standard practices [13]. For these reasons Bowen et al. called for the development of a quantitative scale, grounded in theory, that is comprehensive in evaluating all elements of engagement, is easy to use, and provides psychometric data [14]. Such a scale, the Research Engagement Survey Tool (REST), has been proposed [15, 16], and has been comprehensively evaluated. This paper examines the internal consistency (reliability) and convergent validity of the REST.

The original version of REST was developed by the evaluation team of the Program for the Elimination of Cancer Disparities (PECaD) at Siteman Cancer Center [17, 18] and pilot tested in one of its programs [13]. The original version of the REST was designed to align with 11 engagement principles selected by the PECaD’s community advisory board (Disparities Elimination Advisory Committee) based on the community based participatory research and community engagement literature [11, 19,20,21,22,23,24,25,26,27,28,29]. Subsequently, revisions to the measure have been made through a five round Delphi process [15, 16, 30] and cognitive response testing [31]. The final version examines eight engagement principles (EPs) [16], applicable along the full continuum of engagement activities [15]. The EPs are:

  1. 1.

    Focus on community perspectives and determinants of health

  2. 2.

    Partner input is vital

  3. 3.

    Partnership sustainability to meet goals and objectives

  4. 4.

    Foster co-learning, capacity building, and co-benefit for all partners

  5. 5.

    Build on strengths and resources within the community or patient population

  6. 6.

    Facilitate collaborative, equitable partnerships

  7. 7.

    Involve all partners in the dissemination process

  8. 8.

    Build and maintain trust in the partnership

Each EP is assessed using three to five items that were measured on two scales: quality (how well: poor, fair, good, very good, excellent, not applicable) and quantity (how often: never, rarely, sometimes, often, always, not applicable) with five-point Likert response options. The stem for the quantity scale is “Please rate how often the partners leading the research do each of the following” and for the quality scale the question stem is “Please rate how well the partners leading the research do each of the following”. There are measures for key constituent stakeholder groups (e.g., patients [32], community [33], community advisory boards [34], coalitions [35]), however, the REST is unique in that it is applicable to all non-academic research partners and is based on their perspective.

Ideally, such a proposed psychometric scale has both high reliability—records consistently the same values for research project stakeholder engagement independent of who is reviewing—and high validity—that is, draws accurate conclusions about the presence and degree of stakeholder engagement [36]. Cronbach’s alpha is a well-developed measure of reliability, characterizing how strongly the items within each EP resemble each other [37], and is robust to the sample size of surveys conducted [38]. Nunnally and Bernstein propose a value of alpha = 0.80 as a satisfactory level of reliability, beyond which decreasing measurement error has little effect on the value for alpha [39]. Ideally, the validity of a scale would be evaluated by demonstrating that the tool produces results that agree with the ‘gold standard’ test. No gold standard exists for measuring stakeholder engagement, therefore convergent validity (the degree to which two measures of constructs that theoretically should be related, are in fact related) was used to assess construct validity by comparing the results from REST to a number of other scales that similarly measure engagement (e.g., Partnership Assessment In community-based Research [40]). Included in this comparison are also tools for measuring constructs that are theoretically associated with strong stakeholder engagement (e.g., trust in medical researchers and health literacy). This paper presents the analysis of reliability and construct validity of the REST measure.

Methods

Study overview

The study was composed of four longitudinal web-based surveys conducted between July 2017 and September 2019 (see Fig. 1). The modified versions of REST presented on sequential surveys correspond to the versions revised through a Delphi process described in detail elsewhere [16, 30]. Surveys one through three contained measures that assess dimensions of collaboration, partnership, trust in medical researchers, and engagement used in determining convergent validity. Finally, we released the fourth survey in January 2019 and it contained the final version of the REST [16, 30, 31] and asked participants to review the categories of community engagement in research and their corresponding definitions [15] and to classify their project into one of these categories: (1) outreach and education, (2) consultation, (3) cooperation, (4) collaboration, and (5) partnership.

Fig. 1
figure 1

Participants complete a short screening instrument. Those screened eligible are sent a link to the survey. Participants that complete the informed consent screen by agreeing to participate are considered enrolled. Surveys were open from July 2017 (depending on release date) to September 2019

Participants

We recruited participants (community partners in research studies) through several different methods throughout the study period (July 2017 to August 2019). Our first recruitment approach consisted of email recruitment to principal investigators (PIs) in the research team’s network involved with stakeholder-engaged research and contacts in health departments, Clinical and Translational Science Awards (CTSA) Programs, Prevention Research Centers, Transdisciplinary Research in Energetics and Cancer Centers, National Institute on Minority Health and Health Disparities Centers of Excellence, National Cancer Institute Community Networks Programs, and U.S. Department of Health & Human Services Regional Health Equity Councils. We also developed a database of community-engaged researchers nationally and reached out to them via email. We asked PIs of community engaged research studies to share information about this study with their community partners. To complement email recruitment, we conducted in person recruitment by attending local (St. Louis, MO) health fairs and local community partner meetings, posting recruitment flyers locally and attended national conferences related to community engagement. We also used recruitment resources at Washington University in St. Louis, including the Washington University Recruitment Enhancement Core and Research Match, a national health volunteer registry created by several academic institutions and supported by the National Institutes of Health as part of the CTSA program (https://www.researchmatch.org/).

Participants were included in the study if they were currently or previously involved in stakeholder-engaged research, were over age 18, and were willing to participate in an 18-month longitudinal study based on an electronic screening instrument (Additional file 1: Table S1). We removed participants from this study that screened eligible but completed the survey multiple times, provided invalid telephone numbers or zip codes, or had odd patterns in their responses (N = 95) [41, 42]. We screened 675 people, of whom 527 (78%) were eligible. Of those eligible, 487 (92%) enrolled in the study (completed informed consent). Of those enrolled, 393 (81%) completed at least one of the four surveys, while 324 (67%) completed all four surveys. See Fig. 1 for participant recruitment and survey timeline.

Procedures

Potential participants completed an eligibility screener (Additional file 1: Table S1) answering questions pertaining to the inclusion and exclusion criteria described above either online or in person. In person screeners were completed either on paper or via a tablet with a member of the research team present. The vast majority of eligible participants provided an email address and were emailed a personalized link to the first survey within two business days. Participants recruited in person that did not have an email address were provided the first survey in person, completed, and returned to a member of the research team (n = 8). After completion of the first survey, participants were provided with a $10 gift card.

For participants completing the surveys online, surveys two through four were emailed to participants either on the survey release date, or if the participant enrolled in the study after the initial survey release date, the survey was emailed to participants within five business days of completing the previous survey. When new surveys were released, they were sent to all enrolled participants, regardless of completion status for previous surveys. Participants who enrolled after initial survey release dates were sent a link to the subsequent survey within approximately four weeks of being sent the previous survey if they had not yet completed the previous survey. All online surveys, including the eligibility screener, were administered through the survey platform Qualtrics (Provo, UT).

For participants completing the surveys on paper, a member of the research team brought participants the subsequent surveys during meetings they both attended and the participants completed the surveys and returned them to the research team. Participants received $10 for completing survey two, and an extra $5 if they completed both surveys one & two. Participants received $15 per survey for completing surveys three and four, and an extra $10 if they completed both surveys. The Institutional Review Boards at both Washington University in St. Louis and New York University approved all portions of this project.

Measures

Research Engagement Survey Tool (REST)

The original version of REST was developed and pilot tested by the evaluation team for the Program for the Elimination of Cancer Disparities at Siteman Cancer Center [13, 17, 18]. The original version (survey one) contained 48 items corresponding to 11 engagement principles (EPs). Each EP contained three to five items that were measured on two scales: quality (how well) and quantity (how often). For the quality scale, response options were Poor, Fair, Good, Very Good, Excellent. For the quantity scale, response options were Never, Rarely, Sometimes, Often, Always.

Three additional revised versions of REST were presented sequentially on surveys two through four. Revisions were made based on a modified Delphi Panel process and cognitive interviews that have been described in detail elsewhere [16, 30, 31]. On survey four, an additional response option of ‘Not Applicable’ was added based on feedback from a Delphi panel process and cognitive interviews described elsewhere [16, 31].

Scoring REST

The REST has two scoring approaches, the first is aligned with EPs. We treat not applicable responses as missing in the analysis. REST scoring is done at the EP level and overall. EP specific scores were calculated as an average of non-missing items and the eight means were averaged to calculate the overall REST score. This scoring approach is used to examine the internal consistency and convergent validity of the REST. The second scoring approach aligns the REST with the categories of community engagement in research and provides a percentage in each of five categories: (1) outreach and education, (2) consultation, (3) cooperation, (4) collaboration, and (5) partnership [15]. This scoring approach does not provide one overall score, rather it is five percentages (one for each engagement level) based on the number of REST items (out of 32 total) that are scored in each category (using the scoring scheme provided in Additional file 1: Table S2) based on the survey responses.

To develop the second scoring approach, we reviewed each item against the definitions of the categories of engagement and identified the category for each response (Additional file 1: Table S2). For example, for item 1.4, “The focus is on cultural factors that influence health behaviors,” we classified it as follows:

  • For quality: poor = outreach & education, fair = outreach & education, good = outreach & education, very good = consultation, excellent = cooperation. This means if a participant responds with poor, fair, or good one point is added to the outreach & education category, if the participant instead responds very good one point is added to the consultation category, or if the participant had responded excellent one point would be added to the cooperation category.

  • For quantity: never = outreach & education, rarely = outreach & education, sometimes = outreach & education, often = consultation, excellent = consultation. This means if a participant responds never, rarely, or sometimes to item 1.4 one point is added to the outreach & education category, if the participant had instead responded often, one point would be added to the consultation category, or if the responded reported excellent for item 1.4 one point would be added to the consultation category.

A similar process was done for each item (Additional file 1: Table S2). To calculate the overall score by survey respondent, we gave each participant a point for the category of engagement based on their response for that item (Additional file 1: Table S2). For example, if a participant responded good for item 1.4 on the quality scale, they would get one point in the outreach & education category. Then, for each survey respondent, we summed the points for each category of engagement and calculated a percentage with 32 as the denominator; the version of REST that the participants completed had 32 items (comprehensive version). We examine the average percentages by category of engagement.

Other measures

On survey one, we included measures of health literacy [43, 44], subjective numeracy [45], medical mistrust [46], trust in medical researchers [47], a survey of community engagement [34], and the Partnership Assessment In community-based Research (PAIR) [40]. The measure of medical mistrust [46] was calculated as an unweighted sum score with 12 subtracted from the total. The trust in medical researchers score [47] was calculated as a percentage of the total range. For both the medical mistrust and trust in medical researchers scores, higher values indicate more trust in medical researchers. The Kagan et al. [34] summary score was calculated similar to REST, as a weighted average over the three sub-sections of community-involvement, relevance or research, and collaboration & communication. The Kagan et al. survey is a measure of the extent in which community advisory boards (CABs) are involved in research activities. The PAIR [40] measure was also calculated similarly, with a mean score for each dimension (communication, collaboration, evaluation/continuous improvement, benefits, and partnership) and then the dimension means averaged to create an overall score. The PAIR measure is designed to evaluate partnerships between community members and researchers. For both the Kagan et al. and PAIR measures, higher scores indicate higher engagement or a more developed partnership.

On survey two, we included the community engagement research index (CERI) [11], the trust subscale of the coalition self-assessment survey (CSAS) [35], and the community campus partnerships for health (CCPH) principles [48]. The CERI measures the level of community participation in research, while the trust subscale of the CSAS examines trust among coalition members and the CCPH principles measure collaborative partnerships between the community and academic institutions. The CERI was calculated according to Khodyakov et al. [11] by creating a summed index score over the 12 items with higher scores indicating more engagement in research. The trust portion of the CSAS was calculated as an average score [35], with higher values indicating higher trust.

On survey three, we included the partnership self-assessment tool (PSAT) [49, 50] and the Wilder collaboration inventory [51, 52]. The PSAT includes measures of 11 dimensions, including: (1) synergy, (2) leadership, (3) efficiency, (4) administration & management, (5) non-financial resources, (6) financial resources, (7) decision making, (8) benefits, (9) drawbacks, (10) comparing benefits and drawbacks, and (11) satisfaction. Each dimension has several items that are averaged together to create the overall dimension score, with higher scores indicating higher levels of the dimension, except for the benefits and drawbacks scales which are created as percentage scores and the comparison of benefits and drawbacks which consists of only one item [49, 50]. The Wilder collaboration inventory contains 40 total items pertaining to 20 factors of collaboration (one to three items per factor), within six overall categories of collaboration (environment, member characteristics, process/structure, communication, purpose, and resources; two to six factors per category) that are averaged to create an overall score [51, 52].

Demographic questions (age, gender, race, ethnicity, education level, region) and project description questions were presented on survey one, however, if a participant had not previously responded to survey one before they were sent the subsequent surveys, the demographic questions and project descriptions questions were asked of the participant on whichever survey they completed first. Age was measured continuously in years and gender was coded as male, female, or other. Race and ethnicity were asked as two separate questions but were combined into categories of Non-Hispanic/Latino(a) Black, Non-Hispanic/Latino(a) White, Hispanic, Asian, and Other/ Multiracial/ Unknown. Education level was coded as less than high school, high school degree or GED, some college or associates degree, college degree, or graduate degree. Region was coded as northeast, west, south, midwest, and non-state area (includes Virgin Islands and Puerto Rico). Project description questions included the following: open ended description of project and project purpose, the participants’ project role, how long participant had worked on the project, and how long the participant had collaborated with the academic/university partner.

Statistical analysis

Descriptive statistics including mean, median, and standard deviation were calculated by item, EP, and for the overall measure. Frequencies and percentage of ‘not applicable’ responses by items were also calculated. We calculated Cronbach’s alpha for each EP of REST to assess internal consistency. To measure convergent validity of REST with other similar constructs, we calculated Spearman’s correlation coefficients between REST and other measures (i.e., Trust in Medical Researchers, Medical Mistrust, PAIR, CERI, Wilder Collaboration inventory, CSAS, PSAT, Kagan survey of community engagement). The addition of the ‘not applicable’ response option led to a larger number of missing responses on the version of REST presented on survey four. We therefore conducted sensitivity analysis throughout, using a sample of only those with all non-missing items and examined differences between the results. We conducted all aforementioned analyses for both the quality and quantity response scales of REST. All statistical analyses were conducted in SAS ® version 9.4.

Results

The majority of participants were female (80%), from the Midwest region of the United States (53%) and had a college degree or higher level of education (75%). The participants were mostly either non-Hispanic/Latino(a) Black (41%) or non-Hispanic/Latino(a) White (42%) and had a mean age of 42 years (Table 1).

Table 1 Demographic characteristics of participants who enrolled (n = 487) and completed survey 4 (n = 336)

REST summary and scores

EP means for the quality scale range from 3.6 to 3.8 (where 1 = poor and 5 = excellent), while means for the quantity scale range from 3.7 to 4.0 (where 1 = never and 5 = always). The mean score on the overall quality version of REST was 3.6 (95% CI 3.5, 3.7) and the overall quantity mean score was 3.9 (95% CI 3.8, 4.0) (Table 2).

Table 2 Mean (95% confidence interval) and Cronbach’s alpha for engagement principles—final version of REST

When looking at how REST aligns with the categories of stakeholder engagement in research (see Additional file 1: Table S2 for classification information), the range of percent by category ranged widely, as participants were engaged in many different types of projects in our sample across the stakeholder engagement continuum. For the quality scale of REST, average percentages by category of stakeholder engagement ranged from 0 to 100% for outreach and education with a median of 6%; 0–75% for consultation with a median of 9%, 0–81% for cooperation with a median of 25%; 0–75% for collaboration with a median of 41%; and 0–22% for partnership with a median of 3%. For the quantity scale of REST, average percentages ranged from 3 to 84% for outreach and education with a median of 3%; 0–75% for consultation with a median of 9%; 0–78% for cooperation with a median of 22%; 0–75% for collaboration with a median of 47%; and 0–25% for partnership with a median of 6%.

Item specific summary statistics are presented in Table 3. Overall, item means were typically higher for the quantity scale versus the quality scale. The median for all items was 4.0, except for item 7.3 (“All partners have the opportunity to be coauthors when the work is published.”) on the quality scale were the median was 3.0. The item with the highest mean score on both the quantity and quality scales was the same, item 8.4 (“All partners respect the population being served.”). The item with the lowest mean score was also the same on both scales, item 7.3 (“All partners have the opportunity to be coauthors when the work is published.”).

Table 3 Item information summary

Six (19%) items had a large amount of not applicable responses (> 5%). Items that met this criteria were consistent on both the quality and quantity scales and included items:

  • 1.3—“The effort incorporates factors (for example housing, transportation, food access, education, employment) that influence health status.”

  • 3.5—“All partners continue community-engaged activities beyond an initial project, activity, or study.”

  • 6.1—“Fair processes have been established to manage conflict or disagreements.”

  • 6.4—“All partners agree on ownership of data for publications and presentations.”

  • 7.3—“All partners have the opportunity to be coauthors when the work is published.”

  • 8.2—“All partners are confident that they will receive credit for their contributions to the partnership.”

Internal consistency

Results on the final comprehensive version of REST (from survey four) showed strong internal consistency among the EPs for both the quality (Cronbach’s alpha range: 0.83 to 0.92) and quantity (Cronbach’s alpha range: 0.79–0.91) versions of the measure (Table 2). For EP 7 (Involve all partners in the dissemination process), results showed a slight increase in alpha if item 7.3 (All partners have the opportunity to be coauthors when the work is published) was removed. The alpha increased slightly to 0.84 from 0.83 for the quality version, and 0.81 from 0.79 for the quantity version. However, given the slight improvements with the deletion of this item it was retained in the comprehensive REST.

Convergent validity

REST was significantly correlated with several of the comparison measures we used (Table 4). REST showed a statistically significant, but negligible positive correlation with Mainous trust in medical researchers scale (quantity only: r = 0.12, p = 0.03) [46], Hall trust in medical researchers (quality: r = 0.18, p < 0.001; quantity: r = 0.21, p < 0.001) [47], the CERI (quality: r = 0.19, p = 0.001; quantity: r = 0.25, p < 0.001)[11], and PSAT drawbacks dimension (quality: r = − 0.21, p < 0.001; quantity: r = − 0.26, p < 0.001) [49, 50]. There was a negligible and insignificant correlation between REST and each of the single item literacy screeners and a negligible significant correlation between the REST and the subjective numeracy ability (quality: r = 0.11, p = 0.04; quantity: r = 0.11, p = 0.05) and preferences (quality: r = 0.12, p = 0.03; quantity: r = 0.12, p = 0.03) subscales [43,44,45].

Table 4 Comprehensive version of REST convergent validity with other measures

The REST has a low positive correlation with the PAIR (quality: r = 0.34, p < 0.001; quantity: r = 0.44, p < 0.001) [40], PSAT non-financial resources dimension (quality only: r = 0.47, p < 0.001), PSAT benefits dimension (quality: r = 0.33, p < 0.001; quantity: r = 0.41, p < 0.001), and PSAT comparing benefits and drawbacks dimension (quality: r = 0.39, p < 0.001; quantity: r = 0.42, p < 0.001) [49, 50]. REST EP8 (Build and maintain trust in the partnership) showed a low positive correlation with the trust measure of the CSAS (quality: r = 0.40, p < 0.001; quantity: r = 0.42, p < 0.001) [35].

The REST has a moderate correlation with the Kagan et al. measure (quality: r = 0.50, p < 0.001; quantity: r = 0.56, p < 0.001) [34] and the Wilder collaboration inventory (quality: r = 0.54, p < 0.001; quantity: r = 0.54, p < 0.001) [51, 52]. REST also showed a moderate correlation with seven dimensions of the PSAT: synergy dimension (quality: r = 0.61, p < 0.001; quantity: r = 0.62, p < 0.001), satisfaction dimension (quality: r = 0.61, p < 0.001; quantity: r = 0.65, p < 0.001), non-financial resources dimension (quantity only: r = 0.52, p < 0.001), leadership dimension (quality: r = 0.69, p < 0.001; quantity: r = 0.69, p < 0.001), efficiency dimension (quality: r = 0.62, p < 0.001; quantity: r = 0.59, p < 0.001), administration/management dimension (quality: r = 0.63, p < 0.001; quantity: r = 0.64, p < 0.001), and the decision making dimension (quality: r = 0.51, p < 0.001; quantity: r = 0.51, p < 0.001) [49, 50, 53].

While the statistically significant correlations show the measures are correlated (as theoretically hypothesized), the levels of correlation were negligible, low or moderate.

Discussion

We examined the internal consistency and construct validity of the REST. Given the lack of a gold standard measure of stakeholder engagement in research, we calculated the correlation (convergent validity) with other theoretically related constructs (e.g., partnership, collaboration, community engagement, trust, and mistrust). We found statistically significant correlations (negligible, low, moderate) with other measures theoretically associated with stakeholder engagement. However, the lack of high correlation with any of the existing measures suggests the REST is measuring a different construct (perceived stakeholder engagement in research) than these existing measures. Together the results suggest the REST is a valid (research engagement construct) and reliable (internally consistent) tool to assess research engagement of non-academic stakeholders in research. Valid and reliable tools to assess research engagement are necessary to examine the impact of stakeholder engagement on the scientific process and scientific discovery and move the field of stakeholder engagement from best practices and lessons learned to evidence-based approaches based on empirical data and rigorous scientific study designs.

Strengths, limitations, and future directions

Our study findings should be considered in the context of several limitations. First, recruitment delays caused the study design to change, and we ended up recruiting throughout the entire study period versus recruiting all participants before survey one, then consecutively releasing surveys two, three and four to all participants at once. This resulted in some participants doing surveys closer together, while some did them further apart. However, only 31 participants (6%) did the surveys out of order, while 456 (94%) did the surveys in order. Demographic characteristics did not differ between these participants. Second, timing of surveys could have had an effect on those involved in ongoing projects. On survey four, we asked participants to classify the status of their project (just started, ongoing, completed). Of the 336 that completed survey four, 20 (6%) indicated that the project had just started, 174 (52%) indicated that the project was ongoing, and 142 (42%) indicated that the project had been completed (Table 1). Participants whose project was ongoing or had just started may have had changes in the level of engagement across the four surveys, whereas those with complete projects may not have had changes in the level of engagement across the four surveys.

Third, we had a large portion of participants lost to follow up, leading to a smaller sample size for survey four as compared to the number of participants who completed consent. Attrition rate for this study was 31% (151 lost to follow up by survey four) most of which occurred from participants not completing any of the surveys (n = 94; 19%). Among participants that completed one survey (n = 393), 85% completed the final longitudinal survey. While 80% follow-up has been stated as a cut-off rate for acceptable loss to follow-up [54], a systematic review of longitudinal studies found an average retention rate of 74% (standard deviation = 20%), a rate that remained consistent independent of the duration of the study or study type [55]. Studies on the impact of attrition in longitudinal research generally suggest that a 25–30% loss to follow up is acceptable [56], with the impact of further attrition dependent on the extent to which the data is missing at random (acceptable results with up to 25–60% loss to follow-up) or missing not at random (bias present at 30% loss to follow-up) [57, 58].

We also had a higher percent of missing responses on the final version of REST (survey four). This was due primarily to the addition of a ‘not applicable’ response option. This was indicated by the low number of missing responses other than “not applicable” responses among all items (Table 3). Due to attrition and missing data some of the analyses are based on samples of size 224 (67% of the analytic sample). However, we conducted a sensitivity analysis to compare complete case data and data including missingness and found the results to be similar. Finally, REST is currently only available in English, and we were unable to estimate the time to complete. We have time to complete the entire survey, calculated as the finished time minus the starting time. However, the survey had additional questions and participants could start and stop the survey and return later to complete. Excluding those that took more than 30 min to complete, the mean time to completion for the survey is 14 min (median = 13 min) based on this we estimate the REST takes less than 10 min to complete.

Despite these limitations, REST and our study have several strengths. The REST was developed through a stakeholder engaged process from a community-academic partnership (Disparities Elimination Advisory Committee) and was validated using input from stakeholders (e.g., patients and their families, clinicians, health systems, policy makers, community organizations, advocacy groups). REST is flexible and a general tool that can be used across a variety of project types, stages, and stakeholder groups (e.g., community advisory boards, patients, community members, health departments, health care organizations) [59]. REST is easy to administer via online web-survey and also shows potential to be completed via a paper-based survey. The REST is disease, demographic group (e.g., gender, race, age), and stakeholder group (e.g., community advisory boards, patients, community members, health departments, health care organizations) agnostic to allow for use across a broad range of community engaged research activities [59]. The REST was designed to fill a gap due to the dearth of existing measures on stakeholder engagement in research. Measures that assess the perceived level of engagement of non-academic research partners across a broad array of engagement activities, research projects, diseases, health outcomes, and populations are necessary to build an evidence-base for stakeholder engagement by determining the quantity and quality of engagement necessary for successful outcomes.

Conclusions

The REST is a tool that examines how stakeholders (e.g., patients and their families, clinicians, health systems, policy makers, community organizations, advocacy groups) understand and experience their engagement in research projects. In the future, REST should be developed and validated in languages other than English (e.g., Spanish and Mandarin). We do not believe direct translation is appropriate but believe we have developed an approach that can be adapted to other languages. In an implementation study we demonstrate the ability of the REST to measure engagement across a broad array of projects with different levels of engagement [59]. However, the REST should also be examined longitudinally in projects over time to examine test re-test reliability and to see how sensitive REST is to changes in engagement over time. This would allow for the examination of the quantity and quality of engagement necessary to move partnerships along the engagement continuum. Tools that assess stakeholder engagement are necessary for the empirical examination of the influence of engagement on the types of research, questions addressed, service improvement, scientific process, and scientific discovery.