Response Rate and Demographics
Of the 13,091 individuals contacted for this survey, 1231 responded, for a 9.4% response rate. Of the 1231 respondents, 634 answered questions 2a, 2b, and 2c in the affirmative, indicating they had submitted a proposal in the last 3 years and received review feedback. The remaining results focus on this sample of 634 participants. Over half of the responses were collected within a few hours of sending the emailed invitations to complete the survey. Another, smaller wave of responses was collected after a reminder email was sent. Comparisons of the quantitative answers to questions for Usefulness and Appropriateness were nearly identical for these two groups (Supplementary Table 2).
Sample demographics are listed in Table 1: the majority of respondents were men, Caucasian, academic PhDs in a late career stage. They had submitted a median of 5.0 ± 0.1 applications in the last 3 years. The overall funding success rate was 40%, which did not vary significantly by gender, race, age, career stage, degree, and organization (Table 1).
Table 1 Demographics and success rates Some differences were evident in the demographics of respondents who made a comment versus non-commenting respondents (Supplementary Table 3). Non-whites represented only 17% (95%CI [12–22%]) of the commenting respondent pool compared to representing 29% (95%CI [26–34%]) of the non-commenting respondent pool (X2 [1] = 11.2, p = 0.0008, phi = 0.13). Additionally, respondents who made a comment were more likely to be older and in later career stages than non-respondents (Supplementary Table 3). Other demographic variables—degree, organization and gender—did not differ between commenting and non-commenting groups. However, it was also noted that 32% (N = 67; CI [26–38%]) of respondents who made a comment reported being recently funded compared to 44% (N = 183; 95%CI [39% to 49%]) of respondents who did not make a comment (X2 [1] = 8.9, p = 0.0029, phi = 0.12).
Although the survey did not ask respondents to which funding agency they had applied, we did search the comments for mention of the two largest US funders, NIH and NSF. Of the comments that mentioned funding agency, 14% mentioned NSF and 86% mentioned NIH.
Overview of Multi-method Results of Respondent Comments and Ratings
The sections below present the results of the qualitative analysis of the comments and the quantitative ratings of the associated survey questions Q1-7 (Supplementary Table 1). The 13 quotes listed in Supplementary Table 4 were specifically chosen as examples that captured a particular theme associated with the questions asked in the survey. It should be noted, though, that respondents’ comments often had a negative valence and respondents with comments tended to be more negative even in the quantitative portions of the survey as compared to respondents without comments (Supplementary Table 5). We then examine how these results vary with applicant demographics, and the relationship between responses related to appropriateness and usefulness.
Appropriateness of Feedback
Overall, only 56% (95%CI [52–60%]) of respondents thought grant review feedback was well written, cohesive, and balanced. Respondents indicated issues related to the structure and length of the feedback (Supplementary Table 4; Q1.1) and that often reviewers do not support their score with comments (Supplementary Table 4; Q1.2).
Additionally, 60% (95%CI [56–64%]) of respondents perceived grant reviewer feedback as fair and unbiased. Comments, however, identify various types of perceived biases toward specific application content, including bias against topic areas, bias against innovation, and methodology/model bias (Supplementary Table 4; Q2.1). Some comments also specifically mentioned biases against the applicants (Supplementary Table 4; Q2.2 and Q2.3). Some respondents suggested the impact biased reviews can have, particularly in an era of low funding success rates (Supplementary Table 4; Q2.4).
In terms of reviewer qualifications, 58% (95%CI [54–62%]) of respondents judged the reviewers to have appropriate expertise to evaluate their grant application, based on the reviewer feedback they received. In their comments, respondents identified a lack of reviewer expertise for interdisciplinary proposals, clinical proposals, statistical portions of the proposals, proposals with different animal models, and a lack of expertise in specific areas of science (Supplementary Table 4; Q3.1 and Q3.2).
Usefulness of Feedback
Overall, only 38% (median of 3.0; 95%CI [2.9–3.1]) found the grant review feedback they received on their last grant submission to be mostly useful or very useful. Further, only 30% (median of 3.0; 95%CI [2.9–3.1]) thought it was mostly useful or very useful in improving their grantsmanship; only 35% (median of 3.0; 95%CI [2.9–3.1]) found the review feedback was mostly useful or very useful in improving their future submissions; and only 26% (median value of 3.0; 95%CI [2.9–3.1]) felt the feedback was mostly or very useful in informing their future scientific endeavors in the proposed research area. Based on these data, the majority of applicants did not find the reviewer feedback to be highly useful.
Few comments specifically mentioned the usefulness of the feedback in terms of grantsmanship (Q5) or future scientific endeavors (Q7); more were related to the usefulness of the feedback in improving future submissions (Q6). Some remarked on how they received constructive criticism that helped improve their applications (Supplementary Table 4; Q4-7.1). However, some remarked that inconsistent feedback from different sets of reviewers evaluating resubmissions reduces usefulness (Supplementary Table 4; Q4-7.2). Others commented that usefulness is hampered by a perceived lack of expertise (Supplementary Table 4; Q4-7.3). Several comments mentioned that the feedback format and lack of suggestions for improvement limit usefulness (Supplementary Table 4; Q4-7.4). Finally, some mentioned that usefulness of feedback was ultimately restricted by funding success rates (Supplementary Table 4; Q4-7.5).
Perceptions of Feedback and Demographics
We used multiple regression analysis to examine the effects of demographic variables on perceived appropriateness and usefulness of feedback. As seen in Supplementary Table 6, the variance inflation factors between most of these demographic variables is low.
We first analyzed the relationships between demographic variables and the nominal responses related to the appropriateness of review feedback using binary logistic regression. We found significant differences in terms of funding status for responses to all three questions related to appropriateness, as indicated by the reported odds ratios (Table 2).
Table 2 Binary logistic regression of appropriateness of review feedback For example, for the Q1 regression, the factor of funding status had an odds ratio of 1.78 (95% CI 1.25–2.53). Thus, respondents who were funded were nearly twice as likely to indicate that the review feedback was well-written, cohesive, and balanced as compared to respondents who were not funded; indeed, 63% (95%CI, 57–69%) of funded respondents found the feedback to be well-written, cohesive and balanced compared to 51% (95%CI, 46% to 56%) of unfunded respondents (X2 [1] = 9.2, p = 0.0024, phi = 0.12). Similarly, funded respondents were more likely to find the feedback was fair and unbiased (Q2 Table 2): 71% (95%CI 65–77%) of funded respondents versus 53% (95%CI, 48% to 58%) of unfunded respondents (X2 [1] = 18.0, p < 0.0001, phi = 0.18). A greater number of funded respondents perceived the reviewers to have appropriate expertise to evaluate their proposal (Q3; Table 2): 68% (95%CI 62–74%) of funded respondents versus 51% (95%CI 46–56%) of unfunded respondents (X2 [1] = 17.0, p < 0.0001, phi = 0.17).
No differences were observed by career stage, age, organization or degree with respect to perceptions of appropriateness (Table 2). However, gender and race were found to predict perceptions of the appropriateness of review feedback (Table 2). Women were significantly more likely to rate the reviewer feedback as well written, cohesive, and balanced compared than men (64% [95%CI 58–70%] and 53% [95%CI 48–58%], respectively) (X2 [1] = 9.3, p = 0.0023, phi = 0.12). Whites were significantly more likely to rate the feedback as fair and unbiased than non-whites ((64%, 95% CI [60–68%]) and 49%, 95% CI [41–57%], respectively) (X2 [1] = 9.2, p = 0.0024, phi = 0.12). These differences were not due to funding success, as the rates were similar between groups (Table 1). In terms of reviewer expertise, responses to Q3 did not vary significantly by race or gender (Q3 Table 2).
Overall, for the responses related to the appropriateness of review feedback, no thematic differences were found between the comments made by non-white versus white applicants. Similarly, no thematic differences were found between the comments made by women versus men.
We then analyzed the relationships between demographic variables and the ordinal Likert responses (1–5 where 1 is most useful) related to the usefulness of review feedback using multiple ordinal logistic regression. None of the regression models for responses related to questions of general usefulness (Q4), usefulness in improving grantsmanship (Q5), usefulness in improving future submissions (Q6), and usefulness in informing future scientific endeavors (Q7) were found to explain significant proportions of the variance in the responses (Table 3). Moreover, none of the funding and demographic factors (including race, gender, career stage, age, degree, or organization) were found to be significant predictors of these responses.
Table 3 Ordinal logistic regression of usefulness of review feedback Appropriateness versus Usefulness of Feedback
Perceived usefulness of review feedback may be associated with grant resubmission rates, but the associations of perceived appropriateness of feedback and applicant behavior are unclear. Based on our results that the majority of respondents did not find review feedback useful and a large minority of respondents did not find the feedback appropriate, it is likely that applicants who don’t find the feedback appropriate also don’t find it useful. In fact, some comments in our survey suggested usefulness was limited by the lack of appropriate expertise. To test this assumption, we compared respondents who found the review feedback they received to be fair and unbiased (Q2) to respondents who did not. For these two groups, we examined their answers to the questions concerning the usefulness of the feedback (Q4–Q7). This comparison of usefulness and appropriateness is listed in Table 4. The results revealed significantly more negative perceptions of the usefulness of the feedback for those who also felt the feedback was biased compared to those who felt the feedback was unbiased.
Table 4 Review feedback appropriateness versus usefulness