Under-represented, cautious, and modest: the gender gap at European Union Politics

The gender gap pervades many core aspects of political science. This article reports that females continue to be under-represented as authors and reviewers in European Union Politics and that these differences have only diminished slightly since the second half of the 2000s. We also report that females use more cautious and modest language in their correspondence with the editorial office, but do not find evidence that this under-studied aspect of the gender gap affects the outcome of the reviewing process. The authors discuss some measures European Union Politics and other journals might take to address the imbalance.


Introduction
or, more precisely, its editors including foremost the male co-author of this article, helped to prolong the under-representation of females in the discipline and only appointed two senior colleagues, Sara Hagemann and Heike Klüver, to the position of associate editors in the second half of the 2010s. In 2021, ten out of 30 editorial board members were female. This is only a partial improvement over the situation in 2000 when the first issue came out. In the initial editorial board, five out of 14 members were females, but only six women out of 35 members had been appointed to the now abolished international advisory board.
In particular, this article sets out to explore how well represented females are as (lead) authors and reviewers, what their chances are to have their manuscripts accepted for publication and what kind of recommendations they are making. In addition to these standard questions of research, we also analyse whether the gender gap manifests itself in a largely neglected dimension: the language in which authors and reviewers write their memos and reports. The analysis confirms that women are still under-represented at EUP both as authors and reviewers, but that their chances of having articles accepted and their recommendations for submitted articles are similar. However, the study also shows that females use more cautious and modest words in their correspondence with the EUP editorial office. Based on the evidence presented in this article, the journal will especially solicit articles from female and minority scholars in the future and edit the prose of self-congratulatory authors. We also suggest that supervisors of doctoral dissertations alert their students more to the gender gap in academic communication and representation.

Data and methods
Our study relies on two main data sources. First, we retrieved the complete set of available submission and review data since 2006 (when EUP moved to Manuscript Central, the widely used submission management system). For the sake of comparability, we only include completed submission years to make sure that all submissions had obtained a final decision by the time we performed our analyses. We therefore exclude submissions received before 2007 and after 2019 to prevent activity peaks during several periods of the year from biasing our data (see Grossman 2020, for a similar approach). We merge and structure our data so that we end up with one submission as our major unit of analysis, containing all relevant information regarding co-authorship, review process, and final decisions. Note that multiple rounds of revisions are thus reflected within one submission and treated as one observation.
We compiled a second dataset for a word frequency analysis of documents generated throughout the entire review process including cover letters, author responses, and reviews. 2 For this purpose, we relied on EUP's Manuscript Central portal (ScholarOne) which stores the relevant documentation for submissions of the past two years before they are moved to the archive. Given this limited time window, the data do not cover the entire period from 2007 to 2019 but include the most recent submissions only. We started collecting these data in July 2020 (with the earliest documentation dating back to 2018) and updated our first data collection in May 2021, thus broadly covering the last three submission years. We can thus offer a recent account of gender differences in the language used by both authors and reviewers and interpret our findings in the context of long-term submission and publication trends. Overall, there is high inequality with regards to gender in the pool of documents: of the total of 676 documents, a grand majority of 72% are written by males.
One major obstacle we encountered during the data collection process is the fact that the gender attribute of both authors and reviewers has only been added to the standard questionnaire in 2018. Although we came across records of authors' or reviewers' gender attributes for submissions received or reviewed before 2018, this information turned out to be retrospectively assigned by Manuscript Central in case the same person had answered the gender attribute question after 2018. We determined the probable gender of authors and reviewers through the Genderize database. 3 This approach allows to associate first names with the probability of the name being held by a man or a woman, respectively. The Genderize algorithm is not only an established tool (Lerchenmueller et al. 2019;Lerchenmueller and Sorenson 2018), but it was also found to provide the most accurate estimates of gender (Cod-ingNews 2015). To give an idea of the extent of missing data on the gender attribute variable for submitting authors, we had to determine the gender attribute for three in four cases (75% of all submissions).
For the first part of our analyses, we perform significance tests comparing differences in the shares of female and male authors or reviewers. For the word frequency analysis, we run simple linear regressions to assess the relationship between gender and the usage of certain language categories. For that purpose, we prepared the text files from ScholarOne and extracted the relevant words with R's "tidytext" package. 4 We identified relevant expressions along the following three dimensions: (a) the use of positive versus negative language; (b) the use of cautious language (i.e., the opposite of self-promotion); and (c) the use of structuring language. For our first dimension (a), we follow Weidmann et al. (2018) who restrict the set of positive and negative words (as proposed by Vinkers et al. (2015) for articles published in medical research) to those that appear most applicable to the field of political science (i.e., words that appeared more than five times in abstracts for all articles published through 2014 in the three main political journals): American Political Science Review (APSR), American Journal of Political Science (AJPS), and Journal of Politics (JOP). 5 The second dimension (b) comprises the following expressions: "careful", "cautious"/"with caution", "initial", "preliminary", and "tentative". Finally, we rely on the following seven words to identify the use of structuring language (i.e., authors' and reviewers' precision when referring to the article under review): "title", "note", "paragraph", "page", "section", "table", and "figure". Except for the cautious language dimension (b), the word frequency dimensions are applied to both author (cover letters and author responses) and reviewer (reviews) documentation.
In addition to the aforementioned word frequency analysis, the second dataset was further processed using the LiAnS-Pipeline for linguistic annotations. 6 For this analysis, we had to exclude 22 files as the pipeline was not able to process their contents. We annotated the remaining dataset for a variety of linguistic features (see the online Appendix, for a complete overview) that have been considered indicators of gender in language in previous research. 7 We determined the frequency of each feature per file in per cent (the exception being word count recorded in absolute numbers). The goal of the investigation was then to see whether the frequency of each extracted feature is significantly different for men and women, i.e., whether men or women use significantly more or less of this feature or whether the frequency can be predicted based on gender. To this end, we calculated one linear mixed-effects regression model for each of the extracted features. In each model, the measured feature was entered as dependent variable, gender as a fixed factor (only main effect calculations were possible because there was only one fixed factor in each model) and an anonymized author identifier as random effects (Baayen et al. 2008), using the R packages "lme4" (Bates et al. 2015) and "lmerTest" (Kuznetsova et al. 2017). We report the results of selected models in Table 2 alongside the shares of the associated feature in the documents.

From submission to publication
Women submit fewer papers to academic journals. This larger trend similarly applies to the field of political science (Breuning and Sanders 2007;Closa et al. 2020;Grossman 2020) and other disciplines (Teele and Thelen 2017). EUP does not constitute an exception here. The share of female submitting authors has been close to but has never exceeded the 40% mark, reaching its climax in 2011 (see Fig. 1a). Table 1 reports shares for 2007 and 2019 and the statistical significance of the difference between both years. Comparing the share of female submitting authors across time, we do not find any differences (around 28% in both years). What is more, the share of all-female author teams (including 6 The LiAnS-Pipeline is currently under development in the workgroup of Computational Linguistics at the University of Konstanz and is based on the VisArgue project (see http:// www. visar gue. uni-konst anz. de/ de/ (accessed 14 July 2021). 7 The included features are considered to be indicative of more polite and less powerful, tentative language that has been associated with women since Lakoff (1973). While the linguistics approach to gender differences in language has been re-evaluated multiple times (Eckert and McConnell-Ginet 1992;Maltz and Borker 1983), and the tentative nature of some of these features has been questioned (Holmes 1990), Lakoff's features are still widely used in the analysis of this type. Additionally, this analysis includes less well-discussed linguistic categories such as certain function words that have exhibited significant differences in research on larger corpora (Newman et al. 2008).  single-authored submissions) slightly decreased by two percentage points over the same period. At the same time, we find a drop in all-male author teams by ten percentage points and a strong and significant increase in mixed author teams by twelve percentage points. Thus, while submitting authors still tend to be male, there is evidence of a growing trend towards mixed author teams. Moreover, the decline in same-sex teams is more pronounced for all-male submissions (see Fig. 1b). Do the gender differences reported for the submission stage apply to the publication stage as well? First of all, when it comes to the final decision on the publication of author submissions, we state a strong and significant decline in the acceptance rate for female submitting authors (by 27 percentage points) and even more so for male submitting authors (by 34 percentage points). This result indicates an overall higher standard of scrutiny over the years as applied by reviewers and editors alike. While this finding is neither surprising nor unique given the increasing number of submissions to EUP, we are interested in the past and current gender gap in acceptance rates. Unlike other studies (Closa et al. 2020;Stockemer et al. 2020), we do not find that female submitting authors are favoured towards men in the publication stage. Our results suggest that female and male submitting authors are treated more or less equally with regard to the acceptance of their work. Back in 2007, male submitting authors were still more likely to see their work published (50%) than their female counterparts (38%). However, this gap has shrunk considerably over the years amounting to five percentage points in 2019.
Reviewers play an important role in the run-up to the final decision (i.e., the review process). The gender gap traced for both submission and publication stages in political science applies to the scholarly peer review process as well and across disciplines (Helmer et al. 2017;Squazzoni et al. 2021). We discover a very similar pattern. Again, we consider both the share of reviewers by gender and the composition of reviewer teams (i.e., all reviewers assigned to the same submission). For 2007, the share of female reviewers and the share of all-female reviewer teams is almost the same (27% and 26%, respectively) as is the percentage point increase until 2019 (three percentage points). All-male reviewer teams still represent the largest share in 2019 (44%) but declined by eight percentage points as compared to 2007 (52%). What is more: mixed reviewer teams are on the rise (plus five percentage points) and seem to partly compensate for the drop in all-male reviewer teams. While none of these changes reach statistical significance, they corroborate the observation of a slow but steady trend change towards more female participation on both sides of the review process. According to König and Ropers (2021), the trend towards mixed reviewer teams reduces gender bias (i.e., the tendency that male reviewers privilege male authors, while female reviewers privilege female authors).

A matter of language?
We identify the use of language as one potential driving force behind the gender gap in academia more broadly and the review and publication process in particular. In other words: "Men are more likely than women to call their science 'excellent'" (Johnson 2019). To test this argument, we focus on the means of communication in the exchange between authors and reviewers (with the editors as the intermediary) throughout the peer review process at EUP. These documents comprise cover letters, author responses (to reviews), and reviews. In the following, the results of the simple word frequency analysis will be evaluated together with the results from the LiAnS-Pipeline for linguistic annotations. Table 2 summarizes our major results and reports the p-values obtained from both regression analyses.
First, regarding the overall word count, we find that male authors use on average more words in their author responses and cover letters than females; however, this effect is not significant. Interestingly, this relationship is reversed and significant when only reviews are considered. This means that while female authors use less words to introduce their research in cover letters or defend their work in their author response, they use significantly more words when reviewing other authors' work.
When assessing the use of language, the analysis with the LiAnS-Pipeline reveals no significant differences between the genders for the majority of features. This holds true even for the most well-researched and documented features generally associated with women such as hedges (i.e., expressions used to mitigate the certainty or confidence of an assertion) and adjectives (Holmes 1990;Lakoff 1973). However, previous research from a linguistic perspective has focused largely on spoken and informal language, which may explain the discrepancy found when analysing the more formal and professional texts from the peer review process. By contrast, a simple word frequency analysis of all documents, independent of their type, reveals that women use more structuring language, a difference that is highly significant. This effect can be observed for all individual document types as well, but only remains significant for the subgroup of reviews.
As expected and in line with previous research, according to our simple word frequency analysis, women use more cautious language than men (Leaper and Robnett 2011). While this effect is not significant for author responses, it is significant for cover letters and is highly significant for the combined pool of text files. (Reviews were not analysed for the self-promotion gap.) What is astounding is the difference in the usage of these words between the two genders: Overall, the frequency in which women use more cautious words is three times larger than for men. Unsurprisingly, this large effect is not only substantially significant, but also statistically. With regard to cover letters, where the effect just misses significance, this effect is even more profound. This not only confirms previous research on the more insecure correspondence of women, but it also shows the magnitude of the effect.
At first glance, the results from the LiAnS-Pipeline suggest a different picture. When only looking at author responses, women appear to use more certainty expressions (p < 0.04) while men use more verbs that were categorized as polite (p < 0.02). This, of course, contradicts the expectation and previous finding that women use more polite and more uncertain language than men. However, the approaches to determine cautious or tentative language used in the two analyses differ. The simple word frequency analysis identifies eight words as indicators of cautious language and ascertains the frequency of these words without a greater linguistic context, thereby finding that women use a specific subset of "cautious" words more frequently than men. The LiAnS-Pipeline analysis, however, attempts to identify several linguistic categories that, in sum, would indicate either the presence or the absence of tentative (and more polite) language. Within these categories, it is then possible to consider multi-word expressions as well as the influence of words preceding and following the expression at hand. While the significantly higher frequency of certainty expressions in women's language identified through this analysis is one potential indicator for the absence of uncertainty, it alone is not enough to determine that women use more certain and less cautious language. Of course, with none of the other features associated with tentative and polite language exhibiting significantly higher frequencies for women, this analysis does not make a case for the use of cautious or tentative language by women either. Regardless, there is no correlation between the modality expressions of certainty identified here and the aforementioned subset of "cautious" words, and there is also no other, non-significant linguistic feature in the second analysis that corresponds closely to these "cautious" words. Therefore, the result of the second analysis does in no way negate the finding that women use a specific subset of cautious words more frequently than men.
Additionally, the LiAnS-Pipeline approach analyses the use of pronouns, which the word frequency analysis does not look into. When focusing on reviews, men are found to use significantly more first-person pronouns (p < 0.03) than women. While previous research (Argamon et al. 2003) had indicated a higher use of pronouns by women, the results on the use of first-person pronouns were inconclusive given that some studies attributed a higher frequency to women, while others did so to men (for a detailed discussion, see Newman et al. (2008)). Nonetheless, we interpret our Note: Columns 1-2 (female; male) report mean values (across all text files of one document type) for the proportion of relevant keywords in one text file; p-values indicate the significance of the gender variable in predicting the use of language in the specified dimensions results as a sign of male reviewers being more inclined to talk about themselves when reviewing other people's work. The simple word frequency analysis further considers special issues individually. Only about 17% of special issue documents are written by women. Subjectively, this highly significant difference is attributable to the decision of women to reject the offer of participating in a special issue. However, it is unclear if there are other factors that play into this and there is no information as to why women tend to reject these invitations.
Lastly, we assess the influence of language on the acceptance of submissions. This final step, however, is hampered by the fact that our pool of documents only includes author and reviewer correspondence on articles that were ultimately accepted for publications. Therefore, we are unable to observe variation on the binary outcome of interest (i.e., the final decision to accept or reject). Instead, we refer to individual reviewer recommendations made at various stages of the review process and assess the effect of these individual recommendations on the language used in the author responses to the reviewers. We find no significant effect of reviewers' recommendations on the usage of any particular language dimension included. Authors do not seem to adjust their language significantly in response to a desirable or undesirable decision by the reviewers.

Conclusion
EUP has been founded with the ambition to bring cutting-edge research on the European integration process to the forefront of political science. The journal has published articles on the gender gap in female political representation in the European Union (Fortin-Rittberger and Rittberger 2014), but its editor has failed to acknowledge that the differences between female and male authors have only slightly improved over the past few years. The differences in the correspondence with the editor have most likely also not changed much over time, although we were only able to examine this prose for the past few years. Our findings are in line with past evidence on the existence and only gradual narrowing of the gender gap across disciplines, subfields, and across almost all aspects of academic work-from hiring to promotion (Gintehr and Kahn 2004;Moss-Racusin et al. 2012), from earnings to funding (Larivière et al. 2011;Leahey 2007), from publication to citation (Deschouwer 2020; Dion and Mitchell 2020). While our account of linguistic gender differences in the correspondence of authors and reviewers with the editorial team points out one potential driving force, future research should take a closer look at other aspects as well to explain the imbalances observed and to identify further potential for improvement.
Admittedly, the persistence of this problem at EUP is not unique, but an overall challenge for political science and neighbouring disciplines (see Haastrup et al.; Martinsen et al.; Verney and Boscoin in this symposium). We believe that editors, editorial boards, and scholarly organizations share the responsibility to address this issue. Potential reform measures should target the future leaders of the fieldgraduate students, postdocs, and non-tenured faculty members. In the education of doctoral students, supervisors and instructors need to give hands-on advice on how to write scientific prose and how to correspond with journals in a confident, but not overly aggressive manner. Female and minority scholars might receive training on what kind of words that they should avoid in this context. Mentorships could also be established between editorial board members and junior scholars who might not have the ideal background to develop their professional careers. Such arrangements could find the support of academic organizations such as the European Consortium for Political Research or the European Political Science Association. Both organizations already offer some training in this respect, but journal editors could be used as talent and problem spotters in this regard.