Comparisons of content and scientific quality indicators across peer-reviewed journal articles with more or less gender perspective: gender studies can do better

The field of gender studies has faced criticism for poor scholarship and methodology, both from within and outside academia. Here, we compare indicators of scientific quality across three samples of peer-reviewed journal articles with more, less and no gender perspective, on the assumption that gender studies tend to apply a gender perspective. The statements in the articles were content-analysed with respect to subject matter, their level of support in surrounding text, and other indicators of scientific quality. The higher the level of gender perspective, the lower was the scientific quality for seven out of nine indicators. Support was higher for the no gender perspective group, but did not differ across the two higher levels. We suggest that the impact of the field can be increased by implementing established research methods employed in other disciplines, especially in terms of bringing about desired social and societal change.


Introduction
Gender studies has established itself as an academic field over the last half century (Lykke et al. 2007;Thurén 2002;Pavlidou 2011). It covers a broad range of subject matters, including women's studies, feminism, masculinity studies, studies of sexual minorities (Liinason and Holm 2006;Pilcher and Whelehan 2004), sex/gender equality, queer studies, social norms (Lykke et al. 2007) and intersectionality-discrimination based on e.g. ethnicity, race, gender, age, sexual preference, disability and religion (Kantola and Nousiainen 2009;McCall 2005). This field has been associated with certain ideologies and political goals (Cronin et al. 1997;Curthoys 2014;Hoff Sommers 1995;Liinason 2011;Lykke et al. 2007) and influences from post-modernism, relativism, and critical theory (Bergman 2000;Pilcher and Whelehan 2004;Thurén 2002). It has rapidly established itself in academia in Western countries, typically with considerable support from the political establishment Madison 2015, 2017). However, gender and women's studies scholars have experienced both their efforts and themselves as antagonized (Thurén 2003;Zalewski 2003;Baird 2010), their research area as charged (Bergman 2000;Friedman 1997), and their concepts as controversial (Liinason and Holm 2006;Patai 1995;Pereira 2012).
Gender studies has faced criticism on several levels. It has been accused of being nonacademic, overly ideological (Friedman 1997;Hoff Sommers 1995) and theoretical (Zalewski 2003), and for featuring poor methodology and scholarship. The validity of the research methods has been questioned (Pereira 2012), in particular the limited use and analysis of empirical data, and the focus on abstract concepts such as ''social structures'' (Rothstein 1999). For example, Popova (2005) found that the most recent doctoral theses listed by the Swedish National Secretariat for Gender Research assumed the reality of the ''gender power order'' and were dominated by a post-modern perspective. The concept of gender has itself been criticized as logically untenable on the grounds that social and biological sex are intrinsically confounded and therefore cannot be separated (Carlsson 2001). This makes it impossible to attribute behaviour or characteristics to social factors alone, which is common within gender studies (Patai and Koertge 1994). Another recurrent line of criticism is bias in the choice of questions and methods and interpretation of results, on the basis of an underlying political and ideological agenda (Popova 2005;Sokal and Bricmont 1998;Ström 2011). Gender scholars tend to respond that the critics are illinformed and ignorant concerning the focus of gender studies research, which seeks not to obscure or misrepresent the truth, but to question discriminating conventions and open up for new viewpoints (see for example Anderson 2015;Harding 1986Harding , 1991. There has also been a lively debate outside academia, where both the concept of gender and gender studies have been questioned and criticised, mostly in blogs and forums but also in books and reports (e.g. Billing 2012 ;Ström 2007;Tuininga 2016). Such criticism from outside academia has been dismissed as journalistic and selective by gender scholars (Friedman 1997).
At large, the nature of these criticisms is associated with the so-called Science Wars in the USA in the 1990s (Brown 2001;Gross and Levitt 1994;Ross 1996) in which fundamental epistemological questions were raised. The central chasm was between the alleged positivist stance of the natural sciences and a post-modernist nexus of influences on areas such as Cultural studies, Literature studies, Media studies, Feminist studies, Gender studies, and Science and Technology studies. This huge controversy is outside the scope of the present article and will in itself not be further discussed here. However, it is notable that the Science Wars and the bulk of criticism against Gender studies and the whole body of post-modernist thought as it appears also in other disciplines mainly involves academic fields distant from each other (Brown 2001). Indeed, the most prolific and persistent critics are found in remote fields like philosophy (Hacking 2016), political science (Rothstein 2006(Rothstein , 2012Brown 1997) and physics (Sokal 2006;Sokal and Bricmont 1998). The cross-disciplinary nature of this criticism may be one reason why the debate lingers on a high abstract level, and why the Science Wars were never resolved. Without commonly accepted quantitative indices to scrutinize it is difficult to reach any conclusions, and this state of affairs also reduces the motivation to pursue such issues. Indeed, the field of gender studies has only been subject to very limited systematic study, given its comprehensiveness, pretensions, and the criticism it has faced.
A few studies are more directly related to the criticisms being reviewed. A gender perspective was associated with fewer citations and being published in journals with lower impact factor, according to a bibliometric study of about 1000 publications (Söderlund and Madison 2015). A content analysis concerning realms of explanation found that a gender perspective was associated with a higher proportion of biased statements, defined as a preference of opinion by slanted or exaggerated words or language, normative statements, defined as the use of value words, and with a focus on societal and environmental/cultural causes and explanations (Söderlund and Madison 2017). Finally, a content analysis of editorials in three gender studies journals showed that many of the editorials exhibited a feminist political agenda (Cronin et al. 1997). Other than that, there seem to be no more studies of this literature, according to a systematic publication databases search Madison 2017, p. 1095).
Here, we consider some fundamental aspects of scientific quality related to the criticisms against gender studies that have not been well addressed by previous research, such as the possibility to draw inferences, the generalisability and validity of results, and accounting for limitations with the authors' own work. The rest of the introduction outlines and motivates the general design, which combines verbal content analysis and quantitative comparisons of frequencies of certain content across publications with more or less gender perspective.
Peer-reviewed journal articles were chosen as source of data. Being contingent upon the influences and approval of several scholars, they should be the most representative type of publications for the consensus in the field. The sample was limited to publications written by scholars at Swedish institutions, for two reasons. First, the population has to be constrained in such a way that a comprehensive and representative sample can be obtained. In this case, our local knowledge about institutions, literature databases, and other infrastructure is useful for obtaining information about the scholars and identifying as large a proportion of the literature as possible. Second, the selected population should be informative with respect to the research questions. Sweden provides a favourable environment for gender studies compared to most other countries, which should have facilitated productivity as well as theoretical and methodological development. There has, for example, been extensive governmental funding for gender studies during (Swedish Research Council 2011a, which may be related to a faster increase in gender studies publications in Sweden than in other countries (Söderlund and Madison 2015). This monetary support was earmarked, and amounted to approximately one sixth of the total funding from the Swedish Research Council for the social sciences and the humanities during 2001-2011 (for details, see Madison 2017, p. 1094). Sweden also has a feminist political party called Fi!, a government that proclaims itself as feminist (Socialdemokraterna 2016, p. 6), six out of the seven remaining parties labelling themselves as feminist, and is ranked fourth most sex egalitarian amongst 145 countries in the 2015 Global Gender Gap Report (World Economic Forum 2015, p. 8). The gender studies field has also received extra funding to so-called Centres of Excellence (Swedish Research Council 2011a). Taken together, this should render Swedish journal publications of particularly high quality and representative of the development of the field under favourable conditions.
The statements in the articles were content-analysed with respect to subject matter, level of support for the statements, the type of relationship between variables (e.g. causal or correlational) and mention of other research and of limitations with the authors' own study. Limitations included so-called reflexivity, where the author is explicit about his or her own viewpoints and preconceptions, which is emphasized as a particular strength of gender studies (Chrisler and McCreary 2010;Dölling and Hark 2000). We reason that the contents of text that describe the field of study reflect how scholars think about their subject matter on a general level (Graziano and Raulin 2014;Simonton 2006).
However, assessing absolute levels of various indicators of scientific quality is of limited use for evaluating grounds for criticism. No study is perfect, and different types of questions and subject matter constrain which designs and methods can be used, and disciplines differ in how research is reported. We need therefore to compare gender studies to what is not gender studies, but which is nevertheless comparable in terms of questions and subject matter. This poses a profound problem, as gender studies is neither a discipline nor constrained to certain questions or subject matters, as mentioned above. It would not make sense to single out what is produced in certain departments, because the ideal is that the gender perspective should imbue each and every field of study where it is relevant (e.g., Lykke et al. 2007;Pavlidou 2011). Accordingly, it has even been suggested that gender studies for this reason cannot be studied (for a discussion see Madison and Söderlund 2016). Our solution to this problem was to postulate that a gender perspective is characteristic of gender studies publications, and quantify the level of gender perspective for each article. This further rests on the reasonable premise that such a perspective represents a relatively coherent set of values and beliefs, epistemological and other. A high level of gender perspective should therefore be associated with individuals who hold these values and beliefs, as well as being identifiable in texts written by such individuals. According to this conjecture, the highest level of gender perspective was assigned to articles written by those who self-identify as gender scholars or explicitly endorse a gender perspective, the medium level to texts that reflect these values and beliefs, and the lowest level to the remaining articles. Our solution to the problem of comparability was to constrain the samples to a broad subject matter, namely questions related to gender and sex, which should cover a relatively large proportion of gender studies publications.
The review of the criticisms above indicate several specific hypotheses that could be formulated. However, being the first study of these particular quality indicators, a less constrained and to some extent exploratory approach seems more useful. We therefore hypothesize that there be no difference in any quality indicator between the levels of gender perspective, based on the premise that these publications are all, across institutions, departments, and levels of gender perspective, produced and peer-reviewed by academics who have acquired scientific training. We do, however, hypothesize that the subject matter differs across levels of gender perspective, according to gender studies' greater focus on social and societal factors rather than factors related to individual differences. Finally, we explore interactions between content categories and type of relationship between variables.

Method General approach
As argued in the Introduction, the contents of text that describe the field of study reflect how scholars view and reason about the concepts under study and their relations. In effect, the unit of analysis is the group of articles rather than the individual articles. This indicates sections labelled as background, discussion, introduction, and overview, but not sections describing the aim, material, method, or results. The latter are more idiosyncratic and limited to the constraints of the specific study being reported. As it turned out, a fair proportion of the articles did not even have a demarcated method or results section, and did furthermore not comprehensively disclose such information, a tendency which increased with the gender perspective. The alternative approach, to evaluate the design and methods employed in each article, would therefore have provided fewer data points the more the gender perspective, and made comparisons difficult. Less structured articles with theory interspersed throughout larger parts of the text were analysed in their entirety. If a part of the text was a summary of other sections, it was excluded from analysis.
Four types of analyses were performed. First, a content analysis that identified statements to be used for the other three analyses, and for identifying themes in the articles. Then, the statements were categorised according to the type of relationship between variables and relations to other theories/research, and limitations with the authors' own study.

Materials
The journal articles were sampled from the Swedish Gender Studies List (SGSL), a compilation of all gender studies publications and a subpopulation of publications related to sex or gender authored by scholars active in Sweden. The SGSL includes [ 12,000 papers published in affiliation with Swedish universities between January 2000 and November 2011. A summary of the material collection will be given below, and a detailed account is found in Söderlund and Madison (2015). The journal articles were either written by authors affiliated with gender studies departments or were found with the keyword ''gender'' The selection of gender publications was primarily based on publications listed by the gender centres located at the Blekinge and Luleå polytechnics and the universities of Gothenburg, Karlstad, Linköping, Lund, Malmö, Mid Sweden, Stockholm, Södertörn, Umeå, Uppsala and Ö rebro, secondly on other databases indicated by the gender centres, thirdly on searches in Web of Science (WoS), and fourthly on searches in the database KvinnSam. 1

Assignment of level of gender perspective
As motivated in the Introduction, the categorisation of level of gender perspective into groups of articles was based on both self-identification and subject matter, an approach inspired by Ganetz (2005). Information about all authors of each article was obtained from the homepages of the institutions, from other homepages, or from the articles themselves. The highest Self-identified level was defined as having at least one author who explicitly either identifies as a gender studies scholar or (1) acknowledges an uneven power relation between men and women, (2) considers sex as socially and culturally constructed, or (3) focuses on injustices and discrimination based on gender, race, ethnicity, sexuality, age, religion and disability (Kantola and Nousiainen 2009;McCall 2005). If these conditions were not fulfilled, articles were assigned to the Inferred group if their content expressed any of those three criteria, which corresponds to the definition of gender studies in the largest Swedish encyclopedia: (1) power relations: ''The perspective of interpretation is based on the power relationship that historically, culturally and socially have defined women's and men's roles and status in society'', (2) social construction of gender: ''… the society and culture are structured according to gender… this determines our experiences and knowledge and how others perceive us'' and (3) intersectionality: ''…how different power relations interact in the social construction of social differences…'' (Nationalencyklopedin 2016, our translation). Remaining articles were assigned to the group with the lowest level of gender perspective, called Neutral.

Sample selection
Twelve journal articles were randomly selected from each of the three levels of gender perspective, according to the following procedure. Each publication in the SGSL was assigned a unique random number, and the SGSL was sorted according to type of publication (first) and this number (second). Starting with the lowest number, the peer-reviewed articles were assigned a level of gender perspective, as described above. A small group of journal articles were excluded because they related to gender in irrelevant ways, for example in the linguistic sense. After having completed three groups with 12 articles in each, we noticed that 10 Neutral, 2 Inferred, and 1 Self-identified articles were written by scholars affiliated with departments within the faculty of medicine. Articles in the field of medicine tend to differ in several aspects from the humanities and the social sciences, which might constitute a bias for the comparison. These 10 articles in the Neutral group were therefore replaced with non-medicine articles, the random selection of which happened to include only the social sciences. The medicine articles in the other two groups were however retained to reflect the interdisciplinary character of gender studies and the many areas in which gender perspectives are employed. Table 1 below shows the distribution across the three groups and research area of the first author.

Content analysis
The process consisted of identifying statements in the texts and categorising them according to content analysis (Krippendorff 2013), as further described below. This method reduces the numerous words in a text to a smaller number of content categories (Weber 1990). These categories are made up of coding units which, if part of the same content category, are all expected to have the same meaning. One typically chooses a ''natural'' unit, in this case full sentences, and defines coding units using a syntactical distinction. As is common in content analysis, the categories were in part induced by the content of the statements and in part influenced by our knowledge of the area (Krippendorff 2013; Miles and Huberman 1994). For example, such prior knowledge indicated that societal systems, sex differences, and theories about people in general were expected categories.
The articles were content analysed in random order. No trend in the coded categories could be observed as a function of the order in which they were coded. Categories were identified after one or two articles from each group had been coded, and were few and wide in nature, consistent with the intention to give a broad overview of the article content. These categories were: (1) People Concrete-actual people, individuals or groups are mentioned, typically when referring to earlier studies or own results, (2) People Abstractstatements about non-specific people, individuals or groups, for example when mentioning theories, policies or typical/hypothetical situations, (3) Society-for example policies, governmental actions, occupations or societal discourses, and (4) Sex differences-differences between men and women, both concrete and abstract. The sex difference statements were also about people, but were nonetheless gathered in their own category because of the specific relevance to Gender studies. Sex rather than ''gender'' was used for this category because virtually all these statements are based on differences between males and females, rather than some index of social sex such as masculinity/femininity. Finally, (5) statements about theories or matters other than the above categories, for example images, texts or measuring instruments, were categorized as Other statements. The categories were mutually exclusive, such that a statement could only belong to one category. All in all, 2805 statements were categorized, 500 of which were coded by two independent coders in order to assess coding reliability, as reported in the results section. Both coders were aware of the objectives of the study, and discussed the definitions before the coding commenced, but were ignorant about which group each article was assigned to.

Dependent variables
The quality indicators were the number of statements coded as belonging to any of eighteen categories, some predetermined and some emergent. Following the five broad subject matter categories (1-5), they are consecutively numbered and consist of the level of conceptual support for the statements (6-9), types of relationship between variables or constructs (10-13), and relations to earlier theories, results, and limitations with one's own study (14-18). These variables are described in more detail below. Level of support Support was defined as other statements providing facts or arguments for a given statement. It was mechanically categorised by one coder into one of four levels: No support (6): The statement is not supported at all; Low (7): It is supported by either a reference, empirical fact, or a cohesive argument within the whole of the text; Medium (8): It is supported within 10 sentences before or 3 sentences after, or High (9); It is supported within 3 sentences before or 1 sentence after. Statements regarding common knowledge, scarcity of earlier studies, and reports of own results were ignored in this analysis, retaining 1946 out of 2805 statements.

Relationship between variables
The type of relationship between variables, including constructs or dimensions, was categorised into (10) Statement of Fact, meaning descriptive statements regardless of their being supported or not, (11) Correlation, where it is claimed that two or more variables are correlated, (12) Causality, where one variable affects or controls another, and (13) Speculation/Hypothesis, which may consist of any type of relation. Specifically, Speculation/ Hypothesis was defined as clearly stating that the subject matter is unsettled and has not reached academic consensus, but may at the same time vary from wild speculation to sound hypotheses based on earlier research. Cross-sectional designs were coded as correlational, because none of them address causality, but both indicate associations between variables. Correlation and Causality were decided beforehand, and the other two categories emerged from the content analysis. Out of the total 2805 statements, 2754 could be coded in one of these categories, the remaining 51 stating the author's own views or methodological concerns, for example. The categorisation was performed by one coder, although inter-rater reliability was assessed for 500 randomly selected statements categorised by two coders, as reported in the results section.

Relations to other research and limitations with the authors' own study
This dimension was assessed by coding the statements into the following five categories: (14) Earlier theories, which includes previous research theories and reasoning, not necessarily tested or included in the study, (15) Earlier results, which includes explicit previous empirical findings, (16) Own results, where the present results are discussed, and (17) Comparisons, where own results are compared or contrasted to previous results and earlier studies generally. After excluding 97 statements that earlier studies were scarce or about commonly known facts, 2708 out of the total 2805 statements were coded. Finally, another 174 statements concerning (18) Limitations with the authors' own study were coded. These limitations relate to methods used, the scope of the study, type of data, or a mention of reflexivity. All in all, this dimension included 2882 statements, assessed by one coder only, due to the mechanical nature of the coding.

Statistical analysis
Differences between the groups were assessed with v 2 -tests, comparing the actual with the expected frequencies, based on the absolute numbers of statements for each group. A subset of 500 statements was coded by two individuals in order to assess reliability. Nine categories whose definitions were highly concrete and could be employed in a mechanical fashion were coded by one individual, however, in order to strategically reduce the workload. These were Earlier theories, Earlier results, Comparison, Own results, and Limitations (with the authors' own study), and the four categories of Level of support (No, Low, Medium, and High). The reliability of the coding was estimated by Cohen's Kappa.

Results
The reliability of the coding for 500 statements was assessed for the content categories (People Concrete, People Abstract, Society, Sex differences and Other) and the type of relationship (Statement of Fact, Correlation, Causality and Speculation/Hypothesis). Categories with three or fewer statements were excluded from this analysis, namely, the content category Other in Correlation, Causality and Speculation/Hypothesis, and Society and Sex differences in Speculation/Hypothesis. In addition to the analyses described, the words in the 36 articles were counted, save for tables, figures and reference lists. Ten of the 12 Self-identified articles ranged between 7000 and 9000 words, two had * 4100 and * 5200 words, and the average was 7371 words. Ten of the Inferred articles ranged between 4700 and 9500 words, two had * 2600 and * 3200 words, and the average was 6588 words. Nine of the Neutral articles ranged between 4000 and 9000 words, the remaining three had * 1800, * 2500 and * 3000 words, and the average was 5552 words. The grand total was 234,130 words, and subtracting 34,567 words of direct method descriptions left 199,563 words for the content analysis. Figure 1 presents the results of the content analysis, the categories numbered 1-5 in the method section. The bars represent the proportion of statements in percent of each content category for each of the groups Self-identified, Inferred, and Neutral, in order to make them comparable in the face of different total numbers of statements across the three groups. The distribution of these statements was not random across the three groups, according to a v 2 -test (v 2 (df = 8, N = 2805) = 473.035, p \ .05). A post hoc analysis with standardized residuals (Agresti 2012) showed that Self-identified had a lower proportion and Neutral had a higher proportion of People Concrete statements than expected under the null hypothesis, and that Inferred had a higher and Self-identified a lower proportion of People Abstract statements than expected. Self-identified contained a higher, and Inferred and Neutral a lower proportion of Society statements than expected. Sex differences were mentioned less than expected by Self-identified and more than expected in Neutral articles. These significant contrasts are indicated by the labels H and L in the figures. Table A1 in the Appendix lists the number of statements for each group and content category. Figure 2 depicts the proportions of each level of supported statements in the three groups. A v 2 -test showed that the distribution of statements was significantly different across groups (v 2 (6, 1946) = 117.269, p \ .05). The same post hoc analysis as for content category showed that Self-identified and Inferred had a higher proportion, and Neutral a lower proportion of No support (category 6) than expected. Neutral also had a higher proportion of High support (9) than expected. There were no differences for the Low (7) or Medium (8) levels of support. Table A2 in the Appendix lists the number of statements for each group and level of support. Figure 3 shows the distribution of relationships between variables across the three groups. Significant differences were found in (10) Statement of Fact (v 2 (2, 1883) = 71.238, p \ .01), Correlation (v 2 (2, 541) = 166.519, p \ .01), and (11) Speculation/Hypothesis (v 2 (2, 196) = 12.464, p \ .01), but not in (12) Causality (v 2 (2, 134) = 7.923, p [ .01). The post hoc analysis showed that Self-identified had a higher proportion and Neutral a lower proportion of Statement of Fact (10) statements than expected. Conversely, Correlation (11) and Speculation/Hypothesis (13) statements were more frequent than expected in the Neutral group and less than expected in the Selfidentified group. Table A3 in the Appendix lists the number of statements for each group and statement category. Figure 4 shows the distribution of statements discussing relations to other research and Limitations with the authors' own study. Significant differences were found with v2-tests in (14) Earlier theories (v 2 (2, 1622) = 85.745, p \ .01), (15) Earlier results (v 2 (2, 326) = 100.84, p \ .01), Comparisons (v 2 (2, 175) = 28.359, p \ .01) and Limitations with the authors' own study (v 2 (2, 174) = 115.373, p \ .01) but not in (16) Own results (v 2 (2, 585) = 2.733, p [ .05). The post hoc analysis showed that Self-identified had a higher proportion and Neutral a lower proportion of Earlier theories (14) statements than expected. Self-identified also to a lesser extent and Neutral to a greater extent than expected mentioned Earlier results (15), Comparisons (17) and Limitations (18) with the Fig. 1 Proportions of statements for the groups Self-identified, Inferred and Neutral within the five content categories. Statistically significant differences according to residual v 2 analysis are denoted with H = Higher proportion than expected and L = Lower proportion than expected across the groups authors' own study. In fact, seven of the Self-identified articles, five of the Inferred and one of the Neutral articles contained no discussion on limitations at all, and reflexivity was referred to only by one of the Self-identified articles. See Table A4 in the Appendix for the number of statements for each group and category.
In summary, there were significant differences between some levels of gender perspective for 13 of the 18 dependent variables, specifically four of the five subject matter categories, two of the four support categories, three of the four relations between variables categories, and four of the five relations to other research and limitations categories. A higher level of gender perspective was associated with more content mentioning society than individuals, more mentioning of earlier theories, a higher proportion of stated facts, and a lower proportion of speculation/hypothesis statements. More gender perspective was also associated with less mentioning of sex differences, results from other studies, and limitations with the authors' own study, as well as less supported statements. In terms of relations between variables, all three groups showed a low occurrence of statements mentioning causal relationships, but correlational relationships were more common the less the gender perspective. Having reported the main effects of gender group, we now turn to the interaction between type of content and the type of relationship between variables, for each gender group. Figure 5 shows that in Self-identified the largest proportions of statements, between * 8 and 30%, are found in the relationship category Statement of Fact, and deal with all content categories except Sex differences. Smaller proportions are also found in Correlation and Speculation/Hypothesis regarding Society, with 3.7 and 2.7% respectively. Figure 6 shows that the statements in the Inferred group are likewise mostly found in the relationship category Statement of Fact, with People Abstract at * 26% having the largest proportion, People Concrete, Society and Other all having around 12%, and Sex differences having * 4%. People Concrete also has a relatively large proportion of Fig. 2 Proportion of statements for the groups Self-identified, Inferred and Neutral within the four different levels of support. Statistically significant differences according to residual v 2 analysis are denoted with H = Higher proportion than expected and L = Lower proportion than expected across the groups Fig. 3 Proportion of statements for the groups Self-identified, Inferred and Neutral within relationship between variables in the statements. Statistically significant differences according to residual v 2 analysis are denoted with H = Higher proportion than expected and L = Lower proportion than expected across the groups Fig. 4 Proportion of statements for the groups Self-identified, Inferred and Neutral within the relations to other research and Limitations with the authors' own study. Statistically significant differences according to residual v 2 analysis are denoted with H = Higher proportion than expected and L = Lower proportion than expected across the groups correlational statements at * 8%, and People Abstract a proportion of * 2.5-5.5% in Correlation, Causality and Speculation/Hypothesis. Figure 7 conveys a quite different pattern than Figs. 5 and 6 for several categories. The largest proportions of statements within the Neutral group are found within People Concrete and People Abstract in Statement of Fact and Correlation. Somewhat lower proportions are also found in Causality and Speculation/Hypothesis for People Abstract with * 3-5%, and in Statement of Fact for Society, Other and Sex differences with * 2-6%. Society and Sex differences also had proportions of * 2 and * 4% in Correlation. There are substantial differences in the amount of correlational statements across the groups overall, as shown in Fig. 3, and the interaction graphs also show how these statements move from the content categories pertaining to people and Sex differences to Society as the gender perspective increases. While Neutral has the highest level of correlational statements, the few correlational statements in Self-identified mostly concerns society. The figures also show that the high occurrence of Statement of Fact in Self-identified are mostly found in statements concerning Society, People (both) and Other statements, while the low occurrence of Statement of Fact in Neutral concerns people in a concrete sense. Moreover, while Self-identified statements speculates and hypothesizes on societal matters, speculation and hypothesis statements in the Inferred and Neutral articles dwell mostly on people in an abstract sense.

Discussion
The purpose of the present study was to empirically evaluate the grounds for a range of criticisms that has faced the field of gender studies, as well as to provide a high-level description of the contents of the three groups of publications. The hypothesis that there are no differences in scientific quality indicators between the levels of gender perspective must be rejected. The alternative conclusion that more gender perspective is associated with lower quality is buttressed by a consistent dose-response relationship for seven of the nine quality indicators in the relationship categories. The level of support, however, was higher for Neutral but did not differ between the other two groups. The second hypothesis regarding subject matter seems to be supported. The focus on society in the Self-identified articles is consistent with the emphasis on societal systems and structures in the field (Rothstein 1999) and with previous results showing that gender perspective is predominantly concerned with societal explanations (Söderlund and Madison 2017). Furthermore, the many statements about societal issues may be related to the political equality goal of gender studies (Thurén 2002;Cronin et al. 1997), consistent with claims that gender studies is overly ideological (Friedman 1997;Hoff Sommers 1995). The lesser focus on actual or abstract people and sex differences, compared to the Neutral group, might be contingent on a top-down conception of conditions for women and men, as opposed to a bottom-up conception emerging from the actions and preferences of individuals (recent relevant meta-analyses and systematic reviews include Lippa 2010; Puts et al. 2008;Stoet Fig. 6 Proportion of statements in Inferred with respect to content categories and type of relationship in the statements and Geary 2012; Su et al. 2009;Voracek et al. 2011;Voyer and Voyer 2014). Moreover, the Inferred and Neutral texts speculated and hypothesized about people in an abstract sense, while the Self-identified texts dwelled mainly on societal speculation. The Neutral texts also had a higher proportion of cross-sectional sex differences (included in the Correlation category) than the other two groups. These results indicate a general epistemological difference across disciplines, where the texts written by gender scholars are more abstract, less empirical, and to a greater extent focus on societal matters. This tendency for applying a birds-eye perspective is consistent with the criticism that gender studies rarely concerns itself with empirical data (Rothstein 1999), and when it does, it tends to consider these as group level phenomena, such as ''attitudes'', ''norms'', ''values'', or ''practices''. It is worth noting that such constructs may just as well be the cause as the effect of people's actions, which makes it logically impossible to assess any causal direction. This seems also to be reflected in a tendency toward higher proportions of causal statements in the Inferred and Neutral groups (albeit non-significant). In terms of relations between variables, the interaction plots exhibited a pattern where Self-identified had many statements about society and people in an abstract sense and a medium number of statements about people in a concrete sense. Almost all of these were presented as statements of fact. In contrast, the Inferred and Neutral groups largely framed their frequent mention of people in a concrete sense in the context of correlations with other variables. This pattern is consistent with the criticism that the research methods employed by gender studies tend not to use causal or even correlational interpretations  (Pereira 2012). Figures 5, 6 and 7 display many more interactions, which together with the even more detailed tables in the Appendix constitute raw data for further analysis. We refrain from attempting to interpret them further, because they were not part of our hypotheses.
Continuing with the dependent variables that more directly tap scientific quality, we find that Causality and Own results (numbered 12 and 16 above) exhibited no significant difference across the groups, and are apparently dimensions on which the groups do not differ. This is further supported by the lack of any consistent trend related to gender perspective. Correlation, Speculation/Hypothesis, Earlier results, Comparisons, and Limitations with the author's own study all garner more mention the less the gender perspective (11, 13, 15, 17, and 18), and the level of support increases (6-9). In contrast, Statement of Fact and statements related to Earlier theories (10, 14) were more frequent the more the gender perspective.
Correlation includes mention of empirical relations between variables that are not explicitly causal, such as cross-sectional designs, and exhibits substantial and highly significant differences across all three groups. Again, this is consistent with a general epistemological difference across disciplines, as argued by Rothstein (1999) and Popova (2005). A smaller trend in the same direction was observed for Speculation/Hypothesis, meaning that a higher level of gender perspective was associated with a lower proportion of explicit hypothesis (although this did not hold for the absolute difference between Inferred and Neutral).
The lower occurrence of Earlier results and Comparisons with earlier research within Self-identified, and supported statements in Self-identified and Inferred, can be related to the claim that gender studies is non-academic (Friedman 1997;Hoff Sommers 1995). Adding to this is the lesser mentioning of Limitations with the authors' own study in the Self-identified texts, and the fact that more than half of the Self-identified articles and two fifths of the Inferred articles contained no statements about limitations at all. It is notable that reflexivity was only found in one of the 24 articles with any level of gender perspective, although reflexivity is claimed to be an especially strong virtue of gender studies texts (Chrisler and McCreary 2010;Dölling and Hark 2000). The pronouns ''I'', ''we'' ''mine'' and so on were common enough, and made the author visible in the text rather than using passive constructions, but were rarely extended to a discussion on the limits with the own study.
In contrast, statements related to Earlier theories were more frequent the more the gender perspective. A scientific study is typically stronger if it is based on theory, given that a theoretical prediction is actually tested. As seen in the Appendix, there is no dearth of mentioning theories even in the Neutral articles, however, with a total of 285 statements corresponding to an average of about 24 times per article. Whereas the articles with a gender perspective mention Earlier theories 56 times per article on average, the low frequencies of correlational statements and comparisons with earlier results indicate that few of the theories mentioned are evaluated in the light of the empirical data, if extant, let alone systematically tested. The difference in Earlier theories seem therefore to be consistent with gender studies being overly theoretical, in the sense that a very small fraction of the theoretical propositions mentioned are or even can be evaluated or tested.
Finally, the higher frequency of Statement of Fact might be a trivial consequence of fewer Correlation and Speculation/Hypothesis statements, as this tends to function as a default category in the content analysis and corresponds in magnitude to the lower frequencies to these categories. On the other hand, part of this difference could be a characteristic of gender perspective, which seems to be supported by the specific pattern evident in Figs. 5, 6 and 7, in particular its relation to Society.
Overall, these data indicate systematically lower scientific quality across several different indicators as the level of gender perspective increases. While claims to that effect have been based on anecdotal evidence or observations of gender studies publications per se, this is the first time it has been decisively assessed comparing randomly drawn samples from an entire population (but see Söderlund and Madison 2017, for a different set of indicators).
Starting with a wide perspective, the present results entails questions about the extent to which the practices and traditions of gender studies are conducive to the purpose and goals of the field. With less mention of relations to other studies and results, and less support for statements and arguments, it appears to focus on communicating examples of different experiences and viewpoints of certain groups of people, rather than comprehensive models of the real world. If social and societal change is the goal, however, it would seem that such models are essential for demonstrating the magnitude of both the problems to be solved and the side-effects that their solutions entail. In contrast, gender scholars express strong preferences for descriptive, non-experimental designs, and for qualitative data and analysis (Eagly and Riger 2014;Grossman et al. 1997), and are also keen to criticize logical positivism and ''traditional'' science in general (e.g., Magnusson 2011). These are but a few aspects of so-called feminist epistemology, which is argued to be superior to mainstream scientific methods and to transform traditional psychology and the very conceptualizations of knowledge (Grossman et al. 1997, pp. 82-83). This revulsion from what could coarsely be called traditional science is expressed amongst gender scholars in general, not only those cited here. It indicates the desire to overthrow tradition, or to establish a competing tradition. But this also entails a risk for isolation within one's own field, and contains the seeds for a spiral process, as mutual criticism raises the palisade and increases the distance to the 'other'.
It would therefore seem convenient to attribute differences in methods and practices to tradition, and defend them on the grounds that it is unfair to compare traditions in which people have been trained differently. However, we maintain what we believe is a mainstream ideal that methods shall be determined by the questions being asked. Inasmuch as scholars address the same question, which may well happen across psychology, sociology, economics, and political science, for example, it should be appropriate to compare them according to the same standards. If the research question involves the relation between hormones and behaviour, for example, there is no reason that the chemical analyses or procedures be substandard if the study were designed by a psychologist rather than a chemist. The logic of that argument is plain, according to which tradition is not principally a valid reason for applying suboptimal methods or lower scientific standards. The issue becomes more complex if we consider not merely differences in how to answer the same questions, but differences in what questions to ask. As we have seen, social and societal levels of explanation are favoured from a gender perspective, whereas factors related to individual differences are not. It is reasonable that a discipline focused on biological factors formulates questions about, for example, hormonal influences, that one focused on social interaction formulates questions about verbal and other behaviour, and so forth. But it is imperative that these diverse findings can be compared and jointly evaluated, lest they remain disconnected and hence unable to advance knowledge. Therefore, the question of fairness misses the point, which is that science can and should ultimately be evaluated on the basis of the knowledge it generates. A hallmark of good science is that researchers maintain an awareness and casual knowledge about the contributions from other fields to understanding the phenomena they study.
Possible limitations with the present study should be considered. First, the SGSL cannot be expected to contain every relevant publication. Although a population database, it is virtually impossible to find each and every gender journal article published at Swedish universities, partly because of the interdisciplinary nature of gender studies. This should not constitute any problem for the present study, because it is based on differences between groups of articles, and possible missing data would affect all groups equally.
Second, it might be argued that 12 articles in each group is too small a sample for the reason that random factors might play a large role. However, the critical entity is the amount of text rather than the number of articles, because the unit of analysis was the group of articles, as representing a certain level of gender perspective. This is consistent with the fact that the statements were drawn from the introduction and discussion and therefore less tied to the study at hand. According to this design, each statement was equally likely to be selected from within each group of articles, which were relatively equal in length so that exceptionally long articles should not skew the results.
In conclusion, gender studies is consistent in both expressing and practising selective preferences, for example concerning methods, levels of explanation, and ways to communicate (see also Cronin et al. 1997;Söderlund and Madison 2017). The present results indicate that gender studies has positioned itself more distantly from the scientific method than have the disciplines included for comparison. However, the impact of the field will be attenuated if it employs methods and practices that decrease the strength of conclusions and generalizability of results. Bibliometric studies seem to suggest that the peculiarities of the field are indeed hampering its influence and reputation (Söderlund and Madison 2015). The preference for dependent social constructs that forbid causal conclusions, such as ''attitudes'' and ''norms'', is particularly paradoxical in light of the desire for change, as knowing what causes a certain change is absolutely necessary to bring it about. It may therefore be argued that gender studies could benefit from implementing established research methods employed in other disciplines. There is no opposition, we argue, between using qualitative data, like verbal data from talk or text, and applying the power of quantitative analysis, as has humbly been done in the present study, for example. Likewise there can be no opposition between analysing qualitative data and considering their reliability, validity, and representativity, as well as causal relations amongst constructs. A similar point can be made regarding gender studies' criticism of established disciplines, not the least in the way they relate to sex and sex differences. Discussion and criticism is always welcome from a scientific point of view, but it tends to be more useful and make greater impact if founded on a thorough understanding of that which is criticized.
Future studies should assess if the general patterns found so far generalise to other countries and samples, should be extended to more detailed content analyses, and may also attempt to assess the argumentative foundation of the texts. If larger samples could be analysed, it would be feasible to compare publications on the basis of their individual characteristics in terms of the amount of data, scientific outcome, the type of methods, and most importantly, the correspondence between the stated research question and the methods.
Gender studies has received exclusive funding in Sweden, with the specific goal to increase its quality, outside competition with other fields (Swedish Research Council 2011a, b). It seems therefore particularly important to ensure that these resources attain the desired results, and that gender studies concur with the goal to strive for the highest possible level of scientific quality, which can be expected to increase its relatively low impact and few citations (Söderlund and Madison 2015). In short, gender studies can do better.