To study the gender patterns of publication in research subfields in the journal Demography, we had to identify the gender of all of the authors and the research subfield in which each article was published (for a detailed description of the data collection and categorization procedure, see Nieberg et al. 2014). The citation information for all of the publications in Demography was drawn from the online JSTOR database and the Web of Science database, which covers the Science Citation Index Expanded, the Social Sciences Citation Index, and the Arts and Humanities Citation Index.
The key independent variable in our investigation is the gender of the author. However, the citation information obtained from Web of Science and JSTOR does not note the author’s gender. Using the first names provided in the literature databases, we used the program “gender.c” to identify the authors’ gender.Footnote 2 This program contains a list of first names for all European countries, for the United States, and for some Asian countries (e.g., China, India, and Japan). According to the program’s developer, the decision rules for assigning the typical gender to each name are based on assessments provided in interviews with several native speakers and experts. The program assigns a gender to each first name if it is typically given to either males or females, and it flags unisex names. If a name was coded as unisex or as not classifiable, we identified the gender of the author manually through online research. We were unable to retrieve the gender of the author in 185 cases (4 % of the full sample). These cases were omitted from our analysis. The total sample contains 2,252 articles with 4,197 authors.
Categorizing articles by research area was the most challenging task in our analysis. After experimenting with several options, we decided to assign each article to a single category only (for a similar strategy, see Abramo et al. 2009; Dehdarirad et al. 2015; Dolado et al. 2012; Maliniak et al. 2013; West et al. 2013). Furthermore, we based the categorization of the papers on the dependent variable (see also Teachman, et al. 1993). Because demography is an applied field of research, the classification of the papers by the dependent variable seemed straightforward and easily reproducible. However, ambiguous cases remained, especially if a paper was more theoretical or covered a broad range of topics. An alternative approach might have been to use a more objective, computer-assisted classification procedure that used keywords searches. However, using a computer-based approach has disadvantages, particularly if it is based solely on a frequency count of words. For example, even if a word search showed that the term “fertility” appeared more frequently than the term “migration” in a paper, it would still be difficult to judge whether the paper was on migration or on fertility. We thus determined that a more qualitative approach that could take into consideration the chief objective of a paper (by focusing on the dependent variable) was the better option.
Before we could assign articles to subcategories, we had to create these categories. In this case as well, computer-based approaches might have been applied to generate subfields (see, e.g., Merchant 2015). We rejected this option because the core subfields in demography are well defined. Demographers generally agree that the main pillars of demographic research are fertility, mortality, migration, and methods. We therefore used these four narrow subfields as the basis for our classification procedure. After analyzing a sample of articles published in Demography, we extended this classification to include the following categories:
To help us assign each article to one category only, we developed a list of keywords for each subfield (see Table 2 in the appendix). In a pre-test we grouped the different articles based on the list of keywords alone. However, we later determined that the inclusion of additional decision criteria would improve the classification procedure (see Nieberg et al. 2014). Among the advantages of focusing on publications in Demography are that the papers tend to have a similar structure, and that most are quantitative studies with a clearly defined outcome variable. For example, an article on the effect of migration on first births would have been assigned to the subfield “fertility,” while a study on the effect of fertility on migration decisions would have been assigned to the subfield “migration.” Meanwhile, a paper that explored more than one outcome—for example, fertility and mortality processes—would have been assigned to the “other” category. Some papers focused not on a cause-effect relationship between variables, but on methodological aspects, such as improved measurement or data issues. Regardless of their content, we assigned these studies to the “methods and data” category.
This classification system is clearly subjective, and assignments may vary depending on the individual rater. However, to assess the reliability of the classification procedure, we instructed three independent raters to categorize a sample of the papers based on our classification rules.Footnote 4 The three raters independently categorized the same sample of around 200 abstracts. Based on the categorization of this sample, we calculated the coefficient kappa, which measures the degree of agreement of different raters (Cohen 1960). The referring value of kappa was found to be above 0.80, which indicates an acceptable level of agreement on the classification of the publications among these three raters.