Introduction

The debate surrounding gender inequity is more relevant than ever, as it has reached new heights, with initiatives and research shedding light on the imbalances across various contexts. Recently, the Nobel Prize in Economics recognized the work of Goldin, who delved into the historical discoveries of the gender gap over two centuries. Her research has significantly contributed by contextualizing the presence of women in the labor market, addressing aspects like social norms, institutional structures, and assessing the impacts of significant events, such as the advent of the contraceptive pill (Goldin & Katz, 2002; Nobel Prize Outreach AB, 2023). Furthermore, the current availability of data has enhanced our capacity to detect, analyze, and comprehend disparities, enabling us to examine specific contexts and uncover their significance.

Within the broader context of gender disparities, academia is not an exception. Despite recent efforts to reshape the scientific landscape, gender imbalance persists within academia (Huang et al., 2020; Lariviere et al., 2013). This issue is not solely a question of the number of women scientists (UNESCO Institute for Statistics (UIS), 2019), since they constitute the majority of Ph.D. candidates (Organisation for Economic Co-operation and Development (OECD), 2022), but they remain underrepresented in leadership roles and higher-ranking positions (Fagan & Teasdale, 2021). Notably, women scientists tend to produce less research output (Zhang et al., 2021) and achieve lower impact with their work compared with their counterparts (Abramo et al., 2015; Chen & Seto, 2022; Joanis & Patil, 2022; Kozlowski et al., 2022; Lariviere et al., 2013; Liu et al., 2020; Mauleón & Bordons, 2006). Regarding funding and grant distribution, the results are inconclusive, with studies supporting the notion that women receive less investment and that there is no difference (Bol et al., 2022; Bornmann et al., 2007; Chaudhary et al., 2021; Marsh et al., 2009). A recent study showed that a large proportion of women do not experience a sense of belonging in academic environments and are more likely to leave the profession due to this lack (Spoon et al., 2023). Nevertheless, women who remain in the academic system often receive lower incomes (Moss-Racusin et al., 2012; Darren & Steven, 2010), and, ultimately, experience the impact of disparities in hiring decisions (Shen, 2013).

The underrepresentation of women in the scientific system is more evident in some fields of knowledge. Notably, in science, technology, engineering, and mathematics (STEM) fields, this gap is significantly higher (Holman et al., 2018), and even more pronounced in certain STEM subfields, such as computer science and math (Ceci et al., 2014; De Nicola & D’Agostino, 2021; Fagan & Teasdale, 2021). In contrast, there appears to be more gender equity in human and health-related fields (Su & Rounds, 2015). The scenario gains complexity if we consider aspects such as race and ethnicity through its intersection with gender, as it could be related to the choice of research topics (Kozlowski et al., 2022). The explanations for the unequal women’s participation in different fields of knowledge remain unclear, and the debate revolves around structural problems associated with gender roles (Ceci et al., 2014; Petrongolo, 2019).

To understand and evaluate the complexity of the scientific system, we employed the Science of Science (Zeng et al., 2017), which allows us to structure science as a conceptual, social, and intellectual structure and interpret it from a multidisciplinary perspective (Moral-Munoz et al., 2020). The diversity of methodological approaches allows us to delve into specific fields, understand their evolution, identify behavioral patterns, and gain deeper insights into the dynamics of knowledge production. The significance of this approach lies in its ability to uncover meaningful and interpretable insights into the various roles within science. This, in turn, enhances our understanding of the past, which is essential for comprehending the present, and provides the foundation for new projections aimed at creating a future characterized by greater fairness and transparency. Nevertheless, due to the complexity of the scientific system, accurately quantifying and assessing knowledge transfer remains a challenge. This complexity arises from specificities, including the distinct behaviors exhibited by different fields, making it difficult to precisely evaluate each phenomenon (Bornmann, 2017). To address this complexity, the approach should involve understanding different contexts and perspectives, determining what needs to be measured, and determining the best way to do it (Nygaard & Bahgat, 2018; Zeng et al., 2017).

Given this background, conducting studies within specific fields of knowledge is a pragmatic need to deepen our understanding. This necessity arises from the fact that behavioral patterns within each discipline, field or subfield can be unique, making it crucial to comprehend and standardize measurement within each context (Butler, 2008; Dorta-González & Dorta-González, 2013). For instance, the Library and Information Science (LIS) field has traditionally been predominantly women-dominated (Piper & Collamer, 2001). In earlier times, women in the LIS field were often related to and specialized in library education, whereas men were related to information science or high status activities (Harris et al., 1985; Varlejs & Dalrymple, 1986). Nowadays, the gender scenario in the LIS field varies according to the analysis context. For instance, in terms of authorship, in the USA, LIS is a field where there are more women than men as first authors (Thelwall & Mas-Bleda, 2020), whereas the opposite occurs in India (Parabhoi et al., 2020). Nevertheless, our understanding of this field through a gender lens remains limited. Gender-focused studies in LIS have primarily centered on select LIS print and e-journals, exploring authorship, productivity, and citation patterns. Although the presence of women authors is increasing, in general numbers, in both print journals and e-journals over the years, women are more present in e-journals than in print ones. For e-journals, the publication and citation rates of women’s contributions have also increased over time. Meanwhile, when looking at print journals, the increase was seen specifically in library science journals, but in information science journals, the presence of women remained stable (Linsay, 2010; Lund & Shamsi, 2021; Vinay, 2021). In contrast, the e-journals showed the opposite; in there, men’s participation has increased over the years, but the citation rate shows no differences by gender (Gul et al., 2016). In this case, the citation rate showed an increase in gender collaborative teams, resulting in more grants than men-men collaborative work. In a local context, an earlier study analyzed the gender balance in the Indian context finding similarities with the journal observations: men outnumber women authors, but in this specific case, women published more in international journals than in national ones (Parabhoi et al., 2020).

Among the factors that may affect the scientific production and impact of researchers are the themes they address. To gain insights into the relationship between these characteristics, we explored the LIS field and its evolutionary dynamics, analyzing the field from 2007 to 2022. This analysis adopted a conceptual approach, including assessing the relative contribution by gender to each theme, its scientific impact, and a science mapping analysis. This approach aims to shed light on whether the themes addressed by each gender contribute to the observed disparity in their scientific impact.

The subsequent sections are organized as follows: Sect. 2 outlines the methods employed in our analysis. Then, in Sect. 3, the results obtained are presented. Sect. 4 discusses the results, addresses limitations, and suggests future research. Finally, Sect. 5 provides the conclusions of the study.

Methodology

As shown in the introduction, our hypothesis is that the difference in the scientific output and impact of women and men could derive from a subject bias. That is, each gender might address different subfields with different citation patterns. We can use science mapping analysis based on co-word networks and bibliometric indicators to test whether this pattern occurs in the LIS field. To this end, we propose a methodology based on five phases: (i) Data acquisition, (ii) Data preparation, (iii) Data selection, (iv) Theme detection and (v) Performance analysis.

Data acquisition

For further analysis of the themes addressed and their relative gender composition in the LIS field from 2007 to 2022, we retrieved the corpus from the Web of Science (WoS) using the following criteria:

  • Documents indexed on the Science Citation Index\(^{\textrm{TM}}\) Expanded (SCI) or Social Sciences Citation Index® (SSCI).

  • Documents under the “Information Science & Library Science” category.

  • Documents produced between 2007 and 2022.

  • Documents tagged as articles or reviews.

The query was executed on January 23, 2023, and retrieved 63,843 documents. The data was downloaded using the WoS API (Velez-Estevez et al., 2023).

Data preprocessing

To ensure the accuracy of our bibliometrics analysis, the preprocessing step is fundamental. In our case, we have two units of analysis: on one hand, we employed keywords to define themes and ultimately generate the science maps. On the other hand, we used authors’ names to infer gender. For this analysis, the keywords used are both author keywords and the KeywordsPlus, generated by WoS (Garfield, 1990). Keyword preprocessing is crucial for preventing misinterpretation during subsequent co-word analysis. This step was conducted using the open-source software SciMAT (Cobo et al., 2011, 2012). The first stage of preprocessing involved the unification of singular and plural words, followed by a de-duplication step to terms that describe similar concepts (e.g., ‘Hirsch index’ and ‘h-index,’ ‘NN’ and ‘Neural-Networks’). This step also included the identification of terms with broad meanings, such as ‘study’, ‘analysis’, and ‘algorithms.’ To ensure the quality of this process, the authors reviewed the whole set of preprocessed keywords.

The subsequent step consists on the gender identification of the authors. To achieve this, the given name of each author was employed to determine the gender. Three datasets containing given names and gender information were used: the United States Census (Social Security Administration, 2022), the NamesDatabase repository (Leaderboard, 2022), and the Python library GenderGuesser (Pérez, 2016). To handle cases with unisex names, we developed an algorithm for disambiguation that assigns gender based on the gender associated with the name in 90% of the instances. We selected a 90% threshold based on previous research that employed the first name of the author to assign gender (Thelwall & Mas-Bleda, 2020; Thelwall et al., 2019, 2023), and expect that this threshold would be able to ensure that the gender assignment would almost always be correct. It is worth noting that in this study, gender was considered binary due to the fact that other gender identities require self-identification.

Despite having used different APIs and methods to determine the gender of each author, there may be names for which we have not been able to associate gender. Considering that our analysis is at document-level, it was necessary to perform an additional step to filter out documents in which the number of authors of unknown gender is higher or equal than that of male or female authors. Thus, after data filtering, the corpus had 45,650 documents (71.5% of the retrieved at data acquisition step).

As mentioned above, our aim is to determine the thematic differences between men and women and also, analyze how this difference changes and evolves over time. For this reason, we perform our analysis under a longitudinal framework, splitting the corpus into four consecutive periods: (i) From 2007 to 2010, containing 9,845 documents, (ii) From 2011 to 2014, containing 11,125 documents, (iii) From 2015 to 2018, containing 12,320 documents, and (iv) From 2019 to 2022, containing 12,360 documents.

Themes detection

After data selection, four co-word networks were built using the keywords of each document for each of the previously mentioned periods. In this type of network, nodes represent the keywords found within the corpus, and edges represent the co-occurrence relationships, indicating connections between these words (Batagelj & Cerinšek, 2013; Cobo et al., 2011).

To ensure data consistency and minimize bias, we normalized the co-occurrence frequencies of the networks obtained in the previous step using the equivalence index (Callon et al., 1991). Subsequently, the Leiden clustering algorithm Traag et al. (2019) was used to perform the community detection. This process results in a collection of clusters that represent the topics or themes covered within each dataset. The choice of this algorithm was influenced by its ability to identify cohesive communities. After community detection, each theme was named using the most central keyword of all associated keywords. This approach selects the word most connected in each community (Cobo et al., 2011). In our case, these central words were effective in summarizing the structure of communities.

Once we determined the themes addressed in the LIS field for each dataset, four strategic diagrams were built. These diagrams provide a two-dimensional representation based on measures of centrality and density for each theme. Centrality measures the external cohesion of each theme in relationship to other detected themes, while density reflects the internal cohesion within each theme and reveals how the keywords are interconnected among themselves (Cobo et al., 2011). Based on these variables, the themes are categorized into four groups: “Motor”, “Basic and Transversal”, “High developed and Isolated” and “Emerging or Declining”. Motor themes exhibit high centrality and density, indicating that these themes hold a central position compared to other themes, and the keywords within them display strong cohesion, suggesting that these themes are well-developed and well-structured. Similarly, Basic and Transversal themes are those showing high centrality between themes, although internal cohesion is not high, representing general and transversal themes in the field. High developed and Isolated themes are themes with low centrality and high density, implying that they are in the backbone of the research area. However, the keywords within these themes exhibit high cohesion, representing specialized or peripherical themes. Finally, Emerging or Declining themes, characterized by low centrality and density, represent themes with weak centrality and cohesion, indicating either emerging or declining themes.

Performance analysis

In order to quantify and analyze the presence and impact of women in the detected themes, a performance analysis was conducted. Various metrics were computed, including the number of papers within the theme, sum of citations, geometric mean of citations, mean normalized citation score (MNCS) (Waltman et al., 2011), top 1% of most cited papers, percentage of documents in H-Classic (Martínez et al., 2014), percentage of women and men within each theme, and relative gender contribution rate (RGCR).

The MNCS is a metric for assessing the citation impact, facilitating comparisons across different disciplines, fields, subfields, and themes. Within the scope of our analysis, MNCS is computed by considering the citations received within each theme and the average citations within the LIS field for each year. When the MNCS value equals 1, the average number of citations received within a particular theme is equivalent to the average of citations received in the LIS field. Conversely, when the MNCS exceeds 1, the average number of citations for this theme surpasses that in the LIS field (Waltman et al., 2011). Meanwhile, calculating the metric for the 1% of most cited papers involves determining the percentage of documents that achieve this categorization. This calculation relies on two components: (i) the number of most cited documents in the dataset for a given year and (ii) the number of most cited documents within each theme for the same year. The H-Classic metric computation is based on the h-index (Hirsch, 2005). This involves the examination of documents that are part of the h-core, which is defined as a set of papers, each having h or more citations (Martínez et al., 2014).

The percentage of women and men within each theme is measured from the gender assigned to the authors, enabling us to analyze representativeness in each case. So, considering the global disparity in the number of women and men researchers (Holman et al., 2018), it is expected that the distribution of men across disciplines, fields, subfields, or themes will probably outnumber women. In order to recognize, assess, and normalize this initial imbalance, we introduce the RGCR metric. The RGCR evaluates the presence of women authors within a specific theme concerning both the overall men within the theme and the global dataset. It provides a measure indicating whether the gender contribution rate is equal, higher, or lower than the average of women and men participation in the LIS field. The calculation of RGCR is based on the percentage of women and men authors within the initial dataset and their respective percentages within each. The RGCR can be computed using the following Eq. (1):

$$\begin{aligned} RGCR = \frac{\left( \frac{X_{Women-theme}}{X_{Women-global}}\right) }{\left( \frac{Y_{Men-theme}}{Y_Men-global}\right) } \end{aligned}$$
(1)

In this equation, \(X_{Women-theme}\) represents the percentage of women authors in the specific theme, \(X_{Women-global}\) is the percentage of women authors in the global dataset, \(Y_{Men-theme}\) denotes the percentage of men authors in the theme, and \(Y_{Men-global}\) signifies the percentage of men authors in the global dataset. When RGCR is equal to or close to 1, it indicates that this theme was more gender balanced. Values greater than 1 indicate that women authors were overrepresented in that theme, whereas values lower than 1 indicate underrepresentation. This approach provides the advantage of assessing the presence of women authors in each theme concerning the overall presence of women in the LIS field from 2007 to 2022.

To enhance our understanding of the relationship between the RGCR and MNCS achieved by each theme during the analyzed periods, a set of graphics was generated. In these visual representations, the y-axis represents the RGCR, the x-axis represents the MNCS, and the diameter of each circle corresponds to the number of documents in each topic. This delineation, combined with the strategic diagrams, provides a deeper insight into the presence of women in the themes and their importance in the field, facilitating the understanding of their similarities and differences.

Results

After carrying out the steps of the methodology, the global dataset, between 2007 and 2022, contained 45,650 documents (representing 71.5% of the data retrieved initially) authored by 126,418 authors, of which 40.85% (51,644) of the authors were women, 53.6% (67,505) were men, and 5.55% (7,269) unknown. The global dataset was divided into four consecutive periods to uncover the thematical differences between women and men and how they evolve. Information such as the number of documents used and the representativeness of the genders in each case can be observed in Fig. 1. In what follows, we analyzed each period, uncovering the themes that women and men were focused on and computing their performance and impact measures.

Fig. 1
figure 1

Percentage gender over the analyzed periods: 2007 to 2010, 2011 to 2014, 2015 to 2018, and 2019 to 2022

From 2007 to 2010

The period from 2007 to 2010 comprised 9,845 documents, in which 42,68% of the authors were women. The strategic diagram identified 21 themes, as illustrated in Figs. 2 and 3. Among these themes, women were overrepresented in four of them, with greater presence in Substance-use and Posttraumatic-stress-disorder. Meanwhile, men were overrepresented in 13 themes, with greater presence in Clustering and Online-auctions. The themes with similar ratios of each gender were Education, Physician-order-entry, Information-retrieval and Computed-mediated-communication.

Fig. 2
figure 2

Strategic diagram from 2007 to 2010, where the x-axis and y-axis represent the centrality and density, respectively, of each community. Circumference sizes indicate the number of documents in each community, while their position allows classification as ’Motor’, ’Basic and Transversal’, ’Emerging and Declining’, and ’High developed and Isolated’

Fig. 3
figure 3

Distribution of themes from 2007 to 2010 based on their impact (MNCS, x-axis) and relative gender contribution rate (RGCR, y-axis). Circumference sizes indicate the number of documents in each community. The upper quadrant displays the topics where women authors were over-represented, while the lower quadrant shows themes over-represented by men. Themes along the x-axis are considered more gender balanced

Among the four themes overrepresented by women, i.e. with high RGCR values, three of them can be linked to health-related fields, such as Health, Substance-use and Posttraumatic-stress-disorder. The other one, E-books, can be related to a classic theme of LIS. Health had the highest productivity from 2007 to 2010 and was considered as motor theme, underscoring its importance as a theme within the LIS field. In terms of scientific impact (Table 1), Health presented 1.56 of MNCS, around 4% of documents in the 1% of most cited papers, and 5.5% in H-Classic. Meanwhile, the productivity of Substance-use and Posttraumatic-stress-disorder was low compared to other themes, with both positioned as highly developed and isolated. Although both themes presented MNCS around 1.5, neither reached 1% of documents in the 1% of most cited papers and H-Classic. In contrast, E-books was the theme with the lowest MNCS value and almost no documents were in the 1% of most cited papers and H-Classic.

Regarding the 13 themes overrepresented by men, i.e. with low RGCR values, most of them can be linked to STEM fields or classical themes of LIS, such as Clustering and Impact, respectively. Among these themes, Impact, Information-technology, Innovation, E-government and Information-retrieval were shown to have the highest productivity and in the strategic diagram were positioned as motor and/or basic and transversal. In terms of MNCS, all these themes exceed the average, with Clustering, Online-auctions, and Enterprise-systems presenting values higher than 2. Furthermore, all the themes overrepresented by men presented at least 0.5% of documents in the 1% of most cited papers and H-Classic.

Finally, it should be pointed out that from 2007 to 2010 the themes with the highest impact measures were those overrepresented by men. For instance, Information-technology, Impact, and Innovation were the most productive and exhibited a high percentage of papers in the 1% of most cited papers and H-Classic. Online-auctions and Enterprise-systems were themes that achieved the highest MNCS among 21 identified themes.

Table 1 Performance table and impact indicators of LIS field from 2007 to 2010

From 2011 to 2014

The period from 2011 to 2014 consisted of 11,125 documents, being women 47.67% of the authors. The percentage of women authors was slightly higher than observed from 2007 to 2010. The strategic diagram identified 20 themes (Figs. 4 and 5), six of them were overrepresented by women, and 13 themes were overrepresented by men. Only Information-quality presented a rate of gender similar to the global dataset (see Table 2).

Fig. 4
figure 4

Strategic diagram from 2011 to 2014, where the x-axis and y-axis represent the centrality and density, respectively, of each community. Circumference sizes indicate the number of documents in each community, while their position allows classification as ’Motor’, ’Basic and Transversal’, ’Emerging and Declining’, and ’High developed and Isolated’

Fig. 5
figure 5

Distribution of themes covered from 2011 to 2014 based on their impact (MNCS, x-axis) and relative gender contribution rate (RGCR, y-axis). Circumference sizes indicate the number of documents in each community. The upper quadrant displays the topics where women authors were over-represented, while the lower quadrant shows themes over-represented by men. Themes along the x-axis are considered more gender balanced

Table 2 Performance table and impact indicators of LIS field from 2011 to 2014

The six themes overrepresented by women are more related to classical themes of the LIS field and health-related fields. Of these themes, Librarianship, Academic-libraries and Information-literacy did not achieve the average of MNCS, and among them, Librarianship was highlighted by its 3.2 RGCR value, being the theme with the highest presence of women. It showed the lowest MNCS among all identified themes and did not present any documents among the 1% most cited papers and H-Classic. On the contrary, Caregivers, Medline and Health were those themes where the MNCS achieved was exceeded the average. It should be noted that Health was the most productive, and its documents were more present in the 1% most cited papers and H-Classic.

On the other hand, men were overrepresented in 13 themes, and most of them can be related to STEM fields, such as Webometrics, Information-systems, Software and Nanotechnology. These themes presented the lower rate of women authors among the identified themes, and all of them showed an MNCS higher than the average, with documents among the 1% most cited papers and H-Classic. Other highlighted themes in terms of presence of women were Impact and Innovation, presenting 0.69 and 0.65 of RGCR, respectively. For one hand, Impact presented the highest productivity in the period, was positioned as basic and transversal, and at least 4% of documents in the 1% most cited papers and H-Classic. On the other hand, Impact was the third in terms of production, was positioned between basic and transversal, and motor, and was the theme that presented the highest percentage of documents in the 1% most cited papers and H-Classic.

From 2015 to 2018

The period from 2015 to 2018 encompasses 12,320 documents, with 43.92% of women authors, representing an increase of 2.9% compared to the period from 2011 to 2014. Among the 21 identified themes (Figs. 6 and 7), six of them exhibited an overrepresentation of women (Table 3), 11 of men, and four presented similar rates of gender.

Fig. 6
figure 6

Strategic diagram from 2015 to 2018, where the x-axis and y-axis represent the centrality and density, respectively, of each community. Circumference sizes indicate the number of documents in each community, while their position allows classification as ’Motor’, ’Basic and Transversal’, ’Emerging and Declining’, and ’High developed and Isolated’

Fig. 7
figure 7

Distribution of themes covered from 2015 to 2018 based on their impact (MNCS, x-axis) and relative gender contribution rate (RGCR, y-axis). Circumference sizes indicate the number of documents in each community. The upper quadrant displays the topics where women authors were over-represented, while the lower quadrant shows themes over-represented by men. Themes along the x-axis are considered more gender balanced

Once again, the six themes overrepresented by women can be related to classical themes of LIS, human, and health-related fields. Taking into account the MNCS, Information-literacy and HIV were below average, Document-delivery, Physical-activities and Race were on average, and just Health showed a 1.19 of MNCS. Health and Physical-activities were positioned as motor, Information-literacy as basic and transversal, and the other three themes as highly developed and isolated. Despite Physical-activities and Race presenting MNCS around 1, they did not present even 1% of documents in the 1% most cited documents and H-Classic. Among the identified themes, Health and Information-literacy were the most productive, behind only Impact, and both themes showed the highest percentage of documents in the 1% of most cited papers and H-Classic.

Among the 11 themes overrepresented by men, we observed that most of them can be related to classical themes of the LIS or STEM fields. In the case of STEM, some themes can be specifically related to computer science, such as Data-curation, Natural-language-processing and Neural-networks. Except Knowledge organization, all the themes presented MNCS equal or higher than the average. Among these themes, Impact, Innovation and Information-technology were the most productive and presented the higher percentage of documents in the 1% of most cited papers and H-Classic, with the first one positioned as motor and Information-technology and Innovation as basic and transversal.

Electronic-health-records, Big-data, Smartphone and Social-media were the themes in which the gender rates were similar. To them, the MNCS were between 1.35 and 1.57, and the percentage of documents in the 1% of most cited papers and H-Classic were the highest to Big-data and Social-media. The position of Big-data in the strategic diagram was between emerging and basic and transversal, which reveals the increased importance of this theme in the last years. With respect Social-media, its position as basic and transversal reveals its current relevance to the LIS field.

Table 3 Performance table and impact indicators of LIS field from 2015 to 2018

From 2019 to 2022

The period from 2019 to 2022 included 12,360 documents, with 43.72% of women authors, a decrease of 0.46% compared with the period from 2015 to 2018. As illustrated in Figs. 8 and 9, 22 themes were identified. Six of these themes were overrepresented by women, 11 by men, and five presented balanced gender ratios (Table 4).

Once again, the six themes overrepresented by women are more related to classical themes of the LIS field, human, and health-related fields, such as Academic-libraries, Migrants and Physical-activity. Only Migrants presented an MNCS higher than the average, and its percentage of documents in the 1% most cited documents and H-Classic was around 1.5% and 0.67%, respectively. Similar to the periods analyzed previously, Health was the second most productive theme, left behind Impact. Although the presence of its documents in the 1% most cited documents and H-Classic, its MNCS was around 1, and its position changed to be between motor and basic and transversal.

Among the 11 themes overrepresented by men, we observed a diversity of themes. Some of these themes are related to STEM fields, such as Blockchain and Machine-learning, and to classical themes of the LIS field, such as Information-and-communication-technology and Impact. All themes presented MNCS equal or higher than the average, and at least 1% of documents of each theme were in the 1% most cited papers. Blockchain achieved 1.87 of MNCS, the highest MNCS of the themes overrepresented by men, although there were 3% of documents in the 1% most cited papers. The most productive themes were Impact and Innovation, their MNCS were 0.99 and 1.53, respectively, and, once again, both were positioned as basic and transversal.

Fig. 8
figure 8

Strategic diagram from 2019 to 2022, where the x-axis and y-axis represent the centrality and density, respectively, of each community. Circumference sizes indicate the number of documents in each community, while their position allows classification as ’Motor’, ’Basic and Transversal’, ’Emerging and Declining’, and ’High developed and Isolated’

The five themes where the gender ratio was similar were more diverse and covered themes such as Information-retrieval, Data-quality, Organizational-communication, Transgender and Social-media. Except for Social-media, their position in the strategic diagram was highly developed and isolated. Their MNCS ranged from 0.88 to 1.44, highlighting that the theme with the highest MNCS was Data-quality, which can be related to computer science. Regarding the presence of documents in the 1% most cited papers and H-Classic, Social-media presented 5.3% and 2%, exceeding the other four themes.

Fig. 9
figure 9

Distribution of themes covered from 2019 to 2022 based on their impact (MNCS, x-axis) and relative gender contribution rate (RGCR, y-axis). Circumference sizes indicate the number of documents in each community. The upper quadrant displays the topics where women authors were over-represented, while the lower quadrant shows themes over-represented by men. Themes along the x-axis are considered more gender balanced

Table 4 Performance table and impact indicators of LIS field from 2019 to 2022

Discussion

Previous research highlighted disparities in the scientific impact achieved by women (Abramo et al., 2015; Chen & Seto 2022; Joanis & Patil, 2022; Kozlowski et al., (2022; Lariviere et al., 2013; Liu et al., 2022; Mauleón & Bordons, 2006). This can be attributed to several factors, including research productivity, the number of citations received, and the diversity of themes addressed. Exploring the potential factors associated with the differences in impact achieved is a first step towards understanding how and why these dynamics occur. In that sense, we conducted a bibliometric analysis to determine if women and men focused on different themes in the LIS field, which could be one of the aspects that generate this imbalanced impact. To achieve this, we analyzed the evolution over four periods: 2007 to 2010, 2011 to 2014, 2015 to 2018, and 2019 to 2022. Our objective was to examine highlighted themes within the field, assess their impact, and analyze the rate of gender in each theme.

Our findings indicate an increase in the presence of women over the analyzed periods, except for years from 2019 to 2022, and show significant differences among fields and specializations in terms of gender representation and the MNCS achieved for each theme. In general, our observations are in line with the previous results obtained in the LIS field for print and e-journals, showing an increase in the presence of women over the years (Linsay, 2010; Lund & Shamsi, 2021); Parabhoi et al., 2020; Vinay, 2021). Despite this, these previous studies did not cover the pandemic period, which makes our analysis valuable in examining LIS in this specific context. Regarding the identified themes and their characteristics, themes with higher RGCR, where women were overrepresented, tend to be classic themes in LIS or related to human and health-related themes. In contrast, themes with low RGCR, where men were overrepresented, are often related to information technology and computer science, which frequently achieve higher MNCS.

A drop in the number of women authors was observed between 2019 and 2022, its decline coincided with the period marked by the COVID-19 pandemic. Earlier studies have revealed that women researchers were disproportionately affected during the pandemic, often due to increased domestic and parental care responsibilities (Lamarre et al., 2020), which may have created cumulative advantages for men researchers (Liu et al., 2022; Reichelt et al., 2021). Furthermore, previous evidence reported that health and medicine-related fields were among the most negatively affected by the pandemic (Squazzoni et al., 2021), an area predominantly occupied by women (Ceci et al., 2014; Cheryan et al., 2017). Our results support this observation, as the theme Health was most addressed by women over the analyzed periods and exhibited decreasing MNCS values and impact indicators from 2019 to 2022, suggesting its vulnerability to the pandemic’s effect.

Beyond the temporal difference, the results show a gender imbalance in the composition of themes within the LIS field. Specifically, human and health-related themes showed lower impact rates when compared to STEM fields, suggesting that these trends may negatively affect women’s scientific performance. Furthermore, despite the increasing number of women in the field over the years, the themes they address remain relatively constant. Additionally, our study highlights that women are also predominant in themes such as Race and Librarianship. In the case of the first one, the MNCS achieved is lower than the average in the LIS field, and its position in the strategic diagram reflects that it is highly specialized and isolated. Similarly, Librarianship is the theme with the highest women contribution rate and presented the lowest MNCS among all identified themes. Both observations reveal a gender imbalance among the identified themes, suggesting that women in the LIS field addressed themes that are not generally associated with high impact rates and can be considered highly specialized.

In contrast, our findings reveal that themes related to STEM fields, such as Blockchain, Neural-network and Machine-learning, were overrepresented by men authors. These themes presented higher MNCS values and were more present in indicators of excellence, such as the 1% of most cited papers and H-Classic. This observation is in line with previous studies, that women authors were underrepresented in STEM fields (Thelwall & Mas-Bleda, 2020), and this disparity can be even more pronounced in specific subfields, such as mathematics, statistics, and computer science (Ceci et al., 2014); Cheryan et al., 2017). For example, Blockchain achieved the highest MNCS during the period from 2019 to 2022 and presented one of the lowest rates of women authors, and despite growing awareness of gender-related aspects within this subfield, it remains predominantly men-dominated (Di Vaio et al., 2023).

Furthermore, Innovation was another theme overrepresented by men authors from 2007 to 2022. This theme exhibited high MNCS value, the highest percentage of documents in indicators of excellence, and its position on the strategic diagram underscored its high relevance within LIS. Previous innovation-related studies identified that research conducted by diverse, gender-balanced, and larger teams tends to achieve higher citation rates (Hofstra et al., 2020; Larivière et al., 2015) and receive more research grants (Gul et al., 2016). Nevertheless, women researchers were shown to have greater asymmetries between themselves and their co-researchers (Whittington, 2018) and are less inclined to collaborate at an international level (Aksnes et al., 2019; Kwiek & Roszka, 2021). These disparities in collaboration patterns may be related to the fact that international collaboration rates vary according to the disciplines and the researcher’s career stage. For instance, it has been shown that humanities-related fields tend to have lower rates of international collaboration, and senior researchers are likelier to engage in international collaboration (Aksnes et al., 2019; Kwiek & Roszka, 2021). So, to address the gender imbalance in the innovation landscape, some researchers advocate for the adoption of a gendered innovation approach, which involves an awareness of how gender roles, stereotypes, and power dynamics can influence research, design, and innovation (Jenkins et al., 2019; Schiebinger, 2021; Schiebinger & Klinge, 2018).

Considering these results, there is a need to critically examine how power dynamics related to gender roles help to shape the distribution of men and women across different fields. It is essential to clarify that intellectual differences between women and men have no foundation in biological basis (Ceci et al., 2014; De Nicola & D’Agostino, 2021). The understanding of gender imbalances can be approached from two perspectives. One perspective focuses on individual choices and places the problem of inequity on women, attributing the imbalance to their life decisions while dismissing differences between gender roles in actual society. On the other hand, the social and structural perspective considers broader characteristics, such as gender roles and oppression, to explain the gender inequity in STEM (Miner et al., 2018). The construction of gender roles has been studied from various approaches, including the construction and perpetuation of stereotypes within the education system, especially at the primary education level, and through representations in social media. Textbooks and the media often portray white and cisgender men as scientists and white women as teachers (Corsbie-Massay & Wheatly, 2022, Kerkhoven et al., 2016; Mitchell & McKinnon, 2019), creating invisible barriers for women and individuals from diverse linguistic and cultural backgrounds. This perpetuation of stereotypes can influence women’s experience within the education system and affect their sense of belonging in specific academic and professional environments (Casad et al., 2021; Rainey et al., 2018; Spoon et al., 2023; Wright et al., 2007). This structural and systematic phenomenon offers at least a partial explanation for why women and men tend to engage in different fields, subfields, and research themes.

The implications of this social phenomenon within the scientific system are vast. Varying citation patterns and the impact of different themes can influence individual performance. Beyond its effects on the scientific community’s performance, unequal gender representation can also significantly impact science. Scientific knowledge produced within this imbalanced context often reflects the gendered and racialized identities of scientists (Kozlowski et al., 2022), resulting in a biased perspective on the knowledge that excludes and limits its full potential. Several studies have demonstrated that increasing diversity and collaboration among scientists can positively affect scientific impact and innovation, ultimately expanding the body of knowledge produced (Hofstra et al., 2020; Kozlowski et al., 2022; Larivière et al., 2015; Yang et al., 2022).

Although our study provides valuable insights, some limitations need to be addressed. First, our analysis relied on the Web of Science database, which has been reported to have limited coverage in human-related fields (Mongeon & Paul-Hus, 2016; Singh et al., 2021). Despite this limitation, the corpus used in our research proved sufficient for identifying research themes and revealing gender overrepresentation in various themes. Second, the method used for gender identification is more suited to Western names, as it relies on the author’s first name. This approach may not effectively determine the gender of individuals from different cultural backgrounds, such as Russians, for whom gender indicators are typically found at the end of the last name. In future research, we plan to address these limitations by employing different bibliographical databases and implementing an improved methodology to improve the accuracy of gender assignments. Moreover, given the complexity of addressing gender differences, previous studies have suggested using different metrics to avoid biases in gender analysis (Cameron et al., 2016). Self-citations, for instance, are one of these concerns, and studies have demonstrated this behavior is most observed in men authors, which could inflate their performance and potentially mask the impact of the research on the scientific community (Andersen et al., 2019; Cameron et al., 2016; King et al., 2017; Nielsen, 2016). Another limitation of our analysis was that self-citations were not excluded from the calculus of the performance indicators in each community due to a lack of data. This perspective would allow us to analyze in which communities the impact performance was inflated by self-citations and compare if there were communities mostly represented by one gender. Therefore, while our study provides valuable insights, these limitations underscore the need for continued research to refine methodologies and address biases, ultimately fostering a more comprehensive understanding of gender dynamics in academic research.

Conclusions

This research aimed to shed light on the factors that can affect the scientific impact achieved by women. As detailed throughout this study, women and men presented different citation patterns within the LIS field, and in part, bias may result from the different themes addressed by each group. In this sense, our findings underscore that women and men addressed classical themes of LIS, although men were more involved in themes that achieved higher MNCS values. Meanwhile, women were overrepresented in human and health-related themes, exhibiting low MNCS values and a low percentage of documents in impact indicators, such as the 1% of most cited papers and H-Classic. In contrast, men were overrepresented in STEM-related themes and demonstrated more linkage with innovation themes. Both STEM and innovation-related themes exhibited high MNCS values and a large number of documents within the impact indicators measured. These results highlight the significance of the themes addressed by each gender and their scientific impact, introducing a debate on societal and structural factors that may influence the selection of research fields and themes, as well as the broader implications of gender disparity in science, emphasizing the necessity for more inclusive practices. Our results also reveal a slight increase in women authors in the LIS field over the years, despite the decrease during the COVID-19 pandemic. This decline highlights that the pandemic had a disproportionate impact on women authors. To address the gender imbalance in the dataset, we introduced the RGCR metric to measure and assess gender contributions across different themes. Despite the findings, limitations related to the bibliographical database and the gender identification method were exposed. Nevertheless, this study contributes to understanding the complex relation between research themes, gender contribution within these themes, and their scientific impact achieved over time.