Background

Research on topics related to sustainability and health is growing at an unprecedented pace across multiple disciplines and sectors [1,2,3]. As a result, scoping reviews, systematic reviews, and bibliometric analyses have become essential tools for synthesizing evidence in this area and designing evidence-based policies [4,5,6,7]. However, traditional methods of literature review and synthesis, based on manual expert assessment, increasingly appear too time- and resource-consuming to keep pace with the expanding evidence base and maintain scientists and policymakers abreast of most recent research developments [1,2,3]. The resulting delays in literature appraisal may withhold insights that could be crucial to the advancement of the United Nations’ (UN) Sustainable Development Goals (SDGs) [8], including the goal to “ensure healthy lives and promote well-being for all at all ages” (SDG 3, “Good health and well-being”) [9]. Scientific appraisal is among the first bottlenecks in the translation of health discoveries into policy and practice. Delays in this stage can have long-term implications for global health and sustainability, especially when societies face global health emergencies such as COVID-19 [10, 11].

Indeed, the health-related, socioeconomic, and environmental impacts of the COVID-19 pandemic have reshaped priorities and underscored the primacy of health for the sustainability of contemporary human societies, spurring calls to prioritize health sustainability and shift towards a One Health perspective in science and policy [12, 13]. Since its very early stages, the COVID-19 emergency has caused a significant setback for the world’s advancement toward sustainable development, especially among the poorest countries and most vulnerable social groups [14]. As the initial emergency has subsided, the health inequalities [15,16,17], socioeconomic determinants of health [18, 19], and environmental challenges [20, 21] exposed by the pandemic have led to the reevaluation of the health-focused SDG 3 as a central goal capable of guiding holistic and coherent policies for sustainability and prompting synergistic actions in favor of multiple other sustainability goals [22]. The kind of sustainability synergies that SDG 3 can promote may occur through second order effects – e.g., when improving the health of the working population (SDG 3) also improves the state of the economy (SDG 8) – or by requiring explicit advancement of another goal – e.g., reducing the spread of waterborne diseases (SDG 3) by improving sanitation (SDG 6) [23].

The COVID-19 pandemic has also highlighted the complex networks of connections and interdependencies between the 17 SDGs. Development goals and their targets form an interconnected and dynamic system, with numerous and varying synergies and trade-offs among them [24,25,26]. The same scientific insight or policy action may co-advance multiple SDGs simultaneously or contribute to one while hindering others. This creates a need to map SDG interdependencies over time and to ensure policy coherence in sustainability – the harmonization of policies to simultaneously address multiple SDGs, optimize resources for SDG co-advancement, and prevent harms to public health from underinformed policymaking [27,28,29].

Against this backdrop, this paper analyzes the body of global scientific research that has addressed the health-related SDG 3 in the past twenty years, and its potential to promote synergistic progress, in science and policy, on multiple other SDGs simultaneously. Specifically, we seek to answer the following research questions: (1) How is SDG 3 thematically interconnected with all other SDGs in global scientific research, and how have these interconnections changed since the turn of the millennium? (2) What specific themes and topics have emerged in research conducted at the intersection between SDG 3 and other SDGs in the last twenty years, and how can we identify the topics that are most useful to fuel progress towards multiple SDGs simultaneously? With the first question, we explore the extent to which global science has addressed topics that are relevant to both SDG 3 and other SDGs, identifying which other SDGs are consistently overlapping with research on global health and health sustainability, and how this has changed over the past twenty years. Considering the second question, we map the themes and topics that have emerged in global science around SDG 3 and other SDGs, and the network of substantive, semantic interrelationships underpinning them.

Addressing these two questions, this study makes two important contributions to research on global health, public health, sustainability, and globalization. First, we advance a novel approach to the study of interdependencies between SDGs, in which large amounts of research outputs are analyzed with Natural Language Processing techniques and network science methods to synthesize knowledge, map links among SDGs, and describe entry points for science and policy to co-advance multiple SDGs simultaneously. We apply this approach, specifically, to the interdependencies between the health-related SDG 3 and all other SDGs. Second, we identify ‘zipper themes’ in science and policy around health and sustainability. Zipper themes are topics, scientific questions or policy issues which can strengthen scientific research and suggest synergistic policies for co-advancing SDG 3 together with other sustainability goals. Each SDG is itself a broad set of themes and objectives formulated to facilitate communication and collaboration between scientists and policymakers from a wide variety of fields, policy areas, and agendas – from energy and economy to health and biodiversity. We identify more granular zipper themes – within and across SDGs – that will stimulate innovative research ideas around the health-sustainability nexus, promote a conceptual and terminological convergence in sustainability research, and guide the formulation of coherent and actionable policies to co-advance goals in multiple sustainability areas.

COVID-19, health sustainability, and SDG interdependencies

Originally conceived at the Rio + 20 conference as part of the UN’s 2023 Agenda, the SDGs are a set of broad aims, specific targets, and related indicators concerning social, political, economic, and environmental sustainability and aiming to promote global cooperation for sustainable societies [8]. The third SDG, the main focus of this paper, is labelled as “Good health and well-being” and aims to “Ensure healthy lives and promote well-being for all at all ages” by 2030: its targets include reducing the global maternal mortality ratio; ending the epidemics of AIDS, tuberculosis and malaria; achieving universal health coverage; substantially reducing the prevalence of deaths and illnesses from pollution and chemical contamination; and strengthening the capacity of developing countries for management of national and global health risks.

SDG 3 is clearly linked to other SDGs, in the sense that – given the crucial health impacts of certain social, economic, and environmental factors – progress towards SDG 3 clearly goes hand in hand with progress towards other goals, such as SDG 2 (Zero hunger: End hunger, achieve food security and improved nutrition and promote sustainable agriculture) or SDG 13 (Climate action: Take urgent action to combat climate change and its impacts). Indeed, the SDG targets and indicators are inherently interconnected by relationships of synergy, when certain actions can contribute to multiple goals at the same time; and trade-offs, when actions advancing one goal can inhibit or harm progress on another [25, 30]. This network of interdependencies is not static over time and space, but can change depending on geography (e.g., in high-income vis-à-vis low-income countries) and time [26, 29]. Mapping and evaluating SDG interdependencies is essential to ensure coordination and coherence of policies towards sustainable development at the global, national, and regional levels [28]. Previous research has quantified these SDG interconnections by analyzing co-variance among SDG or target indicators over time 26, 31, 32; or by surveying experts to qualitatively distill synergies and trade-offs from previous scientific literature [33,34,35]. While these efforts have been insightful, their approaches are constrained by cost and at times limited availability of indicator data, or by the difficulty of assessing exponentially growing volumes of literature in reasonable time via qualitative, manual expert appraisal – especially when SDG synergies and trade-offs must be evaluated at different geographic locations or scales, or during crises requiring rapid intervention. Overcoming these constraints is important for the effective and time-sensitive translation of scientific discoveries into practice, interventions and policies for health and sustainability, particularly in times of environmental or health crises [11]. In this light, new computational methods, leveraging recent advances in machine learning and natural language processing, can complement existing ones and address some of their drawbacks [36].

In the last three years, the complex network of SDG synergies and trade-offs has been brought into sharp focus by COVID-19. The first-order effects of the pandemic on public health revealed that many countries had significant room for improvement with respect to SDG 3, especially in terms of resilience of their health systems to crises [14, 37]. With its second-order effects in the social, economic, and environmental domains, the COVID-19 crisis negatively impacted indicators of poverty (SDG 1), education (SDG 4), and unemployment (SDG 8), while it brought about a short-term reduction in greenhouse gas emissions, positively impacting climate action (SDG 13) [37]. Further exposing interconnections between different sustainability goals, COVID-19 has also emphasized the urgency of adopting a One Health perspective on issues of health and sustainability [13]. This perspective posits that the health of humans, domestic and wild animals, plants, and the environment are strictly interconnected by co-benefits and trade-offs [12, 13, 38]. Exemplifying the link between human and animal health, COVID-19 itself has been classified as a zoonotic disease due to the genetic similarities between SARS-CoV-2 and horseshoe bat coronaviruses, while natural resource consumption (relevant to SDG 12) and climate change (the focus of SDG 13) have been indicated as causes of increased rates of interaction and potential pathogen transmission between species [12]. At the same time, recent research has highlighted how the advancement of SDGs 12 (Responsible consumption and production) and 13 (Climate action) can directly influence SDG 3, both positively via improvements in air quality and other environmental determinants of health, and negatively via socio-economic trade-offs that reduce pollution and consumption, such as unemployment caused by the shuttering of coal-fired plants in nations without universal health care [23]. Together with a commitment to ‘health in all policies’ by the United States Center for Disease Control and Prevention [39], some of the most vocal proponents of a One Health perspective have recently been the G7, G20, Global Health Summit, and World Health Organization [40,41,42].

Finally, the pandemic has also forced rapid innovations in data analysis [43]. While these innovations have generally been focused on the monitoring of health indicators and epidemic modeling, Natural Language Processing (NLP) has also entered the foreground with hundreds of studies seeking to analyze the fast-evolving scientific literature on COVID-19 for information retrieval and summarization, literature-based discovery, question answering, and topic modelling [36]. NLP is a subfield of computational linguistics working towards the goal of developing and ‘training’ machine learning algorithms that can ‘understand’ and unpack the nuances of human speech and written text, able to retrieve syntactic patterns and dependencies in human writing, distill key words, phrases, entities and topics from large amounts of text, and quantify similarities between documents. Sustainability scholars have begun fine-tuning these algorithms to summarize extensive and evolving bodies of sustainability-related scholarship [3, 44]. For example, addressing SDG 2 (Zero hunger), Porciello et al. (2020) recommended the wider application of NLP for mapping similarities between texts, entity-recognition, and coreference resolution, with the goal of accelerating the synthesis of large quantities of evidence in sustainability, and thereby efficiently discovering effective policies and practices for sustainable development [44]. Focusing on the climate-related SDG 13, Callaghan et al. (2021) fine-tuned the DistilBERT language model to categorize and extract specific information from 102,160 climate impact studies. They used the results to map field-wide trends in anthropogenic climate change (1951–2018) [3]. In these studies – as well as in other works of automated, NLP-based synthesis of evidence which became popular during the pandemic [36] – the goal is to examine specific bodies of literature, providing a highly detailed treatment of a single SDG domain or topic. In contrast, we broaden the focus to relationships among multiple SDG domains and propose a method which combines NLP and network science techniques to illuminate interdependencies between SDGs and generate insights co-advancing multiple sustainability goals together.

Methods

The methods of this study consist of two steps. First, using results from an existing machine learning method to classify scientific publications by their SDG relevance [45], we determine the frequency with which scientific research has addressed the health-related SDG 3 and each other SDG over the past twenty years, identifying SDGs that are well or poorly integrated with SDG 3 in global science. Second, we implement a method, based on topic modeling and network science, to zoom in on the actual contents of scientific research at the intersection between SDG 3 and other SDGs, to map and describe substantive themes of convergence and overlap. All analyses and visualizations were performed using the top2vec library in the Python general-purpose programming language, and the igraph and CentiServer packages in the R statistical computing software within the Visual Studio Code IDE [46,47,48,49,50].

Data

We analyze all peer-reviewed scientific articles, published between 2001 and 2020, which are indexed as relevant to SDG 3 and one or more other SDGs in Dimensions, the most exhaustive database for scientific publications [51, 52]. Data about titles and abstracts of these articles were collected using custom built functions that create an interface between the R statistical computing environment and the Dimensions API. Each article’s relevance to each of the 17 SDGs is determined by a classification algorithm developed by Dimensions and returning a binary index of relevance to each SDG: i.e., an index classifying an article as either relevant or not relevant to each of the 17 SDGs [45]. Created by Digital Science et al. (2020), the classification algorithm was trained on a data set consisting of articles that are certainly relevant to each SDG. These articles were found with a specific keyword search query, for each SDG, of works published since 2010 (when the Millennium Development Goals, the SDGs’ precursors, were established). The keyword search queries were manually curated and informed by the UN’s SDG definitions, targets, and indicators, aiming to minimize false positive rates. Supervised NLP algorithms were then trained to make binary classification decisions (relevant/not relevant) based on the training data corresponding to each of the 17 SDGs. In the results, each article may be classified as relevant to one, multiple, or none of the 17 SDGs. The articles selected for our data were classified as relevant to SDG 3 and at least one other SDG. These are 27,928 articles, contributed by 75,665 authors. Our text data is limited to their abstracts, which overall contain 3,918,143 tokens (units of speech, e.g. words) and 64,575 unique tokens (of which 36,550 appear more than once in the corpus). Procedures for preprocessing this corpus of text are detailed in the supplementary materials.

Topics, topic networks and communities

To distill topics in the article abstracts we employ top2vec [49], a recent unsupervised machine learning approach to discovering topics in large text corpora via word and document embeddings, implemented in the top2vec Python library. This method combines the word2vec and doc2vec embedding models [53, 54], Uniform Manifold Approximation and Projection for dimension reduction (UMAP) [55], and hierarchical density-based spatial clustering of applications with noise (HDBSCAN) [56]. The word2vec model uses shallow neural networks to learn numeric representations (i.e., vectors or embeddings) of words based on collocation with other words in text. Relationships between these numeric representations approximate human understanding of semantic relationships between words; semantically related words like virus, vaccine, and epidemic appear in similar contexts and are therefore represented by similar numbers, and semantically dissimilar words like virus and volcano will be represented by different numbers [53]. The doc2vec model inherits word2vec’s understanding of semantics and adds to it by concurrently learning numeric representations of documents (here, article abstracts), in addition to words [54]. The resulting embeddings, which provide a numeric representation of each word and abstract in the corpus, are numeric vectors of 300 dimensions, that is, each including 300 numbers. The third component of the method, UMAP, reduces these high-dimensional embeddings onto fewer dimensions (fewer numbers): that is, it projects the embeddings in low-dimensional space. Finally HDBSCAN is used to identify dense clusters of documents that have similar embeddings in this low-dimensional space [49]. In the results, a cluster of documents (i.e., of scientific articles) represents a distinct substantive topic; and the centroid of the document embeddings in that cluster (a sort of “average” embedding of the cluster) provides a numeric representation of that entire topic (i.e., the topic embedding). Applied to our corpus of articles, the top2vec method detects a total of 197 topics (clusters). These are summarized in Table S2, including the number of articles in each topic (from 26 articles in the smallest topic to 1277 in the largest) and a description of topic contents based on the most salient words in the topic (words with embeddings that are closest to the cluster centroid).

We quantify semantic closeness between two topics by measuring similarity between their respective topic embeddings via cosine similarity, a popular measure of similarity between two numeric vectors. The result is a network uncovering the structure of semantic relationships between topics (see Fig. 3): each network node is a topic, and the weighted link between two nodes represents the semantic closeness between two topics (i.e., the cosine similarity between their embeddings). Considering breaks in the distribution of cosine similarity scores, a global edge filter is set on this weighted network to only retain links between two topics when their cosine similarity is higher than the 99th percentile (cosine similarity ≥ 0.279). The result is a network in which two topics are linked if they are semantically related or not linked if they are not related, and the precise cosine similarity scores between two topics can, if necessary, be disregarded. Like many social and semantic networks [57], this topic network has a “community structure”: it consists of different communities, that is, distinct groups of topics (nodes) that are more closely connected to each other (i.e., more semantically similar to each other) and more distant from all other nodes. We identify these communities using the walktrap community detection algorithm (from the R igraph package, setting the number of algorithm steps to 5). [48, 58] This results in 19 communities – each gathering between 2 and 30 topics (Tables S1 and S2) – which reveal broad yet coherent thematic clusters in scientific research around SDG 3 and other SDGs in 2001–2020.

We also use UMAP again (from the umap-learn Python library 55) to visualize the distribution of SDGs across articles on a 2-dimensional plane – a ‘topic map’ of research around health and sustainability (Fig. 2). Each point in this map is an article, and two articles are close in space when they are topically, semantically, or lexically similar (and distant when they are dissimilar). Points (articles) are colored in accordance with the second SDG they were assigned by the Dimensions classification algorithm, in addition to SDG 3-Health (light grey points are articles assigned more than one other SDG). The approximate regions corresponding to each SDG in this topic map, as well as the overlaps between these regions, are visualized in Figure S3.

Identifying zipper themes

The network representation of the semantic relationships among topics allows the identification of zipper themes for co-advancing health and other SDGs in science and policy. We propose three methods to distill zipper themes from the topic network: network centrality, cross-community connections, and isolates. First, network centrality measures (from the CentiServer R package 50) detect topics that occupy more central positions in the network’s structure of connections. These topics can be regarded as zipper themes: their high centrality indicates that they are relevant or connected to many other topics at the interface between health and sustainability [59,60,61]. Specifically, we use betweenness and harmonic closeness measures of centrality, as well as the Density of the Maximum Neighborhood Component (DMNC) [62,63,64]. Detailed descriptions of these centrality measures are provided in the supplemental materials. Second, cross-community connections are links between topics (nodes) that belong to separate communities in the network. Recall that 19 distinct communities of topics are discerned in the network. Connections between nodes in different communities reflect semantic relationships between topics in areas that are otherwise distant in sustainability and health research. Thus, cross-community connections point to existing zipper themes (links of semantic closeness that already exist in the current network) that bridge different research areas in the field. Finally, a small selection of 12 topics discovered by top2vec were separated from all other nodes after imposing the global filter, becoming isolates. The articles forming these isolated topics are substantively or discursively distinct from the rest of the corpus even though (by construction of the data) they are still classified as relevant to SDG 3 and at least another SDG. Hence, these isolates indicate gaps in the network structure – and point to corresponding gaps in science and policy on health and sustainability: topics that are semantically distant from all others in the current network, but that further research or interventions could bridge with the “mainstream” of the health and sustainability knowledge network.

Results

Mapping interconnections between SDG 3 and other SDGs in global research

Figure 1 visualizes the overlap between SDG 3-Health and all other SDGs in relevant scientific literature between 2001 and 2020. Scientific research on SDG 3 most frequently intersects with literature on SDGs 16-Peace (Npub = 5,167), 11-Settlements (Npub = 4,628), 10-Inequality (Npub = 3,610), 2-Hunger (Npub = 3,243), and 4-Education (Npub = 2,764). In contrast, SDGs with the smallest intersecting literature with SDG 3 (less than 300 articles across all years) are 15-Terrestrial, 12-Consumption, 9-Industry, 14-Aquatic, and 17-Partnerships. Counts of publications relevant to SDG 3 and another goal have increased markedly in the last 5 to 10 years (+ 139% in 2015–2020, + 74% in 2010-15), most notably for 11-Settlements (+ 176% publications shared with SDG 3 in 2015–2020, N2020 = 1,791), 4-Education (+ 274% publications in the same period, N2020 = 1,314), and 16-Peace (+ 101% publications, N2020 = 1,268). The intersection between SDG 3 and other SDGs in global science has also grown as a proportion of the overall size of literature addressing any pair of SDGs (Fig. 1.B), particularly in the last 5 years, with SDGs 11-Settlements, 4-Education and 2-Hunger showing the highest standardized overlap (Jaccard index) with 3-Health.

Fig. 1
figure 1

Intersection of scientific research relevant to SDG 3-Health with research relevant to each other SDG. (A) Raw count of scientific articles classified as relevant to SDG 3 and another SDG. (B) Jaccard index of the set similarity between articles relevant to SDG 3 and those relevant to each other SDG. All articles in the corpus are classified by dimensions.ai as relevant to SDG 3 and at least another SDG.

Figure 2 presents a topic map of scientific research relevant to SDG 3-Health and at least another SDG, based on the same corpus of articles as in Fig. 1. A strong divide is observed between SDGs in the socioeconomic domain (top-left region of the map: 1-Poverty, 2-Hunger, 4-Education, 5-Gender, 8-Economy, 10-Inequality, 16-Peace, 17-Partnerships) and environmental SDGs (bottom-right region: 7-Energy, 11-Settlements, 13-Climate, 14-Aquatic, 15-Terrestrial). However, the map also provides clues about important points of integration between the socioeconomic and environmental goals: literature relevant to SDG 6-Sanitation lies at the intersection between 2-Hunger and {14-Aquatic, 15-Terrestrial} in the map; articles on 9-Industry and 12-Consumption are situated at the intersection of those related to {8-Economy, 17-Partnerships} and {7-Energy, 11-Settlements}; the last two are also close to 13-Climate, 14-Aquatic, and 15-Terrestrial. These proximities in the map – and the adjacency between articles attached to SDGs in the same broad area (e.g., SDGs 5 and 10 on social inequalities, or the environmental SDGs 13, 14 and 15) – suggest that the method we used to distill topics and topical proximities in SDG-related science captures meaningful themes and relationships in the text corpus. This allows us, in the next section, to narrow the focus to substantive themes of overlap between research on different SDGs.

Fig. 2
figure 2

Topic map of all peer-reviewed publications classified by dimensions.ai as relevant to SDG 3-Health and at least one other SDG. Each point is a publication, proximity between two points indicates topical or semantic proximity between the two corresponding publications in a UMAP 2D projection of document embeddings (see Methods). Point colors indicate the secondary SDGs (other than 3-Health) to which each article is relevant. Publications classified as relevant to three or more SDGs are in light grey

Topics and zipper themes at the health-sustainability nexus

Figure 3 shows the network of semantic connections between topics in scientific research around SDG 3-Health and other SDGs. Highly central topics in the network are semantically connected and potentially relevant to a great number of other topics, are part of large and dense thematic regions, or bridge separate areas in research about SDG 3-Health and other sustainability goals. A consistent set of highly central topics emerges across all centrality measures, producing a stable ‘core’ of zipper themes according to this criterion (e.g., topics 1, 2, 8, 9, 20 in Figure S4 and Table S2). Substantial correlations are observed between network centrality of topics and counts of publications in each topic (from 0.16 for DMNC to 0.39 for closeness and 0.58 for betweenness centrality), suggesting that central themes are also relatively well-established in science, catalyzing larger amounts of research. Topics 2 (health-relevant targets ioutlined by the Millennium and Sustainable Development Goals) and 9 (reproductive health and healthcare, access to skilled birth attendants, maternal mortality rates in developing countries) are among the most central in this analysis. Notably, the publications in topic 9 (e.g., Ali & Chauhan 2020; Pulok et al. 2016 in the bibliography, see also Table 1) [65, 66] tend to explicitly discuss reproductive health in the context of the millennium and sustainable development goals, a topic that is directly relevant to the first target of SDG 3. Additionally, central topics 1, 8, and 20 all consist of research relevant to specific targets and indicators related to health and different SDGs: public transportation and pollution in developing urban environments for topic 1 (e.g., Di Mascio et al. 2018; Jacyna et al. 2017) 67, 68; health inequalities in developed countries for topic 8 (e.g., Leclerc et al. 2006; Stringhini et al. 2015) 69, 70; malnutrition and child mortality in developing countries for topic 20 (e.g., Roy et al. 2018; Tariku et al. 2016) [71, 72]. These substantive contents validate our approach to literature synthesis and identification of zipper themes; they also reinforce the notion of the SDGs as a set of unifying concepts and targets within health-sustainability research.

Fig. 3
figure 3

Semantic network of topics in science related to SDG 3-Health and at least another SDG. Nodes are top2vec topics, drawn as pie charts whose colors represent the SDGs to which publications in each node (topic) are relevant, according to dimensions.ai classification. Node size represents harmonic closeness. Weighted edges are cosine similarities between the embeddings of two topics (when similarity > 99th percentile). Light-blue polygons are network communities: within-community edges are black, between-community edges are red. Network isolates are removed

Table 1 List of example articles discussed in reference to topics of interest. Articles are categorized by method of identification (network centrality, cross-community pairing, or network isolation). Constituent topics are indicated and described

DMNC performs well as a tool for identifying topics at the intersection of wider themes in the corpus (see Table 1 and Figure S5.B). Topic 12 – health and safety in the workplace and occupational injuries (e.g., Yilmaz et al. 2016; García-Mainar & Montuenga-Gómez 2009) [73, 74] – scores the highest on this metric, closely followed by seven topics covering a range of issues, from barriers to healthcare (topic 10) to the health impacts of pollution and climate change (topic 114). Topic 12 is at the intersection of three network communities: itself assigned to a community on the relationship between technology and health (community 3 in Table S1), it is immediately adjacent to communities broadly addressing issues of health emergency response, and health policies, politics, and ethics (communities 11 and 8, respectively). For example, topic 12 is connected to topic 25 (natural disaster injury and response) in community 11, and to topics 102 (micro-entrepreneurship and small-to-medium business development) and 109 (use of communications technology to reduce health care inequalities) in community 8.

Cross-community links point to several strong, actionable areas of integration between disparate research on health and sustainability. These areas include both cases in which the same overall issue is treated from two different perspectives, and cases in which two entirely different issues are linked by a third common topic. For example, the connected topics 9 (causes of inequality in maternal healthcare utilization) and 68 (unequal childhood vaccination coverage, e.g., Hajizadeh 2018; Bobo & Hayen 2020) [75, 76], are both part of a broader literature about maternal and infant healthcare (Table 1). However, topic 9 is assigned to a network community which focuses primarily on policies, politics, and ethics, where topic 68 is assigned to a community focused primarily on infectious diseases. Another set of cross-community connections are observed between topics on air pollution (topics 11, 25, 36) and non-communicable diseases among infants and pregnant women (topics 49, 48), adults (topics 83,178), and the elderly (topic 71). The cross-community link criterion also points to an interesting area of overlap between SDGs 13-Climate, 14-Aquatic, and 3-Health: the link between topics 42 (schistosomiasis treatment, a topic in community 9: infectious diseases) and 158 (schistosomiasis infection due to changes in freshwater snail habitats caused by climate change and infrastructure projects, a topic in community 10: wildlife ecology). One body of research investigates prevalence, impact, and treatment of schistosomiasis (e.g., Poggensee et al. 2005; Siza et al. 2015; M’Bra et al. 2018) [77,78,79], a neglected tropical disease caused by Schistosoma worms; while the other examines its zoonotic etiology (e.g., Pedersen et al. 2014a; 2014b) [80, 81]. Together, these separate but related research agendas offer an ideal example of zipper theme relevant to the One Health perspective.

The strongest cross-community edges (with cosine similarities 2 standard deviations greater than the mean) are between topics 42—158 (cos = 0.59), 6—27 (cos = 0.56), and 11—58 (cos = 0.53). Topics 42 and 158 were described above. Topics 6 (in community 10: wildlife ecology) and 27 (in community 12: environmental health) both address direct impacts of climate change on health: the former focuses on how rising temperatures result in the spread of arboviruses into temperate environments, and the latter on heat waves and heat-related deaths in traditionally temperate climates. Finally, topics 11 (in community 12: environmental health) and 58 (in community 6: non-communicable diseases) both encompass publications on the effects of ambient air pollution, with focuses on respiratory and cardiovascular health, and reproductive health, respectively. These topics are part of a broader area of research into air pollution and the etiology of non-communicable diseases.

The third proposed method to identify zipper themes on health and sustainability – examination of network isolates – highlights a disparate set of topics, such as the impact of HIV/AIDS on patients’ children and orphans (topic 173); CRISPR-Cas9 genome editing, the development of gene-edited pathogen-resistant crops, and CRISPR-Cas9 vaccines (topic 90); shipping industry and maritime logistics, occupational safety in this industry, industry-related pollution (e.g. due to oil spills), ecotoxicology and damage to marine organisms (topic 79). These disconnected topics point to research directions with the potential to span significant gaps in existing literature on health and sustainability.

For example, in topic 173, advances in research on the children of HIV/AIDS patients and victims (e.g., Short & Goldberg 2015; Wete et al. 2019) [82, 83] – including immediate medical consequences of parents’ illness on children’s health (e.g., HIV/AIDS infection) and on long-term trajectories of child development and mental health – could have important impacts on a wide range of scientific topics at the intersection between SDG 3 and other sustainability issues. This theme connects at least three topical communities emerging in our synthesis of literature: non-communicable diseases and conditions (community 6); health policies, politics, and ethics (community 8); infectious diseases (community 9). In topic 79, hazards, accidents and occupational health in oil tankers and the maritime transportation industry have impacts both on the health of workers in that sector (for example, ship crews) and on local environments (e.g., Eliopoulou et al. 2012; Uğurlu et al. 2015) [84, 85]. Accidents and disasters in this sector can also have long-term effects on the health of residents in surrounding areas, underscoring significant overlap with research on non-communicable diseases and conditions (i.e., community 6) [86]. This theme is potentially relevant to three topical communities in our literature synthesis: technology, infrastructure, and workplace safety (community 3); wildlife ecology and zoonoses (community 10); and environmental health (community 12).

Discussion

This study used Natural Language Processing and network science methods to synthesize the entire corpus of scientific abstracts published in 2001–2020 on topics related to SDG 3 (Good health and well-being) and one or more other SDGs. This synthesis was motivated by two main research questions, corresponding with two aims. First, it sought to describe the degree and nature of integration between research relevant to SDG 3 and to other SDGs in global scientific literature over time. Second, it aimed to identify sets of topics, scientific questions, or policy issues – dubbed here as ‘zipper themes’ – which have the potential of stimulating convergence and synergy between research and policy efforts to simultaneously co-advance health (SDG 3) and other sustainability goals.

Addressing the first question and aim, we observed increasing integration between SDG 3 and most of the other goals. This growing body of inter-SDG literature underscores the need for literature reviews that focus on important points of intersection and convergence between goals. While studies on the synergies and trade-offs between all SDG targets and indicators is essential [25, 26, 34], these broader efforts often fail to identify existing or emerging research topics with the potential for a translational impact on the 2030 agenda. The methods presented here can help direct and supplement studies whose aim is to scrutinize overlaps between specific goals, targets, or indicators in an effort to suggest synergistic policies and practices (e.g., De Neve & Sachs 2020) [23].

Scientific integration between SDG 3 and other goals is especially developed with SDG 16-Peace (on topics such as correctional population health, bioethics, and patenting and trade of health technology: topics 7, 19, and 21 in Table S2, respectively); SDG 11-Settlements (for example, on sustainable inner city transport and air pollution: topics 1 and 25); SDG 10-Inequality (on socioeconomic health disparities and ante/postnatal care: topics 8 and 9); SDG 2-Hunger (on food insecurity in HIV-positive populations and HIV-Exposed Uninfected infants, growth disorders and stunting resulting from poor nutrition, sugar intake and obesity: topics 17, 20, 22); and SDG 4-Education (on postgraduate education in health and healthcare: topic 3). Research in these areas is producing scientific knowledge that can help co-advance multiple sustainability goals. The interconnections between SDGs 3, 11, and 16 found in this study replicates results from a previous study that observed positive covariance in their respective SDG indicators [23]. However, the same study notes that the relationship between SDG 3 and 10 is negligible [23], whereas our analysis went on to highlight the salience of healthcare inequalities at this intersection [15, 19, 65]. Indeed, the NLP-powered analysis of unstructured text data offers opportunities for supplementary insights beyond the scope of existing indicators.

On the other hand, analogous to the divide between socioeconomic and environmental SDGs described by previous works [23, 87], we also observed that science connecting SDG 3-Health with SDGs 12-Consumption, 13-Climate, 14-Aquatic, and 15-Terrestrial is much less developed. This finding is remarkable considering the importance of this type of research – at the intersection between human health, socioeconomic issues, and environmental sustainability – for the One Health perspective, a central framework for science and policy on sustainability [27]. Future research into this persistent division is necessary if we are to address issues at the intersection of human and environmental health, including antibiotic resistance [38] and the transmission of zoonotic disease [12, 80].

Indeed, when addressing our second research question, our analysis offers insights on emergent research topics with the potential to bridge these divisions in the science of health and sustainability. Representing the relevant literature as a network of topics, we identified several ‘zipper themes’ which occupy central positions in the structure of health-sustainability research or can bridge significant gaps in this field. These are specific, actionable domains of research that cover a range of issues, from reproductive and maternal health to public transportation in developing cities, from climate change and schistosomiasis to environmental and health impacts of accidents in maritime transportation. In some cases, related bodies of literature on these themes were detected as distinct, unique topics by our models, revealing opportunities for new synergies in health and sustainability research, including within the One Health framework. As an example, insights from research on topic 158 (diffusion of schistosomiasis via freshwater snail migration) could promote scientific advances on topic 42 (treatment of schistosomiasis via praziquantel), although these are detected as separate topics in two different network communities (community 9 on infectious diseases and community 10 on wildlife ecology and zoonoses, respectively). Recognizing the interdependencies between climate and health sustainability, Pedersen (2014a; 2014b) empirically forecast increases in the incidence of intestinal schistosomiasis, induced by habitat loss, that will persist until 2055 [80, 81]. More recent studies have then added that biodiversity loss can have a similar impact on the transmission of zoonotic diseases [12, 88]. This most recent uptake of the climate change–zoonosis zipper theme that was being studied nearly a decade earlier validates our approach. Further applications could help guide research and policy to improve the preparedness of contemporary societies to future pandemics and public health emergencies.

Conclusion

For almost a decade, health and sustainability scholars have advocated for the systematic synthesis of scientific literature and evidence, recognizing that the exponential growth of research volumes in this multidisciplinary field means that an increasing number of studies, insights and innovations risk to be overlooked or ignored [1, 2, 89]. Especially in research on complex topics such as global health, health systems, health inequalities, and sustainable development, literature and evidence syntheses are needed to appreciate the different aspects of multifaceted problems, recognize knowledge gaps, and learn lessons for future interventions [2]. The method proposed here illustrates each of these aims. The synthesis of large bodies of evidence on sustainability and health increasingly relies on scoping review methods, able to map and distill knowledge from hundreds of articles around the same substantive topic [4, 6, 90, 91]. In comparison, the combination of NLP and network science techniques presented here can map topics, connections, and gaps in science from much larger volumes of literature (nearly 30,000 articles in our case) around a more broadly defined topical area (here, health and sustainability). While our study does not match the nuance and detail of scoping reviews, it provides important information about the state and landscape of relevant research, including main topics, existing and missing connections between them, and promising directions for future work.

Moving forward, a major challenge will be to reposition and ‘rewire’ scientific efforts on SDGs 12-Consumption, 13-Climate, 14-Aquatic, and 15-Terrestrial on the map of health-sustainability research, increasing their substantive proximity to other goals in global science [87]. This would enable the identification of novel, interdisciplinary topics of research and policy that could connect and co-advance health with socioeconomic and environmental sustainability goals. Delving into the results of the current study, we identify several pairs of research topics that, through a One Health perspective, have the potential to produce – or are already in the process of producing – research, interventions, and technologies to co-advance multiple SDGs. These research topics represent promising paths forward toward the sustainable development goal to “ensure health lives and promote well-being for all at all ages” [92]. Approaching frontier research with this mindset would be a transformational starting point for scientists, funding agencies, and donors committed to developing interdisciplinary research to promote human health and well-being along with the other SDGs.