The citation network of sustainability science can be divided into 93 clusters, where the number of nodes in each cluster varies from three (the smallest clusters) to 1,584 (the biggest cluster, #1). Papers in each cluster are strongly coupled by intra-cluster citations. Cluster size, i.e. the number of nodes in each cluster, gradually decreases until the 15th cluster, and after the 30th cluster the number becomes negligible. In the following discussion, therefore, we focus on the top 15 clusters, which cover more than 80% of the papers in the network. Figure 3 visualizes the structures of the citation networks of the top 15 clusters. In this figure we assign the same color to intra-cluster links for each cluster. When the structure of a cluster in Fig. 3 is compact and round, it means that papers in the cluster have a strong tendency to cite other papers in the same cluster. Conversely, when a cluster is stretched and spiky, the cluster is closely related to other clusters located in that direction. When two clusters are near to each other, it means the papers in these two clusters cite each other. Table 2 summarizes the contents of each cluster.
Cluster #1 is the Agriculture cluster, in which sustainable agriculture is discussed. The Agriculture cluster has 1,584 papers in it and is the biggest among the 93 clusters. It is also the oldest among the top 15 clusters. Research topics include soil erosion, soil fertility, soil resilience, nutrients, food productivity, plant biodiversity, and so forth. Cluster #2 is the Fisheries cluster, in which the sustainability of world fisheries is discussed. The United States dominates this cluster, with a large CWF. Cluster #3 is Ecological Economics, in which economic indicators of sustainability are proposed and measured. The above three clusters occupy central positions in the network because of their large volume (Fig. 3). The stretched and spiky shape of cluster #3 in Fig. 3 means that this cluster is closely connected to other clusters in the network. Cluster #4 is Forestry (agroforestry). Fertility, such as nitrogen and phosphorus content, is the main concern. Managing the competition between trees and crops for light, water, and nutrients is the key success factor for agroforestry systems. India and Brazil have high CWFs in this cluster, which reflects the importance of this research in those countries. As seen in Fig. 4, approximately half of the papers belong to the top four clusters. It is worth noting that the concept of sustainability originated in the context of sustainable yields for agriculture and renewable resources such as forests or fisheries and has subsequently been adopted as a broad slogan by the environmental movement (Lélé 1991). This historical background is a factor in the current central position of those clusters.
Cluster #5 is Forestry (tropical rain forest). Most papers in this cluster are written by authors in the US and discuss management and economic aspects of timber and non-timber forest products from tropical forests. Cluster #6 is the Business cluster, which is somewhat noisy because most papers discuss the sustainable competitive advantages of a firm. The topological position of the cluster in the citation network reflects this. Some papers definitely share the same context as the other categories, however, e.g. by linking environmental performance and economic performance. Cluster #7 is the Tourism cluster; the subject of sustainable tourism is controversial and the management of oceans and coasts in particular is deliberated. Cluster #8 is the Water cluster, in which wastewater treatment, water resource management, and the water cycle are key topics. It is noteworthy that China focuses on water research. Cluster #9 is the Forestry (biodiversity) cluster; Canada has the highest CWF and is predominant in this cluster. An important goal for research in the cluster is the conservation of biological diversity in forests.
Cluster #10 is the Urban Planning cluster, in which sustainable city and landscape planning are key topics. Social and political aspects of sustainability, for example planning and regulation, are also discussed. Cluster #11 is the Rural Sociology cluster, in which sustainability is closely associated with social issues. Key topics are agreement between the countries of the North and those of the South, rural development, local knowledge, and local food systems. Cluster #12 is the Energy cluster, which is the youngest among the top 15 clusters. In the Energy cluster no country has a value of CWF markedly higher than for other countries, which means that the sustainability of energy is a common and global problem, at least for the developed countries where scientific research is active. Cluster #13 is the Health cluster, in which the sustainability of health projects is discussed. The penetration of intervention into a population and community participation in health-care programs is essential for sustaining health. Cluster #14 is the Soil cluster. Compared with the Agriculture cluster the Soil cluster is more technology-focused. In journals with a high JWF, however, detection of this cluster may be because of an emphasis on regional agricultural systems or a citation bias by which researchers in each country cite journals of their own countries. Cluster #15 is the Wildlife cluster, in which the impact of commercial hunting on forest mammals is investigated. Subsistence hunting by inhabitants and the sustainability of wildlife, especially mammals threatened by game hunting, are investigated.
In Fig. 5, we show the relative positions of these clusters to summarize the above results. We can use this image as an academic overview map of sustainability science. It is worth pointing out some implications of the map. As shown in Fig. 5, some clusters discussing related topics are located in relatively close positions. For example, the Business cluster (#6) is just above the Ecological Economics cluster (#3). The Soil cluster (#14) is in the proximity of the Agriculture cluster (#1). These proximities accord with the relatedness of topics in these clusters. The Forestry clusters (#4, #5, #9) are far from each other, however. This may reflect the diversity of topics in forest research. Agroforestry (#4) is close to Agriculture (#1), Tropical Rain Forest (#5) is near Rural Sociology (#11), and Biodiversity (#9) is near Wildlife (#15). Another view is also possible, however. These clusters (#4, #5, #9) treat similar topics, i.e. forestry and forest management. The citation gap among forestry clusters suggests the existence of a research gap, and the possibility of future collaboration among these clusters. This view might also be valid for Agriculture (#1) and Soil (#14). Papers in the Soil cluster are region-specific as shown in the main journals of Table 2; this may be because of different fields of specialization and research communities from those in the Agriculture cluster.
In the citation-based approach it is assumed that citing and cited papers have similar research topics. Citation behavior is motivated in different ways, however (MacRoberts and MacRoberts 1989), and the result therefore reflect the cognitive structure of scholars in each research domain (Kajikawa et al. 2006). In other words, the citation map can be depicted as a result that must take these different motivations—for example citing papers having similar research topics, unrelated but prominent papers, and self-citations—into consideration. We therefore used NLP as a supplemental method for citation network analysis. We shall now look at the results obtained by NLP.
First, we checked the relevance of the NLP results. Generally, a large fraction of noisy terms is recognized by NLP, and this fraction increases as the number of extracted terms increases. We therefore checked the relevance of extracted terms by comparing them with the keywords designated by the authors. We defined the precision of the result as the fraction of terms according with keywords among all terms extracted by the NC-value method. The results are shown in Fig. 6. The precision was highest at approximately 1,000 terms and decreased as the number of extracted terms increased. Therefore in the following analysis we focused on 1,000 terms extracted by the NC-value method and analyzed the similarity among them. We used the group average method as a clustering method after pruning terms with a similarity threshold of 0.09 to reduce noise. As a result we obtained a dendrogram of 679 terms, as shown in Fig. 7. Some parts of the dendrogram are clearly divided into clusters. Because there is no common criterion for setting the threshold for statistical clustering, we manually set ad hoc criteria to recognize clusters and obtained 19 clusters, as shown in Fig. 7. Two clusters (cluster N1 and N2) consist of noisy terms and one cluster consists of generic terms (cluster M). There are clusters that can be divided at high similarity but cannot be divided at low similarity (A1–A3, E1–E2, and I1–I3). Examples of the terms included in each cluster are shown in Table 3.
Comparing the results obtained by NLP (Table 3) with those by citation network analysis (Table 2), we can see similar clusters. Natural resource-related clusters such as Agriculture, Fisheries, Forestry, Water, and Biodiversity are extracted by both citation network analysis and NLP. These clusters are the central research domains of sustainability science. Clusters relating to Economics (Ecological Economics and Business) are also seen in both results. But some discrepancies exist. For example, the Tourism cluster in the citation network seems to be merged into the Ecological Economics cluster (cluster D) in NLP. This is because in the Tourism cluster the focus of discussion is often on its economic aspects. In NLP we have only one Forestry cluster (cluster F) whereas in the citation network there are three Forestry-related clusters (#4, #5, #9). This suggests the existence of a common terminology for these forestry research domains.
In addition to common clusters, some new clusters can be detected by NLP. These are clusters A1–A3 (Education, Biotechnology, Medical), B (Livestock), E1 (Climate Change), I2 (Welfare), and I3 (Livelihood). Clusters A1–A3 are closely related to each other as shown in the dendrogram (Fig. 7). These clusters are associated with the Health cluster (#13) in the citation network, which implies that education and biotechnology are mainly discussed in the context of the sustainability of health programs. Cluster E2 (Climate Change) is close to cluster E1 (Energy) in the dendrogram. Climate change cannot be detected as a distinct cluster in the citation network but appears in the dendrogram by NLP at this position. Clusters I2 (Welfare) and I3 (Livelihood) are also detected as distinct clusters.
Why do these clusters emerge? One explanation is that these clusters have terms that appear in most of the clusters in the citation network but with few appearances in each cluster. As terms such as education and welfare appear in each citation cluster in small quantities, we cannot detect them as independent clusters by citation network analysis. Nevertheless, distinct clusters are shown by NLP because these terms appear in large quantities across the entire corpus. These clusters that were originally extracted by NLP are therefore considered to be common terms for clusters in the citation network. Common clusters seem to be formed around topics representing what we should sustain: agriculture, fish, water, forests, energy, biodiversity. Some of the clusters that originally appear in the citation network are sub-categories such as soil and wildlife. Clusters originally detected by NLP are more common and more human-rooted, for example welfare, livelihood, and education.
Finally, let us address the limitations of our research. In our approach, we collected the corpus by making a query. The results obtained by citation network analysis indicated that agriculture and fisheries occupy the largest fractions of sustainability science. On the other hand, energy, which is an unquestionably important area of research in sustainability, represents a relatively small fraction of research and is the youngest among the top 15 clusters. But we must note that usage among researchers of the term “sustainability” has been changing. Sustainability was used as a technical term in the early days but nowadays seems to be used to express the importance of global sustainability. It is plausible that clusters with a longer history (e.g. agriculture) have used “sustainability” as a technical term while the younger Energy cluster uses “sustainability” with the latter meaning. Therefore, changes in the definition of sustainability (or the usage of this word) may be behind these results. Debate on the definition and targets of sustainability will continue as a part of sustainability science.