A natural language processing model for supporting sustainable development goals: translating semantics, visualizing nexus, and connecting stakeholders

Matsui, Takanori; Suzuki, Kanoko; Ando, Kyota; Kitai, Yuya; Haga, Chihiro; Masuhara, Naoki; Kawakubo, Shun

doi:10.1007/s11625-022-01093-3

A natural language processing model for supporting sustainable development goals: translating semantics, visualizing nexus, and connecting stakeholders

Original Article
Open access
Published: 04 February 2022

Volume 17, pages 969–985, (2022)
Cite this article

Download PDF

You have full access to this open access article

Sustainability Science Aims and scope Submit manuscript

A natural language processing model for supporting sustainable development goals: translating semantics, visualizing nexus, and connecting stakeholders

Download PDF

Takanori Matsui ORCID: orcid.org/0000-0001-9441-7664¹,
Kanoko Suzuki¹,
Kyota Ando¹,
Yuya Kitai³,
Chihiro Haga¹,
Naoki Masuhara² &
…
Shun Kawakubo³

8054 Accesses
21 Citations
1 Altmetric
Explore all metrics

Abstract

Sharing successful practices with other stakeholders is important for achieving SDGs. In this study, with a deep-learning natural language processing model, bidirectional encoder representations from transformers (BERT), the authors aimed to build (1) a classifier that enables semantic mapping of practices and issues in the SDGs context, (2) a visualizing method of SDGs nexus based on co-occurrence of goals (3) a matchmaking process between local issues and initiatives that may embody solutions. A data frame was built using documents published by official organizations and multi-labels corresponding to SDGs. A pretrained Japanese BERT model was fine-tuned on a multi-label text classification task, while nested cross-validation was conducted to optimize the hyperparameters and estimate cross-validation accuracy. A system was then developed to visualize the co-occurrence of SDGs and to couple the stakeholders by evaluating embedded vectors of local challenges and solutions. The paper concludes with a discussion of four future perspectives to improve the natural language processing system. This intelligent information system is expected to help stakeholders take action to achieve the sustainable development goals.

SDG-Meter: A Deep Learning Based Tool for Automatic Text Classification of the Sustainable Development Goals

The European Green Deal and the 17 SDGs: Uncovering their Connection with a ML-based Approach

On the Problem of Automatically Aligning Indicators to SDGs

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

The decade ending in 2030 is the Decade of Action (United Nations 2020). 2030 is the milestone year of limiting global warming to well below 1.5° (UNFCCC 2015) and of “living in harmony with nature” in 2050 (UNCBD 2021), so reaching the goals requires hastening the related activities. Various platforms have been proposed and developed to support information gathering and knowledge-sharing platforms to promote the sustainable development goals (SDGs) (Nilsson et al. 2018). Further development is expected to enable transactions and innovation of the most advanced SDGs actions and research under digital platforms to reach the stated goals (Bonina et al. 2021).

Since SDGs require multistakeholder partnerships, knowledge platforms must be established at both multiscale and multisector levels. At the global scale, the sustainable development knowledge platform (UNDESA 2021) is representative, while (Sustainable Development Solutions Network 2021) created a tracking and monitoring platform to share government sectors’ progress and maintain accountability. For the business sector, SDG compass (Global Reporting Initiative, UN Global Compact, and WBCSD 2015) provided practical information and tools, and (WBCSD 2021) also offered the SDG Essentials for Business, a learning suite for corporate SDGs activities. In the academic sector, Higher Education Sustainability Initiative (United Nations 2012) is a networking platform for over 300 universities from around the world and the Technology Facilitation Mechanism (UNDESA and UNOICT 2020) has a platform for sharing scientific and technological suggestions, ideas, and solutions for enhancing SDG activities. At the local scale, the Local 2030 (United Nations 2017) support now municipalities’ in monitoring, evaluating, and reviewing their SDGs progress, and the Voluntary Local Review Lab (Institute of Global Environment Strategy 2019) networks the municipalities released the Voluntary Local Review (VLR) reports. In Japan, the Cabinet launched the flagship “Regional Revitalization Public–Private Partnership Platform” to promote domestic SDGs activities and revitalize local areas on a national scale (Cabinet Office Japan 2020). Japan’s Ministry of Foreign Affairs manages the “JAPAN SDGs Action platform” which is a best-practice database of SDGs activities from all sectors (Ministry of Foreign Affairs Japan 2019). The private sector also launched an open innovation platform named “SHIP (SDGs Holistic Innovation Platform)” to share technologies and know-how (Japan Innovation Network and UNDP 2021).

Local governance promotion consistent with global and national scales is very important (Oosterhof 2018) and enhances the mainstreaming of SDGs (Masuda et al. 2021). It is against this background that the authors built the “Local SDGs platform”—a SDGs action supporting system operating on a local scale in Japan since 2017 to the present (Kawakubo 2018). The platform covers 1740 municipalities in Japan and facilitates progress analysis of SDGs in each municipality by using localized SDGs indicators (Kawakubo and Murakami 2020; Cabinet Office Japan 2019). These indicators were developed by adapting the UN’s 244 SDGs indicators (UNSTATS 2017) to the Japanese context. At the same time, municipalities can use the platform to present their valuable experience as narratives. All this enables the municipalities to check and review SDGs progress quantitatively as key performance indicators and share their solutions with their peers. Based on this history, the authors launched a new advanced SDGs communication platform—“Platform Clover” (Sustainable Transition 2021). This expanded the reach beyond just municipalities to all stakeholders. Platform Clover aims to be a base for SDG17 partnerships, providing bottom-up matching that incorporates a variety of goals, missions, experiences, technology, and knowledge.

Artificial intelligence (AI) technology is useful for achieving SDGs (Vinuesa et al. 2020), so AI technology will be utilized to upgrade the semantic analyzing functions of Platform Clover. Our focus will be on these core functions: (1) semantic SDG mapping, (2) SDGs interlinkages and nexus visualization, and (3) stakeholder interpretation and matchmaking. The literature review is below.

Semantic SDG mapping

People with limited knowledge of SDGs have difficulty in translating and mapping their local challenges and activities on to the broader SDGs context. The mapping support function by AI technology should help in this area (Varshney and Mojsilovic 2019). However, this research is still ongoing. The most advanced research can be found on the Open Source SDG (OSDG) project (Pukelis et al. 2020). The OSDG developed a holistic SDGs ontology by coupling a conventional SDGs ontology and a SDGs multi-label classification system by linking a regression model and a topic model. As for machine learning studies, (Pincet et al. 2019) implemented a single-label classification task with a tree-based decision algorithm, while (Sciandra et al. 2020) employed a Gradient Boosting Decision Tree to binarily classify SDGs related tweets on Twitter into an information class or an action class. (Nugroho et al. 2020) used a naïve Bayes classifier to divide news articles into related SDGs and (ElAlfy et al. 2020) classified Corporate Social Responsibility and sustainability reports by FastText algorithm. In Japan, (Koyamada 2019) mapped the policy briefs produced by the Japanese Science Council to relevant SDGs. All this suggests that the demand for technology to link social challenges, policy, and science is quite high.

SDGs nexus visualizing

As emphasized in the preamble to the 2030 agenda, SDGs must be attained by ensuring interlinkages of SDGs and targets. However, the interlinkages among SDGs are very complex and wicked (Bowen et al. 2017), with the importance of both synergy and trade-offs in achieving global optimization repeatedly pointed out (Allen et al. 2018; Del Río Castro et al. 2021; Kroll et al. 2019). These interlinkages and interactions are also referred to as the ”SDGs nexus” (Liu et al. 2018), and this paper employs the word “nexus” in the same way as “interlinkages” in this paper. The visualization of SDGs nexus enables science-based support for effective allocation and distribution of resources and a proactive design of synergy and trade-offs in policy making. For a decade, the authors also had challenged qualitative and quantitative nexus assessments of Japanese prefectural scale (Kumazawa et al. 2009, 2014; Matsui et al. 2019; Masuhara et al. 2019). Recently, SDGs nexus research gains attention from knowledge driven to data driven approach: an integrated research that summarizes key papers (Scharlemann et al. 2020; Alcamo et al. 2020), empirical studies that identify the SDGs interlinkage from VNR’s documents and statistics (Zanten and Tulder 2021; Tosun and Leininger 2017; Sebestyén et al. 2019; Bali Swain and Ranganathan 2021; Fonseca et al. 2020), model-based studies delineate the synergistic or trade-off interactions using the Integrated Assessment Models (van Soest et al. 2019), text mining and network research from documents (Sebestyén et al. 2020), machine learning applications to predict SDG interlinkages (Requejo-Castro et al. 2020), Causality Analysis interconnected SDG factors (Dörgő et al. 2018), a visualizer development of SDG interlinkage (Zhou et al. 2019).

Connecting and matchmaking for collaboration, partnership, and cooperation

The promotion of SDG 17 partnership for the goals is expected to expedite the matching of challenges and problems to know-how and solutions among various stakeholders (Chon et al. 2018; Richards-Kennedy and St Brice 2018; Saric et al. 2019). However, since such matching is still at the proof-of-concept stage, research is sparse. Early studies have only examined the definition of collaboration, partnership, and cooperation in the SDGs context (Stott and Murphy 2020) and guiding collaboration design and governance for contributing SDGs in the business sector (Vazquez-Brust et al. 2020). In the Japanese context, (Cabinet Office Japan 2020) has conducted manual matchmaking exercise of stakeholders, but this has proven a time-consuming task. At a practical level, the UNEP has attempted to apply association rule learning to smart matchmaking of stakeholder (International Telecommunication Union 2020). These matching support systems elaborate the opportunity for all stakeholders to discover potential innovations.

Against this background, this study aims to build a natural language processing system with three functions; (1) a text classifier to map challenges and activities to SDGs context at the goal level; (2) an interlinkage visualizer of the SDGs nexus; (3) semantic matchmaking between local challenges and potential solutions from a variety of stakeholders.

Methodology

The comprehensive analytical framework is shown in Fig. 1. The detailed process of (1) building the corpus database for model training, (2) initializing natural language processing model, (3) training and validating the model, (4) applying the model to SDGs mapping, nexus visualizing, and stakeholder matchmaking are shown below.

Building SDGs corpus database for model training

Japanese documents that explicitly refer to SDG’s goals, targets, and indicators were collected, along with explanatory addendums. This was also done with documents from the United Nations, the Japanese government, and the private sector. The 41 documents are listed in Supplementary material 1. The documents were checked manually and sentences related to the SDGs were extracted. Table 1 shows the samples. This is the initial corpus (N = 1604) and includes both text and 17-dimensional multi-label data. If a sentence is related to SDG3, 5, 10, and 15, the 17-dimensional multihot vector is [0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0]. The mean characters/sentence and token/sentence was 1303.5 and 780.9, respectively, in the initial corpus. However, the BERT model (Devlin et al. 2019), which is a natural language processing model used for this research (explained below), originally has the specification that the maximum length of the input length of the tokens is = < 512. The mean token/sentence exceeded the acceptable vector length of the BERT model, so the sentences were divided to avoid exceeding the 512 token limitation even if all the characters were individually tokenized into a character (e.g., sentences with 1,024 characters was divided into two sentences with 512 characters with same the 17-dimensional multihot vector). As the result, the SDGs corpus database for training was increased to (N = 3758).

Table 1 Samples of corpus database

Full size table

Initializing the natural language processing model

The BERT model—”Bidirectional Encoder Representations from Transformers” developed by (Devlin et al. 2019)—was applied as the natural language processing model to learn the corpus. BERT can be applied to various tasks of natural language processing and performed impressively when measured by General Language Understanding Evaluation (known as GLUE) (Wang et al. 2019), which is the standard benchmark task in the natural language processing research field. And many experiments showed the superiority of BERT task against other machine learning algorithms in text classification (González-Carvajal and Garrido-Merchán 2021). The transformers library (Wolf et al. 2020) (== 3.0.2) developed by Hugging Face (Hugging Face 2016), which is the natural language processing suites implemented on the deep-learning framework (Jax, Pytorch, and Tensorflow), was utilized. This Japanese BERT model pretrained by Japanese Wikipedia on Pytorch framework (== 1.6.0) released by Tohoku university Japan (Inui Laboratory 2019; Suzuki 2021) was adopted. The base model of the Tohoku-BERT (cl-tohoku/bert-base-japanese-whole-word-masking) was used. The Tohoku-BERT employs Japanese morphological analysis Mecab (MeCab 2006) with the ipadic dictionary (Asahara and Matsumoto 2003) (== 2.1.2) and WordPiece algorithm (Sennrich et al. 2015) for the tokenizer. And the BERT model was rebuilt for the multi-label classification task. This was further fine-tuned through the model learning the SDGs corpus database. The original Tohoku-BERT model architecture consists of 12 attention layers with 12 attention multi-heads, and the input for the model is = < 512 tokens from sentences and the output is 768-dimensional vectors by the transformer encoders. Therefore, the input token length was fixed to 512 and a fully connected layer with sigmoid activation function was added after the transformer encoder, allowing input to 768-dimensional vector of the CLS token and output with a 17-dimensional vector. The model was initialized to predict the probabilities of the input sentence belonging to each SDGs.

Training and evaluation of the model

In the process of model training, nested cross-validation (CV) (Varma and Simon 2006) was conducted to train the prediction model with the best combination of hyperparameters and to estimate the expected cross-validation loss at the same time. The inner CV (innerCV) detects the optimum combination of hyperparameters while the outer CV (outerCV) evaluates the expected classification performance. Tenfold was set for outerCV and fivefold for innerCV for the nested CV due to time constraints. This phase involved text data augmentation to the training data frame in each outerCV and innerCV. In the text augmentation, the training data frame was copied and the ten percent tokens included in the copy were replaced by a random synonym predicted by WordNet (Miller 1995) implemented in the nltk library (NLTK 2021). Furthermore, ten percent tokens in the copied data frame were then randomly deleted and the copy was merged with the original data frame.

In the fine-tuning process, the pretrained model parameters of all attention heads in the 1st to11th layers were frozen and the 12 attention heads in the last 12th layer and the final fully connected layers were set as trainable. This operation is expected to facilitate compatibility between the common sense from Wikipedia and the idea of SDGs specific context. The binary cross-entropy with logit loss was set as the loss function for the training and Adam (Kingma and Ba 2017) set as the optimization algorithm of the model parameter. The Bayesian optimization library Optuna (== 1.3.0) (Akiba et al. 2019) was used to search the optimum combination of the batch size (ranged from 2² to 2⁵) for training, the learning rates of the transformer encoder, and the fully connected layers (bath ranged from 10⁻⁵ to 10⁻²). The objective function was the mean loss of each innerCV, with the trial and epoch number set to 2⁴. These search ranges and the numbers of trials were determined with reference to the trial and error and time limitation in the pretest stage.

The optimal hyperparameters detected by the innerCV were set in each outerCV and the expected performance was evaluated based on the aggregated performance of outerCV. The precision metrics were accepted, along with the recall and f1-score for the evaluation of classification performance.

$$ {\text{Precision}}_{i} = \frac{{TP_{i} }}{{TP_{i} + FP_{i} }} $$

(1)

$$ {\text{Recall}}_{i} = \frac{{TP_{i} }}{{TP_{i} + FN_{i} }} $$

(2)

$$ F1\quad {\text{score}}_{i} = \frac{{2 \times {\text{Precision}}_{i} \times {\text{Recall}}_{i} }}{{{\text{Precision}}_{i} + {\text{Recall}}_{i} }} $$

(3)

where, TP (true positive) and TN (true negative) are the numbers of the correct prediction to positive and negative samples, respectively. And conversely FN (false negative) and FP (false positive) are the numbers of the incorrect prediction to positive and negative samples, respectively. The precision i is the ratio of samples predicted to class i that actually belonged to said class (Eq. 1). The recall is the ratio of correctly identified spectrogram numbers to the total spectrogram numbers of class i (Eq. 2). The F1 score is the harmonic mean value of precision and recall of class i (Eq. 3).

Lastly, the best model was trained for SDGs mapping, nexus visualization, and stakeholder matchmaking by setting the mean of the optimal hyperparameter set obtained in each outerCV to the best hyperparameters and the epoch to 2⁵ times.

Application: SDGs mapping, nexus visualizing, and stakeholder matchmaking

This model was used in three applications. First, in the evaluation of SDGs mapping performance, an unknown text, which was not used for the training, was inputted. The BERT model can produce three outputs in the prediction process; semantic vector of the unknown text; membership probability distribution to SDGs; attention weight to the tokens contributing to the classification decision. Hereby the SDG related to the unknown text was predicted quantitatively and the validity of the mapped SDGs and attention to the tokens were qualitatively evaluated by interpreting the semantic features of the unknown text.

As an application case of the text classification model, the Inventory of Business Indicators released from SDG compass (Global Reporting Initiative, UN Global Compact, and WBCSD 2015) (N = 1479, translated in Japanese) was input, and the SDGs related to each indicator were predicted in multi-label format. The co-occurrence of predicted SDGs was analyzed and the network structure as a plausible SDGs nexus visualized.

For an application case of matchmaking of stakeholders, the stakeholder’s database released by (Cabinet Office Japan 2020) was used. (Cabinet Office Japan 2020) regularly holds the matchmaking and networking event beyond the industry, government, academia, and private sectors in their platform and manually matchmake the potential collaborations. This phase simulates the matchmaking application between needs and resources of SDGs by using the semantic vectorizing function and the dimension reduction algorithm.

Results

Performance of multi-label classification

The corpus for the training was (N = 2483), and the mean, maximum, and minimum token/sentence were, respectively, 237.8, 368, and 2. The number of cumulative and unique tokens were 893,739 and 12,290. The Tohoku-Japanese-BERT has 32,000 vocabularies, so the SDGs’ semantic space was defined at 38.4% (12,290/32,000) of the vocabularies. Unknown token that was not in the vocabulary of Tohoku-Japanese-BERT token was not included in the training corpus. The distribution of the numbers of labels by SDGs was not uniform, with the maximum being 1,946 for SDG 08: “decent work” and “economic growth.” The minimum was 773 for SDG 06: “clean water and sanitation.”

The performance of the nested CV was shown in Table 2. It took 265 h for the training on Nvidia Graphical Processing Unit (GPU) Quadro GV100 32 GB with CUDA 10.1 cuDNN 7.6.5. The overall precision, recall, macro-f1-score were, respectively, 0.95, 0.94, and 0.95, which achieved a high cross-validation performance. The recall and precision by SDGs were over 0.90 in all SDGs, with macro f1-scores ranging from 0.92 to 0.97. The mean and standard error of the best hyperparameters obtained by tenfold outerCV, the batch size, the learning rate of the BERT encoder, and the fully connected layer were, respectively, 2^3.7 (0.3), 1.1 × 10⁻⁴ (0.8 × 10⁻⁴), and 1.3 × 10⁻⁴ (0.4 × 10⁻⁴).

Table 2 Performance of nested cross-validation

Full size table

In summary, the classification performance can be regarded as excellent. However, these stats indicate that the introduction of richer corpora and weighted loss against the imbalanced class distribution approach, or an increase in the number of the Bayesian optimizations, may improve performance.

SDGs semantic mapping

A trial of SDGs mapping by multi-label text classification and attention visualization against the unknown data is shown in Fig. 2. The text in Fig. 2 (a) is an original Japanese news article published when Osaka University won an award for its policies to promote equality for sexual minorities (Osaka University Center for Gender Equality Promotion 2021), and Fig. 2 (b) is English translation using Google translate (Google 2021). This article was not included in the training corpus, so it is unknown data to the model. Figure 2 (c) shows the predicted probability vectors by SDGs in multi-label format related to the input article, with the tokens in red color intensity Fig. 2 (a) being the attention weights that the model referenced in the prediction process as key tokens [the red in Fig. 2 (b) were manually added to the highly highlighted tokens].

The top probability of the prediction was SDG 05: gender equality at 99.6%, which sounds appropriate in terms of the description. The tokens with high attention weights were [diversity, gender, LGBTQ (Lesbian, Gay, Bisexual, Transgender and Queer), sexual minority, SOGI (Sexual Orientation and Gender Identity)] and these categories were robustly connected to gender equality and diversity. The main topic was the introduction of all-gender toilets in cooperation with all of the Osaka University members, so the prediction of SDG03 (good health and well-being), SDG04 (Quality education), SDG 06 (clean water and sanitation), and SDG 17 (partnership for the goals) also fit. The high probabilities of SDG 02 (zero hunger), SDG 07 (affordable and clean energy) implicitly may propose some strong nexus hypothesis between gender activities, reducing hunger, and renewable energy implementation. (This aspect is discussed further in the SDGs nexus section below.)

On the other hand, the process also highlights a specifically Japanese language problem in token (19) [インクルージョン&ダイバーシティ] in Japanese in Fig. 2 (a) and [inclusion & diversity] in English in Fig. 2 (b). Japanese uses a mixture of four writing systems, Chinese characters, Hiragana, Katakana, and Alphabet. The token “diversity” can be written in {Chinese character: 多様性, Hiragana: だいばーしてぃ, Katakana: ダイバーシティ, Alphabet: diversity} with same meaning. In token (19) in Japanese, {Katakana: ダイバーシティ} was divided into the sub words of “ダイバー (diver)” and “シティ(sity)”. So, the former “ダイバー (diver)” was focused as “diver” who dives as a sport, or who works or searches for things underwater using special breathing equipment, so this article may be predicted as SDG 14: marine life at 77.4%. This type of problem is not a matter of synonyms, but a language specific problem such as Chinese or Arabic, which pose challenges for morphological analysis.

Visualization of SDGs nexus

Figure 3 is an SDGs nexus predicted by the model. First, the text classifier was applied to all indicators proposed in the Inventory of Business Indicators (N = 1429) in SDG compass (Global Reporting Initiative, UN Global Compact, and WBCSD 2015). All indicators in English were translated to Japanese manually and the translated indicator’s description was then input to the text classifier. SDGs related to each indicator were predicted in the multi-label format, then the predicted probability converted to 1 or 0 with a 50% threshold level to get the multihot vectors. The 17-dimentional multihot vectors, predicted as [0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], can produce the co-occurrence relationship (e.g. SDG2 and 3, SDG2 and 5, SDG3 and 5 were co-occurred in this example). The co-occurrence among SDGs was analyzed and the SDGs nexus visualized. Nodes in Fig. 3 mean 17 SDGs and node sizes are proportional to the PageRank metrics (Brin and Page 1998), which is a score of the node’s influence within the network. Arcs connecting nodes in Fig. 3 are the co-occurrences between SDGs with the width proportional to the Jaccard-score (Jaccard 1912)—i.e., the closeness between the two goals. The libraries of scikit-learn (== 0.22.1) (Pedregosa et al. 2011) and network (== 2.5.1) (Hagberg et al. 2008) were used for the implementation of the Python environment.

Two major cores are observed in Fig. 3—the first core is SDG06 (clean water and sanitation), SDG07 (affordable and clean energy), SDG13 (climate action), SDG11 (sustainable city and community), SDG14 (marine life), and SDG15 (life on land) in the top right in Fig. 3. The second core is SDG01 (poverty eradication) and SDG10 (reduce inequality) in the bottom left in Fig. 3. In the microscopic view, the original indicators in the Inventory of Business Indicators have single-labels at the target level and are assumed to be for monitoring the performance of a single objective. However, the holistic coverage and nexus of SDGs predicted by the model instead suggested the nexus of human rights and equality, the empowerment of women and girls (International Labor Organization 2020; Alarcón and Cole 2019; Dhakal 2018; Mustafa 2019; Afenyo-Agbe and Adeola 2020), and the nexus of ecosystem management and climate action (Portner et al. 2021; Chiabai et al. 2018; Liu 2016; Sarkodie and Owusu 2017). From a macroscopic view these are the two major global challenges and their integration (Jackson and Decker Sparks 2020), and the model sheds light on a possibility to contribute to the global challenges from private sectors.

Currently, the three majors in the SDGs nexus research are: (1) statistical—to detect correlation or causality; (2) knowledge-driven—to infer causal chains based on real-world experience; (3) empirical—to analyze the process-based causal chain between SDGs through careful observation. This study proposes a fourth approach: multiple methods drawing on the network of SDGs by using natural language that focuses on semantics and infers the SDGs nexus by coupling data and knowledge. Especially the model can produce individual SDGs nexus inferences according to the stakeholder’s own data frame. The inferred SDGs nexus can be expected to promote the awareness and sharing of hidden interlinkages, and to potentially produce stakeholder’s collective works.

Matchmaking of stakeholders

To create a matchmaking case, the model was applied to two municipalities in eastern and western Japan and 142 potential solutions from the private sector registered in (Cabinet Office Japan 2020). The sentences of the municipalities’ challenges and the solutions were converted to 768-dimensional vectors by the BERT model and the cosign similarity distance, which is a metric to evaluate the distance between vectors, calculated. The model then matched the challenges and solutions. Table 3 shows the summary: column 1 is the municipality’s name; column 2 is the sentences of the municipality’s challenge in original Japanese and translated English; columns 3 and 4 are the closest and farthest solutions from the private sector. The histogram on the left is the cosign similarity between the municipality and 142 solutions. Kakegawa, a city near metropolitan Tokyo, has a challenge in providing administrative services that allow citizens to move about as little as possible as mandated by the “new normal” imposed by COVID 19. The closest (highest cosign similarity) solution was a company providing audio and visual web content development, while the farthest (the lowest cosign similarity) solution was a biomaterial and bioenergy refinery company. Kishiwada City, which is famous for its “Danjiri festival” (Osaka Convention and Tourism; Bureau 2018), wants to promote and utilize its other natural and cultural resources. The company deemed to have the best solution packaged the regional resources, created a promotion strategy, and trained tourist guides. Lowest ranked was a human development company that offered teleworking support. Both cases make sense given the “needs and seeds”, so this appears to be rational matchmaking.

Table 3 Cases of a match making between municipalities and private sector

Full size table

Finally, Fig. 4 shows a visually supported map of stakeholder’s matchmaking. A dimension reduction algorithm was applied to convert all stakeholders’ 768-dimensional vectors to 2-dimensional vectors. The t-SNE (Maaten 2014) algorithm was used as the dimension reduction algorithm on scikit-learn library (== 0.22.1) on Python. The two large plots in Fig. 4 are the municipalities (Needs0 = Kakegawa and Needs1 = Kishiwada) and the small plots are the solutions. Each of the plots is embedded vector in the two-dimensional space and the color of the plot indicates the most suitable SDG as judged by the BERT model. Stakeholders can easily and globally see the potential candidates of matchmaking by referring to the semantic analysis and SDGs. Currently, in the (Cabinet Office Japan 2020) matchmaking event, a stakeholder shows some specific needs and the other stakeholders propose solutions, and (Cabinet Office Japan 2020) manually organizes a one-on-one session with empirical trial and error approach. This approach strongly depends on the organizer’s coordination resources. Our model can provide a readable map for all stakeholders and support to make the matchmaking process more transparent and reproducible. This function will be implemented our developing online platform, so we will validate the utility through practice with multi-stakeholders in future.

Discussion

The improvement of model performance of the text classification and vectorization will contribute in fundamental ways to SDGs mapping and its application to nexus assessment and matchmaking tasks. There are many technical issues—for instance, the vocabulary and data size is small and the model needs to be much larger. However, four elements are essential to improve model development here.

What is the accuracy?

The “accuracy” of the prediction itself is a difficulty. Indicators in SDG compass (Global Reporting Initiative, UN Global Compact, and WBCSD 2015) has a single-label format in the target level. As an evaluation of generalization performance, the model was tested to see if it could reproduce the single-label defined by the SDG compass. Table 4 shows the basic statistics of the corpus of indicators and the performance of the prediction. First, there was a significant difference in the basic text length between the training corpus (253.7 tokens/sentences) and indicators’ description in the SDG compass (30.3 tokens/sentences). A short sentence has few tokens or co-occurrences of tokens that characterize the meaning of sentences in the SDGs context. This tendency may affect the predictive reliability of the model, which was trained by long sentences. In fact, the mean predicted probability of indicators was 0.1 (S.D. 0.06) overall, which was a very conservative prediction. Thus, the score of recall, precision, and f1-score were quite low given a set threshold of 0.5 for the binarization. On the other hand, ROC/PR AUC, which are the metrics to evaluate the prediction performance (ranged from 0: poor to 1: good) by changing the threshold dynamically. The ROC-AUC had a fairly good outcome at 0.697 (S.E. 0.023), however, the PR-AUC had a bad outcome at 0.17 (S.E. 0.043). This result suggests that performance can change depending on which function we hope to the model prediction. Whether we require the model to predict both true positive and true negative or to actively predict only true positive gives a very different acceptance to the model performance.

Table 4 Corpus statistics and classification performance of Inventory of Business Indicators from SDG compass (Global Reporting Initiative, UN Global Compact, and WBCSD 2015)

Full size table

Noise in labeling by humans is also a significant issue. In the Inventory of Business Indicators from the SDG compass, “Average plant availability factor by energy source and by regulatory regime” is the indicator to evaluate target 1.4: “By 2030, ensure that all men and women, in particular the poor and the vulnerable, have equal rights to economic resources, as well as access to basic services, ownership and control over land and other forms of property, inheritance, natural resources, appropriate new technology and financial services, including microfinance.” This paper’s model predicted that this indicator belonged to GOAL 01 (no poverty): 0.001, GOAL 07 (affordable and clean energy): 0.796, GOAL 09 (industry, innovation and infrastructure): 0.312, GOAL 11 (sustainable cities and communities): 0.994, GOAL 13 (climate action): 0.974. Thus, while the human labeler might assume some kind of SDGs link between poverty and access to basic services with a human imagination, it appears that the model performed better. Moreover, the interpretation can change the contexts surrounding the stakeholder so right and wrong predictions are not entirely crucial. When it comes to that the SDGs, the important thing is to design the solutions with maturely considering synergies and trade-off, so the judgments of humans and AIs should complement each other.

Single-label vs. multi-label

As stated in the preamble of the 2030 agenda (United Nations 2015),”SDGs are integrated and indivisible and balance the three dimensions of sustainable development: the economic, social, and environmental…”. Therefore, SDGs mapping should be a multi-label task. (Zhang et al. 2020a, b), using a corpus (N = 606) with single label tried to train both conventional feature-based and deep learning-based machine learning algorithms (Naïve Bayes, Support Vector Machine, Logistic regression, Convolutional Neural Network, Long Short-Term Memory, ELMo, BERT). None of the models could achieve more than a 0.1 in f1-score, irrespective of the type of algorithm. We also checked and confirmed the reproducibility of this tendency by training the model using only single label corpus. The concept of “decarbonization” is obviously related to both SDG 07 (affordable and clean energy) and SDG13 (climate action)—however, in the single-label classification task, “decarbonization” can be linked only to SDG 07 or to SDG 13 or neither. This restriction severely affects the training of the source-target attention layers in the BERT model. Given this, we are convinced that the text classification task in the SDGs field definitely requires a multi-label data frame for both model training and the SDGs nexus.

Language dependency of accuracy

Our text classification model training displays high performance. (Guisiano and Chiky 2021) also conducted a multi-label classification task with the augmented SDGs documents in English and achieved an accuracy rate of over 0.90, an excellent performance. However, each language has separate difficulties in collecting documents, preprocessing corpus, so there is little meaning in comparing accuracy among languages. (Zhang et al. 2020a, b) used ALBERT, a simpler version of BERT (Lan et al. 2020), and developed a system to infer the nexus between 4005 of SDGs activities in Japanese and achieved 0.7 accuracy. They implied that the sentences that included multiple languages make classification difficult. As shown in an example of “biodiversity” in the result of this research, Japanese uses a lot of English so this study’s corpus included many mixed sentences in Japanese and English. From a technical aspect, the Tohoku-Japanese pretrained model originally used the Japanese Wikipedia database, and this model divides English words into all alphabet with WordPiece algorithms, such as “SDGs”—> {“S,” “D,” “G,” “s”}. It goes without saying that the BERT model must learn the relationship and the order of all of the original meaning in tokens may be lost in the self-attention processing. As (Amin et al. 2019) pointed out, better cross-lingual and cross-domain embedding alignment methods that can transferred effectively will encourage research. And these works are not competitive but collective as described below.

Gigantic global model and indigenous local model

On this occasion, we attempted to build a text classification system localized in Japanese. However, SDGs has a globally universal agenda, which must be sharable in any language. There are two alternatives. One is to develop a universal semantic processing model based on an ultra-giant model such as GPT-3 (Generative Pre-trained Transformer 3) (Brown et al. 2020) and fine-tuned through a gigantic corpus comprising SDGs knowledge from all over the world translated in a universal language. The global SDGs projects, such as AI for Good project (International Telecommunication Union 2017), are expected to meet this challenge. The other alternative, as the history of the Local Agenda 21 (United Nations 1992) and the promotion of the Local 2030 (United Nations 2017) suggests, is that the essentials of SDGs achievement may be locally driven, based on an ensemble approach for the globally thinking and locally acting stakeholders. Each regional and local community, including languages archived in the Atlas of the World’s Languages in Danger (Moseley 2010), develops their SDGs semantic models in their original language, and the models utilizes indigenous local knowledge and creates an ensemble wisdom under the global collaboration.

Conclusion

This study established an SDGs corpus in Japanese and extracted the sentences related to SDGs with multi-label annotation. The BERT, a state-of-the-art model for natural language processing, was trained with this SDGs corpus to build a text classifier model that can identify the SDGs related to the input sentences and also vectorize the semantics. By using the model, a nexus among SDGs was predicted from a representative indicator database and potential applicability to matchmake the stakeholders for SDGs collaboration. Finally, the model had a generally good performance and further development points were discussed, such as the accuracy improvement and a globalization and localization strategy.

For future exploration, we will attempt to establish corpora in the six official languages of the United Nations and verify the interoperability of corpora for model learning across languages and the possibility of diverting the trained models to other languages. And as a further trial, we will also attempt to design a generative model which can convert inputted normal sentences to edited sentences that were translated in the SDGs context. This will be supported in multiple languages and the corpus and models will be implemented on Platform Clover for global collaborations.

References

Afenyo-Agbe EA, Adeola O (2020) Promoting gender equality and women’s empowerment through tourism in Africa: towards agenda 2030. In: Adeola O (ed) Empowering African women for sustainable development: toward achieving the united nations’ 2030 goals. Springer International Publishing, pp 121–132. https://doi.org/10.1007/978-3-030-59102-1_11
Chapter Google Scholar
Akiba T, Sano S, Yanase T, Ohta T, Koyama M (2019) Optuna: a next-generation hyperparameter optimization framework. [Cs, Stat]. http://arxiv.org/abs/1907.10902
Alarcón DM, Cole S (2019) No sustainability for tourism without gender equality. J Sustain Tour 27(7):903–919. https://doi.org/10.1080/09669582.2019.1588283
Article Google Scholar
Alcamo J, Thompson J, Alexander A, Antoniades A, Delabre I, Dolley J, Marshall F, Menton M, Middleton J, Scharlemann JPW (2020) Analysing interactions among the sustainable development goals: findings and emerging issues from local and global studies. Sustain Sci 15(6):1561–1572. https://doi.org/10.1007/s11625-020-00875-x
Article Google Scholar
Allen C, Metternicht G, Wiedmann T (2018) Initial progress in implementing the sustainable development goals (SDGs): a review of evidence from countries. Sustain Sci 13(5):1453–1467. https://doi.org/10.1007/s11625-018-0572-3
Article Google Scholar
Amin S, Neumann G, Dunfield K, Vechkaeva A, Chapman KA, Wixted MK (2019) MLT-DFKI at CLEF eHealth 2019: multi-label classification of ICD-10 Codes with BERT, CLEF 2019 Working Notes. Conference and Labs of the Evaluation Forum (CLEF-2019), 10th Conference and Labs of the Evaluation Forum
Asahara M, Matsumoto Y (2003) ipadic version 2.7.0 User’s Manual. https://osdn.net/projects/naist-jdic/docs/ipadic-2.7.0-manual-en.pdf/en/1/ipadic-2.7.0-manual-en.pdf.pdf. Accessed 03 Aug 2021
Bali Swain R, Ranganathan S (2021) Modeling interlinkages between sustainable development goals using network analysis. World Dev 138:105136. https://doi.org/10.1016/j.worlddev.2020.105136
Article Google Scholar
Bonina C, Koskinen K, Eaton B, Gawer A (2021) Digital platforms for development: foundations and research agenda. Inf Syst J. https://doi.org/10.1111/isj.12326
Article Google Scholar
Bowen KJ, Cradock-Henry NA, Koch F, Patterson J, Häyhä T, Vogt J, Barbi F (2017) Implementing the “sustainable development goals”: towards addressing three key governance challenges—collective action, trade-offs, and accountability. Curr Opin Environ Sustain 26–27:90–96. https://doi.org/10.1016/j.cosust.2017.05.002
Article Google Scholar
Brin S, Page L (1998) the anatomy of a large-scale hypertextual web search engine. Seventh International World-Wide Web Conference (WWW 1998), Brisbane, Australia. http://ilpubs.stanford.edu:8090/361/. Accessed 30 Jan 2021
Brown TB, Mann B, Ryder N, Subbiah M, Kaplan J, Dhariwal P, Neelakantan A, Shyam P, Sastry G, Askell A, Agarwal S, Herbert-Voss A, Krueger G, Henighan T, Child R, Ramesh A, Ziegler DM, Wu J, Winter C, Amodei D (2020) Language models are few-shot learners. [Cs]. http://arxiv.org/abs/2005.14165
Cabinet office Japan (2019) Regional Revitalization SDGs local indicator list 2019.04 Draft version. https://www.chisou.go.jp/tiiki/kankyo/kaigi/h30lwg1/shiryo1.pdf. Accessed 03 Aug 2021
Cabinet office Japan (2020), SDGs for Regional Revitalization Public-Private Partnership Platform). https://future-city.go.jp/sdgs/. Accessed 03 Aug 2021
Chiabai A, Quiroga S, Martinez-Juarez P, Higgins S, Taylor T (2018) The nexus between climate change, ecosystem services and human health: towards a conceptual framework. Sci Total Environ 635:1191–1204. https://doi.org/10.1016/j.scitotenv.2018.03.323
Article CAS Google Scholar
Chon M, Roffe P, Abdel-Latif A (eds) (2018) The cambridge handbook of public-private partnerships, intellectual property governance, and sustainable development, 1st edn. Cambridge University Press. https://doi.org/10.1017/9781316809587
Book Google Scholar
Del Río Castro G, González Fernández MC, Uruburu Colsa Á (2021) Unleashing the convergence amid digitalization and sustainability towards pursuing the sustainable development goals (SDGs): a holistic review. J Clean Prod 280:122204. https://doi.org/10.1016/j.jclepro.2020.122204
Article Google Scholar
Devlin J, Chang M-W, Lee K, Toutanova K (2019) BERT: pre-training of deep bidirectional transformers for language understanding. [Cs]. http://arxiv.org/abs/1810.04805
Dhakal SP (2018) Cooperative enterprises and sustainable development in post-crisis Nepal: a social responsibility perspective on women’s employment and empowerment. Entrepreneurship and the sustainable development goals, vol 8. Emerald Publishing Limited, pp 185–200
Chapter Google Scholar
Dörgő G, Sebestyén V, Abonyi J (2018) Evaluating the interconnectedness of the sustainable development goals based on the causality analysis of sustainability indicators. Sustainability 10(10):3766. https://doi.org/10.3390/su10103766
Article Google Scholar
ElAlfy A, Darwish KM, Weber O (2020) Corporations and sustainable development goals communication on social media: corporate social responsibility or just another buzzword? Sustain Dev 28(5):1418–1430. https://doi.org/10.1002/sd.2095
Article Google Scholar
Fonseca LM, Domingues JP, Dima AM (2020) Mapping the sustainable development goals relationships. Sustainability 12(8):3359. https://doi.org/10.3390/su12083359
Article Google Scholar
Global Reporting Initiative, UN Global Compact, and WBCSD (2015) SDGs Compass. https://sdgcompass.org/. Accessed 03 Aug 2021
González-Carvajal S, Garrido-Merchán EC (2021) Comparing BERT against traditional machine learning text classification. [Cs, Stat]. http://arxiv.org/abs/2005.13012
Google (2021). Google translate. https://translate.google.co.jp/. Accessed 03 Aug 2021
Guisiano J, Chiky R (2021) Automatic classification of multilabel texts related to sustainable development goals (SDGs). TECHENV EGC2021. https://hal.archives-ouvertes.fr/hal-03154261
Hagberg A, Swart P, Chult DS (2008) Exploring network structure, dynamics, and function using networkx (LA-UR-08-05495; LA-UR-08-5495). Los Alamos National Lab. (LANL), Los Alamos, NM (United States). https://www.osti.gov/biblio/960616
Hugging Face (2016) The AI community building the future. https://huggingface.co/. Accessed 03 Aug 2021
Institute of Global Environment Strategy (2019), Online Voluntary Local Review (VLR) Lab. https://www.iges.or.jp/en/projects/vlr. Accessed 03 Aug 2021
International Labor Organization (2020) Decent work. https://www.ilo.org/global/topics/decent-work/lang--en/index.htm. Accessed 03 Aug 2021
International Telecommunication Union (2017) AI for Good. https://aiforgood.itu.int/. Accessed 03 Aug 2021
International Telecommunication Union (2020) United Nations Activities on Artificial Intelligence (AI) 2020. https://www.itu.int:443/en/publications/gs/Pages/publications.aspx. Accessed 03 Aug 2021
Inui Laboratory (2019) Cl-tohoku/bert-japanese. https://github.com/cl-tohoku/bert-japanese. Accessed 03 Aug 2021
Jaccard P (1912) The distribution of the flora in the alpine zone.1. New Phytol 11(2):37–50. https://doi.org/10.1111/j.1469-8137.1912.tb05611.x
Article Google Scholar
Jackson B, Decker Sparks JL (2020) Ending slavery by decarbonisation? Exploring the nexus of modern slavery, deforestation, and climate change action via REDD+. Energy Res Soc Sci 69:101610. https://doi.org/10.1016/j.erss.2020.101610
Article Google Scholar
Japan Innovation Network and UNDP (2021), SHIP: SDGs Holistic Innovation Platform. https://www.sdgs-ship.com/en/. Accessed 03 Aug 2021
Kawakubo S (2018) Local SDGs Platform. https://local-sdgs.jp/?lang=en_us. Accessed 03 Aug 2021
Kawakubo S, Murakami S (2020) Development of the local SDGs platform for information sharing to contribute to achieving the SDGs. IOP Conf Ser Earth Environ Sci 588:022019. https://doi.org/10.1088/1755-1315/588/2/022019
Article Google Scholar
Kingma DP, Ba J (2017) Adam: a method for stochastic optimization. [Cs]. http://arxiv.org/abs/1412.6980
Koyamada K (2019) Overview visualization of the proposals: open data of science council of Japan. Trends Sci 24(4):4_73-4_77. https://doi.org/10.5363/tits.24.4_73
Article Google Scholar
Kroll C, Warchold A, Pradhan P (2019) Sustainable development goals (SDGs): are we successful in turning trade-offs into synergies? Palgrave Commun 5(1):1–11. https://doi.org/10.1057/s41599-019-0335-5
Article Google Scholar
Kumazawa T, Saito O, Kozaki K, Matsui T, Mizoguchi R (2009) Toward knowledge structuring of sustainability science based on ontology engineering. Sustain Sci 4(1):99. https://doi.org/10.1007/s11625-008-0063-z
Article Google Scholar
Kumazawa T, Kozaki K, Matsui T, Saito O, Ohta M, Hara K, Uwasu M, Kimura M, Mizoguchi R (2014) Initial design process of the sustainability science ontology for knowledge-sharing to support co-deliberation. Sustain Sci 9(2):173–192. https://doi.org/10.1007/s11625-013-0202-z
Article Google Scholar
Lan Z, Chen M, Goodman S, Gimpel K, Sharma P, Soricut R (2020) ALBERT: a lite BERT for self-supervised learning of language representations. [Cs]. http://arxiv.org/abs/1909.11942
Liu Q (2016) Interlinking climate change with water-energy-food nexus and related ecosystem processes in California case studies. Ecol Process 5(1):14. https://doi.org/10.1186/s13717-016-0058-0
Article Google Scholar
Liu J, Hull V, Godfray HCJ, Tilman D, Gleick P, Hoff H, Pahl-Wostl C, Xu Z, Chung MG, Sun J, Li S (2018) Nexus approaches to global sustainable development. Nat Sustain 1(9):466–476. https://doi.org/10.1038/s41893-018-0135-8
Article Google Scholar
Masuda H, Okitasari M, Morita K, Katramiz T, Shimizu H, Kawakubo S, Kataoka Y (2021) SDGs mainstreaming at the local level: case studies from Japan. Sustain Sci 16(5):1539–1562. https://doi.org/10.1007/s11625-021-00977-0
Article Google Scholar
Masuhara N, Iwami A, Matsui T (2019) Local initiatives and issues towards achieving sustainable development goals. Pap Environ Inf Sci. https://doi.org/10.11492/ceispapers.ceis33.0_43 (ceis33)
Article Google Scholar
Matsui T, Kawawake A, Iwami A, Masuhara N, Machimura T (2019) Structure analysis of SDGs network based on nexus approach. J Jpn Soc Civil Eng Ser G 75(6):II_39-II_47. https://doi.org/10.2208/jscejer.75.6_II_39
Article Google Scholar
MeCab (2006) Yet another part-of-speech and morphological analyzer. https://taku910.github.io/mecab/. Accessed 03 Aug 2021
Miller GA (1995) WordNet: a lexical database for English. Commun ACM 38(11):39–41
Article Google Scholar
Ministry of Foreign Affairs Japan (2019) JAPAN SDGs Action platform. https://www.mofa.go.jp/policy/oda/sdgs/index.html. Accessed 03 Aug 2021
Moseley C (2010) Atlas of the World’s Languages in Danger, 3rd edn. Paris, UNESCO Publishing. Online version: http://www.unesco.org/culture/en/endangeredlanguages/atlas
Mustafa A (2019) An analysis of the nexus between female labour force participation and women’s empowerment in Bangladesh [Thesis, Brac University]. http://dspace.bracu.ac.bd/xmlui/handle/10361/12085
Nilsson M, Chisholm E, Griggs D, Howden-Chapman P, McCollum D, Messerli P, Neumann B, Stevance A-S, Visbeck M, Stafford-Smith M (2018) Mapping interactions between the sustainable development goals: lessons learned and ways forward. Sustain Sci 13(6):1489–1503. https://doi.org/10.1007/s11625-018-0604-z
Article Google Scholar
NLTK (2021) Natural Language Toolkit—documentation. https://www.nltk.org/. Accessed 03 Aug 2021
Nugroho A, Widyawan, Kusumawardani SS (2020) Distributed classifier for SDGs topics in online news using RabbitMQ message broker. J Phys Conf Ser 1577:012026. https://doi.org/10.1088/1742-6596/1577/1/012026
Article Google Scholar
Oosterhof PD (2018) Localizing the sustainable development goals to accelerate implementation of the 2030 agenda for sustainable development (Issue 33). Asian Development Bank. https://www.adb.org/publications/sdgs-implementation-2030-agenda-sustainable-development
Osaka Convention and Tourism Bureau (2018) Kishiwada Danjiri Festival. OSAKA-INFO. https://osaka-info.jp/en/page/kishiwada-danjiri-festival. Accessed 03 Aug 2021
Osaka University Center for Gender Equality Promotion (2021) A report, Osaka University won a gender equality prize. https://www.osaka-u.ac.jp/ja/news/topics/2020/11/1201. Accessed 03 Aug 2021
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay É (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12(85):2825–2830
Google Scholar
Pincet A, Okabe S, Pawelczyk M (2019) Linking aid to the sustainable development goals—a machine learning approach. OECD Development Co-operation Working Papers 52. OECD Publishing. https://doi.org/10.1787/4bdaeb8c-en
Pörtner HO, Scholes RJ, Agard J, Archer E, Arneth A, Bai X, Barnes D, Burrows M, Chan L, Cheung WL, Diamond S, Donatti C, Duarte C, Eisenhauer N, Foden W, Gasalla MA, Handa C, Hickler T, Hoegh-Guldberg O, Ichii K et al (2021) IPBES-IPCC co-sponsored workshop report synopsis on biodiversity and climate change (Version 1). Zenodo. https://doi.org/10.5281/ZENODO.4782538
Pukelis L, Puig NB, Skrynik M, Stanciauskas V (2020) OSDG–open-source approach to classify text data by UN sustainable development goals (SDGs). [Cs]. http://arxiv.org/abs/2005.14569
Requejo-Castro D, Giné-Garriga R, Pérez-Foguet A (2020) Data-driven Bayesian network modelling to explore the relationships between SDG 6 and the 2030 Agenda. Sci Total Environ 710:136014. https://doi.org/10.1016/j.scitotenv.2019.136014
Article CAS Google Scholar
Richards-Kennedy S, St Brice L (2018) Knowledge brokerage, SDGs and the role of universities. Soc Econ Stud 67(4):7–35
Google Scholar
Saric J, Blaettler D, Bonfoh B, Hostettler S, Jimenez E, Kiteme B, Koné I, Lys J-A, Masanja H, Steinger E, Upreti BR, Utzinger J, Winkler MS, Breu T (2019) Leveraging research partnerships to achieve the 2030 agenda: experiences from North-South cooperation. GAIA Ecol Perspect Sci Soc 28(2):143–150. https://doi.org/10.14512/gaia.28.2.13
Article Google Scholar
Sarkodie SA, Owusu PA (2017) The relationship between carbon dioxide, crop and food production index in Ghana: by estimating the long-run elasticities and variance decomposition. Environ Eng Res 22(2):193–202. https://doi.org/10.4491/eer.2016.135
Article Google Scholar
Scharlemann JPW, Brock RC, Balfour N, Brown C, Burgess ND, Guth MK, Ingram DJ, Lane R, Martin JGC, Wicander S, Kapos V (2020) Towards understanding interactions between sustainable development goals: the role of environment–human linkages. Sustain Sci 15(6):1573–1584. https://doi.org/10.1007/s11625-020-00799-6
Article Google Scholar
Sciandra A, Surian A, Finos L (2020) Supervised machine learning methods to disclose action and information in “U.N. 2030 agenda” social media data. Soc Ind Res. https://doi.org/10.1007/s11205-020-02523-4
Article Google Scholar
Sebestyén V, Bulla M, Rédey Á, Abonyi J (2019) Network model-based analysis of the goals, targets and indicators of sustainable development for strategic environmental assessment. J Environ Manag 238:126–135. https://doi.org/10.1016/j.jenvman.2019.02.096
Article Google Scholar
Sebestyén V, Domokos E, Abonyi J (2020) Focal points for sustainable development strategies—text mining-based comparative analysis of voluntary national reviews. J Environ Manag 263:110414. https://doi.org/10.1016/j.jenvman.2020.110414
Article Google Scholar
Sennrich R, Haddow B, Birch A (2015) Neural machine translation of rare words with subword units. https://arxiv.org/abs/1508.07909v5
Stott L, Murphy DF (2020) An inclusive approach to partnerships for the SDGs: using a relationship lens to explore the potential for transformational collaboration. Sustainability 12(19):7905. https://doi.org/10.3390/su12197905
Article Google Scholar
Sustainable Development Solutions Network (2021) The sustainable development report. https://www.sdgindex.org/. Accessed 03 Aug 2021
Sustainable Transition (2021) Sustainable transition, platform clover. https://platform-clover.net/. Accessed 03 Aug 2021
Suzuki M (2021) Pretrained Japanese BERT Models. https://github.com/cl-tohoku/bert-japanese. Accessed 03 Aug 2021
Tosun J, Leininger J (2017) Governing the Interlinkages between the sustainable development goals: approaches to attain policy integration. Global Chall 1(9):1700036. https://doi.org/10.1002/gch2.201700036
Article Google Scholar
UNDESA (2021) Sustainable development knowledge platform. https://sustainabledevelopment.un.org/index.html/. Accessed 03 Aug 2021
UNDESA and UNOICT (2020) 2030 connect. https://tfm2030connect.un.org/. Accessed 03 Aug 2021
United Nations (1992) Agenda 21 https://sustainabledevelopment.un.org/outcomedocuments/agenda21. Accessed 03 Aug 2021
United Nations (2012) Higher education sustainability initiative. https://sustainabledevelopment.un.org/sdinaction/hesi. Accessed 03 Aug 2021
United Nations (2015) Transforming our world: the 2030 agenda for sustainable development. https://sdgs.un.org/2030agenda. Accessed 03 Aug 2021
United Nations (2017) Local 2030. https://www.local2030.org/. Accessed 03 Aug 2021
United Nations (2020) Decade of action. https://www.un.org/sustainabledevelopment/decade-of-action/. Accessed 03 Aug 2021
United Nations Convention on Biological Diversity (2021) Preparations for the Post-2020 Biodiversity Framework
United Nations Framework Convention on Climate Change (2015) The Paris Agreement
UNSTATS (2017) SDG indicators. https://unstats.un.org/sdgs/indicators/indicators-list/. Accessed 03 Aug 2021
van der Maaten L (2014) Accelerating t-SNE using tree-based algorithms. J Mach Learn Res 15(93):3221–3245
Google Scholar
van Zanten JA, van Tulder R (2021) Towards nexus-based governance: defining interactions between economic activities and sustainable development goals (SDGs). Int J Sustain Dev World 28(3):210–226. https://doi.org/10.1080/13504509.2020.1768452
Article Google Scholar
van Soest HL, van Vuuren DP, Hilaire J, Minx JC, Harmsen MJHM, Krey V, Popp A, Riahi K, Luderer G (2019) Analysing interactions among sustainable development goals with integrated assessment models. Global Transitions 1:210–225. https://doi.org/10.1016/j.glt.2019.10.004
Article Google Scholar
Varma S, Simon R (2006) Bias in error estimation when using cross-validation for model selection. BMC Bioinformatics 7(1):91. https://doi.org/10.1186/1471-2105-7-91
Article CAS Google Scholar
Varshney KR, Mojsilovic A (2019) Open platforms for artificial intelligence for social good: common patterns as a pathway to true impact. [Cs]. http://arxiv.org/abs/1905.11519
Vazquez-Brust D, Piao RS, de Melo MFS, Yaryd RT, Carvalho M (2020) The governance of collaboration for sustainable development: exploring the “black box.” J Clean Prod 256:120260. https://doi.org/10.1016/j.jclepro.2020.120260
Article Google Scholar
Vinuesa R, Azizpour H, Leite I, Balaam M, Dignum V, Domisch S, Felländer A, Langhans SD, Tegmark M, Fuso Nerini F (2020) The role of artificial intelligence in achieving the sustainable development goals. Nat Commun 11(1):233. https://doi.org/10.1038/s41467-019-14108-y
Article CAS Google Scholar
Wang A, Singh A, Michael J, Hill F, Levy O, Bowman SR (2019) GLUE: a multi-task benchmark and analysis platform for natural language understanding. [Cs]. http://arxiv.org/abs/1804.07461
WBCSD (2021) SDG essentials for business. https://sdgessentials.org/index.html. Accessed 03 Aug 2021
Wolf T, Debut L, Sanh V, Chaumond J, Delangue C, Moi A, Cistac P, Rault T, Louf R, Funtowicz M, Davison J, Shleifer S, von Platen P, Ma C, Jernite Y, Plu J, Xu C, Scao TL, Gugger S, Rush AM (2020) Hugging face’s transformers: state-of-the-art natural language processing. [Cs]. http://arxiv.org/abs/1910.03771
Zhang X, Motoki Y, Soneoka Y, Iwasawa Y, Matsuo Y (2020a) Creation of a Japanese SDGs dataset and a baseline model of classification. Proceedings of the Annual Conference of JSAI, JSAI2020, 1D3GS1305–1D3GS1305. https://doi.org/10.11517/pjsai.JSAI2020.0_1D3GS1305
Zhang X, Shiramatsu S, JIN Y, Kamiya A (2020b) Examination of correspondence estimation method between activity goal data and SDGs goals in Mission Forest, the 7th conference on Special Interest Group on Crowd Co-creation Intelligence. https://sigcci.github.io/sigcci/conf7/pdf/SIG-CCI-007-01.pdf. Accessed 30 Jan 2021
Zhou X, Moinuddin M, Li Y(2019) SDG interlinkages analysis & visualisation tool (V3.0). https://sdginterlinkages.iges.jp/. Accessed 03 Aug 2021

Download references

Acknowledgements

This work was supported by the Environment Research and Technology Development Fund (1-2104, JPMEERF20211004).

Author information

Authors and Affiliations

Division of Sustainable Energy and Environmental Engineering, Graduate School of Engineering, Osaka University, Yamadaoka 2-1, Suita, Osaka, 565-0871, Japan
Takanori Matsui, Kanoko Suzuki, Kyota Ando & Chihiro Haga
School of Human Science and Environment, University of Hyogo, Shinzaike-honcho 1-1-12, Himeji, Hyogo, 670-0092, Japan
Naoki Masuhara
Department of Architecture, Faculty of Engineering and Design, Hosei University, 2-33 Ichigayatamachi, Shinjuku, Tokyo, 162-0843, Japan
Yuya Kitai & Shun Kawakubo

Authors

Takanori Matsui
View author publications
You can also search for this author in PubMed Google Scholar
Kanoko Suzuki
View author publications
You can also search for this author in PubMed Google Scholar
Kyota Ando
View author publications
You can also search for this author in PubMed Google Scholar
Yuya Kitai
View author publications
You can also search for this author in PubMed Google Scholar
Chihiro Haga
View author publications
You can also search for this author in PubMed Google Scholar
Naoki Masuhara
View author publications
You can also search for this author in PubMed Google Scholar
Shun Kawakubo
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

TM, SK, NM devised the project, TM established the main conceptual ideas and proof outline. TM, KS, KA, CH worked out almost all of the technical details, and performed the corpus data collection and coding for machine learning implementation and the applications. YK and SK developed the indicator database. TM, KS proposed the experimental discussions with SK and NM, TM wrote the manuscript and all members reviewed and contributed the finalization.

Corresponding author

Correspondence to Takanori Matsui.

Additional information

Handled by Osamu Saito, Institute for Global Environmental Strategies, Japan.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (XLSX 17 KB)

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Matsui, T., Suzuki, K., Ando, K. et al. A natural language processing model for supporting sustainable development goals: translating semantics, visualizing nexus, and connecting stakeholders. Sustain Sci 17, 969–985 (2022). https://doi.org/10.1007/s11625-022-01093-3

Download citation

Received: 16 August 2021
Accepted: 21 December 2021
Published: 04 February 2022
Issue Date: May 2022
DOI: https://doi.org/10.1007/s11625-022-01093-3

A natural language processing model for supporting sustainable development goals: translating semantics, visualizing nexus, and connecting stakeholders

Abstract

Similar content being viewed by others

SDG-Meter: A Deep Learning Based Tool for Automatic Text Classification of the Sustainable Development Goals

The European Green Deal and the 17 SDGs: Uncovering their Connection with a ML-based Approach

On the Problem of Automatically Aligning Indicators to SDGs

Introduction

Semantic SDG mapping

SDGs nexus visualizing

Connecting and matchmaking for collaboration, partnership, and cooperation

Methodology

Building SDGs corpus database for model training

Initializing the natural language processing model

Training and evaluation of the model

Application: SDGs mapping, nexus visualizing, and stakeholder matchmaking

Results

Performance of multi-label classification

SDGs semantic mapping

Visualization of SDGs nexus

Matchmaking of stakeholders

Discussion

What is the accuracy?

Single-label vs. multi-label

Language dependency of accuracy

Gigantic global model and indigenous local model

Conclusion

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Additional information

Publisher's Note

Supplementary Information

Supplementary file1 (XLSX 17 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation