Neural nets for sustainability conversations: modeling discussion disciplines and their impacts

Pugh, Katrina; Musavi, Mohamad; Johnson, Teresa; Burke, Christopher; Yoeli, Erez; Currie, Emily; Pugh, Benjamin

doi:10.1007/s00521-023-08819-z

Neural nets for sustainability conversations: modeling discussion disciplines and their impacts

Original Article
Open access
Published: 01 September 2023

Volume 35, pages 21935–21947, (2023)
Cite this article

Download PDF

You have full access to this open access article

Neural Computing and Applications Aims and scope Submit manuscript

Neural nets for sustainability conversations: modeling discussion disciplines and their impacts

Download PDF

Katrina Pugh ORCID: orcid.org/0000-0001-8455-6273¹,
Mohamad Musavi²,
Teresa Johnson²,
Christopher Burke²,
Erez Yoeli³,
Emily Currie² &
…
Benjamin Pugh⁴

1165 Accesses
Explore all metrics

Abstract

We live in the age polarization, where conversations on matters of sustainability more often produce acrimony or stalemate than productive action. Better understanding conversation features and their impacts may lead to better innovation, solution-design, and ongoing collaboration. We describe a study to test alternate machine learning models for classifying six “discussion disciplines”, which are conversation features associated with rhetorical intent. The model providing the best outcome used the Bi-directional Encoder Representations from Transformers (BERT) layered with a Residual Network (ResNet). The training data were 1135 utterances from Maine aquaculture town hall-like meetings and similar conversations, which had been hand-coded for the discussion disciplines. In addition, we generated 300 phrases corresponding to three conversation outcomes: Intent-to-Act, Options-Generation, and Relationship-Building. We then used the trained model and information retrieval to classify a large corpus of 591 open-source transcripts, containing over 21,000 utterances. A binary logistic regression analysis showed that two discussion disciplines, “Inclusion” and “Courtesy,” had positive, statistically significant, impacts on Intent-to-act: a 10 percentage point increase in the share of the Inclusion or Courtesy yielded a 45% or 34% increase, respectively, in the likelihood of Intent-to-Act. This study shows the applicability of neural networks in modeling conversations and identifying the dialog acts that can provide measurable and predictable impact on conversation outcomes. Conversational intelligence can support a variety of human interactions, such as town halls, policy-deliberations, private–public partnerships, and sustainability teamwork.

Towards Open Domain Chatbots—A GRU Architecture for Data Driven Conversations

Automated Utterance Labeling of Conversations Using Natural Language Processing

Conversational AI and equity through assessing GPT-3’s communication with diverse social groups on contentious topics

Article Open access 18 January 2024

Find the latest articles, discoveries, and news in related topics.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Managers, researchers and policymakers have long sought to better understand the friction in, productivity of, and durability of interacting groups (e.g. [1, 9]. For several decades, natural language processing (NLP) researchers have attempted to quantify the relationships between rhetorical intents in conversations, and their outcomes (collectively, “conversation features”) [29, 22]. While “conversational AI” (chat bots) is well established, rapidly interpreting human-to-human conversation in a manner to improve innovation, motivation, and belonging (“AI for conversation”) is in its infancy [4, 24].

Yet, there is a growing need: there is polarization and manipulation in social media, public meetings, and policy tables. These are having negative effects on the environment and human health, and are spreading anti-democratic behaviors [19, 26]. Sitting at the confluence of humans with their environment, sustainability discourse is vulnerable to misinformation and conflict [2, 10]. At risk in all these conversations are participants’ accountability, reciprocity, and innovativeness.

The discussion disciplines are rhetorical intents that characterize the dialog acts that make up speech [27]. Our research hypothesis was that the discussion disciplines’ shares would correlate to specific outcomes. The six discussion disciplines^{Footnote 1} that our research explored were Integrity, Integrity-q, Courtesy, Inclusion, Translation, and Snarky [21]. Discussion disciplines derive from MIT's four dialog practices—voice, respect, listening, and suspension—which have been shown to improve collaboration [12, 13]. To accommodate everyday speech, we augmented the dialog practices with classifications of Translation-related, Inclusion-related, and “Snarky” rhetorical intents [22]. We also studied three conversation outcomes: Intent-to-Act, Relationship-Building, and Options-Generation, as shown in Fig. 1.

We started with town hall-like meetings that were required in the Maine aquaculture lease scoping process [23]. We used these to train a neural net that would describe sustainability-related conversation. Using a large open-source corpus, we used the trained neural net model to classify the discussion disciplines. We then accurately correlated discussion discipline shares to conversations’ affective and motivational outcomes. Our novel neural network model layered the Bidirectional Encoder Representations from Transformers (BERT) [6] with Residual Network (ResNet) [11]. We have shown that the BERT-ResNet model outperforms BERT alone, as well as the Term Frequency-Indirect Document Frequency (TF*IDF) [28] and [29].

In the following sections, we describe our method and its performance. The “Conversation Data Preparation” section describes the aquaculture town-hall like meetings that we hand-coded and the large corpus of open-source data. The “Modeling and Simulation” section explains the TF*IDF and BERT models, as well as our additional ResNet layer. In the “Model Application” section, we apply the BERT-ResNet model to a large corpus of utterances, and use a binary logistic regression to evaluate the impacts of the discussion disciplines' percentages on conversation outcomes. The “Discussion” section suggests new areas for neural network research to inform and improve conversations at work and in society. Finally, we conclude our research in the last section of this paper.

2 Conversation data preparation

We began our research by attending aquaculture lease “scoping session” meetings (LSMs) which are town hall-like gatherings of Maine aquaculture stakeholders, such as riparian landowners, harbor masters, boaters, and aquaculture farmers. LSMs are part of Maine’s aquaculture lease-approval process governed by the Maine Department of Marine Resources [16]. In each LSM, participants debate the costs and benefits of a new lease, of lease expansion or of lease renewal, e.g., for a scallop, oyster or kelp farm. Participants discuss boat traffic, biodiversity, noise pollution, esthetics, marine navigation, livelihoods, and food security, to name a few topics. To collect transcript data, we attended seven LSMs over Zoom in Fall 2020 and Spring 2021. We manually recorded each utterance, speaker, gender, and role in the conversation. Conversation utterances were then hand-parsed into distinct “moves” (dialog acts of one or more sentences with observable, individual rhetorical intents). Single moves were then hand-coded for each of the six discussion disciplines, for a total of 728 moves.^{Footnote 2} These transcripts statistics are shown in row No. 1 of Table 1. Interviews with aquaculture farmers and other researchers helped validate the coding of the transcripts.

Table 1 Transcript and utterance counts for open-source and hand-coded aquaculture LSM (and similar) transcripts

Full size table

After coding transcripts for discussion disciplines, we manually observed the relationships between the discussion discipline percentages and conversation outcomes: Intent-to-Act, Relationship-Building, and Options-Generation. In our manual analysis, we found correlations between Inclusion and Intent-to-Act; between Integrity-Q and Options-Generation; between Translation and Options-Generation; and between Courtesy and Relationship-Building, as presented in Fig. 2. For each quadrant, the discussion discipline percentage in the transcript is sorted left to right, least to greatest. The outcomes, listed on the horizontal axis of each graph, are Relationship-Building (RB); Options-Generation (OG), and Intent-to-Act (ITA). Snarky tended to reduce all outcomes. Integrity moves, at over 50% of the samples, tended to be spoken by the aquaculture farmer (who convened the LSM), and were associated more with information exchange than with collective outcome, so this is not shown in Fig. 2.

In order to prepare more training data for the neural network model to identify the discussion disciplines, we added some additional hand-coded training data. We raised the observations to 1,138 utterances by adding to the 728 aquaculture LSM data another 410 utterances from US National Archive transcripts, student online discussions, and professional community online discussions, as shown in row No. 2 of Table 1.

In order to measure patterns between the discussion disciplines and the outcomes of transcripts in the open-source data, we regressed outcomes on discussion discipline shares at the transcripts-level. For statistical significance with our six-independent variable model, we would need to classify tens of thousands of utterances for discussion disciplines, and calculate percentages inside transcripts, and then relate those percentages to at least four hundred transcript-level outcomes. We obtained these utterances from 591 open data transcripts from Cornell University’s ConvoKit [3], as shown in rows No. 3 to 9 in Table 1. Approximately 400 transcript observations would be sufficient to find statistical significance. Our open-source transcripts were used first to train the TF*IDF model and then to test the hypothesis that discussion discipline percentages impact the outcomes of the conversations. These transcripts contributed 21,051 utterances. We detected the transcript-level outcomes in the open data by using information retrieval from a lookup table, which was created manually from the outcomes of the LSMs. This lookup table contained 300 phrase examples correlated to two of the outcomes: Intent-to-Act (ITA) and Options-Generation (OG) (Table 2). We used lemmatization to expand phrase examples prior to building the lookup table.

Table 2 Outcomes look up table illustration

Full size table

In each transcript the third outcome, Relationship-Building, was the percentage change in “net positivity,” in other words, Courtesy counts plus Inclusion counts, minus Snarky counts. Thus Relationship-Building was calculated as net positivity for the second half, less net positivity for the first half, divided by net positivity in the first half.

Figure 3 is a visualization of the data and modeling pipeline, as explained below.

3 Modeling and simulation

Based on our previous research into the discussion disciplines [22], we theorized that all conversations could be classified into the six discussion disciplines. Our initial goal was to generate a model that would categorize utterances into clusters which would align with the discussion disciplines. To do this, we used three different models, as briefed below.

3.1 TF*IDF-based classification model

The Term Frequency-Indirect Document Frequency (TF*IDF) transformer model [8] was chosen for its computational transparency, as it derives from the search engine optimization process. TF*IDF begins with word embedding, which is a transformation of text into a numerical vector of m dimensions, where m is the number of unique elements, or “tokens” (word, phrase, word-group, phrase-group) for each utterance. Therefore, for n utterances, we will have a numerical matrix of mxn where n is the number of rows and m is the number of columns. Table 3 shows a simple word embedding of two utterances adapted from the aquaculture transcripts: “I’m hearing no consensus, so I recommend we bring out further options.” and “It is obvious that we need further information to reach consensus.” In the example, after stripping out punctuation, each word is associated with a unique category (“index”), the cell taking the value of the frequency with which the word is present in the sentence [18]. This simple embedding does not capture context, and it becomes computationally inefficient for large vocabularies. To address this, we used the TF*IDF model that relies on a fixed number of tokens (m) derived from the English dictionary.

Table 3 Illustration of word embedding for two sentences

Full size table

TF*IDF in our context uses “Terms” which are tokens and “Documents” which are utterances, as described above. TF*IDF starts with vectors of token counts for each utterance. For a token t in utterance d, the weight (W_t,d) is given by:

$${\text{W}}_{{{\text{t}},{\text{d}}}} = {\text{TF}}_{{{\text{t}},{\text{d}}}} {\text{log }}({\text{n}}/{\text{DF}}_{{\text{t}}} )$$

Where TF_t,d is the number of occurrences of t in utterance d, DF_t is the number of utterances containing token t, n is the total number of utterances in the corpus, m is the total number of tokens.

DF_t in the denominator, reduces the size of the natural log. Not surprisingly, as the TF*IDF model comes from search engine optimization (SEO), one would want a high-occurrence, or salience, of the term in each utterance (numerator), and low occurrence (rarity in the corpus) (denominator) [8]. What this inverse relationship means is that TF*IDF finds the term to be dominating with respect to the utterance, but rare or salient with respect to the corpus.

Our TF*IDF based model used Cornell’s Convokit “PromptTypesWrapper” [3] to find token similarities across utterances in the corpus. We computed in-utterance term frequency (TF) relative to the inverse of corpus-based term frequency (IDF). In our case, our tokens were “phrasing motifs”: We started with pairs of dependency-related words (or “bi-grams”). Where pairs of bi-grams were frequently observed, they are called “phrasing motifs” [14].

We generated an mxn matrix, where m (columns) was the number of unique phrasing motifs for an utterance and n (rows) was the number of unique utterances. (When phrasing motifs are used as the basis for counting the unique element, we add back a certain degree of rhetorical context, leveraging the dependency-related nature of our bi-gram tokens, that, in turn, combine into phrasing motifs. Frequently occurring phrasing motifs used as the columns in the TF*IDF matrix decreased the vocabulary and, thus, sparsity of the matrix.)

We fed into TF*IDF the 21,051 utterances from the open-source transcripts (See Table 1). Transcripts of multiple utterances were selected based on their corpus-similarity to the LSM conversations [17]. For our corpus, we reduced the matrix using the Singular Value Decomposition (SVD) process and then used a K-means clustering process to arrive at six clusters with common meaning for the six discussion disciplines.

To make the unsupervised K-means clusters into a supervised classifier, we fed the TF*IDF model the 728 hand-coded LSM utterances. The clusters were each assigned to a discussion discipline by looking at the largest percent of any discussion discipline that was close to the sigmoid of the cluster. Then, those cluster and discussion disciplines were set aside. The next largest percent of a discipline in a cluster was identified, and that cluster and that discussion discipline were set aside. This continued until all of the clusters were labeled with a discussion discipline. Where there were conflicts, the more frequently occurring discussion disciplines were favored. For the LSM transcripts, the average percentages were Integrity (51%), Integrity-Q (15%), Courtesy (12%), Inclusion (11%), Translation (6%) and Snarky (5%). Thus, if both Integrity and Courtesy had the majority of their utterances in cluster 1, Integrity would be the assigned cluster. This maintained a conservative approach to the scorecard, as described below. The hand-coded LSM transcripts were then used to test the model by running the LSM moves through the model and validating the coding match. This entailed starting with the reduced matrix and then re-running the k-means clustering.

Based on low results (42%), we added a lookup (information retrieval) process. Using the 300-phrase dictionary of discussion disciplines we used simple information retrieval (lookup) to locate and append discussion disciplines before running the TF*IDF process. (The phrase match and append occurred for approximately 20% of the utterances.) We then parsed the utterances into phrasing motifs as usual. The TF*IDF matrix was then generated, and then the SVD and clustering were performed.

Next, to accommodate the asymmetrical distribution of the discussion disciplines, we evaluated the use of a Poisson normalization within the SVD process. This involved taking all nonzero column values in the TF*IDF matrix and dividing them by the square root of (cell value + 1) and then subtracting the mean of the related column. Then, SVD was performed, and the Poisson/mean step was repeated in reverse. This was followed by the K-Means clustering as usual.

The scorecard is shown in Table 4. Ultimately, we found that appending the discussion disciplines in 20% of the cases helped, but the Poisson normalization did not improve the overall performance. The TF*IDF variants’ overall accuracy did not surpass 45.2%.

Table 4 Scorecard for model performance against hand-coded utterances data: TF*IDF, BERT, and BERT + ResNet

Full size table

3.2 BERT-based classification model

The Bi-directional Encoder Representations from Transformers (BERT) is a neural network open-sourced by Google in 2018, described by Devlin and Chan (2017). Built on top of another of Google’s open-sourced applications, TensorFlow, BERT was trained on Google Search and Wikipedia, and was intended for classifying speech (e.g., for sentiment analysis). The ancestor of the ChatGPT large language model, BERT’s transfer learning leverages a pre-trained general-purpose model, which can be used to train on new, labeled data. BERT uses neural network layers that are derived from self-attention in the sentence or utterance (“contextual” self-attention), combined with look-ups (“non-contextual” self-attention). For example, contextual and non-contextual elements allow BERT to recognize paraphrases [7]. Devlin and Chan [6] enumerate BERT’s capabilities: Word sense disambiguation, polysemy resolution (e.g., “river bank,” “rob a bank”), named entity determination, textual entailment / next sentence prediction, coreference resolution, question answering, and automatic summarization.

For our initial BERT-only model, we used 80%/20% data distribution for training and validation using the 1138 hand coded utterances. The model generated a base-BERT 768-dimension word embedding, which is the standard for the model. This resulted in 85% accuracy, with poor performance on the less-abundant disciplines of Translation and Snarky (Table 4).

3.3 BERT-ResNet based classification model

To improve upon the discussion discipline classification performance for TF*IDF and BERT, we enhanced the BERT model with ResNet (Residual Neural Network), which is a deep neural network developed in 2015 for the purpose of image classification [11]. Being highly convolutional makes ResNet valuable for feature extraction where the training data are limited for any variable. As ResNet uses transfer learning, like BERT, it starts with basic knowledge that could be fine-tuned for the discussion disciplines. Figure 4 contains our BERT-ResNet model sequence.

With ResNet, we initially took the BERT embeddings and transformed them into a 30 × 30 matrix, as shown in the left-hand side of Fig. 4. We used overlapping (stacked) sections of the embedding as rows in the output matrix.

A typical padding process of a ResNet model is shown in Fig. 5.

Given that ResNet expects a three-dimensional tensor, i.e., red–green–blue (RGB) images, the matrix was stacked on itself three times. After the stacking, a three-dimensional average pooling layer was used to downsample the input. An example of how to generate a two-dimensional average pooling layer with a 2 × 2 filter and a (2,2) stride is shown in Fig. 6.

The input was down-sampled to (3, 28, 28) (Fig. 4). This was then passed to the ResNet model. The output of ResNet has a shape of (5, 5, 2048). This output was passed to a 2-dimensional average pooling layer to down-sample the input to have dimensionality of (1, 1, 2048). This output was then flattened to a dense layer of just 2048 nodes. Subsequent steps reduced the size of the layer until ultimately reaching the classification layer with six sigmoid nodes. Rectified Linear Unit (ReLU) activation functions and two dropout layers were used to reduce the computational intensity of the backpropagation algorithm and to protect against overfitting, respectively. Finally, the sigmoid layer was used to detect the presence of the six discussion disciplines. Each node had a sigmoid activation function for identifying the six discussion disciplines, which varied between 0 and 1. The nodes with the highest values were chosen as the “winning” discussion discipline.

The hand-coded 1135 training data samples were randomly broken up into two groups for training and validation, at 80%-20% proportions (908 training and 227 validation). The distribution was checked to make sure that the model would see all types of the various classes in both datasets.

Initially, all layers were made trainable for the first five epochs (learning cycle sweep through training data), after which all of the ResNet layers were frozen (no longer trainable), to prevent the ResNet model from overfitting. The remaining 45 epochs trained the average pooling and dense layers in the model that were not part of ResNet.

Compared to BERT-alone, at 85%, the BERT + ResNet combination improved accuracy substantially to 95% as shown in Table 5. The discussion discipline with the highest misclassification was Snarky, with an accuracy of 45%. This could be due to the complex nature of Snarkiness, such as sarcasm (e.g., criticism masked as positivity, as in [15]), indirect speech, or innuendo, which can be hard to detect, even for humans. In fact, in our manual analysis, indirection was a factor in hand-coding several LSMs, where participants in the conversation made oblique references or used sarcasm.

Table 5 Confusion matrix for the final BERT + ResNet model.

Full size table

4 Model application

Using the BERT-ResNet model, which yielded the best result, we ingested 21,051 utterances from 591 open-source transcripts, as shown in Table 6. Table 6 shows that the open-transcripts’ outcomes distribution generally matches the hand-coded data, except for Relationship-Building, which was higher in the LSM data. (Recall that the data and modeling pipeline, including our data and open-source data, was presented in Fig. 3.)

Table 6 Transcript counts, outcomes, and gender, open-source v. hand-coded

Full size table

With a binary logistic regression statistical model, using the open-source data processed by the BERT-ResNet model, we regressed outcomes of each transcript on the percentages of each of the discussion disciplines, by transcript. Correlations between several of the discussion disciplines, and between the discussion disciplines with the outcomes showed a meaningful relationship between Inclusion and Courtesy and Intent-to-Act, as was seen in the manual data. This is shown in the Pearson Correlation matrix, Table 7.

Table 7 Pearson correlation for five discussion disciplines (plus snarky) and three outcomes

Full size table

Inclusion and courtesy are both correlated to Intent-to-Act, and Integrity and Translation are both negatively correlated with Intent-to-Act. Negative correlations between Integrity Q and Translation with Options-Generation are unusual. and may be due to Translation’s negative correlations with Courtesy and Inclusion. Naturally, Snarky is negatively correlated with the five other discussion disciplines.

Due to Collinearity, we evaluated combinations of the variables (discussion discipline percentages, and outcomes detected), to determine the Binary Logistic Regression experiment with the most explanatory power. Table 8 and 9 show the most successful experiment. Table 8 indicates that the discussion disciplines can predict the presence of an Intent-to-Act 98.6% of the time and the absence of Intent-to-Act at 29.8%, with a cut value of 0.5 for the function, for an overall classification accuracy of 79.9%. This result presents an increase of 7.1% over the base case of 72.8% (a guess “yes” for Intent-to-Act, the actual overall share in the open-source transcripts in the third-to-last column in Table 5, above row No. 9). The 29.8% is the “sensitivity,” and the 98.6% is the “specificity.” Table 9 presents coefficients of the binary logistic regression of Intent-to-Act on the discussion disciplines. We see a positive statistically significant explanatory power of Inclusion and Courtesy. Columns indicate the standardization process. The Betas in the first column are unstandardized. The “Wald Statistic” is the quotient of Beta divided by Standard Error, and then squared. The Expected (B) (or “Exp(B)”) is the odds ratio. Each odds ratio in this table indicates the multiplicative change in the odds (of a case falling into the Intent-to-Act target, an output of 1), per unit increase on a given predictor, controlling for the other predictors in the model. If the odds ratio, Exp(B), is 1, it indicates that there is no change in the impact to the dependent variable per unit impact in the predictor. If the odds ratio, Exp(B), is greater than 1, then it indicates that the odds associated with target group (Intent-to-Act) membership are increasing. If it is less than 1, then it indicates that the odds associated with the target group (Intent-to-Act) membership are decreasing.

Table 8 Classification accuracy of binary logistic regression of intent-to-act on discussion disciplines

Full size table

Table 9 Binary logistic regression of Intent-to-Act conversation outcome on discussion disciplines

Full size table

Using the Exp(B) values, we can interpret the table. When the Inclusion percentage increases by 10 percentage points, we increase the odds of Intent-to-Act by 45% (= 10*(1.045–1)). When the Courtesy increases by 10 percentage points, we increase the odds of Intent-to-Act by 34% (= 10*(1.034–1)). Snarky had a surprising large positive coefficient, which suggests it may be picking up other omitted collinear variables, as suggested in Table 7.

As with the manual analysis between discussion disciplines and outcomes (Fig. 3), our results suggest that Inclusion (acknowledgement) has the biggest impact on Intent-to-Act. Being included or acknowledged may arouse a sense of being recognized, and thus a desire to be accountable and/or to take action. In our aquaculture data, Intent-to-Act appeared in a number of ways, such as a statement by the aquaculture farmer about an intent to move their scallop or oyster lease coordinates to reduce navigation-obstruction risks. Intent-to-Act could also be other participants’ statements that they would share information, such as their fishing methods, research outcomes, or land investment plans. Courtesy (positivity, pro-sociality) may have added to the overall sense of mutuality and conscientiousness.

5 Discussion

Our findings suggest that using a combination of BERT and ResNet for the discussion discipline detection, and using a rules-based process for outcomes detection, can profile conversations accurately, and that there is some evidence relating Inclusion and Courtesy to Intent-to-Act. For example, in our aquaculture domain, we can predict how Inclusion statements like, “Please tell us your concern about protecting navigation channels,” could yield an Intent-to-Act by some participant in the conversation, such as the farmer’s intent to relocate their aquaculture lease, or community members’ intent to participate in a future lease siting study. In interviews with the aquaculture farmers, they expressed surprise (and some comfort) about the impact of conversation rhetoric, independent of professional facilitation. This was an empowering outcome for them. They felt that combining qualitative (hand-coded conversation) and these neural net insights could improve community members’, farmers’ and policymakers’ toolkits for reducing conflict, and thus improve the outcomes of similar conversations.

While these results are promising, we see four areas for future research. First, differences in the appearance of conversation features across domains and cultures can yield different model formations. For example, the precise language of courtesy (positivity) may differ in a business versus social community. Those differences necessitate new training data and require another test of corpus similarity for the large-corpus statistical analysis [17].

Second, our model detects the presence of any of the six discussion disciplines but does not assess their magnitude. That is to say, the model has no sense of “how” snarky, inclusive, or inquiring an utterance is. It would be beneficial to look at the boundary conditions for discussion discipline labels. The weak instances of the discussion disciplines were not measured in our study, and just as a Snarky move could have a more negative conversation outcome than witty sarcasm, a vehement Inclusion move might improve Intent-to-Act more than a weak Inclusion move.

Third, our open data transcripts were divided into utterances, whereas the utterances of our training data (LSM transcripts and our other hand-coded transcripts) were further divided into “moves” (a single dialog act, containing one discussion discipline). The BERT-ResNet model while working with the open-source transcripts took the most likely discussion discipline contained in each utterance. It is possible that singularly labeling a compound utterance (for example, an utterance containing both Courtesy and Translation) could cause us to miss the nuanced impacts of each discussion discipline on outcomes. Żelasko et al. [27] suggesting that utterances consist of multiple dialog acts, recommends coding for punctuation, such as commas and colons.

Finally, context-dependence may change discussion disciplines’ meaning. For example, different language may be used with larger groups, with more or less familiarity, or more or less hierarchical social-cultural contexts. Context also supports double-entendre: Signaling [25] like gestures of generosity, or indirect speech [20] like vague accusations of “invasive species” were mentioned occasionally in the LSMs. Some approaches, such as prompt-response designations as inputs [28] and the use of the self-attention layers [27], are attempting to address such context.

6 Conclusion

In this paper, we tested neural net models to classify six rhetorical intents in conversations, called discussion disciplines. By combining BERT and ResNet, we achieved a 95.2% accuracy rate relevant to human-coded data, surpassing pure BERT by over 10 percentage points. We applied the best model to a large, open-source corpus of transcripts to explore the relationship between discussion disciplines and outcomes, and found Inclusion and Courtesy to be significant determinant of Intent-to-Act. We suggest that incorporating discussion discipline intensity, splitting utterances into “moves,” and incorporating measures of context may improve both the accuracy of the classification, as well as the usefulness of the model across settings and cultures. We also suggest applying our model pipeline with new training data for unique language-cultures.

There is great opportunity to train a similar neural network model with different sustainability conversation scenarios, such as climate change, PFAS contamination, and water scarcity. So often well-intended policymakers, citizens, managers and scientists are held back by an unawareness of their rhetorical impacts in conversations. Our hope is to assist sustainability teams, policymakers, and citizens toward conversations with productive outcomes.

Data availability

The open data sets (transcripts) used during the current study are available at ConvoKit, Chang et al. [3] and the Cornell ConvoKit site. See https://doi.org/10.48550/arXiv.2005.04246. The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

Notes

Definitions for much of the vocabulary in this article are provided in the Appendix.
The term “move” applies to hand coded data, which have been parsed down to single dialog acts. We use the term “utterances” to refer to the smallest unit available with the open data.

References

Almaatouq A, Alsobay M, Yin M, Watts DJ (2021) Task complexity moderates group synergy. Proc Natl Acad Sci USA 118(36):1–9. https://doi.org/10.1073/pnas.2101062118
Article Google Scholar
Bago B, Rand D, Pennycook G (2021) Reasoning about climate change. PsyArXiv 1–42. https://psyarxiv.com/vcpkb/
Chang J, Chiam C, Fu L, Wang A, Zhang J, Danescu-Niculescu-Mizil C (2020) ConvoKit: a toolkit for the analysis of conversations. ArXiv. http://arxiv.org/abs/2005.04246 https://doi.org/10.48550/arXiv.2005.0424
Chang J, Schluger C, Danescu-Niculescu-Mizil C (2022) Thread with caution: proactively helping users assess and deescalate tension in their online discussions. In: Proceedings of CSCW 2022
Danescu-Niculescu-Mizil C, Lee L, Pang B, Kleinberg J (2012) Echoes of power: language effects and power differences in social interaction. In: WWW’12—Proceedings of the 21st annual conference on world wide web, pp 699–708. https://doi.org/10.1145/2187836.2187931 and https://arxiv.org/pdf/1112.3670.pdf
Devlin J, Chan M (2018) Open sourcing BERT: state-of-the-art pre-training for natural language processing. https://ai.googleblog.com/2018/11/open-sourcing-bert-state-of-art-pre.html
Google (2021) Transfer learning and transformer models. Google ML Tech Talks. Retrieved March 4, 2022 from https://www.youtube.com/watch?v=LE3NfEULV6k.
Goralewicz B (2021) The TF*IDF algorithm explained, Onely, Retrieved June 13, 2021 from https://www.onely.com/blog/what-is-tf-idf/
Hansen M (2009) Collaboration: how leaders avoid the traps, build common ground, and reap big results. Harvard Business Press, Boston
Google Scholar
Hart D (2018) Teamwork is the new leadership. In: Maine policy review, vol 27. https://digitalcommons.library.umaine.edu/mpr/vol27/iss1/10.
He K, Zhang X, Ren S, Sun J (2015) Deep residual learning for image recognition. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 770–778. https://www.semanticscholar.org/paper/Deep-Residual-Learning-for-Image-Recognition-He-Zhang/
Isaacs W (1999) Dialogue and the art of thinking together. Princeton Press, Princeton
Google Scholar
Isaacs W (2016) Trim-tab dialogues: transformative vision and action in South Asia. In: The World Needs Dialogue! Issue February, pp 1–19
Jurafsky D, Martin J (2014) Dependency parsing. Cognit Technol 9783642414633:403–437. https://doi.org/10.1007/978-3-642-41464-0_13
Article Google Scholar
Kumar A, Narapareddy VT, Srikanth VA, Malapati A, Neti LBM (2020) Sarcasm detection using multi-head attention based bidirectional lSTM. IEEE Access 8:6388–6397. https://doi.org/10.1109/ACCESS.2019.2963630
Article Google Scholar
Maine Department of Marine Resources (2023) Chapter 2: Aquaculture lease regulations. https://www.maine.gov/dmr/rules-enforcement/regulations-rules. Retrieved on May 20, 2023.
Mihalcea R, Corley C, Strapparava C (2006) Corpus-based and knowledge-based measures of text semantic similarity. Proc Natl Conf Artif Intell 1:775–780
Google Scholar
Mikolov T, Sutskever I, Chen K, Corrado G, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Demeester T, Rocktäschel T, Riedel S (2016) Lifted rule injection for relation embeddings. EMNLP 2016—Conference on empirical methods in natural language processing, proceedings, pp 1389–1399. https://doi.org/10.18653/v1/d16-1146
Pennycook G, Rand D (2021) The psychology of fake news. Trends Cognit Sci. https://doi.org/10.1016/j.tics.2021.02.007
Article Google Scholar
Pinker S, Nowak M, Lee J (2008) The logic of indirect speech. Proc Natl Acad Sci USA 105(3):833–838
Article Google Scholar
Pugh K (2022) Sustainability conversation for impact: transdisciplinarity on four scales. Electronic Theses and Dissertations 3608. https://digitalcommons.library.umaine.edu/etd/3608
Skifstad S, Pugh K (2014) Beyond netiquette: discussion discipline drives innovation (Chapter 8). In Pugh K (eds) Smarter innovation: using interactive processes to drive better business results. Ark Group, Wilmington
Sadusky H, Brayden G, Zydlewski, Belle S (2022) Maine Aquaculture Roadmap 2022–2032. Maine Sea Grant. Retrieved from https://seagrant.umaine.edu/wp-content/uploads/sites/467/2022/01/Maine-Aquaculture-Roadmap-2022.pdf on March 13, 2022.
See A, Roller S, Kiela D, Weston J (2019) What makes a good conversation? How controllable attributes affect human judgments. Facebook Research and Stanford University. https://parl.ai/projects/
Spence M (1973) Job market signaling. Quart J Econ 87:355–374. https://doi.org/10.2307/1882010
Article Google Scholar
Voelkel et al (2022) Megastudy identifying successful interventions to strengthen Americans’ democratic attitudes. In prep. Stanford University, Stanford. Contact: willer@stanford.edu and jvoelkel@stanford.edu.
Żelasko P, Pappagari R, Dehak N (2021) What helps transformers recognize conversational structure? Importance of context, punctuation, and labels in dialog act recognition. Trans Assoc Comput Linguist 9:1179–1195. https://doi.org/10.1162/tacl_a_00420
Article Google Scholar
Zhang A, Culbertson B, Paritosh P (2017) Characterizing online discussion using coarse discourse sequences. In: Proceedings of the 11th international conference on web and social media, ICWSM 2017, pp 357–366. https://research.google/pubs/pub46055/
Zhang J (2021) Toward actionable understandings of conversations: a computational approach. Cornell University. August, 2021. https://tisjune.github.io/papers/phd-thesis.pdf
Zhang J, Mullainathan S, Danescu-Niculescu-Mizil C (2020) Quantifying the causal effects of conversational tendencies. ArXiv 4(October). https://dl.acm.org/doi/abs/https://doi.org/10.1145/3415202

Download references

Funding

Partial funding for the research was supported by the National Oceanographic and Atmospheric Association (NOAA) National Sea Grant Award #NA18OAR4170103 and the USDA National Institute of Food and Agriculture, Hatch project number #ME022012 through the Maine Agricultural and Forest Experiment Station. Additional support came from the University of Maine School of Marine Sciences and Maine College of Engineering and Computing.

Author information

Authors and Affiliations

Columbia University, New York, USA
Katrina Pugh
University of Maine, Orono, USA
Mohamad Musavi, Teresa Johnson, Christopher Burke & Emily Currie
Massachusetts Institute of Technology, Sloan School of Management, Cambridge, USA
Erez Yoeli
Olin College, Needham, USA
Benjamin Pugh

Authors

Katrina Pugh
View author publications
You can also search for this author in PubMed Google Scholar
Mohamad Musavi
View author publications
You can also search for this author in PubMed Google Scholar
Teresa Johnson
View author publications
You can also search for this author in PubMed Google Scholar
Christopher Burke
View author publications
You can also search for this author in PubMed Google Scholar
Erez Yoeli
View author publications
You can also search for this author in PubMed Google Scholar
Emily Currie
View author publications
You can also search for this author in PubMed Google Scholar
Benjamin Pugh
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

KP, MM, TJ, and EY contributed to conceptualization; KP, MM, and EY contributed to methodology; KP, EC, CB, and BP contributed to formal analysis and investigation; KP and CB contributed to writing—original draft preparation; MM, KP, and EC contributed to writing—review and editing.: MM contributed to resources; MM and KP contributed to supervision.

Corresponding author

Correspondence to Katrina Pugh.

Ethics declarations

Conflict of interest

Authors declare that they do not have any known financial, editorial, or personal conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary file1 (XLSX 878 kb)

Appendix: Definitions

Arc: Mathematically generated dependency pair of words (bi-grams) in a language. Arcs are similar to grammatical forms, but may not rely on word sequence.

BERT: (Bi-Directional Encoder Representations from Transformers) Google’s open-sourced NLP modeling tool using neural network layering and transfer learning to compute word meaning in context.

ConvoKit PromptTypes Wrapper (“ConvoKit”): A set of transformers (programs) and conversation text corpuses open sourced by Cornell University in 2020 to enable conversation-based NLP processing.

Corpus: Collection of transcripts containing utterances, which, in turn, may each contain multiple moves.

Dialog Act: A gesture inside of an utterance that expresses a single rhetorical intent, such as an opinion, statement, question, or invitation. (Several dialog acts may make up an utterance.)

Discussion Discipline: Rhetorical intents that characterize the dialog acts that make up speech. The six discussion disciplines are Integrity (declaration), Integrity-Q (question), Courtesy (positivity), Inclusion (acknowledgement), Translation (synthesis, summary) and Snarky (behaviors contrary to the first five)..

Embedding: Transformation of text into a numerical vector of m dimensions, where dimensions is the number of unique elements, or “tokens” (word, phrase, word-group, phrase-group) for which each of n documents (or, in our case, utterance) is represented as rows or observations. Embeddings make it easier to do machine learning on large inputs like sparse vectors representing words.

Lease Scoping Meeting (LSM): Also called lease scoping “sessions.” Town hall-like gatherings of aquaculture stakeholders, such as riparian landowners, harbor masters, boaters, and aquaculture farmers. LSMs are part of Maine’s aquaculture lease-approval process governed by the Maine Department of Marine Resources.

Move: Dialog act with a single rhetorical purpose. One or more moves makes up utterances. For example, “I am going to the store for you. Do you have your wallet on you?” is two moves that can be classified with discussion disciplines: Integrity (statement) and Integrity-Q (question).

Phrasing Motif: Commonly occurring pair of arcs used in the Convokit PromptTypesWrapper.

PCA: Principal Component Analysis.

ResNet (Residual Network): A neural network using transfer learning based on graphical image processing. Being highly convolutional makes ResNet valuable for feature extraction where the training data are limited for any variable.

Rhetorical Intent: The goal of a sentence or phrase in a conversation, such as to express an opinion, to provide information, to ask a question, to make an invitation, or to evoke emotion.

Term Frequency, Indirect Document Frequency (TF*IDF): A transformation method for finding terms in content by creating an embedding, and then modeling terms based on their frequency in a document, and their scarcity in a corpus.

Transformer: Program that manipulates (e.g., parses, combines, counts) text and applies metadata.

Token: The smallest fragment of conversation used for computing the NLP model.

Utterance: One speaker’s statement in a transcript, similar to one reply in a theatrical dialog.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Pugh, K., Musavi, M., Johnson, T. et al. Neural nets for sustainability conversations: modeling discussion disciplines and their impacts. Neural Comput & Applic 35, 21935–21947 (2023). https://doi.org/10.1007/s00521-023-08819-z

Download citation

Received: 31 January 2023
Accepted: 28 June 2023
Published: 01 September 2023
Issue Date: October 2023
DOI: https://doi.org/10.1007/s00521-023-08819-z

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Neural nets for sustainability conversations: modeling discussion disciplines and their impacts

Abstract

Similar content being viewed by others

Towards Open Domain Chatbots—A GRU Architecture for Data Driven Conversations

Automated Utterance Labeling of Conversations Using Natural Language Processing

Conversational AI and equity through assessing GPT-3’s communication with diverse social groups on contentious topics

1 Introduction

2 Conversation data preparation