1 Introduction

Natural language processing (NLP) is a machine-learning technology focusing on understanding, interpreting, and generating human languages (Allen 2003). This field has rapidly developed in the past decade, driven by the availability of large datasets, improvement in machine learning algorithms, and the growth of computational resources. A key task of NLP is emotion and sentiment mining (Itani 2017), which delves into extracting and explaining human emotions or sentiment polarity in a piece of text (e.g., Liu 2015; Tang et al. 2015). Emotion detection and sentiment analysis have been broadly applied across diverse fields. Business organizations often analyze the sentiments in customer reviews and social media comments to understand user experiences about their products or services (Pang and Lee 2004). Abdul et al. (2020) conducted sentiment analysis on textual data such as messages, posts, chats, and blogs to understand people’s opinions towards political figures, parties, or policies. Understanding the emotion and sentiment in news articles, social media, and financial reports can assist financial sectors, traders, and analysts in making more informed investment decisions (Fredrickson 2001). Sentiment and emotion analysis also holds great promise in educational research, where it is important to understand students’ sentiments and emotions expressed in teaching evaluations. For example, Qu (2021) introduced an innovative framework of aspect-based sentiment analysis, and built an interpretable model to explain the subtle patterns in student teaching evaluation.

Although both sentiment analysis and emotion detection have been commonly used in practice, they operate at various levels of textual granularity and employ diverse models. Sentiment analysis seeks to evaluate whether the affect or polarity in a textual dataset is positive, negative, or neutral. In contrast, the emotion detection is a method used to identify specific human emotion types, such as anger, fear, or happiness (Munezero et al. 2014). In a recent systematic review, Nandwani and Verma (2021) discussed three levels of sentiment analysis, including sentence-level, document-level, and aspect-level, depending on the study units of interest. There are also various emotion detection models in the literature, each attempting to capture characteristics of human emotions. For instance, the dimensional emotion model (DEM) takes a tripartite approach, describing emotions through valence, arousal, and power (Bakker et al. 2014). The categorical emotion model (CEM) categorically labels the emotions such as anger, happiness, sadness, and fear (Fujimura et al. 2012). Despite the difference between sentiment analysis and emotion detection, one key facet of sentiment and emotion analysis is polarity detection, which focuses on discerning the polarities in a speech or text.

Polarity analysis employs deep learning-based, machine learning-based, or lexicon-based approaches as highlighted by Nandwani and Verma (2021). Deep learning-based methods utilize neural networks to learn intricate patterns in textual data such as sequential dependence, local patterns, and overall dependence (Tang et al. 2015). Machine learning-based methods, in contrast, rely on labeled datasets to train models for classifying text or speech into positive, negative, or neutral categories. A lexicon-based approach uses predefined dictionaries or lexicons that contain words or phrases associated with specific sentiments, thus providing a nuanced and interpretable approach.

In the realm of psychology, human emotions play a central role in shaping subjective well-being and behaviors (e.g., Fredrickson 2001). Positive emotions such as joy or enthusiasm can motivate individuals to pursue goals, socialize, or take on challenges (e.g., Fredrickson and Joiner 2018). Negative emotions like fear or sadness may result in defensive and aggressive behaviors or avoidance of social interactions. Conversely, different behaviors (e.g., expressing gratitude, social connections, or isolation) may also yield different emotions. Traditional methods of collecting self-reported emotional data through scales face challenges of authenticity due to factors like social considerations and subjective interpretations. In contrast, text and speech allow for a more immediate and spontaneous expression, providing a genuine representation of feelings (e.g., Medhat et al. 2014). This authenticity enhances the significance of analyzing sentiment and emotions in text and speech for understanding individuals’ true sentiments and reactions, which provides a new way to study people’s subjective well-being and interpersonal relations, such as emotion contagion. Moreover, researchers are often interested in understanding the interaction of the evolution of emotions and the formation of close relationships, for which a longitudinal study is often adopted.

Longitudinal studies are frequently employed in the fields of social, behavioral, and educational sciences, involving the repetitive collection of data by monitoring the same participants across multiple occasions (e.g., Pan et al. 2008; Zhang et al. 2015). Through the analysis of longitudinal data, researchers can explore both the changes occurring within individuals over time and the variations in these changes among different individuals simultaneously (e.g. McArdle and Hamagami 2003; Wang et al. 2016). Scholars have identified the advantages of growth curve models in capturing both the means and the variances, as well as the covariances of the initial level and the rate of change concurrently (e.g. Ke and Wang 2015; McArdle and Nesselroade 2003; Zhang et al. 2007). Consequently, these models have gained popularity in applied research (e.g., Womack et al. 2022). In a growth curve model, the "time" variable is commonly treated as a continuous predictor, and the outcome variable is a function of both time and measurement error. When assuming the means to be a linear function of time, the widely utilized linear growth curve model (LGCM) is employed (e.g., Grimm et al. 2013; Zhang et al. 2013). Alternatively, a general nonlinear growth curve model, such as the logistic growth curve models, may be applied (e.g., Liu et al. 2016).

With longitudinal textual data, the current study proposes a method that integrates sentiment analysis and growth curve modeling approaches to explore the temporal dynamics of emotional contagion during repeated conversations within dyads. Specifically, we aim to investigate how the inherent polarity in conversations between two individuals evolves as they become more familiar with each other. To seamlessly incorporate the repeated conversation data into the growth curve model, our initial step involves extracting sentiment polarity embedded within each conversation. Subsequently, we fit a growth curve model to capture the dynamic evolution of these sentiments over repeated conversations. Additionally, we will also demonstrate various approaches for extracting conversation polarity and compare their impact on the results.

This modeling framework allows for a comprehensive understanding of sentiment fluctuations within conversations, enabling us to uncover potential patterns, variations, or trends inherent in textual data. Through the integration of sentiment analysis and growth curve modeling, our goal is to provide a nuanced and insightful perspective on how sentiments unfold and evolve. This analytical approach enhances the depth of our study and contributes to a more comprehensive understanding of the underlying dynamics shaping these interactions.

In the remaining sections of this study, we first present an empirical dataset on repeated conversations. Following this, we introduce growth curve modeling in a general framework, laying the foundation of the current study. Subsequently, we propose a model for sentiment analysis. Then with the growth curve methodology, we apply the proposed sentiment model to analyze the longitudinal conversation data. Finally, we provide a concise summary of our findings and discuss the potential directions for future research.

2 Longitudinal Conversation Data

Throughout this article, we will use the longitudinal conversation data we collected at the University of Virginia. The dataset, with which we aim to examine the dynamics of emotions and relationship information over time, lends itself well to demonstrating our proposed modeling approaches.

2.1 Participants

The participants are 118 undergraduate students with a mean age of 18.88 (SD=1.26). Among the 118 students, \(84\%\) are females and \(15\%\) are males (1 student did not provide gender); \(55\%\) students are white, \(32\%\) are Asian, \(4\%\) are Black/African American, and \(9\%\) are mixed race or other. Before data collection, the 118 students formed 59 random pairs of strangers.

2.2 Procedure

We recruited 118 student participants for our study via advertisements in online student groups on social media and through our university’s psychology participation pool. Students received course credit or pay (\(\$5\) per session, with a bonus of \(\$10\) if they completed all sessions) in exchange for participation. Students who signed up for credit had the option of stopping after four sessions due to credit limitations. Out of 59 dyads, 52 completed all six sessions, and seven stopped after four sessions.

In the first session, participants were randomly paired up to form dyads. Participants who indicated that they previously knew their partner were re-assigned new partners. Every dyad had a 10-minute conversation once a week for six weeks with the same partner. Therefore, for each pair of students, we collected six sessions of conversation data.

In each session, the pair of students and the research assistant met over Zoom. Before the conversation, each participant privately completed a survey asking about their emotional states. They were shown a Cartesian plane with the x-axis indicating valence (negative to positive) and the y-axis denoting arousal (low arousal to high arousal) and marked with a dot of how they felt. After both participants were done with the survey, they started the conversation while the researchers left the Zoom room. We recorded each conversation and transcribed the text data. After the 10 min had passed, the researcher went back to the Zoom room and gave each participant a new survey to fill out privately. They again reported their current emotion with the same plot and how close they felt to their partner with the question, “On a scale from 1 = strangers to 100 = close friends, how close do you feel to your partner?”

The final dataset consists of two parts, with the participant ID serving as the linkage between them. One part is the numerical data about the demographic information as well as self-reported survey data. The other part is the textual data transcribed from the conversation recordings. Each dyad has up to six recordings across time. Within each recording, there are several turns where a turn in dialogue means one participant’s speech segment before the conversation shifts to the other participant.

3 Growth Curve Modeling

Growth curve modeling serves as a widely employed technique for exploring longitudinal trajectories over time (e.g., McArdle and Nesselroade 2003; Liu et al. 2016; Zhang et al. 2013). There are two prevalent forms of growth curve models: one within the structural equation modeling (SEM) framework, where growth factors (e.g., intercept and slope parameters) are treated as latent variables, and the other within the mixed-effects modeling framework, typically expressed as:

$$\begin{aligned} y_{it}&=f(t,\eta _{i})+e_{it}\\ \eta _{i}&=\beta +\varepsilon _{i} \end{aligned}$$

where \(y_{it}\) is the observation from person i at time t, \(e_{it}\) denotes the intra-individual measurement errors, the latent variable \(\eta _{i}\) is a vector of growth parameters for subject i and these parameters vary between individuals to account for the inter-individual differences with their mean represented by \(\beta\), which is often referred to as fixed effects. The notation \(\varepsilon _{i}\) is the residual of the random effects \(\eta _{i}\).

Following the tradition of mixed-effects modeling, the intra-individual error terms \(e_{it}\) are assumed to follow a normal distribution independently: \(e_{it}\overset{iid}{\sim }N(0,\sigma _{e}^{2})\). The residual of the growth factors \(\varepsilon _{i}\) is also assumed to follow a normal distribution: \(\varepsilon _{i}\overset{iid}{\sim }N_{q}(0,\mathbf{\Psi )}\), where \({\varvec{\Psi }}\) is a \(q\times q\) matrix when there are q elements in the growth parameters vector \(\eta _{i}\).

The trajectory shape of an individual i is described by the function \(f(t,\eta _{i})\). For instance, in a linear growth curve model (LGCM), the trajectory function \(f(t,\eta _{i})\) is a linear function in terms of measurement time t, with a random intercept parameter \(L_{i}\) and a random slope parameter \(S_{i}\), specifically,

$$\begin{aligned} f(t,\eta _{i})=L_{i}+S_{i}(t-1). \end{aligned}$$

Here \(t=1,2,\cdots , T\) with a total T measurement occasions, and \(t-1\) center the time around the first measurement occasion. Then, the intercept \(L_{i}\) denotes the expected value of the outcome variable at the first time point for individual i.

The mean of the growth parameters \(\eta _{i}\) across all individuals is represented by \(\beta ,\) and the covariance matrix of the growth parameters is denoted by \({\varvec{\Psi }}\), and they have the following forms,

$$\begin{aligned} \eta _{i}=\begin{bmatrix}L_{i}\\ S_{i} \end{bmatrix},\hspace{1em}\beta =\begin{bmatrix}\beta _{L}\\ \beta _{S} \end{bmatrix},\hspace{1em}{\varvec{\Psi }}=\begin{bmatrix}\sigma _{L}^{2} &{} \sigma _{LS}\\ \sigma _{LS} &{} \sigma _{S}^{2} \end{bmatrix}. \end{aligned}$$

When fitting a growth curve model, our objective is to obtain estimates for parameter \(\beta\), \(\Psi ,\) and residual variance \(\sigma _{e}^{2}.\) A growth curve model can be fitted under both Bayesian and frequentist frameworks in most statistical software packages such as Lavaan (Rosseel 2012) and Mplus (Muthén and Muthén 2017).

4 Growth Curve Sentiment Analysis

In this section, we propose a comprehensive growth curve sentiment analysis model that integrates the principles of growth curve modeling with sentiment analysis techniques. Our goal is to construct a model capable of capturing the sentiment dynamics embedded within the longitudinal conversation data. Below we will provide a detailed discussion of the proposed model.

4.1 Model Construction

Consider a sequence of conversations denoted as \(y_{1},y_{2},\cdots y_{T}\) between two individuals in a dyad over time 1,2,\(\cdots ,T\), respectively. Each variable \(y_{t}\) denotes the conversation between two students forming a dyad and having a dialogue. For clarity, Table 1 offers an illustrative example of a conversation between two students.

Table 1 A conversation between two students

The primary objective of this analysis is to incorporate the conversation at each time point as a variable within a growth curve model. However, modeling textual data directly in a statistical model is highly challenging. Consequently, our method involves a two-stage process: (1) initial data preprocessing and sentiment extraction, followed by (2) the direct modeling of sentiment using a growth curve model. Therefore, the proposed model consists of two parts. The first part involves transforming conversation/textual data, denoted as \(y_{t}\), into quantitative sentiment data, represented by \(s_{t}\). This part is essential for rendering the qualitative textual information into a form for statistical modeling. The second part applies a growth curve model to analyze the longitudinal numerical sentiment data \(s_{t}\).

To illustrate this process, Fig. 1 provides a visual representation of the proposed model, outlining the sequential flow from conversation text data to quantitative sentiments, culminating in applying the growth curve model to the longitudinal sentiment data. This model facilitates a more nuanced understanding of sentiment evolution and allows for the integration of text-derived variables within a statistical framework.

Fig. 1
figure 1

Diagram of a linear growth curve model for repeated conversations

Now we discuss the proposed two-stage framework in more detail to analyze the empirical conversation data. Note again that in the first stage, we extract the sentiment in conversations, and in the second stage, we apply a growth curve model to model the longitudinal conversations. Given the complex structure of the longitudinal conversation data, after extracting sentiment from each phrase or sentence, we need to manage those sentiment scores from a dyad and prepare one score for each conversation for the growth curve analysis. Different strategies can be applied and are illustrated below.

4.2 Introduction to Sentiment Extraction

Our current study aims to model the sentiment embedded in repeated conversations and investigate how it evolves over time. As such, our focus lies in extracting the sentiment of a dialogue using Lexicon-based approaches. More specifically, we concentrate on discerning each conversation’s sentiment scores. Lexicon-based methods use sentiment lexicons or dictionaries, assigning each word a sentiment score. In the \({\texttt {R}}\) package \(\texttt {lexicon}\), several lexicons are available. For instance, the sentiment lexicon \(\texttt {hash\_entiment\_jocker\_rinker}\) assigns a score ranging from \(-\,1\) to \(+\,1\) for each of the 11,710 words, exemplifying the granularity and richness of sentiment analysis in our study. In addition, it is tailored to handle the nuances of social media language, including emotions, emojis, slang, and internet jargon, making it suitable for analyzing sentiment in informal text data. Therefore, we chose it to study the sentiments nested in the dialogues between students, which occur mainly in the informal environment.

The following is an example of the sentiment scores of individual words provided by the sentiment lexicon \(\texttt {hash\_entiment\_jocker\_rinker}\).

figure a

In the sentiment lexicon \(\texttt {hash\_entiment\_jocker\_rinker}\), a positive score indicates a positive sentiment, with a score of 1 indicating a most positive sentiment, and a score of 0.5 suggests a moderately positive sentiment. A score of 0 typically represents a neutral sentiment. On the other end, a negative score implies a negative sentiment. A score of \(-\) 0.5 may indicate moderately a negative sentiment and a score of \(-1\) generally represents a most negative sentiment. These sentiment scores provide a comprehensive spectrum of the varying degrees of sentiment expressed by words within the lexicon.

In addition to the sentiment of individual words, a sentence’s overall sentiment is also influenced by what is known as “valence shifters” (sometimes called “amplifiers”). Valence shifters are words or phrases that can modify the perceived positivity or negativity of sentiment in a text. For example, the term “very” in the expression “very happy,” ‘slightly’ in “slightly annoyed,” and “never” in the phrase “never disappointed” are instances of valence shifters influencing the emotional tone of words surrounding them. The data frame \(\texttt {lexicon::hash\_valence\_shifters}\) contains information on valence shifters that alter the meaning of a polarized word. Additionally, it provides an integer key for negators, helping us capture a text’s nuanced sentiment.

In the current analysis, we extract the overall sentiment of each sentence in a text using the R function \(\texttt {sentimentr::sentiment()}\). Here is an example of sentiment analysis applied to a dialogue, presenting the sentiment score for each sentence in the output. The output is structured into columns, with the identifier “element\(\_\)id” distinguishing "turns" in a dialogue. Note again that turns typically refers to the back-and-forth exchange between two participants and a turn in dialogue is one participant’s speech segment before the conversation shifts to another participant. The column “sentence\(\_\)id” enumerates sentences within a turn, and the last two columns provide the word counts and the sentiment score assigned to each sentence, respectively.

figure b

4.3 Proposed Growth Curve Sentiment Analysis

4.3.1 Overview of the Analysis

In our proposed growth curve modeling of the sentiment in repeated conversations, each conversation or dialogue serves as a unique observation or data point, with six repeated conversations for each dyad. Consequently, we must further derive the sentiment of each conversation by integrating the polarity scores from individual sentences.

Given the distinctive structure of the dialogue data, featuring multiple turns and multiple sentences within a turn, there exist several flexible approaches to operationalize the sentiment of dialogues. Figure 2 provides an illustrative example with four optional approaches integrated.

Fig. 2
figure 2

Flow chart of the model fitted

We consider and use the methods for incorporating sentence polarity into turn polarity and, subsequently, into dialogue polarity. Specifically, we will explore obtaining the sentiment of each turn through methods such as arithmetic mean, weighted mean, and the score of the dominant sentence.

In each conversation, every turn is assigned a sentiment score indicating whether it conveys positive, negative, or neutral affect. When aggregating these scores to determine the overall polarity of the dialogue, we differentiate between turns expressing positive and negative sentiments for practical reasons. Positive and negative sentiments play pivotal roles in communicating underlying emotions and attitudes. Positive sentiments often denote satisfaction, approval, or enthusiasm, while negative sentiments convey dissatisfaction, disapproval, or concern. Moreover, positive and negative sentiments carry distinct effects and implications, emphasizing the importance of analyzing them separately to understand the emotional dynamics within dialogues comprehensively. The overall positive and negative sentiments of a dialogue are derived by integrating the polarities of turns expressing positive and negative sentiments, respectively.

For comparison, we will also derive dialogue sentiment by directly integrating the scores from all sentences directly. The subsequent sections will detail the implementation of each method. The four approaches of mining dialogue polarity are summarized in Table 2.

Table 2 Summary on the four approaches for extracting the dialogue polarity

4.3.2 Approach 1: Aggregated Turn Sentiment Using Arithmetic Mean

Given the polarity score of each sentence, we obtain the average scores within a turn, and the obtained mean score will be the polarity score of a turn. The polarity score of a turn could be negative, positive, or neutral (i.e., 0).

We obtain the total score of all the turns with positive polarity, which will operationalize the positive affect inherent in the dialogue, and we label them as PA\(_{it}\) for dyad i at time t. Similarly, we also get the total score of the turns with negative polarity, which is the overall negative polarity in the conversation, NA\(_{it}\). The spaghetti plots of the positive and negative affects are displayed in Fig. 3.

Fig. 3
figure 3

Longitudinal plots of the Positive affect and negative affect nested in the conversations based on Approach 1

The derived positive affect scores are inherently positive, with a higher score indicating a more pronounced positive affect within the conversation. In contrast, negative affect scores carry a negative sign, and a more substantial score denotes a heightened negative affect.

To explore the temporal dynamics of positive and negative affects in the two students in a dyad, we applied a linear growth curve model to both positive and negative affect scores separately. The outcomes are detailed in Table 3.

For the linear growth curve model fitted to the positive affect, the comparative fit index (CFI) is 0.977 and the root mean square error of approximation (RMSEA) is 0.066, both pointing to a satisfactory model fit. The estimated mean values for the random slope and random intercept are 9.830 (p.value\(<0.001\)) and \(-\) 0.218 (p.value\(<0.001\)), respectively.

Similarly, When applying the linear growth curve model to the negative affect, the model exhibits a favorable fit, with a CFI 0.977 and an RMSEA of 0.066. The estimated average intercept and average slope are \(-\) 1.324 (p.value\(<0.001\)) and \(-\) 0.063 (p.value=0.052), respectively.

Table 3 Model parameter estimates of the growth curve models in estimating the repeated positive affects and negative affects with Approach 1

According to the results of the analysis, as the two students grow more acquainted with each other, there is a decrease in the expression of positive affect, coupled with an increase in the intensity of negative affect in their interactions.

4.3.3 Approach 2: Aggregated Turn Sentiment Using Zero Down-weighting Mechanism

A typical turn usually comprises multiple sentences; some may carry neutral sentiments reflected by zero-polarized scores. In Approach 1, when an arithmetic mean uses the sentiment scores of all sentences within a turn, the presence of neutral sentences can dilute the overall sentiment score. To account for the influence of neutral sentences, an average with a down-weighting mechanism in the context of language is commonly employed. This involves assigning a lower weight to zero sentiment scores to reduce the impact of neural sentences. Essentially, this means neutral sentences are seen as having less emotional impact than a polarized sentence.

For instance, in Turn \(\#\)2 of a conversation, there are three sentences with sentiment scores (0, 0.1698, 0.3784). The average with down-weighting of zeros stands out more prominently than the arithmetic mean.

figure c

As in Approach 1, after obtaining the sentiment scores for individual turns, some turns exhibit positive sentiments, some display negative sentiments, and others are assigned neutral scores (zeros). Turns with positive and negative sentiments offer insights into the trust a person places in their friends. The polarity of these sentiments can illuminate the attitudes expressed during the interactions, providing values for understanding the dynamics of interpersonal trust within the given conversations.

As such, we further obtain the overall positive affect as well as the overall negative affect of the conversation by summing the sentiment of the turns with positive scores and those with negative scores, separately. The total positive affect and negative affect are plotted longitudinally in Fig. 4.

Fig. 4
figure 4

Longitudinal plots of the positive affect and negative affect nested in the conversations based on Approach 2

To explore the evolution of positive affect and negative affect as the two strangers gradually become more familiar with each other, we separately fit the linear growth curve model depicted in Figure 1 to the positive and negative affects. The model parameter estimates and fit indices are summarized in Table 4.

When the model is applied to the positive affect, CFI yields a value of 0.971, indicating a good fit, and RMSEA stands at 0.075, indicating an acceptable fit. The estimate for the mean parameter of the random slopes, i.e., \(\hat{\beta }_{L}\), is 0.983 (p.value\(<0.001\)). The estimated mean parameter of the random slopes \(\hat{\beta}_{S}\) is \(-\) 0.245 (p.value\(<0.001\)). This finding suggests a significant initial high positive affect when two strangers first encounter each other, with positive affect diminishing as familiarity between individuals increases.

When fitting the linear growth curve model to the negative affect, it exhibits a firm fit (RMSEA = 0.000, CFI = 1.000). The estimated mean parameter of the random intercepts is \(-\) 1.417 (p.value\(<0.001\)), and the estimated mean parameter of the random slopes is \(-\) 0.072 (p.value=0.036). Since the average slope is negative meaning that the negative affect becomes more negative over time, our findings indicate the negative affect increases in conversations as individuals become more acquainted.

Table 4 Model parameter estimates of the growth curve models in estimating the repeated positive affects and negative affects with Approach 2

4.3.4 Approach 3: Dominant Sentiment in Turns

Another alternative approach to obtain the sentiment score of a turn is identifying the sentiment of the dominant sentence and using it as an operational measure of the sentiment of the turn. In the current study, we use the most extreme (either the positive largest or the negative smallest) sentiment of sentences in a turn as the operational measure of the sentiment of the whole turn. Once we determine the sentiment of a turn, we sum up the positive and negative scores of all turns, separately, which are the operationalization of the positive and negative affect of the conversation as plotted in Fig. 5.

Fig. 5
figure 5

Longitudinal plots of the positive affect and negative affect nested in a conversation based on Approach 3

The model estimations for the growth curve models are detailed in Table 5. The results are akin to those observed in Approaches 1 and 2. For the analysis of the positive affect, the model has a good fit, as indicated by CFI, with a value of 0.968. However, the RMSEA stands at 0.091, suggesting an unacceptable fit. The estimate of the mean of the random slopes is \(-\) 0.255 (p.value=0.012), implying a decline in positive affect within the conversation throughout becoming more acquaintances.

When fitting the linear growth curve model to the negative affect, it achieves a perfect fit, with the RMSEA being 0 and CFI being 1, respectively. The estimated mean parameter of the random slopes, \(\hat{\beta }_{S}\), is \(-\) 0.16 (p.value=0.003), indicating that negative affects in the conversation intensify as individuals become more acquainted. Furthermore, the estimates of the covariance parameter \(\hat{\sigma }_{LS}\) is \(-\) 0.223 (p.value=0.056), indicating that if there is initially minimal negative affect, it tends to decrease at a faster pace over time.

Table 5 Model parameter estimates of the growth curve models in estimating the repeated positive affects and negative affects with Approach 3

4.3.5 Approach 4: Integrating Sentence-level Polarity to Dialogue Polarity

The preceding three approaches involve extracting the polarity of individual turns to subsequently derive the overall polarity of the dialogue. Alternatively, We can directly aggregate sentence polarity to determine dialogue polarity. In this approach, we operationalize positive affect and negative affect by summing up the scores of sentences with positive and negative polarities. Additionally, we aggregate the polarity of all sentences in the dialogue to obtain a single dialogue score using the Zero Down-weighting Mechanism. Trajectories of the resulting scores are illustrated in Fig. 6.

Fig. 6
figure 6

Longitudinal plots of the positive affect and negative affect, and the total polarity of dialogues by integrating the sentence-level polarities in Approach 4. in a conversation

Similarly, as before, a linear growth curve model is fitted for positive, negative, and total affects, separately. The results are illustrated in Table 6.

Table 6 Model parameter estimates of the growth curve models in estimating the repeated positive affects, negative affects, and total affects with Approach 4

When fitting the linear growth curve model to negative affect and positive affect, both exhibit an excellent fit, with RMSEA being 0 and CFI being 1, respectively.

For positive affect, the estimated mean parameter of the random slope, \(\hat{\beta }_{S}\), is \(-\) 0.389 (p.value=0.005), indicating a decrease in positive polarity as the two students become more acquainted. Conversely, for negative affect, the estimated mean parameter of the random slope, \(\hat{\beta }_{S}\), is \(-\) 0.410 (p.value \(<0.001\)), signifying an increase in the intensity of negative affect expressions as the friendship between the two students strengthens. Additionally, the estimate of the covariance parameter \(\hat{\sigma }_{LS}\) is \(-\) 0.410 (p.value=0.078), suggesting that if there is initially little negative affect, it tends to decrease at a faster pace over time.

When the model is fitted to the overall polarity (summing all sentence polarities), the polarity of the initial dialogue is positive, with an estimated intercept mean, \(\hat{\beta }_L=2.09\). This indicates that two strangers tend to use expressions with a positive affect in their conversation. The estimate for the slope parameter, \(\hat{\beta }_S\), is \(-\) 0.008 (p.value \(<0.001\)), suggesting a decrease in conversation polarity as two people become more familiar with each other. It is noteworthy that the estimates of the covariance parameter, \(\hat{\sigma }_{LS}\), as well as the variance parameter of the random slope, \(\hat{\sigma }_S^2\), are both 0. This implies that the slopes of the linear trajectories may be similar, and a model with a fixed slope parameter could be considered.

5 Discussion and Conclusion

This paper introduces novel growth curve sentiment analysis models to integrate longitudinal textual data into growth curve analysis. Given the prevalence of textual and dialogue data, we believe that the proposed framework for modeling repeated dialogues offers a distinctive perspective, providing additional insights into understanding interpersonal interactions and sentiments expressed in conversations.

Due to the unique structure of dialogue data, characterized by sentences and turns, we introduced four distinct approaches for managing sentiment scores within conversational exchanges, each tailored to handle the intricacies of turn-taking dynamics. Across these approaches, we consistently observed a notable trend: a decline in positive affect accompanied by an escalation in shared negative affect within each dyad as friendships deepened and solidified.

These findings imply that as friendships evolve and strengthen over time, a shift in affective experiences emerges within the dyadic relationship. The observed increase in shared negative affect suggests the presence of emotional contagion, wherein those of their friends influence individuals’ emotional states. Furthermore, the results underscore the supportive nature of these relationships, where individuals feel empowered to express negative emotions within the safe confines of a close friendship. Understanding how friendships contribute to emotional regulation and provide a platform for mutual support holds significant potential for interventions to enhance mental health and foster resilience.

Meanwhile, we also note slight variations in the extent of the decrease in positive affect and the degree of the increase in shared negative affect, as well as differences in the significance of the observed effects among the various approaches for sentiment integration. For example, the estimated mean slope for the negative affect in Approach 1 was not statistically significant, whereas it was significant in Approach 2. In practice, we recommend applying multiple approaches and summarizing the results based on all analyses to ensure a comprehensive understanding.

We argue that sentiment analysis holds great potential in empirical applications in psychology, where people’s sentiment and affection are of primary interest to researchers. Traditionally, self-reported data based on affection scales are often used. With the popularity of online platforms for social connection, textual and dialogue data have become more popular and easier to collect. Textual data provide unique information about people’s behaviors. Unlike self-reported data, which may be subject to biases and social desirability, textual data offers a more natural and unfiltered glimpse into individuals’ emotions and sentiments. The richness and spontaneity of language used in texts and dialogues can capture nuances and subtle expressions that might be challenging to elicit through structured surveys or scales. Additionally, the contextual nature of textual data allows researchers to study emotions in real-world settings, providing a more authentic representation of individuals’ emotional experiences in various situations. Overall, the advantages of textual data in capturing genuine and nuanced emotions enhance its utility in understanding human behavior and subjective experiences.

We acknowledge the considerable challenges posed by the unique nature of textual data when employing traditional statistical models such as regression analysis and structural equation modeling. Unlike traditional numerical data, the qualitative essence of textual data introduces complexities that cannot be directly accommodated by these established modeling frameworks. Consequently, there is a substantial demand for adapting existing models to effectively analyze textual data.

The sentiments embedded in textual data are implicit, necessitating robust methods for information extraction. The nuanced and contextual nature of language in textual data requires specialized techniques to capture and quantify the qualitative aspects inherent in the expressions. Conventional statistical models, designed for numerical inputs, may struggle to interpret the intricacies of textual content, making it imperative to develop methodologies that bridge the gap between the qualitative nature of text and the quantitative requirements of statistical analysis.

The current work, which involved analyzing sentiment scores extracted from conversation data using linear growth curve models, presents opportunities for extension in various directions. The models demonstrated a good fit for our data; however, it’s important to note that in practical scenarios where change trajectories exhibit nonlinearity, sentiment scores may benefit from the application of polynomial or other nonlinear growth curves. Future results should be contingent upon employing the best-fitting model for such cases. In addition, the current sample is mainly composed of female participants, which may introduce potential bias in the calculation of conversational sentiment and reduce the external validity of the study. Future studies could benefit from exploring such potential gender and cultural impacts. For example, we may collect data from a more balanced group in gender. We may consider using Large Language Models to simulate conversations between individuals with the same gender, and individuals with different genders, and perform multi-group analyses with a larger sample size to unearth patterns and offer a deeper understanding of the effects of gender on sentiment dynamics.

Note that the sentiment analysis in this study used the aggregated information from each conversation. The dyadic information was not considered mainly because fully investigating the dyadic information would make the model too complicated, as issues such as whether treating participants as distinguishable or indistinguishable dyads, temporal dependency, unequal variances within dyads should be studied. Since the goal of our current study is to propose and compare approaches to integrate sentiment analysis and growth curve modeling to study how the inherent polarity in conversations between two individuals evolves over time, although the dyadic information is important to be incorporated into the study, we decided to investigate it in the future.

In the realm of longitudinal studies, participant attrition is a common issue. In our current analysis, we utilized the full information likelihood approach, where the missing data was ignored. To address this issue more effectively in the future, employing techniques such as multiple imputation or Bayesian approaches could offer improved handling of missing data.

The proposed framework, along with longitudinal textual data, contributes to the exploration of “emotion contagion,” a fundamental inquiry in social and psychological research. Emotion contagion refers to the phenomenon where an individual’s emotions influence the feelings of others, creating a shared emotional experience within a group or community (Hatfield et al. 2014). This contagion occurs through social interactions across diverse channels, both online and offline (e.g., Barsade et al. 2014). While existing research has delved into the nuances of emotion contagion at different relationship closeness levels, a significant gap exists in understanding its evolution within a relationship. As individuals progress from being strangers to friends, a pivotal question arises: does emotional contagion between them increase or decrease with deepening familiarity? To address such inquiries, the proposed longitudinal sentiment analysis can be employed to extract and examine emotion contagion from the dialogues between two people.

In our present study, we explore four approaches for sentiment extraction tailored for conversation data. These methods are designed to integrate the sentiment expressed within individual sentences, thereby capturing the essence of nested conversations. Various parameters are crucial for this task, including the adjustment of word or phrase weights (e.g., via “amplifier.weight”), consideration of contextual window size for sentiment analysis (e.g., via “n.before” and “n.after”), and the weighting of adversative expressions (e.g., via “adversative.weight”). Conversational text exhibits unique characteristics such as abbreviated syntax and overlapping speech patterns resulting from interactive exchanges. Consequently, the selection of parameter values necessitates careful consideration, typically achieved through cross-validation techniques.

The sentiment extracted through these methods serves as an operationalization of participants’ latent sentiment scores. Despite the enormous interest in sentiment analysis, the existing literature lacks a systematic exploration of the validity and reliability of various text mining methods in operationalizing sentiments. In a subsequent phase of our research, we aim to address this gap by comparing these novel approaches with traditional survey methods in discerning individual sentiments.

As discussed earlier, both sentiment analysis and emotion detection have been commonly used in NLP. We focused on sentiment analysis in this study. Future research can expand the scope of our study and further study specific human emotion types such as sadness, happiness, fear, or anger, using emotion detection. Additionally, employing tree-based approaches could help identify covariates that predict these emotions. Alternatively, we can utilize classification methods to train models specifically designed for emotion detection. These extensions would provide a more comprehensive understanding of sentiment dynamics and emotional nuances within the conversation longitudinal data.