Depression symptoms modelling from social media text: an LLM driven semi-supervised learning approach

Farruque, Nawshad; Goebel, Randy; Sivapalan, Sudhakar; Zaïane, Osmar R.

doi:10.1007/s10579-024-09720-4

Depression symptoms modelling from social media text: an LLM driven semi-supervised learning approach

Original Paper
Open access
Published: 04 April 2024

(2024)
Cite this article

Download PDF

You have full access to this open access article

Language Resources and Evaluation Aims and scope Submit manuscript

Depression symptoms modelling from social media text: an LLM driven semi-supervised learning approach

Download PDF

Nawshad Farruque¹,
Randy Goebel¹,
Sudhakar Sivapalan² &
…
Osmar R. Zaïane¹

1277 Accesses
Explore all metrics

Abstract

A fundamental component of user-level social media language based clinical depression modelling is depression symptoms detection (DSD). Unfortunately, there does not exist any DSD dataset that reflects both the clinical insights and the distribution of depression symptoms from the samples of self-disclosed depressed population. In our work, we describe a semi-supervised learning (SSL) framework which uses an initial supervised learning model that leverages (1) a state-of-the-art large mental health forum text pre-trained language model further fine-tuned on a clinician annotated DSD dataset, (2) a Zero-Shot learning model for DSD, and couples them together to harvest depression symptoms related samples from our large self-curated depressive tweets repository (DTR). Our clinician annotated dataset is the largest of its kind. Furthermore, DTR is created from the samples of tweets in self-disclosed depressed users Twitter timeline from two datasets, including one of the largest benchmark datasets for user-level depression detection from Twitter. This further helps preserve the depression symptoms distribution of self-disclosed tweets. Subsequently, we iteratively retrain our initial DSD model with the harvested data. We discuss the stopping criteria and limitations of this SSL process, and elaborate the underlying constructs which play a vital role in the overall SSL process. We show that we can produce a final dataset which is the largest of its kind. Furthermore, a DSD and a Depression Post Detection model trained on it achieves significantly better accuracy than their initial version.

Detecting Depression on Social Platforms Using Machine Learning

A lexicon-based approach to examine depression detection in social media: the case of Twitter and university community

Article Open access 21 September 2022

Depression Detection Using Deep Learning and Natural Language Processing Techniques: A Comparative Study

Find the latest articles, discoveries, and news in related topics.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

According to Boyd et al. (1982), in developed countries, around 75% of all psychiatric admissions are young adults with depression. The fourth leading cause of death in young adults is suicide, which is closely related to untreated depression (World Health Organization, 2023). Moreover, traditional survey-based depression screening may be in-effective due to the cognitive bias of the patients who may not be truthful in revealing their depression condition. So there is a huge need for an effective, inexpensive and almost real time intervention for depression in this high risk population. Interestingly, among young adults, social media is very popular where they share their day to day activities and the availability of social media services is growing exponentially year by year (O’Keeffe & Clarke-Pearson, 2011). Moreover, according to the research (Gowen et al., 2012; Naslund et al., 2014, 2016), it has been found that depressed people who are otherwise socially aloof, show increased use of social media platforms to share their daily struggles, connect with others who might have experienced the same and seek help. So, in this research we focus on identifying depression symptoms from a user’s social media posts as one of the strategies for early identification of depression. Earlier research confirms that signs of depression can be identified in the language used in social media posts (Coppersmith et al., 2015; De Choudhury & De, 2014; De Choudhury et al., 2013; Losada & Crestani, 2016; Reece et al., 2017; Rude et al., 2004; Seabrook et al., 2018; Shen et al., 2017; Trotzek et al., 2018; Yadav et al., 2020; Yazdavar et al., 2017). Based on this background, linguistic features, such as n-grams, psycholinguistic and sentiment lexicons, word and sentence embeddings extracted from the social media posts can be very useful for detecting depression, especially when compared to other social media related features which are not language specific, such as social network structure of depressed users and their posting behavior. In addition, the majority of this background research focused on public social media data, i.e., Twitter and Reddit mental health forums for user-level depression detection, because of the relative ease of accessing such datasets (unlike Facebook and other social media which have strict privacy policies). All this background placed emphasis on signs of depression detection, however, they lacked the inclusion of clinical depression modelling; such requires extensive effort in building a depression symptoms detection model (Sect. 4.2). Some of the earlier research (Ma et al., 2017; Mowery et al., 2016; Safa et al., 2022; Tlelo-Coyotecatl et al., 2022; Yazdavar et al., 2017; Yadav et al., 2020) has focused on depression symptoms detection but they do not attempt to create a clinician-annotated dataset, and later use existing state-of-the-art language models to expand it. All the previous research does not attempt to curate the possible depression candidate dataset from self-disclosed depressed users’ timelines. Therefore the main motivation of this work arises from the following:

1.
Clinician-annotated dataset creation from depressed users tweets: Through leveraging our existing datasets from self disclosed depressed users and trained Depression Post Detection (DPD) model (which is a binary model for detecting signs of depression), we want to curate a clinician-annotated dataset for depression symptoms. This is a more “in-situ” approach for harvesting depression symptoms posts compared to crawled tweets for depression symptoms using depression symptoms keywords, as done in most of the earlier literature (Mowery et al., 2016, 2017). We call it in-situ because this approach respects the natural distribution of depression symptoms samples found in the self-disclosed depressed users’ timelines. Although Yadav et al. (2020) collected samples in-situ as well, our clinician-annotated dataset is much bigger and annotation is more rigorous (Sect. 5.1).
2.
Gather more data that reflects clinical insight: Starting from the small dataset found at (1) and a DSD model trained on that, we want to iteratively harvest more data and retrain our model for our depression symptoms modelling or DSD task.

Our dataset made of both clinician annotated and harvested tweets with signs of depression symptoms is the largest of its kind, to the best of our knowledge.

2 Methodology

To achieve the goals mentioned earlier, we divide our depression symptoms modelling into two parts: (1) Clinician annotated dataset curation: here we first propose a process to create our annotation candidate dataset from our existing depressive tweets from self-disclosed depressed Twitter users. We later annotate this dataset with the help of a clinician amongst others, that helps us achieve our first goal (Sect. 3) and (2) Semi-supervised Learning (SSL): we then describe how we leverage that dataset to learn our first sets of DPD and DSD models and eventually make them robust through iterative data harvesting and retraining or SSL (McClosky et al., 2006) (Sect. 4).

3 Datasets

We create Depression-Candidate-Tweets dataset from the timeline of depressed users in IJCAI-2017 (Shen et al., 2017) who disclosed their depression condition through a self-disclosure statement, such as: "I (am / was / have) been diagnosed with depression" and UOttawa (Jamil et al., 2017) datasets where the users were verified by annotators about their ongoing depression episodes. Later, we further filter it with a DPD model (discussed in Sect. 3.1) for depressive tweets and create the depressive tweets repository (DTR) which is used in our SSL process to harvest in-situ tweets for depression symptoms. We also separate a portion of the DTR for clinician annotation for depression symptoms (Fig. 3).

3.1 Clinician annotated dataset curation

In the overall DSD framework, depicted in Fig. 1, we are ultimately interested in creating a robust DPD and a DSD model which are initially trained on human annotated samples, called “DPD-Human” model and “DSD-Clinician” model as depicted in Fig. 2. The suffixes with these model names, such as “Human,” indicates that this model leverages the annotated samples from both non-clinicians and clinicians; “Clinician” indicates that this model leverages the samples for which the clinician’s annotation is taken as more important (more explanation is provided later in Sect. 3.4). At the beginning of this process, we have only a small human annotated dataset for depression symptoms augmented with depression posts from external organizations (i.e. D2S (Yadav et al., 2020) and DPD-Vioules (Vioulès et al., 2018) datasets), no clinician annotated depression symptoms samples, and a large dataset from self-disclosed depressed users (i.e IJCAI-2017 dataset). We take the following steps to create our first set of clinician annotated depression symptoms dataset and DTR which we will use later for our SSL.

1.
We start the process with the help of a DPD model, which we call DPD Majority Voting model (DPD-MV). It consists of a group of DPD models (Farruque et al., 2019), where each model leverages pre-trained word embedding (both augmented (ATE) and depression specific (DSE)) and sentence embedding (USE), further trained on a small set of human annotated depressive tweets and a Zero-Shot Learning (ZSL) model (USE-SE-SSToT). This ZSL model helps determine the semantic similarity between a tweet and all the possible depression symptoms descriptors and returns the top-k corresponding labels. It also provides a score for each label, based on cosine distance. More details are provided in a previous paper (Farruque et al., 2021). Subsequently, the DPD-MV model takes the majority voting of these models for detecting depressive tweets.
2.
We then apply DPD-MV on the sets of tweets collected from depressed users’ timelines (or Depression-Candidate-Tweets, (Fig. 3) to filter control tweets. The resultant samples, after applying DPD-MV is referred to as Depression Tweet Repository or DTR. We later separate a portion of this dataset, e.g., 1500 depressive tweets for human annotation which we call DSD-Clinician-Tweets dataset. Details of the annotation process are described in Sect. 3.4.
3.
We train our first DSD model using this dataset, then use this model to harvest more samples from DTR. An outline of the DTR and DSD-Clinician-Tweets curation process is provided in Fig. 3. We describe the details of this process in Sect. 4.2, but describe each of its building blocks in the next sections. In Table 1 we provide relevant datasets description.

Table 1 Dataset

Depression symptoms modelling from social media text: an LLM driven semi-supervised learning approach

Abstract

Similar content being viewed by others

Detecting Depression on Social Platforms Using Machine Learning

A lexicon-based approach to examine depression detection in social media: the case of Twitter and university community

Depression Detection Using Deep Learning and Natural Language Processing Techniques: A Comparative Study

Explore related subjects

1 Introduction

2 Methodology

3 Datasets

3.1 Clinician annotated dataset curation

3.2 Annotation task description

3.3 Annotation guideline creation

3.4 Depression symptoms annotation process

3.5 Distribution analysis of the depression symptoms data

4 Experimental setup and evaluation

4.1 Data preprocessing

4.2 Semi-supervised learning (SSL) framework

4.2.1 Step 1: creating first DSD model

4.2.2 Step 2: harvesting tweets using DSD-Clinician-1

4.2.3 Step 3: harvesting tweets using best ZSL Model

4.2.4 Step 4: creating a second DSD Model:

4.2.5 Step 5: creating final DSD model

4.2.6 Step 6: combating low accuracy for less populated labels

4.2.7 Stopping criteria for SSL

5 Results analysis

5.1 Dataset size increase

5.2 Accuracy improvement

5.3 Linguistic components distribution

5.4 Sample distribution

5.5 Data harvesting in the wild

6 Limitations

7 Conclusion

Data availability

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethical approval

Additional information

Publisher's Note

Appendices

Appendix 1: Apriori rules

Appendix 2: Mental-BERT training configuration for DPD and DSD

Appendix 3: Annotation guideline

1.1 Social media data annotation by human

1.2 Depression symptoms labels

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation