Introduction

The COVID-19 virus, which rapidly spread across the world since late 2019, was first detected on December 31, 2019. World Health Organization (WHO) declared it a pandemic On March 11, 2020. As of March 13, 2023, the virus has infected people in 216 countries, with over 759.41 million confirmed cases and over 6.87 million confirmed deaths1. In response to the pandemic, educational facilities in 190 countries were closed, and many governments issued flight bans and stay-at-home orders, affecting people worldwide2.

Prior to the COVID-19 outbreak, an estimated 380 million people worldwide, of all ages, were affected by mental health issues. Previous studies have shown that mental health problems can lead to harmful outcomes such as suicide3,4. However, these studies face two major challenges. Firstly, many individuals with mental health problems are hesitant or ashamed to seek help5. Secondly, obtaining and analyzing a large sample size of diagnosed individuals can be difficult in psychological research.

Numerous studies have investigated the economic and social impacts of COVID-196,7. In addition, various investigations have revealed that the mental health of people around the world has been greatly affected by the COVID-19 outbreak. These studies have reported higher rates of depression, anxiety, PTSD, and stress symptoms during the pandemic than before8. While stay-at-home orders and social distancing measures have been effective in preventing the spread of COVID-19, as suggested by previous research, they can also have negative effects on individuals’ mental health9,10,11.

Social media data can provide valuable insights into physical and mental health concerns. Often, social media users are unaware of changes in their own health12. Research has demonstrated that searching for information about certain health problems can reveal early-warning signs of hard-to-detect tumors13. Social media platforms have also been utilized to track outbreaks of illness and monitor regional nutrition14,15. Predictive screening algorithms have been successful in identifying signals in social media data for a variety of mental health conditions, including addiction16, suicide ideation17, depression18,19,20, Post-Traumatic Stress Disorder (PTSD)21 as well as physical ailments. However, the use of predictive health screening through social media data is still in its early stages, and significant modifications are needed to establish approaches that can effectively supplement health treatment.

Motivation

Depression is a prevalent mental health condition that affects a wide range of behaviors and communication patterns, making it a major concern among computational social scientists22,23,24. Despite its prevalence, depression is often underdiagnosed, with studies reporting that up to 45% of cases in some major metropolitan regions remain undiagnosed25. Post-traumatic stress disorder (PTSD) is less common than depression, but it is often associated with significant levels of depression26. Primary-care physicians frequently underdiagnose or undertreat PTSD27, highlighting the potential benefits of computational approaches for early screening and diagnosis of depression and PTSD12.

Given these concerns, we aim to explore how COVID-19 has impacted people’s mental well-being and determine what percentage of individuals have been affected. Moreover, our objective is to create a machine learning model that can classify individuals as having PTSD or not, based on their exposure to COVID-19, with a high degree of accuracy.

Research contributions

The present study achieves the following four-fold contributions:

  • The dataset of more than 3.96 Million Tweets has been constructed from the users who mentioned on their Twitter timeline that they were COVID positive at some point between March 2020 and November 2021.

  • The resulting dataset has been filtered and manually annotated following International statistical Classification of Diseases (ICD)-1128 guidelines.

  • The proportion of users was quantified being PTSD positive or negative based on the data filtration criteria which gives us a better understanding of users’ posting behavior after they were diagnosed with COVID.

  • Finally, a machine learning based classification model has been proposed to effectively classify the tweets of users as either PTSD positive or negative.

Rest of the paper is organized as follows. Section “Post-traumatic stress disorder (PTSD)” discusses the PTSD and its diagnosis along with the guidelines adopted for data filtering and annotation. Section “Literature review” sheds light on the state of the art on the topic while section “Methodology” explains the proposed methodology for the study along with a brief description of our chosen classification algorithms. In section “Data extraction”, we discuss our approaches of data extraction, filtration and annotation along with our findings based on the data. And finally in section “Conclusion”, we conclude our findings and mention our future directions.

Post-traumatic stress disorder (PTSD)

PTSD is a type of anxiety disorder that can develop in individuals who have experienced a traumatic event, such as a car accident, war, physical, emotional, or sexual abuse, a natural disaster, or any other life-altering experience that impacts their biological or psychological state. The WHO and the American Psychiatric Association (APA) both recognize PTSD as a legitimate condition, and diagnostic criteria are provided in ICD and the Diagnostic and Statistical Manual of Mental Disorders (DSM), as well as related health problems29.

PTSD is a conglomerate of symptoms affecting multiple domains and it is described as “the complex somatic, cognitive, affective, and behavioral effects of psychological trauma”30. Considering lack of physical symptoms in most cases of PTSD and the stigma attached to mental illness, a lot of times, PTSD is diagnosed in people after months of struggling with it. The fact that there is no blood test or an imaging test that can help diagnose PTSD right away is also a barrier to effective treatment being offered at an earlier stage. Population struggling with PTSD are late to be identified and they mostly come to light when they start to struggle at work, have difficulties in their relationships with others, or become addicted to drugs or alcohol to self-medicate to numb their symptoms. Once the contact is made with a psychiatrist, a thorough history of the traumatic event, the symptoms related to it, and in many cases collateral history is necessary to make the right diagnosis.

A cross-sectional study carried out on nurses exposed to COVID in China found incidence of PTSD to be 16.8%, with highest scores in avoidance symptoms31. Our aim in this study is to be able to cut short this lengthy process of diagnosis of PTSD by recognizing those who have had COVID and might be suffering from PTSD using their tweets. By identifying this population and predicting that they might have PTSD, they can be offered proper evaluation and optimal treatment

We acknowledge the complexity of post-traumatic stress disorder (PTSD) and recognize that its diagnosis extends beyond language patterns alone. Factors such as context and personal history are integral to understanding and assessing PTSD. It’s important to emphasize that the severity and impact of the traumatic incident also play a significant role in determining the presence of PTSD symptoms.

PTSD, classified as an anxiety disorder, often emerges following exposure to a traumatic event, which may involve actual or threatened death. We posit that COVID-19 presents a potential trigger for PTSD due to the profound trauma associated with the experience, coupled with the pervasive fear of mortality. Despite potential limitations, our computational analysis serves as a valuable screening tool to identify individuals at risk of developing PTSD. Notably, symptoms such as hyper-vigilance, anxiety, insomnia, flashbacks, and nightmares, consistently observed in our study, correlate with individuals’ encounters with COVID-19. Moreover, various factors, including witnessing severe illness or death, enduring prolonged isolation, fear of contagion, and uncertainty about the future, further contribute to this psychological response. While our study may not fully meet diagnostic criteria, it lays a crucial groundwork for screening populations susceptible to PTSD, thereby facilitating the development of tailored services for timely diagnosis and intervention.

Literature review

The field of mental health detection has been the focus of numerous studies utilizing various datasets and modeling techniques to develop reliable models for detecting mental health issues. In such study by Joshi et al.32, a combination of deep learning and conventional machine learning algorithms was used to detect mental health issues through social media posting and behavioral features. The first stage of classification involved considering 13 behavioral features to classify users, while in the second stage, a behavioral feature called DL_score was created using a word2vec model to classify tweets. The model was trained on nearly 12 million tweets for tweet classification. Their model achieved an accuracy of 89%, with the deep learning feature extraction helping to accurately classify users as normal or non-normal, while also reducing the false positive rate.

During current pandemic, 36.6 million users tweeted almost 41.3 million COVID-19 related tweets in 202033. Based on COVID-19 related tweets, the keywords like ‘corona’, ‘#Corona’, ‘covid19’ etc. tweets are collected from the profile description and tweets of the users to look for the signs of depression. Among 2575 twitter users, 200 are randomly selected from the classified depression set of users and 86% are labeled as positive. Almost 1402 depression users tweeted the tweets that are chosen, posted in three months of time span. Transformer-based models such as BERT, RoBERTa, and XLNet are applied to identify depression users to monitor depression trend during COVID-19.

A study by Sekulic et al.34 proposed a Hierarchical Attention Network (HAN) for the detection of mental disorders. This model is comprised of a word sequence encoder, a layer at the word-level attention, a sentence encoder, and a layer at sentence-level attention. Initially, users with a self-reported diagnosis of nine mental disorders were identified, and the model was trained on their posts, which were modeled as sentences. The HAN outperformed baseline models in detecting depression, anxiety, ADHD, and bipolar disorders, but performed inadequately for PTSD, autism, eating disorders, and schizophrenia. With the attention mechanism provided by the HAN, important words or phrases were easily identified and deemed relevant for classification.

The study utilized lists of n-grams derived from tweets of users diagnosed with depression or PTSD, which were used to train a classifier to rank tweets of other users as positive or negative for depression or PTSD35. The dataset consisted of tweets from 327 random Twitter users, out of which 246 users reported a PTSD diagnosis and had at least 25 tweets. The tweets related to each condition were randomly selected, and the first eight million words of tweets were used in the training data. The features were selected based on their frequency of occurrence, with n-grams that occurred 50 times more in a single condition being included. This selective approach to feature selection helped to improve the results and provide greater insight into the identification of mental illness via social media posts.

To identify individuals who may be experiencing depression, a group of Twitter users who had self-reported their diagnosis of depression via tweets were selected using the Twitter streaming API36, with regular expressions and data acquisition techniques being used over a four-month period. To ensure a balanced dataset, authors selected equal number of positive and negative instances representing depressive and non-depressive tweets respectively from 600 randomly selected users to perform the experiments. Features for emotions were extracted, and strength scores were assigned to create emotion-based features, while time-series analysis was applied, and descriptive statistics were selected as temporal features. The resulting model achieved an accuracy of 87.27% on emotion features alone, outperforming baseline models18,37,38. When different temporal features were used, the accuracy was improved with 89.77%, and when both, i.e., emotion and temporal features were combined, the accuracy increased to 91.81%. These findings suggest that basic emotions can be used to identify individuals who may be experiencing depression on Twitter.

To conduct their research, the authors utilized a widely recognized dataset in the fields of computational linguistics and clinical psychology39, which comprised of three types of Twitter users: those who self-reported a diagnosis of depression, those who self-reported having PTSD, and a control group of users matched in terms of demographics40. The dataset consisted of 3000 tweets, which were manually reviewed to eliminate irrelevant information. The authors then conducted a qualitative analysis to identify instances of misclassification in their approach, discovering that some false positives arose from the use of language that displayed anger or frustration, while other false positives were linked to music, bands, or artists associated with the positive class. The authors emphasized the limitations of using similar machine learning systems and the importance of not relying solely on automated classifiers to determine an individual’s mental health status on social media platforms.

Table 1 Comparison of results with previous studies.

In another study by43, the classification of mental illness from social media texts using deep learning and transfer learning was investigated. The authors aimed to develop a machine learning model to identify the presence of mental illness in text data from social media platforms. The model was trained on a dataset of social media texts annotated for mental illness and evaluated using multiple metrics. The results showed that the transfer learning approach outperformed traditional deep learning methods in terms of accuracy in classifying mental illness in social media texts. This study highlights the potential of deep learning and transfer learning for mental health screening and intervention through social media platforms.

A number of studies have been conducted using machine learning algorithms to predict PTSD and depression in various populations. For example, Reece et al. used RF algorithm to analyze 243,000 Twitter posts related to PTSD and achieved an AUC score of 0.89 in predicting the disorder12.

Another study conducted by Leightley et al. focused on identifying PTSD among military personnel in the UK by applying machine learning techniques. They achieved an accuracy of 97% with RF44. Papini et al. used gradient-boosted decision trees to predict PTSD in 110 patients with the disorder and 231 trauma-exposed controls, achieving an accuracy of 78%48. Similarly, Conrad et al. applied RF using Conditional Interference (RF-CI) and Least Absolute Shrinkage and Selection Operator (LASSO) to predict PTSD survivors of a civil war in Uganda, with RF achieving the highest accuracy of 77.25%45.

Marmar et al. used RF to predict PTSD with an accuracy of 89.1% with an AUC of 0.954 from audio recordings of warzone-exposed veterans46. Vergyri et al. used Gaussian backend (GB), decision trees (DT), neural network (NN) classifiers, and boosting to predict PTSD from audio recordings of war veterans, obtaining an overall accuracy of 77%47.

According to42, a noteworthy investigation was conducted to detect PTSD among cancer survivors using Twitter data. The researchers utilized a convolutional neural network (CNN) to learn the representations of the input tweets containing the keywords “cancer” and “PTSD” to identify cancer survivors with PTSD. The results demonstrated that the proposed CNN was effective in detecting PTSD among cancer survivors and outperformed the baselines. The authors suggested that it is crucial to evaluate and treat PTSD in cancer survivorship care, and social media can act as an early warning system for PTSD in cancer survivors. The study emphasizes the importance of early detection and treatment of PTSD in cancer survivors.

Our research builds on earlier studies by focusing on PTSD in people who have survived COVID-19, aiming to better understand the psychological effects of the pandemic. To address the mental health needs that haven’t been met for these survivors, we have used a unique approach. As shown in Table 1, we compare our study to previous work to highlight how our investigation is different. Unlike other research that looks at various groups, we specifically analyze how COVID-19 has impacted mental health using information gathered from Twitter. Our method successfully pinpointed PTSD in 83.29% of cases, proving to be a valuable tool in understanding how this global crisis affects mental well-being.

Methodology

After thoroughly reviewing state-of-the-art techniques, we have proposed a classification framework as shown in Fig. 1.

Figure 1
figure 1

Proposed methodology.

The stages of our proposed system are as follows: (i) data extraction and filtering (ii) data annotation based on ICD-11 guidelines (iii) preprocessing and splitting data into train and test dataset (iv) extraction of features and (v) training and evaluation of our ML model.

Data extraction

The first step of data extraction is to identify those users who, mentioned on twitter that they were covid positive. We collected the data including tweets from Twitter using official Twitter API for academic research with the search query “#Covidpositive OR #Covidsurvivor OR #CovidFree OR #CovidRecovered OR #ConqueredCovid OR #DefeatedCovid OR #OvercameCovid”. The data was collected from 01-March-2020 till 30-November-2021. We ended up with 90,330 usernames who posted tweets using either of these hashtags during this period. However, the unique usernames were 70,646.

We applied the sample size (n) calculation formula provided by49 given below as (1), on population (N) of 70,646.

$$\begin{aligned} n = \frac{N}{1 + N(e)^{2}} \end{aligned}$$
(1)

where e is the margin of error and we choose it to be 5%. By applying Eq. 1, we got \(\approx\) 177 users. We randomly choose 177 users from the previously extracted data, and we extracted the tweets timeline of these users using the aforementioned timeline. We were able to extract 3,958,836 tweets (\(\approx\) 3.96 Million) out of which 2,155,577 Tweets were in English language (\(\approx\) 2.15 Million). Furthermore, the focus of the model was solely on the text content of the tweets for classification purposes, without relying on any demographic information. This data is visualized in Fig. 2 for both, years and categories.

Figure 2
figure 2

Tweets and engagement count of english and other languages.

Table 2 Set of keywords to filter the tweets.

To filter the data, we used a set of keywords inline with the ICD-11 guidelines. The breakdown of tweets against each keyword is mentioned in the Table 2.

To further understand the posting behavior of users about previously mentioned PTSD categories, we have visualized the flow of users, in Fig. 3, across three intervals of seven months each from our selected timeline.

Figure 3
figure 3

Flow of users across PTSD categories over time.

The intervals include data from (i) March 2020–September 2020 (ii) October 2020–April 2021 (iii) May 2021–November 2021. We see a large fraction of users contributing to Avoidance category, and then the second most contribution is to Non PTSD Tweets, i.e., which did not belong to any of the previously mentioned categories. In fact, these two remains the only categories about which most users have posted and they have further continued posting about either of them. The only small presence is of Hyperarousal in second interval. The interesting fact to be noted is that a large fraction of users have remained in their respective categories across all intervals, however, a small number of users switched their categories in second interval and approximately the same number of users came back in their initial category. We can say that majority of the users were found to not have PTSD symptoms after they were diagnosed with Covid.

Figure 4
figure 4

Category wise tweets and engagement breakdown.

The total of tweets after filtering was 89,647 as per the breakdown mentioned in Table 2. To make sure that we do not have too much and too low representation of one particular keywords, we chose to calculate the 5th and 96th percentile of these numbers to remove the too low and too high, respectively, occurrence of tweets against keywords. The 5th percentile is 3.3 for this data, and 96th percentile is 9715.92. Based on these values, we will exclude values lower than 5th percentile, i.e, Hypervigilant greater than 96th percentile, i.e, Low, therefore, the final set of tweets were 16,704, used for the annotation of data. This breakdown of data for categories mentioned in Table 2 and after removing the outliers in aforementioned categories, is shown in Fig. 4. Since the change is supposed to occur in only two categories where these values belong, that’s why only two of them contain Clean Engagement Count and Clean Posts Count.

Ethical considerations

We recognize the importance of maintaining privacy and sensitivity when working with mental health data, particularly when utilizing user-generated content such as tweets. To address these concerns, all tweets used in our study were anonymized to ensure the protection of user identities. No personally identifiable information was retained in our analysis and hence we cannot trace back any user after our analysis.

Data pre-processing

Once the data had been filtered, we proceeded to pre-process it in order to eliminate any extraneous noise. To this end, we undertook the following steps:

  1. 1.

    We began by selecting a Covid-PTSD category from the training data.

  2. 2.

    Next, we broke down the tweets within this category into tokens using delimiters.

  3. 3.

    After tokenizing the tweets, we sanitized them to remove any non-letter characters, such as punctuation marks, quotes, numbers, and special characters, among others.

  4. 4.

    We then eliminated all stop words, which are typically less informative, from the dataset. To do this, we utilized NLTK-based stop word lists and also generated our own stop word lists.

  5. 5.

    After implementing step four, we utilized a Porter stemmer to perform text stemming. This step is vital in reduce or minimize the dimensions of the features since a word can exist in multiple forms with different meanings in natural language (e.g., singular and plural). By stemming the words, we reduced them to their base form.

  6. 6.

    Steps 1-5 were repeated for both classes.

Classification algorithms

In this article classical machine learning algorithms are used for classification problems. All the four algorithms are discussed in this section.

  • Support Vector Machine (SVM) is a commonly used technique for text categorization50,51. It employs multidimensional hyperplanes to accurately differentiate between different labels or classes52. SVMs are particularly useful in high-dimensional spaces, making them the most practical classifier for such scenarios. Additionally, SVMs offer fair predictive performance even with small datasets due to their relative simplicity and versatility in handling a wide range of classification problems. SVMs are widely used in brain disorder research utilizing multivoxel pattern analysis (MVPA) due to their simplicity and lower risk of overfitting. In recent times, SVMs have been applied in precision psychiatry, particularly in the diagnosis and prognosis of brain diseases like Alzheimer’s, schizophrenia, and depression53.

  • Naïve Bayes (NB) is a machine learning algorithm that utilizes probability to classify data. It calculates the likelihood of a given piece of text belonging to a particular class based on the computed class labels. The classifier has been successfully employed in several studies for text classification54,55,56,57. We chose this classifier for its ease of use and superior performance in earlier studies58,59. The algorithm performs a sequence of probabilistic computations to determine the best-fitted classification for a given piece of data. Suppose x is a set of n attributes, such that \(X = {x_{1}, x_{2}, x_{3},...,x_{n}}\) where X represents the evidence, and H represents the hypothesis that the data sample X belongs to a certain class C. The likelihood that the hypothesis H holds given the evidence X can be computed using Equation 2. The Bayes theorem explains the preceding logic as follows:

    $$\begin{aligned} P(H|X) = (P(P|H) P(H)) / P(X) \end{aligned}$$
    (2)
  • K-Nearest Neighbors (KNN) is an instance-based machine learning classifier based on the concept of similarity. It determines a class’s similarity to a feature using the Euclidean equation and the value of K. The algorithm stores all cases and uses a similarity score to identify new examples. The similarity between a new text and the training data is recognized and calculated, and the texts with the highest similarity are chosen. Finally, the class is identified using K neighbors. However, when K is a large value, the computation required to determine the most suitable class becomes difficult60,61.

  • Random Forest (RF) is a supervised learning approach which was proposed by Ho in 199562. It involves constructing multiple decision trees that work in unison, with decision trees serving as the building blocks. During pre-processing, nodes are selected for the decision trees. A random subset of features is used to determine the best feature, and a decision tree is created based on the input vector to classify new objects. Every decision tree is used for classification, and the algorithm assigns tree votes to each class. The class with the most votes from all the decision trees in the forest is selected as the final classification. RF has many advantages over other classifiers such as SVM and NB. For example, RF can handle noisy and missing data, it is robust to overfitting and can work well with high-dimensional data58. RF can also provide information about the relative importance of the features used in classification, which is useful in feature selection and understanding the underlying data structure. In the field of text classification such as sentiment analysis, categorization of news posts, spam filtering etc., RF has been widely used with significant results58,63. Additionally, RF has been applied in feature selection for text classification by using the Gini impurity index, which measures the importance of a feature by the reduction in the impurity of the resulting classification tree62.

Experimental evaluation

We performed our experiments on a specific dataset collected through the process mentioned in section “Data extraction”. Here in this section we discuss the details about annotating that data and the evaluation metrics we used to evaluate the results of our proposed model.

Data annotation and metrics of evaluation

We used ICD-11 criteria for diagnosing PTSD. Being infected with COVID-19 was identified as a triggering event, and then we looked for symptoms under three core domains outlined in ICD-1128 including re-experiencing, hyperarousal, and avoidance behavior. Apart from these three core domains, we also looked at other affective or mood symptoms, its impact, and the treatment availed by the population being studied. Once tweet timelines were extracted once they were identified as “Covid Positive” according to the criteria mentioned in section “Data extraction” PTSD keywords mentioned in Table 2 were used further to filter the most relevant tweets according to the ICD-11 criteria.

  • Tweets which had both their COVID-19 status as well as one of the PTSD keywords mentioned were considered as “PTSD Positive”.

  • All those tweets that mentioned PTSD keywords but in relation to any other event rather than COVID-19 were not taken into consideration and were deemed “PTSD Negative”.

In the Table 3, we provide a sample of Tweets for both of the classes.

Table 3 Classes with example Tweets.

Based on these guidelines, we annotated the dataset of 16,704 tweets and only 1,092 of them found to be PTSD Positive. To keep the dataset balanced, from rest of tweets we took only 1,092 PTSD Negative tweets and used it for our classifiaction model. To train and test our proposed model, we kept 80% of data to train and 20% for test.

Additionally, we computed several performance metrics to assess the efficacy of our proposed approach. These metrics include accuracy, precision, recall, and F1-score, which are frequently used in information retrieval to evaluate the effectiveness of models. As our study involves binary-label classification, we followed the metrics proposed by64. This step was necessary to validate our findings.

Given a set \(C = \{ c_1, c_2 \}\) of class labels, for each class \(c_j\) we define the following counts:

In a binary classification problem, evaluating the performance of the classifier against each class \(c_j\) is important. For this purpose, we can use four metrics, which are commonly used in evaluation of classifiers for binary classification problems. These metrics are as follows:

  • True Positives (\(TP_j\)): predicted values that are accurately classified as positive.

  • False Positives (\(FP_j\)): predicted values that are wrongly classified as positive.

  • False Negatives (\(FN_j\)): predicted values that are wrongly classified as negative.

  • True Negatives (\(TN_j\)): predicted values that are accurately classified as negative.

These metrics provide a quantitative measure of how well the classifier is performing for each class, and can be used to identify areas for improvement in the classification model.

  • \(Accuracy_j\): is defined as the proportion of properly predicted observations with class \(c_j\) to total number of observations. The mathematical formula is as follows:

    $$\begin{aligned} Accuracy_j = \frac{TP_j + TN_j}{TP_j + TN_j + FP_j + FN_j}. \end{aligned}$$
    (3)
  • \(Precision_j\): is defined as the ratio of correctly predicted observations with class \(c_j\) to the total number of correctly predicted observations. The mathematical formula is as follows:

    $$\begin{aligned} Precision_j = \frac{TP_j}{TP_j + FP_j} \end{aligned}$$
    (4)
  • \(Recall_j\): is defined as the proportion of correctly predicted observations labeled with class \(c_j\) to the total number of observations in a class. The mathematical formula is as follows:

    $$\begin{aligned} Recall_j = \frac{TP_j}{TP_j + FN_j} \end{aligned}$$
    (5)
  • \(F1-measure_j\): is the harmonic average of \(Recall_j\) and \(Precision_j\). The mathematical formula is as follows:

    $$\begin{aligned} F1\text {-}measure_j = \frac{Precision_j \times Recall_j}{Precision_j + Recall_j} \times 2. \end{aligned}$$
    (6)

While performing classification in the case of binary classes, we take the average of all the numbers calculated for each class.

Consequently,

$$\begin{aligned}{} & {} Accuracy = \frac{(Accuracy_1 + Accuracy_2)}{2} \end{aligned}$$
(7)
$$\begin{aligned}{} & {} Precision = \frac{(Precision_1 + Precision_2)}{2} \end{aligned}$$
(8)
$$\begin{aligned}{} & {} Recall = \frac{(Recall_1 + Recall_2)}{2} \end{aligned}$$
(9)
$$\begin{aligned}{} & {} F1\text {-}measure = \frac{(F1\text {-}measure_1 + F1\text {-}measure_2)}{2} \end{aligned}$$
(10)

In addition to these metrics, the Area Under the Curve (AUC) is commonly used as a performance metric to evaluate the classifier’s ability to distinguish between positive and negative classes. AUC represents the area under the Receiver Operating Characteristic (ROC) curve, which plots the true positive rate (TPR) against the false positive rate (FPR) at various threshold settings.

The equation for AUC calculation can be written as:

$$\begin{aligned} \text {AUC} = \frac{1}{2} \sum _{i=1}^{N} \left( \text {TPR}_i + \text {TPR}_{i-1} \right) \times \left( \text {FPR}_i - \text {FPR}_{i-1} \right) \end{aligned}$$
(11)

where \(\text {TPR}_i\) is the True Positive Rate at the ith threshold, \(\text {FPR}_i\) is the False Positive Rate at the ith threshold, and N is the number of thresholds.

In the next section, results are reported and discussed by executing the proposed framework on our dataset.

Experiments and comparison of classifiers

Let us consider the following feature patterns mentioned in Table 4 against which we have computed our results using previously mentioned classifiers.

Table 4 Feature patterns with their abbreviations.

Using the above feature patterns in combination with aforementioned classifiers in section “Classification algorithms”, we performed our experimentation and evaluated the results based on evaluation metrics mentioned in section “Data annotation and metrics of evaluation”. In Table 5, we reported our findings using NB classifier.

Table 5 Results obtained by NB.
Table 6 Results obtained by kNN.

NB achieved the maximum accuracy of 81.86% with UxBxT as feature pattern. Meanwhile, it is notable that accuracy is > 81% in all combinations where U is present. Otherwise, the performance have declined.

In Table 6, the results computed using kNN are mentioned.

Similar to NB, kNN achieved highest accuracies with U or its combination with other feature patterns, achieving accuracy > 74% in its combinations. The maximum was 76.61% with U.

Tables 7 and 8 report the findings by SVM and RF, respectively.

Table 7 Results obtained by SVM.
Table 8 Results obtained by RF.

Among all the classifiers we have used in this study, SVM has outperformed other three with highest classification accuracy of 83.29%. Meanwhile, NB comes second with 81.86%, whereas RF and kNN are at third and fourth place with 80.67% and 76.61% accuracy respectively. The findings by SVM and NB are consistent with those by RF and kNN in terms of better accuracy with feature patterns U or its cartesian product with another pattern. All classifiers have performed better with U or UxB or UxBxT or UxBxTxQ. When U is not among the feature patterns, the accuracy has declined among all classifiers. While it is evident that our preferred classification model had 83.29% accuracy, it is important to note where the model misclassified the tweets. As mentioned before in the guidelines that we followed for annotation, states that the tweets which have both, i.e., PTSD related keywords and information related to Covid-19 were labelled as PTSD Positive and others as PTSD Negative. And we encountered instances where tweets containing keywords related to PTSD were labeled as PTSD Negative if they were not specifically related to Covid-19. This approach occasionally resulted in misclassifications, particularly false positives where tweets were incorrectly identified as PTSD positive.

Findings reported in Tables 5, 6, 7 and 8 are visualized in Fig. 5 for better comparison of results.

Figure 5
figure 5

Accuracy comparison of all classifiers.

The best performing algorithm i.e., SVM, which got highest accuracy of i.e., 83.29% with U, it turns out that it did not perform so well with B and BxT where the accuracy declined significantly. Similarly, as reported in58,59, U gives us low computational cost because of less number of features produced as a result of applying TF-IDF, it is a preferred choice for the model to be used for final act of classification.

Conclusion

In this study, we performed our analysis to understand the post COVID-19 mental health dynamics and tweets consumption of COVID-19 positive users. We identified them by using a set of hashtags reflecting the positive diagnosis of Covid. We then extracted their Twitter timelines and performed our analysis on more than 3.96 Million pieces of content produced between March 2020 and November 2021. Our findings suggest that post circulation related to “Other Affective & Biological Symptoms related to PTSD” category is higher than other categories. However, we noticed that a large fraction of users shifted their behavior from “Avoidance” to “Non PTSD Related” and vice versa. We used ICD-11 guidlines to filter and annotate our tweets and developed a machine learning based classification model to segregate our tweets into either PTSD positive or PTSD negative. We got our best results with SVM on unigram as feature pattern with 83.29% accuracy. We also acknowledge that our study’s concentration on English-language tweets may restrict the usefulness of our model for other languages or platforms with different ways of expression. We’re taking this into account for future research plans. In future, we further aim to extend this work by (i) extending the dataset of PTSD Positve tweets and (ii) extracting all the replies/comments on them to (iii) create a model to effectively understand/classify the sentiments of users on those posts.