Introduction

The COVID-19 pandemic and the restrictions to curb the spread of the virus have impacted individuals, communities, and economies worldwide, and the full extent of the impact will not be known for years to come. In South Africa, the pandemic started in March 2020 with a declared state of National Disaster, followed by a total lockdown on March 27. While restricting people’s mobility to reduce the transition of the virus was crucial, it had various negative consequences, including mental health (Giuntella et al., 2020). These negative effects on mental health are a cause for concern since South Africans have already been enduring various stresses and challenges, as noted by the Mental State of the World Report for 2021 (Sapien Labs, 2021). According to this report, the aggregate metric of mental well-being showed that South Africa (with a score of 46 on a scale varying from − 100 “distressed” to 200 “thriving”) ranked the lowest out of 34 countries in 2020 (Sapien Labs, 2021). Adding to this concern about the effect of government policy to combat the spread of the virus is South Africans’ limited access to health professionals. There is less than one health professional per 100,000 people, and this shortage is stark in times of crisis, such as the pandemic (Burns, 2011).

In this study, we follow the widely accepted proposition that mental health has a robust inverse association with subjective well-being (life satisfaction and happiness) (Lombardo et al., 2018; Fergusson et al., 2015; Rissanen et al., 2013). Layard et al. (2013) argued that when an individual is suffering from mental health problems such as depression and anxiety, it is nearly impossible to be happy and feel fulfilled in life. Given that happiness affects our productivity, predicts our future income and affects labour market performance (Piekalkiewicz, 2017; Bryson et al., 2016), it is important to study factors associated with worsening mental health, especially for groups identified as vulnerable. Our focus is on women since they experienced 4% higher levels of negative emotions (stress, sadness, worry and anger) than men from 2020 to 2021 (World Economic Forum, 2022). Additionally, women are more inclined to suffer from poorer mental health and are more likely to be depressed and anxious than men (Li et al., 2023; Smith & Mazure 2021; Zhao et al., 2020).

Furthermore, the WHO (United Nations, 2020) argued that women are one of the population groups at particular risk of the negative effects of a pandemic, especially in terms of mental health. Studying women’s mental health in South Africa is essential because more than half of the population are women (31 million) (StatsSA, 2022). Meyer et al. (2019) found that South African women were almost 1.75 times more likely to be diagnosed with a depressive disorder than men.

The literature highlights multiple tools which can be used to screen the state of an individual’s mental health, one of them being the Patient Health Questionnaire – 2 (PHQ-2). This scale consists of two items measuring the frequency of depressive feelings and anhedonia (Kroenke et al., 2003). The PHQ-2 is a screening tool for possible depression but is not used to establish a diagnosis.

This study primarily aims to investigate changes in women’s mental health during the first year of the COVID-19 pandemic using the PHQ-2 scale. A secondary aim is to determine whether women with pre-existing depressive symptoms are more vulnerable than those who did not report pre-existing depressive symptoms. Our final aim is to determine those factors associated with worsening mental health over time.

Previous studies (discussed in depth in Sections “Background on the Instruments Measuring Mental Health”–“Predisposition /Pre-Existing Depressive Symptoms in Women”) that focused on measuring women’s mental health during the pandemic primarily used cross-sectional survey data (Aba et al., 2022; Bau et al., 2022; Burn et al., 2022; Riley et al., 2021), used restricted sample groups (Aba et al., 2022; Burn et al., 2022; Racine et al., 2021; Yirmiya et al., 2021), used samples not randomly selected (Zilver et al., 2021) used small samples (Aba et al., 2022; Racine et al., 2021) or used responses about pre-pandemic outcomes based on the participants’ memory (Bau et al., 2022).

Given the aforementioned, we contribute to the literature in several ways. First, we focus on one of the most vulnerable groups, namely women and consider how their mental health changed during the first year of COVID-19, thus an extended period from July-August 2020 to April-May 2021. Using this extended period, we can investigate mental health over a year, during which the regulations pertaining to COVID-19 and the number of infections and deaths changed. Second, when conducting our analyses, we rely not solely on the PHQ-2 scale but also on the two separate items of this scale, ‘depressive feelings’ and ‘anhedonia’. This allows for a more granular investigation and provides information on whether the PHQ-2 scale possibly conflates these two separate items or whether the COVID-19 pandemic (a health pandemic with strict regulations) created a need to disaggregate the PHQ-2 scale not apparent during non-pandemic times. Lastly, our analysis uses pooled and ordered logit models with fixed effects. The logit model with fixed effects has the advantage over linear probability or ordered logit models in that it considers the data’s ordinal structure and accounts for time-invariant heterogeneity.

We benefit from panel data stretching from before the pandemic to thereafter. We use wave 5 (collected in 2017) of the National Income Dynamics Survey (NIDS) panel dataset and its extension, the National Income Dynamics Study – Coronavirus Rapid Mobile Survey (NIDS-CRAM) dataset, which includes five waves over a period from 2020 to 2021. The NIDS-CRAM is one of Africa’s largest and most comprehensive data collection projects that interviews the same people over time (Daniels & Casale, 2022). The data captures various aspects of the social and economic consequences of the COVID-19 pandemic and the national lockdown, including the socioeconomic profiles of individuals. The dataset allows us to create a balanced panel sample of the same women and follow them over time. Furthermore, we can do an in-depth analysis as we have the advantage of having a combined PHQ-2 scale of depressive symptoms as well as two items of this scale, ‘depressive feelings’ and ‘anhedonia’.

Contrary to expectations, our results showed that the individual scale items (‘depressive feelings’ and ‘anhedonia’) behaved differently over time. More specifically, we found that median values of the depressive symptoms, as measured by the PHQ-2 scale, significantly increased during the first period, coinciding with strict regulations and increased COVID-19 cases. However, over time, as regulations became less severe alongside the introduction of vaccines in 2021, the depressive symptoms eased. When it comes to the two separate items of the scale, we found that the median values of the item ‘depressive feelings’ remained relatively constant over the period under investigation. In contrast, the median values of ‘anhedonia’ increased considerably from July/August 2020 to November/December 2020. Furthermore, it continued to increase, albeit marginally, in the April-May 2021 period.

The two items of the PHQ-2 scale demonstrating varying patterns are surprising since items in a scale should be highly correlated when measuring the same construct. Subsequently, we tested the internal consistency of the PHQ-2 scale, and the low Cronbach’s alpha confirmed our observations. This result poses the question of whether the PHQ-2 scale is reliable during a pandemic and highlights the importance of analysing the scale items individually.

Considering these results coupled with the confirmation of the low Cronbach alpha, it seems that the two separate items measure different concepts and should be considered separately, as ‘anhedonia’ is the more severe symptom and should be the focus during significant stressful times.

When it comes to the role pre-existing depressive symptoms played, we found that having pre-existing depressive feelings was statistically significant only in the item ‘depressive feelings’ and increased the probability of women being more depressed during the investigation period.

Considering the factors significantly related to the PHQ-2 scale and the individual items, the differences are further emphasised with different factors related to the two items. Looking at the variables likely related to the pandemic, we find that living with a person who has gone hungry increases the probability of negative outcomes across all models. Taking care of children for 13–24 h a day increased the probability of the PHQ-2 scale and ‘anhedonia’. Furthermore, there were distinct factors that were related only to the separate items of the scale. Wearing a mask decreased the probability of ‘depressive feelings’ among women. While living in a household where someone received a government grant decreased only ‘anhedonia’, improving women’s mental health.

Our results on the PHQ-2 scale show that more research is needed regarding the short instruments used during unprecedented times, given that we face a worsening geopolitical climate and uncertainty regarding when the world will face the next pandemic.

The rest of the paper is structured as follows. The next section briefly introduces instruments used to measure mental health, discusses relevant literature on women’s mental health during the pandemic and highlights studies utilising the NIDS and NIDS-CRAM data. Section “Data and Variables” describes the data and selected variables, and Section “Methodology” outlines the methodology. The results and analyses follow in Section “Results”, while the paper concludes in Section “Conclusions”.

Literature on Women’s Mental Health During the COVID-19 Pandemic

Background on the Instruments Measuring Mental Health

Measuring mental health is complex and involves a variety of assessment techniques. Mental health professionals often perform clinical interviews to gather individuals’ mental health history and current symptoms, as well as gather information through observational and neurological assessments. Psychological questionnaires and surveys are among the most common assessment tools to evaluate an individual’s aspects of mental health (anxiety, depression, addiction, aggression, eating disorders, emotions, personality). Some examples of the instruments include the Generalised Anxiety Disorder (GAD), validated by Lowe et al. (2008); the Beck Depression Inventory (BDI), developed by Beck (1961); the Positive and Negative Affect Schedule (PANAS) developed and validated by Watson et al. (1988); the Centre of Epidemiology Studies Depression Scale (CES-D) developed by Radloff (1977) and the Patient Health Questionnaire (PHQ) developed by Kroenke et al. (2001)Footnote 1. All the aforementioned assessment tools include questionnaires asking individuals to evaluate their symptoms using self-reported Likert scales. The main question is, “Over the last 2 weeks, how often have you been bothered by any of the following problems?” and the items of the scale follow the main question. The respondents choose an option of the scale based on how they feel about a statement or a question. When these scales are developed, it is important to test whether the items in a scale show internal consistency. In other words, if the scale is a consistent measure of a concept (such as depression). Cronbach’s alpha is one way of measuring the strength of that consistency. Good internal consistency is considered when alpha coefficients are above 0.7–0.8 (Bland & Altman, 1997).

The PHQ is a 3-page questionnaire with the corresponding modules on the common diagnoses (major depressive disorder, panic disorder, anxiety disorder, bulimia nervosa, alcohol dependence, somatoform and binge eating disorder) (Kroenke et al., 2001). This survey can be self-administered by an individual; however, it does not establish a final diagnosis (insufficient basis to initiate treatment), and further evaluation by a professional is required. A more common version of the tool is the PHQ-9, which consists of 9 items derived from the depression module of the full PHQ and is designed to screen for possible symptoms of depression.

The PHQ-9 contains items pertaining to (1) depressive feelings, (2) anhedonia, (3) trouble sleeping, (4) feeling tired, (5) change in appetite, (6) guilt or worthlessness, (7) trouble concentrating, (8) feeling restless, and (9) suicidal thoughts. The Cronbach alpha of this scale with development was 0.84 (Bisby et al., 2022). This shows that the items on the scale have high internal consistency and a valid instrument to test, for example, depression.

Furthermore, the significantly shorter version of the PHQ, the PHQ-2 scale, was used in this study since it was included in the NIDS-CRAM data set (see a discussion on the data in Section “Data”) and derived from the PHQ-9 scale. The PHQ-2 scale consists of only two items measuring the frequency of (1) depressive feelings and (2) anhedonia. For each item, the response options are “not at all,” “several days,” “more than half the days,” and “nearly every day,” scored as 0, 1, 2, and 3, respectively. Thus, the PHQ-2 score can range from 0 to 6 (a further discussion on the construction of the PHQ-2 scale is provided in Section “The Outcome Variables: PHQ-2 Scale, ‘Depressive Feelings’ and ‘Anhedonia’”). The Cronbach alpha of this scale with development was 0.79 (Bisby et al., 2022). This shows that the items on the scale have a high level of internal consistency and, therefore a valid instrument.

Apart from the above, other studies such as Kroenke et al. (2003) Cronbach alpha of 0.92 for PHQ-2, Lowe et al. (2005) Cronbach alpha of 0.83 for the PHQ-2, Staples et al. (2019) Cronbach alpha of 0.83 for the PHQ-2 and 0.85 for PHQ-9, and Errazuriz et al. (2022) Cronbach alpha of 0.75 for the PHQ-2 and 0.90 for PHQ-9 also confirmed the reliability and validity of the PHQ-2 and PHQ-9 tools, indicating that they have sound psychometric properties.

Over Time Comparison

Studies analysing the over time comparison of women’s mental health during and before the pandemic show no consensus.

Zilver et al. (2021) found no differences in high levels of stress, depression and anxiety levels (Hospital Anxiety and Depression Scale, Perceived Stress Scale-10) among 1,466 pregnant women during and before the pandemic in the Netherlands. However, these results could suffer from selection bias because the control group (women fearing childbirth) could have already experienced higher stress levels. On the contrary, Racine et al. (2021), controlling for time trends, identified that the mean maternal depression and anxiety scores were higher during the pandemic among 1,301 women from Canada than those collected in various waves prior to the pandemic (at the 3-year, 5-year, and 8-year time points). Even though this study analyses longitudinal data with a relatively large sample size, the authors concentrated only on a sample of mothers.

Two studies analysing multiple time points during the pandemic found an initial spike in negative mental health symptoms followed by a slow return to lower levels. Gigantesco et al. (2022) found that initially, depressive symptoms of both genders of Italian adults increased, decreased, then increased again, and finally gradually returned to above pre-lockdown levels. Ebrahimi et al. (2022) obtained higher initial levels of anxiety (GAD-7) and depressive symptoms (PHQ-9) in Norwegian women (7,851 females) at the beginning of the pandemic, followed by a sharper decrease after four months. However, the studies by Ebrahimi et al. (2022) and Gigantesco et al. (2022) did not follow the same individuals over time.

Predisposition /Pre-Existing Depressive Symptoms in Women

To the best of our knowledge, a handful of studies control for predisposition to depression or pre-existing depressive symptoms among women to show if a person is prone to higher levels of depressive symptoms during a crisis.

Racine et al. (2021) (as discussed in more detail in Section “Over Time Comparison”) found that a history of mental illness was not statistically significant in predicting depression scores (CES-D-10 scale) in a fixed effects model. In contrast, Aba et al. (2022) found that pregnant women with a history of psychiatric disease had a 3.92 times higher risk of developing anxiety than those without it. This result did not hold when analysing depression (history of the psychiatric disease was not significant). Idzik et al. (2021) did a correlation analysis and found that mental disorders among 452 Polish women positively correlated with anxiety, depression, irritability, and loneliness.

In terms of studying both men and women, Kim et al. (2022) found that psychiatric risk increased depression in adults in Soweto (South Africa), while the self-reported quality of life was a protective factor against depression.

Factors Associated with Women’s Mental Health During COVID-19

Existing literature does not provide definitive information about the factors contributing to women’s mental health worsening during the pandemic. There are not many studies that concentrate on women in general. For instance, Bau et al. (2022) examined containment policies and their impact on 573 women from rural Indian households. Using cross-sectional data and a small sample, they found that the pandemic resulted in dramatic income losses and increased food insecurity. A logit model indicated that women with daughters and those living in a female-headed household were more likely to have worsening feelings of depression (PHQ-9 scale), exhaustion, and feelings of safety. Furthermore, the pre-pandemic outcomes were based on the participants’ memory of their circumstances.

A large portion of the literature is focused on restricted samples of women, such as pregnant women (Racine et al., 2021; Yirmiya et al., 2021; Luo et al., 2022; Aba et al., 2022;), working women (Burn et al., 2022) and homeless women (Riley et al., 2021). In general, income/finances, employment status, number of dependants, support and health concerns were found as risk factors for symptoms of depression and anxiety levels in pregnant women. For working women, social distancing and reduced social contact through working from home were associated with higher depressive symptoms. Recent homelessness, unmet subsistence needs (food, closing, housing, hygiene resources), difficulty getting care, feelings of isolation and inability to rely on others increased symptoms of depression in homeless women.

Previous NIDS-CRAM Studies

Since we use both the NIDS and NIDS-CRAM datasets in our study, it would be remiss of us not to highlight other studies that considered mental health using these datasets. We will first discuss five studies that used cross-sectional data and conclude with one study that used panel data. The reader should note that all these studies focused on the adult cohort, not only women.

Oyenubi and Kollamparambil (2020) compared the prevalence of depressive symptoms (using PHQ-2 and CES-D-10) between 2017 and 2020 (wave 2 NIDS-CRAM, four months into the pandemic), along with the distribution of variables known to be correlated with depressive symptoms. Their study employed a linear probability model regression analysis. They concluded that demographic indicators that point to a higher level of wealth (in the South African context) tend to increase the likelihood of screening positive for depressive symptoms relative to the pre-pandemic period (a shift from pro-poor in 2017).

Posel et al. (2021) investigated the effect of job loss and job furlough on mental health using wave 2 NIDS-CRAM data and employed an ordered logit model. They used a dummy variable version of the CES-D-10 scale and found it did not affect depressive symptoms (PHQ-2 scale) of employed adults before the lockdown. Oyenubi and Kollamparambil (2022) also used a dummy variable version of the 2017 CES-D-10 scale. They found that it was not statistically significant in predicting depressive symptoms (PHQ-2 scale) among groups of adults divided by their socioeconomic status (income). Their study employed a threshold model by household income level. Oyenubi, Nwosu and Kollamparambil (2022) investigated income-related inequality in depressive symptoms using all five waves of NIDS (CES-D-10) and NIDS-CRAM (PHQ-2 as a dummy variable) data. They constructed concentration indices for all waves and decomposed the income-related inequalities in well-being for wave 5 NIDS-CRAM (April – May 2021). The CES-D-10 indices were all negative and statistically significant, showing that they were consistent with the expected negative relationship between income and well-being. In contrast, the indices using PHQ-2 showed that income-related inequality in depressive symptoms had a weakened relationship. Regarding the decomposition, they found the concentration index was negative for PHQ-2 but not statistically significant. Additionally, the variables that significantly and positively contribute to inequality in PHQ-2 include COVID-19 risk perception, being White and traditional/mud-dwelling type—in contrast, years of schooling and hunger contributed negatively.

Oyenubi et al. (2022) used the first two waves of NIDS-CRAM to examine the causal effects of COVID-19 risk perceptions on adult depressive outcomes (PHQ-2). The authors used covariate balanced propensity scores and a weighted least square estimator and employed multiple linear regression models (OLS and Logit). They found that the point prevalence of perceived COVID-19 infection risk increased between the third and fifth months of the pandemic, which corresponded with elevations in national COVID-19 infection rates. Additionally, chronic illness, unemployment, and a loss of household income between waves 1 and 2 were risk factors for worse depressive symptoms. Counterintuitively, being African was associated with lower levels of depression.

In terms of panel data, Oyenubi and Kollamparambil (2021) investigated the implications of easing lockdown restrictions on mental health (PHQ-2) using waves 2 and 3 (seven months into the pandemic) of the NIDS-CRAM data. They used the Oaxaca-Blinder decomposition to decompose the change in the proportion of respondents who screened positive for depressive symptoms between the two periods. The authors found that both the distribution and the risk of screening positive for depression increased with an easing of lockdown restrictions. Additionally, the decomposition analysis showed that most of the change was explained through the coefficient effect (returns to characteristics).

From the discussion in Sections “Background on the Instruments Measuring Mental Health”–“Factors Associated with Women’s Mental Health During COVID-19”, the existing literature on women’s mental health during the COVID-19 pandemic has several limitations we address in our study. Firstly, we utilise a combined PHQ-2 scale of depressive symptoms and analyse two items of this scale (‘depressive feelings’ and ‘anhedonia’) separately. Secondly, we analyse the mental health of an entire cohort of women, while most studies used restricted sample groups like pregnant women, working women, women experiencing homelessness or the general population. Additionally, we benefit from analysing the same individuals. Some studies used samples that were not randomly selected or had small samples. Thirdly, we have the advantage of utilising panel data, which contains a period before the pandemic and thereafter in 2020–2021 (a one-year period). Previous studies primarily used cross-sectional survey data that does not allow over time comparison. Finally, we control for the predisposition to depression or pre-existing depressive symptoms among women from the same panel data prior to the pandemic, while other studies used responses about pre-pandemic outcomes based on the participants’ memory. Moreover, this is the only study that uses an ordered logit model with fixed effects estimations that has the advantage of considering the data’s ordinal structure and accounting for time-invariant heterogeneity.

Data and Variables

Data

We utilise the National Income Dynamics Study – Coronavirus Rapid Mobile Survey dataset (NIDS-CRAM)Footnote 2 to investigate the pandemic’s impacts on women’s mental health in South Africa as well as wave 5 of the National Income Dynamics Study (NIDS) to determine the predisposition to depressive symptoms (NIDS-CRAM, 2021). The Southern Africa Labour and Development Research Unit (SALDRU) collected the data. NIDS-CRAM is a special Computer Assisted Telephone Interviewing (CATI) longitudinal survey focusing on the social and economic consequences of the COVID-19 pandemic and the national lockdown (Ingle et al., 2021). Participants for the NIDS-CRAM sample were drawn using a stratified sampling design from wave 5 of the National Income Dynamics Study (NIDS), which was collected in 2017 (Kerr et al., 2020). The NIDS-CRAM survey has five waves spanning over one year, starting in May 2020 and finishing in April 2021 (see Table 1).

Table 1 Data collection from the NIDS and NIDS-CRAM

Given the above, the NIDS-CRAM is unique because it is a continuation of the pre-existing NIDS survey with a subsample of adults who were re-interviewed during the different stages of lockdown to obtain information on the socio-economic impacts of the crisis in a timely manner (Daniels & Casale, 2022). It has become one of Africa’s largest and most comprehensive data collection projects.

The first wave of NIDS-CRAM surveyed 7,073 adults aged 18 years and older. Starting from wave 2, those individuals who refused to be interviewed were not re-interviewed. Naturally, individuals who passed away were excluded. The attrition between waves 1 and 2 was approximately 19%; thus, an additional top-up sample of individuals not yet selected for the survey was added from the original NIDS wave 5, resulting in 8,157 individuals per wave in total in waves 3 to 5 (see Table 1). Ingle et al. (2021) noted negative attrition in the NIDS-CRAM due to participants dropping out from the survey and coming back to participate in later waves, which complicates the calculation of weights.

Since the NIDS-CRAM sample was drawn from the NIDS wave 5, the panel weights provided in the NIDS-CRAM dataset will only estimate “a broadly representative sample of individuals from the 2017 sample” (Ingle et al., 2021). Thus, the use of weights derived from the NIDS wave 5 in the analysis of the NIDS-CRAM dataset does not lead to the construction of a nationally representative sample of South African women for the pandemic years of 2020 and 2021 (Posel et al., 2021; Kerr et al., 2020). Furthermore, Ingle et al. (2021) note that even after estimating the parameters of interest using the weights, cluster and stratum variables, the estimates from the NIDS-CRAM are overestimated compared to the national surveys with larger sample sizes. Therefore, since the weighted estimates do not depict the accurate representation of the population and introduce additional biases, we do not utilise weights in our analysis and refer to our results as sample estimates.

Unlike NIDS, NIDS-CRAM did not have a separate household-level questionnaire and only included some household-related questions due to the limits of the telephonic surveys. Furthermore, NIDS-CRAM only interviewed selected individuals in a household (NIDS interviewed all household residents) and did not track the household IDs; therefore, only individual-level analysis is possible. Moreover, some information collected in several waves was not reported in others, meaning some variables do not appear in all waves.

Comparisons between the NIDS and NIDS-CRAM datasets should be made cautiously because of some important methodological differences (Ingle et al., 2021). Firstly, NIDS performed in-person interviews, whereas NIDS-CRAM used the CATI method. Secondly, there were differences in questionnaire types where NIDS-CRAM did not have household-level questionnaires. Finally, some questions have wording differences, meaning definition and measurement variations could exist.

Selection of Variables

The Outcome Variables: PHQ-2 Scale, ‘Depressive Feelings’ and ‘Anhedonia’

To address the research questions, we consider three outcome variables: the PHQ-2 scale and the two items of the scale individually, namely, ‘depressive feelings’ and ‘anhedonia’, for a more granular level of investigation.

The NIDS dataset included a bigger selection of questions on mental health. For example, wave 5 of the NIDS dataset (surveyed in 2017) included ten questions, creating the Center for Epidemiologic Studies Short Depression Scale (CES-D-10). However, as the NIDS-CRAM is a shortened version of the NIDS dataset, the questions included on mental health are presented via the Patient Health Questionnaire (PHQ-2) with only two variables: “Over the last two weeks, have you been feeling down, depressed or hopeless?” and “Have you had little interest in doing things?”Footnote 3. Additionally, these questions were only asked in the NIDS-CRAM waves 2, 3 and 5 and not in waves 1 and 4; thus, waves 1 and 4 were not included in our analysis.

“Over the last two weeks, have you been feeling down, depressed or hopeless?” and “Have you had little interest in doing things?” included categories “not at all”, “several days”, “more than half the days” and “nearly every day” which we coded from 0 to 3. We then sum these two variables, creating a combined PHQ-2 scale ranging from 0 to 6, with the higher values indicating higher depressive symptoms.

Selection of the Covariates

Guided by the reviewed literature (see Sections “Over Time Comparison”–“Predisposition /Pre-Existing Depressive Symptoms in Women”), the main groups of factors associated with women’s mental health are socioeconomic, household composition, health, social support, habits and COVID-specific factors. Sociodemographic information included women’s age (measured as the continuous variable), race (categorised as African and other), years of education (measured as the continuous variable), and geographical area of residence (rural, urban).

Economic factors contained women’s employment status (categorised as not economically active, unemployed, employed), receipt of any government grantsFootnote 4 (categorised as yes and no), and households’ income (measured as the continuous variable). Household characteristics comprised household size (measured as the continuous variable), type of dwelling (formal and informal), and access to electricity and water (measured as dichotomous variables). Food insecurity is captured by whether anyone in the household has gone hungry due to a lack of food (measured as a dichotomous variable). The NIDS-CRAM survey does not include direct questions related to social support; therefore, we use marital status (categorised as married or single) as an indicator of whether there is someone upon whom women can rely.

Regarding the NIDS-CRAM, we selected variables that are unique to this dataset, such as whether a person wore a mask in the past seven days, started washing their hands more often, and whether a woman thought she was likely to get the coronavirus (all measured as dichotomous variables). Social distancing and school closures meant children’s activities shifted to home, and women had a greater responsibility for caring for children (the daily number of hours a woman had to look after kids). We control for pre-existing depressive symptoms by including the “being depressed in 2017” variable derived from wave 5 of the NIDS data. Since we are only interested in whether the existence of depressive symptoms and not the exact frequency of symptom occurrence matter, we recoded “Respondent felt depressed in the past week?” with a 4-scale into a dichotomous variable with a cut-off point of ≥ 1 (1 = being depressed, 0 = not depressed). The final independent variable, “being depressed in 2017”, was copied across NIDS-CRAM 2, 3 and 5 waves. It should be noted that we tested whether the variable transformation altered the results and found no difference.

Due to the limitations of the NIDS-CRAM data (mentioned in Section “Data”), we could not include variables capturing health, habits, religious beliefs or personality. In wave 3, household income was surveyed as the bracket breakdown, while in other waves, the values indicated the exact household income amounts. Therefore, we estimated the amounts in wave 3 by midpoint value between bracket points in wave 3 (we also considered a median point value between brackets and found very little difference between the results). The remaining variables had less than 5% missingness, so we used the mode and means (as appropriate) to impute missing data (see Table S1 in Supplementary Information).

Sample Description

In this section, we briefly discuss our selected sample. As was mentioned in Section “Data”, wave 1 of the NIDS-CRAM was a shortened version of wave 5 of the NIDS data, including COVID-19-related questions. The number of individuals surveyed in the NIDS-CRAM dataset varied per wave – from 5,676 to 6,130 (see Table 2). Subsequently, we excluded 421 observations where individuals did not complete the interviews (non-response). We restricted our sample to women aged 15 years and older in the baseline wave 5 of the NIDS dataset, reducing our sample size. We created a balanced panel dataset (thus, each woman appeared in each wave). Each wave included 2,415 women, with a total of 7,245 women in a pooled sample. Therefore, the sample consistently reflected the same women across the three waves included in our sample (waves 2, 3 and 5) with no new entrees or attrition.

We do not include survey weights in our analysis to generate population estimates because the available weights are benchmarked to a sample in 2017 (NIDS wave 5), which was itself not nationally representative. Furthermore, we do not use balanced panel weights as we deliberately selected only women who appeared in all the waves, including wave 5 of the NIDS dataset; this further affected the national representativeness of the sample (also see discussion in Section “Data”). Naturally, there was no attrition within our selected sample. Our intent in the analysis was to specifically analyse the longitudinal effects of the pandemic on the exact same sample of women and their mental health, stretching from pre-COVID in 2017 to peri-COVID in 2021. This avoided any distortions that might be brought about by weighting. Our second aim is to evaluate the performance of the PHQ-2 scale during a pandemic; we do not use panel weights as they can obscure the results. Therefore, we consider a model-based approach more suitable and refer to our estimates as sample estimates.

Table 2 Sample collection from the NIDS-CRAM

The demographic characteristics by wave can be found in Table S2 in the Supplementary Information. A substantial proportion of South African women are African (88.2% or 2,131 individuals), with a smaller percentage being from other population groups. More than half of the women were single (56.7% in the pooled dataset) and were, on average, 42 years old. On average, women in our sample have 11 years of education in the pooled dataset, and 38% are employed. At the same time, the unemployed were 36% (in the pooled dataset), and the rest were not economically active (26%). Many women reside in rural areas (59% in the pooled dataset) and formal flat or house dwellings (75%), with a smaller percentage residing in traditional houses or informal dwellings. Almost 70% of women have access to water, and 94% have electricity (in the pooled dataset). On average, women live in a household with six people (over three waves). The monthly household income per person is close to R1,300 ($72 in the pooled dataset), while almost 50% of women reside in households where someone receives a government grant. Only 79% of women (in the pooled dataset) live in a household where no one has gone hungry in the past week.

Methodology

This section explains how we measure the change in women’s mental health over time, using our three outcome variables (PHQ-2, ‘depressive feelings’ and ‘anhedonia’) during the first year of the COVID-19 pandemic. Then, it states the model used to test whether pre-existing depressive symptoms are related to any of the three variables of interest. Finally, we specify the model determining those factors associated with our three variables of interest.

Analysis of Women’s Mental Health Over Time

Our first aim is to analyse how women’s health has changed over time. Firstly, we use a graphical method, namely line plots, to show the change in the three variables across the three waves spanning the one-year period. Secondly, we test for significant differences between the median levels of the three variables across waves using the Kruskal-Wallis H test. It is a rank-based nonparametric test optimal for ordinal dependent variables and allows for comparing more than two independent groups (Kruskal & Wallis, 1952). The null hypothesis is that there is no difference between groups. We reject the null hypothesis if the p-value is < 0.05%, meaning that the median scores of two or more group categories are statistically different from each other.

Subsequent to analysing women’s mental health over time, we test the internal reliability of the PHQ-2 scale using Cronbach’s alpha to investigate our findings further. Good internal consistency is considered when alpha coefficients are above 0.7–0.8 (Bland & Altman, 1997). However, due to alpha being sensitive to the number of the items included in the test, we also calculate Spearman-Brown coefficients as an additional measure to test the scale’s reliability. The general rule is that correlations below 0.3 have little correlation, 0.3 to 0.5 is a low or weak correlation, 0.5 to 0.7 are moderate, and 0.7 to 0.9 are highly correlated.

Moreover, to compare the specificity of the PHQ-2 scale, one needs to correlate the scale to other instruments. We compare the results to the other studies measuring the PHQ-2 scale and the Cronbach alpha’s reported during the development of the scale (see Section “Background on the Instruments Measuring Mental Health”).

Controlling for Pre-Existing Depressive Symptoms Among Women

Our second aim is to determine how women with pre-existing depressive symptoms reacted to the pandemic compared to those who did not report pre-existing depressive symptoms. Since the variable “being depressed in 2017” is invariant over time, using a fixed effects estimator is not an option as fixed effects partial out the effects of time-invariant variables.Footnote 5 Therefore, we use a pooled ordered logit model (see Eq. 1) for each outcome variable with wave-fixed effects and clustered the standard errors at the individual level:

$${Y}_{i}^{*}={\beta Depressed \, 2017}_{i}+{X}_{i}^{{\prime }}+{\epsilon }_{i}$$
(1)

Where \({Y}_{i}^{*}\) is a mental health outcome of interest, i.e., the PHQ-2 scale, ‘depressive feelings’ and ‘anhedonia’ for individual \(i=1,\dots , N.\)\({Depressed 2017}_{i}\) is a binary variable of “being depressed in 2017”, \(\beta\) is a vector of coefficients, and \({X}_{i}^{{\prime }}\) is a set of individual-level covariates mentioned in Section “Determining the Factors Associated with Women’s Mental Health”, and the error term \({\epsilon }_{i}\).

Determining the Factors Associated with Women’s Mental Health

Our final aim is to analyse the factors associated with women’s mental health deterioration during the COVID-19 pandemic while controlling for the time-invariant effects.

The PHQ-2 scale, ‘depressive feelings’ or ‘anhedonia’ can be introduced as a latent variable (\({y}_{it}^{*}\)) which depends linearly on a set of covariates (\({x}_{it}\)), whereas what we observed (\({y}_{it}\)) is measured on an ordinal scale. Due to the ordinal nature of the data, the ordered logit model is appropriate for the analysis (Wooldridge, 2010) and is the base model in our study. The ordered logit model (ologit) can be expressed as follows:

$${y}_{it}^{*}={x}_{it}^{{\prime }}\beta +{\epsilon }_{it}$$
(2)

Where the individual is \(i=1,\dots ,N\); time is \(t=1,\dots ,T\); error term (\({\epsilon }_{it}\)). The observed (\({y}_{it}\)) score can, therefore, be described as:

$${y}_{it}=\left\{\begin{array}{ll} 0& if\, {y}_{it}^{*}\le 0\\ 1 & if\, 0<{y}_{it}^{*}\le {\mu }_{1}\\ 2 & if\, {\mu }_{1}<{y}_{it}^{*}\le {\mu }_{2}\\ &. . \\ &. . \\ N & if\, {\mu }_{N}<{y}_{it}^{*}\end{array}\right.$$
(3)

Where \({\mu }_{N}\) are externally imposed endpoints of the observable categories.

Using panel data, which includes three waves, we employ panel estimation techniques to consider the time dimensions (\(t\)). Fixed and random effects models are general approaches to analysing panel data. To choose the most efficient estimator, we use the Hausman test.

We performed a Hausman test to compare the random and fixed effects estimators of our different outcome variables. The null hypothesis is that there are no correlations between unique errors and the regressors in the model. The test result of the Hausman test was chi-squared = 61.93 with the p-value = 0.000; thus, we reject the null hypothesis and conclude that the fixed effects ordered logit model is the preferred estimation technique (these and other diagnostic tests can be found in Table S4 in Supplementary Information).

Note that the random effects ordered logit model specifies an additional unobservable individual-specific error term (\({\epsilon }_{it}\)) under the assumption that it is normally distributed and independent of the regressors. Fixed effects models relax these assumptions and leave them completely unrestricted, allowing them to account for any potential endogeneity occurring from time-invariant characteristics (\({a}_{i})\).

Based on the results, we use the ordered logit model with fixed effects (feologit) in our estimations, which is expressed as follows:

$${y}_{it}^{*}={x}_{it}^{{\prime }}\beta +{\epsilon }_{it}+{a}_{i}$$
(4)

We use a newly developed feologit estimator by Baetschmann et al. (2020), which addresses the issue of inconsistent estimators. This method uses the blowup and cluster (BUC) estimator, implying that each individual has different thresholds \(({\tau }_{it}).\) The feologit estimator replaces an individual’s observations with copies of itself, then dichotomises them at different cut-off points and applies the composite marginal likelihood (CML) estimator with clustered standard errors. This estimation results in more observations than the initial sample due to the created clones of the same individual (for a more detailed mathematical explanation, see Baetschmann et al.’s (2020) paper).

It is stated that the BUC estimator has good properties, the efficiency loss in finite samples is minor, and it has fewer convergence problems compared to the two-step generalised method-of-moments estimator or empirical likelihood estimators (Baetschmann et al., 2020).

Before the development of the feologit estimator, there was no consistent ordered logit model with a fixed effect estimation (Baetschmann et al., 2020). Therefore, researchers transformed the dependent variable into a binary response and utilised a consistent fixed effects estimator for the binary logit model (xtlogit, fe). As a robustness check, we also follow this method (see Table S3 in Supplementary Information) with a recommended transformation of the PHQ-2 scale with a cut-off point ≥ 3 (out of possible score from 0 to 6) (Kroenke et al., 2003) and the alternative cut-off point of ≥ 2 (Manea et al., 2016).

Results

Descriptive Analysis of Women’s Mental Health Over Time

This section addresses our first aim by using graphical analysis to follow the changes in the number of women (frequencies) experiencing mental health symptoms over time (see more details in Table S2 in Supplementary Information). Additionally, we determine the significance of the differences in the median values of women’s mental health between waves using a Kruskal-Wallis H test (see Table 3).

Regarding the PHQ-2 scale (Fig. 1), we find that at the beginning of the pandemic (in wave 2), almost 50% of women screened positive for a depressive symptom, increasing to almost 58% (in wave 3) before slightly dropping to 55% (wave 5).

In contrast, when looking at the item ‘depressive feelings’, we find no drastic change in the number of women who expressed recurring depressive feelings. At the beginning of the pandemic (wave 2), it was approximately 34% of women, increasing marginally to 37% (wave 3) and then slightly decreasing to 36% (wave 5).

Furthermore, 36% of women expressed recurring symptoms of ‘anhedonia’ (see Fig. 1) at the beginning of the pandemic (wave 2), increasing to 46% (wave 3). After one year of the pandemic, it dropped to 44% (wave 5).

Fig. 1
figure 1

The change over time in negative categories by the outcome variables

All three outcome variables, therefore, follow the same pattern (see Fig. 1). From wave 2 to 3, the number of women experiencing symptoms of depression (PHQ-2), ‘depressive feelings’ or ‘anhedonia’ increased. Finally, as the lockdown eased and the pandemic evolved, the number of women with any positive screening via the PHQ-2 scale or having ‘depressive feelings’ and ‘anhedonia’ decreased. This is consistent with the previous study by Gigantesco et al. (2022) and can be related to the lockdown regulations and the severity of the pandemic.

Turning to the Kruskal-Wallis H test, we test if the differences in the values of the three variables over time are significant. We start by testing for the whole period – thus comparing wave 2 to wave 5. Next, we compare each wave separately (see Table 3).

Table 3 Significance of the difference over time (a Kruskal-Wallis H test)

Regarding the PHQ-2 scale, the median values between wave 2 and wave 5 are statistically significant at the 1% level (see Table 3). We also find significant differences between wave 2 and wave 3. However, the difference between wave 3 and wave 5 is no longer significant.

Additionally, the difference in median values of the ‘anhedonia’ between wave 2 and wave 5 is statistically significant at the 1% level (see Table 3). We also find significant differences between wave 2 and wave 3. However, the difference between wave 3 and wave 5 is no longer significant.

On the contrary, the difference in median values of the ‘depressive feelings’ outcome is not statistically significant between waves (see Table 3), nor between waves 2 and wave 5, or wave 2 and wave 3 and wave 3 and wave 5. It indicates that the number of women who felt depressed did not change much over time. This result agrees with the finding in Fig. 1, where even though the frequencies of ‘depressive feelings’ increased in wave 2, the increase was not drastic.

In summary, we find that the two items in the PHQ-2 scale behaved differently over time –the median values of ‘depressive feelings’ did not change significantly, while ‘anhedonia’ did change. Given this unexpected result, we test the internal reliability (consistency) of the PHQ-2 scale using Cronbach’s alpha and the Spearman-Brown coefficient.

We find that Cronbach’s alpha is 0.49 across all waves, and the Spearman-Brown coefficient is 0.491. Compared to the studies mentioned in Section “Background on the Instruments Measuring Mental Health” (Cronbach’s alpha ranging from 0.79 to 0.92), the internal consistency of our PHQ-2 scale is low. Additionally, a previous study by Oyenubi et al. (2022) calculated Cronbach’s alpha for the PHQ-2 on the general population sample and found the coefficient to be 0.78, which indicates good internal consistency. This leads us to conclude that although the PHQ-2 scale is often used and was found to be internally consistent for the general population, it is important to note that in a pandemic, the consistency of the scales, also pertaining to specific cohorts, can be affected. The finding highlights the importance of not only the scale but also the individual items of the scale.

We further argue that likely reasons for the observation of a spike in wave 3, the significance in median values and the inconsistency in the reliability of the PHQ-2 scale are related to (i) the unprecedented nature of the COVID-19 pandemic, (ii) restrictions and regulations to curb the spread of the virus, and (iii) the disease itself.

For instance, during wave 2 (data collected from July 13 – August 13 2020), advanced level 3 lockdown restrictions were enforced. These restrictions included, for example, only takeaway restaurants being permitted to operate, and public places, gatherings, and sporting activities were all restricted. Additionally, everyone was confined to their residence from 11 pm – 4 am (President Cyril Ramaphosa, 2020). From the start of the outbreak until the last day of data collection for wave 2, South Africa had a total number of 568,919 confirmed cases, which is 444,329 cases more than at the end of wave 1 data collection (May 7 – June 27 2020) (Mathieu et al., 2020). There was a spike in the daily new cases, averaging 12,131 per day compared to 6,215 in wave 1. The total number of accumulated deaths starting from the beginning of the pandemic was around 13,496.

During wave 3 (data collected from November 2 – December 18 2020), the country endured level 1 restrictions, where most activities were allowed except for international travel (Premier Sihle Zikalala, 2020). However, in December 2020, the new Delta variant of the virus was detected in South Africa. There were 892,813 confirmed cases by the end of wave 3, with a spike in the new cases averaging 9,126 daily (Mathieu et al., 2020). The total accumulated deaths from the pandemic until the last day of data collection for wave 3 was around 24,011 cases. Even though the restrictions were not strict, the number of people infected increased during that time (more than 50,000 active cases), which could have caused increased stress levels (President Cyril Ramaphosa, 2020), hence the significant change in the median values of the mental health outcomes between wave 2 and 3 (see Table 3).

Finally, during wave 5 (data collected from April 6 – May 11 2021), level 1 restrictions remained but were relaxed with many additional freedoms, such as gatherings of up to 250 people for outdoor events. There were 1,597,724 confirmed cases at the end of wave 5, with a relatively small number of new cases averaging 1,129 daily (Mathieu et al., 2020). The total accumulated deaths from the pandemic until the last day of data collection for wave 5 was around 54,825. The first COVID-19 vaccine by Johnson & Johnson was introduced in South Africa on February 17 2021, and the Pfizer vaccine was introduced in May 2021. We assume that the introduction of vaccines and the low number of daily cases were probable reasons for the changes between waves 3 and 5 being not significant.

Overall, the results show that South African women improved their mental health after one year of the pandemic. This is consistent with the expectation that after the initial shock, women’s mental health will return to the baseline (see Gigantesco et al., 2022).

Comparison of Depressive Symptoms among Women Depressed in 2017

Our second aim is to investigate whether being depressed in 2017 is related to women’s mental health during the pandemic. As mentioned in Section “Controlling for Pre-Existing Depressive Symptoms Among Women”, the variable “being depressed in 2017” is time-invariant; thus, the most efficient estimator is a pooled ordered logit model, clustering standard errors at the individual level for each outcome variable (Table 4).

Table 4 Pooled ordered logit model while controlling for pre-existing depressive symptoms

We find that being depressed in 2017 is statistically significant and has a positive sign only in the ‘depressive feelings’ model. It means that a predisposition to depression increases the probability of being depressed during the pandemic. The finding aligns with the previous cross-sectional research that controlled for the pre-pandemic mental health of women (see Kim et al., 2022).

Therefore, women who felt depressed prior to the pandemic experienced a greater initial shock to their mental health than those who had never experienced any depressive symptoms. This could also explain why we did not see a significant increase in the ‘depressive feelings’ over time in Fig. 1.

Factors Associated with Women’s Mental Health in South Africa

Now, we continue with our final aim of analysing the factors associated with women’s mental health deterioration (Table 5).

We interpret the fixed effects ordered logit results, which include wave-fixed effects – thus controlling for all observed and unobserved characteristics of waves (such as stricter regulations and more COVID-19 cases or deaths) and unobserved time-invariant characteristics of individuals.

From Section “Descriptive Analysis of Women’s Mental Health Over Time”, two things have become apparent. First, it seems that the COVID-19 pandemic did create a need to disaggregate the PHQ-2 scale, allowing us a more granular investigation in determining those factors associated with a worsening in ‘depressive feelings’ and ‘anhedonia’. Second, the biggest change in the outcome variables was recorded in the ‘anhedonia’. As such, we will discuss those factors associated with an increase in ‘anhedonia’ for women during the pandemic and note where these differ from those associated with ‘depressive feelings’ and the PHQ-2 scale.

Table 5 Results of the pooled ordered logit and fixed effects ordered logit models

In terms of the demography, we find that African women’s probability of experiencing worsening mental health is less than that of other race groups (see pooled ordered logit model in Table 5). This aligns with the Oyenubi et al. (2022) study. In South Africa, race groups and socio-economic status are highly correlated; former studies found higher levels of depressive symptoms are related to lower socioeconomic status. However, we find a plausible explanation in the COR (Conservation of Resource) theory (Hobfoll et al., 2016), which states that psychological distress under specific conditions depends on one’s perception of the decrease in your resources. Therefore, it is possible that the unprecedented conditions of COVID-19 affected people with higher socioeconomic status more than those with lower status.

Compared to the other forms of dwellings (informal, traditional), living in a formal dwelling increases women’s ‘anhedonia’. This is unusual because formal dwellings usually have better characteristics and provide higher living standards. It is plausible that increasing financial insecurity and loss of employment increase stress around housing and increase the risk of eviction (Parke & Adebayo, 2021). This finding holds for both the ‘depressive feelings’ and PHQ-2 scale (see fixed effects ordered logit model in Table 5).

Apart from race and dwelling, we also find two results (although not statistically significant) worth noting. Age and household income (log) do not seem to play a role in worsening mental health across all our models. We suspect that a period of one year between the waves was not long enough to capture the real effect of growing older during the pandemic or that being older did not play a role since all age groups were susceptible to the coronavirus. Regarding household income (log), the government did not financially support households with higher income levels compared to the social support (in the form of grants) provided to households with lower income levels.

Being employed compared to not economically active (NEA) statistically increased women’s ‘anhedonia’ (Table 5). The change in the work conditions (work from home; reduction in the number of staff on site) meant that women had to endure a disproportionate share of the burden of unpaid care and domestic labour (cleaning, cooking, other chores) while still performing their duties in formal employment (EIGE, 2021; Krentz et al., 2020). These additional responsibilities reduce work performance, extend the total working hours (paid and unpaid), and increase the chances of burnout and apathy. From granular descriptive statistics by outcome variableFootnote 6, we observed that the biggest change in employment status, compared to NEA, was in the group of employed women rather than those actively looking for employment (unemployed).

In contrast, being unemployed compared to NEA increased the probability of the PHQ-2 scale becoming more severe among women. This result is consistent with existing research (Posel et al., 2021; Oyenubi & Kollamparambil, 2022). The pandemic increased the chances of job losses, where women’s employment declined by 4.2% between 2019 and 2020 (ILO, 2021). Being unemployed means financial and job search stress, while women who do not seek employment (NEA) might rely on other income sources.

When looking at the household size, we see that increasing the number of people in the household increases the probability of ‘anhedonia’ for women during the pandemic (Table 5). More people in the household means less personal space and more social interactions, leading to family conflicts and disagreements, especially when everyone is forced to spend more time at home (Humphreys et al., 2020). Granular descriptive statistics show that employed women lived in larger households, which, as we mentioned above, increases the amount of unpaid domestic work and higher levels of exhaustion. This finding is not statistically significant in either the ‘depressive feelings’ or PHQ-2 models.

Furthermore, women taking care of children for 13 to 24 h a day, compared to those who did not, increased women’s ‘anhedonia’. As formal and informal childcare centres and schools were closed, the burden fell on adult women to spend more hours looking after and schooling children. There are great imbalances between the gender distribution of childcare and domestic work, where women did an average of 15 h more each week than men during the pandemic (Krentz et al., 2020). Moreover, an Australian study found a worrying pattern where employed parents with children aged 5 to 11 experienced the highest level and the strongest increase in their mental distress than unemployed parents during the pandemic (Broadway et al., 2020). This finding holds for the PHQ-2 scale but is not statistically significant in the ‘depressive feelings’ model, which aligns with the results on employment status.

When a woman lives in a household receiving a government grant, it decreases the probability of women having ‘anhedonia’ (Table 5). Comparable results were found in the study by Posel et al. (2021). Having a grant gives one a sense of security and knowing there is money to do things, preventing households from falling into extreme poverty (Senona et al., 2021). A more granular analysis of the descriptive statisticsFootnote 7 showed that women who lived in a household receiving a government grant were mostly unemployed or not economically active. This means they did not have the additional responsibilities associated with formal employment, thereby allowing them more free time. This finding does not hold for the ‘depressive feelings’ or PHQ-2 models.

Having someone who has gone hungry in the household increased the probability of women having ‘anhedonia’. This result is statistically significant across all models. StatsSA (2020) reported that almost 23.6% of South Africans were moderately or severely food insecure in 2019, which is proven to be stressful and can lead to higher levels of anxiety and depression (Han et al., 2022). The pandemic caused economic disruption, with many businesses shutting down as well as disruptions in the food supply chains due to travel restrictions. This could lead to food shortages and price increases, creating difficulties in accessing food.

Two factors were significant in the ‘depressive feelings’ model and the PHQ-2 scale while not statistically significant in the ‘anhedonia’ model.

An increase in the years of education increased the probability of ‘depressive feelings’ becoming more severe among women (Table 5). Women with more years of education have higher job positions and responsibilities, which can cause more pressure on their performance. Additionally, women with more years of education may consume more news and information. Due to the spread of misinformation and disinformation during the pandemic (Borges do Nascimento et al., 2022), there was an increased exposure to negative and depression-provoking information. This finding holds for the PHQ-2 scale.

Being married compared to being single decreases the probability of severe ‘depressive feelings’ among women (Table 5), consistent with the literature (see Yirmiya et al., 2021). Having someone (partner or a spouse) during a crisis is crucial as they can provide emotional support, comfort and security (Kumar et al., 2020). This finding holds for the PHQ-2 scale.

Lastly, one factor was significant in the ‘depressive feelings’ model while not statistically significant in the ‘anhedonia’ and PHQ-2 scale models. Wearing a mask decreases the probability of women having ‘depressive feelings’ (Table 5). Mask-wearing was a mandatory requirement policy in South Africa (Department of Health, 2022) and aimed to prevent the spread of the virus. Even though mask-wearing has brought conflicting views on its benefits and risks, individuals who believed in the protective properties of mask-wearing showed decreased anxiety and increased feelings of control over health outcomes (Schneider & Leonard, 2021). Since the vaccines were not rolled out during the first year of the pandemic (the phase 1 vaccination programme began in February 2021), mask-wearing was one of the easily available prevention measures.

Another protective behaviour is washing hands, which is worth noting, although it is not statistically significant. Even though hand hygiene was highly imposed by the WHO (2020) and the South African government, there may have been several reasons influencing the effectiveness and significance of handwashing in our study. Firstly, some areas in South Africa lack access to clean water, and with our sample having 30% of women living in such an environment, the inconsequential result is not surprising. Moreover, compared to mask-wearing behaviour (more visible in a public setting), handwashing is a personal and more private behaviour that is harder to detect and regulate. For example, a study of the general population in Germany conducted from May 2020 to August 2021 revealed that a large portion of the adults (one-third) did not comply with hand hygiene and had no intention to change their behaviour (Lippke et al., 2022).

Additionally, a woman who thought she could get coronavirus was not statistically significant in the fixed effects ordered logit model, which controls for the within the individual variation, but was statistically significant in the pooled ordered logit model, indicating the increase in the likelihood of thinking you will get coronavirus increases the chance of a worsening mental health state. Studies by Oyenubi and Kollamparambil (2020), Oyenubi et al. (2021) and Oyenubi and Kollamparambil (2022) also found this result when analysing separate waves of the NIDS-CRAM data on both genders. Risk perception of contracting COVID-19 may have been more significant in the earlier stages of the pandemic and evolved as the pandemic progressed, with the thinking that everyone eventually would contract the coronavirus.

Conclusions

Women are a vulnerable group with a particular risk of exacerbated mental health caused by the COVID-19 pandemic. We contributed to the existing literature by analysing the factors associated with women’s mental health in a developing country in Africa at the micro-level. We followed the same women over time and controlled for pre-existing depressive symptoms. We tracked changes in mental health throughout the pandemic to identify the medium-term effects rather than the immediate effects of the pandemic. We took advantage of having a combined Patient Health Questionnaire (PHQ-2) scale of depressive symptoms as well as two separate items of this scale (‘depressive feelings’ and ‘anhedonia’).

We found that median values of the depressive symptoms, as measured by the PHQ-2 scale, significantly increased during the first period; nonetheless, these eased over time. However, when it came to the individual scale items, ‘depressive feelings’ and ‘anhedonia’, we found contrary to expectations, they behaved differently over time. The median values of the ‘depressive feelings’ remained relatively constant, whereas the median values of ‘anhedonia’ increased considerably over the investigation period. This was unexpected since items in a scale should be highly correlated when measuring the same construct. Subsequent results from testing the internal reliability of the PHQ-2 scale (Cronbach’s α = 0.49) made it clear that the two items measure different concepts and should be considered separately to attain deeper insights.

Furthermore, we found that being depressed before the pandemic increased the probability of women being more depressed during the investigation period. Since the Kruskal-Wallis H test indicated that the difference in median values of the ‘anhedonia’ between wave 2 and wave 5 was statistically significant, we focused on this measure in subsequent analyses.

Upon investigation, those factors associated with an increase in ‘anhedonia’ were being employed, having a larger household size, living in a formal dwelling, having a person who has gone hungry in the household, and taking care of children for 13–24 hours a day. In comparison, living in a household where someone received a government grant decreased ‘anhedonia’ and improved women’s mental health.

For our ‘depressive feelings’ model, we found that being more educated, living in a formal dwelling and having a person who has gone hungry in the household increased the probability of depressive feelings, which worsened women’s mental health. In contrast, having a partner and wearing a mask decreased the occurrence and improved women’s mental health.

Regarding the combined measure PHQ-2, we found that being more educated, unemployed, living in a formal dwelling, having a person who has gone hungry in the household, and taking care of children for 13–24 h a day increased the probability of depressive symptoms. Again, having a partner decreased these possibilities and improved women’s mental health.

Considering the above, our results reinforce the argument that ‘anhedonia’ and ‘depressive feelings’ measure different concepts and provide previously masked information when only considering the combined PHQ-2 scale. Further investigation is needed on the PHQ-2 scale and its use to evaluate mental health during unprecedented times.

Given our findings, there is a need for economic and social policy interventions to protect and improve women’s mental health. As the United Nations’ Policy Brief Report (2020:5) states, there is a need to “place women’s economic lives at the heart of the pandemic response and recovery plans”. Firstly, we suggest creating mental health policies to assist women during a crisis, creating special programmes protecting women with pre-existing psychological vulnerabilities and targeting areas such as job security, access to food and daycare for children. Furthermore, health insurance, paid sick and maternity leave, and social security schemes (cash transfers for women with care responsibilities) should be available beyond formal employment (United Nations, 2020:5). Effective interventions by government and non-government institutions can mitigate the negative effects of the pandemic on people (Sirgy et al., 2022).

While this study makes a valuable contribution to the existing literature on women’s mental health, there are several limitations caused by the chosen dataset. NIDS-CRAM data used a special Computer Assisted Telephone Interviewing method, which meant that our sample was limited to people with phones. Secondly, the survey dataset did not have household-level questionnaires and limited questions on personality traits, religion, and spirituality. Finally, there were gaps in the surveyed data. Some information was collected in several waves but was not reported in others, resulting in us being unable to include some variables (for instance, health).