1 Introduction

The aim of this study is to determine attitudinal changes within the African country of Malawi during the early stages of the COVID-19 pandemic between the months of May and June 2020. The survey data used for this analysis is panel data and is part of a global initiative, supported by the World Bank which was obtained through a mobile phone survey.

A further aim of this study is not only identifying the attitudinal effects of COVID-19 on the Malawi population, but highlighting propensity score methods using both single level and multilevel data. Propensity score methods are used within this study, as propensity score methods allows researchers to balance a dataset between a treatment and control, allowing inference to be made using observational, non–randomized survey data. Propensity score methods is an important methodology as it allows causal statements to be made, with causes being only things that could, in principle, be treatment effects (Holland 1986), known here as the participants attitudes of COVID-19, has on the Malawi population.

In randomized studies, balance within the covariate distributions between the treated and control groups is guaranteed by design, (Imbens and Rubin 2015). In observational studies, such as this Malawi study, this balance is not guaranteed. Propensity score methods are a group of strategies that are needed to balance a dataset so a fair comparison can be made between the two months, (May and June 2020).

The contribution of this study is to serve as a baseline for longitudinal studies and interrupted time–series modelling for determining the attitudinal change allowing for tracking this attitudinal change throughout the COVID-19 pandemic. By undertaking further analysis of the preceding months using the methods described within, provides insight to areas where extra resources, information, health care workers, and services can be applied to the Malawi population should a similar event occur. A further aim of this study was not only identifying the attitudinal effects of COVID-19 on the Malawi population, but highlighting propensity score methods using both single level and multilevel data.

An initial exploratory analysis highlighted the utility of propensity score methods in identifying methodological issues of the survey study. Using propensity score methods highlighted a large, unbalanced dataset that showed possible bias when undertaking the Malawi survey which also contained a large amount of missing data. Missing data was inputted and identification of the cause of the unbalance was the demographic variable, District which showed that it did not have good coverage across the two surveyed months.

Both single and multivariate modelling methodologies are used for this study. Noticed within literature is the high degree of confusion about the key properties of Fixed Effects (FE) and Random Effects (RE) models, see Bell et al. (2019) and that testable assumptions are often ignored, Antonakis et al. (2021). Adding to this, there is a proliferation of Multilevel Modelling (MLM) myths associated with MLM, (Huang 2018). As a consequence, MLM literature is fairly polarized using RE modelling techniques. This study used a FE modelling approach as the research question relates to the means (Theobald 2018), the overall mean shift between the months of May and June 2020, between the studies covariates.

Modelling for the level (grouping) effects or referred to here as contextual effects, can result in many sub–models, see e.g. Finch et al. (2019). Contextual effect modelling is undertaken using created Feature Indicator Variables (FIV). FIV are composites of the survey questions of similar attributes of the generalized household questions that aid in undertaking multilevel modelling as it reduced the number of variables required for contextual modelling. A further advantage of using FIV, they allow for comparisons of attitudinal changes within other countries and surveys as these FIV are composed of similar attributes.

This study fills the gap in survey analysis, as to the best of our knowledge this is the only study that includes multilevel propensity score and FIV, from the selection and undertaking MLM using propensity score methods with results that can be comparable and transferable to other surveys for the application of predicting attitudes of a similar COVID-19 event. Whilst there are many other analytical methods available, this methodology presented provides an alternate that will only add to the "body–of–knowledge" to our understanding about attitudinal change within a pandemic. We proceed as follows: In Sect.  2 we provide a brief description about Malawi. In Sect. 3 we provide an exploratory analysis. In Sect. 4 we discuss the theory of propensity score analysis. In Sects. 5 and 6 we present the propensity scores analysis. Finally in Sect. 7, we make our conclusions.

2 The study setting

This section has been provided for background information about Malawi. Malawi is a landlocked country in southeast Africa. It has an estimated population of 17.4 million people in 2017 with an average annual growth rate of 2.7%, giving an estimated population of 20.4 million people by 2022, Makwero (2018). Malawi’s capital (and largest city) is Lilongwe. Its second largest is Blantyre, its third–largest is Mzuzu and its fourth–largest is its former capital, Zomba, Wikipedia Contributors (2021).

Malawi is among the world’s least–developed countries, Wikipedia Contributors (2021). Agriculture is a principal source of livelihood in Malawi. Around 84% of Malawians earn their livelihood directly from agriculture, which also accounts for over 90% of the country’s export earnings, Chinsinga and Chasukwa (2018).

The country’s first three COVID-19 cases were on the 2nd April 2020. In May 2020 there were 247 new cases, bringing the total number of confirmed COVID-19 cases to 284 with a death toll of 4, Wikipedia Contributors (2021). In June 2020 there were 940 new COVID-19 cases, with the confirmed cases being 1224 and a death toll of 14, Wikipedia Contributors (2021).

Health services in Malawi are provided by the public, Private For Profit (PFP) and Private Not For Profit (PNFP) sectors. Malawi’s health system lacks resources, suffers from maldistribution of staff and funding between the rural and urban settings and across tiers of care, Makwero (2018). Malawi also has a critical shortage of medicines within the health facilities and long waiting times which is coupled with the negative attitudes of health workers, the cost of health care and long distances to health facilities, Munthali et al. (2019).

3 Exploratory analysis

The exploratory analysis undertaken, gains initial insight into the Malawi survey and serves to guide any survey analysis. The Malawi survey included demographic responses which suggests that the dataset can be multilevel. The possible grouping effects were identified as the demographic variables of Gender, Age, Region, District, and Urban/Rural.

The demographic variables within the Malawi survey are shown at Table 1. This shows that the Malawi survey demographics contains 5 variables. The Malawi survey contains a further 57 covariates that related to attitudinal attributes of the Malawi survey participants, shown at Appendix A.

Table 1 Malawi survey demographic variables

This analysis used the R software package, https://CRAN.R-project.org/. Missing data within the Malawi survey is imputed using the R mice software package, refer to van Buuren and Groothuis-Oudshoorn (2011) and Leite (2016). This allowed for the analysis of a dataset to be undertaken using all of the available data instead of deleting the affected units containing missing data. Deletion of the affected units may introduce bias into the analysis and reduce the statistical power of the study, leading to invalid conclusions, Kang (2013).

3.1 Survey dataset Re–coding

The Malawi survey contained survey responses using different likert scales. The attitudinal variables must be re–coded (scaled) from [0,1]. This allows for the estimated treatment effect result values at Table 27, to be comparable and transferable across the results and possible other surveys. Some survey responses required further re–numbering to allow for positive attitudinal responses to be 1, and negative attitudinal responses to be 0. Examples of the re–coding are as follows;

  1. 1.

    Yes, No = 1, 0

  2. 2.

    YES, NO, NOT APPLICABLE = 1, 0, 0.

  3. 3.

    Very worried, Somewhat worried, Not too worried, Not worried at all = 1, 0.75, 0.5, 0.25, 0.

  4. 4.

    AGREE, NEITHER AGREE NOR DISAGREE, DISAGREE = 1, 0.5, 0.

3.2 Initial survey balance analysis

An exploratory analysis into the initial covariate balance was undertaken using the unmatched estimated propensity scores, Olmos and Govindasmay (2015) and the results are shown at Figs. 1 and 2. Figure 1 shows, that the survey dataset was severely unbalanced. Figure 2, shows that by removing the District variable, greatly improves the datasets balance. Checking the datasets propensity score balance prior to matching has highlighted potential issues within the dataset.

Fig. 1
figure 1

Balance of full dataset

Fig. 2
figure 2

Balance with district removed

Figure 3 shows, the District variable distribution within the Malawi survey for the months of May and June 2020. This shows that some districts are not represented for both months. Further, there may have been some bias towards surveying Lilongwe and Lilongwe City within the month of June. This has resulted in the extreme unbalance within the dataset, Fig. 1.

Fig. 3
figure 3

Malawi survey districts (May=0 and June=1)

The bar chart at Fig. 3, may suggest that for some survey participants, the survey may have been undertaken using a bias of alternate districts for each month. This bias has greatly affected the analysis in obtaining balance.

3.3 Created feature indicator variables

A framework of indicators called here, Feature Indicator Variables (FIV) are created for the analysis of this Malawi survey as they provide useful information to measure and interpret the causes and effects, (Mainguet and Baye 2006). These 7 created FIV were identified as reflecting the key attributes of the Malawi survey questions. The 7 created FIV allow for the analysis to focus on the surveys attitudinal attributes in determining the attitudinal change. This aids in the interpretation of results as the results are grouped into attributes of similar types. It is this feature that allows for the survey results to be comparable and transferable to other surveys with similarly constructed FIV.

Using the 57 attitudinal variables in determining the contextual effects for a multilevel model requires a large amount of modelling and time. The FIV allowed for reducing the 57 attitudinal survey questions into 7 smaller categories of similar attributes. This reduction in categories of similar attributes makes contextual modelling achievable in covering the complete dataset.

The created FIV are shown at Table 2, with the full listing of the created FIV composition given at Appendix A. The results for the survey analysis at Appendix A, are the individual variable results, which have been ordered into their respective FIV group.

Table 2 Created featured indicator variables

4 Propensity score methods

In observational studies, the treated subjects often differ systematically from untreated subjects. A unbiased estimate of the average treatment effect cannot be obtained by directly comparing outcomes between the two treatment groups, Austin (2011). In 1983, Rosenbaum and Rubin (1983) proposed the use of propensity score methods in observational studies in determining causal effects, which is based on the Rubin’s causal model, Holland (1986). This would allow for direct comparison to be undertaken between a treated and non–treated group. A direct comparison can be undertaken as propensity score methods can mimic a randomized control trial or study, see e.g. Austin (2011).

Propensity score methods are a group of strategies, see e.g. Leite (2016) that aim to reduce selection the bias by balancing the differences between the treated and untreated individuals on observed covariates in observational studies. The propensity score is defined as the probability of treatment assignment conditional on observed baseline covariates, Austin (2011). All of the subjects that share the same propensity score, the distribution of observed baseline covariates will be the same between the treated and untreated, Austin (2011).

The propensity score is frequently estimated, Austin (2011) using a logistic or a probit model with exposure to the treatment as the dependent variable. The estimated propensity scores are then balanced using a matching method, that equate or “balance” the distribution of covariates in the treated and untreated groups,see e.g. Stuart (2010). In this way, propensity score methods enable a researcher to determine the effect of treatment on a target group.

For this Malawi study, each subject has a pair of potential outcomes; \(Y_i(0)\) and \(Y_i(1)\). However, each subject is either in the treatment group, \(Z=1\) or the non-treatment group, \(Z=0\). With only one outcome that can be observed for each subject which is given by, Austin (2011);

$$\begin{aligned} Y_i(Y_i=Z_iY_i(1)+(1-Z_i)Y_i(0)) \end{aligned}$$
(1)

For each subject (Austin 2011), the effect of treatment is defined as \(Y_i(1)-Y_i(0)\), with the average treatment effect (ATE) at the population level, of moving an entire population from untreated to treated, which is given by;

$$\begin{aligned} E[Y_i(1)-Y_i(0)] \end{aligned}$$
(2)

A related measure of the ATE (Austin 2011), is the average treatment effect for the treated (ATT). The ATT, the average effect of treatment on those subjects who received treatment, is given by;

$$\begin{aligned} E[Y_i(1)-Y_i(0) \vert Z=1] \end{aligned}$$
(3)

The ATT is the effect of the treatment that is applied, while ATE tells us how much the typical survey respondents gained or lost. The ATE is used when the interest is in the average treatment of the entire population, whereas the ATT is used when interest is in the average treatment effect of those treated. For this study, the ATE is of primary concern. The matching method used in propensity score methods needs to be determined, as some matching methods only estimate the ATT.

5 Multilevel modelling and propensity scores

Multilevel models were developed for the analysis of hierarchically structured data. Hierarchical data are structures which consists of lower level observations that are nested within higher levels, see e.g.(Paccagnella 2006).

Social science datasets often have complex structures. Often, observations at level 1 are clustered into groups of some kind at level 2. Sometimes data can be grouped at further levels, yielding three or more levels, see e.g. (Bell et al. 2019). This Malawi study only investigates a 2–level model, which is where the 1–level observations are grouped into clusters. These grouped clusters for this study are the demographic variables.

There needs to be a specific case to undertake Multilevel Modelling (MLM). Just because observations are nested within clusters does not automatically mean MLM is required, Huang (2016). A single level model that does not account for any clustering effects but, may at times be the simplest and best option. Undertaking MLM when it is not required can complicate the modelling process and risk using a miss–specified model. This includes multilevel propensity score methods and determining which modelling strategy to use, whether using Random Effects (RE) or Fixed Effects (FE) methods is a critical step.

For this study as stated earlier, a 2–level model is only considered. The possible 2–Level combinations of the Malawi survey are identified at Table 3.

Table 3 2–Level Combinations

As this study uses FE, the Likelihood Ratio Test (LRT), also known as the Chi–square difference test, has been used to determine whether the contextual (clustering) effects are significant, and whether the Malawi survey needs to be modelled as a multilevel model. LRT is a hypothesis testing procedure that allows for the comparison of two nested models, (Lorah 2018).

Results for the FE contextual models is given at Table 4. The contextual effects have been estimated using the created Feature Indicator Variables. This allows for a significantly reduced number of models to be created in determining the contextual effects.

Table 4 Fixed effects contextual modelling results

The results from Table 4, show all the featured indicator variable, apart from Age resulted in having a significant clustering effects. The clustering effects of Gender, Region and Urban/Rural, have been modelled using multilevel propensity score methods at Sect. 5.2. The indicator variable District, was unsuitable for multilevel propensity score modelling due to insufficient coverage, although it does show significant clustering effects.

5.1 Single level propensity score analysis

After the creation of the FIV, we measure the key attributes results of the survey, see Figs. 2 and 5 with a single level propensity score model. A propensity score is a distance measure, (Stuart 2010). Within literature it is common to see that the propensity scores being estimated using a Generalized Linear Model (GLM). It is the results of the GLM, that are passed to a propensity score matching process that determines the effect of treatment. There are a variety of other propensity score estimation methods available. Guidance on which method may be best suited is sparse, but whatever method produces the best balance should be considered. Matching is a method of sampling from a large reservoir of potential controls to produce a control group of modest size in which the distribution of covariates is similar, see e.g. Rosenbaum and Rubin (1983).

Matching can be defined to be any method that aims to equate (or balance) the distributions of covariates in the treated and control groups, Stuart (2010). Using a matching method that results in unbalanced samples should be rejected, and alternative methods should be adopted until a well–balanced sample is obtained.

Matching can be undertaken multiple times, if required until a balanced result is obtained, as long as the dataset has enough data to allow for this multiple matching. Matching methods have straightforward diagnostics by which there performance can be assessed, Stuart (2010), as shown at Table 4. Advise in using propensity score analysis is to select a method that yields the best balance.

Common propensity score matching options, refer to Stuart (2010) and Olmos and Govindasmay (2015) which can be undertaken using the R package MatchIt, Ho et al. (2018) with the method’s estimates (ATT, ATE), as follows;

  1. 1.

    Nearest Neighbor (Greedy) Matching (ATT),

  2. 2.

    Optimal Matching (ATT),

  3. 3.

    Full Matching (ATT and ATE),

  4. 4.

    Inverse Probability Weighting (ATT and ATE), and

  5. 5.

    Sub–Classification (ATT and ATE).

As the Malawi survey contained no sampling weights, we used the sub–classification matching method to balance the covariates as it allows for the calculation the ATE and has resulted in a well balanced model. The results at Fig. 4, shows the resultant propensity score balance using the sub–classification matching method.

Fig. 4
figure 4

Propensity score balance using sub–classification Matching

5.2 Multilevel propensity score analysis

Further to the single level propensity score model, since we determined that the Malawi survey is clustered, see Fig. 4, we undertook multilevel propensity score modelling. Once determining that the Malawi survey has a multilevel structure and requires a FE methodology can be a bit problematic. Analysis of multilevel survey data is mostly associated with mixed effects models, and a Google search "Multilevel modelling" confirms this. Undertaking a multilevel propensity score model using FE is not well documented.

Multilevel propensity score modelling is more labour intensive than a single level propensity score model as each covariant needs to be modelled. There are specific packages available within R, that allow for multilevel propensity score analysis, this analysis has used the package CMatching, Cannas (2019). The package ’CMatching’ performs propensity score matching through the inclusion of fixed or random effects, Cannas (2019).

Multilevel propensity score methods using CMatching, requires that the propensity scores are estimated before implementing the matching, Cannas (2019). The CMatching package uses a GLM in estimating the propensity scores. This analysis has used the propensity score estimated for the single level sub–classification matching process. This allowed for the two modelling methods to be based on the same propensity scores.

Using the propensity scores from a single level dataset that has been balanced using a method that removes the unmatched data, for example nearest neighbor matching, allows for a balanced dataset to be used within the CMatching package. This may have advantages in using a balanced dataset as balance graphics, refer to Fig. 4 are not provided within the CMatching package within the R software.

6 Modelling results

Two propensity score modelling methods were undertaken for analysis using propensity score methods. One model used an unstructured single level model, with another model using multilevel propensity score methods.

Table 5 and Fig. 5, give the overall attitudinal change between the months of May and June 2020, for a single level model. Table 6 and Fig. 6, give the multilevel propensity score results based on the FIV. The multilevel propensity score results provide more details about the grouping effects of the demographic variables and how this varies within the surveyed demographics of Gender, Region and Urban/Rural.

The single level propensity score method showed that the overall attitudinal results were similar between the months of May and June 2020. The multilevel propensity score model showed that there are some significant differences between the different demographics of Gender, Region and Urban/Rural.

Table 5 Summary of single level featured indicator variable ATE results
Fig. 5
figure 5

Single Level Malawi survey attitudinal change summary

Table 6 Summary of multilevel featured indicator variable ATE results
Fig. 6
figure 6

Multilevel Malawi survey attitudinal change summary

The individual variable results for the single level propensity score model are given at Appendix B, with the multilevel propensity score model results at Appendix C. The multilevel propensity score results gives the attitudinal change between the variables that are grouped within Gender, Region, and the Urban/Rural attitudinal variables. This is of significant interest in prediction, directing resources, information, health care workers, and services.

7 Conclusion

This study highlights a methodology that has been used for the analysis of a Malawi survey. This study will serve as a baseline for future work for longitudinal studies and interrupted time–series modelling for determining attitudinal change. This study provides information about the attitudinal change on the identified multilevel grouping effects of Gender, Region, and Urban/Rural within Malawi based on the FIV. This study provides aid in prediction and determining the directing of resources, information, health care workers, and services should a similar event to COVID-19 occur.

This study is focused on Malawi, by using the FIV which are based on survey questions of similar attributes and the re–scaling of the survey responses [0,1], along with the other modelling methods noted within. Future work will allow for comparisons between other associated surveys to be comparable to these results. Further, using the FIV has limited the number of models required and has allowed for a quick method of determining the contextual effects of the dataset, determining multilevel modelling is required which could then be applied to the propensity score model.

This study identifies the need in determining the contextual effects along with the suitable fixed or random effects modelling strategy in undertaking a survey analysis. Further, this study shows that using a single level propensity score model gave the overall attitudinal change between the two months of May and June 2020, for the FIV and each of the survey’s questions. Using multilevel propensity score analysis provides more information about the attitudinal change as it has identified the attitudinal differences between the identified multilevel grouping effects of Gender, Region, and Urban/Rural within Malawi.

An unexpected result of this analysis highlights when undertaking propensity score balance prior to matching has provided insight to survey methodology issues. Balance checking prior to matching should form part of any survey analysis.

The methodology used within this study can be extended to other (COVID-19) survey studies from different countries by using the FIV and the other modelling methods noted within, that will allow for comparisons to be made, giving insight to areas where extra resources, information, and services can be applied.