Text box 1. Contributions to literature

• Surveys aiming to rapidly evaluate the impact of the COVID-19 pandemic on the population faced budget and time challenges and dependent on existing survey infrastructure. Non-probability web surveys prone to self-selection were commonly used.

• In Belgium, the COVID-19 health surveys, 10 cross-sectional non-probability web surveys with a longitudinal component, were organized. Using diverse recruitment strategies a substantial number of participants were reached. Yet, significant socio-demographic differences in the participant pool were found.

• Non-probability web surveys were an important information source when probability surveys were impossible. Initiatives to improve the survey infrastructure to be better prepared for future crises are important.

Background

The COVID-19 pandemic caused by the severe acute respiratory syndrome coronavirus (SARS-CoV-2) has had a tremendous impact on people’s lives due to the uncertainties and fears that were associated with the outbreak of the virus [1,2,3,4]. In addition, people’s lives were affected by nation-wide preventive measures adopted to reduce the transmission of the virus including physical distancing and lockdowns during the most critical phases of the crisis. In order for decision makers to manage the outcomes of the crisis, close epidemiological monitoring was of utmost importance. Besides the need of surveillance data on the number of COVID-19 infections, hospitalizations, deaths and vaccination, data on how the population experienced this long-lasting crisis were also crucial [2, 5, 6]. Information on the impact of the crisis on mental health in terms of anxiety, depression, psychological distress, loneliness, etc. [1,2,3,4, 7] and on health related behaviors e.g. physical activity, sedentary behavior and nutritional habits [8,9,10,11] was crucial. Moreover, data on the knowledge, perceptions and adherence to the preventive measures were highly valuable as the effectivity of these measures was largely depended on the compliance of the population [5, 12,13,14,15,16].

These data could be collected using population-based surveys but the pandemic imposed some specific challenges. In the first weeks of the pandemic surveys needed to be developed and organized rapidly to assess the impact of the most severe lockdowns [13, 14]. Related to this, the first surveys had to be organized with limited budgets since there was no time to request large funding [13]. In addition, due to physical distancing surveys could not be administered face-to-face [13]. Lastly, it was expected that this public health crisis and the associated psychosocial effects would impact the population in the longer run. Consequently, it was advisable to monitor the population over time using multiple surveys [4]. These aspects impacted the methodological choices of population-based surveys set up during the COVID-19 pandemic. More specifically, they guided choices related to the mode of data collection, the sampling approach and the type of observational study design.

The face-to-face mode is traditionally seen as the ‘gold standard’ for data collection. An interviewer on the doorstep is effective to get high participation rates, even in more difficult to reach population groups [17, 18]. Moreover interviewers can clarify questions, probe for responses and keep participants motivated through long questionnaires [19, 20]. Nevertheless, confinement and quarantine periods made this impossible. An alternative interviewer-administered mode often considered in COVID-19 times was the telephone mode [21]. It is, however, expensive and time consuming as interviewers need to be hired and a specific infrastructure needs to be developed. The web mode has some particular advantages over these interviewer-administered modes and over the paper-and-pencil self-administered mode such as 1) it is completely self-administered which not only reduces the chance for social desirability bias but also limits the chance to spread the virus; 2) it is, contrary to a paper-and-pencil mode, computer-assisted and therefore presents the advantage of automatic data entry and automatic branching logic; 3) costs are lower compared to the paper-and-pencil, face-to-face and telephone mode and 4) web surveys can be developed and implemented rapidly with easy-to-use software. Researchers therefore turned to online data collection in pandemic times [6, 21,22,23,24].

Although, there are many advantages to web data collection, there are certain risks too. Web surveys exclude people without internet access or skills to use the internet by default to complete the survey [25]. Recent data showed that 6% of the Belgian population (16-74 years old) never used the internet and about 40% has low or no digital skills [26]. A recent Belgian study showed that web surveys are prone to low response rates, especially among elderly, lower educated people, people with a migration background and people living alone [18]. Moreover, internet user and non-users might have a different health profile and weighting for demographic variables does not eliminate the observed health differences [27].

Another important methodological choice, often linked to the mode of data collection, is between probability and non-probability sample surveys. In the first approach each member of the population has a known and positive chance to be selected which enables statistical inference, non-probability sample surveys refer to all other types of surveys [28]. Probability sampling is preferred over non-probability sampling, especially for estimating population characteristics [25, 29, 30]. However, setting up surveys in new probability samples can be an expensive and time consuming process [21]. Many COVID-19 web surveys were therefore organized in non-probability samples [23, 24]. These non-probability web surveys were either quota samples from commercial panels [12, 15] or convenience samples with self-selected participants [13, 31,32,33,34,35]. By using e-mail, website or social media announcements for convenience surveys thousands of participants can be reached instantly [28]. Nevertheless, the validity of research findings depends more on the representativeness capacity than on the participant number [21].

Lastly, not all types of observational study designs are suited to monitor the impact of the evolving situation over time. Integrating COVID-19 surveys within existing longitudinal surveys having pre-pandemic information was recommend for this [36,37,38]. In these types of surveys, the same participants were questioned on the same topic before and during the pandemic using (ideally) the same data collection mode. This design allowed to study causal relationships at both the individual and group level. Yet if no existing longitudinal surveys could be used, new longitudinal surveys to follow-up the same individuals were considered valuable as well [4, 39, 40]. The drawback, however, is that comparisons with pre-COVID-19 survey data were hampered due to other methodology approaches in the new surveys.

This manuscript discusses how researchers at Sciensano, the Belgian institute of health, dealt with these challenges in organizing the COVID-19 health surveys. This is a series of ten repeated online surveys that ran between April 2020 and March 2022. These surveys had as objective to monitor the general adult population on health related topics that were relevant for policy makers and supported them in fighting the pandemic and its effects, in the medium and long term. The main objectives of this manuscript are:

  • to describe the methodology used in the COVID-19 health surveys;

  • to provide the outcomes of the COVID-19 health surveys in terms of participation and sample composition;

  • to discuss the benefits and pitfalls of the applied methodology and provide directions for future research.

Methods

Over a period of two years and therefore in different phases of the pandemic, ten online COVID-19 health surveys were organized in Belgium. The general methodology is discussed below. The elements that differed from survey to survey are presented in Table 1. Systematic methodological information about the surveys based on The Checklist for Reporting Results of Internet E-Surveys (CHERRIES) [41] can be found in Additional file 1. All ten surveys were approved by the ethical committee of the University Hospital of Ghent. Before participants could participate to the survey, they had to indicate that they lived in Belgium and were at least 18 years old. Furthermore, in all surveys participants had to provide consent to six terms and conditions including voluntary participation, confidentiality of the data and the right to withdraw at any time in accordance with the General Data Protection Regulation (GDPR) and the Declaration of Helsinki.

Table 1 COVID-19 health surveys by timing and crisis phase, recruitment, themes and participant number, Belgium 2020-2022

Timing

The first COVID-19 health survey was launched three weeks after the first restrictive measures were put in place. The subsequent surveys followed at regular time intervals. Table 1 provides more details about the timing and phases of the crisis in which the different surveys were organized. The phase of the crisis refers to the epidemiological situation and the severity of the restrictions put in place at that time. For the epidemiological situation, the number of new hospital admissions due to COVID-19 as presented on the Dashboard of Sciensano were taken into account [42]. In line with the Belgian “Coronabarometer”, we considered <65 new hospital admissions a day as low, between 65-149 new hospital admissions a day as moderate and ≥150 new hospital admissions a day as high” [43]. For the severity of the restrictions, the measures as presented on the official website of the government served as the reference [44]. Three distinctions were made:

  • Severe restrictions included i.a. closure of non-essential shops, bars, restaurants and schools, telework was the norm, non-essential movements and social contacts outside the household were strictly limited.

  • Moderate restrictions included i.a. non-essential shops, bars, restaurants and schools were closed or open with restrictions, telework was the norm but combined with office days, non-essential movements and social contacts outside the household, though often with restrictions, were allowed.

  • Light restrictions referred to periods where i.a. non-essential shops, bars, restaurants and schools were fully open, telework was at most a recommendation and there were no or only limited rules for non-essential movements and social contacts outside the household.

Recruitment strategy

A non-probability sampling approach was used for the COVID-19 health surveys. In crisis times, a permission from the Belgian national register to draw a new probability sample could be received in short time. However, this registry does not contain any e-mail addresses and consequently sampled individuals can only be invited via post which is time and cost inefficient. Relying on a probability-based sample established prior to the pandemic was also not possible because there was no permissions to re-contact participants from a previous large-scale probability survey such as the Belgian health interview survey 2018. Moreover, it was impossible to use members of a probability-based panel as these panels did not exist in Belgium when the crisis started.

The recruitment strategies can be summarized as follows:

  • River sampling: this refers to recruiting participants by putting an invitation to complete a survey on a website, a social media page, etc. where it is likely to be noticed by members of the target population [30]. Lehdonvirta et al. describe it poetically as “researchers dipping into the traffic flow of a website, catching some of the users floating by“. All COVID-19 health surveys were announced on the website, the Twitter® and LinkedIn® of Sciensano. In addition, they were announced via (online) articles of national press organizations because each survey had a press release. Starting from the second survey, local community organizations, health insurance funds and senior citizens organizations were asked to share survey invitations through their website, and social media. Starting from the seventh survey also sports federations, higher education institutes and young adult organizations received an invitation to spread the survey to attract more youngsters.

  • Snowball sampling: this refers to participants recruiting new participants from their network [45]. The name derives from the idea that the sample appears to grow like a rolling snowball. In the case of the COVID-19 health surveys, participants were asked to share the survey invitation as widely as possible among their friends, family and colleagues via e-mail and social media. In addition, Sciensano employees were asked to share the surveys among their personal contacts.

  • Recruitment of previous participants: starting from the second survey, invitation e-mails were sent to a list of previous participants who agreed in a given survey that their e-mail address could be kept for this purpose. The e-mail invitations were developed and sent using software of Tripolis®. At the time of the last COVID-19 health survey, 50423 former participants received an invitation. By inviting previous participants, a follow-up was also possible. The data of people who participated in multiple waves of the COVID-19 health surveys was linked between waves. Their e-mail address combined in some cases with some background information and the four last digits of their phone number served for this linkage as there was no other unique identifier. Participants gave consent for this approach. For privacy reasons, the e-mail addresses were separated immediately from the datasets used for the analyses.

  • Offline recruitment: in addition to online recruitments, there was also offline recruitment for some COVID-19 health surveys. The surveys were announced during the Coronavirus press conferences organized on a regular basis by the National Crisis Center to inform the population about the epidemiological situation. Some surveys were also mentioned in offline media news.

The specific approaches used per survey can be found in Table 1. The recruitment strategies were continuously adapted in order to try to keep participation high and to attract a sample diverse in terms of socio-demographic characteristics. Materials, such as visuals, social media messages and e-mails, were developed together with internal communication experts.

In order to partly correct for bias associated with this sampling and recruitment strategy, post-stratification weights were applied in the analysis of the data. For what concerns sex, age group and province, information on the composition of the population on January 1st, 2019 as calculated by Statbel, the Belgian Statistical Office, was used. To address unequal participation by educational level, weights were adapted according to the information on educational level collected in the context of the Labor Force Survey 2018 [46]. Two educational levels were distinguished: “higher secondary education or lower” and “higher education”. For topics related to COVID-19 vaccination, specific weights were used taking into account the vaccination status of the population at the moment of the survey.

Web questionnaire

Design

All surveys were developed using LimeSurvey® version 3. This is an open source tool that makes it possible to create large-scale sophisticated surveys in a short period of time. The mean completion time of the COVID-19 health surveys ranged between 11 minutes and 20 seconds (eighth survey) and 17 minutes and 27 seconds (fifth survey). The questionnaires could be completed in the three national languages of Belgium (Dutch, French and German) and in English.

Content

The questionnaires were developed in consultation with public health experts and policy makers. As much as possible validated instruments were used, such as the ones included in the national health interview survey of 2018. An overview of the health themes covered in the different COVID-19 health surveys is provided in Table 1. Overall, there were five broad domains included in the COVID-19 health surveys, with the two first domains being considered as the core.

  • The indirect effects of the COVID-19 crisis on various aspects of health (mental health, social health, health related behaviors and health care consumption).

  • Preventive measures taken to reduce the number of transmissions.

  • The direct impact of the COVID-19 virus on health (contraction of COVID-19 and its consequences).

  • The indirect effects of the COVID-19 crisis on other life domains (e.g. financial and work situation)

  • Various aspects that may have influenced the above mentioned outcomes (e.g. education level, employment situation, income, presence of chronic diseases and personality characteristics).

Results

Participation & sample composition

Figure 1 displays the total number of participants per survey and gives an overview of the cumulative number of people that completed the survey per day. A participant was defined as a person who agreed with the informed consent and completed minimally the questions on birth year, sex and postal code. Two general trends can be identified. Firstly, the participation decreased consistently over time. The highest participation was reached in the early days of the crisis in Belgium (surveys 1 and 2). The only exception to the decreasing trend was the ninth survey which was organized between 13 and 23 December 2021 and had more participants than the sixth till the eighth survey. A second general trend is that the majority of participants completed the survey within two days after the launch of the survey (>60%). The only exception is the second survey where a large share of the participants completed the survey on the third day too.

Fig. 1
figure 1

Number of participants by completion day per survey, COVID-19 health surveys, Belgium 2020-2022

Table 2 shows the unweighted sample distribution of the surveys versus the population distribution in terms of sex, age group, education level and region. It also presents the distribution among participants who completed at least 5 surveys and provided consent to link their data collected in the different surveys. This group of people can be followed-up longitudinally and is called the “cohort”.

Table 2 Unweighted sample distribution of the surveys versus population distribution, COVID-19 health surveys, Belgium 2020-2022

The distribution by sex was the least favorable in the first two surveys (with about 32% males versus 68% females). The sex balance was slightly better in the next surveys and among the cohort participants. The distribution, however, remained highly different from the distribution in the Belgian population (49% males versus 51% females). When assessing the age distributions, some general trends can be observed. In the first survey, there was, compared to the general population distribution, an underrepresentation of the youngest (18-24 years) and the two oldest age groups (65-74 years, 75+ years) and an overrepresentation of the age groups between 25 and 64 years, notably of the age group 35-44 years. Throughout the next editions of the survey, there was a decline in the number of participants from the younger age groups between 18 and 44 years. An exception to this general trend occurred in the seventh and eighth survey where there was an increase in the young participants between 18 and 24 years. The proportional number of participants between 55 and 74 years old increased throughout the surveys. In all COVID-19 health surveys, there was an underrepresentation of the youngest (18-24 years) and oldest age groups (75+). When comparing the distribution of the cohort participants with the distribution in the general population, we observe an underrepresentation of the youngest (18-34 years) and oldest age groups (75+ years) and an overrepresentation of the 45-74 year olds.

The education distribution remained fairly constant during all COVID-19 health surveys and was strongly biased (with about 70% high educated people). Among the cohort participants it was even slightly more skewed with almost 75% high educated people. For comparison, in the general population aged 20 till 64 years only 41% has a degree of higher education. Lastly, the distribution by region remained roughly constant throughout the surveys (on average 66% participants from the Flemish Region, 9% participants from the Brussels Capital Region and 25% participants from the Walloon Region), with the only exception of the second survey where relatively more participants of the Walloon Region (38%) and less participants from the Flemish Region (51%) were counted. The cohort distribution is similar to the survey distributions. When comparing these distributions to the actual distribution in the Belgian population, an overrepresentation of people from the Flemish Region (58% of the population) and an underrepresentation of people from the Walloon Region (32% of the population) can be observed.

Discussion

The beginning days of the COVID-19 pandemic impacted all aspects of life, including the way surveys were organized [49]. Ongoing and planned studies using face-to-face data collection needed to adjust their fieldwork [50,51,52]. New surveys aiming to rapidly evaluate the impact of the pandemic faced challenges and were dependent on the existing survey infrastructure of their country. This manuscript described the methodology of the COVID-19 health surveys, a series of 10 non-probability web surveys in Belgium aiming to monitor the general population after the onset of the pandemic. Recommendation to recruit a demographically balanced participant pool were taken into account for these surveys. Informal partnerships were set up with trustworthy organizations such as local community organizations, health insurance funds, young adults and elderly organizations, etc. to build trust among different population groups [6]. Moreover, the recruitment channels (e.g. e-mail, social media, press, etc.) and networks were divers [6, 22, 28]. In addition, extra efforts were made for next survey editions when realizing that some population groups were not enough represented (e.g. substantial efforts were made starting from the seventh survey to attract more young adults).

Principal findings in terms of participation

In the beginning of the pandemic the number of participants was the highest; the first survey organized within three weeks after the first restrictions were put in place had 49334 participants and the second survey organized two weeks later had 42895 participants. Even though the number of participants decreased throughout time it remained high: the last survey ended with 13882 participants. The participation trend does not follow the severity of the epidemiological situation as some surveys were organized in other critical phases, but had nevertheless much less participants than the first surveys. The declining participation rate may have several reasons. In general, at the beginning of the pandemic, the news and people’s own thoughts and lives were dominated by COVID-19 making the survey topic highly salient. Moreover, the first surveys were organized in strict lockdown periods which gave people time to complete the survey. The two former reasons resulted in a wide dissemination of the COVID-19 health surveys by the press in the beginning of the pandemic while the media attention decreased for later surveys. The only exception to the decreasing trend was the ninth survey which was organized between 13 and 23 December 2021. Possible explanations might be that people had more time during the Christmas period, that the communication materials were more clear after updating them and that the survey was hold in a period with a high number of infections. A declining participation trend over time is also seen in other repetitive COVID-19 surveys [13, 40].

The majority of participants of all COVID-19 health surveys were reached the first and second day after the launch of the surveys. This indicates that the surveys were mainly shared within the first days after the launch and that people completed the survey almost immediately after viewing the link to the survey on a website, a social media page or an invitation e-mail. The only exception is the second survey where a large share of the participants completed the survey on the third day too. Potential reasons are: only on the third day, the Coronavirus press conference and the national TV news mentioned the survey and the invitation e-mail to the previous participants was only sent the evening of the second day.

Males participated less than females in all COVID-19 health surveys. There was an underrepresentation of the youngest (18-24 years) and oldest age groups (75+ years) in all COVID-19 health surveys. In addition throughout the time, a decline in the number of young participants (18-44 years) and an increase in the number of older participants (55-74 years) could be observed. There were also strong educational differences with, as expected, low educated people taking less part in the surveys. People from the Walloon Region were less prone to participate in the surveys. The recruitment approach of the COVID-19 health surveys did not make it possible to get (demographically) balanced samples. Other types of non-probability sampling approaches such as using paid and targeted adds on social media or retaining participants via commercial opt-in panels succeeded better in getting demographically balanced sample [13, 40].

Limitations

The samples of the COVID-19 health surveys were prone to biased estimates as they relied on self-selection and excluded people without internet access or skills. This is the biggest point of criticism that non-probability web surveys receive [22, 23]. Despite of the efforts made, the unweighted sample distributions of the COVID-19 health surveys remained suboptimal. Post-stratification weighting on socio-demographic factors was applied to at least partly take into account the unequal distribution of some population groups in the COVID-19 health surveys. However, weighting for these factors is not sufficient to eliminate bias in the estimates. There are also unobservable characteristics which cannot be taken into account using weighting that impact both the chance to participate and the outcomes of the survey [23, 31].

As a consequence, caution is needed when generalizing results deriving from these type of non-probability web surveys to the general population. It is not recommended to calculate descriptive estimates such as prevalence rates from these surveys [29, 53, 54]. However, in the beginning of the pandemic there was an urgent need to have figures about the impact on the Belgian population. As there was no alternative in the form of a probability survey including people without internet access, the prevalence rates of the COVID-19 health surveys were considered as informative. Inferences regarding associations between variables are generally less sensitive to sampling quality [53]. Apart from the bias associated with the sampling, bias in the estimates can also result from the self-reporting aspect. For example, there might have been an overestimation of the compliance to preventive measures as this is a socially desirable behavior [55].

Strenghts

The first asset is related to the questionnaire development and content. All surveys included as much as possible validated and frequently used instruments and scales. In addition, the surveys covered multiple health outcomes, highly relevant policy topics and contained a large set of covariates. The second major asset is the organization of a longitudinal study by re-inviting participants for next editions. A large share of participants completed the COVID-19 health surveys at least five times over two years (n cohort=12599). The benefit of following up the same individuals over time is that the evolution found for certain outcomes such as mental health throughout the pandemic cannot be due to different sample compositions across different time points [7]. The third major asset was the flexibility and timeliness to include new highly relevant topics in the surveys based on the demand of policy makers. The last asset is that the participants of the COVID-19 health surveys served as a recruitment pool for other COVID-19 projects including a qualitative study on the attitude towards vaccination.

Future prospects and recommendations

The pandemic and the associated demand for data on the well-being of citizens taught us lessons for the future of survey methodology. In order to evaluate the impact of unexpected crises, we must ensure that we can survey randomly selected individuals instead of relying on convenience samples. Non-commercial online panels with a probability-based sample established prior to the crisis are an optimal choice for this [6, 21, 23, 36, 38]. Especially when providing panelists who do not have access to the internet with access to participate anyway or foreseeing them with paper response options. These studies limit self-selection bias and under-coverage bias and have valid comparison points with pre-crisis data. These types of panels did not exist in Belgium when the pandemic started but it is important to build them into our survey infrastructure. Fortunately, initiatives are currently taken to make up for this lack. There is, for instance, the “Belgian Health and Well-being cohort”, a cohort study initiated by Sciensano with a focus on mental health. This is the successor of the COVID-19 health surveys and the participant pool will consist of both previous participants of the COVID-19 health surveys and individuals selected from the national register. In addition to setting up large-scale panel studies, it is also relevant to always ask participants of large probability studies if they may be contacted by e-mail or postal mail in the future for follow-up research [6, 22].

The outcomes of the COVID-19 health surveys in terms of the participation and sample composition indicated that certain subgroups of the population are easy to attract for survey research and remain interested for follow-up surveys whereas for other subgroups the opposite holds. Also in probability surveys not organized in COVID-19 context, participation rates differ by socio-demographic characteristics [18, 56]. The large participation differences found in the COVID-19 health surveys made us think about using different recruitment approaches for different subgroups, especially for the youngsters. After consultations with internal communication experts, we started using different recruitment channels and different recruitment materials such as visuals for Instagram® to reach more youngster. Although the results were modest, experimenting with tailoring the data collection to different subgroups by using different recruitment materials, incentives or reminders instead of using a “one-method-fits-all-design” could be valuable. These type of studies have so-called adaptive or responsive survey designs [57, 58].

Conclusion

These exceptional pandemic times have underlined the importance of collecting high quality data on people's experiences via surveys. However, traditional survey methodology was challenged in many ways in the beginning of the pandemic and, therefore, non-probability web surveys became an important information source. It is up to researchers involved in survey methodology to use these challenging times to improve the surveys organized in future crises times.