Background

The optimal allocation of available resources is the concern of every investigator and decision-maker before choosing a population-based study design [1]. Despite the well-known benefits of conducting longitudinal surveys to advance epidemiology and clinical research, full baseline sample participation in follow-up studies is challenging. Over time, initial participants may drop out of the study due to death, move abroad or simply refuse to respond to the successive rounds of surveys, due to loss of interest for added complex examinations and time consuming measurements. This poor compliance and low participation rate may impact dataset quality and sample relevance.

The “Observation of Cardiovascular Risk Factors in Luxembourg” (ORISCAV-LUX) survey, conducted between November 2007 and January 2009, was the first nationwide cross-sectional survey of cardiovascular health monitoring in Luxembourg [2]. It aimed to establish baseline information on the prevalence of “traditional” cardiovascular risk factors, including obesity, hypertension, diabetes mellitus, lipid disorder, smoking and physical inactivity among the general adult population. Complete details about study design, sampling scheme, non-response handling, sample representativeness of the population were published elsewhere [2, 3]. Briefly, a total of 1432 subjects (response rate 32.2%) were successfully recruited, slightly beyond the estimated necessary sample size and the expected participation rate. The comparison of participants and non-participants in the ORISCAV-LUX survey revealed that their distribution and profiles were comparable in terms of cardiovascular morbidity indicators, including prescribed medications, hospital admission and medical measures [3].

From a public health and research perspective, the health surveys need to be repeated at regular intervals to monitor the evolution and allow the development of coherent and effective strategies of prevention. In 2016, the second wave ORISCAV-LUX study was initiated to follow-up the same baseline sample of participants. An extended set of health indicators, new clinical examinations and self-reported information were integrated in the second round of data collection.

Indeed, reaching a suitable number of participants, based on the initial baseline ORISCAV-LUX sample, was challenging. A nationally representative sample is a prerequisite to meet public health goals. In this respect, we had to adapt our planning and suggest alternative solutions to our sampling scheme in order to ensure sufficient sample size, and hence the validity of constituted dataset and the resulting statistics. The objective of this paper is to summarize the different sampling strategies adopted in the ORISCAV-LUX2, with a focus on the evaluation of population coverage and the sample representativeness. Operational issues associated with the implementation of this adaptive sampling schemes were described hereafter in the methodology.

Methods

Data collection procedures

Similar to the ORISCAV-LUX baseline study [2], the participation to the second wave included 3 main steps: filling in a self-reported questionnaire; clinical and anthropometric measurements according to standardised operating procedures; and blood, urine and hair samples collection.

The participants in the baseline study received an invitation letter together with an information leaflet, a coupon-answer and a pre-paid envelop, suggesting them to take part in the second wave. The subjects who accepted to participate were asked either to fill in the online questionnaire accessible with a unique identification code, or simply request a paper version indicating their preferred language (French, German, Portuguese or English). The consented subjects were rapidly contacted by phone, to schedule an appointment at one of the nearest study centres.

Added questionnaires

Several new questionnaires were added, including a self-administered questionnaire filled by the participant at home and another one focusing on the medical aspects completed during the interview by the research nurse. Information on demographic and socio-economic characteristics, personal and family history as well as lifestyle-related questionnaires were collected based on the same tools as the baseline study. New general health status modules were introduced including quality of life 36-Item Short Form Health Survey (SF-36) [4], evaluation of autonomy [Activities of Daily Living (ADL) and Instrumental Activities of Daily Living (IADL) instruments] [5], sleep habits [6], Mini-Mental State Examination test [7], [Centre for Epidemiologic Studies Depression Scale (CES-D)] [8], constipation [9], social support, women’s health, cardiovascular history, detailed personal diseases and chronic conditions, medication, vitamins and supplements intake and pollution-related questionnaire (Please see Additional files 1 and 2). An electronic version of a 174-item Food Frequency Questionnaire (e-FFQ) was also used in the second wave.

New anthropometric and clinical examinations

In addition to weight, height, waist, and hip circumferences, proximal thigh girth and bio-impedancemetry body composition (Tanita® BC 418) were measured. Further parameters concerning cardiac function including triple blood pressure and pulse rate measurements in sitting and supine position, ECG, pulse wave velocity (Complior®); physical function (including finger tapping, grip strength, balance, chair rises, walking speed, and step test by using Actiheart® were also incorporated. Objective measures of physical activity (7-day accelerometer data by using Actigraph® accelerometer), as well as mental function (five cognitive tests by using the Cambridge Neuropsychological Test Automated Battery CANTAB®) were also collected.

Sampling schemes

Original baseline sample enrolment

In December 2015, the baseline 1432 participants were re-contacted to take part in the second round, except those who had already refused (15 subjects) to take part in follow-up studies. During the 9 years, the missions of the Inspectorate of Social Security (IGSS) who provided the initial sample based on the National Insurance Registry were reformed. This institution was no longer allowed to share nominative data and therefore unable to update the addresses of the participants. They could however confirmed the crude numbers of subjects who quitted the country to live abroad (51) and deceased (23), without link to the identification code, yielding a total eligible sample of 1343 addresses. To avoid sending useless invitations to inexistent subjects, an active research on national website www.editus.lu, as well as direct phone calls were performed to confirm the accuracy of delivery addresses and to correct potential changes.

Following this procedure, further 134 addresses (10% of the eligible sample) could not be found and hence were categorised as “non-recovered”. Then, the invitations were sent to the final identified and validated sample of 1209 addresses. Out of these, 353 (29.2%) refused to take part in the second wave, 158 (13.1%) never answered after three reminders. Further 13 (1.1%) were excluded during the recruitment process due to their move abroad, physical disability or language incapacity. After this scheme, a total of 685 subjects accepted to participate. Among them, 25 subjects (3.6%) did not attend, or cancelled their repeated appointments, and could not be enrolled until the end of the study and hence were categorised as “reluctant/non recruited”. Finally, 660 subjects, constituting 54.6% of the invited sample (Fig. 1).

Fig. 1
figure 1

The overall sample participating in the second wave from the different sampling procedures

Alternative strategies

To overcome the drop of the initial sample size and preserve a nationally representative sample, three alternative sampling strategies were thereafter implemented to recruiting a new complementary sample from:

1) The civil national registry: With the support of the Ministry of Health and in collaboration with the Government IT Centre [Centre des Technologies de l’Information de l’Etat (CTIE)], a new additional random sample of 4737 subjects, accounting for a large anticipated non-participation rate was selected. This number was calculated based on the initial sampling procedure used in the first wave [3]. According to its legal status, the CTIE is the sole institution possessing the nominative information about all resident people in Luxembourg and is apt to approach directly the residents via a nominative mailing. In this context, short letters were sent to the selected subjects summarizing briefly the objective of the study and asking them to send their complete address to the recruiting institute [Luxembourg institute of Health (LIH)] in case of consent. Once the invited subjects agreed to send their personal data via the email dedicated to the project or via a phone call, they were registered in our databank. Thereafter, the same enrolment process begun by sending detailed information about the study and the consented subjects were contacted by our administrative assistant to fix an appointment in our premises. For logistic and practical considerations, the CTIE mailing was dispatched in several batches, each sent to almost 500 subjects, over a period of 6 months. Despite the huge efforts to prepare and organise this procedure, it seemed unhelpful; participation rate constituted only 5.7% (269 participants out of almost 4700 invited subjects).

2) European Health Examination Survey (EHES-LUX) list of participants: Using an existing address list of participants who took part in EHES-LUX study carried out by the LIH. Out of a total of 1431 subjects invited, 455 participants were recruited for the ORISCAV-LUX 2, constituting a participation rate of almost 32%.

3) Volunteers: A call for volunteers was advertised through divers means of communication, for example the LIH social networks (Facebook, Twitter), ORISCAV-LUX project’s website (www.oriscav.lih.lu), the national press, the media, and during outreach events for the general public. For this purpose, study-oriented poster and leaflet were prepared in order to attract new participants. Through this pathway, further 174 volunteers were enrolled (Fig. 1).

Between January 2016 and January 2018, a total of 1558 subjects were recruited in the second wave of the study, including 1438 participants (92.3%) with full participation, and 121 (7.7%) with partial participation. Full participation means that the participants filled in the self-reported questionnaires, attended their appointments and underwent clinical and anthropometric examinations, and provided blood urine and samples. Partial participation entails that the participants answered only self-reported questionnaires, without attending the nurse interview in our study centres.

Statistical methods

Using the baseline ORISCAV-LUX sample, the socio-demographic characteristics and health profiles between participants and non-participants to the second wave were compared. Then, the distribution of subjects across different strategies of sampling was described. A comparison of the overall ORISCAV-LUX 2 sample to the national population according to stratification criteria (age, sex and geographical district) was performed.

Results were presented as numbers (percentages) for categorical variables and mean ± standard deviation (SD) for continuous variables, by using chi-squared test and one-way ANOVA, respectively. All statistical analyses were performed with Predictive Analytics Software “PASW for Window® version 21.0 software (formerly SPSS Statistics Inc., Chicago, IL, USA)”; p < 0.05 was considered statistically significant.

Results

Based on baseline sample (1209 subjects), Table 1 compares the demographic, socio-economic and cardiometabolic risk profiles of participants and non-participants in ORISCAV-LUX2 study. The participants were significantly younger, with no sex-specific difference. There was a significant difference in terms of education level (P < 0.0001), 218 participants having university qualification (33%) vs. 95 non-participants (18%). The participants seemed having a better health perception (p < 0.0001); 455 (70.3%) self-reported good or very good health perception Compared to 312 (58.2%) non-participants.

Table 1 Comparison of participants versus non-participants based on the baseline ORISCAV-LUX sample (1209 subjects)

With regard to selected health-related variables, in general, participants had better cardiometabolic profile compared to non-participants; in fact, prevalence of obesity (P < 0.0001), hypertension (P < 0.0001), diabetes (P = 0.007), as well as mean values of related biomarkers were significantly higher among non-participants.

Table 2 demonstrates a comparison of the overall ORISCAV-LUX2 sample (1558 subjects) according to the pathway of enrolment. In general, volunteers had a better health profile than other groups. The proportions of the sample are significantly different in terms of age, sex, and prevalence of main cardiometabolic risk factors. In the overall sample, prevalence estimates of diabetes, hypertension and obesity were 4.2, 30 and 19%, respectively).

Table 2 Comparison of the participant’s characteristics according to the strategy of enrolment, N = 1558 subjects

To assess the representativeness, the overall ORISCAV-LUX2 sample (1558 participants) was compared to the Luxembourg population (342,235 individuals, National Institute of Statistics, STATEC 2011) according to the stratification criteria: sex, age category and district of residence. Table 3 shows that ORISCAV-LUX 2 sample was representative of the population for district, but not for sex and age groups. This age difference was significant for both men and women (both P < 0.0001). Compared to the Luxembourg population, the younger (25–34 years) and older (65–79 years) age groups were underrepresented, whereas middle-aged adults (45–64) were over-represented in the overall sample.

Table 3 Comparison of ORISCAV-LUX2 participants to the Luxembourg population by sex, age category and district of residence

Table 4 shows the completeness of individual survey elements. Data from the self-administered questionnaires were fully available, including 65% completed online, and 35% completed on paper. The percentage of completeness for health questionnaires, e-FFQ questionnaires and clinical and anthropometric measurements varied between 90 to 92%. Physical function measurements (Actigraph® and Actiheart®) were lowest (76 and 65%, respectively). The samples of biological material; blood, urine and hair were all available for 89, 85 and 55% of the participants, respectively.

Table 4 Completeness of individual survey elements

Discussion

Principal investigators of population surveys face big challenges to manage the data collection as planned and need to create opportunities to adapt the design during the course of data collection in order to ensure quality and external validity of constituted datasets and hence the resulting statistics.

The present manuscript highlights the implementation of adaptive sampling schemes based on our experience in setting up the second wave of the ORISCAV-LUX survey. Indeed, enrolment of the same participants nine years later seemed a highly intricate task. Extensive efforts were required to search and locate former participants in baseline study. A total of 1209 addresses were identified and invited, including 660 subjects (55%) were successfully enrolled. However, it was crucial to recruit additional subjects and implement further alternative strategies to increase the sample size and enhance national representativeness, including random sampling and call for volunteers.

Consistent to most literature supporting the notion of “healthy participant bias” [10,11,12,13], our findings reported that baseline participants in the ORISCAV-LUX2 study were generally healthier and at less risk than those who refused to take part. However, examples of non-significant differences [14, 15] or opposite findings have also been reported [16, 17]. Likewise, the respondents to our invitations were of higher education level than the non-respondents [12, 18, 19]. Such difference and low response rate may imply greater potential for bias survey estimates [20, 21]. In addition, this study confirmed differences in the socio-economic characteristics and cardiometabolic health profile of subjects enrolled via the different pathways, although the major proportion of the overall ORISCAV-LIX2 sample were randomly selected (baseline, EHES-LUX and CITE).

Using an additional list of subjects’ addresses was also used in a similar German population-based study [22], with relevant conclusions. Convenience sampling is affordable, and the subjects are readily available. As confirmed by our study, people who volunteer tend to be more health conscious than others [23]. Therefore, samples based only on volunteers are not likely to be representative of the general population, threatening hence the generalisability of the study results. This small volunteers’ segment could be excluded from future analyses according to specific research objectives and if deemed necessary after secondary analyses.

With these corrective measures, we raised the number of participants up to 1558, including 1438 subjects (92.3%) with full participation (filled in questionnaire and attended appointment with the research nurse). Indeed, this is an utmost advantage for the credibility of future analyses on the ORISCAV-LUX2 dataset, targeting prevalence estimates, for example, cognitive performance, arterial stiffness and physical disability.

In observational epidemiology, in particular for studies with a follow-up design, it is important to distinguish scientific inference from population inference [24]. Goldstein et al. [24] suggested to make a clear distinction between descriptive statistics that require representative samples and analytical statistics that attempt to address scientific hypotheses. They argued that selecting a sample that does not represent a real population but has a high degree of heterogeneity in terms of outcome, may provide much more power to investigate the hypotheses of interest. Therefore, they concluded that heterogeneity is desirable to enhance the effectiveness of analysis, and this often implies using sample that is not necessarily representative of the real population [24]. In addition, most of the etiological research on chronic disease (including cardiovascular diseases) issued from highly selected populations with limited representativeness, for example the Framingham study [25] and the Whitehall studies in the UK [26].

Compared to the Luxembourg population, the ORISCAV-LUX2 sample was representative for district of residence, but not for sex and age, with the younger (25–34 years) and older (65–79 years) age groups being underrepresented, whereas middle-aged adults (45–64 years) were over-represented. In the ORISCAV-LUX2 study, high coverage and sample representativeness is the primary purpose for adopting this hybrid sampling frame as an alternative solution to only use the baseline sample. Interestingly, this initial analyses of the total sample demonstrated that the prevalence of diabetes, hypertension and obesity are comparable to that reported in 2007–2008 (4.2, 30 and 19%, respectively) [2]. Assuming a steady pattern, this would indicate that integrating diverse sampling strategies in the second wave would not have biased our approach to assess the trend of these disease conditions nine years later. Nevertheless, a number of measures will be considered in future analyses in order to ensure population inference [24]. These include post-survey adjustment of data using weighting techniques to correct for non-response bias [27], as well as using statistical models based on the characteristics of the initial respondents to ‘adjust’ subsequent analyses [24, 28].

It is worth noting that strict control measures were applied to ensure quality throughout the conduct of the study. Intensive efforts were provided to optimally prepare the fieldwork including nurses training to standard operating procedures. Several features in the survey process would affect response rate and the type of participation (full vs. partial), such as the way and number of contacts, type of information given to the participants, language of the communication documents, length of interview and feedback received on examination results. While the mean time needed to perform the first wave appointments was less than 2 h, the time for the second wave appointment varied from minimum 01:55 to a maximum of 06:15 (with a mean duration of 03 h:17 min). Based on the 1438 participants who were interviewed, the completeness of individual survey elements can be described as optimal.

Conclusion

This study represents a careful first-stage analysis of the ORISCAV-LUX2 sample, based on available information on participants and non-participants. It stresses that special adaptive procedures in sampling design are needed to gain an optimal sample size. These procedures may provide the only practical way to obtain a sample large enough for both scientific research objectives and population inference. A central issue for success of observational studies is to achieve an appropriate balance between adapting the initial sampling procedure during data collection and a later adjustment with sample weighting. The available ORISCAV-LUX datasets provide a relevant basis for policy-makers regarding public health monitoring and evidence-based prevention, as well as constitute a valuable tool for epidemiological research on cardiometabolic risk.