Background

Large cohort studies are subject to the problem of attrition. The most prominent types of attrition include those participants who have died during the follow-up period, those who cannot be located because of (e)migration, and those who do not respond to the follow-up survey (i.e., non-responders) [1, 2]. Although some causes cannot be influenced by the researcher, study design and efforts to contact the study population can modify the degree of attrition [2].

High rates of non-participation to a follow-up survey can lead to selection bias, when the persons who drop-out differ significantly from the participants in characteristics that are related to the outcome being studied [3, 4]. This loss of a selective group can reduce the external validity as well as the generalizability of the research findings [1, 2, 5, 6]. The success of any longitudinal study, therefore, depends upon its participants remaining in the study [2]. Assessment of information on initial participation and retention rates helps to evaluate potential selection bias when non-participation during follow-up is not random [1, 7, 8]. Furthermore, assessment of determinants of attrition may identify characteristics of participants who are most unlikely to respond to the follow-up survey [1]. This may aid in management strategies to target specifically individuals with such characteristics and thus leading to reducing non-response [1, 4, 6, 9]. Various population-based longitudinal cohort studies have shown that non-responders often differ from those who respond to a follow-up survey with respect to demographic, socioeconomic and health characteristics. Many factors have been investigated, though not all factors are consistently found to be significantly associated with non-response [10, 11]. However, in most studies, non-responders are more likely to be among the youngest [13, 12] or oldest participants [6, 8, 9], to live alone [14, 6, 9, 13], to be less educated [1, 4, 6, 8, 1114], unemployed [2, 5, 9, 14] and to have a low income [5, 6, 11]. Non-responders are more likely to have an unhealthy lifestyle, especially being a smoker [24, 7, 8, 11, 13]. The general health profile of non-responders tends to be worse than that of responders [1, 4, 8, 9, 11, 13, 15, 16] and a higher prevalence of obesity is observed [3, 8, 12].

To date, studies on determinants of non-response have been mainly conducted in single population-based cohorts where all participants were followed for the same time period. The present study, however, is based on data from almost 500.000 participants from 10 European countries, as part of the EPIC-PANACEA (European Prospective Investigation into Cancer and Nutrition-Physical Activity, Nutrition, Alcohol, Cessation of Smoking, Eating out of home And obesity) study. EPIC-PANACEA aims to investigate the determinants of obesity and body weight changes in Europe. For the purpose of EPIC-PANACEA data from a second assessment of body weight collected several years after baseline were centralized and combined with the EPIC baseline dataset. The length of follow-up as well as the method of contacting participants (i.e. by postal surveys, directly by phone or by a request to visit a study center for physical examination) differed between the collaborating centers. This allows insight in whether non-response differs with various methods of contacting participants and diverse durations of follow-up.

The purpose of the present study was twofold. First, we investigated whether baseline demographic, socio-economic, health variables, length of follow-up and method of contacting the participants predicted non-response to an invitation for a second assessment of lifestyle factors and body weight excluding those who were not (yet) contacted, and those who either died or emigrated during follow-up. This provides insight in important determinants of non-response that can be used to enhance cohort maintenance in future studies. Second, we compared all baseline participants for whom a second body weight assessment was missing (including non-responders, (e)migrated, deceased or not yet contacted participants) with responders, to evaluate whether the population lost to follow-up formed a selective group causing potential selection bias in future analyses.

Methods

Study population at baseline

The PANACEA study is part of the large EPIC study. EPIC is an ongoing multi-center prospective cohort study, designed to investigate the relationship of nutrition and lifestyle with cancer and other chronic diseases [17]. The study is conducted in several centers in ten European countries (Denmark [Copenhagen, Aarhus], France, Germany [Potsdam, Heidelberg], Greece, Italy [Florence, Varese, Ragusa, Turin, Naples], The Netherlands [Utrecht, Doetinchem, Amsterdam/Maastricht], Norway, Spain [Asturias, Granada, Murcia, Navarra, San Sebastian], Sweden [Malmø, Umea] and the United Kingdom [Oxford general health, Oxford health conscious, Cambridge]). In the present study, multiple centers within a country were treated as a single study center, when length of follow-up and data collection methods did not differ and when coordination took place out of one center. Therefore, data from multiple centers in Spain and Denmark are treated as single centers, whereas the centers from the UK, Germany, The Netherlands, Italy and Sweden are treated separately in our analyses. In Norway and Greece one coordinating center was situated.

Enrolment took place between 1992–2000, which resulted in recruitment of 521.448 male and female participants aged between 20–80 years. In many centers participants were invited from the general population residing in a given town or geographical area except for France where members of the health insurance for teachers were included; a large part of the Spanish and Italian centers included blood donors; the cohorts in Utrecht and Florence included women attending the breast cancer screening program. Half of the participants recruited by the Oxford centre are ‘health conscious’ vegetarian or healthy eaters partly recruited by contacting members of The Vegetarian Society of the UK and all surviving participants in the Oxford Vegetarian Study [18]. In France, Norway, Utrecht (The Netherlands) and Naples (Italy) only women were recruited. Participants were either invited by mail (Navarra and Asturias (Spain), Ragusa (Italy), France, Germany, Netherlands, United Kingdom, Denmark, Sweden, Norway), in person (Turin (Italy) or in person and by mail (Granada, Murcia and San Sebastian (Spain), Greece, Florence, Naples and Varese (Italy). Individuals who provided written informed consent were mailed standardized questionnaires on diet and on lifestyle, socio-economic and health variables. Most participants completed these questionnaires at home and were then invited to a study centre for an examination. In Spain and Ragusa (Italy), the participants received the non-dietary questionnaire by mail. The lifestyle questionnaire was self-administered when visiting the study centre, where also an interviewer-administered computer-driven dietary questionnaire was completed. Participants in Greece who were recruited in person completed an interviewer-administered questionnaire on diet and a questionnaire on lifestyle at the study centre. In Denmark and Malmö (Sweden), the participants filled in dietary questionnaires at home and lifestyle questionnaires at the study centres.

In all EPIC centres, at the study centre anthropometric measurements were performed and blood samples were taken. Only in France, Oxford-Health conscious group and Norway anthropometric measurements were self-reported by the participants. A detailed description of the data collection in each EPIC centre has been reported earlier [17].

Approval was obtained from the ethical review boards of the International Agency for Research on Cancer and from all local centers.

The EPIC-PANACEA project is designed to investigate the determinants of obesity and weight changes in Europe. From the 521.448 participants recruited initially, 23.479 participants were excluded because of missing information on dietary or lifestyle variables, unavailable information on body mass index (BMI), extreme values on anthropometry data, pregnancy or due to an extreme ratio between energy intake and energy requirement. Thus, 497.969 participants with complete baseline data on anthropometry were available for the baseline EPIC-PANACEA analyses.

Follow-up data collection

EPIC participants are followed for vital status, cause of death and disease occurrence. In most of the centers these data are obtained by regular record linkage with the exception of Greece and Potsdam (Germany) where an active follow-up is used. For updating lifestyle and anthropometric data participants were contacted a second time several years after recruitment. For the purpose of EPIC-PANACEA the second assessment of body weight was centralized and combined with the baseline dataset.

Follow-up time between first and second anthropometry assessment differed between study centers due to logistical and financial reasons, and varied between two (in Heidelberg, Germany) to eleven (in Varese, Italy) years. Assessment was conducted through mailed questionnaires, with several exceptions: Spain and Greece contacted their participants by phone and also completed the questionnaire on the phone. Varese used a combination of postal survey and telephone interview. Cambridge (United Kingdom) and Doetinchem (i.e. a sub-cohort of the EPIC Bilthoven cohort, The Netherlands) invited their participants to come to the research center for a second measurement of anthropometry and other lifestyle factors.

In Ragusa (Italy), Turin (Italy) and Potsdam follow-up assessment is currently ongoing. Data from Ragusa and Turin are therefore not included in the present study. Potsdam provided available data from participants who were contacted so far. Naples (Italy) took a random sample of 700 participants of their baseline population for the second assessment round. In Doetinchem 1,101 baseline participants were not contacted for the follow-up survey. Hence, from all baseline participants 25,355 participants had not (yet) been contacted for a second assessment.

Missing data on follow-up assessment of body weight

In addition to the above mentioned reason for missing follow-up lifestyle and anthropometric data, i.e. not (yet) contacted, we defined three other reasons for missing data at follow-up, i.e. death (n = 8,226), (e)migration (this information was not provided by France, Spain, Greece and Germany) (n = 3,9697), and non-response to the invitation to participate in the second assessment round (n = 84,876). We defined non-responders as baseline participants who were contacted for the second assessment (alive and not (e)migrated), but who did not respond to the invitation to participate, i.e. a second assessment of body weight is not available in the EPIC-PANACEA dataset. We defined ‘persons with missing second assessment data’ as all persons for whom a second body weight assessment is missing (i.e. non responders, death, emigrated, or not contacted yet). Figure 1 summarizes the flow of the participants through the EPIC-PANACEA study.

Figure 1
figure 1

Flow of participants through the EPIC-PANACEA study.

Potential determinants of non-response

Determinants were chosen based on their importance in future weight change analyses in EPIC-PANACEA or because they were related to non-response in other studies [19, 12, 13]. Selected demographic and socioeconomic variables included sex (male/female), age (≤40/41-60/>60 years), marital status (married or living together/single, divorced or separated, widowed) and highest educational level (primary school or less/vocational secondary school/other secondary school/college or university). Lifestyle variables included physical activity according to a validated physical activity index based on work and leisure time activity (inactive/moderately inactive/moderately active/active) [19], smoking status (never smoker/former smoker/current smoker) and alcohol consumption (non users/light alcohol users (0–18 g/day)/moderate alcohol users (18–60 g/day)/heavy alcohol users (>60 g/day)). Health variables included BMI (<18.5/18.5-25/25-30/≥30 kg/m2), having cancer or a chronic condition or disease (hypertension, stroke, myocardial infarction, diabetes mellitus) at baseline, or having developed cancer between baseline and the invitation for the second assessment of body weight. Finally recruitment strategy for the second assessment (postal survey, survey completed by telephone, request to visit a study center) and mean follow-up time (0–4 years, 4.1-8 years, >8 years) were taken into account.

Data analysis

Response rates for the assessment of a second body weight were calculated according to the standard definitions used by the American Association for Public Opinion Research (AAPOR) [20]; i.e. the number of participants with a second weight assessment (responders) divided by the sum of the responders and participants who did not respond (non-responders), died, emigrated or were not yet contacted. Response rates were also calculated for the number of responders divided by the number of non-responders (i.e. excluding baseline participants who were not yet approached, died or emigrated before the second weight assessment from the denominator).

Baseline characteristics of responders (n = 375,815) were compared to characteristics of non-responders (n = 84,876). This information is important because it may help in defining strategies to prevent non-response. To predict probability of non-response by baseline characteristics a multivariate logistic regression model was used with response status (0 for response and 1 for non-response) as the dependent variable and all above mentioned baseline characteristics, recruitment strategy and mean follow-up time as independent variables. Odds ratios mutually adjusted for all variables and their 95% confidence intervals were calculated.

Secondly, baseline characteristics of responders (n = 375,815) were compared to characteristics of all those persons who had missing data for the second body weight assessment either because of death, (e)migration, not (yet) contacted or non-response (n = 122,154). This information is important in future analyses to interpret whether differences between cross-sectional relations in the baseline cohort (for example between physical activity and BMI) and results from longitudinal studies (for example relations between baseline physical activity and future weight change) can be explained by selection bias. Again a multivariate logistic regression model was used with response status as the dependent variable and above mentioned baseline characteristics as independent variables.

All analyses were performed with SPSS software, version 15.0 for Windows.

Results

Response

From all baseline participants (n = 497,969), a second assessment of weight was obtained for 375,815 persons (75.5%). When taking in the denominator only those persons who were contacted (n = 460,691 excluding deceased, (e)migrated and not approached persons) the response rate was 81.6%.

Table 1 shows the EPIC-PANACEA centers ranked according the time between first and second body weight assessment. Furthermore, the distribution of centers according to the different groups of attrition (i.e. not (yet) approached, deceased, (e)migrated, second assessment missing, non-response) is shown.

Table 1 Characteristics of the EPIC-PANACEA Cohort

Baseline characteristics of non-responders

Table 2 presents mutually adjusted associations between baseline characteristics and non-response. Non-responders were more likely to be male, to be young (≤40 years), to live alone (single, divorced/separated, widowed), to be less educated (primary or vocational secondary school), to be physically inactive (i.e. sedentary job and no recreational activity), to be current smokers, to be either a non or heavy alcohol user, to have a chronic disease at baseline but not cancer, to have developed cancer between baseline and second weight assessment and to have either a low (<18.5 kg/m2) or a high BMI (>25, especially ≥30 kg/m2). Based on the Wald statistic, the characteristics that strongly predicted non-response were recruitment strategy and duration of follow-up. A more active way of contacting participants, i.e. through direct telephone interview instead of a mailed questionnaire, resulted in a six-fold higher response. Non-response was more than three-fold if the follow-up assessment occurred after a long period, particularly more than 8 years.

Table 2 Multivariate odds ratios of non-response to a second assessment of body weight in the EPIC-PANACEA study a

When comparing characteristics of the baseline cohort to the characteristics of all persons who had a missing second anthropometric assessment (death, (e)migrated, not yet contacted and non-response) on average the same characteristics were related to missingness (data not shown). Participants for whom a second weight assessment was missing were more often male, young (<40 years), living alone (single, divorced/separated, widowed), less educated (primary or secondary school as highest attained educational level), former or current smokers, alcohol abstainers, chronically diseased at baseline, having developed cancer between baseline and second weight assessment and having either a low (<18.5 kg/m2) or a high BMI (>25, especially ≥30 kg/m2). A missing second body weight assessment was not related to heavy alcohol use or physical inactivity.

Discussion

We investigated whether baseline demographic, socio-economic and health variables were different between responders and non-responders to a second assessment of body weight in a large European cohort. Our results suggest that non-response was non-random, but linked to specific characteristics of the participants at baseline. Both analyses, responders versus non-responders and responders versus all participants with missing second body weight assessment showed that non-responders were more often male, aged under 40 years, living alone, less educated, of poorer health, reported an unhealthy lifestyle and had either a low or a high BMI. Moreover, important predictors of a high response were a short follow-up time and an active way of follow-up using personal telephone interviews.

Non-response

Overall response for the EPIC-PANACEA cohort was 81.6%, varying from 58.2% to 98.8% per center/country. One explanation for the generally high response is that all responders already were participants in the EPIC study assessing relations between lifestyle and chronic diseases. Usually this is a selective population with higher interest in medical and health issues and therefore higher motivation to participate in follow-up assessments. Follow-up time and recruitment strategy differed between the different study centers explaining part of the differences in response between the centers. A shorter time between first and second assessment was associated with a higher response, possibly because participants in centers with short follow-up duration still felt more involved in the study. Direct approach by telephone also yielded a higher response when compared to using a postal questionnaire. This implies that efforts of researchers to try and contact participants by telephone may enhance participation. However, requesting participants to visit a research center causes a burden to some persons, resulting in a lower response when compared to mailed or telephone administered questionnaires. However, we should be cautious with these conclusions because we did not study effects of different follow-up times and recruitments strategies within a center. So, center differences may also contribute to the observed differences in response between centers with different follow-up times and recruitments strategies. For example, in the Spain where participants where contacted by phone, blood donors were included and donors might be cooperative with regard to health related initiatives in general.

We identified several demographic and socioeconomic determinants that were independently related to non-response. The lower response among participants below age 40 years may relate to work obligations and family commitments and consequently less time to take part in research. This effect of age is in agreement with some studies [13, 12], although others found non-response increasing with age [6, 8, 9]. Similar to what has been consistently reported in previous studies [14, 6, 9, 13], we found lower response among participants living alone, i.e. single, divorced/separated or widowed. They depend on their intrinsic motivation to maintain participation in the study and might lack the encouragement of a partner. Furthermore, in agreement with others [3, 13], who reported educational level as one of the most important predictors of non-response, low education was associated with lower response. Reporting a less healthy lifestyle at baseline, i.e., physical inactivity, being a smoker and both, either absence of alcohol or heavy use, was related to higher non-response. Similar results for alcohol use were reported by Thomas et al. [11], who suggested that three-quarters of alcohol abstainers were ex-drinkers, having given up drinking because of ill-health. Participants with a chronic disease at baseline were more likely to refuse participation in the follow-up survey, however, participants with cancer were more likely to respond. This might be due to the fact that the EPIC study was designed specifically to investigate causes of cancer and, therefore, (ex)cancer patients who already decided to participate at baseline were still willing to contribute to this study at a later time. In contrast, participants who developed cancer during follow-up were less likely to respond to an invitation for the second assessment.

Comparing responders with persons for whom a second body weight assessment was missing (i.e. death, (e)migrated, not yet contacted, non-responder) yielded rather similar results. This may imply that future studies that assess relations between baseline characteristics and weight changes during follow-up should consider the possibility of selective non-response. However selective non-response does not automatically imply selection bias and thus distorted effect estimates. If there is selection bias, statistical tools may help in adjusting, such as the method of 'inverse probability' [21, 22]. Several other studies that investigated the magnitude of bias due to non-response showed that bias in these relationships was negligible [3, 14].

Strengths of the present study are the large samples of participants from several European centres, the use of various methods of collecting the second assessment, different time intervals between the first and the second assessment and the use of standardized and validated baseline questionnaires across centres. Some limitations should be considered when interpreting our results. First, the determinants of attrition are based on baseline data, while some lifestyle factors might fluctuate or change over time. Second, in a cohort of half a million people misclassification of exact dates of vital status or migration may occur. Consequently, in some centers the non-response group may include persons already deceased or (e)migrated but not yet registered as such. Third, the selection of the study population in each EPIC centre was largely influenced by practical considerations. Therefore, the sample was not intended to be representative of each region and investigating cross-cultural differences in non-response was not possible. Finally, we studied the response to a second assessment of anthropometric measures. In many centers, at the same time other lifestyle or nutritional data were collected. The type of information as well as the amount may also affect response.

Conclusions

In conclusion, in this large cohort study response to a second assessment, between 2–11 years after baseline, was reasonably high and varied between centers according to follow-up time and recruitment strategy. Non-response was more frequent in participants who were young, living alone, less educated, of poorer health, reported an unhealthy lifestyle and had either a low or high BMI. Cohort studies, especially those with long follow-up, may enhance cohort maintenance by paying extra attention to groups with above-mentioned characteristics who are most unlikely to respond and by an active recruitment strategy using telephone interviews.