Background

Numerous large and important cohort studies have been established overseas [16] and in Australia [79]. Many of the cohort studies undertaken are based on volunteers or clinical/convenience samples and the follow-up is based on self-report data or record linkages. The establishment of population-based, biomedical, cohort studies using a random sample is less common [1, 2, 5, 9].

Cohort studies are based on the assessment of an individual at several points in time and, by recalling or re-contacting each individual, assessing the process and transition of the individual along the disease and life-course continuum [10, 11]. As argued by Szklo [12] there are numerous, and obvious, advantages if population-based cohort studies are representative of their defined population. The translations of the data into population estimates over time enable casual relationships to be explored and the ability to separate out the effects of age and maturation, although the main aims are to undertake intra-group comparisons and follow changes over time [12, 13]. Increasing emphasis is being placed upon these longitudinal data to inform policy makers, health promoters and health planners. With the ageing of the population and the resultant cost pressures placed upon health systems, these data are also being used to make informed predictions about the future use of health services, mortality and morbidity patterns. Cohort studies can provide unique data that provides a more detailed understanding of complex health issues, providing life-course analytical and useful evaluation research opportunities.

Biomedical cohort studies are very costly and logistically difficult to administer [12, 14]. The benefits and uses of the data are compromised if bias exists. A high initial response rate, a representative initial sample and a low attrition rate are areas where effort needs to be invested to limit selection bias [1517]. While loss to follow-up is a somewhat expected consequence of the longitudinal nature of all cohort studies, the representativeness of the initial sample, and subsequent ongoing continual assessment of representativeness, are important aspects that warrant investigation [1113].

This paper investigates the representativeness of an initial cohort to determine if people undertaking risky behaviours were less likely to participate in a major biomedical cohort study and to study the direction and magnitude of any bias found. As argued by Grimes [11], cohort studies should be upfront in identifying and describing the potential effects of any bias and assess similarities and dissimilarities of respondents. This paper aims to identify and describe these biases for a major cohort study established in the western and northern suburbs of Adelaide, the capital of South Australia in which over 4000 randomly selected adults have been recruited. The overall aim of this cohort study is to follow the continuum of selected chronic diseases and associated risk factors.

Methods

The North West Adelaide Health Cohort Study (NWAHCS) recruited between 2000 and 2002 with a total of n = 4060 adults participating. Randomly selected telephone numbers listed in the relevant postcodes (that equated to the boundaries of the suburbs selected to be included in the study) were drawn from the most current Electronic White Pages. A letter of invitation to participate was sent to these households followed within 10 days by a telephone call from trained health study recruiters. A randomly selected adult within the household (those with the next birthday aged 18 years and older) was asked to participate in the study. At each appointment, the participant was given additional detailed information about the study and asked to sign consent forms for participation in the study. The information given highlighted the longitudinal nature of the study, and participants were informed that they may be invited to participate in health-related sub-studies. Prior to the study commencing, approval for the research was obtained from the North West Adelaide Health Service Ethics of Human Research Committee.

Appointments were made for participants in one of the two hospital-based clinics in the region and participants were sent an information folder that included a questionnaire with questions on chronic disease, alcohol consumption, physical activity levels, quality of life and socio-economic details (including highest education level, marital status, work status, country of birth and household income level). Age, sex, smoking status, height, weight, and ever being told they had high blood pressure or high cholesterol were asked in the recruitment telephone interview. At the clinic a range of assessments were made including taking blood (to test fasting plasma glucose, lipids, HbA1c), skin prick tests to common allergens and spirometry lung function tests.

The overall response rate of the completed telephone interview, self-completed questionnaire and clinic biomedical assessment (including blood sample) was 49.6% (69% of those interviewed). This paper assesses data associated with the respondents who completed all aspects of the study. Full details of the methodology have been previously published [1820].

To examine the representiveness of the NWAHS sample with regard to age, sex, area of residence and socio-economic status, a comparison was made using Australian Bureau of Statistics (ABS) Census figures. Socio-economic status was measured using the Socio Index for Areas, Index of Relative Social Disadvantage (SEIFA IRSD) [21].

To compare the other demographics and social characteristics of the respondents and the population estimates of key health risk factors, a comparison against a population-based survey, the South Australian Surveillance and Monitoring System (SAMSS), was undertaken. SAMSS is a representative, on-going, population household telephone interview surveillance/survey of the South Australian population based on EWP sampling and has operated each month since July 2002 using a consistent methodology [22]. This involves a random sample of SA households with one person selected at random in each household according to next birthday. Trained health interviewers interview respondents using computer assisted telephone interviewing (CATI) technology and there is no replacement for non-respondents. From July 2002 to June 2004, n = 2904 adults in the NW suburbs of Adelaide were interviewed providing a non-replacement response rate of 68.7%. To compare physical activity rates, data from the South Australian Health Monitor were used. Methodology of this CATI survey, operated three times a year, is similar to SAMSS and has been detailed elsewhere [23]. This is a separate comparable survey with a separate sample.

While the questions asked in NWAHCS and SAMSS were identical for age, sex, country of birth, household income, alcohol consumption, height and weight (to calculate body mass index (BMI)), current high blood pressure, current high cholesterol, physical activity and self-reported health status there were slight differences in wording of the question for highest education level, marital status, work status variables and smoking status. Questions on height, weight, blood pressure and cholesterol were only asked of the second half of the respondents although measurements in the clinic were undertaken on all participants.

All analyses were limited to data on respondents aged 18 years and over in the same geographical area to correspond to the NWAHCS sample. Data were weighted by age, sex, region and probalility of selection within the household to the 2001 ABS Census data for SA to provide estimates that were representative of the region's population. The comparison for age and sex using the ABS data used both weighted and un-weighted data. Significance was tested using SPSS V12.0 and EpiInfo Version 6 X 2 tests with a 0.05 level of significance. Adjusted standardized residuals were obtained using the methods of Haberman [24] and were used to test deviations from expected values separately in each cell. Bonferroni corrections were applied for multiple testing.

Results

Initial analysis using un-weighted data showed that significantly less younger people (< 40 years) and more older (40+ years) were recruited into the cohort study when compared to Census data. There were no differences by sex or area of residence (Table 1). Table 2 highlights the differences by SEIFA quintiles with study participants more likely to be in the 3rd quintile and less likely to be in the 4th quintile of relative socio-economic disadvantage.

Table 1 Age and sex comparison between 2001 Census and NWAHCS
Table 2 SEIFA Index of relative socio-economic disadvantage – comparison between 2001 Census and NWAHCS

Table 3 highlights other demographic comparisons. There were statistically significant differences by education level, with NWAHCS participants more likely to have trade, certificate or diploma qualifications and less likely to have just secondary school qualifications or to have undertaken tertiary study than participants in SAMSS. There were no statistically significant differences by marital status or work status. The NWAHCS had a statistically significant higher proportion of people born in the United Kingdom or Ireland and a lower proportion of Australian-born. There were also differences in the household income level groups with the NWAHCS participants more likely to be in the $40–80,000 bracket and less likely to be in the $80,000+ bracket.

Table 3 Demographic comparison between NWAHCS and other comparable survey

Table 4 shows the significant differences between the study participants and the comparative population for health risk factors. There was no difference by smoking status, physical activity level, general health status or the proportion with current HBP or current high cholesterol. NWAHCS participants were more likely to be in the intermediate to very high alcohol risk category and less likely to be in the underweight category of BMI.

Table 4 Risk factor comparison between NWAHCS and other comparable survey

Conclusion

This cohort study has offered a unique opportunity to study chronic disease and related risk factors and to define the relationship between lifestyle and health and disease in the Australian population. Cohort studies are one of the most important tools for epidemiological investigation but random sampling cohort studies are often marred by biased samples, low response rates and high loss to follow-up [12, 17]. Erroneous conclusions can be made if confounding factors are not incorporated in analytical comparisons and models [25]. Bias can be corrected providing confounding was anticipated and confounding factors are appropriately controlled [26, 27]. This analysis has highlighted the variables that need consideration in future analyses associated with the NWAHCS.

This analysis has shown that in terms of bias associated with risk factors, the cohort participants are not dramatically unlike the community they represent. Their overall self-reported health status is the same, there is the same proportion of current smokers, their overall BMI status (except for underweight) is similar and the same proportion had current high blood pressure or high cholesterol readings. The only major difference was in terms of alcohol consumption with the cohort participants more likely to consume alcohol at an intermediate to high risk level.

In terms of demographic and social characteristics there were no differences by marital status or work status. The un-weighted comparison showed that less younger people and more older persons were recruited into the study. NWAHCS participants were also more likely to be born in UK or Ireland and less likely to be Australian born. These demographic differences could be explained by the fact that NWAHCS participants knew they were being recruited into a bio-medical cohort study with an outlay of personal time and effort required. The clinical meaning associated with the bias associated with this recruitment means we are missing out on the younger persons (expected to be healthier) and gaining more older persons (expected to be unhealthy) although the weighting of the data would counteract some of this bias. The country of birth differences should not affect clinical results as Australian-born and those born in UK/Ireland have similar heritages and social indicators. All future analyses and assumptions will take into account these differences and the fact that the study participants were more likely to have middle levels of education and be more likely to be in the middle levels of income.

The strength of this study lies in its representative nature, the large random sample and the high response rate. Although the response rate associated with the complete study involvement, including obtaining blood and other bio-medical measurements, was 49.6% (69% of people interviewed by telephone), this is high when compared to other recent, comparable Australian studies. The AusDiab study recorded a response rate of 28% and a recent pilot for a national Australian biomedical study reported obtaining blood from 23% of their sample [28]. There is a trend towards lower response rates in all types of population surveys as people protect their privacy, are overwhelmed by marketing telephone calls or mail outs. The additional commitments associated with involvement in a cohort study add to respondent burden. To overcome some of the initial bias afforded to the response rate, the data were weighted by age and sex. The weights reflect unequal sample inclusion probability and compensate for differential non-response. Theoretically the weighted analyses should provide reliable population estimates of health phenomena.

To increase the initial response rate the study team implemented a range of well recognized survey techniques. These included consideration of timing of the initial phone contact, timing of phone call and technique (questionnaire length, size), training of recruitment staff, marketing, branding, and a free-call telephone number for inquiries [29]. Early in the recruitment stage, qualitative interviews were conducted with subjects unwilling to participate in the cohort study and the findings incorporated into further recruitment procedures.

Limitations to this analysis include the use of data collected using mixed modes with comparisons based on data collected by telephone and self completion. Bias is known to exist by method of collection, especially in regard to socially desirable responses [30]. An additional weakness of the study was the lack of data on non-responders. Although some data were collected on people who refused to participate in the biomedical components of the study, a comparison with people unable to be contacted (non-responders to the recruitment telephone call) was not possible due to data limitations on non-responders. The self-report nature of the data collection could also contain an element of bias and therefore be seen as a limitation. The use of biomedical data in addition to the self-reported data in this study (height, weight, and blood pressure and cholesterol measurements) will allow comparisons to be made between reported and measured variables. This analysis is planned.

Many studies have assessed the characteristics of participants and non-participants in population surveys and questionnaire-based cohort studies. Details on, in-depth analysis of, and subsequent publishing of the initial samples for cohort studies that have been initiated in the last decade, in which participants commit to clinic assessments, are few. This study has shown that for this population, people who have risk factors for ill-health were just as likely as others to participate. This is of relevance for researchers interested in establishing a bio-medical cohort study and offers positive encouragement that the huge financial and human resource costs are worthwhile.