Background

The aim of most epidemiological studies is to obtain estimates that can be generalised to a population of interest. For surveys concerned with disease prevalence, the main means to achieve this is to draw a sample that is sufficiently representative of the target population. However, few surveys have perfect response rates and any level of nonresponse can potentially lead to biased estimates of prevalence [1, 2].

In contrast, much epidemiological practice is based around the principle that representativeness is not necessarily required for reliable estimates of relative risk based on internal comparisons within study populations [35]. Indeed, having a greater proportion of respondents in extreme categories compared to the population of interest may often be necessary, in order to yield sufficient information about specific exposure-outcome relationships [5]. A key issue is whether there is any nonresponse bias after conditioning on the covariates included in the analysis.

Cohort studies generally require more extensive data collection than one-off surveys, as well as the provision of identifying details and a long-term commitment to follow-up. While cohort studies often focus on selected population groups (e.g. occupational groups) [6, 7] and have relatively high response rates within these groups, recent response rates to population-based cohort studies are usually below 50% [812]. Furthermore, cohort study participants are generally healthier and more health conscious than non-participants [3, 1316]. Concern is often expressed at the low response rates for cohort studies, or the selectiveness of the group under study, and the generalisability of their results [17].

Direct empirical data to support the assumption that internal comparisons remain reliable, despite low response rates or highly selected study groups, is lacking. Furthermore, concerns are also expressed that elements of study design, such as sampling methods and use of postal questionnaires versus interviews, may influence the observed relationships [18]. This paper investigates whether or not cross-sectional estimates of exposure-outcome relationships are affected by survey aspects (response rate, sampling frame and mode of questionnaire administration) or the wording of questionnaire items, by comparing estimates computed from two independent studies of the same target population with divergent response rates and different designs.

Methods

The 45 and Up Study

The 45 and Up Study is a population-based cohort study of more than 260,000 men and women aged 45 years and over in New South Wales (NSW), Australia [10]. Participants were randomly selected from the database that is used to administer the national universal health insurance scheme (Medicare Australia), which has almost complete coverage of the population. Equal numbers of males and females were selected for participants less than 80 years old. Individuals aged 80 years or over and residents in rural areas were oversampled by a factor of two, males aged 80 years or over were oversampled compared to females and all residents in remote areas were completely enumerated. Participants entered the study by completing a baseline postal questionnaire and providing written consent to have their health followed over time. The study questionnaire is available at http://www.45andUp.org.au. The survey was available only in English. The current overall response rate to the baseline questionnaire is estimated to be 17.9% [10]. The final analytic sample consisted of 44,851 men and 52,961 women joining the study up to July 2008 after excluding 125 respondents who had a missing Accessibility Remoteness Index of Australia (ARIA+) [19] score.

Post-stratification estimation weights were assigned to the 45 and Up baseline survey to adjust the sample to account for the differences in selection probabilities and response rates and give consistency with 2006 population estimates produced by the Australian Bureau of Statistics (ABS) [20]. The post-strata were formed according to sex (male or female), remoteness (major city, inner regional, outer regional or remote) and age (five year age groups from 45-85 years or ≥85 years).

The NSW Population Health Survey

The NSW Population Health Survey (PHS) is an ongoing survey on the health of people in NSW using computer assisted telephone interviewing [21]. Independent samples of NSW households with private telephones are drawn each year using random digit dialling, and one person is randomly selected to participate in the survey. Informed consent was obtained from participants by their willingness to complete the telephone interview. The survey questionnaire is available at http://www.health.nsw.gov.au/publichealth/surveys/phs.asp. The survey is administered in 6 languages. In 2006 participants were asked all survey questions and in 2007 they were asked a random subset of the survey questions. We report analyses of data for 5,766 men and 9,030 women aged 45 years or over who responded to the 2006 (n = 5,480) or 2007 (n = 9,316) PHS, with response rates of 59.3% [22] and 63.6% [23] respectively.

Weights were assigned to each year of data to adjust for the differences in the probability of selection within the household, number of residential telephone connections to the house and the varying sampling fraction between each of the 8 NSW area health services to provide estimates that were representative of the NSW population [21]. These area health services can include several remoteness categories. Post-stratification weights were also assigned according to sex (male or female) and age (five year age groups from 45-85 years or 85-110 years) using 2005 and 2007 mid-year population statistics released by the ABS for each area health service [22, 23]. After weighting, Indigenous people are slightly under-represented in the PHS sample, and Australian-born people slightly over-represented, compared to the overall NSW population [22, 23].

Questionnaire items

We obtained the original questionnaires from the 45 and Up Study and the 2006 and 2007 PHS and compared the wording of questions and response categories. We classified questionnaire items as highly comparable, moderately comparable or not comparable, based on whether the item was expected to yield identical, similar or non-comparable responses, respectively, for a given individual. Analyses focused on items considered highly or moderately comparable; items used in these analyses are compared in Additional file 1. All variables used in these analyses were derived from self-reported data except postcode (45 and Up Study only).

All analyses included all participants in both studies, unless otherwise stated. If one study only asked a sub-set of participants a question of interest then the same restriction was applied to the other study. Data are reported on falls in the past 12 months for participants aged 60 years and over, hysterectomy operation in females less than 70 years, mammography screening in the past two years for females less than 80 years and bowel screening in the past 5 years for all persons aged 50 years and over.

Questions on mammography screening and hysterectomy were only asked in the 2006 PHS and hypertension and bowel screening in the 2007 PHS.

Highly comparable questionnaire items

Remoteness was determined using the mean ARIA+ score for the postcode of the participant's residential address and categorised as major city, inner regional, outer regional or remote, according to the Australian Institute of Health and Welfare [24].

Self reported height and weight were used to calculate participants' body mass index (BMI) as weight in kilograms divided by the square of height in meters. BMI was categorised as underweight (BMI < 18.5 kg/m2), normal range (BMI 18.5-24.9 kg/m2), overweight (BMI 25.0-29.9 kg/m2) or obese (BMI ≥ 30 kg/m2) according to the World Health Organisation [25].

Participants were classified as having hypertension, diabetes and/or asthma if they reported that these conditions had ever been diagnosed by a doctor (both studies) or at a hospital (PHS only). Only participants who answered version two of the 45 and Up baseline questionnaire were asked if they had ever been diagnosed with asthma (n = 65,522).

Indicator variables were constructed for being born in Australia, missing all natural teeth, speaking a language other than English at home, having fallen in the past 12 months, having private health insurance (excluding Medicare) and having a hysterectomy.

Daily fruit consumption was grouped into participants who don't eat fruit, participants who eat fruit but less than two serves per day, and participants who eat two or more serves per day.

An indicator for females who were breast screened in the past two years was ascertained from responses to ever having a mammogram and the year of (45 and Up Study) or time interval (PHS) since their last mammogram.

Psychological distress was evaluated using the Kessler (K10) measure [26] ascertained as the sum of responses for 10 questions. If a respondent answered nine of the 10 items then the missing item was imputed as the average of the other nine responses. If a respondent answered less than nine items their K10 score was set to missing. Participants with a K10 score of 22 or greater were assigned as having high/very high levels of psychological distress and those with a score less than 22 as having low/moderate levels of psychological distress [27].

Moderately comparable questionnaire items

The wording of questions across the two studies differed for household income before tax (45 and Up Study included benefits, pensions and superannuation), bowel screening (screening tests varied by study) and current smoking status (45 and Up Study participants recorded whether they were 'regular smokers' currently without a definition for regular, whereas PHS participants recorded if they 'smoke daily' where smoking was defined to include cigarettes, cigars and pipes).

The response categories varied across the two studies for highest level of educational attainment (for these analyses similar categories were constructed) and self-rated health status (the PHS had an additional response category - for these analyses the categories "poor" and "very poor" on the PHS were combined).

Analysis

Before analyses commenced, twenty exposure-outcome pairs were selected for inclusion in our analyses. These were selected on the basis of demonstrating relationships across a wide range of domains of research interest. This consisted of i) ten pairs where both the exposure and the outcome variables were highly comparable across the two studies; and ii) ten pairs where the exposure and/or outcome variables were only moderately comparable across the two studies.

Unweighted and survey weighted prevalence estimates with 95% confidence intervals (CI) were calculated for each study for all highly and moderately comparable variables used in the exposure-outcome relationship analyses. Odds ratios (ORs) were used to approximate relative risks and logistic regression analyses were used to calculate the 20 pre-determined exposure-outcome relationships for each study; separated into highly and moderately comparable ORs. In each case two sets of ORs were calculated; namely the crude OR and the OR adjusting for age, sex and remoteness since these were the sampling variables common to both surveys. Unweighted and weighted comparisons of these two types of ORs by study are presented in Additional files 2 and 3 respectively (45 and Up Study) and Additional files 4 and 5 respectively (PHS).

In the figures, the squares and lines represent each OR estimate and CI, with the area of each square being proportional to the sample size used for each estimate.

Wald chi-square statistics were computed to test for differences in the log odds ratios between the two surveys for each of the 20 exposure-outcome pairs and compared to a chi-square distribution with degrees of freedom equal to the number of categories minus one in the exposure variable. Each study was modelled separately and then the Wald statistics were calculated by combining the two sets of estimated parameters, variances and covariances. Analyses were conducted with and without using sampling weights. With survey data the Wald test can be unreliable if the degrees of freedom on the estimated covariance matrix are small [28]. In this case the samples were large and the designs relatively simple.

Analyses were carried out using SAS, version 9 [29]. This study has the approval of the University of New South Wales Ethics Committee and the NSW Population and Health Services Research Ethics Committee.

Results

The distributions of social and demographic characteristics and of health risk factors and conditions in the two studies are shown in Tables 1 and 2, respectively. Younger persons and/or those living in major cities were under-represented in both surveys as were males in the PHS (the 45 and Up Study sample was stratified by sex). The prevalence confidence intervals were narrower on the 45 and Up Study compared to the PHS, because of the larger sample size.

Table 1 Social and demographic characteristics of the 45 and Up Study and the NSW PHS populations
Table 2 Health risk factors and conditions of the 45 and Up Study and the NSW PHS populations

The weighted estimates of prevalence were similar across the two studies for variables such as age, sex, and remoteness (the variables used for weighting), country of birth, educational attainment, fruit consumption, body-mass-index and falls. However, the prevalence of speaking a language other than English at home and of holding private health insurance was higher in the 45 and Up Study compared to the PHS, while the prevalence of smoking, high/very high psychological distress, ever diagnosed with hypertension, ever diagnosed with diabetes and ever diagnosed with asthma was lower (Table 1; Table 2). The PHS tended to have less missing data than the 45 and Up Study, particularly for variables relating to mammography screening, K10 score and household income before tax. Prevalence estimates for self-rated health status varied across the two studies with the proportion who reported the lowest category of self-rated health status on the 45 and Up baseline questionnaire (i.e. "poor") being similar to the proportion who reported the lowest category on the PHS (i.e. "very poor")(Table 2).

The ten exposure-outcome relationships where both the exposure and the outcome variables were highly comparable across the two studies are presented in Figure 1, with ORs adjusted for age, sex and remoteness. The observed relationships were virtually identical between the two studies. For 8 out of the 10 relationships there was no significant difference between the results from the different studies. There was borderline evidence of a difference in the risk of falling according to BMI across the two studies (Wald test P = 0.04) and minor heterogeneity in the relationship of age to high/very high psychological distress was observed (Wald test P = 0.02). Similar observations were seen when the ORs from these ten relationships were calculated using sampling weights (Additional file 6).

Figure 1
figure 1

Odds ratios a by study where the exposure and outcome variables were highly comparable across the 45 and Up Study and the NSW PHS b. ARIA+, Accessibility Remoteness Index of Australia; CI, Confidence Interval; NSW, New South Wales; PHS, Population Health Survey. a Adjusting for age, sex and remoteness.b Black squares represent ORs with area inversely proportional to the sample size contributing to the OR and the corresponding line represents the 95% confidence interval. c P-value from Wald chi-square test: testing for a difference between the two studies for the specific exposure-outcome pair

The ten exposure-outcome relationships where the exposure and/or outcome variables were only moderately comparable across the two studies are presented in Figure 2, with ORs adjusted for age, sex and remoteness. Each exposure-outcome pair had a similar relationship pattern for both studies and all OR estimates were in the same direction and of similar magnitude, except when self-rated health status was the exposure variable. The relationships did not differ significantly for 4 out of 10 of the exposure-outcome associations. Significant but relatively minor differences in ORs were observed for smoking and educational attainment and pre-tax income in relation to psychological distress, private health insurance and remoteness of residence. In spite of the similarity in the shape of the relationship, substantial heterogeneity and large differences in ORs were observed for relationships with self-rated health (where the PHS had an additional response category, "very poor"). Similar observations were seen when the ORs from these ten relationships were calculated using sampling weights (Additional file 7).

Figure 2
figure 2

Odds Ratios a by study where either the exposure or outcome or both variables were only moderately comparable across the 45 and Up Study and the NSW PHS b. ARIA+, Accessibility Remoteness Index of Australia; CI, Confidence Interval; NSW, New South Wales; p.a, per annum; PHS, Population Health Survey. a Adjusting for age, sex and remoteness. b Black squares represent ORs with area inversely proportional to the sample size contributing to the OR and the corresponding line represents the 95% confidence interval. c P-value from Wald chi-square test: testing for a difference between the two studies for the specific exposure-outcome pair

Following adjustment for age, sex and remoteness, additional weighting of the OR for age, sex and remoteness did not change any of the ORs from the 45 and Up Study materially (i.e. no changes were >10%) (Additional files 2, 3). This is because the weighting is determined by the variables used in the logistic regression. Weighting the PHS resulted in some changes to the ORs because not all variables used to determine the weighting (i.e. household size to account for the selection of a person from each selected household and the 8 area health services) were used in the logistic regression (Additional files 4, 5). Weighting did not change the general nature of the observed relationships.

Discussion

Discussions around epidemiological methods often conclude that representativeness is not necessarily required for reliable estimates of relative risk based on internal comparisons within study populations [4]. By their nature, cohort studies tend not to be directly representative of the general population, however over time, their results have usually been shown to be both reproducible and generalisable to the larger population [6, 7]. Miettinen explains that "an empirical relation is not distorted by any manipulation of the distribution of the study base according to the elements in the occurrence relation - the determinant, the modifiers and/or confounders. For example, the empirical relation of body weight to gender does not depend on the gender distribution in the study base." [[5], p. 56]

It is generally accepted that in order to produce results that are generalisable, studies should exhibit sufficient variability in the determinant and modifiers to be studied and a limited range for confounders [4, 5]. Nevertheless the possibility of bias cannot be excluded, and empirical data on how exposure-outcome relationships might vary according to the degree of nonresponse are lacking.

Nonresponse is a form of self-selection. Selection solely by the exposure or outcome variable does not bias the estimates of ORs in logistic regression [14, 30, 31] and selection solely on the basis of covariates in the logistic regression also leads to unbiased ORs. Although evidence from simulation supports the principle of generalisability [12], specific scenarios may result in significant bias if selection criteria and dependent variables are closely related [32]. In particular, biases can occur if selection depends on both the exposure and outcome [14, 17].

We found that although some prevalence estimates varied between the two studies of the same population investigated here, exposure-outcome relationships did not differ materially, where the variables used were highly comparable. This was despite major differences between the studies, including varying response rates, sampling frames and modes of administration; the PHS had a smaller proportion of missing and invalid responses due to the nature of the computer assisted telephone interviewing system and it included respondents who completed the survey in languages other than English. It was not possible to definitively separate the individual effects of sampling frame, response rate and mode of administration, since response rates and aspects of study design are closely linked [33].

We were unable to locate other empirical comparisons of relative risk estimates in independent studies with divergent response rates and different study designs that were drawn from the same target population. Indirect evidence supporting our findings comes from studies that have observed consistent ORs in study respondents and non-respondents using linked data [34] and in initial cohort study participants and participants responding to a subsequent questionnaire [35]. Two studies found only small biases in relative risk estimates due to nonresponse, in cross-sectional ORs from a cohort study relating to cardiovascular disease [31], and in cohort analyses relating to reproductive outcomes [12]. One study found consistent ORs related to smoking in respondents recruited by postal survey and those recruited through postal and telephone surveys and home visits [18].

Having established the lack of any major differences attributable to response rate and study design (including sampling frame and mode of questionnaire administration), the comparison of exposure-outcome relationships containing moderately comparable variables across the two studies can be seen as illustrating the additional effect of the specific questionnaire items used. Our findings demonstrate that an apparently minor difference in the wording of questions can significantly influence measures of prevalence and estimates of risks. This emphasizes the critical importance of maintaining the consistency of survey questions if valid comparisons are to be made and is consistent with previous studies [3638]. Although most differences attributable to question wording resulted in minor heterogeneity, highly significant heterogeneity was evident for the question on self-rated health status, where the response categories varied across the two studies. However, despite differences between questionnaire items, the observed ORs would lead to similar conclusions regarding the nature of the exposure-outcome relationships.

One shortcoming of our study is the lack of strict gold-standard measures for the study variables. The PHS has a 40% nonresponse rate and may be subject to nonresponse bias. Under ideal circumstances we could use census data; however the Australian Census includes only very limited health data. Additionally, these findings relate to two large studies with considerable variability in the factors included in our analyses. This ensured that there were substantial numbers of participants from each study in each exposure-outcome category, and allowed for adjustment for multiple factors. Although these findings support the principle of generalisability of findings from a relatively select group of participants, it remains possible that they are less applicable to smaller, less heterogeneous studies. These findings relate to cross-sectional analyses; prospective, longitudinal analyses are less prone to the potential biases investigated here, since baseline selection cannot be influenced by outcome status.

Applying weights to survey data to calculate prevalence estimates that account for the differences in probability of selection is standard practice. However, use of sampling weights is less common when calculating relative risks from cohort study data; instead adjustments are usually made to account for potential confounders. The relative risk estimates adjusting for age, sex and remoteness from the 45 and Up Study were not altered materially by further weighting. Hence weighting did not appear to be necessary when the variables used in calculating the weights were used as covariates in the analysis. Weighting is potentially important in the PHS because of the role of household size and area health service in the weighting.

Conclusions

These findings show that broad ranges of exposure-outcome relationships estimated from two studies of the same population remained consistent regardless of the underlying response rate or mode of questionnaire administration. They provide empirical support for the basic epidemiological principle that results based on internal comparisons remain generalisable even when study subjects are drawn from a relatively select group. They emphasize the crucial importance of maintaining the consistency of question wording in order to permit comparisons between studies.