Introduction

When designing a case–control study an important decision is the choice of the appropriate control group. The general requirement is that the control group should reflect the exposure frequency in the source population of the cases [1, 2]. Still, various practical solutions exist. Control subjects can be selected randomly from the general population, or can be partners, friends or neighbours of the patient. Another source of control subjects is the hospital in which cases are hospitalized. There are advantages and disadvantages for each choice. For example, random population control subjects may be more difficult to locate and less motivated to take part in the study than patient-related control subjects such as partners, friends, neighbours or (unaffected) family members. Asking patient-related control subjects as control subjects has the risk of overmatching on the study exposure because of joint exposures [3]. Population control subjects potentially have the drawback of recall bias and selective participation; their motivation to recall past events is likely to be different from that of cases [3]. Hospital control subjects are readily accessible, usually cooperative and more likely to have the same recall ability as the cases, but always pose the problem whether exposure is unrelated to the disease leading to the hospitalization of the control [3].

Situations arise in which the investigator may consider to include two or more control groups. On one hand, the use of multiple control groups could lead to inconsistent results with the different control groups, and proper analysis may become complex [1]. On the other hand, when different types of research questions are addressed and adjustment for different variables is required, multiple control groups might be useful.

In the Multiple Environmental and Genetic Assessment of risk factors for venous thrombosis (MEGA study), a very large population-based case–control study, we initially included partners of patients as control subjects because the main focus of the study was on genetic risk factors for venous thrombosis and their interaction with environmental and lifestyle factors. It seemed unlikely that partners would select each other based on similarities in genetic risk factors for venous thrombosis.

We also expected that asking partners would make it easier to recruit control subjects with malignancies, pregnancy, or chronic diseases, which was necessary if we wanted to study these diseases in relation to the risk of venous thrombosis. However, as a result of selecting partners of patients, who usually are of the opposite sex, the age-sex distribution of the partner controls showed some peculiarities. In particular, there was only a small group of young men with venous thrombosis, while there was a relatively large group of young women with venous thrombosis (due to pregnancy and oral contraceptive use). The small group of young men yielded an even smaller control group of young female partners, which made women-specific risk factors difficult to analyze, due to a relative lack of control subjects. Moreover, not all patients had a partner, so there were less available partners than patients, and in addition individuals with a partner may be different than those without a partner [1]. To remedy the case–control imbalance and to boost statistical power, we included an additional population control group that would be useful for certain analyses (such as pregnancy in young women), and increased the overall numbers for the genetic analyses (in particular for interactions)—as no differences in genetic make-up between partner control subjects and population control subjects were expected.

Although the odds ratios for all studied risk factors were in the same direction, and of similar order of magnitude with the two control groups, the point estimates differed somewhat, in particular for life-style variables. This represented a challenge of coming to the optimal combination of the information from the two control groups. In this paper, we describe how we proceeded.

MEGA study

Patients and partners

Between March 1999 and September 2004, we included consecutive patients with a first diagnosis of venous thrombosis. Patients were selected from the files of six large anticoagulation clinics in the Netherlands, which monitor anticoagulation treatment in all patients in a geographically well-defined area. Patients between the age of 18 and 70 with deep venous thrombosis of the leg, pulmonary embolism or a combination of these diagnoses were included. Patients with severe psychiatric problems or those unable to speak Dutch were considered as ineligible for practical reasons.

During the inclusion period, partners of patients were asked to participate as control subjects. Only partner control subjects between the age of 18 and 70 without a history of deep venous thrombosis were included and the same exclusion criteria were applied as for patients.

Random digit dialling control subjects

From January 2002 until September 2004, a second control group was recruited by using the random digit dialling (RDD) method according to Waksberg [4]. RDD control subjects between the age of 18 and 70 with no recent history of deep venous thrombosis were included and the same exclusion criteria were applied as for patients. The RDD method has proved to be a suitable method that yields a control group that can be regarded as approximating a random sample of all individuals in the population [5].

For efficiency reasons, we frequency matched the RDD control subjects to the patients who provided a blood sample according to age and sex. With each telephone call we asked a specific person within a household to participate depending on our needs to fill age and sex specific strata (e.g. we asked for youngest woman between 20 and 50, or oldest man over age 50); this procedure also avoided that the first person who picked up the phone was always included as control subject.

This procedure of control sampling was expensive and time-consuming; on average only three control subjects per hour were included. The response rate is known to be dependent on demographic characteristics of the target population and telephone skills of the interviewers [5]. In addition the RDD method is only useful if the vast majority of individuals live in households with a fixed (land-line) telephone. In December 2005 fixed (land-line) telephone coverage in the Netherlands was very high (96%) [6], indicating that telephone coverage was sufficient for our RDD method.

Data collection

Within a few weeks after diagnosis and registration at the anticoagulation clinics patients with venous thrombosis received a letter with information about the study and were subsequently contacted by phone. Partners of patients were also invited to participate. If patients or partners refused to participate the reason for refusal was asked for. Patients, partners and RDD control subjects received a standardized questionnaire shortly after inclusion by phone. The questionnaires included items on potential risk factors for venous thrombosis such as body weight, body height and injuries. Most questions referred to a period of 12 months prior to the index date, which was the date of venous thrombosis for patients and the date of completing the questionnaire for partners and RDD control subjects.

From March 1999 till June 2002, patients and their partners were asked to visit the anticoagulation clinic at least 3 months after withdrawal of anticoagulation, where, after an overnight fast, a blood sample was drawn. Only in case of continuous use for more than 1 year a blood sample was taken during anticoagulation therapy. From December 1999 onwards, self-administered buccal swabs were obtained by mail when participants were unable or unwilling to provide a blood sample. From June 2002 onwards, blood draws were no longer performed in patients and their partners, and the study was restricted to DNA collection by buccal swabs sent by mail. RDD control subjects were invited for a blood draw within a few weeks after they returned the questionnaire. Within this group buccal swabs were sent when the blood draw was refused. In the blood samples and buccal swabs prothrombotic mutations including the Factor V Leiden (G1691A) mutation were determined. A detailed description of blood collection and DNA analysis for factor V Leiden in the MEGA study has been published [7].

Different research questions, different use of control subjects

To discuss the analytic considerations that arose from having two different control groups we will describe the association of a general lifestyle risk factor (body mass index), an external risk factor (injuries), an example of a genetic risk factor (factor V Leiden mutation), and an analysis for the interaction between body mass index and the factor V Leiden mutation—all with the risk of venous thrombosis.

Results

Response rates and general characteristics

During the inclusion period, 5,961 eligible patients, 3,586 eligible partners and 4,346 eligible RDD control subjects were approached to participate. In the patient group, 4,957 patients (83%) were willing to participate, partners had a similar response rate (n = 2,917, 81%), and 3,000 (69%) RDD control subjects participated (Fig. 1). Furthermore DNA was available for 86.5% of the patients, 87.2% of the partner control subjects and 67.4% of the RDD control subjects. A possible explanation for this difference may be that partners motivated each other to participate and were able to join each other at the location of the blood draw. General characteristics and reasons for non-response are presented in Table 1.

Fig. 1
figure 1

Response rates of patients, partners and RDD control subjects

Table 1 Demographic characteristics and reasons for non-response in patients, partners and RDD control subjects

Body mass index

When we investigated the BMI distribution in patients, partners and the RDD control subjects, frequencies of overweight (BMI: 25–29 kg/m2) and obesity (BMI: ≥30 kg/m2) differed less between patients and their partners than between patients and the RDD control subjects [8]. This is most likely due to ‘assortative mating’ as well as shared lifestyle over many years in couples, resulting in lower risk estimates with the partner control group than with the RDD control group. As these partners are matched with patients (who are likely to be more obese, as obesity is a risk factor for venous thrombosis), this matching has to be considered in the statistical analysis [1]. In Table 2 the result of the matched analysis (conditional logistic regression analysis) with patient-partner pairs is presented. Risk estimates appeared to be still somewhat lower than in the analysis with the RDD control subjects (overweightpartners OR 1.45, CI95 1.26–1.67; overweightRDD OR 1.83, CI95 1.63–2.05; obesitypartners OR 1.81, CI95 1.49–2.20; obesityRDD OR 2.87, CI95 2.45–3.35).

Table 2 BMI as risk factor for venous thrombosis—Analyses with patients, partners and RDD control subjects

The use of the RDD control subjects in the analyses of BMI as risk factor for venous thrombosis may result in a slight overestimation of the true relative risk because of selective inclusion: there were fewer RDD control subjects with overweight than in the general Dutch population. According to data of the Central Bureau of Statistics in the Netherlands the prevalence of overweight and obesity was respectively 36 and 11% during the study period [9], while we found 33 and 11% in the RDD group.

To obtain an overall effect estimate with a greater precision, we combined the matched and unmatched analyses using an approach in which the estimates of the odds ratios of the two analyses were pooled, taking into account that most patients were in the analysis twice [10]. In this combined analysis we accounted for the correlation between the estimated odds ratios since most patients were included both in the matched and the unmatched analysis. Table 2 presents the odds ratios of the combined analysis (ORoverweight 1.71, CI95 1.54–1.89, ORobesity 2.45, CI95 2.14–2.80), which were in-between the odds ratios for the partner and RDD odds ratios.

The most likely explanation for the difference in risk estimates between partners and RDD control subjects is that the matched analysis will include adjustment for measured as well as unmeasured confounders but also for causal intermediary variables if those are related to couple formation or shared lifestyles. Because of these additional (over)adjustments the odds ratios in the partner analyses could be closer to 1 than with the RDD control subjects. However, the analysis of injuries as risk factor for venous thrombosis proved that this is not always the case.

Injuries

We also studied the effect of minor injuries, such as contusions and ankle sprains, on the risk of venous thrombosis [11]. Minor injuries can be caused by occasional events such as traffic accidents, but are also partly related to lifestyle as for instance sports injuries will occur more often in individuals with active lifestyles. Percentages of injuries in the weeks before the index date are presented in Fig. 2. Overall, patients had suffered from a minor injury in 11.7 percent in the 3 months prior to the venous thrombosis. Partners of patients had suffered from a minor injury in 3.6 percent, while in RRD control subjects 4.8 percent had had a minor injury. The odds ratios were 4.2 (CI95 2.9–6.0) with partner control subjects and 2.8 (CI95 2.3–3.6) with RDD control subjects after adjustment for various confounders, even after including sports activities, resulting in a combined estimate of 3.5 (CI95 2.8–4.3). These risk estimates indicate that adjustment for confounders and intermediates could also result in a higher risk estimate in the matched analysis than in the unmatched analysis. It is possible that RDD control subjects are spending more time outdoors than partner control subjects, resulting in more injuries than partners, because patients might have been more sedentary (for example, patients with DVT are generally more obese), and therefore their partners might also have been more sedentary—either by assortative mating or a shared development of habits. Another possibility is influence of the partners by the patients during the answering of the questionnaire. As the patient has a serious medical problem, namely thrombosis, the partner might not want to complain about his or her relative minor injury, resulting in a lower rate of (reported) minor injuries, and therefore a higher risk estimate. Finally, active individuals may be more likely to participate as controls than others.

Fig. 2
figure 2

Percentage of injuries per week before the index date, which was the diagnosis of venous thrombosis (in patients) or completion of the questionnaire (in control subjects). Adapted from [11]

Factor V Leiden

For genetic risk factors it seemed a priori unlikely that their frequency is different in partners than in RDD control subjects. However, the prevalence of factor V Leiden is related to ethnicity [12] so one might speculate that if partners chose their partner according to ethnicity the factor V Leiden distribution in partners might become different from RDD control subjects. In the MEGA study most participants were of Dutch origin, so differences between RDD control subjects and partner controls in the distribution of factor V Leiden due to intra-racial partnerships were unlikely. For the RDD control subjects one might hypothesize that RDD control subjects with a positive family history of venous thrombosis may be more willing to give blood than RDD control subjects without a positive family history, leading to an overestimation of the prevalence of factor V Leiden in this group. This was found not to be true as rates of positive family history were similar in the two control groups and we found the same percentage of individuals with factor V Leiden in the partner and the RDD group. Obviously, both percentages could be an overestimation of the true prevalence, but the percentages were equal to the previously recorded prevalence of factor V Leiden in Caucasians [13].

Since both control groups had the same percentage of factor V Leiden carriers and this percentage was supported by literature, both control groups were combined as if they were a single group in an unconditional logistic regression analysis (Table 3).

Table 3 Factor V Leiden mutation (FVL) as risk factor for venous thrombosis—Analysis with patients, partners and RDD control subjects

BMI and factor V Leiden

The joint effect of overweight or obesity and the factor V Leiden mutation [8] is presented as an example of the analysis of gene-environment interaction. Since the analyses of BMI required a combination of the matched analyses when using partner control subjects and unconditional logistic regression when using the RDD control subjects [10], this approach was also used when analyzing the combined effect of BMI and the factor V Leiden mutation (Table 4). A disadvantage of using matched analyses when studying interaction is that only discordant patient-partner pairs can be included in the analyses resulting in small numbers for those groups in which both exposures are present (e.g. obese and factor V Leiden), as can be seen in Table 4. When there are only a limited number of control subjects, it is also possible to check for interaction using a case-only analysis [14], which results in a multiplicative synergy index [1]. This calculation of a multiplicative synergy index [(1,077*124)/(217*643) = 0.96)] suggested interaction at the multiplicative level and a tenfold [(0.96*2.48*4.18 = 10.0)] increased risk for those being obese and having the factor V Leiden mutation compared with those being lean without factor V Leiden. This estimation of the risk corresponds well with the 7.9 fold increased risk that was found when combining the conditional and unconditional regression analyses.

Table 4 Combined effect of body mass index and the factor V Leiden (FVL) mutation on the risk of venous thrombosis

Discussion

In the MEGA study, a large population-based case–control study, we collected two different control groups, a partner control group and an RDD control group. Although for each risk factor, the odds ratios were always in the same direction and of the same order of magnitude with both control groups, they were nevertheless different. Moreover, when comparing several risk factors, it was not always the same control group that produced the lower or higher estimates, and a matched analysis did not completely annul the differences.

We presented examples of analyses of different types of research questions where we had to take decisions during the analyses about how the results of the two control groups could be combined. For the evaluation of body mass index and the risk of minor injuries we used a method for statistically combining the control groups in the analysis, because for the partner control group a matched analysis was required, which was neither necessary nor feasible in the RDD group. For body mass index the RDD control group produced somewhat higher risk estimates than partner control subjects. However, this was not the case for all analyses: for the analysis for the risk of minor injuries the inverse was found, as RDD control subjects suffered more injuries than partner control subjects. Finally, frequencies of the factor V Leiden mutation, a genetic risk factor, were identical in both control groups and independent of lifestyles, indicating that for the analyses of this genetic risk factor we could simply combine both control groups. When studying the interaction between body mass index and the factor V Leiden mutation the same statistical approach was used as in the body mass index analyses.

An important aim of the MEGA study was to assess the risk of venous thrombosis associated with the combination of risk factors. When studying the interaction between a risk factor specific to women and a lifestyle risk factor (e.g. the joint effect of oral contraceptive use and BMI) it is not straightforward to use a matched design with partner controls. In this case, intuitively one might believe that it is not possible to use the matched case–control design, because only women are users of oral contraceptives and most control subjects have the opposite sex as their matched patients. However, after a publication from the MEGA study it was suggested that one could think of being male as just a reason for a person to be unexposed [15]. This should not lead to exclusion of men from analyses of oral contraceptive use (in the same way as it should not lead to exclusion of women who are opposed to oral contraceptives for religious or health reasons). When this idea was tried out on the MEGA data, an analysis of oral contraceptive use and travel with opposite sex controls proved not only possible but gave more reliable results [16].

There are only a small number of studies reporting their experience with multiple control groups. In 1983, Stavraky and Clarke wrote a paper that summarized their experience in using hospital and neighbourhood control subjects [17]. When testing the hypothesis whether oxidative hair dyes were carcinogenic, they found lower rates of hair dye use among 314 hospital (40.5%) than among 470 neighbourhood control subjects (52.8%). Several other differences were observed. Compared with hospital control subjects, neighbourhood control subjects were older, ethnically more heterogeneous, less likely to be oral contraceptive users and more likely to be smokers. The investigators believed that most of these differences arose from different lifestyles in the relatively rural region from which the hospital control subjects were derived and in the urban region that provided the neighbourhood group. A study investigating the association between machining fluid and laryngeal cancer risk included control subjects with oral cancer as well as a stratified random sample of all deaths in a distinct geographical area as control subjects [18]. When cases (n = 888) were compared to oral cancer control subjects (n = 752) high exposure to machining fluids resulted in a 1.5-fold increased risk of laryngeal cancer. However, when cases were compared with population control subjects (n = 3,594) no increased risk of exposure was found. An explanation, besides a chance finding, may be that data quality on exposure for the cases and oral cancer control subjects may have differed from that of the population control subjects. These studies illustrate, paradoxically, that if only one control group would have been included, unrecognized bias might have influenced the results.

Besides differences in risk estimates also response rates may vary between control groups. In the MEGA study partner control subjects were more willing to participate than RDD control subjects. An explanation for this difference may be that patients motivated their partners to participate. Also the fact that partners of non-participating patients were not included in the non-response may be an explanation; if a patient refused to participate, we did not ask the patients’ partner to participate. Thus a selection was made of persons who were more willing to participate.

Selection bias could have occurred if RDD control subjects would be more willing to participate if they had a family member with thrombosis. However, there was no difference between the control groups with respect to a positive family history, nor between the prevalences of the factor V Leiden mutation, suggesting a limited selection bias in this respect. In contrast, at the start of the study we assumed that pregnant partners or partners with a severe disease such as cancer would be easier to recruit because of a higher motivation than RDD control subjects with similar characteristics. However, this was found not to be true as RDD control subjects with a partner had even slightly higher pregnancy rates than partners controls [19], and both control groups had equal rates of cancer.

We are aware that most teaching in epidemiology emphasises the choice of one control group. We deviated from this, because we thought to have good reasons to do so. Rosenbaum has suggested that it might be wise to include two control groups which differ with respect to a covariate that, though unmeasured, is know to differ substantially between the two groups [20]. Similar outcomes in two such control groups then provide evidence that imbalances in the unmeasured covariate are not responsible for treatment-vs-control differences in outcomes [20]. This was the case in our study, as all odds ratios when using the partner or the RDD control group were in the same direction. Still, some differences in the behaviour of several variables between the control groups were surprising. This gave us the opportunity to study the effects of the choice for particular control subjects: different choices for different control groups may have consequences that may not always be anticipated beforehand. Both control groups had very similar prevalences of the FVL mutation, and were therefore equally suitable. When studying environmental or lifestyle risk factors, however, no control group gave results that differed in a predictable and systematic way from the other, and a matched analyses did not solve the problem. In the end both control groups had their own contribution.