1 Introduction

In a recent paper in this journal, Nicolaou et al. (2011) report a significant association between a common genetic variant (a single nucleotide polymorphism, or SNP) in the dopamine receptor D3 (DRD3) gene and the tendency to be an entrepreneur, in a group of 1,335 British subjects. In this candidate gene study, polymorphisms in a set of nine genes were tested for an association with the tendency to be an entrepreneur, resulting in a single significant association. The set of candidate genes consisted of five dopamine receptor genes associated with novelty or sensation seeking and four genes associated with attention deficit hyperactivity disorder (ADHD). These specific genes were selected based upon the notions that ADHD and sensation seeking are more common among entrepreneurs. The authors claim that this is the first evidence of an association between variants in a specific gene and entrepreneurship.

We tried to replicate their findings by performing an association analysis of the 18 SNPs reported in Nicolaou et al. (2011), including the significant association between a SNP in the DRD3 gene and entrepreneurship, in three much larger, independent groups of Dutch subjects from the Rotterdam Study (Hofman et al. 1991, 2009). However, we failed to replicate their finding, and, therefore, we postulate that the reported association is a false positive, probably arising from several shortcomings in the study by Nicolaou et al. (2011). We discuss these shortcomings and provide suggestions for future research.

2 Replication study

2.1 Data

Our replication study uses data from The Rotterdam Study (Hofman et al. 1991, 2010), a large population-based prospective cohort study of elderly Caucasians ongoing since 1990 in the city of Rotterdam in the Netherlands. The study started with a pilot phase in the second half of 1989. From January 1990 to September 1993, 7,983 participants were successfully recruited in the well-defined Ommoord district in Rotterdam. This formed the initial cohort called Rotterdam Study I (RS-I). The participants were all 55 years of age or over when entering the study. From February 2000 to December 2001, an additional 3,011 participants older than 55 were gathered within a second cohort and interviewed: Rotterdam Study II (RS-II). From February 2006 to December 2008, a third cohort was gathered, Rotterdam Study III (RS-III), consisting of 3,932 individuals of 45 years and older.

In RS-I, 5,974 participants have been successfully genotyped, 2,129 in RS-II and 2,030 in RS-III. Genotyping is performed using the Illumina 550 and 610 K arrays. As the type of array differs between the candidate gene study and our replication study, not all 18 reported SNPs were readily available in the Rotterdam Study cohorts. Therefore, we imputed these SNPs from the available genotype data using MACH (Li et al. 2006, 2009).

We construct a binary variable indicating whether a subject had (1) never been self-employed or (2) been self-employed at least once during his/her complete working life (RS-I) or in his/her current or last occupation (RS-II and RS-III). For RS-I, individuals with an incomplete working life history and individuals who had never had a job are excluded from our study, except those who are classified as self-employed at least once. The rationale for this is that incomplete working life histories could “contaminate” the control group with people who were self-employed at least once. Complete SNP and self-employment data are available for 5,374 subjects (531 cases, 4,843 controls) in RS-I, 2,066 subjects (197 cases, 1,869 controls) in RS-II, and 1,925 subjects (209 cases, 1,716 controls) in RS-III. In this way, our measure of entrepreneurship is equivalent to the definition used by Nicolaou et al. (2011), i.e., “have you ever started a business in your working life.” This equivalence is confirmed by a correlation coefficient of 0.87 between the two constructs of self-employment and starting a new business (Nicolaou et al. 2008).

2.2 Methods

Association analysis is performed for each SNP by logistic regression using the program mach2dat (Li et al. 2006, 2009), which is accessed through a web-based interface called GRIMP (Estrada et al. 2009). For each SNP, two models are estimated: model 1 including the SNP as an independent variable, and model 2 controlling for sex and possible population stratification by including the first four principal components of the genotypic covariance–variance matrix. For RS-III, a dummy for age (≥50) is included in the latter model.

To adjust for multiple testing, a Bonferroni correctionFootnote 1 is applied resulting in a significance level of 0.0028 (0.05/18 tests), which corresponds to a significance level of 0.05 for all tests. However, we will argue below that this significance level is arbitrary. Several other choices of significance levels could also be justified, although this does not change our conclusions.

2.3 Results

Tables 1, 2, and 3 show the association results for RS-I, RS-II, and RS-III, respectively, between the 18 reported SNPs and “at least once self-employment.” In RS-II and RS-III, none of the SNPs are even remotely significant in both models, while the estimation results for RS-I require more explanation.

Table 1 Association results using two logit models of at least once self-employment for RS-I
Table 2 Association results using two logit models of at least once self-employment for RS-II
Table 3 Association results using two logit models of at least once self-employment for RS-III

Nicolaou et al. (2011) report a significant association between SNP rs1486011 and the tendency to be an entrepreneur. This SNP is not significantly associated in RS-I at the chosen level of significance of 0.0028. Moreover, the negative coefficient suggests the opposite; carrying the C allele seems not to decrease the probability of being self-employed at least once, as reported by Nicolaou et al. (2011), but to increase the odds.

Further inspection of the results indicates that three SNPs within the DRD3 gene, rs1486008, rs16822416, and rs1486009, survive our Bonferroni-corrected significance level of 0.0028. However, the direction of the effects is opposite to the associations reported in the candidate gene study. Although we cannot reject the hypothesis that the DRD3 gene is associated with entrepreneurship based on these results, they do not support the effect of the G allele of SNP rs1486011 reported by Nicolaou et al. (2011).

3 Discussion

We performed an association analysis of 18 SNPs in the DRD2, DRD3, and SLC6A3 genes in three independent groups of Dutch subjects. The set of analyzed SNPs includes a SNP previously reported to be significantly associated with entrepreneurship by Nicolaou et al. (2011). Our study fails to replicate this association and, in fact, finds several other significant associations with opposite effects to those reported by Nicolaou et al. (2011).

There are several shortcomings with the candidate gene study that lead us to suspect that the reported association is a false positive and that our results should also be interpreted with care. These shortcomings are lessons learned from the era of candidate gene studies, usually pursued with ill-defined markers across genes, small samples, and/or lacking replication. Indeed, there are numerous examples of small-scale candidate gene studies that report significant associations with behavioral traits that could not be replicated. For instance, Israel et al. (2009) report an association between a variant of the OXTR gene and the dictator game. Apicella et al. (2010) fail to replicate this association. Other studies report an association between a genetic variant in the serotonin transporter gene and anxiety-related traits such as harm avoidance (Lesch et al. 1996; Vormfelde et al. 2006) that others fail to replicate (Becker et al. 2007; Lang et al. 2004). Hence, the decisive proof of a true association is replication in an independent study, a feature that the study of Nicolaou et al. (2011) lacks. Lastly, Ioannidis (2005) shows that the pre-study probability of a genetic association being true is generally extremely low, and consequently, the post-study probability is also low.

With regard to the candidate gene study, first, we believe that the selection of candidates by Nicolaou et al. (2011), although seemingly sound, is largely arbitrary. The set comprises genes previously thought to be associated with novelty or sensation seeking and ADHD, characteristics that are hypothesized to be more common among entrepreneurs. Following this line of thought, there are many other candidate genes, such as the serotonin 2A and 1B transporters (HTR2A and HTR2B), dopamine and serotonin transporters (SLC6A3, SLC6A4), dopamine beta-hydroxylase (DBH), monoamine oxidase B (MAOB), and genes associated with testosterone level. Furthermore, probably more than half of all genes are related to brain function or to the expression of proteins in the brain (Sandberg et al. 2000) and could therefore be candidates. This leads to hundreds of thousands of potential candidate loci and makes the candidate gene approach infeasible for the study of complex behaviors such as entrepreneurship.

Second, the selection criteria of SNPs within the chosen candidate genes are confined to the coding regions. A complete overview of the selected SNPs is lacking, although Nicolaou et al. (2011) report that the SNPs from the coding regions of the nine candidate genes were selected. SNPs in regulatory non-coding regions are not considered, although these could have substantial effects on a given phenotype (for an overview, see http://www.genome.gov/gwastudies).

Third, the hypothesis that dopamine receptor genes are associated with novelty or sensation seeking is itself based on mixed evidence from small-scale studies that could not always be replicated. For example, Ebstein et al. (1996) report a significant association between a variant of the DRD4 gene and novelty seeking, which could not be replicated by Malhotra et al. (1996). A recent meta-analysis by Munafo et al. (2008) concludes that the DRD4 gene may be associated with measures of novelty seeking and impulsivity, but significant evidence of publication bias was found. Finally, Verweij et al. (2010) report that the DRD4 gene is not significantly associated with the novelty seeking dimension of Cloninger’s temperament scales, although the study had 91.5% power to detect SNPs that explain 1% of the variance.

Obviously, the choice of candidate genes is limited by knowledge of the biological function of genes and their possible relationship with entrepreneurship. Recent technological advancements have enabled so-called genome-wide association studies (GWASs), which are considered hypothesis-free as no prior knowledge about gene function is needed. Instead of hypothesizing relationships between genes and a trait a priori, a GWAS systematically interrogates the entire genome for associations between genetic variants (SNPs) and a trait. In current GWASs, millions of SNPs are statistically tested for association, leading to a severe multiple testing problem. Therefore, it is conventional wisdom to apply a very stringent significance level of p < 5 × 10−8 (McCarthy et al. 2008) to each tested SNP to control the false positive rate. Despite this, GWASs have been remarkably successful in uncovering associations between common genetic variation and human traits and diseases (Hindorff et al. 2009) and are gaining interest in the social sciences (Koellinger et al. 2010; van der Loos et al. 2010).

Given that GWASs are currently the way forward in genetics research and that genome-wide data are available in the dataset of Nicolaou et al. (2011; see also http://boss.blogs.nytimes.com/2009/09/21/literally-born-entrepreneurs/), a comprehensive, hypothesis-free GWAS of entrepreneurship is an attractive alternative to the hypothesis-based candidate gene study. Obviously, the reported association would not have reached the accepted genome-wide significance level of p < 5 × 10−8. Associations are often reported to be false positives if a set of candidate genes is selected, while not all relevant genes and SNPs are considered (e.g., Apicella et al. 2010; Becker et al. 2007; Israel et al. 2009; Lang et al. 2004; Lesch et al. 1996; Vormfelde et al. 2006).

4 Conclusion

We tried to replicate the significant association between a variant in the DRD3 gene and entrepreneurship reported by Nicolaou et al. (2011), using three much larger, independent groups of Dutch subjects from the Rotterdam Study, and fail to do so. In fact, we find that the reported association has an opposite, insignificant effect in our study. Moreover, we find several other associations with opposite effects among the SNPs reported by Nicolaou et al. (2011). As explained above, it is difficult to choose a level of significance. All associations would be rendered insignificant using the level of significance commonly used in the GWAS approach (p < 5 × 10−8), which is the superior method, in our view.

As another extreme, we can argue that not all 18 SNPs in our analysis are independent, but are correlated, i.e., they are in linkage disequilibrium. Consequently, the number of independent statistical tests would be less than 18, and a higher significance level could have been used. Assuming that, for simplicity, SNPs within a gene are highly correlated, we could effectively perform three independent statistical tests (with the DRD2, DRD3, and SLC6A3 genes), resulting in a Bonferroni-adjusted significance level of 0.0167 (0.05/3). Adopting this significance level, SNPs rs1486011 and rs3732783 would become significantly associated with entrepreneurship next to the three other SNPs reported above, but again with opposite effects to those reported by Nicolaou et al. (2011). Thus, relaxing or tightening the significance level does not change our conclusion; we fail to replicate the results of the candidate gene study, and we emphasize that a hypothesis-free GWAS in an adequately powered setting is the preferred approach.