Small Business Economics

, Volume 37, Issue 3, pp 269–275

Candidate gene studies and the quest for the entrepreneurial gene

  • Matthijs J. H. M. van der Loos
  • Philipp D. Koellinger
  • Patrick J. F. Groenen
  • Cornelius A. Rietveld
  • Fernando Rivadeneira
  • Frank J. A. van Rooij
  • André G. Uitterlinden
  • Albert Hofman
  • A. Roy Thurik
Open AccessArticle

DOI: 10.1007/s11187-011-9339-2

Cite this article as:
van der Loos, M.J.H.M., Koellinger, P.D., Groenen, P.J.F. et al. Small Bus Econ (2011) 37: 269. doi:10.1007/s11187-011-9339-2

Abstract

Candidate gene studies of human behavior are gaining interest in economics and entrepreneurship research. Performing and interpreting these studies is not straightforward because the selection of candidates influences the interpretation of the results. As an example, Nicolaou et al. (Small Bus Econ 36:151–155, 2011) report a significant association between a common genetic variant in the DRD3 gene and the tendency to be an entrepreneur. We fail to replicate this finding using a much larger, independent dataset. In addition, we discuss the candidate gene approach and give suggestions to avoid the publication of false positives.

Keywords

GeneticsEntrepreneurshipGenome-wide association studyCandidate gene study

JEL Classifications

L26B40

1 Introduction

In a recent paper in this journal, Nicolaou et al. (2011) report a significant association between a common genetic variant (a single nucleotide polymorphism, or SNP) in the dopamine receptor D3 (DRD3) gene and the tendency to be an entrepreneur, in a group of 1,335 British subjects. In this candidate gene study, polymorphisms in a set of nine genes were tested for an association with the tendency to be an entrepreneur, resulting in a single significant association. The set of candidate genes consisted of five dopamine receptor genes associated with novelty or sensation seeking and four genes associated with attention deficit hyperactivity disorder (ADHD). These specific genes were selected based upon the notions that ADHD and sensation seeking are more common among entrepreneurs. The authors claim that this is the first evidence of an association between variants in a specific gene and entrepreneurship.

We tried to replicate their findings by performing an association analysis of the 18 SNPs reported in Nicolaou et al. (2011), including the significant association between a SNP in the DRD3 gene and entrepreneurship, in three much larger, independent groups of Dutch subjects from the Rotterdam Study (Hofman et al. 1991, 2009). However, we failed to replicate their finding, and, therefore, we postulate that the reported association is a false positive, probably arising from several shortcomings in the study by Nicolaou et al. (2011). We discuss these shortcomings and provide suggestions for future research.

2 Replication study

2.1 Data

Our replication study uses data from The Rotterdam Study (Hofman et al. 1991, 2010), a large population-based prospective cohort study of elderly Caucasians ongoing since 1990 in the city of Rotterdam in the Netherlands. The study started with a pilot phase in the second half of 1989. From January 1990 to September 1993, 7,983 participants were successfully recruited in the well-defined Ommoord district in Rotterdam. This formed the initial cohort called Rotterdam Study I (RS-I). The participants were all 55 years of age or over when entering the study. From February 2000 to December 2001, an additional 3,011 participants older than 55 were gathered within a second cohort and interviewed: Rotterdam Study II (RS-II). From February 2006 to December 2008, a third cohort was gathered, Rotterdam Study III (RS-III), consisting of 3,932 individuals of 45 years and older.

In RS-I, 5,974 participants have been successfully genotyped, 2,129 in RS-II and 2,030 in RS-III. Genotyping is performed using the Illumina 550 and 610 K arrays. As the type of array differs between the candidate gene study and our replication study, not all 18 reported SNPs were readily available in the Rotterdam Study cohorts. Therefore, we imputed these SNPs from the available genotype data using MACH (Li et al. 2006, 2009).

We construct a binary variable indicating whether a subject had (1) never been self-employed or (2) been self-employed at least once during his/her complete working life (RS-I) or in his/her current or last occupation (RS-II and RS-III). For RS-I, individuals with an incomplete working life history and individuals who had never had a job are excluded from our study, except those who are classified as self-employed at least once. The rationale for this is that incomplete working life histories could “contaminate” the control group with people who were self-employed at least once. Complete SNP and self-employment data are available for 5,374 subjects (531 cases, 4,843 controls) in RS-I, 2,066 subjects (197 cases, 1,869 controls) in RS-II, and 1,925 subjects (209 cases, 1,716 controls) in RS-III. In this way, our measure of entrepreneurship is equivalent to the definition used by Nicolaou et al. (2011), i.e., “have you ever started a business in your working life.” This equivalence is confirmed by a correlation coefficient of 0.87 between the two constructs of self-employment and starting a new business (Nicolaou et al. 2008).

2.2 Methods

Association analysis is performed for each SNP by logistic regression using the program mach2dat (Li et al. 2006, 2009), which is accessed through a web-based interface called GRIMP (Estrada et al. 2009). For each SNP, two models are estimated: model 1 including the SNP as an independent variable, and model 2 controlling for sex and possible population stratification by including the first four principal components of the genotypic covariance–variance matrix. For RS-III, a dummy for age (≥50) is included in the latter model.

To adjust for multiple testing, a Bonferroni correction1 is applied resulting in a significance level of 0.0028 (0.05/18 tests), which corresponds to a significance level of 0.05 for all tests. However, we will argue below that this significance level is arbitrary. Several other choices of significance levels could also be justified, although this does not change our conclusions.

2.3 Results

Tables 1, 2, and 3 show the association results for RS-I, RS-II, and RS-III, respectively, between the 18 reported SNPs and “at least once self-employment.” In RS-II and RS-III, none of the SNPs are even remotely significant in both models, while the estimation results for RS-I require more explanation.
Table 1

Association results using two logit models of at least once self-employment for RS-I

SNP

Allele

Chromosome

Frequency

Model 1

Model 2

Beta

p value

Beta

p value

rs1486011

C

3

0.063

0.352

0.0056

0.348

0.0068

rs393795

T

5

0.195

0.064

0.4330

0.046

0.5781

rs409588

T

5

0.193

0.068

0.4021

0.051

0.5402

rs456082

G

5

0.193

0.067

0.4082

0.050

0.5478

rs458860

A

5

0.192

0.068

0.4005

0.051

0.5384

rs460000

T

5

0.191

0.070

0.3880

0.053

0.5229

rs460700

C

5

0.195

0.064

0.4314

0.046

0.5761

rs463379

C

5

0.192

0.069

0.3955

0.051

0.5321

rs464528

T

5

0.192

0.069

0.3972

0.051

0.5342

rs250682

C

5

0.196

0.063

0.4424

0.045

0.5893

rs456774

C

5

0.207

0.104

0.1918

0.090

0.2688

rs1486008

T

3

0.056

0.374

0.0025

0.387

0.0020

rs16822416

A

3

0.056

0.374

0.0025

0.388

0.0020

rs1486009

G

3

0.056

0.374

0.0025

0.388

0.0020

rs464061

A

5

0.211

0.043

0.6117

0.027

0.7542

rs3732783

C

3

0.046

0.365

0.0090

0.384

0.0067

rs4436578

T

11

0.886

0.032

0.7584

0.012

0.9115

rs2975292

G

5

0.640

−0.023

0.7326

−0.002

0.9772

Table 2

Association results using two logit models of at least once self-employment for RS-II

SNP

Allele

Chromosome

Frequency

Model 1

Model 2

Beta

p value

Beta

p value

rs1486011

C

3

0.057

0.020

0.9330

0.017

0.9420

rs393795

T

5

0.203

−0.038

0.7811

−0.037

0.7860

rs409588

T

5

0.200

−0.038

0.7792

−0.037

0.7852

rs456082

G

5

0.200

−0.038

0.7789

−0.037

0.7848

rs458860

A

5

0.200

−0.038

0.7793

−0.037

0.7854

rs460000

T

5

0.199

−0.038

0.7792

−0.037

0.7855

rs460700

C

5

0.203

−0.038

0.7810

−0.037

0.7859

rs463379

C

5

0.200

−0.038

0.7791

−0.037

0.7853

rs464528

T

5

0.200

−0.038

0.7792

−0.037

0.7853

rs250682

C

5

0.203

−0.037

0.7814

−0.037

0.7861

rs456774

C

5

0.214

−0.011

0.9314

−0.013

0.9241

rs1486008

T

3

0.050

−0.001

0.9969

−0.009

0.9711

rs16822416

A

3

0.050

−0.001

0.9965

−0.009

0.9708

rs1486009

G

3

0.050

−0.001

0.9966

−0.009

0.9709

rs464061

A

5

0.219

−0.071

0.6122

−0.072

0.6074

rs3732783

C

3

0.041

0.063

0.8110

0.052

0.8459

rs4436578

T

11

0.891

0.087

0.6143

0.068

0.6964

rs2975292

G

5

0.648

0.056

0.6234

0.052

0.6495

Table 3

Association results using two logit models of at least once self-employment for RS-III

SNP

Allele

Chromosome

Frequency

Model 1

Model 2

Beta

p value

Beta

p value

rs1486011

C

3

0.067

−0.068

0.7674

−0.040

0.8652

rs393795

T

5

0.194

0.139

0.2745

0.157

0.2250

rs409588

T

5

0.194

0.139

0.2747

0.156

0.2254

rs456082

G

5

0.194

0.139

0.2746

0.157

0.2252

rs458860

A

5

0.194

0.139

0.2748

0.156

0.2254

rs460000

T

5

0.194

0.139

0.2751

0.156

0.2259

rs460700

C

5

0.194

0.139

0.2744

0.157

0.2249

rs463379

C

5

0.194

0.139

0.2749

0.156

0.2256

rs464528

T

5

0.194

0.139

0.2748

0.156

0.2255

rs250682

C

5

0.194

0.139

0.2750

0.157

0.2253

rs456774

C

5

0.208

0.125

0.3266

0.145

0.2593

rs1486008

T

3

0.059

−0.151

0.5283

−0.104

0.6690

rs16822416

A

3

0.059

−0.151

0.5284

−0.104

0.6690

rs1486009

G

3

0.059

−0.151

0.5283

−0.104

0.6690

rs464061

A

5

0.214

0.151

0.2509

0.175

0.1896

rs3732783

C

3

0.050

−0.013

0.9583

0.033

0.8952

rs4436578

T

11

0.894

−0.086

0.6003

−0.075

0.6519

rs2975292

G

5

0.644

−0.021

0.8467

−0.053

0.6324

Nicolaou et al. (2011) report a significant association between SNP rs1486011 and the tendency to be an entrepreneur. This SNP is not significantly associated in RS-I at the chosen level of significance of 0.0028. Moreover, the negative coefficient suggests the opposite; carrying the C allele seems not to decrease the probability of being self-employed at least once, as reported by Nicolaou et al. (2011), but to increase the odds.

Further inspection of the results indicates that three SNPs within the DRD3 gene, rs1486008, rs16822416, and rs1486009, survive our Bonferroni-corrected significance level of 0.0028. However, the direction of the effects is opposite to the associations reported in the candidate gene study. Although we cannot reject the hypothesis that the DRD3 gene is associated with entrepreneurship based on these results, they do not support the effect of the G allele of SNP rs1486011 reported by Nicolaou et al. (2011).

3 Discussion

We performed an association analysis of 18 SNPs in the DRD2, DRD3, and SLC6A3 genes in three independent groups of Dutch subjects. The set of analyzed SNPs includes a SNP previously reported to be significantly associated with entrepreneurship by Nicolaou et al. (2011). Our study fails to replicate this association and, in fact, finds several other significant associations with opposite effects to those reported by Nicolaou et al. (2011).

There are several shortcomings with the candidate gene study that lead us to suspect that the reported association is a false positive and that our results should also be interpreted with care. These shortcomings are lessons learned from the era of candidate gene studies, usually pursued with ill-defined markers across genes, small samples, and/or lacking replication. Indeed, there are numerous examples of small-scale candidate gene studies that report significant associations with behavioral traits that could not be replicated. For instance, Israel et al. (2009) report an association between a variant of the OXTR gene and the dictator game. Apicella et al. (2010) fail to replicate this association. Other studies report an association between a genetic variant in the serotonin transporter gene and anxiety-related traits such as harm avoidance (Lesch et al. 1996; Vormfelde et al. 2006) that others fail to replicate (Becker et al. 2007; Lang et al. 2004). Hence, the decisive proof of a true association is replication in an independent study, a feature that the study of Nicolaou et al. (2011) lacks. Lastly, Ioannidis (2005) shows that the pre-study probability of a genetic association being true is generally extremely low, and consequently, the post-study probability is also low.

With regard to the candidate gene study, first, we believe that the selection of candidates by Nicolaou et al. (2011), although seemingly sound, is largely arbitrary. The set comprises genes previously thought to be associated with novelty or sensation seeking and ADHD, characteristics that are hypothesized to be more common among entrepreneurs. Following this line of thought, there are many other candidate genes, such as the serotonin 2A and 1B transporters (HTR2A and HTR2B), dopamine and serotonin transporters (SLC6A3, SLC6A4), dopamine beta-hydroxylase (DBH), monoamine oxidase B (MAOB), and genes associated with testosterone level. Furthermore, probably more than half of all genes are related to brain function or to the expression of proteins in the brain (Sandberg et al. 2000) and could therefore be candidates. This leads to hundreds of thousands of potential candidate loci and makes the candidate gene approach infeasible for the study of complex behaviors such as entrepreneurship.

Second, the selection criteria of SNPs within the chosen candidate genes are confined to the coding regions. A complete overview of the selected SNPs is lacking, although Nicolaou et al. (2011) report that the SNPs from the coding regions of the nine candidate genes were selected. SNPs in regulatory non-coding regions are not considered, although these could have substantial effects on a given phenotype (for an overview, see http://www.genome.gov/gwastudies).

Third, the hypothesis that dopamine receptor genes are associated with novelty or sensation seeking is itself based on mixed evidence from small-scale studies that could not always be replicated. For example, Ebstein et al. (1996) report a significant association between a variant of the DRD4 gene and novelty seeking, which could not be replicated by Malhotra et al. (1996). A recent meta-analysis by Munafo et al. (2008) concludes that the DRD4 gene may be associated with measures of novelty seeking and impulsivity, but significant evidence of publication bias was found. Finally, Verweij et al. (2010) report that the DRD4 gene is not significantly associated with the novelty seeking dimension of Cloninger’s temperament scales, although the study had 91.5% power to detect SNPs that explain 1% of the variance.

Obviously, the choice of candidate genes is limited by knowledge of the biological function of genes and their possible relationship with entrepreneurship. Recent technological advancements have enabled so-called genome-wide association studies (GWASs), which are considered hypothesis-free as no prior knowledge about gene function is needed. Instead of hypothesizing relationships between genes and a trait a priori, a GWAS systematically interrogates the entire genome for associations between genetic variants (SNPs) and a trait. In current GWASs, millions of SNPs are statistically tested for association, leading to a severe multiple testing problem. Therefore, it is conventional wisdom to apply a very stringent significance level of p < 5 × 10−8 (McCarthy et al. 2008) to each tested SNP to control the false positive rate. Despite this, GWASs have been remarkably successful in uncovering associations between common genetic variation and human traits and diseases (Hindorff et al. 2009) and are gaining interest in the social sciences (Koellinger et al. 2010; van der Loos et al. 2010).

Given that GWASs are currently the way forward in genetics research and that genome-wide data are available in the dataset of Nicolaou et al. (2011; see also http://boss.blogs.nytimes.com/2009/09/21/literally-born-entrepreneurs/), a comprehensive, hypothesis-free GWAS of entrepreneurship is an attractive alternative to the hypothesis-based candidate gene study. Obviously, the reported association would not have reached the accepted genome-wide significance level of p < 5 × 10−8. Associations are often reported to be false positives if a set of candidate genes is selected, while not all relevant genes and SNPs are considered (e.g., Apicella et al. 2010; Becker et al. 2007; Israel et al. 2009; Lang et al. 2004; Lesch et al. 1996; Vormfelde et al. 2006).

4 Conclusion

We tried to replicate the significant association between a variant in the DRD3 gene and entrepreneurship reported by Nicolaou et al. (2011), using three much larger, independent groups of Dutch subjects from the Rotterdam Study, and fail to do so. In fact, we find that the reported association has an opposite, insignificant effect in our study. Moreover, we find several other associations with opposite effects among the SNPs reported by Nicolaou et al. (2011). As explained above, it is difficult to choose a level of significance. All associations would be rendered insignificant using the level of significance commonly used in the GWAS approach (p < 5 × 10−8), which is the superior method, in our view.

As another extreme, we can argue that not all 18 SNPs in our analysis are independent, but are correlated, i.e., they are in linkage disequilibrium. Consequently, the number of independent statistical tests would be less than 18, and a higher significance level could have been used. Assuming that, for simplicity, SNPs within a gene are highly correlated, we could effectively perform three independent statistical tests (with the DRD2, DRD3, and SLC6A3 genes), resulting in a Bonferroni-adjusted significance level of 0.0167 (0.05/3). Adopting this significance level, SNPs rs1486011 and rs3732783 would become significantly associated with entrepreneurship next to the three other SNPs reported above, but again with opposite effects to those reported by Nicolaou et al. (2011). Thus, relaxing or tightening the significance level does not change our conclusion; we fail to replicate the results of the candidate gene study, and we emphasize that a hypothesis-free GWAS in an adequately powered setting is the preferred approach.

Footnotes
1

Testing multiple hypotheses will inflate the false positive rate for the entire family of tests. For example, accepting a significance level of 5% and performing 100 tests will yield 5 (100 × 0.05) expected incorrect rejections of the null hypothesis. One possible solution to keep the number of false positives at an acceptable level is the Bonferroni correction. Applying this often-used adjustment consists of dividing the desired family-wise significance level by the number of independent tests performed to obtain a test-wise significance level.

 

Acknowledgments

We would like to thank Dr. Tobias A. Knoch, Anis Abuseiris, Karol Estrada, Luc V. de Zeeuw, and Rob de Graaf, as well as their institutions, the Erasmus Grid Office, Erasmus MC Rotterdam, The Netherlands, and especially the national German MediGRID and Services@MediGRID part of the German D-Grid, both funded by the German Bundesministerium fuer Forschung und Technology under grants # 01 AK 803 A-H and # 01 IG 07015 G for access to their grid resources.

Open Access

This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.

Copyright information

© The Author(s) 2011

Authors and Affiliations

  • Matthijs J. H. M. van der Loos
    • 1
  • Philipp D. Koellinger
    • 1
  • Patrick J. F. Groenen
    • 2
  • Cornelius A. Rietveld
    • 1
  • Fernando Rivadeneira
    • 3
    • 4
  • Frank J. A. van Rooij
    • 3
  • André G. Uitterlinden
    • 3
    • 4
  • Albert Hofman
    • 3
  • A. Roy Thurik
    • 1
  1. 1.Department of Applied Economics, Erasmus School of EconomicsErasmus University RotterdamRotterdamThe Netherlands
  2. 2.Econometric Institute, Erasmus School of EconomicsErasmus University RotterdamRotterdamThe Netherlands
  3. 3.Department of EpidemiologyErasmus Medical CenterRotterdamThe Netherlands
  4. 4.Department of Internal MedicineErasmus Medical CenterRotterdamThe Netherlands