Human Genetics

, Volume 123, Issue 5, pp 455–468

Genome-wide screen for asthma in Puerto Ricans: evidence for association with 5q23 region

Authors

    • Institute for Human GeneticsUniversity of California
    • UCSF/Lung Biology CenterUniversity of California, San Francisco
    • Department of MedicineUniversity of California
  • Margaret Taub
    • Department of BiostatisticsUniversity of California
  • Rui Mei
    • Affymetrix Inc
  • José Rodriguez-Santana
    • Centro de Neumologia Pediatrica, CSP
  • William Rodriguez-Cintron
    • Veterans Caribbean Health Care System The University of Puerto Rico School of Medicine
  • Mark D. Shriver
    • Department of AnthropologyPennsylvania State University
  • Elad Ziv
    • Institute for Human GeneticsUniversity of California
    • Department of MedicineUniversity of California
  • Neil J. Risch
    • Institute for Human GeneticsUniversity of California
    • Department of Epidemiology and BiostatisticsUniversity of California
    • Division of ResearchKaiser Permanente
  • Esteban González Burchard
    • Institute for Human GeneticsUniversity of California
    • UCSF/Lung Biology CenterUniversity of California, San Francisco
    • Department of MedicineUniversity of California
    • Department of Biopharmaceutical SciencesUniversity of California
    • Division of ResearchKaiser Permanente
Original Investigation

DOI: 10.1007/s00439-008-0495-7

Cite this article as:
Choudhry, S., Taub, M., Mei, R. et al. Hum Genet (2008) 123: 455. doi:10.1007/s00439-008-0495-7

Abstract

While the number of success stories for mapping genes associated with complex diseases using genome-wide association approaches is growing, there is still much work to be done in developing methods for such studies when the samples are collected from a population, which may not be homogeneous. Here we report the first genome-wide association study to identify genes associated with asthma in an admixed population. We genotyped 96 Puerto Rican moderate to severe asthma cases and 88 controls as well as 109 samples representing Puerto Rico’s founding populations using the Affymetrix GeneChip Human Mapping 100K array sets. The data from samples representing Puerto Rico’s founding populations was used to identify ancestry informative markers for admixture mapping analyses. In addition, a genome-wide association analysis using logistic regression was performed on the data. Although neither admixture mapping nor regression analysis gave any significant association with asthma after correction for multiple testing, an overlap analysis using the top scoring SNPs from different methods suggested chromosomal regions 5q23.3 and 13q13.3 as potential regions harboring genes for asthma in Puerto Ricans. The validation analysis of these two regions in 284 Puerto Rican asthma trios gave significant association for the 5q23.3 region. Our results provide strong evidence that the previously linked 5q23 region is associated with asthma in Puerto Ricans. The detection of causative variants in this region will require fine mapping and functional validation.

Introduction

In the USA, asthma prevalence is highest in Puerto Ricans (26%) and lowest in Mexicans (10%) (Carter-Pokras and Gergen 1993; Homa et al. 2000). This is paradoxical since both groups are considered “Hispanic” or “Latino”. Although there are many potential explanations for this observation, including environmental and socioeconomic factors, one likely explanation is that the genetic predisposition to asthma differs among subgroups within the Latino population. Latinos are admixed and share varying proportions of West African, Native American and European ancestry (Choudhry et al. 2006; Hanis et al. 1991). The mixed ancestry of Latinos provides unique opportunities in epidemiological and genetic studies and may be useful in untangling complex gene–gene and gene–environment interactions in disease susceptibility (Burchard et al. 2003; Choudhry et al. 2007).

Several recent advances in statistical methods and genotyping techniques have resulted in a paradigm shift in genetic association studies, making it possible to perform a genome-wide association analysis, which does not require a priori knowledge of disease associated genes (Hirschhorn and Daly 2005; Kennedy et al. 2003; Matsuzaki et al. 2004; Risch 2000). An alternative yet complementary approach to genome-wide association analysis is admixture mapping (Smith and O’Brien 2005). Admixture mapping is a method for localizing disease causing genetic variants that differ in frequency across populations. It is most advantageous to apply this approach to admixed populations such as Latinos, which descended from a recent mix of three ancestral groups that have been geographically isolated for thousands of years. The approach assumes that near a disease causing gene there will be enhanced ancestry from the population that has greater risk of getting the disease. Thus if one can calculate the ancestry along the genome for an admixed sample set, one could use that to identify disease causing gene variants (Chakraborty and Weiss 1988; McKeigue 1997; Pfaff et al. 2001; Stephens et al. 1994). Admixture mapping requires the genotyping of several thousand markers (Smith et al. 2004) while genome-wide association requires genotyping of hundreds of thousands of markers (Hinds et al. 2005). The Affymetrix GeneChip Human Mapping 100K array set can genotype 116,204 SNPs in a given individual with a single genotyping assay (Kennedy et al. 2003). The ease and cost effectiveness of the GeneChip arrays for large-scale genotyping make them far more attractive than conventional genotyping platforms for genome-wide studies.

We used the Affymetrix 100K arrays to perform genome-wide association and admixture mapping analyses to identify loci associated with asthma in Puerto Ricans. First, we selected ancestry informative markers (AIMs) from the 100K arrays by screening Puerto Rican ancestral populations. We then performed three different analyses: (1) genome-wide association analysis testing association between individual SNPs and asthma disease status, (2) admixture mapping comparing cases and controls using the program Admixmap and (3) admixture mapping using likelihood ratio test on locus-specific ancestry estimates determined using the program Structure. In all three analyses, we incorporated adjustments to correct for confounding due to population stratification. By combining results from these three different analytical methods, we determined a set of most promising candidate regions for asthma in Puerto Ricans. We then selected the top SNPs from these candidate regions and genotyped them in a second sample of Puerto Rican families with asthma for validation analysis.

Materials and methods

Study participants

A total of 380 Puerto Rican subjects with asthma and 88 ethnically matched controls were included in this study. The genome-wide analyses included 96 subjects with moderate to severe asthma and 88 healthy controls. The moderate to severe asthma was defined based on baseline lung function (Pre-FEV1) of the asthmatic subject. Subjects with Pre-FEV1 less than 80% of predicted were categorized as having “moderate-severe” asthma. The validation analysis included 284 Puerto Rican asthma trios (father, mother and affected child). All subjects were recruited as part of the Genetics of Asthma in Latino Americans (GALA) study. Recruitment and patient characteristics were described in detail elsewhere (Burchard et al. 2004; Choudhry et al. 2005; Lind et al. 2003) but will be briefly described here. Ethnicity and national origin were self-reported and were ascertained using standardized questions. Puerto Rican subjects were enrolled only if both biological parents and all four biological grandparents were reported to be of Puerto Rican ethnicity. Interviews with children were conducted in the presence of parents. Eligible subjects with asthma had physician-diagnosed asthma and had experienced two or more asthma symptoms (among wheezing, coughing, and shortness of breath) in the previous two years. All control subjects were screened and considered to be eligible to participate if they did not have clinical evidence of asthma, allergies, atopy or any other allergic or pulmonary disease. All subjects (asthmatics and healthy controls) were between the ages of 8 and 40 years and were interviewed by bilingual and bicultural field workers and physicians specialized in asthma.

Genotyping using Affymetrix 100K arrays

We genotyped 37 West African, 42 European and 30 Native American samples using the Affymetrix GeneChip Human Mapping 100K array set to find AIMs relevant to Puerto Rico’s founding populations. The 37 West African samples are from individuals living in London, UK and South Carolina, USA, who are either non-admixed or have very low levels of admixture. The 42 European samples are from Coriell’s North American Caucasian panel. The Native American samples (Mayan, n = 15 and Nahua, n = 15) were recruited from villages in remote areas of Mexico. Genotyping of Puerto Rican asthma cases and controls was also performed using the Affymetrix 100K arrays. The genotyping was done following standard Affymetrix protocols and the data was processed using the Affymetrix-provided Genotyping Console Software (GCOS) and GeneChip DNA Analysis Software (GDAS) (Affymetrix Inc., Santa Clara, CA, USA). Ten samples were run in duplicate to assess for concordance between runs. The concordance rate was >99.9% and the overall genotyping success rate was >98.5%. Markers were assessed for Hardy–Weinberg equilibrium using a χ2 goodness-of-fit test. After excluding markers on the X chromosome, markers with minor allele frequency (MAF) <5%, extreme deviation from Hardy–Weinberg equilibrium (χ> 10) or study-wide genotyping call rates <95%, we retained 97,112 markers for further association analysis.

Selection of ancestry informative markers (AIMs)

The genotype data from the parental population samples was used to identify AIMs, which were then used to perform admixture mapping and to adjust the analyses for population stratification. Since the contemporary Puerto Rican population is a mixture of three parental populations, West Africans, Europeans and Native Americans, we used an iterative process for selecting our AIMs. For each of the three possible pairs of ancestral populations, we identified markers where the difference in allele frequency (δ) was at least 0.5 between any two ancestral populations. Once we identified such markers, we selected a subset that was adequately distributed across the genome, with the markers being far enough apart that they were in linkage equilibrium in the ancestral populations. These markers formed our set of 2,730 AIMs, which were then used to estimate individual and locus specific ancestry.

Estimation of individual ancestry

The individual ancestry estimates (IAE) were calculated using two different programs, Admixmap (Hoggart et al. 2003, 2004) and Structure (Falush et al. 2003; Pritchard et al. 2000), and the 2,730 AIMs. The IAE from these two programs were highly correlated (r > 0.9) (results not shown). The IAE from the Structure program were used to adjust for population stratification in the 100K association analysis.

Estimation of locus specific ancestry and admixture mapping analysis

The admixture mapping analyses were performed using the panel of 2,730 AIMs as described above and using two different programs, Admixmap and Structure. Admixmap uses a Bayesian probability model fit using Markov chain Monte Carlo to estimate locus specific ancestry (Hoggart et al. 2004). To run Admixmap, we provided the program with the genotypes of the 2,730 AIMs for our case and control subjects, as well as for the ancestral representatives. Admixmap performs a test for association between ancestral status and disease status and returns a p value for each marker. We also estimated the ancestral proportion at each of the 2,730 AIMs using the program Structure (Montana and Pritchard 2004). At each marker, a likelihood ratio statistic was computed, testing the null hypothesis of no association between ancestry and disease status while taking each individual’s overall ancestry estimates into account. Under the null hypothesis, the statistics follow a χ2-distribution and indicates the strength of deviation in ancestry between cases and controls at each locus. Statistical significance was assessed using a permutation test.

Genome-wide association analysis: regression method

We used a logistic regression model to test for association between genotype and disease status for 97,112 markers on the 100K arrays assuming an additive model for the disease. In addition to age and gender, IAE estimates using the program Structure were included as covariates in all the regression models to control the inflation of type I error rate due to population stratification.

Correction for multiple testing

The admixture mapping and genome-wide association analyses were corrected for multiple testing using the multtest package in the statistical language R. We used adjusted p values as calculated by multtest under the Benjamini & Yekutieli step-up false discovery rate (FDR) controlling procedure (Benjamini and Yekutieli 2001).

Identification of follow-up regions by overlap analysis

To identify areas of potential interest we used the combined strength of the different analytical methods: (1) individual SNP association, (2) admixture mapping using a case–control approach with the program Admixmap and (3) admixture mapping using likelihood ratio test on locus specific ancestry estimates from the program Structure. We identified regions for subsequent analysis by selecting markers that were highly ranked (based on unadjusted p value or likelihood ratio test score) in at least two of the three analytical methods.

Validation

To test the validity of the regions identified through the overlap analysis, we performed validation analysis on the most promising markers in these regions on a sample of 284 Puerto Rican asthma trios. The genotyping was performed using the fluorescent polarization (FP) method as directed by the manufacturer (PerkinElmer, Waltham, MA, USA) (Chen et al. 1999). The association of the markers with asthma disease status was tested using the transmission-disequilibrium test as implemented in the program FBAT (Laird et al. 2000).

Results

Identification of AIMs

Table 1 shows the percent of markers on the Affymetrix 100K arrays informative for West African-European, West African-Native American and European-Native American ancestries at different levels of ancestry informativeness as measured by difference in allele frequency (delta, δ). Most markers were only informative for one pair of ancestral populations, while some were informative for more than one pair. Among the markers on the 100K arrays, there were more than 10,000 markers with δ > 0.5 for West African-European, West African-Native American or European-Native American ancestry (Table 1). We selected a panel of 2,730 AIMs which had a δ value of >0.5 and were evenly spaced across the genome (inter-marker distance mean: 1 cM, median: 0.79 cM, first quartile: 0.67 cM and third quartile: 1.05 cM and the overall inter-marker range: 0.01–26.07 cM) for estimation of individual ancestry estimates and admixture mapping analysis for asthma (see supporting material, Table 1S).
Table 1

Ancestry informative markers (AIMs) identified using the Affymetrix GeneChip Human Mapping 100K array set

Marker informativeness

West African-European ancestry

Percent of markers

West African-Native American ancestry

Percent of markers

European-Native American ancestry

Percent of markers

δ > 0.5

5781

4.98

8242

7.09

5040

4.34

δ > 0.4

10810

9.30

12819

11.03

9602

8.26

δ > 0.3

19292

16.60

19477

16.76

17306

14.89

Evidence of population stratification in Puerto Rican Asthma cases and controls

The estimated average European and West African ancestry was different between our Puerto Rican cases and controls. Figure 1 shows box plots demonstrating differences in estimated European and West African ancestral proportions between cases (with medians of 62.0 and 22.3%, respectively) and controls (medians of 58.1 and 26.0%, respectively). The Native American ancestry was similar between the two groups (medians of 15.7% in cases and 15.9% in controls) (Fig. 1). The summary χ2 test using 2,730 AIMs also gave significant results (p = 0.01) suggesting that there are systematic differences in ancestry between our asthma cases and controls that could cause spurious genetic associations if measures of ancestry were not included in our analyses.
https://static-content.springer.com/image/art%3A10.1007%2Fs00439-008-0495-7/MediaObjects/439_2008_495_Fig1_HTML.gif
Fig. 1

Boxplots showing distribution of ancestry estimates in Puerto Rican asthma cases and controls

Admixture mapping analysis

Neither of the admixture mapping analyses (Admixmap or likelihood ratio test based on Structure output) gave any significant results after correction for multiple testing using the FDR approach (Figs. 2, 3). There were 56 markers that had an unadjusted p value of <0.01 in the Admixmap analysis (Table 2) and 21 markers that gave a score of ≥10 in the likelihood ratio tests before correction for multiple testing (Table 3).
https://static-content.springer.com/image/art%3A10.1007%2Fs00439-008-0495-7/MediaObjects/439_2008_495_Fig2_HTML.gif
Fig. 2

Distribution of p values for 2730 AIMs across the chromosomes from admixture mapping analysis using the program Admixmap

https://static-content.springer.com/image/art%3A10.1007%2Fs00439-008-0495-7/MediaObjects/439_2008_495_Fig3_HTML.gif
Fig. 3

Distribution of likelihood ratio test scores for 2,730 AIMs across the chromosomes based on locus-specific ancestry estimates from the program Structure

Table 2

Top markers from the admixture mapping analysis using the program Admixmap. All markers with p values ≤0.01

rs number

Chr.

Physical location

Cytoband

Allele 1

Allele 2

Freq. African

Freq. European

Freq. Native American

Freq. cases

Freq. controls

P value

rs2353169

1

4967042

1p36.32

C

T

0.03

0.48

0.37

0.39

0.29

0.0061

rs2455122

1

2971126

1p36.32

C

G

0.25

0.74

0.57

0.68

0.62

0.0094

rs1495244

1

4012802

1p36.32

C

T

0.24

0.64

0.90

0.72

0.57

0.003

rs3856123

1

179995675

1q25.3

C

T

0.69

0.01

0.00

0.18

0.10

0.0015

rs3856123

1

179995675

1q25.3

C

T

0.69

0.01

0.00

0.18

0.10

0.0097

rs4418557

1

179222342

1q25.3

A

C

1.00

0.74

0.77

0.75

0.74

0.0073

rs10494580

1

181592018

1q25.3

C

G

0.88

0.25

0.27

0.32

0.27

0.0037

rs10489482

1

182625996

1q25.3

C

T

0.16

0.67

0.30

0.56

0.55

0.0076

rs6734345

2

223342750

2q36.1

A

C

0.06

0.19

0.67

0.29

0.31

0.0097

rs1517634

2

224386024

2q36.1

A

G

0.83

0.83

0.00

0.78

0.63

0.0068

rs7603262

2

224839029

2q36.1

A

C

0.50

0.95

0.30

0.74

0.65

0.0064

rs583608

2

226057411

2q36.2

A

G

0.21

0.63

0.37

0.46

0.47

0.0079

rs2216460

2

225450026

2q36.2

G

T

0.12

0.81

0.70

0.58

0.52

0.007

rs9288614

2

227024125

2q36.3

C

G

0.25

0.83

0.93

0.74

0.78

0.0096

rs3860546

3

76329712

3p12.3

A

G

0.50

0.14

0.20

0.33

0.22

0.0066

rs10511040

3

76197443

3p12.3

C

G

0.22

0.02

0.47

0.17

0.16

0.0077

rs2160836

3

179031138

3q26.32

A

C

0.14

0.66

0.53

0.56

0.55

0.01

rs10513785

3

183071354

3q26.33

C

T

0.50

1.00

1.00

0.94

0.95

0.007

rs1010471

3

182012005

3q26.33

C

T

0.22

0.57

0.60

0.52

0.53

0.0087

rs754980

3

191072859

3q28

A

G

0.19

0.67

0.97

0.49

0.59

0.0067

rs4686997

3

189860774

3q28

A

G

0.19

0.24

0.77

0.20

0.24

0.01

rs1129586

3

192305960

3q28

A

G

0.91

0.59

0.93

0.71

0.78

0.0074

rs6840776

4

93118148

4q22.1

A

T

0.18

0.26

0.81

0.22

0.28

0.0064

rs2035116

4

92375810

4q22.1

A

G

0.97

0.62

0.93

0.76

0.69

0.0098

rs7731657

5

130019535

5q23.3

C

T

0.22

0.10

0.57

0.26

0.24

0.0087

rs715285

5

131561599

5q23.3

A

G

0.88

0.51

0.97

0.73

0.63

0.0047

rs715285

5

131561599

5q23.3

A

G

0.88

0.51

0.97

0.73

0.63

0.009

rs803054

5

132162193

5q23.3

A

G

0.38

0.93

0.50

0.69

0.81

0.0047

rs803054

5

132162193

5q23.3

A

G

0.38

0.93

0.50

0.69

0.81

0.01

rs798415

5

130769939

5q23.3

A

C

0.83

0.84

0.20

0.78

0.80

0.0058

rs10491293

5

133066113

5q31.1

C

G

0.97

0.90

0.47

0.82

0.81

0.008

rs1339110

6

150729545

6q25.1

C

G

0.26

0.02

0.25

0.13

0.21

0.0099

rs359819

8

47263291

8q11.1

A

G

0.58

0.99

0.50

0.80

0.83

0.008

rs6984609

8

49639545

8q11.21

C

T

0.58

0.02

0.17

0.24

0.26

0.0032

rs1384830

8

51594553

8q11.21

C

T

0.78

0.04

0.23

0.18

0.28

0.0098

rs7830743

8

48760915

8q11.21

A

G

0.67

1.00

0.30

0.72

0.86

0.0022

rs1356752

8

50400830

8q11.21

C

T

0.42

0.95

0.97

0.84

0.80

0.0048

rs4873102

8

48402365

8q11.21

A

G

0.97

0.50

0.70

0.59

0.57

0.0027

rs10511666

9

18959422

9p22.1

G

T

0.67

0.11

0.07

0.23

0.13

0.0077

rs9298791

9

18308754

9p22.2

C

G

0.06

0.35

0.83

0.36

0.37

0.01

rs2616796

11

31152138

11p13

C

G

0.61

1.00

0.47

0.92

0.93

0.0049

rs224603

11

31942320

11p13

C

G

0.56

0.26

0.63

0.35

0.45

0.0024

rs1002229

11

32017316

11p13

A

T

0.47

0.14

0.67

0.23

0.32

0.0025

rs10488788

11

29441909

11p14.1

A

T

0.19

0.07

0.47

0.14

0.13

0.0067

rs2310798

11

30411932

11p14.1

A

G

0.64

0.13

0.07

0.14

0.30

0.0044

rs4980804

12

132190

12p13.33

C

T

0.14

0.52

0.70

0.49

0.35

0.0024

rs2051852

12

872280

12p13.33

C

T

0.64

0.63

0.89

0.74

0.65

0.0024

rs1044825

12

1771746

12p13.33

G

T

0.97

0.43

0.63

0.63

0.60

0.0074

rs1337604

13

36309215

13q13.3

C

T

0.75

0.43

0.90

0.56

0.53

0.0094

rs9576235

13

35648706

13q13.3

A

G

0.00

0.33

0.60

0.18

0.27

0.0089

rs2197879

13

37043188

13q13.3

A

T

0.75

0.29

0.57

0.40

0.45

0.0072

rs9315703

13

37926845

13q14.11

A

G

0.75

0.56

0.97

0.54

0.66

0.006

rs9315762

13

38539907

13q14.11

C

T

1.00

0.76

0.47

0.86

0.78

0.0086

rs970254

15

33572178

15q14

C

T

0.86

0.29

0.07

0.36

0.34

0.0096

rs893131

15

32806968

15q14

A

G

0.94

0.86

0.20

0.77

0.74

0.0094

rs636842

15

31984672

15q14

A

C

1.00

0.65

0.53

0.73

0.68

0.0067

Table 3

Top markers from the admixture mapping analysis using the program structure and likelihood ratio test statistic

rs number

Chr.

Physical location

Cytoband

Allele 1

Allele 2

Freq. African

Freq. European

Freq. Native American

Freq. cases

Freq. controls

Test score

rs2034915

1

240825227

1q44

C

T

0.11

0.69

0.36

0.66

0.44

13.8

rs1822300

2

24699320

2p23.3

G

T

0.14

0.62

0.87

0.54

0.48

11.2

rs1374196

2

38203203

2p22.2

C

T

0.79

0.40

0.70

0.59

0.46

11.2

rs1517634

2

224386024

2q36.1

A

G

0.83

0.83

0.00

0.78

0.63

12.0

rs10511040

3

76197443

3p12.3

C

G

0.22

0.02

0.47

0.17

0.16

13.5

rs1112924

3

160063925

3q25.32

A

G

0.89

0.20

0.07

0.36

0.31

10.9

rs2593068

4

56745024

4q12

C

T

0.35

0.60

0.23

0.44

0.42

10.3

rs4349629

4

63961550

4q13.1

A

G

0.75

0.28

0.30

0.44

0.30

11.8

rs27761

5

14484718

5p15.2

C

T

0.92

0.50

0.37

0.49

0.49

12.3

rs803054

5

132162193

5q23.3

A

G

0.38

0.93

0.50

0.69

0.81

10.8

rs9329079

5

179049730

5q35.3

A

G

0.67

0.90

0.07

0.66

0.61

12.9

rs1555082

6

21455152

6p22.3

A

G

0.09

0.50

0.63

0.52

0.53

10.6

rs2037499

7

98924050

7q22.1

C

G

0.39

0.93

0.70

0.65

0.71

11.7

rs7830743

8

48760915

8q11.21

A

G

0.67

1.00

0.30

0.72

0.86

10.4

rs10504696

8

80110903

8q21.13

C

T

0.14

0.88

0.87

0.69

0.62

10.7

rs10511666

9

18959422

9p22.1

G

T

0.67

0.11

0.07

0.23

0.13

15.6

rs10764058

10

36094529

10p11.21

G

T

0.59

0.05

0.03

0.12

0.22

15.6

rs2310798

11

30411932

11p14.1

A

G

0.64

0.13

0.07

0.14

0.30

11.1

rs1863372

11

127428642

11q24.3

A

C

0.67

0.45

0.13

0.47

0.52

11.0

rs4034627

12

126750571

12q24.32

A

G

0.78

0.05

0.03

0.24

0.18

11.0

rs4131672

18

73378961

18q23

C

G

0.67

0.81

0.57

0.72

0.68

10.2

All markers with test score ≥10

Individual SNP analyses

Eight SNPs had p value of less than 10−4 in the regression analysis and were located on chromosomal regions 1p22.3, 1p32.2, 4q31.1, 10q23.31, 11q14.1, 13q13.3 and 13q22.3 (Table 4). The smallest unadjusted p value was 1.3 × 10−5 for a marker in chromosomal region 4q31.1. Taking into account the fact that 97,112 simultaneous hypothesis tests were conducted, none of the markers showed statistically significant association with disease status after correction for multiple testing (Fig. 4). However, it is still possible that some of the more significant markers did not have low enough p values simply due to the low power of the study, and to further explore this possibility we ranked the markers based on the relative strength of their association from the regression and admixture mapping analyses, and used these rankings for the overlap analysis.
Table 4

Top markers from the regression analysis

rs number

Chr.

Physical location

Cytoband

Allele 1

Allele 2

Freq. cases

Freq. controls

p value

rs1858556

1

85793742

1p22.3

A

G

0.63

0.80

0.00007

rs10493777

1

85799621

1p22.3

A

G

0.36

0.19

0.00007

rs1342382

1

56547306

1p32.2

A

T

0.06

0.20

0.00007

rs1505863

4

140188929

4q31.1

C

T

0.65

0.82

0.00001

rs1324705

10

91308184

10q23.31

A

G

0.50

0.67

0.00009

rs2175961

11

81768155

11q14.1

C

T

0.06

0.21

0.00006

rs2150468

13

36725888

13q13.3

G

T

0.81

0.91

0.00009

rs679090

13

75345792

13q22.3

C

T

0.22

0.09

0.00003

All markers with p values ≤10−4

https://static-content.springer.com/image/art%3A10.1007%2Fs00439-008-0495-7/MediaObjects/439_2008_495_Fig4_HTML.gif
Fig. 4

Distribution of p values for 97,112 markers across the chromosomes from asthma case–control regression analysis

Overlap analysis

For the overlap analysis, we selected the highest ranked SNPs from each of the three analytical methods. We selected 81 SNPs with p values ≤0.001 from the regression analysis, 56 SNPs with p values ≤0.01 from the Admixmap analysis and 21 SNPs with a combined score of ≥10 from the likelihood ratio test. We then performed an overlap analysis on these SNPs and selected regions, which were identified as being highly ranked by at least two of the three methods employed. We considered loci to be overlapping if at least two methods had highly ranked SNPs within 50 kilobases (kb) of each other. From these analyses, we identified five “overlap regions” where the markers showed higher significance or were closer together than in the other overlap regions (Table 5).
Table 5

The top five regions from the overlap analysis

rs number

Chr.

Physical Location

Cytoband

Allele 1

Allele 2

Admixmap

Test score

Regression

rs1517634

2

224386024

2q36.1

A

G

0.0068

12.0

NS

rs803054

5

132162193

5q23.3

A

G

0.0047

10.8

NS

rs10511666

9

18959422

9p22.1

G

T

0.0077

15.6

0.00097

rs2310798

11

30411932

11p14.1

A

G

0.0044

11.1

NS

rs1337604

13

36309215

13q13.3

C

T

0.0094

NS

0.00028

The last three columns have the p value from Admixmap analysis, test score from the likelihood ratio test on locus specific ancestry estimates from structure and p value from the 100K regression analysis for the markers

NS indicates no significance for that method

To test for further significance within these candidate regions, we selected windows on either side of the overlap SNPs, and tested for enhanced significance in these windows. Windows consisted of sets of 500 markers from the 97,112 marker set for each of the five overlap regions, with markers chosen so that the overlap SNP was at the center of the window in terms of physical distance. Therefore, the window sizes varied (9–12 Mb) according to the density of the markers on the 100K arrays. We plotted the histograms of the p values from the regression analysis for the markers in these windows, as well as quantile–quantile (QQ) plots to test for deviation from the empirical p value distribution of all 97,112 markers tested. These plots indicated that the largest deviations from the empirical distribution were in the 5q23.3 and 13q13.3 regions (Fig. 5). These deviations could either be due to enhanced association between SNPs and disease status in these regions or could be due to local differences in ancestry since our regression and admixture mapping analyses were adjusted only for ancestry on a genome-wide level.
https://static-content.springer.com/image/art%3A10.1007%2Fs00439-008-0495-7/MediaObjects/439_2008_495_Fig5_HTML.gif
Fig. 5

Plots showing histograms of p values for the markers from the top five regions from the overlap analysis and quantile-quantile (QQ) plots of these p values against all p values from the 100K regression analysis

Validation

To validate our findings from the overlap analysis, we selected one marker each from the 500-marker window in 5q23.3 and 13q13.3 overlap regions. The markers, rs1496348 and rs817737, had the best p value in the regression analysis in the 500 marker window in 5q23.3 and 13q13.3 overlap regions, respectively, and therefore were selected for validation. The two markers were genotyped in a sample of 284 Puerto Rican family trios with asthma. Since the initial genome-wide association analysis was performed on moderate to severe asthmatics, the transmission-disequilibrium test (TDT) was performed on all trios and also on trios with moderate to severe asthma. The TDT analysis suggested positive association between the SNP rs1496348 (5q23.3 region) and moderate to severe asthma (p value = 0.02, Table 6). Allele A of this marker was over-transmitted among trios in which the proband had moderate to severe asthma. The same allele was associated with moderate to severe asthma in the initial genome-wide association analysis with a p value of 0.0007 (Table 6). The result of the TDT analysis for SNP rs817737 in 13q13.3 region was not significant for both asthma and moderate to severe asthma (Table 6).
Table 6

Validation analysis

Maker/Phenotype

TDT

GWA

Over-transmitted allele

No. of informative Families

No. of transmissions

No. of non-transmissions

z score

p value

Coeff.

z value

p value

rs1496348 (Ch. 5q23.3)

Asthma

A

152

195

109

0.00

1.00

   

Moderate to severe Asthma

A

24

35

13

2.27

0.02

1.02

3.39

0.0007

rs817737 (Ch. 13q13.3)

Asthma

T

146

200

92

0.08

0.94

   

Moderate to severe Asthma

T

22

32

12

1.35

0.18

1.13

3.63

0.0003

The result of the TDT analyses for markers in 5q23.3 and 13q13.3 regions. The table also shows the result of the initial genome-wide association (GWA) analysis for the same markers

Discussion

Genome-wide association studies are desirable since they do not require a priori knowledge of the genes involved in a disease or disease-related phenotypes and may also provide greater power than linkage-based methods for identifying common variants conferring modest risk (Hirschhorn and Daly 2005; Risch 2000). However, a potential downside to performing genetic association studies in recently admixed populations, such as Puerto Ricans, is the possibility for spurious associations (confounding) due to population stratification (Cardon and Palmer 2003; Devlin and Roeder 1999; Marchini et al. 2004; Ziv and Burchard 2003). To date, there have been several successful attempts at gene mapping for complex diseases, including asthma, by genome-wide association analysis, but none has done so in an admixed population (WTCCC 2007; Buch et al. 2007; Gudmundsson et al. 2007; Hafler et al. 2007; Hakonarson et al. 2007; McPherson et al. 2007; Moffatt et al. 2007; Samani et al. 2007; Saxena et al. 2007; Scott et al. 2007; Steinthorsdottir et al. 2007; Tomlinson et al. 2007; Winkelmann et al. 2007; Yeager et al. 2007; Zanke et al. 2007; Zeggini et al. 2007).

This is the first report of genome-wide association and admixture mapping analysis for asthma in Puerto Ricans. Although our initial efforts at identifying disease associated loci based on marker-by-marker test statistics were limited by our small sample size, our validation analysis indicates that we could identify and validate regions that have been previously associated with asthma disease status by combining the strength from distinct but complementary analytical methods—direct genome wide association and admixture mapping. While the direct genome association analysis can be performed on any population to identify genetic variants (both ethnic-specific and cosmopolitan) associated with a disease, admixture-mapping methodology detects genetic variants in recently admixed populations that are responsible for racial differences in disease risk. Compared to direct genome-wide association study designs, admixture mapping exploits long-range linkage disequilibrium that exists in recently admixed populations and therefore requires fewer markers (McKeigue 1998; Montana and Pritchard 2004), and is more robust to allelic heterogeneity (Terwilliger and Weiss 1998). Although none of our admixture mapping or genome-wide association analysis showed significant association after correction for multiple testing, it is likely that some of the top scoring markers from different analyses are true associations that did not reach statistical significance due to low power of our study. The overlap analysis combining the results of admixture mapping and genome-wide association analysis did identify two regions, 5q23.3 and 13q13.3, as the most promising candidates for asthma in Puerto Ricans. But it is possible that this analysis missed other regions that could be better captured using one of the analytical methods than the other.

For a complex disease like asthma successful identification of causative factors will require large and well characterized samples, as have been used in other successful genome-wide association studies for complex diseases. In addition, it should be noted that many different factors (genetic or environmental) may exist that contribute to the development and severity of asthma. While every effort was made to ensure homogeneity of the phenotype, the possibility that different individuals in our study may have different genetic factors contributing to disease still exists.

One lesson our analysis demonstrates is that association studies in admixed populations should include adjustments for ancestry. We have previously shown that there is evidence of substantial confounding from population stratification in association studies of asthma in Latino populations (Choudhry et al. 2006). Here, we confirm our previous findings using a much larger set of AIMs (n = 2,730) suggesting that any association study of asthma in Latino populations should be tested and corrected for population stratification. In addition, we have identified AIMs on the Affymetrix 100K arrays for Latino populations, which may be helpful for future investigators intending to perform admixture mapping or genome-wide association studies in Latino populations, complementing three recent reports of Latino Admixture Mapping panels (Mao et al. 2007; Price et al. 2007; Tian et al. 2007).

The excess of lower p values than expected by chance in the 5q23.2 and 13q13.3 chromosomal regions may suggest the existence of asthma susceptibility loci in these regions in Puerto Rican population (Fig. 5). Multiple genome-wide linkage studies for asthma and related atopic phenotypes have been performed in ethnically diverse populations. Although, the results of these linkage mappings varied across studies and across racial and ethnic groups, regions 5q and 13q are two of the few most frequently reproduced regions in these studies (Koppelman et al. 2002; Ober et al. 1998, 2000; Wiltshire et al. 1998; Xu et al. 2000). It is interesting to note that these regions contain the asthma candidate genes IL4, IL13, and TGFβ, which have been studied extensively (Bartram and Speer 2004; Basehore et al. 2004; Battle et al. 2007; Camoretti-Mercado and Solway 2005; Howard et al. 2002; Li et al. 2007; Marsh et al. 1994). In addition, these regions contain several new potential asthma candidate genes including IL3, IL5, IL9, SMAD5, IRF1, FBN2 and SMAD9 that warrant further investigation. In addition, our genome-wide association analysis suggests some other regions with low p values that may be associated with asthma including 1p22.3, 1p32.2, 4q31.1, 10q23.31, 11q14.1 and 13q22.3 (Table 4). Some of these regions have been shown to be linked with asthma in previous studies (Colilla et al. 2003; Haagerup et al. 2004; Hoffjan and Ober 2002; Huang et al. 2003; Mathias et al. 2006; Pillai et al. 2006; Postma et al. 2005). The detection of causative variants in these regions will require fine mapping and functional validation.

Our results provide strong evidence that the previously linked 5q23 region is associated with asthma in Puerto Ricans. By finding an association in a previously identified region we provide a “proof-of-principle” for genome-wide association studies in admixed populations. Our study underscores the value of applying complementary analytical techniques to admixed populations to uncover genetic risk factors for complex traits.

Acknowledgments

The authors would like to acknowledge the families and the patients for their participation. We would also like to thank the numerous health care providers and community clinics for their support and participation in the GALA Study. Finally, we would like to especially thank Jeffrey M. Drazen, M.D., Scott Weiss, M.D., Ed Silverman, M.D., Ph.D., Homer A. Boushey, M.D., Jean G. Ford, M.D. and Dean Sheppard, M.D. for all of their effort towards the creation of the GALA Study. This work was supported by National Institutes of Health (HL078885), RWJ Amos Medical Faculty Development Award, NCMHD Health Disparities Scholar, Extramural Clinical Research, Loan Repayment Program for Individuals from Disadvantaged Backgrounds, 2001–2003, to EGB), American Thoracic Society “Breakthrough Opportunities in Lung Disease” (BOLD) Award and Tobacco-Related Disease Research Program New Investigator Award (15KT-0008) to SC, and the Sandler Center for Basic Research in Asthma, the Sandler Family Supporting Foundation and the Flight Attendant Medical Research Institute (FAMRI).

Supplementary material

439_2008_495_MOESM1_ESM.xls (686 kb)
Supporting material (XLS 686 kb)

Copyright information

© Springer-Verlag 2008