Background

Stroke is the third leading cause of death in adults. The 2 basic types of stroke are ischemic stroke and hemorrhagic stroke. In ischemic stroke, the most common type, a profound disturbance of focal cerebral blood flow leads to irreversible parenchymal injury. The Siblings With Ischemic Stroke Study (SWISS) is a multicenter affected sibling pair study with the aim of identifying chromosomal regions linked to ischemic stroke by using genome-wide scanning. Family history and twins studies support the existence of genetic susceptibility to stroke [14]. Mendelian disorders known to be associated with an increased risk of stroke include hemoglobinopathies, dyslipoproteinemias, and cardioembolic disorders [5]. Most known Mendelian stroke disorders present in infancy, childhood, or young adulthood and collectively represent only a small proportion of all stroke cases. Several of these Mendelian disorders were recognized as unique genetic diseases because of striking phenotypic features, such as corneal opacities and angiokeratomas of the skin in Fabry disease.

Defining the genetic basis for stroke syndromes that lack striking phenotypic features is a more difficult task. Model-dependent linkage analysis has been used in large pedigrees with diseases such as cerebral autosomal dominant arteriopathy with subcortical infarcts and leukoencephalopathy (CADASIL) [69]. However, traditional linkage analysis is unlikely to be the most expedient method of finding novel stroke-susceptibility genes when carrier status cannot be defined on the basis of distinctive clinical, radiographic, or laboratory features.

One popular method of identifying genetic risk factors has been the candidate gene association study, in which investigators compare rates of one or more variant polymorphisms of a candidate gene among stroke cases and stroke-free controls. Identifying risk factors depends on selecting the right candidate genes, a daunting task because the human genome harbors about 30,000 genes. A candidate gene is usually selected because the gene product might relate to pathogenesis of disease.

Numerous studies have used a candidate gene approach to define genetic risk factors for stroke, but so far results for several categories of candidate genes have been negative or conflicting. For example, because about 80% of strokes are caused by thrombotic occlusion of a blood vessel, genes related to the coagulation system would seem logical candidates for susceptibility to stroke. However, despite association of the factor V G1691A (factor V Leiden) and prothrombin (factor II) G20210A mutations with venous thromboembolism and myocardial infarction [10, 11], neither mutation is strongly associated with risk of stroke [1218]. Although a study of British adults found elevated levels of serum homocysteine to be associated with an increased risk of stroke [19], a case-control study of a common polymorphism (methylenetetrahydrofolate reductase [MTHFR] T677C) that results in increased serum homocysteine concentrations found no difference between patients with stroke and controls in either genotype or allele frequency [20]. Because antiplatelet agents with different mechanisms of action can bring about significant reductions in stroke risk, several platelet receptor genes have been tested as candidate stroke susceptibility genes [21, 22]. To date, however, no compelling evidence for an association between any platelet receptor gene polymorphism and risk of stroke has been found.

Conflicting results have been reported from studies on the stroke risk factor status of genes related to myocardial infarction or arterial disease, such as specific genotypes of the genes for angiotensin 1-converting enzyme (ACE) [2326] or apolipoprotein [2734]. Potential confounding factors include the effects of comorbidity and differential mortality rates [34, 35]. Although the degree of stenosis of the cervical internal carotid arteries in symptomatic patients correlates with risk of ipsilateral stroke [36], no clear conclusions can be drawn from attempts to relate carotid artery disease to specific genotypes (for example, of the paraoxonase gene PON, which is thought to protect low-density lipoprotein against oxidative modification [37], or of the endothelial nitric oxide synthase gene [38]).

Thus, results of the candidate gene approach have so far failed to define genetic risk factors for stroke. Furthermore, if important functional mutations should arise in noncoding regions without significant disequilibrium with the site of a screened polymorphism, the association analysis may exclude the true disease susceptibility locus.

A genome-wide scanning approach in sibling pairs may expedite discovery of novel risk factor genes. The basic goal of the genome-wide scan with microsatellite markers is to identify chromosomal regions linked to a disease phenotype by determining whether polymorphisms in the microsatellite markers segregate with disease within a cohort of pedigrees. Microsatellites are not functional; they are noncoding regions of DNA that allow identification of chromosomal regions held in common by members of a pedigree.

The collection of sibling pairs and analysis of mean proportion of alleles shared that are identical (by descent or state) by use of a highly polymorphic panel of genetic markers has come to be a standard protocol for detecting linkage of a disease susceptibility locus to a chromosomal region. The technique has been applied to a broad range of other disorders, including multiple sclerosis [3942], Alzheimer disease [4345], type 1 diabetes [46], type 2 diabetes [47], asthma [48], and systemic lupus erythematosus [49]. The use of such linkage-mapping strategies offers the advantage of model-independence, computational speed, and systematic identification of novel loci [39].

Thus, linkage analysis using a genome-wide scan may yield positive results more efficiently than testing candidate genes a few at a time. However, we do not consider these approaches to be mutually exclusive. Genome-wide linkage analysis may guide selective candidate gene evaluations within regions of importance. The goal of SWISS is to help to identify the chromosomal regions that should be searched for candidate genes.

The paucity of genome-wide scanning studies in the stroke literature to date is mainly due to theoretical and logistical factors that make such studies difficult to design. Ischemic stroke itself represents a heterogeneous phenotype. Various systems have been used to classify subtypes of ischemic stroke [50]. For example, the Trial of ORG10172 in Acute Stroke Treatment (TOAST) investigators classified stroke into large-artery atherosclerosis, cardioembolism, small-vessel occlusion, stroke of other etiology, and stroke of undetermined etiology [51]. Uncertainty exists as to whether the clinical heterogeneity of the ischemic stroke phenotype relates to heterogeneity in genetic risk factors.

Logistically, the collection of a large number of sibling pairs concordant for stroke is a daunting task. Stroke affects an elderly population and carries a modest case fatality rate. Patients may be rendered incompetent to consent to a genetics study by the stroke itself. Often, members of a sibship are separated by large geographical distances. In a preliminary study, we found that, whereas 1 in 10 patients with stroke report having a living affected sibling, only 1 in 50 had an affected sibling living in the same city as the proband [52].

SWISS is designed to overcome these hurdles as far as possible. The purpose of this paper is to describe the SWISS protocol in detail.

Research Design and Methods

Aim

The aim of SWISS is to test the hypothesis that the human genome contains chromosomal regions associated with ischemic stroke by means of genome-wide scanning in DNA samples collected from 300 sibling pairs concordant for ischemic stroke and from 200 discordant siblings.

Definition of ischemic stroke and its subtypes

Stroke is defined according to World Health Organization criteria [53] as rapidly developing signs of a focal or global disturbance of cerebral function, with symptoms lasting 24 hours or longer or leading to death with no apparent cause other than vascular origin. Patients are classified as having an ischemic stroke if they had computed tomographic or magnetic resonance imaging of the brain done within 7 days of onset of symptoms that either identified the symptomatic cerebral infarct or failed to identify an alternative cause of the symptoms. Classification of strokes into subtypes is done according to the validated TOAST diagnostic criteria [51]. Subtype diagnosis is made on the basis of available and relevant information obtained up to 3 months after the stroke, because initial subtype diagnosis varies from final diagnosis in approximately one-third of cases [54].

Study population

Three groups of subjects will be studied: probands, concordant siblings, and discordant siblings (Fig. 1).

Figure 1
figure 1

Sample study pedigree for the Siblings With Ischemic Stroke Study (SWISS). Solid symbols indicate ischemic stroke; CS, concordant sibling; and DS, discordant sibling.

Probands

Probands are adult men and women who 1) have a diagnosis of at least 1 ischemic stroke confirmed by the study neurologist, 2) report having at least 1 living full sibling with a history of stroke, and 3) have attained their 18th birthday at the time of enrollment in the study. If probands have had more than one ischemic stroke, the most recent is the proband index stroke. Probands are not excluded from the study for radiographic evidence of hemorrhagic transformation of an ischemic stroke.

Probands are not enrolled if any of the following conditions apply: 1) The index stroke is presumed to be iatrogenic – that is, onset of symptoms occurred within 48 hours after an invasive cerebrovascular or cardiovascular procedure, such as coronary artery bypass grafting, a catheter-based procedure on carotid or coronary arteries, carotid endarterectomy, heart valve surgery, or thoracic or thoracoabdominal aortic aneurysm repair. 2) The index stroke is presumed due to vasospasm after nontraumatic subarachnoid hemorrhage – that is, the onset of symptoms occurred within 60 days after the onset of a nontraumatic subarachnoid hemorrhage. Virtually all delayed cerebral ischemia occurs 5 to 21 days after subarachnoid hemorrhage [55, 56]. 3) The index stroke is presumed due to an autoimmune condition – that is, the patient has a history of brain-biopsy-proven central nervous system vasculitis. 4) The patient is known to have any of the following single-gene or mitochondrial disorders recognized by a distinctive phenotype: CADASIL, Fabry disease, homocystinuria, mitochondrial encephalopathy with lactic acidosis and stroke-like episodes (MELAS), or sickle cell anemia. We excluded probands with these disorders because their enrollment might confound the genome scan for novel risk factors. 5) The patient had a mechanical aortic valve or a mechanical mitral valve at the time of index stroke onset. We chose this criterion because of the high likelihood that ischemic stroke is iatrogenic in such patients. 6) The patient had untreated or actively treated bacterial endocarditis at the time of index stroke onset.

Concordant Siblings

To be enrolled as a concordant sibling, the subject must have a full sibling enrolled as a proband in SWISS. Other eligibility criteria for concordant siblings are identical to those of probands. Both proband and concordant sibling must be at least 18 years old at the time of enrollment and both must meet the same definition of ischemic stroke. For concordant siblings, the diagnosis of ischemic stroke is verified retrospectively by the Stroke Verification Committee (SVC). This is a central, genotype-blinded committee of study-appointed neurologists (Appendix), which adjudicates the diagnosis and subtype of ischemic stroke for concordant siblings, using standardized, prespecified criteria. Although the subtype of the sibling's index ischemic stroke is determined, enrollment is not restricted to siblings with the same ischemic stroke subtype as probands.

Discordant Siblings

Inclusion criteria for discordant siblings are as follows: 1) The subject has attained his or her 18th birthday at the time of enrollment. 2) The subject has 2 or more full siblings who each have had an ischemic stroke and who are participating in the study. 3) The subject reports having no medical history of stroke or transient ischemic attack (TIA) and denies ever having had symptoms of stroke. Because a SWISS proband might erroneously believe that a sibling never had a stroke, discordance is considered verified only if the sibling can be contacted for a structured telephone interview and gives negative answers to all 8 items on the Questionnaire for Verifying Stroke-Free Status (QVSFS) (Table 1) [57, 58]. Discordant siblings are excluded if they are deemed unreliable historians in the opinion of the interviewer administering the QVSFS on the basis of global impression of moderate or severe impairment of speech, language, hearing, or memory.

Table 1 Questionnaire for Verifying Stroke-Free Status (QVSFS)

Recruitment goals

We aim to enroll at least 300 concordant sibling pairs (300 probands plus 300 concordant siblings) and 200 discordant siblings (800 total study subjects). Because it is likely that not all concordant siblings will actually participate, more than 300 probands will be enrolled to obtain DNA from 300 concordant sibling pairs.

Study procedures

Table 2 summarizes the procedures for enrolling subjects into the study and obtaining blood samples for DNA analysis.

Table 2 Summary of Study Procedures

Phase I. Enrolling Probands and Recruiting Siblings

Screening and enrollment will take place at 50 participating centers in the United States and Canada (Appendix). At each center, a study neurologist screens all patients with a possible diagnosis of ischemic stroke to identify potential SWISS probands, orders or reviews medical tests pertinent to the diagnosis and subtyping of ischemic stroke as part of routine clinical practice, and makes a new diagnosis or confirms a previous diagnosis of ischemic stroke in a potential proband. A certified study neurologist classifies the final subtype of the index stroke according to TOAST criteria [51]. To obtain certification the neurologist reads the original manuscript describing the TOAST classification system and scores various patients presented in a series of stanardized clinical vignettes according to TOAST criteria. The scores are compared with reference values generated by a consensus of the SVC. The investigator receives feedback on any deviations from reference values and is required to review the TOAST classification system and retake the test.

The local coordinator or study neurologist conducts a face-to-face interview with patients who meet enrollment criteria to obtain their medical history and to explain the study. If patients agree to participate in the study, they sign and date 2 copies of the informed consent form, retaining 1 copy for themselves. The local coordinator completes the proband case report forms (CRFs), assigns a SWISS study number to the proband, forwards the proband CRFs to the Clinical Coordinating Center, and gives the proband (or surrogate) a set of study invitation letters to be sent to all of his or her living full siblings. In the letter, siblings are asked to indicate whether they are interested in participating in SWISS by completing the contact information section and sending it to the Clinical Coordinating Center. The Center assigns SWISS numbers to all siblings who provide contact information.

Phase II. Verifying Concordance and Discordance

The goal of Phase II is to confirm that phenotyping of siblings is accurate.

Discordance is confirmed in Phase IIA. The Clinical Coordinating Center contacts potentially discordant siblings who provide contact information, obtains verbal consent for a brief telephone interview, administers the QVSFS (Table 1), and obtains a standardized medical history in a structured telephone interview. Siblings who give negative answers to the QVSFS medical history items but who give a positive response to 1 or more of the review-of-symptoms items are advised to inform their primary care physician of their symptoms so that they can be evaluated accordingly. Siblings who respond positively to QVSFS item 1 advance to Phase IIB. If all of the QVSFS items are negative, the patient is considered a verified discordant sibling. The discordant sibling CRFs are completed during the telephone interview, and the Clinical Coordinating Center sends 2 copies of the Informed Consent Form (ICF) to the verified discordant sibling, who returns 1 signed copy to the Center and retains 1 copy. Verified discordant siblings advance to Phase III of the study.

Concordance is confirmed in Phase IIB. The Clinical Coordinating Center sends potentially concordant siblings a Request for Medical Records Form (RMRF) and 2 copies of the informed consent form to sign, date, and return. The RMRF is a slightly modified study-specific version of the official form used by Mayo Clinic for routine patient care. Subjects return 1 copy of the signed form in a pre-addressed, postage-paid envelope provided with the original form and retain the second copy. The Clinical Coordinating Center uses the signed form to request medical records pertaining to the sibling index stroke. The Center constructs a file of medical records in a standardized, subdivided sequence (hospital admission notes and discharge summaries; neurologic consultation notes; reports of computed tomographic and magnetic resonance imaging of the head; reports of imaging of the heart by transthoracic and transesophageal echocardiography; copies of electrocardiograms; reports of imaging of cervicocephalic vasculature by angiography using conventional, computed tomographic, or magnetic resonance techniques or by ultrasonography; and reports of blood work).

The Clinical Coordinating Center submits completed files on potentially concordant siblings to the SVC on a weekly basis. A neurologist member of the SVC reviews the files and attempts to confirm the diagnosis of ischemic stroke, using a standard stroke work-up checklist to assist with and document a systematic review of the medical records. The SVC may instruct the Clinical Coordinating Center to secure additional medical records if the initial set fails to provide sufficient evidence to confirm the diagnosis of ischemic stroke. If the SVC neurologist cannot confirm the diagnosis of stroke, the potentially concordant sibling does not advance in the study. If concordance is confirmed, the SVC neurologist classifies the TOAST stroke subtype, completes the CRFs for the concordant sibling, and forwards the forms to the Clinical Coordinating Center.

Although as many as 10% of the concordant siblings in SWISS may have a history of 2 or more strokes, the SVC confirms the diagnosis and classifies the subtype of ischemic stroke only for the most recent stroke for which there are records sufficient to confirm the diagnosis (the sibling index stroke). The verified concordant sibling then advances to Phase III.

Phase III. Acquiring Blood for Genetic Analysis

Blood samples are taken only when a study pedigree is complete, i.e., clinical data and ICF are available from 1 proband and at least 1 verified concordant sibling, with or without 1 verified discordant sibling. If the diagnosis of ischemic stroke cannot be verified for any sibling of a proband, the clinical data from that proband are saved, but no blood samples are collected. When a pedigree is complete, the Clinical Coordinating Center instructs the home health agency to collect blood samples from all pedigree members. A phlebotomist from the home health agency visits the subjects at their homes, obtains a blood sample, and ships it to the DNA Bank.

Phase IV. Genome-wide Scan

The DNA Bank creates cell lines and notifies the Genetics Laboratory when 300 concordant sibling pair specimens are ready for analysis. The Genetics Laboratory then performs the genome-wide scan.

Measures of outcome

The primary outcome is the degree of linkage between the stroke phenotype and genetic markers as measured by the proportion of alleles shared by concordant sibling pairs (accumulated over all pairs at each marker).

Clinical database

For each proband, we collect name, date of birth, gender, race, home address, home phone number, e-mail address, and alternative contact information. We record the enrolling investigator's study number and the study center number to assure accurate attribution of efforts and to make it possible to verify entries in CRFs with source documents. Data are collected on stroke risk factors and medical history, date of onset of stroke symptoms, TOAST stroke subtype, and the total number of living full siblings.

The following information is collected on all living full siblings who return sibling response letters: name, date of birth, gender, name of the proband they are related to, twin status, home address, home phone number, e-mail address, alternative contact information, and standardized risk factor and medical history.

In addition, for each concordant sibling, we record date of review of outside medical records, a stroke work-up checklist addressing medical reports reviewed by the physician member of the SVC who confirms stroke concordance, date of onset of index stroke (and of first stroke, if sibling had more than one), TOAST subtype of index stroke, and responses to all items contained in the QVSFS. For discordant siblings, we record responses to all items contained in the QVSFS.

Genotyping

Local centers receive blood shipping kits, including a Vacutainer for blood, by mail at the start of SWISS, and the Clinical Coordinating Center will restock the supply on a continuing basis. Used kits are shipped overnight to the DNA Bank for processing. Lymphoblastoid cell lines will be generated from peripheral blood leukocytes and DNA extracted using routine methods. DNA analysis will begin after the 300th concordant sibling pair is enrolled, which we anticipate to be at the end of year 4. At that time, the DNA Bank will ship at least 50 μg of DNA to the Genetics Laboratory.

At the Genetics Laboratory, the DNA will be plated onto 384-well plates for marker genotyping. The ABI Genescan/Genotyper system will be employed in semiautomated fluorescent genotyping, comparing fragment sizes to an internal standard of CEPH DNA. An ABI377 with 96 wells generates the marker data. All genotypes will be scored blind to phenotype. Two hundred thirty-seven microsatellite markers, obtained from Genethon, CHLC, and GDB (106 di-, 21 tri-, and 110 tetranucleotides), will be typed in all sibling pairs. These 237 markers have been sorted into 30 panels. We will run 92 samples per gel (with 4 lanes for controls); estimating ~920 samples at 30 panels, 300 gels will be needed. Allowing for reruns and data loss, we estimate that 400 gels will be required to complete this task and extract greater than 90% of the genetic data. The average distance between adjacent markers in this panel series is 16.3 cM (1–40 cM). Average heterozygosity will be calculated. The CRI-MAP program will determine intermarker distances and will also be used to form the study-specific genetic map. A genotype database (Megabase) will be used to check the binning of alleles, convert allele sizes to whole numbers, and (where possible) to test for non-Mendelian inheritance. Megabase will store all relevant genotypic/phenotypic data and produce all files needed for statistical analysis.

Cell lines

We regard banking of samples to be a key element of this study. Collection of clinical samples is expensive and time consuming, and it is probable that progress in identifying genes involved in stroke will be incremental. For genes of smaller effect, very large sample sizes are likely to be needed. Having these resources available will ensure that future work can build effectively on the work we present here. Epstein-Barr virus-transformed lymphoblastoid cell lines will be used.

Statistical methods

For the concordant sibling pair design, the proportion of alleles shared by the concordant sibling pairs (accumulated over all pairs at that marker) is the statistic that determines evidence for linkage between the stroke phenotype and the genetic marker. If only concordant sibling pairs are collected, a maximum of only 4 alleles can be identified. The third (discordant) sibling is collected for purposes of determining potential nonpaternities in a sibship (more than 4 alleles) and for better estimating the proportion of alleles shared that are identical by state (in the absence of parents).

Traditional applications of gene mapping have used families in which the trait (disease) is transmitted in a clearly Mendelian fashion. For more complex traits, the inheritance pattern does not fit a single-gene model, and methods that assume a genetic model may provide erroneous results [59]. Ischemic stroke clearly demonstrates familial aggregation, yet no single-gene model of transmission is consistent with the family data. In this project, we propose to use model-independent (relative pair) analysis, a method that is designed to detect linkage without the specification of an underlying genetic model and that is robust to contributions by environmental variation. The methods of analysis for determination of risk factor loci will mainly use the SPLINK and MAPMAKER/SIBS programs [60, 61].

Estimates of power to detect linkage

For concordant sibling pair studies, Risch [62, 63] demonstrated that the fraction Kr/Kp, defined as the risk ratio λr for a type-R relative, can be used to model the probable modes of transmission for a complex disease. Thus, under a given model, the value of λr should decrease in a model-specific manner for each decreasing degree of unilineal relationship, and this expected value can then be contrasted with recurrence risks obtained from a set of relatives (monozygotic twins, dizygotic twins, siblings, offspring, second-degree relatives, etc.). For a single-locus model, therefore, the value of (λr-1) should decrease by a factor of 2, and a multiplicative model predicts risk on the basis of the product of the individual factors.

Risch [63] extended the approach of Suarez et al [64] to include any relative pair. On the basis of this formulation, the power to detect linkage can be obtained for relative (sibling) pairs. For concordant sibling pairs, assuming that the candidate locus is near a stroke susceptibility locus (θ > 0), power depends upon λs (sibling recurrence risk) and λo (offspring recurrence risk). If there is little dominance effect, then λs = λo, and hence the power can be computed on the basis of sibling recurrence risk. For other pairs of relatives, Risch [63] has shown that the single parameter λo is sufficient to specify power (and θ, if θ > 0). The recurrence risk data in relatives are sparse for stroke. Data from Framingham suggest that a reasonable estimate of λs for stroke may range from 2 to 5.

We assume that the genetic markers used have polymorphic information content (PIC) equivalent to that of an equiprobable 4-allele system, yielding a PIC of about 70%. The sample size required to determine a given power is inversely proportional to the PIC of the markers; thus, a sample of 300 concordant sibling pairs genotyped at a marker with PIC of 70% would be equivalent to a fully informative marker typed on 210 concordant pairs. In our consideration of power, therefore, we consider a marker with incomplete information, and the initial analyses will comprise pairwise analyses. Application of multipoint (interval) mapping methods will further increase power [65]. Formal analyses with MAPMAKER/SIBS or GeneHunter will add further power by means of the multipoint method. Using this approach, we have estimated power for a set of 300 sibling pairs concordant for stroke (equivalent to 210 pairs with fully informative markers but without parents). With these estimates, we should have ample power to detect linkage between a marker and a moderately strong susceptibility locus, especially for locus-specific sib risks greater than 3.

For a homogeneous single disease susceptibility locus, the power to detect linkage with our expected 300 concordant sibling pairs generally approaches 100% (Table 3). If stroke susceptibility is attributable to several loci, the risk becomes dependent upon the nature of the contributions (additive or epistatic), and the loci are more difficult to identify. Recent efforts utilizing analysis of genome scan data conditional on the evidence for linkage of a major susceptibility factor show promise [66]. With 300 concordant sibling pairs, we have over 70% power to detect linkage with a genome-wide significance of P = 0.00022 and locus-specific risk of λs = 1.6. As was demonstrated in the discordant sibling-pair analyses, we may have substantial power to detect linkage using a complementary analytic approach, so that the addition of even 100 discordant siblings to the concordant sibling pairs may provide additional insight on linkage.

Table 3 Power to Detect Linkage as a Function of λS, θ, and Type I Error Rate (α)

Discussion

It is clearly of importance to define genetic risk factor loci for stroke. Defining such loci should eventually enable us to determine prospectively those who are at high risk for the disease and to counsel and treat them based on this knowledge: in addition, these risk factor loci may help in the identification of new drug targets for effective treatment of this prevalent disease. However, special safeguards must be in place to protect the rights of subjects involved in genetic research. If genetic information is improperly safeguarded, misuse could adversely influence insurability [67] and employability [68] of subjects and could stigmatize individuals or groups [69]. For this study, we have adopted many of the practical suggestions of Merz and colleagues [70] regarding the ethical use of human tissue. For linking purposes, we use special study-specific codes, rather than medical record numbers, Social Security numbers, or an easily decoded combination of initials and birth dates. Access to linking files is restricted to an as-needed basis only, and the files are deleted when they are no longer needed. The study mandates a strictly unidirectional flow of information. This means that clinical data are used for research purposes, but research data are not used for clinical purposes. This restriction on the use of the SWISS data set is based on our recognition that unique clinical obligations accompany the provision of predictive genetic test results to individual patients [71].

Recently, there has been intense debate in the United States at the federal level over the privacy rights of pedigree members with respect to genetic research. For this study, we have adopted the position that every member of a pedigree has the right to refuse to have personal information such as name, address, and medical history recorded in a research database. This right cannot be waived by other members of the pedigree [72]. Therefore, probands or their noninvestigator surrogates invite other family members to participate. Siblings who are interested in participating voluntarily provide investigators with their name, address, and telephone number.

Identifying the genetic basis of diseases with complex modes of inheritance is a daunting task, and it is likely that a variety of approaches will be necessary to elucidate the genetic basis of human stroke. Genomics has been applied to animal models of stroke and yielded discoveries about loci that correlate with stroke risk. For example, Rubattu and colleagues [73] identified 3 major quantitative trait loci in the F2 cross of stroke-prone and spontaneously hypertensive rats. Conducting adequately powered genomics studies of stroke in humans is considerably more challenging.

The central approach this study takes is to identify areas of the genome that siblings affected by stroke share more often than one would expect by chance. The strength of this approach is its theoretical robustness, because the methods make relatively few assumptions about either the genetic architecture of the population in which the disease is studied or about the disease risk. The weaknesses of the approach are its relative lack of power and the low resolution of the linkage peaks identified. In addition, the logistical problem of obtaining DNA samples from clinically well-characterized members of a cohort of stroke pedigrees can be a daunting task.

Although our primary analytic methods will focus on the concordant pair design, we will also use a complementary and related approach, the discordant sibling pair method. This method essentially searches for areas of the genome that are shared less often than one would expect by chance. Discordant siblings are generally easier to collect because they are more frequent than concordant siblings in a disease such as stroke. As demonstrated in the case of diabetic nephropathy [74], the discordant sibling pair approach can have greater power than the concordant sibling pair approach in certain situations. However, the discordant sibling pair approach is often less efficient than the concordant sibling method because of the uncertainty of the final diagnostic status of the unaffected sibling. Discordant siblings might have subclinical (radiographic) stroke or be discordant only because of their relative youth. Furthermore, resource limitations in this study prevent the collection of blood from more than 1 discordant sibling for each concordant pair.

An alternative design to one using affected relative pairs is that of genome-wide association in subjects with ischemic stroke (cases) ascertained without respect to family history compared with nonstroke controls for frequency of numerous, densely spaced genetic markers (single nucleotide polymorphisms or SNPs). Although the cases and controls are collected more easily than concordant sibling pairs, this approach also has limitations. First, the case-control approach is not robust to differential environmental contribution, so that heterogeneity between case and control exposure could result in spurious association. Second, unlike the concordant sibling pair approach, controls are "unaffected" only at time of collection. The control sample actually represents a mixture of subjects who will never have an ischemic stroke and those who will eventually have an ischemic stroke. Thus, true associations can be missed due to misclassiflcation. Third, the number of subjects to be collected to provide genome-wide significance for an association study remains large, given the number of SNPs to be genotyped across the genome to provide 10 kb to 50 kb spacing. To address the "multiple comparisons" concern with thousands of SNPs, 10,000 to 20,000 cases and controls would need to be studied.

Association studies in a case/control framework have a long history in stroke etiology research. However, these studies have been problematic because until now they have only allowed testing for the known candidates, and they have been plagued by false positives. With the completion of the First Draft of the Human Genome, and the identification of haplotype maps, it may be that this approach can be adapted to a whole genome association strategy using prudently positioned SNPs and DNA samples banked in SWISS.

Much of the theoretical challenge of finding genetic risk factors comes from the heterogeneity of the stroke phenotype. In SWISS we are studying the ultimate phenotype of ischemic stroke. An alternative approach would be to study an intermediate phenotype, such as carotid atherosclerosis, with the hope that alleles underlying this phenotype play a role in the etiology of the stroke itself. Shifting the object of investigation from ischemic stroke to carotid atherosclerosis would decrease the numbers of subjects needed. Another advantage is that carotid atherosclerosis can be treated as a quantitative trait. However, using carotid atherosclerosis as an intermediate phenotype for ischemic stroke also has limitations. Perhaps the most important difficulty is that high-grade carotid atherosclerosis is not a prerequisite intermediate phenotype in most patients with ischemic stroke. The large-artery atherosclerotic subtype of ischemic stroke accounted for only 13% of all ischemic strokes in a recent population-based study in Bavaria, Germany [75]. Similarly, the large-artery atherosclerotic subtype accounted for only 16% of all ischemic strokes in a population-based study in Rochester, Minnesota [76]. In addition to not being a mandatory intermediate phenotype in the biological process leading to ischemic stroke, carotid atherosclerosis is itself a complex phenotype. Degree of stenosis is not the only factor affecting symptomatology of an atherosclerotic carotid artery [77], and important differences between intracranial and cervical arteries in the genetics of atherosclerosis may go unrecognized if the object of study is cervical carotid atherosclerosis [78].

Our study directly addresses the heterogeneity of the ischemic stroke phenotype through the study-wide use of a standardized, validated, and widely accepted system of classifying ischemic strokes based on presumed etiology. The process of certification minimizes variation among local center physician investigators in assigning TOAST stroke subtype to probands. The process of central adjudication minimizes the variation in assigning TOAST subtype to concordant siblings. Although we recognize that it may be of value for future studies to define concordance for phenotype by ischemic stroke subtype, we have chosen to regard that approach as exploratory at this stage.

One way of dealing with the logistic challenges inherent in genetic studies is to collect cases from an isolated population, such as Iceland [79]. However this approach requires an integrated health care database not easily applicable within the US health care model, and the required community consent is probably not possible within the US legal/ethical framework [80, 81]. Logistic difficulties have been mitigated by the steady evolution in multicentered clinical trials that has occurred in the field of stroke research [82]. SWISS is designed as a multicenter clinical trial, and the use of a nationwide home health agency for phlebotomy service enables us to obtain blood from siblings living far away from one another. Whereas a requirement that study subjects travel to a local study center could result in failure to enroll patients for logistic reasons, a home health agency can obtain blood from study subjects rendered homebound by stroke or other ailments. Thus, it is hoped that this study design will efficiently assemble a cohort of ischemic stroke pedigrees without invoking community consent or using "cold-calling" of pedigree members.

In conclusion, we believe that the DNA samples collected in this study will not only be of use in defining regions of the genome in which stroke genes reside. They may also be of use in testing for allelic association of candidate genes and regions by SNPs, because alleles that predispose to disease should have a higher allele frequency in siblings who share chromosomal regions than in siblings who do not. Because the cell lines are banked, we hope to facilitate the identification of stroke risk factor genes both directly by ourselves and by others.