Introduction

Birth defects are common, costly, and critical. Although individual birth defects may be rare, collectively about 3 % of US births are affected by a major structural or chromosomal birth defect [1]. Hospital costs for birth defects in the year 2004 were $2.6 billion [2]; the physical and emotional cost of birth defects, while difficult to quantify, are also high. Birth defects can lead to lifelong disability and are the leading cause of infant mortality in the USA [3].

Although genetic factors contribute to birth defect risk, environmental (i.e., non-genetic) factors influence the risk for birth defects as well. Some of these environmental risk factors are recognized as strong teratogens, such as maternal rubella infection [4] and use of the medications thalidomide or isotretinoin during pregnancy [5]. The identification of environmental teratogens provides an opportunity for prevention activities. Maternal folic acid intake during the periconceptional period is one of the most successful examples of an intervention to prevent birth defects. Randomized controlled trials demonstrated the role of folic acid in preventing neural tube defects (NTDs) [68], and a 26 % decrease in the birth prevalence of NTDs was observed in the USA after folic acid fortification of enriched cereal grain products, which was mandated in the late 1990s [9]. It has been estimated, however, that among pregnancies affected by birth defects, the cause is unknown for two thirds [10], and in these instances, there is no clear opportunity for prevention. Therefore, continued efforts to identify modifiable risk factors for birth defects are important.

Conducting research to identify modifiable risk factors for birth defects is difficult, however, for a variety of reasons. While some challenges are familiar to researchers across many disciplines, other issues affecting birth defect research may not be well understood by those outside of the field. This article describes several methodological challenges to the study of birth defects and ways these challenges might be addressed.

Ascertainment, Definition, and Classification of Birth Defect Cases

Case Ascertainment

The studies of modifiable risk factors for birth defects often rely on data from population-based surveillance systems to identify cases. The surveillance systems should capture birth defects occurring among multiple pregnancy outcomes (i.e., live births, stillbirths, pregnancy terminations), and studies should include cases from all of these outcomes. The distribution of pregnancy outcomes varies widely across different birth defects; the prevalence of birth defects for which termination or stillbirth is a more common pregnancy outcome can be substantially underestimated if only live births are ascertained. Results from studies based only on live births can be biased if the risk factor of interest is related to the stillbirth or termination risk in birth defect cases; there will be an under-ascertainment of cases (and an under-estimate of the true association) in a design that is limited to live births [11]. If the surveillance system uses either active case ascertainment (characterized by case finding and medical record abstraction, in addition to abstraction of passively reported cases, and clinical review), or passive case ascertainment with added clinical review and verification using administrative information (e.g., hospital discharge codes), the data will likely be of higher quality than exclusively passive systems with limited data sources, which often have over-ascertainment of some subtypes of birth defects, and under-ascertainment of others.

The rarity of individual birth defects makes the case-control methodology an efficient option for research. However, cohorts have also been assembled through extensive, longitudinal linkages of national registries, such as the Nordic Perinatal Bereavement Cohort [12], and large pregnancy follow-up studies, such as the Norwegian Mother and Child Cohort Study (MoBa) [13]. In the USA, the Collaborative Perinatal Project has been an influential cohort, albeit with limited numbers for assessing risk factors for rare conditions such as structural birth defects [14, 15]. There have been recent attempts to simulate population-based cohorts for birth defect research in the USA using large data linkage collaborations across several health maintenance organizations, such as the Medication Exposure in Pregnancy Risk Evaluation Program (MEPREP) [16]; these linked data capture clinical information recorded in electronic medical records, as well as prescription drug claims.

Case Definition and Classification

The concepts of case definition and case classification are often interwoven and difficult to completely tease apart. A case definition is a uniform set of criteria used to decide whether an infant or fetus has a birth defect that will be included in a study; it is intended to increase the likelihood that included cases have the defect of interest. Case definitions are typically developed before a study begins and will include components such as age at diagnosis and method of diagnosis, although these may not be straightforward (e.g., which defects to include based on prenatal ultrasound only) [17]. Epidemiologic studies typically include only birth defects considered major (e.g., open spina bifida, but not spina bifida occulta [18]). Some minor defects may represent the milder end of a spectrum and thus have the same risk factors as corresponding major defects; however, there is evidence that minor defects are inconsistently ascertained [19], so exclusion criteria for minor defects should be established as part of the case definition [20]. Defects may also be excluded based on the known pathogenesis of the defect. For example, in a risk factor study for clubfoot, the case definition would typically exclude individuals with both clubfoot and spina bifida because the clubfoot is thought to be secondary to the interruption of motor signals from the spinal cord (and thus diminished in the utero movement of the extremity) and is not intrinsic to the foot. In general, reliance on International Classification of Diseases (ICD) codes as the only determinant of case status can lead to inappropriate inclusion or exclusion of cases [21, 22]. These codes may not be sufficiently specific to define an appropriate case group. For example, the ICD-9 code 745.5 encompasses both atrial septal defect (ASD) and patent foramen ovale (PFO); ASD is typically considered a major birth defect, while PFO is not [23, 24]. Further, differential diagnoses in the medical record to “rule out” possible diagnoses may mistakenly be coded as final diagnoses. The use of verbatim descriptions of the diagnoses and the results of clinical tests and procedures to determine individual case status is essential in overcoming many of these challenges [25].

Case classification includes grouping defects into meaningful categories for analysis. Case classification is important in risk factor studies because the inclusion of infants with different underlying causes of their defect(s) may bias the magnitude of an observed association toward the null [26]. In some instances, defect categories may be combined, which allows for larger case groups, if it is plausible that the defects share a common etiology. However, commonly grouped defects can have associated risk factors that are not shared. For example, although neural tube defects are often analyzed as a single defect category, different risk factors have been observed for anencephaly and spina bifida, the two primary defects included in this category [27]. Classifying individual congenital heart defects can be particularly challenging due to their complexity [28]. The issue of determining when to “lump” birth defects into a single category and when to “split” them out into different categories has long been a challenge of researchers [26, 29]. Although researchers should strive for homogeneity within their case groups, ultimately each case of a birth defect will have unique characteristics, and meaningful grouping of some type must occur in order to conduct analyses. Because of the overall heterogeneity of birth defects, analyses of “all birth defects” or even analyses by body system (e.g., cardiac, gastrointestinal) are unlikely to yield informative results.

Birth defects may occur in isolation or in combination with defects in other body systems, often referred to as multiple birth defects (or multiple congenital anomalies). Case classification also involves determining if an infant has the birth defect as an isolated occurrence, as part of multiple birth defects, or as a component of a syndrome (i.e., a recognizable pattern of defects that is known or presumed to have a specific cause, such as a single-gene disorder). Previous studies have identified that the etiology of isolated defects can differ from the etiology of multiple defects [30, 31]. Simply using the number of recognized defects to assign a case into a multiple birth defect category is not valid because infants can have one primary major birth defect with one or more secondary major defects as sequence events (e.g., spina bifida with clubfoot); can have more than one defect in a given body system (e.g., multiple cardiac defects), which may or may not be appropriate to analyze separately; or can have minor birth defects. With thoughtful programming, classification by computer algorithms can be useful to identify infants with isolated birth defects or infants with some known syndromes; however, the classification of infants with multiple birth defects, which can be a substantial proportion of cases, is best done by individual case review.

In risk factor studies, infants whose defects are consistent with syndromes of known etiology (e.g., single-gene or chromosomal disorder) should probably be excluded. Identifying syndromic cases is challenging and often limited to excluding infants with existing syndrome diagnoses. Among infants with multiple defects, certain patterns may be identified (e.g., VATER association [32]), and these patterns can be assessed in relation to specific exposures, potentially leading to the identification of a previously unrecognized syndrome or phenotype (e.g., embryopathy related to mycophenolate mofetil exposure [33]). The examination of patterns among birth defect associations can help in exploring the biological plausibility of findings. A related concept is the grouping of cases (with either isolated or multiple defects) by presumed pathogenic mechanisms (e.g., vascular etiology) when testing the association of particular birth defects with exposures that have known biologic effects (e.g., nicotine and vasoconstriction) [34, 35].

The lack of a standard process for defining and classifying birth defects has made comparisons of results across different epidemiologic studies difficult. To address these issues, researchers can agree on a set of standards and analyze their combined data accordingly. A successful example of this practice is the multi-site National Birth Defects Prevention Study (NBDPS), for which clinical geneticists and experts in pediatric cardiology have developed and published a system of case classification built upon previous efforts of others [3638] to ensure homogeneous case groups [17, 39]. The adoption of standard case definitions and methods for case review and classification will facilitate the replication of study findings in different study populations.

Exposure Assessment of Modifiable Risk Factors

Embryogenesis occurs in the first 8 weeks of pregnancy, and most birth defects develop in the first trimester [40]. Therefore, exposures during the first weeks of pregnancy are most relevant for assessing risk factors for birth defects. Accurately ascertaining early pregnancy exposures to modifiable risk factors is a fundamental challenge of birth defect research. It is difficult to identify women just before or in the first few weeks of pregnancy; in the USA, approximately half of pregnancies are unplanned [41], and many pregnancies are not recognized until the end of the first trimester [42]. Therefore, prospective data collection typically relies on health care databases. The examples of such databases include records from health care visits [43] and pharmacies [44]. Exposure data can also be collected retrospectively by asking the mother about her exposures during pregnancy after the pregnancy has ended.

There are many advantages of using health care databases to ascertain first trimester exposures: the data are routinely collected and thus can be more cost-efficient to use. Also, since the data are recorded prior to the diagnosis of a birth defect, these data are not subject to recall bias, which is often a concern with retrospective exposure ascertainment in a case-control study. A major challenge of using health care databases is that typically they are not designed for research but for administrative purposes; therefore, the types of information that are included may not be ideal for studies. Important information, such as smoking status, over-the-counter medication use, or herbal use, may not be available or might be inconsistently ascertained. Although health care records may contain information on prescribed or dispensed medications, they cannot confirm whether medications were taken, nor can they account for prescription medications that were taken without a prescription (e.g., borrowed or shared medication, a practice reported by over one third of women [45]).

An advantage of retrospective data collection via maternal self-report is that it allows the mother to document her actual behaviors (e.g., whether she took a prescribed medication), which is important since many pregnant women change their behaviors (e.g., stop taking prescribed medication) once they learn they are pregnant [46]. A disadvantage of retrospective data collection is the potential for recall bias if mothers of affected children report their exposures differently than mothers of unaffected children [47, 48].

Some pregnancy exposure cohorts are based on a convenience sample, such as those derived from teratogen information services (TIS). Identifying a comparison group of women for analyses of exposures in TIS cohorts is challenging, since all women in their cohort have some exposure that caused them concern. Some TIS studies have used as controls pregnant women exposed to medications believed not to be teratogenic, those exposed to medications only in the third trimester, or those with the same underlying condition but on a different medication [49]. Recent studies based on TIS cohorts, such as the MotherToBaby Pregnancy Studies conducted by the Organization of Teratology Information Specialists [50], included an expert clinician in-person examination of every baby born to a mother enrolled in the study; however, the potential for selection bias remains. Regulatory authorities encourage the creation of pregnancy exposure registries in order to detect signals of an increased risk of birth defects associated with the use of specific medications, such as antiretroviral medications [51, 52] and medications used to treat epilepsy [53]. Registry participation is often voluntary, relying on health care providers to conduct follow-up and submit data [54]. Considerations for the establishment of optimal pregnancy registry studies have been previously published [55, 56].

Widespread use of electronic medical records provides an opportunity for improved research data. Electronic medical records may allow the linkage of pregnancy information, pharmacy records, and maternal and fetal outcomes but will still be limited by the completeness and accuracy of the information entered, which can vary widely [57]. These data would be even more valuable if pregnant women were asked a brief set of standardized questions regarding their exposures immediately before and throughout their pregnancies.

Analytical Issues

Statistical Power

Perhaps the most challenging analytical issue facing birth defect researchers is that of statistical power to detect associations. Using traditional frequentist statistical methods, the power of a study to detect a true difference in the prevalences of a birth defect among two different exposure groups is dependent on several factors: the p value cut point chosen to define statistical significance (conventionally p < 0.05), the strength of the association between the exposure and the defect, and the prevalence of the exposure in the population [58]. For a given sample size, stronger associations and associations with more common exposures are easier to detect.

Despite the concern of recall bias, case-control studies are commonly employed to study birth defects. Consider one of the most common birth defects, cleft lip with or without cleft palate, with an estimated birth prevalence of 10.9/10,000 live births [59]. Given that two thirds of US pregnancies result in a live birth [60], over 15,000 pregnant women would have to be followed from the beginning of their pregnancies in order to obtain prospective information on pregnancy exposures for 11 cases. For a rarer defect, such as encephalocele, which has a prevalence of 0.8/10,000 live births [59], almost 19,000 pregnant women would need to be followed to obtain data on 1 case. National data repositories and health care databases have allowed some prospective risk factor studies for birth defects [6163] but even using many years of nationwide data results in low statistical power for some defects. Even a case-control design cannot entirely circumvent sample size issues, as accumulating sufficient case numbers can take years, especially in population-based settings.

In addition to individual birth defects being rare, some modifiable risk factors are relatively rare. For example, maternal pregestational diabetes is strongly associated with certain birth defects [30]; however, the prevalence of pregestational diabetes among mothers of infants not affected by a major birth defect is less than 1 % [30]. And although medication use among pregnant women is highly prevalent [48, 64, 65], the use of individual medications can be rare [66]. Similar medications can sometimes be grouped together to help increase exposure prevalence, but there is the potential for different effects even among medications within a single class [67], and this type of grouping does not provide information on which medications within a class might be better alternatives for use during pregnancy.

Multiple Testing

In order to conduct the comprehensive assessments of particular prenatal exposures, exposures are often assessed in relation to multiple birth defect categories in a single study. This practice can lead to difficulty in interpreting results in the context of multiple testing [68]. Although publications from a given study may report individual exposures or defects, the collection of analyses amounts to many tests of statistical significance being conducted on a single sample. Using standard frequentist statistical conventions, 5 % of the associations tested are expected to be statistically significant due to chance, rather than a true association. While the potential for a type I error (rejecting a true null hypothesis) due to multiple testing may be relatively easy to identify in a single publication, it is more difficult to appreciate this potential when the tests are spread across many publications, yet the potential for observing a spurious association remains.

Conventional methods are available to reduce the likelihood of a type I error, including the Bonferroni method, in which the alpha level for individual statistical tests is reduced such that there is a 5 % collective probability of a type I error across all tests, rather than for each individual test. Although this method is routinely used in genetic discovery analyses, the p values and resulting confidence intervals are considered too conservative for most epidemiologic studies of non-genetic exposures and unreasonably increase the probability of a type II error (failing to reject a false null hypothesis) [47]. Further, if the tests are spread out over multiple analyses, some of which have yet to be conducted, there is no practical way to estimate how many statistical tests will ultimately be conducted on a single dataset.

Bayesian statistical methods offer the opportunity to address the issues of both multiple testing and small numbers, through the use of prior knowledge and multilevel models [47, 69]. Bayesian methods, which are beyond the scope of this article, focus on quantifying uncertainty rather than relying on the concept of repeating hypothetical studies. Bayesian methods are increasingly being used in birth defect research, including the analyses of antihistamine medication exposures [70], antiepileptic medication exposures [71], and maternal occupation [72].

The Role of Genetics

The role of genetics in public health is to facilitate the identification of populations most susceptible to disease and the underlying biological mechanisms that are affected by modifiable risk factors. The identification of modifiable risk factors may be obscured without stratifying by genetic variation, including variation in the epigenome (heritable chemical modifications to the genome that result in changes in gene expression without changing the underlying DNA sequence) [73, 74]. However, the inclusion of a genetic component in an epidemiologic study increases its complexity—additional factors that must be considered include specimen collection, including timing and quality; allele frequencies and effect sizes; assay design and quality; specimen and data storage; and ethical, legal, and social implications.

Some specific challenges in collecting and analyzing genetic and epigenetic data for birth defect research include small sample sizes that often preclude high-density variant studies and limit research to only the largest racial-ethnic groups, even in multi-site studies [75], and maternally mediated genetic and epigenetic effects acting on the fetus during gestation which require more complex analytical methods and, ideally, the collection of specimens during the relevant embryological period.

Blood samples are the “gold standard” for genetic material, but specimen quality must be balanced with achieving a reasonable response rate. Convenient non-invasive collection methods are preferred, especially in pediatric populations [76, 77]. The distrust of genetic research, especially among minorities, hampers sample collection and could create selection bias [76, 7880]. Rapid advances in technology and the implementation of data sharing policies were not anticipated in the initial consent of many studies and raise the risk of identifying study participants from coded genetic data. The evolving landscape will undoubtedly yield important clues in causes of birth defects, but raise additional ethical issues with pediatric populations, such as returning individual results to parents of minors, ensuring individual genetic data confidentiality, and the use of data as children reach the age of majority [8184].

Genetic factors undoubtedly play a crucial role in the development of a number of different birth defects. The challenges of obtaining enough samples, especially in combination with environmental exposure data for outcomes as rare as individual birth defects, are not trivial. Collaborative projects such as the population-based NBDPS [85], funded by the US Centers for Disease Control and Prevention, and consortia such as GENEVA, funded by the National Human Genome Research Institute [86], are needed to answer some of these important questions.

Challenges in Translation

Demonstrating Impact

Knowledge gained from birth defect research allows women and their health care providers to make better informed decisions about exposures during pregnancy, leading to a reduction in the number of pregnancies affected by birth defects. However, demonstrating the public health impact of this research is challenging. There is no national system for reporting birth defects in the USA, and although there are birth defect surveillance systems in 41 states [87], not all systems include all births in the state. States also may differ on other critical variables such as which defect categories are included, case definitions, and case verification methods. Although data from these systems can be pooled to estimate a “national prevalence” of selected birth defects [59], these estimates have large uncertainty due to variations in surveillance methodology (e.g., active versus passive ascertainment) and differences in clinical practice (e.g., higher or lower screening rates for certain defects).

Temporal trends in the prevalence of specific birth defects are influenced by multiple factors; a reduction in the prevalence of one risk factor may be offset by the increase of another. For example, the prevalence of cigarette smoking during pregnancy, a risk factor for certain birth defects, has decreased over time, from 18.4 % in 1990 [88] to 12.3 % in 2010 [89]. However, the prevalence of obesity, another recognized risk factor for certain birth defects, has increased dramatically over time for reproductive-aged women, from 12.0 % in 1991 [90] to 35.7 % in 2009–2010 [91].

Temporal trends can also occur among subgroups of the population, which can be obscured when the entire population is considered. For example, an increase in the prevalence of gastroschisis has been observed in the USA, from 2.3 to 4.4 per 10,000 live births in 1995 and 2005, respectively; however, the absolute increase among women less than 20 years old has been markedly higher, from approximately 8 per 10,000 to 15 per 10,000 during the same time period [92].

Simulation modeling offers an opportunity to estimate the potential public health impact of reducing harmful exposures (or increasing beneficial ones), which may not be detectable using currently available surveillance data. Simulations have been used to estimate the number of cases of specific birth defects that could be prevented with the decreased use of teratogenic antiepileptic medications [93], a reduction in the prevalence of prepregnancy obesity [94], the elimination of periconceptional cigarette smoking [95], the addition of folic acid to birth control pills [96], and folic acid fortification of corn masa flour [97].

Moving Target

Despite the challenges described above, research has been successful in identifying modifiable risk factors for birth defects leading to the implementation of effective prevention strategies [98]. However, the changing landscape of maternal exposures makes continued clinical, surveillance, and research vigilance necessary. For example, new medications enter the market every year, the vast majority of which have insufficient clinical or epidemiological data to determine teratogenic risk [99, 100]; in addition, existing medications may be approved for new indications or used off label for a wider range of conditions [101]. Further, although the prevalence of traditional smoking has decreased over time, the introduction and increasing use of the e-cigarette may increase the exposure to nicotine during pregnancy [102].

Conclusion

Current research methods do not allow for the definitive assertion of the safety of an exposure during pregnancy; we can never rule out all risk for all pregnancies. The number of prenatal exposures for which we are able to quantify the associated risk for birth defects is small compared to the vast number of substances and experiences to which a pregnant woman can be exposed. However, well-conducted birth defect research provides information to pregnant women and their health care providers that allow them to reduce the risk for birth defects. In addition, this information can serve to lessen worry for women about necessary (e.g., medications) or accidental exposures during pregnancy, also a worthwhile outcome.