Introduction

Due to the ethical concerns of conducting randomized controlled trials of medications in pregnant women, well-designed observational studies have become the primary way of generating evidence on the benefits and risks of medication use in pregnant women and their offspring. Administrative claims databases are increasingly being used in observational pregnancy research. These databases offer several strengths over other more traditional epidemiological data sources, such as interview- or questionnaire-based data. On the other hand, they are created for administrative and billing purposes, so they have a number of limitations not typically seen in data collected specifically for pregnancy research or other types of electronic data sources (e.g., birth registries).

In recent years, researchers have attempted to address these limitations by linking claims databases with other data sources or supplementing it with information from primary data collection. In this article, we first briefly review data issues critical for studying medication safety in pregnancy. We then describe how adequately claims-only data sources address these issues and how “augmented pregnancy data” may be used to fill the remaining data gaps. We conclude the article with a discussion of future directions on this topic. We define augmented pregnancy data as data from (1) administrative claims databases linked to other data sources or (2) other electronic data sources that generally are thought to provide richer clinical or reproductive information compared to claims databases, such as population-based pregnancy or birth registries and electronic health records (EHRs).

Data Considerations in Medication Safety in Pregnancy Research

Assessing medication safety in pregnancy using observational data poses a number of unique challenges not seen in other types of epidemiological studies. This is because certain medications may be harmful to women or infants only if they are taken during specific gestational periods. For example, cardiac malformations originate in the first trimester, so late pregnancy exposure is not relevant when studying these outcomes. There are seven critical data components in observational studies of medication safety in pregnancy (Table 1). The validity of the study relies heavily on the ability to identify and analyze sufficiently large data sources that provide accurate and complete information on these data components.

Table 1 Data consideration in medication safety in pregnancy research

Administrative Claims Data Only and Its Application in Medication Safety in Pregnancy Research

Administrative claims data from commercial health insurance companies, government-sponsored health plans, and health maintenance organizations in the USA, Canada, and some Asia-Pacific, South American, and European countries have been used for studies of medication use and safety in pregnant women. Created primarily for administrative, financial, and reimbursement purposes, these databases include information on claims submitted by healthcare providers for payment and records of patient encounters with healthcare systems. They generally contain information in the following domains: (1) plan enrollment, e.g., beginning and end dates; (2) demographics, e.g., birth date and sex; (3) outpatient pharmacy dispensing, e.g., codes to determine the drug product, dispense date, amount dispensed, and days’ supply; (4) inpatient and outpatient medical encounters, e.g., dates of service, admission and discharge dates, and providers or facilities; (5) diagnoses recorded for each encounter, e.g., diagnosis codes recorded as International Classification of Diseases, 9th or 10th Revision, Clinical Modification (ICD-9-CM or ICD-10-CM) codes; and (6) procedures done at each encounter, e.g., procedure codes to identify surgeries or laboratory tests ordered. The databases typically have unique personal identifiers for tracking of individuals across various data domain files.

Some of these databases include data from millions of individuals, allowing researchers to conduct studies in large, well-defined insured populations. They also collect detailed information on prescription medications, diagnoses, and procedures performed. The information is generally collected prospectively and not subject to recall bias. We discuss the strengths and limitations of administrative claims data in medication safety in pregnancy research below.

Strengths of Using Administrative Claims Data in Studying Medication Safety in Pregnancy

Information to Identify Pregnancy

Recent studies have shown that administrative claims databases in the USA and Canada can accurately identify most pregnancies resulting in a live birth [1••, 2, 3]. Some studies have also shown that these databases can also identify other pregnancy episodes (e.g., spontaneous abortion) [1••, 2, 3, 4••]. The ability to efficiently identify pregnancies in large, population-based cohorts is a tremendous advantage compared to identification using voluntary pregnancy registries for specific drugs, patient interviews, or teratology information services. Most of the studies that rely on these more traditional epidemiological data sources enroll a relatively small number of women who volunteer or agree to participate in the study. The time and cost to collect the data using these traditional methods are generally much greater than using administrative claims data. In addition, these studies may be more prone to selection bias if participation is more (or less) likely among those exposed to a medication who have an outcome of interest.

Information That Allows Mother-Infant Linkage

To be useful for research of medication safety in pregnancy, information from the mother and the infant must be linked to each other. For a number of administrative claims databases, researchers have determined these mother-infant linkages using a variety of methods, such as unique family identification numbers included in the health plan enrollment data and definitive or probabilistic name and address matching. The linkage rates ranged from approximately 45 to 90% [5,6,7,8]. One example of a claims database that has created these internal mother-infant linkages to conduct studies of medication use and safety in pregnancy using administrative data only is the Medicaid Analytic Extract (MAX) files, a US government database including more than one million mother-infant pairs, comprising mostly low-income or chronically disabled enrollees [5].

Information on Medication Exposure

Studies have shown substantial underreporting of prescription medication use using paper-based questionnaires or interviews in retrospective studies [9,10,11,12]. Electronic pharmacy dispensing data from administrative claims databases provides an advantage over patient interviews or surveys as the information is collected free from recall bias. The information is available on the exact date of dispensing along with other information, such as amount dispensed and days’ supply.

Information on Gestational Age

Pregnancy-related diagnosis and procedure codes can be used to estimate trimesters and weeks of gestation in administrative claims databases [1••, 2, 3, 13•, 14, 15••]. A number of algorithms have been validated against the gestational age information in birth certificates or medical records [2, 13•]. They have been shown to perform well in estimating gestational age and classifying prenatal exposure status, particularly for chronically used medications.

Information on Maternal and Birth Outcomes

The encounter files in administrative claims databases contain diagnosis and procedure codes that can be used to identify medically attended maternal and birth conditions. The validity of these codes varies by outcome [15••, 16].

Information on Confounding Factors

Administrative claims data provides information on a number of potential confounding factors, including maternal demographics, maternal conditions that lead to medical attention, and prescription drug exposures other than the drug of interest.

Information on Long-Term Follow-Up of Infants or Mothers

Follow-up of women and their children in administrative claims databases generally can continue after birth as long as they remain members of the health plan.

Limitations of Administrative Claims Data in Studying Medication Safety in Pregnancy

There are a number of limitations with using only administrative claims data for pregnancy research. A number of possibilities exist for misclassification of exposure. These databases generally do not capture information that provides more accurate estimation of pregnancy start date, such as last menstrual period and clinical or obstetric estimate of gestational age. Although claims-based gestational age algorithms have been shown to perform well on average (e.g., a difference of a few days in average gestational age), individual woman’s gestational length can be misclassified by a considerable amount (e.g., weeks), potentially resulting in exposure misclassification [13•, 17]. While the electronic pharmacy files document that a medication was dispensed, information on whether or not the medication was taken as directed is not available. While some studies have shown that prescription fillings are valid estimates for actual medication use [18], others have reported that noncompliance for some medications may be common among pregnant women, especially in the first trimester [19,19,21]. Such noncompliance may pose a major threat to the validity of medication safety in pregnancy research. Inpatient medication exposures and over-the-counter (OTC) use are often incompletely documented or not captured at all.

Misclassification of certain birth outcomes is also possible. The validity of claims-based algorithms for some birth defects may be questionable, requiring confirmation through medical record review [15••, 16]. For example, Cooper et al. showed that the positive predictive value of claims-based outcome algorithms varied between 34% for hydrocephaly and 93% for oral clefts [16]. Some of these databases may have relatively short follow-up (<1 year) of the infant and mother due to health plan turnover, limiting their ability to study long-term effects of medication use. The state Medicaid databases in the USA typically have shorter enrollment of children, while integrated delivery systems generally have longer follow-up than other commercial health plans.

Information on certain potential confounding variables of interest is also not available or poorly documented in administrative claims data. For example, maternal characteristics including reproductive history, maternal level of education, and OTC folate use or dietary folate consumption are generally not captured. While the enrollment file may contain information on other characteristics such as race/ethnicity and the encounter files may have diagnoses or procedures related to tobacco use, alcohol use, and obesity, the information is generally incompletely recorded.

Augmented Pregnancy Data

Due to the limitations of using only administrative claims data to evaluate the safety of medication use in pregnancy, a number of researchers have linked administrative claims databases to other data sources. These data sources include other databases within healthcare organizations, registries created by various organizations, and data collected by direct contact with the mothers, children, or healthcare providers. Depending on the data source, additional or more accurate information on gestational age, selected maternal and birth outcomes, or potential confounders may be recorded. In some European and North American countries, population-based EHRs and birth registries are available for pregnancy research and can be analyzed alone. We describe examples of some of these data sources and the information available below and in Table 2.

Table 2 Examples of availability and source of information in administrative claims and augmented pregnancy databases

Electronic Health Records

Integrated healthcare delivery systems within the USA have linked administrative claims data with ambulatory or inpatient EHRs for a wide array of research studies, including studies of medication use in pregnancy [26, 27]. Several European general practice EHR databases are also available for pregnancy research [28, 29]. EHRs from healthcare providers and healthcare systems may provide information on important potential confounders not well-recorded in administrative claims data, including tobacco use, alcohol use, blood pressure, and height and weight. Information from clinical encounter data and laboratory test results may also provide data to identify potential maternal or birth outcomes of interest, such as head circumference and test result-confirmed gestational diabetes and preeclampsia. Since the data are recorded as part of clinical care and are collected prospectively, studies using EHRs are typically not subject to recall bias.

National or Regional Birth, Death, Fetal Death, and Malformation Registries

Some researchers have linked administrative health plan or prescription data to national or regional government birth registries to identify data not available or incompletely captured in claims or prescription data [4••, 22, 30,29,32]. In the USA, birth certificate data may be obtained from many state departments of public health for research purposes [22, 33]. These data sources include more accurate estimates of gestational age and estimation methods (e.g., obstetric or clinical estimates) and other maternal, paternal, and birth information that may be important for determining potential confounders (e.g., tobacco use, parity, gravidity, maternal and paternal education, race/ethnicity), or birth outcomes (e.g., birth weight). Fetal death reports provide similar information for stillbirths. In the USA, augmenting administrative claims data with birth certificate data has increased the mother-infant linkage rates or has been used as the primary linkage method for some health plans [34]. Health plans that maintain birth registries have generally been able to use the registry information to link ≥95% of deliveries/mothers identified in administrative data [34]. Healthcare organizations in the USA have also linked their administrative claims data to national or regional government death registries for epidemiological studies, including studies of medication safety in pregnancy [27, 28]. This information can be used to identify maternal and infant mortality and cause of death.

The Quebec Pregnancy Cohort in Canada combines data from four electronic databases [4••]. The Régie de l’Assurance Maladie du Québec (RAMQ) database provides administrative health insurance information on medical encounters and medication use. The Med-Echo database records all acute care hospitalizations and includes information on gestational length and birth weight (clinical database). The birth registry of the Institut de la Statistique du Québec (ISQ) provides additional demographic information on the mother, father, and infant (e.g., education level, marital status, birth weight, gestational age, parity, ethnicity) for live births and stillbirths. The Ministère de l’éducation, des loisirs et des sports du Québec (MELS) database provides information on use of specialized services at elementary schools such as speech therapist or psychologists. Linkage of these databases allows for identification of a large pregnancy cohort with comprehensive information on medication exposures, pregnancy and birth outcomes (including longer-term outcomes), and potential confounders of interest for studies of medication safety in pregnancy.

All Nordic countries (Norway, Denmark, Sweden, Iceland, and Finland) have mandatory reporting of births to national organizations known generally as medical birth registries [35,34,35,38]. The registries began capturing comprehensive population-based data as early as 1967 (Norway) or as late as 1987 (Finland). These registries include live born infants and stillbirths occurring after 22 weeks and routinely collect information on gestational age at birth, birth weight, congenital malformations, and the reproductive history of the mother. These registries also gather some information on the type of birth and delivery characteristics and complications, but the type and quality of the information vary. Select variables from birth registries, including serious pregnancy complications and gestational age, have been validated through medical chart reviews [15••, 25•, 35, 39,38,39,42]. Medication use during pregnancy is recorded by the prenatal care provider, and may or may not include data on OTC medications. This information can be linked to national prescription registries, which are comprehensive records of pharmacy dispensing or reimbursement [43].

The Nordic countries also have registries on termination of pregnancies that record elective or therapeutic abortions occurring after 12 weeks of gestation, with varying information on the reason for the abortion. National patient registries include diagnosis and procedure codes for all contacts with the healthcare system; the birth and patient registries generally require special linkage, with a possible exception of Denmark, where the birth registry and national patient registry merged in 1995. Patient registries present an opportunity to carry out research on longer-term outcomes of medication use during pregnancy, for mothers and their offspring.

In many other European countries, similar national and regional registries exist, but reporting of births is sometimes not mandatory [44]. Most of these registries do not include personal identifiers such as a national identification numbers, which may hamper correct linkage with other data sources. The EUROMediCAT project, a collaborative initiative under the European Surveillance of Congenital Anomalies (EUROCAT), includes 15 medical birth registries in 13 countries and 7 healthcare databases in 5 countries covering a population of 7.2 million births from 1995 to 2012 [45]. Within this project, national or regional congenital anomaly registry data were linked to primary care or prescription administrative databases [46]. The registries within the network collect data on live births, fetal deaths, and pregnancy terminations with congenital anomalies, including information on date of birth, gestational age, and maternal age, medication use, and comorbidities.

Due to their universal healthcare systems, Canada and some European countries (including the Nordic countries) generally have very low turnover rate in their databases. As discussed above, these countries have combined multiple databases with high linkage rates for pregnancy research. These linked databases are highly advantageous for evaluating long-term effects of prenatal exposure such as effects on neurodevelopment (e.g., diagnoses of attention deficit hyperactivity disorder or autism spectrum disorders) and other conditions (e.g., childhood cancer), especially if a diagnosis is available and coded [47].

Surveys

Researchers have also supplemented administrative claims data with data collected through interviews or questionnaires of the mothers, infants, or healthcare providers. For example, healthcare organizations in the USA have collected data on pregnant women through self-administered questionnaires or interviews [48]. For the Quebec Pregnancy Cohort, self-administered questionnaire data are collected bi-annually for a random sample of pregnancies ending with a live birth [4••]. Information collected includes lifestyle factors (e.g., smoking, alcohol, and physical activity), socio-demographic information, OTC medication use, weight and height at the beginning of the pregnancy, weight gain during pregnancy, natural health product use, folic acid intake, and reproductive history including use of assisted reproduction techniques such as in vitro fertilization.

Norway and Denmark have both carried out large questionnaire-based birth cohort studies, each with more than 100,000 pregnancies [49,48,51]. The medication exposure data in these cohort studies are particularly rich, as they include both prescription and OTC drugs, as well as indications for use of each drug. Further, these studies include psychometric instruments to assess maternal mental health during pregnancy as well as children’s neurodevelopment at various ages [24, 52, 53]. Data collection for a similar cohort study, the PRegnancy and Infant DEvelopment (PRIDE) Study, is currently ongoing in The Netherlands [54]. In this cohort, the detailed data from web-based questionnaires on prescription and OTC medication use are enriched with dispensing data from pharmacies. These questionnaire-based studies are routinely linked to birth registry data, and may also be linked to patient registries and prescription drug registries.

Biological Samples

A large number of pregnancy or birth cohorts now include biobanks (http://www.birthcohorts.net/). These biological samples could be used to study the effect of underlying genetic and metabolic status on the risk of adverse pregnancy outcomes. Recently, the Quebec Pregnancy Cohort has started collecting saliva on mothers and children already present in the cohort (more than 300,000 women and 260,000 children), as well as mothers and children who will be entering the cohort in the future [4••]. Availability of biological specimens for Nordic countries varies, with Iceland and Finland both maintaining population-wide serum banks and Denmark and Sweden both storing dried blood spots from phenylketonuria (PKU) testing [••].

Discussion

Administrative claims data is increasingly being used to study medication safety in pregnancy due to its ability to identify large populations in a timely, efficient manner. Due to limitations of studies using only administrative claims data, a number of other data sources have been used to augment claims data or as a stand-alone data source. In general, augmenting administrative claims data with information from other electronic data sources or through primary data collection can help improve data completeness. However, these augmentation approaches also have their own constraints, as we discuss below.

EHRs generally record medications prescribed to the mother. As with administrative claims data, information on whether or not the medication was actually taken is not available. EHRs also have incomplete capture of OTC medication use, diagnoses, procedures, and medication prescriptions made outside the healthcare system. Although registries that capture stillbirths and elective or therapeutic abortions can provide more complete capture of all pregnancies, they may not capture early pregnancy losses given that many will occur among clinically unrecognized pregnancies. Studies that include only live births or very late pregnancy loss may sometimes systematically underestimate medication exposure and its associated risk, if women who take the medication are more likely to suffer early pregnancy loss or elect to terminate the pregnancy [55, 56].

In principle, information not available in administrative claims data could be obtained from other data sources or by contacting mothers, children, or healthcare providers. In practice, however, a typical pregnancy study can only collect such information in a subset of the study population, due to incomplete overlap in populations in various electronic data sources, the cost of primary data collection, and the ability to identify or contact the mothers, children, or their healthcare providers. Therefore, missing data may remain an issue in the augmented data. It is unlikely that the data will be missing completely at random [57]; therefore, appropriate statistical methods, such as multiple imputation [58, 59] and inverse probability weighting [60, 61], will need to be employed to adjust for missing data. An external or internal validation sample with more accurate exposure or covariate information can allow researchers to adjust effect estimates for incomplete or missing information [62,62,63,65].

Another methodological issue with augmenting administrative claims data with information from primary data collection is that women or children who participate in the study may be systematically different from those who do not, which may result in selection bias [66, 67] or affect the generalizability of the study findings. Appropriate analytic methods, such as inverse probability weighting [66], can be used to account for selection bias if factors associated with willingness to participate are measured. Finally, a single data source may not have enough number of pregnancies or outcomes of interest; therefore, pooling of multiple data sources for specific pregnancy studies is often needed to achieve sufficient statistical power [22, 68, 69, 70•].

Conclusions

In summary, administrative claims data offers a number of advantages in studying medication safety in pregnancy. Its limitations can be partially addressed by linking it with other electronic data sources. Supplementing administrative claims data with information from primary data collection may also be advantageous. However, other data sources, such as population-based birth registries, prospective cohort studies, or case-control studies, may sometimes be more appropriate for specific research questions. Regardless of data sources, rigorous assessment of data quality and completeness and use of appropriate analytic methods are always recommended in medication safety in pregnancy research.