BACKGROUND

Collectively, rare conditions affect millions of people worldwide, often causing severe physical and mental impairments and reducing quality of life.1 These disorders are challenging for the individual, family, and society, leading to considerable morbidity and mortality, disproportionate use of health care services, and significant social, emotional, and financial difficulties.2 , 3 Better information is needed to describe the epidemiology and the secondary health conditions associated with these disorders, monitor progress of public health interventions, develop screening strategies for the detection of new cases, inform policies to improve healthcare, and address differences or disparities in the impact on the population.4 There is currently no system or integrated approach to identify and describe persons affected by these conditions across their life-span.4 Population-based registries and surveillance systems for rare conditions require infrastructure, constant tracking, and a significant investment of resources.5 Linked population-based administrative data are a powerful, low-cost tool for public health research.6

We used linked state-level administrative data from South Carolina, a moderate-sized state (population 4.7 million) that is 64.0 % non-Hispanic White, 28.0 % non-Hispanic Black or African American, 5.3 % of Hispanic/Latino ethnicity, and 3.6 % other race/ethnicity.7 The South Carolina Budget and Control Board, Division of Research and Statistics (DRS), is a central repository for health and human service data. Data housed at DRS and utilized for this project originated from Medicaid, the State Health Plan (SHP), the Uniform-billing Hospital Discharge Dataset (HDD), the Department of Disabilities and Special Needs (DDSN), the Department of Social Services (DSS), the State Department of Education (SDE), the South Carolina Department of Health and Environmental Control Vital records, and the Department of Unemployment and Workforce. See Table 1 for a description of the data sources and variables analyzed.

Table 1 Databases Included in this Project: Available Through the South Carolina Budget and Control Board, Division of Research and Statistics (DRS)

We focused on adolescent and young-adult years, because for persons with rare conditions, this is an important transitional time from pediatric to adult health care and from living as dependents to a more independent lifestyle. We chose Fragile X syndrome (FXS), spina bifida (SB), and muscular dystrophy (MD) as examples of rare conditions.

FXS, a neurodevelopmental disorder produced by a mutation in the Fragile X Mental Retardation 1 gene on the X chromosome, is the leading inherited cause of intellectual disability.8 Impairment can range from learning difficulties to intellectual disabilities and autism or “autistic-like” behaviors. Few studies have described the adult experience of persons with FXS.9

SB is a birth defect that occurs when the neural tube fails to close properly during fetal development. The life experience of people with SB is related to the level of the neural tube defect and the presence or absence of hydrocephalus. Studies conducted over the past 25 years have shown that although survival and quality of life for those with SB have improved, young adults with SB have a lower rate of independence, employment, and educational attainment.10 16

Muscular dystrophies are a group of genetic conditions causing progressive skeletal muscle weakness. The severity, age of onset, rate of progression, complications, and prognosis of this group of diseases vary greatly.17 Duchenne muscular dystrophy is the most common form among children, with an average age at diagnosis of 5 years, loss of ambulation between 7 and 13 years, and death in early adulthood.17 Improved treatment has extended the life expectancy of these individuals, increasing the use of adult services.18 , 19

Here we present our methods to study adolescents and young adults with these three rare conditions. We discuss identification in the general population, the use of novel data sources, and provide examples of research questions that can be answered with these data. Finally, we discuss the strengths and limitations of our approach.

METHODS

Through a series of statutes and agreements, agencies and organizations entrust data to DRS while retaining access control at all times. DRS developed a series of algorithms using source-specific personal identifiers to create a global unique identifier (UID). Personal identifiers include (but are not limited to) social security number, name, date of birth, race, and gender. The data are cleaned and standardized before the generation of a UID, which without ever being associated with any personal identifiers, is used on all subsequent episodes of services, regardless of data source or service provider. Using the UID in lieu of personal identifiers enables views of data across multiple providers while protecting confidentiality. Data usage approvals were obtained from participating organizations from which the data originated. Data limitations and quality measures are discussed in Table 1. All data linkages were performed at DRS. Non-DRS investigators were provided a de-identified data set or aggregate results, depending on data usage approval.

Identification of Persons with FXS, SB, and MD

All persons aged 15 to 24 years with an ICD-9-CM primary or secondary diagnosis code for FXS, SB, or MD during the study period 2000–2010 were identified from Medicaid, HDD, and SHP claims data. Additionally, we identified cases of intellectual disability, autism spectrum disorder, paraplegia, wheelchair dependence, and progressive muscular atrophy to determine whether a diagnosis of FXS, SB, or MD had been made prior to age 15 years. See Table 2 for a list of specific and potential inclusion diagnosis codes.

Table 2 International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM) Diagnosis Codes Utilized

Confirmatory variables including FXS genetic testing results from the DDSN Fragile X Registry and myelomeningocele/ meningocele indicator from birth certificate records were used to further identify the cohort. If confirmatory data were not available, then at least two diagnosis codes were required for the case to remain in the final cohort. If other chronic condition codes were found in the record and were coded at a frequency greater than the condition of interest, the case was excluded. The following conditions were considered in this category: 1) cerebral palsy for SB and MD and 2) multiple sclerosis for MD. Additionally, all persons had to be enrolled in either Medicaid or SHP for at least one full year when 15 to 24 years of age. Numbers of exclusions per criterion are shown in Table 3. The target population was residents with one of the rare conditions. We investigated the types and frequencies of healthcare needs of these residents compared to healthy controls and between conditions. The purpose was to highlight the transition from adolescence to adulthood in this cohort of young people with rare conditions.

Table 3 Number of Potential Cases Identified, Exclusion Criteria, and Number Excluded

Medical Encounters and Vital Records

Once cohorts for each condition were established, all medical claims and outpatient pharmacy encounters (regardless of diagnosis codes) from Medicaid and SHP were selected for the study period using each UID.

Emergency department visits were indicated for inpatient and outpatient encounters if a patient was admitted through the emergency department. Surgical encounters were identified by 1) flagging non-zero operating-room charges and 2) searching ICD-9-CM procedure codes for valid operating-room procedures (Healthcare Common Procedure Coding and Current Procedural Terminology codes were converted to ICD-9-CM codes using Truven Health Analytics software).

Claims for professional services provided within a facility were considered to be the same encounter of care as the facility encounter. This allows for facility and professional paid amounts to be averaged per encounter along with other variables of interest (e.g., number of procedures performed or number of different specialty types). Non-facility professional components were assigned an encounter type based on rendering provider type, location, and specialty. If more than one rendering specialty was noted, the specialty with the highest paid amount was assigned. Encounter types included, but were not limited to, inpatient, emergency department, outpatient surgery, primary care, specialty care, therapy, home health, durable medical equipment, and pharmacy. DRS staff conducted all encounter assignments. When Medicaid or the SHP was not the only payer, it is possible that professional and/or facility components were paid entirely by another source and are missing from the analyses.

Death certificate data for 2000–2011 were extracted using each case’s UID. Age at death was calculated, and underlying cause and comorbid causes of death were selected for additional analysis.

Education, Work History, and Social Services

SDE annually provides a file that represents a “snapshot” of students at or near the beginning of the school year20 and includes current grade enrolled. This information is available through the twelfth grade; however, completion of high school is unknown. Students receiving special education services were identified through the Education Finance Act, which contains up to ten codes. A review of all available records determined receipt of special education services, type of educational placement, and setting.

DDSN’s mission is “to serve individuals with intellectual disability and related disabilities, autism, head and spinal cord injury in the least restrictive environment possible by providing support services to these individuals and their families.”21 DDSN data housed at DRS include: 1) a residential setting file, 2) a service file, and 3) a skills file. DSS Supplemental Nutrition Assistance Program (SNAP—formerly Food Stamps) currently serves over 400,000 households and 800,000 people within the state.22 The time of receipt of SNAP benefits was calculated for each UID. Data from the Department of Unemployment and Workforce for each case from 2005–2012 were summarized and aggregate results provided to researchers. Data included number of years employed and annual salary.

RESULTS

Medical Encounters and Vital Records

In the final cohort, 125 persons (71 % male) with FXS, 695 persons (41 % male) with SB, and 220 persons (69 % male) with MD were identified. Of these 1,040 cases, 373 (35.9 %) were identified solely from the Medicaid database, 610 (58.6 %) from Medicaid and HDD and/or SHP, and 57 (5.5 %) from HDD and/or SHP but not Medicaid. Therefore, 94.5 % of the cohort was found at some point in the Medicaid database. The mean and standard deviation of visits by encounter type for each condition are shown in Table 4.

Table 4 Annual Mean Number of Visits (Standard Deviation) By Condition and Encounter Type

Of the 220 individuals with MD, 56 (25 %) died during this period. The median age of death was 22 years; most died of cardiac complications with G71 (ICD-10 code for MD) listed 64 % of the time as the underlying cause of death. Of the 695 individuals with SB, 49 (7.1 %) died during this period. Their median age of death was 24 years, and the most frequent cause of death was kidney failure or other renal complication. Few FXS cases were linked to death certificate data.

Education, Work History, and Social Services

Of the 1,040 individuals in the cohort, 970 (93 %) could be linked to SDE data, of whom 604 (62 %) had enrolled in the twelfth grade. The percentages were similar among the three rare-condition groups for those that were linked (FXS = 64 %, MD = 65 %, and SB = 61 %). Seventy-one percent of the cohort required special education assistance, with a considerably higher percentage requiring assistance in the FXS group (90 %) than the other two groups (MD = 65 % and SB = 63 %). Five hundred and fifty-seven (54 %) cases were linked to DDSN data. The percentage linked was higher for the FXS group (80 %) than the other two groups (MD = 54 % and SB = 49 %). During the study period, 169 individuals received food purchasing assistance. The percentage was lowest for MD (10 %), highest for SB (18 %), and FXS was 17 %. The percentage of persons working was highest in the SB group (43 %) and lowest in the FXS group (25 %). Thirty percent of those in the MD group were employed at some point during this time period.

DISCUSSION

To better understand determinants of health and well-being among South Carolinians with rare conditions, a linked data set was constructed from health, education, and social services sources. The purpose of this manuscript is to describe the data sources, case ascertainment strategies, variables, and examples of research questions that may be answered using this data set. This extensive network of linkable data is unique to South Carolina and has required careful negotiation, attention to privacy concerns, and a commitment by stakeholders over decades; however, other states have embarked on similar projects to answer questions about the health of their citizens (e.g., 23 24). Three key principles have served DRS well over the years: 1) the data housed in the warehouse are never owned by DRS; permission to use the data must be granted by the data owner for each new request; 2) a UID is used in lieu of personal identifiers; and 3) the DRS is politically and policy neutral; it is a service entity for researchers, other state agencies, and private citizens.

Like anyone else with a major health-related condition, persons with rare conditions require frequent hospitalizations, emergency department visits, therapeutic interventions, durable medical equipment, and home health services. Thus, hospital discharge and medical claims data are useful for case identification. The most common case identification strategy for identifying cohorts using ICD-9-CM codes reported in the literature for FXS, SB, or MD25 28 is for inclusion criteria to be limited to at least one diagnosis. While there have been no specific reviews of ICD-9 code accuracy regarding FXS, SB, or MD in the United States, broader reviews of the literature reveal that unintentional coder errors caused by the limits of the clinician’s knowledge and experience with the condition, misinterpreted information from the clinical record, and data entry mistakes lead to inaccuracy in ICD coding.29 31 To control for miscoding, our inclusion criteria required two or more diagnoses of FXS, SB, or MD. The specification of at least two occurrences of a diagnosis was used in a Canadian study, which linked two surveillance systems; for spina bifida, they found an agreement rate of 64.1 %.32 Higher positive predictive values for identifying cases of amyotrophic lateral sclerosis for the National Amyotrophic Lateral Sclerosis Registry were obtained when criteria for case identification included multiple occurrences of the diagnosis code, searching for diagnosis codes in multiple years, and linking to additional sources for case confirmation.33 , 34

Examples of Research Questions that Can Be Answered Using the Linked Administrative Data System

This linked data system is useful for answering research questions related to health care utilization and social outcomes for people with the specified conditions. The data are currently being analyzed by our research team to: 1) Investigate the characteristics of people 15–19 and 20–24 years old with FXS, SB, or MD, including their demographics, socioeconomic status (SES), educational attainment, employment experiences, and social services received. 2) Compare rates of hospitalizations and emergency department visits for all and select causes, including ambulatory care sensitive conditions among persons with FXS, MD, or SB to a comparison group without any of these conditions.35 3) Examine the role of SES in the pattern of health care use (primary regular care vs. hospitalizations and emergency room visits) among persons with MD covered by Medicaid.36 4) Identify conditions co-occurring with FXS, including attention deficit disorder, epilepsy, autism spectrum disorder, and intellectual disability.37 Additional studies are under development.

Strengths and Limitations of Using State-Linked Data System to Study Rare Conditions

Calculated prevalence rates per 10,000 persons for individual study years (2000, 2005, and 2010) using case identification algorithms defined in the methods prior to eligibility requirement exclusions range from 1.9 to 3.3 for SB, 1.1 to 1.2 for MD, and 0.3 to 0.5 for FXS. FXS rates are slightly lower than reported prevalence rates,38 40 whereas SB and MD rates are similar to reported prevalence rates.41 43

The strengths of this method include the diverse types of information available, including data from medical claims, hospital discharges, education, social, and disability services. Using merged data from multiple sources, we identified persons with one of three rare conditions based on all years of medical claims data available. We used sources such as birth certificates and genetic clinic testing for case confirmation. We have multiple observations on medical encounters, education, and social services for the same person. Linking events across disparate sources and over time enables the study of life patterns in health, health care, and other domains. However, one limitation of administrative data is a lack of individual-level measures of SES. We constructed correlates of SES including food stamp receipt, high school completion, and race; these measures are often related to health care use patterns. There is variation in SES for those covered by Medicaid in South Carolina because of the Katie Beckett rule of the Tax Equity and Fiscal Responsibility Act of 1982 (Pub.L. 97–248), which ensures that children with severe disabilities are covered regardless of parents’ income. Likewise, adults with substantial disability qualify for Medicaid regardless of their parents’ income, if their own income and assets are below poverty levels. Along with 11 other states, South Carolina extends Medicaid coverage to people with disabilities who work. South Carolina also provides Medicaid to working people with a disability who earn below 250 % of the federal poverty level through Section 4733 of the Balanced Budget Act of 1997. Thus, for children and some adults with rare conditions in South Carolina, enrollment in Medicaid is not synonymous with poverty.

We recognize a number of limitations. Due to the nature of administrative data, we have information only on persons who interacted with the state systems. Even for those for whom data are available, we do not necessarily capture all information; for example, some individuals may be dually insured in addition to Medicaid or the State Health Plan, and we have no access to billing to other payers (unless evidence of those charges was present in hospital discharge data). Second, administrative data were not collected for research purposes and do not always contain a sufficient level of detail. Finally, medical claims data are not linked with medical records or clinical data. Absent associated registries, we cannot validate our identification strategy and confirm all of the rare conditions.

Despite these limitations, linked state data systems are valuable and relatively inexpensive data sources with which to study rare conditions. We learned about care patterns, disparities in care, care expenditures, comorbid conditions, and educational and employment outcomes during transitions from adolescence to young adulthood for persons with rare conditions. Research using linked administrative databases yields important public health insights for the population with rare disorders. To construct a similar data system absent a centralized service entity like DRS, interested researchers outside of South Carolina might need to contact individual entities for data, and then manage the linking of data across those entities.