Introduction

Lyme disease, caused by the bacteria Borrelia burgdorferi, is transmitted to humans through an infected tick bite. Most Lyme disease cases occur in the northeastern and midwestern United States (US); however, the geographic distribution of tick vectors and incidence of tick-borne diseases has been expanding [1]. Recent estimates suggest that Lyme disease was treated and diagnosed in approximately 476,000 persons annually between 2010 and 2018 [2]. Lyme disease is commonly categorized into two main stages: early stage, where infection is localized (e.g., an expanding erythema migrans lesion), or disseminated stage, where infection has spread beyond the initial bite location [3,4,5]. Untreated, Lyme disease typically progresses from a localized skin infection, often with systemic non-specific symptoms (e.g., fatigue, headache), to various disseminated infections. Most uncomplicated cases fully recover after antibiotic treatment [6]. Disseminated manifestations range from secondary erythema migrans rashes, acute neurological effects (e.g., facial palsy, meningitis, radiculopathy), and carditis, which usually occur weeks to months after infection, to Lyme arthritis, the most common late disseminated infection, which usually occurs months to years after infection [3]. Only 30–50% of cases recall being bitten by a tick [7], and serological tests have low sensitivity early in the disease course [6]; therefore, early diagnosis largely relies on recognition of a characteristic erythema migrans rash. However, up to 20–30% of cases may not have erythema migrans [7], and atypical presentations of erythema migrans are more likely to be misdiagnosed [8, 9].

It is not well understood why some patients do not develop erythema migrans, or why others develop specific disseminated manifestations. Few epidemiologic studies have examined risk factors for disseminated stage Lyme disease, compared to early disease, or risk factors for developing specific manifestations. These studies have largely evaluated relations with demographic variables (age, sex, race, ethnicity), season of diagnosis, or with specific presenting clinical signs and symptoms [10,11,12,13,14]. Most studies identified Lyme disease cases and manifestations from surveillance data [11, 13, 15, 16] or from hospital or clinic data [14, 17, 18]. A recent study used billing codes in insurance claims from high-incidence states to identify disseminated Lyme disease manifestations, including arthritis, facial palsy, carditis, complete heart block, and meningitis, and evaluated trends in age, sex, seasonality, and hospitalization [10].

Like claims data, electronic health record (EHR) data contain rich longitudinal clinical data on diagnoses, medications, and laboratory test orders; however, EHR data also contain the results of laboratory testing and narrative free text notes added by a health care provider. Most EHR studies identify clinical outcomes using structured diagnosis data, such as diagnosis or billing codes; however, natural language processing algorithms are increasingly being developed to extract valuable clinical data from narrative text notes [19]. EHR and claims-based studies of Lyme disease have largely used diagnosis data to identify Lyme disease [2, 20,21,22,23,24]. Although prior EHR studies have identified erythema migrans using keyword-based text matching algorithms [25, 26], no studies have used both diagnosis data and narrative free text data to comprehensively classify Lyme disease diagnoses by stage of disease and manifestation. Classifying Lyme disease in the EHR will allow for future research to evaluate the effectiveness of intervening on modifiable risk factors for disseminated stage diagnosis that can reduce the burden of Lyme disease.

This study had two aims. First, we evaluated the classification ability of diagnosis data and free text from encounter notes to classify Lyme disease by stage of disease and clinical manifestation. Second, we examined whether individual, community-level, and health care factors were associated with disseminated stage compared to early stage Lyme disease.

Methods

Study population

Geisinger is an integrated health system that provides primary, specialty, urgent, and emergency health care services at community practice clinics and hospitals in central and northeastern Pennsylvania. For this study, we started with 1,128,671 patients in the Geisinger EHR (primary care and non-primary care) between January 2012 through December 2016 and whose most recent address was within Geisinger’s primary service area and surrounding counties (38 counties).

Lyme disease case identification

We identified persons diagnosed with Lyme disease by the presence of at least one diagnosis for Lyme disease, using both Epic (Verona, WI) electronic diagnosis group (EDG) names or International Classification of Diseases Ninth and Tenth Revision, Clinical Modification (ICD-9-CM 088.81 and ICD-10-CM A69.2x) codes (Additional file 1: Table S1). Relevant EDG names were identified using an iterative keyword search reviewed by a clinician (BSS). To focus on new diagnoses, we excluded persons with diagnoses indicating a history of Lyme disease up until six months prior to the index diagnosis. The Geisinger Institutional Review Board approved this study and waived informed consent. Hereafter, we will use the term “case” to describe persons who were diagnosed with Lyme disease.

Classification of Lyme disease diagnoses by clinical stage and manifestation

We categorized Lyme disease cases by clinical stage and manifestation using two sources, diagnoses associated with either encounters or medication orders, and narrative free text associated with an in-person outpatient, emergency, or urgent care encounter (Fig. 1). We classified Lyme disease cases as early localized (erythema migrans) or disseminated stage disease. Among those with disseminated stage disease, we further classified cases into four non-mutually exclusive groups: arthritis, neurological manifestations, carditis, and “other disseminated” manifestations. Manifestations were assigned when recorded within a day or less from a generic Lyme disease code.

Fig. 1
figure 1

Algorithm to classify Lyme disease cases by clinical stage and manifestation. ICD International Classification of Disease, EDG electronic diagnosis group

We examined diagnoses (EDG names, ICD-10, ICD-9) for Lyme disease (Additional file 1: Table S1) and related co-diagnoses (Additional file 1: Table S2) from inpatient, outpatient, emergency, or urgent care encounters and from medication orders, and narrative free text from outpatient, emergency, or urgent care encounters within 31 days before or after the index Lyme disease diagnosis. EDG names allow for a higher level of diagnostic detail. EDG names can identify all Lyme disease manifestations, including erythema migrans, which cannot be identified using ICD codes. Based on the information related to stage and manifestation, diagnoses could be categorized as either: (1) a recognized manifestation of Lyme disease (e.g., Lyme arthritis), (2) information on stage but not manifestation (e.g., “early localized Lyme infection”); or (3) no information on disease stage or manifestation (e.g., “Lyme disease”). In cases with only generic Lyme disease diagnoses, we identified diagnoses related to manifestations of Lyme disease (e.g., rash, arthritis, meningitis) in order to determine stage or manifestation.

To extract diagnosis information from free text, we used regular expressions in Stata (MP, Version 15.1) to match words and phrases indicating Lyme disease stage or manifestation. Key words and phrases were developed using an iterative process led by a clinician (BSS), including a review of a subset of notes from a variety of years and settings to account for common misspellings and abbreviations. We identified strings of text before and after matched keywords to exclude where the concept was negated (e.g., “no sign of”), temporally unrelated, (e.g., “history of”), or unrelated to the subject (e.g., “husband has arthritis”). In cases with evidence for both early and disseminated stage Lyme disease, cases were classified as disseminated stage under the assumption that the presenting signs and symptoms were the later disseminated manifestations.

Validation of EHR algorithm for early and disseminated stage Lyme disease

We conducted a validation study to calculate the positive predictive value (PPV) of our EHR algorithm to classify Lyme disease cases as either early stage or disseminated stage. The PPV was calculated as the percentage of persons where the algorithm-assigned stage of Lyme disease was the same as the manual review-assigned stage of Lyme disease (reference standard). Two investigators (B.S and K.M) manually reviewed the EHR records. EHR documentation reviewed included demographics, health insurance payor, diagnoses (e.g., ICD codes and EDG names), Lyme disease serologic testing results, antibiotic medication orders, problem list, and free text clinical notes. Among Lyme disease cases that the EHR algorithm classified by stage of Lyme disease, we randomly selected a subset of 100 persons, 50 persons assigned as early stage and 50 persons assigned as disseminated stage. We stratified selection by year of diagnosis (10 persons per each of the 5 study years) because patterns of documentation, including for Lyme disease, have changed in the Geisinger EHR over time. We calculated 95% confidence intervals for the PPV using a binomial test.

Other variables

We extracted individual-level covariates from the EHR, including age, sex, race, ethnicity, setting of diagnosis, and season of diagnosis. Exposure to disease-carrying ticks is most likely in the late spring and summer [7, 27,28,29], and we hypothesized that compared to early stage cases, disseminated stage diagnosis would be more likely during other seasons. Individuals were defined as having primary care contact if they had two or more outpatient primary care encounters (e.g., family practice, internal medicine, pediatrics, and obstetrics/gynecology departments) prior to Lyme disease diagnosis. In Pennsylvania, Medical Assistance (i.e., Medicaid and Children’s Health Insurance Program [CHIP]), pays for health care services for eligible individuals [30]. We considered the percentage of time an individual used Medical Assistance prior to Lyme disease diagnosis as a surrogate for household socioeconomic status (SES) [31].

In early stage Lyme disease, serology is not recommended due to a high probability of false negatives [32]. Blood samples drawn more than four weeks after disease onset are recommended to be tested for IgG, not IgM, because of high risk of false positive results with IgM at this stage [33, 34]. Consistent with prior analyses of the Geisinger EHR [21], we defined a positive Lyme disease serological test as either an IgG positive test (either alone, with a positive enzyme immunoassay [EIA], or with negative EIA), or an IgM positive test with a positive EIA within 30 days of diagnosis. The vast majority of Lyme disease diagnoses with positive IgG Western blots had a positive EIA (96.4%), meeting the recommendations of the CDC diagnostic criteria ((CDC), 1995). For the cases with positive IgG Western blots without EIA (3.5%) or with a negative EIA (0.1%), we thought it was possible that the initial positive EIA was obtained and recorded outside of Geisinger and thus categorized these test results as seropositive. We defined evidence of appropriate treatment as a medication order for an appropriate antibiotic [5, 24] within 30 days before or after the Lyme disease diagnosis date.

We used geocoded addresses to assign community variables. Most addresses (88%) were geocoded to the street address; otherwise, we assigned the zip code centroid. We used the U.S. Census Bureau categorization of urbanized area (50,000 or more people), urban clusters (at least 2500 and less than 50,000 people), or rural areas (persons not in urbanized areas or urban clusters) [35]. We hypothesized that disseminated Lyme disease may be more likely in rural areas, due to higher systemic and individual barriers to health care in rural areas compared to urban areas [36].

Statistical analysis

We first used descriptive analyses to evaluate selected individual, community, and health care variables among Lyme disease cases classified by stage and by manifestation. We evaluated how these variables differed across classified and unclassified cases, and by data source (e.g., both diagnoses and free text, diagnoses only, or free text only). To identify risk factors for disseminated disease, we conducted a case-only analysis of disseminated stage compared to early stage Lyme disease cases. With multivariable Poisson regression models using generalized estimating equations (GEE) with robust standard errors, we estimated risk ratios (RR) [37] of factors associated with disseminated stage Lyme disease (vs. early stage) and with specific disseminated manifestations (arthritis, neurological manifestations, carditis, secondary erythema migrans, or unspecified “other disseminated” manifestations vs. early stage). All models specified robust standard errors clustered within community (township, borough, or city census tract). Initial models were adjusted for a priori potential confounding variables. Final models included age (< 10, 10 to < 20, 20 to < 30, 30 to < 50, 50 to < 70 [reference], and 70+ years), sex (female, male [reference]), race (non-white, white [reference]), use of Medical Assistance (0% [reference], > 0 to < 50%, and 50–100% of prior observation time), primary care contact (yes, no [reference]), setting of diagnosis (outpatient [reference], urgent care, emergency, inpatient), season of diagnosis (winter, spring, summer [reference], fall), and urban/rural status (urban, urban cluster, or rural area [reference]). In exploratory analyses, we evaluated whether sex, Medical Assistance, or season modified relations between setting of diagnosis and Lyme disease stage by including relevant cross-product terms in separate models, fully adjusted for all covariates. We conducted statistical analyses in Stata (MP, Version 14).

To address potential outcome misclassification, we evaluated a model in which disseminated cases were required to have positive serology (IgG or IgM). We hypothesized that requiring a positive serology result would exclude false positive cases (e.g., due to coding errors or “rule-out” laboratory tests) but would also exclude some true cases (e.g., in which serology was conducted outside Geisinger or tested before antibodies were detectable).

Results

Lyme disease cases classified by stage and manifestation

We identified 7310 cases of Lyme disease between 2012 and 2016 that met inclusion criteria (Fig. 1). Using diagnoses and narrative free text, we classified 4530 cases (62%) as early or disseminated stage. Of the classified cases, 70% were classified as early stage disease. Of the 1,359 disseminated cases, Lyme arthritis was the most common manifestation (55%), followed by neurological manifestations (34%), carditis (6%), and secondary erythema migrans (7%). Diagnoses classified as disseminated stage that did not meet criteria for arthritis, neurological effects, carditis, or secondary erythema migrans were classified as “other disseminated” manifestations (6%). In a validation sample of 50 early and 50 disseminated cases, we found that the PPV of early stage disease was 92% (95% CI 81–98%) and the PPV of disseminated stage Lyme disease was 88% (95% CI 76–95%). Lyme disease cases were distributed across the study area in central and northeastern Pennsylvania (Fig. 2).

Fig. 2
figure 2

Spatial distribution of Lyme disease cases classified as early and disseminated stages of Lyme disease

Classified Lyme disease cases by source of staging information

Among all persons diagnosed with Lyme disease (n = 7310), 23% were classified using information in both diagnoses and text, 26% were classified using information in diagnoses only, 13% were classified using information in text only, and 38% were not classified. Overall, diagnoses were able to classify 49%, compared to 36% with free text. The percentage of diagnoses classified increased over the study period: persons classified by stage using diagnoses increased from 43 to 50% and cases classified by text increased from 21 to 55%.

Among Lyme disease cases classified by stage (n = 4530), we observed notable patterns of sources of staging data over time and by participant characteristics (Additional file 1: Table S3). Over the study period, the percentage of classified cases with information in diagnoses alone decreased from 60 to 21%, while cases classified by text alone increased from 19 to 27%. Cases with staging-relevant data from both diagnoses and text were younger, while cases with staging data only in text were older. The most common source of data for staging varied by setting of diagnosis, primary care contact, and time using Medical Assistance.

Cases classified by stage compared to cases not classified by stage

Lyme disease cases who were classified by stage (62%, n = 4530) differed from those who were not (38%) (Additional file 1: Table S4). Over the study period, the percentage of cases classified increased from 54 to 69%. Classified cases were younger, more likely to be diagnosed in the summer, and more likely to live in a rural area. Cases with sufficient information relevant to staging were more likely to be diagnosed in urgent care, less likely to be diagnosed in the emergency department, and more likely to have primary care contact.

Characteristics of early and disseminated stage Lyme disease cases

We observed several differences in individual, community, and healthcare characteristics across Lyme disease stages and manifestations (Table 1). The secondary erythema migrans group and the “other disseminated” manifestations group had the lowest median age, followed by arthritis, and neurological manifestations and carditis. Lyme carditis was less commonly observed in females, whereas other manifestations were more equally distributed among men and women. Lyme disease cases were most commonly diagnosed in outpatient settings; however, Lyme carditis was more evenly split between inpatient and outpatient settings. Diagnosis in summer was most common across manifestations, except for Lyme arthritis, which was diagnosed relatively equally in fall and summer. For most early (94.5%) and disseminated (82.3%) cases, an appropriate antibiotic was ordered within 30 days. Among disseminated stage cases, when serology is most likely recommended, most cases (59%) had at least one serologic test order within 30 days. Just under half of all disseminated cases had positive IgG or IgM serology (45% overall, 77% of cases with test orders).

Table 1 Selected characteristics of 3151 early stage and 1379 disseminated stage Lyme disease cases in the Geisinger electronic health record, 2012–2016

Disseminated vs. early stage Lyme disease

In a multivariable model examining risk factors of disseminated versus early stage Lyme disease, we observed several interesting associations (risk ratio [95% confidence interval]) (Table 2). Compared to persons with 0% of time using Medical Assistance, persons who used Medical Assistance for 50% or more time prior to Lyme disease diagnosis had a higher risk of disseminated Lyme disease overall (1.20 [1.05, 1.37]), arthritis (1.37 [1.15, 1.64]), and neurological manifestations (1.22 [0.93, 1.60]), no difference in risk of carditis (1.04 [0.47, 2.27]), but a lower risk of secondary erythema migrans (0.50 [0.24, 1.22]) and other disseminated stage (0.15 [0.02, 1.16]). Individuals with primary care contact had lower risk of other disseminated disease (0.59 [0.54, 0.64]). Compared to outpatient settings, we found that the inpatient diagnosis was associated with higher risk of disseminated stage (2.21 [1.98, 2.47]) while urgent care was associated with lower risk of disseminated stage (0.22 [0.17, 0.29]). Emergency department diagnosis was associated with higher odds of neurological Lyme disease (1.47 [1.07, 2.02]). Higher risk of disseminated Lyme disease, particularly Lyme arthritis, were observed in cases diagnosed in winter (4.54 [3.84, 5.37]), followed by fall (2.75 [2.35, 3.22]) and spring (2.53 [2.10, 3.04]), compared to summer.

Table 2 Adjusted associations (risk ratio, 95% confidence interval) of independent variables with Lyme disease stage (disseminated vs. early stage)

In sensitivity analyses to address possible misclassification of our Lyme disease definition, inferences were similar to the main analysis when we required disseminated cases to have positive serology (IgG or IgM, 45% of disseminated cases) (Additional file 1: Table S5), or an antibiotic order within 30 days before or after the diagnosis date (Additional file 1: Table S6). In exploratory models of effect modification, we found statistical interactions between time on Medical Assistance and setting of diagnosis (p = 0.007) and season and setting of diagnosis (p < 0.001); however, the associations were qualitatively similar to the overall model (results not shown).

Discussion

In this study, we used clinical data in diagnoses and narrative text from the Geisinger EHR to classify 62% of Lyme disease cases by disease stage and manifestation. Diagnoses, particularly the EDG names that could specifically identify all early and disseminated manifestations, were able to classify 48% of cases. With a novel rule-based text-matching algorithm, we extracted staging information from narrative free text, and classified an additional 13% of cases that could not be classified using diagnoses alone. We observed similar proportions of Lyme disease manifestations compared to the patterns observed in surveillance data and identified novel associations with SES- and health care-related variables. Medical Assistance, a proxy of low SES, was associated with higher odds of disseminated disease, while primary care contact and diagnosis in the urgent care setting (compared to outpatient) were associated with lower odds of disseminated disease. These findings inform future research to determine whether improvements in SES or healthcare access can improve timely diagnosis and treatment of Lyme disease and whether targeted interventions on these factors could prevent disseminated disease.

In this study, we used both diagnoses and narrative free text to identify stage and manifestations of Lyme disease. EDG names could identify all early and disseminated Lyme disease endpoints. With ICD-10, a provider could specify a disseminated Lyme manifestation (e.g., arthritis, meningitis, other neurologic endpoints [A69.2x]), whereas with ICD-9, a provider needed both a generic Lyme disease code (088.81) and a co-diagnosis (e.g., arthritis [711.8x]) to indicate a specific manifestation. Changes in available diagnoses could have influenced classification trends over time; however, EDG names and ICD-10 codes were available throughout the study period, while ICD-9 codes were used for the small proportion of encounters (< 10%) where they were available (inpatient and emergency encounters before 2015). Over the study period, the proportion of all Lyme disease cases staging-relevant information in free text increased from 21 to 55%. By the end of the study, 18% of diagnoses (27% of classified by stage) had staging-relevant information in free text that was not available in diagnoses. We hypothesize that these temporal trends were largely the result of concurrent administrative and legal incentives to increase the volume and richness of clinical information in narrative free text. Historically, most EHR-based epidemiologic studies have identified clinical outcomes using ICD diagnosis codes alone, but extracting information from free text is increasingly common [19]. EHR studies of Lyme disease, however, have previously used narrative text only to identify erythema migrans [25, 26]. Our results suggest that EHR free text can yield valuable information on Lyme disease stage and manifestations beyond what is available from diagnoses alone.

This study classified 70% of Lyme disease cases who could be staged as early stage and 30% as disseminated stage. Importantly, the approximately 90% PPV observed in the validation sample was consistent with a common acceptable level for validation of EHR algorithms [38]. This distribution is in line with national surveillance data, where 72% of confirmed Lyme disease cases had erythema migrans and 28% had at least one disseminated manifestation, with arthritis being the most common (28%), followed by neurologic endpoints (13%) and carditis (2%) [11]. Information on manifestation was only available for 60% of confirmed surveillance cases [11], which is comparable to 62% of cases classified in our study. A recent claims data study from high-incidence US states categorized only 6% of cases as disseminated stage; however, this is likely an underestimate because they identified disseminated cases by a clinically relevant billing code (e.g., arthritis, facial palsy) within 30 days of a generic Lyme disease diagnosis and assumed cases that did not meet these criteria were early stage [10].

We observed similar patterns across age, sex, and season of diagnosis across Lyme disease manifestations as observed in prior studies. In national surveillance data, the frequency of Lyme arthritis is more common among children and adolescents [11] while carditis is more common in young adults, especially young men [16]. Our observations of similar proportions of manifestations to national surveillance for Lyme disease, which is known to over-represent more severe, disseminated cases [24, 39], could suggest common sources of bias less health care provider documentation of uncomplicated, less severe cases in the EHR.

In a Lyme disease vaccine clinical trial, only 2–3% of the 296 definite, possible, or asymptomatic Lyme disease cases developed disseminated manifestations [4]. In a study of 88,022 persons diagnosed with Lyme disease in claims data in high-incidence states, 2.8% cases had arthritis, 2.7% had neurologic manifestations, and < 1% had carditis [10]. Neither study identified secondary erythema migrans. In claims data, the incidence of facial palsy was highest in young men 10–14 years, a newly identified high risk group [10]. We found that most Lyme disease cases were diagnosed in the summer, with the exception of Lyme arthritis, which was more evenly distributed throughout the fall and winter. The strong seasonal association with disseminated Lyme disease, especially the increased odds of Lyme arthritis in winter, is in line with prior surveillance findings that arthritis is the most common manifestation among Lyme disease cases with illness onset during December to March [11].

No prior quantitative epidemiologic studies have examined the relation between SES or health care factors and risk of Lyme disease manifestations. In this study, we observed associations with MA, primary care contact, and setting of diagnosis. Eligibility for MA is determined by federal and state poverty thresholds [40], and is used as an indicator of low SES in EHR studies [31]. Individuals who regularly see a primary care provider may be less likely to delay care. The higher risk of disseminated disease in inpatient settings likely reflects the acute severity of some manifestations, particularly Lyme carditis, which can be fatal, or neurologic symptoms like facial nerve palsy. In prior qualitative research with Geisinger Lyme disease patients, delayed diagnosis and treatment was attributed to appraisal delays (e.g., due to symptom misattribution, intermittent symptoms, atypical or no erythema migrans), behavioral delays in seeking care (e.g., due to inadequate health insurance) and misdiagnosis in urgent care or emergency settings [41]. The protective association observed here between urgent care and disseminated disease suggests that erythema migrans can often be reliably diagnosed in urgent care. We speculate that misdiagnosis in urgent or emergency settings would be more likely with atypical erythema migrans or in cases without any rash, which we could not reliably assess in this study.

This study had some limitations. We could not account for care provided outside of Geisinger, which could be a source of missing data, although the Geisinger health system provides primary, specialty, urgent, and emergency health care services. This could explain our observations of some Lyme disease diagnoses without antibiotic treatment or disseminated stage cases without a record of positive serology, although misclassification of Lyme disease stage is also a possibility. However, sensitivity analyses restricting to persons with antibiotic orders and disseminated diagnoses with positive serology did not affect our inferences. We were not able to classify 38% of Lyme disease cases by stage of disease. The differences in demographic and health care-related variables we observed between Lyme disease cases that could and could not be classified by stage may have influenced our results. However, the percentage of unclassified cases decreased over the study period, as more specific diagnoses were recorded and the richness of narrative free text increased over time. Free text notes were not available from the inpatient setting, which may have resulted in disseminated stage case misclassification, or from phone calls, which may have resulted in early stage case misclassification. While we accounted for common spelling errors, abbreviations, and excluded simple instances of negations or diagnoses not related to the patient in the free text algorithm, extracting accurate clinical information from free text encounter notes is notoriously challenging due to nonstandard grammar, common shorthand and misspelling, and auto-generated text strings [42]. Diagnostic coding accuracy likely varies by provider characteristics, and setting, with higher accuracy for inpatient diagnoses that are often updated at discharge [43]. Using EDG name diagnoses, in addition to ICD-9 and ICD-10 diagnosis codes, allowed for increased flexibility and specificity of Lyme disease manifestations. Although EDG name diagnoses may have limited generalizability to non-Epic EHR, Epic is one of the largest providers of EHR to hospitals in the US. Geisinger’s primary care population is representative of the region’s general population in terms of age and sex [44]; however, findings from a largely rural and suburban, majority non-Hispanic white population may not be transportable to other populations.

Conclusions

This is the first study to categorize Lyme disease cases by clinical stage and manifestation using both diagnoses and narrative text data from the EHR. Methods for identifying Lyme disease cases by stage and manifestation are critical for Lyme disease epidemiology, for both surveillance and inferential analyses. We found novel evidence that lower SES was associated with higher risk of disseminated Lyme disease, while primary care contact and diagnosis in the urgent care setting were consistently associated with lower risk of disseminated manifestations. Early Lyme disease causes relatively mild symptoms, and uncomplicated cases often respond well to short courses of oral antibiotics. In contrast, disseminated Lyme disease can be severe enough to require hospitalization, and in rare cases of Lyme carditis, can be difficult to treat or even fatal [45]. Delayed diagnosis, which makes disseminated infection more likely, is also a risk factors for post-treatment Lyme disease syndrome [46, 47]. Public health interventions to prevent progression to disseminated stages of Lyme disease, especially in vulnerable groups, are necessary to reduce the substantial health care costs of Lyme disease [48].