1 Introduction

Globally, the healthcare industry has emerged as one of the primary targets of cybersecurity data breaches, with incidents increasing in size and severity [1,2,3,4,5,6]. In 2021 alone, over 45 million individuals were affected by healthcare data breach incidents  in the United States (U.S.) [6]. The ramifications of cybersecurity breaches in the healthcare sector are particularly severe due to the highly sensitive nature of the data involved. Protected  Health Information (PHI) encompasses a wide array of data, including medical histories, test results, insurance details, and other personal identifiers. For example, on the dark web, PHI is deemed more valuable than credit card data, enabling cybercriminals to extract as much as USD 1,000 per stolen medical record [7].Prior literature has identified cybersecurity data breaches as well as a lack of health information exchange capacity among healthcare providers, different interoperability hardware and software standards, limited training in telehealth utilization, and insurance coverage policies that restrict the use of telehealth as significant obstacles to the successful deployment and use of health information technology (HIT) [8,9,10,11,12,13]. Thus, cybersecurity is crucial in protecting patient, medical, and financial data against cyberattacks. Using software and hardware systems, cybersecurity prevents unauthorized use, disclosure, destruction, alteration, or access from internal or external sources [14, 15]. Earlier research has examined the impact of cybersecurity incidents on healthcare in general [16,17,18,19,20], focusing on data breaches, ransomware, and phishing incidents [21,22,23,24]. Other studies have examined the impact of cybersecurity incidents on healthcare organizations [4, 25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40]. A study revealed that healthcare facilities face several cybersecurity challenges, including data breaches and, more recently, ransomware incidents, along with remote work security challenges, human error, insufficient security awareness, inadequate senior-level risk assessments, business continuity plans, inadequate incident response coordination, budget and resource constraints, and vulnerability of the medical system [41].

Recent studies have shown that hospitals face substantial recovery costs following data breaches, which are particularly burdensome due to the time-sensitive nature of their services [40].Furthermore, Hospital data breach remediation efforts have been associated with lower quality healthcare services, a longer time to perform an electrocardiogram (by up to 2.7 min), and an increase in the 30-day mortality rate from acute myocardial infarction by as much as 0.36 percentage points [42]. For example, clinical laboratories, which depend heavily on networked health information technology (HIT) systems, are acutely vulnerable to the disruptions caused by cybersecurity incidents, such as ransomware attacks that lead to significant downtime [43]. Despite the critical importance of robust cybersecurity measures and the severe impacts of data breaches, which have resulted in the theft or loss of millions of records, many hospitals still need a comprehensive understanding of the issue [2, 15, 44, 45]. There is a pressing need to develop more effective security measures by better understanding how various types of breaches and locations impact different hospital settings.

Rural hospitals in the U.S. healthcare system encounter several notable challenges. They must provide care to rural communities despite having fewer medical staff, and they often face more significant healthcare disparities than urban hospitals, financial limitations, and lower reimbursement rates [40, 46,47,48,49,50,51,52]. Furthermore, many rural hospitals are nonprofit, typically serve a higher proportion of older adults, and deal with logistical challenges like extended emergency medical services (EMS) response times and longer patient transit times [53,54,55,56,57,58,59,60]. Studies suggest that these challenges may make rural hospitals more susceptible to cybersecurity incidents due to limited staffing and financial resources [40, 61,62,63]. Despite their critical role, there remains a significant need for empirical research on cybersecurity in rural hospitals in the United States, highlighting a gap in describing how breaches vary between urban and rural hospitals.

In this study, we examine whether there is an increase in breaches across hospitals each year, contrary to the hypothesis of a static breach rate over time, to determine whether cybersecurity threats are on the rise. We hypothesize that urban hospitals are more susceptible to breaches than rural ones, challenging the notion that hospital location does not influence breach likelihood. We seek to determine if the urban–rural distinction significantly influences breach incidence. Furthermore, we question whether the type of the breach and its location (site) within the hospital significantly predict a hospital's classification as an urban facility, challenging the assumption that these factors are irrelevant to hospital classification. Our approach aims to provide comprehensive insights into how hospital settings, time, breach type, and breach location interact in the context of hospital cybersecurity.

This paper aims to characterize how cybersecurity breaches vary between urban and rural hospitals and how they differ over time and across different types and locations of breaches within hospitals. This study is one of the first to provide insight into how different types and locations of breaches impact urban vs. rural hospitals, aiming to tailor security measures more effectively. By understanding these different impacts, hospitals can allocate their cybersecurity resources more efficiently, focusing on the most significant threats they face. Policymakers and regulatory bodies can leverage the findings to develop or refine guidelines and regulations that protect patient data, ensuring that these measures are attuned to the specific vulnerabilities of different hospital settings.

2 Methods

2.1 Data sources and sample

Our analysis is based on publicly available data breach incident reports submitted to the U.S. Department of Health and Human Services (HHS) Office for Civil Rights (OCR) [64]. Specifically, we used the OCR’s report, which includes all reported data breaches that impact 500 or more individuals at healthcare providers covered by the Health Insurance Portability and Accountability Act of 1996 (HIPAA) between 2012 and 2021. Each breach report provides details such as the covered entity type, the year, the type, and the location of the breach. HIPAA breach notification rules require covered entities and their business associates to notify affected individuals, the HHS, and sometimes the media within 60 days of discovering a breach of unencrypted PHI affecting 500 or more individuals [65]. This study defines a data breach as an unauthorized acquisition, access, use, or disclosure of PHI that violates the information's security or privacy [66].

We systematically linked the Office for Civil Rights (OCR) data to the American Hospital Association’s (AHA) survey database [67]. This linkage enables us to ascertain each hospital's urban or rural status in our sample. The data specifically focused on community hospitals, defined by the AHA as nonfederal, short-term general, and special hospitals. This focus allowed us to examine a more homogenous group of hospitals in our sample.

We identified a final sample of 237 unique hospitals from the linked data, comprising 185 urban and 52 rural hospitals. The classification was based on criteria established by the Office of Management and Budget (OMB), with rural hospitals being those located outside of metropolitan statistical areas (MSAs) [68]. In instances where a hospital reported more than one breach, we included only the breach that affected the greatest number of individuals to focus on breaches with potentially the most significant impact. Our analysis identified 40 hospitals that reported multiple breaches; of these, 37 were urban, and 3 were rural. Lastly, all the data used in the analysis was anonymized to protect hospital confidentiality.

2.2 Independent variables

Our independent variables included: 1) Hospital setting, which differentiates hospitals based on their geographical settings as either urban or rural; 2) Year of the breach, a temporal variable allows for the analysis of trends over time; 3) Type of breach categorized by the OCR dataset [64] as follows: (a) hacking/Information Technology (IT) incidents involving PHI; (b) improper disposal of PHI; (c) loss of PHI information; (d) other, including breaches not from a desktop, laptop, or email; (e) theft of PHI; (f) unauthorized access or disclosure of PHI; and (g) unknown type of incident; and 4) Location (or site) of the breach, defined as the hospital's operational technology, categories include (a) desktop computer, (b) laptop, (c) electronic medical records (EMR), (d) network server, (e) email, (f) other portable electronic devices (tablets, smartphones, external storage devices, etc.), and (g) paper or films.

2.3 Dependent variable

The number of breaches per year was considered the dependent (response) variable, with the total number of hospitals each year included as a log offset. This approach enables the analysis of breach incidents, facilitating the identification of trends and patterns over time across different hospital settings (urban/rural), breach submission years, types of breaches, and breach locations (site).

2.4 Analysis

A Poisson regression model was initially used to predict the number of breaches each year, considering time (i.e., year of breach), type of hospital, type of breach, and breach location as independent variables. Upon detecting significant overdispersion, as indicated by a delta deviance value significantly exceeding the degrees of freedom, we transitioned to a Quasi-Poisson model to handle the extra variation more efficiently [69, 70]. The Quasi-Poisson model was used, assuming the variance-mean relationship is linear [71]. The goodness-of-fit for the models was assessed using delta deviance and Pearson chi-squared statistics [69, 70]. The data were loaded into a Jupyter Python Notebook environment (https://jupyter.org) for data pre-processing, cleaning, integration, and filtering before conducting statistical analysis using Python and Pandas libraries [72]. The Poisson and Quasi-Poisson regression models were deemed appropriate for our hospital data breach analysis because the data consists of independent, non-negative integer counts of the number of times a data breach occurred; other studies have used Poisson models in a similar context to our research [73, 74].

3 Results

Table 1 presents the comprehensive dataset used in our analysis. It details data breaches across 237 distinct community hospitals, including 185 categorized as urban and 52 as rural, from 2012 to 2021. The table categorizes breaches by type, location, and breach submission year, providing a breakdown for urban and rural hospitals.

Table 1 Data breaches by type, location, and year in urban and rural community hospitals, 2012–2021

3.1 Trend analysis—urban vs. rural hospitals

The trends of data breaches over time in urban and rural hospitals, as shown by the Quasi-Poisson regression model, revealed significant predictors (Table 2). Initially, a Poisson regression model was applied; however, it exhibited overdispersion (delta deviance = 44.959, df = 17, p < 0.001), necessitating the adoption of a Quasi-Poisson regression model to account for this variability accurately. Within this model, the hospital setting was a significant predictor, with urban hospitals experiencing a higher log count of breaches than rural hospitals (ß = 0.554, SE = 0.255, z = 2.170, p = 0.044). Additionally, the year was a significant positive predictor, indicating an annual increase in the log count of breaches over the study period (ß = 0.174, SE = 0.040, z = 4.294, p < 0.001). These results suggest that hospital setting, and time significantly contribute to the number of data breaches, with urban hospitals and more recent years associated with higher breach counts.

Table 2 Parameter estimate of the Quasi-Poisson regression model for predicting the number of breaches over time among urban and rural hospitals

Figure 1 illustrates the prediction of the number of breaches over time, showing an increasing trend for both hospital settings (urban/rural); there is a noticeable upward trend in the predicted number of breaches over time, indicating an increase in such incidents. However, the trend for urban hospitals shows a steeper increase, suggesting that urban hospitals have been experiencing a faster growth rate in the number of breaches compared to rural hospitals.

Fig. 1
figure 1

Predicted number of breaches by year and hospital setting (rural vs. urban). This plot shows the predicted number of hospital breaches from 2012 to 2021, differentiating between rural (cyan line) and urban (red line) hospitals. The shaded regions represent the 95% confidence intervals

3.2 Type of breach

The analysis revealed overdispersion in the Poisson regression model, as indicated by delta deviance of 253.15, df = 1, p < 0.001; thus, a Quasi-Poisson regression model was adopted (Table 3). It was found that the number of breaches increased over time (p < 0.001), with urban hospitals experiencing significantly more breaches than rural hospitals (p < 0.045). The most common type of breach was due to hacking/IT incidents. Compared to hacking/IT incidents, all other types of breaches were significantly less frequent, except for unauthorized access/disclosure, which did not differ significantly.

Table 3 Parameter estimate of Quasi-Poisson regression model for predicting the number of breaches over time among urban and rural hospitals with respect to the type of breach

The results of data breaches within hospitals from 2012 to 2021 (Fig. 2) demonstrated significant temporal trends in the incidence of breaches, with an observed escalation in the projected frequencies. In urban hospitals, incidents were markedly escalated due to hacking/IT incidents and unauthorized access/disclosure breaches. Conversely, rural hospitals experienced a more moderate increase in these specific breach types. Other breach categories, such as improper disposal, loss, and theft, exhibited relatively consistent trends with lower incidence rates throughout the study period.

Fig. 2
figure 2

Predicted number of data breaches by type of breach and hospital setting (rural vs. urban) from 2012 to 2021. This figure displays the predicted number of data breaches over a ten-year period, segmented by the type of breach and differentiated by hospital setting (rural versus urban). The prediction lines represent various breaches: hacking/IT incident (red), improper disposal (purple), loss (green), other (blue), theft (cyan), and unauthorized access/disclosure (orange)

3.3 Location of breaches

The results showed overdispersion in the initial Poisson regression model (delta deviance = 340.79, df = 150, p < 0.001), leading to the adoption of a Quasi-Poisson regression model (Table 4). The model indicated an increase in the number of breaches over time (p = 0.002). However, there was no significant difference in the number of breaches between urban and rural hospitals (p = 0.096). The number of breaches had increased over time (p = 0.002). The highest location of breaches was due to emails; with reference to this, all other breaches were significantly lower, except network servers. However, the hospital setting was included in the model as it was significant in the Poisson regression model (i.e., significance loss due to inflation in standard errors in the Quasi-Poisson regression model).

Table 4 Parameter estimate of Quasi-Poisson regression model for predicting the number of breaches over time among urban and rural hospitals with respect to breach location

The results of the location of data breaches in hospital settings revealed distinct trends from 2012 to 2021(Fig. 3). There was a considerable increase in predicted breaches, with differences between rural and urban hospitals. The email type of location was the most prominent line in both results, showing a significant upward trend, particularly in urban hospitals. There is a steep increase in breaches related to network servers in urban hospitals, whereas the trend is less pronounced in rural hospitals. Other portable electronic devices, including breaches from tablets, smartphones, etc., indicated an increase, although less pronounced than email or network server breaches. Paper/Films demonstrated a slight increase, with rural hospitals having fewer predicted breaches than urban ones. Finally, a steady increase in breaches related to EMR systems is noted, especially in urban hospitals.

Fig. 3
figure 3

Predicted number of data breaches by location of breach and hospital setting (Rural vs. Urban) from 2012 to 2021. This figure displays the predicted number of data breaches over a ten-year period, segmented by breach location and differentiated by hospital setting (rural versus urban). The prediction lines correspond to breaches associated with Email (red), Desktop Computers (purple), Electronic Medical Records (green), Laptops (blue), Network Servers (cyan), Others (yellow), Other Portable Electronic Devices (pink), and Paper/Films (orange)

4 Discussion

Our study supports the hypothesis that data breaches are increasing annually across urban and rural hospitals. Our Quasi-Poisson regression analysis provides evidence of a significant increase in data breaches across both urban and rural hospitals during the 2012–2021 decade. This trend is quantitatively supported by our regression outputs, which show substantial yearly increases in breach incidents. Particularly, urban hospitals have experienced a more pronounced escalation in data breaches, as indicated by a higher log count of breaches (ß = 0.554, SE = 0.255, z = 2.170, p = 0.044). This result underscores a relatively steeper increase in urban settings than in rural hospitals. These findings underscore the constant challenges in mitigating cybersecurity incidents effectively across diverse hospital settings. Our findings expand the current literature on cybersecurity in the hospital industry, which is increasingly recognized as one of the primary targets for cybersecurity data breaches. Previous studies have documented these incidents' rising scale and severity globally [1,2,3,4,5,6]; our findings add detailed results to the trends between urban and rural hospitals over a decade. Specifically, our study shows that urban hospitals have a higher incidence of breaches than their rural counterparts.

The type of breach analysis indicates that urban hospitals experience a higher frequency of data breaches than rural hospitals, with an increasing trend of breaches over time. Hacking/IT incidents are the most common type of breach, with other types, such as improper disposal, loss, theft, and unauthorized access/disclosure, occurring less frequently. Our results also indicate that unauthorized access/disclosure breaches occur slightly less frequently than hacking/IT breaches but not to the degree that reaches statistical significance. This result is significant because it could imply that both types of breaches are commonly reported, but that hacking/IT incidents may just be slightly more prevalent or easier to detect. The prevalence of hacking/IT incidents in urban settings likely reflects their more extensive IT infrastructures and larger operational scale, including more patients and staff [4, 25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40]. This trend may underscore the broader adoption of HIT in urban hospitals, which, while enhancing healthcare delivery, also expands their vulnerability to hacking/IT incidents. In contrast, the type of breach in rural hospitals demonstrates a more moderate increase in the same categories of hacking/IT incidents. Although these increments are less marked than in urban hospital settings, the potential impact on rural healthcare delivery could be significant. Rural healthcare facilities often grapple with operational challenges such as limited medical staffing, smaller budgets, and less advanced IT infrastructures [53,54,55,56,57,58,59,60]. Thus, even incremental increases in hacking/IT incidents could severely strain these already resource-constrained environments, potentially hindering their ability to deliver essential healthcare services. Future research should seek to explore the impacts of hacking/IT breaches on rural healthcare delivery.

Regarding the location of breaches, email and network servers have been identified as the predominant locations for data breaches, with a notably higher incidence in urban hospitals. This finding is consistent with existing literature highlighting email as a frequent vector for security breach incidents [21,22,23,24]. Conversely, breaches involving desktop computers, electronic medical records (EMR), and laptops occur with significantly less frequency, emphasizing the particular vulnerability of email systems to cyber security incidents. Our analysis also shows a marked increase in breaches related to network servers in urban settings compared to rural hospitals. This pattern suggests that urban hospitals, with their complex HIT systems and more extensive data storage needs, are more vulnerable and indicate that rural hospitals are not unaffected by such challenges. The similar levels of risk associated with network servers and email systems across both rural and urban settings highlight a crucial aspect of hospital cybersecurity, both are essential focal points of HIT operations that require robust protection. Furthermore, the higher incidences of breaches via email and network servers mark a significant departure from earlier findings in the literature, which predominantly reported breaches involving paper/films as the most common in hospital settings [31]. This shift points to an evolving landscape of cybersecurity threats, moving towards more technologically sophisticated breaches that may reflect the growing HIT complexity of hospital settings infrastructures.

The limitations revealed by the current study set an important agenda for our future research: First, while this study aims to characterize the nature of data breaches, it reveals the need for deeper investigations into their impacts on patient safety and financial health, particularly in rural hospitals. Such research should consider the broader implications of cybersecurity incidents, enhancing our understanding of their direct and indirect effects across different hospital settings. Second, the study relies on OCR data and the AHA Annual Survey, which only documents breaches affecting 500 or more individuals. This could underrepresent smaller breaches that are nonetheless impactful. The accuracy and completeness of these reported data are assumed by HIPPA regulations [65]. However, compliance may vary among hospitals, which might lead to gaps in data due to inconsistent or incomplete reporting practices.

Third, our research model, primarily focusing on community hospitals in urban vs. rural settings, is limited by the available data and initial exploratory hypotheses. Expanding this model to include variables such as hospital ownership, teaching status, bed size, and system affiliation could deepen our understanding of the factors influencing cybersecurity breach vulnerabilities; extensive research with expanded datasets and more detailed data should be vital to determine the conditions under which these breaches occur. Fourth, the potential underrepresentation of breaches in rural hospitals due to smaller sample sizes or reporting biases could skew results toward urban hospitals, potentially misrepresenting the actual landscape of cybersecurity threats. Fifth, we treated the type of breach and the location (or site) as independent variables, thus assuming that cybersecurity breaches occur homogeneously across all locations. However, it is possible that cybersecurity breaches could vary across different hospital settings due to diverse environmental and operational factors, potentially overlooking unique vulnerabilities or threats faced by different hospital settings. Therefore, further research is warranted to evaluate the differences in cybersecurity breaches in relation to their locations. This additional analysis could provide deeper insights into how environmental and operational factors at various hospital sites might influence the data breach security vulnerabilities they face.

Sixth, further research should thoroughly investigate the complexities of digital IT infrastructures within hospital settings. With their high volume of data transactions, urban hospitals may face increased risks of cybersecurity breaches due to multiple access points that introduce technical vulnerabilities and human errors [2, 15, 44, 45]. Future studies should delve into the nuances of the different hospital settings' network architecture, including the research into portable electronic devices and EMRs, and evaluate the efficacy of their existing security measures with these technologies. Seventh, additional research is needed to understand how socioeconomic demographics influence cybersecurity practices and outcomes in urban and rural areas. The digital literacy of patient populations may significantly affect PHI security and privacy management. Exploring how the urban–rural digital divide impacts vulnerability to cybersecurity threats should inform data protection strategies tailored to different hospital settings. In rural hospitals, where disparities in health outcomes are well-documented, investigating the impact of limited resources and access to technology on cybersecurity breach risks may be crucial [40, 46,47,48,49,50,51,52].

5 Conclusions

This study examines the rising cybersecurity threats faced by both urban and rural hospitals, underscoring the pressing need for improved data security measures. The findings highlight that urban hospitals are more susceptible to data breaches than rural ones, suggesting a significant vulnerability linked to their complex health information technologies and larger operational scales. The study also confirms a troubling trend of increasing cybersecurity breaches year over year, emphasizing the ongoing challenges in curbing these incidents. Important limitations of this research include its reliance on breach data that only includes incidents affecting 500 or more individuals, potentially underrepresenting the total number of breaches. Furthermore, the study focused primarily on the distinction between urban and rural hospitals, leaving out other potentially influential hospital characteristics like ownership type, teaching status, and system affiliation. These factors could provide deeper insights into different hospital settings' vulnerabilities and specific needs. Future research must expand the analysis to incorporate these additional factors to describe a more comprehensive picture of the cybersecurity landscape. This would not only enhance the robustness of the findings but also help tailoring cybersecurity strategies that are finely tuned to the needs of specific hospital settings. Additionally, there is a need to explore the direct impacts of data breaches on patient care and hospital operations to further guide policy and operational decisions. By applying these findings, healthcare administrators can better allocate resources toward the most effective security measures, potentially reducing the frequency and severity of breaches. This research serves as a foundation for ongoing discussions and further studies to enhance hospital resilience against cybersecurity threats. This research is anticipated to be valuable to healthcare administrators, practitioners, and researchers. It also aims to spark further discussion and research on cybersecurity in rural healthcare institutions, enhancing the capacity of both urban and rural hospitals to prevent, mitigate, and recover from data breaches.