1 Introduction

With ongoing urbanization and the adoption of modern lifestyles worldwide, the global burden of disease will most likely increase in the near future (Masoli et al. 2004), and will not be distributed equally across regions. There is a need for high-resolution comparisons to identify populations particularly at risk and/or susceptible to adverse environmental and access-related factors within the cities to guide policy makers by identifying local health disparities in densely populated areas. This paper will address identifying areas of health disparities and inequity based on access to healthcare and environmental factors for asthma patients in the Metropole Ruhr.

1.1 Definition and Epidemiology of Asthma

Asthma bronchiale is a noncommunicable chronic respiratory disease affecting approximately 262 million people worldwide (WHO 2021). Asthma symptoms like wheezing, cough, chest tightness and shortness of breath are caused by chronic inflammation and narrowing of the air passages (NVL 2020; WHO 2021). Due to its high prevalence, asthma is one of the major noncommunicable chronic diseases (WHO 2021). No single universal definition of asthma in epidemiological studies has been agreed upon (Pekkanen and Pearce 1999; Toelle et al. 1992). For the purpose of this study and in accordance with the DEGS1, KiGGS Wave 2, and WiDO studies, the definition of Asthma prevalence is limited to 12-month asthma prevalence with prescribed medication (Robert Koch Institute 2015, 2019; Wissenschaftliches Institut der AOK 2020).

According to GEDA, the 12-month prevalence of the adult population in Germany was 6.2% as captured via self-disclosure of the participants, while in DEGS1, 5% of the participants reported a diagnosis of asthma. Secondary data of the statutory health insurance show a 5.9% prevalence of diagnosed asthma in adults. KiGGS Wave 2 and statutory health insurance data show similar prevalence of 4.0% and 5.1%, respectively, in children and adolescents (German National Cohort (GNC) Consortium 2014; Hoffmann 2007; Langer et al. 2020; Akmatov et al. 2018; RKI 2017; Wissenschaftliches Institut der AOK). In the last 10 years, the number of cases and the age-adjusted mortality for ICD-10 codes J-46 (status asthmaticus) and J-45 (asthma bronchiale) decreased in all age groups in Germany (Bundesärztekammer (BÄK), Kassenärztliche Bundesvereinigung (KBV), Arbeitsgemeinschaft der Wissenschaftlichen Medizinischen Fachgesellschaften 2020). Gender disparities were found to vary among age groups. While boys are more likely to be affected than girls, this relationship reverses into its opposite after puberty, leading to the assumption that sex hormones play an important role in developing asthma (Carey et al. 2007; Fuseini and Newcomb 2017).

A higher body mass index (BMI) is associated with a higher risk of asthma in adults. As the BMI was found to be an inadequate measure for children, fat mass measures exhibit a similar relationship in children (Guibas et al. 2013).

Higher asthma prevalence can be found in groups with low socioeconomic status, while allergies as a common comorbidity potentially influencing asthma were associated with higher socioeconomic status. Laussmann et al. (2012: 310) state that children living in rural areas or smaller cities suffered less often from asthma, which is suspected to be caused, inter alia, by increased air pollution in cities. Furthermore, exposition to traffic and congestion could influence triggering of respiratory diseases (Nowak and Mutius 2004: 511).

1.2 Environmental Risk Factors and Beneficiaries

While allergies like rhinitis and eczema, smoking, and obesity are considered risk factors for both developing asthma and exacerbations, other factors are unique to either condition. Thus, distinguishing between exposure to risk factors for acute exacerbation of asthma and developing asthma is necessary for evaluating risk factors, beneficiaries, and disease burden. For asthma patients, experiencing symptoms, being exposed to triggers, and exacerbations can result in a lower overall quality of life as well as negative effects on social interactions, limitations of activities, and reduced productivity (Stanescu et al. 2019).

1.2.1 Air Quality

Asthma patients are particularly at risk regarding negative impact of indoor and outdoor air pollutants (Bundesärztekammer (BÄK), Kassenärztliche Bundesvereinigung (KBV), Arbeitsgemeinschaft der Wissenschaftlichen Medizinischen Fachgesellschaften 2020; Masoli et al. 2004, 2004). Indoor environmental allergens with a link to asthma onset and/or exacerbation include moulds, house dust mites, and chemicals, while ozone, nitrogen dioxide and PM2.5/PM10 are the most commonly studied outdoor pollutants (Guarnieri and Balmes 2014; WHO 2021). Guarnieri and Balmes (2014) state, that while direct inflammatory effects of air pollutants on airway neuroreceptors occur at very high concentrations not commonly experienced in Germany, ozone, nitrogen dioxide and PM2.5 can induce airway responsiveness and (allergic) inflammation at lower concentrations and are associated with oxidative stress; leading to the well-founded assumption that exposition to pollutants are associated with exacerbation and onset of asthma through oxidative injury to the airways.

The short-term exposure to air pollutants is associated with an increased number of (emergency) hospitalization. A 10 µg/m3 increase in PM2.5 was associated with an 1.5% increase in risk of emergency admission (Bundesärztekammer (BÄK), Kassenärztliche Bundesvereinigung (KBV), Arbeitsgemeinschaft der Wissenschaftlichen Medizinischen Fachgesellschaften 2020). Due to potential confounders, this relationship might not be causal for lower concentrations commonly found in Germany.

Therefore, the national guideline recommends to limit exposure to the aforementioned air pollutants in occupational, indoor, and outdoor settings (Bundesärztekammer (BÄK), Kassenärztliche Bundesvereinigung (KBV), Arbeitsgemeinschaft der Wissenschaftlichen Medizinischen Fachgesellschaften 2020).

1.3 Greenness

Studies suggest that greenness can influence asthma both negatively and positively. As a direct effect, urban green means seasonal exposure to allergens such as weed and grass pollen, associated with increased asthmatic symptoms (DellaValle et al. 2012; Lovasi et al. 2013), as well as reduced air pollutant concentrations and urban heat islands (Nieuwenhuijsen et al. 2017; Shanahan et al. 2015). While exposure to allergens can cause allergic rhinoconjunctivitis and, therefore, increase the risk of asthmatic episodes, early-life exposure is also suspected to prevent allergies and strengthen the immune system in accordance with the hygiene and environmental hypothesis (Mutius 2016; Ruokolainen et al. 2015).

An increased tree density was also found to be associated with a lower prevalence of asthma and a lower risk of asthma-caused hospitalization, although the latter was found to be insignificant after controlling for confounders (Lovasi et al. 2008). Increasing forest and agriculture cover within a 2–5 km range has been found to be associated with less risk of atopic sensitization (Ruokolainen et al. 2015).

Indirect effects of greenness on wellbeing have also been taken into consideration by multiple studies. An increase in urban green in the surrounding living area was found to be associated with a lower prevalence of obesity and excessive screen time in children (Dadvand et al. 2014).

1.4 Access to Healthcare

Timely access to relevant healthcare services influences treatment outcome, quality of care, and utilization, and inadequate access to health care has been associated with increased morbidity, hospitalization rates, and avoidable deaths in asthma patients, especially when combined with lower socioeconomic status (Bryant-Stephens 2009; CDC 2018; Evans et al. 1999; Haselkorn et al. 2008; Jones and Bentham 1997; Levy et al. 2006; Strunk et al. 2002). According to German clinical practice guidelines, disease management, therapy, symptoms, and adherence should be controlled regularly, to make adjustments as necessary (Bundesärztekammer (BÄK), Kassenärztliche Bundesvereinigung (KBV), Arbeitsgemeinschaft der Wissenschaftlichen Medizinischen Fachgesellschaften 2020). Asthma costs and future risks of severe exacerbations are linked to the patient’s asthma control level (Luskin et al. 2014). Due to limited capacities of specialists, basic diagnosis and pulmonary function testing like spirometry is often performed by general practitioners. Structured treatment programs are advised to lie within the treatment scope of the general practitioner or paediatrist, unless the patient’s health status is highly instable, or asthma is classified as severe. In the case of comorbidities, consulting another specialist may be necessary. Hospitalisation may be indicated based on the severity of exacerbation(s) or severe infections affecting the respiratory system. Along with general practitioners, pharmacists are encouraged to instruct the patients in inhalator and medication use, and, when possible monitor adherent and correct use (Bundesärztekammer (BÄK), Kassenärztliche Bundesvereinigung (KBV), Arbeitsgemeinschaft der Wissenschaftlichen Medizinischen Fachgesellschaften 2020). Accessibility of these health facilities and medication such as inhalers is, therefore, crucial for adequate treatment of asthma. Thus, improving the distribution of health services and medication is an important step towards equity of health care (Corburn and Cohen 2012; Masoli et al. 2004).

2 Data Base and Study Area

The Metropolitan Region Ruhr is a densely populated region in North Rhine-Westphalia, located in the west of Germany. In an area of 4439 km2, more than 5 million inhabitants live in the four districts and eleven cities, making the Metropolitan Region Ruhr the largest conurbation in Germany. Settlements make up for 29.6% of the area, and 9.5% are dedicated to traffic (Regionalverband Ruhr 2021) (Fig. 1).

Fig. 1
figure 1

Study area and population density

A great number of different datasets is combined in this study (see Table 1), either directly associated with the final results and hence providing values and attributes that help analysing our declared research goal, or mandatory auxiliary data sets, that are crucial to process some of the values from large spatial units (source zones) to smaller ones (target zones), when conducting the disaggregation (see chapter 3.1). Almost all data sets are open source to provide a maximum level of reproducibility as this study is not only delivering results for a sophisticated assessment of environmental justice issues in the context of asthmatic prevalence, but also provides a workflow that enables the implementation to other areas of investigation. Since most official German socioeconomic data sets rely on the raster-based census data from 2011 and hence are not suitable to be combined with more recent data sets that are part of this study, the commercial data set is a necessary exception from the open data approach. Once new census data are available, it can be replaced due to the universal disaggregation method adopted in this paper. The datasets’ specifications and processing will be described in the respective subsections of the next chapter.

Table 1 Data sets used in this study, non-commercial data unless indicated otherwise (*)

3 Spatial Distribution Modelling

The primary goal is to assign all values that are necessary to evaluate the environmental, socioeconomic and health conditions to one large-scale spatial unit, represented in this study by the residential city blocks, derived from the UrbanAtlas. This enables the comparison of various different attributes, like e.g. accessibilities to urban green or medical facilities, within each geometry as well as the comparison of certain geometries among each other, as the city block represents a homogeneous structure type within a city (Lehner and Blaschke 2019). Additionally, it provides the possibility to detect clusters of similar value ranges and deduce urban hot spots of inequalities. This can then be followed by an isolated focussed investigation of selective city blocks that includes secondary attributes to evaluate the quality of life and health in the respective zone and analyse its structure for a detection of zones of the socioeconomically and sanitary deprived (Theofilou 2013).

To achieve that objective, the whole workflow (see Fig. 2) is subdivided into three major steps, of which the second one is the disaggregation of the (1) socioeconomic variables (absolute number of inhabitants, age, gender, heritage, education, etc.) from PLZ8 units to city block level, and (2) the prevalence for asthmatic illnesses depicted by the absolute number of affected people divided by the number of inhabitants per spatial unit, being transferred from county areas to again the city block level. To do this, a lot of secondary data sets are necessary (Langford 2006), see also Table 1) to create two ancillary data sets in the first step (one for each disaggregation process) that hierarchizes all target zones and results in a sophisticated distribution of values from the large source into the smaller target zones.

Fig. 2
figure 2

Workflow

As the last step, a raster-based network analysis is conducted to quantify the accessibility of urban greens on the one hand, and specific medical facilities such as hospitals, pharmacies, pneumologists and general practitioners on the other hand. The resulting distances are assigned as averages to each city block geometry.

In doing so, every city block geometry has a specific value for each attribute derived from the disaggregation and the network analysis. Few of these attributes are then used to conduct a statistical analysis to point out the whereabouts of city blocks with boldly correlating values and to identify places of significantly high or low levels of accumulated advantages or disadvantages.

3.1 Three-Class Dasymetric Mapping

As the majority of the data sets and values introduced in chapter two is assigned to spatial units that do not match the high-resolution approach of this study, they need to be disaggregated, before incorporating them into the following analysis. A disaggregation can be conducted in many ways, of which all depend on the availability of ancillary data sets, that improve the respective accuracy the more distinct and suitable they can describe and hierarchise the target zones (Eicher and Brewer 2001; Li et al. 2016; Li and Corcoran 2011; Moos et al. 2021).

In this study two different data sets are disaggregated from the bigger source zones (PLZ8, microm GmbH 2020, and municipalities, BKG 2020) to the smaller target zones (city blocks, European Commission and Copernicus 2020, see Fig. 3), while using the method of three-class dasymetric mapping that has been evaluated in various studies (Langford 2006; Mennis and Hultgren 2005, 2006; Moos 2020). Furthermore, this method also has been implemented and evaluated by Burian et al. (2021) into the disaggregator, a tool for the ArcGIS Pro environment that is used in this study. Additionally, it can be transferred to any other source zone property, like e.g. rasterized census data, which can help to update this study as soon as free and contemporary data sets are available.

Fig. 3
figure 3

Source zones (black outlines) and target zones (red)

3.1.1 Ancillary Data

There are several ways to conduct the disaggregation of intensive or extensive values from larger to smaller units, and none of them claims to depict the real world with its results. In fact, a disaggregation is always just an approximation to the real state and hence does only try to reconstruct a resolution that cannot be achieved by the given data (Kennedy and Kennedy 2004; Schulte 2008). But within these different approaches there are huge differences concerning the accuracy of the subsequent disaggregation results. The application of proper ancillary data sets determines the precision and usability of the values assigned to the target zones. Hence, it is crucial to incorporate ancillary data into the disaggregation process that describes and scales the target zones in terms of their potential allocation as accurate as possible. This can be done in different classes, from a single class approach (areal interpolation, Goodchild and Lam 1980) up to a three class dasymetric mapping approach that is applied in this study and that does not only include the size and distribution of the source an target zones, but also the respective usage type and its scalability respective to all other zones within the area of investigation (Langford 2006; Li et al. 2016; Moos 2020).

As in this study two different data sets are disaggregated, two different approaches of ancillary data sets are prepared before performing the disaggregation of the different values with the disaggregator (Burian et al. 2021).

3.1.1.1 Potential Living Area

The first data set that is disaggregated is the microm data set which contains socioeconomic values like age distribution or educational status and the absolute number of inhabitants. As relative attributes, like percentages of inhabitants with a certain characteristic, highly depend on the absolute number of inhabitants per spatial unit and, however. the absolute number of inhabitants highly depends on the amount of people that can live in each target zone, the ancillary data should quantify the potential living area in each target zone. With this, it is possible to distinguish all target zones (city blocks) from one another and put them into a distinct hierarchy, starting with small houses of mixed use, that can only contain a small number of households, up to single dedicated housing blocks with a lot of floors that include dozens of them.

To create a matching hierarchy, the target zones are first filtered with regard to their type of use, while every target zone that does not include at least on house with the smallest type of residential use, no matter if mixed with e.g. industry or not, is excluded from the final pool of city blocks via the implementation of the digital landscape model (DLM). The remaining houses inside the city blocks are then attributed with their respective type of use—also with regard to the DLM—and the number of floors—coming from the standard land values—, developed and evaluated by Moos 2020. As the second last step, each house receives a certain factor which is then used to calculate the absolute size of the potential living area which in turn is then aggregated to city block level.

This results in a data set that contains the city blocks as target zones with the aforementioned distinct hierarchy that refers back to footprint size, type of use and number of floors.

3.1.1.2 Regression Analysis

A logistic regression of the outcome “diagnosis of asthma within the last 12 month including medical prescription” was carried out to be used as the ancillary dataset for disaggregation of regional prevalences to building blocks. For DEGS1 (Robert Koch Institute, Department of Epidemiology and Health Monitoring 2015) and KiGGS Wave 2 (Robert Koch Institute, Department of Epidemiology and Health Monitoring 2019) data provided by the Robert Koch Institute, propensity score matching is performed prior to inclusion in the logistic regression model. The propensity score can be used to control for imbalances in the study groups, predicting the exposure of a subject without including the outcome by means of a logistic regression, to then sample controls based on similarities (Cepeda et al. 2003: 280; Rosenbaum and Rubin 1985). The matched subjects are more closely related regarding their distribution of covariates than randomly selected subjects, therefore not being independent observations (Austin 2009; Rubin and Thomas 2000).

This study uses this property to construct matched samples that are similar in covariates that are not part of the disaggregation process but are thought to have an influence on the individual outcome, while not influencing the aggregated measure per building block. Thus, controlling for differences that cannot be matched in the building block dataset, e.g. the participant’s sex, which is evenly distributed on building block level, but is likely to be correlated with age and asthma in an individual. Therefore, the effect sizes of the logistic regression cannot be assumed to be free of confounding effects, and are thus not to be interpreted as such. As the use of a matched test can result in a lower type-I error rate (Austin 2009), we use propensity score 1:1 matching to prepare the dataset for logistic regression. The variables included in the propensity score are overall health status, sex, west/east/Berlin, and number of persons in Household.

For DEGS1, this resulted in 7856 cases that were included in propensity score matching, with 131 missing cases. The sampling without replacement produced 17 exact matches (325 match tries, 94.769% rejection rate), and 192 fuzzy matches (match tolerance 0.4, 308 match tries, 37.662% rejection rate), while leaving 2 unmatched due to missing keys.

For KiGGS, 5,840 cases were included in propensity score matching, while 9,183 cases were missing. When sampling without replacement, 24 exact matches (686 match tries, 96.501% rejection rate), and 228 fuzzy matches were obtained (match tolerance 0.4, 662 match tries, 65.559% rejection rate), and 267 observations remained unmatched due to missing keys. A graph of propensity scores across treatment and comparison groups was examined, and common support can be assumed for both study groups. For further analysis, matched DEGS1 and KiGGS Wave 2 were combined in one data set.

Eight hundred twenty-five cases identified during the propensity score matching were included in the analysis. The variables age group, BIK, binary CASMIN, unemployed, housing in sqm, noise pollution in the last 12 month (traffic), noise pollution in the last 12 month (industry), building type, and partner in household were identified from literature research as being relevant to the distribution of asthma patients and available as aggregated measures on building blocks level. The specified logistic regression including all variables resulted in an overall accuracy of observed vs. predicted values of 65.3% correct (65.2% correctly specified as absence, 65.5% as presence of asthma).

It is important to note that the model does not reflect causal relationship but is an aid to model probabilities of belonging to one class or the other. Stepwise inclusion or removal according to Wald test did not result in improved model accuracy.

To create an ancillary data set that can put all city blocks within an administrative district into an elaborate hierarchy that provides a proper disaggregation of the prevalence for asthmatic illnesses, it is necessary to collect several different values from variables and assign them to the city blocks. After the assignment, all values can be factorised, summarized and exponentially prorated, using values that are calculated via a comprehensive regression analysis.

Variables that are included into the equation of the regression analysis and their respective origin data set are shown in Table 2.

Table 2 Variables in the equation of the regression analysis

All variables are spatially joined with the city block geometries which results in a final data set that contains an averaged value for each variable in each city block geometry. For each of these geometries all values x are then multiplied with their respective regression coefficient β and summed up via the formula

$$\frac{{e}^{p}}{(1+{e}^{p})}=c+\sum_{i=1}^{10}({\beta }_{i}*{x}_{i})$$

resulting in a hierarchy of values per city block that depicts the combination of all factors and can be used as an ancillary data set for the follow up disaggregation of the regional prevalence data set provided by WiDO (Wissenschaftliches Institut der AOK 2020).

3.1.2 Disaggregation

After the ancillary data for both data sets are prepared, in the final step the disaggregation of all respective values is conducted in two steps—one for each data set. The basic underlying operation weighs all target zones within a source zone according to their respective values—coming from the ancillary data—and distributes the value from the source zone apportioned due to their place in the hierarchy to all target zones.

There are many cases, where the boundaries of the given geometries from both the source zones as well as the target zones are not the same but intersecting each other which leads to the circumstance that the ancillary value of each target zone cannot be used at large. For this issue, the ancillary value is reduced according to the relative area of the target zone that is still included by the respective source zone. Therefore, a target zone can be split up to two or more parts during the disaggregation process (see Fig. 4).

Fig. 4
figure 4

Schematic split up and reunited target zones

But each of the parts gets its fraction value from the respective superior source zone and after the disaggregation the single parts are again merged while adding up all values from each part and assign the sum to the final target zone geometry. As a consequence, the sum of all intersecting target zones within a given source zone does not necessarily depict the exact value from the source zone, as there may be some boundary values that come from a neighbouring zone.

3.2 Network Analysis

As a further variable that can be queried for each city block geometry, the mean distance to several different areas or points are calculated and added to the respective city block. In this study they are divided into two different parts—the distance to urban greens (> 1 ha and > 10 ha) and the distance to pharmacies, hospitals, pneumologists and general practitioners. The definition of urban green is adopted by Grunewald et al. (2017), who followed numerous approaches, supposing that urban green is an accessible and coherent area of at least 1 ha for recreational areas and larger than 10 ha for larger urban green spaces. As the Metropole Ruhr covers a huge area and a classical network analysis that requires a proper network data set of the whole region would be both very time and hardware consuming, for this study the network analysis is based on a raster-based approach. Besides the fact that with this approach the processing time is significantly enhanced, it can also easily include the accessibility or distance measurement of target areas, which is not a designated task in the vector-based approach (Fuglsang et al. 2011; Mulrooney et al. 2017).

As an overall preparation for all data sets, a street network data set is filtered in respect of the operationality for pedestrians on the one hand and for driving vehicles on the other hand. After this filtering the two respective street networks are rasterised to provide the fundament for the upcoming network analysis.

3.2.1 Urban Green

The accuracy of a raster-based network analysis is highly dependent on the resolution of the underlying network raster data set, as the distances from and to the locations of facilities are calculated using the length through all respective raster cells that are crossed in the unit of the given coordinate system. The higher the resolution of the raster cells in the raster network data set, the more detailed the route is calculated and hence the more precise the calculated route distance.

For this analysis, after filtering the street network data set is rasterised with a 10 m-resolution which implicates a distance of ten meters per raster cell in each x- and y-direction and 14.1421 m in diagonal direction. To calculate the accessibilities to urban green, the DLM data set (see Table 1) is filtered regarding the classes of green areas in urban space (grasslands, forests and other vegetation) and their respective size. After all urban green areas with common boundaries have been merged, all areas smaller than one hectare are dropped. The threshold of one hectare has been evaluated as the minimum size of an urban green space that is necessary to use it as a recreational area (Jalkanen et al. 2020; Markevych et al. 2014; Neuvonen et al. 2007).

As for this approach the following calculations rely on distances and not on travel times there is no need to put the street network into a certain hierarchy which is why the cost for crossing each raster cell is generalised and put to the universal value ‘1’, which results in a cost path analysis that focuses only on the shortest distance and not on the usage of potentially faster paths. The following path distance analysis (or distance accumulation analysis) then calculates the distances for each raster cell in the network to the nearest urban green and assigns the respective values to each raster cell. All raster cells are then converted to point geometries and with a mean value operation assigned to their respective overlaying city block geometry (see Fig. 5).

Fig. 5
figure 5

Schematic workflow of a network analysis for accessibilities to urban green areas < 1 ha

3.2.2 Medical Facilities

Access to emergency care is defined as driving or walking distance to the next hospital, as is accessibility to medication by the distance to pharmacies, access to diagnostic procedures monitoring and routine care by the distance to general practitioners and specialists. Healthcare sites were combined from different sources. Hospital addresses were obtained from the DESTATIS German registry of hospitals and complemented by hospitals included in the open data POIs of Metropole Ruhr. Pharmacy locations were collected through the online search tool of Apotheken Umschau (https://www.apotheken-umschau.de/apothekenfinder). General Practitioners and pneumologists were identified through internet research based on google maps (maps.google.de, search strings “Allgemeinarzt”, “General Practitioner”, “Hausarzt” for GP, “Pneumologe”, “Lungenarzt”, “Lungenfacharzt” for specialist care), as well as specialist search on lung atlas (www.lungenatlas.de) and network severe asthma (asthma.de/expertensuche), including all health sites that were located within the study area. The dataset was cleaned by excluding health sites with matching names (similarity >  = 90%) and addresses. Nevertheless, due to different naming conventions and multiple affiliations, not all health sites could be uniquely identified, thus duplicates cannot be ruled out in the resulting data set. The consolidated list of health facility addresses was geocoded in R using Open Streetmap Data as street location information.

While the capacity of hospitals can be assessed through number of beds and staff, the number of practising physicians and/or full-time equivalents cannot be determined from the data sources at hand. To address these limitations, the following analyses are based on the occurrence of one or more physicians at a given location, and the distance to the nearest health facility.

The subsequent network analysis follows the same rules as described in Chapter 3.2.1, except that the facilities are points instead of polygons. This changes the one parameter that to calculate the shortest distance it can be necessary to leave the given network raster data set, as some facilites maybe inside of buildings and hence do not intersect with the network. In these cases, the orthogonal line from the nearest street network segment to the facility point is taken into the calculation, adding the same raster values and resolution from the regular data set (10 × 10 m).

To determine classes of access to health facilities, the unique distance bands per facility type were integrated in an unsupervised classification. It was assumed that the walking distance should be the decisive factor for distances < 1000 m, while distances beyond were covered by vehicle distance. 3 distinct classes were determined for Metropole Ruhr, of high, low, and medium accessibility.

3.3 Spatial and Non-spatial Statistics

Empirical Bayesian kriging was performed for ground-measured NO2, PM10 and PM2.5 data from Geobasis.NRW (Land NRW 2022). The resulting continuous surface layers are regarded as a proxy for air pollution.

Moran’s I was calculated to determine the level of spatial autocorrelation in the modelled data set. Additionally, Local bivariate relationships were calculated to determine the pattern and nature of associations on a local level. To contrast local statistics with overall trends in ANOVA and MANOVA, environmental, access, and socioeconomic variables were clustered based on data-inherent characteristics. Raster-based formats i.e. air pollution layers, were clustered through an unsupervised ISO algorithm, with a minimum class size of 3, maximum groups of 5, 20 maximum iterations, and a sampling interval of 10. Clusters of feature-based access measures were determined by multivariate unsupervised clustering, the optimal number of classes was determined by comparing pseudo-F statistics.

Fig. 6
figure 6

Asthma prevalence and air pollution clusters

All reported p values are Bonferroni-corrected to account for multiple testing.

4 Results

The distribution of predicted asthma prevalence varies significantly within the study area (Global Moran’s I 0,055543, expected − 0,000025, z-score 82,211043, p < 0,0001). Clusters of significantly higher prevalence within a neighbourhood of 100 building blocks are found in the metropolitan areas, forming a belt around the city centres of Duisburg, Mülheim/Ruhr, Essen, Bochum, and Dortmund (west to east). Asthma prevalence in these city centres is significantly lower than in the surroundings. Most clusters of low prevalence can be found towards the northwest of the study area (see Fig. 6). Within the clusters, low outliers in high prevalence clusters are more common than high-low-outliers.

4.1 Accessibility of Health Facilities

Visualising the results of the network analysis regarding the combined accessibility of hospitals and pneumologists (see Fig. 7) reveals a distinct pattern. Before describing this pattern, it must be stated that regions that are close to the border of the Metropolitan Region Ruhr (< 5 km) are excluded from the analysis (indicated by the white band that fades out into the Ruhr area), since the network analysis did not incorporate facility locations outside the boundaries and hence could not consider that some inhabitants inside the Metropolitan Region Ruhr might visit specialists beyond its borders.

Fig. 7
figure 7

Combined accessibilities of hospitals and pneumologists for the metropolitan region Ruhr

The accessibility of both medical facilities is very good in the city centres, especially in the major cities, marked by the biggest blue squares, which was rather expected. There are visible gaps, where the colours tend to be rather yellow and red, indicating longer distances to both facilities, especially in the outskirts in the densely populated centres. Certain small and medium-sized towns in the north-western, north-eastern and the southern parts are lacking both facilities. Particularly the red band that is also covered by the detail map in its southern part shows, that the distribution of specialists and hospitals for people with asthmatic illnesses is not equally distributed when evaluating the whole area.

A similar picture is depicted in the following two Figs. 8 and 9. All respective city centres have an expected high rate of accessibility to both pharmacies and general practitioners while the overall distribution of general practitioners is a little less dense which can be seen for example in the particular dominant yellow and red colours around the centres and in the northern part of the study area. Nevertheless, there are also certain small regions with a clear lack of pharmacies, although these regions scale down to a few city blocks in the surrounding of the dedicated city centres. Taking a more extensive view away from the centres it becomes visible that in the rather rural areas of the study area like the northern, north-western and southern parts access to basic medical facilities is not as equally distributed as a densely populated conurbation like the Metropolitan Region Ruhr could indicate. Especially in the aforementioned red band shown in the lower left corner of the detail map, evoked by missing hospitals and pneumologists in the local area, this becomes clearly visible, as this region lacks all chosen facilities not only in few small parts but to a bigger extent.

Fig. 8
figure 8

Accessibility of pharmacies for the metropolitan region Ruhr

Fig. 9
figure 9

Accessibility of general practitioners for the metropolitan region Ruhr

For the combined measure of access to health facilities derived from data-driven multinomial clustering, ANOVA shows significant differences between the 3 distinct classes at the 0.05 level, with an effect size of 0.04 (partial η2 with adjustment for potential confounding effects of purchasing power, 95% CI). Between groups 1 and 2, the mean differences (0.2572, 95% CI [0.1868, 0.3276], p < 0.05) are smaller than between 1 and 3 (0.9592, 95% CI [0.8406, 1.0778], p < 0.05), but positive in both pairwise comparisons. Both the mean prevalence and the mean absolute number of patients with asthma is significantly higher closer to the medical facilities (Group 1–Group 2: mean difference 2.430, 95% CI [1.963, 2.898], p < 0.05; Group 1–Group 3: 5.498, 95% CI [4.710, 6.286], p < 0.05; partial η2 0.01 with adjustment for potential confounding effects of purchasing power).

4.2 Air Pollution

The three groups of air quality determined by data-driven clustering feature increasing NO2, while PM 2.5 and PM 10 feature a higher mean in group 2 than group 3 (see Table 3).

Table 3 Air pollution cluster signatures (Means)

For all three groups depicted in Fig. 6, the prevalence means differ significantly from each other at the 0.05 level, but the measured effect size is very small (Tukey-HSD partial η2 0.004, 95% CI [0.003, 0.005], p < 0.05 with adjustment for potential confounding effects of purchasing power, η2 0.007, p < 0.05 without adjustment). Post-hoc-tests show that the mean prevalence is lower in group 1 than in group 2 (− 0.255, 95% CI [− 0.3359, − 0.1746], p < 0.05) and 3 (− 0.450, 95% CI –[0.5320, − 0.3681], p < 0.05). Thus, prevalence is associated with the distinct air pollution patterns on a small scale. This relationship could not be confirmed on a larger scale, neither for absolute individual NO2, PM2.5 and PM10 values nor for the identified clusters, as no significant associations can be reported for local analyses within 100 building blocks.

4.3 Purchasing Power

Overall, prevalence is negatively correlated with purchasing power (− 0.025, 95% CI [− 0.035; − 0.015[, p > 0.05). At the local level, the relationship between the two variables is more complex and varies from negative linear to concave to complex. Visually, the map in Fig. 10 shows a quite conclusive spatial distribution of colours from the bivariate legend that represents the two variables prevalence of asthmatic diseases and purchasing power. Especially the highly populated parts of the Metropolitan Region Ruhr appear in yellow colours, representing low values of purchasing power and higher prevalence. In contrast to that, along the valley of the river Ruhr, blue colours indicate higher purchasing power and mainly lower but at a few spots also high prevalence.

Fig. 10
figure 10

Asthma prevalence and purchasing power bivariate

The southern edge of the more yellow zone follows the so called “socioeconomic equator” of the Ruhr area (Bogumil 2020; Kersting et al. 2009; Ziegler 2018) and therefore underlines its existence. The transition zone between the socioeconomic equator and the Ruhr valley is a narrow band of colours for medium values.

North-western and south-eastern rural areas are dominated by medium values also. In the south-eastern part the city of Hagen and the valley of the river Ennepe westwards from Hagen show lower purchasing power and higher prevalence just as the central Metropolitan Region Ruhr.

4.4 Access to Urban Green

Among the 3 classes of access to urban green, group 1 has direct access to urban green and a high mean NDVI surrounding the building block, while group 2 lies in a neighbourhood with low NDVI and a comparatively high distance to green areas. Group 3 has medium access and features medium NDVI. Mean asthma prevalence is lower in group 1 compared to groups 2 (− 0.344, 95% CI [− 0.432, − 0.256], p < 0.05) and 3 (− 0.633, 95% CI [− 0.706, 0.559], p < 0.05). With an effect size of 0.011 (partial η2*), areas with similar access to urban green and medium NDVI are associated with prevalence. When examining the spatial distribution of the absolute number of patients with access to urban green (Fig. 11), clusters of low prevalence and a greater distance to urban green are found predominantly in city centres, clusters of high prevalence and a large distance can be found along major motorways (see Figs. 9 and 10).

Fig. 11
figure 11

Number of asthma patients and access to urban green bivariate

5 Discussion and Conclusion

Expectedly, the spatial distribution of prevalence does neither exhibit an overall pattern for the whole study area nor uniform values. On local level, the spatial distribution varies with purchasing power and accessibility of urban green, and the number of patients living close to a facility exceeds those in remote areas. The smaller the distance to the medical facilities, the higher the predicted prevalence. As the definition of prevalence includes only asthma cases that are diagnosed and medically treated, the population at the outskirts could suffer from underdiagnosis and/or undertreatment due to lack of access to health facilities. In addition, one has to bear in mind that diagnosis, coding schemes, and treatment varies among physicians. Therefore, the observed prevalence based on ICD-10 codes and drug prescription does not necessarily cover the real regional 12-month asthma prevalence (Masoli et al. 2004). At the same time, distance to urban green is lower on average in remote areas, where distance to health facilities is often greater than in city centres. Green spaces can be indirectly beneficial for not developing asthma in the first place, as allergies are less reported in rural areas. On the other hand, proximity to green spaces also leads to greater exposure to allergens for patients with allergy-induced or severe asthma. It has to be noted, that early-life exposure to allergens as a beneficiary factor is not captured for the adult population, as movement patterns have not been included in the study.

In the study area, air quality clusters are associated with prevalence on a larger scale, linking higher 12 month prevalence to higher exposition to air pollutants. These findings are in line with the literature, although it has to be noted that effect size is small and no causal relationship can be confirmed nor denied due to the scale and uncertainty of the modelled air pollution values, as well as interdependencies with other variables and potential confounders like purchasing power. It can be stated that the air pollution measurement location grid is too coarse for establishing individual local links through kriging. All three pollutants are suspected to cause exacerbations and excess (emergency) hospitalization rates, so that exposition to air pollution should be monitored closely. Thus knowing, how many inhabitants potentially suffer most from air pollution and are most likely to need healthcare as a result, is crucial to resource planning and should be investigated further.

Several studies discovered and verified a specific hidden role of the motorway A40 that is crossing the Metropolitan Region Ruhr from east to west, as it is often called the socioeconomic equator. This is not only due to income but also to language, land prices and other factors, and this study adds a few more to that list, while not rejecting the fact there is no true evidence that the motorway itself is the cause (Jeworutzki et al. 2017; Kersting et al. 2009). In fact, the interplay of position and status is something that is a highly complex issue and there is a certain need of more studies that raise and try to answer the question whether the A40 is a true cause for all those visible disparities or rather something that constitutes a fictional but also still physical border within a highly heterogeneous area like the Metropolitan Region Ruhr.

A limitation of this study lies in the modelled nature of the spatially disaggregated measures. Dispersity and diversity will most likely to be underestimated due to missing modelling parameters that are not available on building block level. As the parameters used for modelling local asthma prevalence could not be integrated as explanatory variables to further investigate the impact of socioeconomic disparities or traffic exposure on the outcome, the analysis revealed intra-urban patterns that could be directly attributed to these variables. Nevertheless, the results of the spatial disaggregation pattern analysis are in line with most associations from population-based cross-sectional as well as epidemiological studies, and reveal potential areas of inequity among patients with asthma in a densely populated conurbation setting.