Introduction

In an era marked by rapid urbanization and growing human population (United Nations 2018), the need for effective urban biodiversity assessment methodologies has become increasingly important. Urban ecosystems, composed of a unique mosaic of natural and built environments, are home to a vast array of flora and fauna that contribute significantly to overall ecological health and human well-being (Elmqvist et al. 2013). However, these ecosystems are often under considerable stress, facing threats such as habitat fragmentation, pollution, and climate change (Grimm et al. 2008). By implementing robust and comprehensive assessment methodologies, we can better understand, monitor, and mitigate these challenges, ultimately fostering more sustainable and resilient urban landscapes. Urban biodiversity assessment methodologies not only provide valuable insights into the health of urban ecosystems but also hold great potential for informing urban planning and management decisions (Tzoulas et al. 2007). The development and application of these methodologies can guide the integration of green infrastructure, the restoration of ecological connectivity, and the enhancement of ecosystem services in urban areas (Lovell and Taylor 2013). In turn, these actions support the global movement towards sustainable urban development and contribute to achieving the United Nations' Sustainable Development Goals (SDGs) (United Nations 2015). Therefore, the significance of urban biodiversity assessment methodologies extends beyond ecological considerations, ultimately shaping the future of our cities and the well-being of their inhabitants.

According to current estimates of worldwide urban growth, 290,000 km2 of natural habitat will probably be lost to urban growth between 2000 and 2030 (Simkin et al. 2022). This includes an increase in urban land near protected areas of more than three times the previous level. It is anticipated that a large portion of this urban growth will take place in areas that are rich in biodiversity, many of which had relatively little urban development in 2000, while it is expected to cause losses in species that are endemic to the ecological region, a significant number of which are found in ecoregions where urbanization is likely to pose a significant danger (Simkin et al. 2022). A further estimate states that the main threat to about 8% of the terrestrial vertebrate species on the International Union for Conservation of Nature’s (IUCN’s) Red List of Threatened Species is urban growth (McDonald et al. 2008).

Despite the growing recognition of the importance of urban biodiversity assessment methodologies, a key challenge faced by researchers and practitioners in this field is the lack of high-resolution biodiversity data for urban environments (Fassnacht et al. 2014). Traditional biodiversity assessment methods, such as field surveys, can be resource-intensive and may not capture the spatial and temporal heterogeneity of urban ecosystems (Pett et al. 2016). Furthermore, urban environments can be characterized by unique and complex land-use patterns, making it difficult to obtain comprehensive and accurate biodiversity data at a citywide scale (Andersson et al. 2014). Local authorities are the ones to decide how to protect urban habitats and develop relevant action plans for safeguarding biodiversity in cities. The EU has formulated measures to address the global biodiversity crisis through the Convention on Biological Diversity (CBD). However, the authorities seem to remain little informed about how to measure biodiversity and what the current status of their cities is. Biodiversity is perceived as non-tangible, despite the severity of the biodiversity loss crisis, while the lack of data and easy-to-measure indicators adds to their difficulty to assess the present and future situation (Laspidou and Ziliaskopoulos 2022).

Citizen science data has emerged as a promising solution to help address these gaps and supplement urban biodiversity assessments (Theobald et al. 2015; Kullenberg and Kasperwski 2016). It can become an essential and cost-effective tool for monitoring biodiversity and promoting engagement in an adaptive management learning processes (Aceves-Bueno et al. 2015, 2017). By engaging local communities in data collection efforts, these projects can generate spatially extensive and temporally dynamic biodiversity information (Dickinson et al. 2010). Citizen science can greatly enhance the scope of species monitoring efforts, leading to valuable contributions towards conservation and ecology (Tredick et al. 2017; Brown and Williams 2019). Depending on the project’s design, it can yield high-quality data with strong inferential power, especially given the appropriate training and oversight. Along these lines, biodiversity citizen science projects have been steadily growing, and they have increasingly come to be recognized as a valuable source of data, fostering public engagement (Burgess et al. 2017; McKinley et al. 2017). However, a systematic review of studies from 2005 to 2014, addressing urban ecology of birds and butterflies, found that only 21% and 26% respectively of these studies utilized citizen science data despite the potential for such data to contribute to our understanding of species-environment relationships, breeding studies, and guild analysis (Wang Wei et al. 2016). Thus, while citizen science presents enormous potential in the realm of biodiversity assessment and conservation, improved visibility of citizen science practices, training of volunteers, and strategies to overcome biases among scientists are essential to maximize its utility and acceptance in the scientific community (Burgess et al. 2017). Conservation scientists should actively collaborate with civilian programs, offering feedback and guidance to enhance the collective contribution to biodiversity conservation (Loss et al. 2015). However, to date, citizen-science projects are not used as primary tools for conservation research, indicating a disconnect between the collected data and the scientific community. Establishing standard protocols and balancing between encouraging public participation and maintaining data quality stand as key implementation challenges (Aceves-Bueno et al. 2015; Brown and Williams 2019). Also, factors such as narrow awareness of citizen science projects among professional scientists, the unsuitability of some biodiversity sciences for citizen science, variations in data quality, and biases for particular data sources (institutional, age, or education level of data collectors) play significant roles in this discrepancy (Ratnieks et al. 2016; Burgess et al. 2017).

Specifically for urban biodiversity, the role of citizen science is a growing field of interest. By leveraging citizen-generated data, researchers can examine how urban biodiversity responds to urban intensification and various forms of urbanization (Leong and Trautwein 2019). Interestingly, this approach has revealed that urban biodiversity often reflects the taxa naturally occurring in a specific region, while highly urbanized areas tend to align less with the regional context (McDonnell and Hahs 2008; Leong and Trautwein 2019). Overall, the existing literature suggests that the use of citizen science in monitoring and understanding urban biodiversity provides a valuable resource in the development of effective conservation management strategies. Continued development of robust methodologies and increased use of citizen science data can help mitigate biodiversity loss from urbanization (Müller et al. 2013). The integration of citizen science data can enhance the ecological validity of urban biodiversity assessments, while also promoting public awareness, fostering stewardship of urban ecosystems and facilitating the development of more sustainable and resilient cityscapes (Newman et al. 2017). The volume and accessibility of this data allows researchers to assess conservation and management objectives on a wide scale, from local and national to even continental and global levels (Loss et al. 2015). To facilitate this scale bridging, study design and data collection should be strategically guided by clearly defined research questions and hypotheses from the study’s outset, subsequently contributing to greater scientific rigor (Loss et al. 2015).

The fact that citizen science biodiversity data collection requires that people are outside their home and in a natural environment could play an important role in data collection during lockdown periods. The recent COVID-19 pandemic has served as a natural experiment to evaluate the resilience and adaptability of biodiversity citizen science programs. Crimmins et al. (2021) found varied participation patterns in biodiversity-themed community science programs in the context of pandemic-related closures and restrictions. Alarmingly, certain programs showed a decrease in the number of participants and observations during this period. However, there was a notable increase in observations originating from urban areas, indicating a potential to utilize shutdown periods productively for biodiversity observation within cities (Crimmins et al. 2021).

When combined with remote sensing technologies, citizen-science data can provide a more complete and up-to-date understanding of urban biodiversity patterns and trends (Kamp et al. 2020). Remote sensing data analysis can be very helpful in enabling monitoring capabilities at various spatial and temporal scales, thus helping authorities plan for and take action towards biodiversity conservation goals (Kwong et al. 2022). Satellite data and imagery is becoming increasingly available as open data (e.g., https://copernicus.eu/). At the same time, spatially explicit species data collected via citizen-science initiatives and applications, such as the Global Biodiversity Information Facility (GBIF–https://www.gbif.org/) and other biodiversity-related Big Data, are constantly evolving and providing a variety of application options to enhance the sustainability of landscapes. Yet, while citizen science is becoming more prevalent in this area, reviews suggest that there remains untapped potential for assessing urban biodiversity through the combination of citizen science and geospatial data (Wang Wei et al. 2016). Applying geospatial analysis techniques to data that is currently available as free and open can have a significant impact on our ability to understand how anthropogenic pressures and climate change affect biodiversity, while improving our ability to predict the consequences of changes in drivers at different scales.

Addressing this potential of assessing urban biodiversity through a combination of open earth observation data and citizen-science data and building on the lack of information about habitats in cities and associated biodiversity, we present in this article a framework for assessing biodiversity in cities. We use a combination of open and free remote sensing data for habitat classification and citizen-science species occurrence data for validation. The framework initially develops an urban habitat classification scheme that uses satellite data to map different habitat types within an urban landscape; this is a result of a k-means clustering scheme that uses a series of features collected through open and public databases and it is performed at a zip code level for the Athens Metropolitan Area. Species occurrence data that have been previously collected and published through the GBIF portal (GBIF 2023) are used to map distributions of species across these habitat types. This framework is motivated by the work of Li et al. (2019), which was conducted for the part of Los Angeles County, California, USA that is highly urbanized. It is novel and has the potential to become widely applicable because it relies on public datasets. It is a framework that can be enriched with citizen-science campaigns and bioblitz events organized by cities, moving towards the creation of technical infrastructures that have also social components to ensure long-term citizen engagement (Liñán et al. 2022). As such, it can also act as a citizen engagement framework for environmental projects aiming at human behavioural change and persuasion. The assessment framework is implemented for Athens, Greece, a large Mediterranean metropolis that has limited green space and very little information available to the authorities on the current biodiversity status. The results of the framework can be used for future planning of the city. Most importantly, our research question is whether we can assess urban biodiversity in a city that has not collected any high-resolution biodiversity data with spatial and temporal reference through costly field surveys and with the involvement of experts. Even though Athens is used as a pilot case to present our “proof of concept,” the strength of this methodology is that it relies on satellite data that is open and widely available and can be easily adapted to be used in any European city; therefore, the work presented is not limited to be applied only in Athens but can be easily replicated. Through our analysis, we wish to reveal patterns that drive urban biodiversity, resulting in urban planning prioritization by the city with interventions that will drive conservation management, enhancing species abundance in urban areas.

Materials and methods

Study area

The Athens municipality has a population of 643,452 (in 2021 according to the Hellenic Statistical Authority (ELSTAT 2023) within its official limits, and a land area of 38.96 km2. The municipality is a small administrative unit of the entire urban area, the Athens Metropolitan Area (AMA), which extends over an area of 412 km2, with a population of 3,744,059, according to the most recent census of 2021 (ELSTAT 2023). A capital that lies in the southernmost country of the European mainland, it has an average annual temperature of up to 19.8 °C locally (HNMS 2022). Athens is the capital and the largest city of Greece, while the AMA has 40 municipalities, 35 of which are referred to as Greater Athens municipalities and produce more than 40% of the national GDP. The city saw a period of rapid development and modernization after the Second World War with large growth rates, resulting in the population of the urban area to more than double (Macrotrends 2023). Athens has one of the highest population densities of Europe, second only to Paris, with 16,830 inhabitants/km2 (ARS 2017). Since the early 80 s, massive construction and a rapidly growing road system led to a depletion of peri-urban green and the covering of the majority of the natural water network of the Attica plain including its two rivers. As a result, In Athens, more that 80% of its area has undergone uncontrollable cementification, making it non-water permeable with buildings, roads, pavements, infrastructures, etc.) (ARS 2017).

The Athens Resilience Strategy was launched in 2017, including the city’s Climate Adaptation Action plan, while the Municipality of Athens is updating its Climate Action Plan, in accordance with its commitments to the Paris Agreement and the Global Covenant of Mayors. Athens faces chronic urban growth issues that magnify climate change impacts; as a response, it has a strategic focus to enhance green infrastructure and support urban biodiversity, in order to best shield itself from, adapt, and build resilience to Climate Change challenges (extreme heat, flash floods and forest fires, to name a few). The Athens municipality is one of nine Case Studies in the European ARSINOE project (https://arsinoe-project.eu/) funded by the EU in the framework of the Horizon 2020 program, focusing on innovations for climate change adaptation. One of the areas that the Case Study explores is urban biodiversity and the focus is on establishing a baseline and engaging the community through citizen science initiatives to establish a species occurrence monitoring and information system that will build on the novel observation application MINKA-SDG (https://minka-sdg.org/).

Data and clustering

K-means is an unsupervised machine learning algorithm and an effective tool for the modeling and visualization of high-dimensional data. The objective of clustering is usually the grouping of similar data in order to discover underlying patterns that may not have been obvious otherwise. To achieve this objective, K-means looks for a fixed number of clusters (k) in a dataset; this is done by finding the locations of k centroids in the data set, which essentially represent the centers of the clusters. The algorithm allocates data points to the nearest cluster and keeps cluster centroids as small as possible. This way, data are mapped in space in a sense that neighboring points have similar features, thus clustering similar units together. In this work, in an effort to develop a methodology that will classify urban habitats according to the characteristics they have in relevant variables that drive biodiversity in the city, we analysed the data and focused on extracting important features. These features were used to identify which urban system properties are relevant to biodiversity, so they can be included by the city in a system that automatically classifies the city to habitats (Balasankar et al. 2021). A long list of variables was considered; in Table 1, we list the features that were chosen: they include population density as well as residential and non-residential density, Urban Land Cover and imperviousness, vegetation, road network, tree coverage, noise level and some meteorological variables such as temperature, precipitation, etc. Data granularity was at zip code level; the dataset contained a total of 494 zip codes in the municipality of Athens. All data were obtained by remote-sensing open datasets, while the units, specific source and corresponding resolution for each variable is listed in Table 1.

Table 1 Features chosen for urban habitat cluster analysis

All data was normalized using the following normalization function:

$${x}_{norm}=\frac{\left(x-\mu \right)}{\sigma }$$
(1)

where x is the feature value, μ is the mean of the feature and σ is the standard deviation of the feature. All calculations were performed with the scikit-learn library in Python (Pedregosa et al. 2011).

The data of each layer was overlaid over the geospatial layer of neighborhoods of central Athens, Greece. The data points of each gridded layer were then translated to corresponding centroids and aggregated for each neighborhood of the case study region. For layers with a resolution that was too sparce to have a corresponding centroid in each neighborhood, for example the 1 km by 1 km layer of mean precipitation, each neighborhood was assigned the value of the nearest centroid. For polygon layers, such as the green areas from the Copernicus Urban Atlas 2018, the percentage of the overlapping polygons was used as the variable, or in the case of the noise maps, the percentage of the overlapping polygons was used in conjunction with the value of each noise polygon to assign an average noise value to each neighborhood.

An important decision in this process is to decide on the number of clusters that the data should be clustered into. For this, the elbow method was used, i.e. for interpreting and validating cluster consistency, in order to find the appropriate number of clusters in a dataset. In the elbow method, the explained variance, or inertia, is plotted as a function of the number of clusters. The optimal number of clusters is where the inertia, namely the sum of squared distances of samples to their closest cluster centroid, starts to decrease in a linear fashion, or forms an “elbow”. The inertia is calculated for each cluster and then summed up to provide the elbow function. As can be seen in Fig. 1, the “elbow”, signifying the optimal number of clusters, is formed between four and six clusters. As we add more clusters, inertia decreases, but at a lower rate, making the increase in the number of clusters non-beneficial. To decide on the optimal number of clusters, clustering was performed for 4, 5 and 6 clusters and the silhouette score (s) was calculated for each, in order to evaluate the separation distance between clusters and the cohesion of each cluster. Score s is a clustering efficiency metric that quantifies how close points are in the cluster; it ranges from − 1 to 1, with a high positive value indicating that the object is well-matched to its own cluster and poorly matched to neighboring clusters. The silhouette score for the number of clusters being 4, 5 and 6 was calculated at 0.4, 0.406 and 0.407 respectively. Even though there is an improvement as the number of clusters is increased, the improvement is deemed marginal, and the optimal number of distinct clusters with satisfying clustering cohesion was decided at 4 clusters, which coincided with 4 distinct urban habitats.

Fig. 1
figure 1

Elbow method for finding the optimal number of clusters

Citizen-science species occurrence data

Recently, several studies have been based on data coming from applications that collect citizen-science species occurrence data. Examples are applications such as iNaturalist (Spear et al. 2017) that have provided a framework to quantify responses of urban biodiversity to anthropogenic pressures, particularly as the cost of professional broad-scale monitoring may be prohibitive (Callaghan et al. 2020). Urban wildlife, such as canids, have been studied using iNaturalist-generated observations, providing both cost-effective data collection and public engagement (Mueller et al. 2019). In fact, one study found that citizen science projects can generate biodiversity occurrence records at a much higher rate than traditional museum collections, particularly for species that are common or resilient in urban environments (Parker 2015; Spear et al. 2017). Another application is eBird, presented in the work by Callaghan et al. (2018) that shows that green space area is the most important predictor of bird biodiversity in urban areas, reinforcing the importance of urban green spaces for maintaining biodiversity. Additionally, projects such as the Baltimore Ecosystem Study in the USA have helped increase our understanding of the biophysical-social complex of urban ecosystems and urbanized areas (Pickett et al. 2008).

Due to the growth of citizen science projects and similar community efforts, there is an increasing availability of urban biodiversity data in the form of species occurrence data for cities (Kobori et al. 2016). As described herein, citizen science platforms are variable and are focused either on a single taxon, such as birds (e.g., eBird; https://ebird.org/home), or on groups of various organisms (e.g., iNaturalist; https://www.inaturalist.org, naturgucker https://www.naturgucker.info/, WU https://herbarium.univie.ac.at/ to name a few). These platforms communicate their data to the Global Biodiversity Information Facility (GBIF, https://www.gbif.org/en/) in order to build a global network and a data infrastructure that provides open access to everyone on biodiversity data around the world, obtained through the involvement of the community.

For Athens, we used the observations imported in the GBIF portal, which includes 10,184 observations (1803 species) including a variety of taxa (e.g., birds, plants, insects, reptiles, mammals, gastropods, arachnids, fungi, etc.) reported between 2010 and 2023—the data set is open and available (GBIF 2023)—to detect biodiversity and species patterns across our study area. These observations were cross-mapped on the four different urban habitats identified through the clustering process.

Results and discussion

Urban habitat typology

Urban habitat categories or typologies have been developed in the past, but they were based on the different plant life forms and their adaptation to urban areas requirements (Farinha-Marques et al. 2017). In this article, we base the categorization of urban habitats on remote sensing data that are related to the built environment and associated bio-physical characteristics, rather than species found in the cities. The species are mapped to validate the habitat typology that emerges from the data and clustering analysis. Following the clustering procedure, four distinct clusters were identified with different characteristics each. To further develop this into an urban habitat typology and identify what differentiates the clusters among them, we analyzed the mean values of the features per zip code in each cluster. The mean values of the feature variables for each cluster are shown in Table 2, while a description of the clusters follows.

Table 2 Mean values of feature variables per zip code in each cluster

Low-residential cluster

According to Table 2, Cluster 1 demonstrates significant non-residential density and building height, suggesting the concentration of commercial or industrial activities. The corresponding low population density implies limited residential activity in these areas. The limited green space, potentially attributed to the spatial requirements of non-residential structures, aligns with the highest occurrence of hot days, possibly due to an urban heat island effect. Geographically, this cluster is located predominantly to the west. Based on these characteristics, we name Cluster 1 as the “Low-Residential Cluster”.

Urban green cluster

Contrastingly, Cluster 2 is characterized by a high percentage of green areas and vegetation. This could indicate a balance between urban development and green space preservation, thus resulting in limited roads and water bodies. The cluster also displays low non-residential density and height, and reduced imperviousness, indicative of predominantly low-rise residential areas and efficient water management practices. Based on these characteristics, we name Cluster 2 as the “Urban Green Cluster”.

Dense urban fabric and urban arterials cluster

These two clusters share characteristics, namely high residential density and building height, suggesting densely populated residential areas. The reduced non-residential density and height imply a predominance of residential spaces over commercial or industrial establishments. The two clusters have a lot of similarities, but we see that one of them has a significantly higher road density, as it encompasses densely populated neighborhoods in Athens with relatively narrow streets and dense structuring—named “Dense Urban Fabric.” The other cluster has a lower road density as it includes some of the large arterials of Athens, thus it is termed “Urban Arterials.”

A data correlation analysis is also performed to explore correlations among features, shown in Fig. 2. We see that Road density is highly correlated with Noise level, while green area percentage coincides with vegetation, as expected. The former also has a high negative correlation with imperviousness, which in turn is negatively correlated with vegetation. Precipitation results in cooler temperatures in the city, which explains the strong negative correlation of precipitation with mean and maximum temperature. The data correlation matrix verifies the quality of data, as all correlations are reasonable, while also shows that some urban features are highly related, as expected. All sixteen features are mapped separately for the municipality of Athens at a zip code level in Fig. 3, while Fig. 4 maps the four clusters as they were derived, without and with the species occurrence observations obtained by citizen scientists and found in GBIF.

Fig. 2
figure 2

Correlation plot of the variables used in the analysis

Fig. 3
figure 3

Values of sixteen features used for clustering analysis of Athens municipality. The features are described in Table 1 and their average value per cluster is provided in Table 2

Fig. 4
figure 4

Athens municipality—a the four clusters, corresponding to four urban habitats; b same as (a) overlaid with GBIF species occurrence data (GBIF 2023)

We further analyzed the results by conducting the Analysis of Variance (ANOVA) test to statistically validate the differences among the means of the clusters and to test the null hypothesis that the means are equal. This statistical method is particularly useful in our analysis as it allows for the simultaneous comparison of more than two groups or clusters, thus permitting the assessment of a multitude of urban characteristics for the different urban habitats across the city. Table 3 presents the sorted ANOVA scores for each characteristic within the clustered zip codes—larger ANOVA scores signify that the feature variable is important in cluster differentiation. These scores and corresponding p-values quantify the differences across clusters and test the statistical significance of these differences.

Table 3 ANOVA scores and associated p-values for all feature variables

The top scoring characteristic is Non-Residential Density with an ANOVA score of 6.89E + 02 and a corresponding p-value of zero demonstrates significant variation across clusters and rejects the null hypothesis of equal means. Similarly, Non-Residential Height, with an ANOVA score of 6.78E + 02 and a p-value of zero, shows significant differentiation across clusters. It is noteworthy that climatic attributes like Maximum Temperature and Mean Temperature also show statistically significant variations across the clusters, with p-values far below the conventional threshold of 0.05. The presence of significant variation in Green Area Percentage, Imperviousness, and Vegetation is indicative of the varied nature of land use across the clusters. On the other end of the spectrum, characteristics such as Water and Street trees per square meter show lower ANOVA scores, suggesting a more uniform distribution of these features across the city’s habitats. Notably, Street trees per square meter has a p-value greater than 0.05 (p = 0.12), which does not reject the null hypothesis, indicating no significant difference in means across the clusters for this characteristic.

Species occurrence data

In Table 4, we see the mean number of species occurrence observations per zip code in each cluster, as they were downloaded from the GBIF platform in a period between 2010 and 2023. This is by no means meant to say that we have an exact record of biodiversity and species occurrence in Athens. The GBIF (2023) dataset is only what has been recorded by citizens throughout the years. It is very possible that species occur in other locations, and it has not been recorded by citizens. We only use this information as an indication of the biodiversity across the Athens zip codes. A general overview of the types of organisms that were spotted across Athens is shown in a Sankey diagram in Fig. 5, in which species occurrence observations obtained by citizen scientists through the GBIF platform are shown for kingdoms, phyla, and classes mapped to the 4 different urban habitat clusters. We see that the number of observations of plants and animals are about equally split, with the plants being dominated by tracheophyta and fungi being found at very small numbers. Animals are split into 3 phyla: arthropoda, chordata and mollusca, with chordata dominating. Overall, a low number of biodiversity observations are found in non-green areas, which cover the largest surface area of the city. When it comes to total number of observations, the Urban Arterials cluster is the one with the lowest number of observations.

Table 4 Mean number of observations per zip code in each cluster
Fig. 5
figure 5

Species occurrence observations obtained by citizen scientists through the GBIF platform: All species observations (grey color) are distinguished for kingdoms (green color), phyla (blue), and classes (purple) and they are cross-mapped to the four different urban habitat clusters (orange)

We also performed a species frequency analysis, which includes a species count statistical analysis that is shown in Table 5. In Table 6, we include a list of species and their corresponding observation counts in a frequency decreasing order, showing all species that had over 100 observations. This is a total of 3252 observations, about a third of the total number of observations in the data set. The corresponding frequency plot is shown in Fig. 6, in which we see that the highest frequency is for species that appear only once, while a large number of species appear twice in the dataset. This is also supported by Table 5, in which we see that 25% and 50% of our data includes species that have only one or two observations, respectively, while 75% of the data include species that are observed up to 4 times. The highest number of observation counts for a single species is 301 and it is found for Ceratonia siliqua (Table 6).

Table 5 Statistics of species occurrence counts
Table 6 Frequency of species occurrence
Fig. 6
figure 6

Species occurrence frequency analysis—density plot

Clustering results matched well the reality on the ground for Athens. A large number of observations (compared to the other clusters) are found in the “Urban Green” cluster, which is also where the National Garden is located in the center of the city and where the only zip code with a water body exists (see the “water” graph in Fig. 3). Pockets of species observations can be found scattered in other locations that are characterized as belonging to the same cluster. The Urban Green cluster is indeed “Green”, since it has the highest percentage of green area and vegetation (Table 2) and has a critical amount of biodiversity observations (Table 4). No other cluster has as many observations as those of the Urban Green cluster. The Urban Arterials and Dense Urban Fabric cluster areas have a high degree of cementification and either large boulevards (Urban Arterials), or narrow streets and high buildings (Dense Urban Fabric), so they are expected to have fewer biodiversity observations and to have the conditions conducive for low species occurrence rates. A significant group of observations is found in the low-residential cluster, which is, at a first glance, surprising. The area includes the Athens Agricultural University that houses tree nurseries and other green parts. Furthermore, in this last cluster, the “Elaionas” district is found. According to the ARS (2017), this urban district is a post-industrial area that has a significant fraction of green open spaces (a result of a Presidential decree) distributed among the urban fabric, even though it is currently regulated to house various light industries. This justifies the existence of species occurrence observations in this low-residential part of Athens.

When comparing the number of observations in Athens with what is reported in the literature, e.g. for Los Angeles (LA) in the USA (Li et al. 2019), we see the following: The urban green area has the most observations in the urban environment (just like in Athens). The Urban Arterials area in Athens has about 50 times less observations than the Urban Green area, while the latter respective area in LA has only 6 times fewer observations. When the most developed areas in Athens and LA are compared (called Dense Urban Fabric in Athens), we see that it is only 4 times fewer in LA and about 100 times fewer than the Urban Green cluster in Athens. The Low-Residential area in Athens has 5 times fewer observations than the Urban Green, while in LA only 3 times fewer. We see that biodiversity in Athens outside the Urban Green cluster is very limited, but this is easily justified when the very limited percentage of green in the whole city is considered, which also exists in isolated patches and is not spread throughout the city. Green corridors, green roofs and continuity in green spaces would enhance biodiversity, but this is not the case in Athens with over 80% of its land being impervious and most open streams being covered. It is hard to compare LA and Athens, as LA has large surface areas, large lots, detached houses and is widely spread out in the urban space. On top of this, we need to consider that the observations that are listed herein are the ones captured by citizens. LA is a city that has run bioblitz events and challenges repeatedly (LAPL 2023) and is expected to have large citizen engagement in capturing biodiversity occurrences via apps like the iNaturalist. For Athens, these events are relatively new, and the citizens are less accustomed to going out and taking part in such citizen science initiatives. A limited number of them are happening in the framework of research projects, mainly with school children, thus the number of citizen science observations is expected to be limited. Efforts should be directed towards creating links of citizens with the natural environment in the city, so they can enjoy the benefits of nature to society, but also to promote their ecological identities and the feeling of “One with Nature” for current and future generations (Laspidou 2023).

Conclusions

Citizen science campaigns are a part of the methodology and promote engagement and participation of communities in obtaining biodiversity data and raising awareness about the activities and solutions that safeguard species conservation and promote green practices in the city. In this article, we focused on developing a methodology that would be relatively easy to implement and one that does not rely on locally available data that are usually not available and/or are hard to obtain. The framework uses remote-sensing data that are relevant to biodiversity to assess conditions in the municipality of Athens, Greece and to divide the city into a number of distinct urban habitats through clustering methodology. Citizen science data that is available through the GBIF platform (GBIF 2023) is used to validate the urban habitat delineation of the city. The novel methodology provides a baseline biodiversity assessment for city officials and analyzes the significance of different factors in promoting urban ecosystem integrity. City officials can then prioritize spending and urban planning initiatives that will enhance species conservation in the city. Results indicate that Athens has a low number of observations overall, which signifies a relatively weak citizen engagement in citizen science events for urban biodiversity. Impermeable surfaces dominate the city, and since green areas and vegetation seem to be the best predictors for having an “Urban Green Habitat” with species occurrences, overall urban biodiversity seems to be low and fragmented. Compared to other large cities, Athens has a significantly lower species occurrence outside the Urban Green cluster. Other than citizen awareness and engagement with urban nature, city planning should include reduction of impervious surfaces, promotion of green areas with variety of vegetation and connectivity among the green urban habitats in order to block habitat fragmentation and promote biodiversity.