Globally, geodiversity—the diversity of geological structures and processes, including rocks and minerals; geomorphology, including landforms and topography; sediments and soils, including formation processes; and hydrology, including marine, surface and subsurface waters—is under increasing pressures from anthropogenic activities and environmental change (Gray 2008a, 2013; Hjort et al. 2015; Fox et al. 2020a). Threats such as urbanisation, mining and land-use change can damage geodiversity both in situ, such as damage to landforms, and across wider spatial and temporal scales, such as the contamination of hydrological systems and changes to soil processes (Hjort et al. 2015). Activities such as tourism and recreation can cause damage to geodiversity features, for example, through the erosion and removal of material (Gray 2008a). Furthermore, environmental change such as anthropogenic induced sea-level rise, landslides and changes to weather patterns can impact geodiversity features and processes (Prosser et al. 2010; Brazier et al. 2012). These wide-reaching anthropogenic and environmental impacts emphasise the need for the wider adoption of geoconservation.

First introduced by Sharples (1993), the concept of geoconservation is any action intended to conserve geodiversity features, processes, sites and specimens (Gray 2018). Geoconservation includes the following: the creation of protected areas such as the UNESCO Global Geoparks Networks; the promotion of education on geodiversity and its conservation; and the in situ management of sites such as the construction of physical barriers to restrict public access (Gray 2008a; Henriques et al. 2011). Though the concept of geoconservation is starting to be adopted globally, a deeper understanding of the value of the contribution of geodiversity to society would promote greater uptake of geoconservation practices (Gordon et al. 2018).

Geodiversity has value also because it plays an integral role in the delivery and maintenance of ecosystem services (ES)— the benefits we receive from nature (Gray 2011; Fox et al. 2020b). First, geodiversity underpins ES, providing the foundations for all other services to occur (Parks and Mulligan 2010). Second, geodiversity actively contributes, through interactions with biodiversity, people and society, to provisioning services (e.g. food, fibre and fuels), regulating services (e.g. dispersal and dilution of pollutants) and cultural ecosystem services (CES, e.g. aesthetic views) (Fox et al. 2020b). Third, geodiversity can provide services in the absence of any interactions with biodiversity. These geosystem services provide a range of goods and benefits for society, including provisioning services (e.g. construction materials and rare-earth metals), regulating services (e.g. the regulation of thermal flows) and cultural services (e.g. religious sites and recreational activities) (Gray 2011; Van Ree and van Beukering 2016; van Ree et al. 2017; Fox et al. 2020b).

Here, we will focus on the relationship between geodiversity and the CES subcategory of recreational activities (Millennium Ecosystem Assessment 2005; Milcu et al. 2013), which provide physical health and psychological well-being benefits obtained through interactions with the natural environment (Hermes et al. 2018). In particular, we focus on hiking, a recreational activity that is generally described as the act of walking for an extended amount of time through natural or rural areas (Mitten et al. 2018). King et al. (2017) identified six pathways from which CES benefits can arise: cognitive, creative, intuitive, retrospective, regenerative and communicative. Here, we classify hiking under the regenerative pathway, which includes opportunities for recreation, leisure and tourism that provide direct restorative benefits like reducing emotional stress (King et al. 2017). We also acknowledge that some of the benefits of recreational activities can also be linked to other CES through other pathways, including accessing aesthetic views, spiritual and religious motivations, or a sense of place (Collins-Kreiner and Kliot 2017; Wilcer et al. 2019).

Some CES that geodiversity provides, such as recreational activities, may, in turn, exacerbate anthropogenic threats to geodiversity (Figueroa-Alfaro and Tang 2017). For example, iconic and geodiverse landscapes, such as the Grand Canyon which attracts thousands of visitors annually, provide a range of CES including as a popular destination for hiking. Through the provision of these services, the landscapes are at higher risk of damage from overuse and exploitation (Gray 2008a). However, quantifying the scale of the threat of human activity on geodiversity and the services it provides is difficult. First, the relationship between geodiversity and ES, and also the service being assessed, varies over the spatial scale assessed as well as across different locations (Alahuhta et al. 2018). Second, indirect damage to geodiversity, such as impacts on downstream hydrological and geomorphological processes, may not be easily assessed (Hjort et al. 2015; Fox et al. 2020b). Therefore, we need to develop a deeper understanding of the complex relationship between geodiversity, ecosystem services and anthropogenic threats.

To develop our understanding of the relationship between geodiversity, ecosystem services and anthropogenic threats over large spatial and temporal scales, there is a need for suitable datasets. However, current data collection methods for ES assessments, including geodiversity, biodiversity and social-demographic datasets, have several limitations not only at larger spatial and temporal scales but also across smaller scales (Hjort et al. 2012). Many traditional CES assessments, such as monetary assessments or social surveys, are expensive and time-intensive to implement (Tenerelli et al. 2016; Figueroa-Alfaro and Tang 2017). Furthermore, as CES vary based on the individual experiencing them, quantifying their perceived benefits and values is more difficult than for provisioning and regulating services (Daniel et al. 2012; Lee et al. 2019; Havinga et al. 2020). Here, data from social media websites provide advantages over traditional ecological assessments and social surveys, providing large spatial and temporal datasets relatively quickly and at minimal financial cost (Barve 2014; Fox et al. 2020a). These approaches can provide a more objective approach to assessing CES compared to survey data, as they show revealed preferences as opposed to stated preferences. Social media websites have therefore started to become established as a reliable source of data for a vast array of CES studies (Ghermandi and Sinclair 2019).

Data from social media sites can be used in a range of methods. For example, using geolocated posts to assess the spatial variation in CES (Tieskens et al. 2017); using content analysis to assess human-nature interactions depicted in photographs (Richards and Tunçer 2018); and using textual analysis to better understand opinions on CES (Becken et al. 2017; Wilson et al. 2019). Previous work in the field has examined the effectiveness of a variety of different social media platforms. Flickr, an image and video hosting website, has been used to assess aesthetic ecosystem services (Van Zanten et al. 2016; Van Berkel et al. 2018) and recreational ecosystem services (Graham and Eigenbrod 2019; Mancini et al. 2019). Twitter, a microblogging site, has been used to assess urban greenspace services (Roberts 2017; Johnson et al. 2019). Reddit, a discussion-based forum, has been used to look at recreational, aesthetic and spiritual CES (Fox et al. 2021a). The different platforms all have different strengths in assessing CES and these strengths and weaknesses should be acknowledged and accounted for in analyses. Fox et al. (2021a) demonstrated that Flickr data is more suited to assessing spatial variation and image content analysis, while Reddit is more suited to assessing textual metadata.

Geolocated posts from social media can be analysed using species distribution modelling methods to gain an understanding of the distributions and drivers of CES. Richards and Tunçer (2018) used maximum entropy modelling (MaxEnt) to plot the potential distributions of CES from Flickr photographs. Their study assessed four criteria: distance from attractions, presence of parks, forest cover and managed vegetation cover. They found that distance from attraction had the largest contribution to the distribution of photographs of plants and animals. Walden-Schreiner et al. (2018) used MaxEnt to assess visitor distributions to national parks, including both infrastructure and environmental features as factors, highlighting that infrastructure variables such as visitor centres are more important than environmental variables. Arslan and Örücü (2020) also used MaxEnt, finding that CES distribution based on geotagged Flickr images is most influenced by roads, religious places and historical and cultural areas. As there is a range of different distribution models available, all with different predictive performance, choosing one individual model may not provide robust results, and therefore, many studies have now taken an ensemble approach by combing the outputs of multiple model algorithms (Hao et al. 2019). However, the authors are not aware of any studies that use multiple modelling algorithms to assess CES from social media data.

Here, we focus on Wales, UK, which is known for its high levels of geodiversity. This geodiversity is protected through various geoconservation instruments, for example, through statutorily regulated areas such as Geological Sites of Special Scientific Interest as well as European Geoparks, such as GeoMôn on the island of Anglesey and Fforest Fawr in the Brecon Beacons (Prosser et al. 2010). The geodiversity of Wales, in particular the high geological and landscape diversity, makes it a popular tourist destination (Burek 2012). Through funding and community engagement, geotourism and education are promoted in the iconic landscapes and areas of cultural significance in Wales (Evans et al. 2018). Previous work using social media data to assess CES in Wales (Gliozzo et al. 2016) found that views of geodiversity features, such as peaks and beaches, the presence of historic human structures and formal biodiversity protection areas, are important drivers of CES in non-urban areas. Furthermore, studies assessing social media posts have found relationships between recreational activities and geodiversity features, in particular, geomorphological and hydrological features (Van Zanten et al. 2016; Oteros-Rozas et al. 2018; Van Berkel et al. 2018; Muñoz et al. 2020). However, due to the multifaceted nature of CES, these analyses, while useful, do not enable us to identify how the use of the landscapes for CES could affect the maintenance of geodiversity.

In this paper, we aim to be the first study to apply an ensemble species distribution modelling approach to understanding what natural and human features drive the distribution of CES derived from geolocated social media data. We aim to quantify which aspects of geodiversity are most important in contributing to the distribution of hiking in Wales, UK. Furthermore, through analysis of the contents of images of hiking, we aim to understand which geodiversity features people interact with to inform focused geoconservation efforts on these potentially at-risk features.


Social Media Data

Launched in 2004, Flickr is a photograph and video hosting website, with a large and diverse user base that contributes over 25 million new uploads a day (Oteros-Rozas et al. 2018; Ding and Fan 2019). Photographs uploaded to Flickr have a range of available metadata including temporal information, in the form of the time and date the image was taken, as well as spatial information in the form of the latitude and longitude at which the image was taken (Fox et al. 2020a). The high availability of metadata from Flickr means that it has become the most widely used social media site for assessing CES (Langemeyer et al. 2018; Ghermandi and Sinclair 2019). Data from Flickr can be accessed through its Application Planning Interface (API), a computing interface that allows researchers to access the application. Here, data from Flickr was obtained using the “photosearcher” package (Fox et al. 2020a) within the R environment (R Core Team 2020). Our image search was limited to any georeferenced images falling within Wales, delimited by a shapefile (Ordnance Survey 2020), taken between the 1st of January 2010 and the 1st of January 2021, uploaded before the 1st of February 2021 and containing a given keyword in the images title, tags or description. To ensure that all images of hiking were captured our keyword search included synonyms of hiking: “hike”, “hiking”, “walk”, “walking”, “trek”, “trekking”, “ramble”, “rambling”.

Not all posts returned from social media sites are useful for CES assessments. Many posts to social media sites contain images taken inside buildings or images of people, while other posts may relate to a negative experience (Fox et al. 2021b). To extract only images relating to the natural environment, the contents of images were automatically tagged using the Google Cloud Vision API (Google Cloud Vision API 2020) within the R environment using the imgrec package (Schwemmer 2019). The Google Cloud Vision API is a machine learning model which can label the contents of images with over 1 million different tags based on a pre-trained dataset. Here, we returned labels for the first 10 features detected in each image. Each label is returned with a given confidence score, scaled between 0 and 1. To select accurate labels without manual validation, we only retained labels with a confidence score of greater than 0.6 (Gosal et al. 2019). We classified the automatically generated labels into biophysical aspects of nature, such as features of biodiversity (e.g. tree or bird) or features of geodiversity (e.g. mountain or lake) and non-biophysical nature labels (e.g. building, car, person, sky) (for a full list of classification see Supporting Information 2). To remove images that are not predominantly of human-nature interactions (e.g. photographs focused on buildings or people), we calculated the ratio of biophysical nature labels to non-biophysical natural labels. Images containing more nature labels than non-nature labels were deemed to be an image focused on nature and were retained for further analysis. Choosing a threshold of at least 50% biophysical nature labels provides a suitable dataset representing human-nature relationships (Fox et al. 2021b). Furthermore, to ensure that users were experiencing a benefit from the hiking experience, we carried out a textual sentiment analysis on the title, description and tags of each image. Here, we used the AFINN dictionary (Nielsen 2011), a collection of words ranked from + 5 (positive words) to − 5 (negative words), to calculate the sum sentiment expressed in the textual metadata of each image. Images in which the user expresses an overall positive sentiment towards the activity were retained for further analysis.

Predictor Variables

To assess the contribution of geodiversity to recreational activities, we chose spatial layers to represent geodiversity features where the relationship with CES has previously been highlighted (Van Zanten et al. 2016; Oteros-Rozas et al. 2018; Van Berkel et al. 2018), as well as features that have not been previously assessed. These are as follows: count of lakes; count of rivers; Euclidian distance to coast; range in elevation; range in slope; count of landscape types; count of bedrock types; count of geosites; and count of soil types (Table 1). Geosites were defined as any Regionally Important Geological and Geomorphological Sites (RIGS) (NRW 2021). Furthermore, as previous studies have found that infrastructure and accessibility are associated with CES distribution (Richards and Tunçer 2018; Muñoz et al. 2020), we also included variables representing these Euclidian distance from roads and Euclidian distance from greenspaces access points. Here, greenspace access points are the entrance to any allotments or community growing spaces, bowling green, cemetery, religious grounds, golf course, other sports facility, play space, playing field, public park or garden and tennis court (Ordnance Survey 2020). Other studies have found that historic sites can also influence the CES distribution obtained from Flickr images (Gliozzo et al. 2016; Van Berkel et al. 2018), so we included the count of scheduled monuments, which are sites of archaeological importance such as burial mounds, castles and churches (NRW 2021). Furthermore, as recreation and CES can be influenced by biodiversity and areas designated for its protection, we also included the area of natural vegetation (Copernicus 2021) and distance to a protected area (Protected Planet 2021). Natural and semi-natural vegetation were any areas of broad-leaved forest, coniferous forest, mixed forest, natural grassland, moors and heathland, transitional woodland-shrub, sparsely vegetated areas, inland marshes, peat bogs and salt marshes (Copernicus 2021). As recreational activities can be impacted by landscape diversity at a spatial scale of around 10 km (Graham and Eigenbrod 2019), here, predictor variables were summarised into 10 km2 grid cells, e.g. elevation range within the 10 km2 grid cell. To transform the individual variables raster to a comparable scale, each of the raster maps was normalized to a 0–1 scale (Supporting information 3 and 4).

Table 1 Predictor variables chosen for species distribution modelling

Distribution Modelling

Within the R environment, the BIOMOD2 package (Thuiller et al. 2009, 2012) can be used to perform ensemble distribution modelling using 10 models: generalized linear model (GLM), generalized additive model (GAM), generalized boosting model (GBM), classification tree analysis (CTA), artificial neural network (ANN), surface range envelop (SRE), flexible discriminant analysis (FDA), multiple adaptive regression splines (MARS), random forest (RF), and MaxEnt. The settings for each model were set to the BIOMOD2 default (Hodd et al. 2014). The different models stem from different mathematical backgrounds, for example, CTA is a classification-based model, GLM a regression-based model and MaxEnt is a machine learning-based model (Guisan et al. 2002; Thuiller et al. 2003; Phillips et al. 2006).

Many of the models require both presence and absence data to model the distribution. However, as geotagged photographs from Flickr are a presence-only dataset, there is a need to generate pseudo-absences. The accuracy of each type of model can be influenced by the number of pseudo-absences used, with each model type having a potential optimal number. For example, Barbet-Massin (et al. 2012) suggest using a large number of pseudo-absences for regression models and an equal number of pseudo-absences to presences for classification techniques. However, using the individual optimal number of pseudo-absences for each model type cannot be applied to the ensemble as the models cannot be compared in an unbiased manner unless the same data is used on all the models (Barbet-Massin et al. 2012). Čengić et al. (2020), found that the number of pseudo-absences does not have a strong effect on model performance across multiple model types and recommend choosing a fixed value. Therefore, we generated an equal number of pseudo-absences to the number of presence points. Furthermore, model performance can be influenced by the prevalence (weighting of presences/pseudo-absences). Here, the modelled prevalence was set at 0.50, which gives the presence and pseudo-absence points equal weighting in the models, as this is recommended for most model types (Barbet-Massin et al. 2012).

To assess each model’s performance the data was split into two groups, 80% for training and 20% for testing (Hodd et al. 2014). We used three common metrics to evaluate model performance, the kappa statistic, the true skill score (TSS) and the area under the receiver operating characteristic curve (AUC). The kappa statistic measures the difference in the observed agreement from the model and the expected agreement on a standardized scale of -1 to + 1 scale, with 1 representing perfect agreement, 0 agreement expected by chance, and negative values indicate agreement less than chance (Viera and Garrett 2005). The TSS assesses model performance through the model’s sensitivity (probability the model correctly classifies a presence) and specificity (probability the model correctly classifies an absence) (Allouche et al. 2006). The TSS is also on a standardized scale between -1 and + 1, with higher scores indicating better model performance and scores close to or less than zero indicating that the model is no better than random (Allouche et al. 2006; Kaky et al. 2020). The AUC curve is the plot of the sensitivity against (1-specificity) across a series of cut-off points, and the AUC calculates a single number across all thresholds with 0 representing a model where the prediction is 100% incorrect and a value of 1 a model where the predictions are 100% correct (Lobo et al. 2008). To minimise uncertainties arising from subsampling, we carried out 10 replications of each algorithm for cross-validation, with the mean kappa statistic, TSS score and AUC for the testing data for each algorithm used to assess model performance (Kaky et al. 2020). The ensemble models were then built using any model runs with kappa and TSS scores of > 0.6 and an AUC of > 0.8, the minimum acceptable standard of accuracy for the metrics (Hodd et al. 2014).

Here, we wish to assess the importance of the predictor variables for the distribution of recreational activity. However, methods for assessing variable contribution are model-specific and therefore limits the comparison between models. Instead, BIOMOD2 calculates the variable contribution to the model independent of the model algorithm (Thuiller et al. 2009). Variable importance is calculated by Pearson’s correlation between the standard prediction (fitted values) and prediction where the variable being assessed is randomly generated. Predictor variables are considered not important when there are higher correlation values—indicating that there is little variation in the standard predictions and the randomly permutated predictions (Thuiller et al. 2009). Here, we assessed the variable importance based on the model algorithms with a mean kappa and TSS score of > 0.6 and an AUC of > 0.8. Furthermore, for these model algorithms, we generated response curves for the predictor variables.

Mapping Geodiversity Indices

A spatial index for geodiversity can be calculated by adding up normalised partial geodiversity indices e.g. normalised elevation range and normalised bedrock type count (Melelli et al. 2017). These geodiversity index maps can be useful indicators for informing geoconservation efforts (Melelli et al. 2017). However, mapping all geodiversity features with equal importance as each other may introduce biases and not highlight areas with high conservation value (e.g. areas of geodiversity with high CES value). Therefore, a common approach to mapping geodiversity is to weight indices based on the conservation goal (Jankowski et al. 2020). Often geodiversity partial indices are combined into themes; e.g., the count of different lakes and rivers are combined into a hydrology partial index (dos Santos et al. 2020). However, this may decrease the relative importance of each feature type to the final geodiversity index. Therefore, we create normalised partial geodiversity indices for each of the 9 geodiversity variables: count of lakes; count of rivers; Euclidian distance to coast; range in elevation; range in slope; count of landscape types; count of bedrock types; count of geosites; and count of soil types (Fig. 1).

Fig. 1
figure 1

Methods for creating unweighted and weighted geodiversity indices

We made two overall geodiversity indices, one where the partial indices had equal weighting and another where we weighted the partial indices based on their mean variable contribution to the distribution model algorithms with a mean kappa and TSS score of > 0.6 and AUC > 0.8. To compare between the equally weighted geodiversity index and the weighted geodiversity index, the two geodiversity indices were reclassified into five classes defined by equal intervals: very low, low, medium, high and very high geodiversity (Pereira et al. 2013; dos Santos et al. 2020; Jankowski et al. 2020).

Understanding Human-Nature Interactions

As well as understanding where people hike, it is also important to understand which aspects of the natural and human-made environment people interact with while hiking in these locations. Purely assessing photograph distribution does account for what the subject of the image is (Yan et al. 2019). For example, people hiking in areas of high elevation may not be interested in topography, but instead they are in that location to take photographs of large-scale vegetation such as a forest (Aiba et al. 2019). To reduce biases introduced by overactive users, following a method similar to the photo user days metric (Wood et al. 2013), we grouped all the images a single user took on a single day as one. For each unique grouping of images by a single user on a single day, we summarised the unique labels returned by the Google Vision Cloud API (e.g. if a single user took 60 images labelled “mountain” a single day, this would be reduced to one “mountain” label). We calculated the frequency of the labels across all users and days and ranked them based on the number of images. Furthermore, to understand which features provide a more positive hiking experience, for each label we calculated the mean textual sentiment value of the images containing that feature.


Social Media Data

There were 20,910 images taken in Wales between the 1st of January 2010 and the 1st of January 2021 which had the chosen hiking synonyms found within the images title, tags or description. Of these, 16,591 (79.34%) were images we deemed to have a focus on natural or semi-natural areas with some biophysical nature features. Of these, 4,919 (23.52% of the full hiking dataset) images had an overall positive sentiment expressed in the image’s textual metadata (Fig. 2).

Fig. 2
figure 2

Distribution of hiking images in Wales; these images are classed as a CES as they have both a focus on biophysical features of nature and contain a positive textual sentiment expressed in the metadata. The inset map shows the location of Wales in comparison to the rest of the UK

Distribution Models

Model performance varied between model algorithms, with RF performing the best based on the kappa statistic, TSS and AUC, while the SRE algorithm performed the worst. Out of all the model's RF, CTA, ANN and GMB had a mean kappa statistic and TSS > 0.6 and AUC > 0.8. If using the AUC metric alone, FDA, GAM, MARS and MAXENT would have also been selected (Supporting Information 3).

The order of the most important predictor variables varied with the model algorithm used, though overall the distance to the coast was consistently the most important predictor (Fig. 3; for individual models see Supporting Information 5). The other most important variables were range in elevation, range in slope, distance from a greenspace entrance and distance from a road. Count of different landscape types, bedrock types, soil types, rivers, lakes and monuments all had relatively low importance across the models. Though the biodiversity measures, area of vegetation and distance to protected area, had a relatively higher mean importance than some of the geodiversity measures, they also had a relatively low overall importance.

Fig. 3
figure 3

Normalised variable importance for ANN, CTA, GBM and RF (model algorithms with a mean kappa statistic and TSS value > 0.6 and AUC value > 0.8)

Inspection of the response curves demonstrates that the probability of images of hiking decreases with distances further away from the coast (Fig. 4, for individual model responses, see Supporting Information 6). Furthermore, all model algorithms show an increase in the probability of hiking images with larger ranges in slope and elevation. For the accessibility predictors, the probability of hiking images increases when closer to a greenspace entrance, however, the probability of images of hiking increases further away from roads. The models’ response curves also show that being closer to a greenspace access point also increases the probability of hiking images. There is a small increase in the probability of hiking images in areas with more lakes and geosites. Furthermore, for biodiversity, there is some increase in probability when closer to protected areas, but there is no real change depending on how much natural or semi-natural vegetation is present. There seems to be no change in response when the varying, count of rivers, count of bedrock types, count of landscape types, count of soil types and count of scheduled monuments.

Fig. 4
figure 4

Mean, ± 1 SD response curves for all variables from all runs of the ANN, CTA, GBM and RF models. The normalized raster values were created for comparable scale for each variable summarised within the 10km2 square: for distance variables, the larger normalized values are, the closer to the feature in question; for the count variables the larger the value the more different types of the feature; for range variable the larger the value the greater the range of that feature and for the area variable the larger the value the greater the area of coverage by that feature

Geodiversity Map

The final geodiversity weightings vary based on the method chosen (Table 2). When all the geodiversity partial indices (count of lakes; count of rivers; Euclidian distance to coast; range in elevation; range in slope; count of landscape types; count of bedrock types; count of geosites; and count of soil types) are given equal weighting, there are small pockets of high geodiversity scattered across Wales (Fig. 5). In general, the two indices agree (45.41% of pixels remained unchanged), with both indices showing very high geodiversity in the Snowdonia National Parks. However, when weighted to the mean variable importance for the distribution models, there is also a relatively large number of areas showing higher geodiversity, with 48.19% of pixels showing an increase. When the geodiversity index is weighted, the higher geodiversity values tend to be within the GeoMôn Geopark or the Pembrokeshire Coast National Park. However, some of the higher geodiversity areas along coastal areas fall outside of any of the larger protected area boundaries. Some areas experienced a decrease in the geodiversity index, with 6.40% pixels having a decline in geodiversity when using the model weightings.

Table 2 Weightings used on the geodiversity indices
Fig. 5
figure 5

Normalised geodiversity index—partial indices: count of lakes; count of rivers; Euclidian distance to coast; range in elevation; range in slope; count of landscape types; count of bedrock types; count of geosites; and count of soil types. a Partial indices unweighted; b partial indices weighted by their mean variable contribution to the ANN, CTA, GBM and RF models; c difference in weightings (weighted – unweighted); d location of national parks, and GeoMon Geopark (the Fforest Fawr Geopark is located within the boundaries of the Brecon Beacons National Park)

Human-Nature Interactions

In total, there were 1105 different image content labels returned by the Google Vision Cloud API. While hiking, people take photographs of both geodiversity and biodiversity features (Fig. 6). For geodiversity, people more frequently photograph elements of topography and water bodies. Furthermore, images containing river landforms (e.g. “fluvial landforms of streams”) and lakes (e.g. “lake”) appear relatively frequently (in the most frequent 30 labels out of a possible 1,105). The features of biodiversity photographed most by hikers were primarily of flora, with none of the most frequently photographed features relating to animals. Furthermore, there were some images containing human features, with the most photographed features being “building” (234 images) and “people in nature” (253 images).

Fig. 6
figure 6

The 30 most frequently photographed features based on the Google Vision Cloud API labels, and the mean textual sentiment of all images containing those features

There was little variation in the mean sentiment score between images relating to the 30 most frequently photographed features (Fig. 6). Images associated with “building”, “nautre” and “coastal and oceanic landforms” have the large associated mean sentiment. Images of topographic features such as “highland” and “mountain”, and those relating to rivers, such as “fluvial landforms of streams”, also have relatively high associated sentiment scores. Images containing features labelled as “grassland”, “terrestrial plant” and “grass” have the lowest mean sentiment values. We note that the mean sentiment score of most labels has a relatively large associated standard deviation.


By using big data from social media sites, we have begun to untangle the complex relationship between geodiversity and CES. First, we have highlighted which natural and human-made features are important in driving the distribution of the locations where people choose to go hiking. Second, we have demonstrated what aspects of the natural and built environment are interacted with while hiking. These are key findings because they can provide information on where geoconservation efforts should be focused and which features of geodiversity should be prioritised. By highlighting areas of high geodiversity in our geodiversity index maps, policy and decision-makers can combine these outputs with other sources of data (e.g. visitation rates) in order to highlight areas that are potentially at risk of loss to the integrity of geodiversity (Bétard and Peulvast 2019). These findings could then guide sustainable decision making to ensure that geodiversity and the ES that it provides can be consumed sustainably, such as the goals of the UNESCO Global Geopark Program (Prosser 2013).

We found that distance from the coast was the biggest driver of the distribution of hiking in Wales. The importance of the coast on the distribution of CES is similar to other studies using social media data to assess CES. Van Berkel et al. (2018) found that beaches were the most visited area in a coastal-inland gradient, while Ghermandi et al. (2020b) found a higher positive sentiment expressed in coastal images when compared to other landscape types. Furthermore, survey data has demonstrated that people in the UK are happiest in natural coastal environments compared to other natural and non-natural areas (MacKerron and Mourato 2013) and that coastal parks in South Africa offer more opportunities for recreational activities such as hiking (Roux et al. 2020). Our results highlight the importance of coastal regions in driving CES, while the image content and sentiment analysis revealed that many people take photographs of coastal and oceanic landforms and that these photographs have a relatively high associated sentiment value. This aligns with Ghermandi et al. (2020a), who found that coastal landforms such as lagoons play an important role in CES and tourism. Future studies should build upon this and assess coastal sites at a finer resolution to further untangle smaller scale relationships between geodiversity and CES.

Our findings also highlight that geomorphological features such as elevation and slope are important determinants in the distribution of recreation. In our study, highly varied slope and elevation increased the probability of hiking images. Other studies have found similar relationships, such as (Van Zanten et al. 2016) who found that features such as hills and mountains were the best predictors of recreational value and (Aiba et al. 2019) who found that the height of mountains plays an important role in the hiking experience. However, these relationships may vary between individuals; for example, some people may find low geomorphological variation boring for hiking, while less experienced hikers may prefer flatter terrain that does not present too great a challenge (Chhetri 2015). As not all demographics are captured by social media data, with those of lower socioeconomic status often underrepresented (Oteros-Rozas et al. 2018; Hargittai 2020), these results may not be reflective of the entire population and management decisions need to ensure that the opinions of theses demographics are also represented (Graham and Eigenbrod 2019).

Previous studies have also found that water bodies such as lakes and rivers can be an important driver of recreational activities such as hiking (Oteros-Rozas et al. 2018; Schirpke et al. 2018). Nevertheless here, we found that the number of lakes and river was less important in driving distribution compared to other variables. The difference in hydrological features driving recreational activities may be due to the location of the study, demographics of users, or the scale at which we assessed the interactions. For example, Ghermandi et al. (2020a) found that locals were more likely to take images of coastal habitats than international tourists, which may account for the importance of the coast when compared to freshwater features. Furthermore, though many of the hiking images were taken close to rivers or lakes, here we assessed the relationship between these as geodiversity and ES at a landscape scale (10km2). As the relationship between geodiversity and ES varies over the spatial scale assessed, the relationship between rivers and hiking may be better explained at a smaller spatial scale (Alahuhta et al. 2018).

We also found that two accessibility variables were important in determining the distribution of hiking in Wales. First, our models suggested that people are more likely to hike closer to the entrance to a greenspace. This further highlights that recreational opportunities and CES are facilitated with access to nature (Roux et al. 2020). Second, here there was a greater increase in the probability of images taken further away from roads. This has also been found by other studies which find high levels of CES in areas that are not accessible by, or close, to roads (Muñoz et al. 2020). However, other social media and CES studies, or those using citizen science data (e.g. nature observation applications such as eBird and iNaturalist), where observations are found closer to infrastructure such as roads (Jacobs and Zipf 2017; Havinga et al. 2020; Muñoz et al. 2020). This complex relationship may be due to the differences in motivations for hiking (Wilcer et al. 2019); for example, novice hikers may not travel far from where they parked or people seeking a tranquil experience may travel much further. These relationships with human infrastructure further demonstrate that CES are co-produced through complex human-nature interactions (Fischer and Eastwood 2016).

Other studies have often found that heritage sites are related to social media posts. For example, Gliozzo et al. (2016) found that viewpoints of historic human structures can influence the distribution of social media uploads in Wales and Van Berkel et al. (2018) found that within coastal environments cultural heritage features were important drivers of CES. However, here, heritage sites were less important at driving CES than natural features such as geomorphology and hydrology, which is consistent with other studies (Kim et al. 2019). Our findings agree with Gliozzo et al. (2016), who suggested that distance to protected areas is relatively important in driving the distribution of CES in Wales. Furthermore, as with other studies, we also found that vegetation cover is generally not the most important driving of hiking (Aiba et al. 2019). Overall, many factors are contributing to the distribution of hiking in Wales, demonstrating that CES are not just influenced by the interactions of geodiversity and biodiversity, but co-produced through complex interactions with society, built infrastructure and heritage (Fischer and Eastwood 2016; Haines-Young and Potschin 2018; Fox et al. 2020b).

As expected, there were differences between the drivers of where people go hiking (based on the location of the images) and what they photograph when there (Yan et al. 2019). Many of the images do contain features found to be important in driving CES distribution such as geomorphological features. However, some of the drivers of distribution were less represented in the image content. For example, though still being one of the most frequently photographed features (in the top 30), there were relatively few images of coastal landforms compared to other features such as trees and mountains. This may indicate that though people generally go to coastal areas, there may be other smaller-scale natural or human features that contribute to this pattern. Conversely, though scheduled monuments were not an important driver of distribution, we found that images of buildings had the highest relative sentiment score. As we filtered images to represent natural or semi-natural areas, this may suggest that the overall number of human-made features may not be important in driving the hiking experience, but rather individual historic or iconic buildings can contribute to sentimental value and provide extremely positive experiences.

We found that the most photographed feature of geodiversity was water, with many images being of rivers and lakes. There were also a relatively high number of images concerning geological and hydrological landforms, including bedrock and fluvial landforms. As well as being highly photographed images of coastal and oceanic landforms and fluvial landforms of streams had relatively high associated sentiments. This suggests that though variety in geodiversity features, e.g. many different lakes or geological landforms, may not drive the general distribution of hiking in Wales when people are on their hike they appreciate and interact with individual features, e.g. a single lake or geological landform. The larger-scale distribution of hiking may therefore be driven by people choosing to hike in areas of high geomorphological variation to receive restorative benefits from physical exercise, while at the local scale, people receive additional creative benefits from hiking in areas with aesthetic views of hydrological and geological features (King et al. 2017; Oteros-Rozas et al. 2018; Schirpke et al. 2018; Wilcer et al. 2019).

As recreational activities such as hiking can cause damage to landforms and hydrological systems (Gray 2008b; Hjort et al. 2015; Wu et al. 2021), these findings have important geoconservation implications. Suitable geoconservation strategies in Wales should continue to promote hiking in a way that maximises the CES benefits, but in a sustainable way that minimise damage caused to the geodiversity features people most interact with. For example, hiking trails could benefit from the construction of educational signage that encourages hikers not to touch at risk landforms and refrain from littering to prevent the contamination of hydrological systems, or from the creation of lookout points that physically restrict hikers from interacting with at-risk features while still providing views that contribute to the CES benefits (Gray 2008a).

A range of different spatially explicit methods of calculating geodiversity indices exist across the literature, from counts of different elements to specific diversity indices (Zwoliński et al. 2018). For example, some studies calculated partial indices by counting up the variety of a feature (e.g. geological units and soil orders) within a grid (e.g. Hjort et al. 2012; Pereira et al. 2013), some calculate geodiversity using standard diversity measures, such as Simpson’s Diversity Index and Simpson’s Evenness Index (e.g. Benito-Calvo et al. 2009), while other studies apply variations of the geodiversity index introduced by Serrano and Ruiz-Flaño (2007). Since our objective was to understand what drives the distribution of where people go hiking, some of our geodiversity partial indices were calculated using count or distance and not a specific diversity measure. For example, by using distance as a partial index measure, we were able to untangle the finer importance of being close to the coastline for recreational hiking in Wales that would not have been possible if we calculated diversity across all hydrological features. Further work should therefore quantify how the differences in geodiversity measures impact their usefulness to differing study contexts (Zwoliński et al. 2018).

Though here we have demonstrated the main geodiversity drivers of hiking (distance to coast and topography) are unmanageable, these results can still be useful for guiding future geoconservation strategies (Fox et al. 2020b). By combining information on where people go hiking, with information on what they interact with when hiking we can better inform geoconservation methods. Here, our results indicate that geoconservation efforts to mitigate against any damage from hiking in Wales may be best focused in coastal and mountainous areas, with targeted management strategies for protecting geomorphological, geological and hydrological features and landforms. Furthermore, as many hikers took photographs of flora, any geoconservation strategies should be undertaken holistically to ensure the future protection of biodiversity as well (Anderson et al. 2015; Lawler et al. 2015). Many areas of high geodiversity already fall within protected areas, primarily in the larger sites such as the Snowdonia National Park, the Pembrokeshire Coast National Park and the GeoMôn Geopark (particularly when weighted based on the distribution models), suggesting that Wales is undertaking good steps in protecting geodiversity from damage caused by recreational activities (Prosser et al. 2010; Evans et al. 2018). However, there are several areas with a high geodiversity, where potential damage is not yet mitigated. Here, these areas may benefit from the creation of geoconservation in the form of a protected area that promotes the sustainable use of geodiversity for tourism and recreation activities, such as the goals of some UNESCO Global Geoparks (Henriques and Brilha 2017; Gordon 2018; Gray 2019). Though here we have assessed geoconservation in Wales with a focus on promoting sustainable recreation, these methods are transferable to studies with different regions and different management goals. For example, one could use geodiversity variables to predict species distribution (Bailey et al. 2017) and use the results of the analysis to inform geoconservation management strategies that also benefit biodiversity conservation (Anderson et al. 2015; Lawler et al. 2015).


Geodiversity plays an integral role in the delivery of hiking as a CES, both driving the general trends in its distribution as well as the features of nature that people interact with while hiking. Here, we have shown that geomorphological features including the range in slope and the range in elevation and hydrological features such as distance to the coast can play an important role in determining the distribution of hiking. We also note the importance of the co-production of CES through human-nature interactions, with access to nature being key to driving recreational activity distribution. Here, both distance to roads and distance to greenspace access points contributed highly to the distribution of hiking in Wales. While hiking, people tend to interact with geomorphological, geological and hydrological landforms. Geoconservation management strategies should therefore focus on promoting hiking in a suitable manner that maximises the CES benefits received while ensuring the future of the geodiversity features that contribute to these. Future work should apply these methods to different activities, conservation goals and study sites to help inform more tailored geoconservation management plans.