Global change can be seen as degradation of natural ecosystems, which in turn is the most important reason underlying biodiversity change (Fuller et al. 2007). Climate and land use changes modify biotic communities in all kinds of environments (Sala et al. 2000; Vilmi et al. 2017), including freshwater ecosystems (Donohue et al. 2009; Heino et al. 2009). Recently, it has been estimated that decrease of biodiversity due to anthropogenic stressors is clearly higher in freshwater ecosystems than in terrestrial ecosystems (Abell 2002; Wiens 2016). The undesirable trend of biodiversity loss is associated with key stressors, such as pollution, invasive species, dams, and modification of in-stream habitats (Vörösmarty et al. 2010). The effects of these stressors are highly alarming because freshwater ecosystems cover only a small fraction of the Earth’s surface area (0.8%), but harbor a considerable (6%) proportion of all known species on Earth (Dudgeon et al. 2006).

Stream ecosystems are dependent on catchment-scale properties and processes (Hynes 1970; Allan and Castillo 2007). Thus, one could assume that a useful solution to maintain biodiversity is to establish protected areas by considering the inter-connected features of the catchment and stream environments (Ward et al. 2002; Moilanen et al. 2008; Soininen et al. 2015). Traditionally, protected areas have been designed for the protection and maintenance of biodiversity in land and marine ecosystems (IUCN 1994), although they are seldom designed for conserving freshwater biodiversity only. In some cases, rivers are used as the borders of protected areas (Abell et al. 2007). The wider use of the catchment-based conservation probably suffers from a lack of positive empirical examples, the unique position of freshwaters (e.g. stream corridors) in the landscape (Abell et al. 2007), and the absence of comprehensive and standardized knowledge about the spatial distribution of the most vital areas for biodiversity (Carrizo et al. 2017). However, it is generally understood that inclusive conservation of freshwater ecosystems requires a whole-catchment approach (Dudgeon et al. 2006).

Conserving nature’s stage (CNS) is known as a strategy to protect biodiversity by maintaining geodiversity (Beier et al. 2015). In CNS, the focus is on the abiotic ‘stage’ upon which ecological processes take place instead of using populations, species or communities as the unit for conservation planning (Beier and Brost 2010). Geodiversity refers to the variability of abiotic characteristics of Earth’s surface and subsurface, including materials such as soils, processes like erosion, and landforms such as valleys, which are relatively stable over long time periods (Anderson and Ferree 2010; Gray 2013). In running water environments, geodiversity could be considered as a driver of species distributions and ecological processes at different scales (Lawler et al. 2015). In addition, the role of geodiversity has been acknowledged for its positive association with biodiversity in terrestrial ecosystems (Anderson and Ferree 2010; Stein et al. 2014; Bailey et al. 2017; Tukiainen et al. 2017a), marine ecosystems (Kaskela et al. 2017) and recently in streams at a local-scale (Kärnä et al. 2018).

Running waters are hierarchically structured ecosystems where the distributions of species on a certain location depend on filtering processes based on climate, geology, dispersal processes, channel morphology and physical–chemical properties of local habitats (Poff 1997; Fig. 1). In streams, habitat- and reach-scale filters comprise nutrients, light, pH, stream width and current velocity (Frissel et al. 1986). In addition, abiotic drivers operating at catchment scales (e.g. land-use, soil-type and geology) are also considered important for species distributions. For example, geology, geomorphological features and land use at the catchment-scale constrain local habitat conditions (Frissel et al. 1986; Richards et al. 1996; Davies et al. 2000; Pajunen et al. 2017). The significance of abiotic variables in structuring species distribution varies with the scale and organism group. For instance, water chemical properties shape the structure of microalgal communities (e.g. diatoms) (Soininen 2007; Jyrkänkallio-Mikkola et al. 2016) and bacterial communities (Heino et al. 2014; Wang et al. 2017). Local macroinvertebrate diversity often responds to local stream features like stream width and current velocity, substrate properties and water chemistry (Malmqvist and Mäki 1994; Heino et al. 2013) and several of catchment properties, such as geology and land use (Richards et al. 1996; Sandin and Johnson 2004).

Fig. 1
figure 1

Photo credit J. Jyrkänkallio-Mikkola

Representation of the environmental features from regional-scale and catchment-scale to local-scale habitat conditions. Environmental features and processes measured at different scales can be considered as filters important for stream biodiversity patterns (Frissel et al. 1986; Poff 1997). Geodiversity at catchment-scale (soil-type richness in this example) can affect the water infiltration processes and thus water chemistry important for stream organisms (Leland and Porter 2000). The visualized climate variable is mean annual temperature across Finland.

In this study, we explored the possibility to explain species richness variation in streams using geodiversity information. In addition, we had land use data and traditionally used local-scale environmental variables as explanatory variables in our statistical models. We compiled geodiversity information for a number of boreal catchments in western Finland at 1-km2 scale and examined biodiversity variation along geodiversity and land use gradients using diatoms, bacteria and macroinvertebrate as focal organismal groups. At local-scale, we expected to find a relationship between macroinvertebrate richness and stream site variables especially related to stream size and water chemistry (Heino et al. 2003). For diatom and bacterial richness, we expected to find a clear relationship to water chemistry variables (Soininen 2007; Jyrkänkallio-Mikkola et al. 2016). Of land use variables, we expected to find effects of agricultural and artificial areas on diatom, bacterial (Leland and Porter 2000; Lear and Lewis 2009), and macroinvertebrate richness (Lenat and Crawford 1994; Sponseller et al. 2001). Microbes may be very sensitive to environmental changes due to their small size (Zeglin 2015), which could lead to changes in species richness in catchments influenced by agriculture with associated increase in nutrient levels (Allan and Castillo 2007). We also expected to find an indirect effect of geodiversity on species richness, because geological features, besides other catchment properties, control many reach-scale characteristics, such as vegetation in the riparian corridor, flow variability and water chemistry (Leland and Porter 2000; Soininen 2015). For example, subsurface properties (e.g. soil type) affect the precipitation–runoff processes in a watershed (e.g. water infiltration capabilities), resulting in changes in water chemistry and consequent variation in species richness in headwater streams (Fig. 1).

Materials and methods

Characteristics of the study area

We sampled altogether 88 boreal streams from 21 main river basins in the coastal areas of western Finland, which all drain to the Baltic Sea (Fig. 2). The location of surveyed stream sites (1 site per stream) stretched approximately 520 km in north–south direction and more than 300 km in east–west direction.

Fig. 2
figure 2

Locations of the 88 sampling sites in 21 major river basins in Finland

The landscapes in the southern parts of the study area comprise river valleys with slightly undulating topography, large coastal plains and some lakes. The southern parts of the study area are characterized by till-covered bedrock hills and structurally controlled valleys (Fogelberg and Seppälä 1986). Northern areas are mainly flat terrain, with few lakes and fine-sediment and till deposits. Topography varies more in the northernmost parts of the study area. Altitude ranges from sea level up to 200 m a.s.l. in the eastern parts. Geomorphology of the study area is characterized by glacial and glaciofluvial relief. For example, eskers cause variation in otherwise quite flat coastal landscapes. The bedrock of the study area is primarily composed of crystalline rocks, and the soils in the area are mostly ground moraine (Aro et al. 1990). In river valleys and coastal regions, there are also sand formations (e.g. dunes) and glaciofluvial deposits with sorted materials.

Biogeographically, the study area ranges from hemiboreal (where mixed forest are dominant) to middle boreal vegetation zones (where both coniferous and mixed forests occur commonly) (Ahti et al. 1968). Wetlands with different peat deposits are relatively rare in the south but are increasingly common northwards (Hämet-Ahti et al. 1988). Mean annual air temperature typically varies from over 5 °C in the southwest to 2 °C in the north (Pirinen et al. 2012). Mean annual precipitation in the study area typically ranges from 500 mm in the northwest coast to over 700 mm in the southernmost area (Pirinen et al. 2012).

The land use of the study area varies substantially. The southernmost streams are generally situated in human-dominated landscapes (e.g. agricultural areas), whereas the northern streams are typically located in forest-dominated landscapes. In addition, local environmental conditions of streams vary from near-pristine forested headwater streams to more modified streams in the catchments of intensive agriculture (Jyrkänkallio-Mikkola et al. 2017; Heino et al. 2018).

Biological sampling

To control for seasonal variation in biological communities, stream macroinvertebrate, diatom and bacterial samples were collected within 2 weeks in September 2014. September is a suitable month for sampling organisms dwelling in boreal streams because diversity is high and natural disturbances (e.g. snowmelt-caused floods) are typically less frequent than in the spring period (Heino et al. 2013). In addition, one-time snapshot sampling, if done within a narrow time window, is suitable for uncovering spatial patterns and species richness–environment relationship. However, it remains silent on temporal variation, which was beyond the focus of this study.

For macroinvertebrates, a 2-min kick sample (net mesh size 0.5 mm) covering most microhabitats in a riffle section of approximately 100 m2 was taken. Such samples contain usually more than 70% of species at a site in a given season (Mykrä et al. 2006). Macroinvertebrates and associated material were immediately preserved in ethanol in the field, and samples were taken to the laboratory for further processing and identification. Macroinvertebrates were identified to the lowest possible taxonomic level, i.e. species, species group or genus.

Diatom and bacterial samples were taken simultaneously with the macroinvertebrate sampling. At each site, 10 randomly chosen cobble-sized stones were collected from ca. 20 cm depth from different parts of the riffle site from an area covering around 20 m of the stream site length. Diatoms were collected from stones by brushing through a rubber template (5 × 5 cm in size) and the composite sample was immediately preserved in a cool and dark box. In the laboratory, diatom frustules were cleaned from organic material using wet combustion with hydrogen peroxide (30%, H2O2) and mounted in Naphrax. At least 500 frustules per sample were counted and identified to species level with a Nikon Optiphot 2 phase contrast light microscope.

Bacterial samples were wiped off from cobble sized stones using sterile pieces of foam plastics and were frozen straightaway in the field until laboratory analyses. Supplementary details of field sampling can be found from Vilmi et al. (2016) and Jyrkänkallio-Mikkola et al. (2017). In the laboratory samples were first freeze-dried and then the DNA was extracted using a PowerSoil DNA Isolation Kit (MoBio, Carlsbad, USA). PCRs were performed by Veriti Thermal Cycler (Life Technologies) The 16S rDNA region of bacteria was amplified with primers 519F 5′-CAGCMGCCGCGGTAATWC-3′ and 926trP1 5′-CCTCTCTATGGGCAGTCGGTGATCCGT CAATTCCTTTRAGTTT-3′. Unique, nine base barcode primers were used for each sample. This barcoding system allows us to identify each sample in post-sequencing analyses. The amplicons were sequenced using ion torrent semiconductor sequencing, where the total number of raw sequences was 2,708,611. The sequence library was split by samples and quality filtered based on the quality scores for each sequence. Sequences with quality scores below 25, shorter than 200 bp or longer than 1000 bp were removed. After quality control, a total of 549,548 sequences were retained, with an average sequence length of 3414 bp. The sequences were clustered as operational taxonomic units (OTUs) using the Usearch61 algorithm (Edgar 2010) with 97% sequence similarities. OTUs (97% similarity) were determined using the Quantitative Insights Into Microbial Ecology (QIIME) pipeline version 1.8.0 (Caporaso et al. 2010). The OTU dataset was rarefied to the lowest number of sequences detected (1052), because sequence numbers varied among samples. The laboratory and bioinformatics methods are presented in more detailed in Heino et al. (2015) and Jyrkänkallio-Mikkola et al. (2017).

In all statistical analysis, we used overall species richness (i.e. number of species or OTUs) for each organism group as a measure of biodiversity. We acknowledge that by using such a simple measure of biodiversity we could lose information regarding variation in species traits and phylogenetic relatedness (e.g. Heino and Tolonen 2017). However, as the number of species remains the most commonly utilized measure of biodiversity in general (e.g. Magurran 2004) and among freshwater studies in particular (e.g. Feld et al. 2009), we decided to concentrate on species richness as the response variable.

Environmental variables

In each stream site, current velocity and water depth were measured at 30 locations, and stream width was measured from 10 cross-stream transects covering the study site. Mean values of the physical variables were used in statistical analyses. Water samples for determining total phosphorus, total nitrogen and water color were collected. Electric conductivity and pH were measured with YSI-Professional Plus water quality meter (YSI Incorporated, Yellow Springs, USA). All environmental variables were collected simultaneously with biological sampling.

Catchment features and geodiversity

For each study site, upstream catchment area was determined using a digital elevation model (grid resolution 10 × 10 m, National Land Survey of Finland 2013) in ArcGIS 10.5 software. Catchment size, land use type and geodiversity were further calculated for each catchment. Land use was obtained from CORINE Land Cover data (20 × 20 m, Finnish Environment Institute 2013).

Geodiversity variables consisted of geomorphological, soil- and rock-type richness at resolution of 1-km2 grid cells (Table 1). Geomorphological, or landform, data were quantified using landform observations, GIS-based environmental variables and generalized additive modelling (Hjort and Luoto 2012). Landform observations were obtained by an expert geomorphologist who systematically examined geomorphological maps (1:50,000) and aerial photographs (~ 30 cm resolution). After that, the geomorphological distribution modelling approach was used to predict the number of landforms in each 1-km2 grid cell covering the whole Finland. There, the landform observations and GIS-based environmental variables were used in generalized additive modelling to generate geomorphological richness values for the grid cells (Hjort and Luoto 2012, 2013; Tukiainen et al. 2017a). Soil and rock-type richness were calculated by summing the number of different soil and rock types in a 1-km2 grid cell. The calculations were based on digital soil and bedrock maps, respectively (Geological Survey of Finland, GSF 2010a, b). Rock-types were classified by an expert into 16 genetically and geochemically distinct classes (Tukiainen et al. 2017a). All geodiversity measures were first calculated for the whole Finland and afterwards reduced to match with catchment boundaries in ArcGIS 10.5 environment. A more detailed description of the geodiversity data can be found from Tukiainen et al. (2017a).

Table 1 Information on geomorphological features, rock types and soil types based on which geodiversity variables were calculated

Selection of final predictor variables

We selected altogether nine predictor variables for the final analyses, of which two were land use, three were geodiversity and four were local environmental variables (Table 2). Harrel et al.’s (1996) shrinkage rule recommends including no more than n/10 predictors in the final model, which supports the strict selection of nine variables for the final analyses. The selection of predictors was based on the theoretical and empirical background of important variables for stream organisms. In addition, preliminary examinations of Pearson correlations between species richness and environmental variables were used to detect the preliminary relationship between the variables and biotic richness, and to prevent multicollinearity. To limit the number of physico-chemical variables, we selected the ones that have been found to influence biotic communities in boreal streams: pH, width, depth and velocity (Malmqvist and Mäki 1994; Mykrä et al. 2007; Lear et al. 2009; Heino et al. 2012, 2014). Furthermore, we excluded total phosphorus, total nitrogen and water color from further analysis because they had low Pearson correlations with biotic richness in our preliminary analyses. In addition, conductivity was removed because of high correlation with land use variables (see Appendix S1). We used two land use classes, artificial and agricultural areas in order to describe anthropogenic effects in the catchment area above our stream sampling point (Tonkin et al. 2016; Jyrkänkallio-Mikkola et al. 2017; Pajunen et al. 2017).

Table 2 Details of the environmental variables used in boosted regression tree analyses for examining the relationship between the environment and species richness of stream macroinvertebrates, diatoms and bacteria in western Finland (n = 88 streams)

Statistical methods

First, all predictor variables were tested for normality and transformed when necessary. For environmental variables (width and depth) and geodiversity variables, logarithmic transformations were used. For land use variables, arcsine-square root-transformation was used. Second, Pearson correlation was used to examine congruence between the biotic richness and the predictor variables (see also Appendix S1–S2).

We used boosted regression trees (BRTs) to explain the variation of species richness and to measure the relative influence of the different predictors on species richness using the package ‘gbm’ (version 2.1.1.) in R 3.1.2 (R Core Team 2017). BRTs is a non-parametric, machine learning method that can be understood as a progressive type of regression modelling (Elith et al. 2008). Machine learning methods have many benefits over traditional statistical models, such as their robustness to missing values and multicollinearity in the data (Dormann et al. 2013) and, especially in the case of our study, their ability to handle nonlinear relationships and variable interactions (Elith et al. 2008). In recent years, similar methods of decision trees have been used in many fields, including ecology (Thuiller et al. 2003; Mouchet et al. 2015; Jyrkänkallio-Mikkola et al. 2017), land-use change (Müller et al. 2013) and geodiversity–biodiversity explorations in terrestrial environment (Bailey et al. 2017; Tukiainen et al. 2017a).

BRTs consist of regression trees, which explain the deviance of a dependent variable by fitting simple models on partitions of the whole data space. Partitions are results from splitting up the data space into assemblages that are as similar as possible in terms of response and that minimize prediction errors. Afterwards, BRTs combine simple decision trees (i.e. boosting) by adding trees in a forward and stage-wise fashion to minimize the loss function of the model (Elith et al. 2008). Using the ‘gbm.step’ function allowed us to calibrate models with three parameters to specify. First, tree complexity (tc) means the model complexity in terms of allowed interactions between independent variables. Second, bag fraction (bf) separates the input data to calibration and evaluation data. Third, learning rate (lr), also known as the shrinkage parameter, can be specified. After testing, we set tc to three, bf to 0.75, lr to 0.001 and used a Gaussian error distribution for all response variables, except for bacterial richness for which the Poisson error distribution was used.

The efficiency of the models was evaluated using the percent of explained deviance [(null deviance − residual deviance)/null deviance]. To understand the effects of individual variables on species richness, the relative influence (sum up to 100%) of every predictor variable was acquired from ‘gbm.step’. We also created partial dependency plots to explore the relationship between species richness and the predictor variables (Müller et al. 2013; Mouchet et al. 2015).

Final models were validated using 10-fold cross-validation (CV). This method subsamples the data 10 times according to defined bf (0.75). This means that each time 3/4 of the data were used to parameterize the model and remained 1/4 to evaluate it. The final CV correlation is the mean correlation between the testing and training data through 10 runs. Finally, we tested the assumption of spatial independence of response and predictor variables and residuals of final BRT models with Moran’s I correlograms (Legendre and Legendre 2012). Also, to check for possible autocorrelation of predictor variables, Moran’s I correlograms were created.

All analyses were run in R (version 3.1.2) with the packages ‘gbm’ (Ridgeway 2015), ‘corrplot’ (Wei 2017) and ‘pgirmess’ (Giradoux 2018). Data transformations were conducted with IBM SPSS Statistics software (IBM Corp., Released 2016, IBM SPSS Statistics for Windows, Version 24.0. Armonk, NY: IBM Corp.).


Geographically, catchments with the highest geodiversity were located in the southern and northern parts of the study area, whereas geodiversity-poor catchments were mainly located in the central part of the study area (Fig. 3d–f). Geomorphological richness was the highest in south-western parts, and soil-type richness was the highest in the southernmost corner of the study area, although a few hotspots also occurred in the northern parts of the study area. Rock-type richness was more evenly distributed across the regions but, interestingly, in the central parts among a few closely situated catchments, rock-type richness showed considerable variation (Fig. 3f).

Fig. 3
figure 3

Study sites (n = 88) and the distribution of macroinvertebrate richness (a), diatom richness (b) and bacterial richness (c). Below, mean of geomorphological richness (d), soil-types richness (e) and rock-type richness (f) are mapped in catchment areas corresponding the studied streams

Overall, 186 macroinvertebrate taxa were found, and their richness per site varied between 12 and 45 (Fig. 3a). Diatom richness ranged between 13 and 81 per site, and 347 species were found in total. The total number of detected bacterial OTUs was 12,308, and local bacterial richness varied between 358 and 744. There was no clear pattern in the spatial distribution of species richness of macroinvertebrates, although there seemed to be more species-poor sites in the north. Diatom and bacterial richness were also quite evenly distributed. However, for diatoms, there was a clear concentration of species-poor sites in the central parts of the study area characterized by large expanses of agriculture fields.

According to the BRT models, the species richness of different organismal groups was best predicted by local environmental and land use variables. On the other hand, the predictive power of geodiversity (especially soil-type richness) was also noticeable (Fig. 4). The performance of the BRT models for macroinvertebrate (53% of explained deviance) and diatom (50%) richness were considerably higher than the performance of the model for bacterial richness (38%).

Fig. 4
figure 4

Relative contributions and partial dependence plots of predictor variables. Relative contributions of environmental, land use and geodiversity variables explaining the variation of species richness in boosted regression trees. Numerical values after a biotic group’s name show percentage of deviance explained (%) by the respective model. Partial dependence plots from boosted regression tree analyses for species richness showing the four most influential predictors for each biotic group. The graphs show the effect of a predictor variable (x-axis) on the response variable (y-axis)

All four local-scale environmental variables together explained most of the variation of macroinvertebrate richness, and stream width was the most important predictor (41.4% of explained variation). Soil-type (11.4%) and geomorphological (4.9%) richness seemed to be slightly more important than the land-use variables and rock-type richness for macroinvertebrates (Fig. 4). Macroinvertebrate richness was highest in streams wider than 2.7 m and with water depth more than 0.14 to 0.22 m (Fig. 4). In addition, macroinvertebrate richness increased clearly as soil-type richness was more than two (Fig. 4). For diatom richness, the relative contributions of different variables were more similar. Of the local environmental variables, the most important variables were pH (21.5%) and water depth (7.9%). Interestingly, according to the BRTs, soil-type richness (26.2%) appeared to be the most important predictor for variability of diatom richness. The partial response curve for diatoms showed species richness increase in two steps with increasing soil richness of catchments, being highest when soil richness was ≥ 3. Moreover, the relationships between diatom richness and pH as well as diatom richness and human land use intensity were positive.

Bacterial richness was best predicted by local environmental variables, of which current velocity was the strongest predictor (34.6%). Of the catchment-level variables, agriculture was the best predictor (5.1%). Of the geodiversity variables, rock-type richness and geomorphological richness showed only a marginal influence on the diversity of bacteria, but soil-type richness seemed to be very important. In fact, it was the second-best variable after velocity accounting for variation in bacterial richness (19.6%, Fig. 4). The effect of soil-type richness was positive. However, the response of bacterial richness to velocity over 0.3 m/s was negative, as species richness clearly decreased after that threshold level (Fig. 4).

Internal model fit and CV showed that the model for macroinvertebrate species richness performed reasonably well (values closer to 1). According to internal model fit, the success of diatom and bacterial richness models were also rather good, although CV correlation was expectably lower (Table 3).

Table 3 A summary of model evaluation (mean cross-validation correlation, CV corr.) and fit statistics (self-statistics, SS)

Diatom richness and macroinvertebrate richness (i.e. response variables) were spatially autocorrelated in categories of shortest site-to-site distances, but for both indices, correlation levels remained quite low and these effects disappeared when residuals from the BRT models were used (Fig. 5). In addition, bacterial richness showed some spatial autocorrelation at first two distance lags but disappeared when residuals from the BRT models were examined. Of the predictor variables, spatial autocorrelation was notable in pH, land use variables, geomorphological richness and soil-type. Spatial autocorrelation of these variables decreased with increasing site-to-site distances (Appendix S3).

Fig. 5
figure 5

Moran’s I correlograms showing the degree of spatial autocorrelation for biotic richness of the three organismal groups (solid lines). Significant values (p < 0.05) are marked with asterisks and represent positive autocorrelation. Dashed lines show spatial autocorrelation of residuals after fitting the BRT models for each organismal group


Freshwaters are one of the most threatened ecosystems in the world (Vörösmarty et al. 2010), and one major issue is how to choose the most valuable regions and delineate areas for conservation (Moilanen et al. 2008). As the valley rules the streams in many ways (Hynes 1970), stream biodiversity should reflect abiotic conditions and vegetation at the catchment-level. Hence, one approach to stream biodiversity conservation could be to focus on catchment-level features. In this study, our objective was to examine how the local environment and spatial environmental heterogeneity at the catchment-scale correlate with stream macroinvertebrate, diatom and bacterial richness. We particularly explored the applicability of geodiversity at the catchment-scale in explaining the species richness of these key stream-dwelling organism groups. Based on our results, the species richness of macroinvertebrates, diatoms and bacteria were best explained by local environmental variables. However, we detected clear signals of the role of soil-richness in the catchment accounting for variation in species richness, thus emphasizing the potential power of geodiversity variables for understanding and predicting variation in stream species richness.

Our results are in line with previous studies about the role of local environmental variables for stream communities (Hildrew and Giller 1994; Malmqvist and Mäki 1994; Heino et al. 2003; Soininen 2007; Wang et al. 2017). Based on the results of BRTs, the species richness–environment relationships differed among the three organismal groups. Macroinvertebrate richness was most strongly related to stream-size related variables, i.e., factors which are well-known to be important in structuring macroinvertebrate communities in boreal streams (Mykrä et al. 2007). The rather strong positive correlation between stream width and macroinvertebrate richness probably shows the ‘area effect’ whereby larger streams (i.e. stream width over 3 m in our study, Fig. 4) should present larger targets and offer more habitats for macroinvertebrates than smaller streams (Heino et al. 2003). Of the chemical variables, pH was the dominant variable explaining variation in diatom richness. Thus, it supports the general understanding that pH is one of the key factors controlling diatom richness (Soininen 2007). The increase in diatom richness at a pH threshold between 6.5 and 7.5 may indicate that pH is not optimal for most of the species below this threshold. In addition, sites with extreme pH values may have contributed to the relative importance of pH for species richness (Potapova et al. 2002). For bacterial richness, local environmental variables showed an important role in explaining species richness, corroborating previous results by Wang et al. (2017). Furthermore, we noted the relatively important role of current velocity, which may result from unstable conditions for benthic bacteria communities in fast velocities. It may also be that high velocities present lower flow heterogeneity, thus decreasing bacterial richness in streams, or that some other unmeasured components of flow (e.g. turbulent kinetic energy and turbulence intensity) may affect bacteria (Besemer et al. 2009).

Catchment-level variables (including geodiversity variables) explained a notable amount of variation in species richness (Fig. 4). One reason behind this finding can be the larger number of catchment predictors than local environmental variables in our analysis. However, these results also indicated the importance of catchment variables in shaping local habitat conditions (Allan 2004). Especially, land use variables had a clear role in accounting for variation in diatom richness. The effect of land use variables can be explained by their role as a proxy variable for water quality, particularly for pH (Varanka and Luoto 2012). This finding was reinforced by the fact that pH had a greater effect on species richness in comparison with that of land use. The contribution of land use variables was not surprising because earlier studies have obtained similar results about the joint role of land use and water quality in structuring stream communities (Tonkin et al. 2016; Pajunen et al. 2017). Jyrkänkallio-Mikkola et al. (2017) also suggested that moderate landscape perturbation by humans may increase species richness because slightly human-modified areas may support higher diatom richness than near pristine or highly-impacted areas. Hence, there may be a hump-shaped response by stream diatom richness to human land use.

Our results are congruent with previous studies about the importance of geodiversity variables in the terrestrial realm (Hjort et al. 2012; Bailey et al. 2017; Tukiainen et al. 2017a). Recently, Toivanen et al. (2019) also reported a significant positive relationship between geodiversity variables and the species richness of aquatic plants in lakes and rivers. These findings also support our present results and encourage the inclusion of geodiversity variables in species richness models of several organismal groups in both terrestrial and aquatic realms.

Of the geodiversity variables, soil-type richness had a strong positive effect on species richness of all three organismal groups. The sum of different soil-types (e.g. till, sand, clay and peat) at the catchment-scale may reflect the overall catchment properties and, consequently, water quality. This presumption is supported by a previous diatom richness study by Passy (2010), where she concluded that dissolved organic carbon and iron concentrations may increase with the proportion of wetlands in the catchment. Thus, it is possible that water quality may reflect the variation in both soil-type richness and land use variables, although soil-type richness may reveal this variation in a stronger way than traditional land cover variables. Leland and Porter (2000) found similar effects of surficial geology for diatoms but, in their study, land use was a more important variable. Potential reasons behind our findings are speculative at present. It is possible that soil richness captures the intermediate-scale variation of soils better than one-dimensional percentages of land cover variables, thus indicating the importance of nutrient-rich habitats for lotic species in general and for diatoms in particular. In addition, the land use of our study area is not as intensive as in the more populated areas in the world (Leland and Porter 2000).

It is worth noticing that spatial and temporal variation of the physical–chemical properties of natural streams by factors such as weather (e.g. variation in runoff), atmospheric deposition and chemical weathering of soils (Likens 2013) may result in individual water samples (e.g. pH) that describe water quality mainly during the time of sampling. On the contrary, we argue that soil-type-richness is more permanent (although indirect) variable to explore sources and quality of water. For instance, soil-type-richness reveals information about the different water sources (e.g. springs vs. peatlands) to streams. In addition, a catchment with high soil-richness probably includes a more diverse array of sediments, which enter the streams as runoff and, in turn, increase heterogeneity of stream bottoms and eventually offer more variable habitat conditions for benthic organisms (Beisel et al. 2000).

Rock-type richness has been shown to affect positively the biodiversity of terrestrial plants (Pausas et al. 2003; Spitale and Nascimbene 2012; Tukiainen et al. 2017a, b), but according to our results, it is not an especially important predictor for aquatic biodiversity. This is probably because of igneous and metamorphic rocks that are prevalent in the study area. They have low weathering capabilities, which create small ionic concentrations of stream waters and further minor effect for biodiversity (Allan and Castillo 2007). Additionally, weathering of bedrock is marginal because our study area is mainly covered by till and other surficial sediments. Geomorphological richness was the second-best geodiversity variable in our BRT models, suggesting that there is at least some level of variability of unique habitats created by geomorphological features (Hjort et al. 2015). Moreover, as both geomorphological richness and soil-type richness were important, it seems that spatial environmental heterogeneity at the catchment-level will support high species richness. This positive relationship has been already attested in terrestrial ecosystems (Nichols et al. 1998; Tukiainen et al. 2017a), but it has been just barely speculated for boreal headwater streams (Jyrkänkallio-Mikkola et al. 2017) as well as lakes and rivers (Toivanen et al. 2019). In the future, including topographic data as a part of geodiversity information (Parks and Mulligan 2010) could enhance our knowledge about possible relationship between topographic variations and biotic richness in stream environments.

Macroinvertebrate and diatom richness models explained variation in the data better than the model for bacterial richness. Differences in the explanatory power between these biotic groups could reflect the strong influence of selected variables, such as width and depth on macroinvertebrates richness (Hildrew and Giller 1994; Heino et al. 2003), and the effects of water chemistry (Soininen 2007) and human impact (Leland and Porter 2000) on diatom richness. The lower performance on bacterial richness could be related to the large number of OTUs (Jyrkänkallio-Mikkola et al. 2017). Moreover, the exclusion of conductivity from the list of predictor variables may have reduced the prediction ability because conductivity has been found to be an important variable for bacterial richness (Jyrkänkallio-Mikkola et al. 2017). On the other hand, we argue that by including pH and, especially, the land use variables we were able to obtain enough information from the effect of water chemistry on species richness.

The patterns we detected may also be related to the efficiency and robustness of the modelling method used. BRTs have been a popular method in recent studies that have examined the biodiversity–geodiversity relationship (Tukiainen et al. 2017a; Toivanen et al. 2019), probably because of its flexibility (Elith et al. 2008). However, as conventional statistical methods, such as generalized linear models, generalized additive modelling and variation partitioning detected a similar finding of the importance of geodiversity for biodiversity (Tukiainen et al. 2017b; Toivanen et al. 2019), it is rather safe to argue that our results are describing true natural phenomena. Nevertheless, we suggest to further investigate the linkages between abiotic and biotic nature using other statistical methods.

In this study, we found out a positive relationship between geodiversity and species richness. This supports the use of geodiversity as a proxy to explain and predict species richness at the catchment-scale. For example, as Beier and Brost (2010) underlined in their study, protecting the physical elements of the abiotic environment offers a climate-resilient way to protect landscape structure, which allows species to adapt to changing conditions as projected climate change will continue. Bringing this idea to stream environments suggests, for instance, that the catchments of high soil-type richness may offer more possibilities and resources for species to better cope with global warming compared with more homogenous catchments. In practice, more heterogeneous catchments (e.g., higher geodiversity) may work as a buffer against weather perturbations and their negative effects on organisms (Piha et al. 2007). This information would offer new perspectives for management and planning. For instance, we could use geodiversity information, in addition to biological data, to decide the most valuable catchments for conservation and restoration efforts. Alternatively, if we can fit the geodiversity data at the scale of the sub-corridor (e.g. 200 m, Sponseller et al. 2001), it would offer a tool to explore the catchment management zones, i.e., areas around a stream or a river in which there are some prohibitions and restrictions for human land use activities (Abell et al. 2007).

In conclusion, although traditionally used local habitat variables were generally the most influential variables associated with species richness, we also found that catchment-scale geodiversity, especially soil-type richness, was correlated with high species richness. In the face of ongoing global change, protecting and preserving catchment geodiversity could help to maintain stream biodiversity.