• Machine learning enables extrapolating stream metabolism, temperature, and light to the network scale.

  • GPP, ER, and NEP change predictably across the river network.

  • ER at network-scale allows for separation between allochthonous and autochthonous contributions.


Primary producers fix carbon dioxide (CO2) as organic carbon through photosynthesis, a flux known at the ecosystem level as gross primary production (GPP). Terrestrial GPP is the largest global carbon flux on Earth and drives critical ecosystem functions, foremost respiration, growth, and nutrient cycling. Terrestrial and marine GPP and ecosystem respiration (ER, as the respiration from all heterotrophic and autotrophic organisms) control land-atmosphere and ocean-atmosphere CO2 exchange with feedbacks on the world’s climate (Falkowski and others 1998; Heimann and Reichstein 2008; Beer and others 2010). The balance between GPP (positive flux) and ER (negative) flux is the net ecosystem production (NEP), which informs on the net accumulation of organic carbon and its potential trophic transfer, or on its export to other ecosystems. Understanding and predicting GPP and ER in terrestrial and marine ecosystems has been in a mainstay in global carbon cycling research and Earth system sciences (Chapin and others 2006; Goulden and others 2011; Xiao and others 2013).

Streams and rivers drain the continents in dense networks. Today we understand that the CO2 evasion fluxes from these ecosystems, including the smallest headwaters within the networks, are within the same range as ocean uptake fluxes of CO2, although in the opposite direction (Raymond and others 2013; Drake and others 2018; Horgby and others 2019b; Rocher-Ros and others 2019). Although these fluxes are becoming increasingly better constrained, the relative importance of CO2 sources within the catchment versus those from in-stream metabolism to the emitted CO2 is still often unclear (Hotchkiss and others 2015; Duvert and others 2018; Horgby and others 2019a). The proper quantification of metabolic fluxes in streams and rivers networks, and this across spatial and temporal scales, is therefore critical to pinpoint the role of these ecosystems in the global carbon cycle.

The advent of environmental sensor technology has profoundly changed the way we study stream ecosystem metabolism. Today, we are able to produce high-resolution time series of GPP, ER, and NEP at reach scale to draw conclusions on the drivers of stream ecosystem metabolic regimes (for example, Beaulieu and others 2013; Hall Jr and Beaulieu 2013; Hall Jr and others 2015; Bernhardt and others 2018). With such annual or even multiannual time series, the effects of shifts in the hydrological regime, including storm-induced disturbance and recovery dynamics, or of pulsed resource supply, for instance, can be studied (Uehlinger and Naegeli 1998; Raymond and others 2016; Demars 2019). Encompassing daily and seasonal variation of the metabolic regime, these time series are also critical to establish solid annual budgets of metabolic fluxes within streams, which are essential to assess catchment-scale carbon biogeochemistry and ecosystem fluxes (Tank and others 2018). In this context, NEP is particularly relevant as it links the terrestrial and aquatic carbon cycles (that is, the “boundless carbon cycle”, Battin and others 2009) through lateral carbon fluxes (Regnier and others 2013) and furthermore informs on the biogeochemical connectivity downstream and throughout fluvial networks (Battin and others 2008). Finally, understanding and predicting the metabolic regime of streams and rivers is also critical for ecosystem restoration at the nexus between hydrology, and both carbon and nutrient cycling (Palmer and Ruhi 2019).

Most studies on metabolic regimes refer to individual stream reaches (for example, Uehlinger and Naegeli 1998; Roberts and others 2007; Beaulieu and others 2013; Hall and others 2016; Appling and others 2018; Bernhardt and others 2018; Ulseth and others 2018), and therefore they do not allow to draw conclusions on potentially emerging properties at the scale of entire stream networks. This, however, would be essential to properly assess the relevance of streams and rivers for carbon cycling at regional and global scales. One noticeable exception is the study by Rodriguez-Castillo and others (Rodríguez-Castillo and others 2019) using spatial statistics to extrapolate ecosystem metabolism from 41 reaches over 72 hours to a stream network. While this study helps to resolve the spatial variation of stream ecosystem metabolism within a network, it does not inform on its temporal dynamics. The combination of both spatial and temporal variation (that is, the regime) of stream metabolism at the level of entire networks has been hampered by the lack of suitable modeling approaches. Resolving this issue is relevant for the proper upscaling of metabolic fluxes in time and space and to detect emerging properties of stream networks. Owing to the dendritic and other topological characteristics of stream networks, the hydrological regime and its resilience are not uniformly distributed across a given network (Botter and others 2013). Furthermore, network structure seems to affect the community structure of stream biofilms (Widder and others 2014), which orchestrate numerous ecosystem processes, including metabolism (Battin and others 2016). A recent series of theoretical studies using optimal channel networks have highlighted the potential role of stream network emerging properties for the removal of dissolved organic carbon and nitrogen, and for GPP regimes (Bertuzzo and others 2017; Helton and others 2018; Koenig and others 2019).

Machine learning is now offering novel opportunities to advance geosciences at the interface between climate and global carbon models (for example, Reichstein and others 2019), for instance, but also ecological sciences (for example, Olden and others 2008). Despite the power of machine learning to resolve complex nonlinear or multimodal relationships as they often underlie Earth’s surface and ecological processes, to date, it remains poorly used in stream biogeochemistry. Here, we propose that machine learning has the potential to extend stream ecosystem metabolism from the reach scale to the scale of entire stream networks. Over the last years, a suite of environmental predictors for stream ecosystem GPP and ER has been identified (for example, Mulholland and others 2001; Beaulieu and others 2013; Bernhardt and others 2018). Chief among them figure photosynthetic active radiation (PAR) and temperature (T). Machine learning could thus be used to extrapolate both in time and space such heterogeneous forcings (for example, streamwater temperature and light) required, for instance, to run process-based models for reach-scale metabolism (Hall and others 2016; Segatto and others 2020) to the scale of an entire stream network. Furthermore, the same procedure could be tested directly on reach-scale estimates of ecosystem metabolism to check whether available data contain enough information to explain network-scale variations in metabolic regimes.

In this study, we used Random Forests (RF, Breiman 2001) as a supervised learning technique to describe and predict patterns of metabolic regimes (that is GPP and ER) and critical environmental forcings (that is PAR and T) at the scale of an entire stream network. To do so, we used 18-month-long time series of data available for the Ybbs River network (Austria, Figure 1). Streamwater temperature, photosynthetic active radiation, and dissolved oxygen concentration were available for all 12 reach sites (Ulseth and others 2018). Furthermore, we used such data to estimate daily rates of ecosystem GPP, ER, and NEP via the single-station approach (see “Methods” section). We explicitly trained our RF model by integrating distal factors, such as vegetation type, canopy cover, hydraulic flow geometry, hydrogeomorphic properties, incident light, precipitation, and other climatic variables.

Figure 1
figure 1

Ybbs Catchment and related major spatiotemporal covariates employed in the learning process: elevation, tree cover density (TCD), dominant leaf type (DLT), and light exposition (top four panels, clockwise order). Sampled reaches (yellow dots numbered from 1 to 12) and the weather station (red dot) are shown as well. Major temporal covariates are displayed from January 2013 to June 2014 in the right-hand side of figure. High-frequency measurements of irradiation, air pressure, precipitation, and air temperature have been collected at the Lunz am See weather station. Hourly discharge displayed in the last subplot refers to the measured signal at the outlet of the catchment. Light and air temperature are shown both at hourly (thin line) and daily (thick line) timescale.

This approach allowed us to reliably establish annual regimes of water temperature, light, and metabolic fluxes (GPP, ER, and NEP) across the Ybbs River network and to quantify the relative contributions from small and large streams to these fluxes at the network scale. We were also able to quantify the annual variability of each of these fluxes across the network with the goal to assess the metabolic resilience from small to larger streams. Finally, we partitioned the allochthonous and autochthonous components of ER, which is important to assess allochthony at the network level. Our findings shed new light on stream network metabolic regimes and provide empirical evidence for long-standing theory predicting shifts of ecosystem metabolism along the stream continuum (Vannote and others 1980).


Study Area

We combined our machine learning approach with data from the Ybbs River network, which drains a 256 km2 prealpine catchment in Austria (47°48’22.9” N, 14°57’00.8° E). The stream network is shown in Figure 1 and has been delineated from a 10-m-resolution digital elevation model (DEM) (Besemer and others 2013). The climate is prealpine with an average annual precipitation of approximately 900 mm and an average temperature of approximately 7 °C (Ceola and others 2014).

The response variables that we aim to predict and extrapolate are water temperature (T), photosynthetic active radiation (PAR), GPP, and ER. Information about such variables is available simultaneously for 12 stream reaches for the period going from January 2013 to June 2014 (Figure 1). Daily metabolic rates of GPP and ER [gO2m−2d−1] have been locally inferred from high-frequency measurements of dissolved oxygen concentration using the single-station approach (Demars and others 2015). Specifically, we adopted the approach described in Segatto and others (2020) in which GPP is assumed linearly dependent on PAR and ER is constant throughout the day. Further details on the dataset can be found in Ulseth and others (2018); Segatto and others (2020), and in the Supporting Information (SI) Methods.

Random Forest

Random forest (RF) is an ensemble learning algorithm combining many classification or regression trees (CARTs, see Breiman and others 1984) which has been developed to overcome some of the limitations of single CARTs: their poor predictive power and the fact that they do not generalize well from the training data (that is, overfitting, Bramer 2007). One such improvement is bootstrap aggregating, also called bagging (Breiman 1996), which increases the stability and accuracy of CARTs and avoids overfitting. In bagging, multiple trees are trained independently on each other. Trees are created (up to a user-defined number Ntree) by drawing a random subset of training samples with replacement (bootstrap sample). About two-thirds of the sample (referred to as in-bag sample) is typically used to train the single trees. Observations in the original dataset that do not occur in a bootstrap sample are called out-of-bag observations (OOB observations) and are used to estimate the prediction error (typically the root mean square error, RMSE, in case of regression) of the tree (out- of-bag error, OOB error). Random forests (Breiman 2001) are a slightly modified version of bagged trees in which, at each node, only a small number of randomly selected variables (Mtry, typically set to one-third of the features for regression trees, Gislason and others (2006)), are made available for the split. In both cases (bootstrap aggregating and RF), the response variable is calculated averaging, in the case of regression, the predictions of all trees. RF turns out to perform very well compared to many other classifiers including popular support vector machines and neural networks and is robust against overfitting (Breiman 2001). To assess the importance of a specific predictor in a RF, the values of that variable are randomly permuted for the OOB observations, and then the modified OOB data are passed down the tree to get new predictions (Breiman 2001). The difference between the RMSEs for the modified and original out-of-bag data is a measure of the importance of the variable. Overfitting can be reduced by selecting only features that improve prediction performances of unseen observations.

Feature Variables

We defined the stream reach (that is, the channel segment between two consecutive confluences or between the head of a stream and the next confluence), and its underlying subcatchment, as the basic unit of this study. The Ybbs network consists of 292 reaches. An entry for the training of the RF can be established only if all features (or predictors, or learners, that is measurable properties of the phenomenon under study) and response variables (that is, the recorded observations of that phenomenon) are simultaneously available for the same reach and time. A full list of all features used in the RF to predict the response variables is reported and further described in Table S1 in SI Methods, while a summary is given in what follows.

Discharge was measured at the catchment closure (Figure 1), and was downscaled to every reach of the network by assuming local discharge proportional to contributing area (Segatto and others 2020). The geomorphic properties of the Ybbs network have been extensively studied by Ceola and others (2014), and we adopted the same exponents to characterize how stream width w [m] and the Gauckler Strickler’s roughness coefficient Ks [m1/3s−1], scale as a power-law function of the contributing area A [km2]. We determined the average streambed slope of each reach from the DEM and estimated the time series of mean water depths, z(t) [m], assuming rectangular reach cross sections of constant width and uniform flow conditions (Manning equation, as in Segatto and others (2020)). Few other geomorphological network properties such as distance to the outlet and total upstream length have also been added (SI Methods).

We further included spatial information on vegetation cover obtained starting from high-resolution layers of dominant leaf type (DLT), describing the presence or absence of broadleaved or coniferous trees, and tree cover density (TCD) obtained via the Copernicus Land Monitoring Service (Figure 1). Information about surface light exposition has been derived from the DEM by creating a grayscale normalized representation of the catchment exposition, with the sun’s relative position taken into account for shading the image (Figure 1). Starting from the distributed information, we derived both reach vegetation coverage (that is, the average DLT and TCD over the reach pixels) as a proxy of stream canopy shading (termed DLT Local and TCD Local) and watershed vegetation coverage (that is, the average DLT and TCD over the entire subcatchment) as a possible proxy of substrate and nutrients availability (termed DLT WS and TCD WS). The same procedure was followed also for light exposure.

We integrated climatic information using data collected at the weather station in Lunz am See (Figure 1), which included precipitation [mm] and its cumulative duration [min], air temperature [C], barometric pressure [mmHg], irradiation [W m−2], duration of sunshine [min] and some derived smoothed signals from temperature (see Figure 1 and Table S1 in SI Methods). Moreover, we included distance from the meteo-station to account for possible spatial correlation between variables measured there and in other places of the network. Time-related predictors such as flags tracking the fraction of the day and of the year, month, and season have been added to the feature library. Finally, we implemented the algorithms in Meeus (1991) to predict for each day, the apparent (refraction corrected) sunrise, noon, and sunset times in seconds from midnight and from which we characterized intra-daily expected solar temporal dynamics (see SI Methods).

Finally, to possibly account for biophysical properties not explicitly considered, or not available for all reaches, we embodied geometrical features along which such properties could cluster. Specifically, we included Euclidean (that is the subcatchment Cartesian coordinates) and network (that is distance from the outlet) geometrical features.

Model Training Procedure

We trained the RF for PAR at daily timescale (12 sites × 609 d = 7308 observations/feature; 24 features, (Table S1 in SI Methods) and the RF for T at 1 h (7308 × 24 h = 175392 observations/feature; 33 features, Table S1 in SI Methods). RFs for GPP and ER have been trained at daily scale as it corresponds to the scale of the single station estimates. Moreover, we investigated the gain in performance that can be achieved including the RF estimates of mean daily T and PAR in the feature portfolio of GPP and ER RFs. After the first set of trials to appreciate the relationship between OBB error and the number of trees Ntree, we set Ntree = 500 for daily-scale RFs and Ntree = 200 for the hourly-scale RF, whereas Mtry has been fixed, for all the following scenarios, to the customary value of one-third of the features number (see “Random Forest” section).

We devised two training setups, termed S and T, designed to probe the performance of the developed RF to extrapolate in space and time, respectively. In training S, we assessed the RF spatial prediction power by completely excluding, one at a time, a monitored reach site from the learning stage of each response variable, and evaluating model performance as the RMSE in the predicted response variable in the excluded site. This setup thus implies training 12 RFs (as many as the number of sites) for each response variable. The RFs were originally trained using all available features that were then ranked according to their cumulative OOB variable importance (see “Random Forest” section) over the 12 RFs. The optimal subset of features ensuring nonoverfitting was then selected as follows. The RFs were then retrained multiple times progressively excluding features from the least to the most important. Finally, we selected the subset of features that minimized the RMSE of the prediction of the response variables in the excluded sites (see Table S1 and Figures S1–S6 in SI Methods). When extrapolating RF results to the whole river network (292 reaches), we averaged the prediction of the 12 RFs (termed ensemble RF).

To test for the temporal prediction power, we trained a single RF per response variable in which the training dataset included the first year of the time series of all the 12 sites (January 2013–January 2014), while excluding the last six months (named training T). Model performance was evaluated as the RMSE in the excluded months of all sites together. Feature selection was then performed as described for training S (see Table S1 and Figures S7–S12 in SI Methods). Although other configurations are possible, for example simultaneously leaving out a particular site and a particular time span for training, lack of distributed data prevented the investigation of more complex scenarios as the resulting training set would not have left enough information to properly train the model.

Contrarily to the S training, we observed that RFs in the T setup achieved a low predictive error with few features variable and were not very sensitive to the further addition of features (Figures S7–S12 in SI Methods). To have a unique set of features for each response variable, we retained the predictors selected under the more demanding training S (see Table S1 in SI Methods), and using only the selected features, we repeated the whole training sequence illustrated above (that is, ranking of feature importance, re-training progressively excluding least importance features) to check for possible residual overfit (Figures S13–S22 in SI Methods) and evaluate the final predictive error.

Allochthonous vs Autochthonous Respiration

Combining a carbon removal model with the scaling theory of fractal river network, Bertuzzo and others (2017) predicted that the respiration of allochthonous (that is, coming from the terrestrial ecosystem) material should exhibit a power-law scaling with the drainage area. We combined such prediction with the RF results to provide a first approximation of the separation of the total ER into its allochthonous (ERal, that is, terrestrial deliveries to the stream network) and autochthonous (ERau, that is, of organic material produced within the network) components. Specifically, we determined the exponent of the power-law scaling via a quantile regression of daily ER values in the different reaches assuming that the lower bound represents the ERal and that any additional respiration should come from ERau. We repeated the procedure for four different seasons to capture the possible effect of temperature and delivery rates on ERal (further details in Figure 6).


RF Modeling in the Ybbs River Network

Training T consistently achieved lower predictive error than training S for all response variables (Table 1). Although the latter uses a higher fraction of data (11/12 versus 2/3), this result was expected because training S has only 11 sites to learn how to extrapolate spatial features to the whole network. In contrast, training T relies on more than 365 time points to learn to predict temporal patterns. We retained the results of training S which is deemed more demanding yet more suitable for spatial extrapolation (to which all figures and results refer). A compendium of site-by-site RF predictions for all the implemented training schemes (that is, RF for PAR, T, GPP, ER, and GPP and ER including extrapolated T and PAR under both training T and S setups) is available in SI Results in Figures S23–S30. Partial dependence plots showing the isolated effect of each predictor for each response variable are reported in SI Results in Figures S31–S34.

Table 1 RF Model Efficiency.

The trained RF correctly recognized upstream reaches with low PAR because of high tree cover density compared to the wider downstream reaches with higher PAR (Figure 2). PAR was heterogeneously distributed throughout the Ybbs River network with daily averages ranging from 5112 to 17555 lux. Modeled average annual streamwater temperatures ranged from 6 to 8°C and clearly increased downstream (Figure 2). The temporal regimes of PAR and streamwater temperature have been correctly learned. For instance, predicted daily PAR and streamwater temperature closely followed measured values in two different subcatchments (Figure 2, see SI Results for all other sampled reaches time series and related prediction errors).

Figure 2
figure 2

Random forest (RF) predictions of light (PAR) and water temperature (T). Maps in the left column show RF predictions for mean daily PAR (top) and streamwater temperature (bottom) of the 292 stream reaches composing the Ybbs river network. Plots in the right column show the comparison between the time series of measured and predicted mean daily PAR and T for two representative reach sites (a and b, see location on the maps). The comparison for the remaining sites is reported in SI Results in Figures S23 and S25. Results refer to the S training (see “Methods” section). The predicted time series reported here are derived using the RFs trained excluding the site shown (that is, site a and b, respectively). Maps show the results of the ensemble RF, that is, the average of the 12 RFs obtained excluding all sites one at a time (see “Methods” section).

Overall, the RF model reliably predicted the annual regimes of ecosystem GPP and ER and, by difference, also NEP across the Ybbs River network (Figure 3, Table 1). However, we found that including modeled daily PAR and streamwater temperature (Figure 2) did not significantly improve our RF predictions of both GPP and ER regimes. We, therefore, retained the simpler formulation that does not include them in the library of predictive features. Daily metabolism predicted by RF agreed reasonably well with GPP and ER computed from the single-station method as indicated by normalized root-mean square errors (GPP: 5.8 to 18 %; ER: 11 to 18 %) (Figure 3, see SI Results for all other sites). Furthermore, we found good agreement between reach-scale mean annual GPP and ER computed from the single-station approach and the RF predictions when the focus site was excluded from training (that is, training S) (GPP: r = 0.7, p < 0.05; ER: 0.6, p < 0.05, across the 12 streams within the Ybbs River network, Figure S35 in SI Results). The main predictors for GPP were precipitation, irradiation, sunshine duration and period of the year, air temperature and pressure, discharge, streamwater depth, position along the river network, and light exposure. The main predictors for ER were irradiation, sunshine and precipitation duration, period of the year, air temperature and pressure, discharge, slope, length, flow depth, position, and watershed tree cover density and dominant leaf type (see Table S1 in SI Methods). RF predicted increasing patterns of mean daily GPP moving from the smallest headwaters (0.2 to almost 1 g O2 m−2d−1), toward the outlet (2.7 g O2 m−2d−1, Figure 3). By contrast, mean daily ER was highest in the headwaters (up to 3.1 g O2 m−2d−1), lowest in the mainstem within the upper catchment (around 0.9 g O2 m−2d−1) and intermediate (around 1.5 g O2 m−2d−1) at the outlet of the Ybbs River network. Calculated mean daily NEP values consistently indicate heterotrophy (-2.5 to -0.5 g O2 m−2d−1) within the headwaters and autotrophy (up to 1 g O2 m−2d−1) along the mainstem, particularly toward the most downstream reaches.

Figure 3
figure 3

Random Forest (RF) predictions of network-scale metabolic regimes. Maps show RF predictions for mean daily GPP (top-left) and ER (top-right), and NEP (bottom-left) as difference between GPP and ER. Plots (bottom-right) show the comparison between the time series of estimated (via the single-station approach) and predicted (via RF) daily GPP and ER for two representative reach sites (a and b, see location on the maps). The comparison for the remaining sites is reported in SI Results in Figure S27. Results refer to the RF trained in the S setup and that does not include the predicted PAR and T in the feature library (see “Methods” section). Maps show the results of the ensemble RF, that is the average of the 12 RFs obtained excluding all sites one at a time (see “Methods” section). Plots report both the prediction of the ensemble RF (tick line) and of the RF trained excluding the site shown (thin line).

Spatiotemporal Variation of Ecosystem Metabolism at Network Scale

Integrating the reach-scale metabolic fluxes, we were able to assess ecosystem metabolic regime at the network scale and this across an entire year (Figure 4). This approach detected a conspicuous peak in GPP during spring (April, May), which transiently rendered the entire network metabolism autotrophic with a cumulative NEP of 0.1 Gg O2 (equivalent to an areal flux of ∼3 g O2 m−2d−1) during that window (Figure 4B). ER was elevated during summer, which reversed the network-scale metabolism to heterotrophy during the rest of the year. This high level of heterotrophy becomes obvious when comparing the metabolic fluxes cumulated over the entire year and across the 292 streams, with GPP, ER, and NEP averaging 326 (95% range: 173 ÷ 576), −824 (95% range: −1068 ÷ −476), and -498 (95% range: −824 ÷ 107) g O2 m−2y−1, respectively. These areal fluxes translate into network-scale annual fluxes of 0.63, −0.83, and −0.20 Gg O2 for GPP, ER and NEP, respectively (Table 2).

Figure 4
figure 4

Stream ecosystem metabolic regime at network scale. Stream reaches have been clustered in three major groups according to their Strahler’s stream order and catchment size (panel a and c). Group I (green) includes all streams with catchment size smaller than the largest first-order stream (5.6 km2), group II (orange) those smaller than the largest third-order stream (that is, from 5.6 to 36 km2) and group III (blue) all the other larger streams. Panel a displays scatter plots of reach-scale mean daily GPP, ER and NEP against drainage area, in logarithmic scale, and their piece-wise regression lines whose equations read as follows (lbp = left side; rbp = right site of the break point): GPPlbp = 0.779 − 0.027 log(A); GPPrbp = −1.264 + 0.676 log(A); ERlbp =  − 2.43 + 0.270 log(A); ERrbp = − 0.59 − 0.207 log(A); NEPlbp = − 651 + 0.240 log(A); NEPrbp =  −2.712 + 0.649 log(A). Breakpoints have been selected to minimize the overall RMSE among all possible combinations (including no breakpoints). Panel b displays the contribution of the different groups to the total, network-scale GPP and ER, expressed per unit of streambed area of the entire river network. Bottom plot of panel b shows network scale NEP per unit of river network streambed area. Top row of panel c shows the frequency distribution of the Strahler’s order of the streams belonging to the three different groups. Bottom plots show the range of variability (colored areas) and the average trend (black lines) of the reach-scale GPP, ER, and NEP of the three different groups.

Table 2 Whole River Network Annual Ecosystem Metabolism Along with the Contribution of the Three Groups Identified.

To assess the relative contributions of small and larger streams to the network metabolism, we aggregated streams into three groups according to their Strahler’s order and the related catchment size. Group I includes all streams with catchment size smaller than the largest first-order stream (5.6 km2), group II those smaller than the largest third-order stream (that is, from 5.6 to 36 km2), and group III all the other larger streams. Figure 4B shows how the annual, network scale metabolic regime is partitioned among the three groups, while Figure 4C reports the variability of the metabolic patterns among individual reaches belonging to the three different groups. Not unexpectedly, larger streams contributed more than 60% to the network annual GPP despite their smaller contribution by streambed area extent (Table 2). Smallest streams (that is, group I) contributed 26% of the total streambed area, but only 16% of the annual primary production. Interestingly, streams from all three groups contributed more evenly to the network ER (Table 2). As a result of GPP and ER patterns across the network, the smallest streams largely drove the heterotrophy (negative NEP) that we observed at the network scale. A further way to explore stream metabolism across the network is to establish scaling relationships between metabolic fluxes of stream reaches and their position along the river network, here indicated by their catchment size (Figure 4A). Annual mean GPP did not change significantly with catchment size below 18 km2, but beyond this threshold GPP significantly increased with the logarithm of catchment size. On the other hand, annual mean ER increased (that is, decreased in absolute value) with catchment size up to 47 km2 and decreased beyond this breakpoint. Finally, annual mean NEP increased (that is, became less heterotrophic) with catchment size up to a breakpoint at 13 km2, beyond which it increased at an even higher rate, eventually becoming positive toward the outlet.

It is relevant to understand the stability of ecosystem processes as a property that potentially emerges from stream networks (for example, Sabo and others 2010; Terui and others 2018). Our predictions from RF modeling allowed us to assess the temporal variability of the metabolic fluxes across the Ybbs River network (Figure 5). Measured as the coefficient of variation (CV) of the daily fluxes over one year, we found that the temporal variability of GPP increased downstream (Pearson’s r = 0.33, p < 0.01), whereas the temporal variability of ER exhibits a weaker linear downstream trend (r = 0.13, p < 0.05).

Figure 5
figure 5

Variability of GPP and ER time series. Coefficient of variation (CV) of the predicted time series of reach-scale GPP (A) and ER (B) as a function of drainage area (log-scale). Each dot represents the CV of one stream reach.

We further combined the RF modeling results and an organic carbon removal model (Bertuzzo and others 2017) to estimate how daily reach-scale ER is partitioned between the respiration of autochthonous (ERau, that is, produced as GPP within the same reach or upstream) and allochthonous (ERal, that is, carbon imported from the surrounding terrestrial ecosystem) material (Figures 6, 7). In stream ecology, allochthonous could sometimes refer to the ecosystem outside the single reach being analyzed, including the upstream stream ecosystems. We instead highlight that, as the scale of the present study is the whole network, allochthonous here refers to an origin other than the whole stream ecosystem. We found that at the scale of the entire network and over one year, 37% of the GPP is respired (ERau = 0.37 Gg O2) and that allochthonous C was the dominant source to ER (ERal = 0.46 Gg O2). Clearly, ERau mirrored the seasonal pattern of GPP, with a peak in spring and a minimum in winter, when the potential for GPP and downstream transport thereof was low. ERal peaked instead in late summer fall. At the network scale ERau was lower than ERal except during spring when a switch in the dominant energy source was observed. When breaking the network-scale fluxes down to groups of stream sites (I, II and III), ERal clearly dominated in almost all groups, whereas ERau increased downstream as did GPP. The excess of GPP that is not respired within the river network is mostly produced in group III.

Figure 6
figure 6

Separation between allochthonous (ERal) and autochthonous (ERau) ecosystem respiration. Left panels show scatter plots of extrapolated (via RF) daily ER [gO2m2d−1], during the four indicated seasons and in all 292 reaches, as a function of the reach drainage area (A [km2]). We determined the exponent of the power law scaling via a 0.05-quantile regression of daily ER values in the different reaches and days assuming that the lower bound represents the ERal and that any additional respiration should come from ERau. The ERal time series for each reach has then be derived via a spline interpolation of the four seasonal values. Example for the final separation results is shown in the right panels for one randomly selected reach (A–C) per group (I–III).

Figure 7
figure 7

Autochthonous vs allochthonous ecosystem respiration. Comparison between cumulative (mass units, GgO2) GPP, autochthonous ER and allochthonous ER, throughout the entire year and at seasonal scale. The main panel displays quantities at the river network scale, while right panels focus on the three groups of stream reaches introduced in Figure 4.


Over the last decade, the River Continuum Concept has fundamentally shaped our conceptual understanding of how ecosystem metabolism may change along the longitudinal continuum from small streams to larger rivers downstream (Vannote and others 1980; Fisher and others 2004). Despite this, we have not been able to reliably predict stream ecosystem metabolism at the scale that is relevant to carbon cycling. Clearly, this is at the scale of stream networks that drain the landscapes. Although this has been achieved with increasing accuracy for terrestrial ecosystems (Field and others 1998; Anav and others 2015), the heterogeneous and nested structure of stream and river networks has encumbered the prediction of their metabolic fluxes on an annual basis. As a consequence, we have not yet been able to assess properties of metabolic regimes potentially emerging from real-world stream networks.

Our study provides a first proof of concept that data-driven machine learning is a valuable tool to predict the annual regime of ecosystem metabolism at the level of an entire stream network. We opted for RF as a flexible algorithm, whose predictive performance can compete with the best supervised learning techniques (for example, artificial neural networks) (Liu and others 2013; Rodriguez-Galiano and others 2015). Overall, RF predicted reasonably well the annual regimes of GPP and ER. We do acknowledge, however, that our approach failed to reconstruct some temporary and localized events of ecosystem metabolic activity; this failure increased the predictive error of the individual site excluded from training. On the other hand, the annual GPP and ER fluxes agreed well with the observed ones (Figures S35 and S36 in SI Results). This highlights the advantage of our modeling approach as it allows the continuous appreciation of annual metabolic regimes that are otherwise often prone to missing data. Furthermore, our annual estimates of GPP and ER are closely bracketed by those reported from other headwater streams covering various biomes (for example, Battin and others 2008; Hoellein and others 2013; Rodríguez-Castillo and others 2019). We are therefore confident that RF modeling is a valuable tool to extrapolate metabolic regimes in every reach of an entire stream network.

The Ybbs streamwater temperature regime at the network scale was predicted by a combination of all available predictors (see SI Methods in Table S1), while PAR was best inferred using a combination of meteorological (that is, precipitation, sunshine duration, irradiation, and air temperature and pressure), hydrological (discharge, slope, reach length, and water depth) and geomorphological responses (subcatchment position, local DLT, TCD, and light exposure). Both long- and short-term measurements of ecosystem metabolism at reach scale have revealed PAR and flow-induced disturbance as the major drivers of GPP (for example, Mulholland and others 2001; Beaulieu and others 2013; Blaszczak and others 2019). Using spatial statistics, Rodriguez-Castillo and others (2019) retained channel cross-sectional area, nitrate concentration, and algal biomass as predictors for GPP at the network scale. In our study, out of a library of 35 potential predictors (that is, the “learners”), the RF model retained daily sunshine duration, season, and position within the Ybbs River network, but also discharge, water depth, air temperature, irradiation, light exposure, barometric pressure, and precipitation. In the final version of the model selected, we did not retain the modeled PAR and streamwater temperature T as predictors for GPP (and ER) because the efficiency gain was not high enough (Table 1) to justify the use of a more complicated procedure (a cascade of two RFs) and the introduction of predicted features (PAR and T) possibly prone to errors. It should be noted that all the predictors used by the RF generated to predict PAR and T were also fed to the GPP and ER RFs. Therefore, the algorithm was able to learn the complex nonlinear interactions relating the original predictors to PAR and T, and eventually to GPP and ER, without the need to provide PAR and T separately. Predictors retained for ER included, besides position within the Ybbs River network, descriptors of temperature, soil water content (as approximated by rainfall duration), precipitation and discharge, but also watershed tree cover density and type, for instance. We argue that these predictors translate into the catchment potential to produce and deliver organic matter to the streams and further into the potential of stream ecosystem metabolism. This approach allowed us to extend the library of predictors usually employed in metabolism models (see, for example, Demars and others 2015). Overall, most predictors used are relatively straightforward to extract from various repositories of spatial and meteorological data. Flow-related variables throughout the network can be obtained by combining gauging data (or alternatively hydrological modeling) with scaling relationships of hydraulic geometry (Leopold and Maddock 1953).

We stress that the scope of our study is to investigate emerging patterns of the metabolic regime specifically of the Ybbs river network. For this purpose, we gathered all possible features and let the validation step to properly select those relevant. In general, the interactions between network properties and biological predictors (for example, stream elevation, discharge, and vegetation cover) could be different for different network systems. Moreover, we included few spatial features that improved the performance of the model but are specific to this case study, for example the spatial coordinates and the network geometry. Therefore, the trained models we developed herein are not directly transferable to other river networks. However, this first case study provides a road map for the development of similar applications in other contexts. At this stage, any new application would require enough reach-scale metabolic data to be trained. However, when data for several river networks will be available, and recent technological advances suggest this goal is not that far into the future (Appling and others 2018), we envision that it will be possible to train a model to predict the metabolic regime also in ungauged catchments.

Network-Scale Ecosystem Metabolism

The GPP regime at the network scale exhibited a bimodal pattern with a conspicuous peak during spring and a reduced, second summer peak. Our findings on a real-world stream network corroborate the emerging property of network-scale GPP as recently reported in theoretical optimal channel networks (Koenig and others 2019). Indeed, the bimodal pattern found for the Ybbs River network resembles that obtained by Koenig and others (2019) under the stochastic assignment, that is, the scenario proposed to be more representative of real rivers by the authors. Dissecting this pattern into the contributions of stream reaches of different sizes to network-scale GPP, we found that downstream reaches were mostly responsible for the spring and summer peak in GPP. This pattern was clearly driven by light availability as modulated by channel geomorphology as predicted by the River Continuum Concept (Vannote and others 1980).

Our findings reveal that the annual metabolism of the Ybbs River network is heterotrophic, with the smallest headwaters contributing most to its heterotrophy. This is in line with earlier predictions on stream ecosystem metabolism (Fisher and Likens 1973; Vannote and others 1980) and studies compiling data from a wide suite of stream ecosystems (Battin and others 2008; Hoellein and others 2013). Perhaps less trivial, our findings provide first insights into the contributions of streams by the areal extent and metabolic fluxes beyond the longitudinal dimension. In this context, the disproportionate contribution of small headwater streams (that is, Group I) to ER and hence to NEP highlights their role for metabolic fluxes at the network scale. Besides the fact that headwater streams are often light-limited owing to riparian vegetation (as tree cover density), per unit area they are more tightly connected to the terrestrial environment than wider reaches further downstream. This increases the terrestrial deliveries of organic matter to these streams. Owing to elevated channel slope, bed roughness, and sediment porosity, the hydrodynamic exchange between the surface water and the streambed is typically higher in headwaters than in downstream reaches, which may enhance the respiratory breakdown of allochthonous organic carbon in these headwater streams (Battin and others 2008).

ER is largely associated with the streambed, notably with its hyporheic zone, which may explain why its annual variability (as CV) does not systematically change across the stream network. Looking more in details at the pattern shown in Figure 5b, the variability of ER is greater in reaches of intermediate size (around few km2 of drainage area) and lower for both small headwaters and large streams. From a biophysical point of view, this pattern could be explained by the dominant role of hyporheic respiration in headwaters. Indeed, the hyporheic zone has been proposed as a buffer to ecosystem processes (Boulton and others 1998). In large streams instead, the ER regime could be buffered by the availability of autochthonous biomass and damped thermal and hydrological regimes. However, it should be noted that reaches of intermediate size are also more abundant and therefore this pattern could also be induced by systematic sampling bias.

On the other hand, the downstream increase in the annual variation of GPP (Figure 5a) could be attributed to the susceptibility of GPP to changes in light- and flow-induced disturbance (for example, Uehlinger and others 1996; Uehlinger and Naegeli 1998; Bernhardt and others 2018). In fact, the annual light regime becomes more pronounced downstream as channels widen up. Our findings on the annual variability of GPP at network scale complement those that have related variations in discharge and catchment size to food chain length (Sabo and others 2010). In fact, the annual variation of GPP, and hence the basis of the green food chain, changing through a stream network may shape the phenology of herbivores and their productivity (Bernhardt and others 2018; Rüegg and others 2020). Furthermore, our findings suggest that the respiratory breakdown of organic carbon rather than its photosynthetic buildup shapes the stability of metabolic carbon fluxes at the network scale.

At the stream reach, a negative NEP typically informs on the external deliveries of organic carbon required to satisfy the ER in excess to GPP produced in that reach. It does not inform, however, on the sources of such deliveries, which either derive from the terrestrial environment (that is, allochthonous sources) or from GPP within upstream reaches. This latter would still be considered as an autochthonous source. Combining results from RF modeling with network scaling theory (Bertuzzo and others 2017), we segregated the contributions from allochthonous and autochthonous sources to ER at the network level. This allowed us to assess the relevance of terrestrial subsidies as an energetic linkage between ecosystems (sensu Polis and others 1997). GPP consistently exceeding ERau at the network level indicates that up to 45% of GPP is either exported from the network, potentially fueling metabolism further downstream, or transferred to higher trophic levels. Such a production excess was particularly pronounced during the spring peak of autotrophy in the downstream reaches. Indeed, we estimated that streams draining catchments up to 36 km2 in size (group I and II) respire on an annual basis all the organic carbon produced therein (Figure 7).

Strikingly, our findings on ERal suggest that stream networks are more heterotrophic than hitherto assumed by simply looking at NEP. In fact, while the annual network-scale NEP is -0.2 g O2 m−2d−1, the corresponding ERal is -0.46 g O2 m−2d−1 (Figure 7), the difference being due to the GPP not respired within the network. This difference becomes even more pronounced for the small streams in fall and winter (Figure 7). We interpret this finding as evidence for strong ecosystem-level allochthony in streams, in analogy to the lake allochthony (Carpenter and others 2005), and otherwise explored at the level of food webs in streams (for example, Collins and others 2016). Furthermore, these findings evoke a high retention efficiency of individual reaches. In fact, GPP fuels the heterotrophic metabolism because of the typically high bioavailability of algal exudates and the close spatial proximity of algae and heterotrophs within benthic biofilms (for example, Haack and McFeters 1982; Kaplan and Bott 1989). Our ecosystem-level observations that a large proportion of GPP is respired are indeed supported by experimental work on photoautotrophic biofilms from the Ybbs River network (Oberer Seebach) showing that benthic community respiration is sustained between 60 and 90% by organic carbon exuded from algae (Wagner and others 2017).

In this study, we have shown how network-scale applications to stream ecosystem metabolism represent the cornerstone for unveiling metabolic emerging properties, scaling relationships, and to properly assess the relative contributions of small and large streams to GPP, ER, and NEP fluxes at scales relevant for regional carbon budgets. We thus believe that this approach opens new possibilities in modeling and estimating stream ecosystem metabolism. For example, reach-scale models leveraging on the knowledge of the extrapolated forcings (for example, light and temperature) can be developed to provide a more process-based interpretation of network-scale metabolism. Moreover, climate change implications, for instance of more extreme thermal and hydrological regimes, could be more reliably tested considering the entire river continuum. Finally, we anticipate that other emerging properties for metrics like ecosystem efficiency, carbon turnover times, and carbon spiraling (Newbold and others 1982; Webster and Meyer 1997) could possibly be unveiled and tested zooming out the analysis at the scale of an entire stream network.