Univariate Time Series Forecasting of Temperature and Precipitation with a Focus on Machine Learning Algorithms: a MultipleCase Study from Greece
 172 Downloads
Abstract
We provide contingent empirical evidence on the solutions to three problems associated with univariate time series forecasting using machine learning (ML) algorithms by conducting an extensive multiplecase study. These problems are: (a) lagged variable selection, (b) hyperparameter handling, and (c) comparison between ML and classical algorithms. The multiplecase study is composed by 50 singlecase studies, which use time series of mean monthly temperature and total monthly precipitation observed in Greece. We focus on two ML algorithms, i.e. neural networks and support vector machines, while we also include four classical algorithms and a naïve benchmark in the comparisons. We apply a fixed methodology to each individual case and, subsequently, we perform a crosscase synthesis to facilitate the detection of systematic patterns. We fit the models to the deseasonalized time series. We compare the one and multistep ahead forecasting performance of the algorithms. Regarding the onestep ahead forecasting performance, the assessment is based on the absolute error of the forecast of the last monthly observation. For the quantification of the multistep ahead forecasting performance we compute five metrics on the test set (last year’s monthly observations), i.e. the root mean square error, the NashSutcliffe efficiency, the ratio of standard deviations, the coefficient of correlation and the index of agreement. The evidence derived by the experiments can be summarized as follows: (a) the results mostly favour using less recent lagged variables, (b) hyperparameter optimization does not necessarily lead to better forecasts, (c) the ML and classical algorithms seem to be equally competitive.
Keywords
Neural networks Support vector machines Hyperparameter optimization Lagged variable selection Multistep ahead forecasting Onestep ahead forecasting1 Introduction
1.1 Background Information
Machine learning (ML) algorithms are widely used for the forecasting of univariate geophysical time series as an alternative to classical algorithms. Popular ML algorithms are the rather wellestablished Neural Networks (NN) and the newentrant in most scientific fields Support Vector Machines (SVM). The latter algorithm has been presented in its current form by Cortes and Vapnik (Cortes and Vapnik 1995; see also Vapnik 1995, Vapnik 1999). The large number and wide range of the relevant applications is apparent in the review papers of Maier and Dandy (2000), and Raghavendra and Deka (2014) respectively. The competence of ML algorithms in univariate time series forecasting has been empirically proven in Papacharalampous et al. (2017a), and Tyralis and Papacharalampous (2017) through extensive simulation experiments.
Nevertheless, univariate time series forecasting using ML algorithms also implies the handling of specific factors that may improve or deteriorate the performance of the algorithms, i.e. the lagged variables and the hyperparameters. In contrast to the typical regression problem, in a forecasting problem the set of predictor variables is a set of lagged variables, formed using observed past values of the process to be forecasted and, consequently, holding information about the temporal dependence. Although the amount of the available historical information taken into account increases when using a large number of lagged variables, the length of the fitting set concomitantly decreases; for more details, see Tyralis and Papacharalampous (2017). While there is a wide literature on applications of ML algorithms in hydrological univariate time series forecasting, mainly comprising single or fewcase studies that particularly focus on details about the model structure (e.g. Atiya et al. 1999; Guo et al. 2011; Hong 2008; Kumar et al. 2004; Moustris et al. 2011; Ouyang and Lu 2017; Sivapragasam et al. 2001; Wang et al. 2006), studies explicitly stating information concerning the variable selection issue, such as Belayneh et al. (2014), Nayak et al. (2004), Hung et al. (2009) and Yaseen et al. (2016), are less. Tyralis and Papacharalampous (2017) have investigated the effect of a sufficient number of lagged variable selection choices on the performance of the Breiman’ s random forests algorithm (Breiman 2001) in onestep ahead univariate time series forecasting.
On the other hand, information on the hyperparameter selection is usually emphasized in the hydrological literature (e.g. Belayneh et al. 2014; Hung et al. 2009; Koutsoyiannis et al. 2008; ElShafie et al. 2007; Tongal and Berndtsson 2017; Valipour et al. 2013; Yu et al. 2004). An example of a hyperparameter is the number of hidden nodes within a neural networks structure. Hyperparameters are distinguished from the basic parameters, because they are usually optimized or tuned with the aim to improve the performance of a ML algorithm. Hyperparameter optimization can be performed using a single validation set extracted from the fitting set or kfold crossvalidation, which involves multiple set divisions and tests. The optimal hyperparameter values are most frequently searched heuristically, either using grid search or random search, while ML or Bayesian methods can be adopted for this task as well (Witten et al. 2017). However, nontuned ML models are also used in hydrology (e.g. Yaseen et al. 2016). Finally, a popular problem arising when using ML forecasting algorithms is the comparison between ML and classical algorithms. This problem is mostly examined within singlecase studies (e.g. Ballini et al. 2001; Koutsoyiannis et al. 2008; Tongal and Berndtsson 2017; Valipour et al. 2013; Yu et al. 2004), as applying to lagged variable and hyperparameter selection as well.
1.2 Main Contribution of this Study

Problem 1: Lagged variable selection in time series forecasting using ML algorithms

Research question 1: Should we select less recent lagged variables or a large number of lagged variables in time series forecasting using ML algorithms?

Problem 2: Hyperparameter selection in time series forecasting using ML algorithms

Research question 2: Does hyperparameter optimization necessarily lead to a better performance in time series forecasting using ML algorithms?

Problem 3: Comparison between ML and classical algorithms

Research question 3: Do the ML algorithms exhibit better (or worse) performance than the classical ones?
In fact, exploration is indispensable for understanding the phenomena involved in a specific problem and, therefore, it constitutes an essential part within every theorydevelopment process.
1.3 Research Method and Implementation
We adopt the multiplecase study research method (presented in detail in Yin (2003)), which embraces the examination of more than one individual cases, facilitating the observation of specific phenomena from multiple perspectives or within different contexts (Dooley 2002). For the detection of systematic patterns across the individual cases a crosscase synthesis can be performed (Larsson 1993). Given the fact that the boundaries between the phenomena and the context are not clear (thus, it is meaningful to consider a case study design, as explained in Baxter and Jack (2008)), it is important that each individual case keeps its identity within the multiplecase study, so that one can specifically focus on it. This exploration within and across the individual cases can provide interesting insights into the phenomena under investigation, as well as a form of generalization named “contingent empirical generalization”, while retaining the immediacy of the singlecase study method (Achen and Snidal 1989).
We explore the three problems summarized in Section 1.2 by conducting an extensive multiplecase study composed by 50 singlecase studies, which use temperature and precipitation time series observed in Greece. We examine these two geophysical processes, because they exhibit different properties, which may affect differently the results within the explorations. We focus on two ML algorithms, i.e. NN and SVM, for an analogous reason. Moreover, the explorations are conducted for the one and a multistep ahead horizons, as their corresponding forecasting attempts are not of the same difficulty. We apply a fixed methodology to each individual case. This fixed methodology provides the common basis to further perform a crosscase synthesis for the detection of systematic patterns across the individual cases. The latter is the novelty of our study.
2 Data and Methods
2.1 Methodology Outline
We conduct 50 singlecase studies by applying a fixed methodology to each of the 50 time series presented in Section 2.2, as explained subsequently. First, we split the time series into a fitting and a test set. The latter is the last monthly observation for the onestep ahead forecasting experiments and the last year’s monthly observations for the multistep ahead forecasting experiments. Second, we fit the models to the seasonally decomposed fitting set, within the context described in Section 2.3, and make predictions corresponding to the test set. Third, we recover the seasonality in the predicted values and compare them to their corresponding observed using the metrics of Section 2.4. Finally, we perform a crosscase synthesis to demonstrate similarities and differences between the singlecase studies conducted. We present the results per category of tests, which is determined by the set {set of methods, process, forecast horizon}, and further summarize them, as discussed in Section 2.4. The sets of methods are defined in Section 2.3, while the total number of categories is 20. We place emphasis on the exploration of the three problems summarized in Section 1.2, but we also present quantitative information about the produced forecasts and search for evidence regarding the existence of a possible relationship between the forecast quality, and the standard deviation (σ), coefficient of variation (cv) and Hurst parameter (H) estimates for the deseasonalized time series (available in Section 2.2). Statistical software information is summarized in the Appendix section.
2.2 Time Series
Time series of the present study
s/n  Process  Code  Location  Station information  Reference  Start  End  Length (months)  

ID  Latitude  Longitude  
1  Temperature  temp_1  Araxos  16687001  38.20  21.40  Lawrimore et al. (2011)  Jan 1951  Dec 1980  360 
2  temp_2  Athens  16714000  37.97  23.72  Jan 1858  Dec 1975  1416  
3  temp_3  Athens  16714000  37.97  23.72  Jan 1989  Dec 2001  156  
4  temp_4  Athens  16716000  37.90  23.73  Jan 1951  Dec 2012  744  
5  temp_5  Heraklion  16754000  35.33  25.18  Jan 1950  Dec 2015  792  
6  temp_6  Kalamata  16726000  37.07  22.02  Jan 1956  Dec 2015  720  
7  temp_7  Kerkyra  16641000  39.62  19.92  Jan 1951  Dec 2016  792  
8  temp_8  Larissa  16648000  39.63  22.42  Jan 1899  Dec 2016  1416  
9  temp_9  Lemnos  16650000  39.92  25.23  Jan 1951  Dec 1998  576  
10  temp_10  Methoni  16734000  36.83  21.70  Jan 1951  Dec 1972  264  
11  temp_11  Methoni  16734000  36.83  21.70  Jan 1975  Dec 2000  312  
12  temp_12  Patra  16689000  38.25  21.73  Jan 1951  Dec 1989  468  
13  temp_13  Samos  16723000  37.70  26.92  Jan 1955  Dec 1969  180  
14  temp_14  Samos  16723000  37.70  26.92  Jan 1974  Dec 2003  360  
15  temp_15  Souda  16746000  35.48  24.12  Jan 1961  Dec 2015  660  
16  temp_16  Thessaloniki  16622000  40.52  22.97  Jan 1892  Dec 2016  1500  
17  temp_17  Thessaloniki  16622001  40.52  23.02  Jan 1961  Dec 1970  120  
18  Precipitation  prec_1  Agrinion  16672000  38.60  21.70  Peterson and Vose (1997)  Jan 1956  Dec 1987  384 
19  prec_2  Alexandroupoli  16627000  40.80  25.90  Jan 1951  Dec 1990  480  
20  prec_3  Aliartos  16674000  38.40  23.10  Jan 1907  Dec 1990  1008  
21  prec_4  Anogeia  16754001  35.30  24.90  Jan 1919  Dec 1939  252  
22  prec_5  Anogeia  16754001  35.30  24.90  Jan 1950  Dec 1979  360  
23  prec_6  Araxos  16687000  38.20  21.40  Jan 1949  Dec 2000  624  
24  prec_7  Athens  16714000  38.00  23.70  Jan 1860  Dec 1881  264  
25  prec_8  Athens  16714000  38.00  23.70  Jan 1887  Dec 2005  1428  
26  prec_9  Athens  16716000  37.90  23.70  Jan 1929  Dec 1945  204  
27  prec_10  Fragma  16715001  38.20  23.90  Jan 1926  Dec 1990  780  
28  prec_11  Heraklion  16754000  35.30  25.10  Jan 1946  Dec 1990  540  
29  prec_12  Igoumenitsa  16641001  39.50  20.30  Jan 1951  Dec 1990  480  
30  prec_13  Ioannina  16642000  39.70  20.80  Jan 1951  Dec 1990  480  
31  prec_14  Kalamata  16726000  37.00  22.10  Jan 1956  Dec 1970  180  
32  prec_15  Kalo Chorio  16756001  35.10  25.70  Jan 1950  Dec 1984  420  
33  prec_16  Kastelli  16760001  35.20  25.30  Jan 1949  Dec 1976  336  
34  prec_17  Kerkyra  16641000  39.60  19.90  Jan 1952  Dec 1996  540  
35  prec_18  Kythira  16743000  36.30  23.00  Jan 1951  Dec 1973  276  
36  prec_19  Kos  16742000  36.80  27.10  Jan 1958  Dec 1990  396  
37  prec_20  Kozani  16632000  40.30  21.80  Jan 1955  Dec 1987  396  
38  prec_21  Larissa  16648000  39.60  22.40  Jan 1951  Dec 1997  564  
39  prec_22  Lemnos  16650001  39.90  25.30  Jan 1951  Dec 2000  600  
40  prec_23  Methoni  16734000  36.80  21.70  Jan 1951  Dec 1991  492  
41  prec_24  Milos  16738000  36.70  24.50  Jan 1951  Dec 1990  480  
42  prec_25  Mytilene  16667000  39.10  26.60  Jan 1952  Dec 1990  468  
43  prec_26  Naxos  16732000  37.10  25.50  Jan 1955  Dec 1971  204  
44  prec_27  Patra  16689000  38.20  21.70  Jan 1901  Dec 1984  1008  
45  prec_28  Sitia  16757000  35.20  26.10  Jan 1960  Dec 1983  288  
46  prec_29  Skyros  16684000  38.90  24.60  Jan 1955  Dec 1987  396  
47  prec_30  Thessaloniki  16622000  40.60  23.00  Jan 1931  Dec 1997  804  
48  prec_31  Thessaloniki  16622002  40.50  22.90  Jan 1961  Dec 1970  120  
49  prec_32  Trikala  16645001  39.60  21.80  Jan 1951  Dec 1990  480  
50  prec_33  Tripoli  16710000  37.50  22.40  Jan 1951  Dec 1985  420 
Mean (μ), standard deviation (σ), coefficient of variation (cv) and Hurst parameter (H) estimates for the deseasonalized temperature time series
Time series  μ estimate (°C)  σ estimate (°C)  cv estimate  H estimate 

temp_1  17.95  1.25  0.07  0.66 
temp_2  17.86  1.93  0.11  0.67 
temp_3  18.51  1.81  0.10  0.68 
temp_4  18.70  1.62  0.09  0.65 
temp_5  18.97  1.18  0.06  0.69 
temp_6  17.90  1.42  0.08  0.74 
temp_7  17.75  1.47  0.08  0.67 
temp_8  15.91  2.75  0.17  0.64 
temp_9  16.36  2.11  0.13  0.74 
temp_10  18.24  1.07  0.06  0.59 
temp_11  17.83  1.20  0.07  0.61 
temp_12  17.71  1.41  0.08  0.69 
temp_13  18.21  1.46  0.08  0.64 
temp_14  18.38  1.64  0.09  0.64 
temp_15  18.63  1.47  0.08  0.71 
temp_16  16.21  2.59  0.16  0.67 
temp_17  16.13  2.16  0.13  0.48 
Mean (μ), standard deviation (σ), coefficient of variation (cv) and Hurst parameter (H) estimates for the deseasonalized precipitation time series
Time series  μ estimate (mm)  σ estimate (mm)  cv estimate  H estimate 

prec_1  81.09  56.61  0.70  0.47 
prec_2  46.50  37.30  0.80  0.56 
prec_3  55.52  42.14  0.76  0.53 
prec_4  93.61  78.01  0.83  0.57 
prec_5  95.62  74.42  0.78  0.48 
prec_6  57.59  43.65  0.76  0.54 
prec_7  33.44  30.45  0.91  0.56 
prec_8  32.79  29.44  0.90  0.53 
prec_9  29.65  27.87  0.94  0.53 
prec_10  47.30  37.03  0.78  0.53 
prec_11  40.02  35.27  0.88  0.50 
prec_12  88.81  66.22  0.75  0.56 
prec_13  94.36  60.85  0.64  0.57 
prec_14  66.19  45.58  0.69  0.46 
prec_15  42.12  35.65  0.85  0.50 
prec_16  60.14  47.45  0.79  0.52 
prec_17  92.53  65.00  0.70  0.56 
prec_18  47.10  39.39  0.84  0.52 
prec_19  58.63  53.36  0.91  0.57 
prec_20  43.94  32.23  0.73  0.54 
prec_21  36.46  30.90  0.85  0.54 
prec_22  40.84  36.72  0.90  0.55 
prec_23  60.59  44.00  0.73  0.50 
prec_24  35.08  32.84  0.94  0.47 
prec_25  56.00  49.39  0.88  0.51 
prec_26  27.61  22.43  0.81  0.53 
prec_27  60.23  44.64  0.74  0.52 
prec_28  40.39  35.38  0.88  0.46 
prec_29  38.55  32.86  0.85  0.56 
prec_30  37.15  27.98  0.75  0.54 
prec_31  35.24  24.94  0.71  0.55 
prec_32  62.91  47.51  0.76  0.61 
prec_33  68.45  44.77  0.65  0.47 
2.3 Forecasting Algorithms and Methods
We focus on two ML forecasting algorithms, i.e. NN and SVM. The NN algorithm is the mlp algorithm of the nnet R package (Venables and Ripley 2002), while the SVM algorithm is the ksvm algorithm of the kernlab R package (Karatzoglou et al. 2004). These algorithms implement a singlehidden layer Multilayer Perceptron (MLP), and the Radial Basis kernel “Gaussian” function with C = 1 and epsilon = 0.1 respectively. Their application is made using the CasesSeries, fit and lforecast functions of the rminer R package (Cortez 2010, 2016). We also include four classical algorithms, i.e. the Autoregressive order one model (AR(1)), an algorithm from the family of Autoregressive Fractionally Integrated Moving Average models (auto_ARFIMA), the exponential smoothing state space algorithm with BoxCox transformation, ARMA errors, Trend and Seasonal Components (BATS) and the Theta algorithm, and a naïve benchmark in the comparisons. The latter sets each monthly forecast equal to its corresponding last year’s monthly value. We apply the classical algorithms using the forecast R package (Hyndman and Khandakar 2008; Hyndman et al. 2017) and, specifically, five functions included in the latter, namely the Arima, arfima, bats, forecast and thetaf functions. The auto_ARFIMA algorithm applies the Akaike Information Criterion with a correction for finite sample sizes (AICc) for the estimation of the p, d, q values of the ARFIMA(p,d,q) model, while both the AR(1) and auto_ARFIMA algorithms implement the maximum likelihood method for the estimation of the ARMA parameters. The auto_ARFIMA algorithm considers the longrange dependence observed in the time series through the d parameter. The AR(1), auto_ARFIMA and BATS algorithms apply BoxCox transformation to the input data before fitting a model to them. All the algorithms used herein are wellgrounded in the literature; thus, in their presentation we place emphasis on implementation information.
Sets of methods and their main utility within this study
s/n  Set of methods  Number of included methods  Main utility 

1  {NN given a regression matrix formed using the first n lags, n = 1, 2, …, 21}  21  Exploration of Problem 1 for the NN algorithm 
2  {SVM given a regression matrix formed using the first n lags, n = 1, 2, …, 21}  21  Exploration of Problem 1 for the SVM algorithm 
3  {NN, NN*}  2  Exploration of Problem 2 for the NN algorithm 
4  {SVM, SVM*}  2  Exploration of Problem 2 for the SVM algorithm 
5  {Naïve, AR(1), auto_ARFIMA, BATS, Theta, NN, SVM}  7  Exploration of Problem 3 for the NN and SVM algorithms 
2.4 Metrics and Summary Statistics
The onestep ahead forecasting performance is assessed by computing the absolute error (AE) of the forecast, while the multistep ahead forecasting performance by computing the RMSE, the NashSutcliffe efficiency (NSE), the ratio of standard deviations (rSD), the index of agreement (d) and the coefficient of correlation (Pr). Subsequently, we provide the definitions of the five latter metrics. For these definitions we consider a time series of N values. Let us also consider a model fitted to the first N−n values of this specific time series and subsequently used to make predictions corresponding to the last n values. Let x_{1}, x_{2}, …, x_{n} represent the last n values and f_{1}, f_{2}, …, f_{n} represent the forecasts.
It can take values between 0 and +∞. The closer to 0 it is, the better the forecast.
It can take values between −∞ and 1. The closer to 1 it is, the better the forecast, while NSE values above 0 indicate acceptable forecasts.
It can take values between 0 and +∞. The closer to 1 it is, the better the forecast.
It can take values between −1 and 1. The closer to 1 it is, the better the forecast.
It can take values between 0 and 1. The closer to 1 it is, the better the forecast.
It can take values between −∞ and +∞. The closer to 1 it is, the better the forecasts. The subscript j in the above notations indicates the serial number of each of the pairs {forecast, target value} formed for a specific category of tests.
3 Results and Discussion
In Section 3 we present and discuss the results of our multiplecase study. We place emphasis on the qualitative presentation of the results, because of its importance in the exploration of the research questions of Section 1.2. Especially the heatmap visualization adopted herein allows the examination of each singlecase study alone and in comparison to the rest simultaneously. Quantitative information, derived by our multiplecase study and particularly significant for the case of Greece, is also presented. Regarding this type of information, the present study could be viewed as an expansion of Moustris et al. (2011). The latter study has focused on four long precipitation time series observed in Alexandroupoli, Athens, Patra and Thessaloniki (a subset of the time series examined within our multiplecase study), with the aim to present forecasts for the monthly maximum, minimum, mean and cumulative precipitation totals using NN methods.
3.1 Exploration of Problem 1
 (a)
There are variations in the results across the individual cases, to an extent that it is impossible to decide on a best or worst method. Therefore, no evidence is provided by the respective categories of tests that any of the compared lagged regression matrices systematically leads to better forecasts than the rest, either for the NN or the SVM algorithms.
 (b)
The heatmaps formed for the SVM algorithm are smoother in the row direcion than those formed for the NN algorithm, a fact rather expected from Figs. 2 and 3. In other words, the variations within each singlecase study are of small magnitude for the case of the SVM algorithm, while they are significant for the NN algorithm.
 (c)
For the SVM algorithm there are no systematic patterns and the small variations seem to be rather random.
 (d)
For the NN algorithm and especially for the twelvestep ahead forecasts the left parts of the heatmaps are smoother with no white cells. Alternatively worded, it seems that is is more likely that the forecasts are better when using less recent lagged variables in conjuction with this algorithm.
3.2 Exploration of Problem 2
 (a)
Here as well, none of the compared methods seems to be systematically better across the individual cases examined. In other words, the results do not systematically favour any of the two tested hyperparameter selection procedures and, therefore, we can state that hyperparameter optimization does not necessarily lead to better forecasts than the use of the default values of the algorithms.
 (b)
For both the ML algorithms the observed variations within each of the singlecase studies are of smaller magnitude for the onestep ahead forecasts than they are for the twelvestep ahead ones.
 (c)
For the case of the NN algorithm the twelvestep ahead forecasts seem to be rather better when hyperparameter optimization precedes the fitting process, while the opposite applies to the case of the SVM algorithm.
3.3 Exploration of Problem 3
 (a)
Here as well, the results of the singlecase studies vary significantly.
 (b)
The best method within a specific singlecase study depends on the criterion of interest. In fact, even within a specific singlecase study, we cannot decide on one best (or worst) method regarding all the criteria set simultaneously.
 (c)
Observations (a) and (b) apply equally to the ML and the classical methods. In fact, it seems that both categories can rarther perform equally well, under the same limitations.
 (d)
We observe that the Naïve benchmark, competent as well, frequently produces far different forecasts than those produced by the ML or classical algorithms.
If we further compare Figs. 11a, b and 12 with Figs. 2, 3 and 4 respectively, we observe that the performance of the NN algorithm (when given the 21 regression matrices examined in the present study) can vary more than the performance of the here compared ML and classical methods. This observation does not apply to the case of the SVM algorithm. Finally, we note that the exploration presented in Section 3.3 and Papacharalampous et al. (2017a) effectively complement each other. In fact, the former illustrates and provides evidence on important points by presenting realworld results, while the latter confirms the evidence derived by the former by conducting simulation experiments of large scale. Both illustration and confirmation are integral parts of every theorybuilding process.
3.4 Additional Information
Summary statistics of the metric values computed for the temperature forecasts. The values reported for the NN and SVM algorithms are computed for the total of the NN and SVM methods implemented in this study respectively
Metric  Algorithm  Summary statistic  

Minimum  Median  Maximum  
AE (°C)  Naïve  0.10  1.00  2.20 
AR(1)  0.08  0.66  4.41  
auto_ARFIMA  0.02  0.88  4.22  
BATS  0.00  0.86  4.07  
Theta  0.11  1.00  3.92  
NN  0.00  0.98  5.79  
SVM  0.01  0.90  4.52  
RMSE (°C)  Naïve  0.92  1.60  2.62 
AR(1)  0.96  1.32  2.12  
auto_ARFIMA  0.74  1.28  1.95  
BATS  0.74  1.14  1.75  
Theta  0.74  1.14  1.73  
NN  0.63  1.70  6.05  
SVM  0.73  1.31  2.30  
NSE  Naïve  0.87  0.94  0.97 
AR(1)  0.89  0.96  0.97  
auto_ARFIMA  0.91  0.95  0.98  
BATS  0.93  0.96  0.99  
Theta  0.93  0.96  0.99  
NN  0.44  0.93  0.99  
SVM  0.85  0.95  0.99  
rSD  Naïve  0.87  1.01  1.18 
AR(1)  0.90  1.01  1.22  
auto_ARFIMA  0.90  1.01  1.21  
BATS  0.92  1.00  1.19  
Theta  0.92  0.99  1.19  
NN  0.89  1.01  1.24  
SVM  0.89  1.02  1.24  
Pr  Naïve  0.96  0.97  0.99 
AR(1)  0.98  0.99  0.99  
auto_ARFIMA  0.98  0.99  0.99  
BATS  0.98  0.99  0.99  
Theta  0.98  0.99  0.99  
NN  0.79  0.98  1.00  
SVM  0.97  0.99  0.99  
d  Naïve  0.97  0.98  0.99 
AR(1)  0.98  0.99  0.99  
auto_ARFIMA  0.98  0.99  1.00  
BATS  0.99  0.99  1.00  
Theta  0.98  0.99  1.00  
NN  0.86  0.98  1.00  
SVM  0.97  0.99  1.00 
Summary statistics of the metric values computed for the precipitation forecasts. The values reported for the NN and SVM algorithms are computed for the total of the NN and SVM methods implemented in this study respectively
Metric  Algorithm  Summary statistic  

Minimum  Median  Maximum  
AE (mm)  Naïve  0  72  239 
AR(1)  2  52  199  
auto_ARFIMA  1  45  178  
BATS  0  41  175  
Theta  2  40  178  
NN  0  51  340  
SVM  0  39  206  
RMSE (mm)  Naïve  17  52  147 
AR(1)  15  46  94  
auto_ARFIMA  16  45  105  
BATS  17  41  76  
Theta  18  41  75  
NN  17  47  588  
SVM  11  41  101  
NSE  Naïve  −13.20  −0.21  0.48 
AR(1)  −46.17  −0.90  0.64  
auto_ARFIMA  −46.17  −1.01  0.61  
BATS  −4.46  −0.35  0.69  
Theta  −5.07  −0.30  0.70  
NN  −7.55  −0.42  0.86  
SVM  −5.44  −0.44  0.76  
rSD  Naïve  0.35  1.05  3.59 
AR(1)  0.55  1.60  4.10  
auto_ARFIMA  0.56  1.55  4.10  
BATS  0.53  1.47  2.53  
Theta  0.53  1.46  2.71  
NN  0.19  1.10  2.60  
SVM  0.48  1.38  2.71  
Pr  Naïve  −0.09  0.46  0.93 
AR(1)  0.09  0.62  0.92  
auto_ARFIMA  0.09  0.62  0.93  
BATS  0.21  0.60  0.91  
Theta  0.24  0.60  0.91  
NN  −0.74  0.54  0.96  
SVM  −0.37  0.62  0.92  
d  Naïve  0.20  0.59  0.89 
AR(1)  0.17  0.70  0.89  
auto_ARFIMA  0.17  0.73  0.89  
BATS  0.46  0.73  0.90  
Theta  0.47  0.73  0.90  
NN  0.01  0.67  0.97  
SVM  0.25  0.71  0.93 
LRC values computed for each category of tests
Set of methods (see Table 4)  Process  Onestep ahead forecasts  Twelvestep ahead forecasts  

Minimum  Maximum  Minimum  Maximum  
1  Temperature  0.62  0.79  0.88  0.97 
2  0.70  0.75  0.93  0.96  
3  0.69  0.70  0.94  0.94  
4  0.70  0.70  0.94  0.95  
5  0.69  0.88  0.94  0.96  
1  Precipitation  0.00  0.43  0.41  0.56 
2  0.21  0.29  0.49  0.52  
3  0.25  0.27  0.48  0.52  
4  0.25  0.29  0.49  0.51  
5  0.21  0.29  0.40  0.52 
4 Summary and Conclusions
We have examined 50 mean monthly temperature and total monthly precipitation time series observed in Greece by applying a fixed methodology to each of them and, subsequently, by performing a crosscase synthesis. The main aim of this multiplecase study is the exploration of three problems associated with univariate time series forecasting using machine learning algorithms, i.e. the (a) lagged variable selection, (b) hyperparameter selection, and (c) comparison between machine learning and classical algorithms. We also present quantitative information about the quality of the forecasts (particularly important for the case of Greece) and search for evidence regarding the existence of a possible relationship between the forecast quality, and the standard deviation, coefficient of variation and Hurst parameter estimates for the deseasonalized time series (used for modelfitting). We have focused on two machine learning algorithms, i.e. neural networks and support vector machines, while we have also included four classical algorithms and a naïve benchmark in the comparisons. We have assessed the one and twelvestep ahead forecasting performance of the algorithms.
The findings suggest that forecasting methods based on the same machine learning algorithm may exhibit very different performance, to an extent mainly depending on the algorithm and the individual case. In fact, the neural networks algorithm can produce forecasts of many different qualities for a specific individual case, in contrast to the support vector machines one. The performance of the former algorithm seems to be more affected by the selected lagged variables than by the adopted hyperparameter selection procedure (use of predefined hyperparameters or defined after optimization). While no evidence is provided that any of the compared lagged regression matrices systematically leads to better forecasts than the rest, either for the neural networks or the support vector machines algorithms, the results mostly favour using less recent lagged variables. Furthermore, for the algorithms used in the present study hyperparameter optimization does not necessarily lead to better forecasts than the use of the default hyperparameter values of the algorithms. Regarding the comparisons performed between machine learning and classical algorithms, the results indicate that methods from both categories can perform equally well, under the same limitations. The best method depends on the case examined and the criterion of interest, while it can be either machine learning or classical. Some information of secondary importance derived by our experiments is subsequently reported. The averagecase performance of the algorithms used to produce one and twelvestep ahead monthly temperature forecasts ranges between 0.66 °C and 1.00 °C, and 1.14 °C and 1.70 °C, in terms of absolute error and root mean square error respectively. For the monthly precipitation forecasts the respective values are 39 mm and 72 mm, and 41 mm and 52 mm. Finally, no evidence is provided by our multiplecase study that there is any relationship between the forecast quality and the estimated parameters for the deseasonalized time series.
Notes
Acknowledgements
A previous shorter version of the paper has been presented in the 10th World Congress of EWRA “Panta Rei” Athens, Greece, 59 July, 2017 under the title “Forecasting of geophysical processes using stochastic and machine learning algorithms” (Papacharalampous et al. 2017b). We thank the Scientific and Organizing Committees for selecting this research. We also thank the Guest Editor and two anonymous reviewers of Water Resources Management for the time they have devoted to our work.
References
 Achen CH, Snidal D (1989) Rational deterrence theory and comparative case studies. World Polit 41(2):143–169. https://doi.org/10.2307/2010405 CrossRefGoogle Scholar
 Atiya AF, ElShoura SM, Shaheen SI, ElSherif MS (1999) A comparison between neuralnetwork forecasting techniquescase study: river flow forecasting. IEEE Trans Neural Netw 10(2):402–409. https://doi.org/10.1109/72.750569 CrossRefGoogle Scholar
 Ballini R, Soares S, Andrade MG (2001) Multistepahead monthly streamflow forecasting by a neurofuzzy network model. IFSA World Congress and 20th NAFIPS International Conference, p 992–997. https://doi.org/10.1109/NAFIPS.2001.944740
 Baxter P, Jack S (2008) Qualitative case study methodology: study design and implementation for novice researchers. Qual Rep 13(4):544–559Google Scholar
 Belayneh A, Adamowski J, Khalil B, OzgaZielinski B (2014) Longterm SPI drought forecasting in the Awash River basin in Ethiopia using wavelet neural network and wavelet support vector regression models. J Hydrol 508:418–429. https://doi.org/10.1016/j.jhydrol.2013.10.052 CrossRefGoogle Scholar
 Breiman L (2001) Random forests. Mach Learn 45(1):5–32. https://doi.org/10.1023/A:1010933404324 CrossRefGoogle Scholar
 Brownrigg R, Minka TP, Deckmyn A (2017) maps: draw geographical maps. R package version 3.2.0. https://CRAN.Rproject.org/package=maps
 Cortes C, Vapnik V (1995) Supportvector networks. Mach Learn 20(3):273–297. https://doi.org/10.1007/BF00994018 CrossRefGoogle Scholar
 Cortez P (2010) Data mining with neural networks and support vector machines using the R/rminer tool. In: Perner P (ed) Advances in data mining. Applications and theoretical aspects. Springer, Berlin Heidelberg, pp 572–583. https://doi.org/10.1007/9783642144004_44 CrossRefGoogle Scholar
 Cortez P (2016) rminer: Data Mining Classification and Regression Methods. R package version 1.4.2. https://CRAN.Rproject.org/package=rminer
 Dooley LM (2002) Case study research and theory building. Adv Dev Hum Resour 4(3):335–354. https://doi.org/10.1177/1523422302043007 CrossRefGoogle Scholar
 ElShafie A, Taha MR, Noureldin A (2007) A neurofuzzy model for inflow forecasting of the Nile river at Aswan high dam. Water Resour Manag 21(3):533–556. https://doi.org/10.1007/s1126900690271 CrossRefGoogle Scholar
 Fraley C, Leisch F, Maechler M, Reisen V, Lemonte A (2012) fracdiff: Fractionally differenced ARIMA aka ARFIMA(p,d,q) models. R package version 1.4–2. https://CRAN.Rproject.org/package=fracdiff
 Guo J, Zhou J, Qin H, Zou Q, Li Q (2011) Monthly streamflow forecasting based on improved support vector machine model. Expert Syst Appl 38(10):13073–13081. https://doi.org/10.1016/j.eswa.2011.04.114 CrossRefGoogle Scholar
 Hong WC (2008) Rainfall forecasting by technological machine learning models. Appl Math Comput 200(1):41–57. https://doi.org/10.1016/j.amc.2007.10.046 CrossRefGoogle Scholar
 Hung NQ, Babel MS, Weesakul S, Tripathi NK (2009) An artificial neural network model for rainfall forecasting in Bangkok, Thailand. Hydrol Earth Syst Sci 13:1413–1425. https://doi.org/10.5194/hess1314132009 CrossRefGoogle Scholar
 Hyndman RJ, Khandakar Y (2008) Automatic time series forecasting: the forecast package for R. J Stat Softw 27(3):1–22. https://doi.org/10.18637/jss.v027.i03
 Hyndman RJ, O'HaraWild M, Bergmeir C, Razbash S, Wang E (2017) forecast: Forecasting Functions for Time Series and Linear Models. R package version 8.2. https://CRAN.Rproject.org/package=forecast
 Karatzoglou A, Smola A, Hornik K, Zeileis A (2004) kernlab  an S4 package for kernel methods in R. J Stat Softw 11(9):1–20. https://doi.org/10.18637/jss.v011.i09
 Koutsoyiannis D, Yao H, Georgakakos A (2008) Mediumrange flow prediction for the Nile: a comparison of stochastic and deterministic methods. Hydrol Sci J 53(1):142–164. https://doi.org/10.1623/hysj.53.1.142 CrossRefGoogle Scholar
 Krause P, Boyle DP, Bäse F (2005) Comparison of different efficiency criteria for hydrological model assessment. Adv Geosci 5:89–97CrossRefGoogle Scholar
 Kumar DN, Raju KS, Sathish T (2004) River flow forecasting using recurrent neural networks. Water Resour Manag 18(2):143–161. https://doi.org/10.1023/B:WARM.0000024727.94701.12 CrossRefGoogle Scholar
 Larsson R (1993) Case survey methodology: quantitative analysis of patterns across case studies. Acad Manag J 36(6):1515–1546. https://doi.org/10.2307/256820 CrossRefGoogle Scholar
 Lawrimore JH, Menne MJ, Gleason BE, Williams CN, Wuertz DB, Vose RS, Rennie J (2011) An overview of the Global Historical Climatology Network monthly mean temperature data set, version 3. J Geophys ResAtmos 116(D19121). https://doi.org/10.1029/2011JD016187
 Maier HR, Dandy GC (2000) Neural networks for the prediction and forecasting of water resources variables: a review of modelling issues and applications. Environ Model Softw 15(1):101–124. https://doi.org/10.1016/S13648152(99)000079 CrossRefGoogle Scholar
 Moustris KP, Larissi IK, Nastos PT, Paliatsos AG (2011) Precipitation forecast using artificial neural networks in specific regions of Greece. Water Resour Manag 25(8):1979–1993. https://doi.org/10.1007/s1126901197905 CrossRefGoogle Scholar
 Nash JE, Sutcliffe JV (1970) River flow forecasting through conceptual models part I—A discussion of principles. J Hydrol 10(3):282–290. https://doi.org/10.1016/00221694(70)902556 CrossRefGoogle Scholar
 Nayak PC, Sudheer KP, Ranganc DM, Ramasastrid KS (2004) A neurofuzzy computing technique for modeling hydrological time series. J Hydrol 291(1–2):52–66. https://doi.org/10.1016/j.jhydrol.2003.12.010 CrossRefGoogle Scholar
 Ouyang Q, Lu W (2017) Monthly rainfall forecasting using echo state networks coupled with data preprocessing methods. Water Resour Manag 32(2):659–674. https://doi.org/10.1007/s1126901718321
 Papacharalampous GA, Tyralis H, Koutsoyiannis D (2017a) Comparison of stochastic and machine learning methods for multistep ahead forecasting of hydrological processes. Preprints 2017100133. https://doi.org/10.20944/preprints201710.0133.v1
 Papacharalampous GA, Tyralis H, Koutsoyiannis D (2017b) Forecasting of geophysical processes using stochastic and machine learning algorithms. Eur Water 59:161−168Google Scholar
 Peterson TC, Vose RS (1997) An overview of the global historical climatology network temperature database. Bull Am Meteorol Soc 78(12):2837–2849. https://doi.org/10.1175/15200477(1997)078<2837:AOOTGH>2.0.CO;2 CrossRefGoogle Scholar
 R Core Team (2017) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna. https://www.Rproject.org/
 Raghavendra NS, Deka PC (2014) Support vector machine applications in the field of hydrology: a review. Appl Soft Comput 19:372–386. https://doi.org/10.1016/j.asoc.2014.02.002 CrossRefGoogle Scholar
 Sivapragasam C, Liong SY, Pasha MFK (2001) Rainfall and runoff forecasting with SSASVM approach. J Hydroinf 3(3):141–152CrossRefGoogle Scholar
 Taieb SB, Bontempi G, Atiya AF, Sorjamaa A (2012) A review and comparison of strategies for multistep ahead time series forecasting based on the NN5 forecasting competition. Expert Syst Appl 39(8):7067–7083. https://doi.org/10.1016/j.eswa.2012.01.039 CrossRefGoogle Scholar
 Tongal H, Berndtsson R (2017) Impact of complexity on daily and multistep forecasting of streamflow with chaotic, stochastic, and blackbox models. Stoch Env Res Risk A 31(3):661–682. https://doi.org/10.1007/s0047701612364 CrossRefGoogle Scholar
 Tyralis H (2016) HKprocess: HurstKolmogorov Process. R package version 0.0–2. https://CRAN.Rproject.org/package=HKprocess
 Tyralis H, Koutsoyiannis D (2011) Simultaneous estimation of the parameters of the Hurst–Kolmogorov stochastic process. Stoch Env Res Risk A 25(1):21–33. https://doi.org/10.1007/s004770100408x CrossRefGoogle Scholar
 Tyralis H, Papacharalampous GA (2017) Variable selection in time series forecasting using random forests. Algorithms 10(4):114. https://doi.org/10.3390/a10040114 CrossRefGoogle Scholar
 Valipour M, Banihabib ME, Behbahani SMR (2013) Comparison of the ARMA, ARIMA, and the autoregressive artificial neural network models in forecasting the monthly inflow of Dez dam reservoir. J Hydrol 476(7):433–441. https://doi.org/10.1016/j.jhydrol.2012.11.017 CrossRefGoogle Scholar
 Vapnik VN (1995) The nature of statistical learning theory, 5th edn. SpringerVerlag, New York. https://doi.org/10.1007/9781475732641 CrossRefGoogle Scholar
 Vapnik VN (1999) An overview of statistical learning theory. IEEE Trans Neural Netw 10(5):988–999. https://doi.org/10.1109/72.788640 CrossRefGoogle Scholar
 Venables WN, Ripley BD (2002) Modern applied statistics with S, 4th edn. SpringerVerlag, New York. https://doi.org/10.1007/9780387217062 CrossRefGoogle Scholar
 Wang W, Van Gelder PH, Vrijling JK, Ma J (2006) Forecasting daily streamflow using hybrid ANN models. J Hydrol 324(1–4):383–399. https://doi.org/10.1016/j.jhydrol.2005.09.032 CrossRefGoogle Scholar
 Warnes GR, Bolker B, Gorjanc G, Grothendieck G, Korosec A, Lumley T, MacQueen D, Magnusson A, Rogers J, et al (2017) gdata: Various R Programming Tools for Data Manipulation. R package version 2.18.0. https://CRAN.Rproject.org/package=gdata
 Wickham H (2016) ggplot2. Springer International Publishing. https://doi.org/10.1007/9783319242774
 Wickham H, Chang W (2017) devtools: Tools to Make Developing R Packages Easier. R package version 1.13.4. https://CRAN.Rproject.org/package=devtools
 Wickham H, Henry L (2017) tidyr: Easily Tidy Data with 'spread()' and 'gather()' Functions. R package version 0.7.2. https://CRAN.Rproject.org/package=tidyr
 Wickham H, Hester J, Francois R, Jylänki J, Jørgensen M (2017) readr: Read Rectangular Text Data. R package version 1.1.1. https://CRAN.Rproject.org/package=readr
 Witten IH, Frank E, Hall MA, Pal CJ (2017) Data mining: practical machine learning tools and techniques, 4th edn. Elsevier Inc., Amsterdam ISBN:9780128042915Google Scholar
 Xie Y (2014) knitr: a comprehensive tool for reproducible research in R. In: Stodden V, Leisch F, Peng RD (eds) Implementing reproducible computational research. Chapman and Hall/CRC, LondonGoogle Scholar
 Xie Y (2015) Dynamic documents with R and knitr, 2nd edn. Chapman and Hall/CRC, LondonGoogle Scholar
 Xie Y (2017) knitr: A GeneralPurpose Package for Dynamic Report Generation in R. R package version 1.17. https://CRAN.Rproject.org/package=knitr
 Yaseen ZM, Allawi MF, Yousif AA, Jaafar O, Hamzah FM, ElShafie A (2016) Nontuned machine learning approach for hydrological time series forecasting. Neural C Ap 30(5):1479–1491. https://doi.org/10.1007/s0052101627630
 Yin RK (2003) Case study research: design and methods, 3rd edn. Sage Publications, Inc., Thousand OaksGoogle Scholar
 Yu X, Liong SY, Babovic V (2004) ECSVM approach for realtime hydrologic forecasting. J Hydroinf 6(3):209–223CrossRefGoogle Scholar
 ZambranoBigiarini M (2017a) hydroGOF: GoodnessofFit Functions for Comparison of Simulated and Observed Hydrological Time Series. R package version 0.3–10. https://CRAN.Rproject.org/package=hydroGOF
 ZambranoBigiarini M (2017b) hydroTSM: Time Series Management, Analysis and Interpolation for Hydrological Modelling. R package version 0.5–1. https://github.com/hzambran/hydroTSM
 Zeileis A, Grothendieck G (2005) zoo: S3 infrastructure for regular and irregular time series. J Stat Softw 14(6):1–27. https://doi.org/10.18637/jss.v014.i06