1 Introduction

The Big Data phenomenon has revolutionized the modern world, and is now the hottest Data Mining topic according to polls conducted by kdnuggets.com, with the current trend expected to continue into the foreseeable future. At present there is no unified definition of Big Data, however Shi [79] presented two definitions for Big Data. For academics, Big Data is “a collection of data with complexity, diversity, heterogeneity, and high potential value that are difficult to process and analyze in reasonable time”, whilst for policy makers, Big Data is “a new type of strategic resource in the digital era and the key factor to drive innovation, which is changing the way of humans’ current production and living” ([79], p. 6). As Varian ([92], p. 24) accurately asserts, “Big data will only get bigger”. This increased availability of data, which has been further escalated through the evolution of Big Data is now a major concern for a large number of industries [49]. It is not entirely surprising that this increasing availability of data is causing anxiety, and this is evident through the sound example presented by [86] where the authors state that in the modern age, we generate 70 times the information stored in the library of congress simply within the first 24 h of a new born baby’s life. In another report, it is noted that South Korea is currently upgrading its data storage capabilities associated with its national weather information system by increasing the capacity to 9.3 petabytes [44] and these examples provide an indication of the rate at which Big Data continues to grow. The times have truly ‘changed’ and we now live in an age where Big Data is identified as the leading edge for innovation, competition and productivity [63]. For example, digital data was expected to grow from 161 exabytes in 2006 to 2837 extabytes in 2012 and is now forecasted to reach 40 trillion gigabytes in 2020 [37]. Moreover, in the year 2008 alone the world had produced 14.7 exabytes of new data [14].

The emergence of Big Data is now history. What is of importance is how organizations develop the tools and means necessary for reacting to, and exploiting the increasingly available Big Data for their advantage. In line with this, [92] notes that there is a need for the adoption of powerful tools such as Data Mining techniques which can aid in modelling the complex relationships that are inherent in Big Data. Moreover, it is noteworthy that the recent financial crisis has seen an increase in the prolific importance of risk management in organizations, and as [83] states, companies are now seeking to use risk management as a tool for maximising their opportunities whilst minimising the associated threats. Herein lays the opportunity, as Big Data forecasting has the ability to improve organizational performance whilst enabling better risk management [16]. As [10] states, Big Data and predictive analysis goes hand in hand in the modern age with companies focussing on obtaining real time forecasts using the increasingly available data.

However, not all authors agree that Big Data is a revolutionary phenomenon. Poynter [72] states Big Data will be more insightful in simply connecting the dots as opposed to painting a whole new picture. For [94], 2013 was the year for getting accustomed to Big Data and 2014 is the year for truly exploiting Big Data towards lucrative gains. We share and subscribe to [94] perception on Big Data. Accordingly, we present this review paper which aims to: provide an informative review of the forecasting techniques utilized for forecasting with Big Data; provide a concise summary of the contributions of yesteryear; and identify challenges which need be overcome as the world gears to embrace and live in the presence of Big Data. In the process, we are successful in reviewing a wide range of forecasting models which have been adopted for forecasting with Big Data. In order to enable the reader to have a clear understanding of the history, we present the review of applications differentiated by the relevant field (i.e., economics, finance, and energy among others) and topic where appropriate. Those interested in tools which can be used for manipulating Big Data are referred to Varian ([92], Table 1, p. 5)

The remainder of this paper is organized as follows. In the following section we discuss the problems and potential behind Big Data forecasts whilst the associated challenges are discussed in Sect. 3. Section 4 provides a review of statistical and Data Mining techniques that have been evaluated for the purpose of forecasting with Big Data and the paper ends with some conclusions in Sect. 5.

2 The ‘Problem’ and ‘Potential’ of Big Data Forecasting

There exists a widespread belief that Big Data can aid in improving forecasts provided that we can analyse and discover hidden patterns, and [76] agree that predictions can be improved through data driven decision making. Tucker [91] believes Big Data will soon be predicting our every move, and according to [29], Big Data is most commonly sought after for building predictive models in a world where forecasting continues to remain a vital statistical problem [46]. We then come to the question, what is the problem behind forecasting with Big Data? The simplest explanation is that the traditional forecasting tools cannot handle the size, speed and complexity inherent in Big Data [62]. According to [3] this is owing to the lack of a structure in these data sets and the size. As a result, traditional techniques are seldom preferred for tackling Big Data [3]. Therefore, forecasting Big Data poses a challenge to organizations, and this is further highlighted by the European Central Bank which is conducting an entire workshop dedicated to using Big Data for forecasting.Footnote 1 Laney [59] is first to discuss the importance of data volume, velocity and variety in the context of Big Data. A decade later, [26] and [73] identifies these ‘3V’s’ as three concepts which define the dimensions of Big Data.

Rey and Wells [75] believe Data Mining techniques can be exploited to help forecasting with Big Data, a view supported by [92]. However, it should be noted that in the past, Data Mining techniques have mainly been used on static data as opposed to time series (see for example [11, 39, 74]); [58]; [45]. Interestingly, [22] finds fault with Big Data for the recent financial crisis as he believes the financial models adopted were unable to handle the huge amounts of data that was being inputted into the systems, and thereby resulted in inaccurate forecasts.

The opportunities for gains through forecasting with Big Data are diverse. At present, there is increased research into using Big Data for obtaining accurate weather forecasts and the initial results suggests that Big Data will benefit weather forecasts immensely [44, 55]. In fact, weather forecasting has been one of the main beneficiaries of Big Data, but the forecasts are still inaccurate beyond a week [84]. The fashion industry too is exploiting Big Data forecasts with companies such as EDITD (http://editd.com/) using Big Data for forecasting the fashion future by collecting data from social media [53]. According to [4], the airline industry is yet another field where Big Data forecasting is crucial. An interesting success story behind forecasting with Big Data is Netflix and its use of Big Data forecasts for decision making prior to commencing production of its own TV show ‘House of Cards’, and this resulted in increased revenue for the company. The potential underlying Big Data forecasts is truly astonishing and at times ‘scary’ as was evident in the experience of an individual in a story narrated by [27] where an irate customer walks into a ‘Target’ store in Minneapolis to complain about the store sending coupons relating to pregnancy products to his high school daughter. A few weeks later the same customer apologizes to the manager as following a discussion with his daughter it was revealed that she was in fact pregnant [27].

3 Challenges for Forecasting with Big Data

In this section we focus mainly on the challenges which need be overcome when forecasting with Big Data. It is imperative to note that the availability of Big Data alone does not constitute the end of problems [4]. A good example is the existence of a vast amount of data on earthquakes, but the lack of a reliable model that can accurately predict earthquakes [84]. Some existing challenges are related to hypothesis, testing and models utilized for Big Data forecasting ([72, 84] whilst [95] identifies as an added concern, the lack of theory to complement Big Data. Besides these, we have identified the following varied challenges associated with forecasting Big Data that needs to be given due consideration.

3.1 Skills

The skills required for tackling the problem of forecasting with Big Data, and the availability of personnel skilled for this particular task is one of the foremost challenges. As [3] states, the advanced skills required to handle Big Data is a major challenge whilst [72] notes there is a short supply of data scientists equipped with the skills required to tackle Big Data. Thornton [90] also agrees that there is a shortage of people who can understand Big Data. In a world where academics, researchers and statisticians are highly experienced in using traditional statistical techniques for over fifty years to obtain accurate forecasts, the availability of Big Data in itself is challenging. Skupin and Agarwal [85] states that the inaptness of traditional statistical techniques which are meant for obtaining forecasts from traditional data are hindering the effectiveness and application of forecasts from Big Data and [3] shares this same concern. As majority of statisticians are experienced in these traditional techniques, [29] points out that it is a challenge to develop the required skills for Big Data forecasting. In order to overcome this issue, it is important that Higher Educational Institutes around the globe give due consideration to upgrading the educational syllabuses to incorporate the skills necessary for understanding, analysing, evaluating and forecasting with Big Data so that the next generation of statisticians will be well equipped with the mandatory skills.

3.2 Signal and Noise

A more technical, but extremely important challenge in Big Data forecasting is identified by [84]. He suggests that noise is distorting the signal in Big Data, and that there is an increasing noise to signal ratio visible in Big Data. Silver’s [84] notion is further confirmed via [7] who points out that with large data sets, extracting the signal is made more complex. A majority of traditional forecasting techniques forecast both the noise and signal, and whilst they perform relatively well in the case of traditional data sets, the increasing noise to signal ratio seen in Big Data is more likely to distort the accuracy of forecasts. This suggests that there is a need for employing and evaluating the use of forecasting techniques which can filter the noise in Big Data and forecast the signal alone. As an example, one such technique is known to be singular spectrum analysis (SSA) which seeks to filter the noise from a given time series, reconstruct a new series which is less noisy, and then use this newly reconstructed series for forecasting future data points. The superiority of the methodology of SSA over traditional techniques has been proven recently in a variety of fields where the data sets would have comparatively small signal to noise ratios in relation to the much higher signal to noise ratio expected in Big Data (see for example, [47, 48, 50, 81] and [82]). Future research should concentrate on evaluating the applicability of such techniques for filtering the noise in Big Data to enable accurate and meaningful forecasts.

3.3 Hardware and Software

Arribas-Bel [3] was of the view that current statistical software is not able to tackle Big Data forecasting whilst [67] notes the possible need for supercomputers to handle Big Data forecasts. Recently, [51] have developed automatic forecasting techniques which can provide output within a matter of seconds. However, their reliability in the face of Big Data is yet to be tested. Another issue directly related to hardware and software is that personally, we have experienced statistical programs crashing in the face of few thousand observations owing to deficiencies in random access memory or the associated software. As such it is prudent to agree that computing capabilities and the structure underlying statistical software will require enhancements in order to be able to successfully handle the increased data input.

3.4 Statistical Significance

Lohr [60] suggests there is an increased threat of making false discoveries from Big Data. This is because even though obtaining forecasts using an appropriate technique appears to be the major challenge, it is not quite so. Given the sheer quantity of data that needs to be processed and forecasted, with Big Data there is an increased complexity in differentiating between randomness and statistically significant outcomes [28]. As such there is an increased chance of reporting a chance occurrence as a statistically significant outcome and misleading the stakeholders interested in the forecast.

3.5 Architecture of Algorithms

Data Mining techniques are suggested as important methods which could be used for forecasting with Big Data. However, these techniques have been designed to handle data of comparatively smaller sizes as opposed to the size of Big Data. Therefore, Data Mining algorithms are often unable to work with data that is not loaded on to its main memory, and thus requires the movement of Big Data between locations which can incur increased network communication costs [52]. The architecture of the analytics needs to be redesigned so that it could handle both historical and real time data [52], and the Lambda architecture proposed in [64] is a sound example of research currently seeking to overcome this issue. A detailed evaluation of challenges associated with the application of Data Mining techniques to Big Data (explained in the context of official statistics) can be found in [49].

3.6 Big Data

Big Data itself is a challenge for forecasting as a result of its inherent characteristics. Firstly, Big Data evolves and changes in real time, and as such it is important that the techniques used to forecast Big Data are able to transform unstructured data into structured data [79], accurately capture these dynamic changes and detect change points in advance. Secondly, there are challenges stemming from Big Data’s highly complex structure and as [29] point out, it is a challenge to build forecasting models that do not result in poor out-of-sample forecasts owing to the ‘over use’ of potential predictors. Factor modelling which is discussed in the following section is a potential cure for this challenge, but more devoted research is needed to overcome the issue completely.

4 Applications of Statistical and Data Mining Techniques for Big Data Forecasting

In this section we identify existing applications of statistical and Data Mining techniques for forecasting with Big Data. We have summarized these based on the related field and topic (where relevant) in order to provide the reader with a more rewarding experience. At the outset, it is noteworthy to mention that [34] and [87] are closely associated with the developments of Econometric techniques for analysis and forecasting with Big Data.

4.1 Forecasting with Big Data in Economics

Researchers in the field of Economics have been major exploiters of Big Data for forecasting various economic variables. Camacho and Sancho [17] used a dynamic factor model (DFM) based on the methodology presented in [87] to forecast a large dataset involving Spanish diffusion indexes which they describe as an exhaustive description of the Spanish economy. DFM models are an extension of [87] factor models and are frequently used for forecasting with Big Data. However, [24] asserted that the use of DFM for macroeconomic Big Data forecasting is flawed as it is based on linear models, and also as Big Data is more likely to be nonlinear. Over time through the work of [35, 88] and [54], the DFM technique was improved, thus enabling it to handle Big Data more appropriately.

The application of Maximum Likelihood estimates of Factor Models for Big Data forecasting has been evaluated by [25] via a simulation study where the authors find this approach to be effective and efficient. A seasonal AR model was used to show how Big Data from the Google search engine can be used to predict economic indicators in [21]. Gupta et al. [42] used a multivariate factor-augmented Bayesian shrinkage model on Big Data comprising of 143 monthly time series to forecast employment in eight sectors of the U.S. economy. Big Data relating to various exchange rates are used to forecast the Euro, British Pound and Japanese Yen in [8] where the authors find their proposed factor-augmented Error Correction Model (FECM) outperforming a factor-augmented VAR (FAVAR) model at accurately predicting the three major bilateral exchange rates. Bańbura et al. [5] proposed an algorithm based on Kalman filtering for large VAR and DFM models to enable obtaining conditional forecasts, and providing a scenario analysis for the European economy using 26 macroeconomic and financial indicators for the Euro area.

In what follows, we further group the applications of Data Mining and statistical techniques for forecasting with Big Data in the field of Economics into topics based on the use of Big Data to forecast economic variables.

4.1.1 Gross Domestic Product (GDP)

Schumacher [77] evaluated the forecasting performance of Factor models using Static and Dynamic Principal Components and Subspace algorithms for State Space models. He finds Factor models outperforming AR models at forecasting a large panel of quarterly time series relating to German GDP. Moreover, the Subspace Factor Model and Dynamic Principal Component model is seen outperforming the Static Factor Model, but this ranking depend greatly on the correct specification of the model parameters [77]. A large factor model which uses an expectation maximization algorithm combined with Principal Components is adopted in [78] to forecast a large dataset comprising of German real-time GDP. They find the Mixed Frequency Factor model performing better than simple benchmark models but find meagre differences in forecast accuracy between the Factor models themselves. Biau and D’Elia [12] apply the ensemble machine learning technique of Random Forests to forecast European Union GDP using large survey datasets. They find Random Forests outperforming the AR model and forecasts from the ‘Euro zone economic outlook’. Biau and D’Elia [12] also note that Random Forests are popular for its ability of not over-fitting when handling a large number of inputs. Altissimo et al. [2] use monthly accumulated Big Data along with a modified DFM to forecast medium-to long run GDP growth in the Euro area and finds their model is able to perform better than Bandpass Filter in terms of fitting and change point detection. Carriero et al. [18] adopted a Bayesian Mixed Frequency model in combination with stochastic volatility for nowcasting with Big Data to obtain real time GDP predictions for the United States. [8] used 90 monthly time series for the German economy and showed that a FECM can outperform a FAVAR model at forecasting real GDP in Germany. Kopoin et al. [57] used factor models with national and international Big Data for improving the accuracy of GDP forecasts for Canadian provinces below the one year ahead mark. Beyond the one year ahead horizon they find that relying on the provincial data alone optimizes the forecasts. Bańbura and Modugno [7] use factor models with maximum likelihood estimation on over 101 series for nowcasting GDP in the Euro area. They find that sectoral information is not mandatory for obtaining accurate GDP predictions in the Euro area.

4.1.2 Monetary Policy

In [9], a FAVAR model was used for Big Data forecasting and structural analysis in order to accurately identify the monetary policy transmission mechanism so that the exact impact of monetary policy on the economy could be ascertained. They find the proposed FAVAR model outperforming the Structural VAR model by exploiting far more informative content for assessing the monetary policy transmission mechanism. De Mol et al. [23] use a Bayesian regression model with macroeconomic Big Data which includes real and nominal variables, asset prices, surveys and yield curves for forecasting the industrial production and consumer price indices. They find the results from the Bayesian regression are highly correlated with forecasts from principal components. Alessi et al. [1] exploits a monthly panel dataset comprising of 130 U.S. macroeconomic time series and four price indexes (PCE, PCE core, CPI and CPI core) for forecasting inflation and its volatility using a DFM in combination with multivariate GARCH models (DF-GARCH). They find the DF-GARCH model outperforming GARCH, AR(\(p\)) and AR(\(p\))-GARCH(1,1) and other univariate and classical factor models. The work of [23] was extended in [6] to show that combining VAR with Bayesian Shrinkage can improve forecasts. The authors conclude that Bayesian VAR models are appropriate for Big panel Data. Bordoloi et al. [13] developed a DFM to forecast India’s industrial production and price level, and cited the DFM model’s ability to handle many variables found in Big Data as the reason behind its selection. Here, they found that the DFM model outperforming an ordinary least squares model. Figueiredo [32] exploits 368 monthly time series which include a variety of economic variables alongside a factor model with targeted predictors (FTP) for forecasting Brazilian inflation. They find the FTP outperforming VAR, Bayesian VAR and a Principal Component based Factor model at forecasting Brazilian inflation. Carriero et al. [20] consider Big Data relating to 52 U.S. macroeconomic time series taken from [88] along with a Bayesian Reduced Rank multivariate model for forecasting industrial production and consumer price indices and the federal funds rate. They compare their results against models based on Rank Reduction, which include Bayesian VAR models, Multivariate Boosting and the Factor model from [87]. They find that combining Rank Reduction with Shrinkage can improve forecasts attained when applied to Big Data. Giovanelli [41] proposes the use of kernel principal component analysis (PCA) (as this enables factors to take a nonlinear relationship to the input variables) and an Artificial Neural Networks (ANN) model on Big Data containing 259 predictors for the Euro area, and 131 predictors for the U.S. economy for forecasting the industrial production and consumer prices indices. The author finds that using the Kernel PCA approach for predicting nonlinear factors yield results of better quality in comparison to the linear method, and that the ANN method reports a similar forecast to that obtained via the Factor Augmented Linear forecasting equation. In [19] a large BVAR model coupled with optimised shrinkage towards univariate AR models are used to forecast interest rates. The authors find the BVAR model showing small gains over random walk forecasts. Banerjee et al. [8] used a FECM and showed that FECM can outperform a FAVAR model at forecasting; U.S. inflation using a large panel of 132 U.S. macroeconomic variables, and German inflation and interest rate using 90 monthly series for the German economy. Ouysse [70] compared bayesian model averaging (BMA) and principal component regression (PCR) on a large panel data set for forecasting U.S. inflation and industrial production. Based on the Root Mean Squared Error the author concludes that in general, PCR can provide more accurate forecasts than BMA. Koop [56] exploits the U.S. macroeconomic data set found in [89] which includes 168 variables along with BVAR model for forecasting inflation and interest rates. They find that BVAR models can provide better forecasts than those attainable via Factor methods. Using the U.S. Treasury zero coupon yield curve estimates, [8] showed that a FECM model can outperform a FAVAR model at forecasting interest rates at different maturities.

4.2 Forecasting with Big Data in Finance

Alessi et al. [1] use their DF-GARCH model for forecasting financial asset returns using Big Data relating to transaction prices of stocks traded on the London Stock Exchange after cleaning the data for outliers. They find the DF-GARCH model outperforming a GARCH (1,1) model and that the full BEKK specification [31] provides better forecasts in comparison to the DCC specification [30].

4.3 Forecasting with Big Data in Population Dynamics

An imputation based on Neural Networks model was applied to the Norwegian population census data of 1990 in order to perform a population census by combining administrative data along with data gathered through sample surveys [68]. A procedure based on Neural Networks was used by [36] to predict trends in Spanish economic indexes per household and censal section by using the Spanish Population and Housing Census, and Family Expenditure Survey. Bayesian regression was used by [71] for predicting long term illness in Stockport UK by using statistics from the 1991 census. Cluster Analysis was used as a method for predicting missing data by analysing the 2007 census donor pool screening in [65]. Unlikely representations of farming operations in the initial Census mail list have been predicted using Classification Trees according to [38]. Gilary [40] reports the US Census Bureau exploited the Decision Trees technique by combining a Stepwise regression with the classification and regression tree concept for recursive portioning of racial classification cells. Moreover, there is evidence of Decision Trees being used to forecast survey non-respondents through the work of [66].

4.4 Forecasting with Big Data in Crime

Wu et al. [96] rely on a Kohonen Neural Network Clustering algorithm to find outliers and then forecast fraudulent behaviour in the data intensive Chinese telecom industry after evaluating its performance in comparison to a two-step Clustering algorithm and K-means algorithm.

4.5 Forecasting with Big Data in Energy

Wang [93] uses Support Vector Machines as an auxiliary method, along with Neural Networks, and ‘MapReduce’ technology, for forecasting Big Data originating from China’s electricity consumption. He finds the developed prediction model is able to provide sound portability and feasibility in terms of processing Big Data relating to electricity. Nguyen and Nabney [69] evaluate the use of wavelet transform (WT) in combination with a variety of models such as GARCH, Linear regressions, Radial Basis functions and multilayer perceptrons (MLP) to forecast UK gas price and electricity demand by exploiting Big Data from the British energy markets. They find that the use of WT and adaptive models can provide considerable improvements to forecasting accuracy. The conclusion here is that adaptive models combining WT with either MLP or GARCH are the optimal models for forecasting gas price and electricity demand based on the lowest mean squared error. Fischer et al. [33] evaluates the use of Exponential Smoothing and ARIMA models in combination with a model configuration advisor to forecast energy demand using Big Data from an energy domain.

4.6 Forecasting with Big Data in Environment

Sigrist et al. [80] utilizes stochastic advection diffusion partial differential equations (SPDEs) to improve the precipitation forecasts for northern Switzerland using Big Data from a numerical weather prediction model. They find that following the application of SPDE, the forecasts are greatly improved in comparison to the raw forecasts attained via the numerical model.

4.7 Forecasting with Big Data in Biomedical Science

Lutz and Buhlmann [61] provide theoretical evidence for the applicability of Multivariate Boosting for forecasting with Big Data. They propose a Multivariate \(L_{2}\) Boosting method to be used with multivariate regression, and which can also be applied to a VAR series. An application to 795 Arabidopsis thaliana genes is used as an example to show the appropriateness of the proposed method.

4.8 Forecasting with Big Data in Media

Using data on hundreds and thousands of Youtube videos, [43] show an ARMA model with Singular Value Decomposition can be used for analyzing and forecasting video access patterns. They find that for rarely accessed videos Hierarchical Clustering can provide the better forecasts whilst for daily accessed videos PCA can provide an efficient forecast.

5 Conclusions

Big Data will continue to grow even bigger in the years to come, and if organizations are not inclined and willing to embrace the challenges and develop and employ the mandatory skills, they will find themselves in dire straits. In this review which is focussed on forecasting with Big Data, we have initially identified several problems and outlined the potential that Big Data has to offer and generate lucrative outcomes provided that we devote sufficient time and effort to overcome the identified issues. Thereafter we note a set of key challenges which at present hinder and impede the accuracy and effectiveness of Big Data forecasts.

In terms of the applications of statistical and Data Mining techniques for forecasting with Big Data, based on past literature it is evident that Factor models are the most common and popular tool currently used for Big Data forecasting whilst Neural Networks and Bayesian models are two other popular choices. The review also finds the field of Economics to be the most popular field in terms of exploitation of Big Data for forecasting variables of interest with the topics of GDP and Monetary Policy being the recipients of majority of the attention. The fields of Population Dynamics and Energy appear to be the second and third most popular based on published research. It is evident that there remains vast scope for research into forecasting with Big Data and that such work has the potential to yield better techniques which can enhance the forecasting accuracy. For example, it would be interesting to consider evaluating the use of a noise filtering technique such as Multivariate Singular Spectrum Analysis for forecasting with Big Data as this could aid in overcoming one of the major challenges at present which is the increased noise distorting the signal in Big Data.

In conclusion we wish to reinforce the necessity and responsibility of higher educational institutes to incorporate modules and courses which develop the skills required to be able to understand, analyze and forecast with Big Data using a variety of novel techniques. We believe that overcoming the constraints imposed by skills should be on top of the list for ensuring the increased application of relevant techniques for the exploitation and attainment of accurate and lucrative forecasts from Big Data in the future.