Forecasting with Big Data: A Review

Hassani, Hossein; Silva, Emmanuel Sirimal

doi:10.1007/s40745-015-0029-9

Forecasting with Big Data: A Review

Published: 10 April 2015

Volume 2, pages 5–19, (2015)
Cite this article

Download PDF

Annals of Data Science Aims and scope Submit manuscript

Forecasting with Big Data: A Review

Download PDF

Hossein Hassani^1,2 &
Emmanuel Sirimal Silva¹

29k Accesses
75 Citations
9 Altmetric
Explore all metrics

Abstract

Big Data is a revolutionary phenomenon which is one of the most frequently discussed topics in the modern age, and is expected to remain so in the foreseeable future. In this paper we present a comprehensive review on the use of Big Data for forecasting by identifying and reviewing the problems, potential, challenges and most importantly the related applications. Skills, hardware and software, algorithm architecture, statistical significance, the signal to noise ratio and the nature of Big Data itself are identified as the major challenges which are hindering the process of obtaining meaningful forecasts from Big Data. The review finds that at present, the fields of Economics, Energy and Population Dynamics have been the major exploiters of Big Data forecasting whilst Factor models, Bayesian models and Neural Networks are the most common tools adopted for forecasting with Big Data.

Machine Learning: Algorithms, Real-World Applications and Research Directions

Article 22 March 2021

Development and Application of Artificial Neural Network

Article 30 December 2017

Predictive big data analytics for supply chain demand forecasting: methods, applications, and research opportunities

Article Open access 25 July 2020

1 Introduction

The Big Data phenomenon has revolutionized the modern world, and is now the hottest Data Mining topic according to polls conducted by kdnuggets.com, with the current trend expected to continue into the foreseeable future. At present there is no unified definition of Big Data, however Shi [79] presented two definitions for Big Data. For academics, Big Data is “a collection of data with complexity, diversity, heterogeneity, and high potential value that are difficult to process and analyze in reasonable time”, whilst for policy makers, Big Data is “a new type of strategic resource in the digital era and the key factor to drive innovation, which is changing the way of humans’ current production and living” ([79], p. 6). As Varian ([92], p. 24) accurately asserts, “Big data will only get bigger”. This increased availability of data, which has been further escalated through the evolution of Big Data is now a major concern for a large number of industries [49]. It is not entirely surprising that this increasing availability of data is causing anxiety, and this is evident through the sound example presented by [86] where the authors state that in the modern age, we generate 70 times the information stored in the library of congress simply within the first 24 h of a new born baby’s life. In another report, it is noted that South Korea is currently upgrading its data storage capabilities associated with its national weather information system by increasing the capacity to 9.3 petabytes [44] and these examples provide an indication of the rate at which Big Data continues to grow. The times have truly ‘changed’ and we now live in an age where Big Data is identified as the leading edge for innovation, competition and productivity [63]. For example, digital data was expected to grow from 161 exabytes in 2006 to 2837 extabytes in 2012 and is now forecasted to reach 40 trillion gigabytes in 2020 [37]. Moreover, in the year 2008 alone the world had produced 14.7 exabytes of new data [14].

The emergence of Big Data is now history. What is of importance is how organizations develop the tools and means necessary for reacting to, and exploiting the increasingly available Big Data for their advantage. In line with this, [92] notes that there is a need for the adoption of powerful tools such as Data Mining techniques which can aid in modelling the complex relationships that are inherent in Big Data. Moreover, it is noteworthy that the recent financial crisis has seen an increase in the prolific importance of risk management in organizations, and as [83] states, companies are now seeking to use risk management as a tool for maximising their opportunities whilst minimising the associated threats. Herein lays the opportunity, as Big Data forecasting has the ability to improve organizational performance whilst enabling better risk management [16]. As [10] states, Big Data and predictive analysis goes hand in hand in the modern age with companies focussing on obtaining real time forecasts using the increasingly available data.

However, not all authors agree that Big Data is a revolutionary phenomenon. Poynter [72] states Big Data will be more insightful in simply connecting the dots as opposed to painting a whole new picture. For [94], 2013 was the year for getting accustomed to Big Data and 2014 is the year for truly exploiting Big Data towards lucrative gains. We share and subscribe to [94] perception on Big Data. Accordingly, we present this review paper which aims to: provide an informative review of the forecasting techniques utilized for forecasting with Big Data; provide a concise summary of the contributions of yesteryear; and identify challenges which need be overcome as the world gears to embrace and live in the presence of Big Data. In the process, we are successful in reviewing a wide range of forecasting models which have been adopted for forecasting with Big Data. In order to enable the reader to have a clear understanding of the history, we present the review of applications differentiated by the relevant field (i.e., economics, finance, and energy among others) and topic where appropriate. Those interested in tools which can be used for manipulating Big Data are referred to Varian ([92], Table 1, p. 5)

The remainder of this paper is organized as follows. In the following section we discuss the problems and potential behind Big Data forecasts whilst the associated challenges are discussed in Sect. 3. Section 4 provides a review of statistical and Data Mining techniques that have been evaluated for the purpose of forecasting with Big Data and the paper ends with some conclusions in Sect. 5.

2 The ‘Problem’ and ‘Potential’ of Big Data Forecasting

There exists a widespread belief that Big Data can aid in improving forecasts provided that we can analyse and discover hidden patterns, and [76] agree that predictions can be improved through data driven decision making. Tucker [91] believes Big Data will soon be predicting our every move, and according to [29], Big Data is most commonly sought after for building predictive models in a world where forecasting continues to remain a vital statistical problem [46]. We then come to the question, what is the problem behind forecasting with Big Data? The simplest explanation is that the traditional forecasting tools cannot handle the size, speed and complexity inherent in Big Data [62]. According to [3] this is owing to the lack of a structure in these data sets and the size. As a result, traditional techniques are seldom preferred for tackling Big Data [3]. Therefore, forecasting Big Data poses a challenge to organizations, and this is further highlighted by the European Central Bank which is conducting an entire workshop dedicated to using Big Data for forecasting.^{Footnote 1} Laney [59] is first to discuss the importance of data volume, velocity and variety in the context of Big Data. A decade later, [26] and [73] identifies these ‘3V’s’ as three concepts which define the dimensions of Big Data.

Rey and Wells [75] believe Data Mining techniques can be exploited to help forecasting with Big Data, a view supported by [92]. However, it should be noted that in the past, Data Mining techniques have mainly been used on static data as opposed to time series (see for example [11, 39, 74]); [58]; [45]. Interestingly, [22] finds fault with Big Data for the recent financial crisis as he believes the financial models adopted were unable to handle the huge amounts of data that was being inputted into the systems, and thereby resulted in inaccurate forecasts.

The opportunities for gains through forecasting with Big Data are diverse. At present, there is increased research into using Big Data for obtaining accurate weather forecasts and the initial results suggests that Big Data will benefit weather forecasts immensely [44, 55]. In fact, weather forecasting has been one of the main beneficiaries of Big Data, but the forecasts are still inaccurate beyond a week [84]. The fashion industry too is exploiting Big Data forecasts with companies such as EDITD (http://editd.com/) using Big Data for forecasting the fashion future by collecting data from social media [53]. According to [4], the airline industry is yet another field where Big Data forecasting is crucial. An interesting success story behind forecasting with Big Data is Netflix and its use of Big Data forecasts for decision making prior to commencing production of its own TV show ‘House of Cards’, and this resulted in increased revenue for the company. The potential underlying Big Data forecasts is truly astonishing and at times ‘scary’ as was evident in the experience of an individual in a story narrated by [27] where an irate customer walks into a ‘Target’ store in Minneapolis to complain about the store sending coupons relating to pregnancy products to his high school daughter. A few weeks later the same customer apologizes to the manager as following a discussion with his daughter it was revealed that she was in fact pregnant [27].

3 Challenges for Forecasting with Big Data

In this section we focus mainly on the challenges which need be overcome when forecasting with Big Data. It is imperative to note that the availability of Big Data alone does not constitute the end of problems [4]. A good example is the existence of a vast amount of data on earthquakes, but the lack of a reliable model that can accurately predict earthquakes [84]. Some existing challenges are related to hypothesis, testing and models utilized for Big Data forecasting ([72, 84] whilst [95] identifies as an added concern, the lack of theory to complement Big Data. Besides these, we have identified the following varied challenges associated with forecasting Big Data that needs to be given due consideration.

3.1 Skills

The skills required for tackling the problem of forecasting with Big Data, and the availability of personnel skilled for this particular task is one of the foremost challenges. As [3] states, the advanced skills required to handle Big Data is a major challenge whilst [72] notes there is a short supply of data scientists equipped with the skills required to tackle Big Data. Thornton [90] also agrees that there is a shortage of people who can understand Big Data. In a world where academics, researchers and statisticians are highly experienced in using traditional statistical techniques for over fifty years to obtain accurate forecasts, the availability of Big Data in itself is challenging. Skupin and Agarwal [85] states that the inaptness of traditional statistical techniques which are meant for obtaining forecasts from traditional data are hindering the effectiveness and application of forecasts from Big Data and [3] shares this same concern. As majority of statisticians are experienced in these traditional techniques, [29] points out that it is a challenge to develop the required skills for Big Data forecasting. In order to overcome this issue, it is important that Higher Educational Institutes around the globe give due consideration to upgrading the educational syllabuses to incorporate the skills necessary for understanding, analysing, evaluating and forecasting with Big Data so that the next generation of statisticians will be well equipped with the mandatory skills.

3.2 Signal and Noise

A more technical, but extremely important challenge in Big Data forecasting is identified by [84]. He suggests that noise is distorting the signal in Big Data, and that there is an increasing noise to signal ratio visible in Big Data. Silver’s [84] notion is further confirmed via [7] who points out that with large data sets, extracting the signal is made more complex. A majority of traditional forecasting techniques forecast both the noise and signal, and whilst they perform relatively well in the case of traditional data sets, the increasing noise to signal ratio seen in Big Data is more likely to distort the accuracy of forecasts. This suggests that there is a need for employing and evaluating the use of forecasting techniques which can filter the noise in Big Data and forecast the signal alone. As an example, one such technique is known to be singular spectrum analysis (SSA) which seeks to filter the noise from a given time series, reconstruct a new series which is less noisy, and then use this newly reconstructed series for forecasting future data points. The superiority of the methodology of SSA over traditional techniques has been proven recently in a variety of fields where the data sets would have comparatively small signal to noise ratios in relation to the much higher signal to noise ratio expected in Big Data (see for example, [47, 48, 50, 81] and [82]). Future research should concentrate on evaluating the applicability of such techniques for filtering the noise in Big Data to enable accurate and meaningful forecasts.

3.3 Hardware and Software

Arribas-Bel [3] was of the view that current statistical software is not able to tackle Big Data forecasting whilst [67] notes the possible need for supercomputers to handle Big Data forecasts. Recently, [51] have developed automatic forecasting techniques which can provide output within a matter of seconds. However, their reliability in the face of Big Data is yet to be tested. Another issue directly related to hardware and software is that personally, we have experienced statistical programs crashing in the face of few thousand observations owing to deficiencies in random access memory or the associated software. As such it is prudent to agree that computing capabilities and the structure underlying statistical software will require enhancements in order to be able to successfully handle the increased data input.

3.4 Statistical Significance

Lohr [60] suggests there is an increased threat of making false discoveries from Big Data. This is because even though obtaining forecasts using an appropriate technique appears to be the major challenge, it is not quite so. Given the sheer quantity of data that needs to be processed and forecasted, with Big Data there is an increased complexity in differentiating between randomness and statistically significant outcomes [28]. As such there is an increased chance of reporting a chance occurrence as a statistically significant outcome and misleading the stakeholders interested in the forecast.

3.5 Architecture of Algorithms

Data Mining techniques are suggested as important methods which could be used for forecasting with Big Data. However, these techniques have been designed to handle data of comparatively smaller sizes as opposed to the size of Big Data. Therefore, Data Mining algorithms are often unable to work with data that is not loaded on to its main memory, and thus requires the movement of Big Data between locations which can incur increased network communication costs [52]. The architecture of the analytics needs to be redesigned so that it could handle both historical and real time data [52], and the Lambda architecture proposed in [64] is a sound example of research currently seeking to overcome this issue. A detailed evaluation of challenges associated with the application of Data Mining techniques to Big Data (explained in the context of official statistics) can be found in [49].

3.6 Big Data

Big Data itself is a challenge for forecasting as a result of its inherent characteristics. Firstly, Big Data evolves and changes in real time, and as such it is important that the techniques used to forecast Big Data are able to transform unstructured data into structured data [79], accurately capture these dynamic changes and detect change points in advance. Secondly, there are challenges stemming from Big Data’s highly complex structure and as [29] point out, it is a challenge to build forecasting models that do not result in poor out-of-sample forecasts owing to the ‘over use’ of potential predictors. Factor modelling which is discussed in the following section is a potential cure for this challenge, but more devoted research is needed to overcome the issue completely.

4 Applications of Statistical and Data Mining Techniques for Big Data Forecasting

In this section we identify existing applications of statistical and Data Mining techniques for forecasting with Big Data. We have summarized these based on the related field and topic (where relevant) in order to provide the reader with a more rewarding experience. At the outset, it is noteworthy to mention that [34] and [87] are closely associated with the developments of Econometric techniques for analysis and forecasting with Big Data.

4.1 Forecasting with Big Data in Economics

Researchers in the field of Economics have been major exploiters of Big Data for forecasting various economic variables. Camacho and Sancho [17] used a dynamic factor model (DFM) based on the methodology presented in [87] to forecast a large dataset involving Spanish diffusion indexes which they describe as an exhaustive description of the Spanish economy. DFM models are an extension of [87] factor models and are frequently used for forecasting with Big Data. However, [24] asserted that the use of DFM for macroeconomic Big Data forecasting is flawed as it is based on linear models, and also as Big Data is more likely to be nonlinear. Over time through the work of [35, 88] and [54], the DFM technique was improved, thus enabling it to handle Big Data more appropriately.

The application of Maximum Likelihood estimates of Factor Models for Big Data forecasting has been evaluated by [25] via a simulation study where the authors find this approach to be effective and efficient. A seasonal AR model was used to show how Big Data from the Google search engine can be used to predict economic indicators in [21]. Gupta et al. [42] used a multivariate factor-augmented Bayesian shrinkage model on Big Data comprising of 143 monthly time series to forecast employment in eight sectors of the U.S. economy. Big Data relating to various exchange rates are used to forecast the Euro, British Pound and Japanese Yen in [8] where the authors find their proposed factor-augmented Error Correction Model (FECM) outperforming a factor-augmented VAR (FAVAR) model at accurately predicting the three major bilateral exchange rates. Bańbura et al. [5] proposed an algorithm based on Kalman filtering for large VAR and DFM models to enable obtaining conditional forecasts, and providing a scenario analysis for the European economy using 26 macroeconomic and financial indicators for the Euro area.

In what follows, we further group the applications of Data Mining and statistical techniques for forecasting with Big Data in the field of Economics into topics based on the use of Big Data to forecast economic variables.

4.1.1 Gross Domestic Product (GDP)

Schumacher [77] evaluated the forecasting performance of Factor models using Static and Dynamic Principal Components and Subspace algorithms for State Space models. He finds Factor models outperforming AR models at forecasting a large panel of quarterly time series relating to German GDP. Moreover, the Subspace Factor Model and Dynamic Principal Component model is seen outperforming the Static Factor Model, but this ranking depend greatly on the correct specification of the model parameters [77]. A large factor model which uses an expectation maximization algorithm combined with Principal Components is adopted in [78] to forecast a large dataset comprising of German real-time GDP. They find the Mixed Frequency Factor model performing better than simple benchmark models but find meagre differences in forecast accuracy between the Factor models themselves. Biau and D’Elia [12] apply the ensemble machine learning technique of Random Forests to forecast European Union GDP using large survey datasets. They find Random Forests outperforming the AR model and forecasts from the ‘Euro zone economic outlook’. Biau and D’Elia [12] also note that Random Forests are popular for its ability of not over-fitting when handling a large number of inputs. Altissimo et al. [2] use monthly accumulated Big Data along with a modified DFM to forecast medium-to long run GDP growth in the Euro area and finds their model is able to perform better than Bandpass Filter in terms of fitting and change point detection. Carriero et al. [18] adopted a Bayesian Mixed Frequency model in combination with stochastic volatility for nowcasting with Big Data to obtain real time GDP predictions for the United States. [8] used 90 monthly time series for the German economy and showed that a FECM can outperform a FAVAR model at forecasting real GDP in Germany. Kopoin et al. [57] used factor models with national and international Big Data for improving the accuracy of GDP forecasts for Canadian provinces below the one year ahead mark. Beyond the one year ahead horizon they find that relying on the provincial data alone optimizes the forecasts. Bańbura and Modugno [7] use factor models with maximum likelihood estimation on over 101 series for nowcasting GDP in the Euro area. They find that sectoral information is not mandatory for obtaining accurate GDP predictions in the Euro area.

4.1.2 Monetary Policy

In [9], a FAVAR model was used for Big Data forecasting and structural analysis in order to accurately identify the monetary policy transmission mechanism so that the exact impact of monetary policy on the economy could be ascertained. They find the proposed FAVAR model outperforming the Structural VAR model by exploiting far more informative content for assessing the monetary policy transmission mechanism. De Mol et al. [23] use a Bayesian regression model with macroeconomic Big Data which includes real and nominal variables, asset prices, surveys and yield curves for forecasting the industrial production and consumer price indices. They find the results from the Bayesian regression are highly correlated with forecasts from principal components. Alessi et al. [1] exploits a monthly panel dataset comprising of 130 U.S. macroeconomic time series and four price indexes (PCE, PCE core, CPI and CPI core) for forecasting inflation and its volatility using a DFM in combination with multivariate GARCH models (DF-GARCH). They find the DF-GARCH model outperforming GARCH, AR(\(p\)) and AR(\(p\))-GARCH(1,1) and other univariate and classical factor models. The work of [23] was extended in [6] to show that combining VAR with Bayesian Shrinkage can improve forecasts. The authors conclude that Bayesian VAR models are appropriate for Big panel Data. Bordoloi et al. [13] developed a DFM to forecast India’s industrial production and price level, and cited the DFM model’s ability to handle many variables found in Big Data as the reason behind its selection. Here, they found that the DFM model outperforming an ordinary least squares model. Figueiredo [32] exploits 368 monthly time series which include a variety of economic variables alongside a factor model with targeted predictors (FTP) for forecasting Brazilian inflation. They find the FTP outperforming VAR, Bayesian VAR and a Principal Component based Factor model at forecasting Brazilian inflation. Carriero et al. [20] consider Big Data relating to 52 U.S. macroeconomic time series taken from [88] along with a Bayesian Reduced Rank multivariate model for forecasting industrial production and consumer price indices and the federal funds rate. They compare their results against models based on Rank Reduction, which include Bayesian VAR models, Multivariate Boosting and the Factor model from [87]. They find that combining Rank Reduction with Shrinkage can improve forecasts attained when applied to Big Data. Giovanelli [41] proposes the use of kernel principal component analysis (PCA) (as this enables factors to take a nonlinear relationship to the input variables) and an Artificial Neural Networks (ANN) model on Big Data containing 259 predictors for the Euro area, and 131 predictors for the U.S. economy for forecasting the industrial production and consumer prices indices. The author finds that using the Kernel PCA approach for predicting nonlinear factors yield results of better quality in comparison to the linear method, and that the ANN method reports a similar forecast to that obtained via the Factor Augmented Linear forecasting equation. In [19] a large BVAR model coupled with optimised shrinkage towards univariate AR models are used to forecast interest rates. The authors find the BVAR model showing small gains over random walk forecasts. Banerjee et al. [8] used a FECM and showed that FECM can outperform a FAVAR model at forecasting; U.S. inflation using a large panel of 132 U.S. macroeconomic variables, and German inflation and interest rate using 90 monthly series for the German economy. Ouysse [70] compared bayesian model averaging (BMA) and principal component regression (PCR) on a large panel data set for forecasting U.S. inflation and industrial production. Based on the Root Mean Squared Error the author concludes that in general, PCR can provide more accurate forecasts than BMA. Koop [56] exploits the U.S. macroeconomic data set found in [89] which includes 168 variables along with BVAR model for forecasting inflation and interest rates. They find that BVAR models can provide better forecasts than those attainable via Factor methods. Using the U.S. Treasury zero coupon yield curve estimates, [8] showed that a FECM model can outperform a FAVAR model at forecasting interest rates at different maturities.

4.2 Forecasting with Big Data in Finance

Alessi et al. [1] use their DF-GARCH model for forecasting financial asset returns using Big Data relating to transaction prices of stocks traded on the London Stock Exchange after cleaning the data for outliers. They find the DF-GARCH model outperforming a GARCH (1,1) model and that the full BEKK specification [31] provides better forecasts in comparison to the DCC specification [30].

4.3 Forecasting with Big Data in Population Dynamics

An imputation based on Neural Networks model was applied to the Norwegian population census data of 1990 in order to perform a population census by combining administrative data along with data gathered through sample surveys [68]. A procedure based on Neural Networks was used by [36] to predict trends in Spanish economic indexes per household and censal section by using the Spanish Population and Housing Census, and Family Expenditure Survey. Bayesian regression was used by [71] for predicting long term illness in Stockport UK by using statistics from the 1991 census. Cluster Analysis was used as a method for predicting missing data by analysing the 2007 census donor pool screening in [65]. Unlikely representations of farming operations in the initial Census mail list have been predicted using Classification Trees according to [38]. Gilary [40] reports the US Census Bureau exploited the Decision Trees technique by combining a Stepwise regression with the classification and regression tree concept for recursive portioning of racial classification cells. Moreover, there is evidence of Decision Trees being used to forecast survey non-respondents through the work of [66].

4.4 Forecasting with Big Data in Crime

Wu et al. [96] rely on a Kohonen Neural Network Clustering algorithm to find outliers and then forecast fraudulent behaviour in the data intensive Chinese telecom industry after evaluating its performance in comparison to a two-step Clustering algorithm and K-means algorithm.

4.5 Forecasting with Big Data in Energy

Wang [93] uses Support Vector Machines as an auxiliary method, along with Neural Networks, and ‘MapReduce’ technology, for forecasting Big Data originating from China’s electricity consumption. He finds the developed prediction model is able to provide sound portability and feasibility in terms of processing Big Data relating to electricity. Nguyen and Nabney [69] evaluate the use of wavelet transform (WT) in combination with a variety of models such as GARCH, Linear regressions, Radial Basis functions and multilayer perceptrons (MLP) to forecast UK gas price and electricity demand by exploiting Big Data from the British energy markets. They find that the use of WT and adaptive models can provide considerable improvements to forecasting accuracy. The conclusion here is that adaptive models combining WT with either MLP or GARCH are the optimal models for forecasting gas price and electricity demand based on the lowest mean squared error. Fischer et al. [33] evaluates the use of Exponential Smoothing and ARIMA models in combination with a model configuration advisor to forecast energy demand using Big Data from an energy domain.

4.6 Forecasting with Big Data in Environment

Sigrist et al. [80] utilizes stochastic advection diffusion partial differential equations (SPDEs) to improve the precipitation forecasts for northern Switzerland using Big Data from a numerical weather prediction model. They find that following the application of SPDE, the forecasts are greatly improved in comparison to the raw forecasts attained via the numerical model.

4.7 Forecasting with Big Data in Biomedical Science

Lutz and Buhlmann [61] provide theoretical evidence for the applicability of Multivariate Boosting for forecasting with Big Data. They propose a Multivariate \(L_{2}\) Boosting method to be used with multivariate regression, and which can also be applied to a VAR series. An application to 795 Arabidopsis thaliana genes is used as an example to show the appropriateness of the proposed method.

4.8 Forecasting with Big Data in Media

Using data on hundreds and thousands of Youtube videos, [43] show an ARMA model with Singular Value Decomposition can be used for analyzing and forecasting video access patterns. They find that for rarely accessed videos Hierarchical Clustering can provide the better forecasts whilst for daily accessed videos PCA can provide an efficient forecast.

5 Conclusions

Big Data will continue to grow even bigger in the years to come, and if organizations are not inclined and willing to embrace the challenges and develop and employ the mandatory skills, they will find themselves in dire straits. In this review which is focussed on forecasting with Big Data, we have initially identified several problems and outlined the potential that Big Data has to offer and generate lucrative outcomes provided that we devote sufficient time and effort to overcome the identified issues. Thereafter we note a set of key challenges which at present hinder and impede the accuracy and effectiveness of Big Data forecasts.

In terms of the applications of statistical and Data Mining techniques for forecasting with Big Data, based on past literature it is evident that Factor models are the most common and popular tool currently used for Big Data forecasting whilst Neural Networks and Bayesian models are two other popular choices. The review also finds the field of Economics to be the most popular field in terms of exploitation of Big Data for forecasting variables of interest with the topics of GDP and Monetary Policy being the recipients of majority of the attention. The fields of Population Dynamics and Energy appear to be the second and third most popular based on published research. It is evident that there remains vast scope for research into forecasting with Big Data and that such work has the potential to yield better techniques which can enhance the forecasting accuracy. For example, it would be interesting to consider evaluating the use of a noise filtering technique such as Multivariate Singular Spectrum Analysis for forecasting with Big Data as this could aid in overcoming one of the major challenges at present which is the increased noise distorting the signal in Big Data.

In conclusion we wish to reinforce the necessity and responsibility of higher educational institutes to incorporate modules and courses which develop the skills required to be able to understand, analyze and forecast with Big Data using a variety of novel techniques. We believe that overcoming the constraints imposed by skills should be on top of the list for ensuring the increased application of relevant techniques for the exploitation and attainment of accurate and lucrative forecasts from Big Data in the future.

Notes

http://www.ecb.europa.eu/events/conferences/html/20140407_call_international_institute_of_forecasters.en.html.

References

Alessi L, Barigozzi M, Capasso M (2009) Forecasting large datasets with conditionally heteroskedastic dynamic common factors. Working Paper No. 1115, European Central Bank
Altissimo F, Cristadoro R, Forni M, Lippi M, Veronese G (2010) New Eurocoin: tracking economic growth in real time. Rev Econ Stat 92(4):1024–1034
Article Google Scholar
Arribas-Bel D (2014) Accidental, open and everywhere: emerging data sources for the understanding of cities. Appl Geogr 49:45–53
Article Google Scholar
Bacon T (2013) Big bang? When ‘Big Data’ gets too Big. http://www.eyefortravel.com/mobile-and-technology/big-bang-when-%E2%80%98big-data%E2%80%99-gets-too-big. Accessed 16 May 2013
Bańbura M, Giannone D, Lenza M (2014) Conditional forecasts and scenario analysis with vector autoregressions for large cross-section. Working Papers ECARES ECARES 2014–2015, ULB - Universite Libre de Bruxelles
Bańbura M, Giannone D, Reichlin L (2010) Large bayesian vector autoregressions. J Appl Econom 25(1):71–92
Article Google Scholar
Bańbura M, Modugno M (2014) Maximum likelihood sstimation of factor models on datasets with arbitrary pattern of missing data. J Appl Econom 29(1):133–160
Article Google Scholar
Banerjee A, Marcellino M, Masten I (2013) Forecasting with factor-augmented error correction models. Int J Forecast (in Press)
Bernanke B, Boivin J, Eliasz PS (2005) Measuring the effects of monetary policy: a factor-augmented vector autoregressive approach. Quart J Econ 120(1):387–422
Google Scholar
Bernstein D (2013) Big data’s greatest power: predictive analysis. http://www.equest.com/cartoons/cartoons-2013/big-datas-greatest-power-predictive-analytics/. Accessed 2 Dec 2013
Berry M (2000) Data mining techniques and algorithms. Wiley, New York
Google Scholar
Biau O, D’Elia A (2009) Euro area GDP forecasting using large survey datasets. A random forest approach. http://unstats.un.org/unsd/nationalaccount/workshops/2010/moscow/AC223-S73Bk4.PDF. Accessed 27 June 2013
Bordoloi S, Biswas D, Singh S, Manna UK, Saggar S (2010) Macroeconomic forecasting using dynamic factor models. Reserv Bank India Occas Pap 31(2):69–83
Google Scholar
Bounie D (2012) International production and dissemination information: results, methodological issues, and statistical perspectives. Int J Commun 6:1001–1021
Google Scholar
Boyd D, Crawford K (2012) Critical questions for big data. Inf Commun Soc 15(5):662–679
Article Google Scholar
Brown B, Chui M, Manyika J (2011) Are you ready for the era of ‘big data’? In: Guzzo RA (ed) McKinsey quarterly. McKinsey & Company, New York
Google Scholar
Camacho M, Sancho I (2003) Spanish diffusion indexes. Span Econ Rev 5(3):173–203
Article Google Scholar
Carriero A, Clark TE, Marcellino M (2012a) Real-time nowcasting with a Bayesian mixed frequency model with stochastic volatility. Working Paper, No. 1227, Federal Reserve Bank of Cleveland
Carriero A, Kapetanios G, Marcellino M (2012b) Forecasting government bond yields with large Bayesian vector autoregressions. J Bank Financ 36(7):2026–2047
Article Google Scholar
Carriero A, Kapetanios G, Marcellino M (2011) Forecasting large datasets with Bayesian reduced rank multivariate models. J Appl Econom 26(5):735–761
Article Google Scholar
Choi H, Varian H (2012) Predicting the resent with google trends. Econ Rec 88(s1):2–9
Article Google Scholar
Cukier K (2010) Data, data everywhere. The economist. http://www.economist.com/node/15557443. Accessed 17 Jul 2012
De Mol C, Giannone D, Reichlin L (2008) Forecasting using a large number of predictors: is Bayesian shrinkage a valid alternative to principal components? J Econom 146(2):318–328
Article Google Scholar
Diebold FX (2003) ’Big Data’ Dynamic factor models for macroeconomic measurement and forecasting. In: Dewatripont M, Hansen LP, Turnovsky S (eds) Advances in economics and econometrics, eighth world congress of the econometric society. Cambridge University Press, Cambridge, pp 115–122
Google Scholar
Doz C, Giannone D, Reichlin L (2012) A quasi-maximum likelihood approach for large, approximate dynamic factor models. Rev Econ Stat 99(4):1014–1024
Article Google Scholar
Dumbill E (2012) What is big data? An introduction to the big data landscape. http://strata.oreilly.com/2012/01/what-is-big-data.html. Accessed 11 Jan 2012
Duhigg C (2012) How companies learn your secrets. http://www.nytimes.com/2012/02/19/magazine/shopping-habits.html. Accessed 16 Feb 2012
Efron B (2010) Large-scale inference: empirical Bayes methods for sstimation, testing and prediction. Cambridge University Press, Cambridge
Book Google Scholar
Einav L, Levin JD (2013) The data revolution and economic analysis. Working Paper No. 19035, National Bureau of Economic Research
Engle RF (2002) Dynamic conditional correlation: a simple class of multivariate generalized autoregressive conditional heteroskedasticity models. J Bus Econ Stat 20(3):339–350
Article Google Scholar
Engle RF, Kroner K (1995) Multivariate simultaneous GARCH. Econom Theory 11(1):122–150
Article Google Scholar
Figueiredo FMR (2010) Forecasting Brazilian inflation using a large dataset. Working Paper No. 228, Central Bank of Brazil. http://www.bcb.gov.br/pec/wps/ingl/wps228.pdf
Fischer U, Schildt C, Hartmann C, Lehner W (2013) Forecasting the data cube: A model configuration advisor for multi-dimensional data sets. In: IEEE 29th International conference on data engineering (ICDE), Brisbane, 8–12 April 2013
Forni M, Hallin M, Lippi M, Reichlin L (2000) The generalized factor model: identification and estimation. Rev Econ Stat 82(4):540–554
Article Google Scholar
Forni M, Hallin M, Lippi M, Reichlin L (2005) The generalized dynamic factor model: one-sided estimation and forecasting. J Am Stat Assoc 100(471):830–840
Article Google Scholar
Frutos S, Menasalva E, Montes C, Segovia J (2003) Calculating economic indexes per household and censal section from official Spanish databases. Intellt Data Anal 7(6):603–613
Google Scholar
Gantz J, Reinsel D (2012) The digital universe in 2020: big data, bigger digital shadows, and biggest growth in far east. http://www.emc.com/collateral/analyst-reports/idc-the-digital-universe-in-2020.pdf. Accessed 5 Jan 2014
Garber SC (2009) Census mail list trimming using SAS data mining. In: RRD Research Report, Washington, DC, 9 May 2009
Ghodsi M (2014) A brief review of data mining applications in the energy industry. Int J Energy Stat 2(1):49–57
Gilary A (2011) Recursive partitioning for racial classification cells. In: Proceedings of the survey research methods section, American Statistical Association-Session 628: survey analysis and issues with ata quality 2011, Miami Beach, p 2706
Giovanelli A (2012) Nonlinear forecasting using large datasets: evidence on US and Euro area economies. CEIS Tor Vergata, Research paper series, 10(13). doi:10.2139/ssrn.2172399
Gupta R, Kabundi A, Miller S, Uwilingiye J (2013) Using large datasets to forecast sectoral unemployment. Stat Methods Appl 23(2):229–264
Article Google Scholar
Gursun G, Crovella M, Matta I (2011) Describing and forecasting video access patterns. In: INFOCOM ’11, Proceedings of the 30th IEEE international conference on computer communications, IEEE, 2011. http://www.cs.bu.edu/techreports/pdf/2010-037-video-access-patterns.pdf. Accessed 10 Nov 2010
Hamm S (2013) How big data can boost weather forecasting. http://readwrite.com/2013/02/28/how-big-data-can-boost-weather-forecasting#awesm=ou64ZEaKe2HtUu. Accessed 20 Nov 2014
Han J, Kamber M, Pie J (2012) Data mining: concepts and techniques. Elsevier, Inc., San Francisco
Book Google Scholar
Hand DJ (2009) Mining the past to determine the future: problems and possibilities. Int J Forecast 25(3):441–451
Article Google Scholar
Hassani H, Heravi S, Zhigljavsky A (2009) Forecasting European industrial production with singular spectrum analysis. Int J Forecast 25(1):103–118
Article Google Scholar
Hassani H, Heravi S, Zhigljavsky A (2013) Forecasting UK industrial production with multivariate singular spectrum analysis. J Forecast 32(5):395–408
Article Google Scholar
Hassani H, Saporta G, Silva ES (2014) Data mining and official statistics: the past, the present & the future. Big Data 2(1):BD1–BD10
Article Google Scholar
Hassani H, Webster A, Silva ES, Heravi H (2015) Forecasting US tourist arrivals using optimal singular spectrum analysis. Tour Manag 46:322–335
Hyndman RJ, Athanasopoulos G (2013) Forecasting: principles and practice. Otexts, Melbourne
Google Scholar
Jadhav DK (2013) Big data: the new challenges in data mining. Int J Innov Res ComputSci & Technol 1(2):39–42
Google Scholar
Kansara VA (2011) The long view\({\vert }\) how ealtime data is reshaping the fashion business. http://www.businessoffashion.com/2011/08/the-long-view-how-realtime-data-is-reshaping-the-fashion-business.html. Accessed 22 May 2014
Kapetanios G, Marcellino M (2009) A parametric estimation method for dynamic factor models of large dimensions. J Time Ser Anal 30(2):208–238
Article Google Scholar
Knapp A (2013) Forecasting the weather with big data and the fourth dimension. http://www.forbes.com/sites/alexknapp/2013/06/13/forecasting-the-weather-with-big-data-and-the-fourth-dimension/2/. Accessed 20 Nov 2014
Koop GM (2013) Forecasting with medium and large Bayesian VARs. J Appl Econom 28(2):177–203
Article Google Scholar
Kopoin A, Moran K, Paré JP (2013) Forecasting regional GDP with factor models: how useful are national and international data? Econ Lett 121(2):267–270
Article Google Scholar
Kurgan L, Musilek P (2006) A survey of knowledge discover and data mining process models. Knowl Eng Rev 21(1):1–24
Article Google Scholar
Laney D (2001) 3D Data management: controlling data volume, velocity and variety. http://blogs.gartner.com/doug-laney/files/2012/01/ad949-3D-Data-Management-Controlling-Data-Volume-Velocity-and-Variety.pdf. Accessed 13 Dec 2013
Lohr S (2013) The age of big data. http://www.nytimes.com/2012/02/12/sunday-review/big-datas-impact-in-the-world.html?pagewanted=all&_r=1&. Accessed 17 Sep 2012
Lutz RW, Buhlmann P (2006) Boosting for high multivariate responses in high-dimensional linear regression. Stat Sin 16:471–494
Google Scholar
Madden S (2012) From databases to big data. IEEE Internet Comput 16(3):4–6
Article Google Scholar
Manyika J, Chui, M, Brown B, Bughin J, Dobbs R, Roxburgh C, Byers AH (2011) Big data: the next frontier for innovation, competition, and productivity. McKinsey Global Institute. http://www.mckinsey.com/insights/business_technology/big_data_the_next_frontier_for_innovation. Accessed 20 Nov 2014
Marz N, Warren J (2013) Big Data: principles and best practices of scalable reatime data systems. Manning Publications, St. Louis
Google Scholar
McCarthy JS, Jacob T, Atkinson D (2009) Innovative uses of data mining techniques in the production of official statistics. In: UN statistical commission session on innovations in official statistics, New York, 20 Feb 2009
McCarthy J, Jacob T, McCracken A (2010) Modeling NASS survey non-response using classification trees. In: RDD Research Report, Washington DC, 1 Nov 2010
Needham J (2013) Disruptive possibilities: how big data changes everything. O’Reilly media. Available via: http://chimera.labs.oreilly.com/books/1234000000914/index.html. Accessed 25 Jul 2012
Nordbotten S (1996) Neural network imputation applied to the Norwegian 1990 population census data. J Off Stat 12(4):385–401
Google Scholar
Nguyen HT, Nabney IT (2010) Short-term electricity demand and gas price forecasts using wavelet transforms and adaptive models. Energy 35(9):3674–3685
Article Google Scholar
Ouysse R (2013) Forecasting using a large number of predictors: Bayesian model averaging versus principal components regression. Australian School of Business Research Paper, No. 2013ECON04, pp 1–34. http://research.economics.unsw.edu.au/RePEc/papers/2013-04.pdf. Accessed 29 May 2013
Paaß G, Kindermann J (2003) Bayesian regression mixtures of experts for geo-referenced data. Intell Data Anal 7(6):567–582
Google Scholar
Poynter R (2013) Big data successes and limitations: what researchers and marketers need to know. http://www.visioncritical.com/blog/big-data-successes-and-limitations. Accessed 20 Nov 2014
Press G (2013) A very short history of big data. http://www.forbes.com/sites/gilpress/2013/05/09/a-very-short-history-of-big-data/2/. Accessed 28 Jul 2013
Pyle D (2003) Business modeling and data mining. Elsevier Science, Philadelphia
Google Scholar
Rey T, Wells C (2013) Integrating data mining and forecasting. OR/MS Today, New York
Google Scholar
Richards NM, King JH (2013) Three paradoxes of big data. Stanf Law Rev Online 66(41):41–46
Google Scholar
Schumacher C (2007) Forecasting German GDP using alternative factor odels based on large datasets. J Forecast 26(4):271–302
Article Google Scholar
Schumacher C, Breitung J (2008) Real-time forecasting of German GDP based on a large factor model with monthly and quarterly data. Int J Forecast 24(3):386–398
Article Google Scholar
Shi Y (2014) Big Data: history, current status, and challenges going forward. Bridge 44(4):6–11
Sigrist F, Kunsch HR, Stahel WA (2012) SPDE based modeling of large space-time data sets. http://arxiv.org/pdf/1204.6118v4.pdf. Accessed 20 Nov 2014
Silva ES (2013) A combination forecast for energy-related \(\text{ CO }_{2}\) emissions in the United States. Int J Energy Stat 1(4):269–279
Article Google Scholar
Silva ES, Hassani H (2015) On the use of singular spectrum analysis for forecasting US trade before, during and after the 2008 recession. Int Econ. doi:10.1016/j.inteco.2014.11.003
Silva ES, Wu Y, Ojiako U (2013) Developing risk management as a competitive capability. Strateg Change 22(5–6):281–294
Article Google Scholar
Silver N (2012) The signal and the noise: the art and science of prediction. Penguin Books, Westmins
Google Scholar
Skupin A, Agarwal P (2007) Introduction: what is a self-organizing map? In: Agarwal P, Skupin A (eds) Self-organizing maps: applications in geographic information science. John Wiley, Chichester
Google Scholar
Smolan R, Erwitt J (2012) The human face of big data. Sterling Publishing, New York
Google Scholar
Stock JH, Watson MW (2002) Forecasting using principal components from a large number of predictors. J Am Stat Assoc 97(460):1167–1179
Article Google Scholar
Stock JH, Watson MW (2006) Forecasting with many predictors. In: Elliott G, Granger CWJ, Timmermann A (eds) Handbook of economic forecasting. Elsevier, Amsterdam, pp 517–554
Google Scholar
Stock JH, Watson MW (2008) Forecasting in dynamic factor models subject to structural instability. In: Hendry David F, Castle J, Shephard N (eds) The methodology and practice of econometrics: a festschrift in honour of professor. Oxford University Press, Oxford, pp 173–205
Google Scholar
Thornton D (2013) The Problem with Big Data. http://moneyweek.com/arria-nlg-the-problem-with-big-data/. Accessed 25 Nov 2013
Tucker P (2013) The future is not a destination. http://www.slate.com/articles/technology/future_tense/2013/10/futurist_magazine_s_predictions_on_quantum_computing_big_data_and_more.html. Accessed 2 Oct 2013
Varian HR (2014) Big data: new tricks for econometrics. J Econ Perspect 28(2):3–28
Article Google Scholar
Wang X (2013) Electricity consumption forecasting in the age of big data. Telkomnika 11(9):5262–5266
Google Scholar
Walker A (2014) Trends in big data: a forecast for 2014. http://www.csc.com/big_data/publications/91710/105057-trends_in_big_data_a_forecast_for_2014
West G (2013) Big data needs a big theory to go with it. http://www.scientificamerican.com/article/big-data-needs-big-theory/. Accessed 31 May 2013
Wu S, Kang N, Yang L (2007) Fraudulent behaviour forecast in telecom industry based on data mining technology. Commun IIMA 7(4):1–6
Google Scholar

Download references

Author information

Authors and Affiliations

Statistical Research Centre, The Business School, Bournemouth University, Dorset, UK
Hossein Hassani & Emmanuel Sirimal Silva
Institute for International Energy Studies (IIES), Tehran, 1967743711, I.R., Iran
Hossein Hassani

Authors

Hossein Hassani
View author publications
You can also search for this author in PubMed Google Scholar
Emmanuel Sirimal Silva
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hossein Hassani.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hassani, H., Silva, E.S. Forecasting with Big Data: A Review. Ann. Data. Sci. 2, 5–19 (2015). https://doi.org/10.1007/s40745-015-0029-9

Download citation

Received: 13 December 2014
Revised: 10 February 2015
Accepted: 13 March 2015
Published: 10 April 2015
Issue Date: March 2015
DOI: https://doi.org/10.1007/s40745-015-0029-9

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Forecasting with Big Data: A Review

Abstract

Similar content being viewed by others

Machine Learning: Algorithms, Real-World Applications and Research Directions

Development and Application of Artificial Neural Network

Predictive big data analytics for supply chain demand forecasting: methods, applications, and research opportunities

1 Introduction

2 The ‘Problem’ and ‘Potential’ of Big Data Forecasting

3 Challenges for Forecasting with Big Data

3.1 Skills

3.2 Signal and Noise

3.3 Hardware and Software

3.4 Statistical Significance

3.5 Architecture of Algorithms

3.6 Big Data

4 Applications of Statistical and Data Mining Techniques for Big Data Forecasting

4.1 Forecasting with Big Data in Economics

4.1.1 Gross Domestic Product (GDP)

4.1.2 Monetary Policy

4.2 Forecasting with Big Data in Finance

4.3 Forecasting with Big Data in Population Dynamics

4.4 Forecasting with Big Data in Crime

4.5 Forecasting with Big Data in Energy

4.6 Forecasting with Big Data in Environment

4.7 Forecasting with Big Data in Biomedical Science

4.8 Forecasting with Big Data in Media

5 Conclusions

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Forecasting with Big Data: A Review

Abstract

Similar content being viewed by others

Machine Learning: Algorithms, Real-World Applications and Research Directions

Development and Application of Artificial Neural Network

Predictive big data analytics for supply chain demand forecasting: methods, applications, and research opportunities

1 Introduction

2 The ‘Problem’ and ‘Potential’ of Big Data Forecasting

3 Challenges for Forecasting with Big Data

3.1 Skills

3.2 Signal and Noise

3.3 Hardware and Software

3.4 Statistical Significance

3.5 Architecture of Algorithms

3.6 Big Data

4 Applications of Statistical and Data Mining Techniques for Big Data Forecasting

4.1 Forecasting with Big Data in Economics

4.1.1 Gross Domestic Product (GDP)

4.1.2 Monetary Policy

4.2 Forecasting with Big Data in Finance

4.3 Forecasting with Big Data in Population Dynamics

4.4 Forecasting with Big Data in Crime

4.5 Forecasting with Big Data in Energy

4.6 Forecasting with Big Data in Environment

4.7 Forecasting with Big Data in Biomedical Science

4.8 Forecasting with Big Data in Media

5 Conclusions

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation