A Comparative Study of Demand Forecasting Models for a Multi-Channel Retail Company: A Novel Hybrid Machine Learning Approach

Mitra, Arnab; Jain, Arnav; Kishore, Avinash; Kumar, Pravin

doi:10.1007/s43069-022-00166-4

A Comparative Study of Demand Forecasting Models for a Multi-Channel Retail Company: A Novel Hybrid Machine Learning Approach

Original Research
Published: 27 September 2022

Volume 3, article number 58, (2022)
Cite this article

Download PDF

Operations Research Forum Aims and scope Submit manuscript

A Comparative Study of Demand Forecasting Models for a Multi-Channel Retail Company: A Novel Hybrid Machine Learning Approach

Download PDF

Arnab Mitra¹,
Arnav Jain¹,
Avinash Kishore¹ &
…
Pravin Kumar ORCID: orcid.org/0000-0002-3378-7087¹

11k Accesses
16 Citations
Explore all metrics

Abstract

Demand forecasting has been a major concern of operational strategy to manage the inventory and optimize the customer satisfaction level. The researchers have proposed many conventional and advanced forecasting techniques, but no one leads to complete accuracy. Forecasting is equally important in manufacturing as well as retail companies. In this study, the performances of five regression techniques of machine learning, viz. random forest (RF), extreme gradient boosting (XGBoost), gradient boosting, adaptive boosting (AdaBoost), and artificial neural network (ANN) algorithms, are compared with a proposed hybrid (RF-XGBoost-LR) model for sales forecasting of a retail chain considering the various parameters of forecasting accuracy. The weekly sales data of a US-based retail company is considered in the analysis of the forecasts undertaking the attributes affecting the sale such as the temperature of the region and the size of the store. It is observed that the hybrid RF-XGBoost-LR outperformed the other models measured against various metrics of performance. This study may help the industry decision-maker to understand and improve the methods of forecasting.

A Review on Random Forest: An Ensemble Classifier

Urban flood susceptibility mapping using frequency ratio and multiple decision tree-based machine learning models

Article 23 April 2024

Machine learning in agriculture: a review of crop management applications

Article 01 July 2023

1 Introduction

In the present business world, organizations must improve their services in terms of efficiency, reliability, and availability to survive in the market. Sales forecasting and effective demand planning positively affect the performance of a supply chain [1]. The goal of demand planning is to develop a forecasting model that helps decision-makers in the areas of procurement, production, distribution, and sales. Forecasts serve as the basis for action plans carried out by various organizational units at different planning levels [2]. Managers usually predict sales based on their perceptions, intuitions, and experience. This may vary from person to person and it is very hard to continuously get reliable inputs from qualified and experienced managers. As a result, computer networks can assist in decision-making by estimating future sales. Machine learning (ML) can be utilized to create effective sales forecasting models utilizing the vast amount of data and related information [3].

ML techniques have become prevalent across various disciplines due to their ability to address the problems associated with increasingly large and complex datasets [4]. It involves complex algorithms to reveal meaningful patterns in large-scale and diverse datasets, which would be virtually impossible for a well-trained person [5]. Machine learning focuses on inductive inference inducing general models from specific empirical data [6]. In recent years, advancements in this field have been driven primarily by the creation of new algorithms as well as the ongoing burst of data at reduced computational costs [7].

ML methods empowered by predictive analysis create enhanced customer engagement and forecast demands with better precision and accuracy in comparison to the traditional demand forecasting methods [8, 9]. ML techniques can handle complicated correlations between so many causal elements having nonlinear relational demand patterns, thereby boosting retail chain performance [10]. The auto-regressive integrated moving average (ARIMA) and auto-regressive integrated moving average with exogenous variables (ARIMAX) approaches are the most often used predictive models for demand forecasting. Recently developed ML algorithms, such as artificial neural networks (ANN), support vector machine (SVM), and regression trees, have already outperformed the traditional methods [11]. The main objectives of the study are as follows:

To explore the machine learning models used for forecasting.
To compare the select machine learning models and hybrid models for sales forecasting of a US-based retail company.

In this study, several ML models are compared for retail demand forecasting. These models include random forecast (RF), artificial neural network (ANN), gradient boosting (GB), adaptive boosting (AdaBoost), and extreme gradient boosting (XGBoost), and the performance of these models was compared with the proposed hybrid model of the RF, XGBoost, and linear regression (LR). To make an accurate comparison of the said models, various performance metrics, namely, mean squared error (MSE), R² score, and mean absolute error (MAE), were considered. The advantages and limitations of the employed methodologies as well as future options for performance enhancement are explored. The historical sales data of a leading US-based multinational retail company is considered for the forecasting analysis. The company has a large number of retail stores across the globe, specializing in a long range of products fulfilling the day-to-day demands of consumers. The sales data used for forecasting is related to various stores of the company spread across the USA.

The rest parts of the paper are arranged as follows: Sect. 2 represents the literature review, Sect. 3 discusses the research methodology, Sect. 4 represents the case study and result discussion, and Sect. 5 concludes the study.

2 Literature Review

Many advances and changes have been observed in the global business world during the past couple of decades. Some major factors leading to business-related uncertainties include partner activities, consumer behaviour, rival behaviour, evolving technology, and new product development [12]. Because of these uncertainties, the market is becoming complex and competitive and needs contemporary supply chain management [13]. Precise forecasting is essential for the success of supply chains [14]. On the other hand, various endogenous factors concerning the collection and application of field data can make the forecasting techniques extremely difficult and the external factors can also have a detrimental impact on forecast accuracy [15]. Effective demand forecasting can save more than 7% on the annual operating expenditures of a business [16]. Either qualitative or quantitative techniques can be employed for demand forecasting [17]. Executives’ consensus, Delphi technique, historical analogies, and market research are used as qualitative demand forecasting. These techniques heavily rely on domain experts’ subjective evaluations and lack decision models, which are data-driven. Quantitative methods, such as regression and time-series analytics, tend to be more systematic and dependable [18]. Regression methods, in particular, are concerned with determining the causal relationships between the independent and the dependent variables [19].

The notion of industry 4.0, ML, and artificial intelligence (AI) as an innovative framework is now being applied to supply chain analytics [20, 21]. Strategic planning, ad-hoc reporting, and end-user computation are common in business intelligence and analytics that aid in robust performance evaluation and management [22, 23]. Descriptive analytics is a technique that deals with happenings in the past, while diagnostic and predictive analytics deals with the happenings to be predicted in the future. For mitigating harmful impacts, the module prescriptive analytics comes in the application [24, 25].

2.1 Machine Learning Methods

ML methods have garnered considerable interest from researchers and practitioners in demand forecasting [26]. But, when it comes to the context of supply chain management, these methods have not been investigated properly. Although algorithms for ML are complex, they offer a variety of distinct and flexible demand forecasting models [27]. The effectiveness of different statistical and ML methods is heavily debated, making it difficult to draw gross generalizations regarding their efficacy [28]. Under varied circumstances, each model class may outperform the others [29]. Some of the most prominent ML methods are discussed in brief in the following paragraphs which have been compared with the proposed hybrid model.

Random Forest

Punia et al. [30] introduced a hybrid forecasting method that is based on long short-term memory (LSTM) and random forest (RF). This first model utilizes LSTM to map the temporal characteristics of the data and then random forest is used to model the residuals of the LSTM network. The random forest section of the network is of vital significance as it provides a substantial edge in forecasting accuracy due to its ability to predict sudden changes due to holidays, promotions etc.

XGBoost

Kang et al. [31] used the XGBoost hybrid model for tourism and trend prediction. They introduced location attributes and the time-lag effect of network search data to propose the hybrid model. The findings suggest that the spatiotemporal XGBoost composite model outperforms single forecasting approaches. There are several modifiable parameters in the XGBoost algorithm, including general, promotion, and learning objective parameters. They adopted the tree model to concentrate on the nonlinear interactions between the Baidu index and the number of tourists. Using the general parameters, the promotion parameters were changed according to the model chosen.

Gradient Boosting Machine

Xenochristou et al. [32] measured the influence of the spatial scale on water demand forecasting. Multiple models were trained on UK daily consumption records for different aggregations of consumptions. Three different levels of spatial aggregation were created using properties’ postcodes. A gradient boosting model for training on each of the configurations and prediction for water consumption was made for 1 day in the future. The results implied that the amount of spatial aggregation had a substantial influence on forecasting accuracy and errors can be minimized by utilizing additional explanatory variables.

AdaBoost

Walker and Jiang [33] observed that the forecasts using AdaBoost are more accurate and reliable than those derived via a more traditional logistic regression method. They have analysed the importance of each predictor in the AdaBoost model to better understand the relative contribution of each factor to the overall predicted outcome. They observed that AdaBoost models are not easily interpretable as regression model coefficients.

ANN

Jahangir et al. [34] used rough artificial neural network (R-ANN) approach to forecast plug-in electric vehicles travel behaviour (PEVs-TB) and PEV load on the distribution network. R-ANNs can increase the accuracy of forecasting findings due to their capacity to analyse phenomena with high uncertainty. Furthermore, two training methods are used in this paper—conventional error back propagation (CEBP) and Levenberg–Marquardt—which are specified using first- and second-order derivatives, respectively. In PEVs-TB, the results demonstrated that the Levenberg–Marquardt approach is more accurate.

It is observed that various authors advocated the varying level of performance of the different models of forecasting which vary on a case-to-case basis. It is very difficult to say which the best performing model is. In this study, RF, ANN, GB, XGBoost, AdaBoost, and hybrid models are tested and compared with each other for the sales forecasting of a US-based retail company.

The hybrid network has been developed by the separate training of the random forest and the XGBoost model on 67% of the data. Then a new dataset was created by generating the prediction from both the models for all the data points. This new dataset was passed as input to train the linear regression model and it predicted the final sales values. Some of the most common machine learning models used in forecasting are summarized in Table 1.

Table 1 Applications of machine learning models in forecasting

Full size table

2.2 Measurement of Forecasting Accuracy

Chicco et al. [65] used healthcare information to forecast rates of obesity. R² was observed to be more accurate and complete than symmetric mean absolute percentage error (SMAPE). The value of R² becomes high if the analysis accurately predicts the majority of ground truth entities for every ground truth category taking into account their dispersion. The accuracy of various machine learning/deep learning models is compared using MSE, MAE, mean absolute percentage error (MAPE), root mean squared error (RMSE), and R² metrics [64]. Tsoumakas [3] examined the present state of ML algorithms for forecasting food-purchasing habits. It covers essential design concerns for forecasting food sales, such as the temporal granularity of sales figures, the intake factors to use for predicting sales, and the depiction of the sales evaluation function. It also looks at machine learning algorithms for predicting food sales and important measures like MAE, MSE, RMSE, and MASE, for evaluating forecasting accuracy. Ala’raj et al. [66] used Covid infection data to model and forecast Covid-19 outbreaks. They utilized a modified SEIRD (Susceptible, Exposed, Infectious, Recovered, and Dead) dynamic model and ARIMA model for prediction. The model prediction accuracy was estimated by using 5 metrics: AE, MSE, MLSE (maximum likelihood sequence estimation), normalized MAE, and normalized MSE. Ramos et al. [67] examined the efficiency of phase space models and ARIMA regressors as a tool for predictions of retail sales of five different types of women’s footwear: boots, booties, flats, sandals, and shoes. RMSE, MAE, and MAPE were used to evaluate the ARIMA model.

3 Methodology

In this study, the performances of XGBoost, RF, ANN, gradient boosting, AdaBoost, and the proposed hybrid framework (RF-XGBoost-LR) are compared using several performance metrics, namely, MAE, MSE, and R² score (coefficient of determination). XGBoost, RF, gradient boosting, and AdaBoost are ensemble techniques built on top of decision trees, while ANN is a deep learning technique. The framework of a decision tree (DT) comprises a root node (topmost node), internal nodes, and leaf nodes (end nodes). Simple principles are used in DT algorithms to branch out of the root node, passing through internal nodes and eventually ending in the leaves [68]. In this work, Python 3.7.12 was utilized. For data handling, Pandas version 1.1.5 and Numpy version 1.19.5 were used. For model training, XGBoost version 0.90 and Scikit-learn library version 1.0.2 were used.

3.1 Proposed Framework

In this study, a hybrid model of RF-XGBoost-LR is proposed and its performance is compared with other individual models.

Bagging vs Boosting

The primary distinction between the approach of bagging and boosting methods is that the former decreases the variance in prediction by generating the additional data for training from the dataset using combinations with repetitions to produce multi-sets of the original data, while the later adjusts the weight of an observation based on the last classification by iteration. Unlike the bagging method, wherein a uniform selection of each sample is made to build a training dataset, the boosting algorithm’s likelihood of choosing a particular sample is unequal. Misclassified or inaccurately calculated samples are more likely to be selected when they carry a higher weight. As a result, each new model may focus on samples that have incorrectly been classified by earlier models [69].

Random Forest

RF is an ensemble technique in which the results of many regression trees are combined to generate a single prediction. The primary premise is bagging, in which a sample of training data is selected at random and fitted into a regression tree [70]. This randomly selected sample is termed a bootstrap sample and it is chosen with replacement, meaning any previously chosen data point can be chosen again. A bootstrap sample can be made by choosing N data points randomly from the dataset and then substituting them with the data points present in the dataset. There is a 1/N chance of any data point being chosen.

RF is a combination of decision tree estimators $\left\{h\left(X,\Theta {\text{k}}\right),\text{ k }=\text{ 1},\text{ 2},...\right\}$, in which every decision tree is calculated by utilizing the outputs of a random vector $\left\{\Theta {\text{k}}\right\}$, which is independently sampled and evenly distributed among all the decision trees present in the forest.

Once the training is complete, the result of the entire set of decision trees on sample X′ is averaged to generate predictions as shown in Eq. (1).

$$\widehat{f}=\frac{1}{k}\sum\limits_{i=1}^{k}h({X\,{^\prime}}, \Theta k)$$

(1)

where $\widehat{f}$ is the final prediction and k is the number of decision trees.

XGBoost

An abbreviation for ‘extreme gradient boosting’ is XGBoost with potential improvements upon gradient boosting. XGBoost enhances the performance and is capable of solving problems of real-world scale while making use of a minimum number of resources [71]. XGBoost is a parallel tree model built upon the gradient boosting model. It utilizes the tree ensemble method, which is made up of a series of CART. Although XGBoost consists of various unique characteristics, second-order Taylor expansion and embedded normalization algorithms appear to be similar to GBDT [72]. XGBoost models have the advantage of scaling effectively for different scenarios while requiring fewer resources than existing prediction models. Within XGBoost, parallel and distributed computation speeds up the model learning and allows for more rapid model exploration.

Hybrid (RF-XGBoost-LR) Model

RF makes parallel decision trees which help in reducing the overfitting problem. Accuracy improves as a result of the reduction in variance. In RF, individually separate decision trees are used for each of the multiple copies of original training data. Despite its widespread popularity, random forest suffers from conceptual and practical shortcomings. Random forest adaptive learning is inherently poor in terms of minimizing training error. In particular, each tree is learned autonomously. Complementary information from other trees is not fully realized in this kind of training [73]. It results in a reduction in model performance. XGBoost combines multiple weak learners in a sequential method, which iteratively improves model performance.

XGBoost is a boosting technique. It takes advantage of parallel processing and runs the model on several CPU cores. However, it is affected by the renowned overfitting problem in boosting, which also impacts multiple additive regression trees (MARTs). This challenge arises when there are few trees accessible early in the iteration process; as a result, all of the trees impact significantly the model [74]. Overfitting of training data degrades the model’s generalization capabilities, resulting in unreliable performance when applied to novel measurements. High variance and low bias estimators are common manifestations of overfitting. The extra complexity may, and frequently does, aid the model’s performance on a set of training data, but it inhibits future point prediction [75]. Overfitting gives an overly optimistic impression of prediction results in new data drawn from the underlying population [76].

Hybrid models are combinations of two or more single models of machine learning or soft computing to achieve higher flexibility along with higher capability in contrast to a single model. One of the two entities, prediction, and optimization of the prediction are often present in a hybrid model for higher accuracy. Mainly, there are two key reasons to develop a hybrid model:

(i)
To eliminate the risk of an unfortunate prediction of a single forecast in some specific conditions.
(ii)
To improve upon the performance of the independent models.

A hybrid model is designed to reap the advantages and overcome the shortcomings of the individual models involved [77]. In this research, a hybrid ML model has been proposed within which random forest regressor which is a bagging technique and XGBoost regressor which is a boosting technique have been combined.

A hybrid model has been developed to overcome the shortcomings of both the models, i.e. RF and XGBoost, as shown in Fig. 1. The random forest model addresses the overfitting problem inherent to XGBoost as it can decrease the model variance without increasing the model bias. This implies that the overfitting problem may be observed in the forecast of a single regression tree, but it can be eliminated in the average forecast of multiple regression trees. Random forest model is poor in terms of reducing the training error as multiple regression trees are trained autonomously. XGBoost addresses this shortcoming of random forest by sequentially training decision trees.

In the proposed framework, RF and XGBoost models are trained separately and predictions of both the models are used as input into an LR model. The LR model processes the final output.

The LR equation can be defined by Eq. (2):

$$\text{Y }=\beta +{\beta }_{1}{\text{ x}}_{1}+{\beta }_{2}{\text{ x}}_{2}+\varepsilon$$

(2)

where the final prediction is represented by Y, and the predictions from RF and XGBoost are represented by ${\text{x}}_{1}$ and ${\text{x}}_{2}$. The y-intercept is represented using $\beta$. The coefficients and error terms are represented by ${\beta }_{1}$, ${\beta }_{2}$, and $\varepsilon$ respectively. The value of ${\beta }_{1}$ is − 0.05671126, ${\beta }_{2}$ is 1.05822592, and $\beta$ is − 4.4399934967759985e-05.

The Python source codes for all the four ML forecasting models and hybrid models are shown in the Appendix.

4 Case Study and Result Discussions

The case company operates as a merchandiser of consumer products. The international segment manages supercentres, supermarkets, hypermarkets, warehouse clubs, and cash and carries outside of the USA. The company was founded in 1945 and is headquartered in Bentonville, Arkansas. Among the largest retailers in the world, based in the USA, the company experiences revenue gain year over year. It operates grocery stores, supermarkets, hypermarkets, department stores, and discount stores offering commodities at the lowest prices, the strategy which defines it, in more than 25 countries across the globe. Fuel, gift cards, banking services, and other associated products such as money orders, prepaid cards, and wire transfers are all available through the company.

According to statistics, grocery prices were reduced by an average of 10–15% in markets where the company entered. It has a wide product range, which makes it a tough competitor among other companies in the same segment. Products offered range from electronics and offices, movies, music, and books to jewellery, baby products, and furniture for pharmacies. It is capable of lowering grocery prices by another significant margin during promotional periods. The strong market power over the supplier and competitors allows them to sell the products at the lowest prices and helps them compete in the market.

In this research, the data of a retail company known to keep up with the demands of customers by offering a wide range of products at one stop has been used. The sales data of the company spans different regions in the USA. The data consists of weekly sales for all the 45 stores and 99 departments over 3 years. The data consists of different attributes of the store and geographic-specific information, namely store number, size of the store, department of the store, date mentioning the week, region’s average temperature, fuel price in the region, CPI (consumer price index), unemployment rate, and holiday week.

Normalization is a pre-processing step that plays a crucial role in machine learning. Normalizing aids in decreasing the learning time when the datasets are too large. Min–Max normalization transforms the original dataset into the desired interval using a linear transformation. This technique has the advantage of preserving all relations between the data points. Min–Max normalization is given by Eq. (3).

$${x}_{scaled}=\frac{x-{x}_{min}}{{x}_{max}-{x}_{min}}$$

(3)

For proportionate scaling of the data, the Min–Max scale was used at the beginning of the analysis keeping the minimum and maximum values as 0 and 1 respectively.

4.1 Performance Parameters

For the comparison of the forecasting models, mean absolute error (MAE), mean squared error (MSE), and R² value are used as discussed in the following subsections.

Mean Absolute Error

It is an average measure of errors in a set of predictions. Since it is absolute, it ignores the positivity or negativity of the error and all individual errors are equally weighted. The calculation of MAE is straightforward as shown in Eq. (4). To get the ‘total error’, the absolute values of the errors are summed up and divided by the total number of observations [78].

$$MAE=\frac{1}{n}\sum\limits_{i=1}^{n}\left|{y}_{i}-{\widehat{y}}_{i}\right|$$

(4)

where $\gamma_i$ is the true value, ${\widehat\gamma}_i$ is the prediction value, and n is the number of observations.

Mean Squared Error

It is also an average measurement of the errors in a set of predictions. The squares of each error are added together and then averaged as shown in Eq. (5). This ensures that all errors are equal in weight and that the direction of the error is irrelevant. Since it is a quadratic function, it will always reach global minima.

$$MSE=\frac{1}{n}\sum\limits_{i=1}^{n}{\left({y}_{i}-{\widehat{y}}_{i}\right)}^{2}$$

(5)

where $\gamma_i$ is the true value, ${\widehat\gamma}_i$ is the prediction value, and n is the number of observations.

R ² Score

It is also known as the coefficient of determination which expresses the amount of variance in the dependent variable explained by a model [79]. R² score is used to evaluate the scattered data about a fitted regression line. Higher R² values for similar datasets represent smaller differences between the predicted data and the true data. It measures the relationship between predicted and true data on a scale of 0–1. For example, an R² value of 0.8 indicates that the variation of the independent variable explains 80% of the variance of the dependent variable being analysed. It is given by Eq. (6).

$${R}^{2}=1-\frac{S{S}_{res}}{S{S}_{total}}=\frac{{\sum }{\left({y}_{i}-{\widehat{y}}_{i}\right)}^{2}}{{\sum }\left({y}_{i}-\mu \right)}$$

(6)

where $SS_{res}$ is the sum of squares of residuals, $SS_{total}$ is the total sum of squares, $\gamma_i$ is the true value, ${\widehat\gamma}_i$ is the prediction value, and $\mu$ is the mean.

R² mainly shows whether the said model provides the goodness of fit for the observed values. It was also necessary to understand errors for which metrics namely MSE and MAE were utilized. Mean squared error (MSE)’s purpose was to put more effort into outliers. Due to its square, it weighs large errors more heavily than small ones. Mean absolute error (MAE) is used when measuring the prediction in the same unit as the original series. MAE provides information about how much an average error is expected from the forecast on average. Using the above performance parameters (MSE, MAE, R²), all the different models incorporated in this study are compared as shown in Table 2.

Table 2 Performance of the forecasting models

Full size table

In the RF model, the number of estimators was kept at 175 and the maximum depth was kept at 28. In the gradient boosting model, the number of estimators was kept at 125 and the maximum depth was kept at 25. In the AdaBoost model, a decision tree with a depth of 25 as a base estimator was used. In ANN, the model has kept 5-layer deep followed by an output layer. The number of neurons in each layer was 10, 12, 24, 12, and 10. The activation function for each layer was taken as ‘relu’ and ‘adam’ optimizer. The batch size was taken as 256 and the model was trained for 500 epochs. In the XGBoost model, the number of estimators was kept at 150 and the maximum depth was given as 25. In the RF-XGBoost-LR model, at first, the RF and the XGBoost models were used and their predictions were used as input to an LR model.

In the hyper-parameter optimization phase of the machine learning model, determining the most optimal configuration parameters of the ML optimization methods is challenging. As a result, using random values within the effective range of relevant ML algorithm parameters may result in enhanced optimization outcomes. The output from RF and XGBoost models is being passed as input to an LR model. The LR model was used to output the final predictions because of its simplicity. If we use a complex model like gradient boosting in our final layer, the hybrid model would be prone to overfitting. The hybrid model leads to overcoming the shortcomings of both the RF and XGBoost models.

Wolpert [80] proposed stacking (also known as stacked generalization), which is an ensemble of well-performing models for their capabilities. Stacking uses a single model to combine the different predictions from multiple models. Stacked models provide the best results by using a wide range of algorithms in the first layer of design, as different algorithms identify trends and patterns differently in training data, and merging both models results in a more accurate and reliable output.

Various performance measures were utilized to compare the performance of all the models. Holistically, based on three metrics, the proposed forecasting method is found to outperform the other benchmarking methods with an R² score of 0.9551, MAE of 0.0024, and MSE of 4.7932e-05.

Figure 2 shows the comparison of the week-wise sale of all the stores and departments against the sales predicted by the hybrid (RF-XGBoost-LR) model. It is observed from Table 2 that the hybrid networks can better forecast the sales as compared to the other models since they can map the trend of the actual sales most accurately.

4.2 Academic Implications

The proposed hybrid model can be utilized to enhance supply chain-related studies and be applied to extend research work on demand forecasting. The robust performance of the proposed framework augments its utility. Retailers, wholesalers, and other industries can use it to their benefit. It is however essential to have adequate domain knowledge for it to be tailored for various applications in different industries. Products in different sectors may have distinct properties that may be retrieved and put into the forecasting framework to fully increase their performance. This makes industry-specific customization to be a potential research topic.

4.3 Managerial Implications

The findings of the study show that the proposed hybrid model improves the forecasting accuracy up to a large extent compared to other individual machine learning models. Both the models random forest and XGBoost jointly overcome the problems of overfitting and training errors in linear regression analysis of the data, and hence, the forecast values are very close to the actual values of sales. Thus, the proposed model helps the industry decision-makers in more accurate forecasting, which leads to the formulation of a better marketing strategy, increasing stock turnover, optimizing capacity building, lowering supply chain costs, and improving customer happiness. An accurate demand forecasting method can improve the supply chain performance by eliminating the bullwhip effect and proper inventory management.

5 Conclusion

In this study, a hybrid model of ML has been proposed combining XGBoost, RF, and LR, for real-time analysis of sales data. Sales data of the retail company with various attributes are trained to introduce a newer more advanced model of veracity. To address the shortcomings of both the RF and XGBoost models, a hybrid model is proposed. At first, the dataset was normalized and then trained and tested separately in RF and XGBoost models. The predictions from these models were assimilated to create a new dataset, which was used as input into the LR model to generate the final predictions.

It is observed that by combining the XGBoost model with the RF model, the dataset improves the accuracy due to reduced variance and enhanced robustness to outliers, which results in improved predictive ability and less vulnerability to overfitting. Three metrics were used in this study, which are MAE, MSE, and R² scores. The results suggest that the proposed hybrid model RF-XGBoost-LR (MAE = 0.0024, MSE = 4.7932e-05, and R² score = 0.9551) has better performance than the other models, namely RF, ANN, gradient boosting, AdaBoost, and XGBoost. R-squared score infers that the model explains 95.51% of the data and variables incorporated.

A precise demand forecasting in an integrated commercial planning environment can be utilized to optimize capacity building, schedule labour management, inventory, supply chain management etc. In the proposed hybrid model, random forest helps to overcome the overfitting problem of XGBoost, while XGBoost is used to reduce the error by training the decision trees sequentially. Forecasting using the proposed model may improve stock availability and enhance stock allocation.

With all the implications, the proposed hybrid model has some limitations in terms of the requirement of the bid size of the training data, decision integration etc. As the size of training datasets expands, machine learning algorithms become more effective.

Data Availability

The datasets generated during and/or analysed during the current study are available in the Kaggle.com repository (https://www.kaggle.com/competitions/walmart-recruiting-store-sales-forecasting/data).

References

Kantasa-Ard A, Nouiri M, Bekrar A, Ait el Cadi A, Sallez Y (2021) Machine learning for demand forecasting in the physical internet: a case study of agricultural products in Thailand. Int J Prod Res 59(24):7491–7515
Article Google Scholar
Haberleitner H, Meyr H, Taudes A (2010) Implementation of a demand planning system using advance order information. Int J Prod Econ 128(2):518–526
Article Google Scholar
Tsoumakas G (2019) A survey of machine learning techniques for food sales prediction. Artif Intell Rev 52(1):441–447
Article Google Scholar
Wilson ZT, Sahinidis NV (2017) The ALAMO approach to machine learning. Comput Chem Eng 106:785–795
Article Google Scholar
Goecks J, Jalili V, Heiser LM, Gray JW (2020) How machine learning will transform biomedicine. Cell 181(1):92–101
Article Google Scholar
Hüllermeier E (2015) Does machine learning need fuzzy logic? Fuzzy Sets Syst 281:292–299
Article Google Scholar
Holzinger A (2016) Interactive machine learning for health informatics: when do we need the human-in-the-loop? Brain Inf 3(2):119–131
Article Google Scholar
Bohanec M, Borštnar MK, Robnik-Šikonja M (2017) Explaining machine learning models in sales predictions. Expert Syst Appl 71:416–428
Article Google Scholar
Chase CW Jr (2016) Machine learning is changing demand forecasting. J Bus Forecast 35(4):43
Google Scholar
Ampazis N (2015) Forecasting demand in supply chain using machine learning algorithms. Int J Artif Life Res (IJALR) 5(1):56–73
Article Google Scholar
Smolak K, Kasieczka B, Fialkiewicz W, Rohm W, Siła-Nowicka K, Kopańczyk K (2020) Applying human mobility and water consumption data for short-term water demand forecasting using classical and machine learning models. Urban Water J 17(1):32–42
Article Google Scholar
Sillanpää V, Liesiö J (2018) Forecasting replenishment orders in retail: value of modelling low and intermittent consumer demand with distributions. Int J Prod Res 56(12):4168–4185
Article Google Scholar
Mohammed A (2020) Towards ‘gresilient’ supply chain management: a quantitative study. Resour Conserv Recycl 155:104641
Article Google Scholar
Oliva R, Watson N (2009) Managing functional biases in organizational forecasts: a case study of consensus forecasting in supply chain planning. Prod Oper Manag 18(2):138–151
Article Google Scholar
Van der Laan E, van Dalen J, Rohrmoser M, Simpson R (2016) Demand forecasting and order planning for humanitarian logistics: an empirical assessment. J Oper Manag 45:114–122
Article Google Scholar
Van Wassenhove LN, Pedraza Martinez AJ (2012) Using OR to adapt supply chain management best practices to humanitarian logistics. Int Trans Oper Res 19(1–2):307–322
Article Google Scholar
Holt CC (2004) Forecasting seasonals and trends by exponentially weighted moving averages. Int J Forecast 20(1):5–10
Article Google Scholar
Maia ALS, de Carvalho FDA (2011) Holt’s exponential smoothing and neural network models for forecasting interval-valued time series. Int J Forecast 27(3):740–759
Article Google Scholar
Wang CH, Chen JY (2019) Demand forecasting and financial estimation considering the interactive dynamics of semiconductor supply-chain companies. Comput Ind Eng 138:106104
Article Google Scholar
Jacobs FR, Chase RB, Lummus RR (2014) Operations and supply chain management (pp 533–535). New York, NY: McGraw-Hill/Irwin
Stevenson WJ, Hojati M, Cao J (2014) Operations management (p. 182). Chicago-USA: McGraw-Hill Education
Lu WM, Wang WK, Lee HL (2013) The relationship between corporate social responsibility and corporate performance: evidence from the US semiconductor industry. Int J Prod Res 51(19):5683–5695
Article Google Scholar
Wang CH, Chen YW (2016) Combining balanced scorecard with data envelopment analysis to conduct performance diagnosis for Taiwanese LED manufacturers. Int J Prod Res 54(17):5169–5181
Article Google Scholar
Addo-Tenkorang R, Helo PT (2016) Big data applications in operations/supply-chain management: a literature review. Comput Ind Eng 101:528–543
Article Google Scholar
Hazen BT, Skipper JB, Ezell JD, Boone CA (2016) Big data and predictive analytics for supply chain sustainability: a theory-driven research agenda. Comput Ind Eng 101:592–598
Article Google Scholar
Abolghasemi M, Hyndman RJ, Tarr G, Bergmeir C (2019) Machine learning applications in time series hierarchical forecasting. arXiv preprint arXiv:1912.00370
Abolghasemi M, Beh E, Tarr G, Gerlach R (2020) Demand forecasting in supply chain: the impact of demand volatility in the presence of promotion. Comput Ind Eng 142:106380
Article Google Scholar
Aye GC, Balcilar M, Gupta R, Majumdar A (2015) Forecasting aggregate retail sales: the case of South Africa. Int J Prod Econ 160:66–79
Article Google Scholar
Ahmed NK, Atiya AF, Gayar NE, El-Shishiny H (2010) An empirical comparison of machine learning models for time series forecasting. Economet Rev 29(5–6):594–621
Article Google Scholar
Punia S, Nikolopoulos K, Singh SP, Madaan JK, Litsiou K (2020) Deep learning with long short-term memory networks and random forests for demand forecasting in multi-channel retail. Int J Prod Res 58(16):4964–4979
Article Google Scholar
Kang J, Guo X, Fang L, Wang X, Fan Z (2021) Integration of Internet search data to predict tourism trends using spatial-temporal XGBoost composite model. Int J Geogr Inf Sci 36(2):236–252
Article Google Scholar
Xenochristou M, Hutton C, Hofman J, Kapelan Z (2020) Water demand forecasting accuracy and influencing factors at different spatial scales using a gradient boosting machine. Water Resources Res 56(8):e2019WR026304
Walker KW, Jiang Z (2019) Application of adaptive boosting (AdaBoost) in demand-driven acquisition (DDA) prediction: a machine-learning approach. J Acad Librariansh 45(3):203–212
Article Google Scholar
Jahangir H, Tayarani H, Ahmadian A, Golkar MA, Miret J, Tayarani M, Gao HO (2019) Charging demand of plug-in electric vehicles: forecasting travel behaviour based on a novel rough artificial neural network approach. J Clean Prod 229:1029–1044
Article Google Scholar
Islam S, Amin SH (2020) Prediction of probable backorder scenarios in the supply chain using distributed random forest and gradient boosting machine learning techniques. J Big Data 7(1):1–22
Article Google Scholar
Mueller SQ (2020) Pre-and within-season attendance forecasting in Major League Baseball: a random forest approach. Appl Econ 52(41):4512–4528
Article Google Scholar
Li C, Tao Y, Ao W, Yang S, Bai Y (2018) Improving forecasting accuracy of daily enterprise electricity consumption using a random forest based on ensemble empirical mode decomposition. Energy 165:1220–1227
Article Google Scholar
Rao C, Liu M, Goh M, Wen J (2020) 2-stage modified random forest model for credit risk assessment of P2P network lending to “Three Rurals” borrowers. Appl Soft Comput 95:106570
Article Google Scholar
Ni L, Wang D, Wu J, Wang Y, Tao Y, Zhang J, Liu J (2020) Streamflow forecasting using extreme gradient boosting model coupled with Gaussian mixture model. J Hydrol 586:124901
Article Google Scholar
Wang Y, Sun S, Chen X, Zeng X, Kong Y, Chen J, Wang T (2021) Short-term load forecasting of industrial customers based on SVMD and XGBoost. Int J Electr Power Energy Syst 129:106830
Article Google Scholar
Yun KK, Yoon SW, Won D (2021) Prediction of stock price direction using a hybrid GA-XGBoost algorithm with a three-stage feature engineering process. Expert Syst Appl 186:115716
Article Google Scholar
Wang Z, Hong T, Piette MA (2020) Building thermal load prediction through shallow machine learning and deep learning. Appl Energy 263:114683
Article Google Scholar
Jabeur SB, Mefteh-Wali S, Viviani JL (2021) Forecasting gold price with the XGBoost algorithm and SHAP interaction values. Ann Operations Res 1–21
Osman AIA, Ahmed AN, Chow MF, Huang YF, El-Shafie A (2021) Extreme gradient boosting (Xgboost) model to predict the groundwater levels in Selangor Malaysia. Ain Shams Eng J 12(2):1545–1556
Article Google Scholar
Shi R, Xu X, Li J, Li Y (2021) Prediction and analysis of train arrival delay based on XGBoost and Bayesian optimization. Appl Soft Comput 109:107538
Article Google Scholar
Zhou L, Lai KK (2017) AdaBoost models for corporate bankruptcy prediction with missing data. Comput Econ 50(1):69–94
Article Google Scholar
Barrow DK, Crone SF (2016) A comparison of AdaBoost algorithms for time series forecast combination. Int J Forecast 32(4):1103–1119
Article Google Scholar
Wang L, Lv SX, Zeng YR (2018) Effective sparse adaboost method with ESN and FOA for industrial electricity consumption forecasting in China. Energy 155:1013–1031
Article Google Scholar
Sidhu RK, Kumar R, Rana PS (2020) Machine learning based crop water demand forecasting using minimum climatological data. Multimed Tools Appl 79(19):13109–13124
Article Google Scholar
Busari GA, Lim DH (2021) Crude oil price prediction: a comparison between AdaBoost-LSTM and AdaBoost-GRU for improving forecasting performance. Comput Chem Eng 155:107513
Article Google Scholar
Huang H, Zhang Z, Song F (2021) An ensemble-learning-based method for short-term water demand forecasting. Water Resour Manage 35(6):1757–1773
Article Google Scholar
Sun S, Wei Y, Wang S (2018) AdaBoost-LSTM ensemble learning for financial time series forecasting. Int Conf Comput Sci (pp 590–597). Springer, Cham
Heo J, Yang JY (2014) AdaBoost based bankruptcy forecasting of Korean construction companies. Appl Soft Comput 24:494–499
Article Google Scholar
Sharma V, Cali Ü, Sardana B, Kuzlu M, Banga D, Pipattanasomporn M (2021) Data-driven short-term natural gas demand forecasting with machine learning techniques. J Petrol Sci Eng 206:108979
Article Google Scholar
Deng S, Wang C, Wang M, Sun Z (2019) A gradient boosting decision tree approach for insider trading identification: an empirical model evaluation of China stock market. Appl Soft Comput 83:105652
Article Google Scholar
Yoon J (2021) Forecasting of real GDP growth using machine learning models: gradient boosting and random forest approach. Comput Econ 57(1):247–265
Article Google Scholar
Gu Q, Chang Y, Xiong N, Chen L (2021) Forecasting nickel futures price based on the empirical wavelet transform and gradient boosting decision trees. Appl Soft Comput 109:107472
Article Google Scholar
Nie P, Roccotelli M, Fanti MP, Ming Z, Li Z (2021) Prediction of home energy consumption based on gradient boosting regression tree. Energy Rep 7:1246–1255
Article Google Scholar
Güven İ, Şimşir F (2020) Demand forecasting with color parameter in retail apparel industry using artificial neural networks (ANN) and support vector machines (SVM) methods. Comput Ind Eng 147:106678
Article Google Scholar
Yucesan M, Gul M, Celik E (2018) A multi-method patient arrival forecasting outline for hospital emergency departments. Int J Healthcare Manage 13(Sup1):283–295
Article Google Scholar
Fanoodi B, Malmir B, Jahantigh FF (2019) Reducing demand uncertainty in the platelet supply chain through artificial neural networks and ARIMA models. Comput Biol Med 113:103415
Article Google Scholar
Jebaraj S, Iniyan S, Goic R (2011) Forecasting of coal consumption using an artificial neural network and comparison with various forecasting techniques. Energy Sources Part A Recov Util Environ Effects 33(14):1305–1316
Article Google Scholar
Zhao X, Yue S (2021) Analysing and forecasting the security in supply-demand management of Chinese forestry enterprises by linear weighted method and artificial neural network. Enterprise Inf Syst 15(9):1280–1297
Article Google Scholar
Loureiro AL, Miguéis VL, da Silva LF (2018) Exploring the use of deep neural networks for sales forecasting in fashion retail. Decis Support Syst 114:81–93
Article Google Scholar
Chicco D, Warrens MJ, Jurman G (2021) The coefficient of determination R-squared is more informative than SMAPE, MAE, MAPE, MSE and RMSE in regression analysis evaluation. PeerJ Comput Sci 7:e623
Article Google Scholar
Ala’raj M, Majdalawieh M, Nizamuddin N (2021) Modeling and forecasting of COVID-19 using a hybrid dynamic model based on SEIRD with ARIMA corrections. Infect Dis Model 6:98–111
Article Google Scholar
Ramos P, Santos N, Rebelo R (2015) Performance of state space and ARIMA models for consumer retail sales forecasting. Robot Comput Integr Manuf 34:151–163
Article Google Scholar
Parsa AB, Movahedi A, Taghipour H, Derrible S, Mohammadian AK (2020) Toward safer highways, application of XGBoost and SHAP for real-time accident detection and feature analysis. Accid Anal Prev 136:105405
Article Google Scholar
Zhang Y, Haghani A (2015) A gradient boosting method to improve travel time prediction. Transport Res Part C Emerg Technol 58:308–324
Article Google Scholar
Lahouar A, Slama JBH (2015) Day-ahead load forecast using random forest and expert input selection. Energy Convers Manage 103:1040–1051
Article Google Scholar
Chen T, Guestrin C (2016) XGBoost: a scalable tree boosting system, in proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’16). San Francisco, CA, 785–794
Kaplan UE, Dagasan Y, Topal E (2021) Mineral grade estimation using gradient boosting regression trees. Int J Min Reclam Environ 35(10):728–742
Article Google Scholar
Ren S, Cao X, Wei Y, Sun J (2015) Global refinement of random forest. Proc IEEE Conf Comput Vision Pattern Recogn 723–730
Samat A, Li E, Wang W, Liu S, Lin C, Abuduwaili J (2020) Meta-XGBoost for hyperspectral image classification using extended MSER-guided morphological profiles. Remote Sens 12(12):1973
Article Google Scholar
Jabbar H, Khan RZ (2015) Methods to avoid over-fitting and under-fitting in supervised machine learning (comparative study). Computer Sci Commun Instrument Devices 70
Steyerberg EW (2019) Overfitting and optimism in prediction models. Clin Predict Models (pp 95–112). Springer, Cham.)
Ardabili S, Mosavi A, Várkonyi-Kóczy AR (2019) Advances in machine learning modeling reviewing hybrid and ensemble methods. In International Conference on Global Research and Education (pp 215–227). Springer, Cham
Wang Z, Bovik AC (2009) Mean squared error: love it or leave it? A new look at signal fidelity measures. IEEE Signal Process Mag 26(1):98–117
Article Google Scholar
Valbuena R, Hernando A, Manzanera JA, Görgens EB, Almeida DR, Silva CA, García-Abril A (2019) Evaluating observed versus predicted forest biomass: R-squared, index of agreement or maximal information coefficient? Eur J Remote Sens 52(1):345–358
Article Google Scholar
Wolpert DH (1992) Stacked generalization. Neural Netw 5(2):241–259
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mechanical Engineering, Delhi Technological University, Delhi, 110042, India
Arnab Mitra, Arnav Jain, Avinash Kishore & Pravin Kumar

Authors

Arnab Mitra
View author publications
You can also search for this author in PubMed Google Scholar
Arnav Jain
View author publications
You can also search for this author in PubMed Google Scholar
Avinash Kishore
View author publications
You can also search for this author in PubMed Google Scholar
Pravin Kumar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pravin Kumar.

Ethics declarations

Competing Interests

The authors declare no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

1.1 Python Source Code

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Mitra, A., Jain, A., Kishore, A. et al. A Comparative Study of Demand Forecasting Models for a Multi-Channel Retail Company: A Novel Hybrid Machine Learning Approach. Oper. Res. Forum 3, 58 (2022). https://doi.org/10.1007/s43069-022-00166-4

Download citation

Received: 26 March 2022
Accepted: 22 August 2022
Published: 27 September 2022
DOI: https://doi.org/10.1007/s43069-022-00166-4

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A Comparative Study of Demand Forecasting Models for a Multi-Channel Retail Company: A Novel Hybrid Machine Learning Approach

Abstract

Similar content being viewed by others

A Review on Random Forest: An Ensemble Classifier

Urban flood susceptibility mapping using frequency ratio and multiple decision tree-based machine learning models

Machine learning in agriculture: a review of crop management applications

1 Introduction

2 Literature Review

2.1 Machine Learning Methods

Random Forest

XGBoost

Gradient Boosting Machine

AdaBoost

ANN

2.2 Measurement of Forecasting Accuracy

3 Methodology

3.1 Proposed Framework

Bagging vs Boosting

Random Forest

XGBoost

Hybrid (RF-XGBoost-LR) Model

4 Case Study and Result Discussions

4.1 Performance Parameters

Mean Absolute Error

Mean Squared Error

R 2 Score

4.2 Academic Implications

4.3 Managerial Implications

5 Conclusion

Data Availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing Interests

Additional information

Publisher’s Note

Appendix

Appendix

1.1 Python Source Code

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation

R ² Score