1 Introduction

Fast-moving consumer goods (FMCGs) are products that are typically consumed quickly and sold in short intervals and at low cost [1]. One large subcategory of FMCG is foods and beverages (F&Bs) for immediate consumption in F&B outlets such as restaurants, fast food delis and coffee shops. For outlets operating in the F&B sector, accurately predicting short-term sales and demand is a business success factor and even an enabler of environmental responsibility [2].

First, accurate sales prediction is a prerequisite to delivering appropriate service levels. Managers must ensure adequate staff during peak hours to avoid slow turn-around times and prevent lost sales from customers leaving due to queues, long waiting times or seat unavailability. During staff shortages, clients might also stay but reduce future visit frequency, leading to future lost sales. However, especially in the F&B sector, which often has tight margins, there is a need to retrench costs by avoiding overstaffing and thus balancing productivity and revenues [3]. Therefore, restaurant managers typically rely on sales predictions to create shift plans [4].

Second, accurate sales and demand prediction is important for procurement efficiency and avoidance of food waste. Many products sold in F&B outlets are perishable [5]. For example, fresh products such as salads, sushi, and French fries typically have short shelf lives and are prepared upfront in quantities corresponding to expected demand so that they can be served immediately. Consequently, if demand predictions are too high, food will be wasted, which reduces business margins and has a negative impact on the environment. For example, in 2019, the U.S. restaurant sector produced 11.4 million tons of food waste [6]. In addition, food costs represent 28% to 35% of sales in restaurants [7]. Hence, there is a strong rationale to not only overstock and overproduce food but also not to underproduce to avoid dissatisfied customers and lost revenues [8].

Third, sound financial planning requires evidence and should be based on scientific best practices [9, 10]. There is a significant relationship between the good cash management of family-owned restaurants and their longevity [11]. This is particularly important, as within the first 4 years, the majority of family-owned restaurants cease operations, which often has a detrimental effect on family finances and communities [11]. Research has shown that liquidity management in small businesses is usually based on the past experience of owner-managers [12]. This is analogous to the current practice of experience-based sales forecasts in F&B outlets. As F&B sales are often cyclical but also subject to external factors such as weather or the macroeconomic climate, state-of-the-art sales forecasts should inform the financial planning of F&B businesses.

Currently, the use of modern digital technologies in the restaurant sector revolves around aspects relating to immediate customer service, such as online table bookings, taking customer orders digitally and electronic billing and payment options [13]. The use of forecasting technology for the optimization of operations and financial planning is less common, especially for smaller restaurants [14, 15]. Instead, outlet managers largely rely on manual demand predictions based on simple heuristics and their experience. It is common practice to look at the sales of the same weekday of the previous year and then, based on personal experience, adjust this value for other influencing factors such as holidays and weather forecasts [16]. Hence, there is substantial potential to improve the accuracy and use of prediction models in the F&B sector by applying research findings on algorithms suitable for sales predictions and how external data can be integrated effectively into those models [2].

However, for AI-based sales forecasting to have a tangible positive impact on F&B businesses, it is equally important to consider factors that drive the adoption of AI-based solutions. In fact, the level of digital transformation of a restaurant is strongly influenced by the personal characteristics of the entrepreneur/manager, with education, entrepreneurial motivation and ambition for growth having a positive impact on the usage of modern information technology and software to improve the business [13].

Furthermore, for F&B managers to embrace AI-based sales forecasts, there are additional important prerequisites or “soft factors”. From the literature on technology acceptance that focuses on AI, we know that trust in AI-based recommendations is a prerequisite for managers to use them in operations effectively [17]. For managers to gain trust in AI-generated recommendations, not only is the mere accuracy of the forecasts important, but so-called twinning also plays an important role; it means that the information that the AI uses is aligned with the information that is available to managers and can thus reflect decision-making processes in a business [17]. Conversely, managers may not use AI-based sales forecasts if they think not all relevant information is considered, or they might heavily adjust forecasts whereby the performance of the adjustment heavily depends on their personality [18] and are prone to biases such as overoptimism [19,20,21]. In addition, for managers and staff to embrace an AI solution, clear communication of the benefits of AI is needed, which relates to both the benefits for the business and potentially the benefits to employees in terms of an increase in work productivity [17].

Against this background, this case study describes a sales forecasting project with a fast-food restaurant in Germany that is part of an international franchise system from two angles. On the one hand, we present our technical forecasting approach, namely, the selected AI algorithm, how variables were chosen, hyperparameters tuned and the forecasting accuracy was evaluated. On the other hand, we also describe in detail how we worked closely with the local management to ensure that the forecasts meet the requirements of the business and are trusted.

The following section provides an overview of the specificities of the F&B domain with regard to the data and automation requirements of AI-based forecasting models. Based on this, we present a case study on how we co-developed an AI-based sales forecasting model with the franchisee and management of a fast-food restaurant.

2 Specificities of the F&B domain regarding the prediction task

In the F&B industry, demand is a key driver of sales, as prices are relatively constant and can be controlled by management. When identifying potential predictors of sales, most emphasis should be placed on determinants of customer demand for food and beverages in immediate consumption outlets.

Research in the domain has shown that F&B demand is highly cyclical and depends largely on date variables, such as the day of the week and the month [7, 22]. External variables, such as weather conditions [23, 24], holidays [25], sales promotions [26], the general macroeconomic climate [27] and local events [28], also have a strong impact on F&B demand. However, there has been a lack of integration of data sources that capture these influencing factors in forecasting models to produce automated, short-term predictions. One key challenge is that data on factors that influence demand for F&Bs are mostly not easily accessible or available in a structured format. Given the effort it takes to systematically collect such data, it is important to carefully evaluate the effort and the impact associated with incorporating such variables into demand forecasting models.

2.1 High-velocity data

Weather conditions are highly variable, and weather forecasts can change quickly. General weather patterns can be captured indirectly via monthly variables. For example, in the Northern Hemisphere, the average temperatures in July are higher than those in January. However, micro weather conditions also have a strong impact on F&B demand [7, 25]. These include atmospheric conditions such as clouds, sunshine, rainfall, wind and temperature. For example, the demand for ice cream on a particular day in June is generally higher during sunny and warm days than during rainy and cool days [29]. Reliable forecasts of such micro weather conditions are available only a few days in advance. To date, incorporating real-time weather forecasts into F&B prediction models has been challenging, primarily due to the limited availability and the lack of easy real-time access to weather data. This article shows that the emergence of weather APIs has great potential to usher in a step change in prediction accuracy for F&B demand.

2.2 High variety data

Information on local events and holidays such as festivals, parades and concerts is characterized by a high level of variety. Such information is available from local event calendars, newspaper articles, school calendars and even social media. Data sources are typically available online but are not standardized. Collecting such information manually is a time-consuming task, especially for large F&B systems that operate numerous restaurants across different geographies. This case study presents avenues to automate data collection for regional events and holidays for incorporation into predictive models. It also discusses the trade-off between the effort of capturing additional unstructured data sources and the increase in prediction accuracy of integrating such variables into a prediction model.

2.3 Temporal granularity

For F&B outlets, the time intervals for which demand needs to be predicted are shorter than those for other types of FMCGs. As demand fluctuates substantially throughout the day, F&B outlets typically operate on hourly demand forecasts for food procurement and preparation. For example, burger patties must be defrosted, potato wedges must be fried, and salads must be chopped. Equally important, hourly demand predictions are necessary for staff planning, as more service staff must be available during peak hours, such as lunch time [8]. Due to the greater temporal granularity of predictions, hourly demand forecasting models must be trained on larger datasets than those used for demand forecasts for a larger temporal unit, such as a day, week or month. In fact, time of day is an important additional predictor that might interact with other predictors, such as weekdays (e.g., demand for a salad deli at 12 pm in a business district would be higher on a Monday than a Sunday). More predictors generally lead to higher model complexity, which in turn requires larger training datasets to avoid overfitting and ensure reliable prediction results [30]. This case study trains and evaluates an extreme gradient boosting (XGBoost) prediction model to assess the impact of including semistructured and/or high-velocity external data in F&B sales forecasting approaches.

3 Case description

Producing hourly demand forecasts is an important yet often tedious task for restaurant managers. It is common practice for managers to access historical data from internal systems, review them, and enter hourly predictions into a planning system while making manual adjustments to historical values based on influencing factors that they consider relevant. For experienced managers, this is a repetitive task. For new managers, this can be a challenging task, as they lack the experience and “gut feeling” of how to estimate the impact of external factors such weather conditions and holidays.

Against this background, we worked with a fast-food restaurant in Germany that is part of an international franchise system to improve and automate its demand forecasts. In this section, we describe our methodological approach. We will describe, in detail, the internal and external data sources we utilized to augment predictions based on several discussions with the management team. We will also briefly describe the extreme gradient boosting (XGBoost) algorithm we trained to produce automated forecasts and why it is suitable for this type of prediction task. In the results section, we will evaluate the effect of augmenting traditional forecasting approaches with new data sources. Furthermore, we will compare the prediction accuracy of our augmented model with the accuracy of manual predictions made by experienced staff managing the two restaurants.

3.1 Methodology

To prepare and structure our forecasting initiative, we implemented a five-step process. This included: first, establishing a clear definition of business requirements; second, identifying relevant predictor variables; third, capturing data and performing feature engineering; fourth, preprocessing the target variable; and fifth, outlining a rigorous protocol for model training, validation and testing. 

3.1.1 Definition of requirements

As a first step to prepare the automation and improvement of the demand forecasting process, we conducted a series of interviews with the restaurant management team.The interview process served to gather valuable insights into the current demand forecasting approach and its business implications. This formed the foundation for the requirements engineering efforts. The management team expressed a specific interest in hourly sales forecasts, given their relevance to various operational tasks, including shift planning, prioritizing this metric over predicted quantities. According to the management team, sales is a metric that adequately reflects the demand pattern, as the product mix is relatively stable and slight fluctuations are of little operational relevance. To take full advantage of automation and avoid unnecessary operational complexity resulting from more granular forecasts on a product volume level, we jointly decided to select hourly sales as the target variable.

Furthermore, hourly sales forecasts were needed for an entire month—approximately 2 to 5 days before the beginning of the respective month. In addition, every other day, short-term predictions were needed for the next 2–3 days, which should account for the latest weather forecast. Due to the location of the restaurant near a motorway leading to a popular outdoor leisure destination, the impact of weather conditions on sales was deemed very high. The manual process of regularly checking short-term weather forecasts and adjusting hourly sales predictions had been a nuisance for managers thus far. Hence, the import of weather forecasts into the prediction model should be automated.

3.1.2 Feature identification

In addition to weather, the management team identified further factors impacting demand, including local events and festivals, school holidays in the federal local state as well as the neighboring states and countries, and public holidays—all these factors affect traffic to the region and thus restaurant consumption. As the case study was conducted during the COVID-19 pandemic, which started in early 2020 in Germany, overall consumer trust was identified as a potential predictor of restaurant sales, as even after the lockdowns, many consumers were still reluctant to visit restaurants. Estimating the level and impact of consumer trust on sales had been highly challenging for the management and had not been systematically considered in manual demand predictions. Finally, sales promotions were also identified as drivers of sales to be considered in the prediction model.

3.1.3 Capturing data sources and feature engineering

Table 1 provides an overview of the data sources that we identified as suitable to collect the required information on the target and features.

Table 1 Overview of data sources

We sequentailly extracted four types of data in.xlsx format for the period 01/01/2015 to 09/14/2020 from internal systems: hourly sales, restaurant opening hours, sales promotions, and, for certain weeks, manual sales forecasts produced by the managers of the restaurant.

Historical and real-time hourly weather data were obtained from OpenWeather [31] in CSV and JSON formats. In addition to a bulk purchase, we integrated an API call for 4-day weather forecasts into our Python program, requiring the restaurant’s geolocation [32], which we obtained via the Geocoding API [33]. Weather conditions (e.g., “clear”, “rain”) were converted into dummy variables.

School holiday dates and public holidays in the federal state and neighboring regions, impacting the restaurant’s sales, were manually gathered, as API access was restricted and the manual collection effort was limited. However, there has recently been an increasing number of school holiday APIs, many created by individuals or small nonprofit organizations, such as ferien-api.de [34], which might consider viable alternatives. Local and national public holidays are available on several websites, such as arbeitstage.org [35], and through the schulferien.org API [36]. The data collected needed certain adjustments, detailed in Table 2, to capture the demand pattern of the business of the restaurant situated near a motorway leading to a popular beachside location.

Table 2 Feature definition and engineering

Relevant local events were identified through management interviews and news searches and included in the dataset. Furthermore, we created two variables functioning as lag variables to enable the model to capture potential structural increases or decreases in sales, namely, a weekly year-on-year sales growth rate as well as a marker of the last 30 days.

Table 2 summarizes all model variables, including those derived from feature engineering.

3.1.4 Preprocessing of data on the target variable

The dataset provided by the restaurant comprises 5.75 years of hourly sales data from January 1st, 2015, to September 14th, 2020, inclusive. First, for analysis, we included records only when the restaurant was open, which we achieved by filtering the dataset by historical opening hour records obtained from the restaurant. Second, we excluded the time periods from May 9th until May 18th, 2016, inclusive, when the restaurant was closed for renovation, and from March 17th, 2020, until May 18th, 2020, inclusive, when the restaurant was offering takeaway only due to the COVID-19 pandemic and related lockdowns, which substantially affected sales volumes. No other outliers were identified or removed.

Third, to avoid showing actual sales numbers, which are sensitive information, we performed the following transformation: all hourly sales data are divided by the difference of the maximum sales volume and the minimum sales volume per hour observed in the dataset and then multiplied by 1000. Consequently, all observations of scaled sales are between zero and 1000, and we will refer to them as normalized sales. Figure 1 shows normalized sales data aggregated by day. It shows that daily sales volumes are highly cyclical and display a trend to be captured by the predictor variables in the model.

Fig. 1
figure 1

Normalized daily sales data (note: data removed due to COVID-19-related lockdowns are displayed as a straight line)

Hourly sales data also exhibit a strong cyclical component. As seen in Figs. 2 and 3, sales increase throughout the morning, peak during lunchtime, drop slightly after lunchtime, increase again during dinnertime and then drop after dinner.

Fig. 2
figure 2

Two exemplary weeks of hourly series of normalized sales data

Fig. 3
figure 3

Average normalized sales by hour of the day and day of the week

From Fig. 3, we can also see an interaction between the hour of the day and day of week, as the sales increase in the evening is substantially higher on weekends—Sundays in particular—than on weekdays.

3.1.5 Model definition

Predicting hourly sales that follow a cyclical pattern based on past data can be defined as a machine learning task. There are two major advantages of using a machine learning approach rather than a classical time series technique, such as autoregressive moving average (ARMA) and autoregressive integrated moving average (ARIMA). First, an ML approach is more powerful, as it allows the use of state-of-the-art supervised learning algorithms such as support vector machines (SVMs) and gradient boosted trees. Second, the ML approach is highly flexible, as it enables the integration of numerous external variables into the model [2].

Therefore, we defined our prediction task as a regression problem using XGBoost as the learning algorithm. XGBoost was introduced in 2016 [37] and has since gained massive popularity in the data science community due to its strong performance [38] and high execution speed [39] for regression and classification tasks.

XGBoost is a gradient tree boosting method that applies an ensemble of classification and decision trees (CARTs) as weak learners. It then boosts the performance of the trees by creating an ensemble of trees that minimize a regularized objective function [40].

XGBoost is available as an open-source library that can be installed free of charge [41].

Our XGBoost model is defined as an ensemble Fm(x) of weak learners fi(x) that sequentially learns from the residual of the previous ensemble with f0(x) learning from the original training dataset described in Table 2. Mathematically, if m > 0 and m \(\in {\mathbb{N}}\), then

$$F_{m} \left( x \right) = \mathop \sum \limits_{i = 0}^{m} f_{i} \left( x \right)$$

is the ensemble at the m-th boosting round, where fm(x) learns from the residuals of Fm1(x). fm(x) is the learner that minimizes the objective function Lm with

$$L^{m} = \mathop \sum \limits_{i = 1}^{n} l\left( {y_{i} , F_{m - 1} \left( {x_{i} } \right) + f_{m} \left( {x_{i} } \right)} \right) + \Omega \left( {f_{m} } \right)$$

and

$$\Omega \left( {f_{m} } \right) =\upgamma {\text{M}} + \frac{{\lambda \left\| w \right\|^{2} }}{2},$$

where l is a differentiable complex loss function between the i-th outcome yi and predicted i-th outcome of the (m − 1)-th ensemble and Ω(fm) is a function penalizing tree complexity. M is the number of leaves, and w is the sum of all leaf weights. γ is the regularization hyperparameter, and λ is the minimum loss hyperparameter of XGBoost [40].

3.1.6 Model training, validation and testing protocol

To train and evaluate our forecasting model, we defined and later on proceeded according to the following protocol: we split the sales data chronologically into training (observations from 2015/01/01 to 2019/02/28 inclusive), validation (2019/03/01 to 2019/12/31) and testing (2020/02/01 to 2020/09/14) sets. (In January 2020, the results of the model validation were discussed, and the procedure of the upcoming evaluation period was defined.) Thus, we first trained our sales prediction model on the training dataset and tested our predictions vs. actual sales on the validation dataset. Second, we used the validation period to fine-tune our model by performing recursive feature selection starting with a model that contained all variables and then removing variables based on performance criteria and discussions with the management. This step provided us with an enhanced model. Third, we tested the predictions of the AI algorithm by producing forecasts for several evaluation windows during which the model competed with predictions that had been produced by the management. At the end of each rolling window, the accuracy a of the AI algorithm and the manual forecasts were evaluated.

A short note on forecasting time horizons: in this study, we developed a short-term and long-term forecasting model to cater to the operational needs of the restaurant. The primary distinction between these two models is in their treatment of weather-related variables because weather forecasts are typically not reliable or available for a time horizon of more than a few days. The long-term forecasting model is simplified as it only maintains temperature as an independent variable, discarding other meteorological variables. This model generates its forecasts by using historical average temperature data for specific hours across various months as independent variable. From an operational standpoint, the restaurant management has identified short-term forecasts, based on the more complex model, as more crucial. Consequently, the subsequent sections of this paper will concentrate on short-term forecasting results, although references to the long-term forecasting model will also be included to provide a comprehensive overview of the forecasting strategy.

The results of our training, validation and testing steps are presented in the subsequent sections. For clarity, the original forecasts were absolute sales values, and thus, the models were trained on absolute values. To maintain confidentiality regarding the absolute sales volumes of the restaurant, this paper reports normalized sales data as described previously. Hence, for publication purposes, the original training, validation, and testing process was repeated in 2023 based on normalized sales data.

3.2 Model training

For model training, we used XGBoost version 1.7 in Python 3.11. To produce forecasts for the testing period, we performed fivefold cross-validation to optimize the XGBoost hyperparameters with the following search grid of distinct hyperparameters (optimal hyperparameters underlined: number of estimators [100, 200], maximum depth [4, 5]] , learning rate [0.05, 0.2], minimum child weight [15, 20] and regularization alpha [1, 10]). Thus, in the process of hyperparameter tuning, 25 = 32 different hyperparameter configurations were evaluated.

3.3 Model validation

The primary objective of the validation period was to fine-tune the model through feature selection, aiming to mitigate potential issues that may arise from multidimensionality. To achieve this, we first trained an XGBoost model on all 89 features in the training dataset and then predicted normalized sales in the validation period. We then evaluated the impact of the variables on the performance of the model to decide if they should remain in the model, be modified or be removed. This process of recursive feature selection is described below.

3.3.1 Feature importance

One commonly used approach for assessing the importance of a feature in predicting the target with the XGBoost algorithm is ‘gain’. Gain represents the cumulative information gain from tree splits on a specific feature. Figure 4 presents the normalized gain for the 12 most important variables produced by the XGBoost prediction model. The hour of the day had the greatest impact on sales, which is not surprising given the strong cyclicality of demand discussed earlier in this article. Temperature was the highest-ranking external variable and ranked 12th in terms of feature importance. To obtain a more differentiated picture of the impact of different types of internal and external variables, Fig. 4 also displays the aggregated importance of model variables based on the categorization introduced in Table 2. The most important feature categories were hour of the day and day of the week, followed by weather variables.

Fig. 4
figure 4

Normalized variable importance for the top 12 variables and for aggregated variable categories in the validation dataset

For feature selection, we iteratively removed the features with the lowest information gain from the model and evaluated the impact on the performance metrics of the model.

3.3.2 Model performance metrics

A common way to evaluate and compare the accuracy of sales prediction algorithms is to compare their respective root mean squared error (RMSE), their mean absolute error (MAE), and, in some instances, the mean absolute percentage error (MAPE).

RMSE is the average magnitude of prediction errors measured as the square root of the average squared differences between predicted and actual hourly sales, namely,

$$RMSE = \sqrt {\frac{1}{n}\,\sum\nolimits_{{i = 1}}^{n} {\left( {y_{i} - h\left( {x_{i} } \right)} \right)^{2} } } .$$

In our case study with n = 7487 observations, yi is the vector of actual normalized sales, xi is the feature vector and h is the predictive model.

Similarly, the MAE is the average deviation of model predictions from observations in the validation dataset, namely,

$$MAE = \frac{1}{n}\sum\nolimits_{{i = 1}}^{n} {\left| {y_{i} - h\left( {x_{i} } \right)} \right|.}$$

Compared to MAE, RMSE penalizes large errors more than MAE due to the squaring of the errors. The measures effectively guided feature selection with restaurant managers. Using the average normalized hourly sales of EUR 241.29 as a benchmark facilitated the discussion and interpretation of values.

The MAPE is the average percentage deviation of model predictions versus actual hourly sales. Mathematically:

$$MAPE = \frac{1}{n}\sum\nolimits_{{i = 1}}^{n} {\left| {\frac{{y_{i} - h\left( {x_{i} } \right)}}{{y_{i} }}} \right|.}$$

The MAPE, which scales errors, is easily interpretable but flawed when actual values are zero. Furthermore, it can overemphasize small but percentagewise large deviations, which are irrelevant from an operational perspective. It was thus omitted from feature selection but calculated for model evaluation on sales over EUR 150, representing busy hours (above the 40th percentile) where prediction accuracy is critical for tasks such as staffing.

Table 3 provides an overview of the RMSE and the MAE computed to inform the recursive feature elimination process. It shows the 25 features with the lowest feature importance in terms of ‘gain’ (in ascending order). It displays the RMSE and the MAE for the model when the respective feature is removed (all else equal). In addition, it shows both metrics for the model when this variable and all other variables with lower ‘gain’ are removed. Finally, it shows the decision on how to proceed with the variable. A variable could either be removed from the model, combined with other variables to create a new feature or, as a default case, be kept in the model. The final decision on feature selection is not a purely technical one but also considers aspects raised by the management regarding the expected reliability of the predictions of the model. For example, the removal of the dummy variable certificates term 1 would improve the RMSE in the validation set from 67.42 to 66.66 and the MAE in the validation set from 48.74 to 47.93. In fact, since this is only 1 day per year, the number of data points the model can learn from is limited for this feature. However, the management preferred to keep the variable in the model, as in their experience, demand patterns on this day were different. Overall, our feature selection process considered both technical aspects of model optimization and factors that are important for management to build trust in the forecasts produced by the model.

Table 3 RMSE and MAE to inform the recursive feature elimination process

The RMSE and the MAE of the validation dataset after feature selection are 65.23 (vs. 67.42 before) and 46.99 (vs. 48.74 before), respectively. The RMSE and the MAE in the training dataset were 44.29 and 32.27, respectively. Figure 5 shows scatterplots of the predicted vs. actual values of the training and validation datasets after feature selection. Overall, the predicted values align with the actual values, almost forming a straight line for both the training and validation datasets. Consequently, the XGBoost algorithm can produce predictions on the test dataset that are in line with actual observations. Furthermore, the scatter plots do not indicate severe overfitting.

Fig. 5
figure 5

Actual vs. predicted normalized sales for training and validation datasets

3.4 Results (model evaluation)

Based on the enhanced set of model features, we retrained our model on all available historical data starting on 2015/01/01 until 2 days prior to the start of the respective prediction period (i.e., we consider a lead time of 2 days) to predict normalized hourly sales. Predictions were made based on available information regarding the prediction period as described in Tables 1 and 2.

Table 4 displays the RMSE, MAE and MAPE for short-term predictions across six rolling evaluation windows, each spanning 2 weeks to 1 month. (The joint testing effort with the management had to be put on hold for several weeks in spring 2020 due to the COVID-19 pandemic. From June 16th, 2020 onward, four consecutive evaluation windows were possible).

Table 4 RMSE, MAE and MAPE of AI predictions vs. manual predictions of sales

For each evaluation window, Table 4 presents the RMSE, MAE and MAPE for the respective training period, the short-term AI predictions produced by the XGBoost algorithm, and the manual predictions made by the management team. Note that manually predicted sales are normalized in the same way as actual sales to ensure fair comparisons.

As shown in Table 4, applying the AI algorithm reduces the RMSE and the MAE in all evaluation periods compared to manual forecasts. The largest improvement can be observed in the evaluation window from the 1st until the 15th of February when the RMSE of manual predictions is 91.25 and the RMSE of AI predictions is 60.84. Thus, the RMSE is 33% lower for the AI-based predictions vs. current management practice. Similarly, the MAE of AI predictions is 42.92 vs. 61.80 of manual predictions and thus 31% lower. Additionally, the MAPE, calculated for busy restaurant hours with normalized sales over EUR 150, is 17.6% for AI predictions and 26.8% for manual predictions. Thus, the accuracy of AI predictions is 9.2 percentage points better. The smallest advantage of AI-generated sales predictions can be observed in the period between the 19th of May and the 1st of June 2020. The RMSE of AI predictions is 6.5% lower than that of manual predictions (85.35 vs. 63.88), the respective MAE is 7.9% lower (63.88 vs. 69.37) and the MAPE is 2.3 percentage points lower. The performance of the longer-term prediction models is similar and thus superior to manual forecasts.

The results show that the AI algorithm offers the largest improvement in prediction accuracy during the evaluation window before the outbreak of the COVID-19 pandemic in Germany, which entailed changes in consumption behavior. Accordingly, the evaluation window with the smallest improvement in accuracy is immediately after the lockdown. However, the results presented in Table 4 also show that during the second half of June, the AI algorithm is able to generate predictions that were substantially better than those created by the managers, with an RMSE that is 33% lower, an MAE that is 26% lower and a MAPE during busy hours that is 6.6 percentage points lower.

For illustration, Fig. 6 plots the normalized actual hourly sales vs. manually predicted sales vs. predictions generated by the trained XGBoost algorithm for two exemplary weeks in 2020. The graphs show that the AI prediction model is more accurate in predicting sales than are manual predictions; however, manual predictions also appear to capture the cyclicality in the data well.

Fig. 6
figure 6

Two exemplary series of hourly normalized sales data, model predictions and manual forecasts during the testing period

4 Discussion and evaluation

The case study has demonstrated that the utilization of AI algorithms to forecast sales in an F&B setting yields a substantial improvement in the accuracy of predictions. Depending on the evaluation window and the accuracy metric employed, the relative improvement ranges from 6.5% to 33.3%. On average, the improvement in RMSE is larger than the improvement in MAE, indicating that the usage of AI particularly improves forecasts during periods with high demand, which is particularly relevant from an operational perspective. The XGBoost algorithm is able to capture not only the cyclicality of consumption patterns but also external influences such as weather conditions, events and holidays. Remarkably, even in the immediate aftermath of COVID-related lockdowns, XGBoost produced predictions that surpassed the accuracy of experienced restaurant managers. This achievement is particularly noteworthy because of the temporary shift in consumption patterns brought about by the pandemic, which could be observed when restaurants were allowed to resume in-house services but demand was still lower than usual. Like restaurant managers, AI had limited data to learn how to predict sales during the pandemic. Nonetheless, the variables ‘consumer trust’, ‘year-on-year sales growth’ and ‘latest 30 days’ allowed the algorithm to capture recent changes in consumption patterns. These three variables were incorporated into the model following in-depth discussions with the management that aimed to identify potential drivers and predictors of demand.

The close collaboration with management in terms of business understanding, feature definition, feature selection and model evaluation was not only a prerequisite to developing a robust model but was also crucial to building trust in AI-based forecasts.

In this context, the testing period was of particular importance. It lasted approximately half a year and was performed live, i.e., forecasts were generated upfront—in parallel to the manual forecasts created by the management. Throughout the evaluation windows, the management had the opportunity to thoroughly examine the AI-generated forecasts against the manually generated predictions. Particularly in cases where notable differences arose between the AI-generated and manual predictions, there was a sense of realization and appreciation when the model’s accuracy proved to be superior.

Apart from building trust, the seamless usability of the technology contributed to the positive reception of the AI-driven forecasts. For managers, an important consideration was how much effort the production of manual forecasts vs. the usage of forecasting technology constituted. Therefore, the process of requirements definition carried significance to ensure that upon implementation, the system would be set up such that it would liberate managers from repetitive tasks so that they could dedicate their time to more value-adding activities. The evaluation period itself encapsulated this consideration: managers were provided not only with the mere hourly sales forecasts but also with essential predictors such as weather forecasts as well as holidays and the business climate indicator. Such information is easy to capture in an automated manner via APIs, while looking them up online would be a time-consuming, repetitive manual task. Additionally, to facilitate the plausibility assessment of the AI-based forecasts, managers were given the sales data from the corresponding week of the previous year, which constituted their starting point for manual predictions. This was to ensure that the process of collecting the data for plausibility checks was automated, too so that it would not consume managers’ time.

The case study has also shown that not all variables considered important by management contribute equally to increasing prediction accuracy. For some external variables, collecting and integrating them into predictive models takes some effort, especially when they are only available in unstructured or semistructured formats. Thus, the incremental gain in accuracy and the additional complexity and cost of certain predictors need to be balanced. For example, our analysis has shown that the feature importance of local events for the considered restaurant is low and that there typically is no single source for collecting the dates of such events. From an efficiency standpoint, one might question whether it is worthwhile to integrate such external event variables since there is a trade-off between an increase in prediction accuracy and effort in data collection and integration. Indeed, on average throughout the year, there might not be a noticeable effect of integrating certain external variables into the prediction model, as such events affect sales only during a few days (in our case study, on 6 of 364 days) a year. The impact almost averages out, so it is hardly noticeable. However, in our perspective, such predictors are crucial to build trust in the AI. Restaurant managers are often used to account for variables such as local events when making manual predictions, so the AI model is expected to also consider events. Furthermore, on the specific day of the event, predictions will likely be different if the model is also trained on these external data. If on such peak demand days, a prediction model clearly underestimates sales, restaurant management might lose trust in the predictive algorithm and stop using it. Similarly, school holidays in neighboring states appear to have little effect on predicted sales, while restaurant managers perceive them as important. Therefore, whether to integrate unstructured or semistructured data such as local events and regional school holidays into a prediction model should be evaluated carefully based on an analysis of their impact on prediction accuracy and their role in increasing managerial acceptance of a prediction algorithm.

Our case study has several limitations. The location of the restaurant, which is situated near a highway leading toward the beach, is quite specific. Therefore, the importance of features such as holidays and weather might be different at other locations. In other parts of the world, there might be entirely different holidays and consumption patterns to consider. Furthermore, there might be additional data sources that could improve a predictive model. For example, we considered further external predictor variables such as traffic data from mobile phone operators. However, these data were deemed prohibitively expensive to obtain and implement.

Therefore, the main purpose of this paper is to recommend a general approach on how to leverage big data to improve F&B forecasting: (1) starting with management interviews to collect requirements, (2) identifying suitable external data sources, (3) capturing these data sources and performing feature engineering, (4) preprocessing the target variable incl. potential data cleansing, (5) selecting and defining a suitable model or algorithm, and (6) training and evaluating the model and the importance of predictive variables based on a rigorous protocol. This case study therefore provides tangible and hands-on ideas for how to approach similar forecasting problems.

5 Conclusions

Our case study has shown that there are two major advantages of using AI models that integrate external data for sales forecasting of F&B outlets. First, there is an increase in prediction accuracy. Second, managers are relieved from repetitive tasks related to producing manual sales forecasts. As F&B outlets are often owner-operated small and medium-sized businesses (SMBs), it is particularly important to work closely with owners or managers to build trust in the technology.

To ensure that managers’ time is freed-up, it is important to fully automate the forecasting process, which includes integrating the prediction model not only with external data sources via APIs but also with internal systems. For large franchise systems benefiting from scale economies, it might just be a matter of time until short-term demand forecasting becomes fully automated. For SMBs lacking resources and skills, it is questionable whether such an integration effort would pay off vs. continuing manual predictions in the near future. Thus, offering a simple solution that integrates easily with restaurant ERP and finance systems might represent an opportunity for existing restaurant software vendors or start-ups. Based on our experience and feedback from management, the most important use case is shift planning and staff allocation, where it is particularly important to correctly identify demand peaks and where the AI model has proven particularly robust. Thus, such AI algorithms might first be implemented in human resource systems and then be extended to other functions, such as procurement and finance.