Big Data in Restaurant Management: Unsupervised Modelling of Ticket Data and Environmental Variables for Sales Forecasting

Gómez-Talal, Ismael; González-Serrano, Lydia; Talón-Ballestero, Pilar; Rojo-Álvarez, José Luis

doi:10.1007/978-3-031-52607-7_15

Part of the book series: Springer Proceedings in Business and Economics ((SPBE))

Included in the following conference series:

International Conference on Tourism and Information and Communication Technologies

711 Accesses

Abstract

Revenue Management (RM) is one of the challenges facing the restaurant industry, mainly due to the lack of technology in this sector and the lack of data. Forecasting is the most valuable input of RM. For this reason, the main objective of this research is the proposal of a sales forecasting model based on the data provided by the tickets of a restaurant to extract information that allows the correct management of price and capacity. A system based on an unsupervised Machine Learning (ML) model was implemented to analyze the information and visualize the relationships between dishes and temperatures. The developed system uses unsupervised ML techniques, such as multicomponent analysis and bootstrap sampling, to identify and visualize statistically relevant relationships between data. This study provides a simple and understandable solution to improve management and maximize profits to support restaurant managers’ decision-making.

You have full access to this open access chapter, Download conference paper PDF

Keywords

1 Introduction

Revenue Management (RM) is a critical new approach in the food service industry in order to improve profitability and operational efficiency. The food service industry, unlike the hotel industry, collects little business data, so databases for customer relationship management are scarce (Moreno & Tejada, 2019). This limitation means that the research analyzes consumer behavior in restaurants is limited (Cavusoglu, 2019).

Sales tickets are the most common type of information available in restaurants; these data as such do not identify specific customer characteristics, making it difficult to forecast demand. However, these tickets provide valuable insight into customer consumption behavior in restaurants, allowing for sales forecasting, as demonstrated in this study.

Conventional forecasting methods sometimes fail to meet the current and future challenges associated with implementing strategies to optimize restaurant management (Jiao et al., 2018). On the contrary, Big Data (BD) technologies provide tools to manage and process large amounts of data (Samara et al., 2020). This allows the study of the restaurant's sales behavior, which facilitates optimal management through the implementation of strategies that increase operational and financial efficiency (Tao et al., 2020).

Sales predictive capabilities are essential in many industries, and various models have been developed for their application (Mariani et al., 2018). Several studies have adopted different models to forecast customer consumption, including linear autoregressive and nonlinear models (Tanizaki et al., 2019).

This study aims to obtain consumption predictions through unsupervised modeling to increase knowledge of customer consumption, leading to more efficient procurement of restaurant supplies. Sales forecasting also helps make short- and long-term decisions, reducing costs and increasing sales. Today, it must be supported by computer systems that can play the role of a good purchasing manager (Tsoumakas, 2019). Forecasting restaurant sales is a complex task, as several external factors, such as weather or economic factors (Lasek et al., 2016). The effect of meteorological factors has been demonstrated in other studies using data from a hotel restaurant and employing a regression model categorizing dishes into 4 types (Bujisic et al., 2017).

To this end, a series of experiments have been conducted by applying DB techniques to data from 367,527 restaurant tickets in 2019 through Dynameat. The startup Dynameat aims to optimize restaurant profitability using Artificial Intelligence models based on RM and Menu Engineering strategies. Dynameat was born before Covid-19 to offer restaurants a “dynamic menu pricing” depending on demand, providing the restaurant manager with a recommendation system based on customer behavior. The selected dishes are perishable foods, which are the most critical in catering. This type of food with poor forecasting can lead to food waste and high costs for companies (Lasek et al., 2016).

Multiple Correspondence Analysis (MCA) technique used in this study focuses on finding relationships between the variables that are stored in the restaurant tickets. (Pouyanfar et al., 2019). To better represent the variables, Support Vector Domain Description (SVDD) is applied to generate clouds representing the relationships in a three-dimensional latent variable space (Talón-Ballestero et al., 2018). This allows us to obtain relationships between each category (such as dishes and temperatures) using an unsupervised model without prior information about the data.

This paper introduces an innovative approach to Machine Learning (ML) based models by applying an unsupervised ML method and implementing MCA. This allows our model to employ ML in a novel way, with the potential to pioneer a new paradigm in sales forecasting.

2 Theoretical Framework and Literature Review

Restaurant sales forecasting has typically relied on techniques based on intuitive forecasting based on the manager's experience. However, forecasting restaurant sales is a complex process since it is influenced by a number of factors such as weather conditions and economic factors (Lasek et al., 2016). Alternatively, ticket data can be exploited to develop sales forecasts using ML models. This process is more straightforward, apart from being unbiased and dynamic, because it can adapt to changes (Tsoumakas, 2019).

Longitudinal data sets often manifest trends such as seasonality and linearity, which can be effectively managed with linear deterministic models. This set of methodologies includes models based on moving averages, presenting a wide range of variants, including the autoregressive (AR) model, the autoregressive moving average (ARMA) model, or the autoregressive moving average integrated autoregressive (ARIMA) model (Lasek et al., 2016).

There is a growing trend to employ supervised algorithms in food service studies using time series models. Supervised learning models use training and test data to predict a specific variable. Depending on the type of variable forecast, these models can be classified into classification or regression models (Jeong-Gil et al., 2022). Supervised models offer significant advantages, such as allowing predictions of study variables and handling larger volumes of data, but one of the drawbacks of these models is their low interpretability. These models are often referred to as black boxes since it is unknown which variables affect the prediction (Apley & Zhu, 2020).

Therefore, the study conducted in this paper focuses on extracting the characteristics provided by the restaurant ticket data, starting from the assumptions of unsupervised models. In this work, the statistical model approach is based on dimensionality reduction.

3 Methodology

In this study, the database used is configured from the ticket information of a restaurant in Madrid extracted from the Point-of-Sale terminal (POS). The company Dynameat provided us with the data and supervised access to the information from the POS systems per table. The database consists of 367 527 tickets extracted from 3 POS systems during 2019. An example of a ticket is seen in Table 1. The study uses two main variables, which are the daily temperatures and the dishes sold in the restaurant, associating the date information of the ticket with the data time stored in the temperature sensor located in the same district of the restaurant.

Table 1 Example of a ticket extracted from POS systems

Full size table

In order to perform the filtering, the “family” variable has been used, which allows us to discern between different families of dishes. The filtering of perishable dishes is based on the different selected families (‘cheese’, ‘fish’, ‘meats’, ‘seafood’, ‘vegetables and mushrooms’, ‘smoked and sauces’, ‘sausages’) based on the consideration of perishable dishes in the literature (Terpstra et al., 2005).

In this study, the State Meteorological Agency (AEMET) database has been used for the collection of temperature provided by the same district of Madrid where the restaurant of the tickets of the study is located (Luna Rico et al., 2008). In order to aggregate each of the databases, it is necessary to cross both databases by comparing the same date formats of the AEMET data with the database of the restaurant tickets. The extraction of the data was done using Python code by collecting the response information from the URL where the AEMET data is stored in JSON format. The subsequent step, using the two databases, is to correlate the temperature value with the field of the dish sold on the ticket, using the date when the temperature sensor records that value and the date of the sold ticket as a key. To achieve this, we have developed a program in MATLAB. This program compares the same formats and iterates through each of the fields, comparing each of the rows.

Figure 1 shows the diagram of this study, where the first two blocks show the loading of the two mentioned databases and their preprocessing to prepare the dataset for the unsupervised model (MCA). In addition to exploring the statistical variability and the distribution of the data information, the bootstrap resampling method is added. This technique is based on the idea of generating multiple (replacement) data samples from the original sample and then calculating the statistic of interest for each of the samples (Hesterberg, 2011). In our study, the statistic of interest is the confidence interval of the weights of each of the eigenvectors obtained by MCA, where the information of each category is represented, and it is determined whether that category is statistically significant or not (which corresponds to the red vertical lines shown in the block of eigenvectors in Fig. 1).

A process flow diagram of a research framework with 2 pathways to inverted heat map table. 1. It starts with loading dataset and proceeds to 3 processes including M C A and table of distances. 2. From M C A it moves to a set of 5 processes including bootstrap resampling and 3 D Cloud S V D D. — **Fig. 1**

After the bootstrap resampling, the next step is to observe the confidence volume using the SVDD method, where the bootstrap resamples considered within the hypersphere are accumulated. Another tool used is the amplitude of the Gaussian kernel from the probability density. This method is an assiduously used tool based on the central limit theorem (Wang et al., 2019). The interpretation of the latent space of the categories through the MCA is viewed as a probability mass function. In this context, centered samples represent larger values, indicating more significant repetition within the statistical range. Conversely, less repetitive samples are not centrally located. In terms of relationships between categories, closer samples show greater affinity. On the other hand, samples farther imply a lower association.

The three-dimensional representation sometimes does not allow a clear and summarized view of the relationships between the categories. Therefore, this information is transferred as a distance table, which compares all clouds and their respective distances. On the other hand, this distance table is, in turn, modified by normalizing the distances and adding a color (based on an inverted heat map) to each table cell. The heat map represents that a cold color cell represents larger distances where categories have a lower affinity and warm colors represent smaller distances and thus higher affinity between categories.

4 Experiments and Results

In this case, the data included are the temperature values obtained from the average value published by AEMET and the dishes sold and invoiced in the tickets (considering perishable foods). Figure 2a shows the three principal eigenvectors together with the confidence intervals of each category, where the red vertical lines mean that these categories have a confidence interval higher than 95%. These three principal eigenvectors are plotted in three-dimensional space, as shown in Fig. 2b, where the clouds of temperatures and dishes are denoted. Closer clouds represent higher statistical affinity and lower affinity in the case of categories farther away from each other. On the one hand, the categories represented in the center of the three-dimensional representation are the categories with the highest representation or frequency in the data.

A set of 3 plots of 3 eigenvectors and a bootstrap sample. a. They plot fluctuating trends of peaks and dips each. b. A 3-D cluster bootstrap plot of the confidence volumes for several categories. — **Fig. 2**

On the other hand, the categories farther away from the center have a lower representation in the data in the form of a Gaussian. Many categories of both variables (dishes and temperatures) do not allow us to visualize the relationships correctly. Therefore, we transferred this information to a table of distances.

We calculate a distance table using the center of the confidence volume between the categories. This table is a square matrix whose dimensions are the number of dishes and the total temperatures. Since this matrix is large and complex, we have simplified it. To do this, we have divided the temperatures into 7 categories: extremely high, very high, high, medium, low, very low, and extremely low. To decide which temperature goes into each category, we have used quartiles. In this way, we have summarized all the information of the matrix in a form that is easier to understand.

5 Conclusions

The present work has employed tickets from a restaurant in Madrid together with weather data from AEMET to identify relationships and improve sales forecasting. On the other hand, this study represents an important starting point for implementing unsupervised models in the restaurant industry since previous studies have been based on supervised models (Sakib, 2023).

Previous research has used the MCA model to study the customer profile in the hotel industry, which allows us to visualize the statistical relationships between variables (Talón-Ballestero et al., 2018). However, this study goes a step further by allowing the visualization of the relationships between dishes and temperatures in an inverted heat map, which provides a more complete and comprehensive representation of sales patterns.

These results reveal information that the restaurateur should investigate. On the one hand, the similar consumption of some dishes may indicate that customers tend to order them together, which opens the possibility of promoting their joint sale through techniques such as cross-selling. On the other hand, the presence of dishes consumed at the most extreme temperatures may indicate seasonal dishes or dishes that have been introduced to the menu for specific periods. Thus, these sales patterns provide valuable information to the restaurant manager to accurate sales forecasts and improve the restaurant's operational efficiency.

Although these results are promising, it is essential to consider some limitations. For example, the MCA method reveals statistically solid relationships, but its effectiveness may be affected by the size of the study samples. However, despite these limitations, this model shows relationships between dishes sold regularly throughout the year. Another important limitation is that the results refer to the “dishes” and not to the restaurant inputs (all the products that make up the dish), for which it would be necessary for restaurants to computerize the standard recipes of all the dishes, and this is still a pending issue in the restaurant industry. With this information, it would be possible to carry out forecasting of inputs and solve the problem of stock forecasting. However, if the restaurants collected this information, this could open up a promising future line of research.

The proposed methodology could be applied to other variables such as months, days of the week, and hours to know the consumption pattern. On the other hand, another study is the relationships of sales to the composition of the restaurant tables to know the behavior of the different types of segments (single, couple, family, among others). Similarly, it would also allow us to see the productivity of workers through the relationship of employees or areas of the restaurant with sales. In this line of research, there is still a long way to go as sales transaction data collected by restaurants improve their operations and product management. This increases the quality of food service, assessing the impact of promotional activities on sales and applying RM techniques such as dynamic pricing or menu engineering to maximize profits and improve customer satisfaction.

References

Apley, D. W., & Zhu, J. (2020). Visualizing the effects of predictor variables in black box supervised learning models. Journal of the Royal Statistical Society Series B: Statistical Methodology, 82(4), 1059–1086.
Article Google Scholar
Bujisic, M., Bogicevic, V., & Parsa, H. G. (2017). The effect of weather factors on restaurant sales. Journal of Foodservice Business Research, 20(3), 350–370.
Article Google Scholar
Cavusoglu, M. (2019). An analysis of technology applications in the restaurant industry. Journal of Hospitality and Tourism Technology, 10(1), 45–72.
Article Google Scholar
Hesterberg, T. (2011). Bootstrap. Wiley Interdisciplinary Reviews: Computational Statistics, 3(6), 497–526.
Article Google Scholar
Jeong-Gil, C., Yi-Wei, Z., & Nadzri, N. I. B. M. (2022). A review of forecasting studies for the restaurant industry: Focusing on results, contributions and limitations. Global Business & Finance Review, 27(2), 61.
Article Google Scholar
Jiao, R., Zhang, T., Jiang, Y., & He, H. (2018). Short-term non-residential load forecasting based on multiple sequences LSTM recurrent neural network. IEEE Access, 6, 59438–59448.
Article Google Scholar
Lasek, A., Cercone, N., & Saunders, J. (2016). Restaurant sales and customer demand forecasting: Literature survey and categorization of methods. In Smart City 360°: First EAI International Summit, Smart City 360°, Bratislava, Slovakia and Toronto, Canada, October 13–16, 2015. Revised Selected Papers (Vol. 1, pp. 479–491).
Google Scholar
Luna Rico, Y., Morata Gasca, A., Martín Pérez, M. L., Santos Muñoz, D., & Cruz, J. D. L. (2008). Validación de la base de datos reticular de la AEMET: Temperatura diaria máxima y mínima.
Google Scholar
Mariani, M., Baggio, R., Fuchs, M., & Höepken, W. (2018). Business intelligence and big data in hospitality and tourism: A systematic literature review. International Journal of Contemporary Hospitality Management, 30(12), 3514–3554.
Article Google Scholar
Moreno, P., & Tejada, P. (2019). Reviewing the progress of information and communication technology in the restaurant industry. Journal of Hospitality and Tourism Technology, 10(4), 673–688.
Article Google Scholar
Pouyanfar, S., Tao, Y., Tian, H., Chen, S. C., & Shyu, M. L. (2019). Multimodal deep learning based on multiple correspondence analysis for disaster management. World Wide Web, 22, 1893–1911.
Article Google Scholar
Sakib, S. N. (2023). Restaurant sales prediction using machine learning. In Handbook of research on AI and machine learning applications in customer support and analytics (pp. 202–226). IGI Global.
Google Scholar
Samara, D., Magnisalis, I., & Peristeras, V. (2020). Artificial intelligence and big data in tourism: A systematic literature review. Journal of Hospitality and Tourism Technology, 11(2), 343–367.
Article Google Scholar
Talón-Ballestero, P., González-Serrano, L., Soguero-Ruiz, C., Muñoz-Romero, S., & Rojo-Álvarez, J. L. (2018). Using big data from customer relationship management information systems to determine the client profile in the hotel sector. Tourism Management, 68, 187–197.
Article Google Scholar
Tanizaki, T., Hoshino, T., Shimmura, T., & Takenaka, T. (2019). Demand forecasting in restaurants using machine learning and statistical analysis. Procedia CIRP, 79, 679–683.
Article Google Scholar
Tao, D., Yang, P., & Feng, H. (2020). Utilization of text mining as a big data analysis tool for food science and nutrition. Comprehensive Reviews in Food Science and Food Safety, 19(2), 875–894.
Article Google Scholar
Terpstra, M. J., Steenbekkers, L. P. A., De Maertelaere, N. C. M., & Nijhuis, S. (2005). Food storage and disposal: Consumer practices and knowledge. British Food Journal.
Google Scholar
Tsoumakas, G. (2019). A survey of machine learning techniques for food sales prediction. Artificial Intelligence Review, 52(1), 441–447.
Article Google Scholar
Wang, J., Liu, W., Qiu, K., Xiong, H., & Zhao, L. (2019). Dynamic hypersphere SVDD without describing boundary for one-class classification. Neural Computing and Applications, 31, 3295–3305.
Article Google Scholar

Download references

Acknowledgements

This work was partly supported by the State Research Agency of the Ministry of Science and Innovation with reference code AEI/https://doi.org/10.13039/501100011033 and PID2022-140786NB-C31. We would like to especially thank Dynameat for providing the data used in this work and for the useful discussions.

Author information

Authors and Affiliations

Rey Juan Carlos University, Madrid, Spain
Ismael Gómez-Talal, Lydia González-Serrano, Pilar Talón-Ballestero & José Luis Rojo-Álvarez

Authors

Ismael Gómez-Talal
View author publications
You can also search for this author in PubMed Google Scholar
Lydia González-Serrano
View author publications
You can also search for this author in PubMed Google Scholar
Pilar Talón-Ballestero
View author publications
You can also search for this author in PubMed Google Scholar
José Luis Rojo-Álvarez
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ismael Gómez-Talal .

Editor information

Editors and Affiliations

Computer science, University of Malaga, Malaga, Spain
Antonio J. Guevara Plaza
Campus de Teatinos, Universidad de Málaga, Fac de Turismo, Málaga, Málaga, Spain
Alfonso Cerezo Medina
Department of Geography, University of Málaga, Málaga, Spain
Enrique Navarro Jurado

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gómez-Talal, I., González-Serrano, L., Talón-Ballestero, P., Rojo-Álvarez, J.L. (2024). Big Data in Restaurant Management: Unsupervised Modelling of Ticket Data and Environmental Variables for Sales Forecasting. In: Guevara Plaza, A.J., Cerezo Medina, A., Navarro Jurado, E. (eds) Tourism and ICTs: Advances in Data Science, Artificial Intelligence and Sustainability. TURITEC 2023. Springer Proceedings in Business and Economics. Springer, Cham. https://doi.org/10.1007/978-3-031-52607-7_15

Download citation

DOI: https://doi.org/10.1007/978-3-031-52607-7_15
Published: 25 June 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-52606-0
Online ISBN: 978-3-031-52607-7
eBook Packages: Business and ManagementBusiness and Management (R0)

Publish with us

Policies and ethics