Abstract
In this paper, we address several aspects of applying classical machine learning algorithms to a regression problem. We compare the predictive power to validate our approach on a data about revenue of a large Russian restaurant chain. We pay special attention to solve two problems: data heterogeneity and a high number of correlated features. We describe methods for considering heterogeneity—observations weighting and estimating models on subsamples. We define a weighting function via Mahalanobis distance in the space of features and show its predictive properties on following methods: ordinary least squares regression, elastic net, support vector regression, and random forest.
The article was prepared within the framework of the Academic Fund Program at the National Research University Higher School of Economics (HSE University) in 2019–2020 (grant #19-04-048) and within the framework of the Russian Academic Excellence Project “5–100”.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Data is available at goo-gl.ru/5vIE.
- 2.
Data are taken from gks.ru.
- 3.
Data taken from 2GIS.
References
Athey, S., Imbens, G.: Recursive partitioning for heterogeneous causal effects. Proc. Nat. Acad. Sci. 113(27), 7353–7360 (2016)
Berthon, P., Holbrook, M., Hulbert, J.: Beyond market orientation. A conceptualization of market evolution. J. Interact. Mark. 14(3), 50–66 (2000)
Bennett, B.: On an approximate test for homogeneity of coefficients of variation. In: Ziegler, W.J. (ed.) Contribution to Applied Statistics, pp. 169–171. Birkhäuser, Basel (1976). https://doi.org/10.1007/978-3-0348-5513-6_16
Breiman, L.: Heuristics of instability and stabilization in model selection. Ann. Stat. 24(6), 2350–2383 (1996)
Chiang, W., Chen, J., Xu, X.: An overview of research on revenue management: current issues and future research. Int. J. Revenue Manag. 1(1), 97–128 (2007)
Kim, S., Upneja, A.: Predicting restaurant financial distress using decision tree and AdaBoosted decision tree models. Econ. Model. 36, 354–362 (2014)
Lee, D., Lee, W., Lee, Y., Pawitan, Y.: Sparse partial least-squares regression and its applications to high-throughput data analysis. Chemometr. Intell. Lab. Syst. 109(1), 1–8 (2011)
Liu, S., et al.: Learning accurate and interpretable models based on regularized random forests regression. BMC Syst. Biol. 8(3), S5 (2014)
Mansfield, E., Helms, B.: Detecting multicollinearity. Am. Stat. 36(3a), 158–160 (1982)
Neale, M.C.: Individual fit, heterogeneity, and missing data in multigroup SEM. In: Modeling Longitudinal and Multiple-Group Data: Practical Issues, Applied Approaches, and Specific Examples. Lawrence Erlbaum Associates, Hillsdale (2000)
Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge University Press, Cambridge (2000)
Walzer, N., Blanke, A., Evans, M.: Factors affecting retail sales in small and mid-size cities. Community Dev. 49(4), 69–484 (2018)
Wang, K., Wai, K., Liang, L., Fue, X.: Entry patterns of low-cost carriers in Hong Kong and implications to the regional market. J. Air Transp. Manag. 64B, 101–112 (2017)
White, H.: A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity. Econometrica 48(4), 817–838 (1980)
Willmott, C., Matsuura, K.: Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance. Clim. Res. 30(1), 79–82 (2005)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Gogolev, S., Ozhegov, E.M. (2020). Comparison of Machine Learning Algorithms in Restaurant Revenue Prediction. In: van der Aalst, W., et al. Analysis of Images, Social Networks and Texts. AIST 2019. Communications in Computer and Information Science, vol 1086. Springer, Cham. https://doi.org/10.1007/978-3-030-39575-9_4
Download citation
DOI: https://doi.org/10.1007/978-3-030-39575-9_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-39574-2
Online ISBN: 978-3-030-39575-9
eBook Packages: Computer ScienceComputer Science (R0)