Abstract
Multiple linear regression is the method of quantifying the effects of a set of independent variables on a dependent variable. In clusterwise linear regression problems, the data points with similar regression estimates are grouped into the same cluster either due to a business need or to increase the statistical significance of the resulting regression estimates. In this paper, we consider an extension of this problem where data points belonging to the same category should belong to the same partition. For large datasets, finding the exact solution is not possible and many heuristics requires an exponentially increasing amount of time in the number of categories. We propose variants of gradient descent based heuristic to provide high-quality solutions within a reasonable time. The performances of our heuristics are evaluated across 1014 simulated datasets. We find that the comparative performance of the base gradient descent based heuristic is quite good with an average percentage gap of \(0.17\%\) when the number of categories is less than 60. However, starting with a fixed initial partition and restricting cluster assignment changes to be one-directional speed up heuristic dramatically with a moderate decrease in solution quality, especially for datasets with a multiple number of predictors and a large number of datasets. For example, one could generate solutions with an average percentage gap of \(2.81\%\) in one-tenth of the time for datasets with 400 categories.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Angün, E., Altınoy, A.: A new mixed-integer linear programming formulation for multiple responses regression clustering. In: 2019 6th International Conference on Control, Decision and Information Technologies (CoDIT), pp. 1634–1639. IEEE (2019)
Bagirov, A.M., Mahmood, A., Barton, A.: Prediction of monthly rainfall in victoria, australia: clusterwise linear regression approach. Atmosp. Res. 188, 20–29 (2017)
Bagirov, A.M., Ugon, J., Mirzayeva, H.: Nonsmooth nonconvex optimization approach to clusterwise linear regression problems. Eur. J. Oper. Res. 229(1), 132–142 (2013)
Bertsimas, D., Shioda, R.: Classification and regression via integer optimization. Oper. Res. 55(2), 252–271 (2007)
Brusco, M.J., Cradit, J.D., Tashchian, A.: Multicriterion clusterwise regression for joint segmentation settings: an application to customer value. J. Mark. Res. 40(2), 225–234 (2003)
Carbonneau, R.A., Caporossi, G., Hansen, P.: Globally optimal clusterwise regression by mixed logical-quadratic programming. Eur. J. Oper. Res. 212(1), 213–222 (2011)
Carbonneau, R.A., Caporossi, G., Hansen, P.: Extensions to the repetitive branch and bound algorithm for globally optimal clusterwise regression. Comput. Oper. Res. 39(11), 2748–2762 (2012)
Charles, C.: Régression typologique et reconnaissance des formes. Ph.D. thesis (1977)
Costanigro, M., Mittelhammer, R.C., McCluskey, J.J.: Estimating class-specific parametric models under class uncertainty: local polynomial regression clustering in an hedonic analysis of wine markets. J. Appl. Econ. 24(7), 1117–1135 (2009)
DeSarbo, W.S., Cron, W.L.: A maximum likelihood methodology for clusterwise linear regression. J. Classif. 5(2), 249–282 (1988)
Hennig, C.: Models and methods for clusterwise linear regression. Models and Methods for Clusterwise Linear Regression. In: Gaul, W., Locarek-Junge, H. (eds.) Classification in the Information Age. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin (1999). https://doi.org/10.1007/978-3-642-60187-3_17
Joki, K., Bagirov, A.M., Karmitsa, N., Mäkelä, M.M., Taheri, S.: Clusterwise support vector linear regression. Eur. J. Oper. Res. 287(1), 19–35 (2020)
Kayış, E.: A gradient descent based heuristic for solving regression clustering problems. In: Proceedings of the 9th International Conference on Data Science, Technology and Applications (DATA 2020), pp. 102–108. INSTICC, SciTePress (2020)
Khadka, M., Paz, A.: Comprehensive clusterwise linear regression for pavement management systems. J. Transp. Eng. B Pavement 143(4), 04017014 (2017)
Lau, K.-N., Leung, P.-L., Tse, K.-K.: A mathematical programming approach to clusterwise regression model and its extensions. Eur. J. Oper. Res. 116(3), 640–652 (1999)
Luo, Z., Yin, H.: Probabilistic analysis of pavement distress ratings with the clusterwise regression method. Transp. Res. Record J. Transp. Res. Board 2084, 38–46 (2008)
McClelland, R.L., Kronmal, R.: Regression-based variable clustering for data reduction. Stat. Med. 21(6), 921–941 (2002)
Olson, A.W., et al.: Classification and regression via integer optimization for neighborhood change. Geograph. Anal. 53(2), 192–212 (2020)
Park, Y.W., Jiang, Y., Klabjan, D., Williams, L.: Algorithms for generalized clusterwise linear regression. INFORMS J. Comput. 29(2), 301–317 (2017)
Shao, Q., Wu, Y.: A consistent procedure for determining the number of clusters in regression clustering. J. Stat. Plan. Infer. 135(2), 461–476 (2005)
Späth, H.: Algorithm 39 clusterwise linear regression. Computing 22(4), 367–373 (1979)
Wang, T., Paschalidis, I.C.: Convergence of parameter estimates for regularized mixed linear regression models. In: 2019 IEEE 58th Conference on Decision and Control (CDC), pp. 3664–3669. IEEE(2019)
Wedel, M., DeSarbo, W.S.: A review of recent developments in latent class regression models. In: Bagozzi, R. (ed.) Advanced Methods of Marketing Research, pp. 352–388. Blackwell (1994)
Wedel, M., Kistemaker, C.: Consumer benefit segmentation using clusterwise linear regression. Int. J. Res. Mark. 6(1), 45–59 (1989)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Kayış, E. (2021). Designing an Efficient Gradient Descent Based Heuristic for Clusterwise Linear Regression for Large Datasets. In: Hammoudi, S., Quix, C., Bernardino, J. (eds) Data Management Technologies and Applications. DATA 2020. Communications in Computer and Information Science, vol 1446. Springer, Cham. https://doi.org/10.1007/978-3-030-83014-4_8
Download citation
DOI: https://doi.org/10.1007/978-3-030-83014-4_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-83013-7
Online ISBN: 978-3-030-83014-4
eBook Packages: Computer ScienceComputer Science (R0)