Designing an Efficient Gradient Descent Based Heuristic for Clusterwise Linear Regression for Large Datasets

Kayış, Enis

doi:10.1007/978-3-030-83014-4_8

Enis Kayış ORCID: orcid.org/0000-0001-8282-5572⁸

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1446))

Included in the following conference series:

International Conference on Data Management Technologies and Applications

358 Accesses

Abstract

Multiple linear regression is the method of quantifying the effects of a set of independent variables on a dependent variable. In clusterwise linear regression problems, the data points with similar regression estimates are grouped into the same cluster either due to a business need or to increase the statistical significance of the resulting regression estimates. In this paper, we consider an extension of this problem where data points belonging to the same category should belong to the same partition. For large datasets, finding the exact solution is not possible and many heuristics requires an exponentially increasing amount of time in the number of categories. We propose variants of gradient descent based heuristic to provide high-quality solutions within a reasonable time. The performances of our heuristics are evaluated across 1014 simulated datasets. We find that the comparative performance of the base gradient descent based heuristic is quite good with an average percentage gap of \(0.17\%\) when the number of categories is less than 60. However, starting with a fixed initial partition and restricting cluster assignment changes to be one-directional speed up heuristic dramatically with a moderate decrease in solution quality, especially for datasets with a multiple number of predictors and a large number of datasets. For example, one could generate solutions with an average percentage gap of \(2.81\%\) in one-tenth of the time for datasets with 400 categories.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Angün, E., Altınoy, A.: A new mixed-integer linear programming formulation for multiple responses regression clustering. In: 2019 6th International Conference on Control, Decision and Information Technologies (CoDIT), pp. 1634–1639. IEEE (2019)
Google Scholar
Bagirov, A.M., Mahmood, A., Barton, A.: Prediction of monthly rainfall in victoria, australia: clusterwise linear regression approach. Atmosp. Res. 188, 20–29 (2017)
Article Google Scholar
Bagirov, A.M., Ugon, J., Mirzayeva, H.: Nonsmooth nonconvex optimization approach to clusterwise linear regression problems. Eur. J. Oper. Res. 229(1), 132–142 (2013)
Article MathSciNet Google Scholar
Bertsimas, D., Shioda, R.: Classification and regression via integer optimization. Oper. Res. 55(2), 252–271 (2007)
Article MathSciNet Google Scholar
Brusco, M.J., Cradit, J.D., Tashchian, A.: Multicriterion clusterwise regression for joint segmentation settings: an application to customer value. J. Mark. Res. 40(2), 225–234 (2003)
Article Google Scholar
Carbonneau, R.A., Caporossi, G., Hansen, P.: Globally optimal clusterwise regression by mixed logical-quadratic programming. Eur. J. Oper. Res. 212(1), 213–222 (2011)
Article MathSciNet Google Scholar
Carbonneau, R.A., Caporossi, G., Hansen, P.: Extensions to the repetitive branch and bound algorithm for globally optimal clusterwise regression. Comput. Oper. Res. 39(11), 2748–2762 (2012)
Article MathSciNet Google Scholar
Charles, C.: Régression typologique et reconnaissance des formes. Ph.D. thesis (1977)
Google Scholar
Costanigro, M., Mittelhammer, R.C., McCluskey, J.J.: Estimating class-specific parametric models under class uncertainty: local polynomial regression clustering in an hedonic analysis of wine markets. J. Appl. Econ. 24(7), 1117–1135 (2009)
Article MathSciNet Google Scholar
DeSarbo, W.S., Cron, W.L.: A maximum likelihood methodology for clusterwise linear regression. J. Classif. 5(2), 249–282 (1988)
Article MathSciNet Google Scholar
Hennig, C.: Models and methods for clusterwise linear regression. Models and Methods for Clusterwise Linear Regression. In: Gaul, W., Locarek-Junge, H. (eds.) Classification in the Information Age. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin (1999). https://doi.org/10.1007/978-3-642-60187-3_17
Joki, K., Bagirov, A.M., Karmitsa, N., Mäkelä, M.M., Taheri, S.: Clusterwise support vector linear regression. Eur. J. Oper. Res. 287(1), 19–35 (2020)
Article MathSciNet Google Scholar
Kayış, E.: A gradient descent based heuristic for solving regression clustering problems. In: Proceedings of the 9th International Conference on Data Science, Technology and Applications (DATA 2020), pp. 102–108. INSTICC, SciTePress (2020)
Google Scholar
Khadka, M., Paz, A.: Comprehensive clusterwise linear regression for pavement management systems. J. Transp. Eng. B Pavement 143(4), 04017014 (2017)
Article Google Scholar
Lau, K.-N., Leung, P.-L., Tse, K.-K.: A mathematical programming approach to clusterwise regression model and its extensions. Eur. J. Oper. Res. 116(3), 640–652 (1999)
Article Google Scholar
Luo, Z., Yin, H.: Probabilistic analysis of pavement distress ratings with the clusterwise regression method. Transp. Res. Record J. Transp. Res. Board 2084, 38–46 (2008)
Google Scholar
McClelland, R.L., Kronmal, R.: Regression-based variable clustering for data reduction. Stat. Med. 21(6), 921–941 (2002)
Article Google Scholar
Olson, A.W., et al.: Classification and regression via integer optimization for neighborhood change. Geograph. Anal. 53(2), 192–212 (2020)
Article Google Scholar
Park, Y.W., Jiang, Y., Klabjan, D., Williams, L.: Algorithms for generalized clusterwise linear regression. INFORMS J. Comput. 29(2), 301–317 (2017)
Article MathSciNet Google Scholar
Shao, Q., Wu, Y.: A consistent procedure for determining the number of clusters in regression clustering. J. Stat. Plan. Infer. 135(2), 461–476 (2005)
Article MathSciNet Google Scholar
Späth, H.: Algorithm 39 clusterwise linear regression. Computing 22(4), 367–373 (1979)
Article MathSciNet Google Scholar
Wang, T., Paschalidis, I.C.: Convergence of parameter estimates for regularized mixed linear regression models. In: 2019 IEEE 58th Conference on Decision and Control (CDC), pp. 3664–3669. IEEE(2019)
Google Scholar
Wedel, M., DeSarbo, W.S.: A review of recent developments in latent class regression models. In: Bagozzi, R. (ed.) Advanced Methods of Marketing Research, pp. 352–388. Blackwell (1994)
Google Scholar
Wedel, M., Kistemaker, C.: Consumer benefit segmentation using clusterwise linear regression. Int. J. Res. Mark. 6(1), 45–59 (1989)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Industrial Engineering Department, Ozyegin University, Istanbul, Turkey
Enis Kayış

Authors

Enis Kayış
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Enis Kayış .

Editor information

Editors and Affiliations

MODESTE/ESEO, Angers, France
Slimane Hammoudi
Fraunhofer FIT and RWTH Aachen University, Aachen, Germany
Christoph Quix
University of Coimbra, Coimbra, Portugal
Jorge Bernardino

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kayış, E. (2021). Designing an Efficient Gradient Descent Based Heuristic for Clusterwise Linear Regression for Large Datasets. In: Hammoudi, S., Quix, C., Bernardino, J. (eds) Data Management Technologies and Applications. DATA 2020. Communications in Computer and Information Science, vol 1446. Springer, Cham. https://doi.org/10.1007/978-3-030-83014-4_8

Download citation

DOI: https://doi.org/10.1007/978-3-030-83014-4_8
Published: 23 July 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-83013-7
Online ISBN: 978-3-030-83014-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics