Skip to main content

Designing an Efficient Gradient Descent Based Heuristic for Clusterwise Linear Regression for Large Datasets

  • Conference paper
  • First Online:
Data Management Technologies and Applications (DATA 2020)

Abstract

Multiple linear regression is the method of quantifying the effects of a set of independent variables on a dependent variable. In clusterwise linear regression problems, the data points with similar regression estimates are grouped into the same cluster either due to a business need or to increase the statistical significance of the resulting regression estimates. In this paper, we consider an extension of this problem where data points belonging to the same category should belong to the same partition. For large datasets, finding the exact solution is not possible and many heuristics requires an exponentially increasing amount of time in the number of categories. We propose variants of gradient descent based heuristic to provide high-quality solutions within a reasonable time. The performances of our heuristics are evaluated across 1014 simulated datasets. We find that the comparative performance of the base gradient descent based heuristic is quite good with an average percentage gap of \(0.17\%\) when the number of categories is less than 60. However, starting with a fixed initial partition and restricting cluster assignment changes to be one-directional speed up heuristic dramatically with a moderate decrease in solution quality, especially for datasets with a multiple number of predictors and a large number of datasets. For example, one could generate solutions with an average percentage gap of \(2.81\%\) in one-tenth of the time for datasets with 400 categories.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Angün, E., Altınoy, A.: A new mixed-integer linear programming formulation for multiple responses regression clustering. In: 2019 6th International Conference on Control, Decision and Information Technologies (CoDIT), pp. 1634–1639. IEEE (2019)

    Google Scholar 

  • Bagirov, A.M., Mahmood, A., Barton, A.: Prediction of monthly rainfall in victoria, australia: clusterwise linear regression approach. Atmosp. Res. 188, 20–29 (2017)

    Article  Google Scholar 

  • Bagirov, A.M., Ugon, J., Mirzayeva, H.: Nonsmooth nonconvex optimization approach to clusterwise linear regression problems. Eur. J. Oper. Res. 229(1), 132–142 (2013)

    Article  MathSciNet  Google Scholar 

  • Bertsimas, D., Shioda, R.: Classification and regression via integer optimization. Oper. Res. 55(2), 252–271 (2007)

    Article  MathSciNet  Google Scholar 

  • Brusco, M.J., Cradit, J.D., Tashchian, A.: Multicriterion clusterwise regression for joint segmentation settings: an application to customer value. J. Mark. Res. 40(2), 225–234 (2003)

    Article  Google Scholar 

  • Carbonneau, R.A., Caporossi, G., Hansen, P.: Globally optimal clusterwise regression by mixed logical-quadratic programming. Eur. J. Oper. Res. 212(1), 213–222 (2011)

    Article  MathSciNet  Google Scholar 

  • Carbonneau, R.A., Caporossi, G., Hansen, P.: Extensions to the repetitive branch and bound algorithm for globally optimal clusterwise regression. Comput. Oper. Res. 39(11), 2748–2762 (2012)

    Article  MathSciNet  Google Scholar 

  • Charles, C.: Régression typologique et reconnaissance des formes. Ph.D. thesis (1977)

    Google Scholar 

  • Costanigro, M., Mittelhammer, R.C., McCluskey, J.J.: Estimating class-specific parametric models under class uncertainty: local polynomial regression clustering in an hedonic analysis of wine markets. J. Appl. Econ. 24(7), 1117–1135 (2009)

    Article  MathSciNet  Google Scholar 

  • DeSarbo, W.S., Cron, W.L.: A maximum likelihood methodology for clusterwise linear regression. J. Classif. 5(2), 249–282 (1988)

    Article  MathSciNet  Google Scholar 

  • Hennig, C.: Models and methods for clusterwise linear regression. Models and Methods for Clusterwise Linear Regression. In: Gaul, W., Locarek-Junge, H. (eds.) Classification in the Information Age. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin (1999). https://doi.org/10.1007/978-3-642-60187-3_17

  • Joki, K., Bagirov, A.M., Karmitsa, N., Mäkelä, M.M., Taheri, S.: Clusterwise support vector linear regression. Eur. J. Oper. Res. 287(1), 19–35 (2020)

    Article  MathSciNet  Google Scholar 

  • Kayış, E.: A gradient descent based heuristic for solving regression clustering problems. In: Proceedings of the 9th International Conference on Data Science, Technology and Applications (DATA 2020), pp. 102–108. INSTICC, SciTePress (2020)

    Google Scholar 

  • Khadka, M., Paz, A.: Comprehensive clusterwise linear regression for pavement management systems. J. Transp. Eng. B Pavement 143(4), 04017014 (2017)

    Article  Google Scholar 

  • Lau, K.-N., Leung, P.-L., Tse, K.-K.: A mathematical programming approach to clusterwise regression model and its extensions. Eur. J. Oper. Res. 116(3), 640–652 (1999)

    Article  Google Scholar 

  • Luo, Z., Yin, H.: Probabilistic analysis of pavement distress ratings with the clusterwise regression method. Transp. Res. Record J. Transp. Res. Board 2084, 38–46 (2008)

    Google Scholar 

  • McClelland, R.L., Kronmal, R.: Regression-based variable clustering for data reduction. Stat. Med. 21(6), 921–941 (2002)

    Article  Google Scholar 

  • Olson, A.W., et al.: Classification and regression via integer optimization for neighborhood change. Geograph. Anal. 53(2), 192–212 (2020)

    Article  Google Scholar 

  • Park, Y.W., Jiang, Y., Klabjan, D., Williams, L.: Algorithms for generalized clusterwise linear regression. INFORMS J. Comput. 29(2), 301–317 (2017)

    Article  MathSciNet  Google Scholar 

  • Shao, Q., Wu, Y.: A consistent procedure for determining the number of clusters in regression clustering. J. Stat. Plan. Infer. 135(2), 461–476 (2005)

    Article  MathSciNet  Google Scholar 

  • Späth, H.: Algorithm 39 clusterwise linear regression. Computing 22(4), 367–373 (1979)

    Article  MathSciNet  Google Scholar 

  • Wang, T., Paschalidis, I.C.: Convergence of parameter estimates for regularized mixed linear regression models. In: 2019 IEEE 58th Conference on Decision and Control (CDC), pp. 3664–3669. IEEE(2019)

    Google Scholar 

  • Wedel, M., DeSarbo, W.S.: A review of recent developments in latent class regression models. In: Bagozzi, R. (ed.) Advanced Methods of Marketing Research, pp. 352–388. Blackwell (1994)

    Google Scholar 

  • Wedel, M., Kistemaker, C.: Consumer benefit segmentation using clusterwise linear regression. Int. J. Res. Mark. 6(1), 45–59 (1989)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Enis Kayış .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kayış, E. (2021). Designing an Efficient Gradient Descent Based Heuristic for Clusterwise Linear Regression for Large Datasets. In: Hammoudi, S., Quix, C., Bernardino, J. (eds) Data Management Technologies and Applications. DATA 2020. Communications in Computer and Information Science, vol 1446. Springer, Cham. https://doi.org/10.1007/978-3-030-83014-4_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-83014-4_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-83013-7

  • Online ISBN: 978-3-030-83014-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics