Skip to main content
Log in

Regression Method in Data Mining: A Systematic Literature Review

  • Review article
  • Published:
Archives of Computational Methods in Engineering Aims and scope Submit manuscript

Abstract

Regression is one of the most important supervised learning methods in data mining that is used to predict and discover knowledge in data mining science. After reviewing the studies conducted in the field of regression, it has been found that the tendency to use this method is increasing day by day among researchers. This study reviews 500 articles from about 230 reputable journals under one framework over the twenty-first century and also discusses the status and use of regression in data mining research. The systematic framework presented in this study includes the following steps: 1—Examining the position of regression in research conducted in data mining and determining the trend of different journals to conduct research in the field of regression in different years 2—Examining different study areas in the field of regression and determining the trend to conduct research in various areas of study in different years 3—Examining the algorithms used in the field of regression and determining the most widely used and trend to use algorithms by researchers in different years 4—Examining the keywords used in regression research in data mining and determining the strongest and most attractive rules obtained from the relationships of these keywords with each other using the Apriori algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Abbreviations

RT:

Regression tree

CART:

Classification and regression tree

MLR:

Multiple logistic regression

RF:

Random forest

SVR:

Support vector regression

HABES:

Harmful algal bloom expert system

ANN:

Artificial neural network

BRT:

Boosted regression tree

ZOIB:

Zero-or-one inflated beta

LLM:

Logit leaf model

TPOT:

Tree-based pipeline optimization tool

GBT:

Gradient boosting tree

MLPNN:

Multilayer perceptron neural network

TGD:

Three gorges dam

LSSVM:

Least squares support vector machine

DT:

Decision tree

DNN:

Dynamic neural network

PCA:

Principal component analysis

CSS:

Customer satisfaction survey

HPSO:

Hybrid particle swarm optimization

HHCART:

CART using house holders matrices

FT-SVR:

Fourier transform and SVR

ROC:

Receiver operating characteristic

CHAID:

Chi-square automatic interaction detection

NLRA:

Non-linear regression analysis

KNN:

K-nearest neighbor

HTBR:

Hierarchical tree-based regression

ISM-RT:

Interpretative structural modeling with regression tree

GBR:

Gradient boosted regression

LSTM:

Long-short term memory network

ABC-LR:

Artificial bee colony with logistic regression

GDP:

Gross domestic product

X12-ARIMA:

X12-auto regressive integrated moving average

MLP:

Multilayer perceptron

MARS-GBM:

MARS and gradient boosting machine

LOR:

Logistic regression

LR:

Linear regression

MRT:

Multivariate regression tree

SVM:

Support vector machine

GMDH:

Group method of data handling

STR:

Smooth transition regression

HPSO:

Hybrid particle swarm optimization

SPFs:

Safety performance functions

LDA:

Linear discriminant analysis

CT:

Classification tree

MARS:

Multivariate adaptive regression spline

ANFIS:

Adaptive neuro-fuzzy inference systems

IVTS:

Interval-valued time series

XGBoost:

Extreme gradient boosting

GBDT:

Gradient boosted decision tree

LSSVR:

Least squares support vector regression

QSAR:

Quantitative structure activity relationship

STR-tree:

Smooth transition regression and CART

RBFN:

Radial basis function network

HPSORTRBFN:

HPSO, RT and RBFN

SDM:

Species distribution model

MS-HCA:

Multidimensional scaling and hierarchical cluster analysis

AUC:

Area under the curve

IOT:

Internet of things

DTR:

Decision tree regression

GBRT:

Gradient boosting regression tree

ELM:

Extreme learning machine

fLogSLFN:

Filtered logistic single-hidden layer feedforward neural network

RFR:

Random forest regression

EMD-LSTM:

Empirical mode decomposition-based LSTM

ICD-9-CM:

International classification of disease 9th—revision clinical modification

SO2:

Sulfur dioxide

GP:

Genetic programming

GBDT-LR:

Gradient boosting decision tree with logistic regression

References

  1. Alizamir M, Kim S, Kisi O, Zounemat-Kermani M (2020) A comparative study of several machine learning based non-linear regression methods in estimating solar radiation: case studies of the USA and Turkey regions. Energy 197:117239

    Article  Google Scholar 

  2. Andreetta A, Cecchini G, Bonifacio E, Comolli R, Vingiani S, Carnicelli S (2016) Tree or soil? Factors influencing humus form differentiation in Italian forests. Geoderma 264:195–204

    Article  Google Scholar 

  3. Belciug S (2020) Logistic regression paradigm for training a single-hidden layer feedforward neural network. Application to gene expression datasets for cancer research. J Biomed Inform 102:103373

    Article  Google Scholar 

  4. Buya S, Tongkumchum P, Owusu BE (2020) Modelling of land-use change in Thailand using binary logistic regression and multinomial logistic regression. Arab J Geosci 13:437

    Article  Google Scholar 

  5. Cai J, Xu K, Zhu Y, Hu F, Li L (2020) Prediction and analysis of net ecosystem carbon exchange based on gradient boosting regression and random forest. Appl Energy 262:114566

    Article  Google Scholar 

  6. Cao Y, Zhang X, Fu Y, Lu Z, Shen X (2020) Urban spatial growth modeling using logistic regression and cellular automata: a case study of Hangzhou. Ecol Ind 113:106200

    Article  Google Scholar 

  7. Cappelli C, Cerqueti R, D’Urso P, Di Iorio F (2020) Multiple breaks detection in financial interval-valued time series. Expert Syst Appl 164:113775

    Article  Google Scholar 

  8. Cappelli C, Penny RN, Rea WS, Reale M (2008) Detecting multiple mean breaks at unknown points in official time series. Math Comput Simul 78(2–3):351–356

    Article  MathSciNet  Google Scholar 

  9. Carey V, Zeger SL, Diggle P (1993) Modelling multivariate binary data with alternating logistic regressions. Biometrika 80(3):517–526

    Article  Google Scholar 

  10. Chatterjee S, Hadi AS (2015) Regression analysis by example. Wiley, New York

    Google Scholar 

  11. Chen MY (2011) Predicting corporate financial distress based on integration of decision tree classification and logistic regression. Expert Syst Appl 38(9):11261–11272

    Article  Google Scholar 

  12. Chen Q, Mynett AE (2004) Predicting Phaeocystis globosa bloom in Dutch coastal waters by decision trees and nonlinear piecewise regression. Ecol Model 176(3–4):277–290

    Article  Google Scholar 

  13. Cheng W, Wang K, Zhang X (2010) Implementation of a COM-based decision-tree model with VBA in ArcGIS. Expert Syst Appl 37(1):12–17

    Article  Google Scholar 

  14. Curcio CL, Wu YY, Vafaei A, Barbosa JFDS, Guerra R, Guralnik J, Gomez F (2020) A regression tree for identifying risk factors for fear of falling: the International Mobility in Aging Study (IMIAS). J Gerontol: Ser A 75(1):181–188

    Article  Google Scholar 

  15. D’Ambrosio A, Aria M, Iorio C, Siciliano R (2017) Regression trees for multivalued numerical response variables. Expert Syst Appl 69:21–28

    Article  Google Scholar 

  16. Da Rosa JC, Veiga A, Medeiros MC (2008) Tree-structured smooth transition regression models. Comput Stat Data Anal 52(5):2469–2488

    Article  MathSciNet  Google Scholar 

  17. De’Ath G (2002) Multivariate regression trees: a new technique for modeling species–environment relationships. Ecology 83(4):1105–1117

    Google Scholar 

  18. De Caigny A, Coussement K, De Bock KW (2018) A new hybrid classification algorithm for customer churn prediction based on logistic regression and decision trees. Eur J Oper Res 269(2):760–772

    Article  MathSciNet  Google Scholar 

  19. de Oña J, de Oña R, Calvo FJ (2012) A classification tree approach to identify key factors of transit service quality. Expert Syst Appl 39(12):11164–11171

    Article  Google Scholar 

  20. Dedeturk BK, Akay B (2020) Spam filtering using a logistic regression model trained by an artificial bee colony algorithm. Appl Soft Comput 91:106229

    Article  Google Scholar 

  21. Dong X, Kattel G, Jeppesen E (2020) Subfossil cladocerans as quantitative indicators of past ecological conditions in Yangtze River Basin lakes, China. Sci Total Environ 728:138794

    Article  Google Scholar 

  22. Elmaz F, Yücel Ö, Mutlu AY (2020) Predictive modeling of biomass gasification with machine learning-based regression methods. Energy 191:116541

    Article  Google Scholar 

  23. Erdal HI, Karakurt O (2013) Advancing monthly streamflow prediction accuracy of CART models using ensemble learning paradigms. J Hydrol 477:119–128

    Article  Google Scholar 

  24. Fan J, Yue W, Wu L, Zhang F, Cai H, Wang X et al (2018) Evaluation of SVM, ELM and four tree-based ensemble models for predicting daily reference evapotranspiration using limited meteorological data in different climates of China. Agric For Meteorol 263:225–241

    Article  Google Scholar 

  25. Feng JZ, Wang Y, Peng J, Sun MW, Zeng J, Jiang H (2019) Comparison between logistic regression and machine learning algorithms on survival prediction of traumatic brain injuries. J Crit Care 54:110–116

    Article  Google Scholar 

  26. Flores PG, López IF, Kemp PD, Dörner J, Zhang B (2017) Prediction by decision tree modelling of the relative magnitude of functional group abundance in a pasture ecosystem in the south of Chile. Agr Ecosyst Environ 239:38–50

    Article  Google Scholar 

  27. Galton F (1886) Regression towards mediocrity in hereditary stature. J Anthropol Inst G B Irel 15:246–263

    Google Scholar 

  28. Gholampour A, Mansouri I, Kisi O, Ozbakkaloglu T (2020) Evaluation of mechanical properties of concretes containing coarse recycled concrete aggregates using multivariate adaptive regression splines (MARS), M5 model tree (M5Tree), and least squares support vector regression (LSSVR) models. Neural Comput Appl 32(1):295–308

    Article  Google Scholar 

  29. Gupta S (2015) A regression modeling technique on data mining. Int J Comput Appl 116(9)

  30. Han J, Pei J, Kamber M (2011) Data mining: concepts and techniques. Elsevier, Amsterdam

    Google Scholar 

  31. Hand DJ, Adams NM (2014) Data mining. Wiley StatsRef: Statistics Reference Online, pp 1–7

  32. Hossny K, Magdi S, Soliman AY, Hossny AH (2020) Detecting explosives by PGNAA using KNN Regressors and decision tree classifier: a proof of concept. Prog Nucl Energy 124:103332

    Article  Google Scholar 

  33. Hu Y, Dai Z, Guldmann JM (2020) Modeling the impact of 2D/3D urban indicators on the urban heat island over different seasons: a boosted regression tree approach. J Environ Manag 266:110424

    Article  Google Scholar 

  34. Jafari A, Khademi H, Finke PA, Van de Wauw J, Ayoubi S (2014) Spatial prediction of soil great groups by boosted regression trees using a limited point dataset in an arid region, southeastern Iran. Geoderma 232:148–163

    Article  Google Scholar 

  35. Janitza S, Tutz G, Boulesteix AL (2016) Random forest for ordinal responses: prediction and variable selection. Comput Stat Data Anal 96:57–73

    Article  MathSciNet  Google Scholar 

  36. Jeong JY, Kang JS, Jun CH (2020) Regularization-based model tree for multi-output regression. Inf Sci 507:240–255

    Article  MathSciNet  Google Scholar 

  37. Jeung M, Baek S, Beom J, Cho KH, Her Y, Yoon K (2019) Evaluation of random forest and regression tree methods for estimation of mass first flush ratio in urban catchments. J Hydrol 575:1099–1110

    Article  Google Scholar 

  38. Jevšenak J, Levanič T, Džeroski S (2018) Comparison of an optimal regression method for climate reconstruction with the compare_methods () function from the dendroTools R package. Dendrochronologia 52:96–104

    Article  Google Scholar 

  39. Jovanovic M, Radovanovic S, Vukicevic M, Van Poucke S, Delibasic B (2016) Building interpretable predictive models for pediatric hospital readmission using Tree-Lasso logistic regression. Artif Intell Med 72:12–21

    Article  Google Scholar 

  40. Kaloop MR, Kumar D, Samui P, Hu JW, Kim D (2020) Compressive strength prediction of high-performance concrete using gradient tree boosting machine. Constr Build Mater 264:120198

    Article  Google Scholar 

  41. Kasprzyk I, Grinn-Gofroń A, Strzelczak A, Wolski T (2011) Hourly predictive artificial neural network and multivariate regression trees models of Ganoderma spore concentrations in Rzeszów and Szczecin (Poland). Sci Total Environ 409(5):949–956

    Article  Google Scholar 

  42. Kerby DS (2003) CART analysis with unit-weighted regression to predict suicidal ideation from Big Five traits. Pers Individ Differ 35(2):249–261

    Article  Google Scholar 

  43. Kisi O (2015) Pan evaporation modeling using least square support vector machine, multivariate adaptive regression splines and M5 model tree. J Hydrol 528:312–320

    Article  Google Scholar 

  44. Kisi O, Parmar KS, Soni K, Demir V (2017) Modeling of air pollutants using least square support vector regression, multivariate adaptive regression spline, and M5 model tree models. Air Qual Atmos Health 10(7):873–883

    Article  Google Scholar 

  45. Krishna K, Veettil VP, Anas A, Nair S (2020) Hydrological regulation of Vibrio dynamics in a tropical monsoonal estuary: a classification and regression tree approach. Environ Sci Pollut Res 28:724–737

    Article  Google Scholar 

  46. Kurt I, Ture M, Kurum AT (2008) Comparing performances of logistic regression, classification and regression tree, and neural networks for predicting coronary artery disease. Expert Syst Appl 34(1):366–374

    Article  Google Scholar 

  47. Lemon SC, Roy J, Clark MA, Friedmann PD, Rakowski W (2003) Classification and regression tree analysis in public health: methodological review and comparison with logistic regression. Ann Behav Med 26(3):172–181

    Article  Google Scholar 

  48. Levatić J, Ceci M, Stepišnik T, Džeroski S, Kocev D (2020) Semi-supervised regression trees with application to QSAR modelling. Expert Syst Appl 158:113569

    Article  Google Scholar 

  49. Levatić J, Kocev D, Ceci M, Džeroski S (2018) Semi-supervised trees for multi-target regression. Inf Sci 450:109–127

    Article  MathSciNet  Google Scholar 

  50. Li G, Chen H, Hu Y, Wang J, Guo Y, Liu J et al (2018) An improved decision tree-based fault diagnosis method for practical variable refrigerant flow system using virtual sensor-based fault indicators. Appl Therm Eng 129:1292–1303

    Article  Google Scholar 

  51. Li S, Laima S, Li H (2018) Data-driven modeling of vortex-induced vibration of a long-span suspension bridge using decision tree learning and support vector regression. J Wind Eng Ind Aerodyn 172:196–211

    Article  Google Scholar 

  52. Littke KM, Cross J, Harrison RB, Zabowski D, Turnblom E (2017) Understanding spatial and temporal Douglas-fir fertilizer response in the Pacific Northwest using boosted regression trees and linear discriminant analysis. For Ecol Manag 406:61–71

    Article  Google Scholar 

  53. Liu Y (2010, January) Study on application of apriori algorithm in data mining. In: 2010 second international conference on computer modeling and simulation. IEEE, vol 3, pp 111–114

  54. Liu J, Li Y (2020) Study on environment-concerned short-term load forecasting model for wind power based on feature extraction and tree regression. J Clean Prod 264:121505

    Article  Google Scholar 

  55. Luo RM, Li YQ, Guo HL, Zhou YP, Xu H, Gong H (2013) Adaptive configuration of radial basis function network by regression tree allied with hybrid particle swarm optimization algorithm. Chemom Intell Lab Syst 124:50–57

    Article  Google Scholar 

  56. McCord SE, Buenemann M, Karl JW, Browning DM, Hadley BC (2017) Integrating remotely sensed imagery and existing multiscale field data to derive rangeland indicators: application of Bayesian additive regression trees. Rangel Ecol Manag 70(5):644–655

    Article  Google Scholar 

  57. Mattern S, Fasbender D, Vanclooster M (2009) Discriminating sources of nitrate pollution in an unconfined sandy aquifer. J Hydrol 376(1–2):275–284

    Article  Google Scholar 

  58. Mikut R, Reischl M (2011) Data mining tools. Wiley Interdiscip Rev: Data Min Knowl Discov 1(5):431–443

    Google Scholar 

  59. Nusinovici S, Tham YC, Yan MYC, Ting DSW, Li J, Sabanayagam C et al (2020) Logistic regression was as good as machine learning for predicting major chronic diseases. J Clin Epidemiol 122:56–69

    Article  Google Scholar 

  60. Palaniappan S, Awang R (2008, March) Intelligent heart disease prediction system using data mining techniques. In: 2008 IEEE/ACS international conference on computer systems and applications. IEEE, pp 108–115

  61. Pendharkar PC (2004) An exploratory study of object-oriented software component size determinants and the application of regression tree forecasting models. Inf Manag 42(1):61–73

    Article  Google Scholar 

  62. Peters RP, Twisk JW, van Agtmael MA, Groeneveld AJ (2006) The role of procalcitonin in a decision tree for prediction of bloodstream infection in febrile patients. Clin Microbiol Infect 12(12):1207–1213

    Article  Google Scholar 

  63. Ploner A, Brandenburg C (2003) Modelling visitor attendance levels subject to day of the week and weather: a comparison between linear regression models and regression trees. J Nat Conserv 11(4):297–308

    Article  Google Scholar 

  64. Qin X, Wan Y, Fan M, Liao Y, Li Y, Wang B, Gao Q (2020) Diffusive flux of CH4 and N2O from agricultural river networks: regression tree and importance analysis. Sci Total Environ 717:137244

    Article  Google Scholar 

  65. Rahmatian M, Chen YC, Palizban A, Moshref A, Dunford WG (2017) Transient stability assessment via decision trees and multivariate adaptive regression splines. Electric Power Syst Res 142:320–328

    Article  Google Scholar 

  66. Rawls WJ, Pachepsky YA, Ritchie JC, Sobecki TM, Bloodworth H (2003) Effect of soil organic carbon on soil water retention. Geoderma 116(1–2):61–76

    Article  Google Scholar 

  67. Salimi A, Rostami J, Moormann C, Hassanpour J (2018) Examining feasibility of developing a rock mass classification for hard rock TBM application using non-linear regression, regression tree and generic programming. Geotech Geol Eng 36(2):1145–1159

    Google Scholar 

  68. Smarra F, Di Girolamo GD, De Iuliis V, Jain A, Mangharam R, D’Innocenzo A (2020) Data-driven switching modeling for MPC using regression trees and random forests. Nonlinear Anal Hybrid Syst 36:100882

    Article  MathSciNet  Google Scholar 

  69. Sanzana MB, Garrido SS, Poblete CM (2015) Profiles of Chilean students according to academic performance in mathematics: An exploratory study using classification trees and random forests. Stud Educ Eval 44:50–59

    Article  Google Scholar 

  70. Sarda-Espinosa A, Subbiah S, Bartz-Beielstein T (2017) Conditional inference trees for knowledge extraction from motor health condition data. Eng Appl Artif Intell 62:26–37

    Article  Google Scholar 

  71. Schwantes AM, Swenson JJ, Jackson RB (2016) Quantifying drought-induced tree mortality in the open canopy woodlands of central Texas. Remote Sens Environ 181:54–64

    Article  Google Scholar 

  72. Shim EJ, Yoon MA, Yoo HJ, Chee CG, Lee MH, Lee SH et al (2020) An MRI-based decision tree to distinguish lipomas and lipoma variants from well-differentiated liposarcoma of the extremity and superficial trunk: classification and regression tree (CART) analysis. Eur J Radiol 127:109012

    Article  Google Scholar 

  73. Smith R, Kasprzyk J, Rajagopalan B (2019) Using multivariate regression trees and multiobjective tradeoff sets to reveal fundamental insights about water resources systems. Environ Model Softw 120:104498

    Article  Google Scholar 

  74. Song Y, Zhou H, Wang P, Yang M (2019) Prediction of clathrate hydrate phase equilibria using gradient boosted regression trees and deep neural networks. J Chem Thermodyn 135:86–96

    Article  Google Scholar 

  75. Sproull GJ, Adamus M, Bukowski M, Krzyżanowski T, Szewczyk J, Statwick J, Szwagrzyk J (2015) Tree and stand-level patterns and predictors of Norway spruce mortality caused by bark beetle infestation in the Tatra Mountains. For Ecol Manag 354:261–271

    Article  Google Scholar 

  76. Suykens JA, Vandewalle J (1999) Least squares support vector machine classifiers. Neural Process Lett 9(3):293–300

    Article  Google Scholar 

  77. Torgo L (1997, July) Functional models for regression tree leaves. In: ICML, vol 97, pp 385–393

  78. Tso GK, Yau KK (2007) Predicting electricity energy consumption: a comparison of regression analysis, decision tree and neural networks. Energy 32(9):1761–1768

    Article  Google Scholar 

  79. Valle R, Buenaposada JM, Valdés A, Baumela L (2019) Face alignment using a 3d deeply-initialized ensemble of regression trees. Comput Vis Image Underst 189:102846

    Article  Google Scholar 

  80. Vallejo F, Díaz-Robles LA, Vega R, Cubillos F (2020) A novel approach for prediction of mass yield and higher calorific value of hydrothermal carbonization by a robust multilinear model and regression trees. J Energy Inst 93:1755–1762

    Article  Google Scholar 

  81. Vanli ND, Sayin MO, Mohaghegh M, Ozkan H, Kozat SS (2019) Nonlinear regression via incremental decision trees. Pattern Recogn 86:1–13

    Article  Google Scholar 

  82. Vega FA, Andrade ML, Covelo EF (2010) Influence of soil properties on the sorption and retention of cadmium, copper and lead, separately and together, by 20 soil horizons: comparison of linear regression and tree regression analyses. J Hazard Mater 174(1–3):522–533

    Article  Google Scholar 

  83. Wang FK, Mamo T (2020) Gradient boosted regression model for the degradation analysis of prismatic cells. Comput Ind Eng 144:106494

    Article  Google Scholar 

  84. Wang K, Simandl JK, Porter MD, Graettinger AJ, Smith RK (2016) How the choice of safety performance function affects the identification of important crash prediction variables. Accid Anal Prev 88:1–8

    Article  Google Scholar 

  85. Wickramarachchi DC, Robertson BL, Reale M, Price CJ, Brown J (2016) HHCART: an oblique decision tree. Comput Stat Data Anal 96:12–23

    Article  MathSciNet  Google Scholar 

  86. Wolf BJ, Slate EH, Hill EG (2015) Ordinal logic regression: a classifier for discovering combinations of binary markers for ordinal outcomes. Comput Stat Data Anal 82:152–163

    Article  MathSciNet  Google Scholar 

  87. Wright RE (1995) Logistic regression

  88. Yang BS, Tan ACC (2009) Multi-step ahead direct prediction for the machine condition prognosis using regression trees and neuro-fuzzy systems. Expert Syst Appl 36(5):9378–9387

    Article  Google Scholar 

  89. Yang F, Wang D, Xu F, Huang Z, Tsui KL (2020) Lifespan prediction of lithium-ion batteries based on various extracted features and gradient boosting regression tree model. J Power Sources 476:228654

    Article  Google Scholar 

  90. Yang RM, Zhang GL, Liu F, Lu YY, Yang F, Yang F et al (2016) Comparison of boosted regression tree and random forest models for mapping topsoil organic carbon concentration in an alpine ecosystem. Ecol Indic 60:870–878

    Article  Google Scholar 

  91. Yu H, Cooper AR, Infante DM (2020) Improving species distribution model predictive accuracy using species abundance: application with boosted regression trees. Ecol Model 432:109202

    Article  Google Scholar 

  92. Yu X, Wang Y, Wu L, Chen G, Wang L, Qin H (2020) Comparison of support vector regression and extreme gradient boosting for decomposition-based data-driven 10-day streamflow forecasting. J Hydrol 582:124293

    Article  Google Scholar 

  93. Yu H, Wen J, Wang H, Jun L (2011) An improved Apriori algorithm based on the Boolean matrix and Hadoop. Procedia Eng 15:1827–1831

    Article  Google Scholar 

  94. Zegler CH, Renz MJ, Brink GE, Ruark MD (2020) Assessing the importance of plant, soil, and management factors affecting potential milk production on organic pastures using regression tree analysis. Agric Syst 180:102776

    Article  Google Scholar 

  95. Zeng N, Xiao H (2020) Inferring implications in semantic maps via the Apriori algorithm. Lingua 239:102808

    Article  Google Scholar 

  96. Zhan G, Yan X, Zhu S, Wang Y (2016) Using hierarchical tree-based regression model to examine university student travel frequency and mode choice patterns in China. Transp Policy 45:55–65

    Article  Google Scholar 

  97. Zhan X, Zhang S, Szeto WY, Chen X (2020) Multi-step-ahead traffic speed forecasting using multi-output gradient boosting regression tree. J Intell Transp Syst 24(2):125–141

    Article  Google Scholar 

  98. Zhang D, Kabuka MR (2018) Combining weather condition data to predict traffic flow: a GRU-based deep learning approach. IET Intell Transp Syst 12(7):578–585

    Article  Google Scholar 

  99. Zhang L, Huettmann F, Liu S, Sun P, Yu Z, Zhang X, Mi C (2019) Classification and regression with random forests as a standard method for presence-only data SDMs: a future conservation example using China tree species. Eco Inform 52:46–56

    Article  Google Scholar 

  100. Zhang L, Traore S, Ge J, Li Y, Wang S, Zhu G et al (2019) Using boosted tree regression and artificial neural networks to forecast upland rice yield under climate change in Sahel. Comput Electron Agric 166:105031

    Article  Google Scholar 

  101. Zhou S, Wang S, Wu Q, Azim R, Li W (2020) Predicting potential miRNA-disease associations by combining gradient boosting decision tree with logistic regression. Comput Biol Chem 85:107200

    Article  Google Scholar 

  102. Zhou F, Zhang Q, Sornette D, Jiang L (2019) Cascading logistic regression onto gradient boosted decision trees for forecasting and trading stock indices. Appl Soft Comput 84:105747

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohammad Vahid Sebt.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sebt, M.V., Sadati-Keneti, Y., Rahbari, M. et al. Regression Method in Data Mining: A Systematic Literature Review. Arch Computat Methods Eng (2024). https://doi.org/10.1007/s11831-024-10088-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11831-024-10088-5

Navigation