Abstract
When modeling the functional relationship between a response variable and covariates via linear regression, multiple relationships may be present depending on the underlying component structure. Deploying a flexible mixture distribution can help with capturing a wide variety of such structures, thereby successfully modeling the response–covariate relationship while addressing the components. In that spirit, a mixture regression model based on the finite mixture of generalized hyperbolic distributions is introduced, and its parameter estimation method is presented. The flexibility of the generalized hyperbolic distribution can identify better-fitting components, which can lead to a more meaningful functional relationship between the response variable and the covariates. In addition, we introduce an iterative component combining procedure to aid the interpretability of the model. The results from simulated and real data analyses indicate that our method offers a distinctive edge over some of the existing methods, and that it can generate useful insights on the data set at hand for further investigation.
Similar content being viewed by others
Data Availability
The data sets used in this paper are available freely online, and the reference has been provided in the manuscript.
Code Availability
Not applicable.
References
Aitken AC (1926) On Bernoulli’s numerical solution of algebraic equations. Proc R Soc Edinb 46:289–305
Andrews JL, McNicholas PD (2011) Extending mixtures of multivariate t-factor analyzers. Stat Comput 21(3):361–373
Askew K (2020) Counting the cost of fish fraud: ‘billions’ lost to illicit fisheries. https://www.foodnavigator.com/Article/2020/03/12/Counting-the-cost-of-fish-fraud-Billions-lost-to-illicit-fisheries
Barndorff-Nielsen O (1978) Hyperbolic distributions and distributions on hyperbolae. Scand J Stat 5:151–157
Baudry JP, Raftery AE, Celeux G, Lo K, Gottardo R (2010) Combining mixture components for clustering. J Comput Graph Stat 19(2):332–353
Biernacki C, Celeux G, Govaert G (2000) Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Trans Pattern Anal Mach Intell 22(7):719–725
Bouveryon C, Girard S, Schmid C (2007) High-dimensional data clustering. Comput Stat Data Anal 52:502–519
Browne RP, McNicholas PD (2015) A mixture of generalized hyperbolic distributions. Can J Stat 43(2):176–198
Cao S, Chang W, Zhang C (2020) RobMixReg: robust mixture regression. https://CRAN.R-project.org/package=RobMixReg
Cattell RB (1966) The scree test for the number of factors. Multivar Behav Res 1(2):245–276
Chacón JE (2019) Mixture model modal clustering. Adv Data Anal Classif 13(2):379–404
Chamroukhi F (2016) Robust mixture of experts modeling using the t distribution. Neural Netw 79:20–36
Chamroukhi F (2017) Skew t mixture of experts. Neurocomputing 266:390–408
De Veaux RD (1989) Mixtures of linear regressions. Comput Stat Data Anal 8:227–245
Fokoué E (2005) Mixtures of factor analyzers: an extension with covariates. J Multivar Anal 95(2):370–384
Fraley C, Raftery AE (2002) Model-based clustering, discriminant analysis, and density estimation. J Am Stat Assoc 97(458):611–631
Franczak BC, Browne RP, McNicholas PD (2013) Mixtures of shifted asymmetric Laplace distributions. IEEE Trans Pattern Anal Mach Intell 36(6):1149–1157
García-Escudero LA, Gordaliza A, Greselin F, Ingrassia S, Mayo-Íscar A (2017) Robust estimation of mixtures of regressions with random covariates, via trimming and constraints. Stat Comput 27(2):377–402
Gershenfeld N (1997) Nonlinear inference and cluster-weighted modeling. Ann NY Acad Sci 808(1):18–24
Ghahramani Z, Hinton GE et al (1996) The em algorithm for mixtures of factor analyzers. Technical Report CRG-TR-96-1, University of Toronto
Grün B, Leisch F (2008) FlexMix version 2: finite mixtures with concomitant variables and varying and constant parameters. J Stat Softw 28(4):1–35 (https://www.jstatsoft.org/v28/i04/)
Hennig C (2000) Identifiablity of models for clusterwise linear regression. J Classif 17(2):273–296
Hennig C (2010) Methods for merging Gaussian mixture components. Adv Data Anal Classif 4(1):3–34
Hu H, Yao W, Wu Y (2017) The robust EM-type algorithms for log-concave mixtures of regression models. Comput Stat Data Anal 111:14–26
Hubert L, Arabie P (1985) Comparing partitions. J Classif 2(1):193–218. https://doi.org/10.1007/BF01908075
Hunter DR, Young DS (2012) Semiparametric mixtures of regressions. J Nonparametric Stat 24(1):19–38
Ingrassia S, Minotti SC, Vittadini G (2012) Local statistical modeling via a cluster-weighted approach with elliptical distributions. J Classif 29(3):363–401
Ingrassia S, Minotti SC, Punzo A (2014) Model-based clustering via linear cluster-weighted models. Comput Stat Data Anal 71:159–182
ISTAT (2013) Italian tourist flow data (retrieved from www.robertocellini.it). http://www.robertocellini.it/doc/master_specializzazione/Cellini-Cuccia_ApEc2013_data1996-2010.pdf
Jacobs RA, Jordan MI, Nowlan SJ, Hinton GE (1991) Adaptive mixtures of local experts. Neural Comput 3(1):79–87
Kim NH, Browne R (2019) Subspace clustering for the finite mixture of generalized hyperbolic distributions. Adv Data Anal Classif 13(3):641–661
Kim NH, Browne RP (2021) Mode merging for the finite mixture of t-distributions. Stat 10(1):e372
Kotz S, Kozubowski T, Podgorski K (2012) The Laplace distribution and generalizations: a revisit with applications to communications, economics, engineering, and finance. Springer, Berlin
Lin TI, Lee JC, Hsieh WJ (2007a) Robust mixture modeling using the skew t distribution. Stat Comput 17(2):81–92
Lin TI, Lee JC, Yen SY (2007b) Finite mixture modelling using the skew normal distribution. Stat Sin 17:909–927
Lin TI, McLachlan GJ, Lee SX (2016) Extending mixtures of factor models using the restricted multivariate skew-normal distribution. J Multivar Anal 143:398–413
Liu M, Lin TI (2014) A skew-normal mixture regression model. Educ Psychol Meas 74(1):139–162
Ma Y, Wang S, Xu L, Yao W (2021) Semiparametric mixture regression with unspecified error distributions. TEST 30(2):429–444
McLachlan G, Peel G (2000) Finite mixture models. Wiley series in probability and statistics
McNeil AJ, Frey R, Embrechts P (2005) Quantitative risk management: concepts, techniques and tools. Princeton University Press, Princeton
McNicholas PD (2016) Mixture model-based classification. CRC Press, Boca Raton
Melnykov V (2016) Merging mixture components for clustering through pairwise overlap. J Comput Graph Stat 25(1):66–90
Menardi G (2016) A review on modal clustering. Int Stat Rev 84(3):413–433
Neykov N, Filzmoser P, Dimova R, Neytchev P (2007) Robust fitting of mixtures using the trimmed likelihood estimator. Comput Stat Data Anal 52(1):299–308
OECD (2020) OECD tourism trends and policies 2020. https://www.oecd-ilibrary.org/sites/3d4192c2-en/index.html?itemId=/content/component/3d4192c2-en
Peel D, McLachlan GJ (2000) Robust mixture modelling using the t distribution. Stat Comput 10(4):339–348
Pesevski A, Franczak BC, McNicholas PD (2018) Subspace clustering with the multivariate-t distribution. Pattern Recogn Lett 112:297–302
Punzo A, McNicholas PD (2016) Parsimonious mixtures of multivariate contaminated normal distributions. Biom J 58(6):1506–1537
Punzo A, McNicholas PD (2017) Robust clustering in regression analysis via the contaminated Gaussian cluster-weighted model. J Classif 34(2):249–293
Pyae A (2019) Fish market data set. https://www.kaggle.com/aungpyaeap/fish-market/metadata
R Core Team (2020) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria, https://www.R-project.org/
Rao AV, Miller D, Rose K, Gersho A (1997) Mixture of experts regression modeling by deterministic annealing. IEEE Trans Signal Process 45(11):2811–2820
Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6(2):461–464
Scott DW, Szewczyk WF (2001) From kernels to mixtures. Technometrics 43(3):323–335
Sharp A, Browne R (2021) Functional data clustering by projection into latent generalized hyperbolic subspaces. Adv Data Anal Classif 15(3):735–757
Song W, Yao W, Xing Y (2014) Robust mixture regression model fitting by Laplace distribution. Comput Stat Data Anal 71:128–137
Statistica (2020) Number of visitors to state museums, monuments, archaeological sites, and museum complexes with both free and paying entrance in Italy in 2019, by month. https://www.statista.com/statistics/737980/visits-to-paying-free-state-museums-monuments-and-archeological-sites-by-month-italy/
Subedi S, Punzo A, Ingrassia S, McNicholas PD (2013) Clustering and classification via cluster-weighted factor analyzers. Adv Data Anal Classif 7(1):5–40
Tortora C, McNicholas PD, Browne RP (2016) A mixture of generalized hyperbolic factor analyzers. Adv Data Anal Classif 10(4):423–440
Travel and Leisure (2021) The best and worst times to visit Italy. https://www.travelandleisure.com/travel-tips/best-time-to-visit-italy
UN (2020) The state of world fisheries and aquaculture 2020. http://www.fao.org/state-of-fisheries-aquaculture
Warner K, Timme W, Lowell B, Hirschfield M (2013) Oceana study reveals seafood fraud nationwide. Oceana, Washington
Yao W, Wei Y, Yu C (2014) Robust mixture regression using the t-distribution. Comput Stat Data Anal 71:116–127
Yu C, Yao W, Chen K (2017) A new method for robust mixture regression. Can J Stat 45(1):77–94
Yu C, Yao W, Yang G (2020) A selective overview and comparison of robust mixture regression estimators. Int Stat Rev 88(1):176–202
Funding
Dr. Ryan P. Browne is supported by the Natural Sciences and Engineering Research Council of Canada (RGPIN-2018-04444).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
Not applicable.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Kim, NH., Browne, R.P. Flexible mixture regression with the generalized hyperbolic distribution. Adv Data Anal Classif 18, 33–60 (2024). https://doi.org/10.1007/s11634-022-00532-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11634-022-00532-4