Abstract
Despite recent methodological advances in hidden Markov regression models and a rapid increase in their application in a wide range of empirical settings, complex clustering-based research questions that include the contribution of the covariates set to the classification and the presence of atypical observations are often addressed ignoring the possible effects of wrong model assumptions. Hidden Markov regression models with random covariates (HMRMRCs) have been recently proposed as an improvement over the classical fixed covariates approach, allowing the covariates to contribute to the underlying clustering structure. To make the approach more flexible, when all the considered random variables are continuous, HMRMRCs are here defined focusing on three multivariate elliptical distributions: the normal (reference distribution), the t, and the contaminated normal. The latter two, heavy-tailed generalizations of the normal distribution, are introduced to protect the reference model for the occurrence of mildly atypical points and also allow us their automatic detection. Identifiability conditions are provided, EM-based algorithms are outlined for parameter estimation, and various implementation and operational issues are discussed. Properties of the estimators of the regression coefficients, as well as of the hidden path parameters, are evaluated through Monte Carlo experiments with the aim of showing the consequences of wrong model assumptions on paramaters estimates and inferred clustering. Artificial and real data analyses are provided to investigate models behavior in presence of heterogeneity and atypical observations.
Similar content being viewed by others
References
Bartolucci F, Farcomeni A (2009) A multivariate extension of the dynamic logit model for longitudinal data based on a latent Markov heterogeneity structure. J Am Stat Assoc 104:816–831
Bartolucci F, Farcomeni A, Pennoni F (2014) Latent Markov models: a review of a general framework for the analysis of longitudinal data with covariates. Test 23(3):433–465
Baum LE, Petrie T, Soules G, Weiss N (1970) A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. Ann Math Stat 41(1):164–171
Bernardi M, Maruotti A, Petrella L (2017) Multiple risk measures for multivariate dynamic heavy-tailed models. J Empir Financ 43:1–32
Biernacki C, Lourme A (2014) Stable and visualizable Gaussian parsimonious clustering models. Stat Comput 24(6):953–969
Croux C, Dehon C (2003) Estimators of the multiple correlation coefficient: local robustness and confidence intervals. Stat Pap 44(3):315–334
Dang UJ, Punzo A, McNicholas PD, Ingrassia S, Browne RP (2017) Multivariate response and parsimony for Gaussian cluster-weighted models. J Classif 34(1):4–34
Dannemann J, Holzmann H, Leister A (2014) Semiparametric hidden Markov models: identifiability and estimation. Wiley Interdiscip Rev Comput Stat 6(6):418–425
Hennig C (2000) Identifiablity of models for clusterwise linear regression. J Classif 17(2):273–296
Hossain A, Naik DN (1991) A comparative study on detection of influential observations in linear regression. Stat Pap 32(1):55–69
Ingrassia S, Rocci R (2007) Constrained monotone EM algorithms for finite mixture of multivariate Gaussians. Comput Stat Data Anal 51(11):5339–5351
Ingrassia S, Minotti SC, Punzo A (2014) Model-based clustering via linear cluster-weighted models. Comput Stat Data Anal 71:159–182
Lachos VH, Angolini T, Abanto-Valle CA (2011) On estimation and local influence analysis for measurement errors models under heavy-tailed distributions. Stat Pap 52(3):567–590
Leroux BG (1992) Maximum-likelihood estimation for hidden Markov models. Stoch Process Their Appl 40(1):127–143
Maronna RA (1976) Robust \({M}\)-estimators of multivariate location and scatter. Ann Stat 4(1):51–67
Martinez-Zarzoso I, Maruotti A (2013) The environmental kuznets curve: functional form, time-varying heterogeneity and outliers in a panel setting. Environmetrics 24(7):461–475
Maruotti A (2011) Mixed hidden Markov models for longitudinal data: An overview. Int Stat Rev 79(3):427–454
Maruotti A (2014) Robust fitting of hidden Markov regression models under a longitudinal setting. J Stat Comput Simul 84(8):1728–1747
Maruotti A, Punzo A (2017) Model-based time-varying clustering of multivariate longitudinal data with covariates and outliers. Comput Stat Data Anal 113:475–496
Maruotti A, Bulla J, Lagona F, Picone M, Martella F (2017) Dynamic mixtures of factor analyzers to characterize multivariate air pollutant exposures. Ann Appl Stat 11(3):1617–1648
Maruotti A, Punzo A, Bagnato L (2019) Hidden Markov and semi-Markov models with multivariate leptokurtic-normal components for robust modeling of daily returns series. J Financ Econom 17(1):91–117
Mazza A, Punzo A (2017) Mixtures of multivariate contaminated normal regression models. Stat Pap. https://doi.org/10.1007/s00362-017-0964-y
Mazza A, Punzo A, Ingrassia S (2018) flexCWM: a flexible framework for cluster-weighted models. J Stat Softw 86(2):1–30
McLachlan G, Krishnan T (2007) The EM algorithm and extensions, Wiley Series in Probability and Statistics, vol 382, 2nd edn. Wiley, New York
McLachlan GJ, Peel D (2000) Finite mixture models. Wiley, New York
Meng XL, Rubin DB (1993) Maximum likelihood estimation via the ECM algorithm: a general framework. Biometrika 80(2):267–278
Niu X, Li P, Zhang P (2016) Testing homogeneity in a scale mixture of normal distributions. Stat Pap 57(2):499–516
Punzo A, Ingrassia S (2015) Parsimonious generalized linear Gaussian cluster-weighted models. In: Morlini I, Minerva T, Vichi M (eds) Advances in statistical models for data analysis. Studies in classification, data analysis and knowledge organization. Springer, Switzerland, pp 201–209
Punzo A, Maruotti A (2016) Clustering multivariate longitudinal observations: the contaminated Gaussian hidden Markov model. J Comput Graph Stat 25(4):1097–1116
Punzo A, McNicholas PD (2016) Parsimonious mixtures of multivariate contaminated normal distributions. Biom J 58(6):1506–1537
Punzo A, McNicholas PD (2017) Robust clustering in regression analysis via the contaminated Gaussian cluster-weighted model. J Classif 34(2):249–293
Punzo A, Ingrassia S, Maruotti A (2018a) Multivariate generalized hidden Markov regression models with random covariates: physical exercise in an elderly population. Stat Med 37(19):2797–2808
Punzo A, Mazza A, McNicholas PD (2018b) ContaminatedMixt: An R package for fitting parsimonious mixtures of multivariate contaminated normal distributions. J Stat Softw 85(10):1–25
R Core Team (2018) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/
Ritter G (2015) Robust cluster analysis and variable selection, Chapman & Hall/CRC monographs on statistics & applied probability, vol 137. CRC Press, Boca Raton
Rousseeuw PJ, Leroy AM (2005) Robust regression and outlier detection. Wiley Series in probability and statistics. Wiley, Hoboken
Subedi S, Punzo A, Ingrassia S, McNicholas PD (2013) Clustering and classification via cluster-weighted factor analyzers. Adv Data Anal Classif 7(1):5–40
Subedi S, Punzo A, Ingrassia S, McNicholas PD (2015) Cluster-weighted \(t\)-factor analyzers for robust model-based clustering and dimension reduction. Stat Methods Appl 24(4):623–649
Visser I, Raijmakers MEJ, Molenaar PCM (2000) Confidence intervals for hidden markov model parameters. Br J Math Stat Psychol 53(2):317–327
Zucchini W, MacDonald IL, Langrock R (2016) Hidden Markov models for time series: an introduction using R, monographs on statistics & applied probability, vol 150, 2nd edn. CRC Press, Boca Raton
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Punzo, A., Ingrassia, S. & Maruotti, A. Multivariate hidden Markov regression models: random covariates and heavy-tailed distributions. Stat Papers 62, 1519–1555 (2021). https://doi.org/10.1007/s00362-019-01146-3
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00362-019-01146-3