Multivariate hidden Markov regression models: random covariates and heavy-tailed distributions

Punzo, Antonio; Ingrassia, Salvatore; Maruotti, Antonello

doi:10.1007/s00362-019-01146-3

Multivariate hidden Markov regression models: random covariates and heavy-tailed distributions

Regular Article
Published: 18 November 2019

Volume 62, pages 1519–1555, (2021)
Cite this article

Statistical Papers Aims and scope Submit manuscript

678 Accesses
7 Citations
Explore all metrics

Abstract

Despite recent methodological advances in hidden Markov regression models and a rapid increase in their application in a wide range of empirical settings, complex clustering-based research questions that include the contribution of the covariates set to the classification and the presence of atypical observations are often addressed ignoring the possible effects of wrong model assumptions. Hidden Markov regression models with random covariates (HMRMRCs) have been recently proposed as an improvement over the classical fixed covariates approach, allowing the covariates to contribute to the underlying clustering structure. To make the approach more flexible, when all the considered random variables are continuous, HMRMRCs are here defined focusing on three multivariate elliptical distributions: the normal (reference distribution), the t, and the contaminated normal. The latter two, heavy-tailed generalizations of the normal distribution, are introduced to protect the reference model for the occurrence of mildly atypical points and also allow us their automatic detection. Identifiability conditions are provided, EM-based algorithms are outlined for parameter estimation, and various implementation and operational issues are discussed. Properties of the estimators of the regression coefficients, as well as of the hidden path parameters, are evaluated through Monte Carlo experiments with the aim of showing the consequences of wrong model assumptions on paramaters estimates and inferred clustering. Artificial and real data analyses are provided to investigate models behavior in presence of heterogeneity and atypical observations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Robust Random Effects Models: A Diagnostic Approach Based on the Forward Search

Semiparametric Bayesian inference on generalized linear measurement error models

Article Open access 01 February 2016

Automatic Discovery of Common and Idiosyncratic Latent Effects in Multilevel Regression

References

Bartolucci F, Farcomeni A (2009) A multivariate extension of the dynamic logit model for longitudinal data based on a latent Markov heterogeneity structure. J Am Stat Assoc 104:816–831
Article MathSciNet Google Scholar
Bartolucci F, Farcomeni A, Pennoni F (2014) Latent Markov models: a review of a general framework for the analysis of longitudinal data with covariates. Test 23(3):433–465
Article MathSciNet Google Scholar
Baum LE, Petrie T, Soules G, Weiss N (1970) A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. Ann Math Stat 41(1):164–171
Article MathSciNet Google Scholar
Bernardi M, Maruotti A, Petrella L (2017) Multiple risk measures for multivariate dynamic heavy-tailed models. J Empir Financ 43:1–32
Article Google Scholar
Biernacki C, Lourme A (2014) Stable and visualizable Gaussian parsimonious clustering models. Stat Comput 24(6):953–969
Article MathSciNet Google Scholar
Croux C, Dehon C (2003) Estimators of the multiple correlation coefficient: local robustness and confidence intervals. Stat Pap 44(3):315–334
Article MathSciNet Google Scholar
Dang UJ, Punzo A, McNicholas PD, Ingrassia S, Browne RP (2017) Multivariate response and parsimony for Gaussian cluster-weighted models. J Classif 34(1):4–34
Article MathSciNet Google Scholar
Dannemann J, Holzmann H, Leister A (2014) Semiparametric hidden Markov models: identifiability and estimation. Wiley Interdiscip Rev Comput Stat 6(6):418–425
Article Google Scholar
Hennig C (2000) Identifiablity of models for clusterwise linear regression. J Classif 17(2):273–296
Article Google Scholar
Hossain A, Naik DN (1991) A comparative study on detection of influential observations in linear regression. Stat Pap 32(1):55–69
Article MathSciNet Google Scholar
Ingrassia S, Rocci R (2007) Constrained monotone EM algorithms for finite mixture of multivariate Gaussians. Comput Stat Data Anal 51(11):5339–5351
Article MathSciNet Google Scholar
Ingrassia S, Minotti SC, Punzo A (2014) Model-based clustering via linear cluster-weighted models. Comput Stat Data Anal 71:159–182
Article MathSciNet Google Scholar
Lachos VH, Angolini T, Abanto-Valle CA (2011) On estimation and local influence analysis for measurement errors models under heavy-tailed distributions. Stat Pap 52(3):567–590
Article MathSciNet Google Scholar
Leroux BG (1992) Maximum-likelihood estimation for hidden Markov models. Stoch Process Their Appl 40(1):127–143
Article MathSciNet Google Scholar
Maronna RA (1976) Robust \({M}\)-estimators of multivariate location and scatter. Ann Stat 4(1):51–67
Article MathSciNet Google Scholar
Martinez-Zarzoso I, Maruotti A (2013) The environmental kuznets curve: functional form, time-varying heterogeneity and outliers in a panel setting. Environmetrics 24(7):461–475
Article MathSciNet Google Scholar
Maruotti A (2011) Mixed hidden Markov models for longitudinal data: An overview. Int Stat Rev 79(3):427–454
Article Google Scholar
Maruotti A (2014) Robust fitting of hidden Markov regression models under a longitudinal setting. J Stat Comput Simul 84(8):1728–1747
Article MathSciNet Google Scholar
Maruotti A, Punzo A (2017) Model-based time-varying clustering of multivariate longitudinal data with covariates and outliers. Comput Stat Data Anal 113:475–496
Article MathSciNet Google Scholar
Maruotti A, Bulla J, Lagona F, Picone M, Martella F (2017) Dynamic mixtures of factor analyzers to characterize multivariate air pollutant exposures. Ann Appl Stat 11(3):1617–1648
Article MathSciNet Google Scholar
Maruotti A, Punzo A, Bagnato L (2019) Hidden Markov and semi-Markov models with multivariate leptokurtic-normal components for robust modeling of daily returns series. J Financ Econom 17(1):91–117
Article Google Scholar
Mazza A, Punzo A (2017) Mixtures of multivariate contaminated normal regression models. Stat Pap. https://doi.org/10.1007/s00362-017-0964-y
Article MATH Google Scholar
Mazza A, Punzo A, Ingrassia S (2018) flexCWM: a flexible framework for cluster-weighted models. J Stat Softw 86(2):1–30
Article Google Scholar
McLachlan G, Krishnan T (2007) The EM algorithm and extensions, Wiley Series in Probability and Statistics, vol 382, 2nd edn. Wiley, New York
Google Scholar
McLachlan GJ, Peel D (2000) Finite mixture models. Wiley, New York
Book Google Scholar
Meng XL, Rubin DB (1993) Maximum likelihood estimation via the ECM algorithm: a general framework. Biometrika 80(2):267–278
Article MathSciNet Google Scholar
Niu X, Li P, Zhang P (2016) Testing homogeneity in a scale mixture of normal distributions. Stat Pap 57(2):499–516
Article MathSciNet Google Scholar
Punzo A, Ingrassia S (2015) Parsimonious generalized linear Gaussian cluster-weighted models. In: Morlini I, Minerva T, Vichi M (eds) Advances in statistical models for data analysis. Studies in classification, data analysis and knowledge organization. Springer, Switzerland, pp 201–209
Chapter Google Scholar
Punzo A, Maruotti A (2016) Clustering multivariate longitudinal observations: the contaminated Gaussian hidden Markov model. J Comput Graph Stat 25(4):1097–1116
Article MathSciNet Google Scholar
Punzo A, McNicholas PD (2016) Parsimonious mixtures of multivariate contaminated normal distributions. Biom J 58(6):1506–1537
Article MathSciNet Google Scholar
Punzo A, McNicholas PD (2017) Robust clustering in regression analysis via the contaminated Gaussian cluster-weighted model. J Classif 34(2):249–293
Article MathSciNet Google Scholar
Punzo A, Ingrassia S, Maruotti A (2018a) Multivariate generalized hidden Markov regression models with random covariates: physical exercise in an elderly population. Stat Med 37(19):2797–2808
Article MathSciNet Google Scholar
Punzo A, Mazza A, McNicholas PD (2018b) ContaminatedMixt: An R package for fitting parsimonious mixtures of multivariate contaminated normal distributions. J Stat Softw 85(10):1–25
Article Google Scholar
R Core Team (2018) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/
Ritter G (2015) Robust cluster analysis and variable selection, Chapman & Hall/CRC monographs on statistics & applied probability, vol 137. CRC Press, Boca Raton
Google Scholar
Rousseeuw PJ, Leroy AM (2005) Robust regression and outlier detection. Wiley Series in probability and statistics. Wiley, Hoboken
MATH Google Scholar
Subedi S, Punzo A, Ingrassia S, McNicholas PD (2013) Clustering and classification via cluster-weighted factor analyzers. Adv Data Anal Classif 7(1):5–40
Article MathSciNet Google Scholar
Subedi S, Punzo A, Ingrassia S, McNicholas PD (2015) Cluster-weighted \(t\)-factor analyzers for robust model-based clustering and dimension reduction. Stat Methods Appl 24(4):623–649
Article MathSciNet Google Scholar
Visser I, Raijmakers MEJ, Molenaar PCM (2000) Confidence intervals for hidden markov model parameters. Br J Math Stat Psychol 53(2):317–327
Article Google Scholar
Zucchini W, MacDonald IL, Langrock R (2016) Hidden Markov models for time series: an introduction using R, monographs on statistics & applied probability, vol 150, 2nd edn. CRC Press, Boca Raton
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Economics and Business, University of Catania, Catania, Italy
Antonio Punzo & Salvatore Ingrassia
Department of Economics, Political Sciences and Modern Languages, LUMSA, Rome, Italy
Antonello Maruotti
Department of Mathematics, University of Bergen, Bergen, Norway
Antonello Maruotti

Authors

Antonio Punzo
View author publications
You can also search for this author in PubMed Google Scholar
Salvatore Ingrassia
View author publications
You can also search for this author in PubMed Google Scholar
Antonello Maruotti
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Antonello Maruotti.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 141 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Punzo, A., Ingrassia, S. & Maruotti, A. Multivariate hidden Markov regression models: random covariates and heavy-tailed distributions. Stat Papers 62, 1519–1555 (2021). https://doi.org/10.1007/s00362-019-01146-3

Download citation

Received: 15 July 2018
Revised: 03 August 2019
Published: 18 November 2019
Issue Date: June 2021
DOI: https://doi.org/10.1007/s00362-019-01146-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multivariate hidden Markov regression models: random covariates and heavy-tailed distributions

Abstract

Access this article

Similar content being viewed by others

Robust Random Effects Models: A Diagnostic Approach Based on the Forward Search

Semiparametric Bayesian inference on generalized linear measurement error models

Automatic Discovery of Common and Idiosyncratic Latent Effects in Multilevel Regression

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Electronic supplementary material

Supplementary material 1 (pdf 141 KB)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Multivariate hidden Markov regression models: random covariates and heavy-tailed distributions

Abstract

Access this article

Similar content being viewed by others

Robust Random Effects Models: A Diagnostic Approach Based on the Forward Search

Semiparametric Bayesian inference on generalized linear measurement error models

Automatic Discovery of Common and Idiosyncratic Latent Effects in Multilevel Regression

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Electronic supplementary material

Supplementary material 1 (pdf 141 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation