Skip to main content
Log in

Multivariate hidden Markov regression models: random covariates and heavy-tailed distributions

  • Regular Article
  • Published:
Statistical Papers Aims and scope Submit manuscript

Abstract

Despite recent methodological advances in hidden Markov regression models and a rapid increase in their application in a wide range of empirical settings, complex clustering-based research questions that include the contribution of the covariates set to the classification and the presence of atypical observations are often addressed ignoring the possible effects of wrong model assumptions. Hidden Markov regression models with random covariates (HMRMRCs) have been recently proposed as an improvement over the classical fixed covariates approach, allowing the covariates to contribute to the underlying clustering structure. To make the approach more flexible, when all the considered random variables are continuous, HMRMRCs are here defined focusing on three multivariate elliptical distributions: the normal (reference distribution), the t, and the contaminated normal. The latter two, heavy-tailed generalizations of the normal distribution, are introduced to protect the reference model for the occurrence of mildly atypical points and also allow us their automatic detection. Identifiability conditions are provided, EM-based algorithms are outlined for parameter estimation, and various implementation and operational issues are discussed. Properties of the estimators of the regression coefficients, as well as of the hidden path parameters, are evaluated through Monte Carlo experiments with the aim of showing the consequences of wrong model assumptions on paramaters estimates and inferred clustering. Artificial and real data analyses are provided to investigate models behavior in presence of heterogeneity and atypical observations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  • Bartolucci F, Farcomeni A (2009) A multivariate extension of the dynamic logit model for longitudinal data based on a latent Markov heterogeneity structure. J Am Stat Assoc 104:816–831

    Article  MathSciNet  Google Scholar 

  • Bartolucci F, Farcomeni A, Pennoni F (2014) Latent Markov models: a review of a general framework for the analysis of longitudinal data with covariates. Test 23(3):433–465

    Article  MathSciNet  Google Scholar 

  • Baum LE, Petrie T, Soules G, Weiss N (1970) A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. Ann Math Stat 41(1):164–171

    Article  MathSciNet  Google Scholar 

  • Bernardi M, Maruotti A, Petrella L (2017) Multiple risk measures for multivariate dynamic heavy-tailed models. J Empir Financ 43:1–32

    Article  Google Scholar 

  • Biernacki C, Lourme A (2014) Stable and visualizable Gaussian parsimonious clustering models. Stat Comput 24(6):953–969

    Article  MathSciNet  Google Scholar 

  • Croux C, Dehon C (2003) Estimators of the multiple correlation coefficient: local robustness and confidence intervals. Stat Pap 44(3):315–334

    Article  MathSciNet  Google Scholar 

  • Dang UJ, Punzo A, McNicholas PD, Ingrassia S, Browne RP (2017) Multivariate response and parsimony for Gaussian cluster-weighted models. J Classif 34(1):4–34

    Article  MathSciNet  Google Scholar 

  • Dannemann J, Holzmann H, Leister A (2014) Semiparametric hidden Markov models: identifiability and estimation. Wiley Interdiscip Rev Comput Stat 6(6):418–425

    Article  Google Scholar 

  • Hennig C (2000) Identifiablity of models for clusterwise linear regression. J Classif 17(2):273–296

    Article  Google Scholar 

  • Hossain A, Naik DN (1991) A comparative study on detection of influential observations in linear regression. Stat Pap 32(1):55–69

    Article  MathSciNet  Google Scholar 

  • Ingrassia S, Rocci R (2007) Constrained monotone EM algorithms for finite mixture of multivariate Gaussians. Comput Stat Data Anal 51(11):5339–5351

    Article  MathSciNet  Google Scholar 

  • Ingrassia S, Minotti SC, Punzo A (2014) Model-based clustering via linear cluster-weighted models. Comput Stat Data Anal 71:159–182

    Article  MathSciNet  Google Scholar 

  • Lachos VH, Angolini T, Abanto-Valle CA (2011) On estimation and local influence analysis for measurement errors models under heavy-tailed distributions. Stat Pap 52(3):567–590

    Article  MathSciNet  Google Scholar 

  • Leroux BG (1992) Maximum-likelihood estimation for hidden Markov models. Stoch Process Their Appl 40(1):127–143

    Article  MathSciNet  Google Scholar 

  • Maronna RA (1976) Robust \({M}\)-estimators of multivariate location and scatter. Ann Stat 4(1):51–67

    Article  MathSciNet  Google Scholar 

  • Martinez-Zarzoso I, Maruotti A (2013) The environmental kuznets curve: functional form, time-varying heterogeneity and outliers in a panel setting. Environmetrics 24(7):461–475

    Article  MathSciNet  Google Scholar 

  • Maruotti A (2011) Mixed hidden Markov models for longitudinal data: An overview. Int Stat Rev 79(3):427–454

    Article  Google Scholar 

  • Maruotti A (2014) Robust fitting of hidden Markov regression models under a longitudinal setting. J Stat Comput Simul 84(8):1728–1747

    Article  MathSciNet  Google Scholar 

  • Maruotti A, Punzo A (2017) Model-based time-varying clustering of multivariate longitudinal data with covariates and outliers. Comput Stat Data Anal 113:475–496

    Article  MathSciNet  Google Scholar 

  • Maruotti A, Bulla J, Lagona F, Picone M, Martella F (2017) Dynamic mixtures of factor analyzers to characterize multivariate air pollutant exposures. Ann Appl Stat 11(3):1617–1648

    Article  MathSciNet  Google Scholar 

  • Maruotti A, Punzo A, Bagnato L (2019) Hidden Markov and semi-Markov models with multivariate leptokurtic-normal components for robust modeling of daily returns series. J Financ Econom 17(1):91–117

    Article  Google Scholar 

  • Mazza A, Punzo A (2017) Mixtures of multivariate contaminated normal regression models. Stat Pap. https://doi.org/10.1007/s00362-017-0964-y

    Article  MATH  Google Scholar 

  • Mazza A, Punzo A, Ingrassia S (2018) flexCWM: a flexible framework for cluster-weighted models. J Stat Softw 86(2):1–30

    Article  Google Scholar 

  • McLachlan G, Krishnan T (2007) The EM algorithm and extensions, Wiley Series in Probability and Statistics, vol 382, 2nd edn. Wiley, New York

    Google Scholar 

  • McLachlan GJ, Peel D (2000) Finite mixture models. Wiley, New York

    Book  Google Scholar 

  • Meng XL, Rubin DB (1993) Maximum likelihood estimation via the ECM algorithm: a general framework. Biometrika 80(2):267–278

    Article  MathSciNet  Google Scholar 

  • Niu X, Li P, Zhang P (2016) Testing homogeneity in a scale mixture of normal distributions. Stat Pap 57(2):499–516

    Article  MathSciNet  Google Scholar 

  • Punzo A, Ingrassia S (2015) Parsimonious generalized linear Gaussian cluster-weighted models. In: Morlini I, Minerva T, Vichi M (eds) Advances in statistical models for data analysis. Studies in classification, data analysis and knowledge organization. Springer, Switzerland, pp 201–209

    Chapter  Google Scholar 

  • Punzo A, Maruotti A (2016) Clustering multivariate longitudinal observations: the contaminated Gaussian hidden Markov model. J Comput Graph Stat 25(4):1097–1116

    Article  MathSciNet  Google Scholar 

  • Punzo A, McNicholas PD (2016) Parsimonious mixtures of multivariate contaminated normal distributions. Biom J 58(6):1506–1537

    Article  MathSciNet  Google Scholar 

  • Punzo A, McNicholas PD (2017) Robust clustering in regression analysis via the contaminated Gaussian cluster-weighted model. J Classif 34(2):249–293

    Article  MathSciNet  Google Scholar 

  • Punzo A, Ingrassia S, Maruotti A (2018a) Multivariate generalized hidden Markov regression models with random covariates: physical exercise in an elderly population. Stat Med 37(19):2797–2808

    Article  MathSciNet  Google Scholar 

  • Punzo A, Mazza A, McNicholas PD (2018b) ContaminatedMixt: An R package for fitting parsimonious mixtures of multivariate contaminated normal distributions. J Stat Softw 85(10):1–25

    Article  Google Scholar 

  • R Core Team (2018) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/

  • Ritter G (2015) Robust cluster analysis and variable selection, Chapman & Hall/CRC monographs on statistics & applied probability, vol 137. CRC Press, Boca Raton

    Google Scholar 

  • Rousseeuw PJ, Leroy AM (2005) Robust regression and outlier detection. Wiley Series in probability and statistics. Wiley, Hoboken

    MATH  Google Scholar 

  • Subedi S, Punzo A, Ingrassia S, McNicholas PD (2013) Clustering and classification via cluster-weighted factor analyzers. Adv Data Anal Classif 7(1):5–40

    Article  MathSciNet  Google Scholar 

  • Subedi S, Punzo A, Ingrassia S, McNicholas PD (2015) Cluster-weighted \(t\)-factor analyzers for robust model-based clustering and dimension reduction. Stat Methods Appl 24(4):623–649

    Article  MathSciNet  Google Scholar 

  • Visser I, Raijmakers MEJ, Molenaar PCM (2000) Confidence intervals for hidden markov model parameters. Br J Math Stat Psychol 53(2):317–327

    Article  Google Scholar 

  • Zucchini W, MacDonald IL, Langrock R (2016) Hidden Markov models for time series: an introduction using R, monographs on statistics & applied probability, vol 150, 2nd edn. CRC Press, Boca Raton

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Antonello Maruotti.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 141 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Punzo, A., Ingrassia, S. & Maruotti, A. Multivariate hidden Markov regression models: random covariates and heavy-tailed distributions. Stat Papers 62, 1519–1555 (2021). https://doi.org/10.1007/s00362-019-01146-3

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00362-019-01146-3

Keywords

Navigation