Variational Bayes approximations for clustering via mixtures of normal inverse Gaussian distributions

  • Sanjeena SubediEmail author
  • Paul D. McNicholas
Regular Article


Parameter estimation for model-based clustering using a finite mixture of normal inverse Gaussian (NIG) distributions is achieved through variational Bayes approximations. Univariate NIG mixtures and multivariate NIG mixtures are considered. The use of variational Bayes approximations here is a substantial departure from the traditional EM approach and alleviates some of the associated computational complexities and uncertainties. Our variational algorithm is applied to simulated and real data. The paper concludes with discussion and suggestions for future work.


Clustering MNIG NIG Normal inverse Gaussian   Variational approximations Variational Bayes 

Mathematics Subject Classification



  1. Abramowitz M, Stegun I (1972) Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables, 9th edn. Dover Press, New YorkzbMATHGoogle Scholar
  2. Akaike H (1973) Information theory and an extension of the maximum likelihood principle. In: Second international symposium on information theory, vol 1. Springer, Berlin, pp 267–281Google Scholar
  3. Andrews JL, McNicholas PD (2011) Extending mixtures of multivariate t-factor analyzers. Stat Comput 21(3):361–373MathSciNetCrossRefGoogle Scholar
  4. Andrews JL, McNicholas PD (2012) Model-based clustering, classification, and discriminant analysis via mixtures of multivariate t-distributions. Stat Comput 22(5):1021–1029zbMATHMathSciNetCrossRefGoogle Scholar
  5. Andrews JL, McNicholas PD, Subedi S (2011) Model-based classification via mixtures of multivariate t-distributions. Comput Stat Data Anal 55:520–529zbMATHMathSciNetCrossRefGoogle Scholar
  6. Baek J, McLachlan GJ (2011) Mixtures of common t-factor analyzers for clustering high-dimensional microarray data. Bioinformatics 27:1269–1276CrossRefGoogle Scholar
  7. Baek J, McLachlan GJ, Flack LK (2010) Mixtures of factor analyzers with common factor loadings: applications to the clustering and visualization of high-dimensional data. IEEE Trans Pattern Anal Mach Intell 32(7):1298–1309CrossRefGoogle Scholar
  8. Barndorff-Nielsen OE (1997) Normal inverse Gaussian distributions and stochastic volatility modelling. Scand J Stat 24(1):1–13zbMATHMathSciNetCrossRefGoogle Scholar
  9. Baum LE, Petrie T, Soules G, Weiss N (1970) A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. Ann Math Stat 41:164–171zbMATHMathSciNetCrossRefGoogle Scholar
  10. Beal MJ (2003) Variational algorithms for approximate Bayesian inference. PhD thesis, University of LondonGoogle Scholar
  11. Bechtel Y, Bonaiti-Pellie C, Poisson N, Magnette J, Bechtel P (1993) A population and family study of \(N\)-acetyltransferase using caffeine urinary metabolites. Clin Pharmacol Ther 54(2):134–141CrossRefGoogle Scholar
  12. Bouveyron C, Girard S, Schmid C (2007) High-dimensional data clustering. Comput Stat Data Anal 52(1):502–519zbMATHMathSciNetCrossRefGoogle Scholar
  13. Browne RP, McNicholas PD, Sparling MD (2012) Model-based learning using a mixture of mixtures of Gaussian and uniform distributions. IEEE Trans Pattern Anal Machine Intell 34(4):814–817CrossRefGoogle Scholar
  14. Celeux G, Govaert G (1995) Gaussian parsimonious clustering models. Pattern Recognit 28:781–793CrossRefGoogle Scholar
  15. Chhikara RS, Folks JL (1989) The Inverse Gaussian Distribution: Theory, Methodology, and Applications, Statistics: Textbooks and Monographs, vol 95. Marcel Dekker Inc, New YorkGoogle Scholar
  16. Corduneanu A, Bishop CM (2001) Variational Bayesian model selection for mixture distributions. Artificial Intelligence and Statistics. Morgan Kaufmann, Los Altos, pp 27–34Google Scholar
  17. Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B 39(1):1–38zbMATHMathSciNetGoogle Scholar
  18. Fraley C, Raftery AE (2002) Model-based clustering, discriminant analysis, and density estimation. J Am Stat Assoc 97(458):611–631zbMATHMathSciNetCrossRefGoogle Scholar
  19. Franczak BC, Browne RP, McNicholas PD (2012) Mixtures of shifted asymmetric Laplace distributions. arXiv:1207.1727v3
  20. Ghahramani Z, Hinton GE (1997) The EM algorithm for factor analyzers. Tech. Rep. CRG-TR-96-1, University of Toronto, TorontoGoogle Scholar
  21. Hastie T, Tibshirani R (1996) Discriminant analysis by Gaussian mixtures. J R Stat Soc Ser B 58(1):155–176zbMATHMathSciNetGoogle Scholar
  22. Hubert L, Arabie P (1985) Comparing partitions. J Classif 2:193–218CrossRefGoogle Scholar
  23. Jordan MI, Ghahramani Z, Jaakkola TS, Saul LK (1999) An introduction to variational methods for graphical models. Mach Learn 37:183–233zbMATHCrossRefGoogle Scholar
  24. Jørgensen B (1982) Statistical Properties of the Generalized Inverse Gaussian Distribution, vol 21. Springer, New YorkCrossRefGoogle Scholar
  25. Karlis D, Lillestol J (2004) Bayesian estimation of NIG models via Markov chain Monte Carlo methods. Appl Stoch Models Business Ind 20:323–338zbMATHMathSciNetCrossRefGoogle Scholar
  26. Karlis D, Santourian A (2009) Model-based clustering with non-elliptically contoured distributions. Stat Comput 19(1):73–83MathSciNetCrossRefGoogle Scholar
  27. Lee SX, McLachlan GJ (2013) On mixtures of skew normal and skew t-distributions. Adv Data Anal Classif 7(3):241–266zbMATHMathSciNetCrossRefGoogle Scholar
  28. Lillestol J (2000) Risk analysis and the NIG distribution. J Risk 2:41–56Google Scholar
  29. Lin TI (2009) Maximum likelihood estimation for multivariate skew normal mixture models. J Multivariate Anal 100:257–265zbMATHMathSciNetCrossRefGoogle Scholar
  30. Lin TI (2010) Robust mixture modeling using multivariate skew t distributions. Stat Comput 20:343–356MathSciNetCrossRefGoogle Scholar
  31. McGrory CA, Titterington DM (2007) Variational approximations in Bayesian model selection for finite mixture distributions. Comput Stat Data Anal 51:5352–5367zbMATHMathSciNetCrossRefGoogle Scholar
  32. McLachlan GJ, Peel D (2000) Mixtures of factor analyzers. Proceedings of the seventh international conference on machine learning. Morgan Kaufmann, San Francisco, pp 599–606Google Scholar
  33. McNicholas PD (2010) Model-based classification using latent Gaussian mixture models. J Stat Plan Infer 140(5):1175–1181zbMATHMathSciNetCrossRefGoogle Scholar
  34. McNicholas PD, Murphy TB (2008) Parsimonious Gaussian mixture models. Stat Comput 18:285–296MathSciNetCrossRefGoogle Scholar
  35. McNicholas PD, Murphy TB (2010) Model-based clustering of longitudinal data. Can J Stat 38(1):153–168zbMATHMathSciNetGoogle Scholar
  36. McNicholas PD, Subedi S (2012) Clustering gene expression time course data using mixtures of multivariate t-distributions. J Stat Plan Infer 142(5):1114–1127zbMATHMathSciNetCrossRefGoogle Scholar
  37. McNicholas PD, Murphy TB, McDaid AF, Frost D (2010) Serial and parallel implementations of model-based clustering via parsimonious Gaussian mixture models. Comput Stat Data Anal 54(3):711–723zbMATHMathSciNetCrossRefGoogle Scholar
  38. Morris K, McNicholas PD (2013a) Dimension reduction for model-based clustering via mixtures of shifted asymmetric Laplace distributions. Stat Probab Lett 83(9):2088–2093zbMATHMathSciNetCrossRefGoogle Scholar
  39. Morris K, McNicholas PD (2013b) Non-Gaussian mixtures for dimension reduction, clustering, classification, and discriminant analysis. arXiv:1308.6315
  40. Morris K, McNicholas PD, Scrucca L (2013) Dimension reduction for model-based clustering via mixtures of multivariate t-distributions. Adv Data Anal Classif 7(3):321–338zbMATHMathSciNetCrossRefGoogle Scholar
  41. Murray PM, Browne RP, McNicholas PD (2013a) Mixtures of skew-\(t\) factor analyzers. arXiv:1305.4301v2
  42. Murray PM, McNicholas PD, Browne RP (2013b) Mixtures of common skew-\(t\) factor analyzers. arXiv:1307.5558v2
  43. Orchard T, Woodbury MA (1972) A missing information principle: theory and applications. In: Le Cam LM, Neyman J, Scott EL (eds) Proceedings of the sixth Berkeley symposium on mathematical statistics and probability, vol 1., Theory of StatisticsUniversity of California Press, Berkeley, pp 697–715Google Scholar
  44. Punzo A, McNicholas PD (2013) Outlier detection via parsimonious mixtures of contaminated Gaussian distributions. arXiv:1305.4669
  45. Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6(2):461–464zbMATHCrossRefGoogle Scholar
  46. Seshadri V (1993) The inverse Gaussian distribution: a case study in exponential families. Oxford University Press, New YorkGoogle Scholar
  47. Steane MA, McNicholas PD, Yada R (2012) Model-based classification via mixtures of multivariate t-factor analyzers. Commun Stat 41(4):510–523zbMATHMathSciNetCrossRefGoogle Scholar
  48. Sundberg R (1974) Maximum likelihood theory for incomplete data from an exponential family. Scand J Stat 1:49–58zbMATHMathSciNetGoogle Scholar
  49. Teschendorff A, Wang Y, Barbosa-Morais N, Brenton J, Caldas C (2005) A variational Bayesian mixture modelling framework for cluster analysis of gene-expression data. Bioinformatics 21(13):3025–3033CrossRefGoogle Scholar
  50. Titterington DM, Smith AFM, Makov UE (1985) Statistical Analysis of Finite Mixture Distributions. Wiley, ChichesterzbMATHGoogle Scholar
  51. Todorov V, Filzmoser P (2009) An object-oriented framework for robust multivariate analysis. J Stat Softw 32(3):1–47Google Scholar
  52. Venables WN, Ripley BD (2002) Modern Applied Statistics with S, 4th edn. Springer, New YorkzbMATHCrossRefGoogle Scholar
  53. Vrbik I, McNicholas PD (2012) Analytic calculations for the EM algorithm for multivariate skew-t mixture models. Stat Probab Lett 82(6):1169–1174zbMATHMathSciNetCrossRefGoogle Scholar
  54. Vrbik I, McNicholas PD (2014) Parsimonious skew mixture models for model-based clustering and classification. Comput Stat Data Anal 71:196–210MathSciNetCrossRefGoogle Scholar
  55. Waterhouse S, MacKay D, Robinson T (1996) Bayesian methods for mixture of experts. In: Advances in neural information processing systems, vol 8. MIT Press, CambridgeGoogle Scholar
  56. Wolfe JH (1963) Object cluster analysis of social areas. Master’s thesis, University of California, BerkeleyGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  1. 1.Department of Mathematics and StatisticsUniversity of GuelphGuelphCanada

Personalised recommendations