Skip to main content
Log in

Model-based approach for household clustering with mixed scale variables

Advances in Data Analysis and Classification Aims and scope Submit manuscript


The Ministry of Social Development in Mexico is in charge of creating and assigning social programmes targeting specific needs in the population for the improvement of the quality of life. To better target the social programmes, the Ministry is aimed to find clusters of households with the same needs based on demographic characteristics as well as poverty conditions of the household. Available data consists of continuous, ordinal, and nominal variables, all of which come from a non-i.i.d complex design survey sample. We propose a Bayesian nonparametric mixture model that jointly models a set of latent variables, as in an underlying variable response approach, associated to the observed mixed scale data and accommodates for the different sampling probabilities. The performance of the model is assessed via simulated data. A full analysis of socio-economic conditions in households in the Mexican State of Mexico is presented.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5


  • Bandyopadhyay D, Canale A (2016) Non-parametric spatial models for clustered ordered periodontal data. J R Stat Soc Ser C 65:619–640

    Article  MathSciNet  Google Scholar 

  • Banfield JD, Raftery AE (1993) Model-based Gaussian and non-Gaussian clustering. Biometrics 49:803–821

    Article  MathSciNet  MATH  Google Scholar 

  • Barnard J, McCulloch R, Meng X-L (2000) Modeling covariance matrices in terms of standard deviations and correlations, with application to shrinkage. Stat Sin 10:1281–1311

    MathSciNet  MATH  Google Scholar 

  • Barrios E, Lijoi A, Nieto-Barajas LE, Prünster I (2013) Modeling with normalized random measure mixture models. Stat Sci 28:313–334

    Article  MathSciNet  MATH  Google Scholar 

  • Box GEP, Cox DR (1964) An analysis of transformations (with discussion). J R Stat Soc B 26:211–252

    MathSciNet  MATH  Google Scholar 

  • Cai JH, Song XY, Lam KH, Ip EH (2011) A mixture of generalized latent variable models for mixed mode and heterogeneous data. Comput Stat Data Anal 55:2889–2907

    Article  MathSciNet  MATH  Google Scholar 

  • Canale A, Dunson DB (2011) Bayesian kernel mixtures for counts. J Am Stat Assoc 106:1528–1539

    Article  MathSciNet  MATH  Google Scholar 

  • Canale A, Dunson DB (2015) Bayesian multivariate mixed-scale density estimation. Stat Interface 8:195–201

    Article  MathSciNet  MATH  Google Scholar 

  • Canale A, Scarpa B (2016) Bayesian nonparametric location-scale-shape mixtures. Test 25:113–130

    Article  MathSciNet  MATH  Google Scholar 

  • Carmona C, Nieto-Barajas LE (2017) Package BNPMIXcluster. R package version 1.2.0

  • Chambers RL, Skinner CJ (2003) Analysis of survey data. Wiley, Chichester

    Book  MATH  Google Scholar 

  • CONEVAL (2009) Metodología para la medición multidimensional de la pobreza en México. Consejo Nacional de Evaluación de la Política de Desarrollo Social, México. (in Spanish)

  • Dahl DB (2006) Model-based clustering for expression data via a Dirichlet process mixture model. In: Vanucci M, Do K-A, Müller P (eds) Bayesian inference for gene expression and proteomics. Cambridge University Press, Cambridge

    Google Scholar 

  • Everitt BS (1988) A finite mixture models for the clustering of mixed-mode data. Stat Probab Lett 6:305–309

    Article  MathSciNet  Google Scholar 

  • Ferguson TS (1973) A Bayesian analysis of some nonparametric problems. Ann Stat 1:209–230

    Article  MathSciNet  MATH  Google Scholar 

  • Fernández D, Arnold R, Pledger S (2016) Mixture-based clustering for the ordered stereotype model. Comput Stat Data Anal 93:46–75

    Article  MathSciNet  MATH  Google Scholar 

  • Ishwaran H, James LF (2001) Gibbs sampling methods for stick-breaking priors. J Am Stat Assoc 96:161–173

    Article  MathSciNet  MATH  Google Scholar 

  • Kingman JFC (1975) Random discrete distributions. J R Stat Soc B 37:1–22

    MathSciNet  MATH  Google Scholar 

  • Kottas A, Müller P, Quintana F (2005) Nonparametric Bayesian modeling for multivariate ordinal data. J Comput Graph Stat 14:610–625

    Article  MathSciNet  Google Scholar 

  • Leon-Novelo LG, Zhou X, Nebiyou Bekele B, Müller P (2010) Assessing toxicities in a clinical trial: Bayesian inference for ordinal data nested within categories. Biometrics 66:966–974

    Article  MathSciNet  MATH  Google Scholar 

  • Lumley T (2010) Complex surveys. Wiley, Hoboken

    Book  Google Scholar 

  • McLachlan GJ, Basford KE (1988) Mixture models: inference and applications to clustering. Marcel Dekker, New York

    MATH  Google Scholar 

  • McParland D, Claire Gormley I, McCormick TH, Clark SJ, Whiteson Kabudula C, Collinson MA (2014) Clustering South African households based on their asset status using latent variable models. Ann Appl Stat 8:747–776

    Article  MathSciNet  MATH  Google Scholar 

  • Navarrete C, Quintana FA, Müller P (2008) Some issues in nonparametric Bayesian modeling using species sampling models. Stat Model 8:3–21

    Article  MathSciNet  Google Scholar 

  • Nieto-Barajas LE, Contreras-Cristán A (2014) A Bayesian nonparametric approach for time series clustering. Bayesian Anal 9:147–170

    Article  MathSciNet  MATH  Google Scholar 

  • Norets A, Pelenis J (2012) Bayesian modeling of joint and conditional distributions. J Econom 168:332–346

    Article  MathSciNet  MATH  Google Scholar 

  • Pitman J (1995) Exchangeable and partially exchangeable random partitions. Probab Theory Relat Fields 102:145–158

    Article  MathSciNet  MATH  Google Scholar 

  • Pledger S, Arnold R (2014) Multivariate methods using mixtures: correspondence amalysis, scaling and pattern-detection. Comput Stat Data Anal 71:241–261

    Article  MATH  Google Scholar 

  • Pitman J, Yor M (1997) The two-parameter Poisson–Dirichlet distribution derived from a stable subordinator. Ann Probab 25:855–900

    Article  MathSciNet  MATH  Google Scholar 

  • R Core Team (2016) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna.

  • Ritter C, Tanner MA (1992) Facilitating the Gibbs sampler. The Gibbs stopper and the Griddy-Gibbs sampler. J Am Stat Assoc 87:861–868

    Article  Google Scholar 

  • Rodríguez CE, Walker SG (2014) Univariate Bayesian nonparametric mixture modeling with unimodal kernels. Stat Comput 24:35–49

    Article  MathSciNet  MATH  Google Scholar 

  • Tierney L (1994) Markov chains for exploring posterior distributions. Ann Stat 22:1701–1762

    Article  MathSciNet  MATH  Google Scholar 

  • Wade S, Ghahramani Z (2017) Bayesian cluster analysis: point estimation and credible balls. Bayesian Anal.

Download references


The authors are grateful to the constructive comments of a guest editor and two anonymous referees. The first author acknowledges support from Consejo Nacional de Ciencia y Tecnología, Mexico. The second author acknowledges support from Asociación Mexicana de Cultura, A. C. Mexico. The third author is also affiliated with the Collegio Carlo Alberto and acknowledges support of Grant CPDA154381/15 from the University of Padua, Italy.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Luis Nieto-Barajas.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Carmona, C., Nieto-Barajas, L. & Canale, A. Model-based approach for household clustering with mixed scale variables. Adv Data Anal Classif 13, 559–583 (2019).

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:


Mathematics Subject Classification