On finite mixtures of Discretized Beta model for ordered responses

The paper discusses the specification of finite mixture models based on the Discretized Beta distribution for the analysis of ordered discrete responses, as ratings and count data. The ultimate goal of the paper is to parameterize clusters of opposite and intermediate response outcomes. After a thorough discussion on model interpretation, identifiability and estimation, the proposal is illustrated on the wake of a case study on the probability to vote for German Political Parties and with a comparative discussion with the state of the art.


Motivation
With the upcoming of any electoral competition, parties' share of the electoral consensus can be measured by pollsters if voting intentions on nominal scales are surveyed. A more innovative approach consists in gauging probability to vote for each candidate as ratings on ordered scales in order to assess the extent by which respondents' opinions hold. Similarly, marketing stakeholders prefer to survey intention to take a certain decision in the future, rather than asking questions with yes/no answers about respondents' likings and habits. Thus, suitable statistical modelling of ordered evaluations is advocated to characterize clusters of both extreme and intermediate response choices.
Polarization is hereafter meant as the process by which evaluations about an item converge towards one of two opposing poles of the response spectrum, in the spirit of B Rosaria Simone rosaria.simone@unina.it 1 University of Naples Federico II: Universita degli Studi di Napoli Federico II, Naples, Italy (Apouey 2007) 1 . Possibly, a further cluster may be expected as a result of un-polarized respondents, corresponding to a concentration of responses away from the extremes: the term floatation is hereafter used to indicate this circumstance as complementary to polarization.
A candidate model allowing to directly parameterize polarization towards the extremes is the two-component mixture of Inverse Hypergeometric distributions (mihg, (Simone and Iannario 2018)), whereas a mixture of Binomial and Discretized Beta models can be considered to analyse overall response feeling and certain symmetric response styles (caub, (Simone and Tutz 2018)). For count data, bimodality (not necessarily at the extremes of the response support) can be tackled via suitable adaptation of the (shifted) Poisson distribution (Gómez-Déniz et al. 2020) or by resorting to a two-component mixture of Conway-Maxwell-Poisson models (Sur et al. 2015).
With respect to the state of the art, the paper discusses the specification of mixture models based on the Discretized Beta distribution (Ursino 2014;Ursino and Gasparini 2018) as a flexible class of statistical models to parameterize polarization and floatation of ordered evaluations. The proposal is designed to attain broad and straightforward interpretation for marketing, psychology and socio-economic studies, as it allows to characterize opposite and intermediate response clusters. Further relevant applications include self-reported wealth or health, or Net Promoter Score type evaluations (NPS, (Reichheld 2003)) to assess the extent by which attractors outclass detractors (Capecchi and Piccolo 2017).
The paper is organized as follows: Sect. 2 recalls the baseline framework of the Discretized Beta model. The core of the paper is Sect. 3, with a detailed discussion on mixtures based on the Discretized Beta distribution to jointly model polarization and floatation of ordered evaluations; goodness-of-fit criteria and inferential aspects are described in Sects. 3.2-3.5, whereas a comparative discussion of the state of the art is delivered in Sect. 3.6. A case study is pursued in Sect. 4 to support the proposal with empirical evidence. Concluding remarks are addressed in Sect. 5. A devoted appendix supplements the presentation with a discussion on the optimal number of components for Discretized Beta mixtures and of the parameter constraints needed to prevent identifiability issues.

Discretized Beta mixtures for polarization and floatation of ordered data
Let R be a rating variable collected on a response scale with m ordered categories, say c 1 ≺ c 2 ≺ · · · ≺ c m : the numeric scoring c r = r will be made merely for notational convenience. Without loss of generality, assume that the scale has a positive orientation with the trait being examined. Definition 1 For α, β ∈ R + , let X ∼ Beta(α, β) be a Beta distributed random variable over the real interval [0, 1]. For a given m > 3, a discrete variable R, with support {1, 2, . . . , m}, is said to be distributed according to a Discretized Beta model of parameters α, β (R ∼ DB(α, β), for short) if: Pr(R = r |α, β) = Pr r − 1 m ≤ X ≤ r m α, β , r = 1, . . . , m. (1) For notational convenience, set db(r ; α, β) := Pr(R = r |α, β). This model has been already acknowledged in the literature on ordinal data analysis in view of the flexibility inherited from the underlying Beta distribution, which does not impose a predetermined shape for the latent continuous trait (Ursino 2014;Fasola and Sciandra 2015;Ursino and Gasparini 2018;Simone and Tutz 2018). Similar arguments can be advanced for the Beta-Binomial model (Morrison 1979), yet the Discretized Beta is more versatile as it can be either overdispersed or underdispersed (Ursino 2014). The uniform distribution arises as a limit case when α = β = 1. Location and shape properties of the latent Beta model imply the following features of the DB distribution (Abramowitz and Stegun 1972;Forbes et al. 2011). Given that the discretization of the latent Beta model occurs at equi-spaced intervals for a fixed m, the modal value . Thus, the following condition implies an inner mode: -The distribution is U -shaped with two modal values at the first and at the last categories if max(α, β) < 1 and if, for the given m, parameters satisfy the following system of inequalities 2 based on the incomplete Beta function I x (α, β): As a consequence, a necessary condition for a Discretized Beta model to be applied for polarization of either favourable or unfavourable responses is the constraint min(α, β) < 1. Under this circumstance, parameter α governs the polarization of the unfavourable responses: hereafter, this cluster will be referred to as opponents' pole. If β = max(α, β) ≥ 1, the closer α is to 0, the stronger is the polarization of the opponents, with positive asymmetry increasing with growing β. Conversely, β governs the polarization of the favourable responses (say, the supporters' pole). If α = max(α, β) ≥ 1, the closer β is to 0, the higher is the probability assigned to the last category and thus the stronger is the polarization of the supporters, with negative asymmetry strengthening with growing α. A Discretized Beta model with max(α, β) < 1, instead, can be specified to account for polarization towards both the extremes (provided that (3) holds), whereas floatation between the two response endpoints can be modelled by assuming a DB(α, β) distribution with min(α, β) > 1, such that (2) holds true, given the number of categories 3 . Asymmetry and intensity of floatation can be measured in terms of skewness γ 1 (α, β) and excess kurtosis γ 2 (α, β) of the underlying Beta distribution 4 : such that γ 1 (α, β) = −γ 1 (β, α) and γ 2 (α, β) = γ 2 (β, α). However, interpretation of excess kurtosis is not straightforward for asymmetric distributions: the measure of kurtosis adjusted for skewness introduced in (Blest 2003) can be considered to overcome this issue (see (15) in Appendix 1 for details).
Some identifiability issues may arise for the polarization components in both (6) and (7), due to a Beta approximation of the latent Beta models. Appendix 2 collects all the relevant discussion and results pertaining to these topics: the present section will focus on the proposed class of mixtures, stemming from (7) under suitable parameter constraints.

The OFS mixture for polarization and floatation of ordered evaluations
In order to overcome possible identifiability issues for mixtures of DB models, the proposed strategy is to constrain β 1 = 1 and α 3 = 1 for the mixture specification (7).
Hereafter, the acronym OFS will stand for Opponent-Floatation-Supporter, and three 0-1 subscripts will indicate if each component is specified in the mixture (1) or not (0). Thus, models DB(α 1 , 1), with α 1 ∈ (0, 1), and DB(1, β 3 ), with β 3 ∈ (0, 1), will be referred to as OFS 100 and OFS 001 to indicate a DB distribution to model polarization towards the opponents' and the supporters' pole, respectively. Consequently, as a benchmark for bi-polarization towards the end-points, the proposal is to assume the following mixture specification.
Definition 2 If α 1 , β 3 , δ ∈ (0, 1), the OFS 101 model is defined by the mixture: The mixture of OFS 101 for polarization with an OFS 010 distribution for floatation (so that (2) holds) can be safely considered to jointly model polarization towards either one or both the extremes and possible floatation in between.
Definition 3 If the above notation prevails, the OFS 111 model is defined by: Remark 1 With reference to the procedures outlined in Appendix 2 and unlike for (6) and (7), the Beta approximation of the latent polarization components in (9), and its combination with the latent floatation, does not correspond to an OFS 111 specification. The same arguments apply if either OFS 100 or OFS 001 are assumed for polarization. Thus, identifiability of parameters can be assumed for OFS mixture models.
Both asymmetric and symmetric floatation are encompassed by the OFS 111 model (under the constraint α 2 = β 2 ). In case the floatation component is symmetric, the superscript (s) will be used. If m is odd, a degenerate floatation component corresponds to neutrality (in case α 2 = β 2 tends to infinity), resulting in inflation in the middle of the response scale: in this case, the superscript (i) will replace (s), and the resulting OFS (i) 111 model will denote a mixture of an OFS 101 model with a degenerate distribution 1 c=r with mass concentrated at c = m+1 2 (so that 1 c=r = 0 if r = c, and 1 c=r = 1 if r = c).
Remark 2 OFS models encompass also inflated responses at the extremes of the response support. Consider, for instance, the OFS 110 model: the DB(α 1 , 1) component identifies the opponents' cluster, which is characterized by a mode at the first category and decreasing probabilities as scores increase, thus allowing to account also for scale usage diversity among opponents and for different strengths of opposition. As a limit case, the OFS 110 tends to an inflated DB model with inflation at the first category if α 1 → 0. The dual remark applies for the OFS 011 model 5 . Thus, the smoothed switch between extreme modal values and inner categories implied by the OFS approach is more general than DB models with inflation at either one of the end-points (see the example discussed in Sect. 3.6).
Remark 3 Covariate effects on model parameters can be investigated via suitable link functions. If x i , y i , u i , z i , t i are selected subjects' characteristics, a logarithmic link can be set for individual floatation parameters α 2i , β 2i > 1: provided that the constraint (2) is taken into account also conditional to covariates, whereas a logit link can be set for polarization parameters α 1 , β 3 , δ 1 , δ 3 ∈ (0, 1):

Fitting performances and model selection
Model selection within the OFS class can be performed in terms of likelihood ratio test for pairs of nested models (to compare the symmetric and asymmetric specification for floatation, for instance). More generally, fitting performance of an OFS model against competing alternatives can be assessed by resorting to information criteria: in the following, the BIC index will be considered to account also for model complexity. Standard goodness-of-fit tests relying on Pearson X 2 statistics could be performed provided that m − 1 − k > 0, if k is the number of estimable parameters. For instance, m > 7 is needed to apply this test for OFS 111 models.
The normalized Leti's dissimilarity index (Leti 1983): will be considered to measure the goodness of fit of an estimated model p = p(θ) = ( p 1 , . . . , p m ) to the observed relative frequency distribution f = ( f 1 , . . . , f m ). 5 It is worth to remark that the OFS model is reversible with respect to the scale, in the sense that if With respect to more traditional indicators, as the Hellinger distance H ( p, q) (Gibbs and Su 2002), so that: the Dissimilarity value is interpretable as the percentage of responses that are missed by the model 6 . For this reason, it can be also exploited to check the ability of a model p, estimated on a training set, to predict the test set distribution f . With the same goal and for comparative purposes, the Kullback- f r log( f r p r ) will be also computed.

Inferential issues for the OFS model
Hereafter, the main steps of the expectation-maximization algorithm for mixtures (EM, (McLachlan and Krishnan 1997)) to perform maximum likelihood estimation of parameters are outlined for the general OFS 111 specification. For a sample of ratings r = (r 1 , . . . , r n ), the complete log-likelihood of the OFS 111 model, with parameter vector θ = (δ 1 , δ 3 , α 1 , α 2 , β 2 , β 3 ), is given by: where Z ji is a random variable with Z ji = 1 if the i-th rating is drawn from the j-th component in the mixture, and is the current estimate at the k-th iteration, the posterior probabilities of the i-th rating being drawn from the opponents' component DB(α 1 , 1) and the supporters' component DB(1, β 3 ) are computed within the E-step as: , 6 Indeed, from the identity min(a, b) = 1 2 (a + b) − |a − b| holding for a, b ∈ R + , one can write 3i . In case covariates effects are not specified in the model, then one can write τ jr if r i = r , r = 1, . . . , m, j = 1, 2, 3, and the expected complete log-likelihood to be maximized at the M-step can be rewritten as: where (n 1 , n 2 , . . . , n m ) denotes the frequency distribution of the sample, and one sets: 2r , yielding, after differentiation, the updated estimates: 2r log(db(r ; α 2 , β 2 )).
At each step, the updated estimates of α 1 , α 2 , β 2 , β 3 have to be obtained from numerical optimization of the corresponding functions, under the required bound constraints 7 .

Small simulation experiment
In order to show the performance of the estimation procedure, a small simulation experiment has been carried out: for each scenario, B = 200 samples of size n were generated. Table 1 reports the mean squared error (MSE) of the sampling distribution of parameter estimators obtained over the simulation runs. The average dissimilarity between generating model p and estimated distributionp ( Diss( p,p)) and between frequency distribution of the sample f and estimated distribution ( Diss( f ,p)) is reported. Analogous simulation experiments are pursued also for OFS 101 and OFS 110 for the sake of completeness (see Tables 2 and 3). Results are satisfactory and indicate that the model is correctly specified and estimated, with efficiency improving with sample size.

Standard errors for OFS parameters
Uncertainty evaluation of parameters estimates could be performed by resorting to asymptotic information theory on the basis of the observed information matrix (see Appendix 1 for details). Potential drawbacks of this procedure may arise due to possible occurrence of numerical overflow in the approximation of the involved integrals. In this respect, numerical derivatives of the log-likelihood can be computed directly with Richardson's extrapolation method, as suggested in (Ursino and Gasparini 2018) 8 . By considering that information theory results apply only asymptotically under regularity conditions, re-sampling methods as the bootstrap (Efron 1981) can be assumed as a general practice for OFS models, allowing to obtain stable accuracy evaluations on parameter estimates even for small sample sizes. A small Monte-Carlo experiment has been pursued to compare the asymptotic performance of the different methods: for selected OFS models, n observations were sampled. For the general OFS 111 model, Table 4 reports standard errors' estimates obtained on the basis of the observed information matrix (Inf.), numerical approximation of the derivatives of the log-likelihood function with the Richardson's extrapolation method (Num.), and nonparametric bootstrap with B = 500 replicates (Boot.). The three methods are asymptotically equivalent, but for small and moderate sample sizes, the data-driven procedure Boot entails more accurate results 9 . For instance, numerical divergence for some of the integrals involved in the computation of the observed information matrix occurred for n = 500.
The same check limited to numerical and bootstrap methods is pursued for instances of OFS 110 and OFS 101 models (see Tables 5 and 6).

A comparative discussion with the state of the art
Like the OFS family, mihg (Simone and Iannario 2018) and caub (Simone and Tutz 2018) mixture models pursue a direct parameterization of the features of interest of the distribution, with easy interpretation and explicit location of modal values (yet the mihg does not consider floatation). In this context, a 3-component mixture of Binomial distributions could be also considered if suitable constraints are put on Binomial parameters to model polarization and floatation: its specification will not be discussed hereafter, since the Binomial model can be approximated by the DB model (Ursino 2014): see (Grilli et al. 2015) for further applications of Binomial mixtures to discrete data. The proposal of the bimodal discrete shifted Poisson model (Bi-Poiss) advanced in (Gómez-Déniz et al. 2020), instead, deals with a construction to encompass bimodal count data starting from the Poisson model, with addition of an extra dispersion parameter θ responsible for bimodality (not necessarily at the extremes of support) 10 . After truncation at m, the main drawback of the Bi-Poiss model is the lack of an explicit link between parameter values and polarization and floatation of the response: for instance, theoretical values for the modes can be obtained in terms of parameters by solving numerically nonlinear equations. In addition, the Bi-Poiss does not encompass the scenario of three response clusters as the OFS 111 model. Conversely, the Bi-Poiss model is directly applicable in case of bimodality at inner categories, whereas specification of mixtures of DB models in this case should be designed carefully for identifiability issues (see Appendix 2).
For bimodal discrete data, a two-component mixture of (truncated) Conway-Maxwell-Poisson models can be considered as well (Mix-CMP, (Sur et al. 2015)) 11 . With respect to computational aspects, the M-step within the EM algorithm needs to be performed with a computationally demanding grid search since the ML solution for Mix-CMP is highly dependent of initial values. With respect to the problem under examine, the main drawback about Mix-CMP concerns identifiability, which causes several limitations on interpretation of the response location and dispersion. Specifically, parameters are not straightforwardly interpretable in terms of polarization and floatation, as for the OFS family. As to fitting performances, a tentative approach to pursue a comparative analysis with the OFS family requires to set suitable parameter constraints to mitigate identifiability issues for the Mix-CMP, at the cost of lack of flexibility. For instance, the supporters' pole can be shaped by restricting to a C M P(λ S , ν S ) with λ S ∈ (m − 2, m), ν S ∈ (0, 1), whereas a C M P(λ O , ν O ) model with λ O , ν O ∈ (0, 1) can be considered for the opponents' pole. Floatation could be possibly considered explicitly if a component C M P(λ F , ν F ), λ F , ν F > 1, is specified in the mixture. For count data, each component should be truncated from below at the minimum observed count, and from above at the largest observed count or at the censoring threshold, whereas it should be truncated from above at m − 1 and then shifted upward by 1 in case of ratings on Likert-type scales, as argued for the Binomial component in cub mixtures (Piccolo and Simone 2019). 10 If g(x; λ) denotes the (shifted) Poisson probability function, x ∈ N, the bimodal discrete shifted Poisson model is defined by the probability mass function: . In particular, θ > 0 or θ < 0 for overdispersed or underdispersed distribution with respect to the Poisson. 11 A discrete random variable X ∼ C M P(λ, ν) has the Conway-Maxwell-Poisson distribution of param- x ≥ 0. If ν = 1, the Poisson model is recovered, whereas ν < 1 or ν > 1 implies overdispersion or underdispersion, respectively. It is worth to remark that for data exhibiting bi-polarization and floatation, a 3component mixture of CMP would have a higher model complexity than the OFS 111 model; similarly, the Mix-CMP would be less parsimonious than mihg and OFS 101 for U -shaped distributions and than OFS 110 or OFS 011 for bimodal data with one mode at one of the extremes.
In order to show that the OFS family is successfully applicable also in case of (truncated) distributions of count data, Table 7 reports some performance indicators of alternative models for the Health Heritage Competition data discussed in (Sur et al. 2015) 12 . Fitting results of a unique DB(α, β) model with no parameter constraints, and of the cub mixture (Piccolo and Simone 2019), possibly allowing for inflation at the last category (cub with shelter), are also reported. The last column reports the average of the p values for the Pearson X 2 goodness-of-fit statistics, applied on each test set of a K = 30-fold cross-validation 13 based on the model estimated on the remaining K − 1 folds 14 : it follows that the OFS 111 entails very satisfactory performance.
Thus, OFS mixtures could be successfully applied to assess the efficiency of health care structures, for instance, as well as for other count data, thanks to good flexibility in both fitting and interpretation. For instance, in this case floatation covers the intermediate stays, whereas polarization should be interpreted as the predominance of short and long hospitalizations, with parameters α 1 , β 3 describing the concentration of brief and lengthy stays towards the lowest and largest count, respectively. Finally, OFS mixing weights quantify how frequent short, intermediate and long hospitalizations are overall. For the example, results indicate that intermediate hospitalizations tend to 12 After omitting zero counts, the observed distribution has been censored at 15, so that the observed scores {1, . . . , 15+} have frequencies (9299, 4548, 2882, 1819, 1093, 660, 474, 316, 263, 209, 145, 135, 111, 65, 479), with saturated log-likelihood l sat = 15 r =1 n r log( n r n ) = −41180.28. 13 The R package caret has been exploited to split the data (Khun 2020).
14 The choice of setting K = 30 allows that that each fold has a moderate sample size of 750 observations. be as shorter as possible since the floatation component is right-skewed with modal value at the second category 15 .
For the subsequent case studies, fitting results of both the Bi-Poiss and the (constrained) Mix-CMP models will be reported for the sake of comparisons.

Remark 4
Noticeably, the latent Beta polarization components f (x; α 1 , 1) and f (x; 1, β 3 ) of the OFS family are particular cases of the Kumaraswami distribution (Jones 2009), with density g(x;α, β) that is similar to the Beta distribution for several aspects, yet more tractable from the mathematical point of view. Preliminary investigations seem to indicate that mixture specification within this family would not imply identifiability issues as for the Beta mixtures discussed in Appendix 2. Thus, a mixture of two discretized Kumaraswami distributions, one with parameters (α 1 , β 1 ) such that min(α 1 , β 1 ) < 1 for polarization, and one component with parameters (α 2 , β 2 ) with min(α 2 , β 2 ) > 1 for floatation, could be an alternative model for the problem under examine, yet with lack of straightforward and symmetrical interpretation of parameters with respect to polarization and floatation; further, non-uniform symmetric shapes would not be encompassed.

A case study on the probability to vote for German Political Parties
The data analysed in the present section are taken from the GESIS ALLBUS German Social Survey (Gesis 2016). On a rating scale ranging from 1 = "very unlikely", 10 = "very likely", respondents were asked to rate: "How likely it is that you would ever vote for this German party?". Hereafter, ratings for the four main parties (CDU, SPD, FDP, The Greens) collected in 2002 and 2008 will be considered. The last two categories have been collapsed to yield rating measurements on a scale with m = 9 categories. After list-wise omission of missing values, samples of n = 2738 and n = 3056 observations are analysed for 2002 and 2008 data, respectively. Within the OFS framework, polarization is meant as resoluteness of the opinion of opponents and supporters, whereas floatation can be also interpreted as indecision. Table 8 reports the best model for each rating variable, selected on the basis of a joint analysis of multiple criteria, including X 2 Statistics, likelihood ratio tests for nested models and BIC values. As a general rule, the most parsimonious specification has been preferred in case of weakly significant evidence for a more complex model, if comparable satisfactory results hold for the other criteria (see Appendix 3 for details).
It follows that:

Fig. 1
Polarization parameters for the best OFS mixture (see Table 8) -For the Greens and the FDP, instead, evidence for the supporter pole was found only in 2008: given the positive asymmetry of the floatation component in 2002 (see Table 9) and its symmetry in 2008, it can be concluded that there has been a movement of the undecided opinions towards the supporter pole from 2002 and 2008.
The parameterization of polarization and indecision accomplished via OFS mixtures allows to identify if and to what extent changes have occurred in the probability to vote for German Parties. Figure 1 shows estimated polarization parametersδ 1 ,δ 3 ,α 1 ,β 3 ∈ (0, 1) for all parties in 2002 (left panel) and in 2008 (right panel). Lower and upper bounds of 95%-bootstrap confidence intervals are displayed with star symbols at the edge of the whiskers departing from the point estimates. It follows that: -Polarization and floatation components of the voting probabilities for the CDU are overall stable from 2002 to 2008, in both intensity and size; -For the SPD, a significant decrease is observed for both δ 3 and β 3 : thus, given that no relevant variation is observed for δ 1 , it can be inferred that indecision has increased, but positive evaluations have further polarized. -For the Green and the FDP parties, a significant decrease is observed in both δ 1 and α 1 , indicating that the opposition pole grew in intensity but decreased in size. As a result, it can be inferred that some negative yet un-polarized evaluations have floated towards a symmetric indecision (see also Table 9).
Figure 2 provides a joint representation of estimation results for the sizes of polarization and floatation with a ternary plot of mixing weights (left), whereas a scatter plot of polarization parameters α 1 , β 3 in the unit square is displayed to compare the strengths of unfavourable and favourable opinions over time (right). Table 9 Asymmetry and Adjusted kurtosis of the floatation component for the best OFS mixture (see Table 8)  Table 9 reports the chosen asymmetry measure γ 1 defined in (4) and the adjusted kurtosis value γ 2 (15) for the estimated indecision component for those parties and time points where it is not degenerate. The extent of floatation of negative opinions towards neutrality is then quantified, as is the extent by which un-polarized opinions became more homogeneous from 2002 to 2008 for both the FPD and the Greens (more for the FDP than for the Greens). The reverse circumstance is observed for SPD, for which the neutrality component in 2002 left the place to a general yet symmetric indecision. The analysis and the proposed visualization tools for the results could be replicated conditional to covariates values (as gender, geographical residence, etc) to give local assessments of the polarization and floatation dimensions.
Finally, a 10-fold cross-validation is performed to check the ability of the selected best model (Table 8) to predict the rating distribution. Table 10 reports some summarizing indicators: average and 9th decile over folds of the dissimilarity index between the best model p train , estimated on the training set, and the response distribution on the test set ( f test ), are proposed as a proxy of prediction errors for the test set distribution. With the same goal and strategy, the average over folds of the Kullback-Leibler divergence is reported for candidate OFS models. Results indicate that, beyond fitting ability, the flexibility of OFS models allows to attain satisfactory predictive performance.

Table 10
Summarizing results for goodness of fit and predicting ability of the best model, over 10-fold cross-validation

Final considerations
The paper has discussed mixture specification of Discretized Beta models to explicitly parameterize polarization and floatation of discrete ordered evaluations, as ratings and (truncated) count data. The proposal is more flexible than other alternative models in both fitting performance and interpretation: for instance, the method presented in (Gómez-Déniz et al. 2020) to induce bimodality in a distribution could be applied also to the DB model, at the cost of losing the direct a-priori parameterization of polarization features afforded by the OFS models. A devoted R package for OFS implementation is under development. Further research will be tailored to the analysis of tail dependencies of polarization and floatation of different survey items with suitable copula modelling, as well as to the implementation of model-based trees to derive response profiles in terms of covariates entailing a significant effect in at least one model's features (see Simone et al. 2019) for the case of cub models for rating data). A comparative analysis with mixtures of discretized Kumaraswami distributions also deserve in-depth investigation in future research.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. For the DB model, Fig. 3 displays contour lines for the adjusted kurtosis index (left) and selected DB distributions with corresponding values of γ 1 (α, β), γ 2 (α, β) (right). As a benchmark, notice that γ 2 = 1.8 for the uniform distribution, whereas γ 2 → 3 if α = β → ∞.
combining DB models for polarization with a DB model for floatation. The core of the discussion is the following theorem, whose proof follows straightforwardly.
Mixture specification within the DB family should consider also the following arguments leading to a Beta approximation of Beta mixtures 16 .
For a fixed k ≥ 2, consider a mixture g(x) = k i=1 d i f (x; α i , β i ) of Beta densities. If μ i1 and μ i2 denote the first and second moments of the i-th mixture component, let be the first and second moment of the mixture, respectively. If s = μ 2 −μ 2 1 is the variance of the mixture, and h = μ 1 1−μ 1 , the following approximation can be derived: For instance, assume that k = 2 and that X 1 ∼ f (α 1 , β 1 ) is J -shaped, whereas X 2 ∼ f (α 2 , β 2 ) is reversed J -shaped. Their mixture g(x) (with weights d 1 and 1−d 1 ) can be approximated by a U -shaped Beta density f (x; α, β) with parameters (α, β) obtained as in (17). Table 11 reports some instances. The last 4 columns report the Dissimilarity index between the discretized versions of g(x) and its approximation f (x; α, β) given in (17), for varying number of categories m. Results indicate that this approximation is satisfactory: thus, an OFS 101 model could be approximated by a DB(α, β) model, which in turn can be written as a further mixture of two DB models after discretization of the representation in (16). Then, specifying DB mixtures is a challenging task, especially for small m.
If the constraints min(α 2 , β 2 ) > 1 and (2) are satisfied, identifiability concerns may arise depending on the goodness of the approximation Pr( ·; θ ) ≈ Pr( ·; θ ) and on the distance between parameter vectors. Similar issues may arise if max(α 1 , β 1 ) ≤ 1. Applying Theorem 1 iteratively to the underlying Beta mixture, it follows that: where (17) implies that: Thus, despite immediate interpretation of parameters, the mixture specification in (6) does not ensure identifiability.