With the upcoming of any electoral competition, parties’ share of the electoral consensus can be measured by pollsters if voting intentions on nominal scales are surveyed. A more innovative approach consists in gauging probability to vote for each candidate as ratings on ordered scales in order to assess the extent by which respondents’ opinions hold. Similarly, marketing stakeholders prefer to survey intention to take a certain decision in the future, rather than asking questions with yes/no answers about respondents’ likings and habits. Thus, suitable statistical modelling of ordered evaluations is advocated to characterize clusters of both extreme and intermediate response choices.

Polarization is hereafter meant as the process by which evaluations about an item converge towards one of two opposing poles of the response spectrum, in the spirit of (Apouey 2007)Footnote 1. Possibly, a further cluster may be expected as a result of un-polarized respondents, corresponding to a concentration of responses away from the extremes: the term floatation is hereafter used to indicate this circumstance as complementary to polarization.

A candidate model allowing to directly parameterize polarization towards the extremes is the two-component mixture of Inverse Hypergeometric distributions (mihg, (Simone and Iannario 2018)), whereas a mixture of Binomial and Discretized Beta models can be considered to analyse overall response feeling and certain symmetric response styles (caub, (Simone and Tutz 2018)). For count data, bimodality (not necessarily at the extremes of the response support) can be tackled via suitable adaptation of the (shifted) Poisson distribution (Gómez-Déniz et al. 2020) or by resorting to a two-component mixture of Conway–Maxwell–Poisson models (Sur et al. 2015).

With respect to the state of the art, the paper discusses the specification of mixture models based on the Discretized Beta distribution (Ursino 2014; Ursino and Gasparini 2018) as a flexible class of statistical models to parameterize polarization and floatation of ordered evaluations. The proposal is designed to attain broad and straightforward interpretation for marketing, psychology and socio-economic studies, as it allows to characterize opposite and intermediate response clusters. Further relevant applications include self-reported wealth or health, or Net Promoter Score type evaluations (NPS, (Reichheld 2003)) to assess the extent by which attractors outclass detractors (Capecchi and Piccolo 2017).

The paper is organized as follows: Sect. 2 recalls the baseline framework of the Discretized Beta model. The core of the paper is Sect. 3, with a detailed discussion on mixtures based on the Discretized Beta distribution to jointly model polarization and floatation of ordered evaluations; goodness-of-fit criteria and inferential aspects are described in Sects. 3.23.5, whereas a comparative discussion of the state of the art is delivered in Sect. 3.6. A case study is pursued in Sect. 4 to support the proposal with empirical evidence. Concluding remarks are addressed in Sect. 5. A devoted appendix supplements the presentation with a discussion on the optimal number of components for Discretized Beta mixtures and of the parameter constraints needed to prevent identifiability issues.

Discretized Beta mixtures for polarization and floatation of ordered data

Let R be a rating variable collected on a response scale with m ordered categories, say \(c_1 \prec c_2 \prec \cdots \prec c_m\): the numeric scoring \(c_r = r\) will be made merely for notational convenience. Without loss of generality, assume that the scale has a positive orientation with the trait being examined.

Definition 1

For \(\alpha , \beta \in {\mathbb {R}}^{+}\), let \(X \sim Beta(\alpha ,\beta )\) be a Beta distributed random variable over the real interval [0, 1]. For a given \(m>3\), a discrete variable R, with support \(\{1,2,\dots ,m\}\), is said to be distributed according to a Discretized Beta model of parameters \(\alpha ,\beta \) (\(R \sim \text {DB}(\alpha ,\beta )\), for short) if:

$$\begin{aligned} Pr(R = r |\alpha ,\beta )\,=\, Pr\bigg ( \frac{r-1}{m} \le X \le \frac{r}{m} \big | \alpha , \beta \bigg ), \qquad r=1,\dots ,m. \end{aligned}$$

For notational convenience, set \(db(r; \alpha ,\beta ) := Pr(R = r |\alpha ,\beta )\). This model has been already acknowledged in the literature on ordinal data analysis in view of the flexibility inherited from the underlying Beta distribution, which does not impose a predetermined shape for the latent continuous trait (Ursino 2014; Fasola and Sciandra 2015; Ursino and Gasparini 2018; Simone and Tutz 2018). Similar arguments can be advanced for the Beta-Binomial model (Morrison 1979), yet the Discretized Beta is more versatile as it can be either overdispersed or underdispersed (Ursino 2014). The uniform distribution arises as a limit case when \(\alpha =\beta =1\). Location and shape properties of the latent Beta model imply the following features of the DB distribution (Abramowitz and Stegun 1972; Forbes et al. 2011). Given that the discretization of the latent Beta model occurs at equi-spaced intervals for a fixed m, the modal value Mo(R) of \(R \sim {\text {DB}}(\alpha ,\beta )\) satisfies:

  • \(Mo(R) = 1\) if \(\alpha < 1\) and \(\beta \ge 1\) or, in case \(\min (\alpha ,\beta )>1\), if \(\frac{\alpha -1}{\alpha +\beta -2}<\frac{1}{m}\);

  • \(Mo(R) = m\) if \(\alpha \ge 1\) and \(\beta <1\) or, in case \(\min (\alpha ,\beta ) > 1\), if \(\frac{\alpha -1}{\alpha +\beta -2} > 1-\frac{1}{m}\);

  • \(Mo(R) = r \in \{2,\dots ,m-1\}\) if and only if \(\min (\alpha ,\beta ) > 1\) and \(\frac{\alpha -1}{\alpha +\beta -2} \in (\frac{r-1}{m}, \frac{r}{m}]\). Thus, the following condition implies an inner mode:

    $$\begin{aligned} \frac{1}{m}< \frac{\alpha -1}{\alpha +\beta -2} < 1-\frac{1}{m}; \end{aligned}$$
  • The distribution is U-shaped with two modal values at the first and at the last categories if \(\max (\alpha ,\beta ) <1\) and if, for the given m, parameters satisfy the following system of inequalitiesFootnote 2 based on the incomplete Beta function \(I_x(\alpha ,\beta )\):

    $$\begin{aligned} {\left\{ \begin{array}{ll} 1+I_{\frac{m-2}{m}}(\alpha ,\beta ) &{}> \,2\,I_{\frac{m-1}{m}}(\alpha ,\beta );\\ 2\,I_{\frac{1}{m}}(\alpha ,\beta ) &{}> \,I_{\frac{2}{m}}(\alpha ,\beta ).\\ \end{array}\right. } \end{aligned}$$

As a consequence, a necessary condition for a Discretized Beta model to be applied for polarization of either favourable or unfavourable responses is the constraint \(\min (\alpha , \beta ) < 1\). Under this circumstance, parameter \(\alpha \) governs the polarization of the unfavourable responses: hereafter, this cluster will be referred to as opponents’ pole. If \(\beta = \max (\alpha , \beta ) \ge 1\), the closer \(\alpha \) is to 0, the stronger is the polarization of the opponents, with positive asymmetry increasing with growing \(\beta \). Conversely, \(\beta \) governs the polarization of the favourable responses (say, the supporters’ pole). If \(\alpha = \max (\alpha , \beta ) \ge 1\), the closer \(\beta \) is to 0, the higher is the probability assigned to the last category and thus the stronger is the polarization of the supporters, with negative asymmetry strengthening with growing \(\alpha \). A Discretized Beta model with \(\max (\alpha ,\beta )<1\), instead, can be specified to account for polarization towards both the extremes (provided that (3) holds), whereas floatation between the two response endpoints can be modelled by assuming a DB\((\alpha ,\beta )\) distribution with \(\min (\alpha ,\beta )>1\), such that (2) holds true, given the number of categoriesFootnote 3. Asymmetry and intensity of floatation can be measured in terms of skewness \(\gamma _1(\alpha ,\beta )\) and excess kurtosis \(\gamma _2(\alpha ,\beta )\) of the underlying Beta distributionFootnote 4:

$$\begin{aligned} \gamma _1(\alpha ,\beta )= & {} 2\frac{\beta - \alpha }{\alpha +\beta +2}\sqrt{\dfrac{\alpha +\beta +1}{\alpha \,\beta }}; \end{aligned}$$
$$\begin{aligned} \gamma _2(\alpha ,\beta )= & {} \frac{6\big (\alpha ^3 + \alpha ^2(1-2\beta ) +\beta ^2(1+\beta )-2\alpha \beta (2+\beta )\big )}{\alpha \beta \big (\alpha +\beta +2\big )\big (\alpha +\beta +3\big )}; \end{aligned}$$

such that \(\gamma _1(\alpha ,\beta ) = -\gamma _1(\beta ,\alpha )\) and \(\gamma _2(\alpha ,\beta ) = \gamma _2(\beta ,\alpha )\). However, interpretation of excess kurtosis is not straightforward for asymmetric distributions: the measure of kurtosis adjusted for skewness introduced in (Blest 2003) can be considered to overcome this issue (see (15) in Appendix 1 for details).

Finite mixtures of Discretized Beta model

Given the flexibility in both shape and interpretation of the DB model, polarization and floatation in ordered data can be jointly parameterized by specifying suitable mixture distributions.

By virtue of the comments delivered in Sect. 2, if floatation can be shaped via a DB model with parameters \(\alpha _2,\beta _2 > 1\) satisfying (2) for given m, alternative specifications are possible for the polarization effect:

  1. 1.

    a unique component DB\((\alpha _1,\beta _1\)) with \(\min (\alpha _1,\beta _1) < 1\), and with \(\max (\alpha _1,\beta _1) < 1\) satisfying (3) if two opposing clusters at the extremes are present, or with \(\max (\alpha _1,\beta _1) \ge 1\) if only one pole of supporters or opponents is found, yielding the two-component mixture:

    $$\begin{aligned} Pr\big (R=r \mid \varvec{\theta }\big )= (1-\delta )\,db(r;\alpha _1,\beta _1)\,+\,\delta \,db(r;\alpha _2,\beta _2)\,,\quad r=1,\dots ,m\,; \end{aligned}$$
  2. 2.

    a mixture of two J-shaped DB models, DB\((\alpha _1,\beta _1)\) and DB\((\alpha _3,\beta _3)\), yielding the 3-component mixture specification:

    $$\begin{aligned} Pr\big (R=r \mid \varvec{\theta }\big )= & {} \delta _1\,db(r;\alpha _1,\beta _1)\,+\,\delta _2\,db(r;\alpha _2,\beta _2)\,\nonumber \\&+\, \delta _3\,db(r;\alpha _3,\beta _3), \end{aligned}$$

    so that \(\delta _1+\delta _2+\delta _3=1\), and:

    • \(\alpha _1 \in (0,1)\) and \(\beta _1 \ge 1\) to shape the opponents’ pole;

    • \(\alpha _2,\beta _2 >1\) satisfy (2) to shape floatation;

    • \(\alpha _3\ge 1,\beta _3 \in (0,1)\) to shape the supporters’ pole.

Some identifiability issues may arise for the polarization components in both (6) and (7), due to a Beta approximation of the latent Beta models. Appendix 2 collects all the relevant discussion and results pertaining to these topics: the present section will focus on the proposed class of mixtures, stemming from (7) under suitable parameter constraints.

The OFS mixture for polarization and floatation of ordered evaluations

In order to overcome possible identifiability issues for mixtures of DB models, the proposed strategy is to constrain \(\beta _1=1\) and \(\alpha _3=1\) for the mixture specification (7).

Hereafter, the acronym OFS will stand for Opponent-Floatation-Supporter, and three 0-1 subscripts will indicate if each component is specified in the mixture (1) or not (0). Thus, models DB\((\alpha _1,1)\), with \(\alpha _1 \in (0,1)\), and DB\((1,\beta _3)\), with \(\beta _3 \in (0,1)\), will be referred to as \({\text {OFS}}_{100}\) and \({\text {OFS}}_{001}\) to indicate a DB distribution to model polarization towards the opponents’ and the supporters’ pole, respectively. Consequently, as a benchmark for bi-polarization towards the end-points, the proposal is to assume the following mixture specification.

Definition 2

If \(\alpha _1, \beta _3, \delta \in (0,1)\), the \({\text {OFS}}_{101}\) model is defined by the mixture:

$$\begin{aligned} Pr(R=r|\varvec{\theta }) = \delta \, db(r;\alpha _1,1) + (1-\delta )\,db(r; 1,\beta _3),\qquad r=1,\dots ,m. \end{aligned}$$

The mixture of \({\text {OFS}}_{101}\) for polarization with an \({\text {OFS}}_{010}\) distribution for floatation (so that (2) holds) can be safely considered to jointly model polarization towards either one or both the extremes and possible floatation in between.

Definition 3

If the above notation prevails, the \({\text {OFS}}_{111}\) model is defined by:

$$\begin{aligned} Pr(R=r|\varvec{\theta }) = \delta _1\, db(r;\alpha _1,1) \,+\, \delta _2 \,db(r; \alpha _2,\beta _2) \,+\, \delta _3 \,db(r; 1,\beta _3). \end{aligned}$$

Remark 1

With reference to the procedures outlined in Appendix 2 and unlike for (6) and (7), the Beta approximation of the latent polarization components in (9), and its combination with the latent floatation, does not correspond to an \({\text {OFS}}_{111}\) specification. The same arguments apply if either \({\text {OFS}}_{100}\) or \({\text {OFS}}_{001}\) are assumed for polarization. Thus, identifiability of parameters can be assumed for OFS mixture models.

Both asymmetric and symmetric floatation are encompassed by the \({\text {OFS}}_{111}\) model (under the constraint \(\alpha _2 = \beta _2\)). In case the floatation component is symmetric, the superscript (s) will be used. If m is odd, a degenerate floatation component corresponds to neutrality (in case \(\alpha _2=\beta _2\) tends to infinity), resulting in inflation in the middle of the response scale: in this case, the superscript (i) will replace (s), and the resulting \({\text {OFS}}_{111}^{(i)}\) model will denote a mixture of an \({\text {OFS}}_{101}\) model with a degenerate distribution \(\mathbbm {1}_{c=r}\) with mass concentrated at \(c=\frac{m+1}{2}\) (so that \(\mathbbm {1}_{{c=r}} = 0\) if \(r \ne c\), and \(\mathbbm {1}_{{c=r}} = 1\) if \(r=c\)).

Remark 2

OFS models encompass also inflated responses at the extremes of the response support. Consider, for instance, the \({\text {OFS}}_{110}\) model: the DB\((\alpha _1,1)\) component identifies the opponents’ cluster, which is characterized by a mode at the first category and decreasing probabilities as scores increase, thus allowing to account also for scale usage diversity among opponents and for different strengths of opposition. As a limit case, the \({\text {OFS}}_{110}\) tends to an inflated DB model with inflation at the first category if \(\alpha _1 \rightarrow 0\). The dual remark applies for the \({\text {OFS}}_{011}\) modelFootnote 5. Thus, the smoothed switch between extreme modal values and inner categories implied by the OFS approach is more general than DB models with inflation at either one of the end-points (see the example discussed in Sect. 3.6).

Remark 3

Covariate effects on model parameters can be investigated via suitable link functions. If \(\varvec{x}_i, \varvec{y}_i, \varvec{u}_i, \varvec{z}_i, \varvec{t}_i\) are selected subjects’ characteristics, a logarithmic link can be set for individual floatation parameters \(\alpha _{2i}, \beta _{2i} > 1\):

$$\begin{aligned} \log (\alpha _{2i})=\varvec{z}_i\, \varvec{\gamma }_2\,;\quad \log (\beta _{2i})=\varvec{u}_i\, \varvec{\eta }_2\,,\\ \end{aligned}$$

provided that the constraint (2) is taken into account also conditional to covariates, whereas a logit link can be set for polarization parameters \(\alpha _1, \beta _3, \delta _1, \delta _3 \in (0,1)\):

$$\begin{aligned} {\text {logit}}(\alpha _{1i})=\varvec{y}_i\, \varvec{\gamma }_1\,;\; {\text {logit}}(\beta _{3i})=\varvec{w}_i\, \varvec{\eta }_3\,;\; {\text {logit}}(\delta _{1i})=\varvec{x}_i\, \varvec{\omega }_1;\; {\text {logit}}(\delta _{3i})=\varvec{t}_i\, \varvec{\omega }_3\,.\\ \end{aligned}$$

Fitting performances and model selection

Model selection within the OFS class can be performed in terms of likelihood ratio test for pairs of nested models (to compare the symmetric and asymmetric specification for floatation, for instance). More generally, fitting performance of an OFS model against competing alternatives can be assessed by resorting to information criteria: in the following, the BIC index will be considered to account also for model complexity. Standard goodness-of-fit tests relying on Pearson \(X^2\) statistics could be performed provided that \(m-1-k>0\), if k is the number of estimable parameters. For instance, \(m>7\) is needed to apply this test for \({\text {OFS}}_{111}\) models.

The normalized Leti’s dissimilarity index (Leti 1983):

$$\begin{aligned} Diss(\varvec{f},\varvec{p}) = \dfrac{1}{2}\sum _{r=1}^m |f_r - p_r|,\quad Diss(\varvec{f},\varvec{p}) \in [0,1], \end{aligned}$$

will be considered to measure the goodness of fit of an estimated model \(\varvec{p} = \varvec{p}(\varvec{\theta }) = (p_1,\dots ,p_m)\) to the observed relative frequency distribution \(\varvec{f}= (f_1,\dots ,f_m)\). With respect to more traditional indicators, as the Hellinger distance \(H(\varvec{p},\varvec{q})\) (Gibbs and Su 2002), so that:

$$\begin{aligned} H^2(\varvec{p},\varvec{q})\, \le \, Diss(\varvec{p}, \varvec{q}) \le \sqrt{2}\,H(\varvec{p},\varvec{q}), \quad H^2(\varvec{p},\varvec{q})= \dfrac{1}{2}\sum _{r=1}^m \big (\sqrt{p_r} - \sqrt{q}_r\big )^2,\nonumber \\ \end{aligned}$$

the Dissimilarity value is interpretable as the percentage of responses that are missed by the modelFootnote 6. For this reason, it can be also exploited to check the ability of a model \(\varvec{p}\), estimated on a training set, to predict the test set distribution \(\varvec{f}\). With the same goal and for comparative purposes, the Kullback–Leibler Divergence \(KL(\varvec{f}\vert \vert \varvec{p}) = \sum \limits _{r=1}^m f_r \log (\frac{f_r}{p_r})\) will be also computed.

Inferential issues for the OFS model

Hereafter, the main steps of the expectation–maximization algorithm for mixtures (EM, (McLachlan and Krishnan 1997)) to perform maximum likelihood estimation of parameters are outlined for the general \({\text {OFS}}_{111}\) specification.

For a sample of ratings \(\varvec{r} = (r_1,\dots ,r_n)\), the complete log-likelihood of the \({\text {OFS}}_{111}\) model, with parameter vector \(\varvec{\theta } = (\delta _1,\delta _3,\alpha _1,\alpha _2,\beta _2,\beta _3)\), is given by:

$$\begin{aligned} l_c(\varvec{\theta }; \varvec{r})&= \log (\delta _1)\,\sum \limits _{i=1}^n Z_{1i} \, +\, \log (\delta _3)\,\sum \limits _{i=1}^n Z_{3i} \, +\, \log (1-\delta _1-\delta _3)\,\sum \limits _{i=1}^n Z_{2i} \; \end{aligned}$$
$$\begin{aligned}&\quad + \, \sum \limits _{i=1}^n Z_{1i} \,\log \big (db(r_i;\alpha _1,1) \big )\,+ \,\sum \limits _{i=1}^n Z_{2i} \log \big ( db(r_i; \alpha _2,\beta _2)\big ) \nonumber \\&\quad + \sum \limits _{i=1}^n Z_{3i} \,\log \big (db(r_i;1,\beta _3) \big ) \end{aligned}$$

where \(Z_{ji}\) is a random variable with \(Z_{ji} =1\) if the i-th rating is drawn from the j-th component in the mixture, and \(Z_{ji}=0\) otherwise (so \(Z_{2i} = 1-Z_{1i}-Z_{3i}\)). Thus, if \(\varvec{\theta }^{(k)}\) is the current estimate at the k-th iteration, the posterior probabilities of the i-th rating being drawn from the opponents’ component DB\((\alpha _1,1)\) and the supporters’ component DB\((1,\beta _3)\) are computed within the E-step as:

$$\begin{aligned}&{\mathbb {E}}[Z_{1i}|\varvec{\theta }^{(k)}] = \tau _{1i}^{(k)} = \dfrac{\delta _1^{(k)}\, db(r_i; \alpha _1^{(k)},1)}{Pr(R_i=r_i|\varvec{\theta }^{(k)})};\\&\quad {\mathbb {E}}[Z_{3i}|\varvec{\theta }^{(k)}] = \tau _{3i}^{(k)} = \dfrac{\delta _3^{(k)}\,db(r_i; 1,\beta _3^{(k)})}{Pr(R_i=r_i|\varvec{\theta }^{(k)})}, \end{aligned}$$

so that \(\tau _{2i}^{(k)} = 1- \tau _{1i}^{(k)} - \tau _{3i}^{(k)}\). In case covariates effects are not specified in the model, then one can write \(\tau _{ji}^{(k)} = \tau _{jr}^{(k)}\) if \(r_i=r\), \(r=1,\dots ,m\), \(j=1,2,3\), and the expected complete log-likelihood to be maximized at the M-step can be rewritten as:

$$\begin{aligned} {\mathbb {E}}[l_c(\varvec{\theta })| \varvec{\theta }^{(k)}] = Q_1^{(k)}(\delta _1,\delta _3) + Q_2^{(k)}(\alpha _1) + Q_3^{(k)}(\alpha _2,\beta _2) + Q_4^{(k)}(\beta _3), \end{aligned}$$

where \((n_1,n_2,\dots ,n_m)\) denotes the frequency distribution of the sample, and one sets:

  • \(Q_1^{(k)}(\delta _1,\delta _3) = \log (\delta _1)\,\sum \limits _{r=1}^m n_r \tau _{1r}^{(k)}\, +\, \log (\delta _3)\,\sum \limits _{r=1}^m n_r \tau _{3r}^{(k)} \,+ \,\log (1-\delta _1-\delta _3) \sum \limits _{r=1}^m n_r \tau _{2r}^{(k)}\), yielding, after differentiation, the updated estimates:

    $$\begin{aligned} \delta _1^{(k+1)} = \dfrac{1}{n} \sum \limits _{r=1}^m n_r\,\tau _{1r}^{(k)}; \quad \delta _3^{(k+1)} = \dfrac{1}{n} \sum \limits _{r=1}^m n_r \tau _{3r}^{(k)}; \quad \delta _2^{(k+1)} = 1- \delta _1^{(k+1)}-\delta _3^{(k+1)}\,; \end{aligned}$$
  • \(Q_2^{(k)}(\alpha _1) = \sum \limits _{r=1}^m n_r\, \tau _{1r}^{(k)}\,\log (db(r;\alpha _1,1))\)\(Q_4^{(k)}(\beta _3) = \sum \limits _{r=1}^m n_r\, \tau _{3r}^{(k)}\,\log (db(r;1,\beta _3))\);

  • \(Q_3^{(k)}(\alpha _2,\beta _2) = \sum \limits _{r=1}^m n_r\, \tau _{2r}^{(k)}\,\log (db(r;\alpha _2,\beta _2))\).

At each step, the updated estimates of \(\alpha _1, \alpha _2, \beta _2, \beta _3\) have to be obtained from numerical optimization of the corresponding functions, under the required bound constraintsFootnote 7.

Small simulation experiment

In order to show the performance of the estimation procedure, a small simulation experiment has been carried out: for each scenario, \(B=200\) samples of size n were generated. Table 1 reports the mean squared error (MSE) of the sampling distribution of parameter estimators obtained over the simulation runs. The average dissimilarity between generating model \(\varvec{p}\) and estimated distribution \(\hat{\varvec{p}}\) (\({\widehat{Diss}}(\varvec{p},\hat{\varvec{p}})\)) and between frequency distribution of the sample \(\varvec{f}\) and estimated distribution \(({\widehat{Diss}}(\varvec{f},\hat{\varvec{p}})\)) is reported. Analogous simulation experiments are pursued also for \({\text {OFS}}_{101}\) and \({\text {OFS}}_{110}\) for the sake of completeness (see Tables 2 and 3). Results are satisfactory and indicate that the model is correctly specified and estimated, with efficiency improving with sample size.

Table 1 MSE of the sampling estimators of \({\text {OFS}}_{111}\) parameters
Table 2 MSE of the sampling estimators of \({\text {OFS}}_{101}\) parameters
Table 3 MSE of the sampling estimators of \({\text {OFS}}_{110}\) parameters

Standard errors for OFS parameters

Uncertainty evaluation of parameters estimates could be performed by resorting to asymptotic information theory on the basis of the observed information matrix (see Appendix 1 for details). Potential drawbacks of this procedure may arise due to possible occurrence of numerical overflow in the approximation of the involved integrals. In this respect, numerical derivatives of the log-likelihood can be computed directly with Richardson’s extrapolation method, as suggested in (Ursino and Gasparini 2018)Footnote 8. By considering that information theory results apply only asymptotically under regularity conditions, re-sampling methods as the bootstrap (Efron 1981) can be assumed as a general practice for OFS models, allowing to obtain stable accuracy evaluations on parameter estimates even for small sample sizes.

A small Monte-Carlo experiment has been pursued to compare the asymptotic performance of the different methods: for selected OFS models, n observations were sampled. For the general \({\text {OFS}}_{111}\) model, Table 4 reports standard errors’ estimates obtained on the basis of the observed information matrix (Inf.), numerical approximation of the derivatives of the log-likelihood function with the Richardson’s extrapolation method (Num.), and nonparametric bootstrap with \(B=500\) replicates (Boot.). The three methods are asymptotically equivalent, but for small and moderate sample sizes, the data-driven procedure Boot entails more accurate resultsFootnote 9. For instance, numerical divergence for some of the integrals involved in the computation of the observed information matrix occurred for \(n=500\).

Table 4 Comparison of standard errors: \({\text {OFS}}_{111}\) model with \(m=11, \delta _1=0.25; \delta _3=0.4; \alpha _1=0.2; \beta _3=0.6; \alpha _2=3; \beta _2=4\)

The same check limited to numerical and bootstrap methods is pursued for instances of \({\text {OFS}}_{110}\) and \({\text {OFS}}_{101}\) models (see Tables 5 and 6).

Table 5 Comparison of standard errors obtained by numerical differentiation of log-likelihood to obtain the Hessian matrix (Num.) and nonparametric bootstrap: \({\text {OFS}}_{101}\) model with \(m=7, \delta _1=0.6; \alpha _1=0.4; \beta _3=0.7\)
Table 6 Comparison of standard errors obtained by numerical differentiation of log-likelihood to obtain the Hessian matrix (Num.) and nonparametric bootstrap: \({\text {OFS}}_{110}\) model with \(m=9, \delta _1=0.6; \alpha _1=0.3; \alpha _2=4; \beta _2=1.5\)

A comparative discussion with the state of the art

Like the OFS family, mihg (Simone and Iannario 2018) and caub (Simone and Tutz 2018) mixture models pursue a direct parameterization of the features of interest of the distribution, with easy interpretation and explicit location of modal values (yet the mihg does not consider floatation). In this context, a 3-component mixture of Binomial distributions could be also considered if suitable constraints are put on Binomial parameters to model polarization and floatation: its specification will not be discussed hereafter, since the Binomial model can be approximated by the DB model (Ursino 2014): see (Grilli et al. 2015) for further applications of Binomial mixtures to discrete data.

The proposal of the bimodal discrete shifted Poisson model (Bi-Poiss) advanced in (Gómez-Déniz et al. 2020), instead, deals with a construction to encompass bimodal count data starting from the Poisson model, with addition of an extra dispersion parameter \(\theta \) responsible for bimodality (not necessarily at the extremes of support)Footnote 10. After truncation at m, the main drawback of the Bi-Poiss model is the lack of an explicit link between parameter values and polarization and floatation of the response: for instance, theoretical values for the modes can be obtained in terms of parameters by solving numerically nonlinear equations. In addition, the Bi-Poiss does not encompass the scenario of three response clusters as the \({\text {OFS}}_{111}\) model. Conversely, the Bi-Poiss model is directly applicable in case of bimodality at inner categories, whereas specification of mixtures of DB models in this case should be designed carefully for identifiability issues (see Appendix 2).

For bimodal discrete data, a two-component mixture of (truncated) Conway–Maxwell–Poisson models can be considered as well (Mix-CMP, (Sur et al. 2015))Footnote 11. With respect to computational aspects, the M-step within the EM algorithm needs to be performed with a computationally demanding grid search since the ML solution for Mix-CMP is highly dependent of initial values. With respect to the problem under examine, the main drawback about Mix-CMP concerns identifiability, which causes several limitations on interpretation of the response location and dispersion. Specifically, parameters are not straightforwardly interpretable in terms of polarization and floatation, as for the OFS family. As to fitting performances, a tentative approach to pursue a comparative analysis with the OFS family requires to set suitable parameter constraints to mitigate identifiability issues for the Mix-CMP, at the cost of lack of flexibility. For instance, the supporters’ pole can be shaped by restricting to a \(CMP(\lambda _S,\nu _S)\) with \(\lambda _S \in (m-2,m)\), \(\nu _S \in (0,1)\), whereas a \(CMP(\lambda _O,\nu _O)\) model with \(\lambda _O,\nu _O \in (0,1)\) can be considered for the opponents’ pole. Floatation could be possibly considered explicitly if a component \(CMP(\lambda _F, \nu _F), \lambda _F, \nu _F > 1\), is specified in the mixture. For count data, each component should be truncated from below at the minimum observed count, and from above at the largest observed count or at the censoring threshold, whereas it should be truncated from above at \(m-1\) and then shifted upward by 1 in case of ratings on Likert-type scales, as argued for the Binomial component in cub  mixtures (Piccolo and Simone 2019).

It is worth to remark that for data exhibiting bi-polarization and floatation, a 3-component mixture of CMP would have a higher model complexity than the \({\text {OFS}}_{111}\) model; similarly, the Mix-CMP would be less parsimonious than mihg and \({\text {OFS}}_{101}\) for U-shaped distributions and than \({\text {OFS}}_{110}\) or \({\text {OFS}}_{011}\) for bimodal data with one mode at one of the extremes.

In order to show that the OFS family is successfully applicable also in case of (truncated) distributions of count data, Table 7 reports some performance indicators of alternative models for the Health Heritage Competition data discussed in (Sur et al. 2015)Footnote 12. Fitting results of a unique DB\((\alpha ,\beta )\) model with no parameter constraints, and of the cub  mixture (Piccolo and Simone 2019), possibly allowing for inflation at the last category (cub  with shelter), are also reported. The last column reports the average of the p values for the Pearson \(X^2\) goodness-of-fit statistics, applied on each test set of a \(K=30\)-fold cross-validationFootnote 13 based on the model estimated on the remaining \(K-1\) foldsFootnote 14: it follows that the \({\text {OFS}}_{111}\) entails very satisfactory performance.

Table 7 Models comparison for the Number of Days in Hospital dataset discussed in (Sur et al. 2015) (the two-component Mix-CMP has been fitted with the discussed constraints: \({\hat{p}}=0.95, {\hat{\lambda }}_1=0.64; {\hat{\nu }}_1=0.05; {\hat{\lambda }}_2=13; {\hat{\nu }}_2=0.88\))

Thus, OFS mixtures could be successfully applied to assess the efficiency of health care structures, for instance, as well as for other count data, thanks to good flexibility in both fitting and interpretation. For instance, in this case floatation covers the intermediate stays, whereas polarization should be interpreted as the predominance of short and long hospitalizations, with parameters \(\alpha _1, \beta _3\) describing the concentration of brief and lengthy stays towards the lowest and largest count, respectively. Finally, OFS mixing weights quantify how frequent short, intermediate and long hospitalizations are overall. For the example, results indicate that intermediate hospitalizations tend to be as shorter as possible since the floatation component is right-skewed with modal value at the second categoryFootnote 15.

For the subsequent case studies, fitting results of both the Bi-Poiss and the (constrained) Mix-CMP models will be reported for the sake of comparisons.

Remark 4

Noticeably, the latent Beta polarization components \(f(x;\alpha _1,1)\) and \(f(x; 1,\beta _3)\) of the OFS family are particular cases of the Kumaraswami distribution (Jones 2009), with density \(g(x; \alpha ,\beta )= \alpha \,\beta \,x^{\alpha -1}(1-x^{\alpha })^{\beta -1}\) for \(x \in [0,1]\) that is similar to the Beta distribution for several aspects, yet more tractable from the mathematical point of view. Preliminary investigations seem to indicate that mixture specification within this family would not imply identifiability issues as for the Beta mixtures discussed in Appendix 2. Thus, a mixture of two discretized Kumaraswami distributions, one with parameters \((\alpha _1,\beta _1)\) such that \(\min (\alpha _1,\beta _1)<1\) for polarization, and one component with parameters \((\alpha _2,\beta _2)\) with \(\min (\alpha _2,\beta _2)>1\) for floatation, could be an alternative model for the problem under examine, yet with lack of straightforward and symmetrical interpretation of parameters with respect to polarization and floatation; further, non-uniform symmetric shapes would not be encompassed.

A case study on the probability to vote for German Political Parties

The data analysed in the present section are taken from the GESIS ALLBUS German Social Survey (Gesis 2016). On a rating scale ranging from \(1=\)very unlikely”, \(10=\)very likely”, respondents were asked to rate: “How likely it is that you would ever vote for this German party?”. Hereafter, ratings for the four main parties (CDU, SPD, FDP, The Greens) collected in 2002 and 2008 will be considered. The last two categories have been collapsed to yield rating measurements on a scale with \(m=9\) categories. After list-wise omission of missing values, samples of \(n=2738\) and \(n=3056\) observations are analysed for 2002 and 2008 data, respectively. Within the OFS framework, polarization is meant as resoluteness of the opinion of opponents and supporters, whereas floatation can be also interpreted as indecision.

Table 8 reports the best model for each rating variable, selected on the basis of a joint analysis of multiple criteria, including \(X^2\) Statistics, likelihood ratio tests for nested models and BIC values. As a general rule, the most parsimonious specification has been preferred in case of weakly significant evidence for a more complex model, if comparable satisfactory results hold for the other criteria (see Appendix 3 for details).

Table 8 Best OFS mixture (see Table 12 in Appendix 3 for details)

It follows that:

  • For the CDU, the structural components of the probability to vote have not changed neither in size nor in intensity from 2002 to 2008;

  • For the SPD, the neutrality component in 2002 has transformed to a more general yet symmetric indecision component;

  • For the Greens and the FDP, instead, evidence for the supporter pole was found only in 2008: given the positive asymmetry of the floatation component in 2002 (see Table 9) and its symmetry in 2008, it can be concluded that there has been a movement of the undecided opinions towards the supporter pole from 2002 and 2008.

The parameterization of polarization and indecision accomplished via OFS mixtures allows to identify if and to what extent changes have occurred in the probability to vote for German Parties. Figure 1 shows estimated polarization parameters \({\hat{\delta }}_1,{\hat{\delta }}_3,{\hat{\alpha }}_1,{\hat{\beta }}_3 \in (0,1)\) for all parties in 2002 (left panel) and in 2008 (right panel). Lower and upper bounds of 95%-bootstrap confidence intervals are displayed with star symbols at the edge of the whiskers departing from the point estimates. It follows that:

  • Polarization and floatation components of the voting probabilities for the CDU are overall stable from 2002 to 2008, in both intensity and size;

  • For the SPD, a significant decrease is observed for both \(\delta _3\) and \(\beta _3\): thus, given that no relevant variation is observed for \(\delta _1\), it can be inferred that indecision has increased, but positive evaluations have further polarized.

  • For the Green and the FDP parties, a significant decrease is observed in both \(\delta _1\) and \(\alpha _1\), indicating that the opposition pole grew in intensity but decreased in size. As a result, it can be inferred that some negative yet un-polarized evaluations have floated towards a symmetric indecision (see also Table 9).

Fig. 1
figure 1

Polarization parameters for the best OFS mixture (see Table 8)

Figure 2 provides a joint representation of estimation results for the sizes of polarization and floatation with a ternary plot of mixing weights (left), whereas a scatter plot of polarization parameters \(\alpha _1,\beta _3\) in the unit square is displayed to compare the strengths of unfavourable and favourable opinions over time (right).

Finally, Table 9 reports the chosen asymmetry measure \(\gamma _1\) defined in (4) and the adjusted kurtosis value \(\gamma _2^{\star }\) (15) for the estimated indecision component for those parties and time points where it is not degenerate. The extent of floatation of negative opinions towards neutrality is then quantified, as is the extent by which un-polarized opinions became more homogeneous from 2002 to 2008 for both the FPD and the Greens (more for the FDP than for the Greens). The reverse circumstance is observed for SPD, for which the neutrality component in 2002 left the place to a general yet symmetric indecision. The analysis and the proposed visualization tools for the results could be replicated conditional to covariates values (as gender, geographical residence, etc) to give local assessments of the polarization and floatation dimensions.

Table 9 Asymmetry and Adjusted kurtosis of the floatation component for the best OFS mixture (see Table 8)
Fig. 2
figure 2

Visualization of estimation results for polarization parameters: Probability to vote for German parties

Finally, a 10-fold cross-validation is performed to check the ability of the selected best model (Table 8) to predict the rating distribution. Table 10 reports some summarizing indicators: average and 9th decile over folds of the dissimilarity index between the best model \(\varvec{p}_\mathrm{train}^{\star }\), estimated on the training set, and the response distribution on the test set (\(\varvec{f}_{test}\)), are proposed as a proxy of prediction errors for the test set distribution. With the same goal and strategy, the average over folds of the Kullback–Leibler divergence is reported for candidate OFS models. Results indicate that, beyond fitting ability, the flexibility of OFS models allows to attain satisfactory predictive performance.

Table 10 Summarizing results for goodness of fit and predicting ability of the best model, over 10-fold cross-validation

Final considerations

The paper has discussed mixture specification of Discretized Beta models to explicitly parameterize polarization and floatation of discrete ordered evaluations, as ratings and (truncated) count data. The proposal is more flexible than other alternative models in both fitting performance and interpretation: for instance, the method presented in (Gómez-Déniz et al. 2020) to induce bimodality in a distribution could be applied also to the DB model, at the cost of losing the direct a-priori parameterization of polarization features afforded by the OFS models. A devoted R package for OFS implementation is under development.

Further research will be tailored to the analysis of tail dependencies of polarization and floatation of different survey items with suitable copula modelling, as well as to the implementation of model-based trees to derive response profiles in terms of covariates entailing a significant effect in at least one model’s features (see (Cappelli et al. 2019; Simone et al. 2019) for the case of cub models for rating data). A comparative analysis with mixtures of discretized Kumaraswami distributions also deserve in-depth investigation in future research.