Abstract
The paper proposes a method to perform diagnostics of modelbased trees for preference and evaluation data on the basis of surrogate residual analysis for ordinal data models. The discussion stems from the introduction of binomial regression trees and discusses how to perform local diagnostics of misspecification against alternative model extensions within the framework of mixture models with uncertainty. Three case studies concerning customer satisfaction and perceived trust for information sources illustrate usefulness and versatile applicative extent of the proposal.
1 Introduction
Learning response patterns from preference and evaluation survey data is a crucial task to be addressed to measure respondents’ perceptions and, subsequently, to understand their behavior. For instance, assessments of churn risk and of customers’ loyalty are usually pursued by asking to rate the extent by which a customer would recommend a given service or product to others. Similar investigations concern the assessment of perceptions and opinions about public policies and interventions on relevant subjects. For these data, the binomial regression model is a wellacknowledged choice since it offers a simple and effective representation of the data generating process (Allik, 2014; Grilli et al., 2015; Pinto da Costa JF. et al., 2008; Raidvee et al., 2012; Zhou & Lange, 2009). The estimable probability parameter conveys a synthesis of the latent feeling with versatile interpretation (satisfaction, preference, agreement, etc., depending on the topic under investigation). Regression techniques and standard variable selection algorithms allow to derive relevant response profiles in terms of subjects’ characteristics. Automatic identification of these response profiles can be achieved via tree methods: in particular, the rationale of modelbased binary trees (Zeileis et al., 2008) hinges on the assumption that a given model can be maintained at each partitioning level, thus performing an iterative search for significant splitting variables both to disclose variables interaction and to optimize model fit, on the basis of parameter instability tests. In this framework, the paper discusses the implementation of binomial regression trees for rating data. In order to account for possible framing effects on the response outcome (Hilbig, 2012), the class of CUB mixture models and their extensions (Piccolo & Simone, 2019) provides a research paradigm to assess both feeling and uncertainty of the response distribution in a parsimonious way (Tourangeau et al., 2000). Indeed, any measure of location and shape summarizing the overall feeling should be accompanied by an assessment of the heterogeneity of the distribution and should be adjusted for possible nuisance effects blurring the underlying sentiment, as those induced by inflated frequencies and overdispersion, for instance. Under this rationale, if the binomial model is assumed for the feeling component, different uncertainty specifications can be considered: the uniform distribution for the socalled noncontingent response style, or specific models to cope with different response styles (Gottard et al., 2016; Simone & Tutz, 2018). As for every Mixture of Experts System (Gormley & FrühwirthSchnatter, 2019), variable selection for regression of the featuring mixture parameters is a challenging task. Recently, modelbased trees have been developed for the class of CUB models (Cappelli et al., 2019; Simone et al., 2019), providing useful and innovative tools to perform automatic variable selection and to describe and classify response profiles in terms of uncertainty, feeling, and possible shelter effect, occurring where an excess of frequency is observed at a category acting as refuge response. Motivated by the circumstance that the assumed model might not provide an adequate fit to all response profiles, the paper wishes to contribute to the state of the art addressing the issue of model diagnostics on the response profiles learnt from a modelbased tree, with focus on the binomial case. Specifically, surrogate residual analysis proposed in Liu and Zhang (2018) is exploited to run a local model selection for the best binomial extension that verifies a necessary condition for being correctly specified, within the class of mixture models with uncertainty.
The paper is organized as follows: binomial trees for ordered evaluation data are discussed in Section 2. Section 3 recalls some recent findings on residuals’ analysis for ordinal regression models and provides an overview on how this approach can be exploited as a diagnostic procedure for the chosen class of models and related trees. The ideas put forth in the paper are illustrated in Section 4 on the basis of a customer satisfaction survey as an introductory example. Then, Sections 5 and 6 present two more comprehensive case studies for the analysis of perceived trust towards Television and Press — taken from the ALLBUS German General Social Survey (GESIS  LeibnizInstitut für Sozialwissenschaften, 2016) — and for the analysis of satisfaction of Italian Ph.Ds. for the doctoral studies, taken from the ISTAT survey on carrier placement. ^{Footnote 1} An outline of future developments is given in Section 7.
2 Models and Regression Trees for Ordered Evaluation Data
Throughout the paper, assume that ordered evaluations are collected on a discrete response scale with m ordered categories, say c_{1} ≺ c_{2} ≺⋯ ≺ c_{m}: categories will be coded as integers for notational convenience only (c_{j} = j). Let R denote the response rating variable and assume m > 4 for identifiability of all the models that will be considered.
In order to parameterize the latent sentiment and possible framing effects for the response R, the setting of models on the discrete support is advocated. The (shifted) binomial distribution:
provides a parsimonious yet effective choice for the response R, acknowledged by several literature (Allik, 2014; Pinto da Costa JF. et al., 2008; Raidvee et al., 2012; Zhou & Lange, 2009). The estimable parameter ξ ∈ (0,1) accounts for the location and shape of the response: specifically, parameter 1 − ξ assesses the probability that a category is preferred over the previous ones. ^{Footnote 2}, so that the highest is the value of ξ, the more the distribution will be rightskewed with modal value at one of the lower categories. Thus, if the response scale is oriented so that higher categories correspond to a stronger trait, the measure 1 − ξ is a direct indicator of the overall feeling towards the item being investigated.
Adhering to the rationale supported by some experimental psychologists (Tourangeau et al., 2000), the response process of ordered evaluation data on a discrete support is deemed to be subject to framing effects (Hilbig, 2012). The resulting conceptualization achieved via the class of mixture models with uncertainty (Piccolo & Simone, 2019) involves a combination of actual feeling towards the item and uncertainty, so that:
where \(\mathbf {q} = (q_{1},\dots ,q_{m})\) is a suitable model for uncertainty contaminating the feeling model, and π ∈ (0,1] is the mixture parameter weighting feeling importance. The benchmark choice is the uniform specification encompassed by CUB models (\(q_{r}=\frac {1}{m}\) for all \(r=1,\dots ,m\)), to account for heterogeneity of responses and adjust the feeling assessment accordingly (for short, \(R \sim \) CUB(π,ξ)). In several circumstances, a category acts as a refuge option for the response choice, resulting in an inflated frequency for that category, so that uncertainty degenerates solely or partly to a shelter effect (Iannario, 2012). In this case, the feeling measurement should be corrected accordingly, by letting the contaminating distribution in (2) be:
with \(\text {D}_{r}^{(c)}\) denoting a degenerate distribution with mass concentrated at the shelter category, say c, and δ ∈ [0,1] measures the excess frequency at c. The resulting distribution for the observed response is a CUB with shelter effect at category c, namely:
with shelter effect parameterized by δ = 1 − π_{1} − π_{2}, π_{1},π_{2} ∈ (0,1) weighting the importance of feeling component and heterogeneity, respectively. This model encompasses the binomial with shelter (π_{2} = 0). Further uncertainty specifications are possible to encompass different alleged framing effects (Gottard et al., 2016), as well as to consider response styles to middle and extreme categories by choosing the discretized beta distribution (Ursino & Gasparini, 2018; Simone, 2022) as contaminating model (resulting into CAUB: Combination of adjusted uncertainty and Binomial, see Simone and Tutz (2018)). In general, any discrete distribution with support in \(\{1,2,\dots , m\}\) (or a discretized version of a continuous distribution) with some interpretative extent as nuisance of feeling, can be assumed for the uncertainty component, provided that it does not entail identifiability problem when mixed with the feeling component.
Possibly, the binomial model for feeling can be extended to account for overdispersion in terms of a betabinomial distribution, governed by an extra parameter ϕ > 0 related to the excess variability:
so that the binomial is recovered as \(\phi \rightarrow 0\). As a result, mixtures of the betabinomial model with (discrete) uniform distribution or with degenerate distribution at a fixed category can be designed to account for heterogeneity (resulting into CUBE models: Iannario (2013)) and for shelter effect (betabinomial with shelter).
Given this choice, the implementation of modelbased trees (Zeileis et al., 2008) is a natural step to characterize response profiles with synthetic information of different response features (feeling, heterogeneity, shelter, overdispersion), thus fostering their interpretation and comparative analysis.
2.1 Binomial Trees
Under the binomial model, individual response patterns can be disclosed by letting the probability parameter ξ be subjectdependent with a logit link to a covariate vector x_{i}:
where n denotes the sample size. With respect to the intercept term γ_{0}, regression coefficients of the vector γ measure the effects of covariates x_{i} on response feeling: this is a crucial step especially to model multimodal rating distributions. Model selection and search for significant effects can be performed on the basis of standard likelihood inference methods. Within the modelbased setting to classification and regression trees, here a binomial tree is proposed to identify response profiles for the latent feeling of evaluation data. Thus, assuming that the binomial is the maintained model, a binary partitioning algorithm will allow to iteratively split groups of observations (corresponding to tree nodes) in terms of significant variables for the baseline feeling parameter. The algorithm can be summarized as follows:

1.
At step k, consider the subset of observations corresponding to node k with n_{k} observations (if k = 1, this is called the root node): consider the binomial fit with parameter ξ^{(k)} for this sample;

2.
Given a set of candidate binary splitting variables {D_{i}}, test the regression:
$$ \text{logit}(\xi_{i}^{(k)}) = \gamma_{0}^{(k)} + \gamma_{1}^{(k)} \text{D}_{i}, \quad i=1,\dots,n_{k}, $$(6)to identify significant effects at a predetermined α level;

3.
Among the candidate splitting variables entailing significant differences on response feeling, determine the one fulfilling a prespecified optimality criterion;

4.
Accordingly, divide node k into two descendants according to the value of the selected splitting variable: these children nodes will be enumerated as 2k (left descendant) and 2k + 1 (right descendant). Consider the binomial fit conditional to D_{i} for these subsamples;

5.
For each children node, iterate the procedure from step 1 until at least one stopping rule is met (see below).
Figure 1 displays the general partitioning step of the binomial tree that splits node k into left and right descendants, with feeling parameters that are determined according to (6) conditional to D.
Optimality criteria to derive a partitioning rule can be implemented in the same spirit of CUB model regression trees (CUBREMOT (Cappelli et al., 2019; Simone et al., 2019)). Indeed, the binomial model is nested in CUB. Specifically, a splitting criterion can be either based:

1.
on improvements of the loglikelihood deviance from father to children nodes. If l(ξ^{(k)}) is the estimated loglikelihood maximum for the binomial model at node k, then the partitioning principle will select the binary split that entails the larger (absolute) deviance between father’s and descendants’ levels:
$$ {\Delta} l^{(k)} = l(\xi^{(k)}) \big(l(\xi^{(2k)}) + l(\xi^{(2k+1)})\big); $$(7) 
2.
on maximizing the (normalized) dissimilarity between the estimated distributions \(\hat {\mathbf {p}}^{(2k)}\), \(\hat {\mathbf {p}}^{(2k+1)}\) of children nodes.^{Footnote 3}
$$ \text{Diss}(\hat{\mathbf{p}}^{(2k)},\hat{\mathbf{p}}^{(2k+1)}) = \frac{1}{2}\sum\limits_{j=1}^{m}  \hat{p}_{j}^{(2k)}  \hat{p}_{j}^{(2k+1)} \qquad \in [0,1] $$(8)to identify the most dissimilar response profiles.
Remark 2.1
Customarily, the normalized dissimilarity index Diss(p,f) ∈ [0,1] (Leti G., 1983) is used as a goodness of fit indicator to compare an estimated model p and a relative frequency distribution f, as \(\text {Diss}(\mathbf {p},\mathbf {f}) = \frac {1}{2}{\sum }_{r=1}^{m} f_{r}  p_{r}\), so that the lower is the value, the better are the fitting performance of p. Given this interpretation, the dissimilarity index will be hereafter exploited also to derive a global evaluation of the proposed flexible trees. If \(t_{1}, \dots , t_{p}\) are the terminal nodes of a tree \(\mathcal {T}\), with sizes \(n_{t_{1}},\dots ,n_{t_{p}}\), then consider the following average as an overall measure of fitting performance of \(\mathcal {T}\):
where Diss(t_{i}) denotes the dissimilarity between the frequency distribution of observations in node t_{i} and the local model implied by the procedure.
Prepruning of a modelbased tree can be performed by specifying a priori stopping rules, applied until the a priori maximum depth of the tree (namely, the number of generations descending from the root note) has been reached. Usually, a node is declared terminal (and the partitioning procedure will stop for it) if any of these stopping rule is met:

the sample size of the node is lower than a prespecified threshold to attempt any splitting procedure;

the number of observations for any of the descendants of a candidate (significant) split is lower than a given threshold.
Given a terminal node, the corresponding response profile is determined by the values of covariates corresponding to the edges connecting it to the root.
3 Residual Diagnostics for Ordinal Response Models
For models of the form \(R \sim F_{a}(r; X, \theta )\), where F_{a}(⋅) is the cumulative distribution function of the assumed model, Liu and Zhang (2018) advocate a jittering approach on the probability scale to define residuals. Briefly, a surrogate variable S is defined by conditionally sampling from a continuous uniform distribution over (0,1):
On this basis, the residual variable V for the fit of the assumed model can be defined as:
where η denotes the available information set. One of the main findings of the results established in (Liu & Zhang, 2018, Section 7, Theorem 4) is that \(V X \sim \mathcal {U}(\frac {1}{2},\frac {1}{2})\), if the assumed model is correctly specified. Thus, if this necessary condition for the assumed model does not hold, misspecification is detected. The proposal of the paper is to resort to diagnostics of residuals’ built via the surrogate variable method to determine if the assumed model can be maintained or if it is misspecified in either some model components or in neglecting mixed populations. Beyond graphical inspection of residuals, general tests for comparisons of continuous distributions (as the KolmogorovSmirnov, Cramervon Mises tests), as well as specific tests of uniformity, can be considered to perform residual diagnostics. In the following, the QuesenberryMiller test of uniformity (Quesenberry & Miller, 1977) will be considered to test if an assumed model verifies the necessary condition for being correctly specified. This choice is due to its best power performance against alternative tests (Quesenberry & Miller, 1977).
In the next subsections, we show how to adopt this method to binomial regression trees to perform local uncertainty diagnostics. Specifically, the ultimate goal of the paper is to exploit this necessary condition to determine if an assumed model (hereafter, the binomial) can be maintained at each step of the partitioning process, or if a split should be preferably pruned. In the first case, local model selection for the best binomial extension that fulfils the necessary condition for being correctly specified can be pursued to improve the interpretative extent of the tree.
3.1 Model Misspecification for Missing Component
The goal of this section is to illustrate how residuals’ analysis can be used to check for possible misspecification of the binomial fit to a sample of ratings and, in a similar way, to identify candidate extensions with uncertainty that satisfy the necessary condition for being correctly specified or that should be disregarded instead.
Let \(R \sim \) CUB(π,ξ = 0.7), with m = 7, for varying π ∈ (0,1]: some QQ plots of residuals for the binomial fit on the generated samples, with n = 1000, are shown in Fig. 2 (the reference distribution is the continuous uniform distribution on \([\frac {1}{2},\frac {1}{2}]\)). Table 1 reports the pvalues for the QuesenberryMiller uniformity tests: results indicate that, unless the weight of the binomial component is very high, significant evidence against the correct specification of the binomial on CUB data is found.
Similar tests can be performed to identify the threshold under which misspecification of the binomial due to overdispersion is significantly detected. Assume that data are generated according to a betabinomial model as defined in (4). Figure 3 displays the QQ plots comparing the uniform distribution over \((\frac {1}{2},\frac {1}{2})\) with the distribution of the surrogate residuals for the binomial and the betabinomial estimated models. Accordingly, Table 2 reports the pvalues for the QuesenberryMiller test of uniformity for the surrogate residuals: it is possible to conclude that missedspecification of overdispersion is identified also for moderately small values of ϕ.
3.2 Neglecting Subpopulations
In order to show how model misspecification can be detected if subpopulations exist and are neglected, consider the surrogate residuals’ distribution for the binomial model with no covariate in case data are generated according to \(R_{i} \sim \text {Bin}(\xi _{i}), \text {logit}(\xi _{i}) = \gamma _{0} + \gamma _{1} \text {D}_{i}\), for a given dummy variable D_{i}, over a scale with m = 7 ordered categories. Figure 4 shows QQ plots of the distributions of the residuals (unconditional and conditional to D_{i}) when the mixed population effect is disregarded or correctly accounted for (left and right panel, respectively).
Thus, for the binomial tree derived with the procedure defined in Section 2, a modelselection procedure can be applied at each terminal node. In particular, if the binomial model verifies the necessary condition for being correctly specified, then the search for fitting improvement is pursued among those extensions that verify this condition in turn. Otherwise, the node is declared terminal and the parent split should be preferably pruned if evidence for misspecification persists even after performing all candidate splits.
Remark 2.2
Since residuals’ construction is based on random generation, the diagnostic procedure hereafter implemented for real data analysis will consider the average pvalue of the chosen uniformity test over a set of replications of residuals’ generation for the estimated model. This strategy will be particularly relevant to assess binomial diagnostics at tree nodes and allows to take the uncertainty of parameter estimation implicitly into account if a large number of replications is considered.
4 Illustrative Example on Customers’ Satisfaction Survey
The ABC Annual Customer Satisfaction Survey refers to a company offering IT solutions to media and telecommunication service providers (Kenett & Salini, 2012, Chapter 2). On a Likerttype scale with m = 5 ordered categories (1 =“very low,” 5 =“very high”), customers were asked to rate their overall satisfaction, along with satisfaction for several aspects of the customer experience, including the following: the extent by which they would recommend the ABC company to a third company (recom); the extent by which the would reconsider the company for further purchases (product); the overall satisfaction for the equipments of the purchase (equipment); for sales and technical support (sales, technical); purchasing support (purchase) and pricing. Due to the small sample size obtained after omitting missing values listwise (n = 212), only small trees can be grown: thus, this dataset will be used for illustration purposes only.
Table 3 reports the average pvalue for the QuesenberryMiller test of uniformity for surrogate residuals for selected candidate models (50 random generations were considered), showing that — at the 5% level — the binomial is significantly misspecified for ratings concerning recom, equipment, technical. For equipment, the only model that fulfils the necessary condition for correct specification is the binomial with the addition of a shelter effect (at category c = 4).
It is seen that the binomial model (without covariates) cannot be maintained for recom at the fixed significance level. This circumstance may be due to missing subpopulations: indeed, the primary split of a dissimilarity binomial tree separates recom ratings provided by customers who are not satisfied with the sales support (sales≤ 2) from those who are satisfied (sales≥ 3), for which the binomial model can be safely assumed instead. Figure 5 displays diagnostics check of uniformity of residuals at the root node and at left and right descendants (top row panels), along with the barplots of the frequency distributions at the nodes, with superimposition of the fitted Binomial model.
Node 2 is declared terminal by the procedure due to a prepruning condition relative to the sample size of the node: the split that is selected for node 3, instead, cannot be accepted for the binomial tree since the (conditional) binomial is significantly misspecified for its left descendant (node 6: see Fig. 6). Thus, the tree growing procedure stops and node 3 is declared terminal as well.
Thus, one should prune this further split under the binomial tree.
As a more comprehensive example in this respect, consider the binomial tree for the overall satisfaction (satis) to disentangle local association for different aspects of the customer experience: here the deviance criterion is considered.
Table 4 reports the average pvalue obtained over 50 replications of residual generation for competing models: it is seen that the binomial and all the extensions satisfy the necessary condition for being correctly specified at the given significance level. Thus, local model selection can be performed to identify the best uncertainty specification according to standard criteria.
Accordingly, Table 5 summarizes main information for each node: estimated feeling measure \(1\hat {\xi }\), mixing weight \(\hat {\pi }\) of the feeling component.^{Footnote 4} possible shelter effect parameter \(\hat {\delta }\) and corresponding shelter category under the best mixture model with uncertainty, as well as dissimilarity of both binomial and best model with respect to the observed frequency distribution. For each inner node of the tree, the selected split variable and split point to determine the left and right descendants are also reported. The best model is selected by jointly considering results from likelihood ratio tests for nested models (in particular, with respect to the baseline binomial), and from BIC comparisons for nonnested models. In case more models are equivalent in terms of BIC index (Burnham & Anderson, 2003), the model with the lowest dissimilarity with the observed frequencies can be chosen.
From response profiles learnt at terminal nodes (3,5,8,9), it can be claimed that:

customers’ propensity to recommend the company is the strongest indicator of overall satisfaction (indeed, node 3 refers to overall satisfaction for those who rated recom= 5);

the most influential dimension of overall satisfaction is the satisfaction for the sales support: thus, overall satisfaction can be controlled by focusing primarily on the control of this aspect of the customer experience;

a small percentage of structurally dissatisfied customers is present (measured by \(\hat {\delta }\)), stronger for respondents who are moderately satisfied for the sales support (sales= 3);

the feeling measure \((1\hat {\xi })\), weighted for the importance \(\hat {\pi }\) of the feeling component, can be considered as an overall satisfaction indicator, and response profiles can be ranked accordingly. In the present example, customer satisfaction should be improved starting from the response profile associated with node 8 (corresponding to customers so that recom≤ 4 and sales≤ 2).
5 Perceived Trust Towards Press and Television
Perceived quality of products and services is related to perceived trust of users in a complex and multifaceted scheme that involves customers’ satisfaction and loyalty (Bloemer et al., 1999; Chiou & Droge, 2006; Eisingerich & Bell, 2008; Garbarino & Johnson, 1999).
With reference to the ALLBUS German General Social Survey of 2012 (GESIS  LeibnizInstitut für Sozialwissenschaften, 2016), consider the perceived trust expressed by n = 2692 respondents towards Press and Television, as institutions, collected on a rating scale with m = 7 ordered categories (1 = “no trust at all,” 7 = “a great deal of trust”), after listwise omission of missing values of the considered set of variables.
Among the available covariates used to grow the tree (including gender, employment status, German citizenship, income, and marital status), the procedure has selected:

Age_{c}: age of the respondent in ordered classes of years (1 = 18–29, 2 = 30–44, 3 = 45–59; 4 = 60–74; 5 = 75–89; 6 = more than 90);

notwork: a dummy indicating if the respondent is unemployed (notwork= 1) or employed (notwork= 0);

Internet: a dummy indicating if the respondent uses internet for private purposes (Internet= 1) or not (Internet= 0);

(leftright): leftright self placement on political orientation (semantic scale with ten categories running from extreme left (leftright= 1) to extreme right (leftright= 10);

univ: a dummy variable to indicate whether the respondent has a university education (univ= 1) or not (univ= 0);

west: a dummy variable to indicate if respondent’s residence is in the Old Federal Republic (West Berlin: west= 1) or in former German Democratic Republic (East Berlin: west= 0).
For both ratings on perceived Trust towards Press and Television, the response profiles derived from the dissimilarity binomial tree are discussed. Indeed, for latent traits like perceived trust which are deemed to exhibit similar sentiment among the population, the dissimilarity criterion can be helpful in determining significant differences in model parameters that highlights more dissimilar response patterns. Given the orientation of the response scale, the feeling measure \(1\hat {\xi }\) under the binomial model is a direct indicator of perceived trust.
Remark 2.3
Binomial trees are grown up to the fourth generation of descendants from the root node: as further prepruning rules, a minimum sample size of 250 observations per node is required to attempt a split, and a split is admissible only if each descendant corresponds to 100 observations at least.
5.1 Perceived Trust Towards Press
The binomial tree highlights the following response profiles at the terminal nodes:

Node 4:Employed residents in former GDR (west= 0);

Node 5:Unemployed residents in former GDR (west= 0);

Node 6:Residents in former Federal Republic (west= 1) that do not use internet for private purposes;

Node 14:Young residents (aged less than 29) in former Federal Republic (west= 1) that use internet for private purposes;

Node 15:Adult and elderly respondents, resident in former Federal Republic (west= 1), that use internet for private purposes.
Before commenting results from the binomial tree, diagnostics of the assumed model needs to be performed. For each node of the binomial tree, Table 6 reports the average value of the distribution of pvalues for the QuesenberryMiller test of uniformity of surrogate residuals to verify the necessary condition for correct specification of binomial and its extensions.
For the sake of completeness, Fig. 7 displays the boxplots of the pvalue distribution for terminal nodes.
Assuming the average pvalue as synthesis of the residuals’ generation procedure, it follows that the binomial can be maintained at each level: thus, local model selection for binomial extensions can be pursued to account for possible overdispersion and to identify the uncertainty source best characterizing each response profile. This selection process is based on a combined analysis of information on fitting performance related to LRT and BIC index, reported in Tables 7 and 8.
After determining the best model for each node, Table 9 reports the relevant results, indicating featuring parameters, split at the node, sample size, and dissimilarity of both the baseline binomial and the best fitting mixture extension with respect to the observed frequencies.
It turns out that fitting performances of the feeling model improves if mixed with a shelter effect at the first category c = 1 (thus, indicating that distrust is a structural phenomenon, yet with different weights). Focusing on response profiles at terminal nodes (4, 5, 6, 14, 15), perceived trust towards Press is the lowest for unemployed people living in former East Germany: this response profile corresponds also to the strongest structural distrust (as measured by \(\hat {\delta }\)), whereas the strongest trust characterizes responses from people living in former west Germany that do not have access to Internet. Feeling differences due to age are found in former West Germany for those who have access to Internet for private use: in particular, young respondents experience a higher trust towards Press than older adults. For the sake of completeness, Fig. 8 displays observed and fitted response distributions (comparing binomial and selected best model), along with quantilequantile plots of residuals, for the terminal nodes.
5.2 Perceived Trust for Television
The binomial tree highlights the following response profiles at the terminal nodes:

Node 4:Residents in former East Germany that do not use Internet for private purposes;

Node 5:Residents in former West Germany that do not use Internet for private purposes;

Node 7:Respondents that use Internet for private purposes, aged more than 60 years;

Node 13:Respondents that use Internet for private purposes, aged less than 60 years, with a University education;

Node 24:Respondents that use Internet for private purposes, aged less than 60 years, with no University education and with leftwing political orientation (leftright≤ 4);

Node 25:Respondents that use Internet for private purposes, aged less than 60 years, with no University education and with neutral or rightwing political orientation (leftright≥ 5).
Table 10 reports the average of the distribution of the pvalues for the QuesenberryMiller test of uniformity (500 random generations of residuals): Fig. 9 supplements the discussion with boxplots of these distributions. It follows that, at the root node, evidence for the correct specification of the binomial is quite weak: however, in light of the improvement occurring at the descending nodes 2 and 3, this circumstance can be partly due to the existence of two subpopulations. Similar remarks hold for subsequent splits of node 3 into node 6 and node 7, of node 6 into nodes 12 and 13, and finally for node 12 into nodes 24 and 25. As a result, there is sufficient evidence that the binomial can be maintained as assumed model for the baseline response generating process, possibly after accounting for mixed populations.
Table 11 reports the main results from the local model selection for the best adjustment of the binomial: it can be concluded that the lowest Trust towards Television corresponds to extremely leftwing politically oriented young adults with no university degree and that use internet for private purposes; the highest trust, instead, characterize respondents living in former West Germany that do not use internet. Overall, older people trust Television more than young adults; in addition, structural distrust, as measured by \(\hat {\delta }\), is stronger for young adults than it is for seniors. When comparing former West and East Germany residents, it follows that the latter perceive a lower trust towards television overall, and are also subject to a quite strong structural distrust.
Results indicate that different uncertainty contamination are needed at different partitioning levels: in particular, the specification of a shelter effect (at category c = 1 for all nodes except for node 24) improves fitting of either the binomial or CUB models, as well as possible moderate overdispersion effect should be accounted for certain response profiles. Thus, a fixed modelbased tree would have missed to account for the diversified features of response profiles for perception of Trust for TV. In order to display the local model adjustment that is needed at the response profiles, Fig. 10 displays, for each terminal node, the QQ plot of residuals for binomial and its best extension (see Table 11), as well as fitted distributions superimposed to the barplot of the observed distribution.
5.2.1 Tree Performance
The fitting performance of the binomial tree can be assessed by averaging the dissimilarity index to compare frequency distribution and the model implied by the tree computed at each terminal node, with weights given by sample sizes, as defined in (9). Similarly, the dissimilarity between frequency distribution and best binomial extension can be computed to evaluate the advantage of resorting to the proposed procedure of local uncertainty diagnostics: results are reported in Table 12 and indicate a noticeable improvement in fitting performance from the binomial tree to the adjusted version.
With respect to predictive ability, several measures are available for evaluating the performance of classification procedures, but few proposals are specifically designed for ordinal responses, among which the Ranked Probability Score (Gneiting & Raftery, 2007; Murphy, 1971; Simone & Piccolo, 2022) (corresponding to assigning the median of the predictive model to a new observation), or some resorting to the modal value of the predictive model as a prediction for a new observation, like the proposals introduced by (Ballante et al., 2022; Cardoso & Sousa, 2011). However, the modal value may provide inadequate representation for an ordinal model, even if is unique. In general, the rationale pursued by the proposed procedure is based on the association of a whole predictive model to a given covariate profile, rather than a single response value, so that featuring model parameters can be used to characterize future observations from that profile. For instance, consider a sample of 500 observations from the original sample of both case studies on Perceived Trust to be used as a test set. Then, for each test observation, the corresponding covariate profile is associated to the node of the binomial tree where it is classified into; finally, with respect to the frequency distribution of the test observations classified in that node, prediction performance can be assessed via the dissimilarity value and the Ranked Probability Score (RPS^{Footnote 5}) with the estimated conditional model under the binomial tree and the estimated best mixture with uncertainty. Results reported in Table 13 indicate satisfactory performance.
6 Satisfaction of Italian Ph.D. Awardees
The approach proposed in the paper takes the lead from the introduction of the binomial tree, since it is the simplest model on the discrete support that can be assumed for evaluation data to parameterize the latent feeling. However, the diagnostic procedure and the subsequent local specification adjustments can be implemented for every modelbased tree with respect to extension of the assumed model. For illustration purposes, an example on CUBREMOT is presented (Cappelli et al., 2019; Simone et al., 2019). Consider the overall satisfaction for the doctoral experience rated by Italian Ph.Ds. awarding the title in 2012 and 2014, collected within the survey run by the Italian National Statistical Office (ISTAT) to investigate their satisfaction for the professional placement and the Ph.D. programme (available at https://www.istat.it/it/archivio/87536).
Satisfaction for the Ph.D. experience was rated with reference to several dimensions (quality of teaching courses, spaces and tools at disposal, and so on): ratings for overall satisfaction will be considered hereafter as response variable. All the ratings were collected on a discrete scale with categories coded from 0 to 10: the rating scale has been subsequently modified to a scale with 8 categories due to zeroscores observed in certain categories, so that higher scores along the response scale corresponds to higher levels of satisfaction.
After omitting missing values for the variables of interest, n = 3830 observations are used for the analysis. Among the available covariates used to grow the tree (including current employment status, residence, discipline of the Ph.D. program, marital status, participation in research project, and year of Ph.D. completion), the procedure has selected:

gender: a dummy indicating if the respondent is male (gender= 0) or female (gender= 1);

abroad: a dummy indicating if the respondent had any work or training experience abroad after the Ph.D. completion (abroad= 1) or not (abroad= 0);

research: a dummy indicating if the respondent currently works in the research domain (research= 1) or in other fields (research= 0);

stem, a dummy variable to indicate whether the Ph.D. program was relative to STEM disciplines (stem= 1) or different ones (stem= 0);

north, a dummy variable to indicate if respondent awarded the Ph.D. title from a University located in Northern Italy (north= 1) or in a different geographical area (north= 0).
Due to a structural heterogeneity of the distribution, the binomial does not satisfy the necessary condition for being correctly specified at any of the partitioning levels for the corresponding modelbased tree, not even after searching for possible neglected subpopulations. Indeed, for a general CUB model, the necessary condition for being correctly specified is verified at each stage of the growth of a CUBREMOT with the deviance splitting criterion (results are not reported for brevity). Thus, CUB can be assumed as maintained model for the response generating process. Then, at each step, the procedure selects the most significant binary split in either feeling or uncertainty parameters. Table 14 reports the main information concerning the nodes composing the tree with respect to the local search of the best model extension, among those verifying the necessary condition for being correctly specified (in particular, candidate models are CUBE and CUB with shelter) see Fig. 11 for visualization of results.
First, it is worth to highlight that a CUBREMOT tree with a prespecified shelter fixed at all partitioning levels would have entailed suboptimal descriptions of the response profiles. Once diagnostic checks have confirmed that the baseline CUB can be maintained as assumed model, the flexible model selection that is performed locally provides, instead, more accurate descriptions of the response patterns (for instance, shelter effects are found at different levels of the scale for different response profiles). In particular, it follows that:

Ph. Doctors with studies in disciplines different from STEM experience lower feeling than Ph. Doctors with studies in STEM disciplines, especially if the current job does not involve research. For Ph. Doctors in disciplines different from STEM, feeling of evaluation decreases if the respondent had any work or training experience abroad after the Ph. doctors;

Among the Ph. Doctors working in research, women are less satisfied than men, especially if the Ph.D. program concerned disciplines different from STEM and if no abroad experience has occurred after the Ph.D. completion. In the latter case, the lower feeling towards satisfaction experienced by women is revealed also in terms of the location of the shelter effect at a lower category than it is found for men (even if in both cases concentration is found at the center of the scale). Thus, even for neutral evaluation, women tend to assess their evaluations with lower scores.
7 Conclusions and Further Developments
Statistical modelling of preference and evaluation data that accounts for uncertainty can be embedded in the 7th dimension of Information Quality (Kenett & Shmueli, 2014; Kenett, 2016), a paradigm for analytic research that guides scholars and stakeholders towards a qualified experience of data analysis to successfully pursue research goals and decisionmaking process. Being framed within this modern debate, the paper proposes a method to perform local diagnostics of modelbased trees for evaluation data, assuming the setting of mixture models with uncertainty. The discussion stems from the introduction of binomial regression trees, following the rationale of CUBREMOT technique (Cappelli et al., 2019; Simone et al., 2019). With respect to the consolidated approach to modelbased trees, which assumes a constant model specification at all partitioning levels, the localmodel selection for the best extension of the binomial model at tree nodes allows for a more accurate assessment of feeling. The adoption of this flexible approach is subject to the fulfilment of a necessary condition for correct specification that can be assessed through diagnostics of surrogate residuals for ordinal data models. Then, a pruning criterion could be derived for those splits where evidence for model misspecification is found. In a similar way, diagnostics check for modelbased trees could be exploited to tune the depth of the tree, or to select the best tree out of a set of alternative ones. The thread of the discussion has been the binomial model for rating data, but the proposal may be adapted to other preference models on the discrete support (as shown in Section 6 for the case of CUBREMOT). In the end, the flexible binomial tree with uncertainty could be possibly exploited to obtain: a more precise imputation of missing values or prediction rules on the basis of the derived response profiles and its characterizing model parameters; an adjusted synthetic indicator of the latent feeling of the trait being examined. Further research will address the possibility of incorporating the proposed diagnostics methods within the partitioning process to drive the selection of the split and provide more general flexible uncertainty trees: see Banchelli (2019) for a recent proposal of flexible trees on the basis of nonnested models for count data. With respect to residuals diagnostics, further methodological research will assume a comparative perspective with alternative tests and techniques to detect model misspecification; adjustments of the splitting rule on the basis of stability tests of regression parameters, as for the modelbased trees introduced in Zeileis et al. (2008); a sensitivity analysis to determine the most suited synthetic indicator of the distribution of the pvalues for the uniformity tests for residuals. Finally, the proposal could be exploited for prediction purposes with the implementation of modelbased random forests for rating variables. In this setting, specific attention should be devoted in the derivation of variable importance measures due to possible indirect to effects of splitting variables (Gottard et al., 2020), which could also affect the interpretation of response profiles read from a single tree.
Code Availability
The code to reproduce the analysis is available upon request from the author. A package for the R environment devoted to implementation of modelbased trees for ordinal responses based on discrete models is under development.
Availability of Data and Material
Data considered in Section 4 are available from download from the website of the book: https://www.wiley.com/enus/Modern+Analysis+of+Customer+Surveys%3A+with+Applications+using+Rp9780470971284; Data considered in Section 5 are taken from the official repository https://search.gesis.org/research_data/ZA5276; Data considered in Section 6 are available online from the ISTAT repository at https://www.istat.it/it/archivio/87536.
Notes
The binomial model implies a constant probability ξ of preference of a category over the subsequent one, for each pair of adjacent categories.
For a binomial tree, \( p^{(j)} \sim \text {Bin}(\xi ^{(j)})\).
Notice that \(\hat {\pi }=1\hat {\delta }\) for the binomial with shelter.
The lower the RPS value, the better the prediction performance.
References
Allik, J. (2014). A mixedbinomial model for Likerttype personality measures. Frontiers in psychology, vol. 5.
Ballante, E., Figini, S., & Uberti, P. (2022). A new approach in model selection for ordinal target variables. Computational Statistics, 37(1), 43–56.
Banchelli, F. (2019). Flexible modelbased trees for count data. In G. C. Porzio, F. Greselin, & S. Balzano (Eds.) Cladag 2019: Book of short papers. ISBN: 9788883171086: Edizioni Università di Cassino, pp. 63–66.
Bloemer, J., de Ruyter, K., & Wetzels, M. (1999). Linking perceived service quality and service loyalty: A multidimensional perspective. European Journal of Marketing, 33(1112), 1082–1106.
Burnham, K. P., & Anderson, D. R. (2003). Model selection and multimodel inference: A practical informationtheoretic approach, 2nd ed. New York: Springer.
Cappelli, C., Simone, R., & Di Iorio, F. (2019). CUBREMOT: A modelbased tree for ordinal responses. Expert Systems with Applications, 124, 39–49.
Cardoso, J. S., & Sousa, R. (2011). Measuring the performance of ordinal classification. International Journal of Pattern Recognition and Artificial Intelligence, 25(08), 1173–1195.
Chiou, J. S., & Droge, C. (2006). Service quality, trust, specific asset investment, and expertise: Direct and indirect effects in a satisfactionloyalty framework. Journal of the Academy of Marketing Science, 34(4), 613–627.
Eisingerich, A. B., & Bell, S. J. (2008). Perceived service quality and customer trust: Does enhancing customers’ service knowledge matter?. Journal of Service Research, 10(3), 256–268.
Garbarino, E., & Johnson, M. S. (1999). The different roles of satisfaction, trust, and commitment in customer relationships. Journal of Marketing, 63(2), 70–87.
GESIS  LeibnizInstitut für Sozialwissenschaften. (2016). German General Social Survey (ALLBUS)  Cumulation 19802014. GESIS Data Archive, Cologne. ZA4584 Data file Version 1.0.0, https://doi.org/10.4232/1.12574.
Gneiting, T., & Raftery, A. (2007). Strictly proper scoring rules, prediction, and estimation. Journal of the American Statistical Association, 102(477), 359–378.
Gottard, A., Iannario, M., & Piccolo, D. (2016). Varying uncertainty in CUB models. Advances in Data Analysis and Classification, 10(2), 225–244.
Gottard, A., Vannucci, G., & Marchetti, G. M. (2020). A note on the interpretation of treebased regression models. Biometrical Journal, 62(6), 1564–1573.
Gormley, I. C., & FrühwirthSchnatter, S. (2019). Mixture of experts models, Chapter 12. In S. FrühwirthSchnatter, C. Gilles, & C.P Robert (Eds.) Handbook of mixture analysis 1^{n}d ed, Chapman & Hall CRC, Handbooks of Modern Statistical Methods, https://doi.org/10.1201/9780429055911.
Grilli, L., Rampichini, C., & Varriale, R. (2015). Binomial mixture modeling of university credits. Communications in Statistics  Theory and Methods, 44(22), 4866–4879.
Iannario, M. (2012). Modelling shelter choices in a class of mixture models for ordinal responses. Statistical Methods and Applications, 21, 1–22.
Iannario, M. (2013). Modelling uncertainty and overdispersion in ordinal data. Communications in Statistics Theory and Methods, 43, 771–786.
Hilbig, B. E. (2012). How framing statistical statements affects subjective veracity: Validation and application of a multinomial model for judgments of truth. Cognition, 125(1), 37–48.
Kenett, R. S., & Shmueli, G. (2014). On information quality. Journal of Royal Statistical Society, Series A, 177(1), 3–38.
Kenett, R. S. (2016). Information quality: The potential of data and analytics to generate knowledge, John Sons.
Kenett, R. S., & Salini, S. (2012). Modern analysis of customer surveys with applications in R. New York: Wiley.
Leti G. (1983). Statistica descrittiva. Bologna: Il Mulino.
Liu, D., & Zhang, H. (2018). Residuals and diagnostics for ordinal regression models: A surrogate approach. Journal of the American Statistical Association, 113(522), 845–854.
Murphy, A. (1971). A note on the ranked probability score. Journal of Applied Meteorology, 10, 155–15.
Piccolo, D., & Simone, R. (2019). The class of CUB models: Statistical foundations, inferential issues and empirical evidence. Statistical Method and Applications, 28, 389–435.
Pinto da Costa JF., Alonso H., & Cardoso JS. (2008). The unimodal model for the classification of ordinal data. Neural Networks, vol. 21, pp. 78–91. Corrigendum. In (2014). Neural Networks, vol. 59, pp. 73–75.
Quesenberry, C. P., & Miller, F. L. Jr. (1977). Power studies of some tests for uniformity. Journal of the Statistical Computation and Simulation, 5(3), 169–191.
Raidvee, A., Pölder, A., & Allik, J. (2012). A new approach for assessment of mental architecture: Repeated tagging. Plos One, vol. 7, (1).
Simone, R. (2022). On finite mixtures of discretized beta model for ordered responses. TEST, 31, 828–855.
Simone, R., Cappelli, C., & Di Iorio, F. (2019). Modelling marginal ranking distributions: The uncertainty tree. Pattern Recognition Letters, 125, 278–288.
Simone, R., & Piccolo, D. (2022). On the predictability of a class of ordinal data models. In A. Balzanella, M. Bini, C. Cavicchia, & R. Verde (Eds.) Book of short papers SIS 2022, 51st scientific meeting of the italian statistical society. ISBN, 9788891932310, pp. 1053–1058: Pearson publisher.
Simone, R., & Tutz, G. (2018). Modelling uncertainty and response styles in ordinal data. Statistica Neerlandica, 72(3), 224–245.
Tourangeau, R., Rips, L. J., & Rasinski, K. (2000). The psychology of survey response. Cambridge: Cambridge University Press.
Ursino, M., & Gasparini, M. (2018). A new parsimonious model for ordinal longitudinal data with application to subjective evaluation of a gastrointestinal disease. Statistical Methods in Medical Research, 27(5), 1376–1393.
Zeileis, A., Hothorn, T., & Hornik, K. (2008). Modelbased recursive partitioning. Journal of Computational and Graphical Statistics, 17, 92–514.
Zhou, H., & Lange, K. (2009). Rating movies and rating the raters who rate them. American Statistician, 63(4), 297–307.
Funding
Open access funding provided by Università degli Studi di Napoli Federico II within the CRUICARE Agreement.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
The paper has been prepared in compliance with ethical standards.
Conflict of Interest
The author declares no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Simone, R. Uncertainty Diagnostics of Binomial Regression Trees for Ordered Rating Data. J Classif 40, 79–105 (2023). https://doi.org/10.1007/s00357022094295
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00357022094295
Keywords
 Ordered data
 Modelbased trees
 Binomial regression
 Surrogate Residuals
 Mixture models with uncertainty