Generalized residuals and outlier detection for ordinal data with challenging data structures

Motivated by the analysis of rating data concerning perceived health status, a crucial variable in biomedical, economic and life insurance models, the paper deals with diagnostic procedures for identifying anomalous and/or influential observations in ordinal response models with challenging data structures. Deviations due to some respondents’ atypical behavior, outlying covariates and gross errors may affect the reliability of likelihood based inference, especially when non robust link functions are adopted. The present paper investigates and exploits the properties of the generalized residuals. They appear in the estimating equations of the regression coefficients and hold the remarkable characteristic of interacting with the covariates in the same fashion as the linear regression residuals. Identification of statistical units incoherent with the model can be achieved by the analysis of the residuals produced by maximum likelihood or robust M-estimation, while the inspection of the weights generated by M-estimation allows to identify influential data. Simple guidelines are proposed to this end, which disclose information on the data structure. The purpose is twofold: recognizing statistical units that deserve specific attention for their peculiar features, and being aware of the sensitivity of the fitted model to small changes in the sample. In the analysis of the self-perceived health status, extreme design points associated with incoherent responses produce highly influential observations. The diagnostic procedures identify the outliers and assess their influence.


Introduction
Decision making processes are often carried out and supported by means of the results of questionnaires, where the respondent expresses his/her evaluation/perception through a rating on a scale made up by ordered categories.From a statistical viewpoint, the ratings are ordinal responses, which may be related to the subjects' covariates through an appropriate model.The most frequently implemented models to this purpose, within the generalized linear models (GLM) framework, are the cumulative models introduced by McCullagh (1980).Here the ordinal response is assumed to be a categorization of a continuous latent variable, which depends on the covariates through a latent regression.
Anomalous data may arise in this context-eliciting a challenging data structure-because of outlying covariates or incoherent behaviors by respondents, who deliberately or unconsciously choose a wrong category, as a consequence of a satisficing aptitude, willingness to joke or fake, or a search for a shelter category (Iannario and Piccolo 2016).Furthermore, gross errors can occur.Anomalous data produce a contamination of the assumed model distribution, which can seriously affect the reliability of the Maximum Likelihood Estimators (MLE) leading to a poorly fitted model.Recent studies point out the need of procedures for outlier detection (Riani et al. 2011) or, alternatively, of a robust inferential approach (see Croux et al. (2013), Iannario et al. (2017) and Scalera et al. (2021) in the context of GLMs, and Iannario et al. (2016) for cub models (Piccolo 2003)).
The current paper deals with diagnostic procedures for ordinal response models finalized to detect atypical and influential observations.Identifying anomalous data can prevent from unreliable inferential results especially when these data are due to gross errors and are influential.On the other side, atypical data can reveal statistical units whose behavior is incoherent with the majority of the data, and therefore of interest for their challenging characteristics.They tell a different story and may be the target of ad hoc decision making and policies.
Special emphasis is placed on the generalized residuals (Gourieroux et al. 1987;Franses and Paap 2004), whose properties are investigated.They are connected with the covariates similarly to the least squares residuals in linear regression.Their magnitude allows to detect incoherent responses and to assess the influence of extreme design points.
Diagnostics is therefore pursued along two lines: outliers detection and influential data identification.The former are statistical units incoherent with the model that fits the majority of data, whereas the latter are observations with a high impact on both estimation and testing.The outliers can be identified by the analysis of the generalized residuals produced by ML-or-more effectively-by robust M-estimation, whereas the assessment of the influence of the statistical units relies on the inspection of the weights generated by M-estimation (Iannario et al. 2017).
The above diagnostic procedures are applied to the perceived health status, a valuable predictor of people's future health care use, mortality and social capital (see Palladino et al. (2016), Raina et al. (2002) and Arezzo and Giudici (2017)).The analysis is also related to biomedical aspects associated with care perspective 1 3 Generalized residuals and outlier detection for ordinal data… and lifestyle improvements, which lead to a lower mortality risk (Rappange et al. 2016).Hence the perceived health status is a key source of information, which affects many economic and health choices.Data are from the Survey of Health, Ageing and Retirement in Europe (SHARE) in 2016-17, which includes nationally representative samples of n = 68 .618 persons aged 50 and older from 16 European nations.The dependent variable is the self-reported health with responses on a 5-point scale.Useful covariates are life satisfaction, education and muscle strength (measured by an objective test).Diagnostics allow to identify anomalous responses and bad leverage points, which can alter the outcome of the tests on the relevance of the covariates.
The paper is organized as follows: the next Section provides a quick overview of the cumulative models for ordinal responses.Section 3 illustrates the properties of the generalized residuals pointing out their role in the analysis of anomalous and influential data.Section 4 briefly reviews alternative residuals.Section 5 introduces simple guidelines for outlier detection based on the residuals.Diagnostic procedures based on the residuals and the weights produced by M-estimation are illustrated in Sect.6.The proposed diagnostics is implemented on the self-perceived health status data in Sect.7. Some concluding remarks end the paper.Proofs related to the properties of the generalized residuals are reported in the Appendix, whereas simulation studies assessing the performances of the proposed procedures and alternative diagnostic methods are in the supplementary material, along with the R code for the reproducibility of the analysis.

Cumulative models for ordinal data
In cumulative models the response variable Y takes integer values between 1 and m.Its value derives from the categorization of an underlying (continuous) latent variable Y * .We have where −∞ =  0 <  1 < … <  m = +∞ are the thresholds (or cut-points) of the scale of Y * .Given the vector X i = X i1 , … X ip of covariates for the i-th statistical unit, the relationship with X i is established through the latent regression model where = ( 1 , … , p ) � is the vector of regression coefficients, i ∼ G(⋅) and G(⋅) is a distribution function.Equations ( 1) and (2) yield the cumulative probability Pr(Y ≤ j) = G( j − X ) , hence G −1 defines the link function.If G(⋅) is the stand- ardized normal distribution we have the probit link, while the logistic distribution yields the logit link, and the extreme value distribution leads to the (complementary) log-log link (Agresti 2010;Tutz 2012).Alternative links are proposed by Aranda-Ordaz (1983) and Genter and Farewell (1985).Here we focus on the probit link. (1) ) is an open subset of ℝ p+m−1 .Consider an observed random sample (y i , x i ), for i = 1, 2, … , n , where y i and x i are the values of Y i and X i .The log- likelihood function (McCullagh 1980) is , where I[ ] is the indicator function that takes value 1 if holds and 0 otherwise, and The score function is S n ( ) = ∑ n i=1 S( ;y i , x i ) where S( ;y i , , and S l ( ;y i , Iannario et al. (2017) for the analytical expressions).The MLE is the solution Here the interest focuses on the score functions for the regression coefficients k The quantities e ij ( ) in (3) are the generalized residuals where g(⋅) is the density function corresponding to the distribution function G(⋅) .The sign in ( 4) is reversed with respect to the original definition (Franses and Paap 2004, p. 123) to make e ij ( ) an increasing function of the categories for given x i (see (iii) of Sect.3).Notice that the inner sum in (3) contains only one term, the j-th one such that y i = j , since ∑ m j=1 I(y i = j) = 1.

Generalized residuals
The estimating Eq. ( 3) are analogous to the normal equations in the linear regression model, i.e.
∑ n i=1 r i x i = 0, where the r i 's are the least squares residuals and 0 is a vector of zeros.Hence the generalized residuals (hereafter abbreviated as 'residuals' when no conflict arises) in the cumulative models context correspond to the r i 's in linear models, and actually they share some of the properties of the least squares residuals.In particular (i) There is only one value per subject, i.e. e ij ( ) if y i = j; (ii) Their expectation is zero; (iii) Under the assumption that log {g(⋅)} is concave, they are monotonically increasing with respect to the observed category for a fixed value of the covariates, i.e. e ij ( ) < e ij � ( ) for j < j ′ .
In addition the residuals have the following proprieties of specific interest for ordinal response models: (3) Generalized residuals and outlier detection for ordinal data… (a) They do not depend on arbitrary scores/values attributed to the categories, (b) They satisfy the branching property (if two categories are merged the new residual is a weighted average of the initial residuals, with weights proportional to the probabilities of the merged categories).
Proofs of (ii), (iii) and (b), and the variance of the residuals are in the Appendix.
The pivotal characteristic of the e ij ( ) 's is that they depend on the covariates just as the least squares residuals do.Inspection of (3) shows that the impact of the i-th observation on the estimators depends on the product of e ij ( ) and x i .If an outlying x i is associated with a large absolute residual |e ij ( )| , the corresponding statistical unit will have a dominating role in (3) when determining the estimate of the parameter.If instead a large |e ij ( )| , produced by an anomalous response, is associated with a value of x i within the bulk of data, or viceversa an outlying x i is associated with a small |e ij ( )| , then the impact of the anomalous data in the estimation process is limited.Therefore observations characterized by an outlying x i associated with a large absolute residual are to be considered and treated as bad leverage points just as in the linear model.The analysis of the residuals, and their associations with the covariates, is essential in detecting outliers and assessing their impact on inference, as shown in the following example.
A sample of n = 200 observations has been generated from ( 5), obtain- ing the fitted model Y * = 2.54X 1 + 0.78X 2 + εi with estimated thresholds ̂ = (−3.25, −1.11, 1.03, 3.25) .Three possible contaminations of the data are con- sidered by adding an extra observation (Y c , X c 1 , X c 2 ) to the original sample.They are listed below.
It represents an anomalous response, since the val- ues of the regressors are thoroughly plausible.
The regressors identify a highly outlying design point, though-according to the model-the value of Y c is consistent with ) is less extreme than the previous (5, 5), but the response Y c = 1 is inconsistent with the covari- ates.
Table 1 shows the estimates of the regression coefficients obtained from the original data and from the three contaminated samples.The last row of Table 1 reports the distance between the estimates of the regression coefficients computed as fol- , where ̂ and ̂ c are the estimates of ( 5) obtained from the original and the contaminated sample, respectively, and V ( ̂ ) is the covariance matrix of ̂ estimated from the original sample.
Figure 1 shows the scatter plot of the residuals versus the Mahalanobis norm of the regressors.The residuals e ij ( ̂ ) are computed at ̂ .The Mahalanobis norm , where ̂ X and VX are the sample mean and the sample covariance matrix of X = (X 1 , X 2 ) evaluated on the first n observations.Contamination (i) is easily detected as an anomalous response, since ||x c || M = 1.063 is fairly small, but the residual takes the outlying value e c ( ̂ ) = −4.25 .It yields d( ̂ c , ̂ ) = 3.96.
For contamination (ii) we have ||x c || M = 6.52 with a small absolute residual e c ( ̂ ) = −0.94 .The distance d( ̂ c , ̂ ) = 0 shows that an extreme design point raises no concern if the response Y is consistent with the model.
Contamination (iii) corresponds to a fairly large design point ( ||x c || M = 3.27 ) associated with an inconsistent response, which yields a large absolute residual e c ( ̂ ) = −5.37 .The corresponding point is strongly influential with distance d( ̂ c , ̂ ) = 19.36.Generalized residuals and outlier detection for ordinal data…

Other types of residuals
Alternative residuals for ordinal response models have been proposed for various diagnostic purposes.Liu and Zhang (2018) propose the surrogate residuals based on a continuous variable S, which acts as a surrogate for the ordinal response Y, and is identically distributed to the latent variable Y * .Given Y = j , the distribution of S is obtained by truncating the distribution of X + using the interval The variable S is generated by sampling -conditionally on the observed ordinal outcomes (y 1 , … , y n )-according to a hypothetical probability model coherent with the assumed model for Y.In practice, after fitting the model, the parameters and are replaced by the corresponding estimates and the observed residual is r S i = s i − x i ̂ − ∫ dG() , where s i is the sampled value of S. The surrogate residuals allow to work on the continuous space of the latent data, rather than on the discrete space of the original data.They are mainly used for diagnostics on the mean structure, the link function, the proportional assumption and on heteroscedasticity.
Further alternative residuals are briefly reviewed in the supplementary material.

Outlier detection via residuals
A graphical inspection of the residuals may help identifying critical observations.To illustrate the idea a small experiment is carried out in the following example.
Example 2. A sample of 200 observations generated from model ( 5) is contaminated by an additional observation given by (Y c = 2, X c 1 = 1.0,X c 2 = 0.5) whose probability under the model is smaller than 0.0001, hence it is undoubtedly unlikely.The estimation of the model is repeated for 100 samples.The left panel of Fig. 2 shows the boxplots of the ML-residuals obtained from each of the contaminated samples.The boxplots exhibit an anomalous residual (approximately Example 2 suggests a procedure based on the residuals, to be routinely applied for anomalous data detection.Let Q 1 and Q 3 be the 25th and 75th percentiles of the empirical distribution of the residuals and let IQR = Q 3 − Q 1 .Residuals are considered large (in absolute value), i.e. corresponding to anomalous data, when they fall outside the interval The point to be solved is the choice of k.A good outlier detection procedure should reveal anomalous data with high probability, hence it should have a high rate of 'true positives', i.e. correctly detected anomalous statistical units.There is however the risk of erroneously classifying 'genuine' observations-those generated by the model-as outliers, producing 'false positives', and it is desirable to achieve a low rate of false positives.Unfortunately increasing k reduces both rates and , therefore the value of k derives from a compromise between the two conflicting needs of a high rate of true positives and a low rate of false positives.
Example 2 -(continued).Fig. 3 shows the Receiver Operating Characteristic (ROC) curve for k between 1.5 and 3.5, obtained from a simulation on 10 .000 samples with contaminating point (Y c = 2, X c 1 = 1.0,X c 2 = 0.5) .Notice that in each simulated sample there are n = 200 genuine observations, hence n rates i of false positives can be computed, one for each observation, as the simulated percentage of erroneous classifications.To summarize the results, in a conservative approach, we consider the worst case, i.e. max = max i∈{1,…n} i the maximum probability of labeling a genuine observation as an outlier.For k = 2.5 the (highest) rate of false positives is 0.15%-very low-while the rate of true positives is 98.5% .If k is decreased to 1.5 we get max = 2.2% while = 1.
Extensive simulation experiments carried out in the supplementary material, for various kinds and amounts of contamination, show that suitable values for k are within the interval (1.5, 2.5).k = 2.5 yields a very small rate of false positives (usually max ≤ 0.2% ), with very high rates of true positives Generalized residuals and outlier detection for ordinal data… (roughly between 0.7 and 1).For k = 1.5 max can reach 3.0% , still a pretty small probability.
In principle the same procedure can be applied also with the surrogate residuals.However, the simulation procedure generating the surrogate residuals introduces additional variability, which can inflate the rate of false positives and tarnish outliers.
Example 2-(continued).The right panel of Fig. 2 shows the boxplots of the surrogate residuals for the same samples of the left panel with the ML-residuals.The boxplots of the surrogate residuals are wider and the outlying residuals are not distinctly separated from the bulk of the other residuals.Consequently the outlier detection can be very poor.Table 2 compares, for the same samples of the ROC curve in Fig. 3, the rates of false positives and true positives obtained by the ML-residuals and by the surrogate residuals.For k = 2.5 the maximum rate of false positives by the surrogate residuals is 1.6% but the rate of true posi- tives is as low as 36.9% ,hence rather unsatisfactory.To increase up to 94.0% k should be reduced to 2.0 but than max can be as high as 12.2% , too large to be acceptable.For k = 2.5 the ML-residuals identify the anomalous observation in 98.5% of cases with a rate of false positives that does not exceed 0.2%.
Table 3 shows a supplement of the previous experiment for the case the covariates are correlated with cor(X 1 , X 2 ) = 0.7 .Dependence between covariates does not alter the diagnostic performance of ML-and surrogate residuals.6 Diagnostics via robust residuals and weights 6.1 M-estimation Iannario et al. (2017) propose an M-estimator of , which is the implicit solution where the term a( ) = E S( ;Y, X) w(Y, X; ) is required to achieve Fisher con- sistency, i.e.E{ ( ;Y, X)} = 0 (the estimator computed on the whole population yields the true value of the parameter; see Cox and Hinkley (1974, p. 287)).The weights w i = w(y i , x i ; ) in ( 6) are designed to downweight outlying observations in order to control their impact in the estimation.If w i ≡ 1 then a( ) ≡ 0 and ( ;y i , x i ) coincides with S( ;y i , x i ) , by showing the MLEs as a special case of M-estimators.
An appropriate choice of the weights ensures that the influence function of ̂ M is bounded (Hampel et al. 1986).
Since the observations with a large product �e ij ( )�‖x i ‖ M are the most influ- ential, the Huber weights w i = min 1, c∕ (e ij , x i ) are considered, where based on robust estimators of location and scale and c is a suitable tuning constant.The weights are a decreasing function of the product �e ij ( )�‖x i ‖ M , when it exceeds the tuning constant c.The value of c has to be chosen so that the loss of efficiency incurred by M-estimators, with respect to the MLEs, does not exceed a given threshold (say 5% or 10%) when there is no contamination in the data.Iannario et al. (2017) suggest that appropriate values of c, when the probit link is adopted, vary roughly between 1 and 2.
Alternative M-type estimators are proposed by Croux et al. ( 2013) with similar residuals but weights depending only on the covariates.These estimators do not take into account the interaction between the response and the covariates.
When the interest focuses on the majority of data, M-estimators provide a robust fitted model.If instead detecting anomalous and influential data is of interest, M-estimation has two further benefits.The residuals inherit the robustness of the estimators and the weights, which take simultaneously into account the magnitude of both residuals and covariates, provide an essential tool for identifying influential data.

Outlier detection via M-residuals
The procedure for outlier detection, performed with the ML-residuals in Sect. 5 can be applied also with the residuals generated by M-estimation.In case of multiple outliers masking effects may arise, which affect the MLEs and consequently spoil the diagnostic ability of the ML-residuals.The M-residuals, which rely on ( 6) Generalized residuals and outlier detection for ordinal data… robust estimators, preserve their ability in outlier detection even under a considerable amount of contamination.
Example 3.Here we consider outlier detection performed through ML-and M-residuals when a sample of n = 200 observations generated from model ( 5) is contaminated by 12 outliers, whose values are in Table 4.They include both anomalous responses and bad leverage points, and are designed to generate masking effects.
The rates of false positives and true positives, obtained by a simulation on 10 .000 samples, are displayed in Table 5.Since this experiment deals with multiple outliers, for each outlier a true positive rate can be computed.To summarize the results we consider both the average rate of true positives and the worst case given by the minimum rate of true positives.Both rates of true positives are systematically higher for the M-residuals, without appreciable increase in the rate of false positives.When the number of outliers increases the detective potential of the ML-residuals collapses, while the M-residuals maintain their effectiveness.Similar results are obtained when cor(X 1 , X 2 ) = 0.7 , as shown in Table 6.
Figure 4 compares the ROC curves based on the ML-and M-residuals when the sample is contaminated by the points of Table 4.The better performance of the M-residuals is clearly evident.For k = 2 they yield a high rate of true positives with a very low rate of false positives.
This experiment, similarly to those of the supplementary material, shows that values of k around 2 provide a good compromise between the rate of false positives and the rate of true positives produced by M-residuals in the multiple outliers case.
Example 4. Three contaminating points are added to the original dataset of Example 1.They are ) .For all of them the covariates are positive, hence the response Y c = 1 is anomalous.The three points differ for the outlyingness of the X 1 1.5 2.5 3.5 0.0 0.0 0.0 1.0 1.0 1.0 2.0 3.0 5.0 X 2 0.0 0.0 0.0 1.5 2.5 3.5 1.0 2.0 3.0 1.0 1.0 5.0 covariates.The Mahalanobis norms of the covariates are 10.016, 1.947 and 1.637 respectively.Figure 5 shows the ML-and the M-residuals versus the Mahalanobis norm of the covariates.Although all the three points are anomalous, the x i for the first one is remarkably outlying.By comparing the two panels, the MLresiduals (on the left) associated to the contaminating points appear much smaller  Generalized residuals and outlier detection for ordinal data… in absolute value than the corresponding M-residuals (on the right).In the MLcontext, the first contaminating point acts as a leverage point in linear regression.Bad leverage points can affect the model fitted by MLEs in such a way that their deviation from the tendency of the other points cannot even be seen.Hence if x i is outlying, a small ML-residual does not prevent an observation from being a bad leverage point.On the contrary, the robust M-residuals evidently signal outlying points.

Analysis of influential data via M-weights
Atypical data have a different impact on the ML-estimators according to the product �e ij ( )�‖x i ‖ M .An analysis of the weights, which are a decreasing func- tion of �e ij ( )�‖x i ‖ M , can reveal which statistical units are more influential for the MLEs (the weighting system in (6) evidently limits the impact of atypical data on the M-estimators).The smaller w i the more influential is the statistical unit, which can either be generated from the model or be an outlier.The concept is not binary: the influence of an observation can arise with various intensity.Nevertheless some suggestions for the identification and the analysis of the impact of potentially influential points are important to assess the sensitivity of the fitted model to small changes in the sample.
Example 5. To profile a guideline for the interpretation of the weights, an experiment has been carried out.100 samples generated from model ( 5) have been contaminated by the 12 influential outliers of Table 4. Figure 6 shows the boxplots of the weights associated to each statistical unit (in the 100 samples).The first 200 boxplots correspond to the n observations generated by the model and are entirely flattened on w = 1 though occasionally few of them can give rise to outlying low weights.The last 12 boxplots in the lower right corner correspond to the outliers.They are all located below 0.4, the larger ‖x i ‖ M the lower the corresponding boxplot.
The results of Example 5 (along with those of the supplementary material) suggest-as a guideline-that weights smaller than 0.4 signal potential influential data.

Analysis of self-perceived health status
Most OECD countries conduct regular health surveys that allow respondents to express opinions on different aspects of their health.A commonly asked question is: "How is your health in general?" as reported in the SHARE questionnaire.Despite the subjective nature of this question, indicators of perceived health are considered valuable predictors of people's future health care use.The self-reported health is measured on a rating scale with 5 categories ranging from Excellent = 1 to Poor = 5.
In accordance with the literature (see Borgonovi and Pokropek (2016) and Ambrasat et al. ( 2011)) the following latent regression model is considered 1   where Maxgrip is the maximum hand grip strength (evaluated on a 0 − 100 scale), Lifesat is the perceived life satisfaction measured on a scale from 0 to 10 (where 0 means completely dissatisfied and 10 completely satisfied with life) and Edu represents the years of education (which in our data range from 0 to 22).Lifesat may be handled as nominal, ordinal (see Espinosa and Hennig (2019)) or numeric; here we choose the last option by referring to Agresti (2010) and taking advantage of the large number of categories.
Nevertheless, if the model is fitted on the full dataset we get with Wald statistics t 1 = −40.790, t 2 = −86.370and t 3 = −35.060.The difference in the results is due to the occurrence of few outliers, which potentially have a large influence on the estimators and on the related standard errors.Figure 7 shows the scatter plot of the residuals versus the Mahalanobis norm , where ̃ X and ṼX are robust estimators of the location and of the covariance matrix of X based on the Minimum Covariance Determinant estimator (Rousseeuw 1984(Rousseeuw , 1985)).( 7) The R code for reproducing the results of this Section is in the supplementary material, whereas the SHARE 2017 data sample is available upon request from the Authors.The full dataset is available at http:// www.share-proje ct.org/ home0.html.

3
Generalized residuals and outlier detection for ordinal data… In Fig. 7 we observe a large absolute residual corresponding to the statistical unit 45 (with value −2.412 ) whose corresponding ||x i || M = 2.01 is on average.This observation (in the lower left part of the scatter plot) is an anomalous response by an interviewee, whose excellent perceived health status sharply conflicts with his very low grip strength.There are also the observations 30 and 90 characterized by large regressors ( ||x i || M > 4 ) whose residuals are close to zero.They are outlying covari- ates associated with responses coherent with the model, which have little influence on the estimators.These observations are generally related to persons with no life satisfaction and a fairly high level of education, who-coherently with the fitted model-indicate a poor perceived status on the rating scale.Finally there are the statistical units 58, 97, 191 characterized by a large product e ij ( )||x i || M , which are likely to be influential.The corresponding health status is excellent but they have an average or low grip strength and are entirely dissatisfied with their life.
The left panel of Fig. 8 shows the boxplot of the ML-residuals.By taking k = 2 the statistical units 97 and 58 turn out to be outliers.Removing these data from the sample yields the fitted model with ̂ = (−3.939,−3.239, −2.034, −1.134) , t 1 = −2.917, t 2 = −4.115and t 3 = −1.585 .The significance of the estimate of the regression coefficient related to education has increased.
The right panel of Fig. 8 shows the M-residuals.By considering k = 2 the points 45, 58, 97 and 191 emerge as outliers.The same result is obtained by considering the ML-residuals with k = 1.5 .It is to be noticed that M-residuals signal outlying data in a more pronounced way and clearly separate anomalous responses from bad leverage points.Removing also the statistical units 45 and 191 leads to .This outcome clearly points out that the model that fits the majority of data should include education among the explanatory variables, and it is consistent with the model fitted by the M-estimators, which is with ̂ = (−4.327,−3.555, −2.320, −1.415) , t 1 = −2.928, t 2 = −5.047and t 3 = −1.920 .This result does not imply that the outliers should be discarded.It highlights that there is a model suitable for most of data, though few statistical units display a different behavior.These observations require discussion, and should not be removed (unless they are erroneous).
The left panel of Fig. 9 shows the boxplot of the weights, which highlights that the statistical units 58, 97 and 191 are highly influential (their weights are between 0.077 and 0.084).Also the anomalous response, corresponding to the statistical units 45, has a low weight ( w i = 0.263 ).Along with these data, there are further points that have a relevant impact on estimation.They are identified also in the right panel of Fig. 9 with the scatter plot of the M-residuals versus the Mahalanobis norm of the covariates.The area delimited by the dashed lines incorporates data such that w i < 0.4 .Among these points there are the statistical units 151 with weight 0.264, 189 with weight 0.330, 9 with weight 0.377 and 157 with weight 0.364.Neither the residuals nor the covariates of these points are especially large, but it is the product �e ij ( )�‖x i ‖ M that is substantial.
Finally Fig. 10 shows the empirical influence function of the MLEs corresponding to the n statistical units, versus the robust Mahalanobis norm of the covariates, computed according to the procedure of Croux et al. (2013) and Pison and Van Aelst (2004), using the M-estimator of Sect.6.1 as robust estimator.The

Final remarks
The paper deals with diagnostics aimed at outliers detection and influential data identification in the context of ordinal response models, where an anomalous respondents' behavior and outlying covariates may affect the reliability of likelihood based inference.The properties of the generalized residuals are derived, by showing that they are naturally generated by ML-or M-estimation.Their inspection is proved to be a fundamental procedure for identifying atypical data.Instead, the assessment of the influence is obtained directly from the weights produced by M-estimation, which take simultaneously into account the magnitude of both residuals and covariates.Some simple guidelines are provided for the analysis of both residuals and weights suggesting a new methodology for diagnostics.
A case study concerning the perceived health status in elderly people is examined.Here the diagnostic procedures allow to recognize statistical units whose behavior is in contrast with the model that fits the majority of the data, and signal influential data for the fitted model.

Proofs of some proprierties and variance of the generalized residuals
Conditionally on x i we have Hence the residuals have zero expectation.Their variance, conditionally on x i , is given by Notice that the variance does not depend on the scores/values attached to the categories.
Next we prove that the residuals are monotonic with respect to the observed category, for a fixed value of the covariate (property (iii)).We assume that log{g(⋅)} 1 3 Generalized residuals and outlier detection for ordinal data… is concave, an assumption fulfilled by the probit, logit, log-log and complementary log-log link function.By the Cauchy theorem we have where * j is an intermediate point between j−1 and j .Since log{g(⋅)} is concave, its derivative is a decreasing function.It yields property (iii), If category j and j + 1 are merged, the new residual is a weighted average of e ij ( ) and e ij+1 ( ) with weights proportional to the probabilities of the merged categories (branching property (b)).It is given by Supplementary Information The online version contains supplementary material available at https:// doi.org/ 10. 1007/ s10260-023-00686-1.
Funding Open access funding provided by Università degli Studi di Napoli Federico II within the CRUI-CARE Agreement.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material.If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.To view a copy of this licence, visit http:// creat iveco mmons.org/ licen ses/ by/4.0/. .

Fig. 1
Fig. 1 Residuals versus the Mahalanobis norm of the covariates.The points corresponding to the contamination (i), (ii) and (iii) are indicated with a triangle, a square and a diamond respectively.For each contaminating point d( ̂ c , ̂ ) is reported

Fig. 2
Fig. 2 Boxplot of the ML-residuals (left panel) and surrogate residuals (right panel) from data contaminated by the additional point (Y = 2, X 1 = 1.0,X 2 = 0.5) whose corresponding residual is indicated by an asterisk

Fig. 3
Fig. 3 ROC curve for outlier detection-contaminating point (Y c = 2, X c 1 = 1.0,X c 2 = 0.5)-the numbers associated to the points are the values of k

Fig. 4
Fig. 4 ROC curves when the sample is contaminated by 12 anomalous observations based on ML-(solid line with circles) and M-(dashed line with squares) residuals

Fig. 6
Fig. 6 Boxplot of the weights

Fig. 7
Fig. 7 ML-residuals versus the norm of the covariates.The triangles are anomalous responses, the squares represent extreme design points, and the diamonds are potentially influential data

Table 1
Estimates of the regression coefficients from the original data and the contaminated samples in Example 1

Table 2
False positives and true positives by ML-residuals and surrogate residuals-contaminating point (Y c = 2, X c

Table 3
False positive and true positives by ML-residuals and surrogate residuals when the covariates are correlated-contaminating point (Y c = 2, X c

Table 5
False positives and true positives by ML-and M-residuals when the sample is contaminated by 12 anomalous points

Table 6
False positive and true positives by ML-and M-residuals when the sample is contaminated by 12 anomalous points and the covariates are correlated