1 Introduction

Decision making processes are often carried out and supported by means of the results of questionnaires, where the respondent expresses his/her evaluation/perception through a rating on a scale made up by ordered categories. From a statistical viewpoint, the ratings are ordinal responses, which may be related to the subjects’ covariates through an appropriate model. The most frequently implemented models to this purpose, within the generalized linear models (GLM) framework, are the cumulative models introduced by McCullagh (1980). Here the ordinal response is assumed to be a categorization of a continuous latent variable, which depends on the covariates through a latent regression.

Anomalous data may arise in this context—eliciting a challenging data structure—because of outlying covariates or incoherent behaviors by respondents, who deliberately or unconsciously choose a wrong category, as a consequence of a satisficing aptitude, willingness to joke or fake, or a search for a shelter category (Iannario and Piccolo 2016). Furthermore, gross errors can occur. Anomalous data produce a contamination of the assumed model distribution, which can seriously affect the reliability of the Maximum Likelihood Estimators (MLE) leading to a poorly fitted model. Recent studies point out the need of procedures for outlier detection (Riani et al. 2011) or, alternatively, of a robust inferential approach (see Croux et al. (2013), Iannario et al. (2017) and Scalera et al. (2021) in the context of GLMs, and Iannario et al. (2016) for cub models (Piccolo 2003)).

The current paper deals with diagnostic procedures for ordinal response models finalized to detect atypical and influential observations. Identifying anomalous data can prevent from unreliable inferential results especially when these data are due to gross errors and are influential. On the other side, atypical data can reveal statistical units whose behavior is incoherent with the majority of the data, and therefore of interest for their challenging characteristics. They tell a different story and may be the target of ad hoc decision making and policies.

Special emphasis is placed on the generalized residuals (Gourieroux et al. 1987; Franses and Paap 2004), whose properties are investigated. They are connected with the covariates similarly to the least squares residuals in linear regression. Their magnitude allows to detect incoherent responses and to assess the influence of extreme design points.

Diagnostics is therefore pursued along two lines: outliers detection and influential data identification. The former are statistical units incoherent with the model that fits the majority of data, whereas the latter are observations with a high impact on both estimation and testing. The outliers can be identified by the analysis of the generalized residuals produced by ML- or—more effectively—by robust M-estimation, whereas the assessment of the influence of the statistical units relies on the inspection of the weights generated by M-estimation (Iannario et al. 2017).

The above diagnostic procedures are applied to the perceived health status, a valuable predictor of people’s future health care use, mortality and social capital (see Palladino et al. (2016), Raina et al. (2002) and Arezzo and Giudici (2017)). The analysis is also related to biomedical aspects associated with care perspective and lifestyle improvements, which lead to a lower mortality risk (Rappange et al. 2016). Hence the perceived health status is a key source of information, which affects many economic and health choices. Data are from the Survey of Health, Ageing and Retirement in Europe (SHARE) in 2016–17, which includes nationally representative samples of \(n=68^.618\) persons aged 50 and older from 16 European nations. The dependent variable is the self-reported health with responses on a 5-point scale. Useful covariates are life satisfaction, education and muscle strength (measured by an objective test). Diagnostics allow to identify anomalous responses and bad leverage points, which can alter the outcome of the tests on the relevance of the covariates.

The paper is organized as follows: the next Section provides a quick overview of the cumulative models for ordinal responses. Section 3 illustrates the properties of the generalized residuals pointing out their role in the analysis of anomalous and influential data. Section 4 briefly reviews alternative residuals. Section 5 introduces simple guidelines for outlier detection based on the residuals. Diagnostic procedures based on the residuals and the weights produced by M-estimation are illustrated in Sect. 6. The proposed diagnostics is implemented on the self-perceived health status data in Sect. 7. Some concluding remarks end the paper. Proofs related to the properties of the generalized residuals are reported in the Appendix, whereas simulation studies assessing the performances of the proposed procedures and alternative diagnostic methods are in the supplementary material, along with the R code for the reproducibility of the analysis.

2 Cumulative models for ordinal data

In cumulative models the response variable Y takes integer values between 1 and m. Its value derives from the categorization of an underlying (continuous) latent variable \(Y^*\). We have

$$\begin{aligned} Y=j \qquad \Longleftrightarrow \qquad \tau _{j-1} < Y^* \le \tau _{j} \qquad j=1,2,\ldots ,m, \qquad m>2, \end{aligned}$$
(1)

where \(-\infty =\tau _0< \tau _1<\ldots < \tau _m=+\infty \) are the thresholds (or cut-points) of the scale of \( Y^*\). Given the vector \(\varvec{X}_i =\left( X_{i1}, \ldots X_{ip}\right) \) of covariates for the i-th statistical unit, the relationship with \(\varvec{X}_i\) is established through the latent regression model

$$\begin{aligned} Y_i^*= \varvec{X}_i \varvec{\beta }+ \epsilon _i,\qquad i=1, 2, \ldots , n, \end{aligned}$$
(2)

where \(\varvec{\beta }=(\beta _1,\dots ,\beta _p)'\) is the vector of regression coefficients, \(\epsilon _i \sim G(\cdot )\) and \(G(\cdot )\) is a distribution function. Equations (1) and (2) yield the cumulative probability \(P\scriptstyle {r}\displaystyle \left( Y \le j\right) =G(\tau _j-\varvec{X} \varvec{\beta })\), hence \(G^{-1}\) defines the link function. If \(G(\cdot )\) is the standardized normal distribution we have the probit link, while the logistic distribution yields the logit link, and the extreme value distribution leads to the (complementary) log-log link (Agresti 2010; Tutz 2012). Alternative links are proposed by Aranda-Ordaz (1983) and Genter and Farewell (1985). Here we focus on the probit link.

Let \(\varvec{\theta }=(\varvec{\tau }', \varvec{\beta }')'\) where \(\varvec{\tau }=(\tau _1,\dots ,\tau _{m-1})'\), then \(\varvec{\theta }\in \Omega (\varvec{\theta })\) where \(\Omega (\varvec{\theta })\) is an open subset of \(\mathbb {R}^{p+m-1}\). Consider an observed random sample \((y_i, \varvec{x}_i),\) for \(i=1,2, \ldots , n\), where \(y_i\) and \(\varvec{x}_i\) are the values of \(Y_i\) and \(\varvec{X}_i\). The log-likelihood function (McCullagh 1980) is \(\sum _{i=1}^{n}\,\ell _i(\varvec{\theta }; y_i,\varvec{x}_i)\) with individual contribution \( \ell _i(\varvec{\theta }; y_i,\varvec{x}_i)= \sum _{j=1}^m\, I[y_i=j] \log P\scriptstyle {r}\displaystyle \left( Y_{i}=j|\varvec{x}_i\right) \), where \(I[\omega ]\) is the indicator function that takes value 1 if \(\omega \) holds and 0 otherwise, and \(P\scriptstyle {r}\displaystyle \left( Y_i=j\mid \varvec{x}_i\right) =G(\tau _{j}-\varvec{x}_i\varvec{\beta })-G(\tau _{j-1}-\varvec{x}_i\varvec{\beta })\), for \(j=1,2,\ldots ,m\).

The score function is \(S_n(\varvec{\theta })=\sum _{i=1}^{n} S(\varvec{\theta };y_i, \varvec{x}_i)\) where \(S(\varvec{\theta };y_i, \varvec{x}_i)=\left[ S_{\tau _1}(\varvec{\theta };y_i, \varvec{x}_i), \ldots ,\right. \) \( \left. S_{\tau _{m-1}}(\varvec{\theta };y_i, \varvec{x}_i), S_{\beta _1}(\varvec{\theta };y_i, \varvec{x}_i), \ldots ,S_{\beta _p}(\varvec{\theta }; y_i, \varvec{x}_i)\right] ' \), and \(S_{\theta _l}(\varvec{\theta };y_i, \varvec{x}_i)=\partial \ell _i(\varvec{\theta }; y_i,\varvec{x}_i)/\partial \theta _l\), for \(l=1,\ldots ,m+p-1\) (see Iannario et al. (2017) for the analytical expressions). The MLE is the solution \(\hat{\varvec{\theta }}=(\hat{\varvec{\tau }}',\hat{\varvec{\beta }}')'\) of \(S_n(\hat{\varvec{\theta }})= \varvec{0}\).

Here the interest focuses on the score functions for the regression coefficients \(\beta _k\)

$$\begin{aligned} \sum _{i=1}^n S_{\beta _k}(\varvec{\theta };y_i, \varvec{x}_i)=\sum _{i=1}^n\sum _{j=1}^m I[y_i=j]e_{ij}(\varvec{\theta })x_{ik}, \qquad k=1,\ldots , p. \end{aligned}$$
(3)

The quantities \(e_{ij}(\varvec{\theta })\) in (3) are the generalized residuals

$$\begin{aligned} e_{ij}(\varvec{\theta })= - \frac{g(\tau _{j}- \varvec{x}_i\varvec{\beta })-g(\tau _{j-1}- \varvec{x}_i\varvec{\beta })}{G(\tau _{j}- \varvec{x}_i\varvec{\beta })-G(\tau _{j-1}- \varvec{x}_i\varvec{\beta })},\qquad i=1,2,\ldots ,n; \quad j=1,2,\ldots ,m , \end{aligned}$$
(4)

where \(g(\cdot )\) is the density function corresponding to the distribution function \(G(\cdot )\). The sign in (4) is reversed with respect to the original definition (Franses and Paap 2004, p. 123) to make \(e_{ij}(\varvec{\theta })\) an increasing function of the categories for given \(\varvec{x}_i\varvec{\beta }\) (see (iii) of Sect. 3). Notice that the inner sum in (3) contains only one term, the j-th one such that \(y_i=j\), since \(\sum _{j=1}^m I(y_i=j)=1\).

3 Generalized residuals

The estimating Eq. (3) are analogous to the normal equations in the linear regression model, i.e. \(\sum _{i=1}^n\,r_i\varvec{x}_i=\varvec{0},\) where the \(r_i\)’s are the least squares residuals and \(\varvec{0}\) is a vector of zeros. Hence the generalized residuals (hereafter abbreviated as ‘residuals’ when no conflict arises) in the cumulative models context correspond to the \(r_i\)’s in linear models, and actually they share some of the properties of the least squares residuals. In particular

  1. (i)

    There is only one value per subject, i.e. \(e_{ij}(\varvec{\theta })\) if \(y_i=j\);

  2. (ii)

    Their expectation is zero;

  3. (iii)

    Under the assumption that \(\log \left\{ g(\cdot )\right\} \) is concave, they are monotonically increasing with respect to the observed category for a fixed value of the covariates, i.e. \(e_{ij}(\varvec{\theta }) < e_{ij'}(\varvec{\theta })\) for \(j < j'\).

In addition the residuals have the following proprieties of specific interest for ordinal response models:

  1. (a)

    They do not depend on arbitrary scores/values attributed to the categories,

  2. (b)

    They satisfy the branching property (if two categories are merged the new residual is a weighted average of the initial residuals, with weights proportional to the probabilities of the merged categories).

Proofs of (ii), (iii) and (b), and the variance of the residuals are in the Appendix.

The pivotal characteristic of the \(e_{ij}(\varvec{\theta })\)’s is that they depend on the covariates just as the least squares residuals do. Inspection of (3) shows that the impact of the i-th observation on the estimators depends on the product of \(e_{ij}(\varvec{\theta })\) and \(\varvec{x}_i\). If an outlying \(\varvec{x}_i\) is associated with a large absolute residual \(|e_{ij}(\varvec{\theta })|\), the corresponding statistical unit will have a dominating role in (3) when determining the estimate of the parameter. If instead a large \(|e_{ij}(\varvec{\theta })|\), produced by an anomalous response, is associated with a value of \(\varvec{x}_i\) within the bulk of data, or viceversa an outlying \(\varvec{x}_i\) is associated with a small \(|e_{ij}(\varvec{\theta })|\), then the impact of the anomalous data in the estimation process is limited. Therefore observations characterized by an outlying \(\varvec{x}_i\) associated with a large absolute residual are to be considered and treated as bad leverage points just as in the linear model. The analysis of the residuals, and their associations with the covariates, is essential in detecting outliers and assessing their impact on inference, as shown in the following example.

Example 1. Let Y be an ordered response with \(m=5\) categories generated from the following latent regression model

$$\begin{aligned} Y^*=\beta _1 X_1 + \beta _2 X_2 +\epsilon \,, \end{aligned}$$
(5)

where \(X_1\) and \(X_2\) are mutually independent N(0, 1) random variables, \(\beta _1=2.5\), \(\beta _2=0.7\), \(\epsilon \sim N(0,1)\) and the tresholds are \((\tau _1, \tau _2, \tau _3, \tau _4)=(-3, -1, 1, 3)\). According to model (5), the response Y increases with both \(X_1\) and \(X_2\).

A sample of \(n=200\) observations has been generated from (5), obtaining the fitted model \(Y^*=2.54 X_1+ 0.78 X_2 + {\hat{\epsilon }}_i\) with estimated thresholds \(\hat{\varvec{\tau }}= (-3.25,-1.11, 1.03, 3.25)\). Three possible contaminations of the data are considered by adding an extra observation \((Y^c,X_1^c,X_2^c)\) to the original sample. They are listed below.

  1. (i)

    \((Y^c=1, X_1^c=1, X^c_2=0.5)\). It represents an anomalous response, since the values of the regressors are thoroughly plausible.

  2. (ii)

    \((Y^c=5, X_1^c=5,X^c_2=5)\). The regressors identify a highly outlying design point, though—according to the model—the value of \(Y^c\) is consistent with \((X_1^c,X_2^c)\).

  3. (iii)

    \((Y^c=1, X_1^c=2.5,X^c_2=2.5)\). The couple \((X_1^c,X^c_2)=(2.5,2.5)\) is less extreme than the previous (5, 5), but the response \(Y^c=1\) is inconsistent with the covariates.

Table 1 shows the estimates of the regression coefficients obtained from the original data and from the three contaminated samples. The last row of Table 1 reports the distance between the estimates of the regression coefficients computed as follows \(d(\hat{\varvec{\beta }}^c, \hat{\varvec{\beta }}) =(\hat{\varvec{\beta }}^c- \hat{\varvec{\beta }})' \hat{\varvec{V}}_\beta (\hat{\varvec{\theta }})^{-1} (\hat{\varvec{\beta }}^c- \hat{\varvec{\beta }}) \), where \(\hat{\varvec{\beta }}\) and \(\hat{\varvec{\beta }}^c\) are the estimates of \(\varvec{\beta }\) obtained from the original and the contaminated sample, respectively, and \(\hat{\varvec{V}}_\beta (\hat{\varvec{\theta }})\) is the covariance matrix of \(\hat{\varvec{\beta }}\) estimated from the original sample.

Table 1 Estimates of the regression coefficients from the original data and the contaminated samples in Example 1

Figure 1 shows the scatter plot of the residuals versus the Mahalanobis norm of the regressors. The residuals \(e_{ij}(\hat{\varvec{\theta }})\) are computed at \(\hat{\varvec{\theta }}\). The Mahalanobis norm of \(\varvec{x}_i\) is computed as \(||\varvec{x}_i||_M=\left\{ (\varvec{x}_i- \hat{\varvec{\mu }}_X)' \hat{\varvec{V}}_X^{-1}(\varvec{x}_i- \hat{\varvec{\mu }}_X)\right\} ^{1/2}\), where \(\hat{\varvec{\mu }}_X\) and \(\hat{\varvec{V}}_X\) are the sample mean and the sample covariance matrix of \(\varvec{X}=(X_1,X_2)\) evaluated on the first n observations.

Fig. 1
figure 1

Residuals versus the Mahalanobis norm of the covariates. The points corresponding to the contamination (i), (ii) and (iii) are indicated with a triangle, a square and a diamond respectively. For each contaminating point \(d(\hat{\varvec{\beta }}^c, \hat{\varvec{\beta }})\) is reported

Contamination (i) is easily detected as an anomalous response, since \(||\varvec{x}^c||_M=1.063\) is fairly small, but the residual takes the outlying value \(e_{c}(\hat{\varvec{\theta }})=-4.25\). It yields \(d(\hat{\varvec{\beta }}^c, \hat{\varvec{\beta }})=3.96\).

For contamination (ii) we have \(||\varvec{x}^c||_M=6.52\) with a small absolute residual \(e_{c}(\hat{\varvec{\theta }})=-0.94\). The distance \(d(\hat{\varvec{\beta }}^c, \hat{\varvec{\beta }})=0\) shows that an extreme design point raises no concern if the response Y is consistent with the model.

Contamination (iii) corresponds to a fairly large design point (\(||\varvec{x}^c||_M=3.27\)) associated with an inconsistent response, which yields a large absolute residual \(e_{c}(\hat{\varvec{\theta }})=-5.37\). The corresponding point is strongly influential with distance \(d(\hat{\varvec{\beta }}^c, \hat{\varvec{\beta }})=19.36\).

4 Other types of residuals

Alternative residuals for ordinal response models have been proposed for various diagnostic purposes. Liu and Zhang (2018) propose the surrogate residuals based on a continuous variable S, which acts as a surrogate for the ordinal response Y, and is identically distributed to the latent variable \(Y^*\). Given \(Y=j\), the distribution of S is obtained by truncating the distribution of \(\varvec{X} \varvec{\beta }+\epsilon \) using the interval \((\tau _{j-1},\tau _j)\). The surrogate residual is \(R_S=S-E(S|X)= S-\varvec{X} \varvec{\beta }-\int \epsilon dG(\epsilon )\). The variable S is generated by sampling – conditionally on the observed ordinal outcomes \((y_1, \ldots , y_n)\)-according to a hypothetical probability model coherent with the assumed model for Y. In practice, after fitting the model, the parameters \(\varvec{\tau }\) and \(\varvec{\beta }\) are replaced by the corresponding estimates and the observed residual is \(r_i^S=s_i-\varvec{x}_i \hat{\varvec{\beta }} -\int \epsilon dG(\epsilon )\), where \(s_i\) is the sampled value of S. The surrogate residuals allow to work on the continuous space of the latent data, rather than on the discrete space of the original data. They are mainly used for diagnostics on the mean structure, the link function, the proportional assumption and on heteroscedasticity.

Further alternative residuals are briefly reviewed in the supplementary material.

5 Outlier detection via residuals

A graphical inspection of the residuals may help identifying critical observations. To illustrate the idea a small experiment is carried out in the following example.

Example 2. A sample of 200 observations generated from model (5) is contaminated by an additional observation given by \((Y^c=2, X_1^c=1.0, X_2^c=0.5)\) whose probability under the model is smaller than 0.0001, hence it is undoubtedly unlikely. The estimation of the model is repeated for 100 samples. The left panel of Fig. 2 shows the boxplots of the ML-residuals obtained from each of the contaminated samples. The boxplots exhibit an anomalous residual (approximately around \(-4\)), clearly distant from the others, signaling an atypical observation corresponding to the contaminating point.

Fig. 2
figure 2

Boxplot of the ML-residuals (left panel) and surrogate residuals (right panel) from data contaminated by the additional point \((Y=2, X_1=1.0, X_2=0.5)\) whose corresponding residual is indicated by an asterisk

Example 2 suggests a procedure based on the residuals, to be routinely applied for anomalous data detection. Let \(Q_1\) and \(Q_3\) be the 25th and 75th percentiles of the empirical distribution of the residuals and let \(IQR=Q_3-Q_1\). Residuals are considered large (in absolute value), i.e. corresponding to anomalous data, when they fall outside the interval \(\left( Q_1- k \, IQR, Q_3 + k \, IQR \right) \).

The point to be solved is the choice of k. A good outlier detection procedure should reveal anomalous data with high probability, hence it should have a high rate \(\gamma \) of ‘true positives’, i.e. correctly detected anomalous statistical units. There is however the risk of erroneously classifying ‘genuine’ observations-those generated by the model-as outliers, producing ‘false positives’, and it is desirable to achieve a low rate \(\alpha \) of false positives. Unfortunately increasing k reduces both rates \(\gamma \) and \(\alpha \), therefore the value of k derives from a compromise between the two conflicting needs of a high rate of true positives and a low rate of false positives.

Example 2 - (continued). Fig. 3 shows the Receiver Operating Characteristic (ROC) curve for k between 1.5 and 3.5, obtained from a simulation on \(10^.000\) samples with contaminating point \((Y^c=2, X_1^c=1.0, X_2^c=0.5)\). Notice that in each simulated sample there are \(n=200\) genuine observations, hence n rates \(\alpha _i\) of false positives can be computed, one for each observation, as the simulated percentage of erroneous classifications. To summarize the results, in a conservative approach, we consider the worst case, i.e. \(\alpha _{max}=\underset{i \in \{1,\ldots n\}}{max}\alpha _i\) the maximum probability of labeling a genuine observation as an outlier. For \(k=2.5\) the (highest) rate of false positives is \(0.15\%\)—very low—while the rate of true positives is \(98.5\%\). If k is decreased to 1.5 we get \(\alpha _{max}=2.2\%\) while \(\gamma =1\).

Fig. 3
figure 3

ROC curve for outlier detection—contaminating point \((Y^c=2, X_1^c=1.0, X_2^c=0.5)\)—the numbers associated to the points are the values of k

Extensive simulation experiments carried out in the supplementary material, for various kinds and amounts of contamination, show that suitable values for k are within the interval (1.5, 2.5). \(k=2.5\) yields a very small rate of false positives (usually \(\alpha _{max}\le 0.2\%\)), with very high rates of true positives (roughly between 0.7 and 1). For \(k=1.5\) \(\alpha _{max}\) can reach \(3.0\%\), still a pretty small probability.

In principle the same procedure can be applied also with the surrogate residuals. However, the simulation procedure generating the surrogate residuals introduces additional variability, which can inflate the rate \(\alpha \) of false positives and tarnish outliers.

Example 2—(continued). The right panel of Fig. 2 shows the boxplots of the surrogate residuals for the same samples of the left panel with the ML-residuals. The boxplots of the surrogate residuals are wider and the outlying residuals are not distinctly separated from the bulk of the other residuals. Consequently the outlier detection can be very poor. Table 2 compares, for the same samples of the ROC curve in Fig. 3, the rates of false positives and true positives obtained by the ML-residuals and by the surrogate residuals. For \(k=2.5\) the maximum rate of false positives by the surrogate residuals is \(1.6\%\) but the rate of true positives is as low as \(36.9\%\), hence rather unsatisfactory. To increase \(\gamma \) up to \(94.0\%\) k should be reduced to 2.0 but than \(\alpha _{max}\) can be as high as \(12.2\%\), too large to be acceptable. For \(k=2.5\) the ML-residuals identify the anomalous observation in \(98.5\%\) of cases with a rate of false positives that does not exceed \(0.2\%\).

Table 2 False positives and true positives by ML-residuals and surrogate residuals—contaminating point \((Y^c=2, X_1^c=1.0, X_2^c=0.5)\)

Table 3 shows a supplement of the previous experiment for the case the covariates are correlated with \(cor(X_1,X_2)=0.7\). Dependence between covariates does not alter the diagnostic performance of ML- and surrogate residuals.

Table 3 False positive and true positives by ML-residuals and surrogate residuals when the covariates are correlated—contaminating point \((Y^c=2, X_1^c=1.0, X_2^c=0.5)\)

6 Diagnostics via robust residuals and weights

6.1 M-estimation

Iannario et al. (2017) propose an M-estimator of \(\varvec{\theta }\), which is the implicit solution \(\hat{\varvec{\theta }}_M\) of

$$\begin{aligned} \sum _{i=1}^{n} \, \psi (\hat{\varvec{\theta }}_M; y_i, \varvec{x}_i)=\sum _{i=1}^{n} \, S(\hat{\varvec{\theta }}_M; y_i, \varvec{x}_i)\,w(y_i, \varvec{x}_i; \hat{\varvec{\theta }}_M)-a(\hat{\varvec{\theta }}_M)=\varvec{0}, \end{aligned}$$
(6)

where the term \(a(\varvec{\theta })=E \big \{ S(\varvec{\theta }; Y, \varvec{X})\,w(Y, \varvec{X}; \varvec{\theta }) \big \} \) is required to achieve Fisher consistency, i.e. \(E\{\psi (\varvec{\theta }; Y, \varvec{X})\}=\varvec{0}\) (the estimator computed on the whole population yields the true value of the parameter; see Cox and Hinkley (1974, p. 287)). The weights \(w_i=w(y_i, \varvec{x}_i; \varvec{\theta })\) in (6) are designed to downweight outlying observations in order to control their impact in the estimation. If \(w_i \equiv 1\) then \(a(\varvec{\theta }) \equiv 0\) and \(\psi (\varvec{\theta }; y_i, \varvec{x}_i)\) coincides with \(S(\varvec{\theta }; y_i, \varvec{x}_i)\), by showing the MLEs as a special case of M-estimators. An appropriate choice of the weights ensures that the influence function of \(\hat{\varvec{\theta }}_M\) is bounded (Hampel et al. 1986).

Since the observations with a large product \(|e_{ij}(\varvec{\theta })| \Vert \varvec{x}_i\Vert _M\) are the most influential, the Huber weights \(w_i =\min \left\{ 1, c /\kappa (e_{ij}, \varvec{x}_i)\right\} \) are considered, where \(\kappa (e_{ij}, \varvec{x}_i)=\sum _{j=1}^m \,I[y_i=j] \,\mid e_{ij}(\varvec{\theta })\mid \,\Vert \varvec{x}_i\Vert _M\), the Mahalanobis norm \(||\varvec{x}_i||_M\) is based on robust estimators of location and scale and c is a suitable tuning constant. The weights are a decreasing function of the product \(| e_{ij}(\varvec{\theta })| \Vert \varvec{x}_i \Vert _M\), when it exceeds the tuning constant c. The value of c has to be chosen so that the loss of efficiency incurred by M-estimators, with respect to the MLEs, does not exceed a given threshold (say 5% or 10%) when there is no contamination in the data. Iannario et al. (2017) suggest that appropriate values of c, when the probit link is adopted, vary roughly between 1 and 2.

Alternative M-type estimators are proposed by Croux et al. (2013) with similar residuals but weights depending only on the covariates. These estimators do not take into account the interaction between the response and the covariates.

When the interest focuses on the majority of data, M-estimators provide a robust fitted model. If instead detecting anomalous and influential data is of interest, M-estimation has two further benefits. The residuals inherit the robustness of the estimators and the weights, which take simultaneously into account the magnitude of both residuals and covariates, provide an essential tool for identifying influential data.

6.2 Outlier detection via M-residuals

The procedure for outlier detection, performed with the ML-residuals in Sect. 5 can be applied also with the residuals generated by M-estimation. In case of multiple outliers masking effects may arise, which affect the MLEs and consequently spoil the diagnostic ability of the ML-residuals. The M-residuals, which rely on robust estimators, preserve their ability in outlier detection even under a considerable amount of contamination.

Example 3. Here we consider outlier detection performed through ML- and M-residuals when a sample of \(n=200\) observations generated from model (5) is contaminated by 12 outliers, whose values are in Table 4. They include both anomalous responses and bad leverage points, and are designed to generate masking effects.

Table 4 Contaminating data

The rates of false positives and true positives, obtained by a simulation on \(10^.000\) samples, are displayed in Table 5. Since this experiment deals with multiple outliers, for each outlier a true positive rate can be computed. To summarize the results we consider both the average rate of true positives and the worst case given by the minimum rate of true positives. Both rates of true positives are systematically higher for the M-residuals, without appreciable increase in the rate of false positives. When the number of outliers increases the detective potential of the ML-residuals collapses, while the M-residuals maintain their effectiveness. Similar results are obtained when \(cor(X_1,X_2)=0.7\), as shown in Table 6.

Table 5 False positives and true positives by ML- and M-residuals when the sample is contaminated by 12 anomalous points
Table 6 False positive and true positives by ML- and M-residuals when the sample is contaminated by 12 anomalous points and the covariates are correlated

Figure 4 compares the ROC curves based on the ML- and M-residuals when the sample is contaminated by the points of Table 4. The better performance of the M-residuals is clearly evident. For \(k=2\) they yield a high rate of true positives with a very low rate of false positives.

Fig. 4
figure 4

ROC curves when the sample is contaminated by 12 anomalous observations based on ML- (solid line with circles) and M- (dashed line with squares) residuals

This experiment, similarly to those of the supplementary material, shows that values of k around 2 provide a good compromise between the rate of false positives and the rate of true positives produced by M-residuals in the multiple outliers case.

Example 4. Three contaminating points are added to the original dataset of Example 1. They are \(\left( Y^c=1, X_1^c=15.0, X_2^c=1.5\right) \), \((Y^c=1, X_1^c=3.0, X_2^c=0)\) and \((Y^c=1, X_1^c=2.5, \) \(X_2^c=0.5)\). For all of them the covariates are positive, hence the response \(Y^c=1\) is anomalous. The three points differ for the outlyingness of the covariates. The Mahalanobis norms of the covariates are 10.016, 1.947 and 1.637 respectively. Figure 5 shows the ML- and the M-residuals versus the Mahalanobis norm of the covariates. Although all the three points are anomalous, the \(\varvec{x}_i\) for the first one is remarkably outlying. By comparing the two panels, the ML-residuals (on the left) associated to the contaminating points appear much smaller in absolute value than the corresponding M-residuals (on the right). In the ML-context, the first contaminating point acts as a leverage point in linear regression. Bad leverage points can affect the model fitted by MLEs in such a way that their deviation from the tendency of the other points cannot even be seen. Hence if \(\varvec{x}_i\) is outlying, a small ML-residual does not prevent an observation from being a bad leverage point. On the contrary, the robust M-residuals evidently signal outlying points.

Fig. 5
figure 5

ML-residuals (on the left) and M-residuals (on the right) versus the norm of the covariates. The points \(\left( Y^c=1, X_1^c=15.0, X_2^c=1.5\right) \), \(\left( Y^c=1, X_1^c=3.0, X_2^c=0\right) \) and \(\left( Y^c=1, X_1^c=2.5, X_2^c=0.5\right) \) are indicated with a triangle, a square and a diamond respectively

6.3 Analysis of influential data via M-weights

Atypical data have a different impact on the ML-estimators according to the product \(|e_{ij}(\varvec{\theta })| \Vert \varvec{x}_i\Vert _M\). An analysis of the weights, which are a decreasing function of \(|e_{ij}(\varvec{\theta })| \Vert \varvec{x}_i\Vert _M\), can reveal which statistical units are more influential for the MLEs (the weighting system in (6) evidently limits the impact of atypical data on the M-estimators). The smaller \(w_i\) the more influential is the statistical unit, which can either be generated from the model or be an outlier. The concept is not binary: the influence of an observation can arise with various intensity. Nevertheless some suggestions for the identification and the analysis of the impact of potentially influential points are important to assess the sensitivity of the fitted model to small changes in the sample.

Example 5. To profile a guideline for the interpretation of the weights, an experiment has been carried out. 100 samples generated from model (5) have been contaminated by the 12 influential outliers of Table 4. Figure 6 shows the boxplots of the weights associated to each statistical unit (in the 100 samples). The first 200 boxplots correspond to the n observations generated by the model and are entirely flattened on \(w=1\) though occasionally few of them can give rise to outlying low weights. The last 12 boxplots in the lower right corner correspond to the outliers. They are all located below 0.4, the larger \(\varvec{\Vert }\varvec{x}_i\Vert _M\) the lower the corresponding boxplot.

The results of Example 5 (along with those of the supplementary material) suggest—as a guideline—that weights smaller than 0.4 signal potential influential data.

Fig. 6
figure 6

Boxplot of the weights

7 Analysis of self-perceived health status

Most OECD countries conduct regular health surveys that allow respondents to express opinions on different aspects of their health. A commonly asked question is: “How is your health in general?” as reported in the SHARE questionnaire. Despite the subjective nature of this question, indicators of perceived health are considered valuable predictors of people’s future health care use. The self-reported health is measured on a rating scale with 5 categories ranging from \(Excellent=1\) to \(Poor=5\).

In accordance with the literature (see Borgonovi and Pokropek (2016) and Ambrasat et al. (2011)) the following latent regression model is consideredFootnote 1

$$\begin{aligned} Y_i^*=\beta _1 \, Maxgrip_i + \beta _2 \, Lifesat_i + \beta _3 \, Edu_i + \epsilon _i ,\qquad i=1,\ldots ,n, \end{aligned}$$
(7)

where Maxgrip is the maximum hand grip strength (evaluated on a \(0-100\) scale), Lifesat is the perceived life satisfaction measured on a scale from 0 to 10 (where 0 means completely dissatisfied and 10 completely satisfied with life) and Edu represents the years of education (which in our data range from 0 to 22). Lifesat may be handled as nominal, ordinal (see Espinosa and Hennig (2019)) or numeric; here we choose the last option by referring to Agresti (2010) and taking advantage of the large number of categories.

A cumulative probit model based on (7) is estimated on a sample of \(n=200\) respondents of the SHARE data. The MLEs yield

$$\begin{aligned}Y^*_i = -\underset{(0.007)}{0.017}\ \, Maxgrip_i - \underset{(0.039)}{0.097}\ \, Life_i - \underset{(0.021)}{0.028}\ \, Edu_i +{{\hat{\epsilon }}}_i, \qquad i=1,\ldots , n, \end{aligned}$$

with \({{\hat{\tau }}}_1=-3.080\), \({{\hat{\tau }}}_2=-2.445\), \({{\hat{\tau }}}_3=-1.305\), \({{\hat{\tau }}}_4=-0.444\). In parenthesis there are the corresponding standard errors. The Wald statistics for the regression coefficients are \(t_{\beta _1}= -2.650\), \(t_{\beta _2}= -2.509\) and \(t_{\beta _3}= -1.315\), indicating a non significant coefficient for the variable Edu.

Nevertheless, if the model is fitted on the full dataset we get

$$\begin{aligned}Y^*_i = -\underset{(0.000)}{0.015}\ \, Maxgrip_i - \underset{(0.002)}{0.205}\ \, Life_i - \underset{(0.001)}{0.035}\ \, Edu_i +{{\hat{\epsilon }}}_i, \qquad i=1,\ldots , n, \end{aligned}$$

with Wald statistics \(t_{\beta _1}= -40.790\), \(t_{\beta _2}= -86.370\) and \(t_{\beta _3}= -35.060\).

The difference in the results is due to the occurrence of few outliers, which potentially have a large influence on the estimators and on the related standard errors. Figure 7 shows the scatter plot of the residuals versus the Mahalanobis norm \(||\varvec{x}_i||_M=\left\{ (\varvec{x}_i- \tilde{\varvec{\mu }}_X)' \tilde{\varvec{V}}_X^{-1}(\varvec{x}_i- \tilde{\varvec{\mu }}_X)\right\} ^{1/2}\), where \(\tilde{\varvec{\mu }}_X\) and \(\tilde{\varvec{V}}_X\) are robust estimators of the location and of the covariance matrix of \(\varvec{X}\) based on the Minimum Covariance Determinant estimator (Rousseeuw 1984, 1985).

In Fig. 7 we observe a large absolute residual corresponding to the statistical unit 45 (with value \(-2.412\)) whose corresponding \(||\varvec{x}_i||_M=2.01\) is on average. This observation (in the lower left part of the scatter plot) is an anomalous response by an interviewee, whose excellent perceived health status sharply conflicts with his very low grip strength. There are also the observations 30 and 90 characterized by large regressors (\(||\varvec{x}_i||_M>4\)) whose residuals are close to zero. They are outlying covariates associated with responses coherent with the model, which have little influence on the estimators. These observations are generally related to persons with no life satisfaction and a fairly high level of education, who-coherently with the fitted model-indicate a poor perceived health status on the rating scale. Finally there are the statistical units 58, 97, 191 characterized by a large product \(e_{ij}(\varvec{\theta })||\varvec{x}_i||_M\), which are likely to be influential. The corresponding health status is excellent but they have an average or low grip strength and are entirely dissatisfied with their life.

Fig. 7
figure 7

ML-residuals versus the norm of the covariates. The triangles are anomalous responses, the squares represent extreme design points, and the diamonds are potentially influential data

The left panel of Fig. 8 shows the boxplot of the ML-residuals. By taking \(k=2\) the statistical units 97 and 58 turn out to be outliers. Removing these data from the sample yields the fitted model

$$\begin{aligned}Y^*_i = -\underset{(0.007)}{0.019}\ \, Maxgrip_i -\underset{(0.042)}{0.174}\ \, Life_i -\underset{(0.022)}{0.034}\ \, Edu_i +{{\hat{\epsilon }}}_i, \qquad i=1,\ldots , n, \end{aligned}$$

with \(\hat{\varvec{\tau }}=(-3.939, -3.239, -2.034, -1.134)\), \(t_{\beta _1}= -2.917\), \(t_{\beta _2}= -4.115\) and \(t_{\beta _3}= -1.585\). The significance of the estimate of the regression coefficient related to education has increased.

Fig. 8
figure 8

Boxplot of the ML-residuals (left panel) and M-residuals (right panel)

The right panel of Fig. 8 shows the M-residuals. By considering \(k=2\) the points 45, 58, 97 and 191 emerge as outliers. The same result is obtained by considering the ML-residuals with \(k=1.5\). It is to be noticed that M-residuals signal outlying data in a more pronounced way and clearly separate anomalous responses from bad leverage points. Removing also the statistical units 45 and 191 leads to

$$\begin{aligned}Y^*_i = -\underset{(0.007)}{0.020}\ \, Maxgrip_i -\underset{(0.046)}{0.243}\ \, Life_i -\underset{(0.022)}{0.050}\ \, Edu_i +{{\hat{\epsilon }}}_i, \qquad i=1,\ldots , n, \end{aligned}$$

with \(\hat{\varvec{\tau }}=(-4.863, -4.063, -2.784, -1.844)\), \(t_{\beta _1}= -3.072\), \(t_{\beta _2}= -5.317\) and \(t_{\beta _3}= -2.260\). This outcome clearly points out that the model that fits the majority of data should include education among the explanatory variables, and it is consistent with the model fitted by the M-estimators, which is

$$\begin{aligned}Y^*_i = -\underset{(0.006)}{0.018}\ \, Maxgrip_i -\underset{(0.041)}{0.208}\ \, Life_i -\underset{(0.021)}{0.041}\ \, Edu_i +{{\hat{\epsilon }}}_i, \qquad i=1,\ldots , n, \end{aligned}$$

with \(\hat{\varvec{\tau }}=(-4.327, -3.555, -2.320, -1.415)\), \(t_{\beta _1}= -2.928\), \(t_{\beta _2}= -5.047\) and \(t_{\beta _3}= -1.920\). This result does not imply that the outliers should be discarded. It highlights that there is a model suitable for most of data, though few statistical units display a different behavior. These observations require discussion, and should not be removed (unless they are erroneous).

Fig. 9
figure 9

Boxplot of M-weights (left panel) and M-residuals versus the norm of the covariates (right panel)

The left panel of Fig. 9 shows the boxplot of the weights, which highlights that the statistical units 58, 97 and 191 are highly influential (their weights are between 0.077 and 0.084). Also the anomalous response, corresponding to the statistical units 45, has a low weight (\(w_i=0.263\)). Along with these data, there are further points that have a relevant impact on estimation. They are identified also in the right panel of Fig. 9 with the scatter plot of the M-residuals versus the Mahalanobis norm of the covariates. The area delimited by the dashed lines incorporates data such that \(w_i<0.4\). Among these points there are the statistical units 151 with weight 0.264, 189 with weight 0.330, 9 with weight 0.377 and 157 with weight 0.364. Neither the residuals nor the covariates of these points are especially large, but it is the product \(|e_{ij}(\varvec{\theta })| \Vert \varvec{x}_i\Vert _M\) that is substantial.

Finally Fig. 10 shows the empirical influence function of the MLEs corresponding to the n statistical units, versus the robust Mahalanobis norm of the covariates, computed according to the procedure of Croux et al. (2013) and Pison and Van Aelst (2004), using the M-estimator of Sect. 6.1 as robust estimator. The vertical line corresponds to the cutoff value \(\chi ^2_{p,0.075}\) where \(\chi ^2_{p,\xi }\) is the \(100\xi \)-th percentile of the \(\chi ^2_p\) distribution. The horizontal line corresponds to the cutoff for the influence measure, obtained as the empirical 95-th percentile of a set of simulated overall empirical influence functions. According to Croux et al. (2013), Part I in the plot contains the regular observations, Part II corresponds to anomalous responses, Part III to the good leverage points and Part IV to the bad leverage points. Consequently the points 97, 191, 58 are identified as bad leverage points and the unit 45 is identified as an anomalous response. These results are in complete agreement with the previous findings. Notice however that the procedure leading to Fig. 10 is based on simulation and is therefore remarkably heavier from the computational viewpoint with respect to the diagnostics based on the M-residuals and weights.

Fig. 10
figure 10

Empirical influence of the observations versus the norm of the covariates

8 Final remarks

The paper deals with diagnostics aimed at outliers detection and influential data identification in the context of ordinal response models, where an anomalous respondents’ behavior and outlying covariates may affect the reliability of likelihood based inference. The properties of the generalized residuals are derived, by showing that they are naturally generated by ML- or M-estimation. Their inspection is proved to be a fundamental procedure for identifying atypical data. Instead, the assessment of the influence is obtained directly from the weights produced by M-estimation, which take simultaneously into account the magnitude of both residuals and covariates. Some simple guidelines are provided for the analysis of both residuals and weights suggesting a new methodology for diagnostics.

A case study concerning the perceived health status in elderly people is examined. Here the diagnostic procedures allow to recognize statistical units whose behavior is in contrast with the model that fits the majority of the data, and signal influential data for the fitted model.