Conditional selective inference for robust regression and outlier detection using piecewise-linear homotopy continuation

Tsukurimichi, Toshiaki; Inatsu, Yu; Duy, Vo Nguyen Le; Takeuchi, Ichiro

doi:10.1007/s10463-022-00846-2

Conditional selective inference for robust regression and outlier detection using piecewise-linear homotopy continuation

Published: 27 August 2022

Volume 74, pages 1197–1228, (2022)
Cite this article

Annals of the Institute of Statistical Mathematics Aims and scope Submit manuscript

Toshiaki Tsukurimichi¹,
Yu Inatsu¹,
Vo Nguyen Le Duy^1,2 &
…
Ichiro Takeuchi^1,3,4

308 Accesses
3 Citations
Explore all metrics

Abstract

In this paper, we consider conditional selective inference (SI) for a linear model estimated after outliers are removed from the data. To apply the conditional SI framework, it is necessary to characterize the events of how the robust method identifies outliers. Unfortunately, the existing conditional SIs cannot be directly applied to our problem because they are applicable to the case where the selection events can be represented by linear or quadratic constraints. We propose a conditional SI method for popular robust regressions such as least-absolute-deviation regression and Huber regression by introducing a new computational method using a convex optimization technique called homotopy method. We show that the proposed conditional SI method is applicable to a wide class of robust regression and outlier detection methods and has good empirical performance on both synthetic data and real data experiments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A new computational framework for log-concave density estimation

Article Open access 30 April 2024

Data-driven distributionally robust optimization using the Wasserstein metric: performance guarantees and tractable reformulations

Article Open access 07 July 2017

Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC

Article 30 August 2016

Notes

Note that the authors in Chen and Bien (2020) point out the possibility of Huberized Lasso approach, but did not provide any numerical results. For comparison in §5, we implemented it by ourselves.
Since numerical results on Huberized Lasso were not presented in Chen and Bien (2020), we implemented it by ourself (see Appendix C in the supplementary material).

References

Allgower, E. L., George, K. (1993). Continuation and path following. Acta Numerica, 2, 1–63.
Article MathSciNet MATH Google Scholar
Andrews, D. F. (1974). A robust method for multiple linear regression. Technometrics, 16(4), 523–531.
Article MathSciNet MATH Google Scholar
Atkinson, A. (1986). [influential observations, high leverage points, and outliers in linear regression]: Comment: Aspects of diagnostic regression analysis. Statistical Science, 1(3), 397–402.
Article Google Scholar
Bach, F. R., Heckerman, D., Horvits, E. (2006). Considering cost asymmetry in learning classifiers. Journal of Machine Learning Research, 7, 1713–41.
MathSciNet MATH Google Scholar
Barrodale, I., Roberts, F. D. (1973). An improved algorithm for discrete l_1 linear approximation. SIAM Journal on Numerical Analysis, 10(5), 839–848.
Article MathSciNet MATH Google Scholar
Benjamini, Y., Yekutieli, D. (2005). False discovery rate-adjusted multiple confidence intervals for selected parameters. Journal of the American Statistical Association, 100(469), 71–81.
Article MathSciNet MATH Google Scholar
Benjamini, Y., Heller, R., Yekutieli, D. (2009). Selective inference in complex research. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 367(1906), 4255–4271.
Article MathSciNet MATH Google Scholar
Berk, R., Brown, L., Buja, A., Zhang, K., Zhao, L., et al. (2013). Valid post-selection inference. The Annals of Statistics, 41(2), 802–837.
Article MathSciNet MATH Google Scholar
Best, M. J. (1996). An algorithm for the solution of the parametric quadratic programming problem. Applied mathematics and parallel computing, 57-76.
Bickel, P. J. (1973). On some analogues to linear combinations of order statistics in the linear model. The Annals of Statistics, 597–616.
Brownlee, K. A. (1965). Statistical theory and methodology in science and engineering, Vol. 150. New York: Wiley.
MATH Google Scholar
Chen, S., Bien, J. (2019). Valid inference corrected for outlier removal. Journal of Computational and Graphical Statistics, 1–12.
Chen, S., Bien, J. (2020). Valid inference corrected for outlier removal. Journal of Computational and Graphical Statistics, 29(2), 323–334.
Article MathSciNet MATH Google Scholar
Choi, Y., Taylor, J., Tibshirani, R., et al. (2017). Selecting the number of principal components: Estimation of the true rank of a noisy matrix. The Annals of Statistics, 45(6), 2590–2617.
Article MathSciNet MATH Google Scholar
Cleveland, W. S. (1979). Robust locally weighted regression and smoothing scatterplots. Journal of the American Statistical Association, 74(368), 829–836.
Article MathSciNet MATH Google Scholar
Cook, R. D. (1977). Detection of influential observation in linear regression. Technometrics, 19(1), 15–18.
MathSciNet MATH Google Scholar
Das, D., Duy, V. N. L., Hanada, H., Tsuda, K., Takeuchi, I. (2022). Fast and more powerful selective inference for sparse high-order interaction model. Proceedings of AAAI conference on artificial intelligence.
Duy, V. N. L. Takeuchi, I. (2021). Parametric programming approach for more powerful and general lasso selective inference. In International conference on artificial intelligence and statistics, 901–909.
Duy, V. N. L., Iwazaki, S., Takeuchi, I. (2020a). Quantifying statistical significance of neural network representation-driven hypotheses by selective inference. arXiv preprint arXiv:2010.01823.
Duy, V. N. L., Toda, H., Sugiyama, R., Takeuchi, I. (2020b). Computing valid p-value for optimal changepoint by selective inference using dynamic programming. Advances in Neural Information Processing Systems, 33, 11356–11367.
Efron, B., and Tibshirani, R. (2004). Least angle regression. Annals of Statistics, 32(2), 407–499.
Article MathSciNet MATH Google Scholar
Ellenberg, J. H. (1973). The joint distribution of the standardized least squares residuals from a general linear regression. Journal of the American Statistical Association, 68(344), 941–943.
Article MathSciNet MATH Google Scholar
Ellenberg, J. H. (1976). Testing for a single outlier from a general linear regression. Biometrics, 32, 637–645.
Article MathSciNet MATH Google Scholar
Fithian, W., Taylor, J., Tibshirani, R., Tibshirani, R. (2015). Selective sequential model selection. arXiv preprint arXiv:1512.02565.
Gal, T. (1995). Postoptimal analysis, parametric programming, and related topics. Berlin: Walter de Gruyter.
Google Scholar
Giesen, J., Jaggi, M., Laue, S. (2012a). Approximating parameterized convex optimization problems. ACM Transactions on Algorithms (TALG), 9(1), 1–17.
Article MathSciNet MATH Google Scholar
Giesen, J., Müller, J., Laue, S., Swiercy, S. (2012b). Approximating concavely parameterized optimization problems. Advances in Neural Information Processing Systems, 25, 2105–2113.
Google Scholar
Harvey, A. C. (1977). A comparison of preliminary estimators for robust regression. Journal of the American Statistical Association, 72(360a), 910–913.
Article MathSciNet MATH Google Scholar
Hastie, T., Rosset, S., Tibshirani, R., Zhu, J. (2004a). The entire regularization path for the support vector machine. Journal of Machine Learning Research, 5, 1391–415.
MathSciNet MATH Google Scholar
Hastie, T., Rosset, S., Tibshirani, R. (2004b). The entire regularization path for the support vector machine. Journal of Machine Learning Research, 5, 1391–1415.
MathSciNet MATH Google Scholar
Hill, R. W., and Holland, P. W. (1977). Two robust alternatives to least-squares regression. Journal of the American Statistical Association, 72(360a), 828–833.
Article MATH Google Scholar
Hocking, T., j. P. Vert, Bach, F., Joulin, A. (2011). Clusterpath: an algorithm for clustering using convex fusion penalties. Proceedings of the 28th international conference on machine learning, 745–752.
Hoeting, J., Raftery, A. E., Madigan, D. (1996). A method for simultaneous variable selection and outlier identification in linear regression. Computational Statistics & Data Analysis, 22(3), 251–270.
Article MATH Google Scholar
Huber, P. J. (2004). Robust statistics, Vol. 523. Cambridge, Massachusetts: John Wiley & Sons.
Google Scholar
Huber, P. J., et al. (1973). Robust regression: Asymptotics, conjectures and monte carlo. The Annals of Statistics, 1(5), 799–821.
Article MathSciNet MATH Google Scholar
Hyun, S., Lin, K., G’Sell, M., Tibshirani, R. J. (2018). Post-selection inference for changepoint detection algorithms with application to copy number variation data. arXiv preprint arXiv:1812.03644.
Joshi, P. C. (1972). Some slippage tests of mean for a single outlier in linear regression. Biometrika, 59(1), 109–120.
Article MathSciNet MATH Google Scholar
Karasuyama, M., Takeuchi, I. (2010). Nonlinear regularization path for quadratic loss support vector machines. IEEE Transactions on Neural Networks, 22(10), 1613–1625.
Article Google Scholar
Karasuyama, M., Harada, N., Sugiyama, M., and Takeuchi, I. (2012). Multi-parametric solution-path algorithm for instance-weighted support vector machines. Machine Learning, 88(3), 297–330.
Article MathSciNet MATH Google Scholar
Koenker, R. (2005). Quantile regression. Cambridge, Massachusetts: Cambridge University Press.
Book MATH Google Scholar
Koenker R, Bassett Jr, G. (1978) Regression quantiles. Econometrica: Journal of the Econometric Society, (pp. 33–50).
Lee, G., Scott, C. (2007). The one class support vector machine solution path. Proceedings of international conference on acoustics speech and signal processing 2007, II521–II524.
Lee, J. D., Sun, D. L., Sun, Y., Taylor, J. E., et al. (2016). Exact post-selection inference, with application to the lasso. The Annals of Statistics, 44(3), 907–927.
Article MathSciNet MATH Google Scholar
Leeb, H., Pötscher, B. M. (2005). Model selection and inference: Facts and fiction. Econometric Theory, 21(1), 21–59.
Article MathSciNet MATH Google Scholar
Leeb, H., Pötscher, B. M., et al. (2006). Can one estimate the conditional distribution of post-model-selection estimators? The Annals of Statistics, 34(5), 2554–2591.
Article MathSciNet MATH Google Scholar
Lockhart, R., Taylor, J., Tibshirani, R. J., Tibshirani, R. (2014). A significance test for the lasso. Annals of Statistics, 42(2), 413.
MathSciNet MATH Google Scholar
Loftus, J. R., Taylor, J. E. (2014). A significance test for forward stepwise model selection. arXiv preprint arXiv:1405.3920.
Loftus, J. R., Taylor, J. E. (2015). Selective inference in regression models with groups of variables. arXiv preprint arXiv:1511.01478.
Maronna, R. A., Martin, R. D., Yohai, V. J., et al. (2019). Robust statistics: Theory and methods (with R). Cambridge, Massachusetts: John Wiley & Sons.
MATH Google Scholar
Murty, K. G. (1983). Linear programming. Berlin: Springer.
MATH Google Scholar
Ndiaye, E., Takeuchi, I. (2019). Computing full conformal prediction set with approximate homotopy. Advances in Neural Information Processing Systems, 32, 1386–1395.
Google Scholar
Ndiaye, E., Le, T., Fercoq, O., Salmon, J., Takeuchi, I. (2019). Safe grid search with optimal complexity. In International conference on machine learning, 4771–4780.
Ogawa, K., Imamura, M., Takeuchi, I., Sugiyama, M. (2013). Infinitesimal annealing for training semi-supervised support vector machines. International conference on machine learning, 897–905.
Osborne, M. R., Presnell, B., Turlach, B. A. (2000). A new approach to variable selection in least squares problems. IMA Journal of Numerical Analysis, 20(3), 389–403.
Article MathSciNet MATH Google Scholar
Pan, J.-X., Fang, K.-T. (1995). Multiple outlier detection in growth curve model with unstructured covariance matrix. Annals of the Institute of Statistical Mathematics, 47(1), 137–153.
Article MathSciNet MATH Google Scholar
Panigrahi, S., Taylor, J., Weinstein, A. (2016). Bayesian post-selection inference in the linear model. arXiv preprint arXiv:1605.08824, 28.
Pötscher, B. M., Schneider, U., et al. (2010). Confidence sets based on penalized maximum likelihood estimators in gaussian regression. Electronic Journal of Statistics, 4, 334–360.
Article MathSciNet MATH Google Scholar
Ritter, K. (1984). On parametric linear and quadratic programming problems. Mathematical programming: Proceedings of the international congress on mathematical programming, 307–335.
Rosset, S., Zhu, J. (2007). Piecewise linear regularized solution paths. Annals of Statistics, 35, 1012–1030.
Article MathSciNet MATH Google Scholar
Rousseeuw, P. J., Leroy, A. M. (2005). Robust regression and outlier detection, Vol. 589. Cambridge, Massachusetts: John Wiley & Sons.
MATH Google Scholar
She, Y., Owen, A. B. (2011). Outlier detection using nonconvex penalized regression. Journal of the American Statistical Association, 106(494), 626–639.
Article MathSciNet MATH Google Scholar
Shibagaki, A., Suzuki, Y., Karasuyama, M., Takeuchi, I. (2015). Regularization path of cross-validation error lower bounds. Advances in Neural Information Processing Systems, 28, 1675–1683.
Google Scholar
Shimodaira, H., Terada, Y. (2019). Selective inference for testing trees and edges in phylogenetics. Frontiers in Ecology and Evolution, 7, 174.
Article Google Scholar
Srikantan, K. (1961). Testing for the single outlier in a regression model. Sankhyā: The Indian Journal of Statistics, Series A, 251–260.
Srivastava, M. S., von Rosen, D. (1998). Outliers in multivariate regression models. Journal of Multivariate Analysis, 65(2), 195–208.
Article MathSciNet MATH Google Scholar
Sugiyama, K., Duy, V. N. L., Takeuchi, I. (2020). More powerful and general selective inference for stepwise feature selection using the homotopy continuation approach. arXiv preprint arXiv:2012.13545.
Sugiyama, R., Toda, H., Duy, V. N. L., Inatsu, Y., Takeuchi, I. (2021). Valid and exact statistical inference for multi-dimensional multiple change-points by selective inference. arXiv preprint arXiv:2110.08989.
Suzumura, S., Nakagawa, K., Umezu, Y., Tsuda, K., Takeuchi, I. (2017). Selective inference for sparse high-order interaction models. Proceedings of the 34th international conference on machine learning, Vol. 70, 3338–3347.
Takeuchi, I., Sugiyama, M. (2011). Target neighbor consistent feature weighting for nearest neighbor classification. Advances in Neural Information Processing Systems, 24, 576–584.
Google Scholar
Takeuchi, I., Nomura, K., Kanamori, T. (2009). Nonparametric conditional density estimation using piecewise-linear solution path of kernel quantile regression. Neural Computation, 21(2), 539–559.
Article MathSciNet MATH Google Scholar
Takeuchi, I., Hongo, T., Sugiyama, M., Nakajima, S. (2013). Parametric task learning. Advances in Neural Information Processing Systems, 26, 1358–1366.
Google Scholar
Tanizaki, K., Hashimoto, N., Inatsu, Y., Hontani, H., Takeuchi, I. (2020). Computing valid p-values for image segmentation by selective inference. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 9553–9562.
Taylor, J., Lockhart, R., Tibshirani, R. J., Tibshirani, R. (2014). Post-selection adaptive inference for least angle regression and the lasso. arXiv preprint arXiv:1401.3889, 354.
Terada, Y., Shimodaira, H. (2017). Selective inference for the problem of regions via multiscale bootstrap. arXiv preprint arXiv:1711.00949.
Tian, X., Taylor, J., et al. (2018). Selective inference with a randomized response. The Annals of Statistics, 46(2), 679–710.
Article MathSciNet MATH Google Scholar
Tibshirani, R. J., Taylor, J., Lockhart, R., Tibshirani, R. (2016). Exact post-selection inference for sequential regression procedures. Journal of the American Statistical Association, 111(514), 600–620.
Article MathSciNet Google Scholar
Tsuda, K. (2007). Entire regularization paths for graph data. Proceedings of international conference on machine learning, 2007, 919–925.
Google Scholar
Welsch, R. E., Kuh, E. (1977). Linear regression diagnostics. Technical Report 173, National Bureau of Economic Research, Cambridge, Massachusetts.
Yamada, M., Umezu, Y., Fukumizu, K., Takeuchi, I. (2018a). Post selection inference with kernels. In International conference on artificial intelligence and statistics, 152–160.
Yamada, M., Wu, D., Tsai, Y.-H. H., Takeuchi, I., Salakhutdinov, R., Fukumizu, K. (2018b). Post selection inference with incomplete maximum mean discrepancy estimator. arXiv preprint arXiv:1802.06226.
Yang, F., Barber, R. F., Jain, P., Lafferty, J. (2016). Selective inference for group-sparse linear models. Advances in Neural Information Processing Systems, 29, 2469–2477.
Google Scholar
Zaman, A., Rousseeuw, P. J., Orhan, M. (2001). Econometric applications of high-breakdown robust regression techniques. Economics Letters, 71(1), 1–8.
Article MATH Google Scholar

Download references

Acknowledgements

This work was partially supported by MEXT KAKENHI (20H00601, 16H06538), JST CREST (JPMJCR21D3), JST Moonshot R &D (JPMJMS2033-05), JST AIP Acceleration Research (JPMJCR21U2), NEDO (JPNP18002, JPNP20006) and RIKEN Center for Advanced Intelligence Project. We thank the two anonymous reviewers for their constructive comments which help us to improve the paper.

Author information

Authors and Affiliations

Department of Engineering, Nagoya Institute of Technology, Gokiso-cho, Showa-ku, Nagoya, Aichi, 466-8555, Japan
Toshiaki Tsukurimichi, Yu Inatsu, Vo Nguyen Le Duy & Ichiro Takeuchi
RIKEN, 2-1 Hirosawa, Wako, Saitama, 351-0198, Japan
Vo Nguyen Le Duy
Now at Department of Engineering, Nagoya University, Furo-cho, Chikusa-ku, Nagoya, Aichi, 464-8603, Japan
Ichiro Takeuchi
RIKEN Center for Advanced Intelligence Project, Nihonbashi, 1 Chome – 4 – 1, Chuo City, Tokyo, 103-0027, Japan
Ichiro Takeuchi

Authors

Toshiaki Tsukurimichi
View author publications
You can also search for this author in PubMed Google Scholar
Yu Inatsu
View author publications
You can also search for this author in PubMed Google Scholar
Vo Nguyen Le Duy
View author publications
You can also search for this author in PubMed Google Scholar
Ichiro Takeuchi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ichiro Takeuchi.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (pdf 78 KB)

Appendix A: Proofs

Proof of Lemma 1

Let $\rho _i(z) = y_i(z)-\varvec{x}_i^\top \varvec{\beta }$ be a parametric residual for each i. Then, (21) can be written as

$$\begin{aligned} \hat{\varvec{\beta }}^{R}(z) = \mathop \mathrm{arg~min}\limits _{\varvec{\beta }} \sum _{i=1}^n |\rho _i(z)|. \end{aligned}$$

Furthermore, for each i, let $\rho _i^+(z)$ and $\rho _i^-(z)$ be non-negative numbers with $\rho _i(z) =\rho _i^+(z)-\rho _i^-(z)$. Since at least one of $\rho _i(z)^+$ and $\rho _i(z)^-$ can be set to zero, the following equation holds:

$$\begin{aligned} |\rho _i(z)|=|\rho _i^+(z) -\rho _i^-(z)| = \rho _i^+(z) +\rho _i^-(z),\ i \in [n]. \end{aligned}$$

Therefore, $\hat{\varvec{\beta }}^{R}(z)$ in the parametric approach can be given by solving the following parametric linear programming problem as

$$\begin{aligned}&\mathrm{min} \sum _{i=1}^n\left( \rho _i^+(z) +\rho _i^-(z)\right) \nonumber \\&\mathrm{s.t.}\;\; \rho _i^+(z) - \rho _i^-(z)=y_i(z)-\varvec{x}_i^{\top }\varvec{\beta }(z), \rho _i^+(z) \ge 0, \rho _i^-(z) \ge 0,\ i \in [n]. \end{aligned}$$

(26)

Next, we show that (26) can be expressed as (22). The $\varvec{\beta }(z)$ in (26) can be re-written as follows:

$$\begin{aligned} \varvec{\beta }(z)=\varvec{\beta }^+(z) - \varvec{\beta }^-(z) , \;\;\; \varvec{\beta }^+(z)\ge \varvec{0},\;\; \varvec{\beta }^-(z)\ge \varvec{0}. \end{aligned}$$

Thus, since ${\varvec{Y}} = {\varvec{a}} + {\varvec{b}} z$, by letting

$$\begin{aligned} \varvec{r}&= \left( \begin{array}{cccccccccc} \rho _1^+ (z)&\rho _1^- (z)&\cdots&\rho _n^+ (z)&\rho _n^- (z)&\beta _{1}^{+} (z)&\beta _{1}^{-} (z)&\cdots&\beta _{p}^{+} (z)&\beta _{p}^{-} (z) \end{array} \right) ^\top \in {\mathbb {R}}^{2n+2p}, \end{aligned}$$

$$\begin{aligned} S&= \left( \begin{array}{cccccccccccccc} 1 &{} -1 &{} 0 &{}0&{}\cdots &{} 0 &{}0 &{}x_{11} &{}-x_{11}&{}x_{12} &{}-x_{12}&{}\cdots &{}x_{1p}&{}-x_{1p}\\ 0 &{} 0 &{} 1 &{} -1 &{}\cdots &{} 0 &{}0 &{}x_{21} &{}-x_{21}&{}x_{22} &{}-x_{22}&{}\cdots &{}x_{2p}&{}-x_{2p}\\ &{}{\vdots }&{}&{}&{}{\ddots }&{} &{}{\vdots }&{}&{}{\vdots }&{}&{}{\vdots }&{} &{}&{}{\vdots } \\ 0&{} 0 &{} 0 &{}0 &{}\cdots &{}1 &{}-1 &{}x_{n1} &{}-x_{n1}&{}x_{n2} &{}-x_{n2}&{}\cdots &{}x_{np}&{}-x_{np}\\ \end{array} \right) \in {\mathbb {R}}^{n \times (2n+2p)},\\ \varvec{q}&= \left( \begin{array}{cccccccccc} 1&{}1&{}\cdots &{}1&{}1&{}0&{}0&{}\cdots &{}0&{}0 \\ \end{array} \right) ^\top \in {\mathbb {R}}^{2n+2p}, \varvec{u}_0=\varvec{a}, \varvec{u}_1=\varvec{b}, \end{aligned}$$

we have (22). Finally, we show piecewise linearity. It is known that the optimal value of the parametric linear programming problem given in the form (22) is a (convex) piecewise-linear function with respect to the parameter z (see, e.g., Section 8.6 in Murty (1983)). Hence, the solution path of $\varvec{\beta }^{R}(z)$ can be represented as a piecewise-linear function with respect to z. $\square $

Proof of Lemma 2

Let $\varvec{e}(z) = \varvec{y}(z)-X^\top \varvec{\beta }$ be a parametric residual vector. Suppose that $\varvec{u}(z)$ and $\varvec{v}(z)$ are vectors, and the i-th element $u(z)_i$ and $v(z)_i$ of $\varvec{u}(z)$ and $\varvec{v}(z)$ are respectively given by

$$\begin{aligned} u(z)_i = \min \{ |e (z)_i |, \delta \}, \ v(z)_i = \max \{|e (z)_i | -\delta ,0 \}, \end{aligned}$$

where $e(z)_i$ is the i-th element of ${\varvec{\rho }} (z)$. Then, the following inequality holds:

$$\begin{aligned} -\varvec{u}(z)-\varvec{v}(z) \le \varvec{y}(z)-X^\top \varvec{\beta } \le \varvec{u}(z)+\varvec{v}(z). \end{aligned}$$

In addition, the objective function can be expressed as follows:

$$\begin{aligned} \frac{1}{2} \varvec{u}(z)^\top \varvec{u}(z) + \delta \cdot \varvec{1}^\top \varvec{v}(z), \end{aligned}$$

where $\varvec{1}$ is a vector with all elements 1. Thus, $\hat{\varvec{\beta }}^{R}(z)$ in the parametric approach can be given by solving the following parametric quadratic programming problem as

$$\begin{aligned} \min&\frac{1}{2} \varvec{u}(z)^\top \varvec{u}(z) + \delta \cdot \varvec{1}^\top \varvec{v}(z)\nonumber \\ \mathrm {s.t.}&-\varvec{u}(z)-\varvec{v}(z) \le \varvec{y}(z)-X\varvec{\beta }(z) \le \varvec{u}(z) + \varvec{v}(z), \nonumber \\&\varvec{0} \le \varvec{u}(z) \le \delta \cdot \varvec{1},\;\; \varvec{v}(z) \ge {\varvec{0}}. \end{aligned}$$

(27)

Therefore, noting that ${\varvec{y}} (z) = {\varvec{a}} + {\varvec{b}} z$, letting

$$\begin{aligned} \varvec{r}&= \left( \begin{array}{ccc} \varvec{\beta } (z) ,\varvec{u} (z) ,\varvec{v} (z) \end{array} \right) ^\top \in {\mathbb {R}}^{p+2n}, \\ P&= \left( \begin{array}{ccc} \varvec{0} &{}\varvec{0} &{}\varvec{0}\\ \varvec{0} &{} I_n &{} \varvec{0}\\ \varvec{0} &{} \varvec{0} &{} \varvec{0}\\ \end{array} \right) \in {\mathbb {R}}^{(p+2n) \times (p+2n)}, \\ \varvec{q}&=(\varvec{0} \;\; \varvec{0}\;\; \delta \cdot \varvec{1}_n)^\top \in {\mathbb {R}}^{(p+2n)}\\ S&= \left( \begin{array}{ccc} X &{}-I_n &{}-I_n\\ -X &{} -I_n &{} -I_n\\ \varvec{0} &{} -I_n &{} \varvec{0}\\ \varvec{0} &{} I_n &{} \varvec{0}\\ \varvec{0} &{} \varvec{0} &{} -I_n \end{array} \right) \in {\mathbb {R}}^{5n \times (p+2n)},\\ {\varvec{u}}_0&= ({\varvec{a}} \;\; -{\varvec{a}} \;\; \varvec{0} \;\; \delta \cdot \varvec{1_n} \;\; \varvec{0} ) ^\top \in {\mathbb {R}}^{5n}, \\ {\varvec{u}}_1&= ({\varvec{b}} \;\; -{\varvec{b}} \;\; \varvec{0} \;\; \varvec{0} \;\; \varvec{0} ) ^\top \in {\mathbb {R}}^{5n}, \end{aligned}$$

it can be shown that (27) is the same as (23). $\square $

Next, we consider the KKT optimality conditions of the reformulated optimization problem (21), and show that the optimal solutions are represented as piecewise-linear functions of z by showing that, when the active variables (nonzero variables at the optimal solution) do not change, the optimal solutions linearly change with z (Proposition 1). Let $\hat{\varvec{u}}(z)$ and $\hat{\varvec{r}} (z)$ be the vectors of optimal Lagrange multipliers. Then, the KKT conditions of (23) are given by

$$\begin{aligned} P \varvec{\hat{r}}(z)+\varvec{q}+S^\top \hat{\varvec{u}}(z)&={\varvec{0}},\\ S\hat{\varvec{r}}(z)-\varvec{h}(z)&\le {\varvec{0}}, \\ \hat{u}_i (z) (S \hat{\varvec{r}}(z)-{\varvec{h}} (z))_i&= 0, \;\;\; \forall i \in [m], \\ \hat{u} _i (z)&\ge 0, \;\;\; \forall i \in [m], \end{aligned}$$

where $\varvec{h}(z)=(\varvec{y}(z)\;\;-\varvec{y}(z) \;\; \varvec{0} \;\; \delta \cdot \varvec{1_n} \;\; \varvec{0})^\top \in {\mathbb {R}}^{5n}$. Furthermore, letting ${\mathcal {A}}_z=\{i \in [m] : \hat{u}_i(z)> 0\}$ and ${\mathcal {A}}_z^c=[m] \; \backslash \; {\mathcal {A}}_z$, we have

$$\begin{aligned} P \varvec{\hat{r}}(z)+\varvec{q}+S^\top \hat{\varvec{u}}(z)&={\varvec{0}}, \nonumber \\ (S\hat{\varvec{r}}(z)-\varvec{h}(z))_i&= 0, \;\;\; \forall i \in {\mathcal {A}}_z \nonumber \\ (S\hat{\varvec{r}}(z)-\varvec{h}(z))_i&\le 0, \;\;\; \forall i \in {\mathcal {A}}_z^c. \end{aligned}$$

(28)

Then, the following proposition holds:

Proposition 1

Let $S_{{\mathcal {A}}_z}$ be the rows of matrix S in a set ${\mathcal {A}}_z$. Consider two real values z and $z'(z < z')$. If ${\mathcal {A}}_z = {\mathcal {A}}_{z'}$, then we have

$$\begin{aligned} \hat{\varvec{r}}(z')-\hat{\varvec{r}}(z)&= \psi (z) \times (z'-z), \end{aligned}$$

(29)

$$\begin{aligned} \hat{\varvec{u}}_{{\mathcal {A}}_z}(z')-\hat{\varvec{u}}_{{\mathcal {A}}_z}(z)&= \gamma (z) \times (z'-z), \end{aligned}$$

(30)

where $\psi (z) \in {\mathbb {R}}^n, \gamma (z) \in {\mathcal {R}}^{|{\mathcal {A}}_z|}$, $ \begin{bmatrix} \psi (z)\\ \gamma (z) \end{bmatrix} = \left[ \begin{array}{cc} P &{} S_{{\mathcal {A}}_z}^\top \\ S_{{\mathcal {A}}_z} &{} 0 \end{array} \right] ^{-1} \left[ \begin{array}{c} \varvec{0}\\ \varvec{u}_{1,{\mathcal {A}}_z} \end{array} \right] .$

Proposition 1 means that $\hat{\varvec{\beta }}^{R}(z)$ is a piecewise-linear function with respect to z on the interval $[z,z^\prime ]$. Finally, we show Proposition 1.

Proof

According to (28), we have the following linear system

$$\begin{aligned} \left[ \begin{array}{cc} P &{} S_{{\mathcal {A}}_z}^\top \\ S_{{\mathcal {A}}_z} &{} \varvec{0} \end{array} \right] \left[ \begin{array}{c} \hat{\varvec{r}}(z)\\ \hat{\varvec{u}}_{{\mathcal {A}}_z}(z)\\ \end{array} \right] = \left[ \begin{array}{c} \varvec{q}\\ \varvec{h}_{{\mathcal {A}}_z}(z) \end{array} \right] . \end{aligned}$$

(31)

By decomposing $\varvec{h}(z)=\varvec{u}_0+\varvec{u}_1 z$, (31) can be re-written as

$$\begin{aligned}&\left[ \begin{array}{cc} P &{} S_{{\mathcal {A}}_z}^\top \\ S_{{\mathcal {A}}_z} &{} \varvec{0} \end{array} \right] \left[ \begin{array}{c} \hat{\varvec{r}}(z)\\ \hat{\varvec{u}}_{{\mathcal {A}}_z}(z)\\ \end{array} \right] = \left[ \begin{array}{c} \varvec{q}\\ \varvec{u}_{0,{\mathcal {A}}}(z) \end{array} \right] + \left[ \begin{array}{c} \varvec{0} \\ \varvec{u}_{1,{\mathcal {A}}_z} \end{array} \right] z, \\ \left[ \begin{array}{c} \hat{\varvec{r}}(z)\\ \hat{\varvec{u}}_{{\mathcal {A}}_z}(z)\\ \end{array} \right] =&\left[ \begin{array}{cc} P &{} S_{{\mathcal {A}}_z}^\top \\ S_{{\mathcal {A}}_z} &{} \varvec{0} \end{array} \right] ^{-1} \left[ \begin{array}{c} \varvec{q}\\ \varvec{u}_{0,{\mathcal {A}}}(z) \end{array} \right] + \left[ \begin{array}{cc} P &{} S_{{\mathcal {A}}_z}^\top \\ S_{{\mathcal {A}}_z} &{} \varvec{0} \end{array} \right] ^{-1} \left[ \begin{array}{c} \varvec{0} \\ \varvec{u}_{1,{\mathcal {A}}_z} \end{array} \right] z. \end{aligned}$$

From now, let us denote $ \begin{bmatrix} \psi (z)\\ \gamma (z) \end{bmatrix} = \left[ \begin{array}{cc} P &{} S_{{\mathcal {A}}_z}^\top \\ S_{{\mathcal {A}}_z} &{} \varvec{0} \end{array} \right] ^{-1} \left[ \begin{array}{c} \varvec{0} \\ \varvec{u}_{1,{\mathcal {A}}_z} \end{array} \right] $ with $\psi (z)\in {\mathbb {R}}^n$ and $\gamma (z) \in {\mathbb {R}}^{|{\mathcal {A}}_z|}$, the results in proposition 1 is straightforward. $\square $

Proof of Lemma 3

We derive the truncation region ${\mathcal {Z}}$ in the interval $[z_{t-1},z_t].$ For any $t \in [T]$ and $i \in [n]$, the residual $r_i^R(z)$ in the interval $[z_{t-1},z_t]$ is given by

$$\begin{aligned} r_i^R(z)=f_{t,i}+g_{t,i}z \;\; z \in [z_{t-1}, z_t]. \end{aligned}$$

The condition for the truncation region is the following inequality:

$$\begin{aligned} |r_i^R(z)|&\ge \xi . \end{aligned}$$

This implies that

$$\begin{aligned}&f_{t,i}+g_{t,i}z \ge \xi , \end{aligned}$$

(32)

$$\begin{aligned} \text { or }&f_{t,i}+g_{t,i}z \le -\xi . \end{aligned}$$

(33)

From (32) and (33), z is classified into the following four cases based on the value of $g_{t,i}$:

$$\begin{aligned}&\left\{ \begin{array}{c} z\ge \frac{\xi -f_{t,i}}{g_{t,i}} \text { if } \;g_{t,i}> 0,\\ z\le \frac{\xi -f_{t,i}}{g_{t,i}} \text { if } \;g_{t,i}< 0,\\ \end{array} \right. \\&\left\{ \begin{array}{c} z\le \frac{-\xi -f_{t,i}}{g_{t,i}} \text { if } \;g_{t,i} > 0,\\ z\ge \frac{-\xi -f_{t,i}}{g_{t,i}} \text { if }\;g_{t,i} < 0. \end{array} \right. \end{aligned}$$

If $g_{t,i} = 0$, the region that satisfies $|f_{t,i}|\ge \xi $ is the truncation region. From the above, $\mathcal {V}_{t, i}$ is derived as

$$\begin{aligned} \mathcal {V}_{t, i} = \left\{ \begin{array}{ll} \left[ z_{t-1}, \min \left\{ z_{t}, \frac{- \xi - f_{t,i}}{g_{t,i}}\right\} \right] \cup \left[ \max \left\{ z_{t-1}, \frac{\xi - f_{t,i}}{g_{t,i}}\right\} , z_t \right] &{} \text { if } g_{t,i} > 0,\\ \left[ z_{t-1}, \min \left\{ z_{t}, \frac{\xi - f_{t,i}}{g_{t,i}}\right\} \right] \cup \left[ \max \left\{ z_{t-1}, \frac{-\xi - f_{t,i}}{g_{t,i}}\right\} , z_t \right] &{} \text { if } g_{t,i}< 0,\\ \left[ z_{t-1}, z_{t} \right] &{} \text { if } g_{t,i} = 0 \text { and } |f_{t,i}| \ge \xi ,\\ \emptyset &{} \text { if } g_{t,i} = 0 \text { and } |f_{t,i}| < \xi , \end{array} \right. \end{aligned}$$

where we define $[\ell , u] = \emptyset $ if $\ell > u$. $\square $

Proof of Lemma 4

We derive the truncation region ${\mathcal {Z}}$ in the interval $[z_{t-1},z_t]$. For any $t \in [T]$ and $(i, i^\prime ) \in \mathcal {O}(y^{\mathrm{observed}}) \times \left( [n] \setminus \mathcal {O}(y^\mathrm{observed})\right) $, the residual $r_i^R(z)$ in the interval $[z_{t-1},z_t]$ is given by

$$\begin{aligned} r_i^R(z)=f_{t,i}+g_{t,i}z,\;\;\; r_{i'}^R(z)=f_{t,i'}+g_{t,i'}z \;\; z \in [z_{t-1}, z_t]. \end{aligned}$$

The condition of the truncation region is the following:

$$\begin{aligned} |f_{t,i}+g_{t,i} z| \ge |f_{t,i'} +g_{t,i'}z|. \end{aligned}$$

(34)

Since the case classification by absolute value is complicated, we take the method of squaring both sides:

$$\begin{aligned}&(f_{t,i}+g_{t,i} z)^2 \ge (f_{t,i'} +g_{t,i'}z)^2,\nonumber \\&(g_{t,i}^2-g_{t,i'}^2)z^2 + (2f_{t,i}g_{t,i}-2f_{t,i'}g_{t,i'})z + (f_{t,i'}^2-f_{t,i'}^2) \ge 0,\nonumber \\&\alpha z^2 + \beta z + \gamma \ge 0, \end{aligned}$$

(35)

where $\alpha =g_{t,i}^2-g_{t,i'}^2, \beta =2f_{t,i}g_{t,i}-2f_{t,i'}g_{t,i'} $ and $\gamma =f_{t,i}^2-f_{t,i'}^2$. The truncation region that satisfies (35) is equivalent to the condition that satisfies equation (34). Next, we consider the three cases of $\alpha $.

When $\alpha =0$, from $\beta z + \gamma \ge 0$, z is classified into

$$\begin{aligned} \left\{ \begin{array}{c} z\ge \frac{-\gamma }{\beta } \text { if } \; \beta > 0,\\ z\le \frac{-\gamma }{\beta } \text { if }\; \beta < 0. \end{array} \right. \end{aligned}$$

If $\beta =0$, the region that satisfies $\gamma \ge 0$ is the truncation region. Therefore, when $\alpha = 0$, the truncation region can be expressed as

$$\begin{aligned} \left\{ \begin{array}{ll} \left[ \max \left\{ z_{t-1}, \frac{-\gamma }{\beta }\right\} , z_t \right] &{} \text { if } \alpha =0 \text { and } \beta >0,\\ \left[ z_{t-1}, \min \left\{ z_{t},\frac{-\gamma }{\beta }\right\} \right] &{} \text { if } \alpha =0 \text { and } \beta<0,\\ \left[ z_{t-1}, z_{t} \right] &{} \text { if } \alpha =0 \text { and } \beta =0 \text { and } \gamma \ge 0,\\ \emptyset &{} \text { if } \alpha =0 \text { and } \beta =0 \text { and } \gamma <0. \end{array} \right. \end{aligned}$$

When $\alpha >0$, using the quadratic formula, inequality (35) is

$$\begin{aligned}&z>\frac{-\beta +\sqrt{\beta ^2-4 \alpha \gamma }}{2 \alpha },\\ \text { or }&z<\frac{-\beta -\sqrt{\beta ^2-4 \alpha \gamma }}{2 \alpha }. \end{aligned}$$

Therefore, the truncation region can be derived using the union as

$$\begin{aligned} \left[ \max \left\{ z_{t-1}, \frac{-\beta +\sqrt{\beta ^2-4 \alpha \gamma }}{2 \alpha } \right\} , z_t \right] \cup \left[ z_{t-1}, \min \left\{ z_{t}, \frac{-\beta -\sqrt{\beta ^2-4 \alpha \gamma }}{2 \alpha }\right\} \right] \text { if } \alpha >0. \end{aligned}$$

When $\alpha <0$, using the quadratic formula, inequality (35) is

$$\begin{aligned}&z<\frac{-\beta -\sqrt{\beta ^2-4 \alpha \gamma }}{2 \alpha }, \\ \text { and }&z>\frac{-\beta +\sqrt{\beta ^2-4 \alpha \gamma }}{2 \alpha }. \end{aligned}$$

Therefore, the truncation region can be derived using the intersection as

$$\begin{aligned} \left[ z_{t-1}, \min \left\{ z_{t}, \frac{-\beta -\sqrt{\beta ^2-4 \alpha \gamma }}{2 \alpha } \right\} \right] \cap \left[ \max \left\{ z_{t-1}, \frac{-\beta +\sqrt{\beta ^2-4 \alpha \gamma }}{2 \alpha }\right\} , z_{t} \right] \text { if } \alpha <0. \end{aligned}$$

From the above, $\mathcal {W}_{t, i}$ is derived as

$$\begin{aligned} \mathcal {W}_{t, (i, i^\prime )} = \left\{ \begin{array}{ll} \left[ \max \left\{ z_{t-1}, \frac{-\gamma }{\beta }\right\} , z_t \right] &{} \text { if } \alpha =0 \text { and } \beta>0,\\ \left[ z_{t-1}, \min \left\{ z_{t},\frac{-\gamma }{\beta }\right\} \right] &{} \text { if } \alpha =0 \text { and } \beta<0,\\ \left[ z_{t-1}, z_{t} \right] &{} \text { if } \alpha =0 \text { and } \beta =0 \text { and } \gamma \ge 0,\\ \emptyset &{} \text { if } \alpha =0 \text { and } \beta =0 \text { and } \gamma<0,\\ \left[ \max \left\{ z_{t-1}, \frac{-\beta +\sqrt{\beta ^2-4 \alpha \gamma }}{2 \alpha } \right\} , z_t \right] \cup \left[ z_{t-1}, \min \left\{ z_{t}, \frac{-\beta -\sqrt{\beta ^2-4 \alpha \gamma }}{2 \alpha }\right\} \right] &{} \text { if } \alpha >0,\\ \left[ z_{t-1}, \min \left\{ z_{t}, \frac{-\beta -\sqrt{\beta ^2-4 \alpha \gamma }}{2 \alpha } \right\} \right] \cap \left[ \max \left\{ z_{t-1}, \frac{-\beta +\sqrt{\beta ^2-4 \alpha \gamma }}{2 \alpha }\right\} , z_{t} \right] &{} \text { if } \alpha <0. \end{array} \right. \end{aligned}$$

$\square $

About this article

Cite this article

Tsukurimichi, T., Inatsu, Y., Duy, V.N.L. et al. Conditional selective inference for robust regression and outlier detection using piecewise-linear homotopy continuation. Ann Inst Stat Math 74, 1197–1228 (2022). https://doi.org/10.1007/s10463-022-00846-2

Download citation

Received: 01 February 2021
Revised: 04 January 2022
Accepted: 23 March 2022
Published: 27 August 2022
Issue Date: December 2022
DOI: https://doi.org/10.1007/s10463-022-00846-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Conditional selective inference for robust regression and outlier detection using piecewise-linear homotopy continuation

Abstract

Access this article

Similar content being viewed by others

A new computational framework for log-concave density estimation

Data-driven distributionally robust optimization using the Wasserstein metric: performance guarantees and tractable reformulations

Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Supplementary Information

Supplementary file 1 (pdf 78 KB)

Appendix A: Proofs

Proof of Lemma 1

Proof of Lemma 2

Proposition 1

Proof

Proof of Lemma 3

Proof of Lemma 4

About this article

Cite this article

Keywords

Navigation

Conditional selective inference for robust regression and outlier detection using piecewise-linear homotopy continuation

Abstract

Access this article

Similar content being viewed by others

A new computational framework for log-concave density estimation

Data-driven distributionally robust optimization using the Wasserstein metric: performance guarantees and tractable reformulations

Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Supplementary Information

Supplementary file 1 (pdf 78 KB)

Appendix A: Proofs

Appendix A: Proofs

Proof of Lemma 1

Proof of Lemma 2

Proposition 1

Proof

Proof of Lemma 3

Proof of Lemma 4

About this article

Cite this article

Share this article

Keywords

Search

Navigation