Shuffled Linear Regression with Outliers in Both Covariates and Responses

Li, Feiran; Fujiwara, Kent; Okura, Fumio; Matsushita, Yasuyuki

doi:10.1007/s11263-022-01709-2

Shuffled Linear Regression with Outliers in Both Covariates and Responses

Published: 14 December 2022

Volume 131, pages 732–751, (2023)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

487 Accesses
1 Citation
3 Altmetric
Explore all metrics

Abstract

This paper studies a shuffled linear regression problem. As a variant of ordinary linear regression, it requires estimating not only the regression variable, but also permutational correspondences between the covariates and responses. While existing formulations require the underlying ground-truth correspondences to be an ideal bijection such that all pieces of data should match, such a requirement barely holds in real-world applications due to either missing data or outliers. In this work, we generalize the formulation of shuffled linear regression to a broader range of conditions where only a part of the data should correspond. To this end, the effective recovery condition and NP-hardness of the proposed formulation are also studied. Moreover, we present a simple yet effective algorithm for deriving the solution. Its global convergence property and convergence rate are also analyzed in detail. Distinct tasks validate the effectiveness of our proposed formulation and the solution method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Linear Regression with Mismatched Data: A Provably Optimal Local Search Algorithm

Randomly weighted LAD-estimation for partially linear errors-in-variables models

Article 27 June 2015

Doubly robust estimation and robust empirical likelihood in generalized linear models with missing responses

Article 14 November 2023

Notes

Some implementations are from ProbReg: http://probreg.readthedocs.io/, last accessed on 2022/12/14 12:14:20.

References

Abid, A., & Zou, J. (2018). A stochastic expectation-maximization approach to shuffled linear regression. In Proceedings of annual allerton conference on communication, control, and computing.
Abid, A., Poon, A., & Zou, J. (2017). Linear regression with shuffled labels. ArXiv Preprint ArXiv:1705.01342.
Aoki, Y., Goforth, H., Srivatsan, R.A., & Lucey, S. (2019). Pointnetlk: Robust & efficient point cloud registration using pointnet. In Proceedings of conference on computer vision and pattern recognition (CVPR), pp. 7163–7172.
Arun, K. S., Huang, T. S., & Blostein, S. D. (1987). Least-squares fitting of two 3-d point sets. Transactions on Pattern Analysis and Machine Intelligence. https://doi.org/10.1109/TPAMI.1987.4767965.
Article Google Scholar
Attouch, H., & Bolte, J. (2009). On the convergence of the proximal algorithm for nonsmooth functions involving analytic features. Mathematical Programming, 116(1), 5–16.
Article MathSciNet MATH Google Scholar
Attouch, H., Bolte, J., Redont, P., & Soubeyran, A. (2010). Proximal alternating minimization and projection methods for nonconvex problems: An approach based on the kurdyka-łojasiewicz inequality. Mathematics of Operations Research, 35(2), 438–457.
Article MathSciNet MATH Google Scholar
Aubry, M., Schlickewei, U., & Cremers, D. (2011). The wave kernel signature: A quantum mechanical approach to shape analysis. In Proceedings of international conference on computer vision workshops (ICCV workshops).
Bell, J., & Stevens, B. (2009). A survey of known results and research areas for n-queens. Discrete Mathematics, 309(1), 1–31.
Article MathSciNet MATH Google Scholar
Birdal, T., & Simsekli, U. (2019). Probabilistic permutation synchronization using the riemannian structure of the birkhoff polytope. In Proceedings of conference on computer vision and pattern recognition (CVPR), pp. 11,105–11,116.
Bogo, F., Romero, J., Loper, M., & Black, M.J. (2014). FAUST: Dataset and evaluation for 3D mesh registration. In Proceedings of conference on computer vision and pattern recognition (CVPR).
Bolte, J., Daniilidis, A., & Lewis, A. (2007). The łojasiewicz inequality for nonsmooth subanalytic functions with applications to subgradient dynamical systems. SIAM Journal on Optimization, 17(4), 1205–1223.
Article MATH Google Scholar
Bolte, J., Sabach, S., & Teboulle, M. (2014). Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Mathematical Programming, 146(1), 459–494.
Article MathSciNet MATH Google Scholar
Bronstein, A.M., Bronstein, M.M., & Kimmel, R. (2008). Numerical geometry of non-rigid shapes.
Cai, Z., Chin, T.J., Le, H., & Suter, D. (2018) Deterministic consensus maximization with biconvex programming. In Proceedings of European conference on computer vision (ECCV), pp. 685–700.
Campbell, D., & Petersson, L. (2015). An adaptive data representation for robust point-set registration and merging. In Proceedings of international conference on computer vision (ICCV).
Chetverikov, D., Svirko, D., Stepanov, D., & Krsek, P. (2002). The trimmed iterative closest point algorithm. Object Recognition Supported by User Interaction for Service Robots, 3, 545–548.
Article Google Scholar
Chin, T. J., & Suter, D. (2017). The maximum consensus problem: Recent algorithmic advances. Synthesis Lectures on Computer Vision, 7(2), 1–194.
Article Google Scholar
Choi, S., Kim, T., & Yu, W. (2009) Performance evaluation of RANSAC family. In Proceedings of British machine vision conference (BMVC).
Curless, B., & Levoy, M. (1996). A volumetric method for building complex models from range images. In Proceedings of annual conference on computer graphics and interactive techniques.
Date, K., & Nagi, R. (2016). Gpu-accelerated hungarian algorithms for the linear assignment problem. Parallel Computing, 57, 52–72.
Article MathSciNet Google Scholar
De Menezes, D., Prata, D. M., Secchi, A. R., & Pinto, J. C. (2021). A review on robust m-estimators for regression analysis. Computers & Chemical Engineering, 147(107), 254.
Google Scholar
Doornik, J.A. (2011). Robust estimation using least trimmed squares. Tech. rep., Institute for Economic Modelling, Oxford Martin School, and Economics Department, University of Oxford, UK.
Eckart, B., Kim, K., & Jan, K. (2018). Eoe: Expected overlap estimation over unstructured point cloud data. In Proceedings of international conference on 3D vision (3DV), pp. 747–755.
Elhami, G., Scholefield, A., Haro, B.B., & Vetterli, M. (2017). Unlabeled sensing: Reconstruction algorithm and theoretical guarantees. In Proceedings of international conference on acoustics, speech, and signal processing (ICASSP), pp. 4566–4570.
Fiori, M., Sprechmann, P., Vogelstein, J., Musé, P., & Sapiro, G. (2013). Robust multimodal graph matching: Sparse coding meets graph matching. In Proceedings of conference on neural information processing systems (NIPS).
Fischler, M. A., & Bolles, R. C. (1981). Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM, 24(6), 381–395.
Article MathSciNet Google Scholar
Fogel, F., Jenatton, R., Bach, F., & d’Aspremont, A. (2013). Convex relaxations for permutation problems. In Proceedings of conference on neural information processing systems (NIPS).
Gao, W., & Tedrake, R. (2019). Filterreg: Robust and efficient probabilistic point-set registration using gaussian filter and twist parameterization. In Proceedings of conference on computer vision and pattern recognition (CVPR).
Gold, S., Rangarajan, A., Lu, C. P., Pappu, S., & Mjolsness, E. (1998). New algorithms for 2d and 3d point matching: Pose estimation and correspondence. Pattern Recognition, 31(8), 1019–1031.
Article Google Scholar
Gunawardana, A., & Byrne, W. (2005). Convergence theorems for generalized alternating minimization procedures. Journal of Machine Learning Research, 6, 2049–2073.
MathSciNet MATH Google Scholar
Haghighatshoar, S., & Caire, G. (2017). Signal recovery from unlabeled samples. Transactions on Signal Processing, 66(5), 1242–1257.
Article MathSciNet MATH Google Scholar
Hahnel, D., Burgard, W., Fox, D., Fishkin, K., & Philipose, M. (2004). Mapping and localization with rfid technology. In Proceedings of international conference on robotics and automation (ICRA), vol. 1, pp. 1015–1020.
Hampel, F. (2014). Robust inference. Statistics Reference Online.
Hampel, F. R. (1985). The breakdown points of the mean combined with some rejection rules. Technometrics, 27, 95–107.
Article MathSciNet MATH Google Scholar
Hartley, R., & Zisserman, A. (2003). Multiple view geometry in computer vision.
Hawkins, D. M. (1994). The feasible solution algorithm for least trimmed squares regression. Computational Statistics & Data Analysis, 17(2), 185–196.
Article MATH Google Scholar
Hsu, D.J., Shi, K., & Sun, X. (2017) Linear regression without correspondence. In Proceedings of conference on neural information processing systems (NIPS).
Huber, P.J. (1992). Robust estimation of a location parameter. In Breakthroughs in statistics, pp. 492–518.
Jia, K., Chan, T. H., Zeng, Z., Gao, S., Wang, G., Zhang, T., & Ma, Y. (2016). Roml: A robust feature correspondence approach for matching objects in a set of images. International Journal of Computer Vision, 117(2), 173–197.
Article MathSciNet MATH Google Scholar
Jiang, H., Stella, X. Y., & Martin, D. R. (2010). Linear scale and rotation invariant matching. Transactions on Pattern Analysis and Machine Intelligence, 33(7), 1339–1355.
Article Google Scholar
Kuhn, A., & Mayer, H. (2015). Incremental division of very large point clouds for scalable 3d surface reconstruction. In Proceedings of international conference on computer vision workshops (ICCV workshops).
Kuhn, H. W. (1955). The hungarian method for the assignment problem. Naval Research Logistics Quarterly, 2(1–2), 83–97.
Article MathSciNet MATH Google Scholar
Larranaga, P., Kuijpers, C. M. H., Murga, R. H., Inza, I., & Dizdarevic, S. (1999). Genetic algorithms for the travelling salesman problem: A review of representations and operators. Artificial Intelligence Review, 13(2), 129–170.
Article Google Scholar
Le, H., Chin, T.J., & Suter, D. (2017) An exact penalty method for locally convergent maximum consensus. In Proceedings of conference on computer vision and pattern recognition (CVPR), pp. 1888–1896.
Le, H., Chin, T. J., Eriksson, A., Do, T. T., & Suter, D. (2019). Deterministic approximate methods for maximum consensus robust fitting. Transactions on Pattern Analysis and Machine Intelligence, 43(3), 842–857.
Article Google Scholar
Li, H., & Hartley, R. (2007). The 3d-3d registration problem revisited. In Proceedings of international conference on computer vision (ICCV).
Li, F., Fujiwara, k., Okura, F., & Matsushita, Y. (2021). Generalized shuffled linear regression. In Proceedings of international conference on computer vision (ICCV).
Lian, W., & Zhang, L. (2014). Point matching in the presence of outliers in both point sets: A concave optimization approach. In Proceedings of conference on computer vision and pattern recognition (CVPR).
Li, J., So, A. M. C., & Ma, W. K. (2020). Understanding notions of stationarity in nonsmooth optimization: A guided tour of various constructions of subdifferential for nonsmooth functions. Signal Processing Magazine, 37(5), 18–31.
Article Google Scholar
Lowe, D.G. (1999). Object recognition from local scale-invariant features. In Proceedings of international conference on computer vision (ICCV).
Lubiw, A. (1981). Some np-complete problems similar to graph isomorphism. SIAM Journal on Computing, 10(1), 11–21.
Article MathSciNet MATH Google Scholar
Maciel, J., & Costeira, J. P. (2003). A global solution to sparse correspondence problems. Transactions on Pattern Analysis and Machine Intelligence, 25(2), 187–199.
Article Google Scholar
Marques, M., Stošić, M., & Costeira, J. (2009). Subspace matching: Unique solution to point matching with geometric constraints. In Proceedings of international conference on computer vision (ICCV), pp. 1288–1294.
Maset, E., Arrigoni, F., & Fusiello, A. (2017). Practical and efficient multi-view matching. In Proceedings of international conference on computer vision (ICCV), pp. 4568–4576.
Mathias, R. (2006). The Linear Algebra a Beginning Graduate Student Ought to Know.
Melzi, S., Ren, J., Rodolà, E., Sharma, A., Wonka, P., & Ovsjanikov, M. (2019). Zoomout: Spectral upsampling for efficient shape correspondence. Transactions on Graphics, 38(6).
Mohamed, I. S., Capitanelli, A., Mastrogiovanni, F., Rovetta, S., & Zaccaria, R. (2019). A 2d laser rangefinder scans dataset of standard eur pallets. Data in brief, 24, 103837.
Article Google Scholar
Myronenko, A., & Song, X. (2010). Point set registration: Coherent point drift. Transactions on Pattern Analysis and Machine Intelligence, 32(12), 2262–2275.
Article Google Scholar
Narayanan, A., & Shmatikov, V. (2008). Robust de-anonymization of large sparse datasets. In Proceedings of symposium on security and privacy.
Nejatbakhsh, A., & Varol, E. (2021). Neuron matching in c. elegans with robust approximate linear regression without correspondence. In Proceedings of winter conference on applications of computer vision (WACV).
Ovsjanikov, M., Ben-Chen, M., Solomon, J., Butscher, A., & Guibas, L. (2012). Functional maps: A flexible representation of maps between shapes. Transactions on Graphics, 31(4).
Pachauri, D., Kondor, R., & Singh, V. (2013). Solving the multi-way matching problem by permutation synchronization. In Proceedings of conference on neural information processing systems (NIPS), vol. 26.
Pananjady, A., Wainwright, M.J., & Courtade, T.A. (2017). Denoising linear models with permuted data. In Proceedings of international symposium on information theory (ISIT).
Pananjady, A., Wainwright, M. J., & Courtade, T. A. (2017). Linear regression with shuffled data: Statistical and computational limits of permutation recovery. Transactions on Information Theory, 64(5), 3286–3300.
Article MathSciNet MATH Google Scholar
Pomerleau, F., Liu, M., Colas, F., & Siegwart, R. (2012). Challenging data sets for point cloud registration algorithms. International Journal of Robotics Research, 31(14), 1705–1711.
Article Google Scholar
Pylvänäinen, T., Berclaz, J., Korah, T., Hedau, V., Aanjaneya, M., & Grzeszczuk, R. (2012). 3d city modeling from street-level data for augmented reality applications. In Proceedings of international conference on 3D imaging, modeling, processing, visualization & transmission, pp. 238–245.
Ren, J., Poulenard, A., Wonka, P., & Ovsjanikov, M. (2018). Continuous and orientation-preserving correspondences via functional maps. Transactions on Graphics, 37(6), 1–6.
Article Google Scholar
Rousseeuw, P.J., & Leroy, A.M. (2005). Robust regression and outlier detection.
Rusinkiewicz, S. (2019). A symmetric objective function for icp. Transactions on Graphics, 38(4).
Rusu, R.B., Blodow, N., & Beetz, M. (2009). Fast point feature histograms (fpfh) for 3d registration. In Proceedings of international conference on robotics and automation (ICRA).
Shiratori, T., Berclaz, J., Harville, M., Shah, C., Li, T., Matsushita, Y., & Shiller, S. (2015). Efficient large-scale point cloud registration using loop closures. In Proceedings of international conference on 3D vision (3DV), pp. 232–240.
Slawski, M., Ben-David, E., et al. (2019). Linear regression with sparsely permuted data. Electronic Journal of Statistics, 13(1), 1–36.
Article MathSciNet MATH Google Scholar
Slawski, M., Ben-David, E., & Li, P. (2019). A two-stage approach to multivariate linear regression with sparsely mismatched data. Journal of Machine Learning Research, 21(204), 1–42.
MATH Google Scholar
Stošić, M., Marques, M., & Costeira, J. P. (2011). Convex solution of a permutation problem. Linear Algebra and its Applications, 434(1), 361–369.
Article MathSciNet MATH Google Scholar
Theiler, P., Schindler, K., et al. (2012). Automatic registration of terrestrial laser scanner point clouds using natural planar surfaces. ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, 3, 173–178.
Article Google Scholar
Tsakiris, M., & Peng, L. (2019). Homomorphic sensing. In Proceedings of international conference on machine learning (ICML).
Unnikrishnan, J., Haghighatshoar, S., & Vetterli, M. (2018). Unlabeled sensing with random linear measurements. Transactions on Information Theory, 64(5), 3237–3253.
Article MathSciNet MATH Google Scholar
Vestner, M., Lähner, Z., Boyarski, A., Litany, O., Slossberg, R., Remez, T., Rodola, E., Bronstein, A., Bronstein, M., & Kimmel, R., et al. (2017). Efficient deformable shape correspondence via kernel matching. In Proceedings of international conference on 3D vision (3DV).
Volgenant, A. (2004). Solving the k-cardinality assignment problem by transformation. European Journal of Operational Research, 157(2), 322–331.
Article MathSciNet MATH Google Scholar
Vongkulbhisal, J., De la Torre, F., & Costeira, J. P. (2018). Discriminative optimization: Theory and applications to computer vision. Transactions on Pattern Analysis and Machine Intelligence, 41(4), 829–843.
Article Google Scholar
Wang, F., Xue, N., Yu, J.G., & Xia, G.S. (2020). Zero-assignment constraint for graph matching with outliers. In Proceedings of conference on computer vision and pattern recognition (CVPR).
Wu, Z., Song, S., Khosla, A., Yu, F., Zhang, L., Tang, X., & Xiao, J. (2015). 3d shapenets: A deep representation for volumetric shapes. In Proceedings of conference on computer vision and pattern recognition (CVPR), pp. 1912–1920.
Xu, Y., & Yin, W. (2013). A block coordinate descent method for regularized multiconvex optimization with applications to nonnegative tensor factorization and completion. SIAM Journal on Imaging Sciences, 6(3), 1758–1789.
Article MathSciNet MATH Google Scholar
Xu, Y., & Yin, W. (2017). A globally convergent algorithm for nonconvex optimization based on block coordinate update. Journal of Scientific Computing, 72(2), 700–734.
Article MathSciNet MATH Google Scholar
Yadav, S.S., Lopes, P.A.C., Ilic, A., & Patra, S.K. (2019). Hungarian algorithm for subcarrier assignment problem using gpu and cuda. International Journal of Communication Systems, 32(4).
Yang, E., Lozano, A. C., & Aravkin, A. (2018). A general family of trimmed estimators for robust high-dimensional data analysis. Electronic Journal of Statistics, 12(2), 3519–3553.
Article MathSciNet MATH Google Scholar
Yang, H., Shi, J., & Carlone, L. (2020). Teaser: Fast and certifiable point cloud registration. Transactions on Robotics, 37(2), 314–333.
Article Google Scholar
Zangwill, W.I. (1969). Nonlinear programming: a unified approach.
Zhang, H., Slawski, M., & Li, P. (2019). Permutation recovery from multiple measurement vectors in unlabeled sensing. In Proceedings of international symposium on information theory (ISIT).
Zhou, Q.Y., Park, J., & Koltun, V. (2016). Fast global registration. In Proceedings of European conference on computer vision (ECCV).

Download references

Acknowledgements

This work was supported by NII CRIS collaborative research program operated by NII CRIS and LINE Corporation.

Author information

Authors and Affiliations

Graduate School of Information Science and Technology, Osaka University, Osaka, Japan
Feiran Li, Fumio Okura & Yasuyuki Matsushita
LINE Corporation, Tokyo, Japan
Kent Fujiwara

Authors

Feiran Li
View author publications
You can also search for this author in PubMed Google Scholar
Kent Fujiwara
View author publications
You can also search for this author in PubMed Google Scholar
Fumio Okura
View author publications
You can also search for this author in PubMed Google Scholar
Yasuyuki Matsushita
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Feiran Li.

Additional information

Communicated by Federica Arrigoni.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Supplementary information for the analyses of GSLR

1.1 A.1 Derivation of Corollary 1

Given a generic linear subspace $\mathcal {V} \subset \mathbb {R}^m$ of dimension d (hereafter denoted as $\textsf{dim}\left( \mathcal {V}\right) = d$), and a finite set of endomorphisms $\mathcal {T}$ of $\mathbb {R}^m$, homomorphic sensing (Tsakiris and Peng 2019) presents the unique recovery condition in $\mathcal {V}$ under $\mathcal {T}$. I.e., $\forall v_1, v_2 \in \mathcal {V}, \tau _1, \tau _2 \in \mathcal {T}$, when can $\tau _1\left( v_1\right) =\tau _2\left( v_2\right) $ imply $v_1 = v_2$.

For self-containment, let us first remark Theorem 1 and Proposition 3 of Tsakiris and Peng (2019), which can be jointly described by the following corollary:

Corollary 2

(Unique recovery condition of homomorphic sensing) Assuming $\forall \tau \in \mathcal {T}, \textsf{rank}\left( \tau \right) \ge 2d$, then, we have unique recovery in $\mathcal {V}$ under $\mathcal {T}$ as long as

$$\begin{aligned} \textsf{dim}\left( \mathcal {V}\right) \le \mathcal {Q}_{\tau _1, \tau _2}, \ \forall \tau _1, \tau _2 \in \mathcal {T}, \end{aligned}$$

(16)

where $\mathcal {Q}_{\tau _1, \tau _2}$ is an intermediate technical term dependent to $\tau _1$ and $\tau _2$ such that, in case $\pi _1, \pi _2 \in \Pi ^\ddagger $ are full-rank square permutation matrices, we have

$$\begin{aligned} \mathcal {Q}_{\rho \pi _1, \rho \pi _2} \ge \frac{\textsf{rank}\left( \rho \right) }{2}, \ \forall \pi _1, \pi _2 \in \Pi ^\ddagger , \end{aligned}$$

(17)

where $\rho $ denotes a coordinate projection matrix.

To apply Corollary 2 to GSLR, we shall consider $\textbf{Ax}$ as a whole for some d-dimensional $\textbf{x}$ and set $\mathcal {V} = \{\textbf{Ax} : \textbf{x} \in \mathcal {X}\}$. Since linear mapping does not increase dimension, we have $\textsf{dim}\left( \mathcal {V}\right) \le d$. Futhermore, by setting $\mathcal {T} = \Pi _k$ and the $\rho $ in Eq. (17) to the projections that map from $\Pi ^\ddagger $ to $\Pi _k$, we obtain

$$\begin{aligned} \mathcal {Q}_{\tau _1, \tau _2} \ge \frac{\textsf{rank}\left( \rho \right) }{2} = \frac{k}{2}, \ \forall \tau _1, \tau _2 \in \Pi _k. \end{aligned}$$

(18)

Using Eq. (18) as a lower bound for Eq. (16), we can immediately recognize $k \ge 2d$ as a sufficient condition for unique recovery.

1.2 A.2 Proof of Proposition 1

Proof of Proposition 1

Since the bijectivity will always be satisfied as long as Problem 2 is solved, we here show that if there exists an f to solve Problem 2 that takes the graphs G and H defined in Proposition 1 as inputs, then this f is a linear mapping from a subset of $\textbf{A}$ to $\textbf{b}$. Specifically, since $\textsf{dom}\left( f\right) $ is a finitely discrete set, an f that solves Problem 2 would be a continuous mapping. Furthermore, assuming that vertices are mapped in the form of $\textbf{A}_i$ to $\textbf{b}_i$ and $\textbf{A}_j$ to $\textbf{b}_j$, we have

$$\begin{aligned} \underbrace{f\left( \textbf{A}_i + \textbf{A}_j\right) }_{\text {edge in} {\tilde{G}}} = \underbrace{\textbf{b}_i + \textbf{b}_j}_{\hbox { edge in}\ H} = \underbrace{f\left( \textbf{A}_i\right) + f\left( \textbf{A}_j\right) }_{\hbox { vertices in}\ H} \end{aligned}$$

(19)

always holds w.r.t. the preimage of $\textbf{b}$, indicating that f is additive. By combining the continuity and the additivity we can obtain that f is a linear mapping (Mathias 2006). $\square $ $\square $

Appendix B: Supplementary information for the convergence analyses

1.1 B.1 Proofs of the global convergence

We here show that the GSLR triplet mentioned in Sect. 4.2.2 satisfies all the conditions of Theorem 1. For self- containment, we first remark the definition of closedness of set-valued mapping:

Definition 1

(Closedness of set-valued mapping (Gunawardana and Byrne 2005)) The concept of closedness of set-valued mapping $\mathcal {F}: \mathcal {U} \rightarrow \mathcal {V}$ generalizes the notion of continuity of point-to-point mappings: $\mathcal {F}$ is closed at a point $u \in \mathcal {U}$ if

$u_k \rightarrow u$, where $u_k \in \mathcal {U}$,
$v_k \rightarrow v$, where $v, v_k \in \mathcal {V}$,
$v_k \in \mathcal {F}(u_k)$,

imply $v \in \mathcal {F}(u)$. Based on this, $\mathcal {F}$ is called closed if it is closed at any points of $\mathcal {U}$.

Now we are ready to introduce the proofs:

Proof of the compactness of $\mathcal {Z}$ Since $\Pi $ is discrete and finite, $\Pi $ is compact. Since $\mathcal {X}$ is compact per Hypotheses H and unions of compact sets are still compact, $\mathcal {Z}$ is compact. $\square $

Proof of the closedness of $ \mathcal {H}$ For any $\left( \textbf{P}_{t}, \textbf{x}_{t}\right) $, there exist some $\delta _1, \delta _2 \in \mathbb {R}^{+}$ that satisfy

$$\begin{aligned} \begin{aligned} \forall \textbf{P}', \ \textbf{P}' \rightarrow \textbf{P} :=\{\textbf{P}' : \textbf{P}' \in \mathcal {N}_{\delta _1}(\textbf{P}_t)\} = \{\textbf{P}_t\}, \\ \forall \textbf{x}', \ \textbf{x}' \rightarrow \textbf{x} :=\left|\mathbf {x'}, \textbf{x}_t\right|_\textbf{x} < \delta _2, \end{aligned} \end{aligned}$$

(20)

where $\mathcal {N}_{\delta _1}(\textbf{P}_t)$ is the neighborhood of $\textbf{P}_t$. Therefore, we can derive the differences of mapped $\textbf{x}$ as

$$\begin{aligned} \begin{aligned}&\mathcal {H}(\mathbf {P', x'}) \setminus \mathcal {H}(\textbf{P}_t, \textbf{x}_t) = \mathcal {H}(\textbf{P}_t, \textbf{x}_t) \setminus \mathcal {H}(\mathbf {P', x'}) \\ =&\mathcal {H}(\textbf{P}_t, \textbf{x}') \setminus \mathcal {H}(\textbf{P}_t, \textbf{x}_t) \\ =&\mathop {\textrm{argmin}}\limits _{\textbf{x} \in \mathcal {X}}\Psi (\textbf{P}_t, \textbf{x}) \setminus \mathop {\textrm{argmin}}\limits _{\textbf{x} \in \mathcal {X}}\Psi (\textbf{P}_t, \textbf{x}) = \emptyset . \end{aligned} \end{aligned}$$

(21)

Since Eq. (21) implies that the cost matrix for the k-LAP generated by the regression variable $\textbf{x} \in \mathcal {H}(\textbf{P, x})$ remains unchanged when the input of $\mathcal {H}$ switches from $\left( \textbf{P}_t, \textbf{x}_t\right) $ to $\left( \mathbf {P'}, \mathbf {x'}\right) $, so will the mapped permutation $\textbf{P}$. Therefore, we have $\mathcal {H}(\textbf{P}_{t}, \textbf{x}_{t})$ closed by further joining Eqs. (20) and (21). $\square $

Proof of the continuity and monotonicity of $\Psi $ For the continuity, since compositions of continuous mappings are still continuous, we only need to prove the continuity of $f(\textbf{P}, \textbf{x}) = \textbf{PAx}$ and $g(\textbf{P}) = \textbf{PP}^T\textbf{b}$. Since the proof roadmap for f and g is similar, we only take f as an example. Specifically, given the fact that such a function is a bilinear form over the finite-dimensional space $\left( \textbf{P, x}\right) $, we only need to show that it is separately continuous w.r.t. both $\textbf{P}$ and $\textbf{x}$. For proof, per Hypotheses H, we have $f(\textbf{x}) = \textbf{Ax}$ continuous w.r.t. $\textbf{x} \in \mathcal {X}$. Since $\Pi $ is finitely discrete, we also have $f(\textbf{P}) = \textbf{PA}$ continuous w.r.t. any $\textbf{P} \in \Pi $. Therefore, we can obtain that $f(\textbf{P}, \textbf{x}) = \textbf{PAx}$ is continuous and hence so is $\Psi $.

For the monotonicity, by conducting the linear regression step w.r.t. $\textbf{x}$ we have

$$\begin{aligned} J(\textbf{P}_t, \textbf{x}_{t+1}) \le J(\textbf{P}_t, \textbf{x}_{t}), \end{aligned}$$

(22)

where the equality holds if and only if $\textbf{x}_t$ is a critical point. Furthermore, assuming there is a temporary permutation matrix $\textbf{P}_{\textrm{tmp}}$ that randomly removes $k_{t+1} - k_t$ non-zero assignments from $\textbf{P}_t$, since the cost matrix $\textbf{D}_{ij} = \left( \textbf{A}_i\textbf{x}-\textbf{b}_j\right) ^2$ is non-negative, we can obtain that

$$\begin{aligned} J(\textbf{P}_{\textrm{tmp}}, \textbf{x}_{t+1}) \le J(\textbf{P}_{t}, \textbf{x}_{t+1}). \end{aligned}$$

(23)

Since $\textbf{P}_{t+1}$ is a global optimum of the objective function w.r.t. $\textbf{P}$, and it preserves the same number of inliers to $\textbf{P}_{\textrm{tmp}}$, we can further derive that

$$\begin{aligned} J(\textbf{P}_{t+1}, \textbf{x}_{t+1}) \le J(\textbf{P}_{\textrm{tmp}}, \textbf{x}_{t+1}), \end{aligned}$$

(24)

where the equality holds if and only if $\textbf{P}_\textrm{tmp}$ is a critical point. Summarizing Eqs. (22)-(24) leads to:

$$\begin{aligned} J(\textbf{P}_{t+1}, \textbf{x}_{t+1}) \le J(\textbf{P}_{\textrm{tmp}}, \textbf{x}_{t+1}) \le J(\textbf{P}_{t}, \textbf{x}_{t}). \end{aligned}$$

(25)

Therefore, we can conclude that Algorithm 1 is monotonically decreasing and the equality in Eq. (25) holds if and only if $\left( \textbf{P}_t, \textbf{x}_t\right) $ is already a critical point. $\square $

1.2 B.2 Proofs of Propositions 2 and 3

Throughout this appendix, apart from the notations described in Table 1, we also use C to denote a constant whose value may vary from line to line.

To prove Proposition 2, due to the non-smoothness of our Objective (6), we following the concept of the KL-inequality generalized to sub-analytic set (Bolte et al. 2007). Let us first remark the definitions for self-containment:

Definition 2

(Kurdyka-Łojasiewicz inequality) Let f be a continuous sub-analytic function (i.e., a function whose graph is a sub-analytic set) with closed domain that maps to $\mathbb {R}$, and ${\bar{x}} \in \textsf{dom}\left( f\right) $ be a critical point, there exists a concave function $\phi \left( s\right) = Cs^{1 - \theta }$ with $\theta \in \left[ 0, 1\right) $ and $C > 0$ such that the following inequality holds for any x within the neighborhood of ${\bar{x}}$ under the conventions $0^0=1$ and $0/0=\infty /\infty =0$:

$$\begin{aligned} \phi '\left( f(x) - f({\bar{x}})\right) \textrm{dis}\left( 0, \partial f(x) \right) \ge 1, \end{aligned}$$

(26)

where $\textrm{dis}\left( 0, \partial f(x)\right) = \inf \{ \left\| x^\star \right\| : x^\star \in \partial f(x) \}$ denotes the point-to-set distance.

Definition 3

(Semi-X & sub-analytic sets) A subset $\mathcal {S}$ of $\mathbb {R}^{d}$ is called semi-algebraic if every $\textbf{x} \in \mathbb {R}^{d}$ admits a neighborhood $\mathcal {N}$ such that

$$\begin{aligned} \mathcal {S} \cap \mathcal {N} = \bigcap _i^{m} \bigcup _j^{n} \left\{ \textbf{x} \in \mathcal {N} : f_{ij}(\textbf{x}) = 0, g_{ij}(\textbf{x}) \ge 0 \right\} , \end{aligned}$$

where $f_{ij}$ and $g_{ij}$ are polynomials and $m, n < +\infty $.

A semi-analytic set generalizes the above definition by relaxing $f_{ij}$ and $g_{ij}$ to real analytic functions.

A subset $\mathcal {S}$ of $\mathbb {R}^{d}$ is called sub-analytic if every $\textbf{x} \in \mathbb {R}^{d}$ admits a neighborhood $\mathcal {N}$, and there exists a bounded semi-analytic set $\mathcal {Y} \subset \mathbb {R}^{d+o}$ such that $\mathcal {S} \cap \mathcal {N}$ is the projection from $\mathcal {Y}$ to $\mathcal {N}$:

$$\begin{aligned} \mathcal {S} \cap \mathcal {N} = \pi (\mathcal {Y}), \end{aligned}$$

where $\pi : \mathbb {R}^{d+o} \rightarrow \mathbb {R}^{d}$ is the projection. In summary, sub-analytic set broadens the semi-analytic one, and the semi-analytic set broadens the semi-algebraic one.

We also introduce the following lemma:

Lemma 1

The set of generalized permutation matrices $\Pi $ can be characterized by the following polynomials:

$$\begin{aligned} \begin{aligned} \textbf{P1} \le \textbf{1} \ \textrm{and} \ \textbf{P}^T\textbf{1} \le \textbf{1}, \\ \textbf{e}_i^T \textbf{D} \textbf{e}_i \left( \textbf{e}_i^T \textbf{D} \textbf{e}_i - 1\right) = 0, \forall i \in \left\lceil n \right\rfloor , \\ \textbf{e}_i^T \textbf{D} \textbf{e}_j = 0, \forall i, j \in \left\lceil n \right\rfloor , i \ne j, \\ \end{aligned} \end{aligned}$$

where $\textbf{e}_i$ is the one-hot vector with 1 at position i, $\textbf{D} = \{ \textbf{PP}^T$, $\textbf{P}^T\textbf{P}\}$, and $\left\lceil n \right\rfloor $ is the set of positive integers that are no greater than the size of $\textbf{D}$.

Proof of Lemma 1 This lemma is an extension of the properties of the full-rank permutation matrices, which states that a matrix is permutational if and only if it is orthogonal and doubly stochastic (Fogel et al. 2013). The latter two equations imply that the diagonal entries of $\textbf{D}$ should be within $\{0, 1\}$ and the off-diagonal ones 0. While the sufficiency is guaranteed by the definition of generalized permutation matrices, we here demonstrate that these polynomials are necessary for a matrix to be permutational by forcing $\textbf{P}$ to be binary. Specifically, assuming there is a matrix $\textbf{P} \in \Pi $ with entries $\textbf{P}_{ip}, \ldots , \textbf{P}_{iq} \in \left( 0, 1\right) $ with $1 \le p < q \le n$ in the $i^{\textrm{th}}$ row, then the $pq^{\textrm{th}}$ entry of $\textbf{D} = \textbf{P}^T\textbf{P}$ can be denoted as

$$\begin{aligned} \begin{aligned} \textbf{D}_{pq} = \sum _k \textbf{P}^T_{pk} \textbf{P}_{kq} = \sum _k \textbf{P}_{kp} \textbf{P}_{kq} \ne 0, \end{aligned} \end{aligned}$$

(27)

which contradicts the requirement that the off-diagonal entries of $\textbf{D}$ should be 0. $\textbf{D} = \textbf{PP}^T$ follows the same proof. $\square $

Now we are ready to prove Proposition 2:

Proof of Proposition 2 The first property is a direct result of the fact that $\triangledown _\textbf{x} J = 2\left( \textbf{A}^T\textbf{P}^T\textbf{PA}\textbf{x} - \textbf{PP}^T\textbf{b}\right) $ is a continuous linear mapping.

For the second property, from Lemma 1 we know that $\Pi $ is semi-algebraic (hence sub-analytic). Moreover, given the facts that $\mathcal {X}$ is sub-analytic per the assumption, Objective (6) is polynomial, and unions and intersections preserve sub-analyticity per Definition 3; we have the graph of Objective (6) sub-analytic. Joining the sub-analyticity together with the continuity mentioned in Sect. 4.2.2 implies the KL-inequality. $\square $

To prove Proposition 3, we introduce the following two lemmas:

Lemma 2

Suppose that Hypotheses H are satisfied, then for any $t^{\textrm{th}}$ step of Algorithm 1, the following equation holds true:

$$\begin{aligned} \left( \triangledown _{\textbf{x}} J(\textbf{P}_{t}, \textbf{x}_t) - \triangledown _{\textbf{x}} J(\textbf{P}_{t-1}, \textbf{x}_t), \textbf{0}\right) \in \partial J(\textbf{P}_t, \textbf{x}_t ). \end{aligned}$$

Proof of Lemma 2 Per the alternative update rule of Algorithm 1 we have

$$\begin{aligned}{} & {} \textbf{0} = \triangledown _\textbf{x} J\left( \textbf{P}_{t-1}, \textbf{x}_t\right) , \end{aligned}$$

(28)

$$\begin{aligned}{} & {} \textbf{0} \in \partial _{\textbf{P}} J\left( \textbf{P}_{t}, \textbf{x}_t\right) . \end{aligned}$$

(29)

From Eq. (28) we can obtain

$$\begin{aligned} \begin{aligned} \triangledown _\textbf{x} J\left( \textbf{P}_{t}, \textbf{x}_t\right) - \triangledown _\textbf{x} J\left( \textbf{P}_{t-1}, \textbf{x}_t\right) \in \partial _{\textbf{x}} J\left( \textbf{P}_{t}, \textbf{x}_t\right) , \end{aligned} \end{aligned}$$

(30)

which together with Eq. (29) concludes the proof. $\square $

Lemma 3

Suppose that Hypotheses H are satisfied, then for any $t^{\textrm{th}}$ step of Algorithm 1, there exists $C \in \mathbb {R}^{+}$ such that the following equation holds true:

$$\begin{aligned} J_{t} - J_{t+1} \ge C \left\| \textbf{z}_{t+1} - \textbf{z}_t\right\| ^2. \end{aligned}$$

Proof of Lemma 3 We base the proof on the quotient law of convergent sequences. In detail, let us define the sequences:

1.
Sequence of difference between energy values: $P = \left\{ J_{t} - J_{t+1} : t \in \mathbb {N} \right\} $.
2.
Sequence of squared difference between solutions: $Q = \left\{ \left\| \textbf{z}_t - \textbf{z}_{t+1}\right\| ^2 : t \in \mathbb {N} \right\} $.
3.
Sequence of quotient: $S = \left\{ \frac{P_t}{Q_t} : t \in \mathbb {N} \right\} $.

Per the compactness of $\textbf{z}$ and continuity of J, we know that both $\textbf{z}$ and J are bounded. Moreover, per the global convergences mentioned in Sect. 4.2.2, for any $t \in \mathbb {N}$ we have $P_t, Q_t \ge 0$ and $P_t = 0 \Leftrightarrow Q_t = 0$. Therefore, with the convention $\frac{0}{0} = 1$, for any $s \in S$ we have $s \in \mathbb {R}_{+}$, which implies that there always exist some $C \in \mathbb {R}_{+}$ such that $\inf S \ge C$ holds. $\square $

With Lemmas 2 and 3 in hand, we now can prove Proposition 3:

Proof of Proposition 3 According to Lemma 2 and the Lipschitz continuity of $ \triangledown _{\textbf{x}} J$, we have

$$\begin{aligned} \begin{aligned} \textrm{dis}\left( \textbf{0}, \partial J(\textbf{z}_t)\right)&\le \left\| \triangledown _{\textbf{x}} J(\textbf{P}_{t-1}, \textbf{x}_t) - \triangledown _{\textbf{x}} J(\textbf{P}_t, \textbf{x}_t)\right\| \\&\le L \left\| \textbf{P}_{t-1} - \textbf{P}_t\right\| \le L \left\| \textbf{z}_{t-1} - \textbf{z}_t\right\| , \end{aligned} \end{aligned}$$

(31)

where $L \in \mathbb {R}^{+}$ is the Lipschitz constant.

Assuming the energy value ${\bar{J}}$ at the critical point satisfies ${\bar{J}} = 0$ without loss of generality, combining Eq. (31) and Eq. (26) leads to

$$\begin{aligned} \phi '(J_t) = \phi '( J_t - {\bar{J}}) \ge \frac{1}{L} \left\| \textbf{z}_{t-1} - \textbf{z}_t\right\| ^{-1}. \end{aligned}$$

(32)

On the other hand, by using the concavity of $\phi $ and Lemma 3, we can obtain

$$\begin{aligned} \begin{aligned} \phi (J_t) - \phi (J_{t+1})&\ge \phi '(J_t)\left( J_t - J_{t+1}\right) \\&\ge C \phi '(J_t) \left\| \textbf{z}_{t+1} - \textbf{z}_t\right\| ^2. \end{aligned} \end{aligned}$$

(33)

Taking Eq. (32) into Eq. (33) yields

$$\begin{aligned} \begin{aligned} \left\| \textbf{z}_{t-1} - \textbf{z}_t\right\| \cdot C \left( \phi (J_t) - \phi (J_{t+1}) \right) \ge \left\| \textbf{z}_{t+1} - \textbf{z}_t\right\| ^2. \end{aligned} \end{aligned}$$

(34)

Taking square root on both sides of Eq. (34) and applying the inequality $a^2 + b^2 \ge 2ab$ result in

$$\begin{aligned} \left\| \textbf{z}_{t-1} - \textbf{z}_t\right\| + C \left( \phi (J_t) - \phi (J_{t+1}) \right) \ge 2 \left\| \textbf{z}_{t+1} - \textbf{z}_t\right\| . \end{aligned}$$

(35)

Summing Eq. (35) on both sides for $t \in \left[ p, +\infty \right) $ leads to

$$\begin{aligned} \begin{aligned} \sum _{t=p}^{+\infty } \left\| \textbf{z}_{t+1} - \textbf{z}_{t}\right\| \le \left\| \textbf{z}_p - \textbf{z}_{p-1}\right\| + C \phi (J_p), \end{aligned} \end{aligned}$$

(36)

Since $\mathcal {Z}$ is compact as mentioned in Sect. 4.2.2, there exists a B such that $\sup \{\left\| \textbf{z}_i - \textbf{z}_j\right\| : \textbf{z}_i, \textbf{z}_j \in \mathcal {Z} \} = B < +\infty $. Hence Eq. (36) yields

$$\begin{aligned} \sum _{t=p}^{+\infty } \left\| \textbf{z}_{t+1} - \textbf{z}_{t}\right\| \le C \phi (J_p) + B < +\infty , \end{aligned}$$

(37)

which concludes the proof. $\square $

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Li, F., Fujiwara, K., Okura, F. et al. Shuffled Linear Regression with Outliers in Both Covariates and Responses. Int J Comput Vis 131, 732–751 (2023). https://doi.org/10.1007/s11263-022-01709-2

Download citation

Received: 02 March 2022
Accepted: 22 October 2022
Published: 14 December 2022
Issue Date: March 2023
DOI: https://doi.org/10.1007/s11263-022-01709-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Shuffled Linear Regression with Outliers in Both Covariates and Responses

Abstract

Access this article

Similar content being viewed by others

Linear Regression with Mismatched Data: A Provably Optimal Local Search Algorithm

Randomly weighted LAD-estimation for partially linear errors-in-variables models

Doubly robust estimation and robust empirical likelihood in generalized linear models with missing responses

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Appendix A: Supplementary information for the analyses of GSLR

1.1 A.1 Derivation of Corollary 1

Corollary 2

1.2 A.2 Proof of Proposition 1

Proof of Proposition 1

Appendix B: Supplementary information for the convergence analyses

1.1 B.1 Proofs of the global convergence

Definition 1

1.2 B.2 Proofs of Propositions 2 and 3

Definition 2

Definition 3

Lemma 1

Lemma 2

Lemma 3

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Shuffled Linear Regression with Outliers in Both Covariates and Responses

Abstract

Access this article

Similar content being viewed by others

Linear Regression with Mismatched Data: A Provably Optimal Local Search Algorithm

Randomly weighted LAD-estimation for partially linear errors-in-variables models

Doubly robust estimation and robust empirical likelihood in generalized linear models with missing responses

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Appendix A: Supplementary information for the analyses of GSLR

1.1 A.1 Derivation of Corollary 1

Corollary 2

1.2 A.2 Proof of Proposition 1

Proof of Proposition 1

Appendix B: Supplementary information for the convergence analyses

1.1 B.1 Proofs of the global convergence

Definition 1

1.2 B.2 Proofs of Propositions 2 and 3

Definition 2

Definition 3

Lemma 1

Lemma 2

Lemma 3

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation