Skip to main content
Log in

Shuffled Linear Regression with Outliers in Both Covariates and Responses

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

This paper studies a shuffled linear regression problem. As a variant of ordinary linear regression, it requires estimating not only the regression variable, but also permutational correspondences between the covariates and responses. While existing formulations require the underlying ground-truth correspondences to be an ideal bijection such that all pieces of data should match, such a requirement barely holds in real-world applications due to either missing data or outliers. In this work, we generalize the formulation of shuffled linear regression to a broader range of conditions where only a part of the data should correspond. To this end, the effective recovery condition and NP-hardness of the proposed formulation are also studied. Moreover, we present a simple yet effective algorithm for deriving the solution. Its global convergence property and convergence rate are also analyzed in detail. Distinct tasks validate the effectiveness of our proposed formulation and the solution method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

Notes

  1. Some implementations are from ProbReg: http://probreg.readthedocs.io/, last accessed on 2022/12/14 12:14:20.

References

  • Abid, A., & Zou, J. (2018). A stochastic expectation-maximization approach to shuffled linear regression. In Proceedings of annual allerton conference on communication, control, and computing.

  • Abid, A., Poon, A., & Zou, J. (2017). Linear regression with shuffled labels. ArXiv Preprint ArXiv:1705.01342.

  • Aoki, Y., Goforth, H., Srivatsan, R.A., & Lucey, S. (2019). Pointnetlk: Robust & efficient point cloud registration using pointnet. In Proceedings of conference on computer vision and pattern recognition (CVPR), pp. 7163–7172.

  • Arun, K. S., Huang, T. S., & Blostein, S. D. (1987). Least-squares fitting of two 3-d point sets. Transactions on Pattern Analysis and Machine Intelligence. https://doi.org/10.1109/TPAMI.1987.4767965.

    Article  Google Scholar 

  • Attouch, H., & Bolte, J. (2009). On the convergence of the proximal algorithm for nonsmooth functions involving analytic features. Mathematical Programming, 116(1), 5–16.

    Article  MathSciNet  MATH  Google Scholar 

  • Attouch, H., Bolte, J., Redont, P., & Soubeyran, A. (2010). Proximal alternating minimization and projection methods for nonconvex problems: An approach based on the kurdyka-łojasiewicz inequality. Mathematics of Operations Research, 35(2), 438–457.

    Article  MathSciNet  MATH  Google Scholar 

  • Aubry, M., Schlickewei, U., & Cremers, D. (2011). The wave kernel signature: A quantum mechanical approach to shape analysis. In Proceedings of international conference on computer vision workshops (ICCV workshops).

  • Bell, J., & Stevens, B. (2009). A survey of known results and research areas for n-queens. Discrete Mathematics, 309(1), 1–31.

    Article  MathSciNet  MATH  Google Scholar 

  • Birdal, T., & Simsekli, U. (2019). Probabilistic permutation synchronization using the riemannian structure of the birkhoff polytope. In Proceedings of conference on computer vision and pattern recognition (CVPR), pp. 11,105–11,116.

  • Bogo, F., Romero, J., Loper, M., & Black, M.J. (2014). FAUST: Dataset and evaluation for 3D mesh registration. In Proceedings of conference on computer vision and pattern recognition (CVPR).

  • Bolte, J., Daniilidis, A., & Lewis, A. (2007). The łojasiewicz inequality for nonsmooth subanalytic functions with applications to subgradient dynamical systems. SIAM Journal on Optimization, 17(4), 1205–1223.

    Article  MATH  Google Scholar 

  • Bolte, J., Sabach, S., & Teboulle, M. (2014). Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Mathematical Programming, 146(1), 459–494.

    Article  MathSciNet  MATH  Google Scholar 

  • Bronstein, A.M., Bronstein, M.M., & Kimmel, R. (2008). Numerical geometry of non-rigid shapes.

  • Cai, Z., Chin, T.J., Le, H., & Suter, D. (2018) Deterministic consensus maximization with biconvex programming. In Proceedings of European conference on computer vision (ECCV), pp. 685–700.

  • Campbell, D., & Petersson, L. (2015). An adaptive data representation for robust point-set registration and merging. In Proceedings of international conference on computer vision (ICCV).

  • Chetverikov, D., Svirko, D., Stepanov, D., & Krsek, P. (2002). The trimmed iterative closest point algorithm. Object Recognition Supported by User Interaction for Service Robots, 3, 545–548.

    Article  Google Scholar 

  • Chin, T. J., & Suter, D. (2017). The maximum consensus problem: Recent algorithmic advances. Synthesis Lectures on Computer Vision, 7(2), 1–194.

    Article  Google Scholar 

  • Choi, S., Kim, T., & Yu, W. (2009) Performance evaluation of RANSAC family. In Proceedings of British machine vision conference (BMVC).

  • Curless, B., & Levoy, M. (1996). A volumetric method for building complex models from range images. In Proceedings of annual conference on computer graphics and interactive techniques.

  • Date, K., & Nagi, R. (2016). Gpu-accelerated hungarian algorithms for the linear assignment problem. Parallel Computing, 57, 52–72.

    Article  MathSciNet  Google Scholar 

  • De Menezes, D., Prata, D. M., Secchi, A. R., & Pinto, J. C. (2021). A review on robust m-estimators for regression analysis. Computers & Chemical Engineering, 147(107), 254.

    Google Scholar 

  • Doornik, J.A. (2011). Robust estimation using least trimmed squares. Tech. rep., Institute for Economic Modelling, Oxford Martin School, and Economics Department, University of Oxford, UK.

  • Eckart, B., Kim, K., & Jan, K. (2018). Eoe: Expected overlap estimation over unstructured point cloud data. In Proceedings of international conference on 3D vision (3DV), pp. 747–755.

  • Elhami, G., Scholefield, A., Haro, B.B., & Vetterli, M. (2017). Unlabeled sensing: Reconstruction algorithm and theoretical guarantees. In Proceedings of international conference on acoustics, speech, and signal processing (ICASSP), pp. 4566–4570.

  • Fiori, M., Sprechmann, P., Vogelstein, J., Musé, P., & Sapiro, G. (2013). Robust multimodal graph matching: Sparse coding meets graph matching. In Proceedings of conference on neural information processing systems (NIPS).

  • Fischler, M. A., & Bolles, R. C. (1981). Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM, 24(6), 381–395.

    Article  MathSciNet  Google Scholar 

  • Fogel, F., Jenatton, R., Bach, F., & d’Aspremont, A. (2013). Convex relaxations for permutation problems. In Proceedings of conference on neural information processing systems (NIPS).

  • Gao, W., & Tedrake, R. (2019). Filterreg: Robust and efficient probabilistic point-set registration using gaussian filter and twist parameterization. In Proceedings of conference on computer vision and pattern recognition (CVPR).

  • Gold, S., Rangarajan, A., Lu, C. P., Pappu, S., & Mjolsness, E. (1998). New algorithms for 2d and 3d point matching: Pose estimation and correspondence. Pattern Recognition, 31(8), 1019–1031.

    Article  Google Scholar 

  • Gunawardana, A., & Byrne, W. (2005). Convergence theorems for generalized alternating minimization procedures. Journal of Machine Learning Research, 6, 2049–2073.

    MathSciNet  MATH  Google Scholar 

  • Haghighatshoar, S., & Caire, G. (2017). Signal recovery from unlabeled samples. Transactions on Signal Processing, 66(5), 1242–1257.

    Article  MathSciNet  MATH  Google Scholar 

  • Hahnel, D., Burgard, W., Fox, D., Fishkin, K., & Philipose, M. (2004). Mapping and localization with rfid technology. In Proceedings of international conference on robotics and automation (ICRA), vol. 1, pp. 1015–1020.

  • Hampel, F. (2014). Robust inference. Statistics Reference Online.

  • Hampel, F. R. (1985). The breakdown points of the mean combined with some rejection rules. Technometrics, 27, 95–107.

    Article  MathSciNet  MATH  Google Scholar 

  • Hartley, R., & Zisserman, A. (2003). Multiple view geometry in computer vision.

  • Hawkins, D. M. (1994). The feasible solution algorithm for least trimmed squares regression. Computational Statistics & Data Analysis, 17(2), 185–196.

    Article  MATH  Google Scholar 

  • Hsu, D.J., Shi, K., & Sun, X. (2017) Linear regression without correspondence. In Proceedings of conference on neural information processing systems (NIPS).

  • Huber, P.J. (1992). Robust estimation of a location parameter. In Breakthroughs in statistics, pp. 492–518.

  • Jia, K., Chan, T. H., Zeng, Z., Gao, S., Wang, G., Zhang, T., & Ma, Y. (2016). Roml: A robust feature correspondence approach for matching objects in a set of images. International Journal of Computer Vision, 117(2), 173–197.

    Article  MathSciNet  MATH  Google Scholar 

  • Jiang, H., Stella, X. Y., & Martin, D. R. (2010). Linear scale and rotation invariant matching. Transactions on Pattern Analysis and Machine Intelligence, 33(7), 1339–1355.

    Article  Google Scholar 

  • Kuhn, A., & Mayer, H. (2015). Incremental division of very large point clouds for scalable 3d surface reconstruction. In Proceedings of international conference on computer vision workshops (ICCV workshops).

  • Kuhn, H. W. (1955). The hungarian method for the assignment problem. Naval Research Logistics Quarterly, 2(1–2), 83–97.

    Article  MathSciNet  MATH  Google Scholar 

  • Larranaga, P., Kuijpers, C. M. H., Murga, R. H., Inza, I., & Dizdarevic, S. (1999). Genetic algorithms for the travelling salesman problem: A review of representations and operators. Artificial Intelligence Review, 13(2), 129–170.

    Article  Google Scholar 

  • Le, H., Chin, T.J., & Suter, D. (2017) An exact penalty method for locally convergent maximum consensus. In Proceedings of conference on computer vision and pattern recognition (CVPR), pp. 1888–1896.

  • Le, H., Chin, T. J., Eriksson, A., Do, T. T., & Suter, D. (2019). Deterministic approximate methods for maximum consensus robust fitting. Transactions on Pattern Analysis and Machine Intelligence, 43(3), 842–857.

    Article  Google Scholar 

  • Li, H., & Hartley, R. (2007). The 3d-3d registration problem revisited. In Proceedings of international conference on computer vision (ICCV).

  • Li, F., Fujiwara, k., Okura, F., & Matsushita, Y. (2021). Generalized shuffled linear regression. In Proceedings of international conference on computer vision (ICCV).

  • Lian, W., & Zhang, L. (2014). Point matching in the presence of outliers in both point sets: A concave optimization approach. In Proceedings of conference on computer vision and pattern recognition (CVPR).

  • Li, J., So, A. M. C., & Ma, W. K. (2020). Understanding notions of stationarity in nonsmooth optimization: A guided tour of various constructions of subdifferential for nonsmooth functions. Signal Processing Magazine, 37(5), 18–31.

    Article  Google Scholar 

  • Lowe, D.G. (1999). Object recognition from local scale-invariant features. In Proceedings of international conference on computer vision (ICCV).

  • Lubiw, A. (1981). Some np-complete problems similar to graph isomorphism. SIAM Journal on Computing, 10(1), 11–21.

    Article  MathSciNet  MATH  Google Scholar 

  • Maciel, J., & Costeira, J. P. (2003). A global solution to sparse correspondence problems. Transactions on Pattern Analysis and Machine Intelligence, 25(2), 187–199.

    Article  Google Scholar 

  • Marques, M., Stošić, M., & Costeira, J. (2009). Subspace matching: Unique solution to point matching with geometric constraints. In Proceedings of international conference on computer vision (ICCV), pp. 1288–1294.

  • Maset, E., Arrigoni, F., & Fusiello, A. (2017). Practical and efficient multi-view matching. In Proceedings of international conference on computer vision (ICCV), pp. 4568–4576.

  • Mathias, R. (2006). The Linear Algebra a Beginning Graduate Student Ought to Know.

  • Melzi, S., Ren, J., Rodolà, E., Sharma, A., Wonka, P., & Ovsjanikov, M. (2019). Zoomout: Spectral upsampling for efficient shape correspondence. Transactions on Graphics, 38(6).

  • Mohamed, I. S., Capitanelli, A., Mastrogiovanni, F., Rovetta, S., & Zaccaria, R. (2019). A 2d laser rangefinder scans dataset of standard eur pallets. Data in brief, 24, 103837.

    Article  Google Scholar 

  • Myronenko, A., & Song, X. (2010). Point set registration: Coherent point drift. Transactions on Pattern Analysis and Machine Intelligence, 32(12), 2262–2275.

    Article  Google Scholar 

  • Narayanan, A., & Shmatikov, V. (2008). Robust de-anonymization of large sparse datasets. In Proceedings of symposium on security and privacy.

  • Nejatbakhsh, A., & Varol, E. (2021). Neuron matching in c. elegans with robust approximate linear regression without correspondence. In Proceedings of winter conference on applications of computer vision (WACV).

  • Ovsjanikov, M., Ben-Chen, M., Solomon, J., Butscher, A., & Guibas, L. (2012). Functional maps: A flexible representation of maps between shapes. Transactions on Graphics, 31(4).

  • Pachauri, D., Kondor, R., & Singh, V. (2013). Solving the multi-way matching problem by permutation synchronization. In Proceedings of conference on neural information processing systems (NIPS), vol. 26.

  • Pananjady, A., Wainwright, M.J., & Courtade, T.A. (2017). Denoising linear models with permuted data. In Proceedings of international symposium on information theory (ISIT).

  • Pananjady, A., Wainwright, M. J., & Courtade, T. A. (2017). Linear regression with shuffled data: Statistical and computational limits of permutation recovery. Transactions on Information Theory, 64(5), 3286–3300.

    Article  MathSciNet  MATH  Google Scholar 

  • Pomerleau, F., Liu, M., Colas, F., & Siegwart, R. (2012). Challenging data sets for point cloud registration algorithms. International Journal of Robotics Research, 31(14), 1705–1711.

    Article  Google Scholar 

  • Pylvänäinen, T., Berclaz, J., Korah, T., Hedau, V., Aanjaneya, M., & Grzeszczuk, R. (2012). 3d city modeling from street-level data for augmented reality applications. In Proceedings of international conference on 3D imaging, modeling, processing, visualization & transmission, pp. 238–245.

  • Ren, J., Poulenard, A., Wonka, P., & Ovsjanikov, M. (2018). Continuous and orientation-preserving correspondences via functional maps. Transactions on Graphics, 37(6), 1–6.

    Article  Google Scholar 

  • Rousseeuw, P.J., & Leroy, A.M. (2005). Robust regression and outlier detection.

  • Rusinkiewicz, S. (2019). A symmetric objective function for icp. Transactions on Graphics, 38(4).

  • Rusu, R.B., Blodow, N., & Beetz, M. (2009). Fast point feature histograms (fpfh) for 3d registration. In Proceedings of international conference on robotics and automation (ICRA).

  • Shiratori, T., Berclaz, J., Harville, M., Shah, C., Li, T., Matsushita, Y., & Shiller, S. (2015). Efficient large-scale point cloud registration using loop closures. In Proceedings of international conference on 3D vision (3DV), pp. 232–240.

  • Slawski, M., Ben-David, E., et al. (2019). Linear regression with sparsely permuted data. Electronic Journal of Statistics, 13(1), 1–36.

    Article  MathSciNet  MATH  Google Scholar 

  • Slawski, M., Ben-David, E., & Li, P. (2019). A two-stage approach to multivariate linear regression with sparsely mismatched data. Journal of Machine Learning Research, 21(204), 1–42.

    MATH  Google Scholar 

  • Stošić, M., Marques, M., & Costeira, J. P. (2011). Convex solution of a permutation problem. Linear Algebra and its Applications, 434(1), 361–369.

    Article  MathSciNet  MATH  Google Scholar 

  • Theiler, P., Schindler, K., et al. (2012). Automatic registration of terrestrial laser scanner point clouds using natural planar surfaces. ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, 3, 173–178.

    Article  Google Scholar 

  • Tsakiris, M., & Peng, L. (2019). Homomorphic sensing. In Proceedings of international conference on machine learning (ICML).

  • Unnikrishnan, J., Haghighatshoar, S., & Vetterli, M. (2018). Unlabeled sensing with random linear measurements. Transactions on Information Theory, 64(5), 3237–3253.

    Article  MathSciNet  MATH  Google Scholar 

  • Vestner, M., Lähner, Z., Boyarski, A., Litany, O., Slossberg, R., Remez, T., Rodola, E., Bronstein, A., Bronstein, M., & Kimmel, R., et al. (2017). Efficient deformable shape correspondence via kernel matching. In Proceedings of international conference on 3D vision (3DV).

  • Volgenant, A. (2004). Solving the k-cardinality assignment problem by transformation. European Journal of Operational Research, 157(2), 322–331.

    Article  MathSciNet  MATH  Google Scholar 

  • Vongkulbhisal, J., De la Torre, F., & Costeira, J. P. (2018). Discriminative optimization: Theory and applications to computer vision. Transactions on Pattern Analysis and Machine Intelligence, 41(4), 829–843.

    Article  Google Scholar 

  • Wang, F., Xue, N., Yu, J.G., & Xia, G.S. (2020). Zero-assignment constraint for graph matching with outliers. In Proceedings of conference on computer vision and pattern recognition (CVPR).

  • Wu, Z., Song, S., Khosla, A., Yu, F., Zhang, L., Tang, X., & Xiao, J. (2015). 3d shapenets: A deep representation for volumetric shapes. In Proceedings of conference on computer vision and pattern recognition (CVPR), pp. 1912–1920.

  • Xu, Y., & Yin, W. (2013). A block coordinate descent method for regularized multiconvex optimization with applications to nonnegative tensor factorization and completion. SIAM Journal on Imaging Sciences, 6(3), 1758–1789.

    Article  MathSciNet  MATH  Google Scholar 

  • Xu, Y., & Yin, W. (2017). A globally convergent algorithm for nonconvex optimization based on block coordinate update. Journal of Scientific Computing, 72(2), 700–734.

    Article  MathSciNet  MATH  Google Scholar 

  • Yadav, S.S., Lopes, P.A.C., Ilic, A., & Patra, S.K. (2019). Hungarian algorithm for subcarrier assignment problem using gpu and cuda. International Journal of Communication Systems, 32(4).

  • Yang, E., Lozano, A. C., & Aravkin, A. (2018). A general family of trimmed estimators for robust high-dimensional data analysis. Electronic Journal of Statistics, 12(2), 3519–3553.

    Article  MathSciNet  MATH  Google Scholar 

  • Yang, H., Shi, J., & Carlone, L. (2020). Teaser: Fast and certifiable point cloud registration. Transactions on Robotics, 37(2), 314–333.

    Article  Google Scholar 

  • Zangwill, W.I. (1969). Nonlinear programming: a unified approach.

  • Zhang, H., Slawski, M., & Li, P. (2019). Permutation recovery from multiple measurement vectors in unlabeled sensing. In Proceedings of international symposium on information theory (ISIT).

  • Zhou, Q.Y., Park, J., & Koltun, V. (2016). Fast global registration. In Proceedings of European conference on computer vision (ECCV).

Download references

Acknowledgements

This work was supported by NII CRIS collaborative research program operated by NII CRIS and LINE Corporation.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Feiran Li.

Additional information

Communicated by Federica Arrigoni.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Supplementary information for the analyses of GSLR

1.1 A.1 Derivation of Corollary 1

Given a generic linear subspace \(\mathcal {V} \subset \mathbb {R}^m\) of dimension d (hereafter denoted as \(\textsf{dim}\left( \mathcal {V}\right) = d\)), and a finite set of endomorphisms \(\mathcal {T}\) of \(\mathbb {R}^m\), homomorphic sensing (Tsakiris and Peng 2019) presents the unique recovery condition in \(\mathcal {V}\) under \(\mathcal {T}\). I.e., \(\forall v_1, v_2 \in \mathcal {V}, \tau _1, \tau _2 \in \mathcal {T}\), when can \(\tau _1\left( v_1\right) =\tau _2\left( v_2\right) \) imply \(v_1 = v_2\).

For self-containment, let us first remark Theorem 1 and Proposition 3 of Tsakiris and Peng (2019), which can be jointly described by the following corollary:

Corollary 2

(Unique recovery condition of homomorphic sensing) Assuming \(\forall \tau \in \mathcal {T}, \textsf{rank}\left( \tau \right) \ge 2d\), then, we have unique recovery in \(\mathcal {V}\) under \(\mathcal {T}\) as long as

$$\begin{aligned} \textsf{dim}\left( \mathcal {V}\right) \le \mathcal {Q}_{\tau _1, \tau _2}, \ \forall \tau _1, \tau _2 \in \mathcal {T}, \end{aligned}$$
(16)

where \(\mathcal {Q}_{\tau _1, \tau _2}\) is an intermediate technical term dependent to \(\tau _1\) and \(\tau _2\) such that, in case \(\pi _1, \pi _2 \in \Pi ^\ddagger \) are full-rank square permutation matrices, we have

$$\begin{aligned} \mathcal {Q}_{\rho \pi _1, \rho \pi _2} \ge \frac{\textsf{rank}\left( \rho \right) }{2}, \ \forall \pi _1, \pi _2 \in \Pi ^\ddagger , \end{aligned}$$
(17)

where \(\rho \) denotes a coordinate projection matrix.

To apply Corollary 2 to GSLR, we shall consider \(\textbf{Ax}\) as a whole for some d-dimensional \(\textbf{x}\) and set \(\mathcal {V} = \{\textbf{Ax} : \textbf{x} \in \mathcal {X}\}\). Since linear mapping does not increase dimension, we have \(\textsf{dim}\left( \mathcal {V}\right) \le d\). Futhermore, by setting \(\mathcal {T} = \Pi _k\) and the \(\rho \) in Eq. (17) to the projections that map from \(\Pi ^\ddagger \) to \(\Pi _k\), we obtain

$$\begin{aligned} \mathcal {Q}_{\tau _1, \tau _2} \ge \frac{\textsf{rank}\left( \rho \right) }{2} = \frac{k}{2}, \ \forall \tau _1, \tau _2 \in \Pi _k. \end{aligned}$$
(18)

Using Eq. (18) as a lower bound for Eq. (16), we can immediately recognize \(k \ge 2d\) as a sufficient condition for unique recovery.

1.2 A.2 Proof of Proposition 1

Proof of Proposition 1

Since the bijectivity will always be satisfied as long as Problem 2 is solved, we here show that if there exists an f to solve Problem 2 that takes the graphs G and H defined in Proposition 1 as inputs, then this f is a linear mapping from a subset of \(\textbf{A}\) to \(\textbf{b}\). Specifically, since \(\textsf{dom}\left( f\right) \) is a finitely discrete set, an f that solves Problem 2 would be a continuous mapping. Furthermore, assuming that vertices are mapped in the form of \(\textbf{A}_i\) to \(\textbf{b}_i\) and \(\textbf{A}_j\) to \(\textbf{b}_j\), we have

$$\begin{aligned} \underbrace{f\left( \textbf{A}_i + \textbf{A}_j\right) }_{\text {edge in} {\tilde{G}}} = \underbrace{\textbf{b}_i + \textbf{b}_j}_{\hbox { edge in}\ H} = \underbrace{f\left( \textbf{A}_i\right) + f\left( \textbf{A}_j\right) }_{\hbox { vertices in}\ H} \end{aligned}$$
(19)

always holds w.r.t.  the preimage of \(\textbf{b}\), indicating that f is additive. By combining the continuity and the additivity we can obtain that f is a linear mapping (Mathias 2006). \(\square \) \(\square \)

Appendix B: Supplementary information for the convergence analyses

1.1 B.1 Proofs of the global convergence

We here show that the GSLR triplet mentioned in Sect. 4.2.2 satisfies all the conditions of Theorem 1. For self- containment, we first remark the definition of closedness of set-valued mapping:

Definition 1

(Closedness of set-valued mapping (Gunawardana and Byrne 2005)) The concept of closedness of set-valued mapping \(\mathcal {F}: \mathcal {U} \rightarrow \mathcal {V}\) generalizes the notion of continuity of point-to-point mappings: \(\mathcal {F}\) is closed at a point \(u \in \mathcal {U}\) if

  • \(u_k \rightarrow u\), where \(u_k \in \mathcal {U}\),

  • \(v_k \rightarrow v\), where \(v, v_k \in \mathcal {V}\),

  • \(v_k \in \mathcal {F}(u_k)\),

imply \(v \in \mathcal {F}(u)\). Based on this, \(\mathcal {F}\) is called closed if it is closed at any points of \(\mathcal {U}\).

Now we are ready to introduce the proofs:

Proof of the compactness of \(\mathcal {Z}\) Since \(\Pi \) is discrete and finite, \(\Pi \) is compact. Since \(\mathcal {X}\) is compact per Hypotheses H and unions of compact sets are still compact, \(\mathcal {Z}\) is compact. \(\square \)

Proof of the closedness of \( \mathcal {H}\) For any \(\left( \textbf{P}_{t}, \textbf{x}_{t}\right) \), there exist some \(\delta _1, \delta _2 \in \mathbb {R}^{+}\) that satisfy

$$\begin{aligned} \begin{aligned} \forall \textbf{P}', \ \textbf{P}' \rightarrow \textbf{P} :=\{\textbf{P}' : \textbf{P}' \in \mathcal {N}_{\delta _1}(\textbf{P}_t)\} = \{\textbf{P}_t\}, \\ \forall \textbf{x}', \ \textbf{x}' \rightarrow \textbf{x} :=\left|\mathbf {x'}, \textbf{x}_t\right|_\textbf{x} < \delta _2, \end{aligned} \end{aligned}$$
(20)

where \(\mathcal {N}_{\delta _1}(\textbf{P}_t)\) is the neighborhood of \(\textbf{P}_t\). Therefore, we can derive the differences of mapped \(\textbf{x}\) as

$$\begin{aligned} \begin{aligned}&\mathcal {H}(\mathbf {P', x'}) \setminus \mathcal {H}(\textbf{P}_t, \textbf{x}_t) = \mathcal {H}(\textbf{P}_t, \textbf{x}_t) \setminus \mathcal {H}(\mathbf {P', x'}) \\ =&\mathcal {H}(\textbf{P}_t, \textbf{x}') \setminus \mathcal {H}(\textbf{P}_t, \textbf{x}_t) \\ =&\mathop {\textrm{argmin}}\limits _{\textbf{x} \in \mathcal {X}}\Psi (\textbf{P}_t, \textbf{x}) \setminus \mathop {\textrm{argmin}}\limits _{\textbf{x} \in \mathcal {X}}\Psi (\textbf{P}_t, \textbf{x}) = \emptyset . \end{aligned} \end{aligned}$$
(21)

Since Eq. (21) implies that the cost matrix for the k-LAP generated by the regression variable \(\textbf{x} \in \mathcal {H}(\textbf{P, x})\) remains unchanged when the input of \(\mathcal {H}\) switches from \(\left( \textbf{P}_t, \textbf{x}_t\right) \) to \(\left( \mathbf {P'}, \mathbf {x'}\right) \), so will the mapped permutation \(\textbf{P}\). Therefore, we have \(\mathcal {H}(\textbf{P}_{t}, \textbf{x}_{t})\) closed by further joining Eqs. (20) and (21). \(\square \)

Proof of the continuity and monotonicity of \(\Psi \) For the continuity, since compositions of continuous mappings are still continuous, we only need to prove the continuity of \(f(\textbf{P}, \textbf{x}) = \textbf{PAx}\) and \(g(\textbf{P}) = \textbf{PP}^T\textbf{b}\). Since the proof roadmap for f and g is similar, we only take f as an example. Specifically, given the fact that such a function is a bilinear form over the finite-dimensional space \(\left( \textbf{P, x}\right) \), we only need to show that it is separately continuous w.r.t.  both \(\textbf{P}\) and \(\textbf{x}\). For proof, per Hypotheses H, we have \(f(\textbf{x}) = \textbf{Ax}\) continuous w.r.t.  \(\textbf{x} \in \mathcal {X}\). Since \(\Pi \) is finitely discrete, we also have \(f(\textbf{P}) = \textbf{PA}\) continuous w.r.t.  any \(\textbf{P} \in \Pi \). Therefore, we can obtain that \(f(\textbf{P}, \textbf{x}) = \textbf{PAx}\) is continuous and hence so is \(\Psi \).

For the monotonicity, by conducting the linear regression step w.r.t.  \(\textbf{x}\) we have

$$\begin{aligned} J(\textbf{P}_t, \textbf{x}_{t+1}) \le J(\textbf{P}_t, \textbf{x}_{t}), \end{aligned}$$
(22)

where the equality holds if and only if \(\textbf{x}_t\) is a critical point. Furthermore, assuming there is a temporary permutation matrix \(\textbf{P}_{\textrm{tmp}}\) that randomly removes \(k_{t+1} - k_t\) non-zero assignments from \(\textbf{P}_t\), since the cost matrix \(\textbf{D}_{ij} = \left( \textbf{A}_i\textbf{x}-\textbf{b}_j\right) ^2\) is non-negative, we can obtain that

$$\begin{aligned} J(\textbf{P}_{\textrm{tmp}}, \textbf{x}_{t+1}) \le J(\textbf{P}_{t}, \textbf{x}_{t+1}). \end{aligned}$$
(23)

Since \(\textbf{P}_{t+1}\) is a global optimum of the objective function w.r.t.  \(\textbf{P}\), and it preserves the same number of inliers to \(\textbf{P}_{\textrm{tmp}}\), we can further derive that

$$\begin{aligned} J(\textbf{P}_{t+1}, \textbf{x}_{t+1}) \le J(\textbf{P}_{\textrm{tmp}}, \textbf{x}_{t+1}), \end{aligned}$$
(24)

where the equality holds if and only if \(\textbf{P}_\textrm{tmp}\) is a critical point. Summarizing Eqs. (22)-(24) leads to:

$$\begin{aligned} J(\textbf{P}_{t+1}, \textbf{x}_{t+1}) \le J(\textbf{P}_{\textrm{tmp}}, \textbf{x}_{t+1}) \le J(\textbf{P}_{t}, \textbf{x}_{t}). \end{aligned}$$
(25)

Therefore, we can conclude that Algorithm 1 is monotonically decreasing and the equality in Eq. (25) holds if and only if \(\left( \textbf{P}_t, \textbf{x}_t\right) \) is already a critical point. \(\square \)

1.2 B.2 Proofs of Propositions 2 and 3

Throughout this appendix, apart from the notations described in Table 1, we also use C to denote a constant whose value may vary from line to line.

To prove Proposition 2, due to the non-smoothness of our Objective (6), we following the concept of the KL-inequality generalized to sub-analytic set (Bolte et al. 2007). Let us first remark the definitions for self-containment:

Definition 2

(Kurdyka-Łojasiewicz inequality) Let f be a continuous sub-analytic function (i.e., a function whose graph is a sub-analytic set) with closed domain that maps to \(\mathbb {R}\), and \({\bar{x}} \in \textsf{dom}\left( f\right) \) be a critical point, there exists a concave function \(\phi \left( s\right) = Cs^{1 - \theta }\) with \(\theta \in \left[ 0, 1\right) \) and \(C > 0\) such that the following inequality holds for any x within the neighborhood of \({\bar{x}}\) under the conventions \(0^0=1\) and \(0/0=\infty /\infty =0\):

$$\begin{aligned} \phi '\left( f(x) - f({\bar{x}})\right) \textrm{dis}\left( 0, \partial f(x) \right) \ge 1, \end{aligned}$$
(26)

where \(\textrm{dis}\left( 0, \partial f(x)\right) = \inf \{ \left\| x^\star \right\| : x^\star \in \partial f(x) \}\) denotes the point-to-set distance.

Definition 3

(Semi-X & sub-analytic sets) A subset \(\mathcal {S}\) of \(\mathbb {R}^{d}\) is called semi-algebraic if every \(\textbf{x} \in \mathbb {R}^{d}\) admits a neighborhood \(\mathcal {N}\) such that

$$\begin{aligned} \mathcal {S} \cap \mathcal {N} = \bigcap _i^{m} \bigcup _j^{n} \left\{ \textbf{x} \in \mathcal {N} : f_{ij}(\textbf{x}) = 0, g_{ij}(\textbf{x}) \ge 0 \right\} , \end{aligned}$$

where \(f_{ij}\) and \(g_{ij}\) are polynomials and \(m, n < +\infty \).

A semi-analytic set generalizes the above definition by relaxing \(f_{ij}\) and \(g_{ij}\) to real analytic functions.

A subset \(\mathcal {S}\) of \(\mathbb {R}^{d}\) is called sub-analytic if every \(\textbf{x} \in \mathbb {R}^{d}\) admits a neighborhood \(\mathcal {N}\), and there exists a bounded semi-analytic set \(\mathcal {Y} \subset \mathbb {R}^{d+o}\) such that \(\mathcal {S} \cap \mathcal {N}\) is the projection from \(\mathcal {Y}\) to \(\mathcal {N}\):

$$\begin{aligned} \mathcal {S} \cap \mathcal {N} = \pi (\mathcal {Y}), \end{aligned}$$

where \(\pi : \mathbb {R}^{d+o} \rightarrow \mathbb {R}^{d}\) is the projection. In summary, sub-analytic set broadens the semi-analytic one, and the semi-analytic set broadens the semi-algebraic one.

We also introduce the following lemma:

Lemma 1

The set of generalized permutation matrices \(\Pi \) can be characterized by the following polynomials:

$$\begin{aligned} \begin{aligned} \textbf{P1} \le \textbf{1} \ \textrm{and} \ \textbf{P}^T\textbf{1} \le \textbf{1}, \\ \textbf{e}_i^T \textbf{D} \textbf{e}_i \left( \textbf{e}_i^T \textbf{D} \textbf{e}_i - 1\right) = 0, \forall i \in \left\lceil n \right\rfloor , \\ \textbf{e}_i^T \textbf{D} \textbf{e}_j = 0, \forall i, j \in \left\lceil n \right\rfloor , i \ne j, \\ \end{aligned} \end{aligned}$$

where \(\textbf{e}_i\) is the one-hot vector with 1 at position i, \(\textbf{D} = \{ \textbf{PP}^T\), \(\textbf{P}^T\textbf{P}\}\), and \(\left\lceil n \right\rfloor \) is the set of positive integers that are no greater than the size of \(\textbf{D}\).

Proof of Lemma 1 This lemma is an extension of the properties of the full-rank permutation matrices, which states that a matrix is permutational if and only if it is orthogonal and doubly stochastic (Fogel et al. 2013). The latter two equations imply that the diagonal entries of \(\textbf{D}\) should be within \(\{0, 1\}\) and the off-diagonal ones 0. While the sufficiency is guaranteed by the definition of generalized permutation matrices, we here demonstrate that these polynomials are necessary for a matrix to be permutational by forcing \(\textbf{P}\) to be binary. Specifically, assuming there is a matrix \(\textbf{P} \in \Pi \) with entries \(\textbf{P}_{ip}, \ldots , \textbf{P}_{iq} \in \left( 0, 1\right) \) with \(1 \le p < q \le n\) in the \(i^{\textrm{th}}\) row, then the \(pq^{\textrm{th}}\) entry of \(\textbf{D} = \textbf{P}^T\textbf{P}\) can be denoted as

$$\begin{aligned} \begin{aligned} \textbf{D}_{pq} = \sum _k \textbf{P}^T_{pk} \textbf{P}_{kq} = \sum _k \textbf{P}_{kp} \textbf{P}_{kq} \ne 0, \end{aligned} \end{aligned}$$
(27)

which contradicts the requirement that the off-diagonal entries of \(\textbf{D}\) should be 0. \(\textbf{D} = \textbf{PP}^T\) follows the same proof. \(\square \)

Now we are ready to prove Proposition 2:

Proof of Proposition 2 The first property is a direct result of the fact that \(\triangledown _\textbf{x} J = 2\left( \textbf{A}^T\textbf{P}^T\textbf{PA}\textbf{x} - \textbf{PP}^T\textbf{b}\right) \) is a continuous linear mapping.

For the second property, from Lemma 1 we know that \(\Pi \) is semi-algebraic (hence sub-analytic). Moreover, given the facts that \(\mathcal {X}\) is sub-analytic per the assumption, Objective (6) is polynomial, and unions and intersections preserve sub-analyticity per Definition 3; we have the graph of Objective (6) sub-analytic. Joining the sub-analyticity together with the continuity mentioned in Sect. 4.2.2 implies the KL-inequality. \(\square \)

To prove Proposition 3, we introduce the following two lemmas:

Lemma 2

Suppose that Hypotheses H are satisfied, then for any \(t^{\textrm{th}}\) step of Algorithm 1, the following equation holds true:

$$\begin{aligned} \left( \triangledown _{\textbf{x}} J(\textbf{P}_{t}, \textbf{x}_t) - \triangledown _{\textbf{x}} J(\textbf{P}_{t-1}, \textbf{x}_t), \textbf{0}\right) \in \partial J(\textbf{P}_t, \textbf{x}_t ). \end{aligned}$$

Proof of Lemma 2 Per the alternative update rule of Algorithm 1 we have

$$\begin{aligned}{} & {} \textbf{0} = \triangledown _\textbf{x} J\left( \textbf{P}_{t-1}, \textbf{x}_t\right) , \end{aligned}$$
(28)
$$\begin{aligned}{} & {} \textbf{0} \in \partial _{\textbf{P}} J\left( \textbf{P}_{t}, \textbf{x}_t\right) . \end{aligned}$$
(29)

From Eq. (28) we can obtain

$$\begin{aligned} \begin{aligned} \triangledown _\textbf{x} J\left( \textbf{P}_{t}, \textbf{x}_t\right) - \triangledown _\textbf{x} J\left( \textbf{P}_{t-1}, \textbf{x}_t\right) \in \partial _{\textbf{x}} J\left( \textbf{P}_{t}, \textbf{x}_t\right) , \end{aligned} \end{aligned}$$
(30)

which together with Eq. (29) concludes the proof. \(\square \)

Lemma 3

Suppose that Hypotheses H are satisfied, then for any \(t^{\textrm{th}}\) step of Algorithm 1, there exists \(C \in \mathbb {R}^{+}\) such that the following equation holds true:

$$\begin{aligned} J_{t} - J_{t+1} \ge C \left\| \textbf{z}_{t+1} - \textbf{z}_t\right\| ^2. \end{aligned}$$

Proof of Lemma 3 We base the proof on the quotient law of convergent sequences. In detail, let us define the sequences:

  1. 1.

    Sequence of difference between energy values: \(P = \left\{ J_{t} - J_{t+1} : t \in \mathbb {N} \right\} \).

  2. 2.

    Sequence of squared difference between solutions: \(Q = \left\{ \left\| \textbf{z}_t - \textbf{z}_{t+1}\right\| ^2 : t \in \mathbb {N} \right\} \).

  3. 3.

    Sequence of quotient: \(S = \left\{ \frac{P_t}{Q_t} : t \in \mathbb {N} \right\} \).

Per the compactness of \(\textbf{z}\) and continuity of J, we know that both \(\textbf{z}\) and J are bounded. Moreover, per the global convergences mentioned in Sect. 4.2.2, for any \(t \in \mathbb {N}\) we have \(P_t, Q_t \ge 0\) and \(P_t = 0 \Leftrightarrow Q_t = 0\). Therefore, with the convention \(\frac{0}{0} = 1\), for any \(s \in S\) we have \(s \in \mathbb {R}_{+}\), which implies that there always exist some \(C \in \mathbb {R}_{+}\) such that \(\inf S \ge C\) holds. \(\square \)

With Lemmas 2 and 3 in hand, we now can prove Proposition 3:

Proof of Proposition 3 According to Lemma 2 and the Lipschitz continuity of \( \triangledown _{\textbf{x}} J\), we have

$$\begin{aligned} \begin{aligned} \textrm{dis}\left( \textbf{0}, \partial J(\textbf{z}_t)\right)&\le \left\| \triangledown _{\textbf{x}} J(\textbf{P}_{t-1}, \textbf{x}_t) - \triangledown _{\textbf{x}} J(\textbf{P}_t, \textbf{x}_t)\right\| \\&\le L \left\| \textbf{P}_{t-1} - \textbf{P}_t\right\| \le L \left\| \textbf{z}_{t-1} - \textbf{z}_t\right\| , \end{aligned} \end{aligned}$$
(31)

where \(L \in \mathbb {R}^{+}\) is the Lipschitz constant.

Assuming the energy value \({\bar{J}}\) at the critical point satisfies \({\bar{J}} = 0\) without loss of generality, combining Eq. (31) and Eq. (26) leads to

$$\begin{aligned} \phi '(J_t) = \phi '( J_t - {\bar{J}}) \ge \frac{1}{L} \left\| \textbf{z}_{t-1} - \textbf{z}_t\right\| ^{-1}. \end{aligned}$$
(32)

On the other hand, by using the concavity of \(\phi \) and Lemma 3, we can obtain

$$\begin{aligned} \begin{aligned} \phi (J_t) - \phi (J_{t+1})&\ge \phi '(J_t)\left( J_t - J_{t+1}\right) \\&\ge C \phi '(J_t) \left\| \textbf{z}_{t+1} - \textbf{z}_t\right\| ^2. \end{aligned} \end{aligned}$$
(33)

Taking Eq. (32) into Eq. (33) yields

$$\begin{aligned} \begin{aligned} \left\| \textbf{z}_{t-1} - \textbf{z}_t\right\| \cdot C \left( \phi (J_t) - \phi (J_{t+1}) \right) \ge \left\| \textbf{z}_{t+1} - \textbf{z}_t\right\| ^2. \end{aligned} \end{aligned}$$
(34)

Taking square root on both sides of Eq. (34) and applying the inequality \(a^2 + b^2 \ge 2ab\) result in

$$\begin{aligned} \left\| \textbf{z}_{t-1} - \textbf{z}_t\right\| + C \left( \phi (J_t) - \phi (J_{t+1}) \right) \ge 2 \left\| \textbf{z}_{t+1} - \textbf{z}_t\right\| . \end{aligned}$$
(35)

Summing Eq. (35) on both sides for \(t \in \left[ p, +\infty \right) \) leads to

$$\begin{aligned} \begin{aligned} \sum _{t=p}^{+\infty } \left\| \textbf{z}_{t+1} - \textbf{z}_{t}\right\| \le \left\| \textbf{z}_p - \textbf{z}_{p-1}\right\| + C \phi (J_p), \end{aligned} \end{aligned}$$
(36)

Since \(\mathcal {Z}\) is compact as mentioned in Sect. 4.2.2, there exists a B such that \(\sup \{\left\| \textbf{z}_i - \textbf{z}_j\right\| : \textbf{z}_i, \textbf{z}_j \in \mathcal {Z} \} = B < +\infty \). Hence Eq. (36) yields

$$\begin{aligned} \sum _{t=p}^{+\infty } \left\| \textbf{z}_{t+1} - \textbf{z}_{t}\right\| \le C \phi (J_p) + B < +\infty , \end{aligned}$$
(37)

which concludes the proof. \(\square \)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, F., Fujiwara, K., Okura, F. et al. Shuffled Linear Regression with Outliers in Both Covariates and Responses. Int J Comput Vis 131, 732–751 (2023). https://doi.org/10.1007/s11263-022-01709-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-022-01709-2

Keywords

Navigation