Abstract
Linear regression is a fundamental modeling tool in statistics and related fields. In this paper, we study an important variant of linear regression in which the predictor-response pairs are partially mismatched. We use an optimization formulation to simultaneously learn the underlying regression coefficients and the permutation corresponding to the mismatches. The combinatorial structure of the problem leads to computational challenges, and we are unaware of any algorithm for this problem with both theoretical guarantees and appealing computational performance. To this end, in this paper, we propose and study a simple greedy local search algorithm. We prove that under a suitable scaling of the number of mismatched pairs compared to the number of samples and features, and certain assumptions on the covariates; our local search algorithm converges to the global optimal solution with a linear convergence rate under the noiseless setting.
Keywords
- Linear regression
- Mismatched data
- Local search method
- Learning permutations
Supported by grants from the Office of Naval Research: ONR-N000141812298 (YIP) and National Science Foundation: NSF-IIS-1718258.
This is a preview of subscription content, access via your institution.
Buying options


Notes
- 1.
This permutation \(P^*\) may not satisfy \( \mathsf {dist}(P^* ,I_n) = r\), but \( \mathsf {dist}(P^* ,I_n)\) will be close to r.
References
Abid, A., Zou, J.: Stochastic EM for shuffled linear regression. arXiv preprint arXiv:1804.00681 (2018)
Dokmanić, I.: Permutations unlabeled beyond sampling unknown. IEEE Signal Process. Lett. 26(6), 823–827 (2019)
Emiya, V., Bonnefoy, A., Daudet, L., Gribonval, R.: Compressed sensing with unknown sensor permutation. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1040–1044. IEEE (2014)
Haghighatshoar, S., Caire, G.: Signal recovery from unlabeled samples. IEEE Trans. Signal Process. 66(5), 1242–1257 (2017)
Hsu, D.J., Shi, K., Sun, X.: Linear regression without correspondence. In: Advances in Neural Information Processing Systems, pp. 1531–1540 (2017)
Neter, J., Maynes, E.S., Ramanathan, R.: The effect of mismatching on the measurement of response errors. J. Am. Stat. Assoc. 60(312), 1005–1027 (1965)
Pananjady, A., Wainwright, M.J., Courtade, T.A.: Denoising linear models with permuted data. In: 2017 IEEE International Symposium on Information Theory (ISIT), pp. 446–450. IEEE (2017)
Pananjady, A., Wainwright, M.J., Courtade, T.A.: Linear regression with shuffled data: statistical and computational limits of permutation recovery. IEEE Trans. Inf. Theory 64(5), 3286–3300 (2017)
Shi, X., Li, X., Cai, T.: Spherical regression under mismatch corruption with application to automated knowledge translation. J. Am. Stat. Assoc., 1–12 (2020)
Slawski, M., Ben-David, E., Li, P.: Two-stage approach to multivariate linear regression with sparsely mismatched data. J. Mach. Learn. Res. 21(204), 1–42 (2020)
Tsakiris, M.C., Peng, L., Conca, A., Kneip, L., Shi, Y., Choi, H., et al.: An algebraic-geometric approach to shuffled linear regression. arXiv preprint arXiv:1810.05440 (2018)
Unnikrishnan, J., Haghighatshoar, S., Vetterli, M.: Unlabeled sensing with random linear measurements. IEEE Trans. Inf. Theory 64(5), 3237–3253 (2018)
Wainwright, M.J.: High-Dimensional Statistics: A Non-asymptotic Viewpoint, vol. 48. Cambridge University Press, Cambridge (2019)
Wang, G., et al.: Signal amplitude estimation and detection from unlabeled binary quantized samples. IEEE Trans. Signal Process. 66(16), 4291–4303 (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Mazumder, R., Wang, H. (2021). Linear Regression with Mismatched Data: A Provably Optimal Local Search Algorithm. In: Singh, M., Williamson, D.P. (eds) Integer Programming and Combinatorial Optimization. IPCO 2021. Lecture Notes in Computer Science(), vol 12707. Springer, Cham. https://doi.org/10.1007/978-3-030-73879-2_31
Download citation
DOI: https://doi.org/10.1007/978-3-030-73879-2_31
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-73878-5
Online ISBN: 978-3-030-73879-2
eBook Packages: Computer ScienceComputer Science (R0)