Skip to main content

Linear Regression with Mismatched Data: A Provably Optimal Local Search Algorithm

Part of the Lecture Notes in Computer Science book series (LNTCS,volume 12707)


Linear regression is a fundamental modeling tool in statistics and related fields. In this paper, we study an important variant of linear regression in which the predictor-response pairs are partially mismatched. We use an optimization formulation to simultaneously learn the underlying regression coefficients and the permutation corresponding to the mismatches. The combinatorial structure of the problem leads to computational challenges, and we are unaware of any algorithm for this problem with both theoretical guarantees and appealing computational performance. To this end, in this paper, we propose and study a simple greedy local search algorithm. We prove that under a suitable scaling of the number of mismatched pairs compared to the number of samples and features, and certain assumptions on the covariates; our local search algorithm converges to the global optimal solution with a linear convergence rate under the noiseless setting.


  • Linear regression
  • Mismatched data
  • Local search method
  • Learning permutations

Supported by grants from the Office of Naval Research: ONR-N000141812298 (YIP) and National Science Foundation: NSF-IIS-1718258.

This is a preview of subscription content, access via your institution.

Buying options

USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-030-73879-2_31
  • Chapter length: 15 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
USD   84.99
Price excludes VAT (USA)
  • ISBN: 978-3-030-73879-2
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   109.99
Price excludes VAT (USA)
Fig. 1.
Fig. 2.


  1. 1.

    This permutation \(P^*\) may not satisfy \( \mathsf {dist}(P^* ,I_n) = r\), but \( \mathsf {dist}(P^* ,I_n)\) will be close to r.


  1. Abid, A., Zou, J.: Stochastic EM for shuffled linear regression. arXiv preprint arXiv:1804.00681 (2018)

  2. Dokmanić, I.: Permutations unlabeled beyond sampling unknown. IEEE Signal Process. Lett. 26(6), 823–827 (2019)

    CrossRef  Google Scholar 

  3. Emiya, V., Bonnefoy, A., Daudet, L., Gribonval, R.: Compressed sensing with unknown sensor permutation. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1040–1044. IEEE (2014)

    Google Scholar 

  4. Haghighatshoar, S., Caire, G.: Signal recovery from unlabeled samples. IEEE Trans. Signal Process. 66(5), 1242–1257 (2017)

    CrossRef  MathSciNet  Google Scholar 

  5. Hsu, D.J., Shi, K., Sun, X.: Linear regression without correspondence. In: Advances in Neural Information Processing Systems, pp. 1531–1540 (2017)

    Google Scholar 

  6. Neter, J., Maynes, E.S., Ramanathan, R.: The effect of mismatching on the measurement of response errors. J. Am. Stat. Assoc. 60(312), 1005–1027 (1965)

    MathSciNet  Google Scholar 

  7. Pananjady, A., Wainwright, M.J., Courtade, T.A.: Denoising linear models with permuted data. In: 2017 IEEE International Symposium on Information Theory (ISIT), pp. 446–450. IEEE (2017)

    Google Scholar 

  8. Pananjady, A., Wainwright, M.J., Courtade, T.A.: Linear regression with shuffled data: statistical and computational limits of permutation recovery. IEEE Trans. Inf. Theory 64(5), 3286–3300 (2017)

    CrossRef  MathSciNet  Google Scholar 

  9. Shi, X., Li, X., Cai, T.: Spherical regression under mismatch corruption with application to automated knowledge translation. J. Am. Stat. Assoc., 1–12 (2020)

    Google Scholar 

  10. Slawski, M., Ben-David, E., Li, P.: Two-stage approach to multivariate linear regression with sparsely mismatched data. J. Mach. Learn. Res. 21(204), 1–42 (2020)

    MathSciNet  MATH  Google Scholar 

  11. Tsakiris, M.C., Peng, L., Conca, A., Kneip, L., Shi, Y., Choi, H., et al.: An algebraic-geometric approach to shuffled linear regression. arXiv preprint arXiv:1810.05440 (2018)

  12. Unnikrishnan, J., Haghighatshoar, S., Vetterli, M.: Unlabeled sensing with random linear measurements. IEEE Trans. Inf. Theory 64(5), 3237–3253 (2018)

    CrossRef  MathSciNet  Google Scholar 

  13. Wainwright, M.J.: High-Dimensional Statistics: A Non-asymptotic Viewpoint, vol. 48. Cambridge University Press, Cambridge (2019)

    Google Scholar 

  14. Wang, G., et al.: Signal amplitude estimation and detection from unlabeled binary quantized samples. IEEE Trans. Signal Process. 66(16), 4291–4303 (2018)

    CrossRef  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations


Corresponding author

Correspondence to Rahul Mazumder .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Verify currency and authenticity via CrossMark

Cite this paper

Mazumder, R., Wang, H. (2021). Linear Regression with Mismatched Data: A Provably Optimal Local Search Algorithm. In: Singh, M., Williamson, D.P. (eds) Integer Programming and Combinatorial Optimization. IPCO 2021. Lecture Notes in Computer Science(), vol 12707. Springer, Cham.

Download citation

  • DOI:

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-73878-5

  • Online ISBN: 978-3-030-73879-2

  • eBook Packages: Computer ScienceComputer Science (R0)