Advertisement

Near-optimal discrete optimization for experimental design: a regret minimization approach

  • Zeyuan Allen-Zhu
  • Yuanzhi Li
  • Aarti Singh
  • Yining WangEmail author
Full Length Paper Series A
  • 63 Downloads

Abstract

The experimental design problem concerns the selection of k points from a potentially large design pool of p-dimensional vectors, so as to maximize the statistical efficiency regressed on the selected k design points. Statistical efficiency is measured by optimality criteria, including A(verage), D(eterminant), T(race), E(igen), V(ariance) and G-optimality. Except for the T-optimality, exact optimization is challenging, and for certain instances of D/E-optimality exact or even approximate optimization is proven to be NP-hard. We propose a polynomial-time regret minimization framework to achieve a \((1+\varepsilon )\) approximation with only \(O(p/\varepsilon ^2)\) design points, for all the optimality criteria above. In contrast, to the best of our knowledge, before our work, no polynomial-time algorithm achieves \((1+\varepsilon )\) approximations for D/E/G-optimality, and the best poly-time algorithm achieving \((1+\varepsilon )\)-approximation for A/V-optimality requires \(k=\varOmega (p^2/\varepsilon )\) design points.

Keywords

Experimental design Spectral sparsification Regret minimization 

Mathematics Subject Classification

90C27 62K05 

Notes

Acknowledgements

We thank Adams Wei Yu for helpful discussions regarding the implementation of the entropic mirror descent solver for the continuous (convex) relaxation problem, thank Aleksandar Nikolov, Shayan Oveis Gharan, and Mohit Singh for discussions on the references. This work is supported by NSF CCF-1563918, NSF CAREER IIS-1252412 and AFRL FA87501720212.

References

  1. 1.
    Ageev, A.A., Sviridenko, M.I.: Pipage rounding: a new method of constructing algorithms with proven performance guarantee. J. Comb. Optim. 8(3), 307–328 (2004)MathSciNetzbMATHCrossRefGoogle Scholar
  2. 2.
    Allen-Zhu, Z., Li, Y.: Follow the compressed leader: faster online learning of eigenvectors and faster MMWU. In: Proceedings of the International Conference on Machine Learning (ICML). Full version available at arxiv:1701.01722 (2017)
  3. 3.
    Allen-Zhu, Z., Li, Y., Singh, A., Wang, Y.: Near-optimal design of experiments via regret minimization. In: Proceedings of the International Conference on Machine Learning (ICML), (2017)Google Scholar
  4. 4.
    Allen-Zhu, Z., Liao, Z., Orecchia, L.: Spectral sparsification and regret minimization beyond matrix multiplicative updates. In: Proceedings of Annual Symposium on the Theory of Computing (STOC), (2015)Google Scholar
  5. 5.
    Angluin, D.: Queries and concept learning. Mach. Learn. 2(4), 319–342 (1988)MathSciNetGoogle Scholar
  6. 6.
    Arora, S., Kale, S.: A combinatorial, primal-dual approach to semidefinite programs. In: Proceedings of the annual ACM Symposium on Theory of Computing (STOC), (2007)Google Scholar
  7. 7.
    Audibert, J.-Y., Bubeck, S., Lugosi, G.: Minimax policies for combinatorial prediction games. In: Proceedings of Conference on Learning Theory (COLT), (2011)Google Scholar
  8. 8.
    Avron, H., Boutsidis, C.: Faster subset selection for matrices and applications. SIAM J. Matrix Anal. Appl. 34(4), 1464–1499 (2013)MathSciNetzbMATHCrossRefGoogle Scholar
  9. 9.
    Bach, F.: Submodular functions: from discrete to continous domains. arXiv preprint arXiv:1511.00394, (2015)
  10. 10.
    Beck, A., Teboulle, M.: Mirror descent and nonlinear projected subgradient methods for convex optimization. Oper. Res. Lett. 31(3), 167–175 (2003)MathSciNetzbMATHCrossRefGoogle Scholar
  11. 11.
    Bhatia, R.: Matrix Analysis, volume 169 of Graduate Texts in Mathematics. Springer, New York, NY (1997)Google Scholar
  12. 12.
    Bian, A.A., Buhmann, J.M., Krause, A., Tschiatschek, S.: Guarantees for greedy maximization of non-submodular functions with applications. In: Proceedings of International Conference on Machine Learning (ICML), (2017)Google Scholar
  13. 13.
    Bouhtou, M., Gaubert, S., Sagnol, G.: Submodularity and randomized rounding techniques for optimal experimental design. Electron. Notes Discret. Math. 36, 679–686 (2010)zbMATHCrossRefGoogle Scholar
  14. 14.
    Boutsidis, C., Woodruff, D.P.: Optimal CUR matrix decompositions. In: Proceedings of Annual Symposium on the Theory of Computing (STOC), (2014)Google Scholar
  15. 15.
    Boyd, S., Vandenberghe, L.: Convex Optim. Cambridge University Press, Cambridge (2004)zbMATHCrossRefGoogle Scholar
  16. 16.
    Černỳ, M., Hladík, M.: Two complexity results on C-optimality in experimental design. Comput. Optim. Appl. 51(3), 1397–1408 (2012)MathSciNetzbMATHCrossRefGoogle Scholar
  17. 17.
    Chaloner, K., Verdinelli, I.: Bayesian experimental design: a review. Stat. Sci. 10(3), 273–304 (1995)MathSciNetzbMATHCrossRefGoogle Scholar
  18. 18.
    Chamon, L.F.O., Ribeiro, A.: Greedy sampling of graph signals. arXiv preprint arXiv:1704.01223, (2017)
  19. 19.
    Chaudhuri, K., Kakade, S., Netrapalli, P., Sanghavi, S.: Convergence rates of active learning for maximum likelihood estimation. In: Proceedings of Advances in Neural Information Processing Systems (NIPS), (2015)Google Scholar
  20. 20.
    Chen, S., Sandryhaila, A., Moura, J.M.F., Kovačević, J.: Signal recovery on graphs: Variation minimization. IEEE Trans. Signal Process. 63(17), 4609–4624 (2015)MathSciNetzbMATHCrossRefGoogle Scholar
  21. 21.
    Chen, S., Varma, R., Singh, A., Kovačević, J.: Signal representations on graphs: Tools and applications. arXiv preprint arXiv:1512.05406, (2015)
  22. 22.
    Chen, S., Varma, R., Singh, A., Kovačević, J.: Signal recovery on graphs: fundamental limits of sampling strategies. IEEE Trans. Signal Inf. Process. Over Netw. 2(4), 539–554 (2016)MathSciNetGoogle Scholar
  23. 23.
    Çivril, A., Magdon-Ismail, M.: On selecting a maximum volume sub-matrix of a matrix and related problems. Theoret. Comput. Sci. 410(47–49), 4801–4811 (2009)MathSciNetzbMATHCrossRefGoogle Scholar
  24. 24.
    Condat, L.: Fast projection onto the simplex and the L1-ball. Math. Program. 158, 575–585 (2015)MathSciNetzbMATHCrossRefGoogle Scholar
  25. 25.
    Dereziński, M., Warmuth, M.K.: Reverse iterative volume sampling for linear regression. J. Mach. Learn. Res. 19(1), 853–891 (2018)MathSciNetzbMATHGoogle Scholar
  26. 26.
    Dhillon, P., Lu, Y., Foster, D.P., Ungar, L.: New subsampling algorithms for fast least squares regression. In: Proceedings of Advances in Neural Information Processing Systems (NIPS) (2013)Google Scholar
  27. 27.
    Drineas, P., Mahoney, M.W.: On the Nyström method for approximating a gram matrix for improved kernel-based learning. J. Mach. Learn. Res. 6(12), 2153–2175 (2005)MathSciNetzbMATHGoogle Scholar
  28. 28.
    Drineas, P., Mahoney, M.W., Muthukrishnan, S.: Relative-error CUR matrix decompositions. SIAM J. Matrix Anal. Appl. 30(2), 844–881 (2008)MathSciNetzbMATHCrossRefGoogle Scholar
  29. 29.
    Duchi, J., Shalev-Shwartz, S., Singer, Y., Chandra, T.: Efficient projections onto the L1-ball for learning in high dimensions. In: Proceedings of International Conference on Machine learning (ICML) (2008)Google Scholar
  30. 30.
    Fedorov, V.V.: Theory of Optimal Experiments. Elsevier, Amsterdam (1972)Google Scholar
  31. 31.
    Joshi, S., Boyd, S.: Sensor selection via convex optimization. IEEE Trans. Signal Process. 57(2), 451–462 (2009)MathSciNetzbMATHCrossRefGoogle Scholar
  32. 32.
    Li, C., Jegelka, S., Sra, S.: Polynomial time algorithms for dual volume sampling. In: Proceedings of Advances in Neural Information Processing Systems (NIPS) (2017)Google Scholar
  33. 33.
    Lieb, E.H.: Convex trace functions and the wigner-yanase-dyson conjecture. Adv. Math. 11(3), 267–288 (1973)MathSciNetzbMATHCrossRefGoogle Scholar
  34. 34.
    McMahan, H.B.: Follow-the-regularized-leader and mirror descent: Equivalence theorems and L1 regularization. In: Procedings of International Conference on Artificial Intelligence and Statistics (AISTATS), (2011)Google Scholar
  35. 35.
    Miller, A., Nguyen, N.-K.: A Fedorov exchange algorithm for d-optimal design. J. Roy. Stat. Soc.: Ser. C (Appl. Stat.) 43(4), 669–677 (1994)Google Scholar
  36. 36.
    Nikolov, A.: Randomized rounding for the largest simplex problem. In: Proceedings of the annual ACM symposium on Theory of computing (STOC) (2015)Google Scholar
  37. 37.
    Nikolov, A., Singh, M.: Maximizing determinants under partition constraints. In: Proceedings of the Annual ACM Symposium on Theory of Computing (STOC) (2016)Google Scholar
  38. 38.
    Nikolov, A., Singh, M., Tantipongpipat, U.T.: Proportional volume sampling and approximation algorithms for a-optimal design. In: Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms (SODA) (2019)zbMATHCrossRefGoogle Scholar
  39. 39.
    Pukelsheim, F.: Optimal Design of Experiments. SIAM, Philadelphia (2006)zbMATHCrossRefGoogle Scholar
  40. 40.
    Rakhlin, A.: Lecture notes on online learning. Draft (2009). http://www-stat.wharton.upenn.edu/~rakhlin/courses/stat991/papers/lecture_notes.pdf
  41. 41.
    Shalev-Shwartz, S., Zhang, T.: Stochastic dual coordinate ascent methods for regularized loss minimization. J. Mach. Learn. Res. 14, 567–599 (2013)MathSciNetzbMATHGoogle Scholar
  42. 42.
    Singh, M., Xie, W.: Approximate positive correlated distributions and approximation algorithms for d-optimal design. In: Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), (2017)Google Scholar
  43. 43.
    Spielman, D.A., Srivastava, N.: Graph sparsification by effective resistances. SIAM J. Comput. 40(6), 1913–1926 (2011)MathSciNetzbMATHCrossRefGoogle Scholar
  44. 44.
    Summa, M.D., Eisenbrand, F., Faenza, Y., Moldenhauer, C.: On largest volume simplices and sub-determinants. In: Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms (SODA) (2015)Google Scholar
  45. 45.
    Van der Vaart, A.W.: Asymptotic Statistics, vol. 3. Cambridge University Press, Cambridge (2000)Google Scholar
  46. 46.
    Wang, S., Zhang, Z.: Improving CUR matrix decomposition and the Nyström approximation via adaptive sampling. J. Mach. Learn. Res. 14(1), 2729–2769 (2013)MathSciNetzbMATHGoogle Scholar
  47. 47.
    Wang, Y., Singh, A.: Provably correct active sampling algorithms for matrix column subset selection with missing data. J. Mach. Learn. Res. 18(156), 1–42 (2018)Google Scholar
  48. 48.
    Wang, Y., Wei Adams, Y., Singh, A.: On computationally tractable selection of experiments in regression models. J. Mach. Learn. Res. 18(143), 1–41 (2017)MathSciNetzbMATHGoogle Scholar
  49. 49.
    Welch, W.J.: Algorithmic complexity: three np-hard problems in computational statistics. J. Stat. Comput. Simul. 15(1), 17–25 (1982)MathSciNetzbMATHCrossRefGoogle Scholar
  50. 50.
    Zinkevich, M.: Online convex programming and generalized infinitesimal gradient ascent. In: Proceedings of the International Conference on Machine Learning (ICML) (2003)Google Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature and Mathematical Optimization Society 2020

Authors and Affiliations

  • Zeyuan Allen-Zhu
    • 1
  • Yuanzhi Li
    • 2
  • Aarti Singh
    • 2
  • Yining Wang
    • 3
    Email author
  1. 1.Microsoft Research RedmondRedmondUSA
  2. 2.Machine Learning DepartmentCarnegie Mellon UniversityPittsburghUSA
  3. 3.Warrington College of BusinessUniversity of FloridaGainesvilleUSA

Personalised recommendations