Advertisement

Machine Learning

, Volume 87, Issue 1, pp 1–32 | Cite as

Online variance minimization

  • Manfred K. WarmuthEmail author
  • Dima Kuzmin
Article

Abstract

We consider the following type of online variance minimization problem: In every trial t our algorithms get a covariance matrix C t and try to select a parameter vector w t−1 such that the total variance over a sequence of trials \(\sum_{t=1}^{T} (\boldsymbol {w}^{t-1})^{\top} \boldsymbol {C}^{t}\boldsymbol {w}^{t-1}\) is not much larger than the total variance of the best parameter vector u chosen in hindsight. Two parameter spaces in ℝ n are considered—the probability simplex and the unit sphere. The first space is associated with the problem of minimizing risk in stock portfolios and the second space leads to an online calculation of the eigenvector with minimum eigenvalue of the total covariance matrix \(\sum_{t=1}^{T} \boldsymbol {C}^{t}\). For the first parameter space we apply the Exponentiated Gradient algorithm which is motivated with a relative entropy regularization. In the second case, the algorithm has to maintain uncertainty information over all unit directions u. For this purpose, directions are represented as dyads uu and the uncertainty over all directions as a mixture of dyads which is a density matrix. The motivating divergence for density matrices is the quantum version of the relative entropy and the resulting algorithm is a special case of the Matrix Exponentiated Gradient algorithm. In each of the two cases we prove bounds on the additional total variance incurred by the online algorithm over the best offline parameter.

Keywords

Hedge algorithm Weighted majority algorithm Online learning Expert setting Density matrix Matrix exponentiated gradient algorithm Quantum relative entropy 

References

  1. Abernethy, J., Warmuth, M. K., & Yellin, J. (2008). When random play is optimal against an adversary. In Proceedings of the 21st annual conference on learning theory (COLT ’08) (pp. 437–445). Google Scholar
  2. Agarwal, A., Hazan, E., Kale, S., & Schapire, R. E. (2006). Algorithms for portfolio management based on the Newton method. In Proceedings of the 23rd international conference on machine learning (ICML ’06) (pp. 9–16). New York: ACM. http://doi.acm.org/10.1145/1143844.1143846. CrossRefGoogle Scholar
  3. Arora, S., & Kale, S. (2007). A combinatorial, primal-dual approach to semidefinite programs. In Proceedings of the 39th annual ACM symposium on theory of computing (STOC ’07) (pp. 227–236). New York: ACM. Google Scholar
  4. Arora, S., Hazan, E., & Kale, S. (2005). Fast algorithms for approximate semidefinite programming using the multiplicative weights update method. In 46th annual symposium on foundations of computer science (FOCS ’05) (pp. 339–348). CrossRefGoogle Scholar
  5. Bernstein, D. S. (2005). Matrix mathematics: theory, facts, and formulas with application to linear systems theory. Princeton: Princeton University Press. zbMATHGoogle Scholar
  6. Bhatia, R. (1997). Matrix analysis. Berlin: Springer. CrossRefGoogle Scholar
  7. Bousquet, O., & Warmuth, M. K. (2002). Tracking a small set of experts by mixing past posteriors. J. Mach. Learn. Res., 3, 363–396. MathSciNetGoogle Scholar
  8. Boyd, S., & Vandenberghe, L. (2004). Convex optimization. Cambridge: Cambridge University Press. zbMATHGoogle Scholar
  9. Cesa-Bianchi, N., & Lugosi, G. (2006). Prediction, learning, and games. Cambridge: Cambridge University Press. zbMATHCrossRefGoogle Scholar
  10. Cesa-Bianchi, N., Mansour, Y., & Stoltz, G. (2005). Improved second-order bounds for prediction with expert advice. In Proceedings of the 18th annual conference on learning theory (COLT ’05) (pp. 217–232). Berlin: Springer. Google Scholar
  11. Cover, T. M. (1991). Universal portfolios. Math. Finance, 1(1), 1–29. MathSciNetzbMATHCrossRefGoogle Scholar
  12. Freund, Y., & Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci., 55(1), 119–139. MathSciNetzbMATHCrossRefGoogle Scholar
  13. Gordon, G. J. (2006). No-regret algorithms for online convex programs. In B. Schölkopf, J. C. Platt, & T. Hoffman (Eds.), Proceedings of 20th annual conference on neural information processing systems (NIPS ’06) (pp. 489–496). Cambridge: MIT Press. Google Scholar
  14. Hazan, E., Agarwal, A., & Kale, S. (2007). Logarithmic regret algorithms for online convex optimization. Mach. Learn., 69(2–3), 169–192. CrossRefGoogle Scholar
  15. Hazan, E., Kale, S., & Warmuth, M. K. (2010). On-line variance minimization in O(n 2) per trial? In Proceedings of the 23rd annual conference on learning theory (COLT ’10) (pp. 314–315). Google Scholar
  16. Helmbold, D., & Warmuth, M. K. (2007). Learning permutations with exponential weights. In Proceedings of the 20th annual conference on learning theory (COLT ’07) (pp. 469–483). Berlin: Springer. Google Scholar
  17. Helmbold, D., & Warmuth, M. K. (2009). Learning permutations with exponential weights. J. Mach. Learn. Res., 10, 1705–1736. MathSciNetGoogle Scholar
  18. Helmbold, D., Schapire, R. E., Singer, Y., & Warmuth, M. K. (1997). A comparison of new and old algorithms for a mixture estimation problem. Mach. Learn., 27(1), 97–119. CrossRefGoogle Scholar
  19. Helmbold, D., Schapire, R. E., Singer, Y., & Warmuth, M. K. (1998). On-line portfolio selection using multiplicative updates. Math. Finance, 8(4), 325–347. zbMATHCrossRefGoogle Scholar
  20. Helmbold, D. P., Kivinen, J., & Warmuth, M. K. (1999). Relative loss bounds for single neurons. IEEE Trans. Neural Netw., 10(6), 1291–1304. CrossRefGoogle Scholar
  21. Herbster, M., & Warmuth, M. K. (1998). Tracking the best expert. Mach. Learn., 32(2), 151–178. zbMATHCrossRefGoogle Scholar
  22. Herbster, M., & Warmuth, M. K. (2001). Tracking the best linear predictor. J. Mach. Learn. Res., 1, 281–309. MathSciNetzbMATHGoogle Scholar
  23. Jain, R., Ji, Z., Upadhyay, S., & Watrous, J. (2010). QIP = PSPACE. In Proceedings of the 42nd ACM symposium on theory of computing (STOC ’10) (pp. 573–582). CrossRefGoogle Scholar
  24. Kakade, S. M., Shalev-Shwartz, S., & Tewari, A. (2010). Regularizaton techniques for learning with matrices. arXiv:0910.0610v2.
  25. Kalai, A. (2005). A perturbation that makes follow the leader? Equivalent to randomized weighted majority? Private communication. Google Scholar
  26. Kalai, A., & Vempala, S. (2005). Efficient algorithms for online decision problems. J. Comput. Syst. Sci., 71(3), 291–307. doi: 10.1016/j.jcss.2004.10.016. MathSciNetzbMATHCrossRefGoogle Scholar
  27. Kivinen, J., & Warmuth, M. K. (1997). Additive versus exponentiated gradient updates for linear prediction. Inf. Comput., 132(1), 1–64. MathSciNetzbMATHCrossRefGoogle Scholar
  28. Kivinen, J., & Warmuth, M. K. (1999). Averaging expert predictions. In Lecture notes in artificial intelligence: Vol. 1572. Computational learning theory, 4th European conference (EuroCOLT ’99), Proceedings, Nordkirchen, Germany, March 29–31, 1999 (pp. 153–167). Berlin: Springer. Google Scholar
  29. Kivinen, J., & Warmuth, M. K. (2001). Relative loss bounds for multidimensional regression problems. Mach. Learn., 45(3), 301–329. zbMATHCrossRefGoogle Scholar
  30. Kivinen, J., Warmuth, M. K., & Hassibi, B. (2005). The p-norm generalization of the LMS algorithm for adaptive filtering. IEEE Trans. Signal Process., 54(5), 1782–1793. CrossRefGoogle Scholar
  31. Kuzmin, D., & Warmuth, M. K. (2005). Optimum follow the leader algorithm. In Proceedings of the 18th annual conference on learning theory (COLT ’05) (pp. 684–686). Berlin: Springer. Open problem. Google Scholar
  32. Kuzmin, D., & Warmuth, M. K. (2007). Online kernel PCA with entropic matrix updates. In ACM international conference proceedings series, Proceedings of the 24rd international conference on machine learning (ICML ’07) (pp. 465–471). CrossRefGoogle Scholar
  33. Littlestone, N. (1988). Learning quickly when irrelevant attributes abound: a new linear-threshold algorithm. Mach. Learn., 2(4), 285–318. Google Scholar
  34. Littlestone, N. (1989). Mistake bounds and logarithmic linear-threshold learning algorithms. PhD thesis, Technical Report UCSC-CRL-89-11, University of California, Santa Cruz. Google Scholar
  35. Littlestone, N., & Warmuth, M. K. (1994). The weighted majority algorithm. Inf. Comput., 108(2), 212–261. Preliminary version in Proceedings of the 30th annual symposium on foundations of computer science (FOCS ’89). MathSciNetzbMATHCrossRefGoogle Scholar
  36. Nielsen, M. A., & Chuang, I. L. (2000). Quantum computation and quantum information. Cambridge: Cambridge University Press. zbMATHGoogle Scholar
  37. Shalev-Shwartz, S., & Singer, Y. (2006). Convex repeated games and Fenchel duality. In B. Schölkopf, J. C. Platt, & T. Hoffman (Eds.), Proceedings of 20th annual conference on neural information processing systems (NIPS ’06) (pp. 1265–1272). Cambridge: MIT Press. Google Scholar
  38. Tsuda, K., Rätsch, G., & Warmuth, M. K. (2005). Matrix exponentiated gradient updates for on-line learning and Bregman projections. J. Mach. Learn. Res., 6, 995–1018. MathSciNetzbMATHGoogle Scholar
  39. Vovk, V. (1990). Aggregating strategies. In Proceedings of the 3rd annual workshop on computational learning theory (pp. 371–383). Morgan Kaufmann: San Mateo. Google Scholar
  40. Warmuth, M. K. (2007a). When is there a free matrix lunch. In Proc. of the 20th annual conference on learning theory (COLT ’07). Berlin: Springer. Open problem. Google Scholar
  41. Warmuth, M. K. (2007b). Winnowing subspaces. In Proceedings of the 24rd international conference on machine learning (ICML ’07). New York: ACM. Google Scholar
  42. Warmuth, M. K., & Kuzmin, D. (2006a). A Bayesian probability calculus for density matrices. In Proc. 22nd conference on uncertainty in artificial intelligence (UAI ’06) (pp. 503–511). Morgan Kaufmann: San Mateo. Journal submission: http://www.soe.ucsc.edu/~manfred/last/bayescalc.pdf. Google Scholar
  43. Warmuth, M. K., & Kuzmin, D. (2006b). Online variance minimization. In Proceedings of the 19th annual conference on learning theory (COLT ’06) (pp. 514–528). Berlin: Springer. Google Scholar
  44. Warmuth, M. K., & Kuzmin, D. (2006c). Randomized PCA algorithms with regret bounds that are logarithmic in the dimension. In Advances in neural information processing systems 19 (NIPS ’06). Cambridge: MIT Press. Google Scholar
  45. Warmuth, M. K., & Kuzmin, D. (2008). Randomized PCA algorithms with regret bounds that are logarithmic in the dimension. J. Mach. Learn. Res., 9, 2217–2250. MathSciNetGoogle Scholar
  46. Warmuth, M. K., & Kuzmin, D. (2010). Bayesian generalized probability calculus for density matrices. Mach. Learn., 78(1–2), 63–101. CrossRefGoogle Scholar
  47. Zinkevich, M. (2003). Online convex programming and generalized infinitesimal gradient ascent. In Proc. 20th int. conference on machine learning (ICML ’03) (pp. 928–936). Google Scholar

Copyright information

© The Author(s) 2011

Authors and Affiliations

  1. 1.UC CaliforniaSanta CruzUSA
  2. 2.GoogleMountain ViewUSA

Personalised recommendations