PAC Learning Axis-Aligned Mixtures of Gaussians with No Separation Assumption

  • Jon Feldman
  • Rocco A. Servedio
  • Ryan O’Donnell
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4005)


We propose and analyze a new vantage point for the learning of mixtures of Gaussians: namely, the PAC-style model of learning probability distributions introduced by Kearns et al. [13]. Here the task is to construct a hypothesis mixture of Gaussians that is statistically indistinguishable from the actual mixture generating the data; specifically, the KL divergence should be at most ε.

In this scenario, we give a poly(n/ε) time algorithm that learns the class of mixtures of any constant number of axis-aligned Gaussians in R n . Our algorithm makes no assumptions about the separation between the means of the Gaussians, nor does it have any dependence on the minimum mixing weight. This is in contrast to learning results known in the “clustering” model, where such assumptions are unavoidable.

Our algorithm relies on the method of moments, and a subalgorithm developed in [9] for a discrete mixture-learning problem.


Product Distribution Mixture Distribution Target Distribution Maximum Likelihood Procedure Total Variation Distance 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Achlioptas, D., McSherry, F.: On spectral learning of mixtures of distributions. In: Auer, P., Meir, R. (eds.) COLT 2005. LNCS (LNAI), vol. 3559, pp. 458–469. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  2. 2.
    Arora, S., Kannan, R.: Learning mixtures of arbitrary Gaussians. In: Proceedings of the 33rd Symposium on Theory of Computing, pp. 247–257 (2001)Google Scholar
  3. 3.
    Cover, T., Thomas, J.: Elements of Information Theory. Wiley, Chichester (1991)CrossRefMATHGoogle Scholar
  4. 4.
    Cryan, M., Goldberg, L., Goldberg, P.: Evolutionary trees can be learned in polynomial time in the two state general Markov model. SIAM Journal on Computing 31(2), 375–397 (2002)CrossRefMathSciNetGoogle Scholar
  5. 5.
    Dasgupta, S.: Learning mixtures of gaussians. In: Proceedings of the 40th Annual Symposium on Foundations of Computer Science, pp. 634–644 (1999)Google Scholar
  6. 6.
    Dasgupta, S., Schulman, L.: A Two-round Variant of EM for Gaussian Mixtures. In: Proceedings of the 16th Conf. on UAI, pp. 143–151 (2000)Google Scholar
  7. 7.
    Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. Royal Stat. Soc. Ser. B 39, 1–38 (1977)MathSciNetMATHGoogle Scholar
  8. 8.
    Feldman, J., O’Donnell, R., Servedio, R.: PAC learning mixtures of Gaussians with no separation assumption. available at:
  9. 9.
    Feldman, J., O’Donnell, R., Servedio, R.: Learning mixtures of product distributions over discrete domains. In: Proc. 46th IEEE FOCS, pp. 501–510 (2005)Google Scholar
  10. 10.
    Freund, Y., Kearns, M., Ron, D., Rubinfeld, R., Schapire, R., Sellie, L.: Efficient learning of typical finite automata from random walks. Information and Computation 138(1), 23–48 (1997)CrossRefMathSciNetMATHGoogle Scholar
  11. 11.
    Freund, Y., Mansour, Y.: Estimating a mixture of two product distributions. In: Proceedings of the 12th Annual COLT, pp. 183–192 (1999)Google Scholar
  12. 12.
    Kannan, R., Salmasian, H., Vempala, S.S.: The spectral method for general mixture models. In: Auer, P., Meir, R. (eds.) COLT 2005. LNCS, vol. 3559, pp. 444–457. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  13. 13.
    Kearns, M., Mansour, Y., Ron, D., Rubinfeld, R., Schapire, R., Sellie, L.: On the learnability of discrete distributions. In: Proc. 26th STOC, pp. 273–282 (1994)Google Scholar
  14. 14.
    Lindsay, B.: Mixture models: theory, geometry and applications. Institute for Mathematical Statistics (1995)Google Scholar
  15. 15.
    Naor, M.: Evaluation be easier than generation. In: Proceedings of the 28th Symposium on Theory of Computing (STOC), pp. 74–83 (1996)Google Scholar
  16. 16.
    Titterington, D.M., Smith, A.F.M., Makov, U.E.: Statistical analysis of finite mixture distributions. Wiley & Sons, Chichester (1985)MATHGoogle Scholar
  17. 17.
    Valiant, L.: A theory of the learnable. Communications of the ACM 27(11), 1134–1142 (1984)CrossRefMATHGoogle Scholar
  18. 18.
    Vempala, S., Wang, G.: A spectral algorithm for learning mixtures of distributions. In: Proceedings of the 43rd IEEE FOCS, pp. 113–122 (2002)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Jon Feldman
    • 1
  • Rocco A. Servedio
    • 2
  • Ryan O’Donnell
    • 3
  1. 1.Google 
  2. 2.Columbia University 
  3. 3.Microsoft Research 

Personalised recommendations