Skip to main content

Multiple Pass Streaming Algorithms for Learning Mixtures of Distributions in \({\mathbb R}^d\)

  • Conference paper
Algorithmic Learning Theory (ALT 2007)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4754))

Included in the following conference series:

Abstract

We present a multiple pass streaming algorithm for learning the density function of a mixture of k uniform distributions over rectangles (cells) in \({\mathbb R}^d\), for any d > 0. Our learning model is: samples drawn according to the mixture are placed in arbitrary order in a data stream that may only be accessed sequentially by an algorithm with a very limited random access memory space. Our algorithm makes 2ℓ + 1 passes, for any ℓ> 0, and requires memory at most \(\tilde O(\epsilon^{-2/\ell}k^2d^4+(2k)^d)\). This exhibits a strong memory-space tradeoff: a few more passes significantly lowers its memory requirements, thus trading one of the two most important resources in streaming computation for the other. Chang and Kannan ? first considered this problem for [1] d = 1, 2.

Our learning algorithm is especially appropriate for situations where massive data sets of samples are available, but practical computation with such large inputs requires very restricted models of computation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Chang, K., Kannan, R.: The space complexity of pass-efficient algorithms for clustering. In: Proceedings of the Seventeenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1157–1166 (2006)

    Google Scholar 

  2. Munro, J.I., Paterson, M.: Selection and sorting with limited storage. Theoretical Computer Science 12, 315–323 (1980)

    Article  MathSciNet  MATH  Google Scholar 

  3. Alon, N., Matias, Y., Szegedy, M.: The space complexity of approximating the frequency moments. Journal of Computer and System Sciences 58, 137–147 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  4. Indyk, P.: Stable distributions, pseudorandom generators, embeddings, and data stream computation. Journal of the Association for Computing Machinery 53, 307–323 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  5. Arora, S., Kannan, R.: Learning mixtures of separated nonsphereical Gaussians. Annals of Applied Probability 15, 69–92 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  6. Dasgupta, S.: Learning mixtures of Gaussians. In: Proceedings of the 40th IEEE Symposium on Foundations of Computer Science, pp. 634–644. IEEE Computer Society Press, Los Alamitos (1999)

    Google Scholar 

  7. Kannan, R., Salmasian, H., Vempala, S.: The spectral method for general mixture models. In: Auer, P., Meir, R. (eds.) COLT 2005. LNCS (LNAI), vol. 3559, pp. 444–457. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  8. Vempala, S., Wang, G.: A spectral algorithm for learning mixtures of distributions. Journal of Computer and System Sciences 68, 841–860 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  9. Dasgupta, A., Hopcroft, J.E., Kleinberg, J.M., Sandler, M.: On learning mixtures of heavy-tailed distributions. In: Proceedings of the 46th IEEE Symposium on Foundations of Computer Science, pp. 491–500. IEEE Computer Society Press, Los Alamitos (2005)

    Google Scholar 

  10. Gilbert, A.C., Guha, S., Indyk, P., Kotidis, Y., Muthukrishnan, S., Strauss, M.: Fast, small-space algorithms for approximate histogram maintenance. In: Proceedings of the 34th Annual ACM Symposium on the Theory of Computing, pp. 389–398. ACM Press, New York (2002)

    Google Scholar 

  11. Thaper, N., Guha, S., Indyk, P., Koudas, N.: Dynamic multidimensional histograms. In: Proceedings of the 2002 ACM SIGMOD international conference on Management of data, pp. 428–439. ACM Press, New York, NY, USA (2002)

    Chapter  Google Scholar 

  12. Guha, S., McGregor, A., Venkatasubramanian, S.: Streaming and sublinear approximation of entropy and information distances. In: Proceedings of the Seventeenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 733–742 (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Chang, K.L. (2007). Multiple Pass Streaming Algorithms for Learning Mixtures of Distributions in \({\mathbb R}^d\) . In: Hutter, M., Servedio, R.A., Takimoto, E. (eds) Algorithmic Learning Theory. ALT 2007. Lecture Notes in Computer Science(), vol 4754. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-75225-7_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-75225-7_19

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-75224-0

  • Online ISBN: 978-3-540-75225-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics