Skip to main content

Distributed Training

  • Chapter
  • First Online:
Probabilistic Topic Models
  • 278 Accesses

Abstract

Discovering long-tail topics from massive data usually requires a large number of topics and a large-scale vocabulary. However, using a single machine to train such large-scale topic models encounters bottlenecks in computing efficiency and data storage. Therefore, it is necessary to develop distributed training mechanisms for topic models. In this chapter, we introduce distributed computing architectures in Sect. 7.1, followed by the distributed sampling algorithm in Sect. 7.2, and the distributed variational inference in Sect. 7.3.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Dean J, Ghemawat S (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51(1):107–113

    Article  Google Scholar 

  2. Li M (2014) Scaling distributed machine learning with the parameter server. In: International Conference on Big Data Science and Computing, p 3

    Google Scholar 

  3. Newman D, Asuncion A, Smyth P, Welling M (2009) Distributed algorithms for topic models. J Mach Learn Res 10(Aug):1801–1828

    MathSciNet  MATH  Google Scholar 

  4. Qiu Z, Wu B, Wang B, Shi C, Yu L (2014) Collapsed Gibbs sampling for latent Dirichlet allocation on spark. In: Proceedings of the 3rd International Conference on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications, vol 36, pp 17–28

    Google Scholar 

  5. Yuan J, Gao F, Ho Q, Dai W, Wei J, Zheng X, Xing EP, Liu TY, Ma WY (2015) LightLDA: Big topic models on modest computer clusters. In: Proceedings of the 24th International Conference on World Wide Web, International World Wide Web Conferences Steering Committee, pp 1351–1361

    Google Scholar 

  6. Zhai K, Boyd-Graber J, Asadi N, Alkhouja ML (2012) Mr. LDA: a flexible large scale topic modeling package using variational inference in MapReduce. In: Proceedings of the 21st International Conference on World Wide Web. ACM, New York, pp 879–888

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Jiang, D., Zhang, C., Song, Y. (2023). Distributed Training. In: Probabilistic Topic Models. Springer, Singapore. https://doi.org/10.1007/978-981-99-2431-8_7

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-2431-8_7

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-2430-1

  • Online ISBN: 978-981-99-2431-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics