Distributed Training

Jiang, Di; Zhang, Chen; Song, Yuanfeng

doi:10.1007/978-981-99-2431-8_7

Di Jiang⁴,
Chen Zhang⁵ &
Yuanfeng Song⁴

278 Accesses

Abstract

Discovering long-tail topics from massive data usually requires a large number of topics and a large-scale vocabulary. However, using a single machine to train such large-scale topic models encounters bottlenecks in computing efficiency and data storage. Therefore, it is necessary to develop distributed training mechanisms for topic models. In this chapter, we introduce distributed computing architectures in Sect. 7.1, followed by the distributed sampling algorithm in Sect. 7.2, and the distributed variational inference in Sect. 7.3.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 199.99; Price excludes VAT (USA)

Hardcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Dean J, Ghemawat S (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51(1):107–113
Article Google Scholar
Li M (2014) Scaling distributed machine learning with the parameter server. In: International Conference on Big Data Science and Computing, p 3
Google Scholar
Newman D, Asuncion A, Smyth P, Welling M (2009) Distributed algorithms for topic models. J Mach Learn Res 10(Aug):1801–1828
MathSciNet MATH Google Scholar
Qiu Z, Wu B, Wang B, Shi C, Yu L (2014) Collapsed Gibbs sampling for latent Dirichlet allocation on spark. In: Proceedings of the 3rd International Conference on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications, vol 36, pp 17–28
Google Scholar
Yuan J, Gao F, Ho Q, Dai W, Wei J, Zheng X, Xing EP, Liu TY, Ma WY (2015) LightLDA: Big topic models on modest computer clusters. In: Proceedings of the 24th International Conference on World Wide Web, International World Wide Web Conferences Steering Committee, pp 1351–1361
Google Scholar
Zhai K, Boyd-Graber J, Asadi N, Alkhouja ML (2012) Mr. LDA: a flexible large scale topic modeling package using variational inference in MapReduce. In: Proceedings of the 21st International Conference on World Wide Web. ACM, New York, pp 879–888
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

AI, WeBank, Shenzhen, Guangdong, China
Di Jiang & Yuanfeng Song
Department of Computing and School of Hotel and Tourism Management, The Hong Kong Polytechnic University, Hong Kong, Hong Kong
Chen Zhang

Authors

Di Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Chen Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yuanfeng Song
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Jiang, D., Zhang, C., Song, Y. (2023). Distributed Training. In: Probabilistic Topic Models. Springer, Singapore. https://doi.org/10.1007/978-981-99-2431-8_7

Download citation

DOI: https://doi.org/10.1007/978-981-99-2431-8_7
Published: 09 June 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-2430-1
Online ISBN: 978-981-99-2431-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics