Abstract
Discovering long-tail topics from massive data usually requires a large number of topics and a large-scale vocabulary. However, using a single machine to train such large-scale topic models encounters bottlenecks in computing efficiency and data storage. Therefore, it is necessary to develop distributed training mechanisms for topic models. In this chapter, we introduce distributed computing architectures in Sect. 7.1, followed by the distributed sampling algorithm in Sect. 7.2, and the distributed variational inference in Sect. 7.3.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Dean J, Ghemawat S (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51(1):107–113
Li M (2014) Scaling distributed machine learning with the parameter server. In: International Conference on Big Data Science and Computing, p 3
Newman D, Asuncion A, Smyth P, Welling M (2009) Distributed algorithms for topic models. J Mach Learn Res 10(Aug):1801–1828
Qiu Z, Wu B, Wang B, Shi C, Yu L (2014) Collapsed Gibbs sampling for latent Dirichlet allocation on spark. In: Proceedings of the 3rd International Conference on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications, vol 36, pp 17–28
Yuan J, Gao F, Ho Q, Dai W, Wei J, Zheng X, Xing EP, Liu TY, Ma WY (2015) LightLDA: Big topic models on modest computer clusters. In: Proceedings of the 24th International Conference on World Wide Web, International World Wide Web Conferences Steering Committee, pp 1351–1361
Zhai K, Boyd-Graber J, Asadi N, Alkhouja ML (2012) Mr. LDA: a flexible large scale topic modeling package using variational inference in MapReduce. In: Proceedings of the 21st International Conference on World Wide Web. ACM, New York, pp 879–888
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Jiang, D., Zhang, C., Song, Y. (2023). Distributed Training. In: Probabilistic Topic Models. Springer, Singapore. https://doi.org/10.1007/978-981-99-2431-8_7
Download citation
DOI: https://doi.org/10.1007/978-981-99-2431-8_7
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-2430-1
Online ISBN: 978-981-99-2431-8
eBook Packages: Computer ScienceComputer Science (R0)