Frontiers of Computer Science

, Volume 7, Issue 4, pp 526–535

Online belief propagation algorithm for probabilistic latent semantic analysis

  • Yun Ye
  • Shengrong Gong
  • Chunping Liu
  • Jia Zeng
  • Ning Jia
  • Yi Zhang
Research Article

DOI: 10.1007/s11704-013-2360-7

Cite this article as:
Ye, Y., Gong, S., Liu, C. et al. Front. Comput. Sci. (2013) 7: 526. doi:10.1007/s11704-013-2360-7

Abstract

Probabilistic latent semantic analysis (PLSA) is a topic model for text documents, which has been widely used in text mining, computer vision, computational biology and so on. For batch PLSA inference algorithms, the required memory size grows linearly with the data size, and handling massive data streams is very difficult. To process big data streams, we propose an online belief propagation (OBP) algorithm based on the improved factor graph representation for PLSA. The factor graph of PLSA facilitates the classic belief propagation (BP) algorithm. Furthermore, OBP splits the data stream into a set of small segments, and uses the estimated parameters of previous segments to calculate the gradient descent of the current segment. Because OBP removes each segment from memory after processing, it is memory-efficient for big data streams. We examine the performance of OBP on four document data sets, and demonstrate that OBP is competitive in both speed and accuracy for online expectation maximization (OEM) in PLSA, and can also give a more accurate topic evolution. Experiments on massive data streams from Baidu further confirm the effectiveness of the OBP algorithm.

Keywords

probabilistic latent semantic analysistopic modelsexpectation maximizationbelief propagation

Copyright information

© Higher Education Press and Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Yun Ye
    • 1
    • 2
  • Shengrong Gong
    • 1
  • Chunping Liu
    • 1
  • Jia Zeng
    • 1
  • Ning Jia
    • 2
  • Yi Zhang
    • 2
  1. 1.School of Computer Science & TechnologySoochow UniversitySuzhouChina
  2. 2.Feng Chao RevenueBaidu Online Network Technology Co., LTDBeijingChina