Chapter

Algorithmic Aspects in Information and Management

Volume 5564 of the series Lecture Notes in Computer Science pp 301-314

PLDA: Parallel Latent Dirichlet Allocation for Large-Scale Applications

  • Yi WangAffiliated withGoogle Beijing Research
  • , Hongjie BaiAffiliated withGoogle Beijing Research
  • , Matt StantonAffiliated withComputer Science, CMU
  • , Wen-Yen ChenAffiliated withGoogle Beijing Research
  • , Edward Y. ChangAffiliated withGoogle Beijing Research

* Final gross prices may vary according to local VAT.

Get Access

Abstract

This paper presents PLDA, our parallel implementation of Latent Dirichlet Allocation on MPI and MapReduce. PLDA smooths out storage and computation bottlenecks and provides fault recovery for lengthy distributed computations. We show that PLDA can be applied to large, real-world applications and achieves good scalability. We have released MPI-PLDA to open source at http://code.google.com/p/plda under the Apache License.