Skip to main content

Efficient Probabilistic Latent Semantic Analysis through Parallelization

  • Conference paper
Information Retrieval Technology (AIRS 2009)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5839))

Included in the following conference series:

Abstract

Probabilistic latent semantic analysis (PLSA) is considered an effective technique for information retrieval, but has one notable drawback: its dramatic consumption of computing resources, in terms of both execution time and internal memory. This drawback limits the practical application of the technique only to document collections of modest size.

In this paper, we look into the practice of implementing PLSA with the aim of improving its efficiency without changing its output. Recently, Hong et al. [2008] has shown how the execution time of PLSA can be improved by employing OpenMP for shared memory parallelization. We extend their work by also studying the effects from using it in combination with the Message Passing Interface (MPI) for distributed memory parallelization. We show how a more careful implementation of PLSA reduces execution time and memory costs by applying our method on several text collections commonly used in the literature.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Chang, J.-M., Su, E.C.-Y., Lo, A., Chiu, H.-S., Sung, T.-Y., Hsu, W.-L.: PSLDoc: Protein subcellular localization prediction based on gapped-dipeptides and probabilistic latent semantic analysis. Proteins 72(2), 693–710 (2008)

    Article  Google Scholar 

  • Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. Journal of the American Society of Information Science 41(6), 391–407 (1990)

    Article  Google Scholar 

  • Dempster, A., Laird, N., Rubin, D.: Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society 39(1), 1–38 (1977)

    MathSciNet  MATH  Google Scholar 

  • Gabriel, E., Fagg, G.E., Bosilca, G., Angskun, T., Dongarra, J.J., Squyres, J.M., Sahay, V., Kambadur, P., Barrett, B., Lumsdaine, A., Castain, R.H., Daniel, D.J., Graham, R.L., Woodall, T.S.: Open MPI: Goals, concept, and design of a next generation MPI implementation. In: Kranzlmüller, D., Kacsuk, P., Dongarra, J. (eds.) EuroPVM/MPI 2004. LNCS, vol. 3241, pp. 97–104. Springer, Heidelberg (2004), http://www.open-mpi.org/

    Chapter  Google Scholar 

  • Hanselmann, M., Kirchner, M., Renard, B.Y., Amstalden, E.R., Glunde, K., Heeren, R.M.A., Hamprecht, F.A.: Concise representation of mass spectrometry images by probabilistic latent semantic analysis. Analytical Chemistry (November 2008), ISSN 1520-6882

    Google Scholar 

  • Hofmann, T.: Unsupervised learning by probabilistic latent semantic analysis. Machine Learning 42(1–2), 177–196 (2001)

    Article  MATH  Google Scholar 

  • Hofmann, T.: Probabilistic latent semantic indexing. In: Proc. 22nd ACM International Conference on Research and Development in Information Retrieval (SIGIR), pp. 50–57. ACM Press, New York (1999)

    Google Scholar 

  • Hong, C., Chen, W., Zheng, W., Shan, J., Chen, Y., Zhang, Y.: Parellelization and characterization of probabilistic latent semantic analysis. In: Proc. 37th International Conference on Parallel Processing, pp. 628–635 (2008)

    Google Scholar 

  • Kim, Y.-S., Chang, J.-H., Zhang, B.-T.: An empirical study on dimensionality optimization in text mining for linguistic knowledge acquisition. In: Whang, K.-Y., Jeon, J., Shim, K., Srivastava, J. (eds.) PAKDD 2003. LNCS (LNAI), vol. 2637, pp. 111–116. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  • Mamitsuka, H.: Hierarchical latent knowledge analysis for co-occurrence data. In: Proc. 20th International Conference on Machine Learning, pp. 504–511 (2003)

    Google Scholar 

  • Message Passing Interface Forum. MPI: A message-passing interface standard, version 2.1(June 2008), http://www.mpi-forum.org/docs/docs.html

  • OpenMP Architecture Review Board. OpenMP application programming interface, version 3.0 (May 2008), http://openmp.org/wp/openmp-specifications/

  • Owens, J.D., Houston, M., Luebke, D., Stone, J.E., Philips, J.C.: GPU computing. Proc. IEEE 96(5), 879–899 (2008), http://gpgpu.org/

    Article  Google Scholar 

  • Park, L.A.F., Ramamohanarao, K.: Efficient storage and retrieval of probabilistic latent semantic information for information retrieval. The VLDB Journal 18(1), 141–155 (2009)

    Article  Google Scholar 

  • Quinn, M.J.: Parallel Programming in C with MPI and OpenMP. McGraw-Hill, New York (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Wan, R., Anh, V.N., Mamitsuka, H. (2009). Efficient Probabilistic Latent Semantic Analysis through Parallelization. In: Lee, G.G., et al. Information Retrieval Technology. AIRS 2009. Lecture Notes in Computer Science, vol 5839. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04769-5_38

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-04769-5_38

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-04768-8

  • Online ISBN: 978-3-642-04769-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics