Skip to main content

LSA-PTM: A Propagation-Based Topic Model Using Latent Semantic Analysis on Heterogeneous Information Networks

  • Conference paper
Web-Age Information Management (WAIM 2013)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7923))

Included in the following conference series:

Abstract

Topic modeling on information networks is important for data analysis. Although there are many advanced techniques for this task, few methods either consider it into heterogeneous information networks or the readability of discovered topics. In this paper, we study the problem of topic modeling on heterogeneous information networks by putting forward LSA-PTM. LSA-PTM first extracts meaningful frequent phrases from documents captured from heterogeneous information network. Subsequently, latent semantic analysis is conducted on these phrases, which can obtain the inherent topics of the documents. Then we introduce a topic propagation method that propagates the topics obtained by LSA on the heterogeneous information network via the links between different objects, which can optimize the topics and identify clusters of multi-typed objects simultaneously. To make the topics more understandable, a topic description is calculated for each discovered topic. We apply LSA-PTM on real data, and experimental results prove its effectiveness.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Sun, Y., Han, J., Yan, X., Yu, P.: Mining Knowledge from Interconnected Data: A Heterogeneous Information Network Analysis Approach. In: Proceedings of the VLDB Endowment, pp. 2022–2023. ACM Press (2012)

    Google Scholar 

  2. Cai, D., Mei, Q., Han, J., Zhai, C.: Modeling hidden topics on document manifold. In: Proceedings of the 17th ACM Conference on Information and Knowledge Management, pp. 911–920. ACM Press, New York (2008)

    Chapter  Google Scholar 

  3. Mei, Q., Cai, D., Zhang, D., Zhai, C.: Topic modeling with network regularization. In: Proceedings of the 17th International Conference on World Wide Web, pp. 101–110. ACM Press, New York (2008)

    Chapter  Google Scholar 

  4. Chen, Sun, Y., Han, J., Yu: itopicmodel: Information network-integrated topic modeling. In: Proceedings of Ninth IEEE International Conference on Data Mining, pp. 493–502. IEEE Computer Society, Miami (2009)

    Google Scholar 

  5. Hofmann, T.: Probabilistic latent semantic indexing. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 50–57. ACM, New York (1999)

    Google Scholar 

  6. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. The Journal of Machine Learning Research, 993–1022 (2003)

    Google Scholar 

  7. Deng, H., Han, J., Zhao, B., Yu, Y.: Probabilistic topic models with biased propagation on heterogeneous information networks. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1271–1279. ACM Press, New York (2011)

    Google Scholar 

  8. Simitsis, A., Baid, A., Sismanis, Y., Reinwald, B.: Multidimensional content exploration. The VLDB Endowment, 660–671 (2008)

    Google Scholar 

  9. Deerwester, S.C., Dumais, S.T., Landauer, T.K., Furnas, G.W., Harshman, R.A.: Indexing by latent semantic analysis. Journal of the American Society for Information Science and Technology, 391–407 (1990)

    Google Scholar 

  10. Golub, G.H., Reinsch, C.: Singular value decomposition and least squares solutions. Numerische Mathematik, 403–420 (1970)

    Google Scholar 

  11. DBLP, http://www.informatik.uni-trier.de/~ley/db/

  12. Larsen, B., Aone, C.: Fast and Effective Text Mining Using Linear-time Document Clustering. In: Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 16–22. ACM Press, New York (1999)

    Chapter  Google Scholar 

  13. Steyvers, M., Smyth, P., Rosen-Zvi, M., Griffiths, T.L.: Probabilistic author-topic models for information discovery. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 306–315. ACM Press, New York (2004)

    Google Scholar 

  14. Wei, X., Croft, W.B.: LDA-based document models for ad-hoc retrieval. In: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 178–185. ACM Press, New York (2006)

    Google Scholar 

  15. Zhai, C., Lafferty, J.: Model-based feedback in the language modeling approach to information retrieval. In: Proceedings of the Tenth International Conference on Information and Knowledge Management, pp. 403–410. ACM Press, New York (2001)

    Google Scholar 

  16. Yin, Z., Cao, L., Han, J., Zhai, C.: Geographical topic discovery and comparison. In: Proceedings of the 20th International Conference on World Wide Web, pp. 247–256. ACM Press, New York (2011)

    Chapter  Google Scholar 

  17. He, D., Parker, D.S.: Topic dynamics: an alternative model of bursts in streams of topics. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 443–452. ACM Press, New York (2010)

    Chapter  Google Scholar 

  18. He, Q., Chen, B., Pei, J., Qiu, B., Mitra, P.: Detecting topic evolution in scientic literature: how can citations help? In: Proceedings of the 18th ACM Conference on Information and Knowledge Management, pp. 957–966. ACM Press, New York (2009)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Wang, Q., Peng, Z., Jiang, F., Li, Q. (2013). LSA-PTM: A Propagation-Based Topic Model Using Latent Semantic Analysis on Heterogeneous Information Networks. In: Wang, J., Xiong, H., Ishikawa, Y., Xu, J., Zhou, J. (eds) Web-Age Information Management. WAIM 2013. Lecture Notes in Computer Science, vol 7923. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38562-9_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-38562-9_2

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-38561-2

  • Online ISBN: 978-3-642-38562-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics