Skip to main content
Log in

A Novel Multiview Topic Model to Compute Correlation of Heterogeneous Data

  • Published:
Annals of Data Science Aims and scope Submit manuscript

Abstract

With fast development of Internet technologies and sensor techniques, it is much easier to acquire data from different sources in different dates and times. However, how to compute the correlation of those heterogeneous data is a big challenge for data mining and information retrieval. Here, data feature from one source is called as a view, and the multiview features denote the same data point. In the paper, hidden correlation of two-view features is proposed to construct a Heterogeneous (multiview) Topic Model (HTM). In particular, probabilistic topic model is utilized for different views as usually, generative models provide much richer features when handling high-dimensional data such as texts. Nevertheless, it is necessary to know the form of probability distribution for most existent probabilistic topic models, such as latent Dirichlet allocation. By avoiding the limitation of probabilistic topic model, the HTM is reduced to solving a non-negative matrix tri-factorization problem with certain constraints such that the proposed approach can be used in terms of an arbitrary model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. Available at http://www.cs.nyu.edu/~roweis/data.html.

References

  1. Blei DM, Jordan MI (2003) Modeling annotated data. In: Proceedings of the 26th annual international ACM SIGIR conference on research and development in information retrieval, pp 127–134

  2. Buntine WL (2002) Variational extensions to EM and multinomial PCA. In: Proceedings of the 13th European conference on machine learning, ECML ’02, pp 23–34

  3. Chang J, Blei D (2010) Hierarchical relational models for document networks. Ann Appl Stat 4(1):124–150

    Article  Google Scholar 

  4. Chen X, Zhou M, Carin L (2012) The contextual focused topic model. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, pp 96–104

  5. Ding C, Li T, Peng W, Park H (2006) Orthogonal nonnegative matrix t-factorizations for clustering. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining, pp 126–135

  6. Fama EF (1970) Efficient capital markets: a review of theory and empirical work. J Finance 25(2):383C417

    Article  Google Scholar 

  7. Furnas GW, Deerwester S, Dumais ST, Landauer TK, Harshman RA, Streeter LA, Lochbaum KE (1988) Information retrieval using a singular value decomposition model of latent semantic structure. In: Proceedings of the 11th annual international ACM SIGIR conference on research and development in information retrieval, pp 465–480

  8. Hofmann T (1999) Probabilistic latent semantic analysis. In: Proceedings of the fifteenth conference on uncertainty in artificial intelligence, UAI’99, pp 289–296

  9. Lee D, Seung H et al (1999) Learning the parts of objects by non-negative matrix factorization. Nature 401(6755):788–791

    Article  Google Scholar 

  10. Li T, Zhang Y, Sindhwani V (2009) A non-negative matrix tri-factorization approach to sentiment classification with lexical prior knowledge. In: Proceedings of the joint conference of the 47th annual meeting of the ACL and the 4th international joint conference on natural language processing of the AFNLP: Volume 1–Volume 1, ACL ’09, pp 244–252

  11. Nallapati R, Cohen W (2008) Link-PLSA-LDA: a new unsupervised model for topics and influence of blogs. In: Proceedings of the international conference on weblogs and social media (ICWSM). Association for the Advancement of Artificial Intelligence, pp 84–92

  12. Rosen-Zvi M, Griffiths T, Steyvers M, Smyth P (2004) The author-topic model for authors and documents. In: Proceedings of the 20th conference on uncertainty in artificial intelligence, pp 487–494

  13. Stigler SM (1989) Francis galton’s account of the invention of correlation. Stat Sci 4(2):73C79

    Article  Google Scholar 

  14. Wang H, Huang H, Ding C (2011) Simultaneous clustering of multi-type relational data via symmetric nonnegative matrix tri-factorization. In: Proceedings of the 20th ACM international conference on information and knowledge management, pp 279–28

  15. Wang H, Nie F, Huang H, Makedon F (2011) Fast nonnegative matrix tri-factorization for large-scale data co-clustering. In: Proceedings of the twenty-second international joint conference on artificial intelligence–vol 2, pp 1553–1558

  16. Zhang Y, Yeung D (2012) Overlapping community detection via bounded nonnegative matrix tri-factorization. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, pp 606–614

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mingmin Chi.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shen, J., Chi, M. A Novel Multiview Topic Model to Compute Correlation of Heterogeneous Data. Ann. Data. Sci. 5, 9–19 (2018). https://doi.org/10.1007/s40745-017-0135-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s40745-017-0135-y

Keywords

Navigation