A Novel Multiview Topic Model to Compute Correlation of Heterogeneous Data

Shen, Jinsheng; Chi, Mingmin

doi:10.1007/s40745-017-0135-y

A Novel Multiview Topic Model to Compute Correlation of Heterogeneous Data

Published: 20 February 2018

Volume 5, pages 9–19, (2018)
Cite this article

Annals of Data Science Aims and scope Submit manuscript

Jinsheng Shen¹ &
Mingmin Chi¹

140 Accesses
1 Citation
Explore all metrics

Abstract

With fast development of Internet technologies and sensor techniques, it is much easier to acquire data from different sources in different dates and times. However, how to compute the correlation of those heterogeneous data is a big challenge for data mining and information retrieval. Here, data feature from one source is called as a view, and the multiview features denote the same data point. In the paper, hidden correlation of two-view features is proposed to construct a Heterogeneous (multiview) Topic Model (HTM). In particular, probabilistic topic model is utilized for different views as usually, generative models provide much richer features when handling high-dimensional data such as texts. Nevertheless, it is necessary to know the form of probability distribution for most existent probabilistic topic models, such as latent Dirichlet allocation. By avoiding the limitation of probabilistic topic model, the HTM is reduced to solving a non-negative matrix tri-factorization problem with certain constraints such that the proposed approach can be used in terms of an arbitrary model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

Available at http://www.cs.nyu.edu/~roweis/data.html.

References

Blei DM, Jordan MI (2003) Modeling annotated data. In: Proceedings of the 26th annual international ACM SIGIR conference on research and development in information retrieval, pp 127–134
Buntine WL (2002) Variational extensions to EM and multinomial PCA. In: Proceedings of the 13th European conference on machine learning, ECML ’02, pp 23–34
Chang J, Blei D (2010) Hierarchical relational models for document networks. Ann Appl Stat 4(1):124–150
Article Google Scholar
Chen X, Zhou M, Carin L (2012) The contextual focused topic model. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, pp 96–104
Ding C, Li T, Peng W, Park H (2006) Orthogonal nonnegative matrix t-factorizations for clustering. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining, pp 126–135
Fama EF (1970) Efficient capital markets: a review of theory and empirical work. J Finance 25(2):383C417
Article Google Scholar
Furnas GW, Deerwester S, Dumais ST, Landauer TK, Harshman RA, Streeter LA, Lochbaum KE (1988) Information retrieval using a singular value decomposition model of latent semantic structure. In: Proceedings of the 11th annual international ACM SIGIR conference on research and development in information retrieval, pp 465–480
Hofmann T (1999) Probabilistic latent semantic analysis. In: Proceedings of the fifteenth conference on uncertainty in artificial intelligence, UAI’99, pp 289–296
Lee D, Seung H et al (1999) Learning the parts of objects by non-negative matrix factorization. Nature 401(6755):788–791
Article Google Scholar
Li T, Zhang Y, Sindhwani V (2009) A non-negative matrix tri-factorization approach to sentiment classification with lexical prior knowledge. In: Proceedings of the joint conference of the 47th annual meeting of the ACL and the 4th international joint conference on natural language processing of the AFNLP: Volume 1–Volume 1, ACL ’09, pp 244–252
Nallapati R, Cohen W (2008) Link-PLSA-LDA: a new unsupervised model for topics and influence of blogs. In: Proceedings of the international conference on weblogs and social media (ICWSM). Association for the Advancement of Artificial Intelligence, pp 84–92
Rosen-Zvi M, Griffiths T, Steyvers M, Smyth P (2004) The author-topic model for authors and documents. In: Proceedings of the 20th conference on uncertainty in artificial intelligence, pp 487–494
Stigler SM (1989) Francis galton’s account of the invention of correlation. Stat Sci 4(2):73C79
Article Google Scholar
Wang H, Huang H, Ding C (2011) Simultaneous clustering of multi-type relational data via symmetric nonnegative matrix tri-factorization. In: Proceedings of the 20th ACM international conference on information and knowledge management, pp 279–28
Wang H, Nie F, Huang H, Makedon F (2011) Fast nonnegative matrix tri-factorization for large-scale data co-clustering. In: Proceedings of the twenty-second international joint conference on artificial intelligence–vol 2, pp 1553–1558
Zhang Y, Yeung D (2012) Overlapping community detection via bounded nonnegative matrix tri-factorization. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, pp 606–614

Download references

Author information

Authors and Affiliations

School of Computer Science, Shanghai Key Laboratory of Data Science, Fudan University, Shanghai, China
Jinsheng Shen & Mingmin Chi

Authors

Jinsheng Shen
View author publications
You can also search for this author in PubMed Google Scholar
Mingmin Chi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mingmin Chi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shen, J., Chi, M. A Novel Multiview Topic Model to Compute Correlation of Heterogeneous Data. Ann. Data. Sci. 5, 9–19 (2018). https://doi.org/10.1007/s40745-017-0135-y

Download citation

Received: 02 November 2017
Revised: 04 November 2017
Accepted: 17 November 2017
Published: 20 February 2018
Issue Date: March 2018
DOI: https://doi.org/10.1007/s40745-017-0135-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Novel Multiview Topic Model to Compute Correlation of Heterogeneous Data

Abstract

Access this article

Similar content being viewed by others

A Revised Inference for Correlated Topic Model

A Generative Model with Ensemble Manifold Regularization for Multi-view Clustering

Co-regularized PLSA for Multi-view Clustering

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A Novel Multiview Topic Model to Compute Correlation of Heterogeneous Data

Abstract

Access this article

Similar content being viewed by others

A Revised Inference for Correlated Topic Model

A Generative Model with Ensemble Manifold Regularization for Multi-view Clustering

Co-regularized PLSA for Multi-view Clustering

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation