Word network topic model: a simple but general solution for short and imbalanced texts

Zuo, Yuan; Zhao, Jichang; Xu, Ke

doi:10.1007/s10115-015-0882-z

Word network topic model: a simple but general solution for short and imbalanced texts

Regular Paper
Published: 23 September 2015

Volume 48, pages 379–398, (2016)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Yuan Zuo¹,
Jichang Zhao² &
Ke Xu¹

2847 Accesses
128 Citations
Explore all metrics

Abstract

The short text has been the prevalent format for information of Internet, especially with the development of online social media. Although sophisticated signals delivered by the short text make it a promising source for topic modeling, its extreme sparsity and imbalance bring unprecedented challenges to conventional topic models like LDA and its variants. Aiming at presenting a simple but general solution for topic modeling in short texts, we present a word co-occurrence network-based model named WNTM to tackle the sparsity and imbalance simultaneously. Different from previous approaches, WNTM models the distribution over topics for each word instead of learning topics for each document, which successfully enhances the semantic density of data space without importing too much time or space complexity. Meanwhile, the rich contextual information preserved in the word–word space also guarantees its sensitivity in identifying rare topics with convincing quality. Furthermore, employing the same Gibbs sampling as LDA makes WNTM easily to be extended to various application scenarios. Extensive validations on both short and normal texts testify the outperformance of WNTM as compared to baseline methods. And we also demonstrate its potential in precisely discovering newly emerging topics or unexpected events in Weibo at pretty early stages.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A CWTM Model of Topic Extraction for Short Text

Topic Modeling for Short Texts via Adaptive P $$\acute{o}$$ lya Urn Dirichlet Multinomial Mixture

Incorporating word embeddings into topic modeling of short text

Article 18 December 2018

Notes

References

Andrzejewski D, Zhu X, Craven M (2009) Incorporating domain knowledge into topic modeling via dirichlet forest priors. In: ICML, pp 25–32
Arora S, Ge R, Halpern Y, Mimno D, Moitra A, Sontag D, Wu Y, Zhu M (2013) A practical algorithm for topic modeling with provable guarantees. ICML 28:280–288
Google Scholar
Blei DM, Lafferty JD (2006) Dynamic topic models. In: ICML, pp 113–120
Blei DM, McAuliffe JD (2007) Supervised topic models. In: NIPS, pp 121–128
Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022
MATH Google Scholar
Cha Y, Cho J (2012) Social-network analysis using topic models. In: SIGIR, pp 565–574
Chang J, Gerrish S, Wang C, Boyd-graber JL, Blei DM (2009) Reading tea leaves: how humans interpret topic models. In: NIPS, pp 288–296
Chen M, Jin X, Shen D (2011) Short text classification improved by learning multi-granularity topics. In: IJCAI, pp 1776–1781
Chen Y, Amiri H, Li Z, Chua TS (2013a) Emerging topic detection for organizations from microblogs. In: SIGIR, pp 43–52
Chen Z, Mukherjee A, Liu B, Hsu M, Castellanos M, Ghosh R (2013b) Discovering coherent topics using general knowledge. In: CIKM, pp 209–218
Chua FCT, Asur S (2013) Automatic summarization of events from social media. In: ICWSM
Deerwester SC, Dumais ST, Landauer TK, Furnas GW, Harshman RA (1990) Indexing by latent semantic analysis. JASIS 41(6):391–407
Article Google Scholar
Fan R, Zhao J, Feng X, Xu K (2014) Topic dynamics in weibo: happy entertainment dominates but angry finance is more periodic. In: ASONAM, pp 230–233
Finkelstein L, Gabrilovich E, Matias Y, Rivlin E, Solan Z, Wolfman G, Ruppin E (2002) Placing search in context: the concept revisited. ACM Trans Inf Syst 20(1):116–131
Article Google Scholar
Heinrich G (2005) Parameter estimation for text analysis. http://www.arbylon.net/publications/text-est.pdf
Henderson K, Eliassi-Rad T (2009) Applying latent dirichlet allocation to group discovery in large graphs. In: SAC, pp 1456–1461
Hofmann T (1999) Probabilistic latent semantic indexing. In: SIGIR, pp 50–57
Hong L, Davison BD (2010) Empirical study of topic modeling in twitter. In: SOMA, pp 80–88
Jagarlamudi J, Daumé H III, Udupa R (2012) Incorporating lexical priors into topic models. In: EACL, pp 204–213
Jiang D, Leung KT, Vosecky J, Ng W (2014a) Personalized query suggestion with diversity awareness. In: ICDE, pp 400–411
Jiang D, Leung KWT, Ng W (2014b) Fast topic discovery from web search streams. In: WWW, pp 949–960
Jin O, Liu NN, Zhao K, Yu Y, Yang Q (2011) Transferring topical knowledge from auxiliary long texts for short text clustering. In: CIKM, pp 775–784
Li C, Cheung W, Ye Y, Zhang X, Chu D, Li X (2015) The author-topic-community model for author interest profiling and community discovery. Knowl Inf Syst 44(2):359–383
Article Google Scholar
Lin T, Tian W, Mei Q, Cheng H (2014) The dual-sparse topic model: mining focused topics and focused terms in short text. In: WWW, pp 539–550
McCallum A, Mimno D, Wallach HM (2009) Rethinking lda: why priors matter. In: NIPS, pp 1973–1981
Mimno D, Wallach HM, Talley E, Leenders M, McCallum A (2011) Optimizing semantic coherence in topic models. In: EMNLP, pp 262–272
Nigam K, McCallum A, Thrun S, Mitchell T (2000) Text classification from labeled and unlabeled documents using EM. Mach Learn 39(2–3):103–134
Article MATH Google Scholar
Peirsman Y, Heylen K, Geeraerts D (2008) Size matters: tight and loose context definitions in english word space models. In: Proceedings of the ESSLLI workshop on distributional lexical semantics, pp 34–41
Phan XH, Nguyen LM, Horiguchi S (2008) Learning to classify short and sparse text & web with hidden topics from large-scale data collections. In: WWW, pp 91–100
Quan X, Liu G, Lu Z, Ni X, Liu W (2010) Short text similarity based on probabilistic topics. Knowl Inf Syst 25(3):473–491
Article Google Scholar
Ramage D, Hall D, Nallapati R, Manning CD (2009) Labeled lda: a supervised topic model for credit attribution in multi-labeled corpora. In: EMNLP, pp 248–256
Ramage D, Dumais S, Liebling D (2010) Characterizing microblogs with topic models. In: ICWSM
Rosen-Zvi M, Griffiths T, Steyvers M, Smyth P (2004) The author-topic model for authors and documents. In: UAI, pp 487–494
Rubenstein H, Goodenough JB (1965) Contextual correlates of synonymy. Commun ACM 8(10):627–633
Article Google Scholar
Rubin TN, Chambers A, Smyth P, Steyvers M (2012) Statistical topic models for multi-label document classification. Mach Learn 88(1–2):157–208
Article MathSciNet MATH Google Scholar
Sahami M, Heilman TD (2006) A web-based kernel function for measuring the similarity of short text snippets. In: WWW, pp 377–386
Stevens K, Kegelmeyer P, Andrzejewski D, Buttler D (2012) Exploring topic coherence over many models and many topics. In: EMNLP-CoNLL, pp 952–961
Tang J, Meng Z, Nguyen X, Mei Q, Zhang M (2014) Understanding the limiting factors of topic modeling via posterior contraction analysis. In: ICML, pp 190–198
Tong Y, Cao CC, Chen L (2014) Tcs: efficient topic discovery over crowd-oriented service data. In: KDD, pp 861–870
Wang X, McCallum A (2006) Topics over time: a non-Markov continuous-time model of topical trends. In: KDD, pp 424–433
Wang X, Jia Y, Zhou B, Ding Z, Zheng L (2011) Computing semantic relatedness using chinese wikipedia links and taxonomy. J Chin Comput Syst 32(11):2237–2242
Google Scholar
Weng J, Lim EP, Jiang J, He Q (2010) Twitterrank: finding topic-sensitive influential twitterers. In: WSDM, pp 261–270
Yan X, Guo J, Lan Y, Cheng X (2013) A biterm topic model for short texts. In: WWW, pp 1445–1456
Yu L, Asur S, Huberman BA (2011) What trends in chinese social media. arXiv:1107.3522
Yu LL, Asur S, Huberman BA (2013) Dynamics of trends and attention in chinese social media. arXiv:1312.0649
Zhao WX, Jiang J, Weng J, He J, Lim EP, Yan H, Li X (2011) Comparing twitter and traditional media using topic models. In: ECIR, pp 338–349
Zhou T, Lyu MT, King I, Lou J (2015) Learning to suggest questions in social media. Knowl Inf Syst 43(2):389–416
Article Google Scholar

Download references

Acknowledgments

This work was supported by NSFC (Grant Nos. 71501005 and 61421003) and the fund of the State Key Lab of Software Development Environment (Grant No. SKLSDE-2015ZX-05).

Author information

Authors and Affiliations

State Key Lab of Software Development Environment, Beihang University, Beijing, China
Yuan Zuo & Ke Xu
School of Economics and Management, Beihang University, Beijing, China
Jichang Zhao

Authors

Yuan Zuo
View author publications
You can also search for this author in PubMed Google Scholar
Jichang Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Ke Xu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ke Xu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zuo, Y., Zhao, J. & Xu, K. Word network topic model: a simple but general solution for short and imbalanced texts. Knowl Inf Syst 48, 379–398 (2016). https://doi.org/10.1007/s10115-015-0882-z

Download citation

Received: 18 December 2014
Revised: 23 June 2015
Accepted: 24 August 2015
Published: 23 September 2015
Issue Date: August 2016
DOI: https://doi.org/10.1007/s10115-015-0882-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Word network topic model: a simple but general solution for short and imbalanced texts

Abstract

Access this article

Similar content being viewed by others

A CWTM Model of Topic Extraction for Short Text

Topic Modeling for Short Texts via Adaptive P $$\acute{o}$$ lya Urn Dirichlet Multinomial Mixture

Incorporating word embeddings into topic modeling of short text

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Word network topic model: a simple but general solution for short and imbalanced texts

Abstract

Access this article

Similar content being viewed by others

A CWTM Model of Topic Extraction for Short Text

Topic Modeling for Short Texts via Adaptive P $$\acute{o}$$ lya Urn Dirichlet Multinomial Mixture

Incorporating word embeddings into topic modeling of short text

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation