Advertisement

Frontiers of Computer Science

, Volume 11, Issue 5, pp 852–862 | Cite as

Ranking and tagging bursty features in text streams with context language models

  • Wayne Xin ZhaoEmail author
  • Chen Liu
  • Ji-Rong Wen
  • Xiaoming Li
Research Article
  • 50 Downloads

Abstract

Detecting and using bursty patterns to analyze text streams has been one of the fundamental approaches in many temporal text mining applications. So far, most existing studies have focused on developing methods to detect bursty features based purely on term frequency changes. Few have taken the semantic contexts of bursty features into consideration, and as a result the detected bursty features may not always be interesting and can be hard to interpret. In this article, we propose to model the contexts of bursty features using a language modeling approach. We propose two methods to estimate the context language models based on sentence-level context and document-level context.We then propose a novel topic diversity-based metric using the context models to find newsworthy bursty features. We also propose to use the context models to automatically assign meaningful tags to bursty features. Using a large corpus of news articles, we quantitatively show that the proposed context language models for bursty features can effectively help rank bursty features based on their newsworthiness and to assign meaningful tags to annotate bursty features. We also use two example text mining applications to qualitatively demonstrate the usefulness of bursty feature ranking and tagging.

Keywords

bursty features bursty features ranking bursty feature tagging context modeling 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Notes

Acknowledgements

The authors thank the anonymous reviewers for their valuable and constructive comments. The work was partially supported by the National Natural Science Foundation of China (Grant No. 61502502), the National Basic Research Program (973 Program) of China (2014CB340403), Beijing Natural Science Foundation (4162032), and the Open Fund of Beijing Key Laboratory on Integration and Analysis of Large-scale Stream Data, North China University of Technology, China.

Supplementary material

11704_2016_5144_MOESM1_ESM.ppt (229 kb)
Supplementary material, approximately 229 KB.

References

  1. 1.
    Kleinberg J. Bursty and hierarchical structure in streams. Data Mining Knowledge Discovery, 2003, 7(4): 373–397MathSciNetCrossRefGoogle Scholar
  2. 2.
    Vlachos M, Meek C, Vagena Z, Gunopulos D. Identifying similarities, periodicities and bursts for online search queries. In: Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data. 2004, 131–142CrossRefGoogle Scholar
  3. 3.
    Fung G P C, Yu J X, Yu P S, Lu H. Parameter free bursty events detection in text streams. In: Proceedings of the 31st International Conference on Very Large Data Bases. 2005, 181–192Google Scholar
  4. 4.
    He Q, Chang K Y, Lim E P. Analyzing feature trajectories for event detection. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 2007, 207–214Google Scholar
  5. 5.
    He Q, Chang K Y, Lim E P, Zhang J. Bursty feature representation for clustering text streams. In: Proceedings of the 2007 SIAM Conference on Data Mining. 2007, 491–496CrossRefGoogle Scholar
  6. 6.
    Lappas T, Arai B, Platakis M, Kotsakos D, Gunopulos D. On burstiness-aware search for document sequences. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2009, 477–486CrossRefGoogle Scholar
  7. 7.
    Fung G P C, Yu X J, Liu H, Yu P S. Time-dependent event hierarchy construction. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and DataMining. 2007, 300–309CrossRefGoogle Scholar
  8. 8.
    Parikh N, Sundaresan N. Scalable and near real-time burst detection from ecommerce queries. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2008, 972–980CrossRefGoogle Scholar
  9. 9.
    Kumar R, Novak J, Raghavan P, Tomkins A. On the bursty evolution of blogspace. In: Proceedings of the 12th International Conference on World Wide Web. 2003, 568–576Google Scholar
  10. 10.
    Wang X H, Zhai C X, Hu X, Sproat R. Mining correlated bursty topic patterns from coordinated text streams. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2007, 784–793CrossRefGoogle Scholar
  11. 11.
    Jiang Y L, Lin C X, Mei Q Z. Context comparison of bursty events in web search and online media. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing. 2010, 1077–1087Google Scholar
  12. 12.
    Yao J J, Cui B, Huang Y X, Jin X. Temporal and social context based burst detection from folksonomies. In: Proceedings of the 24th AAAI Conference on Artificial Intelligence. 2010, 1474–1479Google Scholar
  13. 13.
    Mei Q Z, Xin D, Cheng H, Han JW, Zhai C X. Generating semantic annotations for frequent patterns with context analysis. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2006, 337–346CrossRefGoogle Scholar
  14. 14.
    Mei Q Z, Shen X H, Zhai C X. Automatic labeling of multinomial topic models. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2007, 490–499CrossRefGoogle Scholar
  15. 15.
    Zhai C X. Statistical language models for information retrieval: a critical review. Foundations and Trends in Information Retrieval, 2008Google Scholar
  16. 16.
    Blei D M, Ng A Y, Jordan M I. Latent Dirichlet allocation. The Journal of Machine Learning Research, 2003, 3: 993–1022zbMATHGoogle Scholar
  17. 17.
    Zhai C, Lafferty J. Model-based feedback in the language modeling approach to information retrieval. In: Proceedings of the 10th International Conference on Information and Knowledge Management. 2001, 403–410Google Scholar
  18. 18.
    Dempster A P, Laird N M, Rubin D B. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, 1917, 39(1): 1–38MathSciNetzbMATHGoogle Scholar

Copyright information

© Higher Education Press and Springer-Verlag Berlin Heidelberg 2016

Authors and Affiliations

  • Wayne Xin Zhao
    • 1
    • 2
    Email author
  • Chen Liu
    • 3
  • Ji-Rong Wen
    • 1
    • 2
  • Xiaoming Li
    • 4
  1. 1.School of InformationRenmin University of ChinaBeijingChina
  2. 2.Beijing Key Laboratory of Big Data Management and Analysis MethodsRenmin University of ChinaBeijingChina
  3. 3.Beijing Key Laboratory on Integration and Analysis of Large-scale Stream DataBeijingChina
  4. 4.School of Electronics Engineering and Computer SciencePeking UniversityBeijingChina

Personalised recommendations