Skip to main content
Log in

WebKey: a graph-based method for event detection in web news

  • Published:
Journal of Intelligent Information Systems Aims and scope Submit manuscript

Abstract

With rapid and vast publishing of news over the Internet, there is a surge of interest to detect underlying hot events from online news streams. There are two main challenges in event detection: accuracy and scalability. In this paper, we propose a fast and efficient method to detect events in news websites. First, we identify bursty terms which suddenly appear in a lot of news documents. Then, we construct a novel co-occurrence graph between terms in which nodes and edges are weighted based on important features such as click and document frequency within burst intervals. Finally, a weighted community detection algorithm is used to cluster terms and find events. We also propose a couple of techniques to reduce the size of the graph. The results of our evaluations show that the proposed method yields a much higher precision and recall than past methods, such that their harmonic mean is improved by at least 40%. Moreover, it reduces the running time and memory usage by a factor of at least 2.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. https://github.com/abosamoor/polyglot

  2. https://redis.io

  3. http://parsijoo.ir

References

  • Aggarwal, C.C., & Subbian, K. (2012). Event detection in social streams. In Proceedings of the 2012 SIAM international conference on data mining (pp. 624–635).

  • Allan, J. (2002). Topic detection and tracking: event-based information organization. In Topic detection and tracking: event-based information organization (pp. 1–16). Springer Science and Business Media.

  • Allan, J., Carbonell, J., Doddington, G., Yamron, J., Yang, Y., et al. (1998). Topic detection and tracking pilot study: final report. In Proceedings of the DARPA broadcast news transcription and understanding workshop (pp. 194–218).

  • Atefeh, F., & Khreich, W. (2015). A survey of techniques for event detection in Twitter. Computational Intelligence, 31(1), 132–164.

    Article  MathSciNet  Google Scholar 

  • Becker, H., Naaman, M., Gravano, L. (2010). Learning similarity metrics for event identification in social media. In Proceedings of the 3rd ACM international conference on web search and data mining (pp. 291–300).

  • Borsje, J., Hogenboom, F., Frasincar, F. (2010). Semi-automatic financial events discovery based on lexico-semantic patterns. International Journal of Web Engineeringand Technology, 6(2), 115–140.

    Article  Google Scholar 

  • Brandes, U. (2001). A faster algorithm for betweenness centrality. Journal of Mathematical Sociology, 25(2), 163–177.

    Article  Google Scholar 

  • Cataldi, M., DiCaro, L., Schifanella, C. (2010). Emerging topic detection on twitter based on temporal and social terms evaluation. In Proceedings of the 10th international workshop on multimedia data mining (Article No. 4).

  • Chen, Q., Guo, X., Bai, H. (2017). Semantic-based topic detection using Markov decision processes. Elsevier Neurocomputing, 242, 40–50.

    Article  Google Scholar 

  • Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C. (2009a). Breadth-first search. In Introduction to algorithms. 3rd edn. Chapter 22 (pp. 594–602): The MIT Press.

  • Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C. (2009b). Dijkstra’s algorithm. In Introduction to algorithms. 3rd edn. Chapter 24 (pp. 658–662): The MIT Press.

  • Dai, X., & Sun, Y. (2010). Event identification within news topics. In Proceedings of IEEE international conference on intelligent computing and integrated systems (ICISS) (pp. 498–502).

  • Fung, G.P.C., Yu, J.X., Yu, P.S., Lu, H. (2005). Parameter free bursty events detection in text streams. In Proceedings of the 31st international conference on very large data bases (VLDB) (pp. 181–192).

  • Garg, M., & Kumar, M. (2018). TWCM: Twitter word co-occurrence model for event detection. Elsevier Procedia Computer Science, 143, 434–441.

    Article  Google Scholar 

  • Ge, T., Cui, L., Chang, B., Sui, Z., Zhou, M. (2016). Event detection with burst information networks. In Proceedings of 26th international conference on computational linguistics: technical papers (pp. 3276–3286).

  • Hu, L., Zhang, B., Hou, L., Li, J. (2017). Adaptive online event detection in news streams. Elsevier Knowledge-Based Systems, 138, 105–112.

    Article  Google Scholar 

  • Kleinberg, J. (2003). Bursty and hierarchical structure in streams. Data Mining and Knowledge Discovery, 7(4), 373–397.

    Article  MathSciNet  Google Scholar 

  • Kourtellis, N., Morales, G.D.F., Bonchi, F. (2015). Scalable online betweenness centrality in evolving graphs. IEEE Transactions on Knowledge and Data Engineering, 27(9), 2494–2506.

    Article  Google Scholar 

  • Leskovec, J., & Faloutsos, C. (2006). Sampling from large graphs. In Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 631–636).

  • Li, H., & Yamanishi, K. (2000). Topic analysis using a finite mixture model. In Proceedings of the joint SIGDAT conference on empirical methods in natural language processing (EMNLP) and very large corpora (pp. 35–44).

  • Li, Z., Wang, B., Li, M., Ma, W.Y. (2005). A probabilistic model for retrospective news event detection. In Proceedings of the 28th annual international ACM SIGIR conference on research and development in information retrieval (pp. 106–113).

  • Manning, C.D., Raghavan, P., Schutze, H. (2009). Introduction to information retrieval. Cambridge University Press.

  • Miller, G.A. (1995). WordNet: a lexical database for English. Communications of the ACM, 38(11), 39–41.

    Article  Google Scholar 

  • Mori, M., Miura, T., Shioya, I. (2006). Topic detection and tracking for news web pages. In Proceedings of the 2006 IEEE/WIC/ACM international conference on web intelligence (pp. 338–342).

  • Nasre, M., Pontecorvi, M., Ramachandran, V. (2014). Betweenness centrality, incremental and faster. In Springer international symposium on mathematical foundations of computer science (pp. 577–588).

  • Petkos, G., Papadopoulos, S., Aiello, L., Skraba, R., Kompatsiaris, Y. (2014). A soft frequent pattern mining approach for textual topic detection. In Proceedings of the 4th international conference on web intelligence, mining and semantics (WIMS) (Article No. 25).

  • Phuvipadawat, S., & Murata, T. (2010). Breaking news detection and tracking in Twitter. In Proceedings of the IEEE international conference on web intelligence and intelligent agent technology (WI-IAT) (pp. 120–123).

  • Sakaki, T., Okazaki, M., Matsuo, Y. (2010). Earthquake shakes Twitter users: real-time event detection by social sensors. In Proceedings of the 19th international conference on World Wide Web (WWW) (pp. 851–860).

  • Sakaki, T., Okazaki, M., Matsuo, Y. (2013). Tweet analysis for real-time event detection and earthquake reporting system development. IEEE Transactions on Knowledge and Data Engineering, 25(4), 919–931.

    Article  Google Scholar 

  • Sankaranarayanan, J., Samet, H., Teitler, B.E., Lieberman, M.D., Sperling, J. (2009). Twitterstand: news in tweets. In Proceedings of the 17th ACM SigSpatial international conference on advances in geographic information systems (pp. 42–51).

  • Sayyadi, H., & Raschid, L. (2013). A graph analytical approach for topic detection. ACM Transactions on Internet Technology, 13(2), Article No. 4.

    Article  Google Scholar 

  • Sayyadi, H., Hurst, M., Maykov, A. (2009). Event detection and tracking in social streams. In Proceedings of international AAAI conference on web and social media.

  • Shakiba, T., Zarifzadeh, S., Derhami, V. (2018). Spam query detection using stream clustering. Springer World Wide Web, 21(2), 557–572.

    Article  Google Scholar 

  • Taghi-Zadeh, H., Sadreddini, M.H., Diyanati, M.H., Rasekh, A.H. (2017). A new hybrid stemming method for persian language. Digital Scholarship in the Humanities, 32(1), 209–221.

    Google Scholar 

  • Wartena, C., & Brussee, R. (2008). Topic detection by clustering keywords. In Proceedings of the IEEE computer society DEXA workshops (pp. 54–58).

  • Wei, Y., Singh, L., Buttler, D., Gallagher, B. (2018). Using semantic graphs to detect overlapping target events and story lines from newspaper articles. International Journal of Data Science and Analytics, 5(1), 41–60.

    Article  Google Scholar 

  • Weng, J., & Lee, B.S. (2011). Event detection in Twitter. In Proceedings of the international AAAI conference on web and social media (ICWSM) (pp. 401–422).

  • Xiaomei, Z., Jing, Y., Jianpei, Z. (2018). Sentiment-based and hashtag-based Chinese online bursty event detection. Springer Multimedia Tools and Applications, 77 (16), 725–750.

    Google Scholar 

  • Yang, Y., Pierce, T., Carbonell, J. (1998). A study of retrospective and on-line event detection. In Proceedings of the 21st annual international ACM SIGIR conference on research and development in information retrieval (pp. 28–36).

  • Yang, Y., Carbonell, J.G., Brown, R.D., Pierce, T., Archibald, B.T., Liu, X. (1999). Learning approaches for detecting and tracking news events. IEEE Intelligent Systems and Their Applications, 14(4), 32–43.

    Article  Google Scholar 

  • Yang, C.C., Shi, X., Wei, C.P. (2009). Discovering event evolution graphs from news corpora. IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans, 39(4), 850–863.

    Article  Google Scholar 

  • Zhang, W., Pan, G., Wu, Z., Li, S. (2013). Online community detection for large complex networks. In Proceedings of the 23th international joint conference on artificial intelligence (IJCAI) (pp. 1903–1909).

  • Zhao, W.X., Chen, R., Fan, K., Yan, H., Li, X. (2012). A novel burst-based text representation model for scalable event detection. In Proceedings of the 50th annual meeting of the association for computational linguistics: short papers (pp. 43–47).

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sajjad Zarifzadeh.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Rasouli, E., Zarifzadeh, S. & Rafsanjani, A.J. WebKey: a graph-based method for event detection in web news. J Intell Inf Syst 54, 585–604 (2020). https://doi.org/10.1007/s10844-019-00576-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10844-019-00576-7

Keywords

Navigation