Skip to main content
Log in

Real-time event detection and classification in social text steam using embedding

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

Taming data will always be a significant challenge in online social networks. These networks are rapidly becoming the emerging source for users to explore the primary sources to seek information in the form of events. Rich informational data can be extracted from various social platforms like twitter text streams for direct insights into enduring topics and classifying them based on their similarities. To address the research issues of event detection and classification, we model events as evolving clusters over a period of time. The inability of conventional clustering algorithms to process the data streams mandates the use of a fast yet robust method. Therefore this work employs quick comparisons of data coming from social streams relying on a twin network known as the Siamese network, which can detect the novel event based on clustering by comparing their content dependent feature. We also trained dataset derived from the social text stream from twitter and other sources, where embedding encode every word representation mapped to a vector. This representation of word into real valued vectors provides a specific processing task for event classification. Finally, we compared the proposed technique with the existing methods, and the results obtained through several experiments are a clear indicator of the efficacy of the proposed scheme.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Data availability

Enquiries about data availability should be directed to the authors.

References

  1. Von Nordheim, G., Boczek, K., Koppers, L.: Sourcing the sources: an analysis of the use of twitter and facebook as a journalistic source over 10 years in the new york times, the guardian, and süddeutsche zeitung. Digit. J. 6(7), 807–828 (2018)

    Google Scholar 

  2. Imran, M., Castillo, C., Diaz, F., Vieweg, S.: Processing social media messages in mass emergency: a survey. ACM Computi. Surv. (CSUR) 47(4), 1–38 (2015)

    Article  Google Scholar 

  3. Xie, W., Zhu, F., Jiang, J., Lim, E.P., Wang, K.: Topicsketch: real-time bursty topic detection from twitter. IEEE Trans. Knowl. Data Eng. 28(8), 2216–2229 (2016)

    Article  Google Scholar 

  4. Zhang, C., Zhou, G., Yuan, Q., Zhuang, H., Zheng, Y., Kaplan, L., Wang, S., Han, J.: Geoburst: real-time local event detection in geo-tagged tweet streams. In: Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval, pp. 513–522 (2016)

  5. Zhang, C., Liu, L., Lei, D., Yuan, Q., Zhuang, H., Hanratty, T., Han, J.: Triovecevent: Embedding-based online local event detection in geo-tagged tweet streams. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 595–604 (2017)

  6. Alsaedi, N., Burnap, P., Rana, O.: Can we predict a riot? disruptive event detection using twitter. ACM Trans. Internet Technol. (TOIT) 17(2), 1–26 (2017)

    Article  Google Scholar 

  7. Kumar, J.P., Govindarajulu, P.: Near-duplicate web page detection: an efficient approach using clustering, sentence feature and fingerprinting. Int. J. Comput. Intell. Syst. 6(1), 1–13 (2013)

    Article  Google Scholar 

  8. Barbakh, W., Fyfe, C.: Online clustering algorithms. Int. J. Neural Syst. 18(03), 185–194 (2008)

    Article  Google Scholar 

  9. Sumalatha, M., Ananthi, M.: Efficient data retrieval using adaptive clustered indexing for continuous queries over streaming data. Cluster Comput. 22(5), 10503–10517 (2019)

    Article  Google Scholar 

  10. Wei, C.P., Lee, Y.H., Hsu, C.M.: Empirical comparison of fast partitioning-based clustering algorithms for large data sets. Expert Syst. Appl. 24(4), 351–363 (2003)

    Article  Google Scholar 

  11. Jiang, X., Zhang, N., Huang, J., Zhang, P., Liu, H.: Analysis of prediction algorithm for forest land spatial evolution trend in rural planning. Cluster Comput. 1–9 (2021)

  12. Vavliakis, K.N., Symeonidis, A.L., Mitkas, P.A.: Event identification in web social media through named entity recognition and topic modeling. Data Knowledge Eng. 88, 1–24 (2013)

    Article  Google Scholar 

  13. Aiello, L.M., Petkos, G., Martin, C., Corney, D., Papadopoulos, S., Skraba, R., Göker, A., Kompatsiaris, I., Jaimes, A.: Sensing trending topics in twitter. IEEE Trans. Multimedia 15(6), 1268–1282 (2013)

    Article  Google Scholar 

  14. Banda, L., Bharadwaj, K.K.: An approach to enhance the quality of recommendation using collaborative tagging. Int. J. Comput. Intell. Syst. 7(4), 650–659 (2014)

    Article  Google Scholar 

  15. Cadenas, J.M., Garrido, M.C., Martínez, R.: Nip-an imperfection processor to data mining datasets. Int. J. Comput. Intell. Syst. 6(sup1), 3–17 (2013)

    Article  Google Scholar 

  16. Hasan, M., Orgun, M.A., Schwitter, R.: A survey on real-time event detection from the twitter data stream. J. Inf. Sci. 44(4), 443–463 (2018)

    Article  Google Scholar 

  17. Weiler, A., Grossniklaus, M., Scholl, M.H.: Survey and experimental analysis of event detection techniques for twitter. Comput. J. 60(3), 329–346 (2017)

    Google Scholar 

  18. Yao, J., Cui, B., Xue, Z., Liu, Q.: Provenance-based indexing support in micro-blog platforms. In: 2012 IEEE 28th International Conference on Data Engineering, pp. 558–569. IEEE (2012)

  19. Singh, T., Kumari, M.: Role of text pre-processing in twitter sentiment analysis. Procedia Comput. Sci. 89, 549–554 (2016)

    Article  Google Scholar 

  20. Singh, T., Kumari, M., Pal, T.L., Chauhan, A.: Current trends in text mining for social media. Int. J. Grid Distrib. Comput. 10(6), 11–28 (2017)

    Article  Google Scholar 

  21. Aggarwal, CC., Subbian, K.: (2012) Event detection in social streams. In: Proceedings of the 2012 SIAM international conference on data mining, SIAM, pp. 624–635

  22. Xu, Q., Li, M.: A new cluster computing technique for social media data analysis. Clust. Comput. 22(2), 2731–2738 (2019)

    Article  MathSciNet  Google Scholar 

  23. Dong, X., Mavroeidis, D., Calabrese, F., Frossard, P.: Multiscale event detection in social media. Data Min. Knowl. Disc. 29(5), 1374–1405 (2015)

    Article  MathSciNet  Google Scholar 

  24. Wang, Z., Shou, L., Chen, K., Chen, G., Mehrotra, S.: On summarization and timeline generation for evolutionary tweet streams. IEEE Trans. Knowl. Data Eng. 27(5), 1301–1315 (2014)

    Article  Google Scholar 

  25. Camacho-Collados, J., Pilehvar, M.T.: From word to sense embeddings: A survey on vector representations of meaning. J. Artif. Intell. Res. 63, 743–788 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  26. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. (2013) arXiv preprint arXiv:1301.3781

  27. Becker, H., Naaman, M., Gravano, L.: Beyond trending topics: Real-world event identification on twitter. In: Proceedings of the International AAAI Conference on Web and Social Media, vol.  5, (2011)

  28. Fedoryszak, M., Frederick, B., Rajaram, V., Zhong, C.: (2019) Real-time event detection on social data streams. In: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, pp. 2774–2782

  29. Hasan, M., Orgun, M.A., Schwitter, R.: Real-time event detection from the twitter data stream using the twitternews+ framework. Inform. Process. Manag. 56(3), 1146–1165 (2019)

    Article  Google Scholar 

  30. Jiang, Z., Gao, S.: An intelligent recommendation approach for online advertising based on hybrid deep neural network and parallel computing. Clust. Comput. 23(3), 1987–2000 (2020)

    Article  Google Scholar 

  31. Yadav, A., Vishwakarma, D.K.: A comparative study on bio-inspired algorithms for sentiment analysis. Clust. Comput. 23(4), 2969–2989 (2020)

    Article  Google Scholar 

  32. De Boom, C., Van Canneyt, S., Demeester, T., Dhoedt, B.: Representation learning for very short texts using weighted word embedding aggregation. Pattern Recogn. Lett. 80, 150–156 (2016)

    Article  Google Scholar 

  33. Tang, J., Qu, M., Mei, Q.: Pte: Predictive text embedding through large-scale heterogeneous text networks. In: Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining, pp. 1165–1174 (2015)

  34. Jiang, Y.: Semantically-enhanced information retrieval using multiple knowledge sources. Cluster Comput. 1–20 (2020)

  35. Pimentel, M.A., Clifton, D.A., Clifton, L., Tarassenko, L.: A review of novelty detection. Signal Process. 99, 215–249 (2014)

    Article  Google Scholar 

  36. Karkali, M., Rousseau, F., Ntoulas, A., Vazirgiannis, M.: Efficient online novelty detection in news streams. In: International conference on web information systems engineering, pp. 57–71. Springer(2013)

  37. Zhao, H., Gallo, O., Frosio, I., Kautz, J.: Loss functions for image restoration with neural networks. IEEE Trans. Comput. Imaging 3(1), 47–57 (2016)

    Article  Google Scholar 

  38. Mueller, J., Thyagarajan, A.: Siamese recurrent architectures for learning sentence similarity. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol.  30 (2016)

  39. Winarko, E., Pulungan, R., et al.: Trending topics detection of indonesian tweets using bn-grams and doc-p. J. King Saud Univ.-Comput. Inform. Sci. 31(2), 266–274 (2019)

    Google Scholar 

Download references

Funding

The authors have not disclosed any funding.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Daya Sagar Gupta.

Ethics declarations

Conflict of interest

The authors have not disclosed any competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Singh, T., Kumari, M. & Gupta, D.S. Real-time event detection and classification in social text steam using embedding. Cluster Comput 25, 3799–3817 (2022). https://doi.org/10.1007/s10586-022-03610-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-022-03610-6

Keywords

Navigation