Abstract
Taming data will always be a significant challenge in online social networks. These networks are rapidly becoming the emerging source for users to explore the primary sources to seek information in the form of events. Rich informational data can be extracted from various social platforms like twitter text streams for direct insights into enduring topics and classifying them based on their similarities. To address the research issues of event detection and classification, we model events as evolving clusters over a period of time. The inability of conventional clustering algorithms to process the data streams mandates the use of a fast yet robust method. Therefore this work employs quick comparisons of data coming from social streams relying on a twin network known as the Siamese network, which can detect the novel event based on clustering by comparing their content dependent feature. We also trained dataset derived from the social text stream from twitter and other sources, where embedding encode every word representation mapped to a vector. This representation of word into real valued vectors provides a specific processing task for event classification. Finally, we compared the proposed technique with the existing methods, and the results obtained through several experiments are a clear indicator of the efficacy of the proposed scheme.
Similar content being viewed by others
Data availability
Enquiries about data availability should be directed to the authors.
References
Von Nordheim, G., Boczek, K., Koppers, L.: Sourcing the sources: an analysis of the use of twitter and facebook as a journalistic source over 10 years in the new york times, the guardian, and süddeutsche zeitung. Digit. J. 6(7), 807–828 (2018)
Imran, M., Castillo, C., Diaz, F., Vieweg, S.: Processing social media messages in mass emergency: a survey. ACM Computi. Surv. (CSUR) 47(4), 1–38 (2015)
Xie, W., Zhu, F., Jiang, J., Lim, E.P., Wang, K.: Topicsketch: real-time bursty topic detection from twitter. IEEE Trans. Knowl. Data Eng. 28(8), 2216–2229 (2016)
Zhang, C., Zhou, G., Yuan, Q., Zhuang, H., Zheng, Y., Kaplan, L., Wang, S., Han, J.: Geoburst: real-time local event detection in geo-tagged tweet streams. In: Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval, pp. 513–522 (2016)
Zhang, C., Liu, L., Lei, D., Yuan, Q., Zhuang, H., Hanratty, T., Han, J.: Triovecevent: Embedding-based online local event detection in geo-tagged tweet streams. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 595–604 (2017)
Alsaedi, N., Burnap, P., Rana, O.: Can we predict a riot? disruptive event detection using twitter. ACM Trans. Internet Technol. (TOIT) 17(2), 1–26 (2017)
Kumar, J.P., Govindarajulu, P.: Near-duplicate web page detection: an efficient approach using clustering, sentence feature and fingerprinting. Int. J. Comput. Intell. Syst. 6(1), 1–13 (2013)
Barbakh, W., Fyfe, C.: Online clustering algorithms. Int. J. Neural Syst. 18(03), 185–194 (2008)
Sumalatha, M., Ananthi, M.: Efficient data retrieval using adaptive clustered indexing for continuous queries over streaming data. Cluster Comput. 22(5), 10503–10517 (2019)
Wei, C.P., Lee, Y.H., Hsu, C.M.: Empirical comparison of fast partitioning-based clustering algorithms for large data sets. Expert Syst. Appl. 24(4), 351–363 (2003)
Jiang, X., Zhang, N., Huang, J., Zhang, P., Liu, H.: Analysis of prediction algorithm for forest land spatial evolution trend in rural planning. Cluster Comput. 1–9 (2021)
Vavliakis, K.N., Symeonidis, A.L., Mitkas, P.A.: Event identification in web social media through named entity recognition and topic modeling. Data Knowledge Eng. 88, 1–24 (2013)
Aiello, L.M., Petkos, G., Martin, C., Corney, D., Papadopoulos, S., Skraba, R., Göker, A., Kompatsiaris, I., Jaimes, A.: Sensing trending topics in twitter. IEEE Trans. Multimedia 15(6), 1268–1282 (2013)
Banda, L., Bharadwaj, K.K.: An approach to enhance the quality of recommendation using collaborative tagging. Int. J. Comput. Intell. Syst. 7(4), 650–659 (2014)
Cadenas, J.M., Garrido, M.C., Martínez, R.: Nip-an imperfection processor to data mining datasets. Int. J. Comput. Intell. Syst. 6(sup1), 3–17 (2013)
Hasan, M., Orgun, M.A., Schwitter, R.: A survey on real-time event detection from the twitter data stream. J. Inf. Sci. 44(4), 443–463 (2018)
Weiler, A., Grossniklaus, M., Scholl, M.H.: Survey and experimental analysis of event detection techniques for twitter. Comput. J. 60(3), 329–346 (2017)
Yao, J., Cui, B., Xue, Z., Liu, Q.: Provenance-based indexing support in micro-blog platforms. In: 2012 IEEE 28th International Conference on Data Engineering, pp. 558–569. IEEE (2012)
Singh, T., Kumari, M.: Role of text pre-processing in twitter sentiment analysis. Procedia Comput. Sci. 89, 549–554 (2016)
Singh, T., Kumari, M., Pal, T.L., Chauhan, A.: Current trends in text mining for social media. Int. J. Grid Distrib. Comput. 10(6), 11–28 (2017)
Aggarwal, CC., Subbian, K.: (2012) Event detection in social streams. In: Proceedings of the 2012 SIAM international conference on data mining, SIAM, pp. 624–635
Xu, Q., Li, M.: A new cluster computing technique for social media data analysis. Clust. Comput. 22(2), 2731–2738 (2019)
Dong, X., Mavroeidis, D., Calabrese, F., Frossard, P.: Multiscale event detection in social media. Data Min. Knowl. Disc. 29(5), 1374–1405 (2015)
Wang, Z., Shou, L., Chen, K., Chen, G., Mehrotra, S.: On summarization and timeline generation for evolutionary tweet streams. IEEE Trans. Knowl. Data Eng. 27(5), 1301–1315 (2014)
Camacho-Collados, J., Pilehvar, M.T.: From word to sense embeddings: A survey on vector representations of meaning. J. Artif. Intell. Res. 63, 743–788 (2018)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. (2013) arXiv preprint arXiv:1301.3781
Becker, H., Naaman, M., Gravano, L.: Beyond trending topics: Real-world event identification on twitter. In: Proceedings of the International AAAI Conference on Web and Social Media, vol. 5, (2011)
Fedoryszak, M., Frederick, B., Rajaram, V., Zhong, C.: (2019) Real-time event detection on social data streams. In: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, pp. 2774–2782
Hasan, M., Orgun, M.A., Schwitter, R.: Real-time event detection from the twitter data stream using the twitternews+ framework. Inform. Process. Manag. 56(3), 1146–1165 (2019)
Jiang, Z., Gao, S.: An intelligent recommendation approach for online advertising based on hybrid deep neural network and parallel computing. Clust. Comput. 23(3), 1987–2000 (2020)
Yadav, A., Vishwakarma, D.K.: A comparative study on bio-inspired algorithms for sentiment analysis. Clust. Comput. 23(4), 2969–2989 (2020)
De Boom, C., Van Canneyt, S., Demeester, T., Dhoedt, B.: Representation learning for very short texts using weighted word embedding aggregation. Pattern Recogn. Lett. 80, 150–156 (2016)
Tang, J., Qu, M., Mei, Q.: Pte: Predictive text embedding through large-scale heterogeneous text networks. In: Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining, pp. 1165–1174 (2015)
Jiang, Y.: Semantically-enhanced information retrieval using multiple knowledge sources. Cluster Comput. 1–20 (2020)
Pimentel, M.A., Clifton, D.A., Clifton, L., Tarassenko, L.: A review of novelty detection. Signal Process. 99, 215–249 (2014)
Karkali, M., Rousseau, F., Ntoulas, A., Vazirgiannis, M.: Efficient online novelty detection in news streams. In: International conference on web information systems engineering, pp. 57–71. Springer(2013)
Zhao, H., Gallo, O., Frosio, I., Kautz, J.: Loss functions for image restoration with neural networks. IEEE Trans. Comput. Imaging 3(1), 47–57 (2016)
Mueller, J., Thyagarajan, A.: Siamese recurrent architectures for learning sentence similarity. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016)
Winarko, E., Pulungan, R., et al.: Trending topics detection of indonesian tweets using bn-grams and doc-p. J. King Saud Univ.-Comput. Inform. Sci. 31(2), 266–274 (2019)
Funding
The authors have not disclosed any funding.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors have not disclosed any competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Singh, T., Kumari, M. & Gupta, D.S. Real-time event detection and classification in social text steam using embedding. Cluster Comput 25, 3799–3817 (2022). https://doi.org/10.1007/s10586-022-03610-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10586-022-03610-6