Abstract
The research on event intelligent analysis based on big data refers to the intelligent classification of monitoring events through the analysis of monitoring event alarm information in the operation and maintenance platform, to automatically recommend monitoring event processing solutions according to the event knowledge base. However, there are currently few methods to classify monitoring events. To solve this problem, our method relies on the BERT model and the Jieba word segmentation tool to perform keyword extraction, keyword word vector transformation and event representation vector generation for event information in training data. We then pre-classify the training data using the clustering algorithm and similarity to obtain information about each cluster. We establish the relationship between clusters and event classifications based on the pre-classification results and the classification labels of the training data. Finally, we process and analyze the new monitoring events that appear in the operation and maintenance platform, and effectively classify the new events according to the model training results. In addition, our method can periodically train the model to optimize the classification performance based on dynamically added data from the monitoring event database. We perform experiments on the real-life datasets and the results validate the effectiveness of our proposed method.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Aggarwal, C.C., Zhai, C.: A survey of text clustering algorithms. In: Aggarwal, C., Zhai, C. (eds.) Mining Text Data, pp. 77–128. Springer, Boston, MA (2012). https://doi.org/10.1007/978-1-4614-3223-4_4
Cheng, D., Niu, Z., Tu, Y., Zhang, L.: Prediction defaults for networked-guarantee loans. In: 2018 24th International Conference on Pattern Recognition (ICPR), pp. 361–366. IEEE (2018)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Fahad, A., et al.: A survey of clustering algorithms for big data: taxonomy and empirical analysis. IEEE Trans. Emerg. Top. Comput. 2(3), 267–279 (2014)
Guo, G., Wang, H., Bell, D., Bi, Y., Greer, K.: Using KNN model for automatic text categorization. Soft. Comput. 10(5), 423–430 (2006). https://doi.org/10.1007/s00500-005-0503-y
Hahsler, M., Piekenbrock, M., Doran, D.: DBSCAN: fast density-based clustering with R. J. Stat. Softw. 91, 1–30 (2019)
Hartigan, J.A., Wong, M.A.: Algorithm AS 136: a k-means clustering algorithm. J. Roy Stat. Soc. Ser. C (Appl. Stat.) 28(1), 100–108 (1979)
Jin, Y., Huang, J.: Improved TFIDF algorithm based on information entropy and word length information. J. Zhejiang Univ. Tech. 49(2), 203–209 (2021)
Jing, Y., Gou, H., Zhu, Y.: An improved density-based method for reducing training data in KNN. In: 2013 International Conference on Computational and Information Sciences, pp. 972–975. IEEE (2013)
Knuth, D.E.: Dynamic Huffman coding. J. algorithms 6(2), 163–180 (1985)
Kwale, F.M.: A critical review of K means text clustering algorithms. Int. J. Adv. Res. Comput. Sci. 4(9), 27–34 (2013)
Liang, X., Cheng, D., Yang, F., Luo, Y., Qian, W., Zhou, A.: F-HMTC: detecting financial events for investment decisions based on neural hierarchical multi-label text classification. In: IJCAI, pp. 4490–4496 (2020)
Liu, C.Z., Sheng, Y.X., Wei, Z.Q., Yang, Y.Q.: Research of text classification based on improved TF-IDF algorithm. In: 2018 IEEE International Conference of Intelligent Robotic and Control Engineering (IRCE), pp. 218–222. IEEE (2018)
Minaee, S., Kalchbrenner, N., Cambria, E., Nikzad, N., Chenaghlu, M., Gao, J.: Deep learning-based text classification: a comprehensive review. ACM Comput. Surv. (CSUR) 54(3), 1–40 (2021)
Okogbaa, G., Huang, J., Shell, R.L.: Database design for predictive preventive maintenance system of automated manufacturing system. Comput. Indust. Eng. 23(1–4), 7–10 (1992)
Qaiser, S., Ali, R.: Text mining: use of TF-IDF to examine the relevance of words to documents. Int. J. Comput. Appl. 181(1), 25–29 (2018)
Sakakibara, Y., Misue, K., Koshiba, T.: Text classification and keyword extraction by learning decision trees. In: Proceedings of 9th IEEE Conference on Artificial Intelligence for Applications, p. 466. IEEE (1993)
Sun, H., Chen, Q.Y.: Chinese text classification based on BERT and attention. J. Chin. Comput. Syst. 43(1), 22–26 (2022)
Tu, Y., Niu, L., Chen, J., Cheng, D., Zhang, L.: Learning from web data with self-organizing memory module. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12846–12855 (2020)
Wang, C., Nulty, P., Lillis, D.: A comparative study on word embeddings in deep learning for text classification. In: Proceedings of the 4th International Conference on Natural Language Processing and Information Retrieval, pp. 37–46 (2020)
Wang, G., Lin, G.: Improved adaptive parameter DBSCAN clustering algorithm. Comput. Eng. Appl. 56(14), 45–51 (2020)
Yang, H., Zhan, K., Yao, Q., Zhao, X., Zhang, J., Lee, Y.: Intent defined optical network with artificial intelligence-based automated operation and maintenance. Sci. China Inf. Sci. 63(6), 1–12 (2020). https://doi.org/10.1007/s11432-020-2838-6
Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: an efficient data clustering method for very large databases. ACM Sigmod Rec. 25(2), 103–114 (1996)
Zhu, P., Cheng, D., Luo, S., Xu, R., Liang, Y., Luo, Y.: Leveraging enterprise knowledge graph to infer web events’ influences via self-supervised learning. J. Web Semant. 74, 100722 (2022)
Zhu, P., et al.: Improving Chinese named entity recognition by large-scale syntactic dependency graph. IEEE/ACM Trans. Audio Speech Lang. Process. 30, 979–991 (2022)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Wu, J., Gao, J., Fan, Y., Cheng, Y., Zhu, P., Cheng, D. (2022). Improving Events Classification with Latent Space Clustering-Based Similarities. In: Meng, X., Xuan, Q., Yang, Y., Yue, Y., Zhang, ZK. (eds) Big Data and Social Computing. BDSC 2022. Communications in Computer and Information Science, vol 1640. Springer, Singapore. https://doi.org/10.1007/978-981-19-7532-5_6
Download citation
DOI: https://doi.org/10.1007/978-981-19-7532-5_6
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-7531-8
Online ISBN: 978-981-19-7532-5
eBook Packages: Computer ScienceComputer Science (R0)