Improving Events Classification with Latent Space Clustering-Based Similarities

Wu, Jiaxuan; Gao, Jianghao; Fan, Yongdan; Cheng, Yuanjie; Zhu, Peng; Cheng, Dawei

doi:10.1007/978-981-19-7532-5_6

Jiaxuan Wu^10,11,
Jianghao Gao¹²,
Yongdan Fan¹²,
Yuanjie Cheng¹²,
Peng Zhu¹⁰ &
…
Dawei Cheng^10,11

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1640))

Included in the following conference series:

China National Conference on Big Data and Social Computing

467 Accesses

Abstract

The research on event intelligent analysis based on big data refers to the intelligent classification of monitoring events through the analysis of monitoring event alarm information in the operation and maintenance platform, to automatically recommend monitoring event processing solutions according to the event knowledge base. However, there are currently few methods to classify monitoring events. To solve this problem, our method relies on the BERT model and the Jieba word segmentation tool to perform keyword extraction, keyword word vector transformation and event representation vector generation for event information in training data. We then pre-classify the training data using the clustering algorithm and similarity to obtain information about each cluster. We establish the relationship between clusters and event classifications based on the pre-classification results and the classification labels of the training data. Finally, we process and analyze the new monitoring events that appear in the operation and maintenance platform, and effectively classify the new events according to the model training results. In addition, our method can periodically train the model to optimize the classification performance based on dynamically added data from the monitoring event database. We perform experiments on the real-life datasets and the results validate the effectiveness of our proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Z-Embedding: A Spectral Representation of Event Intervals for Efficient Clustering and Classification

Event Element Recognition Based on Improved K-means Algorithm

Event Detection and Recommendation Based on Heterogeneous Information

References

Aggarwal, C.C., Zhai, C.: A survey of text clustering algorithms. In: Aggarwal, C., Zhai, C. (eds.) Mining Text Data, pp. 77–128. Springer, Boston, MA (2012). https://doi.org/10.1007/978-1-4614-3223-4_4
Cheng, D., Niu, Z., Tu, Y., Zhang, L.: Prediction defaults for networked-guarantee loans. In: 2018 24th International Conference on Pattern Recognition (ICPR), pp. 361–366. IEEE (2018)
Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Fahad, A., et al.: A survey of clustering algorithms for big data: taxonomy and empirical analysis. IEEE Trans. Emerg. Top. Comput. 2(3), 267–279 (2014)
Article Google Scholar
Guo, G., Wang, H., Bell, D., Bi, Y., Greer, K.: Using KNN model for automatic text categorization. Soft. Comput. 10(5), 423–430 (2006). https://doi.org/10.1007/s00500-005-0503-y
Article Google Scholar
Hahsler, M., Piekenbrock, M., Doran, D.: DBSCAN: fast density-based clustering with R. J. Stat. Softw. 91, 1–30 (2019)
Article Google Scholar
Hartigan, J.A., Wong, M.A.: Algorithm AS 136: a k-means clustering algorithm. J. Roy Stat. Soc. Ser. C (Appl. Stat.) 28(1), 100–108 (1979)
MATH Google Scholar
Jin, Y., Huang, J.: Improved TFIDF algorithm based on information entropy and word length information. J. Zhejiang Univ. Tech. 49(2), 203–209 (2021)
Google Scholar
Jing, Y., Gou, H., Zhu, Y.: An improved density-based method for reducing training data in KNN. In: 2013 International Conference on Computational and Information Sciences, pp. 972–975. IEEE (2013)
Google Scholar
Knuth, D.E.: Dynamic Huffman coding. J. algorithms 6(2), 163–180 (1985)
Article MathSciNet MATH Google Scholar
Kwale, F.M.: A critical review of K means text clustering algorithms. Int. J. Adv. Res. Comput. Sci. 4(9), 27–34 (2013)
Google Scholar
Liang, X., Cheng, D., Yang, F., Luo, Y., Qian, W., Zhou, A.: F-HMTC: detecting financial events for investment decisions based on neural hierarchical multi-label text classification. In: IJCAI, pp. 4490–4496 (2020)
Google Scholar
Liu, C.Z., Sheng, Y.X., Wei, Z.Q., Yang, Y.Q.: Research of text classification based on improved TF-IDF algorithm. In: 2018 IEEE International Conference of Intelligent Robotic and Control Engineering (IRCE), pp. 218–222. IEEE (2018)
Google Scholar
Minaee, S., Kalchbrenner, N., Cambria, E., Nikzad, N., Chenaghlu, M., Gao, J.: Deep learning-based text classification: a comprehensive review. ACM Comput. Surv. (CSUR) 54(3), 1–40 (2021)
Article Google Scholar
Okogbaa, G., Huang, J., Shell, R.L.: Database design for predictive preventive maintenance system of automated manufacturing system. Comput. Indust. Eng. 23(1–4), 7–10 (1992)
Article Google Scholar
Qaiser, S., Ali, R.: Text mining: use of TF-IDF to examine the relevance of words to documents. Int. J. Comput. Appl. 181(1), 25–29 (2018)
Google Scholar
Sakakibara, Y., Misue, K., Koshiba, T.: Text classification and keyword extraction by learning decision trees. In: Proceedings of 9th IEEE Conference on Artificial Intelligence for Applications, p. 466. IEEE (1993)
Google Scholar
Sun, H., Chen, Q.Y.: Chinese text classification based on BERT and attention. J. Chin. Comput. Syst. 43(1), 22–26 (2022)
Google Scholar
Tu, Y., Niu, L., Chen, J., Cheng, D., Zhang, L.: Learning from web data with self-organizing memory module. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12846–12855 (2020)
Google Scholar
Wang, C., Nulty, P., Lillis, D.: A comparative study on word embeddings in deep learning for text classification. In: Proceedings of the 4th International Conference on Natural Language Processing and Information Retrieval, pp. 37–46 (2020)
Google Scholar
Wang, G., Lin, G.: Improved adaptive parameter DBSCAN clustering algorithm. Comput. Eng. Appl. 56(14), 45–51 (2020)
Google Scholar
Yang, H., Zhan, K., Yao, Q., Zhao, X., Zhang, J., Lee, Y.: Intent defined optical network with artificial intelligence-based automated operation and maintenance. Sci. China Inf. Sci. 63(6), 1–12 (2020). https://doi.org/10.1007/s11432-020-2838-6
Article Google Scholar
Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: an efficient data clustering method for very large databases. ACM Sigmod Rec. 25(2), 103–114 (1996)
Article Google Scholar
Zhu, P., Cheng, D., Luo, S., Xu, R., Liang, Y., Luo, Y.: Leveraging enterprise knowledge graph to infer web events’ influences via self-supervised learning. J. Web Semant. 74, 100722 (2022)
Article Google Scholar
Zhu, P., et al.: Improving Chinese named entity recognition by large-scale syntactic dependency graph. IEEE/ACM Trans. Audio Speech Lang. Process. 30, 979–991 (2022)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Collaborative Innovation Center of Internet Finance Safety, Tongji University, Shanghai, 201804, China
Jiaxuan Wu, Peng Zhu & Dawei Cheng
Department of Computer Science and Technology, Tongji University, Shanghai, 201804, China
Jiaxuan Wu & Dawei Cheng
System Security Department, Shanghai Financial Futures Information Technology Co., Ltd., Shanghai, 200122, China
Jianghao Gao, Yongdan Fan & Yuanjie Cheng

Authors

Jiaxuan Wu
View author publications
You can also search for this author in PubMed Google Scholar
Jianghao Gao
View author publications
You can also search for this author in PubMed Google Scholar
Yongdan Fan
View author publications
You can also search for this author in PubMed Google Scholar
Yuanjie Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Peng Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Dawei Cheng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jianghao Gao .

Editor information

Editors and Affiliations

Renmin University of China, Beijing, China
Xiaofeng Meng
Zhejiang University of Technology, Hangzhou, China
Qi Xuan
Zhejiang University, Hangzhou, China
Yang Yang
Shenzhen University, Shenzhen, China
Yang Yue
Hangzhou Normal University, Hangzhou, China
Zi-Ke Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wu, J., Gao, J., Fan, Y., Cheng, Y., Zhu, P., Cheng, D. (2022). Improving Events Classification with Latent Space Clustering-Based Similarities. In: Meng, X., Xuan, Q., Yang, Y., Yue, Y., Zhang, ZK. (eds) Big Data and Social Computing. BDSC 2022. Communications in Computer and Information Science, vol 1640. Springer, Singapore. https://doi.org/10.1007/978-981-19-7532-5_6

Download citation

DOI: https://doi.org/10.1007/978-981-19-7532-5_6
Published: 07 December 2022
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-7531-8
Online ISBN: 978-981-19-7532-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Improving Events Classification with Latent Space Clustering-Based Similarities

Abstract

Access this chapter

Similar content being viewed by others

Z-Embedding: A Spectral Representation of Event Intervals for Efficient Clustering and Classification

Event Element Recognition Based on Improved K-means Algorithm

Event Detection and Recommendation Based on Heterogeneous Information

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Improving Events Classification with Latent Space Clustering-Based Similarities

Abstract

Access this chapter

Similar content being viewed by others

Z-Embedding: A Spectral Representation of Event Intervals for Efficient Clustering and Classification

Event Element Recognition Based on Improved K-means Algorithm

Event Detection and Recommendation Based on Heterogeneous Information

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation