Abstract
The Internet Water Army (IWA) brings a great threat on cyber security. How to accurately recognize the IWA has become a challenging research issue. Most work exploits the behavioral analysis to distinguish IWA and non-IWA. These approaches are mainly divided into categories: direct compute method and training learning method. The direct calculation method mainly relies on crawler, and makes multidimensional eigenvector to detect IWA. Nevertheless, it did not consider the behavior rules based on the time sequence, and just determine the user behavior by feather vector, so the results are not very accurate. The recognition rate also needs to be improved. The second method mainly relies on cluster approaches. However, cluster approaches require pre-determined the number of clustering, which will directly lead to the model over fitting and owe fitting because of inadequate unit number. In this paper we propose a sequential pattern approach based on DPMM for IWA identification. Firstly, we analyze the user behavior of potential IWA and get a feature vector of user behavior. Secondly, we use DPMM to get effective and accurate clustering results. Finally, we use the sequential pattern mining algorithms to detect navy accounts. Our clustering results with datasets come from Tianya forum show a very ideal consequence.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Zhang, Y.: Data Clustering via Nonparametric Bayesian Modelsm. Journal of Ningbo University (NSEE) 26(4), 24–28 (2013)
Chen, C., Wu, K., Srinivasan, V., et al.: Battling the internet water army: Detection of hidden paid posters. In: Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, pp. 116–120. ACM (2013)
Xu, Q., Zhou, J., Chen, J.: Dirichlet Process and Its Applications in Natural Language Processing. Journal of Chinese Information Processing 23(5), 25–32 (2009)
Zhang, L., Liu, H.: A clustering method based on Dirichlet process mixture model. Journal of Chian University of Mining Technology 41(1), 159–163 (2012)
Ding, Z., Song, W., Li, J.: User Behavior An alysis in Social Network Service Based on Sequential Pattern. Journal of Moder Information 33(3), 56–60 (2013)
Zhou, J., Wang, F., Zeng, D.: Hierarchical Dirichlet Processes and Their Applications. Acta Automatica Sinica 37(4), 389–407 (2011)
Mei, S., Wang, F., Zhou, S.: Dirichlet process mixture model, extensions and applications. Chin. Sci. Bull. (Chin Ver.) 57(34), 3243–3257 (2012)
Xia, M., Wang, X., Sun, Y., Jin, T.: Research on Sequential Pattern Mining Algorithms. Computer Technology and Development 16(4), 4–6 (2006)
Lu, F., Zhang, W.: Research on the Characters of Four Sequential Patterns Mining Algorithms. Journal of Wuhan University of Technology 28(2), 57–60 (2006)
Chen, Z., Yang, B., Song, W., Song, Z.: Survey of sequential pattern mining. Application Research of Computers 25(7), 1960–1963 (2008)
Teh, Y.W., Jordan, M.I., Beal, M.J., et al.: Hierarchical dirichlet processes. Journal of the American Statistical Association 101(476), 1566–1581 (2006)
Wang, C., Blei, D.M.: Variational inference for the nested Chinese restaurant process. In: Proceedings of the 23rd Annual Conference on Neural Information Processing Systems, Canada, pp. 1990–1998 (2009)
Casella, G., George, E.I.: Explaining the Gibbs sampler. The American Statistician 46(3), 167–174 (2009)
Sudderth, E.B.: Graphical Models for Visual Object Recognition and Tracking [Ph. D. dissertation], Department of Electrical Engineering and Computer Science, USA (2006)
Escobar, M.D., West, M., West, M.: Bayesian density estimation and inference using mixtures. Journal of the AmericanStatistical Association 90(430), 577–588 (1995)
Antoniak, C.E.: Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems. The Annals of Statistics 2(6), 1152–1174 (1974)
Hjort, N.L., Holmes, C., Muller, P., Walker, S.G.: Bayesian Nonparametrics. Cambridge University Press, Cambridge (2010)
Koller, D., Friedman, N.: Probabilistic Graphical Models:Principles and Techniques. The MIT Press, Massachusetts (2009)
MacQueen, J.: Some Methods for Classification and Analysis of Multivariate Observations. In: Proceedings of the fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1(14), pp. 281–297 (1967)
Ester, M., Kriecel, H.P., Aander, J., et al.: A density-based algorithm for discovering clusters in large spatial database with noise. In: Proceedings of the 2nd International Conference on Knowledge Discovering and Data Mining, Portland, pp. 226–231 (1996)
Sheikholeslami, G., Chattrerjee, S., Zhang, A.: WaveCluster:A Multi-Resolution Clustering Approach for Very Large Apatial Databases. In: Proceedings of the 24th VLDB Conference, New York, USA, pp. 428–439 (1998)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Li, D., Li, Q., Hu, Y., Niu, W., Tan, J., Guo, L. (2014). An Approach to Detect the Internet Water Army via Dirichlet Process Mixture Model Based GSP Algorithm. In: Batten, L., Li, G., Niu, W., Warren, M. (eds) Applications and Techniques in Information Security. ATIS 2014. Communications in Computer and Information Science, vol 490. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-45670-5_9
Download citation
DOI: https://doi.org/10.1007/978-3-662-45670-5_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-45669-9
Online ISBN: 978-3-662-45670-5
eBook Packages: Computer ScienceComputer Science (R0)