Abstract
Discovery of information in web log data is a very popular research area in the field of data mining. Two of the objectives of favorite applications are to obtain useful information of web users’ behavior and to analyze the structure of web sites. In this paper, we suggest a novel approach to generate web sequential patterns using the gap-constrained method in web log data. The process of mining task in the proposed approach is described as follows. First, pre-process of the raw web log data is introduced by removing irrelevant or redundant items, gathering the same users and transforming the web log data into a set of tuples (sequence identifier, sequence) constrained by visiting time. Second, web access patterns, which are closed sequential patterns with gap constraints, are generated using the Gap-BIDE algorithm in web log data with two parameters, minimum support threshold and gap constraint. In the experiment, a data set is derived from http://www.vtsns.edu.rs/maja/, which is proposed in [1]. The result shows that, with the application of sequential pattern mining in the web log data presented in this paper, we can find information about navigational behavior of web users and the structure of the web page can be designed more legitimately by the order of obtained patterns.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Dimitrijević, M., Bošnjak, Z., Subotica, S.: Discovering Interesting Association Rules in the Web Log Usage Data. Interdisciplinary Journal of Information, Knowledge, and Management 5, 191–207 (2010)
Grace, L.K.J., Maheswari, V., Nagamalai, D.: Analysis of Web Logs and Web User in Web Mining. International Journal of Network Security & Its Applications (IJNSA)Â 3(1) (2011)
Saxena, K., Shukla, R.: Significant Interval and Frequent Pattern Discovery in Web Log Data. International Journal of Computer Science Issues, IJCSIÂ 7(1(3)) (2010)
Suresh, K., Paul, S.: Distributed Linear Programming for Weblog Data using Mining Techniques in Distributed Environment. International Journal of Computer Applications (0975-8887)Â 11(7) (2010)
Wang, Y., Le, J., Huang, D.: A Method for Privacy Preserving Mining of Association Rules Based on Web Usage Mining. In: 2010 International Conference on Web Information Systems and Mining, WISM, vol. 1, pp. 33–37. IEEE Computer Society, Washington, DC (2010)
Wei, C., Sen, W., Yuan, Z., Chang, C.L.: Algorithm of mining sequential patterns for web personalization services. In: ACM SIGMIS Database, vol. 40. ACM. New York (2009)
Zhu, J., Wu, H., Gao, G.: An Efficient Method of Web Sequential Pattern Mining Based on Session Filter and Transaction Identification. Journal of Networks (JNW) 5(9), 1017–1024 (2010) ISSN 1796-2056
Santini, M.: Cross-Testing a Genre Classification Model for the Web. Genres on the Web, Part 3 42, 87–128 (2011)
Rho, J.J., Moon, B.J., Kim, Y.J., Yang, D.H.: Internet Customer Segmentation Using Web Log Data. Journal of Business & Economics Research (JBER)Â 2(11) (2004)
Kejžar, N., Černe, S.K., Batagelj, V.: Network Analysis of Works on Clustering and Classification from Web of Science. In: Proceedings of the 11th IFCS, Part 3, pp. 525–536 (2010)
Xu, G., Zong, Y., Dolog, P., Zhang, Y.: Co-clustering Analysis of Weblogs Using Bipartite Spectral Projection Approach. In: Setchi, R., Jordanov, I., Howlett, R.J., Jain, L.C. (eds.) KES 2010. LNCS, vol. 6278, pp. 398–407. Springer, Heidelberg (2010)
Makanju, A.A.O., Zincir-Heywood, A.N., Milios, E.E.: Clustering Event Logs Using Iterative Partitioning. In: Proceeding of KDD 2009, pp. 1255–1263 (2009)
Wang, J., Mo, Y., Huang, B., Wen, J., He, L.: Web Search Results Clustering Based on a Novel Suffix Tree Structure. In: Rong, C., Jaatun, M.G., Sandnes, F.E., Yang, L.T., Ma, J. (eds.) ATC 2008. LNCS, vol. 5060, pp. 540–554. Springer, Heidelberg (2008)
Chen, J., Cook, T.: Mining contiguous sequential patterns from web logs. In: Proceedings of WWW, pp. 1177–1178 (2007)
Iváncsy, R., Vajk, I.: Frequent Pattern Mining in Web Log Data. Acta Polytechnica Hungarica 3(1), 77–90 (2006)
Mabroukeh, N.R., Ezeife, C.I.: A taxonomy of sequential pattern mining algorithms. ACM Computing Surveys (CSUR)Â 43(1) (2010)
Shivaprasad, G., Subbareddy, N.V., Acharya, U.D.: Knowledge Discovery from Web Usage Data: Research and Development of Web Access Pattern Tree Based Sequential Pattern Mining Techniques: A Survey. In: AIP Conference, vol. 1324, pp. 319–323 (2010)
Grace, L.K.J., Maheswari, V., Nagamalai, D.: Web Log Data Analysis and Mining. In: Meghanathan, N., Kaushik, B.K., Nagamalai, D. (eds.) CCSIT 2011, Part III. Communications in Computer and Information Science (CCIS), vol. 133, pp. 459–469. Springer, Heidelberg (2011)
Li, C., Wang, J.: Efficiently Mining Closed Subsequences with gap constraints. In: Proceeding of 2008 SIAM International Conference on Data Mining, pp. 313–322 (2008)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Yu, X., Li, M., Lee, D.G., Kim, K.D., Ryu, K.H. (2012). Application of Closed Gap-Constrained Sequential Pattern Mining in Web Log Data. In: Zeng, D. (eds) Advances in Control and Communication. Lecture Notes in Electrical Engineering, vol 137. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-26007-0_80
Download citation
DOI: https://doi.org/10.1007/978-3-642-26007-0_80
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-26006-3
Online ISBN: 978-3-642-26007-0
eBook Packages: EngineeringEngineering (R0)