Efficient Web Log Mining and Navigational Prediction with EHPSO and Scaled Markov Model

Conference paper
Part of the Smart Innovation, Systems and Technologies book series (SIST, volume 33)

Abstract

Web log mining is an important part of web usage mining, which help us to retrieve the important and hidden information from web server logs, for tuning up websites and increase the capabilities of web servers. In this paper we are proposing an enhanced methodology on web log mining process and online navigational prediction to improve the accuracy and stability of all web log mining stages. First, we are introduced some improvements on preprocessing stage. Second, we proposed refined approach for user identification and time based heuristic approach for session identification. Third, we are purposing efficient hierarchical particle swarm optimization clustering algorithm (EHPSO) to find the similarity based user sessions, which reduced the complexity of the Markov Model. Finally, we are suggesting Markov Model for online navigational prediction and we also proposed improved popularity and similarity based page rank algorithm (IPSPR) to solve the Markov Model ambiguous result problems.

Keywords

Data clustering Web log mining PSO Markov model Data preprocessing 

References

  1. 1.
    Alam, S., Dobie, G.: Clustering heterogeneous web usage data using hierarchical particle swarm optimization. In: Proceedings in Symposium on Swarm Intelligence (SIS), IEEE, pp. 147–154 (2013)Google Scholar
  2. 2.
    Thwe, P.: Proposed Approach for web page access prediction using popularity and similarity based page rank algorithm. Proc. Int. J. Sci. Technol. Res. 2(3), 240–246 (2013)Google Scholar
  3. 3.
    Sha, H.Z., Liu, T.: EPLogCleaner: improving data quality of enterprise proxy logs for efficient web usage mining. Proc. Comput. Sci. 17, 812–818 (2013)CrossRefGoogle Scholar
  4. 4.
    Guerbas, A., Addam, O., Zaarour, O.: Effective web log mining and online navigational pattern prediction. Proc. Knowl.-Based Syst. 49, 50–62 (2013)CrossRefGoogle Scholar
  5. 5.
    Gunel, B.D., Senkul, P.: Investigating the effect of duration paze size and frequency on next page recommendation with page rank algorithm. In: Workshop on Web Search Click Data WSCD (2012)Google Scholar
  6. 6.
    Robot-Pages: http://www.robotstxt.org/db.html. Accessed March 2014
  7. 7.
    HTTP-Status-Codes: http://httperrorcodes.blogspot.in/. Accessed 8 March 2014
  8. 8.
    Suneetha, K.R., Krishnamoorthi, R.: Data preprocessing and easy access retrieval of data through data ware house. In: Proceedings in World Congress on Engineering and Computer Science WCECS, vol. I, San Francisco, USA (2009)Google Scholar
  9. 9.
    Alam, S., Dobbie, G., Riddle, P.: Towards recommender system using particle swarm optimization based web usage clustering. New Frontier in Applied Data Mining, pp. 316–326 (2012)Google Scholar
  10. 10.
    Alam, S.: Intelligent web usage clustering based recommender system. In: Proceedings in Fifth ACM Conference on Recommender Systems, ACM, pp. 367–370 (2011)Google Scholar
  11. 11.
    Velasquez, J.D.: Web mining and privacy concerns: some important legal issues to be consider before applying any data and information extraction technique in web-based environments. Proc Expert Syst. Appl. 40, 5228–5239 (2013)CrossRefGoogle Scholar
  12. 12.
    Eltahir, M.A., Anour, F.A., AllFa, D.: Extracting knowledge from web server logs using web usage mining. In: Proceedings of International Conference on Computing Electrical and Electronic Engineering ICCEEE, IEEE, pp. 413–417 (2013)Google Scholar
  13. 13.
    Kotiyal, B., Kumar, A.: User behavior analysis in web log through comparative study of eclat and apriori. IEEE, pp. 421–426 (2012)Google Scholar
  14. 14.
    Weichbroth, P., Owoc, M.: Web user navigation pattern discovery from WWW server log files. In: Proceedings in Federated Conference on Computer Science and Information Systems, IEEE, pp. 1171–1176 (2012)Google Scholar
  15. 15.
    Tanasa, D.: Web usage mining: contributions to inter sites logs preprocessing and sequential pattern extraction with low support. In: Proceedings in Universite de Nice Sophia Antipolis—UFR Sciences (2005)Google Scholar
  16. 16.
    Nayak, R.: Fast and effective clustering of xml data using structural information. Knowl. Inf. Syst. 14, 197–215 (2008)CrossRefGoogle Scholar
  17. 17.
    Fayyad, U.M., Shapiro, G.P., Smyth, P.: From data mining to knowledge discovery: an overview. In: Proceedings in Advance Knowledge Discover and Data Mining, pp. 1–34 (1996)Google Scholar
  18. 18.
    Pal, N.R., Pal, S.K.: A review on image segmentation techniques. Pattern Recognit. 26(9), 1277–1294 (1993)CrossRefGoogle Scholar
  19. 19.
    BenDor, A., Yakhini, Z.: Clustering gene expression patterns. In: Proceedings in Annual International Conference on Computational Molecular Biology, ser. RECOMB ’99. ACM, New York, NY, USA, pp. 33–42 (1999) Google Scholar
  20. 20.
    Fu, Y., Sandhu, K., Shih, M.Y.: A generalization-based approach to clustering of web usage sessions. In: Revised Papers from the International Workshop on Web Usage Analysis and User Profiling ser. WEBKDD ’99. Springer, London, UK, pp. 21–38 (2000)Google Scholar
  21. 21.
    Steinbach, M., Larypis, G., Kumar, V.: A comparison of document clustering techniques. In: KDD Workshop on Text Mining (2000)Google Scholar
  22. 22.
    Nassar, O.A., Nedhal, A., Saiyd, A.: The integrating between web mining and data mining techniques. In: Proceedings in 2013 5th International Conference on Computer Science and Information Technology CSITIEEE, pp. 243–247 (2013) Google Scholar

Copyright information

© Springer India 2015

Authors and Affiliations

  1. 1.Department of Computer Science and EngineeringSri Guru Granth Sahib World UniversityFatehgarh SahibIndia
  2. 2.Department of Computer Science and EngineeringShaheed Udham Singh College of Engineering and TechnologyTangoryIndia

Personalised recommendations