Advertisement

Frequent Sequence Mining in Web Log Data

  • Paweł Weichbroth
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 659)

Abstract

The amount of information available even on a single web server can be huge. On the other hand, the amount of visitors (users) can often reach a number of at least six digits. Users vary in gender, age and education, and in consequence their information needs are different. Moreover, they subconsciously expect to get more adequate content after visiting the first few pages. The scope of this kind of problem relates to the domain of information filtering, as a method for delivering relevant information. To solve such a problem, different sources of unstructured or structured data can be used, one of the latter type being web server log data. Executed logging processes on the server side can gather valuable data showing requests sent by users to available resources shared on a particular web site. In this paper, we introduce the Apriori-like FWP algorithm for frequent sequence mining in web log data. Discovered sequences present reconstructed navigation paths across shared web pages by a number of users satisfying a defined minimum. Such knowledge can primarily be used for content recommendation, as well as in cross-marketing strategies and email promotion campaigns.

Keywords

Sequence Mining Web Usage 

References

  1. 1.
    Agrawal, R., Srikant, R.: Mining sequential patterns. In: ICDE 1995, pp. 3–14. IEEE, Taipei (1995)Google Scholar
  2. 2.
    Garofalakis, M.N., Rastogi, R., Shim, K.: SPIRIT: sequential pattern mining with regular expression constraints. In: VLDB 1999, vol. 99, pp. 7–10. Morgan Kaufmann Publishers, Edinburgh (1999)Google Scholar
  3. 3.
    Han, J., Pei, J., Mortazavi-Asl, B., Chen, Q., Dayal, U., Hsu, M.: FreeSpan: frequent pattern-projected sequential pattern mining. In: SIGKDD 2000, pp. 355–359. ACM, Boston (2000)Google Scholar
  4. 4.
    Han, J., Pei, J., Mortazavi-Asl, B., Pinto, H., Chen, Q., Dayal, U., Hsu, M.: PrefixSpan: mining sequential patterns efficiently by prefix-projected pattern growth. In: ICDE 2001, pp. 215–224. IEEE Computer Society, Heidelberg (2001)Google Scholar
  5. 5.
    Iváncsy, R., Vajk, I.: Frequent pattern mining in web log data. Acta Polytech. Hung. 3(1), 77–90 (2006)Google Scholar
  6. 6.
    Masseglia, F., Poncelet, P., Cicchetti, R.: An efficient algorithm for web usage mining. Networking Inf. Syst. J. 2(5/6), 571–604 (2000)Google Scholar
  7. 7.
    Mikulski, Ł., Weichbroth, P.: Discovering patterns of visits on the internet web sites in the perspective of associative models. Pol. J. Environ. Stud. 18(3B), 267–271 (2009)Google Scholar
  8. 8.
    Owoc, M., Weichbroth, P.: Validation model for discovered web user navigation patterns. In: IFIP 2012, pp. 38–52. Springer, Montpellier (2012)Google Scholar
  9. 9.
    Pei, J., Han, J., Mortazavi-Asl, B., Wang, J., Pinto, H., Chen, Q., Hsu, M.: Mining sequential patterns by pattern-growth: the prefixspan approach. IEEE Trans. Knowl. Data Eng. 16(11), 1424–1440 (2004)CrossRefGoogle Scholar
  10. 10.
    Srikant, R., Agrawal, R.: Mining sequential patterns: generalizations and performance improvements. Advances in Database Technology, pp. 1–17 (1996)Google Scholar
  11. 11.
    Wang, P., Sanin, C., Szczerbicki, E.: Prediction based on integration of decisional dna and a feature selection algorithm RELIEF-F. Cybern. Syst. 44(2–3), 173–183 (2013)CrossRefGoogle Scholar
  12. 12.
    Weichbroth, P., Owoc, M., Pleszkun, M.: Web user navigation patterns discovery from WWW server log files. In: FedCSIS 2012, pp. 1171–1176. IEEE, Wroclaw (2012)Google Scholar
  13. 13.
    Zaki, M.: SPADE: an efficient algorithm for mining frequent sequences. Mach. Learn. 42(1), 31–60 (2001)CrossRefzbMATHGoogle Scholar
  14. 14.
    Zhang, M., Kao, B., Yip, C., Cheung, D.: A GSP-based efficient algorithm for mining frequent sequences. In: IC-AI 2001, pp. 497–503. Springer, Seattle, Washington, USA (2001)Google Scholar

Copyright information

© Springer International Publishing AG 2018

Authors and Affiliations

  1. 1.Department of Applied Informatics in ManagementGdansk University of TechnologyGdanskPoland

Personalised recommendations