Knowledge and Information Systems

, Volume 10, Issue 4, pp 515–528 | Cite as

Mining user access patterns with traversal constraint for predicting web page requests

  • Mei-Ling Shyu
  • Choochart Haruechaiyasak
  • Shu-Ching Chen
Short Paper

Abstract

The recent increase in HyperText Transfer Protocol (HTTP) traffic on the World Wide Web (WWW) has generated an enormous amount of log records on Web server databases. Applying Web mining techniques on these server log records can discover potentially useful patterns and reveal user access behaviors on the Web site. In this paper, we propose a new approach for mining user access patterns for predicting Web page requests, which consists of two steps. First, the Minimum Reaching Distance (MRD) algorithm is applied to find the distances between the Web pages. Second, the association rule mining technique is applied to form a set of predictive rules, and the MRD information is used to prune the results from the association rule mining process. Experimental results from a real Web data set show that our approach improved the performance over the existing Markov-model approach in precision, recall, and the reduction of user browsing time.

Keywords

Web usage mining Association rule mining Mining user access patterns 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Agrawal R, Imielinski T, Swami A (1993) Mining association rules between sets of items in large databases. In: Proceedings of the ACM SIGMOD conference on management of data, Washington, D.C., pp 207–216Google Scholar
  2. 2.
    Agrawal R, Srikant R (1995) Mining sequential patterns. In: Proceedings of the 11th international conference on data engineering, Taipei, Taiwan, pp 3–14Google Scholar
  3. 3.
    Anderson C, Domingos P, Weld D (2002) Relational Markov models and their application to adaptive web navigation. In: Proceedings of the 8th ACM SIGKDD international conference on knowledge discovery and data mining, Edmonton, Canada, pp 143–152Google Scholar
  4. 4.
    Baeza-Yates R, Ribeiro-Neto B (eds) (1999) Modern information retrieval. ACM Press, Addison WesleyGoogle Scholar
  5. 5.
    Chen MS, Park JS, Yu PS (1998) Efficient data mining for path traversal patterns. IEEE Trans Knowl Data Eng 10(2):209–221CrossRefGoogle Scholar
  6. 6.
    Cooley R, Mobasher B, Srivastava J (1999) Data preparation for mining world wide web browsing patterns. Knowl Inf Syst 1(1):5–32Google Scholar
  7. 7.
    Deshpande M, Karypis G (2001) Selective Markov models for predicting web-page accesses. In: Proceedings of the 1st SIAM international conference on data mining, Chicago, ILGoogle Scholar
  8. 8.
    Haruechaiyasak C, Shyu ML, Chen SC (2005) A web-page recommender system via a data mining framework and the semantic web concept. Int J Comput Applic Technol (in press)Google Scholar
  9. 9.
    Lin W, Alvarez S, Ruiz C (2002) Efficient adaptive-support association rule mining for recommender systems. Data Mining and Knowledge Dis 6(1):83–105MathSciNetCrossRefGoogle Scholar
  10. 10.
    Mobasher B, Dai H, Luo T, Nakagawa M (2002a) Using sequential and nonsequential patterns for predictive web usage mining tasks. In: Proceedings of the IEEE international conference on data mining, Maebashi City, JapanGoogle Scholar
  11. 11.
    Mobasher B, Dai H, Tao M (2002b) Discovery and evaluation of aggregate usage profiles for web personalization. Data Mining Knowl Dis 6(1):61–82CrossRefGoogle Scholar
  12. 12.
    Padmanabhan VN, Mogul JC (1996) Using predictive prefetching to improve world wide web latency. ACM SIGCOMM Comput Commun Rev 26(3):22–36CrossRefGoogle Scholar
  13. 13.
    Pitkow J, Pirolli P (1999) Mining longest repeating subsequences to predict world wide web surfing. In: Proceedings of the 2nd USENIX Symposium on internet technologies and systems, Boulder, CO, pp 139–150Google Scholar
  14. 14.
    Schechter S, Krishnan M, Smith MD (1998) Using path profiles to predict HTTP requests. Comput Networks ISDN Syst 30(1–7):457–467CrossRefGoogle Scholar
  15. 15.
    Shyu ML, Haruechaiyasak C, Chen SC (2003) Category Cluster Discovery from Distributed WWW Directories. J Inf Sci (Special issue on knowledge discovery from distributed information sources) 155(3–4):181–197Google Scholar
  16. 16.
    Shyu ML, Chen SC, Haruechaiyasak C (2001) Mining user access behavior on the www. In: Proceedings of the IEEE international conference on systems, man, and cybernetics, Tucson, AZ, pp 1717–1722Google Scholar
  17. 17.
    Srivasta J, Cooley R, Deshpande M, Tan P (2000) Web usage mining: discovery and applications of usage patterns from web data. SIGKDD Explor (1)2:12–23Google Scholar
  18. 18.
    Tan P, Kumar V (2002) Discovery of web robot sessions based on their navigational patterns. Data Mining Knowl Discov 6(1):9–35MathSciNetCrossRefGoogle Scholar
  19. 19.
    Yang Y (1999) An evaluation of statistical approaches to text categorization. J Inf Retr 1(1/2):67–88MATHGoogle Scholar

Copyright information

© Springer-Verlag London Limited 2006

Authors and Affiliations

  • Mei-Ling Shyu
    • 1
  • Choochart Haruechaiyasak
    • 2
  • Shu-Ching Chen
    • 3
  1. 1.Department of Electrical and Computer EngineeringUniversity of MiamiCoral GablesUSA
  2. 2.Information Research and Development Division (RDI)National Electronics and Computer Technology Center (NECTEC)PathumthaniThailand
  3. 3.Distributed Multimedia Information System LaboratorySchool of Computer Science, Florida International UniversityMiamiUSA

Personalised recommendations