Abstract
Similarity discovery has become one of the most important research streams in web usage mining community in the recent years. The knowledge obtained from the exercise can be used for many applications such as predicting user’s preference, optimizing web cache organization and improving the quality of web document pre-fetching. This paper presents an approach of mining evolving web sessions to cluster web users and establish similarities among web documents, which are then applied to a Similarity-aware Web content Management system, facilitating offline building of the similarity-ware web caches and online updating of sub-caches and cache content similarity profiles. An agent-based web document pre-fetching mechanism is also developed to support the similarity-aware caching to further reduce the bandwidth consumption and network traffic latency, therefore to improve the web access performance.
Keywords
- Web usage mining
- web caching
- similarity discovery
- web document pre-fetching
This is a preview of subscription content, access via your institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Chen, L., Bhowmick, S.S., Li, J.: COWES: Clustering Web Users Based on Historical Web Sessions. In: Li Lee, M., Tan, K.-L., Wuwongse, V. (eds.) DASFAA 2006. LNCS, vol. 3882, pp. 541–556. Springer, Heidelberg (2006)
Xiao, J., Zhang, Y.: Clustering of web users using session-based similarity measures. In: Proc. of ICCNMC 2001 (2001)
Nasraoui, O., Soliman, M., Saka, E., Badia, A., Germain, R.: A Web usage mining Framework for mining Evolving user profiles in Dynamic Web sites. IEEE Transaction on Knowledge and Data Engineering 20(2) (2008)
Xiao, J., Wang, J.: A Similarity-Aware Multiagent-Based Web Content Management Scheme. In: Yeung, D.S., Liu, Z.-Q., Wang, X.-Z., Yan, H. (eds.) ICMLC 2005. LNCS (LNAI), vol. 3930, pp. 305–314. Springer, Heidelberg (2006)
Fan, L., Cao, P., Lin, W., Jacobson, Q.: Web Prefetching between Low-Bandwidth Client and Proxies: Potential and Performance. In: SIGMETRICS 1999 (1999)
Palpanas, T.: Web Prefetching using Partial Matching Prediction, Technical report CSRG-376, University of Toronto (1998)
Xiao, J.: Agent-based Similarity-aware Web Document Pre-fetching. In: Proc. of the CIMCA/IAWTIC 2005, pp. 928–933 (2005)
Wang, W., Zaiane, O.R.: Clustering web sessions by sequence alignment. In: Proc. of DEXA (2002)
Fu, Y., Sandhu, K., Shih, M.: A generalization-based approach to clustering of web usage sessions. In: Masand, B., Spiliopoulou, M. (eds.) WebKDD 1999. LNCS (LNAI), vol. 1836, pp. 21–38. Springer, Heidelberg (2000)
Wen, J.R., Nie, J.Y., Zhang, H.J.: Querying Clustering Using User Logs. ACM Transactions on Information Systems 20(1), 59–81 (2002)
Popescul, A., Flake, G., Lawrence, S., Ungar, L.H., Gile, C.L.: Clustering and Identifying Temporal Trends in Document Database. In: Proceedings of the IEEE advances in Digital Libraries, Washington (2000)
Flesca, S., Masciari, E.: Efficient and Effective Web Change Detection. In: Data & Knowledge Engineering. Elsevier, Amsterdam (2003)
Salton, G., Yang, C.: On the specification of term values in automatic indexing. Journal of Documentation 29, 351–372 (1973)
Barfourosh, A.A., Nezhad, H.R.M., Anderson, M.L., Perlis, D.: Information Retrieval on the World Wide Web and Active Logic: A Survey and Problem Definition, Technical report UMIACS-TR-2001-69, DRUM: Digital Repository at the University of Maryland (2002)
Broder, A.Z.: On the Resemblance and Containment of Documents. In: Proceedings of Compression and Complexity of SEQUENCES 1997, Salerno, Italy, pp. 21–29 (1997)
Fox, E.: Extending the Boolean and Vector Space Models on Information Retrieval with P-Norm Queries and Multiple Concepts Types. Cornell University Dissertation (1983)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Xiao, J. (2008). Mining Evolving Web Sessions and Clustering Dynamic Web Documents for Similarity-Aware Web Content Management. In: Tang, C., Ling, C.X., Zhou, X., Cercone, N.J., Li, X. (eds) Advanced Data Mining and Applications. ADMA 2008. Lecture Notes in Computer Science(), vol 5139. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-88192-6_11
Download citation
DOI: https://doi.org/10.1007/978-3-540-88192-6_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-88191-9
Online ISBN: 978-3-540-88192-6
eBook Packages: Computer ScienceComputer Science (R0)