Weighted-Frequent Itemset Refinement Methodology (W-FIRM) of Usage Clusters

Dixit, Veer Sain; Bhatia, Shveta Kundra; Kaur, Sarabjeet

doi:10.1007/978-3-319-09156-3_3

Veer Sain Dixit²³,
Shveta Kundra Bhatia²⁴ &
Sarabjeet Kaur²⁵

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8583))

Included in the following conference series:

International Conference on Computational Science and Its Applications

3543 Accesses

Abstract

Due to information overload on the Internet a large number of systems have been developed for extracting user behavior. This paper presents mining of Frequent Itemsets and refinement of usage clusters for web based applications. Here a particular case is under consideration where sessions in a cluster are in abundance, consequently leading to a very large number of not-so interesting recommendations for the user. To solve such problems we intend to refine clusters on the basis of Weighted Frequent Itemsets that in turn help to generate improved quality refined clusters. In the proposed work, Frequent Itemsets are sets of web pages that occur in sessions more than a given threshold known as the minimum support. Motivation for adapting Frequent Itemsets for refinement is the demand of dimensionality reduction. Experimental results show that the cluster quality using the proposed approach is better than the existing approaches (DBS, 2011 and HITS, 2010). After getting refined clusters the same can be used for number of applications such as Web Personalization, improvement in Web Site Structure, Analysis of Users’ Online Behavior and the services of a Recommender System.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Mobasher: Discovery of aggregate usage profiles for web personalization. WebKDD, Boston (2009)
Google Scholar
Bobadilla, J., Ortega, F., Hernando, A., Gutiérrez, A.: Recommender systems survey. Knowledge-Based Systems 46, 109–132 (2013)
Article Google Scholar
Ziegler, C.N.: On Recommender Systems. In: Ziegler, C.N. (ed.) Social Web Artifacts for Boosting Recommenders. SCI, vol. 487, pp. 11–22. Springer, Heidelberg (2013)
Chapter Google Scholar
Berkhin, P.: Survey of clustering data mining techniques. Springer, Heidelberg
Google Scholar
Flake, G., Lawrence, S., Giles, C.L., Coetzee, F.: Self-organization and identification of Web Communities. IEEE Computer 35(3) (2002)
Google Scholar
Castellano, G., Fanelli, A.M., Mencar, C., Torsello, M.A.: Similarity based Fuzzy clustering for user profiling. In: Proceedings of International Conference on Web Intelligence and Intelligent Agent Technology, IEEE/WIC/ACM (2007)
Google Scholar
Xu, R., Wunsch, D.: Survey of clustering algorithms. IEEE Trans. Neural Networks 16(3), 645–678 (2005)
Article Google Scholar
Mobasher, B., Cooley, R., Srivastava, J.: Automatic Personalization based on Web Usage Mining. Communications of the ACM 43(8), 142–151 (2000)
Article Google Scholar
Agrawal, R., Imieliński, T., Swami, A.: Mining association rules between sets of items in large databases. In: Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, SIGMOD 1993, p. 207 (1993)
Google Scholar
Nock, R., Nielsen, F.: On Weighting Clustering. IEEE Transactions and Pattern Analysis and Machine Intelligence 28(8), 1223–1235 (2006)
Article Google Scholar
Baldi, P., Frasconi, P., Smyth, P.: Modeling the Internet and the Web. Wiley (2003)
Google Scholar
Chakrabarti, S.: Mining the Web. Morgan Kaufmann Publishers (2003)
Google Scholar
Banerjee, A., Ghosh, J.: Click stream clustering using weighted longest common subsequences. In: Proceedings of the Web Mining Workshop at the 1st SIAM Conference on Data Mining (2001)
Google Scholar
Cadez, H.D., Meek, C., Smyth, P., White, S.: Model-based clustering and visualization of navigation patterns on a Web site. Data Mining and Knowledge Discovery 7(4), 399–424 (2003)
Article MathSciNet Google Scholar
Fu, Y., Sandhu, K., Shih, M.Y.: Clustering of Web users based on access patterns. In: Proceedings of WEBKDD (1999)
Google Scholar
Hay, B., Vanhoof, K., Wetsr, G.: Clustering navigation patterns on a Website using a sequence alignment method. In: Proceedings of 17th International Joint Conference on Artificial Intelligence, Seattle, Washington, USA (2001)
Google Scholar
Wang, W., Zaane, O.R.: Clustering Web sessions by sequence alignment. In: Proceedings of the 13th International Workshop on Database and Expert Systems Applications, pp. 394–398. IEEE Computer Society, Washington, DC (2002)
Google Scholar
Shahabe, C., Zarkesh, A.M., Abidi, J., Shah, V.: Knowledge discovery from user’s web-page navigation. In: Proceedings Seventh IEEE International Workshop on Research Issues in Data Engineering (RIDE), pp. 20–29 (1997)
Google Scholar
Eiron, N., McCurley, K.S.: Untangling compound documents on the Web. In: Proceedings of the Fourteenth ACM Conference on Hypertext and Hypermedia (2003)
Google Scholar
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: VLDB 1994, pp. 487–499 (1994)
Google Scholar
Greco, G., Greco, S., Zumpano, E.: Web communities: models and algorithms. Journal of World Wide Web 7(1), 58–82 (2004)
Article Google Scholar
Cheng, J., Ke, Y., Ng, Q.: A survey on algorithms for mining frequent itemsets over data streams. Knowledge and Information Systems 16, 1–27 (2008)
Article MathSciNet Google Scholar
Srivastava, J., Cooley, R., Deshpande, M., Tan, P.: Web usage mining: discovery and applications of usage patterns from Web data. ACM SIGKDD Explorations Newsletter 1(2), 12–23 (2000)
Article Google Scholar
Munk, M., Kapusta, J., Svec, P.: Data preprocessing evaluation for web log mining: reconstruction of activities of a web visitor. Procedia Computer Science 1(1), 2273–2280 (2010)
Article Google Scholar
Kosala, R., Blockeel, H.: Web Mining Research: A Survey. ACM SIGKDD Explorations 2(1), 1–15 (2000)
Article Google Scholar
Liao, S.H., Chu, P.H., Hsiao, P.Y.: Data mining techniques and applications – A decade review from 2000 to 2011. Expert Systems with Applications 39(12), 11303–11311 (2011)
Article Google Scholar
Jain, A.K.: Data clustering: 50 years beyond K-means. Pattern Recognition Letters 31(8), 651–666 (2010), Award Winning Papers from the 19th International Conference on Pattern Recognition (ICPR) (2010)
Google Scholar
Boyinbode, O., Le, H., Takizawa, M.: A survey on clustering algorithms for wireless sensor networks. International Journal of Space-Based and Situated Computing 1(2-3), 130–136 (2011)
Article Google Scholar
Prusiewicz, A., Zięba, M.: Services Recommendation in Systems Based on Service Oriented Architecture by Applying Modified ROCK Algorithm. In: Networked Digital Technologies Communications in Computer and Information Science, vol. 88, pp. 226–238. Springer (2010)
Google Scholar
Karypis, G., Han, E.H., Kumar, V.: Chameleon: hierarchical clustering using dynamic modeling. Computer 32(8), 68–75 (1999)
Article Google Scholar
Davies, D.L., Bouldin, D.W.A.: Cluster Separation Measure. Pattern Analysis and Machine Intelligence. IEEE Transactions PAMI-1(2), 224–227 (1979)
Google Scholar
Chen, M.S., Han, J., Yu, P.S.: Data mining: an overview from a database perspective. Knowledge and Data Engineering 8(6), 866–883 (1996)
Article Google Scholar
Goethals, B.: Frequent Set Mining. In: Data Mining and Knowledge Discovery Handbook, pp. 377–397. Springer (2005)
Google Scholar
Berkhin, P.: A Survey of Clustering Data Mining Techniques. In: Grouping Multidimensional Data, pp. 25–71. Springer (2006)
Google Scholar
Borgelt, C.: An implementation of the FP-growth algorithm. In: Proceedings of the 1st International Workshop on Open Source Data Mining: Frequent Pattern Mining Implementations, pp. 1–5. ACM (2005)
Google Scholar
Maulik, U., Bandyopadhyay, S.: Performance evaluation of some clustering algorithms and validity indices. Pattern Analysis and Machine Intelligence 24(12), 1650–1654 (2002)
Article Google Scholar
Cooley, R., Mobasher, B., Srivastava, J.: Data Preparation for mining World Wide Web Browsing Patterns. In: Knowledge and Information Systems, pp. 1–25. Springer (1999)
Google Scholar
Cormode, G., Hadjieleftheriou, M.: Methods for finding frequent items in data streams. The VLDB Journal 19, 3–20 (2010)
Article Google Scholar
Xie, Y., Phoha, V.V.: Web user clustering from access log using belief function. In: Proceedings of the First International Conference on Knowledge Capture (K-CAP 2001), pp. 202–208. ACM Press (2001)
Google Scholar
Shahabi, C., Banaei-Kashani, F.: A framework for efficient and anonymous web usage mining based on client-side tracking. In: Kohavi, R., et al. (eds.) WebKDD 2001. LNCS (LNAI), vol. 2356, pp. 113–144. Springer, Heidelberg (2002)
Chapter Google Scholar
Ypma, A., Heskes, T.: Clustering web surfers with mixtures of hidden markov models. In: Proceedings of the 14th Belgian–Dutch Conference on AI (BNAIC 2002) (2002)
Google Scholar
Nasraoui, O., Frigui, H., Joshi, A., Krishnapuram, R.: Mining Web Access Logs Using Relational Competitive Fuzzy Clustering. Presented at the Eight International Fuzzy Systems Association World Congress, IFSA 1999, Taipei (1999)
Google Scholar
Tseng, F.C.: Mining frequent itemsets in large databases: The hierarchical partitioning approach. Expert Systems with Applications 40(5), 1654–1661 (2013)
Article Google Scholar
Oyanagi, S., Kubota, K., Nakase, A.: Application of matrix clustering to web log analysis and access prediction. In: Third International Workshop EBKDD 2001—Mining Web Log Data Across All Customers Touch Points (2001)
Google Scholar
Kivi, M., Azmi, R.: A webpage similarity measure for web sessions clustering using sequence alignment. In: Proceedings of 2011 International Symposium Artificial Intelligence and Signal Processing (AISP). IEEE Press (2011)
Google Scholar
Bentley, J.: Multidimensional Binary Search Trees Used for Associative Searching. ACM 18(9), 509–517 (1975)
Article MATH Google Scholar
Bradley, P.S., Fayyad, U., Reina, C.: Scaling Clustering Algorithms to Large Databases. In: 4th International Conference on Knowledge Discovery and Data Mining (KDD 1998). AAAI Press (1998)
Google Scholar
Scholkopf, B., Smola, J., Muller, R.: Technical Report: Nonlinear component analysis as a kernel eigen value problem. Neural Comput. 10(5), 1299–1319 (1998)
Article Google Scholar
Dhillon, I.S., Fan, J., Guan, Y.: Efficient clustering of very large document collections. In: Data Mining for Scientific and Engineering Applications, pp. 357–381. Kluwer Academic Publishers (2001)
Google Scholar
Elkan, C.: Using the Triangle Inequality to Accelerate k-Means. In: Proceedings of the Twentieth International Conference on Machine Learning (ICML 2003), pp. 609–616 (2003)
Google Scholar
Yin, K.C., Hsieh, Y.L., Yang, D.L.: GLFMiner: Global and local frequent pattern mining with temporal intervals. In: 2010 the 5th IEEE Conference Industrial Electronics and Applications (ICIEA), pp. 2248–2253 (2010)
Google Scholar
Baralis, E., Cerquitelli, T., Chiusano, S., Grand, A., Grimaudo, L.: An Efficient Itemset Mining Approach for Data Streams. In: König, A., Dengel, A., Hinkelmann, K., Kise, K., Howlett, R.J., Jain, L.C. (eds.) KES 2011, Part II. LNCS, vol. 6882, pp. 515–523. Springer, Heidelberg (2011)
Chapter Google Scholar
Zhao, C., Jia, B., Liu, Y., Chen, L.: Mining global frequent sub trees. In: 2010 Seventh International Conference on Fuzzy Systems and Knowledge Discovery (FSKD), vol. 5, pp. 2275–2279 (2010)
Google Scholar
Cai, C.H., Fu, A.W., Cheng, C.H., Kwong, W.W.: Mining association rules with weighted items. In: Proceedings of the International Database Engineering and Applications Symposium, IDEAS 1998, Cardiff, Wales, UK, pp. 68–77 (1998)
Google Scholar
Tao, F.: Weighted association rule mining using weighted support and significant framework. In: Proceedings of the 9th ACM SIGKDD, Knowledge Discovery and Data Mining, pp. 661–666 (2003)
Google Scholar
Wang, W., Yang, J., Yu, P.S.: WAR: weighted association rules for item intensities. Knowledge Information and Systems 6, 203–229 (2004)
Article Google Scholar
Yun, U., Leggett, J.J.: WFIM: weighted frequent itemset mining with a weight range and a minimum weight. In: Proceedings of the 15th SIAM International Conference on Data Mining (SDM 2005), pp. 636–640 (2005)
Google Scholar
Yun, U.: Efficient Mining of weighted interesting patterns with a strong weight and/or support affinity. Information Sciences 177, 3477–3499 (2007)
Article MathSciNet Google Scholar
Xu, J., Liu, H.: Web User Clustering Analysis based on K-Means Algorithm. In: International Conference on Information Networking and Automation. IEEE (2010)
Google Scholar
Liu, P., Li, W.: Navigation Pattern Discovery on Web Site Based on the Distance Between Sequences. Artificial Intelligence. In: Artificial Intelligence, Management Science and Electronic Commerce (AIMSEC). IEEE Press (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Atma Ram Sanatan Dharma College, University of Delhi, New Delhi, India
Veer Sain Dixit
Department of Computer Science, Research Scholar, University of Delhi, New Delhi, India
Shveta Kundra Bhatia
Department of Computer Science, Indraprastha College for Women, University of Delhi, New Delhi, India
Sarabjeet Kaur

Authors

Veer Sain Dixit
View author publications
You can also search for this author in PubMed Google Scholar
Shveta Kundra Bhatia
View author publications
You can also search for this author in PubMed Google Scholar
Sarabjeet Kaur
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Engineering, University of Basilicata, 85100, Potenza, Italy
Beniamino Murgante
Department of Computer and Information Sciences, Covenant University, Ota, Nigeria
Sanjay Misra
Department of Production and Systems, University of Minho, 4710-057, Braga, Portugal
Ana Maria A. C. Rocha
DICAR, Polytecnico di Bari, 70125, Bari, Italy
Carmelo Torre
University of Minho, Braga, Portugal
Jorge Gustavo Rocha & Maria Irene Falcão &
Monash University, 3800,, Clayton, VIC, Australia
David Taniar
Department of Intelligent Informatics, Kyushu Sangyo University, 2-3-1 Matsukadai, 813-8503, Higashi-ku, Fukuoka, Japan
Bernady O. Apduhan
Department of Mathematics and Computer Science, University of Perugia, Via Vanvitelli, 1, 06123, Perugia, Italy
Osvaldo Gervasi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dixit, V.S., Bhatia, S.K., Kaur, S. (2014). Weighted-Frequent Itemset Refinement Methodology (W-FIRM) of Usage Clusters. In: Murgante, B., et al. Computational Science and Its Applications – ICCSA 2014. ICCSA 2014. Lecture Notes in Computer Science, vol 8583. Springer, Cham. https://doi.org/10.1007/978-3-319-09156-3_3

Download citation

DOI: https://doi.org/10.1007/978-3-319-09156-3_3
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-09155-6
Online ISBN: 978-3-319-09156-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics