Interval Set Clustering of Web Users with Rough K-Means
Rent the article at a discountRent now
* Final gross prices may vary according to local VAT.Get Access
Data collection and analysis in web mining faces certain unique challenges. Due to a variety of reasons inherent in web browsing and web logging, the likelihood of bad or incomplete data is higher than conventional applications. The analytical techniques in web mining need to accommodate such data. Fuzzy and rough sets provide the ability to deal with incomplete and approximate information. Fuzzy set theory has been shown to be useful in three important aspects of web and data mining, namely clustering, association, and sequential analysis. There is increasing interest in research on clustering based on rough set theory. Clustering is an important part of web mining that involves finding natural groupings of web resources or web users. Researchers have pointed out some important differences between clustering in conventional applications and clustering in web mining. For example, the clusters and associations in web mining do not necessarily have crisp boundaries. As a result, researchers have studied the possibility of using fuzzy sets in web mining clustering applications. Recent attempts have used genetic algorithms based on rough set theory for clustering. However, the genetic algorithms based clustering may not be able to handle the large amount of data typical in a web mining application. This paper proposes a variation of the K-means clustering algorithm based on properties of rough sets. The proposed algorithm represents clusters as interval or rough sets. The paper also describes the design of an experiment including data collection and the clustering process. The experiment is used to create interval set representations of clusters of web visitors.
- do Prado, H.A., Engel, P.M., and Filho, H.C. (2002). Rough Clustering: An Alternative to Finding Meaningful Clusters by Using the Reducts from a Dataset. In J. Alpigini, J.F. Peters, A. Skowron, N. Zhong (Eds.), Rough Sets and Current Trends in Computing (RSCTC'02). Springer-Verlag, Lecture notes in Artificial Intelligence 2475.
- Hartigan, J.A., Wong, M.A. (1979) Algorithm AS136: A K-Means Clustering Algorithm. Applied Statistics 28: pp. 100-108
- Hathaway, R.J., Bezdek, J.C. (1993) Switching Regression Models and Fuzzy Clustering. IEEE Transactions of Fuzzy Systems 1: pp. 195-204
- Hirano, S., Tsumoto, S. (2000) Rough Clustering and Its Application to Medicine. Journal of Information Science 124: pp. 125-137
- Joachims, T., Armstrong, R., Freitag, D., and Mitchell, T. (1995). Webwatcher: A Learning Apprentice for the World Wide Web. In AAAI Spring Symposium on Information Gathering from Heterogeneous, Distributed Environments.
- Joshi, A. and Krishnapuram, R. (1998). Robust Fuzzy Clustering Methods to SupportWeb Mining. In Proceedings of the Workshop on Data Mining and Knowledge Discovery, SIGMOD '98 (pp. 15/1-15/8).
- Krishnapuram, R., Frigui, H., Nasraoui, O. (1995) Fuzzy and Possibilistic Shell Clustering Algorithms and Their Application to Boundary Detection and Surface Approximation: Parts I and II. IEEE Transactions on Fuzzy Systems 3: pp. 29-60
- Krishnapuram, R., Keller, J. (1993) A Possibilistic Approach to Clustering. IEEE Transactions on Fuzzy Systems 1: pp. 98-110
- Lingras, P. (2001) Unsupervised Rough Set Classification Using GAs. Journal of Intelligent Information Systems 16: pp. 215-228
- Lingras, P. (2002). Rough Set Clustering forWebMining. In Proceedings of 2002 IEEE International Conference on Fuzzy Systems.
- Lingras, P. and Huang, X. (2002). Statistical, Evolutionary, and Neurocomputing Clustering Techniques: Cluster-Based Versus Object-Based Approaches. Intelligence Review (submitted).
- MacQueen, J. (1967). Some Methods fir Classification and Analysis of Multivariate Observations. In L.M. Le Cam and J. Neyman (Eds.), Proceedings of Fifth Berkeley Symposium on Mathematical Statistics and Probability, Vol. 1 (pp. 281-297).
- Pawlak, Z. (1982) Rough Sets. International Journal of Information and Computer Sciences 11: pp. 145-172
- Pawlak, Z. (1984) Rough Classification. International Journal of Man-Machine Studies 20: pp. 469-483
- Pawlak, Z. (1992). Rough Sets: Theoretical Aspects of Reasoning About Data. Kluwer Academic Publishers.
- Polkowski, L., Skowron, (1996) Rough Mereology: A New Paradigm for Approximate Reasoning. International Journal of Approximate Reasoning 15: pp. 333-365
- Perkowitz, M. and Etzioni, O. (1997). Adaptive Web Sites: An AI Challenge. In Proceedings of the Fifteenth International Joint Conference on Artificial Intelligence.
- Perkowitz, M. and Etzioni, O. (1999). Adaptive Web Sites: Conceptual Cluster Mining. In Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence.
- Peters, J.F., Skowron, A., Suraj, Z., Rzasa, W., and Borkowski, M. (2002). Clustering: A Rough Set Approach to Constructing Information Granules. In Z. Suraj (Ed.), Soft Computing and Distributed Processing, Proceedings of 6th International Conference, SCDP 2002 (pp. 57-61).
- Skowron, A., Stepaniuk, J. Information Granules in Distributed Environment. In: Ohsuga, S., Zhong, N., Skowron, A. eds. (1999) New Directions in Rough Sets, Data Mining, and Granular-Soft Computing. Springer-Verlag, Tokyo, pp. 357-365
- Voges, K.E., Pope, N.K.Ll., and Brown, M.R. (2002a). Cluster Analysis of Marketing Data: A Comparison of K-Means, Rough Set, and Rough Genetic Approaches. In H.A. Abbas, R.A. Sarker, and C.S. Newton (Eds.), Heuristics and Optimization for Knowledge Discovery (pp. 208-216). Idea Group Publishing.
- Voges, K.E., Pope, N.K.Ll., and Brown, M.R. (2002b). Cluster Analysis of Marketing Data Examining On-Line Shopping Orientation: A Comparison of K-Means, Rough Clustering Approaches. In H.A. Abbas, R.A. Sarker, and C.S. Newton (Eds.), Heuristics and Optimization for Knowledge Discovery (pp. 217-225). Idea Group Publishing.
- Yao, Y.Y., Li, X., Lin, T.Y., and Liu, Q. (1994). Representation and Classification of Rough Set Models. In Proceeding of Third International Workshop on Rough Sets and Soft Computing (pp. 630-637).
- Interval Set Clustering of Web Users with Rough K-Means
Journal of Intelligent Information Systems
Volume 23, Issue 1 , pp 5-16
- Cover Date
- Print ISSN
- Online ISSN
- Kluwer Academic Publishers
- Additional Links
- interval sets
- K-means algorithm
- rough sets
- unsupervised learning
- web mining
- Industry Sectors