Abstract
We propose a new data model for Web document representation based on granulation computing, named as Expanded Vector Space Model (EVSM). Traditional Web document clustering is based on two-level knowledge granularity: document and term. It can lead to that clustering results are of “false relevant”. In our approach, Web documents are represented in many-level knowledge granularity. Knowledge granularity with sufficiently conceptual sentences is beneficial for knowledge engineers to understand valuable relations hidden in data. With granularity calculation data can be more efficiently and effectively disposed of and knowledge engineers can handle the same dataset in different knowledge levels. This provides more reliable soundness for interpreting results of various data analysis methods. We experimentally evaluate the proposed approach and demonstrate that our algorithm is promising and efficient.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Hsu, A.L., Halgamuge, S.K.: Enhancement of topology preservation and hierarchical dynamic self-organising maps for data visualization. International Journal of Approximate Reasoning 32(2-3), 259–279 (2003)
Liu, B., Xia, Y., Yu, P.S.: Clustering Through Decision Tree Constructio. In: SIGMOD 2000 (2000)
Hung, C., Wermter, S.: A dynamic adaptive self-organising hybrid model for text clustering. In: Proceedings of The Third IEEE International Conference on Data Mining (ICDM 2003), Melbourne, USA, pp. 75–82 (November 2003)
Hung, C., Wermter, S.: A time-based self-organising model for document clustering. In: Proceedings of International Joint Conference on Neural Networks, Budapest, Hungary, pp. 17–22 (July 2004)
Ngo, C.L., Nguyen, H.S.: A Tolerance Rough Set Approach to Clustering Web Search Results. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) PKDD 2004. LNCS (LNAI), vol. 3202, pp. 515–517. Springer, Heidelberg (2004)
Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers, San Francisco (2000)
Yoon, J., Raghavan, V., Chakilam, V.: BitCube: Clustering and Statistical Analysis for XML Documents. In: Thirteenth International Conference on Scientific and Statistical Database Management, Fairfax, Virginia, July 18-20 (2001)
Kryszkiewicz, M.: Properties of in complete information systems in the framework of rough sets. In: Polkowski, L. (ed.) A Skow roneds. Rough Sets in Data Mining and Knowledge Discovery, pp. 422–450. Springer, Berlin (1998)
Kryszkiewicz, M.: Rough set approach to incomplete information system. Information Sciences 112, 39–495 (1998)
Pawlak, Z.: Rough Sets, Theoretical Aspects of Reasoning about Data. Kluwer Academic Publishers, Dordrecht
Pawlak, Z.: Granularity of knowledge, indiscernibility and rough sets. In: Proceedings of 1998 IEEE International Conference on Fuzzy Systems, pp. 106–110 (1998)
Salton, G., McGill, J.M. (eds.): Introduction to Modern Information Retrieval. McGill-Hill (1983)
Zhang, S.: Knowledge discovery in multi-databases by analyzing local instances. PhD Thesis, Deakin University (2001)
Poe, V., Klauer, P., Brobst, S.: Building A Data Warehouse for Decision Support, 2nd edn. Prentice Hall PTR, Englewood Cliffs
Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: Proceedings of the Fourteenth International Conference on Machine Learning, pp. 412–420. Morgan Kaufmann, San Francisco (1997)
Yao, Y.Y.: Information granulation and rough set approximation. International Journal of Intelligent Systems 16, 87–104 (2001)
Yao, Y.Y.: Granular computing for the design of information retrieval support systems. In: Wu, W., Xiong, H., Shekhar, S. (eds.) Information Retrieval and Clustering, p. 299. Kluwer Academic Publishers, Dordrecht (2003)
Yao, Y.Y.: A Partition Model of Granular Computing. T. Rough Sets 2004, pp. 232–253 (2004)
Zadeh, L.A.: Towards a theory of fuzzy information granulation and its centrality in human reasoning and fuzzy logic. Fuzzy Sets and Systems 19, 111–127
Zadeh, L.A.: Some reflections on soft computing, granular computing and their roles in the conception, design and utilization of information/ intelligent systems. Soft Computing 2, 23–25
Wenzhen, Z.: Architecture for Paragraphs (in Chinese). Fujian People’s Press (1984)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Huang, F., Zhang, S. (2006). Clustering Web Documents Based on Knowledge Granularity. In: Zhou, X., Li, J., Shen, H.T., Kitsuregawa, M., Zhang, Y. (eds) Frontiers of WWW Research and Development - APWeb 2006. APWeb 2006. Lecture Notes in Computer Science, vol 3841. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11610113_9
Download citation
DOI: https://doi.org/10.1007/11610113_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-31142-3
Online ISBN: 978-3-540-32437-9
eBook Packages: Computer ScienceComputer Science (R0)