Clustering Web Documents Based on Knowledge Granularity

Huang, Faliang; Zhang, Shichao

doi:10.1007/11610113_9

Clustering Web Documents Based on Knowledge Granularity

Faliang Huang²¹ &
Shichao Zhang²²

Conference paper

639 Accesses
3 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3841))

Abstract

We propose a new data model for Web document representation based on granulation computing, named as Expanded Vector Space Model (EVSM). Traditional Web document clustering is based on two-level knowledge granularity: document and term. It can lead to that clustering results are of “false relevant”. In our approach, Web documents are represented in many-level knowledge granularity. Knowledge granularity with sufficiently conceptual sentences is beneficial for knowledge engineers to understand valuable relations hidden in data. With granularity calculation data can be more efficiently and effectively disposed of and knowledge engineers can handle the same dataset in different knowledge levels. This provides more reliable soundness for interpreting results of various data analysis methods. We experimentally evaluate the proposed approach and demonstrate that our algorithm is promising and efficient.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Hsu, A.L., Halgamuge, S.K.: Enhancement of topology preservation and hierarchical dynamic self-organising maps for data visualization. International Journal of Approximate Reasoning 32(2-3), 259–279 (2003)
Article MATH Google Scholar
Liu, B., Xia, Y., Yu, P.S.: Clustering Through Decision Tree Constructio. In: SIGMOD 2000 (2000)
Google Scholar
Hung, C., Wermter, S.: A dynamic adaptive self-organising hybrid model for text clustering. In: Proceedings of The Third IEEE International Conference on Data Mining (ICDM 2003), Melbourne, USA, pp. 75–82 (November 2003)
Google Scholar
Hung, C., Wermter, S.: A time-based self-organising model for document clustering. In: Proceedings of International Joint Conference on Neural Networks, Budapest, Hungary, pp. 17–22 (July 2004)
Google Scholar
Ngo, C.L., Nguyen, H.S.: A Tolerance Rough Set Approach to Clustering Web Search Results. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) PKDD 2004. LNCS (LNAI), vol. 3202, pp. 515–517. Springer, Heidelberg (2004)
Chapter Google Scholar
Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers, San Francisco (2000)
Google Scholar
Yoon, J., Raghavan, V., Chakilam, V.: BitCube: Clustering and Statistical Analysis for XML Documents. In: Thirteenth International Conference on Scientific and Statistical Database Management, Fairfax, Virginia, July 18-20 (2001)
Google Scholar
Kryszkiewicz, M.: Properties of in complete information systems in the framework of rough sets. In: Polkowski, L. (ed.) A Skow roneds. Rough Sets in Data Mining and Knowledge Discovery, pp. 422–450. Springer, Berlin (1998)
Google Scholar
Kryszkiewicz, M.: Rough set approach to incomplete information system. Information Sciences 112, 39–495 (1998)
Article MATH MathSciNet Google Scholar
Pawlak, Z.: Rough Sets, Theoretical Aspects of Reasoning about Data. Kluwer Academic Publishers, Dordrecht
Google Scholar
Pawlak, Z.: Granularity of knowledge, indiscernibility and rough sets. In: Proceedings of 1998 IEEE International Conference on Fuzzy Systems, pp. 106–110 (1998)
Google Scholar
Salton, G., McGill, J.M. (eds.): Introduction to Modern Information Retrieval. McGill-Hill (1983)
Google Scholar
Zhang, S.: Knowledge discovery in multi-databases by analyzing local instances. PhD Thesis, Deakin University (2001)
Google Scholar
Poe, V., Klauer, P., Brobst, S.: Building A Data Warehouse for Decision Support, 2nd edn. Prentice Hall PTR, Englewood Cliffs
Google Scholar
Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: Proceedings of the Fourteenth International Conference on Machine Learning, pp. 412–420. Morgan Kaufmann, San Francisco (1997)
Google Scholar
Yao, Y.Y.: Information granulation and rough set approximation. International Journal of Intelligent Systems 16, 87–104 (2001)
Article MATH Google Scholar
Yao, Y.Y.: Granular computing for the design of information retrieval support systems. In: Wu, W., Xiong, H., Shekhar, S. (eds.) Information Retrieval and Clustering, p. 299. Kluwer Academic Publishers, Dordrecht (2003)
Google Scholar
Yao, Y.Y.: A Partition Model of Granular Computing. T. Rough Sets 2004, pp. 232–253 (2004)
Google Scholar
Zadeh, L.A.: Towards a theory of fuzzy information granulation and its centrality in human reasoning and fuzzy logic. Fuzzy Sets and Systems 19, 111–127
Google Scholar
Zadeh, L.A.: Some reflections on soft computing, granular computing and their roles in the conception, design and utilization of information/ intelligent systems. Soft Computing 2, 23–25
Google Scholar
Wenzhen, Z.: Architecture for Paragraphs (in Chinese). Fujian People’s Press (1984)
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Software, Fujian Normal University, Fuzhou, 350007, China
Faliang Huang
Department of Computer Science, Guangxi Normal University, Guilin, 541004, China
Shichao Zhang

Authors

Faliang Huang
View author publications
You can also search for this author in PubMed Google Scholar
Shichao Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of ITEE, The University of Queensland, Australia
Xiaofang Zhou
School of Computer Science and Technology, Heilongjiang University, China
Jianzhong Li
School of Information Technology and Electrical Engineering, The University of Queensland, Brisbane, QLD, Australia
Heng Tao Shen
Institute of Industrial Science, The University of Tokyo, 4-6-1 Komaba, Meguro-ku, 153-8505, Tokyo, Japan
Masaru Kitsuregawa
Victoria University, Australia
Yanchun Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Huang, F., Zhang, S. (2006). Clustering Web Documents Based on Knowledge Granularity. In: Zhou, X., Li, J., Shen, H.T., Kitsuregawa, M., Zhang, Y. (eds) Frontiers of WWW Research and Development - APWeb 2006. APWeb 2006. Lecture Notes in Computer Science, vol 3841. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11610113_9

Download citation

DOI: https://doi.org/10.1007/11610113_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-31142-3
Online ISBN: 978-3-540-32437-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics