Clustering Web Documents Based on Knowledge Granularity

  • Faliang Huang
  • Shichao Zhang
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3841)


We propose a new data model for Web document representation based on granulation computing, named as Expanded Vector Space Model (EVSM). Traditional Web document clustering is based on two-level knowledge granularity: document and term. It can lead to that clustering results are of “false relevant”. In our approach, Web documents are represented in many-level knowledge granularity. Knowledge granularity with sufficiently conceptual sentences is beneficial for knowledge engineers to understand valuable relations hidden in data. With granularity calculation data can be more efficiently and effectively disposed of and knowledge engineers can handle the same dataset in different knowledge levels. This provides more reliable soundness for interpreting results of various data analysis methods. We experimentally evaluate the proposed approach and demonstrate that our algorithm is promising and efficient.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Hsu, A.L., Halgamuge, S.K.: Enhancement of topology preservation and hierarchical dynamic self-organising maps for data visualization. International Journal of Approximate Reasoning 32(2-3), 259–279 (2003)MATHCrossRefGoogle Scholar
  2. 2.
    Liu, B., Xia, Y., Yu, P.S.: Clustering Through Decision Tree Constructio. In: SIGMOD 2000 (2000)Google Scholar
  3. 3.
    Hung, C., Wermter, S.: A dynamic adaptive self-organising hybrid model for text clustering. In: Proceedings of The Third IEEE International Conference on Data Mining (ICDM 2003), Melbourne, USA, pp. 75–82 (November 2003)Google Scholar
  4. 4.
    Hung, C., Wermter, S.: A time-based self-organising model for document clustering. In: Proceedings of International Joint Conference on Neural Networks, Budapest, Hungary, pp. 17–22 (July 2004)Google Scholar
  5. 5.
    Ngo, C.L., Nguyen, H.S.: A Tolerance Rough Set Approach to Clustering Web Search Results. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) PKDD 2004. LNCS (LNAI), vol. 3202, pp. 515–517. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  6. 6.
    Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers, San Francisco (2000)Google Scholar
  7. 7.
    Yoon, J., Raghavan, V., Chakilam, V.: BitCube: Clustering and Statistical Analysis for XML Documents. In: Thirteenth International Conference on Scientific and Statistical Database Management, Fairfax, Virginia, July 18-20 (2001)Google Scholar
  8. 8.
    Kryszkiewicz, M.: Properties of in complete information systems in the framework of rough sets. In: Polkowski, L. (ed.) A Skow roneds. Rough Sets in Data Mining and Knowledge Discovery, pp. 422–450. Springer, Berlin (1998)Google Scholar
  9. 9.
    Kryszkiewicz, M.: Rough set approach to incomplete information system. Information Sciences 112, 39–495 (1998)MATHCrossRefMathSciNetGoogle Scholar
  10. 10.
    Pawlak, Z.: Rough Sets, Theoretical Aspects of Reasoning about Data. Kluwer Academic Publishers, DordrechtGoogle Scholar
  11. 11.
    Pawlak, Z.: Granularity of knowledge, indiscernibility and rough sets. In: Proceedings of 1998 IEEE International Conference on Fuzzy Systems, pp. 106–110 (1998)Google Scholar
  12. 12.
    Salton, G., McGill, J.M. (eds.): Introduction to Modern Information Retrieval. McGill-Hill (1983)Google Scholar
  13. 13.
    Zhang, S.: Knowledge discovery in multi-databases by analyzing local instances. PhD Thesis, Deakin University (2001)Google Scholar
  14. 14.
    Poe, V., Klauer, P., Brobst, S.: Building A Data Warehouse for Decision Support, 2nd edn. Prentice Hall PTR, Englewood CliffsGoogle Scholar
  15. 15.
    Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: Proceedings of the Fourteenth International Conference on Machine Learning, pp. 412–420. Morgan Kaufmann, San Francisco (1997)Google Scholar
  16. 16.
    Yao, Y.Y.: Information granulation and rough set approximation. International Journal of Intelligent Systems 16, 87–104 (2001)MATHCrossRefGoogle Scholar
  17. 17.
    Yao, Y.Y.: Granular computing for the design of information retrieval support systems. In: Wu, W., Xiong, H., Shekhar, S. (eds.) Information Retrieval and Clustering, p. 299. Kluwer Academic Publishers, Dordrecht (2003)Google Scholar
  18. 18.
    Yao, Y.Y.: A Partition Model of Granular Computing. T. Rough Sets 2004, pp. 232–253 (2004)Google Scholar
  19. 19.
    Zadeh, L.A.: Towards a theory of fuzzy information granulation and its centrality in human reasoning and fuzzy logic. Fuzzy Sets and Systems 19, 111–127Google Scholar
  20. 20.
    Zadeh, L.A.: Some reflections on soft computing, granular computing and their roles in the conception, design and utilization of information/ intelligent systems. Soft Computing 2, 23–25Google Scholar
  21. 21.
    Wenzhen, Z.: Architecture for Paragraphs (in Chinese). Fujian People’s Press (1984)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Faliang Huang
    • 1
  • Shichao Zhang
    • 2
  1. 1.Faculty of SoftwareFujian Normal UniversityFuzhouChina
  2. 2.Department of Computer ScienceGuangxi Normal UniversityGuilinChina

Personalised recommendations