Using Online Self-Adaptive Clustering to Group Web Documents

Conference paper
Part of the Lecture Notes in Electrical Engineering book series (LNEE, volume 107)

Abstract

In this chapter an approach that is using online self-adaptive, incremental (on-line) clustering to automatically group relevant Web-based documents is proposed. The proposed online self-adaptive classifier has learning capability—it improves the result on-line with any new document that has been accessed. Therefore, the proposed approach is characterized by low complexity. This chapter reports the results of research on development of a novel clustering method that is suitable for real-time implementations. It is based on evolution principles and tries to address the limitations of existing clustering algorithms which cannot cope in an online mode with high dimensional datasets. This evolution-inspired and nature-inspired approach introduces the new concept of potential values which describes the fitness of a new sample (web document) to be the prototype of a new cluster without the need to store each previously encountered documents but taking into account the contextual similarity density between all previous documents in a recursive and thus computationally efficient way. This chapter also examines the clustering of documents by contextual similarity using extracted keywords represented in a vector space model.

Keywords

Clustering Information retrieval Self-adaptive on-line clustering Contextual similarity 

References

  1. 1.
    Fayyad G, Shapiro P, Smyth P (1996) From data mining to knowledge discovery: an overview. Advances in knowledge discovery and data mining, MIT Press, CambridgeGoogle Scholar
  2. 2.
    Domingos P, Hulten G (2001) Catching up with the data: Research issues in mining data streams, Workshop on research issues in data mining and knowledge discovery, Santa Barbara, CAGoogle Scholar
  3. 3.
    Duda RO, Hart PE, Stork DG (2000) Pattern classification, 2nd edn. Wiley, ChichesterGoogle Scholar
  4. 4.
    Salton G, Lesk ME (1965) The smart automatic retrieval system an illustration. ACM 8:6CrossRefGoogle Scholar
  5. 5.
    Runkler TA, Bezdek JC (2003) Web mining with relational clustering. Int J Approx Reason 32(2–3):217–236CrossRefMATHGoogle Scholar
  6. 6.
    Fox CJ (1990) A stop list for general text. SIGIR Forum 24(1–2):19–35Google Scholar
  7. 7.
    Salton G, Wong A, Yang CS (1975) A vector space model for automatic indexing. ACM 18(11):613–620Google Scholar
  8. 8.
    Angelov P, Evans T (2004) Semantic categorization of web-based documents. In: Proceedings of 5th international conference recent advances in soft computing (RASC), Nottingham, UK, pp 500–505Google Scholar
  9. 9.
    Yang B, Song W (2005) A SOM-based web text clustering approach. In: Proceedings of IFSA, pp 618–621Google Scholar
  10. 10.
    Suryavanshi BS, Shiri N, Mudur SP (2005) Incremental relational fuzzy subtractive clustering for dynamic web usage profiling. In: Proceedings of WebKDD, ChicagoGoogle Scholar
  11. 11.
    Chai KMA, Ng HT, Chieu HL (2002) Bayesian online classifiers for text classification and filtering. In: Proceedings of SIGIR’02, 11–15 Aug, Tampere, Finland, pp 97–104Google Scholar
  12. 12.
    Angelov P (2004) An approach for fuzzy rule-base adaptation using on-line clustering. Int. J Approx Reason 35(3):275–289CrossRefMATHMathSciNetGoogle Scholar
  13. 13.
    Angelov P, Zhou X (2006) Evolving fuzzy systems from data streams in real-time. 2006 international symposium on evolving fuzzy systems. IEEE Press, Ambleside, pp 26–32Google Scholar
  14. 14.
    Angelov P, Zhou X, Klawonn F (2007) Evolving fuzzy rule-based classifiers. In: IEEE international conference on computational intelligence applications for signal and image processing, 1–5 April, Honolulu, Hawaii, USAGoogle Scholar

Copyright information

© Springer Science+Business Media B.V. 2012

Authors and Affiliations

  1. 1.SYM Global LimitedManchesterUK

Personalised recommendations