Abstract
This paper presents an integration of a novel document vector representation technique and a novel Growing Self Organizing Process. In this new approach, documents are represented as a low dimensional vector, which is composed of the indices and weights derived from the keywords of the document. An index based similarity calculation method is employed on this low dimensional feature space and the growing self organizing process is modified to comply with the new feature representation model. The initial experiments show that this novel integration outperforms the state-of-the-art Self Organizing Map based techniques of text clustering in terms of its efficiency while preserving the same accuracy level.
Chapter PDF
Similar content being viewed by others
References
Rigouste, L., Cappé, O., Yvon, F.: Inference and evaluation of the multinomial mixture model for text clustering. Information Processing & Management 43(5), 1260–1280 (2007)
Aliguliyev, R.M.: Clustering of document collection-A weighting approach. Expert Systems with Applications 36(4), 7904–7916 (2009)
Saraçoglu, R.I., Tütüncü, K., Allahverdi, N.: A new approach on search for similar documents with multiple categories using fuzzy clustering. Expert Systems with Applications 34(4), 2545–2554 (2008)
Kohonen, T.: Self-organized formation of topologically correct feature maps. Biological cybernetics 43(1), 59–69 (1982)
Chow, T.W.S., Zhang, H., Rahman, M.: A new document representation using term frequency and vectorized graph connectionists with application to document retrieval. Expert Systems with Applications 36(10), 12023–12035 (2009)
Hung, C., Chi, Y.L., Chen, T.Y.: An attentive self-organizing neural model for text mining. Expert Systems with Applications 36(3), 7064–7071 (2009)
Tang, B., Shepherd, M.A., Heywood, M.I., Luo, X.: Comparing Dimension Reduction Techniques for Document Clustering. In: Kégl, B., Lee, H.-H. (eds.) Canadian AI 2005. LNCS (LNAI), vol. 3501, pp. 292–296. Springer, Heidelberg (2005)
Sinka, M.P., Corne, D.W.: The BankSearch web document dataset: investigating unsupervised clustering and category similarity. Journal of Network and Computer Applications 28(2), 129–146 (2005)
Liu, Y., Wu, C., Liu, M.: Research of fast SOM clustering for text information. Expert Systems with Applications (2011)
Isa, D., Kallimani, V., Lee, L.H.: Using the self organizing map for clustering of text documents. Expert Systems with Applications 36(5), 9584–9591 (2009)
Blackmore, J., Miikkulainen, R.: Incremental grid growing: Encoding high-dimensional structure into a two-dimensional feature map. IEEE (1993)
Fritzke, B.: Growing Grid - a self-organizing network with constant neighborhood range and adaptation strength. Neural Processing Letters 2, 9–13 (1995)
Alahakoon, D., Halgamuge, S.K., Srinivasan, B.: Dynamic self-organizing maps with controlled growth for knowledge discovery. IEEE Transactions on Neural Networks 11(3), 601–614 (2000)
Kohonen, T., et al.: Self organizing of a massive document collection. IEEE Transactions on Neural Networks 11(3), 574–585 (2000)
Rauber, A., Merkl, D., Dittenbach, M.: The growing hierarchical self-organizing map: exploratory analysis of high-dimensional data. IEEE Transactions on Neural Networks 13(6), 1331–1341 (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Matharage, S., Alahakoon, D., Rajapakse, J., Huang, P. (2011). Fast Growing Self Organizing Map for Text Clustering. In: Lu, BL., Zhang, L., Kwok, J. (eds) Neural Information Processing. ICONIP 2011. Lecture Notes in Computer Science, vol 7063. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24958-7_48
Download citation
DOI: https://doi.org/10.1007/978-3-642-24958-7_48
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-24957-0
Online ISBN: 978-3-642-24958-7
eBook Packages: Computer ScienceComputer Science (R0)