Advertisement

Neural Computing and Applications

, Volume 19, Issue 4, pp 531–542 | Cite as

Cluster identification and separation in the growing self-organizing map: application in protein sequence classification

  • Norashikin AhmadEmail author
  • Damminda Alahakoon
  • Rowena Chau
Original Article

Abstract

Growing self-organizing map (GSOM) has been introduced as an improvement to the self-organizing map (SOM) algorithm in clustering and knowledge discovery. Unlike the traditional SOM, GSOM has a dynamic structure which allows nodes to grow reflecting the knowledge discovered from the input data as learning progresses. The spread factor parameter (SF) in GSOM can be utilized to control the spread of the map, thus giving an analyst a flexibility to examine the clusters at different granularities. Although GSOM has been applied in various areas and has been proven effective in knowledge discovery tasks, no comprehensive study has been done on the effect of the spread factor parameter value to the cluster formation and separation. Therefore, the aim of this paper is to investigate the effect of the spread factor value towards cluster separation in the GSOM. We used simple k-means algorithm as a method to identify clusters in the GSOM. By using Davies–Bouldin index, clusters formed by different values of spread factor are obtained and the resulting clusters are analyzed. In this work, we show that clusters can be more separated when the spread factor value is increased. Hierarchical clusters can then be constructed by mapping the GSOM clusters at different spread factor values.

Keywords

Cluster identification Cluster separation Unsupervised neural networks Dynamic self-organizing map Protein sequence classification 

References

  1. 1.
    Kohonen T (1990) The self-organizing map. Proc IEEE 78:1464–1480CrossRefGoogle Scholar
  2. 2.
    Fritzke B (1994) Growing cell structures: a self-organizing network for unsupervised and supervised learning. Neural Netw 7:1441–1460CrossRefGoogle Scholar
  3. 3.
    Blackmore J, Miikkulainen R (1993) Incremental grid growing: encoding high-dimensional structure into a two-dimensional feature map. In: IEEE international conference on neural networks, pp 450–455Google Scholar
  4. 4.
    Alahakoon LD (2000) Data mining with structure adapting neural networks. In: School of computer science and software engineering. Monash University, pp xvii, 286 leavesGoogle Scholar
  5. 5.
    Alahakoon D, Halgamuge SK, Srinivasan B (2000) Dynamic self-organizing maps with controlled growth for knowledge discovery. IEEE Trans Neural Netw 11:601–614CrossRefGoogle Scholar
  6. 6.
    Alahakoon LD (2004) Controlling the spread of dynamic self-organising maps. Neural Comput Appl 13:168–174Google Scholar
  7. 7.
    Amarasiri R, Alahakoon D,Smith KA (2004) HDGSOM: a modified growing self-organizing map for high dimensional data clustering. In: Fourth international conference on hybrid intelligent systems, 2004 (HIS ‘04), pp 216–221Google Scholar
  8. 8.
    Zheng X, Liu W, He P, Dai W (2004) Document clustering algorithm based on tree-structured growing self-organizing feature map advances in neural networks—ISNN 2004, pp 840–845Google Scholar
  9. 9.
    Hsu AL, Tang S-L, Halgamuge SK (2003) An unsupervised hierarchical dynamic self-organizing approach to cancer class discovery and marker gene identification in microarray data. Bioinformatics 19:2131–2140CrossRefGoogle Scholar
  10. 10.
    Chan C-KK, Hsu AL, Tang S-L, Halgamuge SK (2008) Using growing self-organising maps to improve the binning process in environmental whole-genome shotgun sequencing. J Biomed Biotechnol 2008:10Google Scholar
  11. 11.
    Wang H, Azuaje F, Black N (2004) An integrative and interactive framework for improving biomedical pattern discovery and visualization. IEEE Trans Inf Technol Biomed 8:16–27CrossRefGoogle Scholar
  12. 12.
    Zheng H, Wang H, Azuaje F (2008) Improving pattern discovery and visualization of SAGE data through poisson-based self-adaptive neural networks. IEEE Trans Inf Technol Biomed 12:459–469CrossRefGoogle Scholar
  13. 13.
    Wang H, Zheng H, Hu J (2008) Poisson approach to clustering analysis of regulatory sequences. Int J Comput Biol Drug Design 1:141–157CrossRefGoogle Scholar
  14. 14.
    Davies DL, Bouldin DW (1979) A cluster separation measure. IEEE Trans Pattern Anal Mach Intell 1:224–227CrossRefGoogle Scholar
  15. 15.
    Amarasiri R, Wickramasinge K,Alahakoon D (2003) Enhanced cluster visualization using the data skeleton model. In: 3rd international conference on intelligent systems design and application (ISDA), Oklahoma, USAGoogle Scholar
  16. 16.
    Hsu A, Alahakoon D, Halgamuge SK, Srinivasan B (2000) Automatic clustering and rule extraction using a dynamic SOM tree. In: Proceedings of the 6th international conference on automation, robotics, control and vision, SingaporeGoogle Scholar
  17. 17.
    Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv 31:264–323Google Scholar
  18. 18.
    Vesanto J, Alhoniemi E (2000) Clustering of the self-organizing map. IEEE Trans Neural Netw 11:586–600CrossRefGoogle Scholar
  19. 19.
    Boeckmann B, Bairoch A, Apweiler R, Blatter M-C, Estreicher A, Gasteiger E, Martin MJ, Michoud K, O’Donovan C, Phan I, Pilbout S, Schneider M (2003) The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucl Acids Res 31:365–370CrossRefGoogle Scholar
  20. 20.
    Ferran EA, Pflugfelder B, Ferrara P (1994) Self-organized neural maps of human protein sequences. Protein Sci 3:507–521CrossRefGoogle Scholar
  21. 21.
    Wang H-C, Dopazo J, De La Fraga LG, Zhu Y-P, Carazo JM (1998) Self-organizing tree-growing network for the classification of protein sequences. Protein Sci 7:2613–2622CrossRefGoogle Scholar
  22. 22.
    Wu CH, McLarty JW (2000) Neural networks and genome informatics. Elsevier, Oxford, AmsterdamGoogle Scholar
  23. 23.
    Li ZR, Lin HH, Han LY, Jiang L, Chen X, Chen YZ (2006) PROFEAT: a web server for computing structural and physicochemical features of proteins and peptides from amino acid sequence. Nucl Acids Res 34:W32–37CrossRefGoogle Scholar
  24. 24.
    Andrade MA, Casari G, Sander C, Valencia A (1997) Classification of protein families and detection of the determinant residues with an improved self-organizing map. Biol Cybern 76:441–450zbMATHCrossRefGoogle Scholar
  25. 25.
    Ferran EA, Ferrara P (1991) Topological maps of protein sequences. Biol Cybern 65:451–458zbMATHCrossRefGoogle Scholar
  26. 26.
    Wu CH, Ermongkonchai A, Chang T-C (1991) Protein classification using a neural network database system. In: Proceedings of the conference on analysis of neural network applications. ACM, Fairfax, Virginia, United StatesGoogle Scholar
  27. 27.
    Wu C, Whitson G, McLarty J, Ermongkonchai A, Chang TC (1992) Protein classification artificial neural system. Protein Sci 1:667–677CrossRefGoogle Scholar
  28. 28.
    Xu R, Wunsch D (2005) Survey of clustering algorithms. IEEE Trans Neural Netw 16:645–678CrossRefGoogle Scholar

Copyright information

© Springer-Verlag London Limited 2009

Authors and Affiliations

  • Norashikin Ahmad
    • 1
    Email author
  • Damminda Alahakoon
    • 1
  • Rowena Chau
    • 1
  1. 1.Clayton School of Information TechnologyMonash UniversityClaytonAustralia

Personalised recommendations