Abstract
The interdisciplinary research presented in this study is based on a novel approach to clustering tasks and the visualization of the internal structure of high-dimensional data sets. Following normalization, a pre-processing step performs dimensionality reduction on a high-dimensional data set, using an unsupervised neural architecture known as cooperative maximum likelihood Hebbian learning (CMLHL), which is characterized by its capability to preserve a degree of global ordering in the data. Subsequently, the self organising-map (SOM) is applied, as a topology-preserving architecture used for two-dimensional visualization of the internal structure of such data sets. This research studies the joint performance of these two neural models and their capability to preserve some global ordering. Their effectiveness is demonstrated through a case of study on a real-life high complex dimensional spectroscopic data set characterized by its lack of reproducibility. The data under analysis are taken from an X-ray spectroscopic analysis of a rose window in a famous ancient Gothic Spanish cathedral. The main aim of this study is to classify each sample by its date and place of origin, so as to facilitate the restoration of these and other historical stained glass windows. Thus, having ascertained the sample’s chemical composition and degree of conservation, this technique contributes to identifying different areas and periods in which the stained glass panels were produced. The combined method proposed in this study is compared with a classical statistical model that uses principal component analysis (PCA) as a pre-processing step, and with some other unsupervised models such as maximum likelihood Hebbian learning (MLHL) and the application of the SOM without a pre-processing step. In the final case, a comparison of the convergence processes was performed to examine the efficacy of the CMLHL/SOM combined model.
Similar content being viewed by others
References
Ahmad A, Dey L (2005) A feature selection technique for classificatory analysis. Pattern Recogn Lett 26(1):43–56
Kohonen T (1988) Self-organisation and associative memory, vol 8, Springer series in information sciences. Springer-Verlag, New York
Erwin E, Obermayer K, Schulten K (1992) Self-organizing maps: ordering convergence properties and energy functions. Biol Cybern 67:47–55
Wiskott L, Sejnowski TJ (1998) Constrained optimization for neural map formation: a unifying framework for weight growth and normalization. Neural Comput 10(3):671–716
Svensen M (1999). The generative topographic mapping PhD thesis. Aston University, UK
Corchado E, MacDonald D, Fyfe C, (2004). Maximum and minimum likelihood Hebbian learning for exploratory projection pursuit. Data mining and knowledge discovery. Kluwer Academic Publishing 8(3):203–225
Seung HS, Socci ND, Lee D (1998) The rectified Gaussian distribution. Advances in neural information processing systems 10:350
Laaksonen J, Koskela M, Laakso S, Oja E (2001) Self-organising maps as a relevance feedback technique in content-based image retrieval. Pattern Anal Appl 4(2–3):140–152
Lagus K, Kaski S, Kohonen T (2004) Mining massive document collections by the WEBSOM method. Inf Sci 163(1–3):135–156
Corchado E, Fyfe C (2003). Connectionist techniques for the identification and suppression of interfering underlying factors. International journal of pattern recognition and artificial. Intelligence. 17(8):1447–1466
Pearson K (1901) On Lines and Planes of Closest Fit to Systems of Points in Space. Philos Mag 2:559–572
Hotelling H (1993) Analysis of a complex of statistical variables into principal components. J Educ Psychol 24:417–444
Fyfe C, MacDonald D (2002) Epsilon-insensitive Hebbian learning. Neurocomputing 47(1–4):35–57
Ahmadi A, Omatu S, Kosaka T (2003) A PCA based method for improving the reliability of bank note classifier machines. In: Loncaric S, Neri A, Babic H (eds), ISPA 2004 Proceedings of the 3rd International Symposium on Image and Signal Processing and Analysis (IEEE Cat. No. 03EX651), vol 1. Univ. of Zagreb, Zagreb, Croatia, pp 494–499. doi:10.1109/ISPA.2003.1296947
Hyvärinen A (1997). New approximations of differential entropy for independent component analysis and projection pursuit. NIPS 1997
Diaconis P, Freedman D (1984) Asymptotics of graphical projections. Ann Stat 12(3):793–815
Sanger D (1989) A technique for assigning responsibilities to hidden units in connectionist networks contribution analysis. Conn Sci 1(2):115–138
Demtröder W (2008) Laser spectroscopy: experimental techniques, 4th edn. Springer, Berlin
MacDonald D, Corchado E, Fyfe C et al. (2003). Maximum-likelihood competitive learning for the analysis of spectroscopic data. 2nd International Workshop on Practical Applications of Agents and Multiagent Systems–IWPAMS 2003
Yang HC, Lee CH (2004) A text mining approach on automatic generation of web directories and hierarchies. Expert Syst Appl 27(4):645–663
Yang HC, Lee CH (2004) Mining text documents for thematic hierarchies using self-organizing maps. Comput Rev 45(2):117–118
Yang HC, Lee CH (2005) A text mining approach for automatic construction of hypertexts. Expert Syst Appl 29(4):723–734
Kohonen T (2000) Data mining by the self-organising map method. In: Bouchon-Meunier B, Yager RR, Zadeh LA (eds.) Uncertainty in intelligent and information systems. Advances in fuzzy systems—applications and theory, vol 20. World Scientific, Singapore, pp 3–22
Abonyi J, Nemeth S, Vincze C, Arva P (2003) Process analysis and product quality estimation by self-organizing maps with an application to polyethylene production. Comput Ind 52(3):221–234
Lessmann B, Degenhard A, Kessar P, Pointon L, Khazen M, Leach M O, Nattkemper T W (2005). SOM-based wavelet filtering for the exploration of medical images. In: Artificial neural networks: biological inspirations–ICANN 2005, Pt. 1, Proceedings, Lecture Notes in Computer Science, pp 671–676
Krell G, Rebmann R, Seiffert U, Michaelis B (2003). Improving still image coding by an SOM-controlled associative memory. In: Sanfeliu A, Ruiz-Shulcloper J (eds.) Progress in pattern recognition, speech and image analysis. 8th Iberoamerican Congress on Pattern Recognition, CIARP 2003. Proceedings Lecture Notes in Computer Science. Springer-Verlag, Berlin, pp 571–579
Lin S, Si J (1998) Weight-value convergence of the SOM algorithm for discrete input. Neural Comput 10(4):807–814
Corchado JM, Aiken J, Corchado E, Fernández F (2005) Evaluating the air-sea interactions and fluxes using an instance-based reasoning system. AI Communication 18(4):247–256
Herrero A, Corchado E, Pellicer MA, Abraham A (2009) MOVIH-IDS: a mobile-visualization hybrid intrusion detection system. Neurocomputing 72(13–15):2775–2784
Herrero A, corchado E, Gastaldo P, Zunino R (2009) Neural projection techniques for the visual inspection of network traffic. Neurocomputing 72(16–18):3649–3658
Bogdan G, Baruque B, Corchado E (2006) Outlier resistant PCA ensembles. In: Knowledge-based intelligent information and engineering systems, 10th international conference, KES 2006, Bournemouth, UK. KES. LNAI, vol. 3. Springer, Heidelberg, pp 432–440
Yin H (2002) Data Visualisation and Manifold Mapping Using the Visom. Neural Networks 15:1005–1016
Baruque B, Corchado E (2007) Fusion of visualization induced SOM. Innovations in hybrid intelligent systems series: advances in soft computing, vol 44. Springer, Berlin
Bertsekas DP (1995) Nonlinear programming. Athena Scientific, Belmont
Baruque B, Corchado E (2010). A weighted voting summarization of SOM ensembles. Data mining and knowledge discovery. Springer. 21(3):398–426. doi:10.1007/s10618-009-0160-3
Herrero A, Corchado E, Sáiz L, Abraham A (2010) DIPKIP: a connectionist knowledge management system to identify knowledge deficits in practical cases. Comput Intell 26(1):26–56
Yan W, Chen CH, Khoo LP (2005) A web-enabled product definition and customization system for product conceptualization. Expert Syst 22(5):279–293
Liu H, Liu L, Zhang H (2009). Boosting feature selection using information metric for classification. In: Neurocomputing. vol 73(1–3). Elsevier Science, Amsterdam
Saeys Y, Inza I, Larrañaga P (2007) A review of feature selection techniques in bioinformatics, vol 23(19). Bioinformatics Oxford University Press, Oxford, pp 2507–2517
Vinaya V, Bulsara N, Gadgil CJ, Gadgil M (2009) Comparison of feature selection and classification combinations for cancer classification using microarray data. Int J Bioinform Res Appl 5(4):417–431
Nemati S, Basiri ME, Ghasem-Aghaee N, Aghdam MH (2009) A novel ACO-GA hybrid algorithm for feature selection in protein function prediction. Expert Syst Appl Int J 36(10):12086–12094
Hua J, Tembe WD, Dougherty ER (2009) Performance of feature-selection methods in the classification of high-dimension data. Pattern Recogn 42(3):409–424
Gunter S, Bunke H (2004). An evaluation of ensemble methods in handwritten word recognition based on feature selection. Pattern Recogn. ICPR 2004
Gunter S, Bunke H (2004) Handwritten word recognition using classifier ensembles generated from multiple prototypes. Int J Pattern Recogn Artif Intell 18(5):388–392
Sun NQ, Li Y (2009) Intrusion detection based on back-propagation neural network and feature selection mechanism. FGIT 2009. LNCS 5899:151–159
Földiák P (1991) Models of sensory coding, PhD dissertation, University of Cambridge (reprinted as Technical Report No. CUED/F-INFENG/TR 91, Department of Engineering, University of Cambridge, 1992)
Khuwaja GA (2005) Merging face and finger images for human identification. Pattern Anal Appl 8:188–198
Hurtado L F, Griol D, Segarra E, Sanchís E (2006) A stochastic approach for dialog management based on neural networks. In: Proceedings of the 9th international conference on spoken language processing interspeech, Pittsburgh, pp 49–52
Chow TWS, Rahman MKM, Wu S (2006) Content-based image retrieval by using tree-structured features and multi-layer self-organizing map. Pattern Anal Appl 9:1–20
Acknowledgments
This research was supported by projects TIN2010-21272-C02-01 from the Spanish Ministry of Science and Innovation and BU006A08 of the JCyL. The authors would also like to thank the manufacturer of components for vehicle interiors, Grupo Antolin Ingeniería, S.A. which provided support through MAGNO 2008 – 1028 – CENIT funded by the Spanish Ministry of Science and Innovation.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Corchado, E., Perez, J.C. A three-step unsupervised neural model for visualizing high complex dimensional spectroscopic data sets. Pattern Anal Applic 14, 207–218 (2011). https://doi.org/10.1007/s10044-010-0187-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10044-010-0187-5