Pattern Analysis and Applications

, Volume 14, Issue 2, pp 207–218 | Cite as

A three-step unsupervised neural model for visualizing high complex dimensional spectroscopic data sets

  • Emilio CorchadoEmail author
  • Juan C. Perez
Industrial and Commercial Application


The interdisciplinary research presented in this study is based on a novel approach to clustering tasks and the visualization of the internal structure of high-dimensional data sets. Following normalization, a pre-processing step performs dimensionality reduction on a high-dimensional data set, using an unsupervised neural architecture known as cooperative maximum likelihood Hebbian learning (CMLHL), which is characterized by its capability to preserve a degree of global ordering in the data. Subsequently, the self organising-map (SOM) is applied, as a topology-preserving architecture used for two-dimensional visualization of the internal structure of such data sets. This research studies the joint performance of these two neural models and their capability to preserve some global ordering. Their effectiveness is demonstrated through a case of study on a real-life high complex dimensional spectroscopic data set characterized by its lack of reproducibility. The data under analysis are taken from an X-ray spectroscopic analysis of a rose window in a famous ancient Gothic Spanish cathedral. The main aim of this study is to classify each sample by its date and place of origin, so as to facilitate the restoration of these and other historical stained glass windows. Thus, having ascertained the sample’s chemical composition and degree of conservation, this technique contributes to identifying different areas and periods in which the stained glass panels were produced. The combined method proposed in this study is compared with a classical statistical model that uses principal component analysis (PCA) as a pre-processing step, and with some other unsupervised models such as maximum likelihood Hebbian learning (MLHL) and the application of the SOM without a pre-processing step. In the final case, a comparison of the convergence processes was performed to examine the efficacy of the CMLHL/SOM combined model.


Unsupervised learning Projection methods Topology-preserving mapping Visualization Spectroscopic analysis 



This research was supported by projects TIN2010-21272-C02-01 from the Spanish Ministry of Science and Innovation and BU006A08 of the JCyL. The authors would also like to thank the manufacturer of components for vehicle interiors, Grupo Antolin Ingeniería, S.A. which provided support through MAGNO 2008 – 1028 – CENIT funded by the Spanish Ministry of Science and Innovation.


  1. 1.
    Ahmad A, Dey L (2005) A feature selection technique for classificatory analysis. Pattern Recogn Lett 26(1):43–56CrossRefGoogle Scholar
  2. 2.
    Kohonen T (1988) Self-organisation and associative memory, vol 8, Springer series in information sciences. Springer-Verlag, New YorkGoogle Scholar
  3. 3.
    Erwin E, Obermayer K, Schulten K (1992) Self-organizing maps: ordering convergence properties and energy functions. Biol Cybern 67:47–55zbMATHCrossRefGoogle Scholar
  4. 4.
    Wiskott L, Sejnowski TJ (1998) Constrained optimization for neural map formation: a unifying framework for weight growth and normalization. Neural Comput 10(3):671–716CrossRefGoogle Scholar
  5. 5.
    Svensen M (1999). The generative topographic mapping PhD thesis. Aston University, UKGoogle Scholar
  6. 6.
    Corchado E, MacDonald D, Fyfe C, (2004). Maximum and minimum likelihood Hebbian learning for exploratory projection pursuit. Data mining and knowledge discovery. Kluwer Academic Publishing 8(3):203–225Google Scholar
  7. 7.
    Seung HS, Socci ND, Lee D (1998) The rectified Gaussian distribution. Advances in neural information processing systems 10:350Google Scholar
  8. 8.
    Laaksonen J, Koskela M, Laakso S, Oja E (2001) Self-organising maps as a relevance feedback technique in content-based image retrieval. Pattern Anal Appl 4(2–3):140–152MathSciNetzbMATHGoogle Scholar
  9. 9.
    Lagus K, Kaski S, Kohonen T (2004) Mining massive document collections by the WEBSOM method. Inf Sci 163(1–3):135–156CrossRefGoogle Scholar
  10. 10.
    Corchado E, Fyfe C (2003). Connectionist techniques for the identification and suppression of interfering underlying factors. International journal of pattern recognition and artificial. Intelligence. 17(8):1447–1466Google Scholar
  11. 11.
    Pearson K (1901) On Lines and Planes of Closest Fit to Systems of Points in Space. Philos Mag 2:559–572Google Scholar
  12. 12.
    Hotelling H (1993) Analysis of a complex of statistical variables into principal components. J Educ Psychol 24:417–444CrossRefGoogle Scholar
  13. 13.
    Fyfe C, MacDonald D (2002) Epsilon-insensitive Hebbian learning. Neurocomputing 47(1–4):35–57zbMATHGoogle Scholar
  14. 14.
    Ahmadi A, Omatu S, Kosaka T (2003) A PCA based method for improving the reliability of bank note classifier machines. In: Loncaric S, Neri A, Babic H (eds), ISPA 2004 Proceedings of the 3rd International Symposium on Image and Signal Processing and Analysis (IEEE Cat. No. 03EX651), vol 1. Univ. of Zagreb, Zagreb, Croatia, pp 494–499. doi: 10.1109/ISPA.2003.1296947
  15. 15.
    Hyvärinen A (1997). New approximations of differential entropy for independent component analysis and projection pursuit. NIPS 1997Google Scholar
  16. 16.
    Diaconis P, Freedman D (1984) Asymptotics of graphical projections. Ann Stat 12(3):793–815MathSciNetzbMATHCrossRefGoogle Scholar
  17. 17.
    Sanger D (1989) A technique for assigning responsibilities to hidden units in connectionist networks contribution analysis. Conn Sci 1(2):115–138CrossRefGoogle Scholar
  18. 18.
    Demtröder W (2008) Laser spectroscopy: experimental techniques, 4th edn. Springer, BerlinGoogle Scholar
  19. 19.
    MacDonald D, Corchado E, Fyfe C et al. (2003). Maximum-likelihood competitive learning for the analysis of spectroscopic data. 2nd International Workshop on Practical Applications of Agents and Multiagent Systems–IWPAMS 2003Google Scholar
  20. 20.
    Yang HC, Lee CH (2004) A text mining approach on automatic generation of web directories and hierarchies. Expert Syst Appl 27(4):645–663MathSciNetCrossRefGoogle Scholar
  21. 21.
    Yang HC, Lee CH (2004) Mining text documents for thematic hierarchies using self-organizing maps. Comput Rev 45(2):117–118MathSciNetGoogle Scholar
  22. 22.
    Yang HC, Lee CH (2005) A text mining approach for automatic construction of hypertexts. Expert Syst Appl 29(4):723–734CrossRefGoogle Scholar
  23. 23.
    Kohonen T (2000) Data mining by the self-organising map method. In: Bouchon-Meunier B, Yager RR, Zadeh LA (eds.) Uncertainty in intelligent and information systems. Advances in fuzzy systems—applications and theory, vol 20. World Scientific, Singapore, pp 3–22 Google Scholar
  24. 24.
    Abonyi J, Nemeth S, Vincze C, Arva P (2003) Process analysis and product quality estimation by self-organizing maps with an application to polyethylene production. Comput Ind 52(3):221–234CrossRefGoogle Scholar
  25. 25.
    Lessmann B, Degenhard A, Kessar P, Pointon L, Khazen M, Leach M O, Nattkemper T W (2005). SOM-based wavelet filtering for the exploration of medical images. In: Artificial neural networks: biological inspirations–ICANN 2005, Pt. 1, Proceedings, Lecture Notes in Computer Science, pp 671–676Google Scholar
  26. 26.
    Krell G, Rebmann R, Seiffert U, Michaelis B (2003). Improving still image coding by an SOM-controlled associative memory. In: Sanfeliu A, Ruiz-Shulcloper J (eds.) Progress in pattern recognition, speech and image analysis. 8th Iberoamerican Congress on Pattern Recognition, CIARP 2003. Proceedings Lecture Notes in Computer Science. Springer-Verlag, Berlin, pp 571–579Google Scholar
  27. 27.
    Lin S, Si J (1998) Weight-value convergence of the SOM algorithm for discrete input. Neural Comput 10(4):807–814CrossRefGoogle Scholar
  28. 28.
    Corchado JM, Aiken J, Corchado E, Fernández F (2005) Evaluating the air-sea interactions and fluxes using an instance-based reasoning system. AI Communication 18(4):247–256zbMATHGoogle Scholar
  29. 29.
    Herrero A, Corchado E, Pellicer MA, Abraham A (2009) MOVIH-IDS: a mobile-visualization hybrid intrusion detection system. Neurocomputing 72(13–15):2775–2784CrossRefGoogle Scholar
  30. 30.
    Herrero A, corchado E, Gastaldo P, Zunino R (2009) Neural projection techniques for the visual inspection of network traffic. Neurocomputing 72(16–18):3649–3658CrossRefGoogle Scholar
  31. 31.
    Bogdan G, Baruque B, Corchado E (2006) Outlier resistant PCA ensembles. In: Knowledge-based intelligent information and engineering systems, 10th international conference, KES 2006, Bournemouth, UK. KES. LNAI, vol. 3. Springer, Heidelberg, pp 432–440Google Scholar
  32. 32.
    Yin H (2002) Data Visualisation and Manifold Mapping Using the Visom. Neural Networks 15:1005–1016CrossRefGoogle Scholar
  33. 33.
    Baruque B, Corchado E (2007) Fusion of visualization induced SOM. Innovations in hybrid intelligent systems series: advances in soft computing, vol 44. Springer, BerlinGoogle Scholar
  34. 34.
    Bertsekas DP (1995) Nonlinear programming. Athena Scientific, BelmontzbMATHGoogle Scholar
  35. 35.
    Baruque B, Corchado E (2010). A weighted voting summarization of SOM ensembles. Data mining and knowledge discovery. Springer. 21(3):398–426. doi: 10.1007/s10618-009-0160-3 Google Scholar
  36. 36.
    Herrero A, Corchado E, Sáiz L, Abraham A (2010) DIPKIP: a connectionist knowledge management system to identify knowledge deficits in practical cases. Comput Intell 26(1):26–56CrossRefGoogle Scholar
  37. 37.
    Yan W, Chen CH, Khoo LP (2005) A web-enabled product definition and customization system for product conceptualization. Expert Syst 22(5):279–293CrossRefGoogle Scholar
  38. 38.
    Liu H, Liu L, Zhang H (2009). Boosting feature selection using information metric for classification. In: Neurocomputing. vol 73(1–3). Elsevier Science, AmsterdamGoogle Scholar
  39. 39.
    Saeys Y, Inza I, Larrañaga P (2007) A review of feature selection techniques in bioinformatics, vol 23(19). Bioinformatics Oxford University Press, Oxford, pp 2507–2517Google Scholar
  40. 40.
    Vinaya V, Bulsara N, Gadgil CJ, Gadgil M (2009) Comparison of feature selection and classification combinations for cancer classification using microarray data. Int J Bioinform Res Appl 5(4):417–431CrossRefGoogle Scholar
  41. 41.
    Nemati S, Basiri ME, Ghasem-Aghaee N, Aghdam MH (2009) A novel ACO-GA hybrid algorithm for feature selection in protein function prediction. Expert Syst Appl Int J 36(10):12086–12094CrossRefGoogle Scholar
  42. 42.
    Hua J, Tembe WD, Dougherty ER (2009) Performance of feature-selection methods in the classification of high-dimension data. Pattern Recogn 42(3):409–424zbMATHCrossRefGoogle Scholar
  43. 43.
    Gunter S, Bunke H (2004). An evaluation of ensemble methods in handwritten word recognition based on feature selection. Pattern Recogn. ICPR 2004Google Scholar
  44. 44.
    Gunter S, Bunke H (2004) Handwritten word recognition using classifier ensembles generated from multiple prototypes. Int J Pattern Recogn Artif Intell 18(5):388–392Google Scholar
  45. 45.
    Sun NQ, Li Y (2009) Intrusion detection based on back-propagation neural network and feature selection mechanism. FGIT 2009. LNCS 5899:151–159Google Scholar
  46. 46.
    Földiák P (1991) Models of sensory coding, PhD dissertation, University of Cambridge (reprinted as Technical Report No. CUED/F-INFENG/TR 91, Department of Engineering, University of Cambridge, 1992)Google Scholar
  47. 47.
    Khuwaja GA (2005) Merging face and finger images for human identification. Pattern Anal Appl 8:188–198MathSciNetCrossRefGoogle Scholar
  48. 48.
    Hurtado L F, Griol D, Segarra E, Sanchís E (2006) A stochastic approach for dialog management based on neural networks. In: Proceedings of the 9th international conference on spoken language processing interspeech, Pittsburgh, pp 49–52Google Scholar
  49. 49.
    Chow TWS, Rahman MKM, Wu S (2006) Content-based image retrieval by using tree-structured features and multi-layer self-organizing map. Pattern Anal Appl 9:1–20MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer-Verlag London Limited 2010

Authors and Affiliations

  1. 1.Departamento de Informática y AutomáticaUniversity of SalamancaSalamancaSpain
  2. 2.Department of Civil EngineeringUniversity of BurgosBurgosSpain

Personalised recommendations