Abstract
Multilayer perceptron networks have been designed to solve supervised learning problems in which there is a set of known labeled training feature vectors. The resulting model allows us to infer adequate labels for unknown input vectors. Traditionally, the optimal model is the one that minimizes the error between the known labels and those inferred labels via such a model. The training process results in those weights that achieve the most adequate labels. Training implies a search process which is usually determined by the descent gradient of the error. In this work, we propose to replace the known labels by a set of such labels induced by a validity index. The validity index represents a measure of the adequateness of the model relative only to intrinsic structures and relationships of the set of feature vectors and not to previously known labels. Since, in general, there is no guarantee of the differentiability of such an index, we resort to heuristic optimization techniques. Our proposal results in an unsupervised learning approach for multilayer perceptron networks that allows us to infer the best model relative to labels derived from such a validity index which uncovers the hidden relationships of an unlabeled dataset.
Similar content being viewed by others
References
Ahalt SC, Krishnamurthy AK, Chen P, Melton DE (1990) Competitive learning algorithms for vector quantization. Neural Netw 3(3):277–290
Aldana-Bobadilla E, Alfaro-Pérez C (2015) Finding the optimal sample based on Shannon’s entropy and genetic algorithms. Springer, Cham, pp 353–363
Aldana-Bobadilla E, Kuri-Morales A (2015) A clustering method based on the maximum entropy principle. Entropy 17(1):175
Baç ao F, Lobo V, Painho M (2005) Self-organizing maps as substitutes for k-means clustering. In: International conference on computational science, pp 476–483. Springer
Battiti R, Tecchiolli G (1994) The reactive Tabu search. ORSA J Comput 6(2):126–140
Brooks SP, Morgan BJT (1995) Optimization using simulated annealing. The Statistician 44:241–257
Burkardt J (2009) K-means clustering. In: Virginia Tech, Advanced research computing, Interdisciplinary Center for Applied Mathematics
Cavazos T (2000) Using self-organizing maps to investigate extreme climate events: an application to wintertime precipitation in the Balkans. J Clim 13(10):1718–1732
Chen Y, Qin B, Liu T, Liu Y, Li S (2010) The comparison of SOM and k-means for text clustering. Comput Inf Sci 3(2):268
Davies DL, Bouldin DW (1979) A cluster separation measure. IEEE Trans Pattern Anal Mach Intell 2:224–227
Dollhopf SL, Hashsham SA, Tiedje JM (2001) Interpreting 16S rDNA T-RFLP data: application of self-organizing maps and principal component analysis to describe community dynamics and convergence. Microb Ecol 42(4):495–505
Dorigo M, Birattari M, Stutzle T (2006) Ant colony optimization. IEEE Comput Intell Mag 1(4):28–39
Dréo J et al (2006) Metaheuristics for hard optimization: methods and case studies. Springer, Berlin
Dua D, Karra Taniskidou E (2017) UCI machine learning repository. University of California, School of Information and Computer Science, Irvine, CA. http://archive.ics.uci.edu/ml
Dunn JC (1974) Well-separated clusters and optimal fuzzy partitions, vol 4. Taylor & Francis, Abingdon
Eick CF, Zeidat N, Zhao Z (2004) Supervised clustering-algorithms and benefits. In: Tools with artificial intelligence. 16th IEEE international conference on artificial intelligence, pp 774–776. IEEE
Gendreau M, Jean-Yves P (2010) Handbook of metaheuristics, vol 2. Springer, New York
Geritz SAH, Mesze G, Metz JAJ et al (1998) Evolutionarily singular strategies and the adaptive growth and branching of the evolutionary tree. Evol Ecol 12(1):35–57
Glover F (1989) Tabu search—part I. ORSA J Comput 1(3):190–206
Goldberg DE, Holland JH (1988) Genetic algorithms and machine learning. Mach Learn 3(2):95–99
Gray RM (2011) Entropy and information theory. Springer, Berlin
Grefenstette JJ (1986) Optimization of control parameters for genetic algorithms. IEEE Trans Syst Man Cybern 16(1):122–128
Gybenko G (1989) Approximation by superposition of sigmoidal functions. Math Control Signals Syst 4:303–314
Halkidi M, Batistakis Y, Vazirgiannis M (2001) On clustering validation techniques, vol 17. Springer, Berlin
Hartigan JA, Wong MA (1979) Algorithm as 136: a k-means clustering algorithm. J R Stat Soc Ser C Appl Stat 28(1):100–108
Haykin SS, Haykin SS, Haykin SS, Haykin SS (2009) Neural networks and learning machines, vol 3. Pearson, Upper Saddle River
Hewitson BC, Crane RG (2002) Self-organizing maps: applications to synoptic climatology. Clim Res 22(1):13–26
Honkela T, Kaski S, Lagus K, Kohonen T (1997) WEBSOM—self-organizing maps of document collections. Proc WSOM 97:4–6
Jiang H, Liu Y, Zheng L (2010) Design and simulation of simulated annealing algorithm with harmony search. In: International conference in swarm intelligence. Springer, Berlin, pp 454–460
Kennedy J (2011) Particle swarm optimization. In: Sammut C, Webb GI (eds) Encyclopedia of machine learning. Springer, Berlin, pp 760–766
Kim J-H, Myung H (1997) Evolutionary programming techniques for constrained optimization problems. IEEE Trans Evol Comput 1(2):129–140
Kirkpatrick S, Gelatt CD, Vecchi MP (1983) Optimization by simulated annealing. Science 220(4598):671–680
Knops ZF, Maintz JBA, Viergever MA, Pluim JPW (2004) Registration using segment intensity remapping and mutual information. In: International conference on medical image computing and computer-assisted intervention. Springer, Berlin, pp 805–812
Kohonen T, Somervuo P (1998) Self-organizing maps of symbol strings. Neurocomputing 21(1–3):19–30
Kovacs F, Ivancsy R (2006) A novel cluster validity index: variance of the nearest neighbor distance. WSEAS Trans Comput 5:477–483
Koza JR (1999) Genetic programming III: Darwinian invention and problem solving, vol 3. Morgan Kaufmann, Burlington
Koziel S, Xin-She Y (2011) Computational optimization, methods and algorithms, vol 356. Springer, Berlin
Kuri-Morales AF (2014) The best neural network architecture. In: Mexican international conference on artificial intelligence, pp 72–84. Springer
Kuri-Morales AF (2015) Categorical encoding with neural networks and genetic algorithms. In: WSEAS proceedings of the 6th international conference on applied informatics and computing theory, pp 167–175
Kuri-Morales AF, Gutiérrez-García J (2002) Penalty function methods for constrained optimization with genetic algorithms: a statistical analysis. In: Mexican international conference on artificial intelligence, pp 108–117. Springer
Kuri-Morales A (2016) Closed determination of the number of neurons in the hidden layer of a multi-layered perceptron network. Soft Comput 21:1–13
Kuri-Morales A, Aldana-Bobadilla E (2013) The best genetic algorithm I. In: Gelbukh FCA, González M (eds) Advances in soft computing and its applications. Springer, Berlin, pp 1–15
Kuri-Morales AF, Aldana-Bobadilla E, López-Peña I (2013) The best genetic algorithm II. In: Gelbukh FCA, González M (eds) Advances in soft computing and its applications. Springer, Berlin, pp 16–29
Kuri-Morales A, Quezada-Villegas C (1998) A universal eclectic genetic algorithm for constrained optimization. In: Proceedings of the 6th European congress on intelligent techniques and soft computing, vol 1, pp 518–522
Lei JZ, Ghorbani A (2004) Network intrusion detection using an improved competitive learning neural network. In: Second annual conference on communication networks and services research, 2004. Proceedings, pp 190–197. IEEE
Lobo FJ, Lima CF, Michalewicz Z (2007) Parameter setting in evolutionary algorithms, vol 54. Springer, Berlin
MacQueen J et al (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, vol 1, pp 281–297. Oakland, CA, USA
Mingoti SA, Lima JO (2006) Comparing SOM neural network with fuzzy c-means, k-means and traditional hierarchical clustering algorithms. Eur J Oper Res 174(3):1742–1759
Mitchell MF (1998) An introduction to genetic algorithms. MIT Press, Cambridge
Mitchell M, Forrest S, Holland JH (1992) The royal road for genetic algorithms: fitness landscapes and GA performance. In: Proceedings of the first European conference on artificial life, pp 245–254
Molga M, Smutnicki C (2005) Test functions for optimization needs. http://www.zsd.ict.pwr.wroc.pl/files/docs/functions.pdf
Pohlheim H (2012) GEATBX®—the genetic and evolutionary algorithm toolbox for MATLAB®, 2007. http://www.geatbx.com/. Accessed 24 June 2016
Powers DM (2011) Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation. Bioinfo Publications
Ritter H, Kohonen T (1989) Self-organizing semantic maps. Biol Cybern 61(4):241–254
Rudolph G (1994) Convergence analysis of canonical genetic algorithms. IEEE Trans Neural Netw 5(1):96–101
Spears WM, De Jong KA, Bäck T, Fogel DB, De Garis H (1993) An overview of evolutionary computation. In: European conference on machine learning, pp 442–459
Storn R, Price K (1997) Differential evolution—a simple and efficient heuristic for global optimization over continuous spaces. J Global Optim 11(4):341–359
Talbi E-G (2009) Metaheuristics: from design to implementation, vol 74. Wiley, New York
Ultsch A (2007) Emergence in self-organizing feature maps. University Library of Bielefeld
Van Hulle MM (2012) Self-organizing maps. In: Rozenberg G, Bäck T, Kok JN (eds) Handbook of natural computing. Springer, Berlin, pp 585–622
White DJ, Anandalingam G (1993) A penalty function approach for solving bi-level linear programs. J Global Optim 3(4):397–419
Yeniay Ö (2005) Penalty function methods for constrained optimization with genetic algorithms. Math Comput Appl 10(1):45–56
Zha H, He X, Ding C, Gu M, Simon HD (2002) Spectral relaxation for k-means clustering. In: Advances in neural information processing systems, pp 1057–1064
Acknowledgements
The authors acknowledge the support of Consejo Nacional de Ciencia y Tecnología (CONACYT), Asociación Mexicana de Cultura, A.C., and Centro de Investigación y Estudios Avanzados (CINVESTAV).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Ethical approval
This article does not contain any studies with human participants or animals performed by the authors.
Additional information
Communicated by V. Loia.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A: Experimental process for setting heuristic parameters
An important element in the performance of any heuristic is the way in which its parameters are set. Many attempts to solve this problem have been reported (Lobo et al. 2007; Grefenstette 1986; Storn and Price 1997; Talbi 2009). Most of them have important dependencies associated to the properties of the problem that disable the generalization of the suggested optimal values. We conducted an experimental process to determine the most suitable parameters setting relative to the characteristics of our problem. A parameters setting can be defined as m-tuple of the form \(\tau =[p_1,p_2,\ldots ,p_{m}]\) that includes those parameters \(p_i\) that are necessary to execute H (e.g., population size, crossover and mutation probabilities). Given a heuristic H, the experimental process is summarized as follows:
-
1.
The domain for each parameter \(p_i\) is previously defined.
-
2.
A random set of possible instances of \(\tau \) is systematically generated (about 300 different instances). Such a set is denoted as \({\mathcal {I}}\).
-
3.
From \({\mathcal {I}}\), an instance \(\tau _i\) is randomly chosen.
-
4.
A set of 32 clustering problems (from a reservoir described in “Appendix A”) is solved via Algorithm 1 using H with the parameters \(\tau _i\). Based on the solutions obtained using \(\tau _i\), an average value is calculated and taken as a performance value \(F_i\).
-
5.
Step 3 is repeated until \(i=300\).
-
6.
The instance that achieves the best value of \(F_i\) is selected as the most suitable setting.
The above process was executed with EGA using the indices SD, DB, DD. We reformulated every index as an objective function that requires to be minimized and whose range is the interval [0, 1]. When EGA is running, we got snapshots of the value of the index to be optimized every 50 generations. This allows us to obtain the value of performance F throughout the adaptive process. In Fig. 7 is shown the value of F obtained by the top 5 instances of \(\tau \) throughout 500 iterations.
The best instance of \(\tau \) is that with the fastest convergence and the best value of F. The values of such an instance were shown in Table 1.
Appendix B: Generating clustering problems
The relative performance of a clustering method must refer to its ability when compared to other methods to solve the same set of problems. A generalization of the performance will be possible as long as these problems represent a random sample of all clustering problems in a wide a numerical space. A systematic process has been followed to generate numerical datasets in this space. Each dataset contains elements grouped into k clusters which are generated via a set of parametric functions as follows:
Let k, \(\aleph _i\) and \({\mathbb {F}}\) the number of clusters, the size of the ith cluster and a set of generator functions, respectively. A cluster will be a set of d-dimensional vectors generated as follows:
-
1.
From \({\mathbb {F}}\), a subset of functions \(f_i:{\mathbb {R}}\rightarrow {\mathbb {R}}\) are randomly chosen (\(i=1,2,\ldots ,d\)).
-
2.
A vector of the form \(\mathbf {v}=[f_1(x_1),f_2(x_2),\ldots ,f_{d}(x_d)]\) is generated. The values of \(x_i\) are drawn randomly from the domain of \(f_i\).
-
3.
Step 2 is repeated until \(\aleph _i\) vectors have been obtained.
This process is executed until k clusters have been obtained. Set \({\mathbb {F}}\) includes the functions reported in Pohlheim (2012) and Molga and Smutnicki (2005). The reservoir generated includes clustering problems with \(k = 2,3,\ldots ,20\). The above process was implemented in Java language and the resulting data were stored in a relational database (MySQL).
Rights and permissions
About this article
Cite this article
Aldana-Bobadila, E., Kuri-Morales, A., Lopez-Arevalo, I. et al. An unsupervised learning approach for multilayer perceptron networks. Soft Comput 23, 11001–11013 (2019). https://doi.org/10.1007/s00500-018-3655-2
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-018-3655-2