Skip to main content

An unsupervised learning approach for multilayer perceptron networks

Learning driven by validity indices

Abstract

Multilayer perceptron networks have been designed to solve supervised learning problems in which there is a set of known labeled training feature vectors. The resulting model allows us to infer adequate labels for unknown input vectors. Traditionally, the optimal model is the one that minimizes the error between the known labels and those inferred labels via such a model. The training process results in those weights that achieve the most adequate labels. Training implies a search process which is usually determined by the descent gradient of the error. In this work, we propose to replace the known labels by a set of such labels induced by a validity index. The validity index represents a measure of the adequateness of the model relative only to intrinsic structures and relationships of the set of feature vectors and not to previously known labels. Since, in general, there is no guarantee of the differentiability of such an index, we resort to heuristic optimization techniques. Our proposal results in an unsupervised learning approach for multilayer perceptron networks that allows us to infer the best model relative to labels derived from such a validity index which uncovers the hidden relationships of an unlabeled dataset.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

References

  • Ahalt SC, Krishnamurthy AK, Chen P, Melton DE (1990) Competitive learning algorithms for vector quantization. Neural Netw 3(3):277–290

    Article  Google Scholar 

  • Aldana-Bobadilla E, Alfaro-Pérez C (2015) Finding the optimal sample based on Shannon’s entropy and genetic algorithms. Springer, Cham, pp 353–363

    Google Scholar 

  • Aldana-Bobadilla E, Kuri-Morales A (2015) A clustering method based on the maximum entropy principle. Entropy 17(1):175

    Article  Google Scholar 

  • Baç ao F, Lobo V, Painho M (2005) Self-organizing maps as substitutes for k-means clustering. In: International conference on computational science, pp 476–483. Springer

  • Battiti R, Tecchiolli G (1994) The reactive Tabu search. ORSA J Comput 6(2):126–140

    MATH  Article  Google Scholar 

  • Brooks SP, Morgan BJT (1995) Optimization using simulated annealing. The Statistician 44:241–257

    Article  Google Scholar 

  • Burkardt J (2009) K-means clustering. In: Virginia Tech, Advanced research computing, Interdisciplinary Center for Applied Mathematics

  • Cavazos T (2000) Using self-organizing maps to investigate extreme climate events: an application to wintertime precipitation in the Balkans. J Clim 13(10):1718–1732

    Article  Google Scholar 

  • Chen Y, Qin B, Liu T, Liu Y, Li S (2010) The comparison of SOM and k-means for text clustering. Comput Inf Sci 3(2):268

    Article  Google Scholar 

  • Davies DL, Bouldin DW (1979) A cluster separation measure. IEEE Trans Pattern Anal Mach Intell 2:224–227

    Article  Google Scholar 

  • Dollhopf SL, Hashsham SA, Tiedje JM (2001) Interpreting 16S rDNA T-RFLP data: application of self-organizing maps and principal component analysis to describe community dynamics and convergence. Microb Ecol 42(4):495–505

    Article  Google Scholar 

  • Dorigo M, Birattari M, Stutzle T (2006) Ant colony optimization. IEEE Comput Intell Mag 1(4):28–39

    Article  Google Scholar 

  • Dréo J et al (2006) Metaheuristics for hard optimization: methods and case studies. Springer, Berlin

    MATH  Google Scholar 

  • Dua D, Karra Taniskidou E (2017) UCI machine learning repository. University of California, School of Information and Computer Science, Irvine, CA. http://archive.ics.uci.edu/ml

  • Dunn JC (1974) Well-separated clusters and optimal fuzzy partitions, vol 4. Taylor & Francis, Abingdon

    MATH  Google Scholar 

  • Eick CF, Zeidat N, Zhao Z (2004) Supervised clustering-algorithms and benefits. In: Tools with artificial intelligence. 16th IEEE international conference on artificial intelligence, pp 774–776. IEEE

  • Gendreau M, Jean-Yves P (2010) Handbook of metaheuristics, vol 2. Springer, New York

    MATH  Book  Google Scholar 

  • Geritz SAH, Mesze G, Metz JAJ et al (1998) Evolutionarily singular strategies and the adaptive growth and branching of the evolutionary tree. Evol Ecol 12(1):35–57

    Article  Google Scholar 

  • Glover F (1989) Tabu search—part I. ORSA J Comput 1(3):190–206

    MathSciNet  MATH  Article  Google Scholar 

  • Goldberg DE, Holland JH (1988) Genetic algorithms and machine learning. Mach Learn 3(2):95–99

    Article  Google Scholar 

  • Gray RM (2011) Entropy and information theory. Springer, Berlin

    MATH  Book  Google Scholar 

  • Grefenstette JJ (1986) Optimization of control parameters for genetic algorithms. IEEE Trans Syst Man Cybern 16(1):122–128

    Article  Google Scholar 

  • Gybenko G (1989) Approximation by superposition of sigmoidal functions. Math Control Signals Syst 4:303–314

    MathSciNet  Article  Google Scholar 

  • Halkidi M, Batistakis Y, Vazirgiannis M (2001) On clustering validation techniques, vol 17. Springer, Berlin

    MATH  Google Scholar 

  • Hartigan JA, Wong MA (1979) Algorithm as 136: a k-means clustering algorithm. J R Stat Soc Ser C Appl Stat 28(1):100–108

    MATH  Google Scholar 

  • Haykin SS, Haykin SS, Haykin SS, Haykin SS (2009) Neural networks and learning machines, vol 3. Pearson, Upper Saddle River

    MATH  Google Scholar 

  • Hewitson BC, Crane RG (2002) Self-organizing maps: applications to synoptic climatology. Clim Res 22(1):13–26

    Article  Google Scholar 

  • Honkela T, Kaski S, Lagus K, Kohonen T (1997) WEBSOM—self-organizing maps of document collections. Proc WSOM 97:4–6

    MATH  Google Scholar 

  • Jiang H, Liu Y, Zheng L (2010) Design and simulation of simulated annealing algorithm with harmony search. In: International conference in swarm intelligence. Springer, Berlin, pp 454–460

  • Kennedy J (2011) Particle swarm optimization. In: Sammut C, Webb GI (eds) Encyclopedia of machine learning. Springer, Berlin, pp 760–766

    Google Scholar 

  • Kim J-H, Myung H (1997) Evolutionary programming techniques for constrained optimization problems. IEEE Trans Evol Comput 1(2):129–140

    MathSciNet  Article  Google Scholar 

  • Kirkpatrick S, Gelatt CD, Vecchi MP (1983) Optimization by simulated annealing. Science 220(4598):671–680

    MathSciNet  MATH  Article  Google Scholar 

  • Knops ZF, Maintz JBA, Viergever MA, Pluim JPW (2004) Registration using segment intensity remapping and mutual information. In: International conference on medical image computing and computer-assisted intervention. Springer, Berlin, pp 805–812

  • Kohonen T, Somervuo P (1998) Self-organizing maps of symbol strings. Neurocomputing 21(1–3):19–30

    MATH  Article  Google Scholar 

  • Kovacs F, Ivancsy R (2006) A novel cluster validity index: variance of the nearest neighbor distance. WSEAS Trans Comput 5:477–483

    Google Scholar 

  • Koza JR (1999) Genetic programming III: Darwinian invention and problem solving, vol 3. Morgan Kaufmann, Burlington

    MATH  Google Scholar 

  • Koziel S, Xin-She Y (2011) Computational optimization, methods and algorithms, vol 356. Springer, Berlin

    MATH  Book  Google Scholar 

  • Kuri-Morales AF (2014) The best neural network architecture. In: Mexican international conference on artificial intelligence, pp 72–84. Springer

  • Kuri-Morales AF (2015) Categorical encoding with neural networks and genetic algorithms. In: WSEAS proceedings of the 6th international conference on applied informatics and computing theory, pp 167–175

  • Kuri-Morales AF, Gutiérrez-García J (2002) Penalty function methods for constrained optimization with genetic algorithms: a statistical analysis. In: Mexican international conference on artificial intelligence, pp 108–117. Springer

  • Kuri-Morales A (2016) Closed determination of the number of neurons in the hidden layer of a multi-layered perceptron network. Soft Comput 21:1–13

    Google Scholar 

  • Kuri-Morales A, Aldana-Bobadilla E (2013) The best genetic algorithm I. In: Gelbukh FCA, González M (eds) Advances in soft computing and its applications. Springer, Berlin, pp 1–15

    MATH  Google Scholar 

  • Kuri-Morales AF, Aldana-Bobadilla E, López-Peña I (2013) The best genetic algorithm II. In: Gelbukh FCA, González M (eds) Advances in soft computing and its applications. Springer, Berlin, pp 16–29

    Chapter  Google Scholar 

  • Kuri-Morales A, Quezada-Villegas C (1998) A universal eclectic genetic algorithm for constrained optimization. In: Proceedings of the 6th European congress on intelligent techniques and soft computing, vol 1, pp 518–522

  • Lei JZ, Ghorbani A (2004) Network intrusion detection using an improved competitive learning neural network. In: Second annual conference on communication networks and services research, 2004. Proceedings, pp 190–197. IEEE

  • Lobo FJ, Lima CF, Michalewicz Z (2007) Parameter setting in evolutionary algorithms, vol 54. Springer, Berlin

    MATH  Book  Google Scholar 

  • MacQueen J et al (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, vol 1, pp 281–297. Oakland, CA, USA

  • Mingoti SA, Lima JO (2006) Comparing SOM neural network with fuzzy c-means, k-means and traditional hierarchical clustering algorithms. Eur J Oper Res 174(3):1742–1759

    MATH  Article  Google Scholar 

  • Mitchell MF (1998) An introduction to genetic algorithms. MIT Press, Cambridge

    MATH  Google Scholar 

  • Mitchell M, Forrest S, Holland JH (1992) The royal road for genetic algorithms: fitness landscapes and GA performance. In: Proceedings of the first European conference on artificial life, pp 245–254

  • Molga M, Smutnicki C (2005) Test functions for optimization needs. http://www.zsd.ict.pwr.wroc.pl/files/docs/functions.pdf

  • Pohlheim H (2012) GEATBX®—the genetic and evolutionary algorithm toolbox for MATLAB®, 2007. http://www.geatbx.com/. Accessed 24 June 2016

  • Powers DM (2011) Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation. Bioinfo Publications

  • Ritter H, Kohonen T (1989) Self-organizing semantic maps. Biol Cybern 61(4):241–254

    Article  Google Scholar 

  • Rudolph G (1994) Convergence analysis of canonical genetic algorithms. IEEE Trans Neural Netw 5(1):96–101

    MathSciNet  Article  Google Scholar 

  • Spears WM, De Jong KA, Bäck T, Fogel DB, De Garis H (1993) An overview of evolutionary computation. In: European conference on machine learning, pp 442–459

  • Storn R, Price K (1997) Differential evolution—a simple and efficient heuristic for global optimization over continuous spaces. J Global Optim 11(4):341–359

    MathSciNet  MATH  Article  Google Scholar 

  • Talbi E-G (2009) Metaheuristics: from design to implementation, vol 74. Wiley, New York

    MATH  Book  Google Scholar 

  • Ultsch A (2007) Emergence in self-organizing feature maps. University Library of Bielefeld

  • Van Hulle MM (2012) Self-organizing maps. In: Rozenberg G, Bäck T, Kok JN (eds) Handbook of natural computing. Springer, Berlin, pp 585–622

    Chapter  Google Scholar 

  • White DJ, Anandalingam G (1993) A penalty function approach for solving bi-level linear programs. J Global Optim 3(4):397–419

    MathSciNet  MATH  Article  Google Scholar 

  • Yeniay Ö (2005) Penalty function methods for constrained optimization with genetic algorithms. Math Comput Appl 10(1):45–56

    MathSciNet  Google Scholar 

  • Zha H, He X, Ding C, Gu M, Simon HD (2002) Spectral relaxation for k-means clustering. In: Advances in neural information processing systems, pp 1057–1064

Download references

Acknowledgements

The authors acknowledge the support of Consejo Nacional de Ciencia y Tecnología (CONACYT), Asociación Mexicana de Cultura, A.C., and Centro de Investigación y Estudios Avanzados (CINVESTAV).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Edwin Aldana-Bobadila.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Communicated by V. Loia.

Appendices

Appendix A: Experimental process for setting heuristic parameters

An important element in the performance of any heuristic is the way in which its parameters are set. Many attempts to solve this problem have been reported (Lobo et al. 2007; Grefenstette 1986; Storn and Price 1997; Talbi 2009). Most of them have important dependencies associated to the properties of the problem that disable the generalization of the suggested optimal values. We conducted an experimental process to determine the most suitable parameters setting relative to the characteristics of our problem. A parameters setting can be defined as m-tuple of the form \(\tau =[p_1,p_2,\ldots ,p_{m}]\) that includes those parameters \(p_i\) that are necessary to execute H (e.g., population size, crossover and mutation probabilities). Given a heuristic H, the experimental process is summarized as follows:

  1. 1.

    The domain for each parameter \(p_i\) is previously defined.

  2. 2.

    A random set of possible instances of \(\tau \) is systematically generated (about 300 different instances). Such a set is denoted as \({\mathcal {I}}\).

  3. 3.

    From \({\mathcal {I}}\), an instance \(\tau _i\) is randomly chosen.

  4. 4.

    A set of 32 clustering problems (from a reservoir described in “Appendix A”) is solved via Algorithm 1 using H with the parameters \(\tau _i\). Based on the solutions obtained using \(\tau _i\), an average value is calculated and taken as a performance value \(F_i\).

  5. 5.

    Step 3 is repeated until \(i=300\).

  6. 6.

    The instance that achieves the best value of \(F_i\) is selected as the most suitable setting.

The above process was executed with EGA using the indices SD, DB, DD. We reformulated every index as an objective function that requires to be minimized and whose range is the interval [0, 1]. When EGA is running, we got snapshots of the value of the index to be optimized every 50 generations. This allows us to obtain the value of performance F throughout the adaptive process. In Fig. 7 is shown the value of F obtained by the top 5 instances of \(\tau \) throughout 500 iterations.

The best instance of \(\tau \) is that with the fastest convergence and the best value of F. The values of such an instance were shown in Table 1.

Appendix B: Generating clustering problems

The relative performance of a clustering method must refer to its ability when compared to other methods to solve the same set of problems. A generalization of the performance will be possible as long as these problems represent a random sample of all clustering problems in a wide a numerical space. A systematic process has been followed to generate numerical datasets in this space. Each dataset contains elements grouped into k clusters which are generated via a set of parametric functions as follows:

Let k, \(\aleph _i\) and \({\mathbb {F}}\) the number of clusters, the size of the ith cluster and a set of generator functions, respectively. A cluster will be a set of d-dimensional vectors generated as follows:

  1. 1.

    From \({\mathbb {F}}\), a subset of functions \(f_i:{\mathbb {R}}\rightarrow {\mathbb {R}}\) are randomly chosen (\(i=1,2,\ldots ,d\)).

  2. 2.

    A vector of the form \(\mathbf {v}=[f_1(x_1),f_2(x_2),\ldots ,f_{d}(x_d)]\) is generated. The values of \(x_i\) are drawn randomly from the domain of \(f_i\).

  3. 3.

    Step 2 is repeated until \(\aleph _i\) vectors have been obtained.

This process is executed until k clusters have been obtained. Set \({\mathbb {F}}\) includes the functions reported in Pohlheim (2012) and Molga and Smutnicki (2005). The reservoir generated includes clustering problems with \(k = 2,3,\ldots ,20\). The above process was implemented in Java language and the resulting data were stored in a relational database (MySQL).

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Aldana-Bobadila, E., Kuri-Morales, A., Lopez-Arevalo, I. et al. An unsupervised learning approach for multilayer perceptron networks. Soft Comput 23, 11001–11013 (2019). https://doi.org/10.1007/s00500-018-3655-2

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-018-3655-2

Keywords

  • Neural networks
  • Clustering
  • Unsupervised learning