An unsupervised learning approach for multilayer perceptron networks

Aldana-Bobadila, Edwin; Kuri-Morales, Angel; Lopez-Arevalo, Ivan; Rios-Alvarado, Ana B.

doi:10.1007/s00500-018-3655-2

An unsupervised learning approach for multilayer perceptron networks

Learning driven by validity indices

Methodologies and Application
Published: 26 November 2018

Volume 23, pages 11001–11013, (2019)
Cite this article

Soft Computing Aims and scope Submit manuscript

Edwin Aldana-Bobadila ORCID: orcid.org/0000-0001-8315-1813¹,
Angel Kuri-Morales²,
Ivan Lopez-Arevalo³ &
…
Ana B. Rios-Alvarado⁴

522 Accesses
3 Citations
3 Altmetric
Explore all metrics

Abstract

Multilayer perceptron networks have been designed to solve supervised learning problems in which there is a set of known labeled training feature vectors. The resulting model allows us to infer adequate labels for unknown input vectors. Traditionally, the optimal model is the one that minimizes the error between the known labels and those inferred labels via such a model. The training process results in those weights that achieve the most adequate labels. Training implies a search process which is usually determined by the descent gradient of the error. In this work, we propose to replace the known labels by a set of such labels induced by a validity index. The validity index represents a measure of the adequateness of the model relative only to intrinsic structures and relationships of the set of feature vectors and not to previously known labels. Since, in general, there is no guarantee of the differentiability of such an index, we resort to heuristic optimization techniques. Our proposal results in an unsupervised learning approach for multilayer perceptron networks that allows us to infer the best model relative to labels derived from such a validity index which uncovers the hidden relationships of an unlabeled dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 2

Robust Multilayer Perceptrons: Robust Loss Functions and Their Derivatives

Second Order Training and Sizing for the Multilayer Perceptron

Article 08 October 2019

MMLD Inference of Multilayer Perceptrons

References

Ahalt SC, Krishnamurthy AK, Chen P, Melton DE (1990) Competitive learning algorithms for vector quantization. Neural Netw 3(3):277–290
Article Google Scholar
Aldana-Bobadilla E, Alfaro-Pérez C (2015) Finding the optimal sample based on Shannon’s entropy and genetic algorithms. Springer, Cham, pp 353–363
Google Scholar
Aldana-Bobadilla E, Kuri-Morales A (2015) A clustering method based on the maximum entropy principle. Entropy 17(1):175
Article Google Scholar
Baç ao F, Lobo V, Painho M (2005) Self-organizing maps as substitutes for k-means clustering. In: International conference on computational science, pp 476–483. Springer
Battiti R, Tecchiolli G (1994) The reactive Tabu search. ORSA J Comput 6(2):126–140
Article MATH Google Scholar
Brooks SP, Morgan BJT (1995) Optimization using simulated annealing. The Statistician 44:241–257
Article Google Scholar
Burkardt J (2009) K-means clustering. In: Virginia Tech, Advanced research computing, Interdisciplinary Center for Applied Mathematics
Cavazos T (2000) Using self-organizing maps to investigate extreme climate events: an application to wintertime precipitation in the Balkans. J Clim 13(10):1718–1732
Article Google Scholar
Chen Y, Qin B, Liu T, Liu Y, Li S (2010) The comparison of SOM and k-means for text clustering. Comput Inf Sci 3(2):268
Article Google Scholar
Davies DL, Bouldin DW (1979) A cluster separation measure. IEEE Trans Pattern Anal Mach Intell 2:224–227
Article Google Scholar
Dollhopf SL, Hashsham SA, Tiedje JM (2001) Interpreting 16S rDNA T-RFLP data: application of self-organizing maps and principal component analysis to describe community dynamics and convergence. Microb Ecol 42(4):495–505
Article Google Scholar
Dorigo M, Birattari M, Stutzle T (2006) Ant colony optimization. IEEE Comput Intell Mag 1(4):28–39
Article Google Scholar
Dréo J et al (2006) Metaheuristics for hard optimization: methods and case studies. Springer, Berlin
MATH Google Scholar
Dua D, Karra Taniskidou E (2017) UCI machine learning repository. University of California, School of Information and Computer Science, Irvine, CA. http://archive.ics.uci.edu/ml
Dunn JC (1974) Well-separated clusters and optimal fuzzy partitions, vol 4. Taylor & Francis, Abingdon
MATH Google Scholar
Eick CF, Zeidat N, Zhao Z (2004) Supervised clustering-algorithms and benefits. In: Tools with artificial intelligence. 16th IEEE international conference on artificial intelligence, pp 774–776. IEEE
Gendreau M, Jean-Yves P (2010) Handbook of metaheuristics, vol 2. Springer, New York
Book MATH Google Scholar
Geritz SAH, Mesze G, Metz JAJ et al (1998) Evolutionarily singular strategies and the adaptive growth and branching of the evolutionary tree. Evol Ecol 12(1):35–57
Article Google Scholar
Glover F (1989) Tabu search—part I. ORSA J Comput 1(3):190–206
Article MathSciNet MATH Google Scholar
Goldberg DE, Holland JH (1988) Genetic algorithms and machine learning. Mach Learn 3(2):95–99
Article Google Scholar
Gray RM (2011) Entropy and information theory. Springer, Berlin
Book MATH Google Scholar
Grefenstette JJ (1986) Optimization of control parameters for genetic algorithms. IEEE Trans Syst Man Cybern 16(1):122–128
Article Google Scholar
Gybenko G (1989) Approximation by superposition of sigmoidal functions. Math Control Signals Syst 4:303–314
Article MathSciNet Google Scholar
Halkidi M, Batistakis Y, Vazirgiannis M (2001) On clustering validation techniques, vol 17. Springer, Berlin
MATH Google Scholar
Hartigan JA, Wong MA (1979) Algorithm as 136: a k-means clustering algorithm. J R Stat Soc Ser C Appl Stat 28(1):100–108
MATH Google Scholar
Haykin SS, Haykin SS, Haykin SS, Haykin SS (2009) Neural networks and learning machines, vol 3. Pearson, Upper Saddle River
MATH Google Scholar
Hewitson BC, Crane RG (2002) Self-organizing maps: applications to synoptic climatology. Clim Res 22(1):13–26
Article Google Scholar
Honkela T, Kaski S, Lagus K, Kohonen T (1997) WEBSOM—self-organizing maps of document collections. Proc WSOM 97:4–6
MATH Google Scholar
Jiang H, Liu Y, Zheng L (2010) Design and simulation of simulated annealing algorithm with harmony search. In: International conference in swarm intelligence. Springer, Berlin, pp 454–460
Kennedy J (2011) Particle swarm optimization. In: Sammut C, Webb GI (eds) Encyclopedia of machine learning. Springer, Berlin, pp 760–766
Google Scholar
Kim J-H, Myung H (1997) Evolutionary programming techniques for constrained optimization problems. IEEE Trans Evol Comput 1(2):129–140
Article MathSciNet Google Scholar
Kirkpatrick S, Gelatt CD, Vecchi MP (1983) Optimization by simulated annealing. Science 220(4598):671–680
Article MathSciNet MATH Google Scholar
Knops ZF, Maintz JBA, Viergever MA, Pluim JPW (2004) Registration using segment intensity remapping and mutual information. In: International conference on medical image computing and computer-assisted intervention. Springer, Berlin, pp 805–812
Kohonen T, Somervuo P (1998) Self-organizing maps of symbol strings. Neurocomputing 21(1–3):19–30
Article MATH Google Scholar
Kovacs F, Ivancsy R (2006) A novel cluster validity index: variance of the nearest neighbor distance. WSEAS Trans Comput 5:477–483
Google Scholar
Koza JR (1999) Genetic programming III: Darwinian invention and problem solving, vol 3. Morgan Kaufmann, Burlington
MATH Google Scholar
Koziel S, Xin-She Y (2011) Computational optimization, methods and algorithms, vol 356. Springer, Berlin
Book MATH Google Scholar
Kuri-Morales AF (2014) The best neural network architecture. In: Mexican international conference on artificial intelligence, pp 72–84. Springer
Kuri-Morales AF (2015) Categorical encoding with neural networks and genetic algorithms. In: WSEAS proceedings of the 6th international conference on applied informatics and computing theory, pp 167–175
Kuri-Morales AF, Gutiérrez-García J (2002) Penalty function methods for constrained optimization with genetic algorithms: a statistical analysis. In: Mexican international conference on artificial intelligence, pp 108–117. Springer
Kuri-Morales A (2016) Closed determination of the number of neurons in the hidden layer of a multi-layered perceptron network. Soft Comput 21:1–13
Google Scholar
Kuri-Morales A, Aldana-Bobadilla E (2013) The best genetic algorithm I. In: Gelbukh FCA, González M (eds) Advances in soft computing and its applications. Springer, Berlin, pp 1–15
MATH Google Scholar
Kuri-Morales AF, Aldana-Bobadilla E, López-Peña I (2013) The best genetic algorithm II. In: Gelbukh FCA, González M (eds) Advances in soft computing and its applications. Springer, Berlin, pp 16–29
Chapter Google Scholar
Kuri-Morales A, Quezada-Villegas C (1998) A universal eclectic genetic algorithm for constrained optimization. In: Proceedings of the 6th European congress on intelligent techniques and soft computing, vol 1, pp 518–522
Lei JZ, Ghorbani A (2004) Network intrusion detection using an improved competitive learning neural network. In: Second annual conference on communication networks and services research, 2004. Proceedings, pp 190–197. IEEE
Lobo FJ, Lima CF, Michalewicz Z (2007) Parameter setting in evolutionary algorithms, vol 54. Springer, Berlin
Book MATH Google Scholar
MacQueen J et al (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, vol 1, pp 281–297. Oakland, CA, USA
Mingoti SA, Lima JO (2006) Comparing SOM neural network with fuzzy c-means, k-means and traditional hierarchical clustering algorithms. Eur J Oper Res 174(3):1742–1759
Article MATH Google Scholar
Mitchell MF (1998) An introduction to genetic algorithms. MIT Press, Cambridge
MATH Google Scholar
Mitchell M, Forrest S, Holland JH (1992) The royal road for genetic algorithms: fitness landscapes and GA performance. In: Proceedings of the first European conference on artificial life, pp 245–254
Molga M, Smutnicki C (2005) Test functions for optimization needs. http://www.zsd.ict.pwr.wroc.pl/files/docs/functions.pdf
Pohlheim H (2012) GEATBX®—the genetic and evolutionary algorithm toolbox for MATLAB®, 2007. http://www.geatbx.com/. Accessed 24 June 2016
Powers DM (2011) Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation. Bioinfo Publications
Ritter H, Kohonen T (1989) Self-organizing semantic maps. Biol Cybern 61(4):241–254
Article Google Scholar
Rudolph G (1994) Convergence analysis of canonical genetic algorithms. IEEE Trans Neural Netw 5(1):96–101
Article MathSciNet Google Scholar
Spears WM, De Jong KA, Bäck T, Fogel DB, De Garis H (1993) An overview of evolutionary computation. In: European conference on machine learning, pp 442–459
Storn R, Price K (1997) Differential evolution—a simple and efficient heuristic for global optimization over continuous spaces. J Global Optim 11(4):341–359
Article MathSciNet MATH Google Scholar
Talbi E-G (2009) Metaheuristics: from design to implementation, vol 74. Wiley, New York
Book MATH Google Scholar
Ultsch A (2007) Emergence in self-organizing feature maps. University Library of Bielefeld
Van Hulle MM (2012) Self-organizing maps. In: Rozenberg G, Bäck T, Kok JN (eds) Handbook of natural computing. Springer, Berlin, pp 585–622
Chapter Google Scholar
White DJ, Anandalingam G (1993) A penalty function approach for solving bi-level linear programs. J Global Optim 3(4):397–419
Article MathSciNet MATH Google Scholar
Yeniay Ö (2005) Penalty function methods for constrained optimization with genetic algorithms. Math Comput Appl 10(1):45–56
MathSciNet Google Scholar
Zha H, He X, Ding C, Gu M, Simon HD (2002) Spectral relaxation for k-means clustering. In: Advances in neural information processing systems, pp 1057–1064

Download references

Acknowledgements

The authors acknowledge the support of Consejo Nacional de Ciencia y Tecnología (CONACYT), Asociación Mexicana de Cultura, A.C., and Centro de Investigación y Estudios Avanzados (CINVESTAV).

Author information

Authors and Affiliations

CONACYT-Cinvestav-Tamaulipas, Tamaulipas, Mexico
Edwin Aldana-Bobadila
Instituto Tecnológico Autónomo de México, Mexico City, Mexico
Angel Kuri-Morales
Cinvestav-Tamaulipas, Tamaulipas, Mexico
Ivan Lopez-Arevalo
Universidad Autónoma de Tamaulipas, Tamaulipas, Mexico
Ana B. Rios-Alvarado

Authors

Edwin Aldana-Bobadila
View author publications
You can also search for this author in PubMed Google Scholar
Angel Kuri-Morales
View author publications
You can also search for this author in PubMed Google Scholar
Ivan Lopez-Arevalo
View author publications
You can also search for this author in PubMed Google Scholar
Ana B. Rios-Alvarado
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Edwin Aldana-Bobadila.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by the authors.

Additional information

Communicated by V. Loia.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Experimental process for setting heuristic parameters

An important element in the performance of any heuristic is the way in which its parameters are set. Many attempts to solve this problem have been reported (Lobo et al. 2007; Grefenstette 1986; Storn and Price 1997; Talbi 2009). Most of them have important dependencies associated to the properties of the problem that disable the generalization of the suggested optimal values. We conducted an experimental process to determine the most suitable parameters setting relative to the characteristics of our problem. A parameters setting can be defined as m-tuple of the form \(\tau =[p_1,p_2,\ldots ,p_{m}]\) that includes those parameters \(p_i\) that are necessary to execute H (e.g., population size, crossover and mutation probabilities). Given a heuristic H, the experimental process is summarized as follows:

1.
The domain for each parameter \(p_i\) is previously defined.
2.
A random set of possible instances of \(\tau \) is systematically generated (about 300 different instances). Such a set is denoted as \({\mathcal {I}}\).
3.
From \({\mathcal {I}}\), an instance \(\tau _i\) is randomly chosen.
4.
A set of 32 clustering problems (from a reservoir described in “Appendix A”) is solved via Algorithm 1 using H with the parameters \(\tau _i\). Based on the solutions obtained using \(\tau _i\), an average value is calculated and taken as a performance value \(F_i\).
5.
Step 3 is repeated until \(i=300\).
6.
The instance that achieves the best value of \(F_i\) is selected as the most suitable setting.

The above process was executed with EGA using the indices SD, DB, DD. We reformulated every index as an objective function that requires to be minimized and whose range is the interval [0, 1]. When EGA is running, we got snapshots of the value of the index to be optimized every 50 generations. This allows us to obtain the value of performance F throughout the adaptive process. In Fig. 7 is shown the value of F obtained by the top 5 instances of \(\tau \) throughout 500 iterations.

The best instance of \(\tau \) is that with the fastest convergence and the best value of F. The values of such an instance were shown in Table 1.

Appendix B: Generating clustering problems

The relative performance of a clustering method must refer to its ability when compared to other methods to solve the same set of problems. A generalization of the performance will be possible as long as these problems represent a random sample of all clustering problems in a wide a numerical space. A systematic process has been followed to generate numerical datasets in this space. Each dataset contains elements grouped into k clusters which are generated via a set of parametric functions as follows:

Let k, \(\aleph _i\) and \({\mathbb {F}}\) the number of clusters, the size of the ith cluster and a set of generator functions, respectively. A cluster will be a set of d-dimensional vectors generated as follows:

1.
From \({\mathbb {F}}\), a subset of functions \(f_i:{\mathbb {R}}\rightarrow {\mathbb {R}}\) are randomly chosen (\(i=1,2,\ldots ,d\)).
2.
A vector of the form \(\mathbf {v}=[f_1(x_1),f_2(x_2),\ldots ,f_{d}(x_d)]\) is generated. The values of \(x_i\) are drawn randomly from the domain of \(f_i\).
3.
Step 2 is repeated until \(\aleph _i\) vectors have been obtained.

This process is executed until k clusters have been obtained. Set \({\mathbb {F}}\) includes the functions reported in Pohlheim (2012) and Molga and Smutnicki (2005). The reservoir generated includes clustering problems with \(k = 2,3,\ldots ,20\). The above process was implemented in Java language and the resulting data were stored in a relational database (MySQL).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Aldana-Bobadila, E., Kuri-Morales, A., Lopez-Arevalo, I. et al. An unsupervised learning approach for multilayer perceptron networks. Soft Comput 23, 11001–11013 (2019). https://doi.org/10.1007/s00500-018-3655-2

Download citation

Published: 26 November 2018
Issue Date: November 2019
DOI: https://doi.org/10.1007/s00500-018-3655-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An unsupervised learning approach for multilayer perceptron networks

Abstract

Access this article

Similar content being viewed by others

Robust Multilayer Perceptrons: Robust Loss Functions and Their Derivatives

Second Order Training and Sizing for the Multilayer Perceptron

MMLD Inference of Multilayer Perceptrons

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Appendices

Appendix A: Experimental process for setting heuristic parameters

Appendix B: Generating clustering problems

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An unsupervised learning approach for multilayer perceptron networks

Abstract

Access this article

Similar content being viewed by others

Robust Multilayer Perceptrons: Robust Loss Functions and Their Derivatives

Second Order Training and Sizing for the Multilayer Perceptron

MMLD Inference of Multilayer Perceptrons

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Appendices

Appendix A: Experimental process for setting heuristic parameters

Appendix B: Generating clustering problems

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation