## Abstract

In this paper, a comparative study between two different neural network models is performed for a very simple type of classificaction problem in 2D. The first model is a deep neural network and the second is a dendrite morphological neuron. The metrics to be compared are: training time, classification accuracies and number of learning parameters. We also compare the decision boundaries generated by both models. The experiments show that the dendrite morphological neurons surpass the deep neural networks by a wide margin in terms of higher accuracies and a lesser number of parameters. From this, we raise the hypothesis that deep learning networks can be improved adding morphological neurons.

You have full access to this open access chapter, Download conference paper PDF

### Similar content being viewed by others

## Keywords

*These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.*

## 1 Introduction

In the area of Artificial Intelligence there is a great diversity of algorithms for pattern classification, and one of the most important is the Multi-Layer Perceptron (MLP) which through a training process adjusts the hyperplanes of each neuron in each layer to separate the classes of some dataset [22, 25, 26]. The training is often based on gradient descent and back-propagation [22]. This model since its appearance in 1961 [25] has been widely used in the area of pattern recognition. However, there are other classification algorithms such as Dendrite Morphological Neuron (DMN) which use a training algorithm completely different from back-propagation [22], in the sense that they do not try to approximate a hyperplane through an iterative training process, analyzing each sample of the training set. Instead, this type of neuron analyzes the elements as a complete set and based on lattice operations generate hyperboxes. They are able to classify the different classes of the training set with a higher rate.

The success of Deep Neural Networks (DNN) is well known for recognizing objects in images [12] and speech in audio [9]. The mathematical operations employed in these neurons remain the same as those of a MLP [22]: sums, multiplications and some well-known non-linear functions. Furthermore, convolutions are used for reducing the number of learning parameters [13]. So, the novelty of the last 10 years has focused on more computing power, more layers, more data and the dropout [14, 23]. These last two are to avoid overfitting in deep models that previously prevented the MLPs from giving better results than the Support Vector Machines (SVM) [3]. It is important to note that these developments are not related to the mathematical structure. This leads us to ask if there are other mathematical operations that can improve the recognition performance. In this paper, we started a research project in that direction. In particular, we compared DNNs with DMNs for a specific type of problem: multi-class spirals with several loops in 2D. Even when this classification problem is artificial, it is useful for studying the essential properties of the two models. A very first analysis was published in [27]; here we extend the analysis for deeper models and more classes. As classification tools, both models are subjects for comparison in terms of percentage of classification and training times, which depend directly on the number of parameters that constitute the model.

The rest of the paper is organized as follows. Section 2 provides a brief description of works that have proposed a different mathematical structure from the mainstream of neural networks. Sections 3 and 4 present the architecture of DNNs and DMNs, respectively. Section 5 discusses the experimental results. Then, in Sect. 6 we give our conclusions and future work.

## 2 Previous Work

Currently, there are few studies aimed at improving the mathematical structure of deep neural networks. However, before the term “deep learning” was born, we could find several papers with interesting proposals. Pessoa and Maragos [18] combined linear with rank filters. This architecture has shown that it can recognize digits in images, generating similar or better results compared to classical MLPs in shorter training times. Ivakhnenko [11] proposes a multilayer of polynomials to approximate the decision boundary for clasification problems. This was the first deep learning model published in literature. Dubin and Rumelhart [4] introduce product units into neural networks. These units add complexity to the model in order to use less layers. Other mathematical structures have been proposed such as: higher-order neural networks (NNs) [5], sigma-pi NNs [8], second-order NNs [17], functionally expanded NNs [10], wavelet NNs [29] and Bayesian NNs [16]. Glorot investigated more effective ways of training very deep neural networks using ReLUs as activation functions, achieving results comparable to the state-of-the-art [6]. Bengio [2] argues that in order to learn complex functions through training by gradient descent, it is necessary to use deep architectures. In [1] Bengio also analyzes and considers alternatives to training by standard gradient descent, due to the trade-off between efficient learning and latching on information. In this paper, we evaluate the performance of the DNNs with that of the DMNs to show some limitations of the DNNs and how morphological operations could improve deep learning.

## 3 Deep Neural Networks

“A deep learning architecture is a multilayered stack of simple modules with multiple non-linear layers” [14] (usually between 5 and 20 layers), and each layer contains a \(n_{i}\) number of modules, where *i* is the layer number, each module is a neuron with some activation function such as sigmoid or tanh. So an MLP and its generalization a DNN are defined by a set of neurons divided into layers: an input, one or more intermediate and an output layer. Thus, the DNN architectures that are constructed to classify the datasets are neural networks which have an *i* number of intermediate layers and a \(n_{i}\) number of neurons per layer, and the numbers of neurons per layer \(n_{i-1}\) and \(n_{i}\) are not necessarily the same. In our experiments we used the Rectified Linear Unit (ReLU) due to better results in DNN according to [6, 14, 15], so that a neuron is defined by:

where *x* is the input vector of *N* dimensions and *w* is the weights vector that multiplies the input vector. In the output layer, the activation function is changed by a softmax, which is commonly used to predict the probabilities associated with a multinoulli distribution [7], which is defined by

The general DNN architecture is shown in Fig. 1. It is also common practice to vary the number of neurons contained in each layer of the DNN. The training method used for the DNN is Nesterov gradient descent with a mini-batch size of 64 and a moment of 0.9, which helps us to a more stable and fast convergence.

## 4 Dendrite Morphological Neurons

A DMN segments the input space into hyperboxes of *N* dimensions. The output *y* of a neuron is a scalar given by

where *n* is the dendrite number, *k* is the class number, and \(d_{n,k}\) is the scalar output of a dendrite given by

where *x* is the input vector, \(w_{min}\) and \(w_{max}\) are dendrite weight vectors. The min operations together check if *x* is inside the hyperbox limited by \(w_{min}\) and \(w_{max}\) as the extreme points (see Fig. 2). If \(d_{n.k}>0\), *x* is inside the hyperbox, If \(d_{n,k}=0\), *x* is somewhere in the hyperbox boundary; otherwise, it is outside. A good property of DMN is that they can create complex non-linear decision boundaries that separate classes with only one neuron [20, 21]. The reader can consult [28] for more information.

The training goal is to determine the number of hyperboxes and their weights needed to classify an input pattern. The regularized divide and conquer training method [28] consists of only two steps. The algorithm begins by opening an initial hyperbox \(H_{0}\) that encloses all the samples with a margin distance *M* respect to each side of \(H_{0}\) to have a better noise tolerance. Next the divide and conquer strategy is executed in a recursive way. The algorithm chooses a training sample *x* to generate a sub-hyperbox \(H_{sub}\) around it. Next it extracts the samples \(\left( X_{H_{sub}},T_{H_{sub}}\right) \) from \(\left( X,T\right) \) that are enclosed in \(H_{sub}\), where *X* is a training samples set represented as a matrix \(X\epsilon \mathfrak {R}{}^{NxQ_{train}}\), \(Q_{train}\) is the number of training samples and the target class for each sample is contained in vector \(T\epsilon \mathfrak {R}{}^{1xQ_{train}}\). The recursion divides \(H_{0}\) until the error rate \(E_{\%}\) in the hyperbox *H* is less or equal to the hyper-parameter \(E_{0}\). The error rate is defined as \(E_{\%}=\frac{\left| X_{mode}\right| }{\left| X\right| }\), where \(X_{mode}\) is the set of the most repeated training class [19]. At the end of the recursion process, the deepest hyperbox is assigned to the ruling class, which is set to the statistical mode of *T*. The recursive closing procedure is executed by appending all generated sub-hyperboxes with their corresponding classes. The hyperboxes with a common hyperface are joined. A complete description of this training method can be found in [24, 28].

## 5 Experiments

The experiments were designed with the aim of comparing the performance of the two neural networks, taking as a starting point the same training set. The aspects evaluated are the classification accuracy in the validation set, the training time, the number of parameters necessary for the network to correctly classify the training set, and the decision boundaries.

### 5.1 Spiral Datasets

The training set is a set of synthetic data, designed to test the ability of the two types of neural networks in the unraveling of the hyperplanes, that is, the synthetic data is generated with a high rate of entanglement, and a low degree of overlap between classes. For this purpose the generated data spiral consists of 1 to 5 classes wrapped one over the other, and the number of turns vary between 1 and 10. The representation of said training set is shown in Fig. 3 in such a way that the training set is shaped as shown in Table 1.

### 5.2 Experimetal Results for DNNs

In order to classify the patterns presented in the Sect. 5.1 the DNN architecture varies in depth the number of neurons per layer, as well as the number of hidden layers, leaving the hyper-parameters fixed to the following values, learning rate of 0.1, Nesterov momentum of 0.9 and batch size of 64. The value of the hyper-parameters was obtained by performing classification tests by varying the values of the learning rate in a range of \(\left[ 1,0.001\right] \), with increments of 0.01. Table 2 summarizes the resulting architectures applied to each training set; the column “Dataset” specifies the number of the training set used, column \(N_{p}\) specifies the number of parameters in the neural network model, column \(T_{a}\) specifies the percentage of classification on the training set, column \(V_{a}\) shows the classification percentage on the validation set obtained by that neural network model, and column \(T_{t}\) shows the total training and validation time. Figure 5 shows the classification accuracies for each neural network, number of classes and number of loops of each training set; showing better results for DMN over DNN models.

### 5.3 Experimetal Results for DMNs

In the same way as in Sect. 5.2, in Table 3 the architecture of the DMN is presented; the first column shows the training set number used and the third column \(G_{i}\) shows the index of generalization of the DMN.

### 5.4 Decision Boundaries

This section compares the decision boundaries generated by the two types of neural network architectures (DNN and DMN) on the same training sets specified in Sect. 5.1. As we observe, the nature of each algorithm is very different, generating approximations to hyperplanes/hyperboxes, which yield similar results. However, for the specific dataset used, we can observe that the generation of hyperboxes of variable size best models the training set with a higher classification rate and less parameters in the DMN model. These results can be observed in Fig. 4. Each pair of images grouped by column, shows the decision boundary generated by the DMN (top) and the DNN (bottom). As can be seen in column (b), the decision boundaries are best defined by the DMN (column (b), top) than the decision boundaries generated by the DNN (column (b), bottom).

## 6 Conclusion and Future Work

Linear filters with non-linear activation functions (and back-propagation) are today the battle horses of the neural network community. This leads us to ask the questions: Are there other mathematical structures that produce better results for some problems? What advantages would they have? The motivation of this research is to answer these questions. In this paper, we compare DNNs and DMNs in a very simple 2D classification problem: multi-class spirals with increasing number of loops. We show that the performance of the DMNs surpasses that of the DNNs in terms of higher accuracies and a lesser number of learning parameters. Of course, these results are limited to spiral-like problems, which we specifically designed to test the ability of separation for the two neural architectures. It is clear that the DMN training time is longer than the DNN training time, furthermore, the classification rate is not compromised, that is, the DNNs can be trained in a shorter time, but their validation accuracy is much lower to that obtained by the DMN.

We conclude that this result is due to the nature of both algorithms. The hyperboxes of DMNs make better models for these types of datasets because the divide and conquer training is based on geometrical interpretation of the whole data, and refines the model each recursion step, while training based on gradient descent is a search method in a dark environment only guided by partial dataset information, and local information cost function. From this, we raise the hypothesis that deep learning networks can be improved adding morphological neurons. This is a consideration for future research.

## References

Bengio, Y., Simard, P., Frasconi, P.: Learning long-term dependencies with gradient descent is difficult. Trans. Neur. Netw.

**5**(2), 157–166 (1994)Bengio, Y.: Learning deep architectures for AI. Found. Trends Mach. Learn.

**2**(1), 1–127 (2009)Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn.

**20**(3), 273–297 (1995)Durbin, R., Rumelhart, D.E.: Product units: a computationally powerful and biologically plausible extension to backpropagation networks. Neural Comput.

**1**(1), 133–142 (1989)Giles, C.L., Maxwell, T.: Learning, invariance, and generalization in high-order neural networks. Appl. Opt.

**26**(23), 4972–4978 (1987)Glorot, X., Bordes, A., Bengio, Y.: Deep sparse rectifier neural networks. In: Gordon, G.J., Dunson, D.B., Dudik, M. (eds.), AISTATS, vol. 15. JMLR Proceedings, pp. 315–323 (2011). JMLR.org

Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press (2016). http://www.deeplearningbook.org

Gurney, K.N.: Training nets of hardware realizable sigma-pi units. Neural Networks

**5**(2), 289–303 (1992)Hinton, G., Deng, L., Dong, Y., Dahl, G.E., Mohamed, A., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T.N., et al.: Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Process. Mag.

**29**(6), 82–97 (2012)Hussain, A.: A new neural network structure for temporal signal processing. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 1997, Munich, Germany, 21–24 April, pp. 3341–3344 (1997)

Ivakhnenko, A.G.: Polynomial theory of complex systems. IEEE Trans. Syst. Man Cybern.

**SMC–1**(4), 364–378 (1971)Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)

LeCun, Y., Bengio, Y.: The handbook of brain theory and neural networks. In: Convolutional Networks for Images, Speech, and Time Series, pp. 255–258. MIT Press, Cambridge (1998)

LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature

**521**(7553), 436–444 (2015)LeCun, Y., Bottou, L., Orr, G.B., Müller, K.-R.: Efficient backprop. In: Orr, G.B., Müller, K.-R. (eds.) Neural Networks: Tricks of the Trade. LNCS, vol. 1524, pp. 9–50. Springer, Heidelberg (1998). doi:10.1007/3-540-49430-8_2

MacKay, D.J.C.: A practical bayesian framework for backpropagation networks. Neural Comput.

**4**(3), 448–472 (1992)Milenkovic, Z., Obradovic, S., Litovski, V.: Annealing based dynamic learning in second-order neural networks. In: IEEE International Conference on Neural Networks, vol. 1, pp. 458–463. IEEE (1996)

Pessoa, L.F.C., Maragos, P.: Neural networks with hybrid morphological/rank/linear nodes: a unifying framework with applications to handwritten character recognition. Pattern Recogn.

**33**(6), 945–960 (2000)Ritter, G.X., Iancu, L., Urcid, G.: Morphological perceptrons with dendritic structure. In: The 12th IEEE International Conference on Fuzzy Systems, FUZZ-IEEE 2003, St. Louis, Missouri, USA, 25–28 May 2003, pp. 1296–1301 (2003)

Ritter, G.X., Urcid, G.: Lattice algebra approach to single-neuron computation. IEEE Trans. Neural Networks

**14**(2), 282–295 (2003)Ritter, G.X., Urcid, G.: Learning in lattice neural networks that employ dendritic computing. In: Kaburlasos, V.G., Ritter, G.X. (eds.) Computational Intelligence Based on Lattice Theory. SCI, vol. 67, pp. 25–44. Springer, Heidelberg (2007)

Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Parallel distributed processing: Explorations in the microstructure of cognition. In: Learning Internal Representations by Error Propagation, vol. 1, pp. 318–362. MIT Press, Cambridge (1986)

Schmidhuber, J.: Deep learning in neural networks: an overview. Neural Networks

**61**, 85–117 (2015)Sossa, H., Guevara, E.: Efficient training for dendrite morphological neural networks. Neurocomputing

**131**, 132–142 (2014)Van Der Malsburg, C.: Frank Rosenblatt: principles of neurodynamics: perceptrons and the theory of brain mechanisms. In: Palm, G., Aertsen, A. (eds.) Brain Theory. Springer, Heidelberg (1986)

Wasserman, P.D., Schwartz, T.J.: Neural networks. II. What are they and why is everybody so interested in them now? IEEE Expert

**3**(1), 10–15 (1988)Zamora, E., Sossa,H.: Dendrite morphological neurons trained by stochastic gradient descent. In: IEEE Symposium Series on Computational Intelligence (SSCI), pp. 1–8, December 2016

Zamora, E., Sossa, H.: Regularized divide and conquer training for dendrite morphological neurons. In: Mechatronics and Robotics Service: Theory and Applications, Mexican Mechatronics Association, November 2016

Zhang, Q., Benveniste, A.: Wavelet networks. IEEE Trans. Neural Networks

**3**(6), 889–898 (1992)

## Acknowledgments

E. Zamora and H. Sossa would like to acknowledge the support provided by UPIITA-IPN and CIC-IPN in carrying out this research. This work was economically supported by SIP-IPN (grant numbers 20170836 and 20170693), and CONACYT grant number 65 (Frontiers of Science). G. Hernández acknowledges CONACYT for the scholarship granted towards pursuing his PhD studies.

## Author information

### Authors and Affiliations

### Corresponding author

## Editor information

### Editors and Affiliations

## Rights and permissions

## Copyright information

© 2017 Springer International Publishing AG

## About this paper

### Cite this paper

Hernández, G., Zamora, E., Sossa, H. (2017). Comparing Deep and Dendrite Neural Networks: A Case Study. In: Carrasco-Ochoa, J., Martínez-Trinidad, J., Olvera-López, J. (eds) Pattern Recognition. MCPR 2017. Lecture Notes in Computer Science(), vol 10267. Springer, Cham. https://doi.org/10.1007/978-3-319-59226-8_4

### Download citation

DOI: https://doi.org/10.1007/978-3-319-59226-8_4

Published:

Publisher Name: Springer, Cham

Print ISBN: 978-3-319-59225-1

Online ISBN: 978-3-319-59226-8

eBook Packages: Computer ScienceComputer Science (R0)