1 Introduction

Now-a-days, research on ANN is very much challenging and it is an emerging part of artificial intelligence (AI) [1,2,3,4,5,6]. It is a technique, which attempts to simulate the function of the human brain and implement it in machine intelligence. The main parts of the human brain are a network of neurons, synapses, axons, dendrites and others. The neurons (i.e., information processor) are interconnected through dendrites. The different neurons and dendrites meet to form synapses, which is the roadways for passing the information. The neuron receives electrochemical impulses through synapses. If the total impulses received by the neuron exceed a firm threshold value, the neuron fires and sends another impulse down to other neurons through the axon. Synapses help to create the connection between axon and other neurons. Therefore, a neuron receives a set of input impulse and sets out another pulse, which depends on the total input to it and its activation level in the brain. The information is processed through a large number of such neurons. Essentially, ANN is a graph with a set of nodes and arcs [6]. A generalized view of network structure and model of neuron is as follows (Fig. 1).

Fig. 1
figure 1

A simple neuron model

Here, y is the output of neuron and it is defined as follows.

$$f\left( {\sum {w_{i} x_{i} } - \theta } \right)$$
(1)

where wi = weight of input signal i, 1 ≤ i ≤ n, xi = input signal i, 1 ≤ i ≤ n, θ = threshold level and f(∙) = a non-linear function.

A weight is multiplied with each input, which is analogous to a synaptic strength. The activation value is the product of the inputs and its corresponding weights ( ∑ wixi) and it is compared with a given threshold value. If the summation is more than the given threshold value, the threshold element calculates an output signal \((\sum\nolimits_{i = 1}^{n} {w_{i} x_{i} } - \theta )\) using activation. Note that the threshold function may be a sigmoid or a hyperbolic tangent function.

The primary contribution of this paper is stated as follows.

  • A two-layer perceptron is developed with a periodic non-monotonous activation function (referred as PP) to compute any Boolean function.

  • An efficient learning algorithm is proposed and a comparison is performed between PP and multilayer perceptron. It is observed that PP gives better result than the multilayer perceptron.

  • PP is tested with realistic problems, such as the XOR function and the parity problem.

The paper is structured as follows. Section 2 discusses the background work. Section 3 shows the two-layer perceptron with non-monotonous activation function. Section 4 presents the results and discussion. At last, the conclusion is presented in Sect. 5.

2 Related works

Figure 2 gives the structure of a neural network (NN) that is represented by a set of nodes and a set arrows. This structure mainly deals with three layers, such as input, hidden and output layers. In order to function the NN, the weights are being initialized. As a result, the network is made to learn by using some learning methods and rules [2,3,4,5,6,7,8,9,10,11,12,13,14,15]. The connection weights are adjusted during the training. When the training is completed, the weights are fixed to some value. Note that learning of NN indicates parameter change and synaptic changes in brain or nervous system. There are various learning rules for NN, such as simple Hebbian, delta rule and generalized delta rule.

Fig. 2
figure 2

Neural information processor

The two popular methods of learning are supervised and unsupervised. A number of well-known NN models have been built, such as perceptron, multilayered perceptron, adaptive resonance theory network and Boltzmann machine. ANN has gained immense popularity as a useful tool in modeling and simulation. Moreover, it is a mathematical model inspired by the working of a biological brain and borrows heavily from the literature of brain and memory modeling. It comes in many flavors, but, the most popular is the backpropagation model, which is synonymous with ANN. The most significant use of a backpropagation ANN is creating an approximate model of a system, whose response to a large set of stimuli is known and there is no need for creating a mathematical model of any particular kind. A system, which takes an N-dimensional stimulus vector, X and produces M-dimensional response vector Y, can generate a set of P stimulus pairs of the form {X, Y}. To model this system, we create a backpropagation network with a set of unknown weights that is attached to each connection and a non-linear transformation function on the internal nodes. Each internal node works on the following model.

$${\text{Output}} = f\left( {{\text{sum}}\,{\text{of}}\,{\text{inputs}}} \right)$$
(2)

where f is highly non-linear function. The most popular one is the sigmoid function. The weights are estimated using an iterative stage, called training. The problem of training is as follows.

Given a function yʹ=g(W, X), where, X is a stimulus vector, yʹ is the response vector and W is the set of weights assigned to the internal connections to find the value. It is required to minimize the least square error (Er) of the periodic function as follows.

$$E(W) = Er = \sum\limits_{k = 1}^{p} {\sum\limits_{j = 1}^{M} {(y_{k,j}^{{\prime }} - y_{k,j} )^{2} } }$$
(3)

In Fig. 3, the inputs are denoted as {v1, v2,…, vn} and the weights are denoted as {w1, w2,…, wn}.

Fig. 3
figure 3

A simple (McCulloch-Pitts) neurons

The total input of the neuron is calculated as follows.

$$x = \sum\limits_{i = 1}^{n} {v_{i} w_{i} }$$
(4)

The output of the neuron is as follows.

$$x = \sum\limits_{i = 1}^{n} {v_{i} w_{i} } - \theta$$
(5)

where θ is called as threshold associated with this neuron. In addition, there is a transfer function f(x), which provides the discrete and continuous output as follows.

$$f(x) = \left\{ {\begin{array}{*{20}c} 0 & {{\text{if }}x \le 0} \\ 1 & {{\text{if }}x > 0} \\ \end{array} } \right\}$$
(6)

The below perceptron is called as Rosenblatt’s perceptron [7].

$$f(x) = \frac{1}{{(1 + e^{ - x} )}}$$
(7)

Hornik et al. [16] have stated that a perceptron with a huge number of hidden layers can estimate any type of function. But, finding an optimal solution remains a crucial problem as addressed by Hinton [17]. Brady [18] has used periodic activation function to study the convergence of learning algorithm. Gioiello et al. [19] have used multilayer perceptron to study handwritten classification. Filliatre and Racca [20] have studied the PP for speech synthesis. Many such works have been presented in [21,22,23,24,25]. Hu et al. [21] have used two distributions, namely Cauchy and laplace and one error function, called Gaussian to generate novel activation functions. Moreover, they have compared three functions, namely sigmoid, hyperbolic tangent and normal distribution functions. Fawaz et al. [23] have focused on binary neural network and presented the usefulness of quantum amplitude amplification. Godfrey [25] has stated that most of the literautres are relying on one or two activation functions throughout the network. As a result, they have studied various heterogeneous activation functions and their possible applications.

3 Two-layer perceptron with non-monotonous activation function

Let Ni be the ith neuron receiving input signals {s1, s2,…, sn}. Let Ii be the total stimulus (input) and Oi be the output, which are mathematically expressed as follows.

$$I_{i} = \sum\limits_{i = 1}^{n} {s_{i} } \,\,{\text{and}}\,\,O_{i} = f\left( {I_{i} } \right)$$
(8)

where f is denoted as activation function (Fig. 4).

Fig. 4
figure 4

Output signal of the input given to the neuron

Let us consider that there are three neurons (Fig. 5) in which two neurons in the first layer and one neuron in the second layer. The input and output of the neurons are binary (i.e., 0 and 1). The activation of two neurons, namely N1 and N2 are given by the system, when the input neuron is equal to their excitation, that is, O1 = I1 and O2 = I2. The activity of the neuron N3 is given as follows.

Fig. 5
figure 5

The network of PP with three neurons

$$O_{3} = C_{r} \left( {W_{1,3} O_{1} + W_{2,3} O_{2} } \right)$$
(9)

where Cr is the crenel function, which is defined as follows.

$$C_{r} (x) = \left\{ {\begin{array}{*{20}l} I \hfill & {{\text{if }}T_{1} \le x \le T_{2} } \hfill \\ O \hfill & {\text{Otherwise}} \hfill \\ \end{array} } \right\}$$
(10)

where T1 and T2 are the threshold of the neuron N1 and N2.

The crenel function of the PP is given as follows (Fig. 6).

Fig. 6
figure 6

The crenel function of PP

The weights W1,3 and W2,3 can be taken as follows.

$$w_{1,3} = w_{2,3} = \frac{{T_{1} + T_{2} }}{2}$$
(11)

The XOR function calculation by PP is shown in Table 1.

Table 1 XOR function calculation by PP

We can adopt the following rule for changing periodicity of the activation function.

Let \(f_{r}^{k} (x)\) be the activation function with period 2 k. Here, \(f_{r}^{k} (x)\)=\(f_{r}^{{}} (\frac{x}{k})\) with weight matrix Wk. Based on the above facts, the following theorem is proved.

Theorem 1

Every Boolean function can be evaluated by using a periodic perceptron with three neurons.

Proof

Consider the network of the PP of three neurons as shown in Fig. 5, where the output neuron (i.e., N3) is provided by the activation function fr. The following need to be considered to compute the Boolean function f.

$$\phi (0,0) = f_{r} (0)$$
(12)
$$\phi (1,0) = f_{r} (W_{1,3} )$$
(13)
$$\phi (0,1) = f_{r} (W_{2,3} )$$
(14)
$$\phi (1,1) = f_{r} (W_{1,3} + W_{2,3} )$$
(15)

Here, r is determined as \(\left\{ {\begin{array}{*{20}c} {r = 0} & {{\text{if }}\phi (0,0) = 0} \\ {r = 1} & {\text{Otherwise}} \\ \end{array} } \right\}\).□

The weights W1,3 and W2,3 are selected randomly in order to satisfy Eqs. (13) and (14). As fv is a periodic (i.e., period 2), there exists an interval length l around W1,3, where Eqs. (13) and (14) are fulfilled. If [x1, x2] is the interval, where W1,3 satisfying Eq. (13) and [x3, x4] is the interval satisfying Eq. (14), then W1,3 and W2,3 sweep through these intervals and through an interval of length 2 ([x1 + x3, x2 + x4]), where fr (W1,3 + W2,3) is equals to 0 and 1 alternately.

The learning algorithm for PP is shown in Table 2, which is adopted based on the delta learning method.

Table 2 Leaning algorithm for PP

4 Results and discussion

In this section, the results are computed to test the performance of the learning algorithm for PP in order to find the Boolean function. The algorithm uses the XOR function and the parity concept to get the results.

  1. 1.

    The XOR function: In this problem, we train PP with three neurons and we compute XOR function.

  2. 2.

    The parity problem: Let A be a set of n-bit vector. The set splits into A0 and A1, where A0 includes odd number of 0’s and Ai includes the others.

For t instance of time, let a be the learning rate and r is the correction factor and 0r1. The test results for the activation function with the delta learning rule are shown in Table 3. Periodic perceptron is used in such a way that the remaining hidden layer gives the same output. The algorithm is an efficient one for finding the Boolean function.

Table 3 Test results

5 Conclusion

In this paper, we have observed that a two-layer perceptron with a periodic non-monotonous activation function can compute any Boolean function. An efficient learning algorithm for periodic perceptron has been proposed to test two realistic problems, such as the XOR function and the parity problem. The performance of PP have compared with multilayer perceptron and it has been observed that PP gives better result than the multilayer perceptron. In future, this work can be extended by adding the deep nueral network (DNN) and/or convolution neural network (CNN) concept to analyze the error.