Appendix 1
Proof of Theorem 6:
First of all we simplify the expression given in Theorem 3.2 of [21] where we use \(x=\frac{\overline{\lambda }(n)}{4}\) for simplicity.
$$\begin{aligned}&\left( 1- x\right) ^n-\sum _{i=1}^n b_i C_i^n x^i \end{aligned}$$
(38)
$$\begin{aligned}&\quad =\sum _{k=0}^n (-1)^k\cdot x^k \cdot \left( {\begin{array}{c}n\\ k\end{array}}\right) - \sum _{i=1}^n b_i \left( {\begin{array}{c}n\\ i\end{array}}\right) x^i \end{aligned}$$
(39)
$$\begin{aligned}&\quad =1+\sum _{k=1}^n (-1)^k\cdot x^k \cdot \left( {\begin{array}{c}n\\ k\end{array}}\right) - \sum _{i=1}^n b_i \left( {\begin{array}{c}n\\ i\end{array}}\right) x^i \end{aligned}$$
(40)
$$\begin{aligned}&\quad =1+\sum _{i=1}^n x^i \left( (-1)^i\left( {\begin{array}{c}n\\ i\end{array}}\right) - b_i \left( {\begin{array}{c}n\\ i\end{array}}\right) \right) \end{aligned}$$
(41)
$$\begin{aligned}&\quad {\text {using}}\,{\text {that}}\;b_i=ib_{i-1}+(-1)^i\;{\text {we}}\,{\text {get}}\,{\text {that}} \end{aligned}$$
(42)
$$\begin{aligned}&\quad =1+\sum _{i=1}^n x^i \left( -i\cdot b_{i-1}\left( {\begin{array}{c}n\\ i\end{array}}\right) \right) \end{aligned}$$
(43)
which means that \(\frac{\overline{\lambda }(n)}{4}\) is root of the polynomial
$$\begin{aligned} p_n(x)=\sum _{i=1}^n i\cdot b_{i-1}\left( {\begin{array}{c}n\\ i\end{array}}\right) x^i -1. \end{aligned}$$
(44)
The first few values of the recursive sequence \(b_i\):
$$\begin{aligned} b_0=1, \quad b_1=0, \quad b_2= 1, \quad b_3=2, \quad b_4=9, \quad b_5=44 \end{aligned}$$
(45)
The polynomials \(p_n(x)\), the values of \(\overline{\lambda }(n)\) and 4/n for \(n=1,\ldots ,5\) are the following:
n
|
\(p_n(x)\)
|
\(\overline{\lambda }(n)\)
|
4/n
|
---|
1
|
\(x-1\)
|
4
|
4
|
2
|
\(2x-1\)
|
2
|
2
|
3
|
\(3x^3+3x-1\)
|
1.2199
|
1.3333
|
4
|
\(8x^4+12x^3+4x-1\)
|
0.8624
|
1
|
5
|
\(45x^5+40x^4+30x^3+5x-1\)
|
0.6624
|
0.8
|
As it can be recognized, the upper bound provided by Theorem 5 is slightly higher than that given by the recursion in [21]. The following proof shows that it is true for every value of n. The main ideas of the proof are listed below:
-
Since the polynomial \(p_n(x)\) has positive coefficient (except for the constant term), it is strictly increasing if \(x>0\).
-
\(\frac{\overline{\lambda }(n)}{4}\) is a positive root of polynomial \(p_n(x)\), so \(p_n\left( \frac{\overline{\lambda }(n)}{4} \right) =0\).
-
This implies that if \(p_n\left( \frac{4/n}{4} \right) > 0\) then \(\frac{4}{n} > \overline{\lambda }(n)\).
-
So we have to show that \(p_n\left( 1/n \right) > 0\).
We have seen (and simple computations show) that for \(n=1\) and \(n=2\) \(\overline{\lambda }(n)=4/n\). Let us consider the case when \(n\ge 3\), so we have to prove that
$$\begin{aligned} \sum _{i=1}^n i\cdot b_{i-1}\left( {\begin{array}{c}n\\ i\end{array}}\right) \left( \frac{1}{n} \right) ^i -1 >0 \qquad {\text {for}}\,{\text {every}}\;n\ge 3. \end{aligned}$$
(46)
The sequence \(b_i\) is monotone increasing if \(i\ge 1\), which can be easily proved by induction. Moreover, the value of \(i\cdot b_{i-1}\) is greater than or equal to 3 for every \(i \ge 3\):
$$\begin{aligned} i\cdot b_{i-1} \ge i\cdot b_2 \ge 3 \cdot b_2=3. \end{aligned}$$
(47)
We give a lower estimation of \(\sum _{i=1}^n i\cdot b_{i-1}\left( {\begin{array}{c}n\\ i\end{array}}\right) \left( \frac{1}{n} \right) ^i\) by replacing the value of \(i\cdot b_{i-1}\) with 3 for \(i\ge 3\) and taking the exact value for \(i=1,2\):
$$\begin{aligned}&\sum _{i=1}^n i\cdot b_{i-1}\left( {\begin{array}{c}n\\ i\end{array}}\right) \left( \frac{1}{n} \right) ^i \end{aligned}$$
(48)
$$\begin{aligned}&\quad \ge 3 \cdot \sum _{i=1}^n \left( {\begin{array}{c}n\\ i\end{array}}\right) \left( \frac{1}{n} \right) ^i -2 \cdot \left( {\begin{array}{c}n\\ 1\end{array}}\right) \left( \frac{1}{n} \right) ^1 - 3 \cdot \left( {\begin{array}{c}n\\ 2\end{array}}\right) \left( \frac{1}{n} \right) ^2 \end{aligned}$$
(49)
$$\begin{aligned}&=3 \cdot \sum _{i=1}^n \left( {\begin{array}{c}n\\ i\end{array}}\right) \left( \frac{1}{n} \right) ^i -2 - 3 \cdot \frac{n-1}{2n} \end{aligned}$$
(50)
$$\begin{aligned}&=3 \cdot \sum _{i=0}^n \left( {\begin{array}{c}n\\ i\end{array}}\right) \left( \frac{1}{n} \right) ^i -5 - 3 \cdot \frac{n-1}{2n} \end{aligned}$$
(51)
Now we will prove that the last expression is greater than one:
$$\begin{aligned}&3 \cdot \sum _{i=0}^n \left( {\begin{array}{c}n\\ i\end{array}}\right) \left( \frac{1}{n} \right) ^i -5 - 3 \cdot \frac{n-1}{2n} > 1 \end{aligned}$$
(52)
$$\begin{aligned}&3 \cdot \sum _{i=0}^n \left( {\begin{array}{c}n\\ i\end{array}}\right) \left( \frac{1}{n} \right) ^i > 6 + 3 \cdot \frac{n-1}{2n} \end{aligned}$$
(53)
$$\begin{aligned}&\left( 1+ \frac{1}{n} \right) ^n > 2.5 - \frac{1}{2n} \end{aligned}$$
(54)
which is true for every \(n\ge 3\).
Actually, we have just proved that the lower estimation of the sum is greater than one, which implies that \(p_n( 1/n) > 0\) for every \(n\ge 3\). As a consequence, we can conclude that \(\frac{4}{n} \ge \overline{\lambda }(n)\) (equality holds if and only if \(n=1\) or \(n=2\)).
Appendix 2
As a kind of generalization of the findings of [3], in [25] the authors stated the theorem below about the number of equilibrium points (limits of the iterations) of a FCM (Theorem 2 of [25]), literally:
‘ There is one and only one solution for any concept value \(A_i\) of any FCM where the sigmoid function \(f=1/(1+e^{-c_lx})\) is used, if:
$$\begin{aligned} \left( \sum _{i=1}^n \left( c_{l_i}\ell _i \Vert w_i\Vert \right) ^2\right) ^{1/2} <1 \end{aligned}$$
(55)
where \(w_i\) is the ith row of matrix W, \(\Vert w_i\Vert\) is the \(L_2\) norm of \(w_i\), \(\ell _i\) is the inclination of function f equal to \(\ell _i=\frac{c_{l_i} }{ e^{c_{l_i}w_iA} }f^2(c_{l_i}w_i \cdot A)\), and \(c_{l_i}\) is the \(c_l\) factor of function f corresponding to \(A_i\) concept’.
In the following, we will show that this theorem is not valid in its current form. First we give an upper estimation of the formula given on the left-hand side of Eq. (55), then we provide numerical counterexamples by pointing out that the requirements stated in the theorem are fulfilled, but the FCM does not converge to a unique fixed point.
The so-called inclination parameter (or the slope) \(\ell _i=\frac{c_{l_i} }{ e^{c_{l_i}w_iA} }f^2(c_{l_i}w_i \cdot A)\) equals the derivative of the sigmoid function at point \(w_i\cdot A\):
$$\begin{aligned} \left( \frac{1}{1+e^{-c_{l_i} x} }\right) '&=\frac{ c_{l_i} e^{-c_{l_i} x } }{ (1+e^{-c_{l_i} x)^2 } }\end{aligned}$$
(56)
$$\begin{aligned}&\quad\quad\quad=\frac{c_{l_i} }{ e^{c_{l_i}x} }\left( \frac{1}{1+e^{-c_{l_i} x} }\right) ^2 =\frac{c_{l_i} }{ e^{c_{l_i}x} }f^2(x) \end{aligned}$$
(57)
On the other hand, it is easy to check that
$$\begin{aligned} \left( \frac{1}{1+e^{-c_{l_i} x} }\right) '=\frac{ c_{l_i} e^{-c_{l_i} x } }{ (1+e^{-c_{l_i} x)^2 } }= c_{l_i} \cdot f(x) (1-f(x)) \end{aligned}$$
(58)
The maximum value of the derivative of the sigmoid function is \(c_{l_i}/4\) (attained at \(f(x)=0.5\), i.e. at \(x=0\)). Using this value, we can give an upper estimation of the left-hand side of the inequality in the theorem (here \(\Vert w_i\Vert\) is the Euclidean norm of the \(i^{th}\) row of W, i.e. \(\Vert w_i\Vert =\left( \sum _{j=1}^n w_{ij}^2 \right) ^{1/2}\)):
$$\begin{aligned} \left( \sum _{i=1}^n \left( c_{l_i}\ell _i \Vert w_i\Vert \right) ^2\right) ^{1/2}&\le \left( \sum _{i=1}^n \left( \frac{c_{l_i}^2}{4} \Vert w_i\Vert \right) ^2\right) ^{1/2} \end{aligned}$$
(59)
$$\begin{aligned}&\quad\quad\quad= \frac{1}{4}\left( \sum _{i=1}^n \left( c_{l_i}^2 \Vert w_i\Vert \right) ^2\right) ^{1/2} \end{aligned}$$
(60)
If \(c_{l_i}=\lambda\) for every i, it becomes:
$$\begin{aligned} \frac{1}{4}\left( \sum _{i=1}^n \left( c_{l_i}^2 \Vert w_i\Vert \right) ^2\right) ^{1/2}&= \frac{1}{4}\left( \sum _{i=1}^n \left( \lambda ^2 \Vert w_i\Vert \right) ^2\right) ^{1/2} \end{aligned}$$
(61)
$$\begin{aligned}&\quad\quad\quad= \frac{\lambda ^2}{4}\left( \sum _{i=1}^n \Vert w_i\Vert ^2\right) ^{1/2} \end{aligned}$$
(62)
In the following, we show that there exist FCMs, such that the conditions of Theorem 2 of [25] are fulfilled, but the iteration process does not lead to a unique fixed point (equilibrium point). First we show a counterexample with limit cycle, then an other counterexample with multiple fixed points. Although these are artificial counterexamples, not real-life scenarios, they show that the statement of [25] is not valid.
We applied the Bacterial Evolutionary Algorithm (BEA) [30] to find an FCM model (more specifically, a connection matrix and parameter \(\lambda\)) that behaves in the assumed way in order to confirm our hypothesis. BEA is a member of the well-known family of evolutionary algorithms. It is a global optimizer, which can be used if an approximate solution is acceptable. The algorithm can solve any multi-modal, non-continuous, nonlinear or high-dimensional problem, but the original goal of Nawa and Furuhashi, the researchers who suggested this algorithm, was to optimize the parameters of fuzzy systems [31, 32]. BEA mimics the process of the evolution of bacteria, which explains its name. Like other evolutionary algorithms, BEA works with a collection of possible solutions. These are called the population of ‘bacteria’. The algorithm develops the consecutive generations of populations until the termination criterion is fulfilled. In the simplest case, it can be a limit on the number of generations or on the objective value of the best bacterium. The last population, or at least some of the best bacteria of it can be considered as result. The current population is based on the previous population, and is created by two main operators: bacterial mutation and gene transfer. The former operator explores the search space with random modifications of bacteria, while the latter tries to combine the genetic information of better bacteria with worse bacteria in the hope that it may increase the objective/fitness value of worse bacteria. With other words, gene transfer does the exploitation of genetic data.
Bacterial mutation optimizes the bacteria one by one. First, it creates K clones of every bacteria of the current population, then it iterates over the genes of clones in a random order to modify them. The modification is made in a random way, too, and the clones are evaluated after every single gene modification. If the best clone outperforms the original one, it will be replaced by the new one. At the end, all clones are dropped.
The gene transfer operator divides the population into two, equally sized parts. One of these contains the bacteria with better objective values, which is called the ‘superior half’. The name of the other part is ‘inferior part’. The operator chooses T times a bacterium from the inferior half and overwrites some of its randomly selected genes with the genes of an other bacterium of the superior half. The modified bacterium must be evaluated and the elements of population halves must be determined again. After a successful modification the bacterium has a chance to move to the superior half.
The authors created a computer program based on the above expounded BEA to find a connection matrix with given properties according to their hypothesis. The bacteria represent different connection matrices. The first population is generated randomly, the following generations are created by the operators of BEA using the objective value of bacteria. The calculation of this value has several steps. First of all, simulations have to be performed in order to explore the behaviour of the model, differentiate fixed point attractors (FP), limit cycles and chaotic behaviours. A hundred-element set of initial state vectors, also called ‘scenarios’ is generated randomly when the program starts and this set is used in all simulations.
Finally, numerous counterexamples were generated, this one below was created for simple demonstration of our statement. One may observe that the non-diagonal elements of weight matrix W have the same absolute value. It is not an essential feature of the counterexamples, the main reason of this is that we have chosen simple and clear-cut example for demonstration.
Consider weight matrix \(W \in \mathbb {R}^{7\times 7}\), which has only zero elements in its diagonal. It means that the iteration rule of the corresponding FCM does not contain self-feedback.
$$\begin{aligned} W_1=\left( \begin{array}{ccccccc} 0 &{} -0.9 &{} 0.9&{} 0.9&{} -0.9&{} 0.9&{} -0.9\\ -0.9 &{} 0 &{} 0.9 &{} 0.9 &{} -0.9 &{} 0.9 &{} -0.9\\ 0.9 &{} 0.9 &{} 0 &{} -0.9&{} 0.9 &{} -0.9&{} 0.9\\ 0.9&{} 0.9&{} -0.9&{} 0&{} 0.9 &{} -0.9 &{} 0.9\\ -0.9 &{} -0.9&{} 0.9&{} 0.9&{} 0&{} 0.9&{} -0.9\\ 0.9 &{} 0.9 &{}-0.9&{} -0.9&{} 0.9&{} 0&{} 0.9\\ -0.9 &{} -0.9 &{} 0.9&{} 0.9&{} -0.9&{} 0.9&{} 0 \end{array} \right) \end{aligned}$$
(63)
If \(\lambda =0.8\), then \(\frac{\lambda ^2}{4}\left( \sum _{i=1}^n \Vert w_i\Vert ^2\right) ^{1/2}=0.9332\), which means that this scenario meets condition Eq. (55). Contrary to the statement of Theorem 2 of [25], the iteration produces a limit cycle, not an equilibrium point (the initial activation vector was \(A(0)=[1\, 1\, 1\, 1\, 1\, 1\, 1]^{\mathrm{T}}\)). This limit cycle has two elements:
$$\begin{aligned} LC_1 =\left[ \begin{array}{ccccccc} 0.39897 \\ 0.39897 \\ 0.78157\\ 0.78157\\ 0.39897\\ 0.78157\\ 0.39897 \end{array} \right] \qquad LC_2 =\left[ \begin{array}{ccccccc} 0.69559 \\ 0.69559 \\ 0.50590 \\ 0.50590\\ 0.69559\\ 0.50590\\ 0.69559 \end{array} \right] \end{aligned}$$
(64)
Now let us see an example with multiple fixed points. The weight matrix is
$$\begin{aligned} W_2=\left( \begin{array}{cccccccc} 0 &{} -1 &{} 1 &{} -1 &{} 1 &{} -1 &{} 1 &{} -1 \\ -1 &{} 0 &{} -1 &{} 1 &{} -1 &{} 1 &{} -1 &{} 1 \\ 1 &{} -1 &{} 0 &{} -1 &{} 1 &{} -1 &{} 1 &{} -1\\ -1 &{} 1 &{} -1 &{} 0 &{} -1 &{} 1 &{} -1 &{} 1 \\ 1 &{} -1 &{} 1 &{} -1 &{} 0 &{} -1 &{} 1 &{} -1 \\ -1 &{} 1 &{} -1 &{} 1 &{} -1 &{} 0 &{} -1 &{} 1 \\ 1 &{} -1 &{} 1 &{} -1 &{} 1 &{} -1 &{} 0 &{} -1 \\ -1 &{} 1 &{} -1 &{} 1 &{} -1 &{} 1 &{} -1 &{} 0 \end{array} \right) \end{aligned}$$
(65)
If \(\lambda =0.7\), then \(\frac{\lambda ^2}{4}\left( \sum _{i=1}^n \Vert w_i\Vert ^2\right) ^{1/2}=0.9167\), which means that this scenario also meets condition Eq. (55). On the other hand, if the initial activation vector is \(A^{(0)}=[0,0,0,0,0,0,0,0]^T\), then the iteration converges to \(FP_1\), but if \(A^{(0)}=[1,0,0,0,0,0,0,0]^{\mathrm{T}}\), then the iteration converges to \(FP_2\) and for \(A^{(0)}=[0,0,0,0,0,0,0,1]^{\mathrm{T}}\) the limit is \(FP_3\) (results are rounded to four decimals):
$$\begin{aligned} FP_1 =\left[ \begin{array}{ccccccc} 0.4260 \\ 0.4260 \\ 0.4260\\ 0.4260\\ 0.4260\\ 0.4260\\ 0.4260 \\ 0.4260 \end{array} \right] \, FP_2 =\left[ \begin{array}{ccccccc} 0.7847 \\ 0.1266 \\ 0.7847 \\ 0.1266 \\ 0.7847 \\ 0.1266 \\ 0.7847 \\ 0.1266 \\ \end{array} \right] \, FP_3 =\left[ \begin{array}{ccccccc} 0.1266 \\ 0.7847 \\ 0.1266 \\ 0.7847 \\ 0.1266 \\ 0.7847 \\ 0.1266 \\ 0.7847 \end{array} \right] \end{aligned}$$
(66)
A rigorous mathematical analysis leads to the conclusion that the weakness of the statement of [25] comes from the mistake that the contraction property was applied using local values of the derivative, while a global upper bound should be applied. Moreover, an algebraic mistake has also occurred in the derivation of the result, which caused a duplicated presence of the parameter of the sigmoid function (\(c_{l_i}\)).