Linearizing layers of radial binary classifiers with movable centers
 694 Downloads
 3 Citations
Abstract
Ranked layers of binary classifiers are used for the linearization of learning sets composed of multivariate feature vectors. After transformation by ranked layer, each learning set can be separated by a hyperplane from the sum of other learning sets. Ranked layers can be designed, among others, from radial binary classifiers. This work elaborates on designing ranked layers from radial binary classifiers with movable centers.
Keywords
Ranked layers Linear separability Radial binary classifiers Movable centers1 Introduction
Learning sets in classification problems contain examples of objects assigned to particular categories (classes). Objects are typically represented in a standardized manner by multivariate feature vectors of the same dimension. Binary classifiers transform feature vectors into numbers equal to one or zero. Classifiers can be designed based on learning data sets according to various pattern recognition methods [1, 2].
A layer of binary classifiers aggregates input data sets if many feature vectors are transformed into the same output vector with binary components. The aggregation is separable if and only if some of the feature vectors belonging to the same class are aggregated into a single output vector. Ranked layers allow aggregating learning sets in a linearly separable manner [3]. This means that each of the learning sets may be separated from the sum of other learning sets by a hyperplane after transformation by the ranked layer. Linearly separable aggregation plays a special role in pattern recognition methods based on neural network models. In particular, this concept is important in the perceptron model based on formal neurons [4].
The linear separability of learning sets is important also in support vector machines (SVM), one of the most popular methods in data mining, [5, 6]. An essential part of the SVM methods is linear separability induction through kernel functions. Selection of the appropriate kernel functions is still an open and difficult problem in many practical applications of support vector machines. The ranked layers can be treated as a useful alternative for the kernel functions technique in the SVM context.
A family of K disjoined learning sets can always be transformed into K linearly separable sets as a result of transformation by the ranked layer of formal neurons—as proved in the paper [7]. This result was extended to the ranked layers of arbitrary binary classifiers in the work [3]. The procedure of ranked layer designing from radial binary classifiers was proposed in the work [8]. An extension of this procedure to radial binary classifiers with movable centers is discussed in this work.
2 Separable and linearly separable learning sets
Let us assume that each object \(O_j\,(j=1,\ldots , m)\) is represented in a standard manner by feature vectors \({\mathbf{x}}_j[n]=[x_{j1},\ldots ,x_{jn}]^T\) belonging to ndimensional feature space \(F[n]\) (\({\mathbf{x}}_j[n]\in F[n]\)). Each feature vector \({\mathbf{x}}_j[n]\) can be treated as a point of the feature space \(F[n]\). Components \(x_{ji}\) of the feature vector \({\mathbf{x}}_j[n]\) are expected to be numerical results of n standardized examinations of a given object O _{ j } related to particular features \(x_i \,(i=1, \ldots, n)\) (\(x_{ji}\in \{0,1,\}\) or \(x_{ji} \in R\)). In practice, we can often assume that the feature space \(F[n]\) is equal to \(n\)dimensional real space \(R^n\) (\(F[n]=R^n\)).
The learning set \(C_k\) contains \(m_k\) feature vectors \(\varvec{x}_j[n]\) assigned to the \(k\)th category \(\omega _k\). The assignment of feature vectors \(\varvec{x}_j[n]\) to particular categories \(\omega _k\) can be seen as additional knowledge in the classification problem [1].
Definition 1
Definition 2
The feature vector \({\mathbf{x}}_j[n]\) is situated on the positive side of the hyperplane \(H({\mathbf{w}}_k[n],\theta _k)\) (3) if and only if \({\mathbf{w}}_k[n]^T{\mathbf{x}}_j[n]>\theta _k\). Similarly, the vector \({\mathbf{x}}_j[n]\) is situated on the negative side of \(H({\mathbf{w}}_k[n],\theta _k)\) if and only if \({\mathbf{w}}_k[n]^T{\mathbf{x}}_j[n]<\theta _k\).
Definition 3
If the inequalities (4) hold, then all vectors \({\mathbf{x}}_j[n]\) from learning set \(C_k\) are situated on the positive side of hyperplane \(H({\mathbf{w}}_k[n],\theta _k)\) (3) and all vectors \({\mathbf{x}}_j[n]\) from the remaining sets \(C_i\) are situated on the negative side of this hyperplane.
3 Radial binary classifiers
The Euclidean distance function (7) is used to design radial classifiers.
4 Layers of radial binary classifiers
Definition 4
Definition 5
Each linearly separable (12) layer of binary classifiers \(RC({\mathbf{w}}_i[n],\rho _i)\) (6) is also a separable layer (10).
5 Designing ranked layers of radial binary classifiers
Definition 6
The open Euclidean ball \(B_j({\mathbf{x}}_j[n],\rho _j)\) (13) is homogeneous in respect to learning sets \(C_k\) (1) if it contains such feature vectors \({\mathbf{x}}_j[n]\) that belong to only one of these sets. The ball \(B_j({\mathbf{x}}_j[n],\rho _j)\) is not homogeneous if it contains feature vectors \({\mathbf{x}}_j[n]\) from more than one learning set \(C_k\) (1).

Put l = 1 and define sets \(D_k(l):(\exists k\in {1,\ldots,K})\ D_k(l)=C_k\ (1)\)
Stage 2. (Optimal homogeneous ball \(B_{j^*} (X_{j^*}[n],\rho_{j^*}) \,\,(13))\)
 Find parameters \(k^*\), \(j^*\) and \(\rho _{j^*}\) of the reduced data set \(D_{k^*}(l)\) and the optimal homogeneous ball \(B_{j^*}({\mathbf{x}}_{j^*}[n],\rho _{j^*})\) (13). The parameter \(k^*\) \((k\in \{1,\ldots ,K\})\) defines the index \(k(l)\) of data set \(D_{k^*}(l)\) reduced during the \(l\)th step:The parameters \(j^*\) and \(\rho _{j^*}\) define the reducing ball \(B_l({\mathbf{x}}_{j(l)}[n], \rho _{j(l)})\) (13) during the \(l\)th step:$$\begin{aligned} k(l)=k^* \end{aligned}$$(18)and$$\begin{aligned} j(l)=j^* \end{aligned}$$(19)Stage 3. (Reduction of the set \(D_{k^*}(l)\) )$$\begin{aligned} \rho _{j(l)}(l)=\rho _{j^*} \end{aligned}$$(20)
 Remove feature vectors \({\mathbf{x}}_j[n]\) contained in the optimal ball \(B_{j^*}({\mathbf{x}}_{j^*}[n],\rho _{j^*})\) (13)$$\begin{aligned} D_{k^*}(l+1)=D_{k^*}(l)\{{\mathbf{x}}_j[n]:{\mathbf{x}}_j[n]\in B_{j^*}({\mathbf{x}}_{j^*}[n],\rho _{j^*}) \ (13)\} \nonumber \\ {\text { and }} (\forall k\in \{1,\ldots ,K\} {\text { where }} k \ne k^* ) \ D_k(l+1)=D_k(l) \end{aligned}$$(21)
Stage 4. (Stop criterion)
if all data sets \(D_k(l+1)\) are empty, then stop
else increase the index \(l\) by one (\(l\rightarrow l+1\)) and go to Stage 2.
Remark 1
Each radial binary classifier \(RC({\mathbf{w}}_i[n],\rho _i)\) (5) added to the layer in accordance with the procedure (17) reduces (18) data set \(D_{k^*}(l)\) by at least one feature vector \({\mathbf{x}}_{j^*}[n]\).
It can be proved on the basis of the above Remark 1 that if the learning sets \(C_k\) (1) are separable (2), then after finite number \(L\) steps, the procedure will be stopped. The following Lemma results [8]:
Lemma 1
The minimal number \(L=K\) of radial binary classifiers \(RC({\mathbf{x}}_{j(l)}[n],\rho _{j(l)})\) (5) appears in the ranked layer when whole learning sets \(C_k\) (1) are reduced (21) during successive steps \(l\). The maximal number \(L=m\) of elements appears in the ranked layer when only single elements \({\mathbf{x}}_{j}[n]\) are reduced during successive steps \(l\).
Theorem 1
Proof
The arguments formulated in works [3] and [7] have been used in the above proof of Theorem 1.
The procedure of ranked layer designing (17) allows to generate a sequence of optimal homogeneous balls \(B_{j^*}({\mathbf{x}}_{j^*}[n],\rho _{j^*})\) (13). The procedure (17) is stopped if each feature vector \({\mathbf{x}}_j[n]\) is located in optimal ball \(B_{j^*}({\mathbf{x}}_{j^*}[n],\rho _{j^*})\).
6 Radial binary classifiers with movable centers
The procedure of ranked layer designing (17) involves the search (Stage 2) for the optimal homogeneous balls \(B_{j^*}({\mathbf{x}}_{j^*}[n],\rho _{j^*})\) (13). Each optimal ball \(B_{j^*}({\mathbf{x}}_{j^*}[n],\rho _{j^*})\) should be distinguished by a large number \(M_{j^*}\) (16) of feature vectors \({\mathbf{x}}_j[n]\) from one of the \(K\) learning sets \(C_k\) (1).
Remark 2
Remark 3
The ball \(B_{j'}({\mathbf{x}}_{j'}[n],\rho _{j'})\) (13) with the center in point \({\mathbf{c}}_{j'}[n]\) (\({\mathbf{c}}_{j'}[n]={\mathbf{x}}_{j'}[n]\)) and radius \(\rho _{j'}\) (28) contains the maximal number \(M_{j'}\) of feature vectors \({\mathbf{x}}_j[n]\) among all the other homogeneous balls \(B_{j'}({\mathbf{x}}_{j'}[n],\rho _{j'})\) (13) centered in this point.
7 The procedure of displacements based on averaging
Working supposition: If coefficient \(K\) in enlarged ball \(B_j({\mathbf{c}}_j[n],K \rho _{j'})\) (31) is not excessively large, then homogeneous ball \(B_1({\mathbf{m}}_k(1), \rho _{j(1)})\) (33) obtained at the end (36) of the procedure contains no less elements \({\mathbf{x}}_j[n]\) than the initial homogeneous ball \(B_j({\mathbf{c}}_j[n],\rho _{j'})\) (13).
Enlarged ball \(B_j({\mathbf{c}}_j[n],K \rho _j)\) (31) was used in the above procedure description. In a more general formulation, this procedure can be started from any heterogeneous subset of feature vectors \({\mathbf{x}}_j[n]\) (1).
8 Procedures of radial displacements
Lemma 2
If each feature vector \({\mathbf{x}}_j[n]\) (1) situated on the positive side (39) of tangent hyperplane \(H({\mathbf{x}}_{j(a)}[n]; {\mathbf{x}}_{j(b)}[n])\) (43) belongs to the same learning set \(C_k\) as central vector \({\mathbf{x}}_{j(a)}[n]\) (\({\mathbf{x}}_{j(a)}[n]\in C_k\)) of the homogeneous ball \(B({\mathbf{x}}_{j(a)}[n];{\mathbf{x}}_{j(b)}[n])\) (37), then the enlarged ball \(B({\mathbf{x}}_{\alpha }[n]; {\mathbf{x}}_{j(b)}[n])\) (42) is homogeneous for an arbitrarily large value of parameter \(\alpha\) (\(\alpha \ge 1\)).
Lemma 2 can be proven by geometrical consideration. This Lemma can be reformulated in the manner below by using the condition (45).
Lemma 3
If each feature vector \({\mathbf{x}}_j[n]\) (1), which does not belong to set \(C_k\), fulfills the condition (45), then the enlarged ball \(B({\mathbf{x}}_{\alpha }[n];{\mathbf{x}}_{j(b)}[n])\) (42) is homogeneous for arbitrarily large values of parameter \(\alpha\) (\(\alpha \ge 1\)).
Remark 4
If parameter \(\beta\) is greater than certain threshold \(\beta _t\) (\(\beta _t\ge 0\)), then the relation (44) is fulfilled and the enlarged ball \(B({\mathbf{x}}_{\alpha }[n];{\mathbf{x}}_{j(b)}[n])\) (42) is homogeneous for arbitrarily large values of parameter \(\alpha\) (\(\alpha \ge 1\)) (lemma 2).
Enlargement of the homogeneous ball \(B({\mathbf{x}}_{j(a)}[n]; {\mathbf{x}}_{j(b)}[n])\) (37) is aimed at increasing the number \(M_j\) of feature vectors \({\mathbf{x}}_j[n]\) from the learning set \(C_k\) (1) contained in this ball. Shifting (46) of the tangent hyperplane \(H({\mathbf{x}}_{j(a)}[n];{\mathbf{x}}_{j(b)}[n])\) (38) is also done for this purpose.
9 Strategies for designing linearizing layers
Remark 5
In accordance with Designing postulate I (14), the set \(R_{k(l)}(l  1)\) (49) should be the greatest. This means that in the context of the radial binary classifiers, the optimal ball \(B_l({\mathbf{x}}_l[n], \rho _l)\) (13) with center \({\mathbf{x}}_l[n]\) and radius \(\rho _l\) should contain the greatest number of elements \({\mathbf{x}}_j[n]\) of the reduced learning set \(C_{k(l)}(l  1)\). The postulate (14) is an example of the greedy strategy aimed at designing a ranked layer with a great power of generalization. The procedure (17) of ranked layer designing includes Designing postulate I (14) within Stage 2. Designing postulate II (26) is somewhat more general than Designing postulate I (14). Postulate II (26) can lead beyond the greedy strategy, but so far there has been a lack of efficient computational procedures.
Both the procedures of displacements based on averaging and radial displacements of the homogeneous ball \(B_j({\mathbf{x}}_j[n], \rho _j)\) (13) can be used to obtain a ranked layer with a great power of generalization. Two types of the abovementioned procedures can be used alternatively for particular balls \(B_j({\mathbf{x}}_j[n], \rho _j)\) (13). This means that for some ball \(B_j({\mathbf{x}}_j[n], \rho _j)\) (13), the best results will produce procedure displacements based on averaging, but for a different ball \(B_{j'}({\mathbf{x}}_{j'}[n],\rho _{j'})\) (13), better results can be achieved by using the radial displacements procedure. Typically, the best results mean the modified ball, for example the homogeneous \(B({\mathbf{x}}_{\alpha }[n]; {\mathbf{x}}_{j(b)}[n])\) (42), with a large number of elements \({\mathbf{x}}_j[n]\) of one of the sets \(D_{k(l)}(l)\) (49).
A key issue remains. Which of the homogeneous balls \(B_j({\mathbf{x}}_j[n], \rho _j)\) (13) should be subjected to individual procedures of displacements (30)? A variety of strategies can be proposed for the selection of one or more homogeneous balls \(B_j({\mathbf{x}}_j[n], \rho _j)\) (13) and the appropriate technique to modify these balls. However, this issue requires further study.
10 Experimental results
To demonstrate the particular steps of ranked layer designing, the results of four experiments are presented. The first experiment was performed on artificial data sets with normal distributions. In the second experiment, data sets with a ring structure were used. The third experiment was carried out on the wellknown and wellunderstood Iris data set [1], and finally, three data sets from the UCI repository were chosen.
10.1 Experiment 1
In the first experiment, the procedure of radial displacements (25) of the homogeneous balls \(B_j({\mathbf{x}}_j[n], \rho _j)\) (13) was used. This procedure was applied to the learning sets generated in accordance with the normal model [9]. Objects belonging to two categories were randomly generated from populations with normal distributions with mean vectors \(\mu _1=[0,0]\), \(\mu _2=[3,1]\), respectively, and the same covariance matrices \(\Sigma _1=\Sigma _2\), where the variance of the first class equaled 2.4 and for the second 2.0, and the correlation coefficient was at level 0.9.
In the second step, the initial \(B_2\) ball is centered at the [4.11, 0.26] point, with a radius of 2.14. The final center in the second step is \({\mathbf{c}}_2\) = [23.71, \(\)21.87], and the radius is enlarged to \(\rho _2\) = 31.27. The number of correctly classified objects in category 2 increases from 61 to 99.
The center of the final \(B_3\) ball is \({\mathbf{c}}_3\)=[102.33, 148.19], while the initial center is situated in [2.33, 2.46]. Using the movable centers procedure, the radius of the final \(B_3\) ball increases from 5.48 to \(\rho _3\)=180.47. Both at the first and at the second setting, there are 17 correctly classified category 1 objects.
In the fourth step, the initial center [\(\)2.01, \(\)2.67] is displaced to the \({\mathbf{c}}_4\) = [\(\)102.01, \(\)52.14] point. The initial radius equaled 1.3833 and was enlarged to \(\rho _4\) = 112.699. Twelve category 1 observations are correctly classified.
In the last step, the remaining one object from the second category is classified using the \(B_5\) ball with \({\mathbf{c}}_5\) = [\(\)0.77, \(\)2.06] center and radius 1.
10.2 Experiment 2
In the first step, the center of the homogeneous ball \(B_1\) is located in the \({\mathbf{c}}_1\) = [−0.09, 0.05] point with the radius equal to \(\rho _1\) = 0.06. Fortytwo inner ring observations can be correctly classified using this classifier. In the second step, the homogeneous ball \(B_1({\mathbf{c}}_1[n], \rho _1)\) (13) is enlarged to heterogeneous ball \(B_1({\mathbf{c}}_1[n],K\rho _1)\), with coefficient K greater than one. \(K=2\) is assumed. Inside the ball, there are 66 inner ring objects and 74 second category objects. By averaging the featured objects inside the ball, displacement of the center is performed. The new center is moved to the \({\mathbf{c}}_2\) = [\(\)0.09, 0.03] point. The center correction is analogical to this in the \(k\)means method. In the last step, the radius is decreased to \(\rho _2\) = 0.08. Finally, all 66 objects forming the inner ring are correctly classified. The remaining 115 outer ring objects are correctly classified using radial classifier \(B_3({\mathbf{c}}_3[n],\rho 3)\) with the center \({\mathbf{c}}_3\) = [\(\)0.10, 0.02] and radius \(\rho _2\) = 0. 15.
10.3 Experiment 3
Results for the Iris data set (\({\mathbf{c}}_i\) center of B _{ i } ball, \(\rho _i\) radius of B _{ i } ball, m _{ i } number of classified objects by the B _{ i } ball)
Step i  Ball center \({\mathbf{c}}_{i}\)  Radius \(\rho _i\)  m _{ i }  Category 

1  (5.1, 103.5, −158.6, −89.8)  211.136  50  Iris setosa 
2  (9.7, 3.1, −1.5, −2.5)  8.398  48  Iris versicolor 
3  (7.5, 3.7, 6.4, 2.7)  2.460  44  Iris virginica 
4  (−95.1, −67.5, −25.5, −8.3)  127.215  6  Iris virginica 
5  (6.0, 2.7, 5.1, 1.6)  0.625  2  Iris versicolor 
The results are presented in Table 1. Five steps were needed to classify the objects belonging to three classes. In the first step, the whole category Iris setosa was perfectly classified by B _{1} ball with the \({\mathbf{c}}_1\) = [5.1, 103.5, −158.6, −89.8] center and the enlarged \(\rho _1\) = 211.14 radius. In the second step, 48 objects belonging to the Iris versicolor category were classified by the B _{2} ball with the \({\mathbf{c}}_2\) = [9.7, 3.1, −1.5, −2.5] center and the \(\rho _2\) = 8.39 radius. In the next two steps, 44 and 6 objects belonging to the Iris virginica category were classified by B _{3} and B _{4} balls (\({\mathbf{c}}_3\) = [7.5, 3.7, 6.4, 2.7], \(\rho _3\) = 2.46 and \({\mathbf{c}}_4\) = [−95.1, −67.5, −25.5, −8.3], \(\rho _4\) = 127.22). In the last step, two remaining objects belonging to the Iris versicolor category were correctly classified by B _{5} ball (\({\mathbf{c}}_5\) = [6.0, 2.7, 5.1, 1.6] and \(\rho _5\) = 0.63).
10.4 Experiment 4
In the last experiment, data sets from the UCI repository were chosen. The first data set (Yeast) contains data of protein localization sites in yeast bacteria, based on several biostatistical tests. The number of objects is 1484, and each object is described by eight numerical attributes and the class label (ten classes). The objective of the second data set (E. coli) is similar—to predict the cellular localization sites of proteins. The data set contains 336 instances described by seven numerical attributes and the class. There are eight classes. The third chosen data set is BreastTissue. It presents electrical impedance measurements in samples of freshly excised tissue from the breast. One hundred and six instances, nine numerical attributes and the class are available.
Results for the chosen data sets (m number of objects, n number of attributes, K number of classes, Q _{RLRBC} accuracy for the ranked layers of radial binary approach, Q _{SVMRBF} accuracy for the SVM with RBF kernel approach)
Data set  m  n  K  Q _{RLRBC}  Q _{SVMRBF} 

Yeast  1484  8  10  0.51  0.56 
E. coli  336  7  8  0.79  0.76 
BreastTissue  106  9  6  0.45  0.54 
In the case of the Yeast data set, the accuracy for the ranked layers of the radial binary classifiers method as well as the movable centers approach was 0.51. The data set is complex and the number of classes is high. Not equally distributed classes and the fact that objects from various classes are not separable caused the small accuracy. The best results (Q = 0.56) were obtained using the SVM with RBF kernel approach.
The accuracy for the ranked layers of the radial binary classifiers for the E. coli data set was 0.79. The movable centers approach gave slightly better results (Q = 0.80), and it was the highest accuracy among the compared methods. A 0.76 accuracy was obtained for the SVM with RBF kernel approach.
For the BreastTissue data set, the accuracy for the ranked layers of the radial binary classifiers was \(Q = 0.45\), while for the SVM with RBF kernel approach, it was \(Q = 0.54\).
This method is new and is still being researched to improve quality and determine the scope of its applicability. In our opinion, these results are encouraging for further work to optimize the strategy of ranked layer designing.
11 Concluding remarks
The ranked layer of binary classifiers allows to transform separable learning sets into sets that are linearly separable. The problem of learning set linearization is important, for example, in the context of support vector machine (SVM) techniques [5]. The linearization of the learning sets in the SVM approach is not always done successfully through a search for appropriate kernel functions.
The procedure of ranked layer designing from formal neurons was described for the first time in paper [7]. In this approach, the ranked layer was designed using hyperplanes in the feature space. The basis exchange algorithms, which are similar to linear programming, allow one to find optimal hyperplane parameters efficiently, even in the case of large multidimensional data sets.
A computationally straightforward procedure for building ranked layers using the optimal homogeneous balls was described in work [8]. This procedure is based on exhaustive examination of homogeneous balls centered in all feature vectors contained in the learning sets.
An extension of the procedure of ranked layer designing using radial binary classifiers with movable centers has been proposed and discussed in this work. In particular, center movements based on averaging and radial displacements of the open homogeneous balls were proposed and examined. There are still many problems with this approach, but the results achieved so far are encouraging for further research and applications.
References
 1.Duda OR, Hart PE, Stork DG (2001) Pattern classification. Wiley, New YorkzbMATHGoogle Scholar
 2.Fukunaga K (1972) Introduction to Statistical Pattern Recognition. Academic Press, WalthamGoogle Scholar
 3.Bobrowski L (2011) Induction of linear separability through the ranked layers of binary classifiers. Engineering application of neural networks., FIP advances in information and communication technologySpringer, Berlin, pp 69–77CrossRefGoogle Scholar
 4.Rosenblatt F (1962) Principles of neurodynamics. Spartan Books, WashingtonzbMATHGoogle Scholar
 5.Vapnik VN (1998) Statistical learning theory. Wiley, New YorkzbMATHGoogle Scholar
 6.Hand DJ, Mannila H (2001) Principles of data mining. MIT Press, CambridgeGoogle Scholar
 7.Bobrowski L (1991) Design of piecewise linear classifiers from formal neurons by some basis exchange technique. Pattern Recognit 24(9):863–870CrossRefGoogle Scholar
 8.Bobrowski L, Topczewska M (2013) Separable linearization of learning sets by ranked layer of radial binary classifiers. In: Burduk R et al (eds) Proceedings of the 8th international conference on computer recognition systems CORES 2013, AISCS 226. Springer, Switzerland, pp 131–140CrossRefGoogle Scholar
 9.Johnson RA, Wichern DW (1991) Applied multivariate statistical analysis. PrenticeHall Inc., Englewood CliffsGoogle Scholar
 10.Bobrowski L (2007) Almost separable data aggregation by layers of formal neurons. 13th International Conference: KDS’2007: Knowledgedialoguesolution, Varna, June 18–24, 2007. International Journal on Information Theories and Applications, Sofia, pp 34–41Google Scholar
Copyright information
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.