# Influence of graphical weights’ interpretation and filtration algorithms on generalization ability of neural networks applied to digit recognition

- First Online:

- Received:
- Accepted:

- 2 Citations
- 590 Downloads

## Abstract

In this paper, the method of the graphical interpretation of the single-layer network weights is introduced. It is shown that the network parameters can be converted to the image and their particular elements are the pixels. For this purpose, weight-to-pixel conversion formula is used. Moreover, new weights’ modification method is proposed. The weight coefficients are computed on the basis of pixel values for which image filtration algorithms are implemented. The approach is applied to the weights of three types of the models: single-layer network, two-layer backpropagation network and the hybrid network. The performance of the models is then compared on two independent data sets. By means of the experiments, it is presented that the adjustment of the weights to new values decreases test error value compared to the error obtained for initial set of weights.

### Keywords

Weights Neural network Filtration Digit recognition## 1 Introduction

Character recognition problem has enjoyed great attention for a few decades. Back in 1972, the way of automatic recognition of handwritten characters was already described [1]. In mid-eighties, the method of "learning" character sets and various feature extraction techniques were proposed [2]. Machine learning techniques, such as neural networks, played a very important role in this domain [3]. In 1990, the application of a backpropagation neural network to the recognition of handwritten US Postal Service Office zip-code was presented [4]. Input data differed significantly in writing style, character size, overlapping numerals, postmarks, horizontal bars and marks on the envelope which made the recognition process more difficult. Even though, the performance on zip-code digits was 92% recognition, 1% substitution and 7% rejects [4]. In the field of character classification, one did not only focus on offline handwriting recognition, which is performed after the process of writing. The effort was also devoted to online, i.e. dynamic handwriting recognition in which the machine recognizes the characters while the user writes [5]. The transducers (converters, e.g. tablet) were used for this purpose, but the process was strongly dependent on the power of contemporary computers. It is necessary to emphasize that neural networks are not the only models that have been used in handwritten patterns classification. The other methods of computational intelligence have also been applied. There are many contributions that present the use of distance classifiers [6, 7, 8], support vector machines [9, 8] or decision trees [10].

In spite of the fact that the task of character recognition has been thoroughly explored, it still attracts a lot of researchers nowadays. A great number of scientists still apply neural networks for this purpose. The recognition of subcontinental languages, e.g. Chinese letters [11, 12], Persian fonts [13] or Indian numeral optical characters [8, 14, 15], receives an increasing attention. For these particular cases, backpropagation neural networks, particle swarm optimization neural networks, single-layer perceptrons and probabilistic neural networks were used. Numerous amount of work has been done on benchmarking Arabic digits (e.g. CENPARMI released by Concordia University, CEDAR released by CEDAR-SUNY Buffalo or MNIST extracted from the NIST database) where various neural models (multilayer perceptrons, radial basis function networks, learning vector quantization networks and polynomial networks) were tested against state-of-the-art machine learning techniques such as nearest neighbor classifiers, naive Bayes, rule-based learning or support vector machines [16, 17, 18].

In this work, the concept of the graphical interpretation of the single-layer neural network weights is proposed. The model is designed to classify all digits; thus, it is equipped with 10 neurons where each element is responsible for the recognition of a single numeral. Once the training process of the network is completed, it is shown that it is possible to convert the weights to pixel values in order to transform model parameters into the images. On the basis of the fact that the networks weights can be regarded as an image, the filtration algorithms are applied to the pixels obtained from the weight values. The filtered pixels then serve for new weights computation. The idea is tested on two data sets using three types of the models: single-layer network, two-layer backpropagation network and the hybrid network by comparing the efficiency of the networks with the weights computed from the filtered images, and the performance of the models having original set of parameters. For computational purposes, all the models, image transformations and filtration algorithms were hard-coded in the authors’ software.

The main motivation of such a research lies in the intention of understating how the neural model "perceives" input data, how it faces image filtration and whether it is capable of generalizing to unknown examples.

The paper is organized as follows. Section 2 describes handwritten digit data sets used for recognition. In Sect. 3, the neural networks employed to the classification are briefly described. Section 4 highlights the graphical interpretation of the single-layer network weights. Later on, in Sect. 5, the filtration algorithms applied to image pixels and new weight modification method are discussed. Section 6 verifies the performance of the neural networks in two digit classification tasks. Finally, Sect. 7 presents the conclusions.

## 2 Input data sets

Two digit databases are considered in the work. The first set represents the numerals entered by means of Wacom CTE-440/S graphics tablet. Its working area covered A6 letter format (127.6 × 92.8 mm). The device resolution reached 2,000 dpi (787 lines/cm). Input patterns were entered by means of wireless pencil lead. Total input data included 1,000 handwritten digits (\(0,1,\ldots ,9\)), which, in turn, were converted to 30 × 40 size. For the sake of unification of all digit patterns, some necessary transformation operations were carried out. Initially, all the digits needed to be rescaled [19] since while writing on the tablet, their size was different. Then, each image underwent binarization [20] to be converted from colorful to the one in gray scale. Additionally, in order to limit the information of the characters that were written with thick lines and to extract the parts that represent the relevant elements of the images, the patterns were peeled off using skeletonization algorithm [21], [22]. Finally, due to the fact that the placement of the digits within the frame of the device’s screen was different, all the characters had to be centered for proper representation in 30 × 40 pixel pattern.

The second set was the MNIST database [7], a subset of a larger set available from National Institute of Standards and Technology (NIST). It consisted of 60,000 training examples and 10,000 test examples. The digits were size-normalized and centered in a fixed-size 28 × 28 image. The images contained gray levels as a result of the anti-aliasing technique used by the normalization algorithm. The regular 28 × 28 database along with the content description and a performance results for some computational intelligence methods is available at [23].

## 3 Neural networks used in digit recognition

Three types of neural networks were analyzed in the research: single-layer network, two-layer backpropagation network and the hybrid network. All the models are shortly discussed in the following subsections.

### 3.1 Single-layer network

*l*is the total number of data set. All the examples, through the weight vector \(\mathbf{w1}_{j}=[w1_{j1},\ldots ,w1_{jn}], \) were connected to

*j*-th neuron of the output layer for \(j=1,\ldots ,m\) where

*m*= 10 is the number of classified digits (\(0,1,\ldots ,9\)). Each output neuron computed the weighted sum of the input signals, which was fed forward to get activated according to the formula:

*b*1

_{j}is the bias and \(f(\cdot )\) is defined as log-sigmoid transfer function [24]. The output of each of all 10 neurons belongs to the interval (0, 1); therefore, in order to recognize the digit at the single presentation, the highest output activation was determined and the pattern was classified to the class corresponding to the strongest signal represented by the

*j*-th neuron. The network was trained using standard Widrow-Hoff gradient descent learning algorithm [25], which relied on the following weights’ update:

*t*

_{ji}is an element of a target vector \(\mathbf{t}_{i}\) given for input signal \(\mathbf{x}_{i}, \eta \) is positive learning rate and \(f^{\prime } \) is the derivative of the transfer function.

### 3.2 Two-layer backpropagation network

*m*neurons connected to

*n*input elements, and the output layer of \(o=1,\ldots ,p=10\) units responsible for classification of a particular digit. All output neurons were fired using log-sigmoid transfer function

*f*with the weighted sum of the hidden layer signals as the argument:

*w*2

_{oj}is the element of the second layer weight vector \(\mathbf{w2}_{o}=[w2_{o1},\ldots ,w2_{om}], a1_{ji}\) is defined in (1) and

*b*2

_{o}is the second layer bias. As in case of single-layer network, the highest output activation determined the class of a classified digit. The model was trained using standard backpropagation algorithm that amounted to the following update of the weights of the first and second layer:

### 3.3 Hybrid network

*d*

_{j}"distance" determined the winner-neuron along with its neighborhood. All the neurons in the layer were activated by means of radial basis transfer function [27]:

*n*1

_{wi}is the activation of the winner-neuron. In the output layer, each neuron calculated the weighted sum of

*a*1

_{ji}signals which was fed as the argument to log-sigmoid activation function (3). The Kohonen layer was trained using neural gas algorithm [28]. The second layer was trained by means of standard gradient descent learning algorithm.

## 4 Graphical interpretation of single-layer network’s weights

The initial idea behind the interpretation of the weights of the single-layer network was to understand how the particular neurons of the artificial model "see" their coefficients. Since the row input consists of 30 × 40 elements for the tablet data and 28 × 28 elements for the NIST database connected to *m* = 10 neurons, one may ponder whether it is possible to generate the pictures of the same size showing the weight values computed for each neuron after training process. Can the weights be perceived as an image of a digit?

In order to find the answer, it is necessary to provide the method of how to convert the weights (a set of real numbers) into the values that can correspond to the pixels with some brightness. Once such a transformation is determined, one can represent the set of calculated pixels in a resolution image. In this section, such a weight visualization is proposed. The following example highlights the idea.

*m*outputs and assume the weighs for

*j*-th neuron:

*w*

_{j1}= 0.3,

*w*

_{j2}= − 0.1 and

*w*

_{j3}= 0.7, where \(j=1,\ldots ,m. \) Then, let us find the maximum and minimum of these coefficients,

*w*

_{jmax}= 0.7 and

*w*

_{jmin}= − 0.1, in this case. Furthermore, let us convert the weight values using the following normalization:

*x*. Now, for this particular set of weights,

*p*

_{jk}is called a pixel and formula (8) generates the image with the pixel values:

*p*

_{j1}= 127,

*p*

_{j2}= 0 and

*p*

_{j3}= 255 in the gray scale. Such an image has a color palette describing each pixel using a single byte. In this case, the image consisting of gray, black and white pixel is given, thus

*p*

_{jk}defines the brightness in the gray scale.

*m*= 10 neurons. In the upper row, 28 × 28 images of the weights for NIST database are shown. At the bottom, the pictures of the weights for the tablet data are presented. In both cases, the network was trained using Widrow-Hoff gradient descent algorithm (2) with η = 0.6 . First top and bottom left images in Fig. 1 represent the weight values for the first neuron that learnt to recognize digit 0. The weights of remaining neurons, treated as the pixels in the image (seen as \(1,2,3,\ldots ,9\)), also resemble appropriate digits from the input data, i.e. \(1,2,3,\ldots ,9. \)

As shown, neural network parameters determined after training process do not have to be only treated as some numbers that allow the model to classify input examples. The weights can also be interpreted as the images and, what was shown in this section, such an interpretation can illustrate the effect of network’s training and the way the artificial model "understands" digit recognition.

## 5 Image filtration and the weight adjustment

As shown in Sect. 4, the neural network weights can be interpreted in a graphical form. Each appropriately normalized coefficient is then treated as the element of the image seen by the neuron. This image, after the model training process, resembles a digit to a large degree. However, the picture is not so "perfect" as the original input pattern. For example, the neuron weights obtained from NIST digits illustrated in Fig. 1 in the form of pixels are blurry. The weights calculated from the tablet numerals, as the image, do not have a strong signal (white pixels) but are more distinct. For this reasons, high- and low-pass filtration algorithms were applied to pixels computed from the set of optimal weights found within the network training on tablet and NIST data sets. On the basis of filtered pixels, neural network weight modification was introduced. Following subsections describe the filtration algorithms and the solution of how to adjust the weights parameters on the basis of new pixel values.

### 5.1 Filtration algorithms

*p*

_{jk}is the source image pixel,

*s*is the normalization parameter:

*s*=

*M*

_{−1,−1}+

*M*

_{0,−1}+

*M*

_{1,−1}+

*M*

_{−1,0}+

*M*

_{0,0}+

*M*

_{1,0}+

*M*

_{−1,1}+

*M*

_{0,1}+

*M*

_{1,1}and

*M*

_{x,y}denotes one of 3 × 3 mask element defined in (9), (10) or (11).

### 5.2 Weights adjustment

*p*

_{jk}

^{(new)}is defined in (12). Depending on the mask applied, the pixels change their intensity getting more sharp or blurry what makes the weights in the image behave in the similar way. Now, if one updates the neural network weights by means of (13) mapping, the model’s classification ability should change. The next section presents the test error comparison between analyzed networks with original and modified set of weights in the classification of tablet and NIST data sets.

## 6 The performance of neural networks on digit data sets

In this part of work, the comparative efficiency analysis was conducted for the single-layer network, two-layer backpropagation network and the hybrid network in the classification of tablet and NIST data sets. The comparison was carried out by measuring two indicators of the performance: test error—calculated by the models with the weights obtained after training process, and test error after filtration—determined by the networks with the weights updated according to the mapping (13). Both factors were computed on the test set different from the patterns used in the training process: 200 of numerals for the tablet data set and 10,000 digits for NIST database. The errors were measured as the function of the network training parameters. Two following subsections highlight the results received on each data set. Afterward, a short summary is added.

### 6.1 Tablet data set

Gaussian filtration (LPG mask) provided the lowest test error for single-layer network (10.425%), two-layer backpropagation network (9.468%) and the hybrid network (17.234%),

the lowest overall test error (9.468%) was achieved by two-layer backpropagation network with the set of weights modified by Gaussian filtration algorithm,

the highest reduction in the error rate was equalled 2.925%—it was found when applying Gaussian filtration to the weights of backpropagation network,

all filtration algorithms decreased 19.894% test error by the margin of 1.276% (HP3), 2.021% (LP3) and 2.659% (LPG) for the hybrid network,

HP3 mask filtration increased test error by 0.106% and 0.851% for the single-layer and backpropagation network, respectively.

The lowest percentage of test error (Test) and test errors after filtration (HP3, LP3, LPG) found for single-layer network, two-layer backpropagation network and hybrid network in tablet digits classification

Model | Minimum error values [%] | |||
---|---|---|---|---|

| Test | HP3 | LP3 | LPG |

Single-layer network | 12.127 | 12.234 | 11.117 | 10.425 |

Backpropagation network | 12.394 | 13.245 | 10.585 | 9.468 |

Hybrid network | 19.894 | 18.617 | 17.872 | 17.234 |

### 6.2 NIST database

HP3 and LP3 mask decreased test set error for each neural network,

all filtration algorithms decreased 18.174% test error by the margin of 0.424% (HP3), 0.382% (LP3) and 0.286% (LPG) for the hybrid network,

LP3 mask filtration applied to backpropagation network weights provided the lowest test error among all classifiers,

Gaussian filtration algorithm made the test error increase for the single-layer network by 0.111%.

The lowest percentage of test error (Test) and test errors after filtration (HP3, LP3, LPG) recorded for single-layer network, two-layer backpropagation network and hybrid network in NIST database pattern recognition

Model | Minimum error values [%] | |||
---|---|---|---|---|

| Test | HP3 | LP3 | LPG |

Single-layer network | 9.376 | 9.172 | 9.294 | 9.488 |

Backpropagation network | 9.078 | 9.068 | 8.966 | 9.078 |

Hybrid network | 18.174 | 17.750 | 17.792 | 17.888 |

### 6.3 Summary

However, HP3 mask applied to the weights obtained from the training of the NIST digits made the filtration test error lower than the error determined on the unfiltered coefficients for the single-layer network and the hybrid network. LPG filtration, as shown in Figs. 2 and 3 provided here worse results for the single- and two-layer networks, respectively. It can be justified by the fact that the set of weights as the image was blurry (Fig. 1). On the other hand, for the hybrid network (Fig. 4), all filtration algorithms decreased test error rate for each number of hidden neurons.

## 7 Conclusion

In the article, the method of single-layer neural network weights interpretation was proposed. The network was destined to recognize digits from the range \(0,1,\ldots ,9; \) therefore, it was built of 10 neurons. Each unit recognized single numeral. It was shown that after the training of the model, one is possible to transform the weight values to the image pixels. The idea was tested on two data sets: 30 × 40 resolution 1,000 digits entered by means of the graphical tablet and 60,000 NIST web page database digits with 28 × 28 size. Furthermore, high- and low-pass filtration algorithms were applied to the pixels computed from the weight values. On the basis of the filtered pixels, the weights of the network were adjusted. This approach was then verified on three types of the models: single-layer network, two-layer backpropagation network and the hybrid network by comparing the performance of the models having the weights computed from the filtered images and the coefficients obtained after training process. The analysis were carried out on both data sets. The results presented in the work showed that, in both data classification cases, the filtration algorithms decreased test error calculated by the networks with the weights set to values determined after training process. In particular, in tablet data recognition, the use of the LPG mask (Gaussian low-pass filter) provided the lowest test error for single-layer network (10.425%), two-layer backpropagation network (9.468%) and the hybrid network (17.234%). Moreover, this filtration algorithm applied to the weights of backpropagation network reduced the test error rate by the margin of 2.925% what yielded the lowest test error among all models (9.468%).

The improvement obtained by considered networks in the test error rate for NIST digit recognition was not that large though. The highest reduction of this indicator (0.424%) was obtained when using high-pass filtration (HP3 mask) to the weights of the hybrid network. The application of both HP3 and LP3 masks decreased admittedly the test error value, but the gain was subtle. It can be explained by the fact that this particular data set is a web base to which no image preprocessing was applied. In contrast, the tablet data set images, before been fed as the input to the networks, underwent the skeletonization process that extracted the shape of pattern digits.

The entire process of computing the pixels from the optimal network weights, applying the filtration algorithm to calculated pixels and, finally, updating the weights on the basis of filtered pixels can increase the generalization ability of the neural network. However, it is important to add that such an improvement can be found if an appropriate image filtration is applied. Sometimes, it may even amount to a trial and error approach. Moreover, some data preprocessing has to be performed since the images to be classified usually contain a lot of information, which mislead the network in the process of generalization.

## Acknowledgments

This research was partially supported by Rzeszow University of Technology Grant No. U-8255/DS and NN 514 705540 from National Science Centre. The authors are grateful to valuable comments of the anonymous reviewer. All the remarks significantly improved the quality of the manuscript.

### Open Access

This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.