Influence of graphical weights’ interpretation and filtration algorithms on generalization ability of neural networks applied to digit recognition
- First Online:
- 590 Downloads
In this paper, the method of the graphical interpretation of the single-layer network weights is introduced. It is shown that the network parameters can be converted to the image and their particular elements are the pixels. For this purpose, weight-to-pixel conversion formula is used. Moreover, new weights’ modification method is proposed. The weight coefficients are computed on the basis of pixel values for which image filtration algorithms are implemented. The approach is applied to the weights of three types of the models: single-layer network, two-layer backpropagation network and the hybrid network. The performance of the models is then compared on two independent data sets. By means of the experiments, it is presented that the adjustment of the weights to new values decreases test error value compared to the error obtained for initial set of weights.
KeywordsWeights Neural network Filtration Digit recognition
Character recognition problem has enjoyed great attention for a few decades. Back in 1972, the way of automatic recognition of handwritten characters was already described . In mid-eighties, the method of "learning" character sets and various feature extraction techniques were proposed . Machine learning techniques, such as neural networks, played a very important role in this domain . In 1990, the application of a backpropagation neural network to the recognition of handwritten US Postal Service Office zip-code was presented . Input data differed significantly in writing style, character size, overlapping numerals, postmarks, horizontal bars and marks on the envelope which made the recognition process more difficult. Even though, the performance on zip-code digits was 92% recognition, 1% substitution and 7% rejects . In the field of character classification, one did not only focus on offline handwriting recognition, which is performed after the process of writing. The effort was also devoted to online, i.e. dynamic handwriting recognition in which the machine recognizes the characters while the user writes . The transducers (converters, e.g. tablet) were used for this purpose, but the process was strongly dependent on the power of contemporary computers. It is necessary to emphasize that neural networks are not the only models that have been used in handwritten patterns classification. The other methods of computational intelligence have also been applied. There are many contributions that present the use of distance classifiers [6, 7, 8], support vector machines [9, 8] or decision trees .
In spite of the fact that the task of character recognition has been thoroughly explored, it still attracts a lot of researchers nowadays. A great number of scientists still apply neural networks for this purpose. The recognition of subcontinental languages, e.g. Chinese letters [11, 12], Persian fonts  or Indian numeral optical characters [8, 14, 15], receives an increasing attention. For these particular cases, backpropagation neural networks, particle swarm optimization neural networks, single-layer perceptrons and probabilistic neural networks were used. Numerous amount of work has been done on benchmarking Arabic digits (e.g. CENPARMI released by Concordia University, CEDAR released by CEDAR-SUNY Buffalo or MNIST extracted from the NIST database) where various neural models (multilayer perceptrons, radial basis function networks, learning vector quantization networks and polynomial networks) were tested against state-of-the-art machine learning techniques such as nearest neighbor classifiers, naive Bayes, rule-based learning or support vector machines [16, 17, 18].
In this work, the concept of the graphical interpretation of the single-layer neural network weights is proposed. The model is designed to classify all digits; thus, it is equipped with 10 neurons where each element is responsible for the recognition of a single numeral. Once the training process of the network is completed, it is shown that it is possible to convert the weights to pixel values in order to transform model parameters into the images. On the basis of the fact that the networks weights can be regarded as an image, the filtration algorithms are applied to the pixels obtained from the weight values. The filtered pixels then serve for new weights computation. The idea is tested on two data sets using three types of the models: single-layer network, two-layer backpropagation network and the hybrid network by comparing the efficiency of the networks with the weights computed from the filtered images, and the performance of the models having original set of parameters. For computational purposes, all the models, image transformations and filtration algorithms were hard-coded in the authors’ software.
The main motivation of such a research lies in the intention of understating how the neural model "perceives" input data, how it faces image filtration and whether it is capable of generalizing to unknown examples.
The paper is organized as follows. Section 2 describes handwritten digit data sets used for recognition. In Sect. 3, the neural networks employed to the classification are briefly described. Section 4 highlights the graphical interpretation of the single-layer network weights. Later on, in Sect. 5, the filtration algorithms applied to image pixels and new weight modification method are discussed. Section 6 verifies the performance of the neural networks in two digit classification tasks. Finally, Sect. 7 presents the conclusions.
2 Input data sets
Two digit databases are considered in the work. The first set represents the numerals entered by means of Wacom CTE-440/S graphics tablet. Its working area covered A6 letter format (127.6 × 92.8 mm). The device resolution reached 2,000 dpi (787 lines/cm). Input patterns were entered by means of wireless pencil lead. Total input data included 1,000 handwritten digits (\(0,1,\ldots ,9\)), which, in turn, were converted to 30 × 40 size. For the sake of unification of all digit patterns, some necessary transformation operations were carried out. Initially, all the digits needed to be rescaled  since while writing on the tablet, their size was different. Then, each image underwent binarization  to be converted from colorful to the one in gray scale. Additionally, in order to limit the information of the characters that were written with thick lines and to extract the parts that represent the relevant elements of the images, the patterns were peeled off using skeletonization algorithm , . Finally, due to the fact that the placement of the digits within the frame of the device’s screen was different, all the characters had to be centered for proper representation in 30 × 40 pixel pattern.
The second set was the MNIST database , a subset of a larger set available from National Institute of Standards and Technology (NIST). It consisted of 60,000 training examples and 10,000 test examples. The digits were size-normalized and centered in a fixed-size 28 × 28 image. The images contained gray levels as a result of the anti-aliasing technique used by the normalization algorithm. The regular 28 × 28 database along with the content description and a performance results for some computational intelligence methods is available at .
3 Neural networks used in digit recognition
Three types of neural networks were analyzed in the research: single-layer network, two-layer backpropagation network and the hybrid network. All the models are shortly discussed in the following subsections.
3.1 Single-layer network
3.2 Two-layer backpropagation network
3.3 Hybrid network
4 Graphical interpretation of single-layer network’s weights
The initial idea behind the interpretation of the weights of the single-layer network was to understand how the particular neurons of the artificial model "see" their coefficients. Since the row input consists of 30 × 40 elements for the tablet data and 28 × 28 elements for the NIST database connected to m = 10 neurons, one may ponder whether it is possible to generate the pictures of the same size showing the weight values computed for each neuron after training process. Can the weights be perceived as an image of a digit?
In order to find the answer, it is necessary to provide the method of how to convert the weights (a set of real numbers) into the values that can correspond to the pixels with some brightness. Once such a transformation is determined, one can represent the set of calculated pixels in a resolution image. In this section, such a weight visualization is proposed. The following example highlights the idea.
As shown, neural network parameters determined after training process do not have to be only treated as some numbers that allow the model to classify input examples. The weights can also be interpreted as the images and, what was shown in this section, such an interpretation can illustrate the effect of network’s training and the way the artificial model "understands" digit recognition.
5 Image filtration and the weight adjustment
As shown in Sect. 4, the neural network weights can be interpreted in a graphical form. Each appropriately normalized coefficient is then treated as the element of the image seen by the neuron. This image, after the model training process, resembles a digit to a large degree. However, the picture is not so "perfect" as the original input pattern. For example, the neuron weights obtained from NIST digits illustrated in Fig. 1 in the form of pixels are blurry. The weights calculated from the tablet numerals, as the image, do not have a strong signal (white pixels) but are more distinct. For this reasons, high- and low-pass filtration algorithms were applied to pixels computed from the set of optimal weights found within the network training on tablet and NIST data sets. On the basis of filtered pixels, neural network weight modification was introduced. Following subsections describe the filtration algorithms and the solution of how to adjust the weights parameters on the basis of new pixel values.
5.1 Filtration algorithms
5.2 Weights adjustment
6 The performance of neural networks on digit data sets
In this part of work, the comparative efficiency analysis was conducted for the single-layer network, two-layer backpropagation network and the hybrid network in the classification of tablet and NIST data sets. The comparison was carried out by measuring two indicators of the performance: test error—calculated by the models with the weights obtained after training process, and test error after filtration—determined by the networks with the weights updated according to the mapping (13). Both factors were computed on the test set different from the patterns used in the training process: 200 of numerals for the tablet data set and 10,000 digits for NIST database. The errors were measured as the function of the network training parameters. Two following subsections highlight the results received on each data set. Afterward, a short summary is added.
6.1 Tablet data set
Gaussian filtration (LPG mask) provided the lowest test error for single-layer network (10.425%), two-layer backpropagation network (9.468%) and the hybrid network (17.234%),
the lowest overall test error (9.468%) was achieved by two-layer backpropagation network with the set of weights modified by Gaussian filtration algorithm,
the highest reduction in the error rate was equalled 2.925%—it was found when applying Gaussian filtration to the weights of backpropagation network,
all filtration algorithms decreased 19.894% test error by the margin of 1.276% (HP3), 2.021% (LP3) and 2.659% (LPG) for the hybrid network,
HP3 mask filtration increased test error by 0.106% and 0.851% for the single-layer and backpropagation network, respectively.
The lowest percentage of test error (Test) and test errors after filtration (HP3, LP3, LPG) found for single-layer network, two-layer backpropagation network and hybrid network in tablet digits classification
Minimum error values [%]
6.2 NIST database
HP3 and LP3 mask decreased test set error for each neural network,
all filtration algorithms decreased 18.174% test error by the margin of 0.424% (HP3), 0.382% (LP3) and 0.286% (LPG) for the hybrid network,
LP3 mask filtration applied to backpropagation network weights provided the lowest test error among all classifiers,
Gaussian filtration algorithm made the test error increase for the single-layer network by 0.111%.
The lowest percentage of test error (Test) and test errors after filtration (HP3, LP3, LPG) recorded for single-layer network, two-layer backpropagation network and hybrid network in NIST database pattern recognition
Minimum error values [%]
However, HP3 mask applied to the weights obtained from the training of the NIST digits made the filtration test error lower than the error determined on the unfiltered coefficients for the single-layer network and the hybrid network. LPG filtration, as shown in Figs. 2 and 3 provided here worse results for the single- and two-layer networks, respectively. It can be justified by the fact that the set of weights as the image was blurry (Fig. 1). On the other hand, for the hybrid network (Fig. 4), all filtration algorithms decreased test error rate for each number of hidden neurons.
In the article, the method of single-layer neural network weights interpretation was proposed. The network was destined to recognize digits from the range \(0,1,\ldots ,9; \) therefore, it was built of 10 neurons. Each unit recognized single numeral. It was shown that after the training of the model, one is possible to transform the weight values to the image pixels. The idea was tested on two data sets: 30 × 40 resolution 1,000 digits entered by means of the graphical tablet and 60,000 NIST web page database digits with 28 × 28 size. Furthermore, high- and low-pass filtration algorithms were applied to the pixels computed from the weight values. On the basis of the filtered pixels, the weights of the network were adjusted. This approach was then verified on three types of the models: single-layer network, two-layer backpropagation network and the hybrid network by comparing the performance of the models having the weights computed from the filtered images and the coefficients obtained after training process. The analysis were carried out on both data sets. The results presented in the work showed that, in both data classification cases, the filtration algorithms decreased test error calculated by the networks with the weights set to values determined after training process. In particular, in tablet data recognition, the use of the LPG mask (Gaussian low-pass filter) provided the lowest test error for single-layer network (10.425%), two-layer backpropagation network (9.468%) and the hybrid network (17.234%). Moreover, this filtration algorithm applied to the weights of backpropagation network reduced the test error rate by the margin of 2.925% what yielded the lowest test error among all models (9.468%).
The improvement obtained by considered networks in the test error rate for NIST digit recognition was not that large though. The highest reduction of this indicator (0.424%) was obtained when using high-pass filtration (HP3 mask) to the weights of the hybrid network. The application of both HP3 and LP3 masks decreased admittedly the test error value, but the gain was subtle. It can be explained by the fact that this particular data set is a web base to which no image preprocessing was applied. In contrast, the tablet data set images, before been fed as the input to the networks, underwent the skeletonization process that extracted the shape of pattern digits.
The entire process of computing the pixels from the optimal network weights, applying the filtration algorithm to calculated pixels and, finally, updating the weights on the basis of filtered pixels can increase the generalization ability of the neural network. However, it is important to add that such an improvement can be found if an appropriate image filtration is applied. Sometimes, it may even amount to a trial and error approach. Moreover, some data preprocessing has to be performed since the images to be classified usually contain a lot of information, which mislead the network in the process of generalization.
This research was partially supported by Rzeszow University of Technology Grant No. U-8255/DS and NN 514 705540 from National Science Centre. The authors are grateful to valuable comments of the anonymous reviewer. All the remarks significantly improved the quality of the manuscript.
This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.