# Rough Deep Belief Network - Application to Incomplete Handwritten Digits Pattern Classification

## Abstract

The rough deep belief networks (RDBN) are new modification of well known deep belief networks. Thanks to applied elements from Pawlak’s rough set theory, RDBNs are suitable in processing of incomplete patterns. In this paper we present the results of adaptation of this class of networks for classification of handwritten digits. The samples of the pattern applied in the learning and working processes are randomly corrupted. This allows to study the robustness of classifier for various levels of incompleteness.

### Keywords

Deep belief network Rough set Missing features## 1 Introduction

The Restricted Boltzmann Machine [7, 27] is one of sophisticated types of neural networks which can process probability distribution, and is applied to filtering, image recognition, and modelling [4]. Deep belief network [2, 9] is a structure that contains RBMs. As other types of computational intelligence systems, DBNs can process real data, which often contain imperfections such as noise, inexactness, uncertainty and incompleteness. The easiest way to use such data is kind form of preprocessing. In the case of incompleteness there are two general ways, imputation and marginalization. They can take into consideration a class of incompleteness, for example MCAR (Missing Completely At Random), MAR (Missing At Random), MNAR (Missing Not At Random) [19]. An interesting way to process the data with a variable set of available values of input features is the rough set theory proposed by Pawlak [25, 26]. It defines the approximations of the sets in form of the pair of sets, called the rough set, and consist of the lower and upper approximation. The quality of approximation depends on the usefulness of available knowledge. The theory has been extended by defining the rough fuzzy sets, fuzzy rough sets [5, 6], covering rough sets [30, 31] and other. It allows us to extend various types of a fuzzy systems [3, 12, 13, 14, 21, 22], a nearest neighbor classifier [23], a decision tree [20] and other [2, 24] to work with missing data. The resulting systems have been called rough fuzzy systems, rough k-NN classifiers etc. In some solutions missing values are replaced by appropriate interval which can cover the whole domain of feature (MCAR) or its parts (MAR, MNAR). Answer of the systems is represented as an interval or, in the case of classification, information about assignment to one of three regions defined in rough set theory, i.e. positive, boundary and negative. It means that, using available input information, the classifier can decide that the object being classified definitely belongs to a class (positive region), definitely does not belong to a class (negative region) or that input information is insufficient to make a decision (boundary region). It also allows us to start the classification process with limited description of classified object and complement it until the answer is either positive or negative.

In the paper we introduce Rough Deep Belief Network (RDBN) which is created in similar way. It is a structure that contains rough restricted Boltzmann machines (RRBM) capable of processing information in the form of intervals as well as incomplete data. It should be noted that the vast majority of network-like architectures are suitable in various parallel implementations. It could be realized using many signal processors connected by dedicated serial bus [1] and multicore CPU architectures [28, 29]. Nowadays, networks are even implemented in structures made of single molecules [17], for example distributed in mesoporous silica matrix [15, 16]. RDBN like other rough hybrids uses answer “unknown” when input information is too incomplete to make a credible answer. In the same case, other classifiers give answer with low level of credibility — frequently incorrect.

The paper is organized as follows. The Sect. 2 brings the reader to architecture of DBN, followed by Sect. 3 describes RDBN. The following section describes the MNIST database of handwritten digits which was used in testing. Then, the obtained result is presented. Section 6 summarizes the work.

## 2 Deep Belief Network Architecture

*h*that are fully connected in an undirected model to a set of stochastic visible units

*v*as shown in Fig. 1. The RBM - (

*l*) defines the following joint distribution:

*i*and hidden unit

*j*, \(b_{vi}\), \(b_{hj}\) are their biases and \(w_{ij}\) is the weight between them. The network assigns a probability to every possible pair of visible and hidden vectors via following energy function:

*partition*

*function*

*Z*is given by summing over all possible pairs of visible and hidden vectors:

*j*is set to 1 with probability:

*i*can be set to 1 with probability:

*L*is an additional layer, \(y^{(L)}_{j}\) output from the network, \(w^{(L)}_{ij}\) and \( b^{(L)}_{j}\) weight and bias of extra layer, their initial value is set to 0, \(h^{(L)}_i\) value is obtained from the last RBM layer with a scholar DBN. Softmax function is calculated as follows:

## 3 Rough Deep Belief Network

*j*-th neuron in hidden layer of lower RBM is signed by \(underline h_j(t)\), and \(overline h_j(t)\) in the case of upper RBM. They are derived with the probability described by non-linear output of the neurons as follows:

The common weights \(w_{ij}\), biases in hidden layers \(b_{\mathrm {h}j}(t)\) and visible layers \(b_{\mathrm {v}i}(t)\) are corrected using correction values \({\varDelta }w_{ij}\), \({\varDelta }b_{\mathrm {h}j}(t)\) and \({\varDelta }b_{\mathrm {v}i}(t)\) which come from both upper and lower RBMs.

*L*is an additional layer, \(\underline{y}^{(L)}_{j}\) and \(\overline{y}^{(L)}_{j}\) are output from the network, \(\underline{s}^{(L)}_{i}(t)\) and \(\overline{s}^{(L)}_{i}(t)\) are values obtained from the last RBM with a scholar DBN. Softmax function is calculated using Eq. 12.

## 4 The MNIST Database of Handwritten Digits

In our work we used MNIST database which contains samples of handwritten digits. Samples are commonly used while testing machine learning, pattern recognition techniques, and their implementations. Database has been created from NIST’s databases and is divided into 60,000 training samples and 10,000 testing samples. Each sample is a 28 by 28 gray-scale image representing a single handwritten digit. All samples have been scaled down to 20 by 20 bounding box while preserving their aspect ratio and positioned so the center of mass of the pixels is at the center of 28 by 28 image.

Files containing labels have very similar format but instead of image data they only contain a single byte value for each image. Values are in range from 0 to 9 and describe what kind of digit corresponding image represents. Labels are specified in the same order as images data set.

## 5 Implementation and Experimental Results

For the purpose of our study, solution has been implemented in Matlab which allowed us to compare results with implementation presented by Karpathy [10, 11]. In all our tests we used 6000 samples of hand-written digits from the data set described in Sect. 4.

Tested image is given classification when lower and upper systems provide the same answer. If the answers are different, RDBN informs us that it does not recognize the digit. This way system refrains from making possibly incorrect classification and reduces total number of mistakes. Resulting information is thereby more reliable and more accurate. Details describing how digit recognition is performed is shown on Algorithm 1.

System is able to correctly classify most images when little information is missing. When the amount of missing information exceeds \(25\,\%\) system starts being unable to correctly classify majority of samples and the number of “unknown” answers increases. Unknown answers represent different type of classification and are therefore not counted in diagrams representing correct and incorrect answers. Thanks to extending the range of possible answers by response “unknown”, total number of mistakes for RDBN system has been significantly reduced which is shown in Fig. 4. Results show that RDBN system provides noticeably fewer incorrect answers than DBN. Comparison of correct classification results between RDBN and DBN is shown in Fig. 5. RDBN gives a similar number of correctly classified digits as DBN which proves comparable level of efficiency.

Some of the samples used for testing are particularly difficult to classify due to irregularities of handwriting. Removing randomly information can turn a digit into different one or make it unrecognizable. Figure 6 shows tested samples with randomly removed \(5\,\%\) of information which RDBN could not correctly recognize and gave incorrect answer. Figure 7 shows samples which system could not classify and instead gave answer “unknown”.

## 6 Conclusions and Future Work

In the paper we examined rough deep belief network as a system for recognition of handwritten digits in samples with missing values. The investigation was processed for various level of missing input information to evaluate the robustness of the classifier. The obtained results confirm again that the rough set theory is a useful to extend traditional computational intelligence systems. The digits were recognized with quite high level of missing pixels. The indisputable advantage of RDBN and other system extended using rough set theory is the possibility to apply the incomplete information also in the developing (e.g. learning) phase. The future step in the investigation is to use RDBN with data containing other forms of imperfection, for example patterns with erroneous values and noise.

### References

- 1.Bilski, J.: Momentum modification of the RLS algorithms. In: Rutkowski, L., Siekmann, J.H., Tadeusiewicz, R., Zadeh, L.A. (eds.) ICAISC 2004. LNCS (LNAI), vol. 3070, pp. 151–157. Springer, Heidelberg (2004)CrossRefGoogle Scholar
- 2.Chu, J.L., Krzyzak, A.: The recognition of partially occluded objects with support vector machines and convolutional neural networks and deep belief networks. J. Artif. Intell. Soft Comput. Res.
**4**(1), 5–19 (2014)CrossRefGoogle Scholar - 3.Cpaka, K., Nowicki, R., Rutkowski, L.: Rough-neuro-fuzzy systems for classification. In: The First IEEE Symposium on Foundations of Computational Intelligence (FOCI 2007) (2007)Google Scholar
- 4.Dourlens, S., Ramdane-Cherif, A.: Modeling & understanding environment using semantic agents. J. Artif. Intell. Soft Comput. Res.
**1**(4), 301–314 (2011)Google Scholar - 5.Dubois, D., Prade, H.: Rough fuzzy sets and fuzzy rough sets. Int. J. Gen. Syst.
**17**(2–3), 191–209 (1990)CrossRefMATHGoogle Scholar - 6.Dubois, D., Prade, H.: Putting rough sets and fuzzy sets together. In: Słowiński, R. (ed.) Intelligent Decision Support: Handbook of Applications and Advances of the Rough Sets Theory, pp. 203–232. Kluwer, Dordrecht (1992)CrossRefGoogle Scholar
- 7.Hinton, G.: Training products of experts by minimizing contrastive divergence. Neural Comput.
**14**(8), 1771–1800 (2002)CrossRefMATHGoogle Scholar - 8.Hinton, G.: A practical guide to training restricted Boltzmann machines. Momentum
**9**(1), 926 (2010)Google Scholar - 9.Hinton, G., Osindero, S., Teh, Y.W.: A fast learning algorithm for deep belief nets. Neural Comput.
**18**(7), 1527–1554 (2006)MathSciNetCrossRefMATHGoogle Scholar - 10.Karpathy, A.: Code for training restricted Boltzmann machines (RBM) and deep belief networks in MATLAB. https://code.google.com/p/matrbm/
- 11.Karpathy, A.: CPSC 540 project: Restricted Boltzmann machinesGoogle Scholar
- 12.Korytkowski, M., Nowicki, R., Rutkowski, L., Scherer, R.: AdaBoost ensemble of DCOG rough–neuro–fuzzy systems. In: Jędrzejowicz, P., Nguyen, N.T., Hoang, K. (eds.) ICCCI 2011, Part I. LNCS, vol. 6922, pp. 62–71. Springer, Heidelberg (2011)Google Scholar
- 13.Korytkowski, M., Nowicki, R., Scherer, R.: Neuro-fuzzy rough classifier ensemble. In: Alippi, C., Polycarpou, M., Panayiotou, C., Ellinas, G. (eds.) ICANN 2009, Part I. LNCS, vol. 5768, pp. 817–823. Springer, Heidelberg (2009)CrossRefGoogle Scholar
- 14.Korytkowski, M., Nowicki, R., Scherer, R., Rutkowski, L.: Ensemble of rough-neuro-fuzzy systems for classification with missing features. Proc. World Congr. Comput. Intell.
**2008**, 1745–1750 (2008)Google Scholar - 15.Laskowski, L., Laskowska, M.: Functionalization of sba-15 mesoporous silica by cu-phosphonate units: probing of synthesis route. J. Solid State Chem.
**220**, 221–226 (2014)CrossRefGoogle Scholar - 16.Laskowski, L., Laskowska, M., Balanda, M., Fitta, M., Kwiatkowska, J., Dzilinski, K., Karczmarska, A.: Mesoporous silica sba-15 functionalized by nickel-phosphonic units: raman and magnetic analysis. Microporous Mesoporous Mater.
**200**, 253–259 (2014)CrossRefGoogle Scholar - 17.Laskowski, Ł., Laskowska, M., Jelonkiewicz, J., Boullanger, A.: Spin-glass implementation of a hopfield neural structure. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2014, Part I. LNCS, vol. 8467, pp. 89–96. Springer, Heidelberg (2014)CrossRefGoogle Scholar
- 18.Le Roux, N., Bengio, Y.: Representational power of restricted boltzmann machines and deep belief networks. Neural Comput.
**20**(6), 1631–1649 (2008)MathSciNetCrossRefMATHGoogle Scholar - 19.Little, R., Rubin, D.: Statistical Analysis with Missing Data. Wiley, New York (1987)MATHGoogle Scholar
- 20.Nowak, B.A., Nowicki, R.K., Mleczko, W.K.: A new method of improving classification accuracy of decision tree in case of incomplete samples. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2013, Part I. LNCS, vol. 7894, pp. 448–458. Springer, Heidelberg (2013)CrossRefGoogle Scholar
- 21.Nowicki, R.: Rough-neuro-fuzzy structures for classification with missing data. IEEE Trans. Syst. Man Cybern.-Part B: Cybern
**39**(6), 1334–1347 (2009)CrossRefGoogle Scholar - 22.Nowicki, R.: On combining neuro-fuzzy architectures with the rough set theory to solve classification problems with incomplete data. IEEE Trans. Knowl. Data Eng.
**20**(9), 1239–1253 (2008)CrossRefGoogle Scholar - 23.Nowicki, R.K., Nowak, B.A., Woźniak, M.: Rough k nearest neighbours for classification in the case of missing input data. In: Proceedings of the 9th International Conference on Knowledge, Information and Creativity Support Systems, Limassol, Cyprus, pp. 196–207, November 2014Google Scholar
- 24.Pawlak, M.: Kernel classification rules from missing data. IEEE Trans. Inf. Theory
**39**, 979–988 (1993)CrossRefMATHGoogle Scholar - 25.Pawlak, Z.: Rough sets. Int. J. Comput. Inform. Sci.
**11**(5), 341–356 (1982)CrossRefMATHGoogle Scholar - 26.Pawlak, Z.: Rough Sets: Theoretical Aspects of Reasoning About Data. Kluwer, Dordrecht (1991)CrossRefMATHGoogle Scholar
- 27.Smolensky, P.: Information processing in dynamical systems: foundations of harmony theory. In: Rumelhart, D.E., McLelland, J.L. (eds.) Parallel Distributed Processing: Explorations in the Microstructure of Cognition: Vol. 1 Fundations, pp. 194–281. MIT Press, Cambridge (1986)Google Scholar
- 28.Staff, C.I., Reinders, J.: Parallel Programming and Optimization with Intel\(\textregistered \) Xeon PhiTM Coprocessors: Handbook on the Development and Optimization of Parallel Aplications for Intel\(\textregistered \) Xeon Coprocessors and Intel\(\textregistered \) Xeon PhiTM Coprocessors. Colfax International, Sunnyvale (2013)Google Scholar
- 29.Szustak, L., Rojek, K., Gepner, P.: Using Intel Xeon Phi coprocessor to accelerate computations in MPDATA algorithm. In: Wyrzykowski, R., Dongarra, J., Karczewski, K., Waśniewski, J. (eds.) PPAM 2013, Part I. LNCS, vol. 8384, pp. 582–592. Springer, Heidelberg (2014)CrossRefGoogle Scholar
- 30.Zhu, W., Wang, F.Y.: Reduction and axiomization of covering generalized rough sets. Inform. Sci.
**152**, 217–230 (2003)MathSciNetCrossRefMATHGoogle Scholar - 31.Zhu, W., Wang, F.Y.: On three types of covering-based rough sets. IEEE Trans. Knowl. Data Eng.
**19**(8), 1131–1144 (2007)CrossRefGoogle Scholar