# A Neuromorphic Architecture for Context Aware Text Image Recognition

## Abstract

Although existing optical character recognition (OCR) tools can achieve excellent performance in text image detection and pattern recognition, they usually require a clean input image. Most of them do not perform well when the image is partially occluded or smudged. Humans are able to tolerate much worse image quality during reading because the perception errors can be corrected by the knowledge in word and sentence level context. In this paper, we present a brain-inspired information processing framework for context-aware Intelligent Text Recognition (ITR) and its acceleration using memristor based crossbar array. The ITRS has a bottom layer of massive parallel Brain-state-in-a-box (BSB) engines that give fuzzy pattern matching results and an upper layer of statistical inference based error correction. Optimizations on each layer of the framework are introduced to improve system performance. A parallel architecture is presented that incorporates the memristor crossbar array to accelerate the pattern matching. Compared to traditional multicore microprocessor, the accelerator has the potential to provide tremendous area and power savings and more than 8,000 times speedups.

### Keywords

Neuromorphic Text recognition Memristor crossbar array## 1 Introduction

Military planning, battlefield situation awareness, and strategic reasoning rely heavily on the knowledge of the local situation and the understanding of different cultures. A rich source of such knowledge is presented as natural-language text. Autonomous and intelligent recognition of printed or handwritten text image is one of the key features to achieve situational awareness. Although generally effective, *Conventional Optical Character Recognition (OCR)* tools or pattern recognition techniques usually have difficulties in recognizing images that are noisy, partially occluded or even incomplete due to the damages to the printing material, or obscured by marks or stamps.

However, such tasks are not too difficult for humans, as the errors in image recognition will be corrected later using semantic and syntactic context. Most human cognitive procedures involve two interleaved steps, sensing and association. Together, they provide higher accuracy.

Computing models have been developed for performing cognitive functions on raw input signals such as image and audio. One representative area in this category is the associative neural network model, which is typically used for pattern recognition. We generally say that this kind of model performs the “sensing” function. In the other category, models and algorithms are researched to operate on the concept-level objects, assuming that they have already been “recognized” or extracted from raw inputs. In a recent development, the cogent confabulation model was used for sentence completion [1, 2]. Trained using a large amount of literatures, the confabulation algorithm has demonstrated the capability of completing a sentence (given a few starting words) based on conditional probabilities among the words and phrases. We refer these algorithms as the “association” models. The brain inspired signal processing flow could be applied to many applications. A proof-of-concept prototype of context-aware *Intelligence Text Recognition system (ITRS)* is developed on high performance computing cluster [3]. The lower layer of the ITRS performs pattern matching of the input image using a simple non-linear auto-associative neural network model called *Brain-State-in-a-Box (BSB)* [4]. It matches the input image with the stored alphabet. A race model is introduced that gives fuzzy results of pattern matching. Multiple matching patterns will be found for one input character image, which is referred as ambiguity. The upper layer of the ITRS performs information association using the cogent confabulation model [1]. It enhances those BSB outputs that have strong correlations in the context of word and sentence and suppresses those BSB outputs that are weakly related. In this way, it selects characters that form the most meaningful words and sentences.

Both BSB and confabulation models are connection based artificial neural networks, where *weight matrices* are used to represent synapses between neurons and their operation can be transformed into matrix–vector multiplication(s). Hardware realizations of neural networks require a large volume of memory and are associated with high cost if built with digital circuits [5].

The memristor has been discovered as a promising device for massively parallel, large-scale neuromorphic systems. A memristor can “remember” the total electric charge/flux ever to the flow through it [6], which is analogous to synapses among neurons. Moreover, memristor-based memories can achieve a very high integration density of 100-Gbits/cm^{2}, a few times higher than flash memory technologies [7]. Due to these properties, memristor crossbar, which employs a memristor at each intersection of horizontal and vertical metal wires, is proposed to facilitate weight matrices storage and matrix–vector multiplication.

In this paper, we present the brain inspired information processing framework and its acceleration using memristor crossbar array. The remainder of the paper is organized as follows. In Section II, we discuss some related neuromorphic works while in Section III we introduce the basics of models used for sensing and association in the ITRS system. Section IV describes the overall system model and the algorithms in different layers. Section V gives the details of hardware acceleration using memristor crossbar array. The experimental results and discussions are presented in Section VI. Section VII summarizes the work.

## 2 Related Works

During recent years, neuromorphic computing has become an important research area. The research works range from applications to hardware implementations.

In [22] Voorhies et al. introduced a uniquely structured Bayesian learning network with combined measure across spatial and temporal scales on various visual features to detect small, rare events in far-field video streams. However, this structured Bayesian learning network does not fit applications like text recognition and completion easily. Authors of [23] proposed a sophisticated method based on spiking neuromorphic systems with event-driven contrastive divergence trained Restricted Boltzmann Machines and apply the model to recognize the image of MNIST hand-written digit. However their application limits only in the pattern matching layer, and did not go beyond that. M. Schmuker at el. [24] demonstrates a brain-like processing using spiking neuron network, which achieves classification of generic multidimensional data. No specific application, however, is discussed in this work. It only provides a proof of concept design of analog electronic microcircuits to mimic behavior of neurons for real-world computing tasks.

Many existing neuromorphic computing researches concentrate on pattern matching applications such as video detection or character recognition. Very few of them study the function of probabilistic inference in neuromorphic computing. Some works also focus on developing general hardware architecture for neuromorphic computing. For example, IBM’s TrueNorth [25] is an efficient, scalable and flexible non-von Neumann architecture, which integrates 1 million programmable spiking neurons and 256 million configurable synapses. The hardware is suited to many applications such as multi-object detection and classification. Other novel architectures utilize emerging device technologies such as memristor crossbar or *phase change memory* (*PCM*). Authors of [18] attempt to implement data storage using memristor and [26] describe a memristor based neuromorphic circuit capable of learning which is tolerant of error. Suri, M. at el. [27] demonstrate a unique energy efficient methodology that uses PCM as synapse in ultra-dense large scale neuromorphic systems. In [27] the demonstration of complex visual pattern extraction from real world data using PCM synapses in a 2-layer spiking neural network is shown.

To the best of our knowledge, our proposed architecture is the first that covers both the pattern matching layer and probabilistic inference layer in neuromorphic computing. Neither have the implementation on state-of-the-art multicore processor nor the projected acceleration using memristor crossbar array been addressed in previous works.

## 3 Background

### 3.1 Neural Network and BSB Model

*training*and

*recall*. The mathematical model of BSB recall function can be represented as:

**x**is an

*N*dimensional real vector and

**A**is an

*N*-by-

*N*connection matrix, which is trained using the extended Delta rule.

**Ax**(

*t*) is a matrix–vector multiplication, which is the main function of the recall operation.

*α*is a scalar constant feedback factor.

*β*is an inhibition decay constant.

*S*() is the “squash” function defined as follows:

For a given input pattern **x**(0), the recall function computes (1) iteratively until *convergence*, that is, when all entries of **x**(*t* + 1) are either ‘1’ or ‘−1’[14].

Algorithm 1. BSB training algorithm using Delta rule.

*Step 0.* Initialize weights (zero or small random values).

Initialize learning rate *α*.

*Step 1.* Randomly select one prototype pattern *γ*^{(k)}∈ *B*^{n}, *k* = 1,...., *m. B*^{n} is the *n*-dimension binary space (−1, 1).

Set target output to the external input prototype pattern *γ*^{(k)} : *t*_{i} = *γ*_{i}.

*Step 2.* Compute net inputs: \( {y}_{i{n}_i}={\sum}_j{\gamma}_j{w}_{ji} \)

(Each net input is a combination of weighted signals received from all units.)

*Step 3.* Each unit determines its activation (output signal):

\( {y}_i=S\left({y}_{i{n}_i}\right)=\left\{\begin{array}{ccccc}\hfill 1,\hfill & \hfill \hfill & \hfill {y}_{in}\hfill & \hfill \ge \hfill & \hfill 1\hfill \\ {}\hfill {y}_{in},\hfill & \hfill -1\hfill & \hfill <{y}_{in}\hfill & \hfill <\hfill & \hfill 1\hfill \\ {}\hfill -1,\hfill & \hfill \hfill & \hfill {y}_{in}\hfill & \hfill \le \hfill & \hfill -1\hfill \end{array}\right. \)

*Step 4.* Update weights: *Δw*_{ij} = *α*⋅(*t*_{j} − *y*_{j})⋅*γ*_{i}.

*Step 5.* Repeat *Steps 1-4* until the condition |*t*(*i*) − *y*(*i*)| < *θ* is satisfied in *m* consecutive iterations.

The most fundamental BSB training algorithm is given in Algorithm 1, which bases on the extended Delta rule [8]. It aims at finding the weights so as to minimize the square of the error between a target output pattern and the input prototype pattern.

### 3.2 Cogent Confabulation

Inspired by human cognitive process, cogent confabulation [1] mimics human information processing including Hebbian learning, correlation of conceptual symbols and recall action of brain. Based on the theory, the cognitive information process consists of two steps: learning and recall. The confabulation model represents the observation using a set of features. These features construct the basic dimensions that describe the world of applications. Different observed attributes of a feature are referred as *symbols*. The set of symbols used to describe the same feature forms a *lexicon* and the symbols in a lexicon are exclusive to each other.

In learning process, matrices storing posterior probabilities between neurons of two features are captured and referred as the *knowledge link*s *(KL)*. A KL stores weighted directed edges from symbols in source lexicon to symbols in target lexicon. The (*i*, *j*) th entry of a KL, quantified as the conditional probability *P*(*s*_{i}|*t*_{j}), represents the Hebbian plasticity of the synapse between *i*th symbol in source lexicon *s* and *j*th symbol in target lexicon *t*. The knowledge links are constructed during learning process by extracting and associating features from the inputs and collection of all knowledge links in the model forms its *knowledge base (KB)*.

During recall, the input is a noisy observation of the target. In this observation, certain features are observed with great ambiguity, therefore multiple symbols are assigned to the corresponding lexicons. The goal of the recall process is to resolve the ambiguity and select the set of symbols for maximum likelihood using the statistical information obtained during the learning process. This is achieved using a procedure similar to the integrate-and-fire mechanism in biological neural system. Each neuron in a target lexicon receives an excitation from neurons of other lexicons through KLs, which is the weighted sum of its incoming excitatory synapses. Among neurons in the same lexicon, those that are least excited will be suppressed and the rest will fire and become excitatory input of other neurons. Their firing strengths are normalized and proportional to their excitation levels. As neurons gradually being suppressed, eventually only the neuron that has the highest excitation remains firing in each lexicon and the ambiguity is thus resolved.

*l*denote a lexicon,

*F*

_{l}denote the set of lexicons that have knowledge links going into lexicon

*l*, and

*S*

_{l}denote the set of symbols that belong to lexicon

*l*. The excitation of a symbol

*t*in lexicon

*l*is calculated by summing up all incoming knowledge links:

*el*(

*s*) is the excitation level of the source symbol

*s*. The parameter

*p*

_{0}is the smallest meaningful value of

*P*(

*s*

_{i}|

*t*

_{j}). The parameter

*B*is a positive global constant called the

*bandgap*. The purpose of introducing

*B*in the function is to ensure that a symbol receiving

*N*active knowledge links will always have a higher excitation level than a symbol receiving (

*N*-1) active knowledge links, regardless of their strength. As we can see, the excitation level of a symbol is actually its log-likelihood given the observed attributes in other lexicons.

## 4 System Architecture

### 4.1 Overview of the ITRS

Figure 1 shows an example of using the ITRS to read texts that have been occluded. The BSB algorithm recognizes text images with its best effort. The word level confabulation provides all possible words that can be formed based on the recognized characters while the sentence level confabulation finds the combination among those words that gives the most meaningful sentence.

### 4.2 Character Level Image Recognition

The region extraction operates at the page level. In this stage pages are broken down to paragraphs. The line extraction operates at paragraph level, which extracts the text lines from a paragraph. The line correction is the next step that corrects all deformations due to warping and rotation. Characters are then extracted and scaled in order to remove perspective distortion. Correct order of text lines in paragraph and correct order of paragraphs in a page are determined in line ordering and paragraph ordering stages. Each character image is labeled with these orders and sent to BSB model for pattern recognition.

We designed a new “racing” algorithm for BSB recalls to implement the multi-answer character recognition process. Let *S* denote the set of characters that we want to recognize. Without loss of generality, assume the size of *S* is 52, which is the number of upper and lower case characters in the English alphabet. We also assume that for each character, there are *M* typical variations in terms of different fonts, styles and sizes. In terms of pattern recognition, there is a total of 52 × *M* patterns to remember during training and to recognize during recall.

*S*. Therefore there will be a set of 52 BSB models and each BSB model is trained for all variations of a character. The multi-answer implementation utilizes the BSB model’s convergence speed to represent the similarity between an input image and the stored pattern. An input image is compared against each one of the 52 BSB models; therefore it triggers 52 recall processes. The number of iterations that each recall process takes to converge is recorded. Then we pick up characters in

*K*“fastest” converged processes as the final output to word confabulation model. Figure 3 gives an example of how the racing mechanism works.

### 4.3 Word Level Confabulation

The combinations based on letter candidates are validated against the trie. For example let’s consider a word “dog”. Its candidates for each letter position are [d o b] [o a g] [g a y]. Word confabulation will traverse through the trie using these candidates to search for the valid words presented in the trie. The valid words will be pushed onto a stack. In this example, these valid words would be: dog, day, boy, bag. Since the letter candidates were passed with their relative confidence level, the confidence level for each word will be the product of the letters it contains.

### 4.4 Sentence Level Confabulation

Sentence level confabulation model defines three levels of lexicons. The first and second level lexicons represent single words and pairs of adjacent words; while the third level of lexicons represent the *parts-of-speech* (POS) tags of the corresponding word. During recall, those word and word-pair symbols corresponding to the outputs from word level confabulation are set as active, and all POS tag symbols are also set as active. If a lexicon has more than one active symbol, it is said to have ambiguity. The goal of sentence confabulation is to resolve the ambiguity iteratively through a recall procedure similar to belief propagation and finally form a meaningful sentence. The general confabulation recall algorithm can is described as follows in Algorithm.2.

As Algorithm 2 shows, for each lexicon that has multiple symbols activated, we calculate the *excitation level* of each activated symbol. The *N* highest excited symbols in this lexicon are kept active. These symbols will further excite the symbols in other ambiguous lexicons. This procedure will continue until the activated symbols in all lexicons do not change anymore. If the convergence cannot be reached after a given number of iterations, then we will force the procedure to converge. Then value of N will be reduced by 1 and we repeat the above procedure. At last N is reduced to 0 which means there is only one active symbol in each lexicon. Then ambiguity is eliminated in all lexicons.

Algorithm 2. Confabulation recall algorithm

**for each***known lexicon**

*set symbol to be active*

**end for**

**for***N from MAX_AMBIGUIOUS downto 1*

*converged = false;*

*iteration_count = 0;*

**while***not converged*

**for each** unknown lexicon

**for each** symbol associated to the lexicon

calculate the excitation level of the symbol;

**end for**

select N highest excited symbols and set them to be active;

**end for**

iteration_count++;

**if** activated set does not change since last iteration or iteration_count > = MAX_ITERATION

converged = true;

**end if**

N--;

**end for**

*lexicons who has only one symbol candidate are denoted as known lexicons, others are unknown lexicons.

### 4.5 Improving Sentence Confabulation

*mutual information*(MI) [9]. Mutual information of two random variables is a measure of variables’ mutual independence, calculated as

*A*is the source lexicon and

*a*represents symbols in

*A*;

*B*is the target lexicon and

*b*represents symbols in

*B. p*(

*a*,

*b*) is the joint probability of symbol

*a*and

*b*;

*p*(

*a*) and

*p*(

*b*) are the margin probability of symbol

*a*and

*b*respectively.

*I*(

*A*;

*B*) is nonnegative. The value of

*I*(

*A*;

*B*) will increase when the correlation of symbols in lexicon

*A*and

*B*get stronger. We defined the weight of KL (i.e.,

*w*

_{kl}) from A to B as positive linear function of MI of A and B.

*t*as follows,

In (5), variable *P*_{BSB}(*t*) is the excitation to *t* from the BSB layer, which is calculated as: \( {P}_{BSB}(t)=\frac{1/\left({N}_{BSB}(t)-{N}_{min}\right)}{{\displaystyle {\sum}_t}1/\left({N}_{BSB}(t)-{N}_{min}\right)} \), where *N*_{BSB}(*t*) is the BSB convergence speed of *t*, *N*_{min} is the minimum convergence number that is possible for BSB engines, α and β are coefficients that adjust the weight of BSB (i.e., image) information and confabulation (i.e., language) information, *α* + *β* = 1. In general, we should increase the value of *α* and decrease the value of *β* when the image has high quality and vice versa.

## 5 Hardware Acceleration of BSB Recall

### 5.1 Memristor and Crossbar Array

*p*(0 ≤

*p*≤ 1) is the relative doping front position, which is the ratio of doping front position over the total thickness of the TiO

_{2}thin-film,

*R*

_{L}and

*R*

_{H}respectively denote the

*low resistance state*(LRS) and the

*high resistance state*(HRS) of the memristor. The velocity of doping front movement

*v*(

*t*), driven by the voltage applied across the memristor

*V*(

*t*), can be expressed as:

*μ*

_{v}is the equivalent mobility of dopants,

*h*is the total thickness of the thin film, and

*M*(

*p*) is the total memristance when the relative doping front position is

*p*. In general, a certain energy (or threshold voltage) is required to enable the state change in a memristive device. When the electrical excitation through a memristor is greater than the threshold voltage, i.e.,

*V*(

*t*) >

*V*

_{th}, the memristance changes (in training). Otherwise, a memristor behaves like a resistor.

### 5.2 Matrix Multiplication Using Memristor Crossbar

*N*-by-

*N*memristor crossbar array illustrated in Fig. 5 for matrix computation, a set of input voltages

**V**

_{I}

^{T}= {

*V*

_{I,1},

*V*

_{I,2}, …,

*V*

_{I,N}} is applied on the

*word-lines*(WL) of the array, and the current through each

*bit-line*(BL) is collected by measuring the voltage across a sensing resistor. The same sensing resistors are used on all BLs with resistance

*r*

_{s}, or conductance

*g*

_{s}= 1/

*r*

_{s}. The output voltage vector

**V**

_{O}

^{T}= {

*V*

_{O,1},

*V*

_{O,2}, …,

*V*

_{O,N}}. Assume the memristor sitting on the connection between WL

_{i}and BL

_{j}has a memristance of

*m*

_{i,j}. The corresponding conductance

*g*

_{i,j}= 1/

*m*

_{i,j}. Then, the relation between the input and output voltages can be represented by:

**C**can be represented by the memristors’ conductance and the load resistors as:

Please note that some non-iterative neuromorphic hardware uses the output currents **I**_{O} as output signals. Since the BSB algorithm discussed in this work is an iterative network, we take **V**_{O} as output signals, which can be directly fed back to inputs for the next iteration without extra design cost.

Equation (8) indicates that a trained memristor crossbar array can be used to construct the weight matrix **C**, and transfer the input vector **V**_{I} to the output vector V_{O}. However, **C** is not a direct one-to-one mapping of conductance matrix **G** as indicated in Eq. (9). Though we can use a numerical iteration method to obtain the exact mathematical solution of **G**, it is too complex and hence impractical when frequent updates are needed.

*g*

_{i,j}∈

**G**satisfies

*g*

_{min}≤

*g*

_{i,j}≤

*g*

_{max}, where

*g*

_{min}and

*g*

_{max}respectively represent the minimum and the maximum conductance of all the memristors in the crossbar array. Thus, a simpler and faster approximation solution to the mapping problem is defined as:

With the proposed fast approximation function (10), the memristor crossbar array performs as a decayed matrix *Ĉ* between the input and output voltage signals, where *ĉ*_{i,j} = *c*_{i,j} ⋅ *g*_{max}/*g*_{s}.

### 5.3 Training Memristor Crossbars in BSB Model

A software generated weight matrix can be mapped to the memristor crossbar arrays based on the assumption that every memristor in the crossbar could be perfectly programmed to the required resistance value. However, the traditional crossbar programming method faces accuracy and efficiency limitations due to the existence of the sneak paths [11]. Although some recent works were presented to improve the write/read ability of memristor crossbars by leveraging the device nonlinearity [11], the controllability of analog state programming is still limited. In spite of preparing the memristor crossbars with open-loop writing operations, we propose a *close-loop training method* which iteratively tunes the entire memristor crossbar to the target state. This technique is based on a modification of the software training algorithm.

*w*

_{ij}corresponds to the analog state of the memristor at the cross-point of the

*i*th row and the

*j*th column in a crossbar array. A weight updating Δ

*w*

_{ij}involves multiplying three analog variables:

*α*,

*t*

_{j}–

*y*

_{j}, and

*x*

_{i}. Though these variables are available in training scheme design, the hardware implementation to obtain their multiplication demands unaffordable high computation resources. Thus, we simplify the weight updating function by trading off the convergence speed as:

Here, *sign*(*t*_{j}–*y*_{j}) and *sign*(*x*_{i}) are the polarities of *t*_{j} − *y*_{j} and *x*_{i}, respectively. *sign*(*t*_{j} − *y*_{j})⋅*sign*(*x*_{i}) represents the *direction* of the weight change.

The simplification minimizes the circuit design complexity meanwhile ensuring the weight change in the same direction as that of the Delta rule.

### 5.4 Transformation of BSB Recall Matrix

A memristor is a physical device with conductance *g* > 0. Therefore, all elements in matrix **C** must be positive as shown in (9). However, in the original BSB recall model, *a*_{i,j}∈**A** can be either positive or negative. An alternative solution is moving the whole **A** into the positive domain. Since the output **x**(*t* + 1) will be used as input signal in the next iteration, a biasing scheme at **x**(*t* + 1) is needed to cancel out the shift induced by the modified **A**. The biasing scheme involves a vector operation since the shift is determined by **x**(*t*).

**A**in physical mapping and leverage the high integration density of memristor crossbar, we propose to split the positive and negative elements of

**A**into two matrixes

**A**

^{+}and

**A**

^{−}as:

*α*=

*β*= 1. Thus,

**A**

^{+}and

**A**

^{−}can be mapped to two memristor crossbar arrays

**M**

_{1}and

**M**

_{2}in a decayed version

**Â**

^{+}and

**Â**

^{−}, respectively, by following (10).

## 6 Experimental Results

In this section, we present several independent experiments carried out on different layers of the ITRS system. Each experiment is specifically designed to demonstrate our improvements on that particular layer over the previous works. Their configuration and results are discussed in detail in the following sections. We also report the accuracy and confidence level of the entire ITRS system when applied to recognize document images with different qualities. At the end, we demonstrate the recall quality of memristor crossbar array based BSB, and analyze its performance and cost.

### 6.1 Performance Improvement in Word Confabulation Layer

Quality of Input Images.

Image sets | Scanned Clean | Scanned Occluded | Camera Occluded |
---|---|---|---|

Avg. SNR | 5.1204 | 4.3756 | 3.8116 |

Avg. PSNR | 8.095 | 7.3515 | 6.7904 |

Improvement in word confabulation layer.

Word Confabulation Time (sec) | |||
---|---|---|---|

Clean image | Scan Occluded | Camera Occluded | |

Original implementation | 310 | 2997 | 2483 |

New implementation | 0.3 | 1.28 | 1.71 |

### 6.2 Performance Improvement in Sentence Confabulation Layer

As the most important layer of ITRS system, more optimizations are proposed on sentence confabulation layer. In order to focus only on the performance of sentence confabulation, for all experiments in this subsection, we set α and β in Eq. (5) to 0 and 1 respectively, in order to decouple the image information from sentence confabulation. We will discuss the impact of parameters α and β in the next subsection.

Comparasion of non-circular and circular model.

Non-circular | Circular | Improvement (%) | |
---|---|---|---|

Training time (sec) | 489180 | 144540 | 70.45 % |

Recall time (sec) | 6317.22 | 5207.83 | 17.56 % |

Accuracy | 54.95 % | 68.13 % | 23.99 % |

We also reduced the initialization time of sentence confabulation by loading the knowledge base in parallel. The size of sentence confabulation knowledge base is more than 7GB. Loading the knowledge base sequentially takes more than 83.9 s. A multi-thread implementation that loads the knowledge base in parallel can reduce the initialization time to 31 s and provides 2.7× speedups.

Integrating the POS tag in confabulation model significantly improves the sentence confabulation accuracy [15]. To evaluate the impact, the tag-assisted confabulation method is compared with no-tag confabulation at various noise levels. In this experiment, we randomly select input character image and add 3 pixel wide horizontal strikes. The noise level percentage means the ratio of characters in text with a 3-pixel wide horizontal strike. Note that the size of original character is 15 × 15 pixels, a 3-pixel wide strike is almost equivalent to 20 % distortion.

The next set of experiments is to show the impact of weighting knowledge link of sentence level confabulation using mutual information between the source and target lexicons.

*Microsoft Research Sentence Completion (MRSC)*challenge. The MRSC challenge intended to stimulate research into language modeling techniques which are sensitive to overall sentence coherence [20]. The challenge consists of fill-in-the-blank questions similar to those widely used in the Scholastic Aptitude Test. We use partial training set provided by MRSC project to train our confabulation model due to the limited time. And we run recall function based on sentence confabulation model with and without weighting knowledge links to fill in the blank words for 1,040 tests in the challenge. Figure 7 shows the recall accuracy of the two different of confabulation models. For each model, the Bandgap is varied from 1 to 1000. As we can see, when bandgap value is 10 or less, assigning weight to KL provides little improvement. However, when the bandgap value exceeds 100, assigning weight to KLs brings visible benefits; it improves accuracy by about 4 %. The recall accuracy becomes saturated after the bandgap exceeds 100. We also observe that, without weighted KL, changing the bandgap value has almost no impact on the recall accuracy. Please note in this experiment, the condition is equivalent to that words are completely covered, sentence level confabulation cannot get any clue from word confabulation. And since we train incomplete training set to save time, some words appear in the tests are not stored in dictionary. An unrecognized word will never be recalled correctly by the confabulation model, thus if we train complete training set, sentence accuracy will be increased. The same testing set was evaluated in [21], our weighted confabulation model gives a slightly higher recall accuracy of 48.30 % than 47 % accuracy based on

*recurrent neural network (RNN)*model. Please note that the RNN model identifies the missing word from the list of candidates by evaluating the probability of the sentence that they could make. Therefore, it has to create a sentence for each combination of the candidates and calculate its probability. The complexity of the RNN is an exponential function of the number of missing words, while the complexity of confabulation model is a linear function of the number of missing words.

### 6.3 Performance Improvement of Overall ITR System

To evaluate the impact of weighting image and language information. We assign *w*_{kl} as 1, and *α* varies from 1 to 0 at step of 0.1, *β* varies from 0 to 1 at step of 0.1. In this experiment, we run the complete ITRS to show the overall performance.

*α*and

*β*control the weight of image information and language knowledge. Adjusting the value of α and β affects the accuracy of ITRs. Figure 8 shows how the word accuracy changes as we vary the value of α and β. In this experiment, we take three sets of images as input, scanned clean images, scanned occluded images and occluded images taken by camera. As we can see, completely ignore either the image inform or language information will lead to poor accuracy. Furthermore, for a clean image, we can rely more on the image information, and the best quality recognition happens when α and β are set at (0.9, 0.1); while for a low quality image, we should rely more on language information, and the best quality recognition happens when α and β are set at (0.7, 0.3).

We further assign confidence level to the words recognized by ITRS. The confidence level is calculated as the normalized excitation difference between the selected candidate and its final competitor in the last round of confabulation, \( c\left({t}_1\right)= min\left[1,\ \frac{I\left({t}_1\right)-I\left({t}_2\right)}{I\left({t}_2\right)}\right] \), where *t*_{1} is the selected word and *t*_{2} is its only competitor in the last round of confabulation. Under this definition, 100 % confidence means that there was only 1 candidate for the lexicon while 0 % confidence means that the excitation level for the two remaining candidates are the same and in that case, the program just chooses the first candidate.

Confidence level of scanned occluded images.

File Name | Test-1 | Test-2 | Test-3 | Test-4 | Total |
---|---|---|---|---|---|

Total Words | 761 | 737 | 745 | 613 | 2856 |

Total Right Words | 731 | 703 | 700 | 575 | 2709 |

Total Wrong Words | 30 | 34 | 45 | 38 | 147 |

Average Confidence of right words (%) | 88.86 | 85.66 | 88.63 | 88.96 | 87.99 |

Average Confidence of wrong words (%) | 16.83 | 20.87 | 15.37 | 24.53 | 19.31 |

Total Average Confidence (%) | 86.03 | 82.67 | 84.20 | 84.96 | 84.46 |

Total Accuracy (%) | 96.06 | 95.39 | 93.96 | 93.80 | 94.85 |

Comparision between ITRS and Tesseract.

Input quality | Scanned clean images | Scanned images with occlusions | Camera images with occlusions |
---|---|---|---|

Tesseract | 100 % | 93.1 % | 88.6 % |

ITRS (default) | 97.6 % | 93.5 % | 90.9 % |

ITRS (best) | 99.0 % | 94.8 % | 91.9 % |

Please note that, unlike Tesseract which recognize words and sentences solely based on image information, ITRS cannot guarantee the recognition of any word that is not in its dictionary. This is because the known words will always receive higher excitation than unknown words during sentence confabulation, which is analogy to human cognition process. If we exclude all proper nouns, such as the name of characters and locations, the word level accuracy of ITRS can be further increased.

### 6.4 Performance Evaluation on Memristor Based BSB Circuit

The robustness of the BSB recall circuit was analyzed based on Monte-Carlo simulations at the component level. Memristor device parameters are taken from [10]. We tested 26 BSB circuits corresponding to the 26 lower case letters from “a” to “z”. The character imaging data was taken from [17]. Each character image consists of 16 × 16 points and can be converted to a (−1, +1) binary vector with *n* = 256. Accordingly, each BSB recall matrix has a dimension of 256 × 256. The training set of each character consists of 20 prototype patterns representing different size-font-style combinations. In each test, we created 500 design samples for each BSB circuit and ran 13,000 Monte-Carlo simulations. We use the *probability of failed recognitions (P*_{F}*)* to measure the performance of a BSB circuit.

*P*

_{F}of each input character pattern without considering any noise sources (“Ideal”) and under the scenario including all the process variations and signal fluctuations (“All noises”). In the figure, “within 3” stands for the failure rate test that the correct answer is within the top 3 recognized patterns, and “1st hit” stands for the failure rate test that the first recognized pattern is the correct answer.

The simulation shows that the performance degradation induced by process variations and signal fluctuations have a constant impact on all of the BSB circuits in the case of “within 3”. When processing a perfect image under ideal conditions, no BSB circuits fail and hence *P*_{F} = 0. After including all static and dynamic noise, *P*_{F} (within 3) ranges from 1 to 7 % for different input characters. When increasing the random point defects to 30 for input images, the range of *P*_{F} (within 3) increase from 0 to 10 % under ideal conditions to 4–16 % after including the noise sources. When considering only the “1st hit” case, the *P*_{F} of most characters, both in “Ideal” or “All noise”, dramatically increases as defects number goes to 30, implying that the input image defects rather than noise dominates the failure rate. Only a few characters, such as “j” and “l”, are more sensitive to noise than defects as they suffer from the high failure rates even without input pattern defects.

Besides accuracy, BSB model emphasize more on speed of calculation, for ambiguity can be eliminated on word confabulation level. We created a Verilog-A memristor model by adopting the device parameters from [18] and scaling them to 65 nm node based on the resistance and device area relation given in [19]. To achieve high-speed and small form factor, we adopt the flash analog-digital converter (ADC) and current steering digital analog converter (DAC) [28] in our design. For more detailed design of the peripheral circuitry of the crossbar array, please refer to [29].

*Neuromorphic Computing Accelerator (NCA)*with size 256 × 256. 93 NCAs are used and each of them implement one BSB model. Table 6 gives the area, power consumption and performance estimation of the accelerator. The processing time is estimated as the time needed to complete one unit workload of BSB computation, which is to check a set of 96 images. In the same table, we also list the power consumption, area and performance of Intel Xeon Sandy Bridge-EP processor as a reference. As we can see, the memristor based neuromorphic computing accelerator provides tremendous reduction from every perspective.

Comparison of memristor and xeon processor.

Implementations | Processing time | Area nmm | Power consumption |
---|---|---|---|

Memristor crossbar | 60 μs | 151 | 875 mW |

Xeon processor | 0.5 s | 435 | 183 W |

With the scaling of memristor devices, the programming energy will be further reduced [30, 31]. For example, Pi et al. demonstrated cross point arrays of memristive devices with a lateral dimension of 8 nm [31]. The 8 nm device arrays made required a programming current of 600 pA, and it only needed 3 nanowatts to power the operation. Moreover, memristance has an inverse proportional relationship with the device area. Thus, memristance will increase with the shrinking of device sizes, resulting in lower operation power consumption of crossbar array.

## 7 Conclusions

This paper presents our work and optimization in neuromorphic computing with performance improvement. A brain-inspired information processing framework is developed that performs document image recognition using pattern matching and statistical information association. The framework has outstanding noise resistance and is capable of recognizing words and sentences from highly damaged images at high accuracy. With optimization on each layer of the framework, local and global accuracy are both increased. The detailed structure of a memristor crossbar array based neuromoprhic accelerator is described. When applied to implement the pattern matching layer of the text recognition system, the memristor based BSB recall circuit has high resilience to process variations and signal fluctuations and NCA based on memristor crossbar array provides more than 8,000× speedups over the Intel Xeon processor. The area and power consumption of the NCA is only 1/3 and 0.5 % of a Xeon processor respectively.

### References

- 1.Hecht-Nielsen, R. (2007).
*Confabulation theory: the mechanism of thought*. Springer.Google Scholar - 2.Qiu, Q., Wu, Q., Burns, D., Moore, M., Bishop, M., Pino, R., Linderman, R. (2011). Confabulation based sentence completion for machine reading.
*Proc. Of IEEE Symposium Series on Computational Intelligence.*Google Scholar - 3.Qinru Qiu, Q., Wu, M., Bishop, R. P., Linderman, R. W. (2013). A parallel neuromorphic text recognition system and its implementation on a heterogeneous high performance computing cluster.
*IEEE Transactions on Computers*,*62*(5).Google Scholar - 4.Anderson, J. A. (1995).
*An introduction to neural networks*. The MIT Press.Google Scholar - 5.Partzsch, J., & Schuffny, R. (2011). Analyzing the scaling of connectivity in neuromorphic hardware and in models of neural networks.
*IEEE Transactions on Neural Networks, 22*(6), 919–935.CrossRefGoogle Scholar - 6.Chua, L. (2011). Resistance switching memories are memristors.
*Applied Physics A: Materials Science & Processing, 102*(4), 765–783.CrossRefMATHGoogle Scholar - 7.Ho, Y. Huang, G. M., Li P. (2009). Nonvolatile memristor memory: device characteristics and design implications.
*International Conference on Computer-Aided Design (ICCAD)*, pp.485–490.Google Scholar - 8.Anderson, J., Silverstein, J., Ritz, S., & Jones, R. (1977). Distinctive features, categorical perception, and probability learning: some applications of a neural model.
*Psychological Review, 84*(5), 413.CrossRefGoogle Scholar - 9.Yang, Z. R., & Zwolinski, M. (2001). Mutual information theory for adaptive mixture models.
*IEEE Transactions on Pattern Analysis and Machine Intelligence, 23*, 396–403.CrossRefGoogle Scholar - 10.Strukov, D. B., Snider, G. S., Stewart, D. R., & Williams, R. S. (2008). The missing memristor found.
*Nature, 453*, 80–83.CrossRefGoogle Scholar - 11.Heittmann A. and Noll, T. G. (2012). Limits of writing multivalued resistances in passive nano-electronic crossbars used in neuromorphic circuits.
*ACM Great Lakes Symposium on VLSI (GLSVLSI)*, pp. 227–232.Google Scholar - 12.Ramacher, U., Malsburg, C. V. D. (2010).
*On the construction of artificial brains*. Springer.Google Scholar - 13.Hasegawa, T., Ohno, T., Terabe, K., Tsuruoka, T., Nakayama, T., Gimzewski, J. K., & Aono, M. (2010). Learning abilites achieved by a single solid-state atomic switch.
*Advanced Materials, 22*(16), 1831–1834.CrossRefGoogle Scholar - 14.Ahmed, K., Qinru Qiu, P., Malani, M. T. (2014). Accelerating pattern matching in neuromorphic intelligent text recognition system using Intel Xeon Phi Coprocessor.
*Proc. International Joint Conference on Neural Networks (IJCNN)*.Google Scholar - 15.Yang, F., Qinru Qiu, M. B., Wu, Q. (2012). Tag-assisted sentence confabulation for intelligent text recognition.
*Proc. of Computational Intelligence for Security and Defense Applications (CISDA)*.Google Scholar - 16.Li, Z., Qinru Qiu. (2014). Completion and parsing Chinese sentences using cogent confabulation.
*Proceedings of IEEE Symposium Series on Computational Intelligence (SSCI)*.Google Scholar - 17.Wu, Q. Bishop, M., Pino, R., Linderman, R., Qiu, Q. (2011). A multi-answer character recognition method and its implementation on a high- performance computing cluster.
*3rd International Conference on Future Computational Technologies and Applications*, pp. 7–13.Google Scholar - 18.Kim, K.-H., Gaba, S., Wheeler, D., Cruz-Albrecht, J. M., Hussain, T., Srinivasa, N., & Lu, W. (2011). A functional hybrid memristor crossbararray/cmos system for data storage and neuromorphic applications.
*Nano Letters, 12*(1), 389–395.CrossRefGoogle Scholar - 19.Choi, B. J., Chen, A. B., Yang, X., & Chen, I.-W. (2011). Purely electronic switching with high uniformity, resistance tunability, and good retention in pt-dispersed sio2 thin films for reram.
*Advanced Materials, 23*(33), 3847–3852.Google Scholar - 20.Zweig, G., Burges. C. J. C. (2012). A challenge set for advancing language modeling. In
*Proceedings of the NAACL-HLT 2012 Workshop: Will We Ever Really Replace the N-gram Model? On the Future of Language Modeling for HLT*, pages 29–36. Association for Computational Linguistics.Google Scholar - 21.Li, B., Zhou, E., Huang, B., Duan, J., Wang, Y., Xu, N., Zhang J., Yang, H. (2014). Large scale recurrent neural network on GPU.
*Neural Networks (IJCNN), 2014 International Joint Conference on*, pp. 4062–4069.Google Scholar - 22.Voorhies, R.C., Elazary, L., Itti, L. (2012). Neuromorphic Bayesian surprise for far-range event detection, on advanced video and signal-based surveillance (AVSS),
*IEEE Ninth International Conference on*, pp 1–6, 18–21 Sept. 2012.Google Scholar - 23.Neftci, E., Das, S., Pedroni, B., Kreutz-Delgado, K., & Cauwenberghs, G. (2014). Event-driven contrastive divergence for spiking neuromorphic systems.
*Frontiers in Neuroscience, 7*, 272.CrossRefGoogle Scholar - 24.Schmuker, M., Pfeil, T., & Nawrot, M. P. (2014). A neuromorphic network for generic multivariate data classification.
*Proceedings of the National Academy of Sciences, 111*(6), 2081–2086.CrossRefGoogle Scholar - 25.Merolla, P. A., Arthur, J. V., Alvarez-Icaza, R., Cassidy, A. S., Sawada, J., Akopyan, F., & Modha, D. S. (2014). A million spiking-neuron integrated circuit with a scalable communication network and interface.
*Science, 345*(6197), 668–673.CrossRefGoogle Scholar - 26.Yakopcic, C., Hasan, R., Taha, T. M. (2014). Tolerance to defective memristors in a neuromorphic learning circuit.
*Aerospace and Electronics Conference, NAECON 2014-IEEE National,*(pp. 243–249). IEEE.Google Scholar - 27.Bichler, O., Suri, M., Querlioz, D., Vuillaume, D., DeSalvo, B., & Gamrat, C. (2012). Visual pattern extraction using energy-efficient “2-PCM synapse” neuromorphic architecture.
*IEEE Transactions on Electron Devices, 59*(8), 2206–2214.CrossRefGoogle Scholar - 28.Gustavsson, M., Wikner, J. J., & Tan, N. (2000). CMOS data converters for communications.
*Springer Science & Business Media.*Google Scholar - 29.Liu, X., Mao, M., Liu, B., Li, H., Chen, Y., Li, B., Wang, Y. (2015). RENO: A High-efficient Reconfigurable Neuromorphic Computing Accelerator Design,
*Proc. of Design Automation Conference*.Google Scholar - 30.Zhirnov, V. V., Meade, R., Cavin, R. K., & Sandhu, G. (2011). Scaling limits of resistive memories.
*Nanotechnology, 22*, 25.CrossRefGoogle Scholar - 31.Pi, S., Lin, P., & Xia, Q. (2013). Cross point arrays of 8 nm 38 nm memristive devices fabricated with nanoimprint lithography.
*Journal of Vacuum Science and Technology, B31*, 06FA02.CrossRefGoogle Scholar