Journal of Signal Processing Systems

, Volume 84, Issue 3, pp 355–369 | Cite as

A Neuromorphic Architecture for Context Aware Text Image Recognition

  • Qinru Qiu
  • Zhe Li
  • Khadeer Ahmed
  • Wei Liu
  • Syed Faisal Habib
  • Hai (Helen) Li
  • Miao Hu
Article

Abstract

Although existing optical character recognition (OCR) tools can achieve excellent performance in text image detection and pattern recognition, they usually require a clean input image. Most of them do not perform well when the image is partially occluded or smudged. Humans are able to tolerate much worse image quality during reading because the perception errors can be corrected by the knowledge in word and sentence level context. In this paper, we present a brain-inspired information processing framework for context-aware Intelligent Text Recognition (ITR) and its acceleration using memristor based crossbar array. The ITRS has a bottom layer of massive parallel Brain-state-in-a-box (BSB) engines that give fuzzy pattern matching results and an upper layer of statistical inference based error correction. Optimizations on each layer of the framework are introduced to improve system performance. A parallel architecture is presented that incorporates the memristor crossbar array to accelerate the pattern matching. Compared to traditional multicore microprocessor, the accelerator has the potential to provide tremendous area and power savings and more than 8,000 times speedups.

Keywords

Neuromorphic Text recognition Memristor crossbar array 

1 Introduction

Military planning, battlefield situation awareness, and strategic reasoning rely heavily on the knowledge of the local situation and the understanding of different cultures. A rich source of such knowledge is presented as natural-language text. Autonomous and intelligent recognition of printed or handwritten text image is one of the key features to achieve situational awareness. Although generally effective, Conventional Optical Character Recognition (OCR) tools or pattern recognition techniques usually have difficulties in recognizing images that are noisy, partially occluded or even incomplete due to the damages to the printing material, or obscured by marks or stamps.

However, such tasks are not too difficult for humans, as the errors in image recognition will be corrected later using semantic and syntactic context. Most human cognitive procedures involve two interleaved steps, sensing and association. Together, they provide higher accuracy.

Computing models have been developed for performing cognitive functions on raw input signals such as image and audio. One representative area in this category is the associative neural network model, which is typically used for pattern recognition. We generally say that this kind of model performs the “sensing” function. In the other category, models and algorithms are researched to operate on the concept-level objects, assuming that they have already been “recognized” or extracted from raw inputs. In a recent development, the cogent confabulation model was used for sentence completion [1, 2]. Trained using a large amount of literatures, the confabulation algorithm has demonstrated the capability of completing a sentence (given a few starting words) based on conditional probabilities among the words and phrases. We refer these algorithms as the “association” models. The brain inspired signal processing flow could be applied to many applications. A proof-of-concept prototype of context-aware Intelligence Text Recognition system (ITRS) is developed on high performance computing cluster [3]. The lower layer of the ITRS performs pattern matching of the input image using a simple non-linear auto-associative neural network model called Brain-State-in-a-Box (BSB) [4]. It matches the input image with the stored alphabet. A race model is introduced that gives fuzzy results of pattern matching. Multiple matching patterns will be found for one input character image, which is referred as ambiguity. The upper layer of the ITRS performs information association using the cogent confabulation model [1]. It enhances those BSB outputs that have strong correlations in the context of word and sentence and suppresses those BSB outputs that are weakly related. In this way, it selects characters that form the most meaningful words and sentences.

Both BSB and confabulation models are connection based artificial neural networks, where weight matrices are used to represent synapses between neurons and their operation can be transformed into matrix–vector multiplication(s). Hardware realizations of neural networks require a large volume of memory and are associated with high cost if built with digital circuits [5].

The memristor has been discovered as a promising device for massively parallel, large-scale neuromorphic systems. A memristor can “remember” the total electric charge/flux ever to the flow through it [6], which is analogous to synapses among neurons. Moreover, memristor-based memories can achieve a very high integration density of 100-Gbits/cm2, a few times higher than flash memory technologies [7]. Due to these properties, memristor crossbar, which employs a memristor at each intersection of horizontal and vertical metal wires, is proposed to facilitate weight matrices storage and matrix–vector multiplication.

In this paper, we present the brain inspired information processing framework and its acceleration using memristor crossbar array. The remainder of the paper is organized as follows. In Section II, we discuss some related neuromorphic works while in Section III we introduce the basics of models used for sensing and association in the ITRS system. Section IV describes the overall system model and the algorithms in different layers. Section V gives the details of hardware acceleration using memristor crossbar array. The experimental results and discussions are presented in Section VI. Section VII summarizes the work.

2 Related Works

During recent years, neuromorphic computing has become an important research area. The research works range from applications to hardware implementations.

In [22] Voorhies et al. introduced a uniquely structured Bayesian learning network with combined measure across spatial and temporal scales on various visual features to detect small, rare events in far-field video streams. However, this structured Bayesian learning network does not fit applications like text recognition and completion easily. Authors of [23] proposed a sophisticated method based on spiking neuromorphic systems with event-driven contrastive divergence trained Restricted Boltzmann Machines and apply the model to recognize the image of MNIST hand-written digit. However their application limits only in the pattern matching layer, and did not go beyond that. M. Schmuker at el. [24] demonstrates a brain-like processing using spiking neuron network, which achieves classification of generic multidimensional data. No specific application, however, is discussed in this work. It only provides a proof of concept design of analog electronic microcircuits to mimic behavior of neurons for real-world computing tasks.

Many existing neuromorphic computing researches concentrate on pattern matching applications such as video detection or character recognition. Very few of them study the function of probabilistic inference in neuromorphic computing. Some works also focus on developing general hardware architecture for neuromorphic computing. For example, IBM’s TrueNorth [25] is an efficient, scalable and flexible non-von Neumann architecture, which integrates 1 million programmable spiking neurons and 256 million configurable synapses. The hardware is suited to many applications such as multi-object detection and classification. Other novel architectures utilize emerging device technologies such as memristor crossbar or phase change memory (PCM). Authors of [18] attempt to implement data storage using memristor and [26] describe a memristor based neuromorphic circuit capable of learning which is tolerant of error. Suri, M. at el. [27] demonstrate a unique energy efficient methodology that uses PCM as synapse in ultra-dense large scale neuromorphic systems. In [27] the demonstration of complex visual pattern extraction from real world data using PCM synapses in a 2-layer spiking neural network is shown.

To the best of our knowledge, our proposed architecture is the first that covers both the pattern matching layer and probabilistic inference layer in neuromorphic computing. Neither have the implementation on state-of-the-art multicore processor nor the projected acceleration using memristor crossbar array been addressed in previous works.

3 Background

3.1 Neural Network and BSB Model

The BSB model is an auto-associative, nonlinear, energy minimizing neural network. A common application of the BSB model is to recognize a pattern from a given noisy version. It can also be used as a pattern recognizer that employs a smooth nearness measure and generates smooth decision boundaries. It has two main operations: training and recall. The mathematical model of BSB recall function can be represented as:
$$ \boldsymbol{x}\left(t+1\right)=S\left(\alpha \cdot \boldsymbol{Ax}(t)+\beta \cdot \boldsymbol{x}(t)\right) $$
(1)
where x is an N dimensional real vector and A is an N-by-N connection matrix, which is trained using the extended Delta rule. Ax(t) is a matrix–vector multiplication, which is the main function of the recall operation. α is a scalar constant feedback factor. β is an inhibition decay constant. S() is the “squash” function defined as follows:
$$ S(y)=\left\{\begin{array}{ll}1,\hfill & y\ge 1\hfill \\ {}y,\hfill & -1<y<1\hfill \\ {}-1,\hfill & y\le -1\hfill \end{array}\right. $$
(2)

For a given input pattern x(0), the recall function computes (1) iteratively until convergence, that is, when all entries of x(t + 1) are either ‘1’ or ‘−1’[14].

Algorithm 1. BSB training algorithm using Delta rule.

Step 0. Initialize weights (zero or small random values).

Initialize learning rate α.

Step 1. Randomly select one prototype pattern γ(k)∈ Bn, k = 1,...., m. Bn is the n-dimension binary space (−1, 1).

Set target output to the external input prototype pattern γ(k) : ti = γi.

Step 2. Compute net inputs: \( {y}_{i{n}_i}={\sum}_j{\gamma}_j{w}_{ji} \)

(Each net input is a combination of weighted signals received from all units.)

Step 3. Each unit determines its activation (output signal):

\( {y}_i=S\left({y}_{i{n}_i}\right)=\left\{\begin{array}{ccccc}\hfill 1,\hfill & \hfill \hfill & \hfill {y}_{in}\hfill & \hfill \ge \hfill & \hfill 1\hfill \\ {}\hfill {y}_{in},\hfill & \hfill -1\hfill & \hfill <{y}_{in}\hfill & \hfill <\hfill & \hfill 1\hfill \\ {}\hfill -1,\hfill & \hfill \hfill & \hfill {y}_{in}\hfill & \hfill \le \hfill & \hfill -1\hfill \end{array}\right. \)

Step 4. Update weights: Δwij = α⋅(tj − yj)⋅γi.

Step 5. Repeat Steps 1-4 until the condition |t(i) − y(i)| < θ is satisfied in m consecutive iterations.

The most fundamental BSB training algorithm is given in Algorithm 1, which bases on the extended Delta rule [8]. It aims at finding the weights so as to minimize the square of the error between a target output pattern and the input prototype pattern.

3.2 Cogent Confabulation

Inspired by human cognitive process, cogent confabulation [1] mimics human information processing including Hebbian learning, correlation of conceptual symbols and recall action of brain. Based on the theory, the cognitive information process consists of two steps: learning and recall. The confabulation model represents the observation using a set of features. These features construct the basic dimensions that describe the world of applications. Different observed attributes of a feature are referred as symbols. The set of symbols used to describe the same feature forms a lexicon and the symbols in a lexicon are exclusive to each other.

In learning process, matrices storing posterior probabilities between neurons of two features are captured and referred as the knowledge links (KL). A KL stores weighted directed edges from symbols in source lexicon to symbols in target lexicon. The (i, j) th entry of a KL, quantified as the conditional probability P(si|tj), represents the Hebbian plasticity of the synapse between ith symbol in source lexicon s and jth symbol in target lexicon t. The knowledge links are constructed during learning process by extracting and associating features from the inputs and collection of all knowledge links in the model forms its knowledge base (KB).

During recall, the input is a noisy observation of the target. In this observation, certain features are observed with great ambiguity, therefore multiple symbols are assigned to the corresponding lexicons. The goal of the recall process is to resolve the ambiguity and select the set of symbols for maximum likelihood using the statistical information obtained during the learning process. This is achieved using a procedure similar to the integrate-and-fire mechanism in biological neural system. Each neuron in a target lexicon receives an excitation from neurons of other lexicons through KLs, which is the weighted sum of its incoming excitatory synapses. Among neurons in the same lexicon, those that are least excited will be suppressed and the rest will fire and become excitatory input of other neurons. Their firing strengths are normalized and proportional to their excitation levels. As neurons gradually being suppressed, eventually only the neuron that has the highest excitation remains firing in each lexicon and the ambiguity is thus resolved.

Let l denote a lexicon, Fl denote the set of lexicons that have knowledge links going into lexicon l, and Sl denote the set of symbols that belong to lexicon l. The excitation of a symbol t in lexicon l is calculated by summing up all incoming knowledge links:
$$ el(t)={\displaystyle \sum_{k\in {F}_l}}\left[{\displaystyle \sum_{s\in {S}_k}}el(s) \ln \left(\frac{P\left(s|t\right)}{p_0}\right)+B\right],t\in {S}_l $$
(3)
the function el(s) is the excitation level of the source symbol s. The parameter p0 is the smallest meaningful value of P(si | tj). The parameter B is a positive global constant called the bandgap. The purpose of introducing B in the function is to ensure that a symbol receiving N active knowledge links will always have a higher excitation level than a symbol receiving (N-1) active knowledge links, regardless of their strength. As we can see, the excitation level of a symbol is actually its log-likelihood given the observed attributes in other lexicons.

4 System Architecture

4.1 Overview of the ITRS

The ITRS is divided into three layers as shown in Fig. 1. The input of the system is a text image. The first layer is character recognition based on BSB models. It recalls the stored patterns of the English alphabet that matches the input image. If there is noise in the image, multiple matched patterns may be found. The ambiguity can be removed by considering the word level and sentence level context, which is achieved by the statistical information association in the second and third layer where word and sentence is formed using cogent confabulation models.
Figure 1

Overall architecture of the models and algorithmic flow.

Figure 1 shows an example of using the ITRS to read texts that have been occluded. The BSB algorithm recognizes text images with its best effort. The word level confabulation provides all possible words that can be formed based on the recognized characters while the sentence level confabulation finds the combination among those words that gives the most meaningful sentence.

4.2 Character Level Image Recognition

The initial image processing consists of six major steps performed in a sequence. These steps corrects the distortion and extract characters for further pattern recognition. To optimize performance these stages are designed as a pipeline as shown in Fig. 2.
Figure 2

Image processing pipeline.

The region extraction operates at the page level. In this stage pages are broken down to paragraphs. The line extraction operates at paragraph level, which extracts the text lines from a paragraph. The line correction is the next step that corrects all deformations due to warping and rotation. Characters are then extracted and scaled in order to remove perspective distortion. Correct order of text lines in paragraph and correct order of paragraphs in a page are determined in line ordering and paragraph ordering stages. Each character image is labeled with these orders and sent to BSB model for pattern recognition.

We designed a new “racing” algorithm for BSB recalls to implement the multi-answer character recognition process. Let S denote the set of characters that we want to recognize. Without loss of generality, assume the size of S is 52, which is the number of upper and lower case characters in the English alphabet. We also assume that for each character, there are M typical variations in terms of different fonts, styles and sizes. In terms of pattern recognition, there is a total of 52 × M patterns to remember during training and to recognize during recall.

A BSB model is trained for each character in S. Therefore there will be a set of 52 BSB models and each BSB model is trained for all variations of a character. The multi-answer implementation utilizes the BSB model’s convergence speed to represent the similarity between an input image and the stored pattern. An input image is compared against each one of the 52 BSB models; therefore it triggers 52 recall processes. The number of iterations that each recall process takes to converge is recorded. Then we pick up characters in K “fastest” converged processes as the final output to word confabulation model. Figure 3 gives an example of how the racing mechanism works.
Figure 3

Example for “racing” mechanism based on BSB model. Hand-written “t” is compared against each model storing patterns of each character in S, and initiate 52 recall processes. K fastest converged process are selected to output its corresponding character as candidates to next level, i.e., word confabulation.

4.3 Word Level Confabulation

Word level confabulation interfaces between BSB and sentence confabulation, which collects ambiguous character inputs from BSB layer and generate valid combinations to form meaningful words. The word confabulation use the ambiguous letter candidates and create valid word combinations. The dictionary database is loaded as a trie data structure during initialization. An example of trie data structure is shown in the right of Fig. 4.
Figure 4

Trie data structure used in word confabulation.

The combinations based on letter candidates are validated against the trie. For example let’s consider a word “dog”. Its candidates for each letter position are [d o b] [o a g] [g a y]. Word confabulation will traverse through the trie using these candidates to search for the valid words presented in the trie. The valid words will be pushed onto a stack. In this example, these valid words would be: dog, day, boy, bag. Since the letter candidates were passed with their relative confidence level, the confidence level for each word will be the product of the letters it contains.

4.4 Sentence Level Confabulation

Sentence level confabulation model defines three levels of lexicons. The first and second level lexicons represent single words and pairs of adjacent words; while the third level of lexicons represent the parts-of-speech (POS) tags of the corresponding word. During recall, those word and word-pair symbols corresponding to the outputs from word level confabulation are set as active, and all POS tag symbols are also set as active. If a lexicon has more than one active symbol, it is said to have ambiguity. The goal of sentence confabulation is to resolve the ambiguity iteratively through a recall procedure similar to belief propagation and finally form a meaningful sentence. The general confabulation recall algorithm can is described as follows in Algorithm.2.

As Algorithm 2 shows, for each lexicon that has multiple symbols activated, we calculate the excitation level of each activated symbol. The N highest excited symbols in this lexicon are kept active. These symbols will further excite the symbols in other ambiguous lexicons. This procedure will continue until the activated symbols in all lexicons do not change anymore. If the convergence cannot be reached after a given number of iterations, then we will force the procedure to converge. Then value of N will be reduced by 1 and we repeat the above procedure. At last N is reduced to 0 which means there is only one active symbol in each lexicon. Then ambiguity is eliminated in all lexicons.

Algorithm 2. Confabulation recall algorithm

for eachknown lexicon*

set symbol to be active

end for

forN from MAX_AMBIGUIOUS downto 1

converged = false;

iteration_count = 0;

whilenot converged

for each unknown lexicon

for each symbol associated to the lexicon

calculate the excitation level of the symbol;

end for

select N highest excited symbols and set them to be active;

end for

iteration_count++;

if activated set does not change since last iteration or iteration_count > = MAX_ITERATION

converged = true;

end if

N--;

end for

*lexicons who has only one symbol candidate are denoted as known lexicons, others are unknown lexicons.

4.5 Improving Sentence Confabulation

In sentence confabulation, the excitation level of a candidate is the weighted sum of excitation levels of active symbols in other lexicons. Intuitively, however, different source lexicons do not contribute equally to a target lexicon. For example, the lexicon right next to an unknown word obviously gives more information in determining the unknown word than the lexicon that is five words away. Thus the significance of a KL can be measured by weight and quantified by the mutual information(MI) [9]. Mutual information of two random variables is a measure of variables’ mutual independence, calculated as
$$ I\left(A;B\right)={\displaystyle \sum_{b\in B}}{\displaystyle \sum_{a\in A}}p\left(a,b\right) log\left(\frac{p\left(a,b\right)}{p(a)p(b)}\right) $$
(4)
where A is the source lexicon and a represents symbols in A; B is the target lexicon and b represents symbols in B. p(a, b) is the joint probability of symbol a and b; p(a) and p(b) are the margin probability of symbol a and b respectively. I(A; B) is nonnegative. The value of I(A; B) will increase when the correlation of symbols in lexicon A and B get stronger. We defined the weight of KL (i.e., wkl) from A to B as positive linear function of MI of A and B.
The sentence confabulation model in Algorithm 2 considers all initial symbols equally possible. In reality, we know that some words are more likely than others from the given image. To incorporate the image information with sentence confabulation, we consider the BSB convergence speed during the confabulation process, and modify the excitation level calculation of a word symbol t as follows,
$$ el(t)=\alpha {P}_{BSB}(t)+\beta {\displaystyle \sum_{k\in {F}_l}}\left[{w}_{kl}{\displaystyle \sum_{s\in {S}_k}}el(s) \ln \left(\frac{P\left(s|t\right)}{p_0}\right)+B\right] $$
(5)

In (5), variable PBSB(t) is the excitation to t from the BSB layer, which is calculated as: \( {P}_{BSB}(t)=\frac{1/\left({N}_{BSB}(t)-{N}_{min}\right)}{{\displaystyle {\sum}_t}1/\left({N}_{BSB}(t)-{N}_{min}\right)} \), where NBSB(t) is the BSB convergence speed of t, Nmin is the minimum convergence number that is possible for BSB engines, α and β are coefficients that adjust the weight of BSB (i.e., image) information and confabulation (i.e., language) information, α + β = 1. In general, we should increase the value of α and decrease the value of β when the image has high quality and vice versa.

5 Hardware Acceleration of BSB Recall

5.1 Memristor and Crossbar Array

In 2008, HP Lab demonstrated the first memristive device, in which the memristive effect was achieved by moving the doping front within a TiO2 thin-film [10]. The overall memristance can be expressed as:
$$ M(p)=p\cdot {R}_H+\left(1-p\right)\cdot {R}_L $$
(6)
where p (0 ≤ p ≤ 1) is the relative doping front position, which is the ratio of doping front position over the total thickness of the TiO2 thin-film, RL and RH respectively denote the low resistance state (LRS) and the high resistance state (HRS) of the memristor. The velocity of doping front movement v(t), driven by the voltage applied across the memristor V(t), can be expressed as:
$$ v(t)=\frac{dp(t)}{dt}={\mu}_v\cdot \frac{R_L}{h^2}\cdot \frac{V(t)}{M(p)} $$
(7)
where μv is the equivalent mobility of dopants, h is the total thickness of the thin film, and M(p) is the total memristance when the relative doping front position is p. In general, a certain energy (or threshold voltage) is required to enable the state change in a memristive device. When the electrical excitation through a memristor is greater than the threshold voltage, i.e., V(t) > Vth, the memristance changes (in training). Otherwise, a memristor behaves like a resistor.
Crossbar array illustrated in Fig. 5 is a typical structure of memristor based memories. It employs a memristor device at each intersection of horizontal and vertical metal wires without any selectors [11]. The memristor crossbar array is naturally attractive for implementation of connection matrix in neural networks for it can provide a large number of signal connections within a small footprint and conduct the weighted combination of input signals [12, 13].
Figure 5

A memristor crossbar array.

5.2 Matrix Multiplication Using Memristor Crossbar

In order to use the N-by-N memristor crossbar array illustrated in Fig. 5 for matrix computation, a set of input voltages VIT = {VI,1, VI,2, …, VI,N} is applied on the word-lines (WL) of the array, and the current through each bit-line (BL) is collected by measuring the voltage across a sensing resistor. The same sensing resistors are used on all BLs with resistance rs, or conductance gs = 1/rs. The output voltage vector VOT = {VO,1, VO,2, …, VO,N}. Assume the memristor sitting on the connection between WLi and BLj has a memristance of mi,j. The corresponding conductance gi,j = 1/mi,j. Then, the relation between the input and output voltages can be represented by:
$$ {V}_o=C{V}_I $$
(8)
Here, C can be represented by the memristors’ conductance and the load resistors as:
$$ \boldsymbol{C}=\boldsymbol{D}{\boldsymbol{G}}^T= diag\left({d}_1,\cdots, {d}_N\right)\left[\begin{array}{ccc}\hfill {g}_{11}\hfill & \hfill \cdots \hfill & \hfill {g}_{1,N}\hfill \\ {}\hfill \vdots \hfill & \hfill \ddots \hfill & \hfill \vdots \hfill \\ {}\hfill {g}_{N,1}\hfill & \hfill \cdots \hfill & \hfill {g}_{N,N}\hfill \end{array}\right] $$
(9)
where \( {d}_i=1/\left({g}_s+{\displaystyle \sum_{i=1}^N}{g}_{i,j}\right) \).

Please note that some non-iterative neuromorphic hardware uses the output currents IO as output signals. Since the BSB algorithm discussed in this work is an iterative network, we take VO as output signals, which can be directly fed back to inputs for the next iteration without extra design cost.

Equation (8) indicates that a trained memristor crossbar array can be used to construct the weight matrix C, and transfer the input vector VI to the output vector VO. However, C is not a direct one-to-one mapping of conductance matrix G as indicated in Eq. (9). Though we can use a numerical iteration method to obtain the exact mathematical solution of G, it is too complex and hence impractical when frequent updates are needed.

For simplification, assume gi,jG satisfies gmin ≤ gi,j ≤ gmax, where gmin and gmax respectively represent the minimum and the maximum conductance of all the memristors in the crossbar array. Thus, a simpler and faster approximation solution to the mapping problem is defined as:
$$ {g}_{j,i}={c}_{i,j}\cdot \left({g}_{max}-{g}_{min}\right)+{g}_{min} $$
(10)

With the proposed fast approximation function (10), the memristor crossbar array performs as a decayed matrix Ĉ between the input and output voltage signals, where ĉi,j = ci,j ⋅ gmax/gs.

5.3 Training Memristor Crossbars in BSB Model

A software generated weight matrix can be mapped to the memristor crossbar arrays based on the assumption that every memristor in the crossbar could be perfectly programmed to the required resistance value. However, the traditional crossbar programming method faces accuracy and efficiency limitations due to the existence of the sneak paths [11]. Although some recent works were presented to improve the write/read ability of memristor crossbars by leveraging the device nonlinearity [11], the controllability of analog state programming is still limited. In spite of preparing the memristor crossbars with open-loop writing operations, we propose a close-loop training method which iteratively tunes the entire memristor crossbar to the target state. This technique is based on a modification of the software training algorithm.

Let’s use the Delta rule in Algorithm 1 as an example. A weight wij corresponds to the analog state of the memristor at the cross-point of the ith row and the jth column in a crossbar array. A weight updating Δwij involves multiplying three analog variables: α, tjyj, and xi. Though these variables are available in training scheme design, the hardware implementation to obtain their multiplication demands unaffordable high computation resources. Thus, we simplify the weight updating function by trading off the convergence speed as:
$$ \varDelta {w}_{ij}=\alpha \cdot sign\left({t}_j-{y}_j\right)\cdot sign\left({x}_i\right) $$
(11)

Here, sign(tjyj) and sign(xi) are the polarities of tj − yj and xi, respectively. sign(tj − yj)⋅sign(xi) represents the direction of the weight change.

The simplification minimizes the circuit design complexity meanwhile ensuring the weight change in the same direction as that of the Delta rule.

5.4 Transformation of BSB Recall Matrix

A memristor is a physical device with conductance g > 0. Therefore, all elements in matrix C must be positive as shown in (9). However, in the original BSB recall model, ai,jA can be either positive or negative. An alternative solution is moving the whole A into the positive domain. Since the output x(t + 1) will be used as input signal in the next iteration, a biasing scheme at x(t + 1) is needed to cancel out the shift induced by the modified A. The biasing scheme involves a vector operation since the shift is determined by x(t).

To better maintaining the meaning of the matrix A in physical mapping and leverage the high integration density of memristor crossbar, we propose to split the positive and negative elements of A into two matrixes A+ and A as:
$$ {a}_{i,j}^{+}=\left\{\begin{array}{ll}{a}_{i,j},\hfill & if\ {a}_{i,j}>0\hfill \\ {}0,\hfill & if\ {a}_{i,j}\le 0\hfill \end{array}\right.\kern1em \mathrm{and}\kern1em {a}_{i,j}^{-}=\left\{\begin{array}{ll}0,\hfill & if\ {a}_{i,j}>0\hfill \\ {}-{a}_{i,j},\hfill & if\ {a}_{i,j}\le 0\hfill \end{array}\right. $$
(12)
As such, (1) becomes
$$ \boldsymbol{x}\left(t+1\right)=S\left({\boldsymbol{A}}^{+}\boldsymbol{x}(t)-{\boldsymbol{A}}^{-}\cdot \boldsymbol{x}(t)+x(t)\right) $$
(13)
where we set α = β = 1. Thus, A+ and A can be mapped to two memristor crossbar arrays M1 and M2 in a decayed version Â+ and Â, respectively, by following (10).

6 Experimental Results

In this section, we present several independent experiments carried out on different layers of the ITRS system. Each experiment is specifically designed to demonstrate our improvements on that particular layer over the previous works. Their configuration and results are discussed in detail in the following sections. We also report the accuracy and confidence level of the entire ITRS system when applied to recognize document images with different qualities. At the end, we demonstrate the recall quality of memristor crossbar array based BSB, and analyze its performance and cost.

6.1 Performance Improvement in Word Confabulation Layer

Instead of the hash table, which is originally used to store the dictionary, the trie data structure is applied as a new implementation to significantly reduce the search time for checking all character combinations against dictionary. Three sets of images with different qualities are used as inputs. The first set of images are clean scanned document images; the second set of images are scanned document image with 10 % of characters completely occluded; and the third set of images are camera images with the same amount of occlusions. Each set consists of 8 document images. The average Signal-to-Noise Ratio (SNR) and average Peak Signal-to-Noise Ratio (PSNR) of the images in each set are given in Table 1. The clean image has the highest quality while the camera occluded image has the lowest quality.
Table 1

Quality of Input Images.

Image sets

Scanned Clean

Scanned Occluded

Camera Occluded

Avg. SNR

5.1204

4.3756

3.8116

Avg. PSNR

8.095

7.3515

6.7904

Table 2 compares the word confabulation time of old implementation to that of the new implementation when processing input images with different qualities. As we can see, the lower quality input image leads to higher ambiguity in pattern matching. As the number of letter candidates increases, the complexity of the original implementation of word confabulation increases exponentially as it has to check all the combinations of the letter candidates. The new implementation has much lower complexity because it pruned many invalid combinations in advance. Furthermore, the hash table based dictionary storage in the original implementation has very poor memory locality, which degrades the performance even more.
Table 2

Improvement in word confabulation layer.

 

Word Confabulation Time (sec)

Clean image

Scan Occluded

Camera Occluded

Original implementation

310

2997

2483

New implementation

0.3

1.28

1.71

6.2 Performance Improvement in Sentence Confabulation Layer

As the most important layer of ITRS system, more optimizations are proposed on sentence confabulation layer. In order to focus only on the performance of sentence confabulation, for all experiments in this subsection, we set α and β in Eq. (5) to 0 and 1 respectively, in order to decouple the image information from sentence confabulation. We will discuss the impact of parameters α and β in the next subsection.

Original sentence confabulation model maintains a separate knowledge link for each pair of source and target lexicons, which generates redundancy. A new implementation called circular model is proposed in [16] to merge all knowledge links between source and target lexicons that have same relative position. For example, knowledge links between any pair of adjacent lexicons will be merged as one. The new implementation not only reduces training and recall time, but also improves the accuracy of sentence completion. In this experiment, we cover random number of words completely in a sentence so that all words in dictionary are taken as candidates for the missing words. Table 3 shows that circular model gives 23.99 % accuracy improvement, with 70.4 % less effort of training and 17.5 % less effort of recall.
Table 3

Comparasion of non-circular and circular model.

 

Non-circular

Circular

Improvement (%)

Training time (sec)

489180

144540

70.45 %

Recall time (sec)

6317.22

5207.83

17.56 %

Accuracy

54.95 %

68.13 %

23.99 %

We also reduced the initialization time of sentence confabulation by loading the knowledge base in parallel. The size of sentence confabulation knowledge base is more than 7GB. Loading the knowledge base sequentially takes more than 83.9 s. A multi-thread implementation that loads the knowledge base in parallel can reduce the initialization time to 31 s and provides 2.7× speedups.

Integrating the POS tag in confabulation model significantly improves the sentence confabulation accuracy [15]. To evaluate the impact, the tag-assisted confabulation method is compared with no-tag confabulation at various noise levels. In this experiment, we randomly select input character image and add 3 pixel wide horizontal strikes. The noise level percentage means the ratio of characters in text with a 3-pixel wide horizontal strike. Note that the size of original character is 15 × 15 pixels, a 3-pixel wide strike is almost equivalent to 20 % distortion.

Figure 6 shows that no-tag sentence confabulation quickly collapse as noise level increases. This is because each test sentence contains on average 28 characters and we only consider the sentence correct if all of its characters are correct. The noise level at character level is compounded into character and word level ambiguity. Without semantic information, which provides an overall structure for each sentence, the success rate is expected to drop exponentially as noise level increase. Tag-assisted confabulation shows clear improvements over no-tag confabulation at all noise levels. The improvement is minor at low noise level, but significant at high noise level. Overall, tag-assisted confabulation improves success rate by 33 % in average.
Figure 6

Accuracy of sentence confabulation with/without POS tag.

The next set of experiments is to show the impact of weighting knowledge link of sentence level confabulation using mutual information between the source and target lexicons.

This experiment is based on Microsoft Research Sentence Completion (MRSC) challenge. The MRSC challenge intended to stimulate research into language modeling techniques which are sensitive to overall sentence coherence [20]. The challenge consists of fill-in-the-blank questions similar to those widely used in the Scholastic Aptitude Test. We use partial training set provided by MRSC project to train our confabulation model due to the limited time. And we run recall function based on sentence confabulation model with and without weighting knowledge links to fill in the blank words for 1,040 tests in the challenge. Figure 7 shows the recall accuracy of the two different of confabulation models. For each model, the Bandgap is varied from 1 to 1000. As we can see, when bandgap value is 10 or less, assigning weight to KL provides little improvement. However, when the bandgap value exceeds 100, assigning weight to KLs brings visible benefits; it improves accuracy by about 4 %. The recall accuracy becomes saturated after the bandgap exceeds 100. We also observe that, without weighted KL, changing the bandgap value has almost no impact on the recall accuracy. Please note in this experiment, the condition is equivalent to that words are completely covered, sentence level confabulation cannot get any clue from word confabulation. And since we train incomplete training set to save time, some words appear in the tests are not stored in dictionary. An unrecognized word will never be recalled correctly by the confabulation model, thus if we train complete training set, sentence accuracy will be increased. The same testing set was evaluated in [21], our weighted confabulation model gives a slightly higher recall accuracy of 48.30 % than 47 % accuracy based on recurrent neural network (RNN) model. Please note that the RNN model identifies the missing word from the list of candidates by evaluating the probability of the sentence that they could make. Therefore, it has to create a sentence for each combination of the candidates and calculate its probability. The complexity of the RNN is an exponential function of the number of missing words, while the complexity of confabulation model is a linear function of the number of missing words.
Figure 7

Comparison of accuracy for weighted/non-weighted KL model with different bandgap value.

6.3 Performance Improvement of Overall ITR System

To evaluate the impact of weighting image and language information. We assign wkl as 1, and α varies from 1 to 0 at step of 0.1, β varies from 0 to 1 at step of 0.1. In this experiment, we run the complete ITRS to show the overall performance.

As shown in (5), the excitation level of a word in the sentence confabulation layer is a weighted sum of two components. One of them represents the likelihood of the word based on the input image; the other represents the likelihood of word based on the language context. The parameters α and β control the weight of image information and language knowledge. Adjusting the value of α and β affects the accuracy of ITRs. Figure 8 shows how the word accuracy changes as we vary the value of α and β. In this experiment, we take three sets of images as input, scanned clean images, scanned occluded images and occluded images taken by camera. As we can see, completely ignore either the image inform or language information will lead to poor accuracy. Furthermore, for a clean image, we can rely more on the image information, and the best quality recognition happens when α and β are set at (0.9, 0.1); while for a low quality image, we should rely more on language information, and the best quality recognition happens when α and β are set at (0.7, 0.3).
Figure 8

Adjusting the weight of image and language information affects the accuracy of ITRS.

We further assign confidence level to the words recognized by ITRS. The confidence level is calculated as the normalized excitation difference between the selected candidate and its final competitor in the last round of confabulation, \( c\left({t}_1\right)= min\left[1,\ \frac{I\left({t}_1\right)-I\left({t}_2\right)}{I\left({t}_2\right)}\right] \), where t1 is the selected word and t2 is its only competitor in the last round of confabulation. Under this definition, 100 % confidence means that there was only 1 candidate for the lexicon while 0 % confidence means that the excitation level for the two remaining candidates are the same and in that case, the program just chooses the first candidate.

Table 4 shows the recall results for scanned occluded images as an example. Correctly recalled words have around 90 % confidence compared to around 20 % confidence of wrongly recalled words,. The overall average confidence is pretty high around 85 %, which means the ITRS system can always eliminate the ambiguity for multiple candidates effectively and achieve a high accuracy.
Table 4

Confidence level of scanned occluded images.

File Name

Test-1

Test-2

Test-3

Test-4

Total

Total Words

761

737

745

613

2856

Total Right Words

731

703

700

575

2709

Total Wrong Words

30

34

45

38

147

Average Confidence of right words (%)

88.86

85.66

88.63

88.96

87.99

Average Confidence of wrong words (%)

16.83

20.87

15.37

24.53

19.31

Total Average Confidence (%)

86.03

82.67

84.20

84.96

84.46

Total Accuracy (%)

96.06

95.39

93.96

93.80

94.85

In the last experiment, we compare the accuracy of ITRS with that of Tesseract on processing the same three sets of testing images. Developed initially at HP lab and now at Google, Tesseract is claimed to be the most accurate open source OCR engine available. The word accuracy of both ITRS and Tesseract are given in Table 5. As we can see, with the reduced image quality, the accuracy of Tesseract degrades rapidly, while the performance of ITRS is more robust. Although Tesseract produces perfect recognition with given clean image, the ITRS is more reliable under noisy environment for low quality images.
Table 5

Comparision between ITRS and Tesseract.

Input quality

Scanned clean images

Scanned images with occlusions

Camera images with occlusions

Tesseract

100 %

93.1 %

88.6 %

ITRS (default)

97.6 %

93.5 %

90.9 %

ITRS (best)

99.0 %

94.8 %

91.9 %

Please note that, unlike Tesseract which recognize words and sentences solely based on image information, ITRS cannot guarantee the recognition of any word that is not in its dictionary. This is because the known words will always receive higher excitation than unknown words during sentence confabulation, which is analogy to human cognition process. If we exclude all proper nouns, such as the name of characters and locations, the word level accuracy of ITRS can be further increased.

6.4 Performance Evaluation on Memristor Based BSB Circuit

The robustness of the BSB recall circuit was analyzed based on Monte-Carlo simulations at the component level. Memristor device parameters are taken from [10]. We tested 26 BSB circuits corresponding to the 26 lower case letters from “a” to “z”. The character imaging data was taken from [17]. Each character image consists of 16 × 16 points and can be converted to a (−1, +1) binary vector with n = 256. Accordingly, each BSB recall matrix has a dimension of 256 × 256. The training set of each character consists of 20 prototype patterns representing different size-font-style combinations. In each test, we created 500 design samples for each BSB circuit and ran 13,000 Monte-Carlo simulations. We use the probability of failed recognitions (PF) to measure the performance of a BSB circuit.

Figure 9 shows the comparison of PF of each input character pattern without considering any noise sources (“Ideal”) and under the scenario including all the process variations and signal fluctuations (“All noises”). In the figure, “within 3” stands for the failure rate test that the correct answer is within the top 3 recognized patterns, and “1st hit” stands for the failure rate test that the first recognized pattern is the correct answer.
Figure 9

PF for each character pattern.

The simulation shows that the performance degradation induced by process variations and signal fluctuations have a constant impact on all of the BSB circuits in the case of “within 3”. When processing a perfect image under ideal conditions, no BSB circuits fail and hence PF = 0. After including all static and dynamic noise, PF (within 3) ranges from 1 to 7 % for different input characters. When increasing the random point defects to 30 for input images, the range of PF (within 3) increase from 0 to 10 % under ideal conditions to 4–16 % after including the noise sources. When considering only the “1st hit” case, the PF of most characters, both in “Ideal” or “All noise”, dramatically increases as defects number goes to 30, implying that the input image defects rather than noise dominates the failure rate. Only a few characters, such as “j” and “l”, are more sensitive to noise than defects as they suffer from the high failure rates even without input pattern defects.

Besides accuracy, BSB model emphasize more on speed of calculation, for ambiguity can be eliminated on word confabulation level. We created a Verilog-A memristor model by adopting the device parameters from [18] and scaling them to 65 nm node based on the resistance and device area relation given in [19]. To achieve high-speed and small form factor, we adopt the flash analog-digital converter (ADC) and current steering digital analog converter (DAC) [28] in our design. For more detailed design of the peripheral circuitry of the crossbar array, please refer to [29].

We implemented the layout and schematic of the 64x64 memristor crossbar array under Cadence Virtuoso environment and extracted its area. The delay and power consumption of the crossbar array is obtained through simulation. The area, delay and power consumption of the peripheral circuits (e.g., AD/DA converter, op-amps, etc.) are estimated using published data [28]. We then scale the results to obtain an estimation of the Neuromorphic Computing Accelerator (NCA) with size 256 × 256. 93 NCAs are used and each of them implement one BSB model. Table 6 gives the area, power consumption and performance estimation of the accelerator. The processing time is estimated as the time needed to complete one unit workload of BSB computation, which is to check a set of 96 images. In the same table, we also list the power consumption, area and performance of Intel Xeon Sandy Bridge-EP processor as a reference. As we can see, the memristor based neuromorphic computing accelerator provides tremendous reduction from every perspective.
Table 6

Comparison of memristor and xeon processor.

Implementations

Processing time

Area nmm2)

Power consumption

Memristor crossbar

60 μs

151

875 mW

Xeon processor

0.5 s

435

183 W

With the scaling of memristor devices, the programming energy will be further reduced [30, 31]. For example, Pi et al. demonstrated cross point arrays of memristive devices with a lateral dimension of 8 nm [31]. The 8 nm device arrays made required a programming current of 600 pA, and it only needed 3 nanowatts to power the operation. Moreover, memristance has an inverse proportional relationship with the device area. Thus, memristance will increase with the shrinking of device sizes, resulting in lower operation power consumption of crossbar array.

7 Conclusions

This paper presents our work and optimization in neuromorphic computing with performance improvement. A brain-inspired information processing framework is developed that performs document image recognition using pattern matching and statistical information association. The framework has outstanding noise resistance and is capable of recognizing words and sentences from highly damaged images at high accuracy. With optimization on each layer of the framework, local and global accuracy are both increased. The detailed structure of a memristor crossbar array based neuromoprhic accelerator is described. When applied to implement the pattern matching layer of the text recognition system, the memristor based BSB recall circuit has high resilience to process variations and signal fluctuations and NCA based on memristor crossbar array provides more than 8,000× speedups over the Intel Xeon processor. The area and power consumption of the NCA is only 1/3 and 0.5 % of a Xeon processor respectively.

References

  1. 1.
    Hecht-Nielsen, R. (2007). Confabulation theory: the mechanism of thought. Springer.Google Scholar
  2. 2.
    Qiu, Q., Wu, Q., Burns, D., Moore, M., Bishop, M., Pino, R., Linderman, R. (2011). Confabulation based sentence completion for machine reading. Proc. Of IEEE Symposium Series on Computational Intelligence. Google Scholar
  3. 3.
    Qinru Qiu, Q., Wu, M., Bishop, R. P., Linderman, R. W. (2013). A parallel neuromorphic text recognition system and its implementation on a heterogeneous high performance computing cluster. IEEE Transactions on Computers, 62(5).Google Scholar
  4. 4.
    Anderson, J. A. (1995). An introduction to neural networks. The MIT Press.Google Scholar
  5. 5.
    Partzsch, J., & Schuffny, R. (2011). Analyzing the scaling of connectivity in neuromorphic hardware and in models of neural networks. IEEE Transactions on Neural Networks, 22(6), 919–935.CrossRefGoogle Scholar
  6. 6.
    Chua, L. (2011). Resistance switching memories are memristors. Applied Physics A: Materials Science & Processing, 102(4), 765–783.CrossRefMATHGoogle Scholar
  7. 7.
    Ho, Y. Huang, G. M., Li P. (2009). Nonvolatile memristor memory: device characteristics and design implications. International Conference on Computer-Aided Design (ICCAD), pp.485–490.Google Scholar
  8. 8.
    Anderson, J., Silverstein, J., Ritz, S., & Jones, R. (1977). Distinctive features, categorical perception, and probability learning: some applications of a neural model. Psychological Review, 84(5), 413.CrossRefGoogle Scholar
  9. 9.
    Yang, Z. R., & Zwolinski, M. (2001). Mutual information theory for adaptive mixture models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23, 396–403.CrossRefGoogle Scholar
  10. 10.
    Strukov, D. B., Snider, G. S., Stewart, D. R., & Williams, R. S. (2008). The missing memristor found. Nature, 453, 80–83.CrossRefGoogle Scholar
  11. 11.
    Heittmann A. and Noll, T. G. (2012). Limits of writing multivalued resistances in passive nano-electronic crossbars used in neuromorphic circuits. ACM Great Lakes Symposium on VLSI (GLSVLSI), pp. 227–232.Google Scholar
  12. 12.
    Ramacher, U., Malsburg, C. V. D. (2010). On the construction of artificial brains. Springer.Google Scholar
  13. 13.
    Hasegawa, T., Ohno, T., Terabe, K., Tsuruoka, T., Nakayama, T., Gimzewski, J. K., & Aono, M. (2010). Learning abilites achieved by a single solid-state atomic switch. Advanced Materials, 22(16), 1831–1834.CrossRefGoogle Scholar
  14. 14.
    Ahmed, K., Qinru Qiu, P., Malani, M. T. (2014). Accelerating pattern matching in neuromorphic intelligent text recognition system using Intel Xeon Phi Coprocessor. Proc. International Joint Conference on Neural Networks (IJCNN).Google Scholar
  15. 15.
    Yang, F., Qinru Qiu, M. B., Wu, Q. (2012). Tag-assisted sentence confabulation for intelligent text recognition. Proc. of Computational Intelligence for Security and Defense Applications (CISDA).Google Scholar
  16. 16.
    Li, Z., Qinru Qiu. (2014). Completion and parsing Chinese sentences using cogent confabulation. Proceedings of IEEE Symposium Series on Computational Intelligence (SSCI).Google Scholar
  17. 17.
    Wu, Q. Bishop, M., Pino, R., Linderman, R., Qiu, Q. (2011). A multi-answer character recognition method and its implementation on a high- performance computing cluster. 3rd International Conference on Future Computational Technologies and Applications, pp. 7–13.Google Scholar
  18. 18.
    Kim, K.-H., Gaba, S., Wheeler, D., Cruz-Albrecht, J. M., Hussain, T., Srinivasa, N., & Lu, W. (2011). A functional hybrid memristor crossbararray/cmos system for data storage and neuromorphic applications. Nano Letters, 12(1), 389–395.CrossRefGoogle Scholar
  19. 19.
    Choi, B. J., Chen, A. B., Yang, X., & Chen, I.-W. (2011). Purely electronic switching with high uniformity, resistance tunability, and good retention in pt-dispersed sio2 thin films for reram. Advanced Materials, 23(33), 3847–3852.Google Scholar
  20. 20.
    Zweig, G., Burges. C. J. C. (2012). A challenge set for advancing language modeling. In Proceedings of the NAACL-HLT 2012 Workshop: Will We Ever Really Replace the N-gram Model? On the Future of Language Modeling for HLT, pages 29–36. Association for Computational Linguistics.Google Scholar
  21. 21.
    Li, B., Zhou, E., Huang, B., Duan, J., Wang, Y., Xu, N., Zhang J., Yang, H. (2014). Large scale recurrent neural network on GPU. Neural Networks (IJCNN), 2014 International Joint Conference on, pp. 4062–4069.Google Scholar
  22. 22.
    Voorhies, R.C., Elazary, L., Itti, L. (2012). Neuromorphic Bayesian surprise for far-range event detection, on advanced video and signal-based surveillance (AVSS), IEEE Ninth International Conference on, pp 1–6, 18–21 Sept. 2012.Google Scholar
  23. 23.
    Neftci, E., Das, S., Pedroni, B., Kreutz-Delgado, K., & Cauwenberghs, G. (2014). Event-driven contrastive divergence for spiking neuromorphic systems. Frontiers in Neuroscience, 7, 272.CrossRefGoogle Scholar
  24. 24.
    Schmuker, M., Pfeil, T., & Nawrot, M. P. (2014). A neuromorphic network for generic multivariate data classification. Proceedings of the National Academy of Sciences, 111(6), 2081–2086.CrossRefGoogle Scholar
  25. 25.
    Merolla, P. A., Arthur, J. V., Alvarez-Icaza, R., Cassidy, A. S., Sawada, J., Akopyan, F., & Modha, D. S. (2014). A million spiking-neuron integrated circuit with a scalable communication network and interface. Science, 345(6197), 668–673.CrossRefGoogle Scholar
  26. 26.
    Yakopcic, C., Hasan, R., Taha, T. M. (2014). Tolerance to defective memristors in a neuromorphic learning circuit. Aerospace and Electronics Conference, NAECON 2014-IEEE National, (pp. 243–249). IEEE.Google Scholar
  27. 27.
    Bichler, O., Suri, M., Querlioz, D., Vuillaume, D., DeSalvo, B., & Gamrat, C. (2012). Visual pattern extraction using energy-efficient “2-PCM synapse” neuromorphic architecture. IEEE Transactions on Electron Devices, 59(8), 2206–2214.CrossRefGoogle Scholar
  28. 28.
    Gustavsson, M., Wikner, J. J., & Tan, N. (2000). CMOS data converters for communications. Springer Science & Business Media. Google Scholar
  29. 29.
    Liu, X., Mao, M., Liu, B., Li, H., Chen, Y., Li, B., Wang, Y. (2015). RENO: A High-efficient Reconfigurable Neuromorphic Computing Accelerator Design, Proc. of Design Automation Conference.Google Scholar
  30. 30.
    Zhirnov, V. V., Meade, R., Cavin, R. K., & Sandhu, G. (2011). Scaling limits of resistive memories. Nanotechnology, 22, 25.CrossRefGoogle Scholar
  31. 31.
    Pi, S., Lin, P., & Xia, Q. (2013). Cross point arrays of 8 nm 38 nm memristive devices fabricated with nanoimprint lithography. Journal of Vacuum Science and Technology, B31, 06FA02.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2015

Authors and Affiliations

  1. 1.Department of Electrical Engineering and Computer ScienceSyracuse UniversitySyracuseUSA
  2. 2.Department of Electrical and Computer EngineeringUniversity of PittsburghPittsburghUSA

Personalised recommendations