Fundamental methodological issues of syntactic pattern recognition
 1.8k Downloads
 9 Citations
Abstract
Fundamental open problems, which are frontiers of syntactic pattern recognition are discussed in the paper. Methodological considerations on crucial issues in areas of string and graph grammarbased syntactic methods are made. As a result, recommendations concerning an enhancement of contextfree grammars as well as constructing parsable and inducible classes of graph grammars are formulated.
Keyword
Syntactic pattern recognition Formal language Graph grammar1 Introduction
Representing a pattern as a structure of the form of string, tree or graph and a set of structures as a formal language is the main idea of syntactic pattern recognition [6, 24, 27, 42, 55], which is one of the main approaches in the area of machine recognition. A generation of such a language is made with a formal grammar. An analysis and a recognition of an unknown structure is performed with a formal automaton. If patterns are complex, they are defined in a hierarchical way. Thus, at the bottom of the hierarchy we use elementary patterns in order to build simple substructures (These elementary patterns are called primitives and they are represented with symbols of a language alphabet.). Then, using such simple substructures we construct more complex substructures and so on.
Syntactic pattern recognition prevails over “standard” pattern recognition approaches (probabilistic, discriminant functionbased, NN, etc.) when patterns considered can be characterized better with structural features than vectors of features. What is more, using this approach not only can we make a classification (in a sense of ascribing a pattern to a predefined category), but also a (structural) interpretation of an unknown pattern. Therefore, for structurallyoriented recognition problems such as: character recognition, speech recognition, scene analysis, chemical and biological structures analysis, texture analysis, fingerprint recognition, geophysics, a syntactic approach has been applied successfully since its beginning in the early 1960s for the next two decades. A rapid development of syntactic methods has slowed down since 1990s and the experts in this area (see e.g. [26]) have found this approach stagnating.
Methodological considerations on the issues which have an impact on further development of syntactic methods are made in the paper. Firstly, however, key open problems constituting the frontiers of this research area should be identified. It can be easily noticed in the literature concerning syntactic pattern recognition [6, 24, 27, 42, 55] that in the field of stringbased models a lot of efficient methods have been developed for structural patterns that can be generated with regular or contextfree grammars. On the other hand, if a set of patterns cannot be represented with contextfree languages, i.e. it is of a contextsensitive nature, then defining an efficient recognition method is difficult. It results from a nonpolynomial time complexity of automata analyzing contextsensitive languages. Therefore, defining string grammars generating languages with a polynomial membership problem that are stronger than contextfree grammars seems to be still the key open problem in this area.
If a pattern is structurally complex, a linearlike string description is very often too weak for its representation. Then, a graph representation is usually used. It means that one should use a graph grammar for a generation of a set of patterns and a graph automaton (parser) for its analysis. Unfortunately, a problem of parsing of nontrivial graph languages is PSPACEcomplete or NPcomplete [4, 51, 56]. Therefore, defining graph grammars generating languages with a polynomial membership problem is the second crucial open problem in syntactic pattern recognition.
Before we consider two open key problems identified above in Sects. 3 and 4, respectively, we try to formulate in Sect. 2 some general methodological recommendations concerning a research in syntactic pattern recognition. Our considerations are based on the 20 years research experience in both stringbased and graphbased syntactic pattern recognition [12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 33, 35]. Hopefully, our recommendations concerning methodological aspects of a research in syntactic pattern recognition launch a discussion on prospects and limitations of a future development of this field.
2 General remarks on syntactic pattern recognition model
As we have mentioned it in a previous section, a grammar (a pattern generator) and an automaton (a pattern recognizer/analyzer) are basic formalisms of syntactic pattern recognition. For most applications of the theory of formal languages, including: programming languages, a construction of compilers, etc., these formalisms are sufficient, since a grammar is defined by a designer on the basis of a welldefined syntax of the language.
I. A syntactic pattern recognition model should be complete. It means that it should consist of the following three components: a grammar, an efficient syntax analyzer and a grammatical inference algorithm of a polynomial complexity.
An abstract/generalized representation of a pattern as a structure defined with a predefined primitives is a good point of a syntactic approach, since it is a kind of an analogy to a recognition based on a predefined perceptual concepts made by a human being. On the other hand, such a generalization of phenomena (images) performed by a computer system can be too rough, because of a fuzzy/vague nature of the realworld phenomena. Therefore, a symbolic representationbased syntactic pattern recognition scheme has been often “enhanced” in order to handle a problem of a fuzziness of the realworld phenomena, as well as a problem of a noise/distortion appearing at a stage of an image acquisition [24].
In the first approach we define transformations corresponding to distortions of strings representing patterns. There are three kinds of such distortions. A substitution error consists in an occurrence of a terminal symbol a instead of b in a string, which usually is a result of a misrecognition of a primitive. Deletion or insertion errors appear when there is a lack of some terminal symbol in a phrase or a certain symbol occurs, whereas it should not, respectively. These two errors result usually from segmentation errors. Having all the possible errors determined, we should expand a grammar generating “ideal” patterns by adding productions corresponding to error transformations. Now, we can use a parser, which computes a distance between an analyzed string x and a proper string y (i.e. a string belonging to an underlying language). Such a parser is called a minimumdistance errorcorrecting parser, MDECP [2]. This distance can be computed simply as the smallest number of error transformations required to obtain a string x from a string y. If we ascribe various costs (weights) to various error transformations, a weighted distance can be calculated.
If errors resulted from preprocessing phases are more “subtle” than differences between symbolic (categorybased) primitives, attributed grammars are applied [36]. In such an approach, attributes which characterize features of primitives in detail (e.g. numeric features) are used. Productions of an attributed grammar contain a syntactic part (corresponding to “standard” productions of nonattributed grammars) and a “semantic” part, called a semantic rule. Such a rule allows one to evaluate attributes of certain symbols appearing in the production in terms of attributes of other symbols. A distance between an analyzed pattern and the language consisting of model (“ideal”) patterns can be computed during parsing not only on the basis of structural distortions, but also with the help of vectors of attributes.
The third approach to a syntax analysis of noisy patterns can be used, if one is able to observe that some patterns occur more frequently than others. Such a phenomenon can be noticed, for example, during a process of a grammatical inference performed on the basis of a sample of patterns. In such a case occurrences of patterns can be used for evaluating their probabilities. As a result a stochastic grammar can be defined [23]. In such a grammar, probabilities are assigned to productions, so during a derivation a probability of a generated pattern can be computed. A corresponding parser, called a maximumlikelihood errorcorrecting parser, MLECP, evaluates additionally a probability with which an unknown pattern belongs to an underlying language.
After a brief presentation of the main approaches to a problem of a fuzzy/vague nature of realworld phenomena, we can formulate the second methodological remark concerning a pattern recognition model.
II. If a syntactic pattern recognition model is to be used for a classification/interpretation of realworld objects or phenomena ^{2}, it should be enhanced with a mechanism allowing one to handle a problem of their fuzzy/vague nature. Errorcorrecting parsing, attributed grammars and stochastic grammars are typical enhancement mechanisms applied in such a case.
Decisiontheoretic classification methods make use of a generic pattern representation of the form of a feature vector. In consequence they are allpurpose in a sense they can be applied for various application areas. On the contrary, developing a syntactic model, we define a representation, which is adequate (so, specific) for a given application area, i.e. a nature of patterns occurring in this area [6, 24, 27, 42, 55]. A form of a structural representation determines, in turn, a form (type) of a formal grammar which is a basis for a construction of a model. A generative power of a formal grammar is its fundamental characterization. In order to define it formally, we introduce firstly basic notions. We will make it in a general way, i.e. we do not determine a grammar structure (like a quadruplestructure for standard Chomsky’s grammars), since the structure varies for grammars considered in this paper.
If \(\Upsigma\) is a set of any symbols, then \(\Upsigma^{*}\) (Kleene star) denotes a set of all strings that can be constructed by catenating symbols of \(\Upsigma, \) including the empty string (empty word), denoted with λ. A language L is a subset of \(\Upsigma^{*}\).
Let G be a grammar. Let components of G are denoted in the following way. V is a set of symbols (alphabet). \(\Upsigma \subset V \) is a set of terminal symbols, i.e. symbols that occur in words of a language generated with G. P is a set of productions (rules) used to generate a language. A production is denoted by: \(\gamma \longrightarrow \delta, \ \gamma, \delta \in V^{*}, \) which means that a substring γ can be replaced by a substring δ. \(N = V \setminus \Upsigma\) is a set of nonterminal symbols. Nonterminal symbols are auxiliary symbols and they are used in a process of deriving language words with the help of productions. (They play a role similar to variable symbols in mathematics.) They do not occur in words of a language generated with G. (The language contains only terminal symbols.) \(S \in N\) is the starting symbol.
An application of a production to a string \(\alpha \in V^{*}\) that results in obtaining a string \(\beta \in V^{*}\) is called a derivation step, denoted \(\alpha \Longrightarrow \beta. \) Thus, for defining a production (rule) we use a symbol \(\longrightarrow, \) whereas for denoting its application a symbol \(\Longrightarrow\) is used. A sequence of derivation steps (including the empty sequence) is denoted with \(\mathop\Rightarrow\limits^{*}\).
A language generated with G is a set \(L(G)\,=\,\{\alpha  S \mathop\Rightarrow\limits^{*} \alpha, \ \alpha \in \Upsigma^{*}\} \).
Let X denotes a type of formal grammars. A class X of languages is a set \( {\cal L}(X) = \{L  \exists G\,of\,the\,type\,X\,:\,L\,=\,L(G) \} \), i.e. it is a set containing all the languages L that can be generated with any grammar G of the type X. We say that grammars of a type X are of a bigger generative power than grammars of a type Y, if \({\cal L}(Y ) \subsetneq {\cal L}(X)\).
In general, the bigger generative power of a grammar is, the bigger computational complexity of the corresponding automaton is. Moreover, in case of a growth of a generative power of a grammar, constructing an efficient inference algorithm is even more difficult than defining an efficient automaton. Summing up our considerations, we can propose the following methodological principle.
III. Any syntactic pattern recognition method should be constructed for a specific problem of a strictlydefined application area, and with the use of the Ockham Razor principle with respect to generative power of an underlying grammar. That is, a grammar should be of the smallest generative power yet sufficient to generate all the possible patterns.
3 Enhanced string contextfree grammars
In an introduction we have identified an issue of an enhancement of a generative power of contextfree grammars as the one of most important key open problems in syntactic pattern recognition. In this section we discuss it in a more detailed way.
3.1 Survey of models
In this section we present and discuss certain types of enhanced contextfree grammars. Such grammars are required to generate all the contextfree languages and also certain contextsensitive languages^{3}. There are a lot of taxonomies and characterizations of enhanced CFGs. In the theory of formal languages Dassow and Păun [8, 9] have defined a taxonomy for enhanced CFGS, called here regulated rewriting (controlled) grammars that is of a great importance for studying formal properties of such grammars. In the field of Natural Language Processing (NLP) various types of enhanced CFGs, which are convenient for solving crucial problems in this area, have been defined within a class of the socalled mildly contextsensitive grammars, MCSGs [57]. We will try to analyze important types of grammars from the point of view of syntactic pattern recognition. Especially, we will have in mind the first methodological recommendation formulated in a previous section, that is a possibility of constructing an efficient parser and a polynomial inference algorithm.
In order to enhance a contextfree grammar, we should devise it with an ability of controlling a derivation process. In a standard (Chomskyan) paradigm it can be made either by including certain derivation control operators in grammar productions or by defining a separated (w.r.t. productions) derivation control mechanism. We begin with the first approach. An indexed grammar [1] introduced by Aho in 1968 was the first type of grammars developed within this approach. Let us define it formally.
Definition 1

[..] represents a stack of indices, a string in I ^{*},

[i..] represents a stack of indices where i ∈ I is the top element of the stack.
 1.
If \(A \longrightarrow X_1,\ldots, X_k\) is a production of type (1), then β Aδγ ⇒ β X _{1}δ_{1} , …, X _{ k }δ_{ k } γ , where δ_{ j } = δ if X _{ j } ∈ N and δ_{ j } = λ if \(X_j \in \Upsigma. \)
 2.
If \(A[..] \longrightarrow B[i..]\) is a production of type (2), then β Aδγ ⇒ β B iδγ.
 3.
If \(A[i..] \longrightarrow [..] X_1, \ldots, X_k\) is a production of type (3), then β A iδγ ⇒ β X _{1}δ_{1}, …, X _{ k }δ_{ k } γ, where δ_{ j } = δ if X _{ j } ∈ N and δ_{ j } = λ if \(X_j \in \Upsigma. \)
A symbol \(\mathop\Rightarrow\limits^{k}\) denotes an application of the kth production.

d is a straight line segment,

l is left 60° “turn”,

r is right 60° “turn”.
Let us describe the derivation in an intuitive way. We begin with putting an index i on the stack with the help of the first production. The more indices i we put on the stack the more complex structure we receive. We start a proper generation of a structure by applying the second production. Productions: 3 and 4 generate primitives: r and l (at the same time they remove indices i from the stacks). The effect of the first application of a production 3 is shown in Fig. 3c. The effect of the first application of a production 4 is shown in Fig. 3d. Productions: 5 and 6 generate a primitive d (cf. Fig. 3e). The final effect of the derivation is shown in Fig. 3f.
To obtain a more complex structure than the one shown in Fig. 3f, the first production should be applied three times (cf. Fig. 3g). If we apply the first production four times a successive more complex structure is received (cf. Fig. 3h), etc.
As one can see, additional stacks of indices assigned to nonterminals are used in indexed grammars. If a production is applied to a nonterminal with a stack, then all nonterminals of the righthand side of the production receive copies of this stack. Such a mechanism allows us to reflect contextual dependencies during a derivation. Neither a polynomial parser nor a polynomial grammatical inference algorithm have been defined for indexed grammars.
Let us notice that some additional syntactic constructs (brackets: [ ], indices) occur in both grammar productions and nonfinal phrases of a derivation, apart from terminal and nonterminal symbols. These constructs do not occur in words of a language generated and they play a role of operators controlling a derivation. An occurrence of such operators is typical for mildly contextsensitive grammars (MCSGs) [57] used in NLP and mentioned above. Mildly contextsensitive languages (MCSLs) fulfill the following properties. MCSLs contain contextfree languages and certain languages with context dependencies (\(L_{1} = \{a^{n}b^{n}c^{n}  n \geq 0 \}, L_{2} = \{a^{n}b^{m}c^{n}d^{m}  n,m \geq 0 \}, L_{3} = \{ww  w \in \{a,b\}^{*} \}\)). Their membership problem is solvable in a deterministic polynomial time. MCSLs have the linear growth property (if strings of a language are ordered in a sequence according to their length, then two successive lengths do not differ in arbitrary large amounts). The best known MCSGs include: linear indexed grammars, head grammars, and combinatory categorial grammars ^{4}.
Now, we briefly characterize MCSGs mentioned above. Let us start with linear indexed grammars (LIGs) introduced by Gazdar [25]. LIG differs from an indexed grammar in the form of productions. In LIG at most one nonterminal in each production receives the stack of indices. (In indexed grammars all nonterminals receive copies of the stack.) Let us introduce the following definition.
Definition 2
The head grammars (HGs) were introduced by Pollard in 1984 [46]. They are defined in the following way.
Definition 3
Head grammars differ from contextfree grammar in containing a distinguished symbol “\(\uparrow\)” in each string. This symbol corresponds to the head of the string. The nonterminals of a head grammar derive headed strings or pairs of terminal strings (u, v) that we denote \((u \uparrow v). \) There are two types of operations that can be performed using the head. The first is a concatenation C _{ i,n }. It joins n headdivided words in order and inserts a new head in the string \(C_{i,n} (u_1\uparrow v_1,\ldots,u_i \uparrow v_i,\ldots, u_n\uparrow v_n) = u_1 v_1,\ldots, u_i\uparrow v_i ,\ldots, u_n v_n. \) The second operation is wrapping W which inserts one word into another based on the head position \(W(u_1 \uparrow v_1, u_2 \uparrow v_2) = u_1 u_2 \uparrow v_2 v_1\).
Combinatory categorial grammars (CCGs) were introduced by Steedman in 1987 [53]. Let us introduce their definition.
Definition 4
 1.
forward application: \( (x / y) \, y \rightarrow x\)
 2.
backward application: y (x\y) → x
 3.
generalized forward composition for some \(n \geq 1: \ (x / y) \ (\ldots(y _1 z_1) _2 \ldots _n z_n) \rightarrow (\ldots (x _1 z_1) _2 \ldots _n z_n)\)
 4.
generalized backward composition for some n ≥ 1: (…(y_{1} z _{1}) _{2}… _{ n } z _{ n }) (x\y) → (… (x_{1} z _{1}) _{2}…_{ n } z _{ n })
Derivations in a CCG involve the use of the combinatory rules in R (instead of productions in a “common” formal grammar). Let the “derives” relation be defined as: α c β ⇒ α c _{1} c _{2} β if R contains a combinatory rule that has c _{1} c _{2} → c as an instance, and α and β are string of categories. Then the string languages generated by a CCG is defined as: \(L(G) = \{ a_1, \ldots\!,\!a_n  S \Rightarrow \cdots \Rightarrow c_1, \ldots\!, c_n, c_i \in f(a_i), a_i \in \Upsigma \cup \{ \lambda \}, 1 \leq i \leq n \}. \)
 rule r1:

\((x^S / T) (T \backslash A / T \backslash B) \rightarrow (x^S \backslash A / T \backslash B)\)
 rule r2:

\((A / D) (x^S \backslash A) \rightarrow (x^S / D)\)
 rule r3:

\((x^S / y) y \rightarrow x^S\)
 rule r4:

y(x ^{ S }\y) → x ^{ S }
 f1:

\(f(a) = \{ (A / D) \}\)
 f2:

f(b) = { B }
 f3:

f(d) = {D}
 f4:

\(f(c) = \{(T \backslash A / T \backslash B) \}\)
 f5:

f(λ) = { (S/T), T }
The derivation is made by applying rules: 3, 2, 4, and 1, and then f(λ), f(a), f(b), f(c), f(d).
Similarly as in the case of mildly contextsensitive grammars applied in the field of NLP and presented above, derivation control operators included in productions have been recently used in two types of enhanced contextfree grammars introduced by Okhotin in the theory of formal languages. These grammars allow one to specify such theoretical operations over sets of languages as their intersection, negation, etc. Let us consider the following definition [40].
Definition 5
Intuitively speaking, a rule in a conjunctive grammar specifies that every string which satisfies each of the conditions α _{ i } is generated by A.
In the grammar G nonterminal A generates any number of a symbols, while F generates strings with equal numbers of b symbols and c symbols (b ^{ n } c ^{ n }). On the other hand, G generates strings with equal numbers of a symbols and b symbols (a ^{ n } b ^{ n }) while C generates strings with any number of c symbols. By taking the conjunction of the languages associated with AF and GC (since S → AF & GC), grammar G generates the language L(G) = {a ^{ n } b ^{ n } c ^{ n }, n > 0}.
Boolean grammars defined by Okhotin in 2004 [41] are more general than conjunctive grammars. Additionally, a negation operator can be used in productions that results in a possibility of expressing every Boolean operation over sets of languages. Both conjunctive and Boolean grammars generate all contextfree languages and a subset of contextsensitive languages. Polynomial parsers have been defined for both classes. An investigation in grammatical inference has not been led, because of theoretical objectives of the research (enhancing generative power of CFGs allowing to express logical operations over sets of underlying contextfree languages).
After presenting types of grammars with derivation control operators included in productions, let us introduce grammars with a separated control mechanism, i.e. the mechanism that is not “hidden” in left or righthand sides of a production. Such a methodology is used in programmed grammars introduced by Rosenkrantz in 1969 [48].
Definition 6
A derivation is defined as follows. A production labelled with (1) is applied firstly. If it is possible to apply a production (r), then after its application the next production is chosen from its success field U. Otherwise, we choose the next production from the failure field W.
Now, we use once more an example of the Sierpinski Triangle (see Fig. 3a) for showing a big descriptive power of a programmed grammar. Let us define a programmed grammar G, which generates the Sierpinski tiling arrowhead (it has been generated with an indexed grammar at the beginning of this section).
Let us describe the derivation in an intuitive way. The successive iterations of a development of subsequent structures are “programmed” in the grammar G. We start with applying the first production (see Fig. 4a). Secondly, all the nonterminals indexed with n (i.e. A _{ n } and B _{ n }) are replaced with nonterminals indexed with o (i.e. A _{ o } and B _{ o }) with the help of productions: 2 and 3 (see Fig. 4b). Then, each nonterminal indexed with o is developed into a substructure B _{ n } r A _{ n } r B _{ n } or A _{ n } l B _{ n } l A _{ n } with the help of productions: 4 or 5, respectively (see Fig. 4c, d). At this moment, we can replace all the nonterminals with terminals d with the help of productions: 6 and 7 (cf. Fig. 4e, f) finishing the generation or we can begin the next iteration starting from a form shown in Fig. 4d.
A static control mechanism of programmed grammars (success and failure fields include fixed indices of productions) has been extended in DPLL(k) grammars (Dynamically Programmed LL(k) grammars) [19]. Instead of success and failure fields, every production is devised with a control tape. A head of a tape can write/read indices of productions and it can move. A derivation is made according to a content of a tape. We introduce the following definition.
Definition 7
A derivation for dynamically programmed grammars is defined in the following way. Apart from testing whether L _{ i } occurs in a sentential form derived, the predicate of applicability of a production p _{ i } is checked. If it is true, then L _{ i } is replaced with R _{ i }, and actions over derivation control tapes for certain productions are performed. A derivation control tape for a production corresponds to a success field of programmed grammars. The difference is that, whereas in common programmed grammars this field is fixed at the moment of defining a grammar, in dynamically programmed grammars this “field” is dynamically filled with labels of productions during a derivation with the help of the set of actions A _{ i }.
In order to construct a polynomial syntax analyzer for dynamically programmed grammars, restriction forcing a deterministic derivation and limiting “recursive steps” have been imposed in the following way.
Definition 8
Let \(G = (V, \Upsigma, O, P, S)\) be a dynamically programmed contextfree grammar, First _{ k }(x) denotes a set of all the klength terminal prefixes of strings derivable from x in a grammar G ^{5}, \(\mathop \Rightarrow \limits_{{core}}^{*}\) denotes a sequence of derivation steps consisting in applying only production cores. The grammar G is called a Dynamically Programmed LL(k) grammar, DPLL(k), if the following two conditions are fulfilled.
(2) For a grammar G there exists a certain number ξ such that for any lefthand side derivation \(S \mathop \Rightarrow \limits^{*} wA\alpha \mathop \Rightarrow \limits^{\pi} w\beta\alpha\) (where \(w \in \Upsigma^*, A \in N, \alpha, \beta \in V^*, \pi\) is a string of indices of productions applied) fulfilling a condition: π ≥ ξ, the first symbol of βα is the terminal one.
The first condition is analogical to a constraint put on a contextfree grammar by a definition of wellknown LL(k) grammar [49] in order to make a derivation deterministic by checking the first k symbols of the righthand sides of productions. However, in a DPLL(k) grammar there can be more than one productions generating wx from wA α, but only for one the predicate of applicability is fulfilled at this derivational step. With such a definition, a lefthand recursion can occur. Therefore, a number of “recursive” steps is limited with the second condition.
Label  μ  Core  Actions 

1  TRUE  S \(\rightarrow\) aAbBcC  ∅ 
2  TRUE  A \(\rightarrow\) aA  add(4,4); add(6,6); 
3  TRUE  A \(\rightarrow \lambda\)  add(4,5); add(6,7); 
4  read(4) = 4  B \(\rightarrow\) bB  move(4); 
5  read(4) = 5  B \(\rightarrow \lambda\)  move(4); 
6  read(6) = 6  C \(\rightarrow\) cC  move(6); 
7  read(6) = 7  C \(\rightarrow \lambda\)  move(6); 
Production  Sentence derived  DCL_{4}  DCL_{6} 

S  
1  aAbBcC  
2  aaAbBcC  4  6 
2  aaaAbBcC  44  66 
3  aaabBcC  445  667 
4  aaabbBcC  #45  667 
4  aaabbbBcC  ##5  667 
5  aaabbbcC  ###_  667 
6  aaabbbccC  ###_  #67 
6  aaabbbcccC  ###_  ##7 
7  aaabbbccc  ###_  ###_ 
A descriptive power of a DPLL(k) grammar has been increased in its generalized version, called GDPLL(k), allowing one to generate such “complex” languages as e.g. \(L(G)=\{a^{2^{n}}, n>0\}\) [35]. A parsing algorithm for both DPLL(k) and GDPLL(k) grammars is of the O(n) computational complexity, and a grammatical inference algorithm is also of a polynomial complexity, \(O(m^{3} \cdot n^{3}),\) where m is a sample size, n is the maximum length of a string in a sample [34].
All the formalisms discussed above can be characterized as string grammars belonging to the Chomskyan paradigm, i.e. they are “enhanced” versions of the Chomsky’s contextfree grammars generating certain contextsensitive languages. Below we present some models going beyond this paradigm, which are also, in our opinion, worth considering in a context of syntactic pattern recognition.
A hybrid syntacticstructural model based on an augmented regular expression, ARE was introduced in 1997 by Alquezar and Sanfeliu [3]. Intuitively speaking, ARE = (R, V, T, L) is defined by a regular expression R, in which the stars are replaced by naturalvalued variables, called star variables V, and these variables are related through a finite number of linear equations L. Additionally, with a star tree T a structure of the parentheses’ embedment in the expression is determined. An analysis of a string s in order to verify its belonging to a language L(ARE) is performed in two steps. Firstly, with parsing of s, which is performed by a finite stateautomaton corresponding to R, a verification of a general structure of s is made. Secondly, if a verification is positive, testing a fulfillment of constraints L that result from the parsing (i.e. an evaluation of a finite linear relations between star variables) is made. With augmented regular expressions, a considerable subclass of contextsensitive languages can be represented. A syntacticstructural analysis performed according to this scheme is of a polynomial complexity. A learning method has been defined in this model, as well.
In the field of Natural Language Processing Joshi, Levy and Takahashi defined in 1975 [31, 32] a tree adjoining grammar, TAG belonging to a class of mildly contextsensitive grammars (MCSGs). A TAG generates labelled trees by an application of operations of two kinds over initial trees and auxiliary trees. These operations include: substitution that attaches a tree to a substitution node (a nonterminal leaf node marked with a special symbol) of a tree derived from an initial tree, and adjunction that inserts an auxiliary tree into an internal node of a tree. A string language generated with a TAG is defined as a set of frontiers of trees generated. Thus, a tree adjoining grammar generates a kind of derivation trees of strings belonging to a mildly contextsensitive langauge. In fact, it is a treerewriting system, not a stringrewriting system. Parsing for TAGs is of a polynomial complexity.
Contextual grammars, CGs, used also in the NLP area, were introduced by Marcus in 1969 [37]. CGs go beyond the Chomskyan paradigm of string rewriting. An operation of inserting words into derived phrases according to contextual dependencies is used here instead of Chomskyan productions involving nonterminal symbols for generating phrases. An insertion operation is performed with contexts being pairs of words connected with sets of words called selectors. During a derivation, elements of contexts are “wrapped around” associated elements of selectors, called selector elements. Contextual grammars of various types are incomparable with grammars of the Chomsky hierarchy, which are a “standard” formal model in syntactic pattern recognition. Nevertheless, some contextsensitive languages can be generated by CGs. They are also worth considering in a context of syntactic pattern recognition, since for some classes of CGs a polynomial parsing algorithms have been defined (e.g. [28]), as well as a polynomial algorithm of grammatical inference [38].
3.2 Methodological remarks on stringbased syntactic pattern recognition
General characteristics  Type of grammars  Parsing  Inference 

Chomskyan string grammars with derivation control operators included in productions  Indexed [1]  Nonpolyn.  Undefined 
Linear indexed [25]  Polynomial  Undefined  
Head [46]  Polynomial  Undefined  
Combinatory categorial [53]  Polynomial  Undefined  
Conjunctive [40]  Polynomial  Undefined  
Boolean [41]  Polynomial  Undefined  
Chomskyan string grammars with a separated derivation control mechanism  Programmed [48]  Nonpolyn.  Undefined 
DPLL(k) [19]  Polynomial  Polynomial  
Other approaches  ARE [3]  Polynomial  Nonpolyn. 
Polynomial  Undefined  
Marcus CG [37]  Polynomial  Polynomial 
As one can see, a definition of a grammatical inference algorithm is the main problem here. In most enhanced models, such as indexed grammars, head grammars, combinatory grammars and conjunctive/Boolean grammars, a derivation control mechanism is “hidden” (cleverly) in grammar productions. It is made with the use of some syntax constructs like stacks, heads, operators that do not occur in the words of a language. Let us notice that the main idea of standard inference methods (i.e. used for regular and contextfree languages) consists in looking for similarities among sample strings. An alphabet of a grammar inferred contains only terminal symbols that occur in the language sample and nonterminals that are entered in a (relatively) simple way as “classes of abstractions” for certain substrings. The only “operator” used is a simple catenation operator. In a way, this operator is “visible” in a derived word. However, if grammar productions contain operators that disappear during a derivation and do not occur in a derived word, a grammatical inference problem becomes very difficult. The reason is the fact that a syntax of sample words does not deliver any information related to a history of obtaining these words with such operators. It is hardly likely that algorithms reconstructing such a history can be of a polynomial complexity. On the other hand, for Chomskyan string grammars with a separated derivation control mechanism (DPLL(k)) a polynomial grammatical inference algorithm has been defined [34].
At the same time, for some mathematical linguistics models, which goes beyond a “standard” Chomsky’s string grammars approach, inference algorithms have been defined. Thus, such “unconventional” approaches are worth considering as candidates for theoretical basis of syntactic pattern recognition methods.
Let us summarize our considerations on enhanced stringbased syntactic pattern recognition models and parsing/inference issues with the following methodological recommendation.
IV. A possibility of defining an algorithm of grammatical inference is a key issue for constructing an effective syntactic pattern recognition system. Defining a control mechanism enhancing a grammar as a separate element makes a development of an efficient grammatical inference algorithm easier than “hiding” this mechanism in left or righthand sides of productions with the help of additional syntactic operators. At the same time, a possibility of applying models going beyond a standard Chomskyan string grammar paradigm in the field of syntactic pattern recognition is worth studying.
Now, we sum up the second methodological recommendation introduced in Sect. 2, i.e. a possibility of an enhancement of a syntactic pattern recognition model with: errorcorrecting parsing, adding attributes to a grammar or adding probability information to a grammar. Theoretical considerations verified by a practical application experience [5, 24] show that such an enhancement does not cause any problems in case of constructing a parsing algorithm for a syntactic pattern recognition. The use of attributed or stochastic grammars does not make a construction of an inference algorithm more difficult than in case of “pure” grammars either [5, 24]. However, in case of using an errorcorrecting parsing scheme, an expanded grammar is to be defined, as we have discussed it in Sect. 2 Usually, it is a human being who decides, which structures are “proper” and which structures are distortions of “proper” structures. Therefore, using such a scheme would hinder solving a grammatical inference problem. In fact, this problem belongs to Artificial Intelligence rather than to a pattern recognition area.
4 Parsable graph languages in syntactic pattern recognition
In spite of the fact that graph grammars have been widely used for image representation and synthesis since the late 1960s and the early 1970s (e.g. [44, 45]), when they were firstly proposed for this purpose, their application in an area of syntactic pattern recognition has been relatively rare. A possibility of their use in this area depends strongly on a balance between a generative power sufficient for representing a class of complex “multidimensional” patterns and a parsing efficiency. In this section we consider this problem and try to formulate certain recommendations concerning its possible solutions.
4.1 Survey of research into graphbased syntactic pattern recognition
Although the first graph automata were proposed in the late 1960s, only a few graph grammarbased syntactic pattern recognition models have been presented for last 40 years^{6}. Web automata were proposed by Rosenfeld and Milgram in 1972 [47]. An efficient parser for expansive graph grammars was constructed by Fu and Shi in 1983 [50]. In 1990 two parsing algorithms for plex grammars were defined independently by Bunke and Haller [5], and Peng, Yamamoto and Aoki [43]. In the early 1990s two parsing methods for relational grammars were proposed independently by: Wittenburg [58], and Ferruci, Tortora, Tucci and Vitiello [11]. An efficient, O(n ^{2}), parser for the ETPL(k) subclass of the wellknown edNLC [30] graph grammars was constructed in 1993 [17, 18]. A parsing method for reserved graph grammars was proposed by Zhang, Zhang and Cao in 2001 [59].
4.2 Methodological remarks on graphbased syntactic pattern recognition
Constructing parsing algorithm for graph languages is much more difficult task than for string languages. There are two reasons of such a difficulty. First of all, a graph structure is unordered by its nature, whereas a linear order is defined by a string structure^{7}. During parsing, however, succeeding pieces of an analyzed structure (subwords in case of strings, subgraphs in case of graphs), called here handles, are teared off repetitively in order to be matched with predefined structures (predefined on the basis of righthand sides of grammar productions) that are stored in a parser memory. An answer to a question: what a succeeding piece is? is easy when there is any kind of ordering determined for an analyzed structure. In case of the lack of any order in a graph structure, this question resolves itself into the problem of subgraph isomorphism, which is the NPcomplete one.
There is, however, the second reason causing constructing graph parser very difficult. In case of string grammars we know how (or rather where) to glue/embed a right handside of a production in a structure transformed during a derivational step. It results from a uniform rigid structure of strings. However, in case of graph grammars we have to specify how to embed the righthand side graph in the restgraph in an explicit way. Such a specification is made with the help of the third component of a production, i.e. the embedding transformation. The embedding transformation allows one to modify a derived graph structure. On the other hand, it acts at the border between the left (right) hand sides of the production and their context, i.e. its behaviour is “context sensitive”like.
These two “features” of graph grammars cause their big generating power. On the other hand, they result in a hard (i.e. PSPACEcomplete or NPcomplete) membership problem for classes of graph grammars interesting from the application point of view, which has been shown at the beginning of 1980s independently by: Brandenburg [4], Slisenko [51] and Turan [56]. Therefore, in case of a construction of a graph parser, either a generative power of a class of grammars is usually decreased with imposing certain restrictions or one defines a specific class of graph languages with a polynomial membership problem (like e.g. in case of expansive grammars [50]). Summing up, we can define the following methodological recommendation.
V. Constructing a syntactic pattern recognition model based on a class of graph grammar defined, one should focus primarily on a polynomial complexity of a membership problem for languages generated by this class.
Now, let us consider the second issue relating to constructing effective syntactic pattern recognition model, i.e. an inference algorithm. In case of graph grammars applied for syntactic pattern recognition, an efficient (polynomial) algorithm has been defined only for the parsable ETPL(k) [17, 18] grammars [21]. As we have mentioned it previously, a grammatical inference problem is much more complex and difficult than a parsing problem. Analyzing both the parsing algorithm and the inference algorithm for ETPL(k) grammars, one could easily see that the model has been constructed with the help of analogies to string grammars. Particularly, deterministic properties of a derivation of ETPL(k) graph languages have been obtained on the analogy of the wellknown string deterministic LL(k) grammars [49]. Let us, now, consider this analogy.
Now, let us look at Fig. 5b illustrating a basic idea of ETPL(k) graph grammars. In this case imposing a condition on edNLC graph grammars causing their deterministic derivation has been the main objective, as well. Thus, we have demanded an unambiguity of a production choice during a (leftmost) derivation. For a string LL(k) grammar such an unambiguity has concerned the klength prefix of a word (i.e. the klength handle). In case of graphs it should concern a subgraph of a terminal graph. Such a subgraph contains a node a having an index (i) determining a place of a production application and its k successors: \(a_1, \ldots , a_k. \) Such a subgraph is called the ksuccessors handle. If for every derivational step in a grammar we can choose a production in an unambiguous way on the basis of an analysis of the ksuccessors handle, then we say that the grammar has a property of an unambiguous choice of a production with respect to the ksuccessors handle in a (leftmost) derivation.
Similarly, defining a general scheme of an inference of an ETPL(k) grammar, the author has made use of analogies of the wellknown scheme of a formal derivatives method used for inferencing string grammars [24]. Summing up, looking for analogies in the area of string languages seems to be a good methodological technique while we make a research in the area of graph languages. Thus, let us formulate the following methodological recommendation.
VI. During a process of constructing a graph parser and an algorithm of a graph grammar inference one should look for analogical constructs and mechanisms in an area of the theory of string languages.
5 Conclusions
The theory of formal languages (mathematical linguistics) constitutes a formal basis for various research areas in computer science. A specificity of an area determines methodological principles that should be followed during a research. In other words, for various areas various principles are valid. The general principles concerning an application of mathematical linguistics formalisms in a syntactic pattern recognition area have been formulated in the paper.
These principles can be summarized as follows. A syntactic pattern recognition model should include not only an efficient syntax analyzer, but also a grammatical inference algorithm of a polynomial complexity. If the model is to be used for a recognition of realworld objects or phenomena, then it should be enhanced with such techniques as: errorcorrecting parsing, attributed grammars or stochastic grammars. In order to ensure a computational efficiency of the model, a type of a grammar should be of the smallest generative power yet sufficient to generate all the possible patterns. For contextfree grammars enhanced with the help of additional syntactic operators, a construction of a grammatical inference algorithm is easier than for grammars with a control mechanism “hidden” in productions. A polynomial complexity of a membership problem is a key issue for a graph grammarbased pattern recognition. In order to define algorithms of: syntax analysis and grammatical inference for graph grammarbased methods one should look for analogical algorithms for string grammars.
Let us remember that they are not necessarily valid in other research areas that make use of formalisms of mathematical linguistics such, as e.g. Natural Language Processing, compiler design.
A syntactic pattern recognition paradigm is primarily an approach in a machine recognition area. However, it relates also strongly to other fields of computer science, like: Artificial Intelligence (a problem of pattern understanding, see e.g. [54]), a construction of secret sharing techniques (see e.g. [39]), etc. Therefore, a set of methodological principles presented in the paper will be extended in the future with new ones connected with the AI paradigms.
Footnotes
 1.
On the other hand, a parsing algorithm should be as efficient as it is possible especially, if a recognition performed by the system is to be made in a realtime mode.
 2.
Sometimes a syntactic pattern recognition scheme is used for analyzing objects or systems being artefacts, like for example a particle physics detector system (see e.g. [19]). Then, an enhancement of a syntactic model can be unnecessary.
 3.
One can enhance other classes of grammars than CFGs with mechanisms discussed in the paper. Nevertheless, in syntactic pattern recognition we are interested primarily in enhancing CFGs.
 4.
Tree adjoining grammars are the fourth wellknown MCSGs, but, in fact, they are tree (not string) grammars.
 5.
First _{ k }(x) was introduced for the LL(k)subclass of CFGs by Rosenkrantz and Stearns [49].
 6.
In the survey, we present those parsing models that have been applied in the area of syntactic pattern recognition.
 7.
In case of tree structure we have at least the partial ordering.
References
 1.Aho AV (1968) Indexed grammars—an extension of contextfree grammars. J Assoc Comput Mach 15:647–671CrossRefzbMATHMathSciNetGoogle Scholar
 2.Aho AV, Peterson TG (1972) A minimum distance errorcorrecting parser for contextfree languages. SIAM J Comput 1:305–312CrossRefzbMATHMathSciNetGoogle Scholar
 3.Alquezar R, Sanfeliu A (1997) Recognition and learning of a class of contextsensitive languages described by augmented regular expressions. Pattern Recogn 30:163–182CrossRefGoogle Scholar
 4.Brandenburg FJ (1983) On the complexity of the membership problem of graph grammars. Proc. Int. Workshop on Graphtheoretic Concepts in Computer Sci. WG’83, June 1618 1983, Osnabrück, Germany, Trauner Verlag, pp 40–49Google Scholar
 5.Bunke HO, Haller B (1990) A parser for context free plex grammars. Lecture Notes Comput Sci 411:136–150CrossRefGoogle Scholar
 6.Bunke HO, Sanfeliu A, eds (1990) Syntactic and structural pattern recognition—theory and applications. World Scientific, SingaporeGoogle Scholar
 7.Castaño JM (2003) GIGs: Restricted contextsensitive descriptive power in bounded polynomialtime. In: Proceedings of conference on intelligent text processing and computational linguistics (CICLing ’03)Google Scholar
 8.Dassow J, Păun G (1990) Regulated rewriting in formal language theory. Springer, New YorkGoogle Scholar
 9.Dassow J (2004) Grammars with regulated rewriting. In: MartinVide C, Mitrana V, Păun G (eds) Formal languages and applications. Springer, BerlinGoogle Scholar
 10.Eijck van J (2008) Sequentially indexed grammars. J Logic Comput 18(2):205–228CrossRefzbMATHMathSciNetGoogle Scholar
 11.Ferruci F, Tortora G, Tucci M, Vitiello G (1994) A predictive parser for visual languages specified by relational grammars. Proceedings of the IEEE symposium on visual languagesVL’94: 245–252Google Scholar
 12.Flasiński M (1988) Parsing of edNLCgraph grammars for scene analysis. Pattern Recogn 21:623–629CrossRefzbMATHGoogle Scholar
 13.Flasiński M (1989) Characteristics of edNLC—graph grammars for syntactic pattern recognition. Comput Vision Graphics Image Process. CGIP47: 1–21Google Scholar
 14.Flasiński M (1990) Distorted pattern analysis with the help of Nodel Label Controlled graph languages. Pattern Recogn 23:765–774CrossRefGoogle Scholar
 15.Flasiński (1991) Some notes on a problem of constructing the best matched graph. Pattern Recogn 24:1223–1224Google Scholar
 16.Flasiński M, Lewicki G (1991) The convergent method of constructing polynomial discriminant functions for pattern recognition. Pattern Recogn 24:1009–1015CrossRefGoogle Scholar
 17.Flasiński M (1993) On the parsing of deterministic graph languages for syntactic pattern recognition. Pattern Recogn 26:1–16CrossRefGoogle Scholar
 18.Flasiński M (1998) Power properties of NLC graph grammars with a polynomial membership problem. Theoret Comput Sci 201:189–231CrossRefzbMATHMathSciNetGoogle Scholar
 19.Flasiński M, Jurek J (1999) Dynamically programmed automata for quasi context sensitive languages as a tool for inference support in pattern recognitionbased realtime control expert systems. Pattern Recogn 32:671–690CrossRefGoogle Scholar
 20.Flasiński M, Jurek J (2006) On the analysis of fuzzy string patterns with the help of extended and stochastic GDPLL(k) grammar. Fundam Inf 71:1–14zbMATHGoogle Scholar
 21.Flasiński M (2007) Inference of parsable graph grammars for syntactic pattern recognition. Fundam Inf 80:379–413zbMATHGoogle Scholar
 22.Flasiński M, Myśliński S (2010) On the use of graph parsing for recognition of isolated hand postures of Polish Sign Language. Pattern Recogn 43:2249–2264CrossRefGoogle Scholar
 23.Fu KS, Swain PH (1971) Stochastic programmed grammars for syntactic pattern recognition. Pattern Recogn 4:83–100MathSciNetGoogle Scholar
 24.Fu KS (1982) Syntactic pattern recognition and applications. Prentice Hall, Englewood CliffsGoogle Scholar
 25.Gazdar G (1988) Applicability of indexed grammars to natural languages. In: Reyle U, Rohrer C (eds) Natural language parsing and linguistic theories. D. Reidel Publ. Comp., Englewood Cliffs, NJGoogle Scholar
 26.Goldfarb L (2004) Pattern representation and the future of pattern recognition. Proceedings of the ICPR2004 workshop, August 22, 2004, CambridgeGoogle Scholar
 27.Gonzales RC, Thomason MG (1978) Syntactic pattern recognition: an introduction. AddisonWesley, ReadingGoogle Scholar
 28.Harbusch K (2003) An efficient online parser for contextual grammars with at most contextufree selectors. Lecture Notes Comput Sci 2588:168–179CrossRefGoogle Scholar
 29.Jain AK, Duin RPW, Mao J (2000) Statistical pattern recognition: a review. IEEE Trans Pattern Anal Mach Intell 22:4–37CrossRefGoogle Scholar
 30.Janssens D, Rozenberg G (1980) On the structure of nodelabelcontrolled graph languages. Inf Sci 20:191–216CrossRefzbMATHMathSciNetGoogle Scholar
 31.Joshi AK, Levy LS, Takahashi M (1975) Tree adjunct grammars. J Comput Syst Sci 10:136–163CrossRefzbMATHMathSciNetGoogle Scholar
 32.Joshi AK (1985) How much contextsensivity is necessary for characterizing structural descriptions—tree adjoining grammars. In: Dowty D et al. (eds) Natural language processing—theoretical, computational and psychological perspective. Cambridge University Press, New YorkGoogle Scholar
 33.Jurek J (2000) On the linear computational complexity of the parser for quasi context sensitive languages. Pattern Recognit Lett 21:179–187CrossRefGoogle Scholar
 34.Jurek J (2004) Towards grammatical inferencing of GDPLL(k) grammars for applications in syntactic pattern recognitionbased expert systems. Lecture Notes Comput Sci 3070:604–609CrossRefGoogle Scholar
 35.Jurek J (2005) Recent developments of the syntactic pattern recognition model based on quasicontext sensitive languages. Pattern Recognit Lett 26:1011–1018CrossRefGoogle Scholar
 36.Knuth D (1971) Semantics of contextfree languages. Math Syst Theory 2:127–145CrossRefMathSciNetGoogle Scholar
 37.Marcus S (1969) Contextual grammars. Rev Roum Math Pures Appl 14:1525–1534zbMATHGoogle Scholar
 38.Oates T, Armstrong T, BecerraBonache L, Atamas M (2006) Inferring grammars for mildly context sensitive languages in polynomial time. Lecture Notes Comput Sci 4201:137–147CrossRefMathSciNetGoogle Scholar
 39.Ogiela MR, Ogiela U (2012) DNAlike linguistic secret sharing for strategic information systems. Int J Inf Manag 32:175–181CrossRefGoogle Scholar
 40.Okhotin A (2001) Conjunctive grammars. J Autom Lang Comb 6:519–535zbMATHMathSciNetGoogle Scholar
 41.Okhotin A (2004) Boolean grammars. Inf Comput 194:19–48CrossRefzbMATHMathSciNetGoogle Scholar
 42.Pavlidis T (1977) Structural pattern recognition. Springer, New YorkzbMATHGoogle Scholar
 43.Peng KJ, Yamamoto T, Aoki Y (1990) A new parsing scheme for plex grammars. Pattern Recogn 23:393–402CrossRefGoogle Scholar
 44.Pfaltz JL, Rosenfeld A (1969) Web grammars. Proceedings of the first international conference on artificical intelligence, Washington DC, pp 609–619Google Scholar
 45.Pfaltz JL (1972) Web grammars and picture description. Comput Graph Image Process 1:193–220CrossRefMathSciNetGoogle Scholar
 46.Pollard C (1984) Generalized phrase structure grammars, head grammars, and natural language. PhD thesis, Stanford University, CAGoogle Scholar
 47.Rosenfeld A, Milgram DL (1972) Web automata and web grammars. Mach Intell 7:307–324zbMATHGoogle Scholar
 48.Rosenkrantz DJ (1969) Programmed grammars and classes of formal languages. J Assoc Comput Mach 16:107–131CrossRefzbMATHMathSciNetGoogle Scholar
 49.Rosenkrantz DJ, Stearns RE (1970) Properties of deterministic topdown grammars. Inf Control 17:226–256CrossRefzbMATHMathSciNetGoogle Scholar
 50.Shi QY, Fu KS (1983) Parsing and translation of attributed expansive graph languages for scene analysis. IEEE Trans Pattern Anal Mach Intell 5:472–485CrossRefzbMATHGoogle Scholar
 51.Slisenko AO (1982) Contextfree grammars as a tool for describing polynomialtime subclasses of hard problems. Inform Proc Lett 14:52–56CrossRefzbMATHMathSciNetGoogle Scholar
 52.Staudacher P (1993) New frontiers beyond contextfreeness: DIgrammars and DIautomata. In: Proceedings of the sixth conference on European chapter of the association for computational linguistics (EACL ’93).Google Scholar
 53.Steedman M (1987) Combinatory grammars and parasitic gaps. Nat Lang Ling Theory 5:403–439CrossRefGoogle Scholar
 54.Tadeusiewicz R, Ogiela MR (2004) Medical image understanding technology. Springer, BerlinHeidelbergNew YorkGoogle Scholar
 55.Tanaka E (1995) Theoretical aspects of syntactic pattern recognition. Pattern Recogn 28:1053–1061CrossRefGoogle Scholar
 56.Turan G (1982) On the complexity of graph grammars. Rep. Automata Theory Research Group. Szeged, 1982Google Scholar
 57.VijayShanker K, Weir DJ (1994) The equivalence of four extensions of contextfree grammars. Math Syst Theory 27:511–546CrossRefzbMATHMathSciNetGoogle Scholar
 58.Wittenburg K (1992) Earleystyle parsing for relational grammars. Proceedings of the IEEE symposium on visual languages VL’92:192–199Google Scholar
 59.Zhang DQ, Zhang K, Cao J (2001) A contextsensitive graph grammar formalism for the specification of visual languages. Comput J 44:186–200CrossRefzbMATHGoogle Scholar
Copyright information
Open AccessThis article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.