Keywords

1 Introduction

Extending the theory of formal (string) languages to two dimensions is a very interesting and challenging task. Our motivations are mainly theoretical but, as formal language theory had very significant impact in several applications, we expect that results on two-dimensional languages will be exploited in practical fields like image processing, pattern recognition and matching.

A two dimensional word, or picture, is a rectangular array of symbols taken from a finite alphabet \(\varSigma \); a two-dimensional language is thus a subset of \(\varSigma ^{**}\). The notion of finite state recognizability can be transferred into a two-dimensional (2D) world in different ways (e.g. [10, 15, 17, 19, 22,23,24]). A crucial difference with the string language theory is that in two dimensions many problems become undecidable and even for finite-state recognizability we loose the equivalence between determinism and non-determinism [2, 6, 17].

In the theoretical study of formal string languages, string codes have been always a relevant subject of research, also because of their applications to practical problems (see [14] for complete references). An important and easy-to-construct class of string codes are prefix codes. Recall that a set S of strings is called prefix if inside S no word is (left-)prefix of another one. It holds that any prefix set of words is also a code, referred to as a prefix code. The notion of code can be intuitively and naturally transposed to two-dimensional objects by exploiting the notion of unique tiling decomposition. Several attempts of developing a formal theory of two-dimensional codes have been done by using polyominoes (connected two-dimensional figures, not necessarily rectangular). Unfortunately, most of the published results show that in the 2D context we loose important properties. In [13] D. Beauquier and M. Nivat proved that the problem whether a finite set of polyominoes is a code is undecidable, and that the same result holds also for dominoes. Codes of other variants of polyominoes including bricks (i.e. labelled polyominoes) and pictures are also studied in [1, 16, 18, 20, 21] and further undecidability results are proved.

In [4, 7], a new definition of picture code was introduced by referring to the operation of tiling star as defined in [24]; the tiling star of a set X is the set \(X^{**}\) of all pictures that are tilable (in the polyominoes style) by elements of X. Then, X is a code if any picture in \(X^{**}\) is tilable in a unique way. Unfortunately, it is again not decidable whether a finite language of pictures is a code. The aim was finding decidable subclasses of picture codes. For this, two definitions of prefix code of pictures have been proposed by associating to the pictures a preferred scanning direction from top-left corner towards the bottom-right one. Note that, moving to the 2D setting, the main concern is that if we delete a “prefix” from a picture (i.e. delete a rectangular portion starting at top-left corner) the remaining part is not in general a picture itself. As consequence, the proof techniques for string codes fail when transposed to two dimensions. Further generalizations to 2D of classes of string codes are presented in [8, 11, 12].

A first definition of two-dimensional prefix code is proposed in [4, 7]. It is based on some special kind of polyominoes that have straight top border. A smaller class, referred to as the class of strong prefix sets, was then proposed in [3, 9]; it is defined in a simpler way, it is easier to manage and more robust, while it preserves all positive features of the first definition. In order to prevent to start decoding a picture message in two different ways, no prefix-overlapping pictures are admitted in a strong prefix set. More precisely, any two pictures in the set cannot coincide in their common top-left part. Finite strong prefix sets are a decidable family of picture codes with a simple polynomial decoding algorithm. The results in [5, 9], show a recursive procedure to construct all finite maximal strong prefix codes of pictures, starting from the “singleton” pictures containing only one alphabet symbol. The construction extends the literal representation of prefix codes of strings (cf. [14]). It is the starting point for most considerations in this paper.

All the mentioned results on two-dimensional codes regard finite codes, unless for some first examples of infinite codes of pictures in [8], in the framework of the deciphering delay. Here, the attention is devoted to the infinite strong prefix codes. We present a recursive definition of a family of languages based on the iterated extensions. We prove that all languages defined by iterated extensions are maximal strong prefix codes. Moreover, we show that, vice versa, any maximal strong prefix code can be obtained by iterated extensions. We investigate also the measure of such codes by associating a probability to each letter of the alphabet. We prove that, as in the string case, the measure of a two-dimensional strong prefix code is less than or equal to one. Nevertheless, we show that there exist infinite maximal strong prefix codes whose measure is strictly less than one and discuss the reason of this difference with the string case.

2 Preliminaries

We recall some definitions about two-dimensional languages (see [17]). A picture over a finite alphabet \(\varSigma \) is a two-dimensional rectangular array of elements of \(\varSigma \). Given a picture p, \(|p|_{row}\) and \(|p|_{col}\) denote the number of rows and columns, respectively, while \(size(p)=\left( |p|_{row},|p|_{col}\right) \) and \(area(p)=|p|_{row}\times |p|_{col}\) denote the picture size and area, respectively. We also consider all the empty pictures that correspond to all pictures of size (m, 0) or (0, n). The set of all pictures over \(\varSigma \) of fixed size (mn) is denoted by \(\varSigma ^{m,n}\). The set of all pictures over \(\varSigma \) is denoted by \(\varSigma ^{**}\) while \(\varSigma ^{++}\) refers to the set \(\varSigma ^{**}\) without the empty pictures. A two-dimensional language (or picture language) over \(\varSigma \) is a subset of \(\varSigma ^{**}\). Any string on \(\varSigma \) can be viewed as a one-row picture in \(\varSigma ^{**}\). With a little abuse of notation, in the sequel, \(\varSigma \) will sometimes denote \(\varSigma ^{1,1}\), and a the corresponding picture in \(\varSigma ^{1,1}\).

In order to locate a position in a picture, it is necessary to put the picture in a reference system. The set of coordinates \(dom(p)=\{1, 2, \ldots , |p|_{row}\}\times \{1, 2, \ldots , |p|_{col}\}\) is referred to as the domain of a picture p. We let p(ij) denote the symbol in p at coordinates (ij). We assume the top-left corner of the picture to be at position (1, 1), and fix the scanning direction for a picture from the top-left corner toward the bottom right one.

A subdomain of dom(p) is a set d of the form \(\{i, i+1, \ldots , i'\}\times \{j, j+1, \ldots ,j'\}\), where \(1\le i\le i'\le |p|_{row},\ 1\le j\le j'\le |p|_{col}\), also specified by the pair \([(i, j), ({i'}, {j'})]\). The portion of p corresponding to positions in subdomain \([(i, j), ({i'}, {j'})]\) is denoted by \(p[(i, j), ({i'}, {j'})]\). Then a picture x is subpicture of p if \(x=p[(i, j), ({i'}, {j'})]\), for some \(1\le i\le i'\le |p|_{row},\ 1\le j\le j'\le |p|_{col}\). Prefixes of pictures are special subpictures. Given pictures xp, with \(|x|_{row}\le |p|_{row}\) and \(|x|_{col}\le |p|_{col}\), picture x is a prefix of p, denoted \(x\unlhd p\), if x is a subpicture of p corresponding to its top-left portion, i.e. if \(x= p[(1, 1), ({|x|_{row}, |x|_{col}})]\).

Dealing with pictures, two concatenation products are classically defined. Let \(p, q\in \varSigma ^{**}\) be pictures of size (mn) and \((m', n')\), respectively. The column and the row concatenation of p and q are defined by horizontally and vertically juxtaposing p and q. They are partial operations, defined only if \(m=m'\) and if \(n=n'\), respectively. These operations can be extended to define row- and column- concatenations, and row- and column- stars on languages. We consider another interesting star operation for picture languages, as introduced by D. Simplot in [24], the tiling star. The idea is to compose pictures in some way to cover a rectangular area as, for example, in the following figures.

figure a

The tiling star of X, denoted by \(X^{**}\), is the set that contains all the empty pictures together with all the non-empty pictures p whose domain can be partitioned in disjoint subdomains \(\{d_1,d_2,\ldots , d_k\}\) such that any subpicture \(p_h\) of p associated with the subdomain \(d_h\) belongs to X, for all \(h=1,..., k\).

Then \(X^{++}\) denotes the set \(X^{**}\) without the empty pictures. In the sequel, if \(p\in X^{++}\), we say that p is tilable in X while the partition \(t= \{d_1,d_2,\ldots , d_k\}\) of dom(p), together with the corresponding pictures \(\{p_1,p_2,\ldots , p_k\}\), is called a tiling decomposition of p in X.

3 Two-Dimensional Codes

Let us recall the definitions of codes and strong prefix codes of pictures given in [3, 4, 7, 9], together with some examples. Let \(\varSigma \) be a finite alphabet. \(X\subseteq \varSigma ^{++}\) is a code iff any \(p\in \varSigma ^{++}\) has at most one tiling decomposition in X.

Example 1

Let \(\varSigma =\{ a, b\}\) be the alphabet and let . It is easy to see that X is a code. Any picture \(p\in X^{++}\) can be decomposed starting at top-left-corner and checking the subpicture p[(1, 1), (2, 2)]; it can be univocally decomposed in X. Then, proceed similarly for the next contiguous subpictures of size (2, 2).

Example 2

Let . Notice that no picture in X is prefix of another picture in X (see definition in Sect. 2). Nevertheless, X is not a code. Indeed, picture has the two following different tiling decompositions in X: and  .

Taking inspiration from the very remarkable family of prefix codes of strings, let us introduce strong prefix codes, defined in [3, 9]. The idea is that, given a strong prefix set of pictures \(X\subset \varSigma ^{++}\), each picture in \(\varSigma ^{++}\) can “start” with at most one of the pictures in X.

Definition 3

Let \(p, q\in \varSigma ^{++}\). Pictures p and q prefix-overlap if for any \((i,j)\in \mathrm {dom}(p)\cap \mathrm {dom}(q)\), \(p(i,j)=q(i,j)\). Moreover pictures p and q strictly prefix-overlap if they prefix-overlap, but neither \(p\unlhd q\) nor \(q\unlhd p\).

For example, in the following figure, picture p and q strictly prefix-overlap:

figure b

Definition 4

Let \(X\subseteq \varSigma ^{++}\). X is strong prefix if for any pictures p, q in X with \(p \ne q\), p and q do not prefix-overlap.

Example 5

The following language \(X_{}\) is strong prefix; no two pictures in \(X_{}\) prefix-overlap.

figure c

Definition 6

A strong prefix set \(X \subseteq \varSigma ^{++}\) is maximal strong prefix over \(\varSigma \) if it is not properly contained in any other strong prefix set over \(\varSigma \); that is, \(X \subseteq Y \subseteq \varSigma ^{++}\) and Y strong prefix imply \(X=Y\).

The results in [5, 9] prove that finite strong prefix codes have a recursive structure and describe an effective procedure to construct all (maximal) finite strong prefix codes of pictures, starting from the “singleton” pictures containing only one alphabet symbol. The construction in some sense extends the literal representation of prefix codes of strings and is based on the notion of extensions of a picture. The set of extensions of a picture p to some bigger size (mn), is the set of all pictures of fixed size (mn), obtained by adding some columns to the right and some rows to the bottom of p filled with all possible combinations of alphabet symbols.

Let us fix an order between pairs of integers. We write \((m,n)<(m',n')\) if \(m\le m'\), \(n\le n'\) and \(m\ne m'\) or \(n\ne n'\).

Definition 7

Let \(\varSigma \) be an alphabet, \(p \in \varSigma ^{++}\), \(m, n\ge 0\) be positive integers with \(size(p)<(m,n)\). The set of extensions of p to size (mn) is \(E_{(m,n)}(p)=\{ q \in \varSigma ^{m,n}\ | \ q[(1,1), (|p|_{row},|p|_{col})]=p \}\).

In [9] the finite maximal strong prefix codes are characterized as follows.

Proposition 8

\(X\subseteq \varSigma ^{++}\) is a finite maximal strong prefix code if and only if there exists a finite sequence of picture languages over \(\varSigma \), \(X_1, X_2, \ldots , X_k\), such that \(X_1=\varSigma \), \(X=X_k\), and for \(i=1, \ldots , k-1\), \(X_{i+1}=( X_i \setminus \{p_i\} )\cup E_{(m_i, n_i)}(p_i)\), for some \(p_i \in X_i\), \(m_i,n_i \ge 0\).

4 Infinite Strong Prefix Codes

In this section we consider the strong prefix codes introduced in [3, 9] and recalled in the previous Sect. 3. We define a construction for infinite maximal strong prefix codes that provides an interesting inside view of their structure.

We first observe that, as in the one-dimensional case, any strong prefix code of pictures can be embedded into a maximal one. This result allows to concentrate our attention on the infinite strong prefix codes that are maximal.

Proposition 9

Any strong prefix code \(X\subseteq \varSigma ^{++}\) is contained in some maximal strong prefix code over \(\varSigma \).

The proof is similar to the corresponding one in the one dimensional case (Proposition 1.5 in [14]). It considers, given a strong prefix code X, a chain of strong prefix codes containing X, ordered by set inclusion, and uses the remark that, in view of Zorn’s lemma, this chain admits a least upper bound. We omit here all the details.

The following is a simple example of an infinite picture language that is a strong prefix code.

Example 10

Let X be the language of square pictures over \(\varSigma =\{a,b\}\) that contains b in all positions apart for the bottom-right corner where symbol a occurs.

figure d

X is an infinite strong prefix code. Furthermore, X is not maximal strong prefix. Indeed, consider, for example, the picture ; it is easy to see that \(X \cup \{ p\}\) is still strong prefix.

Note that X can be viewed as a generalization to 2D of the well known infinite code of strings \(S=\{ b^na\), \(n \ge 0 \}\).

The following example provides a maximal strong prefix code.

Example 11

The language \(X_{\infty }\) contains all square pictures over \(\varSigma =\{ a, b\}\) such that if p has size (nn), its prefix of size \((n-1,n-1)\) contains only b’s while there should be at least one a in the bottom row or in the rightmost column. Then,

figure e

The language \(X_{\infty }\) is an infinite maximal strong prefix code. It is immediate to see that it is strong prefix. Indeed, by definition, no picture in \(X_{\infty }\) is prefix of another picture in \(X_{\infty }\) and this implies, since they are all square pictures, that no pair of pictures in \(X_{\infty }\) can prefix-overlap.

To prove the maximality, consider a picture \(p\in \varSigma ^{++}\setminus X_{\infty }\). It cannot be \(p(1,1)=a\) otherwise \(a \in X_{\infty }\) is a prefix of p. Assume therefore that \(p(1,1)=b\). Two cases arise: either \(p \in \{ b\}^{++}\) or not. In the first case p is a prefix of an infinite number of pictures of \(X_{\infty }\). In the second case, let \(b^{k,k}\) be the prefix of p with maximal k, and \(k< |p|_{row}, |p|_{col}\). Then there exists a picture \(q\in X_{\infty }\) of size \((k+1,k+1)\), that is a prefix of p. In both cases \(X_{\infty }\cup \{ p\}\) is not strong prefix.

Note that the language \(X_{\infty }\) of the previous example contains the language X of the Example 10 (as already noted, X was not maximal strong prefix).

The language \(X_{\infty }\) can be viewed inside a more general family that is obtained by means of iterated extensions; the definition takes as starting point the construction of finite maximal strong prefix codes recalled in Proposition 8.

We use the notion of extension of a picture (see Definition 7) to define infinite languages that will result to be maximal strong prefix codes. We give first an informal description. The idea is to construct a language X as infinite union of sets \(X_k\). We start from the initial set \(Y_0=\varSigma \) of all pictures of size (1, 1). Then we partition \(Y_0=X_1 \cup A_0\) where \(X_1\) is added to X, while the pictures in \(A_0\) will be extended to get a set of pictures of bigger size. Let \(Y_1\) be the union of all possible extensions of pictures \(p\in A_0\) to a size (m(p), n(p)) that depends on p. Again we partition \(Y_1=X_2 \cup A_1\) and again we add \(X_2\) to X and take all pictures in \(A_1\) for new extensions to produce the set \(Y_2\). And so on. A further condition ensures that whenever a picture \(p\in Y_k\) is not chosen to belong to \(X_{k+1}\) (i.e. p stays in \(A_k\) to be extended and put in \(Y_{k+1}\)), then in some future step, one of its extensions will be surely added to some \(X_i\). Such condition will be crucial in the proof of maximality of Proposition 15. Here below is the formal definition.

Definition 12

Let \(\varSigma \) be a finite alphabet. A language \(X\subseteq \varSigma ^{++}\) is generated by iterated extensions on \(\varSigma \) if \(X=\cup _{k\ge 1} X_k\) where, for any \(k\ge 0\),

  1. (1)

    \(Y_0=\varSigma \)

  2. (2)

    \(A_k \subseteq Y_{k}\), \(X_{k+1}=Y_k \setminus A_{k}\)

  3. (3)

    \(Y_{k+1}=\bigcup _{p \in A_k}E_{(m(p),n(p))}( p )\), for some \((m(p),n(p)) > size(p)\)

  4. (4)

    for any \(p\in A_k\), there exist \(h>k\) and some extension q of p, with \(q\in X_h\). The family of all languages generated by iterated extensions on \(\varSigma \) will be denoted by \(\mathcal {I}(\varSigma )\), or simply \(\mathcal {I}\), when no ambiguity is possible.

Example 13

The language \(X_{\infty }\) introduced in Example 11 is in \(\mathcal I\). In fact, \(X_{\infty }=\cup _{k\ge 1} X_k\), where \(Y_0=\varSigma \), \(A_0=\{ b\}\), and for any \(k\ge 1\), \(X_k=Y_{k-1}\setminus A_{k-1}\), with \(A_{k-1}=\{ p_k\}\) where \(p_k\) is the picture of size (kk) composed of all b’s, and \(Y_{k-1}=E_{(k,k)}( p_{k-1} )\).

Many different and involved languages can be defined by using Definition 12. The matter is to fix the rule to “extract” the set \(X_{k+1}\) from \(Y_k\) and the criterion to choose the size of the extensions of the pictures in \(A_k\). Consider as an example, the following language.

Example 14

Use iterated extensions on \(Y_0 =\{a,b\}\) and take \(X_1=\{a\}\) (\(A_0=\{ b\}\)) and \(Y_1=E_{(2,2)}( b )\). For any \(k\ge 1\), put in \(X_{k+1}\) those pictures of \(Y_k\) that have the first column equal to the last one. The remaining pictures \(p\in Y_k\) (actually pictures of the set \(A_k\)) are extended in two different ways. If the last row of p contains an even number of a, add a row to p; if it contains an odd number of a, add a column. This will generate the next set \(Y_{k+1}\) containing pictures of many different sizes. Here below, we calculate some of the pictures.

Note that in Definition 12 if, for some \(k\ge 0\), \(A_k=\emptyset \), then \(Y_{k+1}, X_{k+1}=\emptyset \) and the language X is finite. This is the unique case where X can be finite. Otherwise, if for any \(k\ge 0\), \(A_k\ne \emptyset \), then condition (4) in Definition 12 guarantees that the language is infinite. Moreover, we will see in the next proposition, that condition (4) will be crucial also in proving the maximality of the obtained language.

On the other hand, observe that for some \(k\ge 0\), it can hold that \(A_k=Y_k\), that is \(X_{k+1}=\emptyset \) (without forcing the finiteness of the language).

Next proposition shows that any language generated by iterated extensions is a maximal strong prefix code.

Proposition 15

Any set \(X\in \mathcal I\) \((\varSigma )\) is a maximal strong prefix code over \(\varSigma \).

Proof

Let \(X\in \mathcal I\) \((\varSigma )\). First of all, let us show by induction that, for any \(h\ge 1\), \(\big ( \bigcup _{i=1 \ldots h} X_i \big ) \cup A_{h-1}\) is a finite maximal strong prefix code. In the base case, \(h=1\), we have \(X_1 \cup A_{0}= \{ a,b \}\) and this is a maximal strong prefix code. Inductively, suppose that the set \(Z= \bigcup _{i=1 \ldots h-1} X_i \cup A_{h-2}\) is a maximal strong prefix code. Note that the set \( \bigcup _{i=1 \ldots h} X_i \cup A_{h-1}\) can be obtained from Z, by replacing any \(p \in A_{h-2} \subseteq Z\) with the set of all its extensions to some bigger size. Hence, it is a finite maximal strong prefix code (see the characterization in Proposition 8).

To show that X is a strong prefix code consider two pictures \(p, q \in X\) and suppose \(p \in X_h\), \(q \in X_k\) and \(h \ge k\). Then \(p, q \in \bigcup _{i=1 \ldots h} X_i\) and, since \(\bigcup _{i=1 \ldots h} X_i \cup A_{h-1}\) is a strong prefix code, p and q cannot prefix-overlap.

Now, let us show that X is a maximal strong prefix code. Suppose by contradiction that there exists a picture \(p\in \varSigma ^{**}\setminus X\) such that \(X \cup \{ p\}\) is strong prefix. Let \(size(p)=(m,n)\) and set \(K= max \{k\ | \ \forall x \in \bigcup _{i=1 \ldots k} X_i, |x|_{row} \le m\) and \(|x|_{col} \le n\}\). Consider the set \(T = \bigcup _{i=1 \ldots K} X_i \cup A_{K-1}\). We have \(p \notin \bigcup _{i=1 \ldots k} X_i\) (since \(p \notin X\)) and \(p \notin A_{K-1}\) (since, if \(p \in A_{K-1}\), then, by condition 4) there exists an extension of p in X, against \(X \cup \{ p\}\) strong prefix). Therefore, \(p \notin T\). Let us show that \(T \cup \{ p\}\) is a strong prefix code, against the maximality of T. Note that p cannot prefix-overlap a picture in \(\bigcup _{i=1 \ldots K} X_i\), since \(X \cup \{ p\}\) is strong prefix. Furthermore, p cannot strictly prefix-overlap a picture in \(A_{K-1}\). Indeed, any \(q \in A_{K-1}\) has a size less than size(p); hence, p and q could strictly prefix-overlap only if q is a prefix of p. This would imply that, there exist some \(K'>K\) and some \(p' \in X_{K'} \subseteq X\), \(p'\) extension of q, such that \(p'\) and p prefix-overlap, against \(X \cup \{ p\}\) strong prefix. We can conclude that \(T \cup \{ p\}\) is a strong prefix code against the maximality of T.    \(\square \)

We now show the reverse of Proposition 15, i.e. that any maximal infinite strong prefix code can be obtained by iterated extensions.

Proposition 16

If X is a maximal strong prefix code over \(\varSigma \) then \(X\in \mathcal I\) \((\varSigma )\).

Proof

(Sketch). Let \(Y_0=\varSigma \), \(X_1=X\cap \varSigma \) and \(A_0=Y_0\setminus X_1\). The proof is sketched only in the case \(\varSigma =\{a,b\}\) and \(X_1=\{ b\}\); a similar proof can be used in the other cases. Let us show how to construct the sets \(X_2\), \(X_3\), \(\ldots \), and so on.

Denote \(r_a\) = min\(\{ |p|_{row} \ | \ p \in X\) and \( a \unlhd p\}\) and \(c_a=\)min\(\{ |p|_{col} \ | \ p \in X\) and \( a \unlhd p\}\). Clearly \((r_a, c_a) \ne (1,1)\). Set \(Y_1=E_{(r_a,c_a)}(a)\) and \(X_2=X \cap Y_1\); then \(A_1=Y_1\setminus X_2\). Note that it could be \(X_2=\emptyset \). Observe that the pictures in \(X \setminus (X_1 \cup X_2)\) must be the extensions of some pictures in \(A_1\). Indeed, they cannot have a size smaller than the elements in \(A_1\) (for the choice of \(r_a\) and \(c_a\)); moreover, \(A_1\) contains all pictures in \(\varSigma ^{r_a,c_a}\), except those pictures that are in \(X_1 \cup X_2\) (whose extensions cannot be in X, since it is strong prefix). Subsequently, for any \(t_1 \in A_1\), at least one extension of \(t_1\) is in X, otherwise the set \(X \cup \{t_1\}\) would be strong prefix, against the maximality of X.

For any \(q \in A_1\), let \(r_q=\)min\(\{ |p|_{row} \ | \ p \in X\) and \( q \unlhd p\}\) and \(c_q=\)min\(\{ |p|_{col} \ | \ p \in X\) and \( q \unlhd p\}\). Clearly, \(r_q > |q|_{row}\) or \(c_q> |q|_{col}\). Set \(Y_2= \bigcup _{q \in A_1} E_{(r_q,c_q)}(q)\), \(X_3=X \cap Y_2\) and \(A_2=Y_2\setminus X_3\). Again, for any \(t_2 \in A_2\), at least one extension of \(t_2\) is in X, otherwise the set \(X \cup \{t_2\}\) would be strong prefix. Iterating this scheme, one obtains all the subsequent \(X_k\) such that \(X=\cup _{k\ge 1} X_k\).    \(\square \)

The results in the two previous propositions can be summarized in the following theorem which gives a characterization of maximal strong prefix codes of pictures. It holds both for finite and infinite codes.

Theorem 17

Let \(X \subseteq \varSigma ^{++}\). X is a maximal strong prefix code over \(\varSigma \) if and only if \(X\in \mathcal I\) \((\varSigma )\).

5 Measure of Two-Dimensional Languages and Codes

Some important results on codes of strings deal with the notion of measure (cf. [14]). A probability is assigned to each symbol of the alphabet and, for a given string, one multiplies the probability of each letter. Then, the measure of a language is simply the sum of the probability of its strings. A major result states that the measure of a string code is always less than or equal to 1, whereas a thin string code is maximal if and only if its measure is 1. Roughly speaking, a set of strings is not a code if there are “too many too short strings”. In this section, we consider the measure of infinite strong prefix codes of pictures as introduced in [5].

Definition 18

Let \(\varSigma \) be an alphabet and \(\pi \) be a probability distribution on \(\varSigma \). The probability of a picture \(p\in \varSigma ^{++}\) is defined as \(\pi (p)= \prod _{{1\le i\le m,} {1\le j\le n}} \pi (p(i,j)).\) The measure of a language \(X\subseteq \varSigma ^{++}\) relative to \(\pi \) is \(\mu _{\pi }(X) = \sum _{p\in X}^{} \pi (p).\)

Particular interest is devoted to the uniform distribution, which associates to every symbol a in the alphabet \(\varSigma \) of cardinality k, the probability \(\pi _u(a)=\frac{1}{k}\). Then, the uniform probability of a picture \(p\in \varSigma ^{++}\) is \(\pi _u(p)=\frac{1}{k^{area(p)}}\). The uniform measure of a language \(X\subseteq \varSigma ^{++}\), is \(\mu _u(X)=\sum _{p\in X}^{} \pi _u(p)\).

Example 19

Let \(\varSigma =\{ a, b\}\) and consider language on \(\varSigma \). Its uniform measure is \(\mu _u(X)=5/8 < 1\). In general for any probability distribution \(\pi (a)=p\), \(\pi (b)=1-p\), \(0<p<1\), then \(\mu _\pi (X)=p^3-p+1 < 1\). Note that X is a code.

A main result in [5] shows that for any finite strong prefix code \(X\subseteq \varSigma ^{++}\) and measure \(\mu \), we have that \(\mu (X) \le 1\). Moreover \(\mu (X)= 1\) if and only if X is a finite maximal strong prefix code. We show that without the finiteness hypotesis the scenario is different. Coherently with the intuitive relation between code and measure, we prove first the following result.

Theorem 20

Let \(X\subseteq \varSigma ^{++}\) be a maximal strong prefix code and \(\mu \) be a measure. Then \(\mu (X) \le 1\).

Proof

By Theorem 17, and following the notation of Definition 12, X is the union of some languages \(X_i\), for \(i\ge 1\). Since the languages \(X_i\)’s are pairwise disjoint, taking \(s_n=\sum _{i=1}^n \mu (X_i)\), we can write \(\mu (X)=\lim _{n\rightarrow \infty } s_n\). Consider now, for any \(n\ge 1\), the sets \(Z_n= \bigcup _{i=1 \ldots n} X_i \cup A_{n-1}\). For any \(n\ge 1\), \(Z_n\) is a finite maximal strong prefix code (as shown in the proof of Proposition 15) and therefore \(\mu (Z_n)=1\). Hence, \(s_n\le \mu (Z_n)=1\). Finally, \(\mu (X)=\lim s_n\le 1\).    \(\square \)

The measure of infinite maximal strong prefix codes does not behave as the measure of the finite ones. To show this, we propose another example.

Example 21

Consider the language \(Z_{\infty }\) over \(\varSigma =\{ a, b\}\) that contains the size (1, 1) picture with a and all square pictures p that have symbol b in the top-left position and in all positions of the bottom row and of the rightmost column. Moreover all square prefixes of p should have at least one a in their bottom row or last column.

figure f

The language \(Z_{\infty }\) can be obtained following Definition 12. \(Z_{\infty } =\cup _{i\ge 1} X_i\), where \(Y_0=\varSigma \), \(A_0=\{ b\}\), and for any \(i\ge 1\), \(X_i=Y_{i-1}\setminus A_{i-1}\), with \(A_{i-1}=\{ p \in \varSigma ^{i,i} | \ p\) has at least one a’s in the last row or column\(\}\), and \(Y_{i-1}=\bigcup _{p \in A_{i-2}}E_{(i,i)}( p )\). Then, by Theorem 17, \(Z_{\infty }\) is a maximal strong prefix code.

Proposition 22

There exist maximal strong prefix codes whose measure is strictly less than 1.

Proof

Consider the language \(Z_{\infty }\) together with the languages \(X_i\), \(Y_{i-1}\), \(A_{i-1}\) resulting by the associated iterated extensions as defined in Example 21. Let us calculate the uniform measure of \(Z_{\infty }\). Since the languages \(X_i\)’s are pairwise disjoint, \(\mu (Z_{\infty })= \sum _{i\ge 1} \mu (X_i)\). We have:

  • \(\mu (X_1)= 1/2\),

  • \(\mu (X_2)= 1/ 2^4\), and

  • \(\mu (X_i)= \frac{(2^3 -1) (2^5 -1) \cdots (2^{2(i-1)-1}-1) }{2^{1+3+5+\cdots + (2i-1)}}\), for any \(i\ge 3\).

Recall that \(X_i\subseteq \varSigma ^{i,i}\) and \(1+3+5+\cdots + (2i-1) = i^2\). Then, for any \(i\ge 3\),

\(\mu (X_i)\le \frac{2^3 2^5 \cdots 2^{2(i-1)-1} }{2^{1+3+5+\cdots + (2i-1)}} = \frac{1}{2^{1+ (2i-1)} } = \frac{1}{2^{2i} }= \frac{1}{4^i} \).

Hence, \(\mu ( Z_{\infty }) \le 1/2 + 1/2^4 + \sum _{i=3}^{\infty } \frac{1}{4^i} = 1/2 + \sum _{i=2}^{\infty } (\frac{1}{4})^i = 1/2 + \sum _{i=0}^{\infty } (\frac{1}{4})^i - 1- 1/4= 4/3 - 3/4= 7/12\). This shows that \(\mu (Z_{\infty }) < 1\).    \(\square \)

The next Proposition characterizes the maximal strong prefix codes which have measure equal to 1, in terms of the measure of the languages involved in its construction by iterated extensions.

Proposition 23

Let \(X \in \mathcal{I}(\varSigma )\) and let \(A_{n}\), for any \(n\ge 0\), be the corresponding languages. The measure of X is equal to 1 if and only if \(\lim _{n\rightarrow \infty } \mu (A_n)=0\).

Proof

Let \(X_i\), \(Y_{i-1}\), \(A_{i-1}\), for any \(i\ge 1\), be the languages involved in the iterated extensions for X as in Definition 12.

Since the languages \(X_i\)’s are pairwise disjoint, \(\mu (X)=\lim _{n\rightarrow \infty } s_n\), where \(s_n=\sum _{i=1}^n \mu (X_i)\). Observe that, for any \(i\ge 1\), \(\mu (X_i)=\mu (Y_{i-1}) - \mu (A_{i-1})\) and \(\mu (Y_i) = \mu (A_{i-1})\), since \(Y_i\) contains all the extensions of all the pictures in \(A_{i-1}\). Therefore, \(s_n= ( \mu (Y_0) - \mu (A_0) ) + ( \mu (Y_1) - \mu (A_1) ) + \cdots + ( \mu (Y_n) - \mu (A_n) ) = \mu (Y_0) - \mu (A_n) =1 - \mu (A_n)\). Finally, \(\mu (X)=\lim _{n\rightarrow \infty } s_n = 1 - \lim _{n\rightarrow \infty } \mu (A_n)\). Hence, \(\mu (X)=1\) if and only if \(\lim _{n\rightarrow \infty } \mu (A_n)=0\).    \(\square \)

As an application of the previous proposition we prove the following.

Proposition 24

There exist maximal strong prefix codes whose measure is exactly 1.

Proof

We consider the language \(X_{\infty }\) as in Example 13 and we show that the uniform measure \(\mu (X_{\infty })=1\). Following the construction by iterated extensions, each set \(A_n\) contains a single picture p of size \((n+1,n+1)\) then the measure \(\mu (A_n)= 1 / 2^{(n+1)^2}\) and \(\lim _{n\rightarrow \infty }1 / 2^{(n+1)^2}=0\). By applying Proposition 23 we complete the proof.    \(\square \)

We conclude the paper by observing that the proofs of Propositions 22 and 24 are based on two languages \(X_{\infty }\) and \(Z_{\infty }\) that have somehow complementary structure with respect to the definition by iterated extensions. Starting from \(Y_0=\{a,b\}\), for both languages we take \(X_1=\{a\}\) and \(Y_1=E_{(2,2)}( b )\). At each step i we use the same criterion to partition the respective current sets \(Y_i\) (in one side, the only picture with all b’s in the bottom row and in the rightmost column and in the other side, all the remaining ones). Nevertheless, for \(X_{\infty }\) such single picture is put in the set \(A_i\) to be extended, while for \(Z_{\infty }\) such picture is the only one which is kept in the code. The difference in the cardinality of the two sides of the partition makes the substantial discrepancy in the calculation of the measure.