Generating, Sampling and Counting Subclasses of Regular Tree Languages

Antonopoulos, Timos; Geerts, Floris; Martens, Wim; Neven, Frank

doi:10.1007/s00224-012-9428-x

Generating, Sampling and Counting Subclasses of Regular Tree Languages

Published: 26 October 2012

Volume 52, pages 542–585, (2013)
Cite this article

Theory of Computing Systems Aims and scope Submit manuscript

Timos Antonopoulos¹,
Floris Geerts²,
Wim Martens³ &
…
Frank Neven¹

218 Accesses
2 Citations
Explore all metrics

Abstract

To experimentally validate learning and approximation algorithms for XML Schema Definitions (XSDs), we need algorithms to generate uniformly at random a corpus of XSDs as well as a similarity measure to compare how close the generated XSD resembles the target schema. In this paper, we provide the formal foundation for such a testbed. We adopt similarity measures based on counting the number of common and different trees in the two languages, and we develop the necessary machinery for computing them. We use the formalism of extended DTDs (EDTDs) to represent the unranked regular tree languages. In particular, we obtain an efficient algorithm to count the number of trees up to a certain size in an unambiguous EDTD. The latter class of unambiguous EDTDs encompasses the more familiar classes of single-type, restrained competition and bottom-up deterministic EDTDs. The single-type EDTDs correspond precisely to the core of XML Schema, while the others are strictly more expressive. We also show how constraints on the shape of allowed trees can be incorporated. As we make use of a translation into a well-known formalism for combinatorial specifications, we get for free a sampling procedure to draw members of any unambiguous EDTD. When dropping the restriction to unambiguous EDTDs, i.e. taking the full class of EDTDs into account, we show that the counting problem becomes #P-complete and provide an approximation algorithm. Finally, we discuss uniform generation of single-type EDTDs, i.e., the formal abstraction of XSDs. To this end, we provide an algorithm to generate k-occurrence automata (k-OAs) uniformly at random and show how this leads to the uniform generation of single-type EDTDs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Inferring Deterministic Regular Expression with Counting

Inferring Regular Expressions with Interleaving from XML Data

XQuery Testing from XML Schema Based Random Test Cases

Notes

http://www.maplesoft.com/support/help/Maple/view.aspx?path=combstruct.
The translation algorithm in Lemma 7 of [20] only claims quadratic time, but it uses a different definition of single-type EDTDs. In the definition there, the set of types in an EDTD is always of the form {a ⁱ∣a∈Σ,i∈Δ} and, for each such type, μ(a _i)=a. For the definition we use in this paper, the translation can be easily adapted to run in linear time.

References

Albert, J., Giammerresi, D., Wood, D.: Normal form algorithms for extended context free grammars. Theor. Comput. Sci. 267(1–2), 35–47 (2001)
Article MATH Google Scholar
Almeida, M., Moreira, N., Reis, R.: Enumeration and generation with a string automata representation. Theor. Comput. Sci. 387(2), 93–102 (2007)
Article MATH MathSciNet Google Scholar
Barbosa, D., Mendelzon, A.O., Keenleyside, J., Lyons, K.A.: ToXgene: a template-based data generator for XML. In: International Symposium on Management of Data (SIGMOD), p. 616 (2002)
Google Scholar
Bassino, F., David, J., Nicaud, C.: Enumeration and random generation of possibly incomplete deterministic automata. Pure Math. Appl. 19(2–3), 1–16 (2008)
MathSciNet Google Scholar
Bassino, F., Nicaud, C.: Enumeration and random generation of accessible automata. Theor. Comput. Sci. 381(1–3), 86–104 (2007)
Article MATH MathSciNet Google Scholar
Bertoni, A., Goldwurm, M., Sabadini, N.: The complexity of computing the number of strings of given length in context-free languages. Theor. Comput. Sci. 86(2), 325–342 (1991)
Article MATH MathSciNet Google Scholar
Bex, G.J., Gelade, W., Martens, W., Neven, F.: Simplifying XML schema: effortless handling of nondeterministic regular expressions. In: International Symposium on Management of Data (SIGMOD), pp. 731–744 (2009)
Google Scholar
Bex, G.J., Gelade, W., Neven, F., Vansummeren, S.: Learning deterministic regular expressions for the inference of schemas from XML data. In: International World Wide Web Conference (WWW), pp. 825–834 (2008)
Google Scholar
Bex, G.J., Gelade, W., Neven, F., Vansummeren, S.: Learning deterministic regular expressions for the inference of schemas from XML data. ACM Trans. Web 4(4) (2010)
Bex, G.J., Neven, F., Schwentick, T., Vansummeren, S.: Inference of concise regular expressions and DTDs. ACM Transactions on Database Systems (2010)
Bex, G.J., Neven, F., Vansummeren, S.: Inferring XML schema definitions from XML data. In: International Conference on Very Large Data Bases (VLDB), pp. 998–1009 (2007)
Google Scholar
Björklund, H., Martens, W.: The tractability frontier for NFA minimization. In: International Colloquium on Automata, Languages and Programming (ICALP), pp. 27–38 (2008)
Chapter Google Scholar
Brüggemann-Klein, A.: Regular expressions into finite automata. In: Latin American Symposium on Theoretical Informatics (LATIN), pp. 87–98 (1992)
Google Scholar
Brüggemann-Klein, A., Murata, M., Wood, D.: Regular tree and regular hedge languages over unranked alphabets: version 1, April 3. Technical report HKUST-TCSC-2001-0, The Hongkong University of Science and Technology (2001)
Brüggemann-Klein, A., Wood, D.: One-unambiguous regular languages. Inf. Comput. 142(2), 182–206 (1998)
Article MATH Google Scholar
Cohen, S., Kimelfeld, B., Sagiv, Y.: Incorporating constraints in probabilistic XML. ACM Trans. Database Syst. 34(3), 1–45 (2009)
Article Google Scholar
Cohen, S., Kimelfeld, B., Sagiv, Y.: Running tree automata on probabilistic XML. In: International Symposium on Principles of Database Systems (PODS), pp. 227–236 (2009)
Google Scholar
Flajolet, P., Zimmermann, P., Van Cutsem, B.: A calculus for the random generation of labelled combinatorial structures. Theor. Comput. Sci. 132(2), 1–35 (1994)
Article MATH Google Scholar
Gelade, W., Idziaszek, T., Martens, W., Neven, F.: Simplifying XML schema: single-type approximations of regular tree languages. In: International Symposium on Principles of Database Systems (PODS) (2010)
Google Scholar
Gelade, W., Neven, F.: Succinctness of pattern-based schema languages for XML. J. Comput. Syst. Sci. 77(3), 505–519 (2011)
Article MATH MathSciNet Google Scholar
Gore, V., Jerrum, M., Kannan, S., Sweedyk, Z., Mahaney, S.R.: A quasi-polynomial-time algorithm for sampling words from a context-free language. Inf. Comput. 134(1), 59–74 (1997)
Article MATH MathSciNet Google Scholar
Héam, P.-C., Nicaud, C., Schmitz, S.: Random generation of deterministic tree (walking) automata. In: International Conference on Implementation and Application of Automata (CIAA), pp. 115–124 (2009)
Chapter Google Scholar
Hopcroft, J.E., Motwani, R., Ullman, J.D.: Introduction to Automata Theory, Languages, and Computation, 3rd edn. Addison-Wesley, Reading (2007)
Google Scholar
Kannan, S., Sweedyk, Z., Mahaney, S.R.: Counting and random generation of strings in regular languages. In: ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 551–557 (1995)
Google Scholar
Martens, W., Neven, F., Schwentick, T.: Simple off the shelf abstractions of XML schema. SIGMOD Rec. 36(3), 15–22 (2007)
Article Google Scholar
Martens, W., Neven, F., Schwentick, T.: Complexity of decision problems for XML schemas and chain regular expressions. SIAM J. Comput. 39(4), 1486–1530 (2009)
Article MATH MathSciNet Google Scholar
Martens, W., Neven, F., Schwentick, T., Bex, G.J.: Expressiveness and complexity of XML schema. ACM Trans. Database Syst. 31(3), 770–813 (2006)
Article Google Scholar
Martens, W., Niehren, J.: On the minimization of XML Schemas and tree automata for unranked trees. J. Comput. Syst. Sci. 73(4), 550–583 (2007)
Article MATH MathSciNet Google Scholar
Meyer, A.R., Fischer, M.J.: Economy of description by automata, grammars, and formal systems. In: FOCS, pp. 188–191. IEEE, New York (1971)
Google Scholar
Murata, M., Lee, D., Mani, M., Kawaguchi, K.: Taxonomy of XML schema languages using formal language theory. ACM Trans. Internet Technol. 5(4), 660–704 (2005)
Article Google Scholar
Nijenhuis, A., Wilf, H.: Combinatorial Algorithms. Academic Press, San Diego (1979)
Google Scholar
Seidl, H.: Deciding equivalence of finite tree automata. SIAM J. Comput. 19(3), 424–437 (1990)
Article MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Hasselt University and Transnational University of Limburg, Hasselt, Belgium
Timos Antonopoulos & Frank Neven
University of Antwerp, Antwerp, Belgium
Floris Geerts
Universität Bayreuth, Bayreuth, Germany
Wim Martens

Authors

Timos Antonopoulos
View author publications
You can also search for this author in PubMed Google Scholar
Floris Geerts
View author publications
You can also search for this author in PubMed Google Scholar
Wim Martens
View author publications
You can also search for this author in PubMed Google Scholar
Frank Neven
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Timos Antonopoulos.

Additional information

We acknowledge the financial support of the Future and Emerging Technologies (FET) programme within the Seventh Framework Programme for Research of the European Commission, under the FET-Open grant agreement FOX, number FP7-ICT-233599.

Supported by grant number MA 4938/2–1 from the Deutsche Forschungsgemeinschaft (Emmy Noether Nachwuchsgruppe).

Appendix: Proof of Lemma 6.15

We show that the specification given in Fig. 3 is combinatorially isomorphic to the strings of length satisfying rules (A1)–(A4) given in Sect. 6.

Lemma 6.15

For ℓ,k,n,j′,j,n ₀,…,n _ℓ−1∈ℕ, m≥ℓ, and $\overline{W}^{m}$ a valid partition w.r.t. k for m, the class $\mathcal {S}_{m}^{(j',j)}[\overline{W}^{m}]$ defined by the specification in Fig. 3 is combinatorially isomorphic to the class of strings of length (n+1)⋅ℓ satisfying rules (A1)–(A4), with a prefix s ₀…s _j′⋅ℓ+j, where exactly the states [1,m] occur in the string at positions up to position j′⋅ℓ+j, and for each i∈[0,ℓ−1], and q∈[0,j′⋅ℓ+j], it holds that {s _q | q=i (mod ℓ)}=W _i.

Proof

We first count the number of valid extensions of such a given prefix (Lemma A.1) and then show that this number coincides with the number of objects in the corresponding class (Lemma A.2).

We need the following notations. Let Σ be an alphabet of size ℓ. For ℓ and k as above, and m,j′,j,n,n ₀,…,n _ℓ−1∈ℕ, let N(m,j′⋅ℓ+j,n,n ₀,…,n _ℓ−1) be defined inductively as follows:

$$ \begin{array}{l@{\quad }l} N \bigl(m,j'\ell+j,n, n_0,\ldots,n_{\ell-1}\bigr) = 0 & \mbox{if}\ \exists i\in[0,\ell-1]\mbox{ s.t. }n_i>k,\\[3pt] N \bigl(m,j'\ell+j,n, n_0,\ldots,n_{\ell-1}\bigr) = 0& \mbox{if}\ j'> m,\\[3pt] N \bigl(m,j'\ell+j,n, n_0,\ldots,n_{\ell-1}\bigr) = 0& \mbox{if}\ m> n,\\[3pt] N \bigl(m,j'\ell+j,n, n_0,\ldots,n_{\ell-1}\bigr) = 0& \mbox{if}\ m\neq\sum _{i=0}^{\ell-1} n_i.\\ \end{array} $$

(A.1)

$$ N(n, n\ell+j,n,n_0,\ldots,n_{\ell-1}) = \prod_{i=j+1}^{\ell-1} n_{i}, $$

(A.2)

and for j′≤m−1 and j<ℓ,

$$ N \bigl(m,j'\cdot\ell+j,n,n_0,\ldots,n_{\ell-1}\bigr)=N_1+N_2, $$

(A.3)

and for j′=m and j<ℓ,

$$ N \bigl(m,j'\cdot\ell+j,n,n_0,\ldots,n_{\ell-1}\bigr)=N_3, $$

(A.4)

where:

$$ \everymath{\displaystyle} \begin{array}{rcl} N_1&=&(n_0\cdot\ldots\cdot n_{\ell-1})\cdot N\big (m, \bigl(j'+1\bigr)\cdot\ell+j,n,n_0,\ldots,n_{\ell-1}\big), \\[6pt] N_2&=&\sum_{i=1}^{\ell} \Bigg(\Bigg(\prod_{i'=1}^{i-1}(n_{j+i' \ (\mathrm {mod}\ \ell)})\Bigg)\\[6pt] &&{}\cdot N \bigl(m+1,j'\ell+j+i,n,n_0,\ldots ,n_{j+i\ (\mathrm{mod}\ \ell)}+1,\ldots,n_{\ell-1}\bigr)\Bigg),\\[6pt] N_3&=&\sum_{i=1}^{\ell-j-1} \Bigg(\Bigg(\prod _{i'=1}^{i-1}(n_{j+i' \ (\mathrm {mod}\ \ell)})\Bigg)\\[6pt] &&{}\cdot N \bigl(m+1,j'\ell+j+i,n,n_0,\ldots,n_{j+i\ (\mathrm{mod}\ \ell )}+1,\ldots,n_{\ell-1}\bigr)\Bigg). \end{array} $$

Informally, and in a similar manner as above, the number of strings satisfying the rules (A1)–(A4) where exactly the states in [1,m] appear at the positions up to an including position j′⋅ℓ+j, is equal to the sum of the number of possible strings where m+1 appears for the first time in one of the positions between j′⋅ℓ+j+1 and (m+1)⋅ℓ−1. In equation (A.3), N ₁ covers the case where between the positions j′⋅ℓ+j+1 and (j′+1)⋅ℓ+j no new state m+1 is introduced, and N ₂ covers the cases where at some position J between the positions j′⋅ℓ+j+1 and (j′+1)⋅ℓ+j, the state m+1 is introduced. In the case where j′=m, since N(m,(j′+1)⋅ℓ+j,n,n ₀,…,n _ℓ−1)=0 by the definition of the base case that also complies with the rule (A2), the equation is different in order to take into account the remaining positions until position m⋅ℓ−1. This is reflected in (A.4).

The next lemma tells that the function N(⋅) defined above, correctly counts the strings satisfying rules (A1)–(A4).

Lemma A.1

For ℓ,k,m,n,j′,j,n ₀,…,n _ℓ−1∈ℕ with 1≤m≤n, the number N(m,j′⋅ℓ+j,n,n ₀,…,n _ℓ−1) is the number of strings of length (n+1)⋅ℓ satisfying rules (A1)–(A4), with a prefix s ₀…s _j′⋅ℓ+j, where exactly the states in [1,m] occur up to the position j′⋅ℓ+j, and for each i∈[0,ℓ−1], and p∈[0,j′⋅ℓ+j], it holds that |{s _p | p=i (mod ℓ)}|=n _i.

Proof

We proceed by inverse induction on m to show that the statement above holds. For the base case, let m=n. Notice that if m=n+1, N(n+1,J,n,n ₀,…,n _ℓ−1)=0 for all values of the other parameters. We show that N(n,J,n,n ₀,…,n _ℓ−1) is the number of strings described by the lemma. For J≥(n+1)⋅ℓ, N(n,J,n,n ₀,…,n _ℓ−1)=0, as required by the rules. We proceed by inverse induction on J≤(n+1)⋅ℓ−1.

If J=n⋅ℓ+j for some j∈[0,ℓ−1], then

$$ N(n,n\cdot\ell+j,n,n_0,\ldots,n_{\ell-1})=\prod_{i=j+1}^{\ell-1} n_{i}, $$

by (A.2), which is the correct number according to the rules (A1)–(A2).

For the inductive case, suppose that for all r>J′ for some J′<n⋅ℓ, N(n,r,n,n ₀,…,n _ℓ−1) for all n _i with $n=\sum_{i=0}^{\ell -1}n_{i}$, is the correct number of strings. Consider the number N(n,J′,n,n ₀,…,n _ℓ−1). If for any i∈[0,ℓ−1], n _i>k then this number is equal to 0 which complies with the rule (A3) of strings. Suppose then that for all i∈[0,ℓ−1], n _i≤k and let $J'=J'_{0}\cdot\ell+J'_{1}$, for $J'_{0},J'_{1}\in\mathbb{N}$ and $J'_{0}$ maximal. Notice that, since all states [1,n] occur at a position up to position J′, no state number can appear in the string at a position to the right of the position J′ that has not appeared to the left or exactly at the position J′. Therefore, from the rules (A3) and (A4), each position to the right of position J′ can be occupied by states that have already appeared before that position, and furthermore, have appeared at positions associated with the appropriate label. Consider therefore, the next ℓ positions, starting with $J'_{0}\cdot\ell+J'_{1}+1$. For this position there are $n_{J'_{1}+1 \ (\mathrm {mod}\ \ell)}$ possible values for the string to comply with the rules (A1)–(A4). Similarly, for the position $J'_{0}\cdot\ell+J'_{1}+2$ there are $n_{J'_{1}+2 \ (\mathrm {mod}\ \ell)}$ possible symbols, and so on, until position $J'_{0}\cdot\ell+J'_{1}+\ell=(J'_{0}+1)\cdot\ell+J'_{1}$ for which there are $n_{J'_{1}}$ possible values. By the inductive hypothesis, $N(n,J'_{0}\cdot\ell+J'_{1}+\ell,n,n_{0},\ldots,n_{\ell-1})$ is the number of possible strings with a prefix $s_{0},\ldots,s_{(J'_{0}\cdot\ell+J'_{1}+\ell)}$, that satisfy the rules (A1)–(A4) and exactly the states [1,n] appear up to position $J'_{0}\cdot\ell+J'_{1}+\ell$. Therefore, $N(n,J'_{0}\cdot\ell+J'_{1},n,n_{0},\ldots,n_{\ell-1})=n_{0}\cdot\ldots \cdot n_{\ell-1}\cdot N(n,(J'_{0}+1)\cdot\ell+J'_{1},n,n_{0},\ldots,n_{\ell-1})$, which is also what N ₁ is defined to be according to (A.3). Notice that N ₂=0, according to (A.1), which complies with the fact that no new state can appear to the right of J′.

Suppose then that for some M<n and all m>M, the number N(m,J,n,n ₀,…,n _ℓ−1), for all n _i such that $m=\sum_{i=0}^{\ell-1}n_{i}$, is the correct number for all J∈ℕ, and consider the value of N(M,J′,n,n ₀,…,n _ℓ−1) for the different values of J′∈ℕ. We show that this is the correct number of strings. First, for J′≥(M+1)⋅ℓ, the number N(M,J′,n,n ₀,…,n _ℓ−1) is equal to 0 which complies with rule (A2). We proceed by inverse induction on J′<(M+1)⋅ℓ. For the base cases, suppose $J'=M\cdot\ell+J'_{1}$, for $J'_{1}\in[0,\ell-1]$. Then, $N(M,M\cdot\ell+J'_{1},n,n_{0},\ldots,n_{\ell-1})$ is determined by (A.4). By rule (A2), the number of strings where exactly the states [1,M] appear at positions up to $M\cdot\ell+J'_{1}$ is equal to the sum of the number of strings where state M+1 appears for the first time in some position in the next $\ell-J'_{1}-1$ positions, and this is the number given by (A.4).

For the inductive hypothesis, suppose that N(M,r,n,n ₀,…,n _ℓ−1), for all n _i such that $M=\sum _{i=0}^{\ell}n_{i}$, is the correct number of strings for all r>J′ for some J′<M⋅ℓ. Consider N(M,J′,n,n ₀,…,n _ℓ−1). If for any i∈[0,ℓ−1], n _i>k then this number is equal to 0 which complies with the rule (A3) of strings.

Otherwise, the possible strings with n states that satisfy rules (A1)–(A4) and with prefix s ₀⋯s _J′, where exactly the states [1,M] occur at positions up to position J′, are the following. Either, M+1 does not appear in the following ℓ positions to the right of J′, or it appears in at least one of them. For the first case, the number of possible strings is n ₀⋅…⋅n _ℓ−1⋅N(M,J′+ℓ,n,n ₀,…,n _ℓ−1), where by the inductive hypothesis, N(M,J′+ℓ,n,n ₀,…,n _ℓ−1) is the correct number of the appropriate strings. This is the number given by the term N ₁ of (A.3). For the second case, let J′=J ₀⋅ℓ+J ₁ and let us consider all possible ℓ positions to the right of position J′ where the state M+1 appears for the first time. Suppose that this position is J′+i for i∈[1,ℓ]. Then, the number of possible strings complying with rules (A1)–(A4) is the following. For position J′+1 there are $n_{J_{1}+1 \ (\mathrm {mod}\ \ell)}$ possible values, for position J′+2 there are $n_{J_{1}+2 \ (\mathrm {mod}\ \ell)}$ possible values, and so on until the position J′+i which is labeled by M+1. The number of allowed strings with prefix s ₀⋯s _J′+i, and where the states [1,M+1] appear at positions up to position J′+i, is given by $N(M+1,J'+i,n,n_{0},\ldots,n_{J_{1}+i \ (\mathrm {mod}\ \ell)},\ldots,n_{\ell-1})$, by the inductive hypothesis. Considering all possible values where the new symbol can appear, we get a sum equal to the term N ₂ of (A.3). Notice that any string counted in one of the terms of this sum, is not counted in any other term of this sum. Therefore, N(M,J′,n,n ₀,…,n _ℓ−1) is equal to N ₁+N ₂, which is what is described by (A.3). □

We next show that N(⋅) also correctly counts the number of objects in the specification given in Fig. 3.

Lemma A.2

Let ℓ,k,m,n,j′,j,n ₀,…,n _ℓ−1∈ℕ, ℓ<m. The number of objects in $\mathcal {S}_{m}^{(j',j)}[\overline {W}^{m}]$ is equal to N(m,j′⋅ℓ+j,n,n ₀,…,n _ℓ−1), where for each i∈[0,ℓ−1], n _i=|W _i|.

Proof

Firstly, for m>n, N(m,j′⋅ℓ+j,n,n ₀,…,n _ℓ−1)=0 and there are no objects in $\mathcal {S}_{m}^{(j',j)}[\overline{W}^{m}]$. We proceed by reverse induction on m≤n to show that the statement holds.

Suppose first that m=n. Then for all j′>m=n, N(n,j′⋅ℓ+j,n,n ₀,…,n _ℓ−1)=0, and $\mathcal {S}_{n}^{(j',j)}[\overline{W}^{n}]$ has no objects. We then show that the statement holds by reverse induction on j′≤m=n.

For the base cases, suppose that j′=n and j∈[0,ℓ−1]. Then $\mathcal{S}_{n}^{(n,j)}[\overline{W}^{n}]:= \prod_{i=j+1}^{\ell-1} \mathcal{W}_{i}$, and $N(n, n\cdot\ell+j,n,n_{0},\ldots,n_{\ell-1}) = \prod _{i=j+1}^{\ell-1} n_{i}$. For each i in [0,ℓ−1], the number of objects in $\mathcal {W}_{i}$ is equal to n _i, and hence the statement holds for j′=n.

Next, assume that the statement holds for m=n and j′>J′ for some J′<n and consider the case where j′=J′ and j∈[0,ℓ−1]. The class $\mathcal{S}_{n}^{(J',j)}[\overline{W}^{n}]$ is given by Q ₁+Q ₂, and N(n,J′⋅ℓ+j,n,n ₀,…,n _ℓ−1) is given by N ₁+N ₂. Consider first Q ₁ and N ₁. The class Q ₁ is defined as

$$Q_1= \Biggl(\,\prod_{i=j+1}^{\ell+j}\mathcal{W}_{i \ (\mathrm {mod}\ \ell)}\Biggr)\times \mathcal{S}_{n}^{(J'+1,j)} \bigl[\overline{W}^{n}\bigr], $$

and for m=n,

$$N_1=(n_0\cdot\ldots\cdot n_{\ell-1})\cdot N\big (n, \bigl(J'+1\bigr)\cdot\ell+j,n,n_0,\ldots,n_{\ell-1}\big). $$

From the inductive hypothesis, we may conclude that the number of objects in Q ₁ is equal to N ₁ for m=n, j′=J′ and j∈[0,ℓ−1].

Similarly, for m=n and j′=J′, the class Q ₂ is empty by the equation

$$Q_2:= \sum_{i=1}^{\ell} \Biggl( \prod_{i'=1}^{i-1} \mathcal{W}_{j+i' \ (\mathrm {mod}\ \ell)}\Biggr)\times \mathcal {Z}_{n+1}\times\mathcal{S}_{n+1}^{(J',j+i)} \bigl[\overline{W}^{n}_{{n+1},j+i \ (\mathrm {mod}\ \ell)}\bigr] $$

and N ₂ is defined to be equal to

We have that Q ₂ is undefined whereas N called for n+1 is equal to 0 by definition. Hence, the number of objects in Q ₁+Q ₂ is equal to N ₁+N ₂ for m=n and j′=J′, and hence the statement holds.

Suppose next that the statement holds for all m>M for some M<n, and consider the case where m=M. For all j′>M, N(M,j′⋅ℓ+j,n,n ₀,…,n _ℓ−1)=0, and $\mathcal {S}_{M}^{(j',j)}[\overline{W}^{M}]$ has no objects. We then show that the statement holds by reverse induction on j′≤M.

For the base cases, let j′=M, and j∈[0,ℓ−1]. Then, we have that the class $\mathcal{S}_{M}^{(j',j)}[\overline{W}^{M}]$ is defined by

$$Q_3:= \sum_{i=1}^{\ell-j-1} \Biggl( \prod_{i'=1}^{i-1} \mathcal{W}_{j+i' \ (\mathrm {mod}\ \ell)}\Biggr) \times \mathcal {Z}_{M+1}\times\mathcal{S}_{M+1}^{(j',j+i)} \bigl[\overline{W}^{M}_{{M+1},j+i \ (\mathrm {mod}\ \ell)}\bigr], $$

whose number of elements is equal to

by the inductive hypothesis. Finally, assume that the statement holds for all j′>J′ for some J′<M, and consider the case where j′=J′. The class $\mathcal{S}_{M}^{(J',j)}[\overline{W}^{M}]$ is defined by the equation Q ₁+Q ₂. Since

$$Q_1:= \Biggl(\prod_{i=j+1}^{\ell+j}\mathcal{W}_{i \ (\mathrm {mod}\ \ell)}\Biggr)\times \mathcal{S}_{M}^{(J'+1,j)} \bigl[\overline{W}^{M}\bigr], $$

its number of elements is equal to

$$N_1=\Big(n_0\cdot\ldots\cdot n_{\ell-1}\Big)\cdot N\big (M,(J'+1)\cdot\ell+j,n,n_0,\ldots,n_{\ell-1}\big), $$

by the inductive hypothesis. Similarly, the number of elements in Q ₂ defined by

$$Q_2=\sum_{i=1}^{\ell} \Biggl( \prod_{i'=1}^{i-1} \mathcal{W}_{j+i' \ (\mathrm {mod}\ \ell)}\Biggr)\times \mathcal {Z}_{M+1}\times\mathcal{S}_{M+1}^{(J',j+i)}[\overline{W}^{M}_{{M+1},j+i \ (\mathrm {mod}\ \ell)}] $$

is equal to

by the inductive hypothesis. □

Clearly, since N(⋅) both correctly counts strings and the corresponding objects in the specification, the lemma readily follows.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Antonopoulos, T., Geerts, F., Martens, W. et al. Generating, Sampling and Counting Subclasses of Regular Tree Languages. Theory Comput Syst 52, 542–585 (2013). https://doi.org/10.1007/s00224-012-9428-x

Download citation

Published: 26 October 2012
Issue Date: April 2013
DOI: https://doi.org/10.1007/s00224-012-9428-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Generating, Sampling and Counting Subclasses of Regular Tree Languages

Abstract

Access this article

Similar content being viewed by others

Inferring Deterministic Regular Expression with Counting

Inferring Regular Expressions with Interleaving from XML Data

XQuery Testing from XML Schema Based Random Test Cases

Notes

References