my-server
← Wiki

Regular tree grammar

In theoretical computer science and formal language theory, a regular tree grammar is a formal grammar that describes a set of directed trees, or terms. A regular word grammar can be seen as a special kind of regular tree grammar, describing a set of single-path trees.

Definition

A regular tree grammar G is defined by the tuple G = (N, Σ, Z, P), where:

  • N is a finite set of nonterminals,
  • Σ is a ranked alphabet (i.e., an alphabet whose symbols have an associated arity) disjoint from N,
  • Z is the starting nonterminal, with , and
  • P is a finite set of productions of the form A → t, with , and , where T<sub>Σ</sub>(N) is the associated term algebra, i.e. the set of all trees composed from symbols in according to their arities, where nonterminals are considered nullary.

Derivation of trees

The grammar G implicitly defines a set of trees: any tree that can be derived from Z using the rule set P is said to be described by G. This set of trees is known as the language of G. Formally, the relation ⇒<sub>G</sub> on the set T<sub>Σ</sub>(N) is defined as follows:

A tree can be derived in a single step into a tree (in short: t<sub>1</sub> ⇒<sub>G</sub> t<sub>2</sub>), if there is a context S and a production such that:

  • t<sub>1</sub> = S[A], and
  • t<sub>2</sub> = S[t].

Here, a context means a tree with exactly one hole in it; if S is such a context, S[t] denotes the result of filling the tree t into the hole of S.

The tree language generated by G is the language .

Here, T<sub>Σ</sub> denotes the set of all trees composed from symbols of Σ, while ⇒<sub>G*</sub> denotes successive applications of ⇒<sub>G</sub>.

A language generated by some regular tree grammar is called a regular tree language.

Examples

Let G<sub>1</sub> = (N<sub>1</sub>,Σ<sub>1</sub>,Z<sub>1</sub>,P<sub>1</sub>), where

  • N<sub>1</sub> = {Bool, BList } is our set of nonterminals,
  • Σ<sub>1</sub> = { true, false, nil, cons(.,.) } is our ranked alphabet, arities indicated by dummy arguments (i.e. the symbol cons has arity 2),
  • Z<sub>1</sub> = BList is our starting nonterminal, and
  • the set P<sub>1</sub> consists of the following productions:
  • Bool → false
  • Bool → true
  • BList → nil
  • BList → cons(Bool,BList)

An example derivation from the grammar G<sub>1</sub> is

BList ⇒ cons(Bool,BList) ⇒ cons(false,cons(Bool,BList)) ⇒ cons(false,cons(true,nil)).

The image shows the corresponding derivation tree; it is a tree of trees (main picture), whereas a derivation tree in word grammars is a tree of strings (upper left table).

The tree language generated by G<sub>1</sub> is the set of all finite lists of boolean values, that is, L(G<sub>1</sub>) happens to equal T<sub>Σ1</sub>. The grammar G<sub>1</sub> corresponds to the algebraic data type declarations (in the Standard ML programming language):

Every member of L(G<sub>1</sub>) corresponds to a Standard-ML value of type BList.

For another example, let , using the nonterminal set and the alphabet from above, but extending the production set by P<sub>2</sub>, consisting of the following productions:

  • BList → cons(true,BList)
  • BList → cons(false,BList)

The language L(G<sub>2</sub>) is the set of all finite lists of boolean values that contain true at least once. The set L(G<sub>2</sub>) has no datatype counterpart in Standard ML, nor in any other functional language. It is a proper subset of L(G<sub>1</sub>). The above example term happens to be in L(G<sub>2</sub>), too, as the following derivation shows:

BList ⇒ cons(false,BList) ⇒ cons(false,cons(true,BList)) ⇒ cons(false,cons(true,nil)).

Language properties

If L<sub>1</sub>, L<sub>2</sub> both are regular tree languages, then the tree sets , and L<sub>1</sub> \ L<sub>2</sub> are also regular tree languages, and it is decidable whether , and whether L<sub>1</sub> = L<sub>2</sub>.

Alternative characterizations and relation to other formal languages

Applications

Applications of regular tree grammars include:

See also

References

Further reading

  • Regular tree grammars were already described in 1968 by:
  • A book devoted to tree grammars is:
  • Algorithms on regular tree grammars are discussed from an efficiency-oriented view in:
  • Given a mapping from trees to weights, Donald Knuth's generalization of Dijkstra's shortest-path algorithm can be applied to a regular tree grammar to compute for each nonterminal the minimum weight of a derivable tree. Based on this information, it is straightforward to enumerate its language in increasing weight order. In particular, any nonterminal with infinite minimum weight produces the empty language. See:
  • Regular tree automata have been generalized to admit equality tests between sibling nodes in trees. See:
  • Allowing equality tests between deeper nodes leads to undecidability. See: