Ambiguity Detection Of RELAX Grammars
KAWAGUCHI Kohsuke, (email@example.com)
Sun Microsystems, Inc.
May 24, 2001
In this paper, an ambiguity detection algorithm for grammars written in RELAX, a schema language
for XML, is presented. Ambiguous schemata cause troubles to those applications which process document
instances written in them. This algorithm is a sound as well as complete algorithm to determine the
ambiguity of a grammar.
The proposed algorithm is an ambiguity detection algorithm for general hedge regular grammar, with appli-
cation to RELAX  in mind. RELAX is a grammar-based schema language for XML.
Modern schema languages, like RELAX and TREX , are powerful in terms of expressiveness, compared
to older schema languages like DTD. For example, they allow non-deterministic content models, tag name
overloading, more powerful datatyping, and so on . However, at the same time, this expressive power
causes some problems. One of these problems is the ambiguity. Consider the following RELAX grammar:
<ref label="foo" occurs="+" />
<attribute name="id" type="ID" />
<element name="ref" type="IDREF" />
<attribute name="id" type="string" />
<element name="ref" type="string" />
and let us see what happens when the following XML instance is validated by the above grammar.
<foo id="a" ref="b" />
<foo id="a" ref="a" />
<foo id="b" ref="a" />
If the first foo is considered as ID/IDREF pair, then the second foo must be string/string pair, and the
third must be ID/IDREF pair. Another possible interpretation is that all foo are string/string pair. And
other interpretations are also possible.
Intuitively, to enforce ID/IDREF constraints in such a grammar is considerably harder than doing the
same thing for DTD. Several schema languages, including XML Schema , avoids this problem by sticking