The specification of the DATR theory file is identical to a definition of the syntax of the DATR language. A number of BNF formulations have been proposed in the literature; the following has been used in a number of implementations and is used in Zdatr 2.0.
The input file to zdatrtok should satisfy the format described in BNF:
<theory> ::= <atomexeceptions> <theory> |
<nodeexeceptions> <theory> |
<variablelist> <theory> |
<sentence> <theory> |
<sentence>
<atomexeceptions>::= # atom <itemlist> .
<nodeexeceptions>::= # node <itemlist> .
<itemlist> ::= <item> | <item> <itemlist>
<item> ::= <char_string> | ' <any_string> '
<variablelist>::= #vars <variabledefinition> .
<variabledefinition> ::= <variablename> : <list_of_atoms>
<list_of_atoms>::= atom <list_of_atoms> | atom
<sentence> ::= <node> : <eqseq> .
<eqseq> ::= <equation> | <equation> <eqseq>
<equation> ::= <lhs> <rhs>
<lhs> ::= < <atomseq> >
<rhs> ::= <extrhs> | <defrhs>
<extrhs> ::= = <atomval>
<atomval> ::= <atomseq>
<atomseq> ::= <epsilon> | <atom> <atomseq>
<defrhs> ::= == <descval>
<descval> ::= <descseq>
<descseq> ::= <epsilon> | <desc> <descseq>
<desc> ::= <atom> | <pointer>
<pointer> ::= " <spec> " | <spec>
<spec> ::= <node> : <descpath> | <node> | <descpath>
<descpath> ::= < <descseq> <plop> >
<plop> ::= . | <epsilon>
<node> ::= <upper_case> <char_string>
<atom> ::= <not_upper_case> <char_string> | ' <any_string> '
<variablename> ::= $ <char> <charstring>
<char_string> ::= <epsilon> | <char> <char_string>
<any_string> ::= <epsilon> | <any> <any_string>
<res_char> ::= [!:<>"='.%]
<char> ::= ! [0...256] | [ASCII(33)...ASCII(127)]-[<reschar>]
<any> ::= [ASCII(9)...ASCII(127)]-[']
<upper_case> ::= [A-Z]
<not_upper_case> ::= [<char>] - [<upper_case>]
The category <epsilon> stands for [Image], the empty string.
Comments are allowed, everything between the comment marker '%' and the next linefeed will be ignored.
Quoted atoms can contain the following escape sequences:[Image]
| Escape sequence | Meaning |
\xH | character represented by a sequence of hexadecimal digits, |
thus \xA or \x0A is newline (\n) | |
\nnn | character represented by three octal digits, |
thus \012 is also newline (\n) | |
\b | backspace |
\e | escape character |
\f | form-feed |
\n | newline |
\r | carriage return |
\t | horizontal tab |
\v | vertical tab |
Any number of WHITESPACE characters may appear between the # and the three tags, but note that a declaration continues to the next non-quoted dot. Dots within quoted symbols are ignored.
Although the tokenizer can recognize many errors and shows them together with filenumber, description and context, it is possible to produce misleading error messages because of the properties of the parsing algorithm. Nevertheless, the line number yields the correct location of the error.
Sample theory and demo files are included in this distribution.
The following changes to the previous BNF have been made: