[Image] [Image] [Image] [Image]
Next: Token file Up: Format specifications Previous: Overview

Theory file

 

The specification of the DATR theory file is identical to a definition of the syntax of the DATR language. A number of BNF formulations have been proposed in the literature; the following has been used in a number of implementations and is used in Zdatr 2.0.

The input file to zdatrtok should satisfy the format described in BNF:

  <theory>      ::= <atomexeceptions> <theory> |
                    <nodeexeceptions> <theory> |
                    <variablelist> <theory> |
                    <sentence> <theory> |
                    <sentence>
 
  <atomexeceptions>::= # atom <itemlist> .
  <nodeexeceptions>::= # node <itemlist> .
  <itemlist>    ::= <item> | <item> <itemlist>
  <item>        ::= <char_string> | ' <any_string> '
 
  <variablelist>::= #vars <variabledefinition> .
  <variabledefinition> ::= <variablename> : <list_of_atoms>
  <list_of_atoms>::= atom <list_of_atoms> | atom	
 
  <sentence>     ::=  <node> : <eqseq> .
  <eqseq>        ::=  <equation> | <equation> <eqseq>
  <equation>     ::=  <lhs> <rhs>

  <lhs>          ::=  < <atomseq> >
  <rhs>          ::=  <extrhs> | <defrhs>

  <extrhs>       ::=  = <atomval>
  <atomval>      ::=  <atomseq>
  <atomseq>      ::=  <epsilon> | <atom> <atomseq>

  <defrhs>       ::=  == <descval>
  <descval>      ::=  <descseq>
  <descseq>      ::=  <epsilon> | <desc> <descseq>

  <desc>         ::=  <atom> | <pointer>
  <pointer>      ::=  " <spec> " | <spec>
  <spec>         ::=  <node> : <descpath> | <node> | <descpath>
  <descpath>     ::=  < <descseq> <plop> >
  <plop>         ::=  . | <epsilon>

  <node>         ::=  <upper_case> <char_string>
  <atom>         ::=  <not_upper_case> <char_string> | ' <any_string> '
  <variablename> ::=  $ <char> <charstring>

  <char_string>  ::=  <epsilon> | <char> <char_string>
  <any_string>   ::=  <epsilon> | <any> <any_string>

  <res_char>     ::=  [!:<>"='.%] 
  <char>         ::=  ! [0...256] | [ASCII(33)...ASCII(127)]-[<reschar>]
  <any>          ::=  [ASCII(9)...ASCII(127)]-[']
  <upper_case>   ::=  [A-Z]
  <not_upper_case> ::= [<char>] - [<upper_case>]

The category <epsilon> stands for [Image], the empty string.

Comments are allowed, everything between the comment marker '%' and the next linefeed will be ignored.

Quoted atoms can contain the following escape sequences:[Image]

Escape sequence Meaning
\xH character represented by a sequence of hexadecimal digits,
thus \xA or \x0A is newline (\n)
\nnn character represented by three octal digits,
thus \012 is also newline (\n)
\b backspace
\e escape character
\f form-feed
\n newline
\r carriage return
\t horizontal tab
\v vertical tab

Atoms that do not satisfy the standard definition of atoms can be defined with the # atom tag. Non-standard nodes can be defined with the # node tag.

Any number of WHITESPACE characters may appear between the # and the three tags, but note that a declaration continues to the next non-quoted dot. Dots within quoted symbols are ignored.

Although the tokenizer can recognize many errors and shows them together with filenumber, description and context, it is possible to produce misleading error messages because of the properties of the parsing algorithm. Nevertheless, the line number yields the correct location of the error.

Sample theory and demo files are included in this distribution.

The following changes to the previous BNF have been made:

  1. Optional parentheses enclosing RHS sequences have been removed; parentheses are no longer reserved characters.
  2. The dot ``.'', representing the path lop or cut (path extension blocking) operator, is included.

[Image] [Image] [Image] [Image]
Next: Token file Up: Format specifications Previous: Overview

© Dafydd Gibbon Sun Sep 13 17:17:46 MET DST 1998