next up previous
Next: LISP implementation of a Up: 13.11.2001: Grammatik: prozedurale Aspekte Previous: Processing languages and grammars

Parsing as search - search parameters (advanced topic)

Recognition and parsing can be understood as a search process, in which a space of states of the parsing algorithm is traversed until a solution is found. There are various strategies for traversing the grammar and the input string:

Top-down vs. bottom-up

A top-down parser starts with the input signal and expands it (and all non-terminal symbols in the strings resulting from this) by replacing the non-terminal symbols with the right hand side of a rule whose left hand side matches the non-terminal symbol until a string of terminal symbols is derived which matches the input string.

A bottom-up parser starts with the input string and reduces substrings which match the right hand side of grammar rules to the left hand side symbol, until the start symbol is reached.

A left corner parser is a particular kind of bottom up parser in which the current input symbol is matched with rules in which the leftmost element on the right hand side and the rest of the symbols in the rule are used in top-down fashion.

Directional (left-right vs. right-left) vs. bidirectional (island, head-first)

A directional parser is one in which the parse procedure starts with the leftmost (left-right) or rightmost (right-left) symbol in the input string. A bidirectional parser is one in which any symbol may be taken first and right or left neighbouring symbols may be analysed subsequently.

Deterministic vs. non-deterministic

A deterministic parser is one in which the grammar rule can be taken at any one time is uniquely determined. A non-deterministic parser is one in which there may be a choice of grammar rules which may be taken at any one time (i.e. there is structural ambiguity), and there is no criterion in the input string for selecting which should be taken.

Stack (depth-first; backtracking; recursive descent) vs. queue (breadth-first)

In a non-deterministic parser, an extrinsic strategy is required for deciding in which order to try optional rules. In a stack-based parser, one option at a time is tried and completely analysed; if it fails, backtracking (usually to the nearest alternative) occurs, and further tries are made with the remaining options. In a queue parser, options are placed at the end of a queue while the current rule is exhaustively handled, when all options at the same depth of analysis are tried together before further analysis at deeper levels takes place.

List vs. table (chart, well-formed substring table)

Stack and queue parsers use lists (either stacks or queues) as their basic data structures. However, both strategies have drawbacks. Stack parsers with backtracking destroy any structures they have built up when an option fails; they may need to re-build the same structures later on when another option works. This makes them extremely - in fact exponentially - time-consuming in the worst case (i.e. the length of the input string n is the exponent of some constant which depends on the grammar, tex2html_wrap_inline945).

Queue parsers build up structures for all options at the same time, they may duplicate the same structures for different options. This makes them extremely - in fact exponentially - space-consuming (memory consuming) in the worst case, again dependent on the length of the input string).

In a chart parser (well-formed substring table parser), structures once built are saved and recovered when the relevant substring is re-traversed. This results in enormous time-saving in relation to the stack parser and space-saving in relation to the queue parser. In fact, the complexity is not exponential, but in the worst case cubic, i.e. dependent on tex2html_wrap_inline947, which is a much smaller quantity.

Lookahead (LRk; accessibility tables

Two types of lookahead are used to make parsers more efficient:

LRk:
Parsers may have a lookahead of k positions in the input string.
Accessibility (reachability) table:
A table may be pre-compiled from the grammar in which it is known which terminal symbols may be reached from which nononterminal symbols (or vice versa for bottom-up parsers).

There is a well-known optimised grammar format for context-free grammars, Greibach Normal Form (GNF), in which in all rules the first symbol on the right hand side is a terminal symbol, i.e. effectively the reachability table is implicit in the grammar. For any context-free grammar, a GNF grammar may be compiled.

There is another well-known optimised grammar format for context-free grammars, Chomsky Normal Form (CNF), in which in all rules the right side may contain only either a maximum of 2 non-terminal symbols or just one terminal symbol.


next up previous
Next: LISP implementation of a Up: 13.11.2001: Grammatik: prozedurale Aspekte Previous: Processing languages and grammars

Dafydd Gibbon, Wed Feb 12 10:50:41 MET 2003 Automatically generated, links may change - update every session.