Recognition and parsing can be understood as a search process, in which a space of states of the parsing algorithm is traversed until a solution is found. There are various strategies for traversing the grammar and the input string:
A top-down parser starts with the input signal and expands it (and all non-terminal symbols in the strings resulting from this) by replacing the non-terminal symbols with the right hand side of a rule whose left hand side matches the non-terminal symbol until a string of terminal symbols is derived which matches the input string.
A bottom-up parser starts with the input string and reduces substrings which match the right hand side of grammar rules to the left hand side symbol, until the start symbol is reached.
A left corner parser is a particular kind of bottom up parser in which the current input symbol is matched with rules in which the leftmost element on the right hand side and the rest of the symbols in the rule are used in top-down fashion.
A directional parser is one in which the parse procedure starts with the leftmost (left-right) or rightmost (right-left) symbol in the input string. A bidirectional parser is one in which any symbol may be taken first and right or left neighbouring symbols may be analysed subsequently.
A deterministic parser is one in which the grammar rule can be taken at any one time is uniquely determined. A non-deterministic parser is one in which there may be a choice of grammar rules which may be taken at any one time (i.e. there is structural ambiguity), and there is no criterion in the input string for selecting which should be taken.
In a non-deterministic parser, an extrinsic strategy is required for deciding in which order to try optional rules. In a stack-based parser, one option at a time is tried and completely analysed; if it fails, backtracking (usually to the nearest alternative) occurs, and further tries are made with the remaining options. In a queue parser, options are placed at the end of a queue while the current rule is exhaustively handled, when all options at the same depth of analysis are tried together before further analysis at deeper levels takes place.
Stack and queue parsers use lists (either stacks or queues) as their basic
data structures.
However, both strategies have drawbacks.
Stack parsers with backtracking destroy any structures they have built up
when an option fails; they may need to re-build the same structures later
on when another option works. This makes them extremely
- in fact exponentially - time-consuming in the worst case
(i.e. the length of the input string n is the exponent of
some constant which depends on the grammar,
).
Queue parsers build up structures for all options at the same time, they may duplicate the same structures for different options. This makes them extremely - in fact exponentially - space-consuming (memory consuming) in the worst case, again dependent on the length of the input string).
In a chart parser (well-formed substring table parser), structures once built
are saved and recovered when the relevant substring is re-traversed. This
results in enormous time-saving in relation to the stack parser and
space-saving in relation to the queue parser. In fact, the complexity is
not exponential, but in the worst case cubic, i.e. dependent on
,
which is a much smaller quantity.
Two types of lookahead are used to make parsers more efficient:
There is a well-known optimised grammar format for context-free grammars, Greibach Normal Form (GNF), in which in all rules the first symbol on the right hand side is a terminal symbol, i.e. effectively the reachability table is implicit in the grammar. For any context-free grammar, a GNF grammar may be compiled.
There is another well-known optimised grammar format for context-free grammars, Chomsky Normal Form (CNF), in which in all rules the right side may contain only either a maximum of 2 non-terminal symbols or just one terminal symbol.