Grammar Classification

This is the Chomsky formal grammar classification in CamelCase

  • RecursivelyEnumerableLanguage
  • RecursiveLanguage
  • ContextSensitiveLanguage
  • MildlyContextSensitiveLanguage
  • ContextFreeLanguage
  • RegularLanguage

Recursive descent parser

Left to right

Calls a recursive function with a slice of the input and a symbol, trying all productions, until a terminal symbol is reached. At that point, the input is compared with the terminal symbol. If there is a match, the previous recursion receives a positive result. If the first call with the initial symbol receives a valid result, the input is accepted and a parse tree is returned.

Weighted

This is a small improvement over the left to right algorithm; in each production the higher score symbol is parsed first. With this optimization an invalid response will be processed faster, although the algorithm is generally slower because it has to consider multiple hypothesis simultaneously.

Because the first element processed is not the first from the left, an arbitrary number of tokens might be consumed later.

This algorithm stores every possible combination and it deletes unsuitable alternatives.

The reason this algorithm is better in the general case is because it delays the call to subgrammars and passes smaller slices of the input, reducing the total amount of computation

LR/SLR

Lr0 is a top up parser. From an algorithmic point of view, top up parsers are generally more complex than bottom down, but in pydsl the gap is bigger.

Lexer and parser

The first necessary distinction is the lexer. A lexer usually uses a regular grammar recognizer that parses an input into tokens. The list of available tokens is predefined.

True recursivity

What makes the implementation harder is that in pydsl the lexers can call subgrammars. There are no substantial changes in the parser.

See also:

http://nedbatchelder.com/text/python-parsers.html