I'm working on a new feature to compare information according to an arbitrary grammar/alphabet.


Alphabet diff is similar to standard unix diff: using Longest Common Substring after calling the Lexer/Scanner. To some extend, this is an encoding aware diff.


BNF Grammars generate a parse tree during the parsing process. Besides that, some functions generates an AST from the Parse tree. These trees can be compared using a tree diff algorithm like Zhang-Shasha

However, the general case is that there are other ways of parsing a string that doesn't generate any parse tree. The solution I have so far is to use per grammar diff implementations, but mostly for partially defined grammars. Partially defined grammars are grammars which contains some unknown symbols. E.g.:

      list -> open [ unk (unk )*] close

unk symbol means that the symbol is unknown for the grammar. This way of generating diffs is grammar specific and at a the same time doesn't require to create a diff algorithm for an infinite number of grammars

general diff branch
general diff issue
unknown grammar branch