I’m working on a new feature to compare information according to an arbitrary grammar/alphabet.
Alphabet diff is similar to standard unix diff: using Longest Common Substring after calling the Lexer/Scanner. To some extend, this is an encoding aware diff.
BNF Grammars generate a parse tree during the parsing process. Besides that, some functions generates an AST from the Parse tree. These trees can be compared using a tree diff algorithm like Zhang-Shasha.
However, the general case is that there are other ways of parsing a string that doesn’t generate any parse tree. The solution I have so far is to use per grammar diff implementations, but mostly for partially defined grammars. Partially defined grammars are grammars which contains some unknown symbols. E.g.:
list -> open [ unk ,unk )*] close
unk symbol means that the symbol is unknown for the grammar. This way of generating diffs is grammar specific and at a the same time doesn’t require to create a diff algorithm for an infinite number of grammars general diff branch general diff issue unknown grammar branch