As many other IT startups, we are using mongodb at iwoca. Mongo is a database that belongs to the NoSQL family. It stores documents in BSON format (a slightly enhanced binary version of JSON) without enforcing any structure. Mongo interacts very well with dynamic languages like Python; BSON objects are trivially translated into dictionaries and vice-versa. But for us, as a technology, Mongodb is hardly a silver bullet. The lack of constraints means that over the time different programs interact with the same object without following any rules. A mismatch in the semantics of an object according to different programs will potentially lead to a “wrong” document for some of them.

More strict access interfaces to the database can help to mitigate this problem. Another option is to reduce the number of programs that can modify the database. A third option is to externally check the documents. pydsl can help with the external checking.

Grammar definition

MongoDB offers a query language that can be used as a grammar definition. With a bit of effort, this language can take advantage of the pydsl library:

{"id":{"$type":"cstring"},
"amount":{"$type":"integer"},
"status":{"$or":["status1","status2","status3"]}}

Checking documents

pydsl includes a checking class for mongodb query grammar definition. It tests one document against the given spec.

Guessing documents

Given a sets of specs, guesser returns a list of all the specs that validates the document. There are both a guesser class and binary available in pydsl

###A practical example: check all the elements of a collection

Create a directory and store all your grammar definitions there

definitions/def1.py: spec = {‘id’:{“$type”:”integer”}, “info”:{“$type”:”cstring”}, “valid_until”:{“$type”:”cstring”}, “valid_from”:{“$type”:”cstring”}, “response”:{“$type”:”cstring”}} iclass=”MongoDict”

3. Instantiate the library with the definitions

from pydsl.Memory.Storage.Directory.Grammar import GrammarDirStorage
library = GrammarDirStorage("definitions/")

2. Load all your bson objects

bs = open(path + ".bson", "rb").read()
for x in bson.decode_all( bs ):
    pass

4. Load and call the guesser class

from pydsl.Guess import Guesser
guess = Guesser([library])
validgrammars = guess(x)

##Update on 11/9/2012

About the MongoDB verification post, recently I’ve discovered the json schema and a python implementation written by Julian Berman. It is more verbose than the mongo query syntax, but it is also more powerful. I have implemented a pydsl Checker using Julian Berman’s library. Hopefully a few extra features will be added soon.  Json schema is a very interesting idea… I’m wondering if json could do a good job as an AST representation.