Python pickle context
In this post I am going to explain a limitation that I found while working with pickle, that dill exists and my thoughts on the shortcut of using libraries to serialise binary representation.
In python, when serialising a custom instance, pickle assumes or
requires to have the definition of the class available. Loading
pickle from some code when the definition is available will cause:
## saving
class A: pass
import pickle
with open('/tmp/test_pickle.pickle', 'wb') as f:
pickle.dump(A(), f)
## loading (different file)
import pickle
with open('/tmp/test_pickle.pickle', 'rb') as f:
obj pickle.load(f)
Will cause this error:
Traceback (most recent call last):
File "...", line X, in <module>
obj = pickle.load(f)
AttributeError: Can't get attribute 'A' on <module '__main__' from '.../load_pickle.py'>
This is caused by the original definition of A not being available by the loader.
Back in 2023, while exploring mlem, a library to facilitate model deployments (now defunct) used dill to overcome this limitation by:
serializing and de-serializing Python objects to the majority of the built-in Python types
dill handles the serialisation example in [1] fine. However when
trying to serialise with dill instead of pickle:
import dill as pickle
and change slightly the class definition provided in [1]:
VAR = 1
class A:
@property
def prop(self):
global VAR
return VAR
it produces a different error:
Traceback (most recent call last):
File "/home/nesaro/load_pickle.py", line 3, in <module>
obj = pickle.load(f)
AttributeError: Can't get attribute 'A' on <module '__main__' from '/home/nesaro/load_pickle.py'>
This is caused by limitations on how much the default recursion of dill
collects.
In
https://stackoverflow.com/questions/53342955/serialize-a-python-method-with-global-variables-by-dilldill’s
author says that the recurse flag can fix it.
Libraries like dill, or the various pickle methods to enable
pickle to load any class
https://docs.python.org/3.8/library/pickle.html#pickle-inst
Are shortcuts to avoid writing code to deal with the lifetime of the data in the program. They are particularly useful for models because the model themselves are generally blobs hard to reason about, so it makes sense to use something automated.
But the trade off is that the libraries have limits in terms of
scope and types of objects, hence the need for libraries like
dill that are a partial fix.
- tags:#python