In this post I am going to explain a limitation that I found while working with pickle, that dill exists and my thoughts on the shortcut of using libraries to serialise binary representation.

In python, when serialising a custom instance, pickle assumes or requires to have the definition of the class available. Loading pickle from some code when the definition is available will cause:

## saving
class A: pass
import pickle
with open('/tmp/test_pickle.pickle', 'wb') as f:
    pickle.dump(A(), f)
## loading (different file)
import pickle
with open('/tmp/test_pickle.pickle', 'rb') as f:
    obj pickle.load(f)

Will cause this error:

Traceback (most recent call last):
  File "...", line X, in <module>
    obj = pickle.load(f)
AttributeError: Can't get attribute 'A' on <module '__main__' from '.../load_pickle.py'>

This is caused by the original definition of A not being available by the loader.

Back in 2023, while exploring mlem, a library to facilitate model deployments (now defunct) used dill to overcome this limitation by:

serializing and de-serializing Python objects to the majority of the built-in Python types

dill handles the serialisation example in [1] fine. However when trying to serialise with dill instead of pickle:

import dill as pickle

and change slightly the class definition provided in [1]:

VAR = 1 
class A:
    @property
    def prop(self):
        global VAR
        return VAR

it produces a different error:

Traceback (most recent call last):
  File "/home/nesaro/load_pickle.py", line 3, in <module>
    obj = pickle.load(f)
AttributeError: Can't get attribute 'A' on <module '__main__' from '/home/nesaro/load_pickle.py'>

This is caused by limitations on how much the default recursion of dill collects. In https://stackoverflow.com/questions/53342955/serialize-a-python-method-with-global-variables-by-dilldill’s author says that the recurse flag can fix it.

Libraries like dill, or the various pickle methods to enable pickle to load any class

https://docs.python.org/3.8/library/pickle.html#pickle-inst

Are shortcuts to avoid writing code to deal with the lifetime of the data in the program. They are particularly useful for models because the model themselves are generally blobs hard to reason about, so it makes sense to use something automated.

But the trade off is that the libraries have limits in terms of scope and types of objects, hence the need for libraries like dill that are a partial fix.