Serialization and Persistence#
Parameterized objects are declarative, explicitly defining a set of values for their parameters. This set of values constitutes the (parameter) state of the object, and this state can be saved (“serialized”), transmitted (if appropriate), and restored (“deserialized”) in various ways, so that object state can be sent from one Python session to another, restored from disk, configured using a text file, and so on.
Param offers several independent serialization mechanisms for a Parameterized object, each used for very different purposes:
Pickle: creates a Python pickle file containing not just the Parameters, but potentially any other state of the object. A pickle file is not human readable, and is not always portable between different python versions, but it is highly complete, capturing both parameter values and also non-Parameter attributes of an object. Useful for saving the entire state of a complex object and restoring it. All objects used in pickling need to be restorable, which puts some restrictions on Parameter values (e.g. requiring named functions, not lambdas).
JSON: captures the state as a JSON text string. Currently and probably always limited in what can be represented, but human readable and easily exchanged with other languages. Useful for sending over a network connection, saving simple state to disk for restoring later, etc.
script_repr: generates a string representation in the form of Python code that, when executed, will instantiate Parameterized objects having similar state. Useful for capturing the current state in a compact, human-readable form suitable for manual editing to create a Python file. Not all Parameters will have values representable in this way (e.g. functions defined in the current namespace will not show their function definition), but this representation is generally a reasonable human-readable starting point for hand editing.
Pickling Parameterized objects#
Param supports Python’s native pickle serialization format. Pickling converts a Python object into a binary stream of bytes that can be stored on disk, and unpickling converts a previously pickled byte stream into an instantiated Python object in the same or a new Python session. Pickling does not capture the actual Python source code or bytecode for functions or classes; instead, it assumes you will have the same Python source tree available for importing those definitions during unpickling and only stores the fully qualified path to those definitions. Thus pickling requires that you use named functions defined in separate importable modules rather than lambdas (unnamed functions) or other objects whose code is defined only in the main namespace or in a non-importable python script.
Apart from such limitations, pickling is the most rich and fully featured serialization option, capable of capturing the full state of an object even beyond its Parameter values. Pickling is also inherently the least portable option, because it does include all the details of this internal state. The resulting .pkl files are not human readable and are not normally usable outside of Python or even across Python versions in some cases. Pickling is thus most useful for “snapshots” (e.g. for checkpoint-and-restore support) for a particular software installation, rather than for exporting, archiving, or configuration. See the comparison with JSON to help understand some of the tradeoffs involved in using pickles.
Using pickling#
Let’s look at an example of pickling and unpickling a Parameterized object:
import param, pickle, time
from param.parameterized import default_label_formatter
class A(param.Parameterized):
n = param.Number(39)
l = param.List(["a","b"])
o = param.ClassSelector(class_=param.Parameterized)
def __init__(self, **params):
super(A,self).__init__(**params)
self.timestamp = time.time()
a = A(n=5, l=[1,"e",[2]], o=default_label_formatter.instance())
a, a.timestamp
(A(l=[1, 'e', [2]], n=5, name='A00003', o=default_label_formatter(capitalize=True, name='default_label_formatter00002', overrides={}, replace_underscores=True)),
1719265167.2200813)
Here we created a Parameterized object a
containing another Parameterized object nested in parameter o
, with state in self.timestamp
and not just in the Parameter values. To save this state to a file on disk, we can do a pickle “dump” and then delete the object so that we are sure it’s no longer around:
with open('data.pickle', 'wb') as f:
pickle.dump(a, f)
del a
To reload the state of a
from disk, we do a pickle “load”:
import pickle
with open('data.pickle', 'rb') as f:
a = pickle.load(f)
a, a.timestamp
(A(l=[1, 'e', [2]], n=5, name='A00003', o=default_label_formatter(capitalize=True, name='default_label_formatter00002', overrides={}, replace_underscores=True)),
1719265167.2200813)
As you can see, it restored not just the Parameter values, but the timestamp (stored in the object’s dictionary) as well.
Here we are depending on the class definition of A
actually being in memory. If we delete that definition and try to unpickle the object again, it will fail:
del A
with param.exceptions_summarized():
with open('data.pickle', 'rb') as f:
a = pickle.load(f)
AttributeError: Can't get attribute 'A' on <module '__main__'>
Notice how the pickle has stored the fact that class A
is defined in the main namespace, but because __main__
is not an importable module, unpickling fails. Had A
been defined in a module available for importing, unpickling would have succeeded here even if A had never previously been loaded.
To use pickling in practice, you’ll need to ensure that all functions and classes are named (not lambdas) and defined in some importable module, not just inline here in a notebook or script or command prompt. Even so, pickling can be very useful as a way to save and restore state of complex Parameterized objects.
Pickling limitations and workarounds#
As you develop a module using Param, you’ll need to pay attention to a few technical issues if you want to support pickling:
Callable parameter values: If you provide any
param.Callable
,param.Hooklist
, or other parameters that can accept callable objects to your users, you will need to warn them that none of those can be set to unnamed (lambda) functions or to one-off functions defined in the main namespace if they want to use pickling. Of course, you can accept such values during initial development when you may not care about pickling, but once things are working, move the one-off function to a proper importable module and then it will be safe to use as a picklable value. One way to make this work smoothly is to createparam.ParameterizedFunction
objects or other “function object” classes (classes whose instances are callable like functions but which may have state and are fully picklable); see e.g. thenumbergen
module for examples.Skipping Parameters that should not be pickled: In some cases, you may not want the value of a given Parameter to be pickled and restored even while other state is being serialized. For instance, a Parameter whose value is set to a particular file path might cause errors if that path is restored when the pickle is loaded on a different system or once the file no longer exists. To cover such rare but potentially important cases, the Parameter can be defined with
pickle_default_value=False
(normallyTrue
), so that the instantaneous value is usable but won’t be saved and restored with pickle.Customizing settting and getting state: You may find that your Parameter or Parameterized objects have other state that you need to handle specially, whether that’s to save and restore data that isn’t otherwise picklable, or to ignore state that should not be pickled. For instance, if your object’s dictionary contains some object that doesn’t support pickling, then you can add code to omit that or to serialize it in some special way that allows it to be restored, e.g. by extracting a state dictionary fom it and then restoring it from the dictionary later. See the pickle docs for the
__getstate__
and__setstate__
methods that you can implement on your Parameter or Parameterized objects to override this behavior. Be sure to callsuper(YourClass,self).__setstate__(state)
or the getstate equivalent so that you also store parameters and dictionary values as usual, if desired.Loading old pickle files: If you use pickles extensively, you may find yourself wanting to support pickle files generated by an older version of your own code, even though your code has since changed (with renamed modules, classes, or parameters, or options that are no longer supported, etc.). By default, unpickling will raise an exception if it finds information in your pickle file that does not match the current Python source code, but it is possible to add custom handling to translate old definitions to match current code, discard no-longer-used options, map from a previous approach into the current approach, etc. You can use
__getstate__
and__setstate__
on your top-level object or on specific other classes to do just about anything like this, though it can get complicated to reason about. Best practice is to store the module version number or other suitable identifier as an attribute or Parameter on the top-level object to declare what version of the code was used to create the file, and you can then read this identifier later to determine whether you need to apply such conversions on reloading.
Serializing with JSON#
JSON is a human-readable string representation for nested dictionaries of key-value pairs. Compared to pickle, JSON is a much more limited representation, using a fixed set of types mapped to string values, and not natively supporting Python-specific types like tuples or custom Python objects. However, it is widely accepted across computer languages, and because it is human readable and editable and omits the detailed internal state of objects (unlike pickle), JSON works well as an interchange or configuration format.
Param’s JSON support is currently fairly limited, with support for serializing and deserializing individual (not nested) Parameterized objects. It is currently primarily used for synchronizing state “across the wire”, e.g. between multiple apps running on different machines that communicate changes to shared state (e.g. for a remote GUI), but as proposed in issue#520 it could be extended to be a general configuration and specification mechanism by adding conventions for specifying a Parameterized type for an object and its nested objects.
To see how it currently works, let’s start with a Parameterized object containing Parameters of different types:
import param, datetime, pandas as pd
df = pd.DataFrame({'A':[1,2,3], 'B':[1.1,2.2,3.3]})
simple_list = [1]
class P(param.Parameterized):
a = param.Integer(default=5, doc='Int', bounds=(2,30), inclusive_bounds=(True, False))
e = param.List([1,2,3], class_=int)
g = param.Date(default=datetime.datetime.now())
l = param.Range(default=(1.1,2.3), bounds=(1,3))
m = param.String(default='baz', allow_None=True)
s = param.DataFrame(default=df, columns=(1,4), rows=(2,5))
p = P(a=29)
p
/tmp/ipykernel_3062/2877059514.py:9: ParamDeprecationWarning: The 'class_' attribute on 'List' is deprecated. Use instead 'item_type'
e = param.List([1,2,3], class_=int)
P(a=29, e=[1, 2, 3], g=datetime.datetime(2024, 6, 24, 21, 39, 27, 439336), l=(1.1, 2.3), m='baz', name='P00004', s= A B
0 1 1.1
1 2 2.2
2 3 3.3)
To serialize this Parameterized object to a JSON string, call .serialize_parameters()
on it:
s = p.param.serialize_parameters()
s
'{"name": "P00004", "a": 29, "e": [1, 2, 3], "g": "2024-06-24T21:39:27.439336", "l": [1.1, 2.3], "m": "baz", "s": [{"A": 1, "B": 1.1}, {"A": 2, "B": 2.2}, {"A": 3, "B": 3.3}]}'
Notice that the serialization includes not just the values set specifically on this instance (a=29
), but also all the default values inherited from the class definition.
You can easily select only a subset to serialize, if you wish:
p.param.serialize_parameters(subset=['a','m'])
'{"a": 29, "m": "baz"}'
The JSON string can be saved to disk, sent via a network connection, stored in a database, or for any other usage suitable for a string.
Once you are ready to deserialize the string into a Parameterized object, you’ll need to know the class it came from (here P
) and can then call its deserialize_parameters
method to get parameter values to use in P
’s constructor:
p2 = P(**P.param.deserialize_parameters(s))
p2
P(a=29, e=[1, 2, 3], g=datetime.datetime(2024, 6, 24, 21, 39, 27, 439336), l=(1.1, 2.3), m='baz', name='P00004', s= A B
0 1 1.1
1 2 2.2
2 3 3.3)
As you can see, we have successfully serialized our original object p
into a new object p2
, which could be in a different Python process on a different machine or at a different date.
JSON limitations and workarounds#
To see the limitations on Param’s JSON support, let’s look at how it works in more detail. Because the result of serialization (s
above) is a valid JSON string, we can use the json
library to unpack it without any knowledge of what Parameterized class it came from:
import json
dj = json.loads(s)
dj
{'name': 'P00004',
'a': 29,
'e': [1, 2, 3],
'g': '2024-06-24T21:39:27.439336',
'l': [1.1, 2.3],
'm': 'baz',
's': [{'A': 1, 'B': 1.1}, {'A': 2, 'B': 2.2}, {'A': 3, 'B': 3.3}]}
The result is a Python dictionary of name:value pairs, some of which you can recognize as the original type (e.g. a=29
), others that have changed type (e.g. l=(1.1,2.3)
or s=pd.DataFrame({'A':[1,2,3], 'B':[1.1,2.2,3.3]})
), and others that are still a string encoding of that type (e.g. g=datetime.datetime(...)
)). If you try to pass this dictionary to your Parameterized constructor, any such value will be rejected as invalid by the corresponding Parameter:
with param.exceptions_summarized():
P(**dj)
ValueError: Date parameter 'P.g' only takes datetime and date types, not <class 'str'>.
That’s why instead of simply json.loads(s)
, we do P.param.deserialize_parameters(s)
, which uses the knowledge that P.l
is a tuple parameter to convert the resulting list [1.1, 2.3]
into a Python tuple (1.1, 2.3)
as required for such a parameter:
print(dj['l'])
print(p2.l)
[1.1, 2.3]
(1.1, 2.3)
Similarly, parameters of type param.Array
will unpack the list representation into a NumPy array, param.DataFrame
unpacks the list of dicts of list into a Pandas DataFrame, etc. So, the encoding for your Parameterized object will always be standard JSON, but to deserialize it fully into a Parameterized, you’ll need to know the class it came from, or Param will not know that the list it finds was originally a tuple, dataframe, etc.
For this reason, any Parameter that itself contains a Parameterized object will not be able to be JSON deserialized, since even if we knew what class it was (e.g. for param.ClassSelector(class_=param.Number)
, it could be some subclass of that class. Because the class name is not currently stored in the JSON serialization, there is no way to restore it. Thus there is currently no support for JSON serializing or deserializing nested Parameterized objects.
We do expect to add support for nested objects using something like the convention for datetime objects; see issue#520.
JSON Schemas#
If you want to use your JSON representation in a separate process where Param is not available or perhaps in a different language altogether, Param can provide a JSON schema that specifies what type you are expecting for each Parameter. The schema for a given Parameterized can be obtained using the schema
method:
p.param.schema()
{'name': {'anyOf': [{'type': 'string'}, {'type': 'null'}],
'description': 'String identifier for this object.',
'title': 'Name'},
'a': {'type': 'integer',
'minimum': 2,
'exclusiveMaximum': 30,
'description': 'Int',
'title': 'A'},
'e': {'type': 'array', 'items': {'type': 'integer'}, 'title': 'E'},
'g': {'type': 'string', 'format': 'date-time', 'title': 'G'},
'l': {'type': 'array',
'minItems': 2,
'maxItems': 2,
'additionalItems': {'type': 'number', 'minimum': 1, 'maximum': 3},
'title': 'L'},
'm': {'anyOf': [{'type': 'string'}, {'type': 'null'}], 'title': 'M'},
's': {'type': 'array',
'items': {'type': 'object', 'minItems': 1, 'maxItems': 4},
'minItems': 2,
'maxItems': 5,
'title': 'S'}}
Once you have the schema, you can validate that a given JSON string matches the schema, i.e. that all values included therein match the constraints listed in the schema:
from jsonschema import validate
d = json.loads(s)
full_schema = {"type" : "object", "properties" : p.param.schema()}
validate(instance=d, schema=full_schema)
If one of the parameter values fails to match the provided schema, you’ll get an exception:
d2 = d.copy()
d2['a']='astring'
with param.exceptions_summarized():
validate(instance=d2, schema=full_schema)
ValidationError: 'astring' is not of type 'integer'
Failed validating 'type' in schema['properties']['a']:
{'description': 'Int',
'exclusiveMaximum': 30,
'minimum': 2,
'title': 'A',
'type': 'integer'}
On instance['a']:
'astring'
The param.schema()
call accepts the same subset
argument as .param.serialize_parameters()
, letting you serialize and check only a subset of the parameters if appropriate.
You can also supply a safe=True
argument that checks that all parameter values are guaranteed to be serializable and follow the given schema. This lets you detect if there are any containers or parameters whose type is not fully specified:
with param.exceptions_summarized():
full2 = {"type" : "object", "properties" : p.param.schema(safe=True)}
validate(instance=d, schema=full2)
UnsafeserializableException: DataFrame is not guaranteed to be safe for serialization as the column dtypes are unknown
script_repr#
Parameterized objects can be constructed through a series of interactive actions, either in a GUI or command line, or as the result of automated scripts and object-construction functions. Any parameter values can also be changed at any moment once that object has been created. If you want to capture the resulting Parameterized object with any such additions and changes, you can use the param.script_repr()
function. script_repr
returns a representation of that object and all nested Parameterized or other supported objects as Python code that can recreate the full object later. This approach lets you go flexibly from an interactive or indirect way of creating or modifying objects, to being able to recreate that specific object again for later use. Programs with a GUI interface can use script_repr()
as a way of exporting a runnable version of what a user created interactively in the GUI.
For example, let’s construct a Parameterized object p
containing Parameters whose values are themselves Parameterized objects with their own Parameters:
import param
class Q(param.Parameterized):
a = param.Number(39, bounds=(0,50))
b = param.String("str")
class P(param.Parameterized):
c = param.ClassSelector(default=Q(), class_=Q)
d = param.ClassSelector(default=param.Parameterized(), class_=param.Parameterized)
e = param.Range((0,1))
q = Q(b="new")
p=P(c=q, e=(2,3))
p
P(c=Q(a=39, b='new', name='Q00048'), d=Parameterized(name='Parameterized00051'), e=(2, 3), name='P00049')
We can get a script representation for this object by calling script_repr(p)
:
print(param.script_repr(p))
import param.parameterized
import __main__
import param
__main__.P(c=__main__.Q(b='new'),
d=param.parameterized.Parameterized(),
e=(2,3))
As you can see, this representation encodes the fact that P
was defined in the main namespace, generated inside this notebook. As you might expect, this representation has the same limitation as for pickle
– only classes that are in importable modules will be runnable; you’ll need to save the source code to your classes in a proper Python module if you want the resulting script to be runnable. But once you have done that, you can use the script_repr
to get a runnable version of your Parameterized object no matter how you created it, whether it was by selecting options in a GUI, adding items via a loop in a script, and so on.
script_repr limitations and workarounds#
Apart from making sure your functions and classes are all defined in their own importable modules, there are various considerations and limitations to keep in mind if you want to support using script_repr
.
Normally, script_repr prints only parameter values that have changed from their defaults; it is designed to generate a script as close as is practical to one that a user would have typed to create the given object. If you want a record of the complete set of parameter values, including all defaults, you can enable that behavior:
import param.parameterized
param.parameterized.script_repr_suppress_defaults=True
The resulting output is then suitable for archiving the full parameter state of that object, even if some default later gets changed in the source code. Note that Param is not able to detect all cases where a default value is unchanged, e.g. for Parameters with instantiate=True
, which will always be treated as changed since each instance has a copy of that Parameter value independent of the original default value.
You can control script_repr
with keyword arguments:
imports=[]
: If desired, a list of imports that can be built up over multiple script_repr calls to collect a full set of imports required for a script. Useful withshow_imports=False
except on the last script_repr call. Can be an empty list or a list containing some hard-coded imports needed.prefix="\n "
: Optional prefix to use before a nested object.qualify=True
: Whether the class’s path will be included (e.g. “a.b.C()”), otherwise only the class will appear (“C()”).unknown_value=None
: determines what to do where a representation cannot be generated for something required to recreate the object. Such things include non-parameter positional and keyword arguments, and certain values of parameters (e.g. some random state objects). Supplying anunknown_value
ofNone
causes unrepresentable things to be silently ignored. Ifunknown_value
is a string, that string will appear in place of any unrepresentable things. Ifunknown_value
isFalse
, an Exception will be raised if an unrepresentable value is encountered.separator="\n"
: Separator to use between parameters.show_imports=True
: Whether to include import statements in the output.
The script_repr
behavior for a particular type, whether it’s a Parameterized object or not, can be overridden to provide any functionality needed. Such overrides are stored in param.parameterized.script_repr_reg
, which already contains handling for list and tuple containers, various objects with random state, functions, and modules. See examples in
param.parameterized
.