Did you ever face a TypeError: Object of type ndarray is not JSON serializable
,
TypeError: Object of type int64 is not JSON serializable
or something along those
lines? I certainly did! This is happening because you are trying to serialize a numpy
datatype using the default JSONEncoder included in the Python standard library.
I frequently use pandas and numpy in my projects and I have to store the results in a NoSQL database. The results are usually nested dictionaries generated dynamically in the algorithm. Initially I was explicitly type casting the numpy datatypes to pure python datatypes before converting to a json string. But it quickly became cumbersome and code started to look... lets say not so clean. So I started looking for a better solution. I found a stackoverflow answer 1 and a linked GitHub discussion 2 that suggested using a custom JSONEncoder to handle the numpy numeric and array datatypes.
While np.number
and np.ndarray
type objects throw a TypeError when trying to
serialize them, np.nan
and np.inf
are gracefully serialized as NaN
and Infinity
respectively. But according to JSON specification 3, NaN
and Infinity
are not
valid JSON values. So if we want a valid JSON string which we can eventually save into
a NoSQL database, we need to handle these datatypes as well. Pure python float('inf')
and float('nan')
also face the same issue. Let's see how we can handle all these
datatypes.
To use a custom
JSONEncoder
subclass (e.g. one that overrides the.default()
method to serialize additional types), specify it with thecls
kwarg; otherwiseJSONEncoder
is used.
This is from the docstring of the dumps
method in the json
module. If we have a
custom datatype that is not serializable by the default JSONEncoder
, we
can create a custom encoder class to define how to serialize that datatype. This custom
class inherits the default JSONEncoder
and overrides the default
method of the
JSONEncoder
class.
Here are the numpy datatypes that are not serializable by the default JSONEncoder
and
the python datatypes they closely resemble.
Numpy Datatype | Pure Python Datatype |
---|---|
numpy.int8 | int |
numpy.int16 | int |
numpy.int32 | int |
numpy.int64 | int |
numpy.float16 | float |
numpy.float32 | float |
numpy.float64 | float |
numpy.ndarray | list |
We can create a custom encoder class that will convert the numpy datatypes to the corresponding python datatypes.
This still serializes np.nan
and np.inf
as NaN
and Infinity
respectively. The
dumps
method has allow_nan
argument which can be set to False
, but this just
raises a ValueError
if the input contains nan
or inf
.
pip install simplejson
simplejson is an externally maintained JSON encoder/decoder with a similar interface. 4
The dumps
method in simplejson has an argument ignore_nan
which can be set to
True
to serialize nan
, inf
and -inf
to null
. null
is a valid JSON value. So
lets swap out the json
module with simplejson
and see how it works.
This will output a valid JSON string.
We have seen how to create a custom JSONEncoder to handle numpy datatypes and how to
serialize nan
, inf
and -inf
as null
using simplejson. This is a simple and
clean solution to the problem of serializing numpy datatypes. This will make the code
cleaner and more readable.
json.loads
. You will
have to explicitly convert the python datatypes back to numpy datatypes.