SY

5 min
numpypython

JSONEncoder for numpy datatypes

Did you ever face a TypeError: Object of type ndarray is not JSON serializable,
TypeError: Object of type int64 is not JSON serializable or something along those lines? I certainly did! This is happening because you are trying to serialize a numpy datatype using the default JSONEncoder included in the Python standard library.

I frequently use pandas and numpy in my projects and I have to store the results in a NoSQL database. The results are usually nested dictionaries generated dynamically in the algorithm. Initially I was explicitly type casting the numpy datatypes to pure python datatypes before converting to a json string. But it quickly became cumbersome and code started to look... lets say not so clean. So I started looking for a better solution. I found a stackoverflow answer 1 and a linked GitHub discussion 2 that suggested using a custom JSONEncoder to handle the numpy numeric and array datatypes.

While np.number and np.ndarray type objects throw a TypeError when trying to serialize them, np.nan and np.inf are gracefully serialized as NaN and Infinity respectively. But according to JSON specification 3, NaN and Infinity are not valid JSON values. So if we want a valid JSON string which we can eventually save into a NoSQL database, we need to handle these datatypes as well. Pure python float('inf') and float('nan') also face the same issue. Let's see how we can handle all these datatypes.

A quick look at the stdlib documentation.

To use a custom JSONEncoder subclass (e.g. one that overrides the .default() method to serialize additional types), specify it with the cls kwarg; otherwise JSONEncoder is used.

This is from the docstring of the dumps method in the json module. If we have a custom datatype that is not serializable by the default JSONEncoder, we can create a custom encoder class to define how to serialize that datatype. This custom class inherits the default JSONEncoder and overrides the default method of the JSONEncoder class.

Let's see how we can apply this to the numpy datatypes.

Here are the numpy datatypes that are not serializable by the default JSONEncoder and the python datatypes they closely resemble.

Table 1: Numpy Datatypes and their Python counterparts
Numpy DatatypePure Python Datatype
numpy.int8int
numpy.int16int
numpy.int32int
numpy.int64int
numpy.float16float
numpy.float32float
numpy.float64float
numpy.ndarraylist

We can create a custom encoder class that will convert the numpy datatypes to the corresponding python datatypes.

numpy_encoder.py
import json
import numpy as np
 
 
class NumpyEncoder(json.JSONEncoder):
    """
    This encoder can be used to convert incompatible numpy data types 
    to types compatible with json.dumps()
    Use like json.dumps(output, cls=NumpyEncoder)
    """
    def default(self, o):
        if isinstance(o, np.integer):
            return int(o)
 
        elif isinstance(o, np.floating):
            return float(o)
 
        elif isinstance(o, np.ndarray):
            return o.tolist()
 
        return json.JSONEncoder.default(self, o)

Usage

import json
import numpy as np
 
 
data = {
    'int': np.int64(42),
    'float': np.float64(3.14),
    'array': np.array([1, 2, 3, 4, 5])
}
 
json_data = json.dumps(data)  # This will raise a TypeError
 
json_data = json.dumps(data, cls=NumpyEncoder)  # This will work

This still serializes np.nan and np.inf as NaN and Infinity respectively. The dumps method has allow_nan argument which can be set to False, but this just raises a ValueError if the input contains nan or inf.

simplejson to the rescue

pip install simplejson

simplejson is an externally maintained JSON encoder/decoder with a similar interface. 4 The dumps method in simplejson has an argument ignore_nan which can be set to True to serialize nan, inf and -inf to null. null is a valid JSON value. So lets swap out the json module with simplejson and see how it works.

numpy_encoder.py
import simplejson as json
import numpy as np
 
 
class NumpyEncoder(json.JSONEncoder):
    """
    This encoder can be used to convert incompatible data types 
    to types compatible with json.dumps()
    Use like json.dumps(output, ignore_nan=True, cls=NumpyEncoder)
    """
    def default(self, o):
        if isinstance(o, np.integer):
            return int(o)
 
        elif isinstance(o, np.floating):
            return float(o)
 
        elif isinstance(o, np.ndarray):
            return o.tolist()
 
        return json.JSONEncoder.default(self, o)

Usage

import simplejson as json
import numpy as np
 
 
data = {
    'int': np.int64(42),
    'float': np.float64(3.14),
    'array': np.array([1, 2, 3, 4, 5]),
    'nan': np.nan,
    'inf': np.inf,
    '-inf': -np.inf
}
 
json_data = json.dumps(data, ignore_nan=True, cls=NumpyEncoder)
print(json_data)

This will output a valid JSON string.

{
    "int": 42,
    "float": 3.14,
    "array": [1, 2, 3, 4, 5],
    "nan": null,
    "inf": null,
    "-inf": null
}

Conclusion

We have seen how to create a custom JSONEncoder to handle numpy datatypes and how to serialize nan, inf and -inf as null using simplejson. This is a simple and clean solution to the problem of serializing numpy datatypes. This will make the code cleaner and more readable.

Caveats

Footnotes

  1. https://stackoverflow.com/a/49677241/11030653

  2. https://github.com/mpld3/mpld3/issues/434#issuecomment-340255689

  3. https://www.json.org/

  4. https://simplejson.readthedocs.io/en/latest/