Checking the digital paperwork
A final, technically interesting piece of the Centaman integration is the JSON document validation we applied to make sure the data we receive from the Centaman API has the structure and content we expect.
Because the Australian Museum website relies so much on a full understanding of the Centaman data, it is important for the code we write to be very clear about which JSON data fields we are using and how we are using them. An excellent way to do that is to run each document through a validation process that checks what fields are present and whether they have the kinds of values we expect, and complains loudly if something doesn't look right.
We did exactly this for the Australian Museum website using the Cerberus validation library for Python. Whenever the site fetches a document from the Centaman API it performs a validation check on it before it does anything else: before the document is saved to the website database, processed, or has values shown on a webpage.
This early validation step gives us a chance to enforce various rules:
- We can make sure that any fields we absolutely require are present in the document.
- We can enforce rules about the kinds of values we expect. For example, we can ensure that some fields have a specific value type such as a number or timestamp if we need to sort them, while we can accept any textual value at all for other fields if we merely display them.
Guaranteeing that any incoming documents comply with these rules leads to some nice benefits:
- If a document breaks any of our rules, we receive an error report telling us exactly what was in the document and why it was rejected. This makes it much quicker and easier to find and fix any mistaken assumptions we have about the Centaman API.
- Document validation errors tend to be clearer and easier to debug than accidental misuse of data from within a document.
- If the API result documents change over time in a way we don't expect, we will find out about it straight away rather than the next time that document happens to be (mis)used by the website code.
In addition to these code-quality benefits, the validation specifications also act as extra developer-readable documentation about what various API result documents contain and which parts are important to us. This allows developers to work on the site without needing a full understanding of the Centaman API, because they can easily see which parts of the API data we do or don't use.
Cerberus works by accepting validation rules as expressed in schema documents. Although this schema format is very powerful and flexible, we found it could be fiddly to write and difficult to read. So instead of writing these schemas directly, we built a set of validation functions to help build the schema, with a focus on enforcing the kinds of rules and data checks we needed. With these helper functions we can express a validation schema that looks very similar to the real document it validates.
Here is an example validation schema for the Centaman Member document type, built using our helper functions from a validation module: