As we continue our series on the Art Gallery of South Australia’s (AGSA’s) new website and digital transformation process, we turn our focus in this post to our work on integrating AGSA’s collection data from EMu, one of the world's leading collection management systems.
AGSA stores its collection data in EMu. This includes works of art, artists, images, physical locations, taxonomies, cultural and linguistic backgrounds, and much more.
What were we aiming to do?
We were tasked with finding the best way to display data and images from the AGSA collection to the public, which is accessible via the recently launched AGSA Collection Search. The AGSA Collection Search has more than 9,400 works published, out of a total almost five times as large.
Here are some of the principles that guided our design of this system:
- A search interface should allow the public to investigate the collection and discover both works and artists.
- CMS users should be able to link to collection records and reuse collection images when editing pages.
- Changes to collection data should be displayed to the public as soon as possible.
- Gallery staff should be able to use EMu to control whether records are displayed by the CMS, as well as how much of their content will be displayed.
EMu integration: high-level overview
Integrating a system as complex as EMu into a CMS involves a large number of moving parts, with multiple systems communicating between one another and following certain steps. For the sake of brevity, let's dive into the process for retrieving an image from EMu and adding it to AGSA’s CMS.
First, an automated task connects to EMu and fetches any new or modified records. These records pass through a pipeline, which includes processing of high-resolution images. At the final stage, these newly processed records and images are uploaded to the CMS.
After the CMS has received uploads, it begins to examine the new records and looks for those that are flagged as ready for public display. Those that are permitted for display are then used to create derivative data sets that are ready for use within the CMS’s front-end, search engine and admin user interface.
As with any bespoke implementation, the integration of EMu threw up some interesting challenges. We at the IC are renowned for tackling gnarly technical implementations and finding ways to integrate systems that don’t normally work together. IC cofounder, Alastair Weakley, has been known for saying on many occasions that “we like interesting problems”. Here’s some of the interesting problems we encountered when working on this integration.
Reading data from EMu, via IMu
EMu provides the IMu module, which exposes a low-level socket-based API that allows EMu’s data to be queried and read. As the vendor did not have a high-level Python library available, we needed to implement the functionality that would enable us to connect to IMu, run queries and fetch results.
The vendor does provide IMu libraries in other languages, as with as some limited technical documentation that covered some high-level overviews and basic use cases. Unfortunately, this left us with the task of reverse-engineering IMu’s transport mechanisms and messaging protocols. Using their Perl library allowed us to send messages to IMu, while we used Wireshark to trap their network payloads. With their payloads easily inspectable, we could quickly reverse-engineer their API and build out a system that enabled us to pull data from EMu.
The labours of this work have been open sourced to assist any other developers encountering similar problems.
Preparing AGSA’s collection data for display
AGSA’s EMu system is maintained behind a virtual private network (VPN) that does not allow direct connections from the public, let alone the CMS.
To connect to EMu we needed to run within the gallery’s VPN, so we setup a periodic task on a sync server that can consume EMu’s data and then push it to our CMS.
Our initial implementation involved some data preparation on the sync server, so that the CMS would receive display-ready data. However, we quickly realised that any changes to the publishing or display behaviour would necessitate changes to both the sync server and the CMS. So we refactored both systems so that they maintain an intermediary representation that closely resembles EMu’s structure.
This new representation enabled us to rapidly push data to the CMS, which now acts as the single source of truth for our display logic. This simplified architecture allows us to quickly iterate on changes to the codebase, which helps us respond to the changing needs of the gallery.
Transforming large multimedia assets
The gallery uses high-resolution TIFF images for works of art, events, artists and more. These images are often measured in the hundreds of megabytes and hence are not directly viable for use on the web.
To ensure that we could display these images without degradation or loss of information, we built an asset pipeline optimised for asset size. The pipeline would consume the source TIFFs and then output full-sized, colour-corrected JPEGs that are typically 1-5% the size of the source.
Due to the heavy compression in the JPEG format, these files still contain the entire source image, but are immediately ready for use in the web environment. We will be writing in more detail about this in an upcoming post.
More to follow: image editing, zoom, CRM integration
Look out for our upcoming posts on image editing and the other integrations we undertook – including a bespoke CRM integration – as part of AGSA’s new site launch. In the meantime, check out AGSA's Online Collection Search, and stay tuned for our next installments by signing up to the IC’s newsletter or follow us on Twitter.