Stop buying black boxes: Why Digital Preservation Must Include Metadata in Collections Management

16 May 2025

Inspiration, Ideas

Introduction

In recent years, cultural institutions have made significant strides in digital preservation, focusing primarily on the safeguarding of digital assets such as images, documents, videos, and audio recordings. While these efforts are commendable and vital, they often overlook an equally critical component of digital heritage: metadata.

Digital preservation is not merely about securing digital files. It is about preserving the context, meaning, and relationships that give those files significance. In collections management, metadata performs this essential role. It describes, organises, and links the objects in a collection to people, places, events, and themes. Without it, a digital image is just a digital file—separated from its meaning and provenance. Yet the preservation of metadata, the very structure of knowledge, is often ignored or treated as an afterthought in digital preservation strategies.

Why Metadata Matters

A robust digital preservation policy must include the preservation of metadata stored in Collections Management Systems (CMS). This is not simply a matter of regular backups or exportable reports. It is about ensuring that the metadata is accessible, readable, and reusable even if the original CMS interface becomes obsolete or unavailable. Institutions must be able to extract the data from their databases in a way that is meaningful, consistent, and complete.

Metadata not only supports discovery and interpretation, but also ensures the authenticity, integrity, and longevity of digital assets. Without metadata, even the most carefully preserved digital images completely lose their value. They become contextless files with no historical or scholarly relevance.

Lessons from Migration: Real-World Challenges

In practice, however, this ideal is rarely achieved. Having migrated data into our Qi Collections Management System from a wide range of competitors' systems (and yours is probably one of them), we have consistently found that the underlying database structures in these systems are opaque and illogical. Tables and fields are frequently named with internal or abbreviated codes that do not reflect their purpose or contents. I have seen some real horrific data structures in the backend of database we have salvaged data from. For example:

Tables and field names written in Swiss German dialects, coupled with fields that conflate multiple data types—such as names, dates, and locations all jumbled into a single column—making any systematic extraction or interpretation nearly impossible
A central table storing all records as a single XML-formatted string within one column—rendering standard querying impossible—with values encoded using obscure, undocumented two-letter acronyms that bear no semantic relationship to the data they represent
Relationship data between multiple tables stored as obscure numeric codes with no supporting documentation—making it nearly impossible to determine what the source and target entities are or how they relate, even when access to the database is granted

This obscurity turns migrations into forensic investigations. Understanding the schema requires proprietary knowledge, vendor support, or laborious trial and error. As a result, vital data may be misinterpreted, mangled, or even lost altogether. We have encountered cases where object titles, creator attributions, or acquisition histories were at risk of being corrupted during migration due to unclear or misleading table and field names.

When a CMS reaches its end of life, or an institution decides to switch platforms, the metadata it contains should be easily extractable by anyone with access to the database, without having to rely on the incumbent vendor.

If the structure is inherently unintelligible, the process becomes one of reverse engineering—time-consuming, error-prone, and often incomplete.

Qi's Transparent Approach

By contrast, Qi has been developed with long-term preservation and accessibility in mind. Its database uses table and field names that are descriptive and self-explanatory. A table of "actors" (i.e. people and organisations who "act" on objects) is called exactly that. A field for the title of an object is named title, within a table called object. A table to relate people to objects is named object_actor_xrefs and it has fields named object_id and actor_id.

All of this seems logical and it should be taken for granted, correct? Yet, even though this may seem obvious, but it marks a significant departure from legacy systems. Most of them. And the bizarre thing is: most other CMS vendors still sell systems with such an illogical data structure.

Qi's design principle is not only beneficial for everyday use by technical teams and developers; it is essential for digital preservation. Should the Qi interface ever become inaccessible, or if your organisation simply decides to migrate away from it, institutions will still be able to access, interpret, and repurpose their data directly from the database.

The database schema is documentation in itself.

The Archival Standard for Metadata Structures

This is the crux of the matter: metadata must outlive the software used to manage it. Just as we expect digital image files to be stored in standardised, open formats (such as TIFF or JPEG), we must demand the same of metadata structures. Open, transparent, and logical database schemas are the metadata equivalent of archival-quality file formats.

This shift in mindset requires cultural institutions to become more proactive and technically literate in their CMS procurement processes. During vendor evaluations, institutions should request access to the database schema and assess its clarity. If tables and fields are cryptically named, if they rely on undocumented joins or hardcoded relationships, alarm bells should ring.

A Checklist for Sustainable Metadata Preservation

When selecting or auditing a CMS, institutions should consider the following questions:

Are the table and field names descriptive and intuitive?
Can metadata be exported in open, non-proprietary formats (e.g., CSV, XML, JSON)?
Is the database schema well-documented and accessible?
Can the database be queried directly using standard tools (e.g., SQL)?
In the event of interface failure, can the data still be meaningfully interpreted?

The answers to these questions can make the difference between a resilient, future-proof system and one that locks metadata into a black box.

Conclusion: Preserving Meaning Alongside Media

In an era where sustainability, openness, and long-term thinking are rightly prioritised, we can no longer afford to treat metadata as a secondary concern. Digital preservation must extend to metadata in all its forms: descriptive, structural, technical, and administrative. This is not just about good database design; it is about ensuring that the meaning and richness of cultural collections endure beyond the lifespan of any single software product.

The future of heritage depends not only on the images and recordings we preserve but also on the metadata that makes them intelligible. As stewards of memory, we must ensure both are preserved with equal care.

Preserving metadata is preserving meaning. And without meaning, we preserve nothing at all.

Stop buying black boxes: Why Digital Preservation Must Include Metadata in Collections Management

16 May 2025

Inspiration, Ideas

Introduction

Why Metadata Matters

Lessons from Migration: Real-World Challenges

Qi's Transparent Approach

The Archival Standard for Metadata Structures

A Checklist for Sustainable Metadata Preservation

Conclusion: Preserving Meaning Alongside Media

Want to learn more about our approach, software & projects? Get in touch

United Kingdom
Headquarters

Singapore
Reseller

Stop buying black boxes: Why Digital Preservation Must Include Metadata in Collections Management

16 May 2025

Inspiration, Ideas

Introduction

Why Metadata Matters

Lessons from Migration: Real-World Challenges

Qi's Transparent Approach

The Archival Standard for Metadata Structures

A Checklist for Sustainable Metadata Preservation

Conclusion: Preserving Meaning Alongside Media

Want to learn more about our approach, software & projects? Get in touch

United Kingdom Headquarters

Singapore Reseller

United Kingdom
Headquarters

Singapore
Reseller