In data management, a data contract is an agreement between data producers and data consumers. It contains a detailed schema creating a link between business (logical representation of the data) and technology (its physical implementation). A data contract also describes advanced metadata, such as data quality rules, SLA, and behavior. Data contracts can take several forms, but YAML is very common.
The Linux Foundation project Bitol has published a data contract standard called Open Data Contract Standard (ODCS). Its current version is 3.0.2.
In December 2021, Andrew Jones at GoCardless wrote about how they were using Data Contracts, and in October 2022 wrote about their implementation.
In August 2022, Jean-Georges Perrin published in the PayPal Technology Blog a popular reference article where he describes the use of data contracts in a Data Mesh implementation. A little later, in May 2023, PayPal open-sourced its Data Contract Template.
In June 2023, Andrew Jones published Driving Data Quality with Data Contracts: A comprehensive guide to building reliable, trusted, and effective data platforms, which is, up to now, the only published book on this topic.
In November 2023, Bitol, a Linux Foundation project, released the first version of ODCS (Open Data Contract Standard), a compatible fork from the PayPal template.
In September 2024, Ronald Angel at Miro wrote about their implementation of data contracts.
In October 2024, Bitol released ODCS v3.0.0 with enhanced support for data quality.
The Apache 2.0-based Bitol project divides data contracts into several sections:
Data contracts are gaining popularity as Data Products are gaining traction.
Usually, a data contract is created by one data producer for one or many data consumers.
A data contract is designed to be enhanced iteratively. Data engineers can start with the few elements in the header and the schema. Over time, data engineers and owners can add more information, like data quality and SLA.
Most data contracts are implemented using a YAML file, which is both human -and computer-readable and language-agnostic.
The symbol for a data contract is either an equilateral triangle (rotated 90ð) â symbolizing schema, business meaning, and SLAs or a file icon.