Showing preview only (1,741K chars total). Download the full file or copy to clipboard to get everything.
Repository: datacontract/datacontract-specification
Branch: main
Commit: 145852c67604
Files: 55
Total size: 1.7 MB
Directory structure:
gitextract_glrnu_dz/
├── .github/
│ ├── validate-examples
│ └── workflows/
│ └── ci.yaml
├── .gitignore
├── CHANGELOG.md
├── CNAME
├── LICENSE
├── README.md
├── _config.yml
├── _layouts/
│ └── default.html
├── datacontract.init.yaml
├── datacontract.schema.json
├── definition.schema.json
├── diagrams/
│ ├── automation.drawio
│ ├── datacontract.drawio
│ └── favicon.drawio
├── examples/
│ ├── covid-cases/
│ │ ├── datacontract.html
│ │ └── datacontract.yaml
│ ├── datacontract.html
│ ├── generate-catalog
│ ├── index.html
│ ├── muellimperium/
│ │ ├── data.csv
│ │ ├── datacontract.html
│ │ └── datacontract.yaml
│ ├── orders-latest/
│ │ ├── datacontract.html
│ │ └── datacontract.yaml
│ ├── orders-latest-nested/
│ │ ├── datacontract.html
│ │ └── datacontract.yaml
│ ├── time-example/
│ │ ├── datacontract.html
│ │ └── datacontract.yaml
│ └── variant-json-example/
│ └── datacontract.yaml
├── gen-openapi-yaml
├── versions/
│ ├── 0.9.0/
│ │ ├── README.md
│ │ ├── datacontract.init.yaml
│ │ └── datacontract.schema.json
│ ├── 0.9.1/
│ │ ├── README.md
│ │ ├── datacontract.init.yaml
│ │ └── datacontract.schema.json
│ ├── 0.9.2/
│ │ ├── README.md
│ │ ├── datacontract.init.yaml
│ │ └── datacontract.schema.json
│ ├── 0.9.3/
│ │ ├── README.md
│ │ ├── datacontract.init.yaml
│ │ ├── datacontract.schema.json
│ │ └── definition.schema.json
│ ├── 1.1.0/
│ │ ├── README.md
│ │ ├── datacontract.init.yaml
│ │ ├── datacontract.schema.json
│ │ └── definition.schema.json
│ ├── 1.2.0/
│ │ ├── datacontract.init.yaml
│ │ ├── datacontract.schema.json
│ │ └── definition.schema.json
│ └── 1.2.1/
│ ├── datacontract.init.yaml
│ ├── datacontract.schema.json
│ └── definition.schema.json
└── workshop.md
================================================
FILE CONTENTS
================================================
================================================
FILE: .github/validate-examples
================================================
#!/bin/bash
set -ex
#function datacontract() {
# docker run --rm -v "${PWD}:/home/datacontract" --platform linux/amd64 datacontract/cli:latest "$@"
#}
datacontract --version
SCHEMA=datacontract.schema.json
awk '/^```yaml$/{flag=1; next} /^```$/{print ""; flag=0; exit} flag' README.md > datacontract-from-readme.yaml
datacontract lint datacontract-from-readme.yaml --schema $SCHEMA
datacontract test --examples datacontract-from-readme.yaml --schema $SCHEMA
# Compare with example?
datacontract lint examples/orders-latest/datacontract.yaml --schema $SCHEMA
datacontract test --examples examples/orders-latest/datacontract.yaml --schema $SCHEMA
datacontract lint examples/orders-latest-nested/datacontract.yaml --schema $SCHEMA
datacontract test --examples examples/orders-latest-nested/datacontract.yaml --schema $SCHEMA || true # examples are not nested
datacontract lint examples/covid-cases/datacontract.yaml --schema $SCHEMA
datacontract test --examples examples/covid-cases/datacontract.yaml --schema $SCHEMA || true
================================================
FILE: .github/workflows/ci.yaml
================================================
on:
push:
pull_request:
workflow_call:
name: CI
jobs:
test:
if: false # skip as the example structure has changed with v1.1.0
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: 3.11
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install datacontract-cli[all]
datacontract --version
- name: Validate examples
run: .github/validate-examples
================================================
FILE: .gitignore
================================================
.idea/
*.bkp
datacontract.schema.openapi-format.*
.soda/
datacontract-from-readme.yaml
.duckdb/
================================================
FILE: CHANGELOG.md
================================================
# Changelog
All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
## [1.2.1] - 2025-09-24
### Added
- Support for data quality metrics that align with ODCS 3.1
### Changed
- Replaced threshold operators mustBeGreaterThanOrEqualTo with mustBeGreaterOrEqualTo and mustBeLessThanOrEqualTo with mustBeLessOrEqualTo to align with ODCS 3.1, even if it feels wrong...
## [1.2.0] - 2025-07-05
### Added
- Support for `models.additionalFields` to define if additional fields (columns) are allowed or not in the physical server ([#99](https://github.com/datacontract/datacontract-specification/pull/99))
- Add `time` data type ([#123](https://github.com/datacontract/datacontract-specification/issues/123))
- Added `variant` data type ([#113](https://github.com/datacontract/datacontract-specification/issues/113))
- Added `json` data types ([#112](https://github.com/datacontract/datacontract-specification/issues/112))
### Changed
- `server.type` changed from enum to simple string to support custom types ([#107](https://github.com/datacontract/datacontract-specification/pull/107))
## [1.1.0] - 2024-10-30
### Added
- Data quality on model and field level ([#55](https://github.com/datacontract/datacontract-specification/issues/55))
- Lineage support ([#90](https://github.com/datacontract/datacontract-specification/issues/90))
- Field and definition `examples` as array of any type, instead of `example` as a single value ([#29](https://github.com/datacontract/datacontract-specification/issues/29)
- Support for server-specific data types as config map ([#63](https://github.com/datacontract/datacontract-specification/issues/63))
- AWS Glue Catalog server support
- sftp server support
- info.status field
- oracle server support
- field.title attribute
- model.title attribute
- AWS Kinesis Data Streams server support
- field.links attribute
- Trino support
- Field `type: map` support with properties `keys` and `values`
- Definitions: `fields`, for type `object`, `record`, and `struct`
- Field `field.primaryKey` (Replaces `field.primary`)
- Field `model.primaryKey` to describe a composite primary key
- Add Redshift server properties `clusterIdentifier`, `endpoint`, `host` and `port`.
### Removed
- `definitions.domain` removed (use a hierarchical structure instead)
- `definitions.name` removed (use a hierarchical structure instead)
- `quality` on top-level removed
- `examples` on top-level removed
- `schema` removed in favor of encoding any physical schema configuration in the `model` using the `config` map at the field level and supporting import/export ([#21](https://github.com/datacontract/datacontract-specification/issues/21)).
### Deprecated
- `field.primary` (use `field.primaryKey` instead)
## [0.9.3] - 2024-03-06
### Added
- Service levels as a top level `servicelevels` element
- pubsub server support
- primary key and relationship support via `field.primary` and `field.references` attributes
- databricks server support improved
## [0.9.2] - 2024-01-04
### Added
- Format and validation attributes to fields in models and definitions
- Postgres support
- Databricks support
## [0.9.1] - 2023-11-19
### Added
- A logical data model (#13), mainly to simplify editor support with a defined schema, easier to detect breaking changes, and better Databricks support.
- Definitions (#14) for reusable semantic definitions within one data contract or across data contracts.
### Removed
- Property `info.dataProduct` as data products should define which data contracts they implement.
- Property `info.outputPort` as data products should define which data contracts they implement.
Those removals are not considered as breaking changes, as these attributes are now treated as specification extensions.
## [0.9.0] - 2023-09-12
First public release.
================================================
FILE: CNAME
================================================
datacontract-specification.com
================================================
FILE: LICENSE
================================================
MIT License
Copyright (c) 2023 Data Mesh Architecture
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
================================================
FILE: README.md
================================================
# Data Contract Specification
<a href="https://github.com/datacontract/datacontract-specification">
<img alt="Stars" src="https://img.shields.io/github/stars/datacontract/datacontract-specification" /></a>
<a href="https://datacontract.com/slack" rel="nofollow"><img src="https://img.shields.io/badge/slack-join_chat-white.svg?logo=slack&style=social" alt="Slack Status" data-canonical-src="https://img.shields.io/badge/slack-join_chat-white.svg?logo=slack&style=social" style="max-width: 100%;"></a>
> **Deprecation Notice**
> With the release of the [Open Data Contract Standard v3.1.0](https://github.com/bitol-io/open-data-contract-standard), we deprecate the Data Contract Specification in line with our commitment to focus on a single industry standard for data contracts. We have actively contributed to the Open Data Contract Standard in the TSC and will continue to support it.<br><br>
> If you are using Data Contract Specification, we recommend [migrating to the Open Data Contract Standard](#migration) within the next few months.<br>
> The Data Contract Specification will be supported in Data Contract CLI and Entropy Data until the end of 2026.

Data contracts bring data providers and data consumers together.
A _data contract_ is a document that defines the structure, format, semantics, quality, and terms of use for exchanging data between a data provider and their consumers.
Think of an API, but for data.
A data contract is implemented by a data product or other data technologies, even legacy data warehouses.
Data contracts can also be used for the input port to specify the expectations of data dependencies and verify given guarantees.
The _data contract specification_ defines a YAML format to describe attributes of provided data sets.
It is data platform neutral and can be used with any data platform, such as AWS S3, Google BigQuery, Azure, Databricks, and Snowflake.
The data contract specification is an open initiative to define a common data contract format.
It follows [OpenAPI](https://www.openapis.org/) and [AsyncAPI](https://www.asyncapi.com/) conventions.
If you haven't adopted a YAML format yet, we recommend to start directly with the [Open Data Contract Standard](https://github.com/bitol-io/open-data-contract-standard). It’s considered the conceptual successor and comes highly recommended.
Data contracts come into play when data is exchanged between different teams or organizational units, such as in a [data mesh architecture](https://www.datamesh-architecture.com/).
First, and foremost, data contracts are a communication tool to express a common understanding of how data should be structured and interpreted.
They make semantic and quality expectations explicit.
They are often created collaboratively in [workshops](./workshop.md) together with data providers and data consumers.
Later in development and production, they also serve as the basis for code generation, testing, schema validations, quality checks, monitoring, access control, and computational governance policies.
The specification comes along with the [Data Contract CLI](https://github.com/datacontract/datacontract-cli), an open-source tool to develop, validate, and enforce data contracts.
> _Note: The term "data contract" refers to a specification that is usually owned by the data provider and thus does not align with a "contract" in a legal sense as a mutual agreement between two parties.
> The term "contract" may be somewhat misleading, but it is how it is used by the industry.
> The mutual agreement between one data provider and one data consumer is the "data usage agreement" that refers to a data contract.
> Data usage agreements have a defined lifecycle, start/end date, and help the data provider to track who accesses their data and for which purposes._
Version
---
1.2.1([Changelog](CHANGELOG.md))
Example
---
View in [Data Contract Catalog](https://datacontract.com/examples/index.html)
```yaml
dataContractSpecification: 1.2.1
id: orders-latest
info:
title: Orders Latest
version: 2.0.0
description: |
Successful customer orders in the webshop.
All orders since 2020-01-01.
Orders with their line items are in their current state (no history included).
owner: Checkout Team
status: active
contact:
name: John Doe (Data Product Owner)
url: https://teams.microsoft.com/l/channel/example/checkout
servers:
production:
type: s3
environment: prod
location: s3://datacontract-example-orders-latest/v2/{model}/*.json
format: json
delimiter: new_line
description: "One folder per model. One file per day."
roles:
- name: analyst_us
description: Access to the data for US region
- name: analyst_cn
description: Access to the data for China region
terms:
usage: |
Data can be used for reports, analytics and machine learning use cases.
Order may be linked and joined by other tables
limitations: |
Not suitable for real-time use cases.
Data may not be used to identify individual customers.
Max data processing per day: 10 TiB
policies:
- name: privacy-policy
url: https://example.com/privacy-policy
- name: license
description: External data is licensed under agreement 1234.
url: https://example.com/license/1234
billing: 5000 USD per month
noticePeriod: P3M
models:
orders:
description: One record per order. Includes cancelled and deleted orders.
type: table
fields:
order_id:
$ref: '#/definitions/order_id'
required: true
unique: true
primaryKey: true
order_timestamp:
description: The business timestamp in UTC when the order was successfully registered in the source system and the payment was successful.
type: timestamp
required: true
examples:
- "2024-09-09T08:30:00Z"
tags: ["business-timestamp"]
order_total:
description: Total amount the smallest monetary unit (e.g., cents).
type: long
required: true
examples:
- 9999
quality:
- type: sql
description: 95% of all order total values are expected to be between 10 and 499 EUR.
query: |
SELECT quantile_cont(order_total, 0.95) AS percentile_95
FROM orders
mustBeBetween: [1000, 49900]
customer_id:
description: Unique identifier for the customer.
type: text
minLength: 10
maxLength: 20
customer_email_address:
description: The email address, as entered by the customer.
type: text
format: email
required: true
pii: true
classification: sensitive
quality:
- type: text
description: The email address is not verified and may be invalid.
lineage:
inputFields:
- namespace: com.example.service.checkout
name: checkout_db.orders
field: email_address
processed_timestamp:
description: The timestamp when the record was processed by the data platform.
type: timestamp
required: true
config:
jsonType: string
jsonFormat: date-time
quality:
- type: sql
description: The maximum duration between two orders should be less that 3600 seconds
query: |
SELECT MAX(duration) AS max_duration
FROM (
SELECT EXTRACT(EPOCH FROM (order_timestamp - LAG(order_timestamp) OVER (ORDER BY order_timestamp))) AS duration
FROM orders
)
mustBeLessThan: 3600
- type: sql
description: Row Count
query: |
SELECT count(*) as row_count
FROM orders
mustBeGreaterThan: 5
examples:
- |
order_id,order_timestamp,order_total,customer_id,customer_email_address,processed_timestamp
"1001","2030-09-09T08:30:00Z",2500,"1000000001","mary.taylor82@example.com","2030-09-09T08:31:00Z"
"1002","2030-09-08T15:45:00Z",1800,"1000000002","michael.miller83@example.com","2030-09-09T08:31:00Z"
"1003","2030-09-07T12:15:00Z",3200,"1000000003","michael.smith5@example.com","2030-09-09T08:31:00Z"
"1004","2030-09-06T19:20:00Z",1500,"1000000004","elizabeth.moore80@example.com","2030-09-09T08:31:00Z"
"1005","2030-09-05T10:10:00Z",4200,"1000000004","elizabeth.moore80@example.com","2030-09-09T08:31:00Z"
"1006","2030-09-04T14:55:00Z",2800,"1000000005","john.davis28@example.com","2030-09-09T08:31:00Z"
"1007","2030-09-03T21:05:00Z",1900,"1000000006","linda.brown67@example.com","2030-09-09T08:31:00Z"
"1008","2030-09-02T17:40:00Z",3600,"1000000007","patricia.smith40@example.com","2030-09-09T08:31:00Z"
"1009","2030-09-01T09:25:00Z",3100,"1000000008","linda.wilson43@example.com","2030-09-09T08:31:00Z"
"1010","2030-08-31T22:50:00Z",2700,"1000000009","mary.smith98@example.com","2030-09-09T08:31:00Z"
line_items:
description: A single article that is part of an order.
type: table
fields:
line_item_id:
type: text
description: Primary key of the lines_item_id table
required: true
order_id:
$ref: '#/definitions/order_id'
references: orders.order_id
sku:
description: The purchased article number
$ref: '#/definitions/sku'
primaryKey: ["order_id", "line_item_id"]
examples:
- |
line_item_id,order_id,sku
"LI-1","1001","5901234123457"
"LI-2","1001","4001234567890"
"LI-3","1002","5901234123457"
"LI-4","1002","2001234567893"
"LI-5","1003","4001234567890"
"LI-6","1003","5001234567892"
"LI-7","1004","5901234123457"
"LI-8","1005","2001234567893"
"LI-9","1005","5001234567892"
"LI-10","1005","6001234567891"
definitions:
order_id:
title: Order ID
type: text
format: uuid
description: An internal ID that identifies an order in the online shop.
examples:
- 243c25e5-a081-43a9-aeab-6d5d5b6cb5e2
pii: true
classification: restricted
tags:
- orders
sku:
title: Stock Keeping Unit
type: text
pattern: ^[A-Za-z0-9]{8,14}$
examples:
- "96385074"
description: |
A Stock Keeping Unit (SKU) is an internal unique identifier for an article.
It is typically associated with an article's barcode, such as the EAN/GTIN.
links:
wikipedia: https://en.wikipedia.org/wiki/Stock_keeping_unit
tags:
- inventory
servicelevels:
availability:
description: The server is available during support hours
percentage: 99.9%
retention:
description: Data is retained for one year
period: P1Y
unlimited: false
latency:
description: Data is available within 25 hours after the order was placed
threshold: 25h
sourceTimestampField: orders.order_timestamp
processedTimestampField: orders.processed_timestamp
freshness:
description: The age of the youngest row in a table.
threshold: 25h
timestampField: orders.order_timestamp
frequency:
description: Data is delivered once a day
type: batch # or streaming
interval: daily # for batch, either or cron
cron: 0 0 * * * # for batch, either or interval
support:
description: The data is available during typical business hours at headquarters
time: 9am to 5pm in EST on business days
responseTime: 1h
backup:
description: Data is backed up once a week, every Sunday at 0:00 UTC.
interval: weekly
cron: 0 0 * * 0
recoveryTime: 24 hours
recoveryPoint: 1 week
tags:
- checkout
- orders
- s3
links:
datacontractCli: https://cli.datacontract.com
```
Migration
---
To migrate from Data Contract Specification to the Open Data Contract Specification, you can use the [Data Contract CLI](https://github.com/datacontract/datacontract-cli):
```
uv tool install --python python3.11 --upgrade 'datacontract-cli[all]'
datacontract export --format odcs --output odcs.yaml datacontract.yaml
```
You can now continue to work with _odcs.yaml_ file.
Data Contract CLI
---
The [Data Contract CLI](https://cli.datacontract.com) is a command line tool and Python library to lint, test, import and export data contracts (supporting Data Contract Specification and ODCS).
Here is a short example of how to verify that your actual dataset matches the data contract:
```bash
pip3 install "datacontract-cli[all]"
datacontract test https://datacontract.com/examples/orders-latest/datacontract.yaml
```
or, if you prefer Docker:
```bash
docker run datacontract/cli test https://datacontract.com/examples/orders-latest/datacontract.yaml
```
The Data Contract contains all required information to verify data:
- The _servers_ block has the connection details to the actual data set.
- The _models_ define the syntax, formats, and constraints.
- The _quality_ defined further quality checks.
The Data Contract CLI chooses the appropriate engine, formulates test cases, connects to the server, and executes the tests, based on the server type.
More information and configuration options on [cli.datacontract.com](https://cli.datacontract.com).
Specification
---

- [Data Contract Object](#data-contract-object)
- [Info Object](#info-object)
- [Contact Object](#contact-object)
- [Server Object](#server-object)
- [Terms Object](#terms-object)
- [Model Object](#model-object)
- [Field Object](#field-object)
- [Definition Object](#definition-object)
- [Service Level Object](#service-levels-object)
- [Quality Object](#quality-object)
- [Lineage Object](#lineage-object)
- [Data Types](#data-types)
- [Specification Extensions](#specification-extensions)
[JSON Schema](https://github.com/datacontract/datacontract-specification/blob/main/datacontract.schema.json) of the Data Contract Specification.
### Data Contract Object
This is the root document.
It is _RECOMMENDED_ that the root document be named: `datacontract.yaml`.
| Field | Type | Description |
|---------------------------|--------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------|
| dataContractSpecification | `string` | REQUIRED. Specifies the Data Contract Specification being used. |
| id | `string` | REQUIRED. An organization-wide unique technical identifier, such as a UUID, URN, slug, string, or number |
| info | [Info Object](#info-object) | REQUIRED. Specifies the metadata of the data contract. May be used by tooling. |
| servers | Map[`string`, [Server Object](#server-object)] | Specifies the servers of the data contract. |
| terms | [Terms Object](#terms-object) | Specifies the terms and conditions of the data contract. |
| models | Map[`string`, [Model Object](#model-object)] | Specifies the logical data model. |
| definitions | Map[`string`, [Definition Object](#definition-object)] | Specifies definitions. |
| servicelevels | [Service Levels Object](#service-levels-object) | Specifies the service level of the provided data |
| links | Map[`string`, `string`] | Additional external documentation links. |
| tags | Array of `string` | Custom metadata to provide additional context. |
This object _MAY_ be extended with [Specification Extensions](#specification-extensions).
### Info Object
Metadata and life cycle information about the data contract.
| Field | Type | Description |
|-------------|-----------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| title | `string` | REQUIRED. The title of the data contract. |
| version | `string` | REQUIRED. The version of the data contract document (which is distinct from the Data Contract Specification version or the Data Product implementation version). |
| status | `string` | The status of the data contract. Can be `proposed`, `in development`, `active`, `deprecated`, `retired`. |
| description | `string` | A description of the data contract. |
| owner | `string` | The owner or team responsible for managing the data contract and providing the data. |
| contact | [Contact Object](#contact-object) | Contact information for the data contract. |
This object _MAY_ be extended with [Specification Extensions](#specification-extensions).
### Contact Object
Contact information for the data contract.
| Field | Type | Description |
|-------|----------|-------------------------------------------------------------------------------------------------------|
| name | `string` | The identifying name of the contact person/organization. |
| url | `string` | The URL pointing to the contact information. This _MUST_ be in the form of a URL. |
| email | `string` | The email address of the contact person/organization. This _MUST_ be in the form of an email address. |
This object _MAY_ be extended with [Specification Extensions](#specification-extensions).
### Server Object
The fields are dependent on the defined type.
| Field | Type | Description |
|-------------|----------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| type | `string` | REQUIRED. The type of the data product technology that implements the data contract. Well-known server types are: `bigquery`, `clickhouse`, `s3`, `glue`, `redshift`, `azure`, `sqlserver`, `snowflake`, `databricks`, `postgres`, `oracle`, `kafka`, `pubsub`, `sftp`, `kinesis`, `trino`, `local` |
| description | `string` | An optional string describing the server. |
| environment | `string` | An optional string describing the environment, e.g., prod, sit, stg. |
| roles | Array of [Server Role Object](#server-role-object) | An optional array of roles that are available and can be requested to access the server for role-based access control. E.g. separate roles for different regions or sensitive data. |
This object _MAY_ be extended with [Specification Extensions](#specification-extensions).
#### BigQuery Server Object
| Field | Type | Description |
|---------|----------|-----------------------|
| type | `string` | `bigquery` |
| project | `string` | The GCP project name. |
| dataset | `string` | |
#### S3 Server Object
| Field | Type | Description |
|-------------|----------|-------------------------------------------------------------------------------------------------------------------------|
| type | `string` | `s3` |
| location | `string` | S3 URL, starting with `s3://` |
| endpointUrl | `string` | The server endpoint for S3-compatible servers, such as MioIO or Google Cloud Storage, e.g., `https://minio.example.com` |
| format | `string` | Format of files, such as `parquet`, `delta`, `json`, `csv` |
| delimiter | `string` | (Only for format = `json`), how multiple json documents are delimited within one file, e.g., `new_line`, `array` |
Example (AWS S3):
```yaml
servers:
production:
type: s3
location: s3://acme-orders-prod/orders/
format: json
delimiter: new_line
```
Example (MinIO):
```yaml
servers:
minio:
type: s3
endpointUrl: http://localhost:9000
location: s3://my-bucket/path/
format: delta
```
Example (Google Cloud Storage):
```yaml
servers:
gcs:
type: s3
endpointUrl: https://storage.googleapis.com
location: s3://my-bucket/path/*/*/*/*/*.parquet
format: parquet
```
#### Redshift Server Object
| Field | Type | Description |
|-------------------|----------|---------------------------------------------------------------------------------------------------------------------|
| type | `string` | `redshift` |
| account | `string` | |
| database | `string` | |
| schema | `string` | |
| clusterIdentifier | `string` | Identifier of the cluster. <br /> Example: `analytics-cluster` |
| host | `string` | Host of the cluster. <br /> Example: `analytics-cluster.example.eu-west-1.redshift.amazonaws.com` |
| port | `number` | Port of the cluster. <br /> Example: `5439` |
| endpoint | `string` | Endpoint of the cluster <br /> Example: `analytics-cluster.example.eu-west-1.redshift.amazonaws.com:5439/analytics` |
Example, specifying an endpoint:
```yaml
servers:
analytics:
type: redshift
account: '123456789012'
database: analytics
schema: analytics
endpoint: analytics-cluster.example.eu-west-1.redshift.amazonaws.com:5439/analytics
```
Example, specifying the cluster identifier:
```yaml
servers:
analytics:
type: redshift
account: '123456789012'
database: analytics
schema: analytics
clusterIdentifier: analytics-cluster
```
Example, specifying the cluster host:
```yaml
servers:
analytics:
type: redshift
account: '123456789012'
database: analytics
schema: analytics
host: analytics-cluster.example.eu-west-1.redshift.amazonaws.com
port: 5439
```
#### Azure Server Object
| Field | Type | Description |
|----------------|----------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| type | `string` | `azure` |
| storageAccount | `string` | The storage account name that contains the files |
| location | `string` | Path to Azure Blob Storage or Azure Data Lake Storage (ADLS) in the storage account, supports globs. Starting with `az://` or `abfss`<br> Recommended pattern is `abfss://<container_name>/<path>`, Examples: `az://my_storage_account_name.blob.core.windows.net/my_container/path/*.parquet` or `abfss://my_container_name/path/*.parquet` |
| format | `string` | Format of files, such as `parquet`, `json`, `csv` |
| delimiter | `string` | (Only for format = `json`), how multiple json documents are delimited within one file, e.g., `new_line`, `array` |
#### SQL-Server Server Object
| Field | Type | Description |
|----------|-----------|--------------------------------------------------------------------------|
| type | `string` | `sqlserver` |
| host | `string` | The host to the database server |
| port | `integer` | The port to the database server, default: `1433` |
| database | `string` | The name of the database, e.g., `database`. |
| schema | `string` | The name of the schema in the database, e.g., `dbo`. |
| driver | `string` | The name of the supported driver, e.g., `ODBC Driver 18 for SQL Server`. |
#### Snowflake Server Object
| Field | Type | Description |
|----------|----------|-------------|
| type | `string` | `snowflake` |
| account | `string` | |
| database | `string` | |
| schema | `string` | |
#### Databricks Server Object
| Field | Type | Description |
|---------|----------|---------------------------------------------------------------------|
| type | `string` | `databricks` |
| host | `string` | The Databricks host, e.g., `dbc-abcdefgh-1234.cloud.databricks.com` |
| catalog | `string` | The name of the Hive or Unity catalog |
| schema | `string` | The schema name in the catalog |
#### Postgres Server Object
| Field | Type | Description |
|----------|-----------|---------------------------------------------------------|
| type | `string` | `postgres` |
| host | `string` | The host to the database server |
| port | `integer` | The port to the database server |
| database | `string` | The name of the database, e.g., `postgres`. |
| schema | `string` | The name of the schema in the database, e.g., `public`. |
#### Oracle Server Object
| Field | Type | Description |
|-------------|-----------|---------------------------------|
| type | `string` | `oracle` |
| host | `string` | The host to the oracle server |
| port | `integer` | The port to the oracle server |
| serviceName | `string` | The name of the service |
#### Kafka Server Object
| Field | Type | Description |
|--------|----------|---------------------------------------------------------------------------|
| type | `string` | `kafka` |
| host | `string` | The bootstrap server of the kafka cluster. |
| topic | `string` | The topic name. |
| format | `string` | The format of the message. Examples: json, avro, protobuf. Default: json. |
#### Pub/Sub Server Object
| Field | Type | Description |
|---------|----------|-----------------------|
| type | `string` | `pubsub` |
| project | `string` | The GCP project name. |
| topic | `string` | The topic name. |
#### sftp Server Object
| Field | Type | Description |
|-----------|----------|------------------------------------------------------------------------------------------------------------------|
| type | `string` | `sftp` |
| location | `string` | SFTP URL, starting with `sftp://` |
| format | `string` | Format of files, such as `parquet`, `delta`, `json`, `csv` |
| delimiter | `string` | (Only for format = `json`), how multiple json documents are delimited within one file, e.g., `new_line`, `array` |
#### AWS Kinesis Data Streams Server Object
| Field | Type | Description |
|--------|----------|---------------------------------------------------------------------------|
| type | `string` | `kinesis` |
| stream | `string` | The name of the Kinesis data stream. |
| region | `string` | AWS region, e.g., `eu-west-1`. |
| format | `string` | The format of the records. Examples: json, avro, protobuf. |
#### Trino Server Object
| Field | Type | Description |
|----------|-----------|-----------------------------------------------------------|
| type | `string` | `trino` |
| host | `string` | The Trino host |
| port | `integer` | The Trino port |
| catalog | `string` | The name of the catalog, e.g., `my_catalog`. |
| schema | `string` | The name of the schema in the catalog, e.g., `my_schema`. |
#### Local Server Object
| Field | Type | Description |
|--------|----------|-------------------------------------------------------------------------------------|
| type | `string` | `local` |
| path | `string` | The relative or absolute path to the data file(s), such as `./folder/data.parquet`. |
| format | `string` | The format of the file(s), such as `parquet`, `delta`, `csv`, or `json`. |
#### Server Role Object
| Field | Type | Description |
|-------------|----------|--------------------------------------------------------------|
| name | `string` | Name of the role |
| description | `string` | A description of the role and what access the role provides. |
### Terms Object
The terms and conditions of the data contract.
| Field | Type | Description |
|--------------|------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| usage | `string` | The usage describes the way the data is expected to be used. Can contain business and technical information. |
| limitations | `string` | The limitations describe the restrictions on how the data can be used, can be technical or restrictions on what the data may not be used for. |
| policies | Array of [Policy Object](#policy-object) | A list of policies, licenses, standards, that are applicable for this data contract and that must be acknowledged by data consumers. |
| billing | `string` | The billing describes the pricing model for using the data, such as whether it's free, having a monthly fee, or metered pay-per-use. |
| noticePeriod | `string` | The period of time that must be given by either party to terminate or modify a data usage agreement. Uses ISO-8601 period format, e.g., `P3M` for a period of three months. |
This object _MAY_ be extended with [Specification Extensions](#specification-extensions).
#### Policy Object
| Field | Type | Description |
|-------------|----------|-----------------------------------|
| name | `string` | Name of the policy. |
| description | `string` | A description of the policy. |
| url | `string` | An URL that refers to the policy. |
### Model Object
The Model Object describes the structure and semantics of a data model, such as tables, views, or structured files.
The name of the data model (table name) is defined by the key that refers to this Model Object.
| Field | Type | Description |
|------------------|----------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------|
| type | `string` | The type of the model. Examples: `table`, `view`, `object`. Default: `table`. |
| description | `string` | An optional string describing the data model. |
| title | `string` | An optional string for the title of the data model. Especially useful if the name of the model is cryptic or contains abbreviations. |
| fields | Map[`string`, [Field Object](#field-object)] | The fields (e.g. columns) of the data model. |
| primaryKey | Array of `string` | If the primary key is a compound key, list the field names that constitute the primary key. Alternative to field-level `primaryKey`. |
| quality | Array of [Quality Object](#quality-object) | Specifies the quality attributes on model level. |
| examples | Array of `Any` | Specifies example data sets for the model. |
| additionalFields | `Boolean` | Specify, if the model can have additional fields that are not defined in the contract. Default: `false`. |
| config | [Config Object](#config-object) | Any additional key-value pairs that might be useful for further tooling. |
This object _MAY_ be extended with [Specification Extensions](#specification-extensions).
### Field Object
The Field Objects describes one field (column, property, nested field) of a data model.
| Field | Type | Description |
|------------------|----------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| description | `string` | An optional string describing the semantic of the data in this field. |
| type | [Data Type](#data-types) | The logical data type of the field. |
| title | `string` | An optional string providing a human readable name for the field. Especially useful if the field name is cryptic or contains abbreviations. |
| enum | array of `string` | A value must be equal to one of the elements in this array value. Only evaluated if the value is not null. |
| required | `boolean` | An indication, if this field must contain a value and may not be null. Default: `false` |
| primaryKey | `boolean` | If this field is a primary key. Default: `false` |
| references | `string` | The reference to a field in another model. E.g. use 'orders.order_id' to reference the order_id field of the model orders. Think of defining a foreign key relationship. |
| unique | `boolean` | An indication, if the value must be unique within the model. Default: `false` |
| format | `string` | `email`: A value must be complaint to [RFC 5321, section 4.1.2](https://www.rfc-editor.org/info/rfc5321).<br>`uri`: A value must be complaint to [RFC 3986](https://www.rfc-editor.org/info/rfc3986).<br>`uuid`: A value must be complaint to [RFC 4122](https://www.rfc-editor.org/info/rfc4122). Only evaluated if the value is not null. Only applies to unicode character sequences types (`string`, `text`, `varchar`). |
| precision | `number` | The maximum number of digits in a number. Only applies to numeric values. Defaults to 38. |
| scale | `number` | The maximum number of decimal places in a number. Only applies to numeric values. Defaults to 0. |
| minLength | `number` | A value must greater than, or equal to, the value of this. Only evaluated if the value is not null. Only applies to unicode character sequences types (`string`, `text`, `varchar`). |
| maxLength | `number` | A value must less than, or equal to, the value of this. Only evaluated if the value is not null. Only applies to unicode character sequences types (`string`, `text`, `varchar`). |
| pattern | `string` | A value must be valid according to the [ECMA-262](https://262.ecma-international.org/5.1/) regular expression dialect. Only evaluated if the value is not null. Only applies to unicode character sequences types (`string`, `text`, `varchar`). |
| minimum | `number` | A value of a number must greater than, or equal to, the value of this. Only evaluated if the value is not null. Only applies to numeric values. |
| exclusiveMinimum | `number` | A value of a number must greater than the value of this. Only evaluated if the value is not null. Only applies to numeric values. |
| maximum | `number` | A value of a number must less than, or equal to, the value of this. Only evaluated if the value is not null. Only applies to numeric values. |
| exclusiveMaximum | `number` | A value of a number must less than the value of this. Only evaluated if the value is not null. Only applies to numeric values. |
| ~~example~~ | `string` | DEPRECATED, use examples. An example value. |
| examples | Array of Any | A list of example values. |
| pii | `boolean` | An indication, if this field contains Personal Identifiable Information (PII). |
| classification | `string` | The data class defining the sensitivity level for this field, according to the organization's classification scheme. Examples may be: `sensitive`, `restricted`, `internal`, `public`. |
| tags | Array of `string` | Custom metadata to provide additional context. |
| links | Map[`string`,`string`] | Additional external documentation links. |
| $ref | `string` | A reference URI to a definition in the specification, internally or externally. Properties will be inherited from the definition. |
| fields | Map[`string`, [Field Object](#field-object)] | The nested fields (e.g. columns) of the object, record, or struct. Use only when type is `object`, `record`, or `struct`. |
| items | [Field Object](#field-object) | The type of the elements in the array. Use only when type is `array`. |
| keys | [Field Object](#field-object) | Describes the key structure of a map. Defaults to `type: string` if a map is defined as type. Not all server types support different key types. Use only when type is `map`. |
| values | [Field Object](#field-object) | Describes the value structure of a map. Use only when type is `map`. |
| quality | Array of [Quality Object](#quality-object) | Specifies the quality attributes on field level. |
| lineage | [Lineage Object](#lineage-object) | Provides information where the data comes from. |
| config | [Config Object](#config-object) | Any additional key-value pairs that might be useful for further tooling. |
This object _MAY_ be extended with [Specification Extensions](#specification-extensions).
### Definition Object
The Definition Object includes a clear and concise explanations of syntax, semantic, and classification of a business object in a given domain.
It serves as a reference for a common understanding of terminology, ensure consistent usage and to identify join-able fields.
Models fields can refer to definitions using the `$ref` field to link to existing definitions and avoid duplicate documentations.
| Field | Type | Description |
|------------------|----------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| type | [Data Type](#data-types) | REQUIRED. The logical data type |
| title | `string` | The business name of this definition. |
| description | `string` | Clear and concise explanations related to the domain |
| enum | array of `string` | A value must be equal to one of the elements in this array value. Only evaluated if the value is not null. |
| format | `string` | `email`: A value must be complaint to [RFC 5321, section 4.1.2](https://www.rfc-editor.org/info/rfc5321).<br>`uri`: A value must be complaint to [RFC 3986](https://www.rfc-editor.org/info/rfc3986).<br>`uuid`: A value must be complaint to [RFC 4122](https://www.rfc-editor.org/info/rfc4122). Only evaluated if the value is not null. Only applies to unicode character sequences types (`string`, `text`, `varchar`). |
| precision | `number` | The maximum number of digits in a number. Only applies to numeric values. Defaults to 38. |
| scale | `number` | The maximum number of decimal places in a number. Only applies to numeric values. Defaults to 0. |
| minLength | `number` | A value must greater than, or equal to, the value of this. Only evaluated if the value is not null. Only applies to unicode character sequences types (`string`, `text`, `varchar`). |
| maxLength | `number` | A value must less than, or equal to, the value of this. Only evaluated if the value is not null. Only applies to unicode character sequences types (`string`, `text`, `varchar`). |
| pattern | `string` | A value must be valid according to the [ECMA-262](https://262.ecma-international.org/5.1/) regular expression dialect. Only evaluated if the value is not null. Only applies to unicode character sequences types (`string`, `text`, `varchar`). |
| minimum | `number` | A value of a number must greater than, or equal to, the value of this. Only evaluated if the value is not null. Only applies to numeric values. |
| exclusiveMinimum | `number` | A value of a number must greater than the value of this. Only evaluated if the value is not null. Only applies to numeric values. |
| maximum | `number` | A value of a number must less than, or equal to, the value of this. Only evaluated if the value is not null. Only applies to numeric values. |
| exclusiveMaximum | `number` | A value of a number must less than the value of this. Only evaluated if the value is not null. Only applies to numeric values. |
| examples | Array of Any | A list of example values. |
| pii | `boolean` | An indication, if this field contains Personal Identifiable Information (PII). |
| classification | `string` | The data class defining the sensitivity level for this field, according to the organization's classification scheme. |
| tags | Array of `string` | Custom metadata to provide additional context. |
| links | Map[`string`, `string`] | Additional external documentation links. |
| fields | Map[`string`, [Field Object](#field-object)] | The nested fields (e.g. columns) of the object, record, or struct. Use only when type is `object`, `record`, or `struct`. |
| items | [Field Object](#field-object) | The type of the elements in the array. Use only when type is `array`. |
| keys | [Field Object](#field-object) | Describes the key structure of a map. Defaults to `type: string` if a map is defined as type. Not all server types support different key types. Use only when type is `map`. |
| values | [Field Object](#field-object) | Describes the value structure of a map. Use only when type is `map`. |
This object _MAY_ be extended with [Specification Extensions](#specification-extensions).
### Service Levels Object
A service level is defined as an agreed-upon, measurable level of performance for provided the data.
Data Contract Specification defines well-known service levels.
This list can be extended with custom service levels.
One can either describe each service level informally using the `description` field, or make use of the predefined fields for automation support, e.g., via the [Data Contract CLI](https://cli.datacontract.com).
| Field | Type | Description |
|--------------|-----------------------------------------------|-------------------------------------------------------------------------|
| availability | [Availability Object](#availability-object) | The promised uptime of the system that provides the data |
| retention | [Retention Object](#retention-object) | The period how long data will be available. |
| latency | [Latency Object](#latency-object) | The maximum amount of time from the source to its destination. |
| freshness | [Freshness Object](#freshness-object) | The maximum age of the youngest entry. |
| frequency | [Frequency Object](#frequency-object) | The update frequency. |
| support | [Support Object](#support-object) | The times when support is provided. |
| backup | [Backup Object](#backup-object) | The details about data backup procedures. |
This object _MAY_ be extended with [Specification Extensions](#specification-extensions).
#### Availability Object
Availability refers to the promise or guarantee by the service provider about the uptime of the system that provides the data.
| Field | Type | Description |
|-------------|----------|--------------------------------------------------------------------------------|
| description | `string` | An optional string describing the availability service level. |
| percentage | `string` | An optional string describing the guaranteed uptime in percent (e.g., `99.9%`) |
This object _MAY_ be extended with [Specification Extensions](#specification-extensions).
#### Retention Object
Retention covers the period how long data will be available.
| Field | Type | Description |
|----------------|-----------|---------------------------------------------------------------------------------------------------------------------------------------------------------|
| description | `string` | An optional string describing the retention service level. |
| period | `string` | An optional period of time, how long data is available. Supported formats: Simple duration (e.g., `1 year`, `30d`) and ISO 8601 duration (e.g, `P1Y`). |
| unlimited | `boolean` | An optional indicator that data is kept forever. |
| timestampField | `string` | An optional reference to the field that contains the timestamp that the period refers to. |
This object _MAY_ be extended with [Specification Extensions](#specification-extensions).
#### Latency Object
Latency refers to the maximum amount of time from the source to its destination.
Examples are the maximum duration it takes after an order has been recorded in the ecommerce shop until it is available in the orders table in the data analytics platform. This includes the waiting times until the next batch run is started and the processing time of the pipeline.
| Field | Type | Description |
|-------------------------|----------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| description | `string` | An optional string describing the latency service level. |
| threshold | `string` | An optional maximum duration between the source timestamp and the processed timestamp. Supported formats: Simple duration (e.g., `24 hours`, `5s`) and ISO 8601 duration (e.g, `PT24H`). |
| sourceTimestampField | `string` | An optional reference to the field that contains the timestamp when the data was provided at the source. |
| processedTimestampField | `string` | An optional reference to the field that contains the processing timestamp, which denotes when the data is made available to consumers of this data contract. |
This object _MAY_ be extended with [Specification Extensions](#specification-extensions).
#### Freshness Object
Freshness refers to the maximum age of the youngest entry.
| Field | Type | Description |
|-------------------------|----------|--------------------------------------------------------------------------------------------------------------------------------------------------|
| description | `string` | An optional string describing the freshness service level. |
| threshold | `string` | An optional maximum age of the youngest entry. Supported formats: Simple duration (e.g., `24 hours`, `5s`) and ISO 8601 duration (e.g, `PT24H`). |
| timestampField | `string` | An optional reference to the field that contains the timestamp that the threshold refers to. |
This object _MAY_ be extended with [Specification Extensions](#specification-extensions).
#### Frequency Object
Frequency describes how often data is updated.
| Field | Type | Description |
|-------------|----------|-----------------------------------------------------------------------------------------------------------|
| description | `string` | An optional string describing the frequency service level. |
| type | `string` | An optional type of data processing. Typical values are `batch`, `micro-batching`, `streaming`, `manual`. |
| interval | `string` | Optional. Only for batch: How often the pipeline is triggered, e.g., `daily`. |
| cron | `string` | Optional. Only for batch: A cron expression when the pipelines is triggered. E.g., `0 0 * * *`. |
This object _MAY_ be extended with [Specification Extensions](#specification-extensions).
#### Support Object
Support describes the times when support will be available for contact.
| Field | Type | Description |
|--------------|----------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| description | `string` | An optional string describing the support service level. |
| time | `string` | An optional string describing the times when support will be available for contact such as `24/7` or `business hours only`. |
| responseTime | `string` | An optional string describing the time it takes for the support team to acknowledge a request. This does not mean the issue will be resolved immediately, but it assures users that their request has been received and will be dealt with. |
This object _MAY_ be extended with [Specification Extensions](#specification-extensions).
#### Backup Object
Backup specifies details about data backup procedures.
| Field | Type | Description |
|---------------|----------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| description | `string` | An optional string describing the backup service level. |
| interval | `string` | An optional interval that defines how often data will be backed up, e.g., `daily`. |
| cron | `string` | An optional cron expression when data will be backed up, e.g., `0 0 * * *`. |
| recoveryTime | `string` | An optional Recovery Time Objective (RTO) specifies the maximum amount of time allowed to restore data from a backup after a failure or loss event (e.g., 4 hours, 24 hours). |
| recoveryPoint | `string` | An optional Recovery Point Objective (RPO) defines the maximum acceptable age of files that must be recovered from backup storage for normal operations to resume after a disaster or data loss event. This essentially measures how much data you can afford to lose, measured in time (e.g., 4 hours, 24 hours). |
### Quality Object
The quality object defines quality attributes.
Quality attributes are checks that can be applied to the data to ensure its quality.
Data can be verified by executing these checks through a data quality engine.
Quality attributes can be:
- A text in natural language that describes the quality of the data.
- A predefined metric from the library of commonly used metrics
- An individual SQL query that returns a single value that can be compared.
- Engine-specific types: Pre-defined quality checks, as defined by data quality libraries. Currently, the engines `soda` and `great-expectations` are supported.
A quality object can be specified on the field level and on the model level.
The top-level quality object is deprecated.
#### Description Text
A description in natural language that defines the expected quality of the data.
This is useful to express requirements or expectations when discussing the data contract with stakeholders.
Later in the development process, these might be translated into an executable check (such as `sql`).
It can also be used as a prompt to check the data with an AI engine.
| Field | Type | Description |
|-------------|----------|--------------------------------------------------------------------|
| type | `string` | `text` |
| description | `string` | A plain text describing the quality attribute in natural language. |
Example:
```yaml
models:
my_table:
fields:
account_iban:
quality:
- type: text
description: Must be a valid IBAN. Must not be empty.
```
#### SQL
An individual SQL query that returns a single number that can be compared with a threshold. The SQL query must be in the SQL dialect of the provided server.
> __Note:__ Establish a secure development process and use read-only connections, as the misuse of SQL queries can lead to SQL injection attacks.
| Field | Type | Description |
|----------------------------|-----------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------|
| type | `string` | `sql` |
| description | `string` | A plain text describing the quality of the data. |
| query | `string` | A SQL query that returns a single number to compare with the threshold. |
| dialect | `string` | The SQL dialect that is used for the query. Should be compatible to the server type. Examples: `postgres`, `spark`, `bigquery`, `snowflake`, `duckdb`, ... |
| mustBe | `integer` | The threshold to check the return value of the query |
| mustNotBe | `integer` | The threshold to check the return value of the query |
| mustBeGreaterThan | `integer` | The threshold to check the return value of the query |
| mustBeGreaterThanOrEqualTo | `integer` | The threshold to check the return value of the query |
| mustBeLessThan | `integer` | The threshold to check the return value of the query |
| mustBeLessThanOrEqualTo | `integer` | The threshold to check the return value of the query |
| mustBeBetween | array of two integers | The threshold to check the return value of the query. Boundaries are inclusive. |
| mustNotBeBetween | array of two integers | The threshold to check the return value of the query. Boundaries are inclusive. |
In the query the following placeholders can be used:
| Placeholder | Description |
|-------------|----------------------------------------------------------------------------------------|
| `{model}` | The name of the model that is checked. |
| `{table}` | Alias for `{model}`. |
| `{field}` | The name of the field that is checked (only if the quality is defined on field-level). |
| `{column}` | Alias for `{field}`. |
Example:
```yaml
models:
orders:
quality:
- type: sql
description: The maximum duration between two orders must be less that 3600 seconds
query: |
SELECT MAX(duration) AS max_duration
FROM (
SELECT EXTRACT(EPOCH FROM (order_timestamp - LAG(order_timestamp) OVER (ORDER BY order_timestamp))) AS duration
FROM {model}
)
mustBeLessThan: 3600
```
SQL queries allow powerful checks for custom business logic.
A SQL query should run not longer than 10 minutes.
#### Library / Metrics
A set of predefined metrics commonly used in data quality checks, designed to be compatible with all major data quality engines. This simplifies the work for data engineers by eliminating the need to manually write SQL queries.
These metrics are aligned with ODCS 3.1.
| Field | Type | Description |
|------------------------|-----------------------|----------------------------------------------------------------------------------|
| type | `string` | `library` (can be omitted, if `metric` is defined) |
| metric | `string` | `nullValues`, `missingValues`, `invalidValues`, `duplicateValues`, or `rowCount` |
| arguments | `object` | Some metrics require additional arguments |
| description | `string` | A plain text describing the quality of the data. |
| mustBe | `integer` | The threshold to check the return value of the query |
| mustNotBe | `integer` | The threshold to check the return value of the query |
| mustBeGreaterThan | `integer` | The threshold to check the return value of the query |
| mustBeGreaterOrEqualTo | `integer` | The threshold to check the return value of the query |
| mustBeLessThan | `integer` | The threshold to check the return value of the query |
| mustBeLessOrEqualTo | `integer` | The threshold to check the return value of the query |
| mustBeBetween | array of two integers | The threshold to check the return value of the query. Boundaries are inclusive. |
| mustNotBeBetween | array of two integers | The threshold to check the return value of the query. Boundaries are inclusive. |
| unit | `string` | `rows` (default) or `percent` |
Metrics:
| Metric | Level | Description | Arguments | Arguments Example |
|--------|--------|----------------------------------------------------------------|------------------------------------------------------------------|----------------------------------------------------------------------|
| `nullValues` | Property | Counts null values in a column/field | None | |
| `missingValues` | Property | Counts values considered as missing (empty strings, N/A, etc.) | `missingValues`: Array of values considered missing | `missingValues: [null, '', 'N/A']` |
| `invalidValues` | Property | Counts values that don't match valid criteria | `validValues`: Array of valid values<br>`pattern`: Regex pattern | `validValues: ['pounds', 'kg']`<br>`pattern: '^[A-Z]{2}[0-9]{2}...'` |
| `duplicateValues` | Property | Counts duplicate values in a column | None | |
| `duplicateValues` | Schema | Counts duplicate values across multiple columns | `properties`: Array of property names | `properties: ['tenant_id', 'order_id']` |
| `rowCount` | Schema | Counts total number of rows in a table/object store | None | |
Example:
```yaml
properties:
- name: email_address
quality:
- metric: missingValues
arguments:
missingValues: [null, '', 'N/A', 'n/a']
mustBeLessThan: 5
unit: percent # rows (default) or percent
```
#### Custom
You can define custom quality attributes that are specific to a data quality engine.
#### Custom (Engine: Soda)
Soda has a number of predefined quality [checks](https://docs.soda.io/soda/data-contracts-checks.html) that can be referenced as quality attributes.
Soda checks can be applied on model and field level.
> Note: Soda Data contract check reference is experimental and may change in the future. Currently only supported by Postgres, Snowflake, and Spark (Databricks)
| Field | Type | Description |
|---------------|----------|-----------------------------------------------------------------------------------------------------------------------------|
| type | `string` | `custom` |
| description | `string` | Optional. A plain text describing the quality attribute in natural language. |
| engine | `string` | `soda` |
| implementation | `object` | A check type as defined in the [Data contract check reference](https://docs.soda.io/soda/data-contracts-checks.html) |
See the [Data contract check reference](https://docs.soda.io/soda/data-contracts-checks.html) for all possible types and configuration values.
Example:
```yaml
models:
my_table:
fields:
order_id:
type: string
quality:
- type: custom
description: This is a check on field level
engine: soda
implementation:
type: no_duplicate_values
carrier:
type: string
shipment_numer:
type: string
quality:
- type: custom
description: This is a check on model level
engine: soda
implementation:
type: duplicate_percent
columns:
- carrier
- shipment_numer
must_be_less_than: 1.0
- type: custom
description: This is a check on model level
engine: soda
implementation:
type: row_count
must_be_greater_than: 500000
```
#### Custom (Engine: Great Expectations)
Quality attributes defined as Great Expectations [Expectation](https://greatexpectations.io/expectations/).
Expectations are applied on model level.
| Field | Type | Description |
|---------------|----------|-----------------------------------------------------------------------------------------------------|
| description | `string` | Optional. A plain text describing the quality attribute in natural language. |
| engine | `string` | `great-expectations` |
| implementation | `object` | An expectation type as listed in [Expectation](https://greatexpectations.io/expectations/) as YAML. |
Example:
```yaml
models:
my_table:
quality:
- type: custom
engine: great-expectations
implementation:
expectation_type: expect_table_row_count_to_be_between
kwargs:
min_value: 10000
max_value: 50000
meta:
notes: "This expectation is crucial to avoid processing datasets that are too small or too large."
- type: custom
engine: great-expectations
description: "Check that passenger_count values are between 1 and 6."
implementation:
expectation_type: expect_column_values_to_be_between
kwargs:
column: passenger_count
max_value: 6
min_value: 1
mostly: 1.0
strict_max: false
strict_min: false
meta:
tags:
- business-critical
- range_check
```
### Lineage Object
Field level lineage provides optional fine-grained information where the data comes from and how it was transformed.
The lineage object is based on the OpenLinage [Column Level Lineage Dataset Facet](https://openlineage.io/docs/spec/facets/dataset-facets/column_lineage_facet) to describe the input fields.
| Field | Type | Description |
|-------------|---------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| inputFields | Array of [InputField Object](#inputfield-object) | The input fields refer to specific fields, columns, or data points from source systems or other data contracts that feed into a particular transformation, calculation, or final result. |
#### InputField Object
| Field | Type | Description |
|-----------------|----------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| namespace | `string` | The input dataset namespace, such as the name of the source system or the domain of another data contract. Examples: `com.example.crm`, `checkout`, snowflake://{account name}. [More on namespace](https://openlineage.io/blog/whats-in-a-namespace/#namespaces-in-the-spec) |
| name | `string` | The input dataset name, such as a reference to a data contract, a fully qualified table name, a Kafka topic. |
| field | `string` | The input field name, such as the field in an upstream data contract, a table column or a JSON Path. |
| transformations | Array of [Transformation Object](#transformation-object) | Optional. This describes how the input field data was used to generate the final result. |
#### Transformation Object
| Field | Type | Description |
|-------------|-----------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| type | `string` | Indicates how direct is the relationship e.g. in query. Allows values are: `DIRECT` and `INDIRECT`. |
| subtype | `string` | Optional. Contains more specific information about the transformation.<br>Allowed values for type `DIRECT`: `IDENTITY`, `TRANSFORMATION`, `AGGREGATION`.<br>Allowed values for type `INDIRECT`: `JOIN`, `GROUP_BY`, `FILTER`, `SORT`, `WINDOW`, `CONDITIONAL`. |
| description | `string` | Optional. A string representation of the transformation applied. |
| masking | `boolean` | Optional. Boolean value indicating if the input value was obfuscated during the transformation. |
Example:
```yaml
models:
orders:
fields:
order_id:
type: string
lineage:
inputFields:
- namespace: com.example.service.checkout
name: checkout_db.orders
field: order_id
transformations:
- type: DIRECT
subtype: IDENTITY
description: The order ID from the checkout order
- namespace: com.example.service.checkout
name: checkout_db.orders
field: order_timestamp
transformations:
- type: INDIRECT
subtype: SORT
customer_email_address_hash:
type: string
lineage:
inputFields:
- namespace: com.example.service.checkout
name: checkout_db.orders
field: email_address
transformations:
- type: DIRECT
subtype: Transformation
description: The email address from the checkout order, hashed with SHA-256
masking: true
```
### Config Object
The config field can be used to set additional metadata that may be used by tools, e.g. to define a namespace for code generation, specify physical data types, toggle tests, etc.
A config field can be added with any name. The value can be null, a primitive, an array or an object.
For developer experience, a list of well-known field names is maintained here, as these fields are used in the Data Contract CLI:
| Field | Type | Description |
|-----------------|----------|----------------------------------------------------------------------------------------------------------------|
| avroNamespace | `string` | (Only on model level) The namespace to use when importing and exporting the data model from / to Apache Avro. |
| avroType | `string` | (Only on field level) Specify the field type to use when exporting the data model to Apache Avro. |
| avroLogicalType | `string` | (Only on field level) Specify the logical field type to use when exporting the data model to Apache Avro. |
| bigqueryType | `string` | (Only on field level) Specify the physical column type that is used in a BigQuery table, e.g., `NUMERIC(5, 2)` |
| snowflakeType | `string` | (Only on field level) Specify the physical column type that is used in a Snowflake table, e.g, `TIMESTAMP_LTZ` |
| redshiftType | `string` | (Only on field level) Specify the physical column type that is used in a Redshift table, e.g, `SMALLINT` |
| sqlserverType | `string` | (Only on field level) Specify the physical column type that is used in a SQL Server table, e.g, `DATETIME2` |
| databricksType | `string` | (Only on field level) Specify the physical column type that is used in a Databricks table |
| glueType | `string` | (Only on field level) Specify the physical column type that is used in a AWS Glue Data Catalog table |
This object _MAY_ be extended with [Specification Extensions](#specification-extensions).
Example:
```
models:
orders:
config:
avroNamespace: "my.namespace"
fields:
my_field_1:
description: Example for AVRO with Timestamp (millisecond precision)
type: timestamp
config:
avroType: long
avroLogicalType: timestamp-millis
snowflakeType: timestamp_tz
```
### Data Types
The following data types are supported for model fields and definitions:
- Unicode character sequence: `string`, `text`, `varchar`
- Any numeric type, either integers or floating point numbers: `number`, `decimal`, `numeric`
- 32-bit signed integer: `int`, `integer`
- 64-bit signed integer: `long`, `bigint`
- Single precision (32-bit) IEEE 754 floating-point number: `float`
- Double precision (64-bit) IEEE 754 floating-point number: `double`
- Binary value: `boolean`
- Timestamp with timezone: `timestamp`, `timestamp_tz`
- Timestamp with no timezone: `timestamp_ntz`
- Date with no time information: `date`
- Time with no date information: `time`
- Array: `array`
- Map: `map` (may not be supported by some server types)
- Sequence of 8-bit unsigned bytes: `bytes`
- Complex type: `object`, `record`, `struct`
- Semi-structured data: `variant` (may not be supported by some server types)
- JSON data: `json` (may not be supported by some server types)
- No value: `null`
### Specification Extensions
While the Data Contract Specification tries to accommodate most use cases, additional data can be added to extend the specification at certain points.
A custom field can be added with any name. The value can be null, a primitive, an array or an object.
Tooling
---
- [Data Contract CLI](https://github.com/datacontract/datacontract-cli) is an open-source CLI tool to help you create, develop, and maintain your data contracts.
- [Data Contract Manager](https://www.datamesh-manager.com/) is a commercial tool to manage data contracts. It includes a data contract catalog, a Web-Editor, and a request and approval workflow to automate access to data products for a full enterprise data marketplace.
- [Data Contract GPT](https://gpt.datacontract.com) is a custom GPT that can help you write data contracts.
- [Data Contract Editor](https://editor.datacontract.com) is an open-source editor for Data Contracts, including a live html preview.
Code Completion
---
The [JSON Schema](https://datacontract.com/datacontract.schema.json) of the current data contract specification is registered in [Schema Store](https://www.schemastore.org/), which brings code completion and syntax checks for all major IDEs.
IntelliJ comes with a built-in YAML plugin which will show you autocompletions.
For VS Code we recommend to install the [YAML](https://marketplace.visualstudio.com/items?itemName=redhat.vscode-yaml) plugin.
No additional configuration is required.
Autocompletion is then enabled for files following these patterns:
```
datacontract.yaml
datacontract.yml
*-datacontract.yaml
*-datacontract.yml
*.datacontract.yaml
*.datacontract.yml
datacontract-*.yaml
datacontract-*.yml
**/datacontract/*.yml
**/datacontract/*.yaml
**/datacontracts/*.yml
**/datacontracts/*.yaml
```
Authors
---
The Data Contract Specification was originally created by [Jochen Christ](https://www.linkedin.com/in/jochenchrist/) and [Dr. Simon Harrer](https://www.linkedin.com/in/simonharrer/), and is currently maintained by them.
Contributing
---
Contributions are welcome! Please open an issue or a pull request.
License
---
[MIT License](LICENSE)
<a href="https://github.com/datacontract/datacontract-specification/" class="github-corner" aria-label="View source on GitHub"><svg width="80" height="80" viewBox="0 0 250 250" style="fill:#151513; color:#fff; position: absolute; top: 0; border: 0; right: 0;" aria-hidden="true"><path d="M0,0 L115,115 L130,115 L142,142 L250,250 L250,0 Z"></path><path d="M128.3,109.0 C113.8,99.7 119.0,89.6 119.0,89.6 C122.0,82.7 120.5,78.6 120.5,78.6 C119.2,72.0 123.4,76.3 123.4,76.3 C127.3,80.9 125.5,87.3 125.5,87.3 C122.9,97.6 130.6,101.9 134.4,103.2" fill="currentColor" style="transform-origin: 130px 106px;" class="octo-arm"></path><path d="M115.0,115.0 C114.9,115.1 118.7,116.5 119.8,115.4 L133.7,101.6 C136.9,99.2 139.9,98.4 142.2,98.6 C133.8,88.0 127.5,74.4 143.8,58.0 C148.5,53.4 154.0,51.2 159.7,51.0 C160.3,49.4 163.2,43.6 171.4,40.1 C171.4,40.1 176.1,42.5 178.8,56.2 C183.1,58.6 187.2,61.8 190.9,65.4 C194.5,69.0 197.7,73.2 200.1,77.6 C213.8,80.2 216.3,84.9 216.3,84.9 C212.7,93.1 206.9,96.0 205.4,96.6 C205.1,102.4 203.0,107.8 198.3,112.5 C181.9,128.9 168.3,122.5 157.7,114.1 C157.9,116.9 156.7,120.9 152.7,124.9 L141.0,136.5 C139.8,137.7 141.6,141.9 141.8,141.8 Z" fill="currentColor" class="octo-body"></path></svg></a><style>.github-corner:hover .octo-arm{animation:octocat-wave 560ms ease-in-out}@keyframes octocat-wave{0%,100%{transform:rotate(0)}20%,60%{transform:rotate(-25deg)}40%,80%{transform:rotate(10deg)}}@media (max-width:500px){.github-corner:hover .octo-arm{animation:none}.github-corner .octo-arm{animation:octocat-wave 560ms ease-in-out}}</style>
================================================
FILE: _config.yml
================================================
plugins:
- jekyll-sitemap
name: Data Contract Specification
title: null
description: Data contracts bring data providers and data consumers together.
================================================
FILE: _layouts/default.html
================================================
<!DOCTYPE html>
<html lang="{{ site.lang | default: "en-US" }}">
<head>
<meta charset="UTF-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta property="og:image" content="https://datacontract.com/images/datacontract-preview.png" />
<meta name="twitter:card" content="summary_large_image" />
<meta name="twitter:image" content="https://datacontract.com/images/datacontract-preview.png" />
<meta name="viewport" content="width=device-width, initial-scale=1">
<link rel="shortcut icon" type="image/png" href="images/favicon.png">
{% seo %}
<link rel="stylesheet" href="{{ "/assets/css/style.css?v=" | append: site.github.build_revision | relative_url }}">
<style>
.footer {
text-align: center;
margin-bottom: 1rem;
}
.footer-logo {
width: 150px;
}
</style>
</head>
<body>
<div class="container-lg px-3 my-5 markdown-body">
{% if site.title and site.title != page.title %}
<h1><a href="{{ "/" | absolute_url }}">{{ site.title }}</a></h1>
{% endif %}
{{ content }}
{% if site.github.private != true and site.github.license %}
<div class="footer border-top border-gray-light mt-5 pt-3 text-right text-gray">
This site is open source. {% github_edit_link "Improve this page" %}.
</div>
{% endif %}
</div>
<footer class="footer">
<p style="margin-top: 2em;">
<a href="https://www.innoq.com">
<img src="/images/supported-by-innoq--petrol-apricot.svg" class="footer-logo" />
</a>
</p>
<p>
<a href="https://www.innoq.com/en/impressum/">Legal Notice</a>
|
<a href="https://www.innoq.com/en/datenschutz/">Privacy</a>
</p>
</footer>
<script src="{{ "assets/javascript/anchor-js/anchor.min.js" | relative_url }}"></script>
<script>anchors.add();</script>
{% if site.google_analytics %}
<script>
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,'script','//www.google-analytics.com/analytics.js','ga');
ga('create', '{{ site.google_analytics }}', 'auto');
ga('send', 'pageview');
</script>
{% endif %}
<!-- 100% privacy friendly analytics -->
<script async defer src="https://scripts.simpleanalyticscdn.com/latest.js"></script>
<noscript><img src="https://queue.simpleanalyticscdn.com/noscript.gif" alt="" referrerpolicy="no-referrer-when-downgrade" /></noscript>
</body>
</html>
================================================
FILE: datacontract.init.yaml
================================================
dataContractSpecification: 1.2.1
id: my-data-contract-id
info:
title: My Data Contract
version: 0.0.1
# description:
# owner:
# contact:
# name:
# url:
# email:
### servers
#servers:
# production:
# type: s3
# location: s3://
# format: parquet
# delimiter: new_line
### terms
#terms:
# usage:
# limitations:
# billing:
# noticePeriod:
### models
# models:
# my_model:
# description:
# type:
# fields:
# my_field:
# type:
# description:
### definitions
# definitions:
# my_field:
# domain:
# name:
# title:
# type:
# description:
# example:
# pii:
# classification:
### servicelevels
#servicelevels:
# availability:
# description: The server is available during support hours
# percentage: 99.9%
# retention:
# description: Data is retained for one year because!
# period: P1Y
# unlimited: false
# latency:
# description: Data is available within 25 hours after the order was placed
# threshold: 25h
# sourceTimestampField: orders.order_timestamp
# processedTimestampField: orders.processed_timestamp
# freshness:
# description: The age of the youngest row in a table.
# threshold: 25h
# timestampField: orders.order_timestamp
# frequency:
# description: Data is delivered once a day
# type: batch # or streaming
# interval: daily # for batch, either or cron
# cron: 0 0 * * * # for batch, either or interval
# support:
# description: The data is available during typical business hours at headquarters
# time: 9am to 5pm in EST on business days
# responseTime: 1h
# backup:
# description: Data is backed up once a week, every Sunday at 0:00 UTC.
# interval: weekly
# cron: 0 0 * * 0
# recoveryTime: 24 hours
# recoveryPoint: 1 week
================================================
FILE: datacontract.schema.json
================================================
{
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "object",
"title": "DataContractSpecification",
"properties": {
"dataContractSpecification": {
"type": "string",
"title": "DataContractSpecificationVersion",
"enum": [
"1.2.1",
"1.2.0",
"1.1.0",
"0.9.3",
"0.9.2",
"0.9.1",
"0.9.0"
],
"description": "Specifies the Data Contract Specification being used."
},
"id": {
"type": "string",
"description": "Specifies the identifier of the data contract."
},
"info": {
"type": "object",
"properties": {
"title": {
"type": "string",
"description": "The title of the data contract."
},
"version": {
"type": "string",
"description": "The version of the data contract document (which is distinct from the Data Contract Specification version or the Data Product implementation version)."
},
"status": {
"type": "string",
"description": "The status of the data contract. Can be proposed, in development, active, retired.",
"examples": [
"proposed",
"in development",
"active",
"deprecated",
"retired"
]
},
"description": {
"type": "string",
"description": "A description of the data contract."
},
"owner": {
"type": "string",
"description": "The owner or team responsible for managing the data contract and providing the data."
},
"contact": {
"type": "object",
"properties": {
"name": {
"type": "string",
"description": "The identifying name of the contact person/organization."
},
"url": {
"type": "string",
"format": "uri",
"description": "The URL pointing to the contact information. This MUST be in the form of a URL."
},
"email": {
"type": "string",
"format": "email",
"description": "The email address of the contact person/organization. This MUST be in the form of an email address."
}
},
"description": "Contact information for the data contract.",
"additionalProperties": true
}
},
"additionalProperties": true,
"required": [
"title",
"version"
],
"description": "Metadata and life cycle information about the data contract."
},
"servers": {
"type": "object",
"description": "Information about the servers.",
"additionalProperties": {
"$ref": "#/$defs/BaseServer",
"allOf": [
{
"if": {
"properties": {
"type": {
"const": "bigquery"
}
}
},
"then": {
"$ref": "#/$defs/BigQueryServer"
}
},
{
"if": {
"properties": {
"type": {
"const": "postgres"
}
},
"required": [
"type"
]
},
"then": {
"$ref": "#/$defs/PostgresServer"
}
},
{
"if": {
"properties": {
"type": {
"const": "s3"
}
},
"required": [
"type"
]
},
"then": {
"$ref": "#/$defs/S3Server"
}
},
{
"if": {
"properties": {
"type": {
"const": "sftp"
}
},
"required": [
"type"
]
},
"then": {
"$ref": "#/$defs/SftpServer"
}
},
{
"if": {
"properties": {
"type": {
"const": "redshift"
}
},
"required": [
"type"
]
},
"then": {
"$ref": "#/$defs/RedshiftServer"
}
},
{
"if": {
"properties": {
"type": {
"const": "azure"
}
},
"required": [
"type"
]
},
"then": {
"$ref": "#/$defs/AzureServer"
}
},
{
"if": {
"properties": {
"type": {
"const": "sqlserver"
}
},
"required": [
"type"
]
},
"then": {
"$ref": "#/$defs/SqlserverServer"
}
},
{
"if": {
"properties": {
"type": {
"const": "snowflake"
}
},
"required": [
"type"
]
},
"then": {
"$ref": "#/$defs/SnowflakeServer"
}
},
{
"if": {
"properties": {
"type": {
"const": "databricks"
}
},
"required": [
"type"
]
},
"then": {
"$ref": "#/$defs/DatabricksServer"
}
},
{
"if": {
"properties": {
"type": {
"const": "dataframe"
}
},
"required": [
"type"
]
},
"then": {
"$ref": "#/$defs/DataframeServer"
}
},
{
"if": {
"properties": {
"type": {
"const": "glue"
}
},
"required": [
"type"
]
},
"then": {
"$ref": "#/$defs/GlueServer"
}
},
{
"if": {
"properties": {
"type": {
"const": "postgres"
}
},
"required": [
"type"
]
},
"then": {
"$ref": "#/$defs/PostgresServer"
}
},
{
"if": {
"properties": {
"type": {
"const": "oracle"
}
},
"required": [
"type"
]
},
"then": {
"$ref": "#/$defs/OracleServer"
}
},
{
"if": {
"properties": {
"type": {
"const": "kafka"
}
},
"required": [
"type"
]
},
"then": {
"$ref": "#/$defs/KafkaServer"
}
},
{
"if": {
"properties": {
"type": {
"const": "pubsub"
}
},
"required": [
"type"
]
},
"then": {
"$ref": "#/$defs/PubSubServer"
}
},
{
"if": {
"properties": {
"type": {
"const": "kinesis"
}
},
"required": [
"type"
]
},
"then": {
"$ref": "#/$defs/KinesisDataStreamsServer"
}
},
{
"if": {
"properties": {
"type": {
"const": "trino"
}
},
"required": [
"type"
]
},
"then": {
"$ref": "#/$defs/TrinoServer"
}
},
{
"if": {
"properties": {
"type": {
"const": "clickhouse"
}
},
"required": [
"type"
]
},
"then": {
"$ref": "#/$defs/ClickhouseServer"
}
},
{
"if": {
"properties": {
"type": {
"const": "local"
}
},
"required": [
"type"
]
},
"then": {
"$ref": "#/$defs/LocalServer"
}
}
]
}
},
"terms": {
"type": "object",
"description": "The terms and conditions of the data contract.",
"properties": {
"usage": {
"type": "string",
"description": "The usage describes the way the data is expected to be used. Can contain business and technical information."
},
"limitations": {
"type": "string",
"description": "The limitations describe the restrictions on how the data can be used, can be technical or restrictions on what the data may not be used for."
},
"policies": {
"type": "array",
"items": {
"type": "object",
"properties": {
"type": {
"type": "string",
"description": "The type of the policy.",
"examples": [
"privacy",
"security",
"retention",
"compliance"
]
},
"description": {
"type": "string",
"description": "A description of the policy."
},
"url": {
"type": "string",
"format": "uri",
"description": "A URL to the policy document."
}
},
"additionalProperties": true
},
"description": "The limitations describe the restrictions on how the data can be used, can be technical or restrictions on what the data may not be used for."
},
"billing": {
"type": "string",
"description": "The billing describes the pricing model for using the data, such as whether it's free, having a monthly fee, or metered pay-per-use."
},
"noticePeriod": {
"type": "string",
"description": "The period of time that must be given by either party to terminate or modify a data usage agreement. Uses ISO-8601 period format, e.g., 'P3M' for a period of three months."
}
},
"additionalProperties": true
},
"models": {
"description": "Specifies the logical data model. Use the models name (e.g., the table name) as the key.",
"type": "object",
"minProperties": 1,
"propertyNames": {
"pattern": "^[a-zA-Z0-9_-]+$"
},
"additionalProperties": {
"type": "object",
"title": "Model",
"properties": {
"description": {
"type": "string"
},
"type": {
"description": "The type of the model. Examples: table, view, object. Default: table.",
"type": "string",
"title": "ModelType",
"default": "table",
"enum": [
"table",
"view",
"object"
]
},
"title": {
"type": "string",
"description": "An optional string providing a human readable name for the model. Especially useful if the model name is cryptic or contains abbreviations.",
"examples": [
"Purchase Orders",
"Air Shipments"
]
},
"fields": {
"description": "Specifies a field in the data model. Use the field name (e.g., the column name) as the key.",
"type": "object",
"additionalProperties": {
"type": "object",
"title": "Field",
"properties": {
"description": {
"type": "string",
"description": "An optional string describing the semantic of the data in this field."
},
"title": {
"type": "string",
"description": "An optional string providing a human readable name for the field. Especially useful if the field name is cryptic or contains abbreviations."
},
"type": {
"$ref": "#/$defs/FieldType"
},
"required": {
"type": "boolean",
"default": false,
"description": "An indication, if this field must contain a value and may not be null."
},
"fields": {
"description": "The nested fields (e.g. columns) of the object, record, or struct.",
"type": "object",
"additionalProperties": {
"$ref": "#/properties/models/additionalProperties/properties/fields/additionalProperties"
}
},
"items": {
"$ref": "#/properties/models/additionalProperties/properties/fields/additionalProperties"
},
"keys": {
"$ref": "#/properties/models/additionalProperties/properties/fields/additionalProperties"
},
"values": {
"$ref": "#/properties/models/additionalProperties/properties/fields/additionalProperties"
},
"primary": {
"type": "boolean",
"deprecationMessage": "Use the primaryKey field instead."
},
"primaryKey": {
"type": "boolean",
"default": false,
"description": "If this field is a primary key."
},
"references": {
"type": "string",
"description": "The reference to a field in another model. E.g. use 'orders.order_id' to reference the order_id field of the model orders. Think of defining a foreign key relationship.",
"examples": [
"orders.order_id",
"model.nested_field.field"
]
},
"unique": {
"type": "boolean",
"default": false,
"description": "An indication, if the value must be unique within the model."
},
"enum": {
"type": "array",
"items": {
"type": "string"
},
"uniqueItems": true,
"description": "A value must be equal to one of the elements in this array value. Only evaluated if the value is not null."
},
"minLength": {
"type": "integer",
"description": "A value must greater than, or equal to, the value of this. Only applies to string types."
},
"maxLength": {
"type": "integer",
"description": "A value must less than, or equal to, the value of this. Only applies to string types."
},
"format": {
"type": "string",
"description": "A specific format the value must comply with (e.g., 'email', 'uri', 'uuid').",
"examples": [
"email",
"uri",
"uuid"
]
},
"precision": {
"type": "number",
"examples": [
38
],
"description": "The maximum number of digits in a number. Only applies to numeric values. Defaults to 38."
},
"scale": {
"type": "number",
"examples": [
0
],
"description": "The maximum number of decimal places in a number. Only applies to numeric values. Defaults to 0."
},
"pattern": {
"type": "string",
"description": "A regular expression the value must match. Only applies to string types.",
"examples": [
"^[a-zA-Z0-9_-]+$"
]
},
"minimum": {
"type": "number",
"description": "A value of a number must greater than, or equal to, the value of this. Only evaluated if the value is not null. Only applies to numeric values."
},
"exclusiveMinimum": {
"type": "number",
"description": "A value of a number must greater than the value of this. Only evaluated if the value is not null. Only applies to numeric values."
},
"maximum": {
"type": "number",
"description": "A value of a number must less than, or equal to, the value of this. Only evaluated if the value is not null. Only applies to numeric values."
},
"exclusiveMaximum": {
"type": "number",
"description": "A value of a number must less than the value of this. Only evaluated if the value is not null. Only applies to numeric values."
},
"example": {
"type": "string",
"description": "An example value for this field.",
"deprecationMessage": "Use the examples field instead."
},
"examples": {
"type": "array",
"description": "A examples value for this field."
},
"pii": {
"type": "boolean",
"description": "An indication, if this field contains Personal Identifiable Information (PII)."
},
"classification": {
"type": "string",
"description": "The data class defining the sensitivity level for this field, according to the organization's classification scheme.",
"examples": [
"sensitive",
"restricted",
"internal",
"public"
]
},
"tags": {
"type": "array",
"items": {
"type": "string"
},
"description": "Custom metadata to provide additional context."
},
"links": {
"type": "object",
"description": "Links to external resources.",
"minProperties": 1,
"propertyNames": {
"pattern": "^[a-zA-Z0-9_-]+$"
},
"additionalProperties": {
"type": "string",
"title": "Link",
"description": "A URL to an external resource.",
"format": "uri",
"examples": [
"https://example.com"
]
}
},
"$ref": {
"type": "string",
"description": "A reference URI to a definition in the specification, internally or externally. Properties will be inherited from the definition."
},
"quality": {
"type": "array",
"items": {
"$ref": "#/$defs/Quality"
}
},
"lineage": {
"$ref": "#/$defs/Lineage"
},
"config": {
"type": "object",
"description": "Additional metadata for field configuration.",
"additionalProperties": {
"type": [
"string",
"number",
"boolean",
"object",
"array",
"null"
]
},
"properties": {
"avroType": {
"type": "string",
"description": "Specify the field type to use when exporting the data model to Apache Avro."
},
"avroLogicalType": {
"type": "string",
"description": "Specify the logical field type to use when exporting the data model to Apache Avro."
},
"bigqueryType": {
"type": "string",
"description": "Specify the physical column type that is used in a BigQuery table, e.g., `NUMERIC(5, 2)`."
},
"snowflakeType": {
"type": "string",
"description": "Specify the physical column type that is used in a Snowflake table, e.g., `TIMESTAMP_LTZ`."
},
"redshiftType": {
"type": "string",
"description": "Specify the physical column type that is used in a Redshift table, e.g., `SMALLINT`."
},
"sqlserverType": {
"type": "string",
"description": "Specify the physical column type that is used in a SQL Server table, e.g., `DATETIME2`."
},
"databricksType": {
"type": "string",
"description": "Specify the physical column type that is used in a Databricks Unity Catalog table."
},
"glueType": {
"type": "string",
"description": "Specify the physical column type that is used in an AWS Glue Data Catalog table."
}
}
}
}
}
},
"primaryKey": {
"type": "array",
"items": {
"type": "string"
},
"description": "The compound primary key of the model."
},
"quality": {
"type": "array",
"items": {
"$ref": "#/$defs/Quality"
}
},
"examples": {
"type": "array"
},
"additionalFields": {
"type": "boolean",
"description": " Specify, if the model can have additional fields that are not defined in the contract. ",
"default": false
},
"config": {
"type": "object",
"description": "Additional metadata for model configuration.",
"additionalProperties": {
"type": [
"string",
"number",
"boolean",
"object",
"array",
"null"
]
},
"properties": {
"avroNamespace": {
"type": "string",
"description": "The namespace to use when importing and exporting the data model from / to Apache Avro."
}
}
}
}
}
},
"definitions": {
"description": "Clear and concise explanations of syntax, semantic, and classification of business objects in a given domain.",
"type": "object",
"propertyNames": {
"pattern": "^[a-zA-Z0-9/_-]+$"
},
"additionalProperties": {
"type": "object",
"title": "Definition",
"properties": {
"domain": {
"type": "string",
"description": "The domain in which this definition is valid.",
"default": "global",
"deprecationMessage": "This field is deprecated. Encode the domain into the ID using slashes."
},
"name": {
"type": "string",
"description": "The technical name of this definition.",
"deprecationMessage": "This field is deprecated. Encode the name into the ID using slashes."
},
"title": {
"type": "string",
"description": "The business name of this definition."
},
"description": {
"type": "string",
"description": "Clear and concise explanations related to the domain."
},
"type": {
"$ref": "#/$defs/FieldType"
},
"fields": {
"description": "The nested fields (e.g. columns) of the object, record, or struct.",
"type": "object",
"additionalProperties": {
"$ref": "#/properties/models/additionalProperties/properties/fields/additionalProperties"
}
},
"items": {
"$ref": "#/properties/models/additionalProperties/properties/fields/additionalProperties"
},
"keys": {
"$ref": "#/properties/models/additionalProperties/properties/fields/additionalProperties"
},
"values": {
"$ref": "#/properties/models/additionalProperties/properties/fields/additionalProperties"
},
"minLength": {
"type": "integer",
"description": "A value must be greater than or equal to this value. Applies only to string types."
},
"maxLength": {
"type": "integer",
"description": "A value must be less than or equal to this value. Applies only to string types."
},
"format": {
"type": "string",
"description": "Specific format requirements for the value (e.g., 'email', 'uri', 'uuid')."
},
"precision": {
"type": "integer",
"examples": [
38
],
"description": "The maximum number of digits in a number. Only applies to numeric values. Defaults to 38."
},
"scale": {
"type": "integer",
"examples": [
0
],
"description": "The maximum number of decimal places in a number. Only applies to numeric values. Defaults to 0."
},
"pattern": {
"type": "string",
"description": "A regular expression pattern the value must match. Applies only to string types."
},
"minimum": {
"type": "number",
"description": "A value of a number must greater than, or equal to, the value of this. Only evaluated if the value is not null. Only applies to numeric values."
},
"exclusiveMinimum": {
"type": "number",
"description": "A value of a number must greater than the value of this. Only evaluated if the value is not null. Only applies to numeric values."
},
"maximum": {
"type": "number",
"description": "A value of a number must less than, or equal to, the value of this. Only evaluated if the value is not null. Only applies to numeric values."
},
"exclusiveMaximum": {
"type": "number",
"description": "A value of a number must less than the value of this. Only evaluated if the value is not null. Only applies to numeric values."
},
"example": {
"type": "string",
"description": "An example value.",
"deprecationMessage": "Use the examples field instead."
},
"examples": {
"type": "array",
"description": "Example value."
},
"pii": {
"type": "boolean",
"description": "Indicates if the field contains Personal Identifiable Information (PII)."
},
"classification": {
"type": "string",
"description": "The data class defining the sensitivity level for this field."
},
"tags": {
"type": "array",
"items": {
"type": "string"
},
"description": "Custom metadata to provide additional context."
},
"links": {
"type": "object",
"description": "Links to external resources.",
"minProperties": 1,
"propertyNames": {
"pattern": "^[a-zA-Z0-9_-]+$"
},
"additionalProperties": {
"type": "string",
"title": "Link",
"description": "A URL to an external resource.",
"format": "uri",
"examples": [
"https://example.com"
]
}
}
},
"required": [
"type"
]
}
},
"servicelevels": {
"type": "object",
"description": "Specifies the service level agreements for the provided data, including availability, data retention policies, latency requirements, data freshness, update frequency, support availability, and backup policies.",
"properties": {
"availability": {
"type": "object",
"description": "Availability refers to the promise or guarantee by the service provider about the uptime of the system that provides the data.",
"properties": {
"description": {
"type": "string",
"description": "An optional string describing the availability service level.",
"example": "The server is available during support hours"
},
"percentage": {
"type": "string",
"description": "An optional string describing the guaranteed uptime in percent (e.g., `99.9%`)",
"pattern": "^\\d+(\\.\\d+)?%$",
"example": "99.9%"
}
}
},
"retention": {
"type": "object",
"description": "Retention covers the period how long data will be available.",
"properties": {
"description": {
"type": "string",
"description": "An optional string describing the retention service level.",
"example": "Data is retained for one year."
},
"period": {
"type": "string",
"description": "An optional period of time, how long data is available. Supported formats: Simple duration (e.g., `1 year`, `30d`) and ISO 8601 duration (e.g, `P1Y`).",
"example": "P1Y"
},
"unlimited": {
"type": "boolean",
"description": "An optional indicator that data is kept forever.",
"example": false
},
"timestampField": {
"type": "string",
"description": "An optional reference to the field that contains the timestamp that the period refers to.",
"example": "orders.order_timestamp"
}
}
},
"latency": {
"type": "object",
"description": "Latency refers to the maximum amount of time from the source to its destination.",
"properties": {
"description": {
"type": "string",
"description": "An optional string describing the latency service level.",
"example": "Data is available within 25 hours after the order was placed."
},
"threshold": {
"type": "string",
"description": "An optional maximum duration between the source timestamp and the processed timestamp. Supported formats: Simple duration (e.g., `24 hours`, `5s`) and ISO 8601 duration (e.g, `PT24H`).",
"example": "25h"
},
"sourceTimestampField": {
"type": "string",
"description": "An optional reference to the field that contains the timestamp when the data was provided at the source.",
"example": "orders.order_timestamp"
},
"processedTimestampField": {
"type": "string",
"description": "An optional reference to the field that contains the processing timestamp, which denotes when the data is made available to consumers of this data contract.",
"example": "orders.processed_timestamp"
}
}
},
"freshness": {
"type": "object",
"description": "The maximum age of the youngest row in a table.",
"properties": {
"description": {
"type": "string",
"description": "An optional string describing the freshness service level.",
"example": "The age of the youngest row in a table is within 25 hours."
},
"threshold": {
"type": "string",
"description": "An optional maximum age of the youngest entry. Supported formats: Simple duration (e.g., `24 hours`, `5s`) and ISO 8601 duration (e.g., `PT24H`).",
"example": "25h"
},
"timestampField": {
"type": "string",
"description": "An optional reference to the field that contains the timestamp that the threshold refers to.",
"example": "orders.order_timestamp"
}
}
},
"frequency": {
"type": "object",
"description": "Frequency describes how often data is updated.",
"properties": {
"description": {
"type": "string",
"description": "An optional string describing the frequency service level.",
"example": "Data is delivered once a day."
},
"type": {
"type": "string",
"enum": [
"batch",
"micro-batching",
"streaming",
"manual"
],
"description": "The method of data processing.",
"example": "batch"
},
"interval": {
"type": "string",
"description": "Optional. Only for batch: How often the pipeline is triggered, e.g., `daily`.",
"example": "daily"
},
"cron": {
"type": "string",
"description": "Optional. Only for batch: A cron expression when the pipelines is triggered. E.g., `0 0 * * *`.",
"example": "0 0 * * *"
}
}
},
"support": {
"type": "object",
"description": "Support describes the times when support will be available for contact.",
"properties": {
"description": {
"type": "string",
"description": "An optional string describing the support service level.",
"example": "The data is available during typical business hours at headquarters."
},
"time": {
"type": "string",
"description": "An optional string describing the times when support will be available for contact such as `24/7` or `business hours only`.",
"example": "9am to 5pm in EST on business days"
},
"responseTime": {
"type": "string",
"description": "An optional string describing the time it takes for the support team to acknowledge a request. This does not mean the issue will be resolved immediately, but it assures users that their request has been received and will be dealt with.",
"example": "24 hours"
}
}
},
"backup": {
"type": "object",
"description": "Backup specifies details about data backup procedures.",
"properties": {
"description": {
"type": "string",
"description": "An optional string describing the backup service level.",
"example": "Data is backed up once a week, every Sunday at 0:00 UTC."
},
"interval": {
"type": "string",
"description": "An optional interval that defines how often data will be backed up, e.g., `daily`.",
"example": "weekly"
},
"cron": {
"type": "string",
"description": "An optional cron expression when data will be backed up, e.g., `0 0 * * *`.",
"example": "0 0 * * 0"
},
"recoveryTime": {
"type": "string",
"description": "An optional Recovery Time Objective (RTO) specifies the maximum amount of time allowed to restore data from a backup after a failure or loss event (e.g., 4 hours, 24 hours).",
"example": "24 hours"
},
"recoveryPoint": {
"type": "string",
"description": "An optional Recovery Point Objective (RPO) defines the maximum acceptable age of files that must be recovered from backup storage for normal operations to resume after a disaster or data loss event. This essentially measures how much data you can afford to lose, measured in time (e.g., 4 hours, 24 hours).",
"example": "1 week"
}
}
}
}
},
"links": {
"type": "object",
"description": "Links to external resources.",
"minProperties": 1,
"propertyNames": {
"pattern": "^[a-zA-Z0-9_-]+$"
},
"additionalProperties": {
"type": "string",
"title": "Link",
"description": "A URL to an external resource.",
"format": "uri",
"examples": [
"https://example.com"
]
}
},
"tags": {
"type": "array",
"items": {
"type": "string",
"description": "Tags to facilitate searching and filtering.",
"examples": [
"databricks",
"pii",
"sensitive"
]
},
"description": "Tags to facilitate searching and filtering."
}
},
"required": [
"dataContractSpecification",
"id",
"info"
],
"$defs": {
"FieldType": {
"type": "string",
"title": "FieldType",
"description": "The logical data type of the field.",
"enum": [
"number",
"decimal",
"numeric",
"int",
"integer",
"long",
"bigint",
"float",
"double",
"string",
"text",
"varchar",
"boolean",
"timestamp",
"timestamp_tz",
"timestamp_ntz",
"date",
"time",
"array",
"map",
"object",
"record",
"struct",
"bytes",
"variant",
"json",
"null"
]
},
"BaseServer": {
"type": "object",
"properties": {
"description": {
"type": "string",
"description": "An optional string describing the servers."
},
"environment": {
"type": "string",
"description": "The environment in which the servers are running. Examples: prod, sit, stg."
},
"type": {
"type": "string",
"description": "The type of the data product technology that implements the data contract.",
"examples": [
"azure",
"bigquery",
"BigQuery",
"clickhouse",
"databricks",
"dataframe",
"glue",
"kafka",
"kinesis",
"local",
"oracle",
"postgres",
"pubsub",
"redshift",
"sftp",
"sqlserver",
"snowflake",
"s3",
"trino"
]
},
"roles": {
"description": " An optional array of roles that are available and can be requested to access the server for role-based access control. E.g. separate roles for different regions or sensitive data.",
"type": "array",
"items": {
"type": "object",
"properties": {
"name": {
"type": "string",
"description": "The name of the role."
},
"description": {
"type": "string",
"description": "A description of the role and what access the role provides."
}
},
"required": [
"name"
]
}
}
},
"additionalProperties": true,
"required": [
"type"
]
},
"BigQueryServer": {
"type": "object",
"title": "BigQueryServer",
"properties": {
"project": {
"type": "string",
"description": "The GCP project name."
},
"dataset": {
"type": "string",
"description": "The GCP dataset name."
}
},
"required": [
"project",
"dataset"
]
},
"S3Server": {
"type": "object",
"title": "S3Server",
"properties": {
"location": {
"type": "string",
"format": "uri",
"description": "S3 URL, starting with `s3://`",
"examples": [
"s3://datacontract-example-orders-latest/data/{model}/*.json"
]
},
"endpointUrl": {
"type": "string",
"format": "uri",
"description": "The server endpoint for S3-compatible servers.",
"examples": [
"https://minio.example.com"
]
},
"format": {
"type": "string",
"enum": [
"parquet",
"delta",
"json",
"csv"
],
"description": "File format."
},
"delimiter": {
"type": "string",
"enum": [
"new_line",
"array"
],
"description": "Only for format = json. How multiple json documents are delimited within one file"
}
},
"required": [
"location"
]
},
"SftpServer": {
"type": "object",
"title": "SftpServer",
"properties": {
"location": {
"type": "string",
"format": "uri",
"pattern": "^sftp://.*",
"description": "SFTP URL, starting with `sftp://`",
"examples": [
"sftp://123.123.12.123/{model}/*.json"
]
},
"format": {
"type": "string",
"enum": [
"parquet",
"delta",
"json",
"csv"
],
"description": "File format."
},
"delimiter": {
"type": "string",
"enum": [
"new_line",
"array"
],
"description": "Only for format = json. How multiple json documents are delimited within one file"
}
},
"required": [
"location"
]
},
"RedshiftServer": {
"type": "object",
"title": "RedshiftServer",
"properties": {
"account": {
"type": "string",
"description": "An optional string describing the server."
},
"host": {
"type": "string",
"description": "An optional string describing the host name."
},
"database": {
"type": "string",
"description": "An optional string describing the server."
},
"schema": {
"type": "string",
"description": "An optional string describing the server."
},
"clusterIdentifier": {
"type": "string",
"description": "An optional string describing the cluster's identifier.",
"examples": [
"redshift-prod-eu",
"analytics-cluster"
]
},
"port": {
"type": "integer",
"description": "An optional string describing the cluster's port.",
"examples": [
5439
]
},
"endpoint": {
"type": "string",
"description": "An optional string describing the cluster's endpoint.",
"examples": [
"analytics-cluster.example.eu-west-1.redshift.amazonaws.com:5439/analytics"
]
}
},
"additionalProperties": true,
"required": [
"account",
"database",
"schema"
]
},
"AzureServer": {
"type": "object",
"title": "AzureServer",
"properties": {
"location": {
"type": "string",
"format": "uri",
"description": "Path to Azure Blob Storage or Azure Data Lake Storage (ADLS), supports globs. Recommended pattern is 'abfss://<container_name>/<path>'",
"examples": [
"abfss://my_container_name/path",
"abfss://my_container_name/path/*.json",
"az://my_storage_account_name.blob.core.windows.net/my_container/path/*.parquet",
"abfss://my_storage_account_name.dfs.core.windows.net/my_container_name/path/*.parquet"
]
},
"format": {
"type": "string",
"enum": [
"parquet",
"delta",
"json",
"csv"
],
"description": "File format."
},
"delimiter": {
"type": "string",
"enum": [
"new_line",
"array"
],
"description": "Only for format = json. How multiple json documents are delimited within one file"
}
},
"required": [
"location",
"format"
]
},
"SqlserverServer": {
"type": "object",
"title": "SqlserverServer",
"properties": {
"host": {
"type": "string",
"description": "The host to the database server",
"examples": [
"localhost"
]
},
"port": {
"type": "integer",
"description": "The port to the database server.",
"default": 1433,
"examples": [
1433
]
},
"database": {
"type": "string",
"description": "The name of the database.",
"examples": [
"database"
]
},
"schema": {
"type": "string",
"description": "The name of the schema in the database.",
"examples": [
"dbo"
]
}
},
"required": [
"host",
"database",
"schema"
]
},
"SnowflakeServer": {
"type": "object",
"title": "SnowflakeServer",
"properties": {
"account": {
"type": "string",
"description": "An optional string describing the server."
},
"database": {
"type": "string",
"description": "An optional string describing the server."
},
"schema": {
"type": "string",
"description": "An optional string describing the server."
}
},
"required": [
"account",
"database",
"schema"
]
},
"DatabricksServer": {
"type": "object",
"title": "DatabricksServer",
"properties": {
"host": {
"type": "string",
"description": "The Databricks host",
"examples": [
"dbc-abcdefgh-1234.cloud.databricks.com"
]
},
"catalog": {
"type": "string",
"description": "The name of the Hive or Unity catalog"
},
"schema": {
"type": "string",
"description": "The schema name in the catalog"
}
},
"required": [
"catalog",
"schema"
]
},
"DataframeServer": {
"type": "object",
"title": "DataframeServer",
"required": [
"type"
]
},
"GlueServer": {
"type": "object",
"title": "GlueServer",
"properties": {
"account": {
"type": "string",
"description": "The AWS Glue account",
"examples": [
"1234-5678-9012"
]
},
"database": {
"type": "string",
"description": "The AWS Glue database name",
"examples": [
"my_database"
]
},
"location": {
"type": "string",
"format": "uri",
"description": "The AWS S3 path. Must be in the form of a URL.",
"examples": [
"s3://datacontract-example-orders-latest/data/{model}"
]
},
"format": {
"type": "string",
"description": "The format of the files",
"examples": [
"parquet",
"csv",
"json",
"delta"
]
}
},
"required": [
"account",
"database"
]
},
"PostgresServer": {
"type": "object",
"title": "PostgresServer",
"properties": {
"host": {
"type": "string",
"description": "The host to the database server",
"examples": [
"localhost"
]
},
"port": {
"type": "integer",
"description": "The port to the database server."
},
"database": {
"type": "string",
"description": "The name of the database.",
"examples": [
"postgres"
]
},
"schema": {
"type": "string",
"description": "The name of the schema in the database.",
"examples": [
"public"
]
}
},
"required": [
"host",
"port",
"database",
"schema"
]
},
"OracleServer": {
"type": "object",
"title": "OracleServer",
"properties": {
"host": {
"type": "string",
"description": "The host to the oracle server",
"examples": [
"localhost"
]
},
"port": {
"type": "integer",
"description": "The port to the oracle server.",
"examples": [
1523
]
},
"serviceName": {
"type": "string",
"description": "The name of the service.",
"examples": [
"service"
]
}
},
"required": [
"host",
"port",
"serviceName"
]
},
"KafkaServer": {
"type": "object",
"title": "KafkaServer",
"description": "Kafka Server",
"properties": {
"host": {
"type": "string",
"description": "The bootstrap server of the kafka cluster."
},
"topic": {
"type": "string",
"description": "The topic name."
},
"format": {
"type": "string",
"description": "The format of the message. Examples: json, avro, protobuf.",
"default": "json"
}
},
"required": [
"host",
"topic"
]
},
"PubSubServer": {
"type": "object",
"title": "PubSubServer",
"properties": {
"project": {
"type": "string",
"description": "The GCP project name."
},
"topic": {
"type": "string",
"description": "The topic name."
}
},
"required": [
"project",
"topic"
]
},
"KinesisDataStreamsServer": {
"type": "object",
"title": "KinesisDataStreamsServer",
"description": "Kinesis Data Streams Server",
"properties": {
"stream": {
"type": "string",
"description": "The name of the Kinesis data stream."
},
"region": {
"type": "string",
"description": "AWS region.",
"examples": [
"eu-west-1"
]
},
"format": {
"type": "string",
"description": "The format of the record",
"examples": [
"json",
"avro",
"protobuf"
]
}
},
"required": [
"stream"
]
},
"TrinoServer": {
"type": "object",
"title": "TrinoServer",
"properties": {
"host": {
"type": "string",
"description": "The Trino host URL.",
"examples": [
"localhost"
]
},
"port": {
"type": "integer",
"description": "The Trino port."
},
"catalog": {
"type": "string",
"description": "The name of the catalog.",
"examples": [
"hive"
]
},
"schema": {
"type": "string",
"description": "The name of the schema in the database.",
"examples": [
"my_schema"
]
}
},
"required": [
"host",
"port",
"catalog",
"schema"
]
},
"ClickhouseServer": {
"type": "object",
"title": "ClickhouseServer",
"properties": {
"host": {
"type": "string",
"description": "The host to the database server",
"examples": [
"localhost"
]
},
"port": {
"type": "integer",
"description": "The port to the database server."
},
"database": {
"type": "string",
"description": "The name of the database.",
"examples": [
"postgres"
]
}
},
"required": [
"host",
"port",
"database"
]
},
"LocalServer": {
"type": "object",
"title": "LocalServer",
"properties": {
"path": {
"type": "string",
"description": "The relative or absolute path to the data file(s).",
"examples": [
"./folder/data.parquet",
"./folder/*.parquet"
]
},
"format": {
"type": "string",
"description": "The format of the file(s)",
"examples": [
"json",
"parquet",
"delta",
"csv"
]
}
},
"required": [
"path",
"format"
]
},
"Quality": {
"allOf": [
{
"type": "object",
"properties": {
"type": {
"type": "string",
"description": "The type of quality check",
"enum": [
"text",
"library",
"sql",
"custom"
]
},
"description": {
"type": "string",
"description": "A plain text describing the quality attribute in natural language."
}
}
},
{
"if": {
"properties": {
"type": {
"const": "text"
}
}
},
"then": {
"required": [
"description"
]
}
},
{
"if": {
"properties": {
"type": {
"const": "sql"
}
}
},
"then": {
"properties": {
"query": {
"type": "string",
"description": "A SQL query that returns a single number to compare with the threshold."
},
"dialect": {
"type": "string",
"description": "The SQL dialect that is used for the query. Should be compatible to the server.type.",
"examples": [
"athena",
"bigquery",
"redshift",
"snowflake",
"trino",
"postgres",
"oracle"
]
},
"mustBe": {
"type": "number"
},
"mustNotBe": {
"type": "number"
},
"mustBeGreaterThan": {
"type": "number"
},
"mustBeGreaterOrEqualTo": {
"type": "number"
},
"mustBeGreaterThanOrEqualTo": {
"type": "number",
"deprecated": true
},
"mustBeLessThan": {
"type": "number"
},
"mustBeLessThanOrEqualTo": {
"type": "number",
"deprecated": true
},
"mustBeLessOrEqualTo": {
"type": "number"
},
"mustBeBetween": {
"type": "array",
"items": {
"type": "number"
},
"minItems": 2,
"maxItems": 2
},
"mustNotBeBetween": {
"type": "array",
"items": {
"type": "number"
},
"minItems": 2,
"maxItems": 2
}
},
"required": [
"query"
]
}
},
{
"if": {
"properties": {
"type": {
"const": "library"
}
}
},
"then": {
"properties": {
"metric": {
"type": "string",
"description": "The DataQualityLibrary metric to use for the quality check.",
"examples": ["nullValues", "missingValues", "invalidValues", "duplicateValues", "rowCount"]
},
"rule": {
"type": "string",
"deprecated": true,
"description": "Deprecated. Use metric instead"
},
"arguments": {
"type": "object",
"description": "Additional metric-specific parameters for the quality check.",
"additionalProperties": {
"type": ["string", "number", "boolean", "array", "object"]
}
},
"mustBe": {
"description": "Must be equal to the value to be valid. When using numbers, it is equivalent to '='."
},
"mustNotBe": {
"description": "Must not be equal to the value to be valid. When using numbers, it is equivalent to '!='."
},
"mustBeGreaterThan": {
"type": "number",
"description": "Must be greater than the value to be valid. It is equivalent to '>'."
},
"mustBeGreaterOrEqualTo": {
"type": "number",
"description": "Must be greater than or equal to the value to be valid. It is equivalent to '>='."
},
"mustBeLessThan": {
"type": "number",
"description": "Must be less than the value to be valid. It is equivalent to '<'."
},
"mustBeLessOrEqualTo": {
"type": "number",
"description": "Must be less than or equal to the value to be valid. It is equivalent to '<='."
},
"mustBeBetween": {
"type": "array",
"description": "Must be between the two numbers to be valid. Smallest number first in the array.",
"minItems": 2,
"maxItems": 2,
"uniqueItems": true,
"items": {
"type": "number"
}
},
"mustNotBeBetween": {
"type": "array",
"description": "Must not be between the two numbers to be valid. Smallest number first in the array.",
"minItems": 2,
"maxItems": 2,
"uniqueItems": true,
"items": {
"type": "number"
}
}
},
"required": [
"metric"
]
}
},
{
"if": {
"properties": {
"type": {
"const": "custom"
}
}
},
"then": {
"properties": {
"description": {
"type": "string",
"description": "A plain text describing the quality attribute in natural language."
},
"engine": {
"type": "string",
"examples": [
"soda",
"great-expectations"
],
"description": "The engine used for custom quality checks."
},
"implementation": {
"type": [
"object",
"array",
"string"
],
"description": "Engine-specific quality checks and expectations."
}
},
"required": [
"engine"
]
}
}
]
},
"Lineage": {
"type": "object",
"properties": {
"inputFields": {
"type": "array",
"items": {
"type": "object",
"properties": {
"namespace": {
"type": "string",
"description": "The input dataset namespace"
},
"name": {
"type": "string",
"description": "The input dataset name"
},
"field": {
"type": "string",
"description": "The input field"
},
"transformations": {
"type": "array",
"items": {
"type": "object",
"properties": {
"type": {
"description": "The type of the transformation. Allowed values are: DIRECT, INDIRECT",
"type": "string"
},
"subtype": {
"type": "string",
"description": "The subtype of the transformation"
},
"description": {
"type": "string",
"description": "a string representation of the transformation applied"
},
"masking": {
"type": "boolean",
"description": "is transformation masking the data or not"
}
},
"required": [
"type"
],
"additionalProperties": true
}
}
},
"additionalProperties": true,
"required": [
"namespace",
"name",
"field"
]
}
},
"transformationDescription": {
"type": "string",
"description": "a string representation of the transformation applied",
"deprecated": true
},
"transformationType": {
"type": "string",
"description": "IDENTITY|MASKED reflects a clearly defined behavior. IDENTITY: exact same as input; MASKED: no original data available (like a hash of PII for example)",
"deprecated": true
}
},
"additionalProperties": true,
"required": [
"inputFields"
]
}
}
}
================================================
FILE: definition.schema.json
================================================
{
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "object",
"description": "Clear and concise explanations of syntax, semantic, and classification of business objects in a given domain.",
"properties": {
"id": {
"type": "string",
"description": "A unique identifier for this definition. Encode the domain into the ID, separated by slashes.",
"examples": [
"checkout/order_id"
]
},
"title": {
"type": "string",
"description": "The business name of this definition."
},
"description": {
"type": "string",
"description": "Clear and concise explanations related to the domain."
},
"type": {
"type": "string",
"description": "The logical data type."
},
"minLength": {
"type": "integer",
"description": "A value must be greater than or equal to this value. Applies only to string types."
},
"maxLength": {
"type": "integer",
"description": "A value must be less than or equal to this value. Applies only to string types."
},
"format": {
"type": "string",
"description": "Specific format requirements for the value (e.g., 'email', 'uri', 'uuid')."
},
"precision": {
"type": "integer",
"examples": [
38
],
"description": "The maximum number of digits in a number. Only applies to numeric values. Defaults to 38."
},
"scale": {
"type": "integer",
"examples": [
0
],
"description": "The maximum number of decimal places in a number. Only applies to numeric values. Defaults to 0."
},
"pattern": {
"type": "string",
"description": "A regular expression pattern the value must match. Applies only to string types."
},
"example": {
"type": "string",
"description": "An example value for this field.",
"deprecationMessage": "Use the examples field instead."
},
"examples": {
"type": "array",
"description": "A examples value for this field."
},
"pii": {
"type": "boolean",
"description": "Indicates if the field contains Personal Identifiable Information (PII)."
},
"classification": {
"type": "string",
"description": "The data class defining the sensitivity level for this field."
},
"tags": {
"type": "array",
"items": {
"type": "string"
},
"description": "Custom metadata to provide additional context."
},
"links": {
"type": "object",
"description": "Links to external resources.",
"minProperties": 1,
"propertyNames": {
"pattern": "^[a-zA-Z0-9_-]+$"
},
"additionalProperties": {
"type": "string",
"title": "Link",
"description": "A URL to an external resource.",
"format": "uri",
"examples": [
"https://example.com"
]
}
}
},
"required": [
"type"
]
}
================================================
FILE: diagrams/automation.drawio
================================================
<mxfile host="Electron" modified="2024-10-26T19:15:16.643Z" agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) draw.io/21.5.1 Chrome/112.0.5615.204 Electron/24.6.0 Safari/537.36" etag="7o40z_2iALpzpzZdF8zj" version="21.5.1" type="device">
<diagram name="datacontractcli-v2" id="tp5WBm8LkCMx9FwTQ9-I">
<mxGraphModel dx="760" dy="500" grid="1" gridSize="10" guides="1" tooltips="1" connect="1" arrows="1" fold="1" page="1" pageScale="1" pageWidth="850" pageHeight="1100" math="0" shadow="0">
<root>
<mxCell id="AiJdHj6Q9A8rBBn62U_L-0" />
<mxCell id="AiJdHj6Q9A8rBBn62U_L-1" parent="AiJdHj6Q9A8rBBn62U_L-0" />
<mxCell id="AiJdHj6Q9A8rBBn62U_L-2" value="" style="rounded=0;whiteSpace=wrap;html=1;strokeColor=none;" parent="AiJdHj6Q9A8rBBn62U_L-1" vertex="1">
<mxGeometry x="10" y="230" width="680" height="340" as="geometry" />
</mxCell>
<mxCell id="AiJdHj6Q9A8rBBn62U_L-3" value="" style="rounded=1;whiteSpace=wrap;html=1;fontFamily=Courier New;fontSize=14;arcSize=0;" parent="AiJdHj6Q9A8rBBn62U_L-1" vertex="1">
<mxGeometry x="190" y="280" width="317" height="166.32" as="geometry" />
</mxCell>
<mxCell id="AiJdHj6Q9A8rBBn62U_L-4" value="" style="endArrow=classic;html=1;rounded=0;" parent="AiJdHj6Q9A8rBBn62U_L-1" edge="1">
<mxGeometry width="50" height="50" relative="1" as="geometry">
<mxPoint x="513" y="365" as="sourcePoint" />
<mxPoint x="555" y="365" as="targetPoint" />
</mxGeometry>
</mxCell>
<mxCell id="AiJdHj6Q9A8rBBn62U_L-5" value="<span style="font-size: 11px;">export</span>" style="edgeLabel;html=1;align=center;verticalAlign=middle;resizable=0;points=[];fontFamily=Architects Daughter;fontSource=https%3A%2F%2Ffonts.googleapis.com%2Fcss%3Ffamily%3DArchitects%2BDaughter;fontSize=11;" parent="AiJdHj6Q9A8rBBn62U_L-4" vertex="1" connectable="0">
<mxGeometry x="-0.475" y="1" relative="1" as="geometry">
<mxPoint x="9" y="-15" as="offset" />
</mxGeometry>
</mxCell>
<mxCell id="AiJdHj6Q9A8rBBn62U_L-8" value="" style="endArrow=classic;html=1;rounded=0;startArrow=none;" parent="AiJdHj6Q9A8rBBn62U_L-1" edge="1">
<mxGeometry width="50" height="50" relative="1" as="geometry">
<mxPoint x="347" y="450" as="sourcePoint" />
<mxPoint x="347" y="480" as="targetPoint" />
<Array as="points">
<mxPoint x="347" y="470" />
</Array>
</mxGeometry>
</mxCell>
<mxCell id="AiJdHj6Q9A8rBBn62U_L-9" value="test" style="edgeLabel;html=1;align=center;verticalAlign=middle;resizable=0;points=[];fontSize=11;fontFamily=Architects Daughter;" parent="AiJdHj6Q9A8rBBn62U_L-8" vertex="1" connectable="0">
<mxGeometry x="-0.175" relative="1" as="geometry">
<mxPoint x="21" y="2" as="offset" />
</mxGeometry>
</mxCell>
<mxCell id="AiJdHj6Q9A8rBBn62U_L-10" value="" style="shape=image;verticalLabelPosition=bottom;labelBackgroundColor=default;verticalAlign=top;aspect=fixed;imageAspect=0;image=data:image/svg+xml,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHZpZXdCb3g9IjAgMCAyNCAyNCI+PGc+PHBhdGggZmlsbC1ydWxlPSJldmVub2RkIiBmaWxsPSIjMDIwMjAyIiBkPSJNMjEuMTQgNi45NGEuNzcuNzcgMCAwIDAtLjQ4LjJjLTEtLjgtMi4xNC0uMjUtMyAuNzJBMTYuMTIgMTYuMTIgMCAwIDAgMTYuMTUgMTBhLjI4LjI4IDAgMCAwIC40LjMyYy4wOCAwIDEuMzktMS4yMSAxLjQ3LTEuMjguNzMtLjY4IDEuMzUtMS40MiAxLjgxLTEuNDhzLjMyIDAgLjQxLjA5YTE5IDE5IDAgMCAwLTIuMDcgMi44MWMwIC4xMi0uMzEgMCAxLjUzIDEuNzFsLjI3LjI0QzE5IDEzLjU3IDE2IDE2LjczIDE1Ljc0IDE3YTIuNjYgMi42NiAwIDAgMS0xIC40MyAzLjcyIDMuNzIgMCAwIDEtLjIyLTEuNTJMMTcuNyAxMmEuMy4zIDAgMCAwLS40Ny0uMzdjLS43Mi44OC0zLjQ1IDMuMzktMy42OSAzLjkxYTQgNCAwIDAgMCAuMjggMi42MiAxIDEgMCAwIDAgLjkzLjMyIDQuMjYgNC4yNiAwIDAgMCAxLjc2LS42NWMuNi0uNjIgMy44Ni00Ljc5IDQuNjUtNS4zMWEyMCAyMCAwIDAgMCAxLjc3LTEuNjQgNC42NyA0LjY3IDAgMCAwIDEtMS4zN2MuNS0xLjU3LTEuODItMi42Ni0yLjc5LTIuNTdabTEuNjYgMi4zM2ExMi45IDEyLjkgMCAwIDEtMS41OCAxLjg4Yy0uOTEgMS0uNjUuNzQtMSAuNDYtMS41Mi0xLjI2LTEuMzMtMS0xLjI3LTEuMjFsMi0yLjE2Yy4zNC0uMzgtLjA2LS40NS43LS4xNS40NC4xNiAxLjQzLjYzIDEuMTUgMS4xOFoiLz48cGF0aCBmaWxsLXJ1bGU9ImV2ZW5vZGQiIGZpbGw9IiMwMjAyMDIiIGQ9Ik03LjIgMjIuNTNjLTEuNDgtLjI1LTEuNDgtMi4xMS0xLjYxLTMuNTNhNDAgNDAgMCAwIDEgLjE1LTYuNTdjLjE4LTIuNDYuNi0yLjI1LjU5LTUuNTEgMC0xLjczLS4zMy00LjA2LTEuNzktNWE0MC4yOSA0MC4yOSAwIDAgMSA2LjI0LjA4YzIuNjQuMTggNS43MS0uMTQgNi45MS42M0MxOS4yMyAzLjYzIDE4LjEyIDYgMTkgNmEuMzIuMzIgMCAwIDAgLjMxLS4zNmMtLjA2LTEuMzEuMjQtMi40NC0uODEtMy40NEE1LjgyIDUuODIgMCAwIDAgMTQuNzYgMWE3NS40NiA3NS40NiAwIDAgMC04LjU3IDBDNS43NyAxIDEuNiAxLjUgMSAyLjA4QTMuMTcgMy4xNyAwIDAgMCAwIDQuNSA0LjYxIDQuNjEgMCAwIDAgLjcyIDdhMS45MiAxLjkyIDAgMCAwIC45My42MmMxIC4zMSAzLjI0LjI1IDIuODYtLjU1QzQuNCA2LjggMy4xOSA3IDMuMTEgN2EyLjM1IDIuMzUgMCAwIDEtMS43Ny0uNTYgMy43NyAzLjc3IDAgMCAxLS40OC0yIDIuMzEgMi4zMSAwIDAgMSAuNzYtMS43MyAzLjU0IDMuNTQgMCAwIDEgMS4yNy0uMzFjMS4xMSAwIDEuNjQuNjYgMiAxLjczYTE0LjEzIDE0LjEzIDAgMCAxIC4zIDQuNDYgMzMgMzMgMCAwIDAtLjU1IDdjMCAxLjQzLjE5IDUgLjg0IDYuMjJhMi4xIDIuMSAwIDAgMCAxLjY3IDEuMjQuMy4zIDAgMCAwIC4wNS0uNTJaIi8+PHBhdGggZmlsbC1ydWxlPSJldmVub2RkIiBmaWxsPSIjMDIwMjAyIiBkPSJNMjEuODUgMTkuNTRhOCA4IDAgMCAxLTEuNzEtMy4xOS4yOS4yOSAwIDEgMC0uNTcuMTVjLjc1IDMuMzMgMi4xOCA0IDIuNzkgNC43OS0uMzQuMDgtMTMuMjYgMS4xLTE0IDFhLjMzLjMzIDAgMSAwLS4wOS42NmMxLjYxLjI2IDEwLjU5LS4yMiAxMy4xLS40OWEzIDMgMCAwIDAgMi0uNjVjLjYzLS42Ny0uNzEtMS40Ny0xLjUyLTIuMjdaIi8+PHBhdGggZmlsbC1ydWxlPSJldmVub2RkIiBmaWxsPSIjMGM2ZmZmIiBkPSJNOS42NyAxNi44NmMuMzkgMCAuNDYuODQuNSAxLjI2IDAgLjEzLS4wNy43OC4zNS45NWEuNTEuNTEgMCAwIDAgLjQ1IDBjMS42My0uODUuOTMtLjE4IDEuNDYtLjE4LjY4IDAgLjU2LTEuMzktMS4yNS0xIDAtMS4yNC0uMzQtMi4yOS0xLjU5LTIuMmEyLjc1IDIuNzUgMCAwIDAtMi4xOCAxLjgxIDMuOTIgMy45MiAwIDAgMC0uMzQgMS4wNWMuMTEgMS44NC44Ni0xLjY4IDIuNi0xLjY5WiIvPjxwYXRoIGZpbGwtcnVsZT0iZXZlbm9kZCIgZmlsbD0iIzBjNmZmZiIgZD0iTTguNTcgOC40MWM1LjI4LjE5IDIuMzIgMCA2LjA4IDBhLjM0LjM0IDAgMCAwIC4zNS0uMzUuNDcuNDcgMCAwIDAtLjM5LS4zN2MtLjI4LS4wNi0yLjEtLjMtMi45NC0uMzNhMTcuODQgMTcuODQgMCAwIDAtMy44NC40OC4zMS4zMSAwIDAgMCAuMi41M2MuMDkuMDIuNDguMDQuNTQuMDRaIi8+PHBhdGggZmlsbC1ydWxlPSJldmVub2RkIiBmaWxsPSIjMGM2ZmZmIiBkPSJNMTMuMjIgMTEuNjRjLTUuMi0uMTUtNS40OS4xMi01LjU3LjM0YS4zLjMgMCAwIDAgLjE3LjM4IDEuMjMgMS4yMyAwIDAgMCAuNDQuMWMuODQuMDYgNy4yOC4zOCA3LjI0LS40Mi0uMDMtLjU4LTEuMzctLjMzLTIuMjgtLjRaIi8+PC9nPjwvc3ZnPg==;" parent="AiJdHj6Q9A8rBBn62U_L-1" vertex="1">
<mxGeometry x="469" y="288.5" width="30" height="30" as="geometry" />
</mxCell>
<mxCell id="AiJdHj6Q9A8rBBn62U_L-11" value="Data Contract CLI" style="text;html=1;strokeColor=none;fillColor=none;align=center;verticalAlign=middle;whiteSpace=wrap;rounded=0;fontStyle=1;fontSize=23;fontFamily=Architects Daughter;fontSource=https%3A%2F%2Ffonts.googleapis.com%2Fcss%3Ffamily%3DArchitects%2BDaughter;" parent="AiJdHj6Q9A8rBBn62U_L-1" vertex="1">
<mxGeometry x="190" y="243" width="315" height="30" as="geometry" />
</mxCell>
<mxCell id="AiJdHj6Q9A8rBBn62U_L-12" value="" style="rounded=1;whiteSpace=wrap;html=1;fontFamily=Courier New;fontSize=14;" parent="AiJdHj6Q9A8rBBn62U_L-1" vertex="1">
<mxGeometry x="241" y="485" width="48" height="50" as="geometry" />
</mxCell>
<mxCell id="AiJdHj6Q9A8rBBn62U_L-13" value="<font style="font-size: 8px;" data-font-src="https://fonts.googleapis.com/css?family=Architects+Daughter" face="Architects Daughter">BigQuery</font>" style="text;html=1;strokeColor=none;fillColor=none;align=center;verticalAlign=middle;whiteSpace=wrap;rounded=0;fontFamily=Courier New;fontSize=8;" parent="AiJdHj6Q9A8rBBn62U_L-1" vertex="1">
<mxGeometry x="240" y="517" width="50" height="13" as="geometry" />
</mxCell>
<mxCell id="AiJdHj6Q9A8rBBn62U_L-14" value="" style="rounded=1;whiteSpace=wrap;html=1;fontFamily=Courier New;fontSize=14;" parent="AiJdHj6Q9A8rBBn62U_L-1" vertex="1">
<mxGeometry x="405" y="485" width="48" height="50" as="geometry" />
</mxCell>
<mxCell id="AiJdHj6Q9A8rBBn62U_L-16" value="" style="rounded=1;whiteSpace=wrap;html=1;fontFamily=Courier New;fontSize=14;" parent="AiJdHj6Q9A8rBBn62U_L-1" vertex="1">
<mxGeometry x="295" y="485" width="48" height="50" as="geometry" />
</mxCell>
<mxCell id="AiJdHj6Q9A8rBBn62U_L-20" value="" style="rounded=1;whiteSpace=wrap;html=1;fontFamily=Courier New;fontSize=14;" parent="AiJdHj6Q9A8rBBn62U_L-1" vertex="1">
<mxGeometry x="350" y="485" width="48" height="50" as="geometry" />
</mxCell>
<mxCell id="AiJdHj6Q9A8rBBn62U_L-23" value="" style="rounded=1;whiteSpace=wrap;html=1;fontFamily=Courier New;fontSize=14;" parent="AiJdHj6Q9A8rBBn62U_L-1" vertex="1">
<mxGeometry x="459" y="485" width="48" height="50" as="geometry" />
</mxCell>
<mxCell id="AiJdHj6Q9A8rBBn62U_L-24" value="<font style="font-size: 8px;" data-font-src="https://fonts.googleapis.com/css?family=Architects+Daughter" face="Architects Daughter">Kafka</font>" style="text;html=1;strokeColor=none;fillColor=none;align=center;verticalAlign=middle;whiteSpace=wrap;rounded=0;fontFamily=Courier New;fontSize=8;" parent="AiJdHj6Q9A8rBBn62U_L-1" vertex="1">
<mxGeometry x="457" y="518" width="50" height="13" as="geometry" />
</mxCell>
<mxCell id="AiJdHj6Q9A8rBBn62U_L-25" value="" style="shape=image;verticalLabelPosition=bottom;labelBackgroundColor=default;verticalAlign=top;aspect=fixed;imageAspect=0;image=https://upload.wikimedia.org/wikipedia/commons/thumb/0/05/Apache_kafka.svg/1200px-Apache_kafka.svg.png;" parent="AiJdHj6Q9A8rBBn62U_L-1" vertex="1">
<mxGeometry x="473.62" y="490.38" width="16.77" height="27.25" as="geometry" />
</mxCell>
<mxCell id="AiJdHj6Q9A8rBBn62U_L-26" value="" style="rounded=1;whiteSpace=wrap;html=1;fontFamily=Courier New;fontSize=14;" parent="AiJdHj6Q9A8rBBn62U_L-1" vertex="1">
<mxGeometry x="187" y="485" width="48" height="50" as="geometry" />
</mxCell>
<mxCell id="AiJdHj6Q9A8rBBn62U_L-27" value="<font style="font-size: 8px;" data-font-src="https://fonts.googleapi
gitextract_glrnu_dz/ ├── .github/ │ ├── validate-examples │ └── workflows/ │ └── ci.yaml ├── .gitignore ├── CHANGELOG.md ├── CNAME ├── LICENSE ├── README.md ├── _config.yml ├── _layouts/ │ └── default.html ├── datacontract.init.yaml ├── datacontract.schema.json ├── definition.schema.json ├── diagrams/ │ ├── automation.drawio │ ├── datacontract.drawio │ └── favicon.drawio ├── examples/ │ ├── covid-cases/ │ │ ├── datacontract.html │ │ └── datacontract.yaml │ ├── datacontract.html │ ├── generate-catalog │ ├── index.html │ ├── muellimperium/ │ │ ├── data.csv │ │ ├── datacontract.html │ │ └── datacontract.yaml │ ├── orders-latest/ │ │ ├── datacontract.html │ │ └── datacontract.yaml │ ├── orders-latest-nested/ │ │ ├── datacontract.html │ │ └── datacontract.yaml │ ├── time-example/ │ │ ├── datacontract.html │ │ └── datacontract.yaml │ └── variant-json-example/ │ └── datacontract.yaml ├── gen-openapi-yaml ├── versions/ │ ├── 0.9.0/ │ │ ├── README.md │ │ ├── datacontract.init.yaml │ │ └── datacontract.schema.json │ ├── 0.9.1/ │ │ ├── README.md │ │ ├── datacontract.init.yaml │ │ └── datacontract.schema.json │ ├── 0.9.2/ │ │ ├── README.md │ │ ├── datacontract.init.yaml │ │ └── datacontract.schema.json │ ├── 0.9.3/ │ │ ├── README.md │ │ ├── datacontract.init.yaml │ │ ├── datacontract.schema.json │ │ └── definition.schema.json │ ├── 1.1.0/ │ │ ├── README.md │ │ ├── datacontract.init.yaml │ │ ├── datacontract.schema.json │ │ └── definition.schema.json │ ├── 1.2.0/ │ │ ├── datacontract.init.yaml │ │ ├── datacontract.schema.json │ │ └── definition.schema.json │ └── 1.2.1/ │ ├── datacontract.init.yaml │ ├── datacontract.schema.json │ └── definition.schema.json └── workshop.md
Condensed preview — 55 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (1,805K chars).
[
{
"path": ".github/validate-examples",
"chars": 1035,
"preview": "#!/bin/bash\n\nset -ex\n\n#function datacontract() {\n# docker run --rm -v \"${PWD}:/home/datacontract\" --platform linux/am"
},
{
"path": ".github/workflows/ci.yaml",
"chars": 566,
"preview": "on:\n push:\n pull_request:\n workflow_call:\n\nname: CI\njobs:\n test:\n if: false # skip as the example structure has c"
},
{
"path": ".gitignore",
"chars": 96,
"preview": ".idea/\n*.bkp\ndatacontract.schema.openapi-format.*\n.soda/\ndatacontract-from-readme.yaml\n.duckdb/\n"
},
{
"path": "CHANGELOG.md",
"chars": 3984,
"preview": "# Changelog\n\nAll notable changes to this project will be documented in this file.\n\nThe format is based on [Keep a Change"
},
{
"path": "CNAME",
"chars": 30,
"preview": "datacontract-specification.com"
},
{
"path": "LICENSE",
"chars": 1079,
"preview": "MIT License\n\nCopyright (c) 2023 Data Mesh Architecture\n\nPermission is hereby granted, free of charge, to any person obta"
},
{
"path": "README.md",
"chars": 108809,
"preview": "# Data Contract Specification \n\n<a href=\"https://github.com/datacontract/datacontract-specification\">\n <img alt=\"Star"
},
{
"path": "_config.yml",
"chars": 152,
"preview": "plugins:\n - jekyll-sitemap\nname: Data Contract Specification\ntitle: null\ndescription: Data contracts bring data provide"
},
{
"path": "_layouts/default.html",
"chars": 2707,
"preview": "<!DOCTYPE html>\n<html lang=\"{{ site.lang | default: \"en-US\" }}\">\n <head>\n <meta charset=\"UTF-8\">\n <meta http-equi"
},
{
"path": "datacontract.init.yaml",
"chars": 1828,
"preview": "dataContractSpecification: 1.2.1\nid: my-data-contract-id\ninfo:\n title: My Data Contract\n version: 0.0.1\n# description"
},
{
"path": "datacontract.schema.json",
"chars": 64938,
"preview": "{\n \"$schema\": \"http://json-schema.org/draft-07/schema#\",\n \"type\": \"object\",\n \"title\": \"DataContractSpecification\",\n "
},
{
"path": "definition.schema.json",
"chars": 2976,
"preview": "{\n \"$schema\": \"http://json-schema.org/draft-07/schema#\",\n \"type\": \"object\",\n \"description\": \"Clear and concise explan"
},
{
"path": "diagrams/automation.drawio",
"chars": 306864,
"preview": "<mxfile host=\"Electron\" modified=\"2024-10-26T19:15:16.643Z\" agent=\"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) Apple"
},
{
"path": "diagrams/datacontract.drawio",
"chars": 196395,
"preview": "<mxfile host=\"Electron\" modified=\"2023-09-19T08:15:52.509Z\" agent=\"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) Apple"
},
{
"path": "diagrams/favicon.drawio",
"chars": 4200,
"preview": "<mxfile host=\"drawio-plugin\" modified=\"2024-03-12T19:44:32.908Z\" agent=\"5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWeb"
},
{
"path": "examples/covid-cases/datacontract.html",
"chars": 47208,
"preview": "<!doctype html>\n<html class=\"h-full bg-gray-100\" lang=\"en\">\n<head>\n <title>Data Contract</title>\n <meta charset=\"UTF-8"
},
{
"path": "examples/covid-cases/datacontract.yaml",
"chars": 1741,
"preview": "dataContractSpecification: 0.9.3\nid: covid_cases\ninfo:\n title: COVID-19 cases\n description: Johns Hopkins University C"
},
{
"path": "examples/datacontract.html",
"chars": 47435,
"preview": "<!doctype html>\n<html class=\"h-full bg-gray-100\" lang=\"en\">\n<head>\n <title>Data Contract</title>\n <meta charset=\"UTF-8"
},
{
"path": "examples/generate-catalog",
"chars": 54,
"preview": "datacontract catalog --files \"**/*.yaml\" --output \".\"\n"
},
{
"path": "examples/index.html",
"chars": 53157,
"preview": "<!doctype html>\n<html class=\"h-full bg-gray-100\" lang=\"en\">\n<head>\n <title>Data Contract</title>\n <meta charset=\"UTF-8"
},
{
"path": "examples/muellimperium/data.csv",
"chars": 272,
"preview": "Pluto,residual_waste,2021-01-09\nPluto,bio_waste,2021-01-02\nPluto,paper,2021-01-11\nPluto,plastic,2021-01-12\nPluto,bulky_w"
},
{
"path": "examples/muellimperium/datacontract.html",
"chars": 43685,
"preview": "<!doctype html>\n<html class=\"h-full bg-gray-100\" lang=\"en\">\n<head>\n <title>Data Contract</title>\n <meta charset=\"UTF-8"
},
{
"path": "examples/muellimperium/datacontract.yaml",
"chars": 1411,
"preview": "dataContractSpecification: 0.9.3\nid: muellimperium-exchange-format\ninfo:\n title: Muellimperium Exchange Format\n versio"
},
{
"path": "examples/orders-latest/datacontract.html",
"chars": 88599,
"preview": "<!doctype html>\n<html class=\"h-full bg-gray-100\" lang=\"en\">\n<head>\n <title>Data Contract</title>\n <meta charset=\"UTF-8"
},
{
"path": "examples/orders-latest/datacontract.yaml",
"chars": 7770,
"preview": "dataContractSpecification: 1.2.0\nid: orders-latest\ninfo:\n title: Orders Latest\n version: 2.0.0\n description: |\n Su"
},
{
"path": "examples/orders-latest-nested/datacontract.html",
"chars": 66241,
"preview": "<!doctype html>\n<html class=\"h-full bg-gray-100\" lang=\"en\">\n<head>\n <title>Data Contract</title>\n <meta charset=\"UTF-8"
},
{
"path": "examples/orders-latest-nested/datacontract.yaml",
"chars": 3482,
"preview": "dataContractSpecification: 0.9.3\nid: urn:orders-latest-nested\ninfo:\n title: Orders Latest (Nested)\n version: 1.0.0\n d"
},
{
"path": "examples/time-example/datacontract.html",
"chars": 12782,
"preview": "<!DOCTYPE html>\n<html lang=\"en\">\n<head>\n <meta charset=\"UTF-8\">\n <meta name=\"viewport\" content=\"width=device-width"
},
{
"path": "examples/time-example/datacontract.yaml",
"chars": 3680,
"preview": "dataContractSpecification: 1.2.1\nid: time-demo\ninfo:\n title: Time Data Type Example\n version: 1.0.0\n description: |\n "
},
{
"path": "examples/variant-json-example/datacontract.yaml",
"chars": 2592,
"preview": "dataContractSpecification: 1.2.1\nid: variant-json-demo\ninfo:\n title: Variant and JSON Data Types Example\n version: 1.0"
},
{
"path": "gen-openapi-yaml",
"chars": 556,
"preview": "#!/bin/bash\n\n# INSTALL BEFORE\n# npm install -g @openapi-contrib/json-schema-to-openapi-schema\n# brew install yq\n\njson-sc"
},
{
"path": "versions/0.9.0/README.md",
"chars": 30582,
"preview": "# Data Contract Specification\n\n\n\nData contracts bring data providers and dat"
},
{
"path": "versions/0.9.0/datacontract.init.yaml",
"chars": 4436,
"preview": "dataContractSpecification: 0.9.0\nid: my-data-contract-id\ninfo:\n title: My Data Contract\n version: 0.0.1\n# description"
},
{
"path": "versions/0.9.0/datacontract.schema.json",
"chars": 11235,
"preview": "{\n \"$schema\": \"http://json-schema.org/draft-07/schema#\",\n \"type\": \"object\",\n \"properties\": {\n \"dataContractSpecifi"
},
{
"path": "versions/0.9.1/README.md",
"chars": 36265,
"preview": "# Data Contract Specification\n\n\n\nData contracts bring data providers and dat"
},
{
"path": "versions/0.9.1/datacontract.init.yaml",
"chars": 4747,
"preview": "dataContractSpecification: 0.9.1\nid: my-data-contract-id\ninfo:\n title: My Data Contract\n version: 0.0.1\n# description"
},
{
"path": "versions/0.9.1/datacontract.schema.json",
"chars": 13788,
"preview": "{\n \"$schema\": \"http://json-schema.org/draft-07/schema#\",\n \"type\": \"object\",\n \"properties\": {\n \"dataContractSpecifi"
},
{
"path": "versions/0.9.2/README.md",
"chars": 57395,
"preview": "# Data Contract Specification\n\n<a href=\"https://github.com/datacontract/datacontract-specification\">\n <img alt=\"Stars"
},
{
"path": "versions/0.9.2/datacontract.init.yaml",
"chars": 988,
"preview": "dataContractSpecification: 0.9.2\nid: my-data-contract-id\ninfo:\n title: My Data Contract\n version: 0.0.1\n# description"
},
{
"path": "versions/0.9.2/datacontract.schema.json",
"chars": 24338,
"preview": "{\n \"$schema\": \"http://json-schema.org/draft-07/schema#\",\n \"type\": \"object\",\n \"title\": \"DataContractSpecification\",\n "
},
{
"path": "versions/0.9.3/README.md",
"chars": 94475,
"preview": "# Data Contract Specification\n\n<a href=\"https://github.com/datacontract/datacontract-specification\">\n <img alt=\"Stars"
},
{
"path": "versions/0.9.3/datacontract.init.yaml",
"chars": 2134,
"preview": "dataContractSpecification: 0.9.3\nid: my-data-contract-id\ninfo:\n title: My Data Contract\n version: 0.0.1\n# description"
},
{
"path": "versions/0.9.3/datacontract.schema.json",
"chars": 56866,
"preview": "{\n \"$schema\": \"http://json-schema.org/draft-07/schema#\",\n \"type\": \"object\",\n \"title\": \"DataContractSpecification\",\n "
},
{
"path": "versions/0.9.3/definition.schema.json",
"chars": 2838,
"preview": "{\n \"$schema\": \"http://json-schema.org/draft-07/schema#\",\n \"type\": \"object\",\n \"description\": \"Clear and concise explan"
},
{
"path": "versions/1.1.0/README.md",
"chars": 102314,
"preview": "# Data Contract Specification\n\n<a href=\"https://github.com/datacontract/datacontract-specification\">\n <img alt=\"Stars"
},
{
"path": "versions/1.1.0/datacontract.init.yaml",
"chars": 1827,
"preview": "dataContractSpecification: 1.1.0\nid: my-data-contract-id\ninfo:\n title: My Data Contract\n version: 0.0.1\n# description"
},
{
"path": "versions/1.1.0/datacontract.schema.json",
"chars": 62791,
"preview": "{\n \"$schema\": \"http://json-schema.org/draft-07/schema#\",\n \"type\": \"object\",\n \"title\": \"DataContractSpecification\",\n "
},
{
"path": "versions/1.1.0/definition.schema.json",
"chars": 2976,
"preview": "{\n \"$schema\": \"http://json-schema.org/draft-07/schema#\",\n \"type\": \"object\",\n \"description\": \"Clear and concise explan"
},
{
"path": "versions/1.2.0/datacontract.init.yaml",
"chars": 1828,
"preview": "dataContractSpecification: 1.2.0\nid: my-data-contract-id\ninfo:\n title: My Data Contract\n version: 0.0.1\n# description"
},
{
"path": "versions/1.2.0/datacontract.schema.json",
"chars": 64162,
"preview": "{\n \"$schema\": \"http://json-schema.org/draft-07/schema#\",\n \"type\": \"object\",\n \"title\": \"DataContractSpecification\",\n "
},
{
"path": "versions/1.2.0/definition.schema.json",
"chars": 2976,
"preview": "{\n \"$schema\": \"http://json-schema.org/draft-07/schema#\",\n \"type\": \"object\",\n \"description\": \"Clear and concise explan"
},
{
"path": "versions/1.2.1/datacontract.init.yaml",
"chars": 1828,
"preview": "dataContractSpecification: 1.2.0\nid: my-data-contract-id\ninfo:\n title: My Data Contract\n version: 0.0.1\n# description"
},
{
"path": "versions/1.2.1/datacontract.schema.json",
"chars": 64938,
"preview": "{\n \"$schema\": \"http://json-schema.org/draft-07/schema#\",\n \"type\": \"object\",\n \"title\": \"DataContractSpecification\",\n "
},
{
"path": "versions/1.2.1/definition.schema.json",
"chars": 2976,
"preview": "{\n \"$schema\": \"http://json-schema.org/draft-07/schema#\",\n \"type\": \"object\",\n \"description\": \"Clear and concise explan"
},
{
"path": "workshop.md",
"chars": 5378,
"preview": "# Data Contract Workshop\n\nBring data producers and consumers together to define data contracts in a facilitated workshop"
}
]
About this extraction
This page contains the full source code of the datacontract/datacontract-specification GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 55 files (1.7 MB), approximately 595.1k tokens. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.
Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.