Data Architecture
Equinor's data architecture for OSDU is a work in progress. This page describes current thinking and design principles.
Key reference documents:
Continuous Enrichment
Our approach is built on the principle of Continuous Enrichment: reduce ingestion friction, make data available early, create quality transparency, and adhere to OSDU standards.
Reduce ingestion friction
We prefer to ingest "poor quality" data rather than block ingestion. Users should be able to view all data and comment on quality, rather than not knowing data exists. This includes auto-generating Master Data and Reference Data records to satisfy referential integrity checks.
Extract-Load-Transform (ELT)
We adopt ELT (not ETL). We ingest the raw format of source data as close to the original as possible. This maintains full fidelity on the source and allows extracting more data later without going back to the source.
Early availability
We lift the minimum needed data to OSDU Well-Known-Schemas (WKS). If mappings are well-known and low-friction, we add them. Records can be expanded later.
Quality transparency
End-users need insights to make educated data choices:
- Full lineage of data records
- KPI measures: asset coverage, element coverage (based on WKS), usage of recommended vs auto-generated Master/Reference Data records
OSDU Standards vs Raw data
End users should use OSDU Well-Known-Schemas, but there are scenarios where they need access to raw schemas and files — for example, when the WKS does not have the fidelity needed and the user must pass through to the raw schema.
Flow of data
End-user requirements
When an end-user has a requirement, the first step is validating whether the data has a home in an OSDU WKS. If not:
- Extend the WKS — propose this to the Data Definitions sub-committee (can be time-consuming)
- Use extended-properties or local schemas — useful for quick validation or Equinor-specific needs
Warning
Extension properties are not indexed. If you need to search on these properties, create a custom schema. Also, always keep the original data accessible within reach of OSDU (ELT principle).
Raw data
As a general rule, we want immutability:
- Store a full copy of raw data in OSDU unless there are specific reasons not to (e.g., size)
- Store data outside the source system in a way that ensures the original data is not modified
Raw data may have an associated custom schema that either includes all fields or contains a subset with a reference to stored copies of remaining data.