Introduction to Data Change Management (DCM)

Data schemas evolve over time, both in production and consumption. While traditional schema management tools work well for transactional databases, they fall short in addressing the unique challenges of analytics data:

Event-based and often immutable data
Decoupled producers and consumers
Different evolution speeds between data production and consumption

Moose's Data Change Management (DCM) is designed to tackle these analytics-specific issues, by introducing a solution that lets you manage how multiple versions of your Data Model coexist in production simultaneously.

Pre-Alpha Warning

Data change management is in pre-alpha. We are releasing it to get feedback on its direction. We intend to polish it up in the coming weeks and months using the community's feedback (opens in a new tab).

Core Concepts

Data Change Management (DCM) in Moose enables the simultaneous operation of multiple versions of your Data Model. This approach allows for evolving of your data schema without disrupting existing data producers or consumers with breaking changes.

Versioning

DCM is built on top of decades of best practices in software development, where API versions are a common pattern. The same way you can consume an API at version 1.0, 1.1, 1.2, etc., you can consume a data model at version 1.0, 1.1, 1.2, etc.

Versioning

Moose uses a versioning system to manage different iterations of your data model:

The latest version corresponds to the most recent commit on the main branch.
Previous versions are tracked in the moose.config.toml file.
Each version has its own set of infrastructure (tables, ingestion endpoints, topics).

Continuous Migrations

To ensure data consistency across versions, Moose implements continuous migrations:

Data ingested into older versions is automatically migrated to the latest version.
This process runs in near real-time within the database.
Migrations are defined in your programming language, which can be customized as needed.

Initial Data Migrations

When a new version is created:

An initial data migration populates the new version with data from the previous version.
This process temporarily pauses related infrastructure to prevent data duplication.
Once complete, normal data flow resumes.

DCM diagram

Configuring Infrastructure Create a new Data Model Version