Introduction to Data Change Management (DCM)
Data schemas evolve over time, both in production and consumption. While traditional schema management tools work well for transactional databases, they fall short in addressing the unique challenges of analytics data:
- Event-based and often immutable data
- Decoupled producers and consumers
- Different evolution speeds between data production and consumption
Moose's Data Change Management (DCM) is designed to tackle these analytics-specific issues, by introducing a solution that lets you manage how multiple versions of your Data Model coexist in production simultaneously.
Data change management is in pre-alpha. We are releasing it to get feedback on its direction. We intend to polish it up in the coming weeks and months using the community's feedback (opens in a new tab).
Core Concepts
Data Change Management (DCM) in Moose enables the simultaneous operation of multiple versions of your Data Model. This approach allows for evolving of your data schema without disrupting existing data producers or consumers with breaking changes.
DCM is built on top of decades of best practices in software development, where API versions are a common pattern. The same way you can consume an API at version 1.0, 1.1, 1.2, etc., you can consume a data model at version 1.0, 1.1, 1.2, etc.
Versioning
Moose uses a versioning system to manage different iterations of your data model:
- The latest version corresponds to the most recent commit on the
main
branch. - Previous versions are tracked in the
moose.config.toml
file. - Each version has its own set of infrastructure (tables, ingestion endpoints, topics).
Continuous Migrations
To ensure data consistency across versions, Moose implements continuous migrations:
- Data ingested into older versions is automatically migrated to the latest version.
- This process runs in near real-time within the database.
- Migrations are defined in your programming language, which can be customized as needed.
Initial Data Migrations
When a new version is created:
- An initial data migration populates the new version with data from the previous version.
- This process temporarily pauses related infrastructure to prevent data duplication.
- Once complete, normal data flow resumes.