Setting up Multiple Versions of a Data Model
The following guide will demonstrate how to create a new version of a Data Model in a Moose project and how to generate a migration function to migrate data from the old version to the new version.
Recall from the Data Change Management Core Concepts that Moose supports operating multiple versions of a Data Model in parallel, so that when you make a change to a Data Model, you can continue to operate on the old version of the Data Model while the new version is being ingested and migrated.
Additionally, recall that the state of your files in your git repository of your Moose application corresponds to the latest version of your data model.
If you don't have a Moose project with a Data Model, follow the Quick Start guide to create a new Moose project for this guide.
Change a Data Model
Select a Data Model in your datamodels
folder and make a change to one of the fields. This can be a change in the type of the field, or a change in the name of the field. It can also be removing a field and adding a new field with a different name or type.
If you have a dev server running, stop it (CTRL+C
in the terminal) before
proceeding.
Example: Defining a new version of the UserActivity
Data Model
In the boilerplate project created in the Quick Start guide, the UserActivity
Data Model is defined as follows:
export interface UserActivity {
eventId: Key<string>;
userId: string;
timestamp: Date;
activity: string;
}
To change the UserActivity
Data Model and generate a new version of it, we will delete the activity
field. The new version of the Data Model will be as follows:
export interface UserActivity {
eventId: Key<string>;
userId: string;
timestamp: Date;
}
Bumping the Version of the Data Model
Run the following command to bump the version of the Data Model:
npx moose-cli bump-version
This command will bump the version in your package.json
file and add a pointer between the git commit and the previous version to the moose.config.toml
file.
[supported_old_versions]
"0.0" = "263297f"
Generate the Continuous Migration Function
Run the following command:
npx moose-cli generate migrations
Streaming Function Migration
This command will generate a special migration Streaming Function in the /functions
directory.
- UserActivity_migrate__0_0__0_1.ts
Inside this file, you will see the following:
export default function migrate(event: UserActivity_0_0): UserActivity_0_1 {
return {
eventId: event.eventId,
timestamp: event.timestamp,
userId: event.userId,
};
}
The setup of this function looks like a regular Streaming Function:
- The old version (
UserActivity_0_0
) is the input type. - The new version (
UserActivity_0_1
) is the output type. - The function body is the migration logic.
Migration functions are unique because they are initially executed on all the data in the table for the previous version. This is the Initial Data Load part of the migration.
Like regular Streaming Functions, this migration function will be executed for any new data that is ingested into the old Data Model version in order to migrate it to the new version. This is the Continuous part of the migration.
Inspect the Resulting State
First start your dev server:
npx moose-cli dev
Next, open a new terminal window and run the following command:
npx moose-cli ls
Here you will be able to see the additional tables automatically created for
you: UserActivity_0_1
is one of them.
If you are following this guide with a different Data Model in your own
project, you should see YOUR_DATA_MODEL_NAME_0_1
in the list of tables.
Run the following in your terminal to send some data to version _0_0
of your data model
curl -v -X POST \
-H "Content-Type: application/json" \
-d "{\"eventId\": \"1\", \"timestamp\": \"$(date '+%Y-%m-%d %H:%M:%S')\", \"userId\": \"1\", \"activity\": \"click\"}" \
http://localhost:4000/ingest/UserActivity/0.0
You will notice that it contains the activity field that we deleted.
Query the Data Model
In your DB explorer, execute the following query:
SELECT * FROM local.UserActivity_0_0 LIMIT 50;
You should see the event we just added there.
Validate the Continuous Migration Output
The continuous migration working behind the scenes will allow you to also see the data
inside local.UserActivity_0_1
:
SELECT * FROM local.UserActivity_0_1 LIMIT 50;
This should return the same record, but without the activity
column.
The data was automatically migrated to the new table. This is a trivial example, but the same feature holds true for more complex migrations.
Sending Data to the New Data Model Version
Run the following curl
command to send sample data to the new version of the UserActivity
Data Model (UserActivity_0_1
):
curl -v -X POST \
-H "Content-Type: application/json" \
-d "{\"eventId\": \"2\", \"timestamp\": \"$(date '+%Y-%m-%d %H:%M:%S')\", \"userId\": \"2\"}" \
http://localhost:4000/ingest/UserActivity/0.1
This setup enables you to keep old models alive with historical data as you migrate your infrastructure to produce and consume data on the new data model.
Once all your data is migrated, you can remove the old version, and we will appropriately clean up the infrastructure.
To summarize:
- Data Change Management allows you to version your data infrastructure alongside your Data Models.
- As you update your Data Models, Moose will automate the creation of data infrastructure, and allow you to run the old data infrastructure alongside the new data infrastructure, keeping the old data flowing through to the new Data Models with migrations.
- Such migrations are defined by Moose, but you may change the definition as per your requirements.
- This allows you to treat your data infrastructure and your Data Models as you do your code.