Setting up Multiple Versions of a Data Model

The following guide will demonstrate how to create a new version of a Data Model in a Moose project and how to generate a migration function to migrate data from the old version to the new version.

Recall from the Data Change Management Core Concepts that Moose supports operating multiple versions of a Data Model in parallel, so that when you make a change to a Data Model, you can continue to operate on the old version of the Data Model while the new version is being ingested and migrated.

Additionally, recall that the state of your files in your git repository of your Moose application corresponds to the latest version of your data model.

Prerequisites: Have a Moose project with a Data Model

If you don't have a Moose project with a Data Model, follow the Quick Start guide to create a new Moose project for this guide.

Change a Data Model

Select a Data Model in your datamodels folder and make a change to one of the fields. This can be a change in the type of the field, or a change in the name of the field. It can also be removing a field and adding a new field with a different name or type.

Stop your Dev Server

If you have a dev server running, stop it (CTRL+C in the terminal) before proceeding.

Example: Defining a new version of the `UserActivity` Data Model

In the boilerplate project created in the Quick Start guide, the UserActivity Data Model is defined as follows:

./app/datamodels/models.ts

export interface UserActivity {
  eventId: Key<string>;
  userId: string;
  timestamp: Date;
  activity: string;
}

To change the UserActivity Data Model and generate a new version of it, we will delete the activity field. The new version of the Data Model will be as follows:

./app/datamodels/models.ts

export interface UserActivity {
  eventId: Key<string>;
  userId: string;
  timestamp: Date;
}

Bumping the Version of the Data Model

Run the following command to bump the version of the Data Model:

Terminal

npx moose-cli bump-version

Bumping the version

This command will bump the version in your package.json file and add a pointer between the git commit and the previous version to the moose.config.toml file.

./moose.config.toml

[supported_old_versions]
"0.0" = "263297f"

Generate the Continuous Migration Function

Run the following command:

Terminal

npx moose-cli generate migrations

Streaming Function Migration

This command will generate a special migration Streaming Function in the /functions directory.

- UserActivity_migrate__0_0__0_1.ts

Inside this file, you will see the following:

./app/functions/UserActivity_migrate__0_0__0_1.ts

export default function migrate(event: UserActivity_0_0): UserActivity_0_1 {
  return {
    eventId: event.eventId,
    timestamp: event.timestamp,
    userId: event.userId,
  };
}

The setup of this function looks like a regular Streaming Function (opens in a new tab):

The old version (UserActivity_0_0) is the input type.
The new version (UserActivity_0_1) is the output type.
The function body is the migration logic.

Migrations are a Special Type of Streaming Function

Migration functions are unique because they are initially executed on all the data in the table for the previous version. This is the Initial Data Load part of the migration.

Like regular Streaming Functions, this migration function will be executed for any new data that is ingested into the old Data Model version in order to migrate it to the new version. This is the Continuous part of the migration.

Inspect the Resulting State

First start your dev server:

Terminal

npx moose-cli dev

Next, open a new terminal window and run the following command:

Terminal

npx moose-cli ls

Here you will be able to see the additional tables automatically created for you: UserActivity_0_1 is one of them.

Not working with the UserActivity Data Model?

If you are following this guide with a different Data Model in your own project, you should see YOUR_DATA_MODEL_NAME_0_1 in the list of tables.

Run the following in your terminal to send some data to version _0_0 of your data model

Terminal

curl -v -X POST \
    -H "Content-Type: application/json" \
    -d "{\"eventId\": \"1\", \"timestamp\": \"$(date '+%Y-%m-%d %H:%M:%S')\", \"userId\": \"1\", \"activity\": \"click\"}" \
    http://localhost:4000/ingest/UserActivity/0.0

You will notice that it contains the activity field that we deleted.

Query the Data Model

In your DB explorer, execute the following query:

Your Database Explorer

SELECT * FROM local.UserActivity_0_0 LIMIT 50;

You should see the event we just added there.

Validate the Continuous Migration Output

The continuous migration working behind the scenes will allow you to also see the data inside local.UserActivity_0_1:

Your Database Explorer

SELECT * FROM local.UserActivity_0_1 LIMIT 50;

This should return the same record, but without the activity column.

Success!

The data was automatically migrated to the new table. This is a trivial example, but the same feature holds true for more complex migrations.

Sending Data to the New Data Model Version

Run the following curl command to send sample data to the new version of the UserActivity Data Model (UserActivity_0_1):

Terminal

curl -v -X POST \
    -H "Content-Type: application/json" \
    -d "{\"eventId\": \"2\", \"timestamp\": \"$(date '+%Y-%m-%d %H:%M:%S')\", \"userId\": \"2\"}" \
    http://localhost:4000/ingest/UserActivity/0.1

Data Shows up in the 0_1 table only!

This setup enables you to keep old models alive with historical data as you migrate your infrastructure to produce and consume data on the new data model.

Once all your data is migrated, you can remove the old version, and we will appropriately clean up the infrastructure.

To summarize:

Data Change Management allows you to version your data infrastructure alongside your Data Models.
As you update your Data Models, Moose will automate the creation of data infrastructure, and allow you to run the old data infrastructure alongside the new data infrastructure, keeping the old data flowing through to the new Data Models with migrations.
Such migrations are defined by Moose, but you may change the definition as per your requirements.
This allows you to treat your data infrastructure and your Data Models as you do your code.

Changing Data Models Ingesting Data