Define your First Data Model

Define a Data Model to Ingest Data from the Github Webhook

Viewing typescript

switch to python

Now you’re ready to define your first Moose primitive: a Data Model. This primitive is where you specify the schema of the data you will use in your application. Let's create a RawStarEvent Data Model for the data we will receive from the GitHub webhook.

Initialize a Data Model from Sample Data via the CLI

Instead of creating your Data Model by hand, let’s use the Moose helper function that can automatically generate Data Models based on analyzing a set of sample data.

Add Sample Data to your Project

Create a new file named sample-data.json in the root folder of your project:

  • sample-data.json
  • Inside this file, paste in the following sample data and save your changes:

    Initialize a RawStarEvent Data Model via the CLI

    MooseTip:

    Start a new terminal session and make sure to navigate back to your project directory.

    Terminal
    npx moose-cli data-model init RawStarEvent -s sample-data.json

    This command will generate a new file named:RawStarEvent.ts and place it in the datamodels folder of your project.

  • sample-data.json
      • RawStarEvent.ts
  • Open and Inspect RawStarEvent.ts

    Open the file Moose generated for you in your IDE to inspect the Data Model code that was automatically generated from the sample data.

    Note a couple of interesting fields:

    datamodels/RawStarEvent.ts
    import { Key } from "@514labs/moose-lib";
     
    export interface RawStarEvent {
      action: Key<string>;
      organization: {
        // more data fields ...
        //..
      };
      // more data fields ...
      // ...
      //...
      sender: {
        //...
      };
      starred_at: string;
    }
    • action: Indicates whether a star was created or deleted for this event.
    • starred_at: For created actions, this field represents the timestamp when the star was added.
    • sender: An object containing information about the user who starred or un-starred the repository. There are two fields within this nested object that we want to ingest:
      • sender.login: the username of the starrer
      • sender.repos_url: the URL we can use to lookup the starrer's public repositories
    Data Modeling Tips:
    • Each Data Model must designate a field as the Key type. This special type, imported from @514labs/moose-lib, specifies the sorting key for your Clickhouse table.
    • The optional (?) operator is used to mark a field as nullable (i.e., it's possible for the field to be missing in the data you want to ingest).
    • Only exported interfaces are picked up as Data Models, and spin up corresponding Moose infrastructure.

    Verify Moose Configured the infrastructure

    Your dev server should have automatically picked up on the new Data Model and configured the necessary infrastructure to ingest webhook events with this schema.

    Run this command in your terminal to verify:

    Terminal
    npx moose-cli ls
    Congrats!

    With just a single terminal command, you've generated a Moose Data Model. This Data Model in turn automatically generated:

    • An API route to ingest RawStarEvent data (ingest/RawStarEvent)
    • A Redpanda streaming topic to queue and process the data
    • A Clickhouse database table to store these events (RawStarEvent_0_0)