Define a Data Model to Ingest Data from the Github Webhook
Viewing typescript
switch to python
Now you’re ready to define your first Moose primitive: a Data Model. This primitive is where you specify the schema of the data you will use in your application. Let's create a RawStarEvent
Data Model for the data we will receive from the GitHub webhook.
Initialize a Data Model from Sample Data via the CLI
Instead of creating your Data Model by hand, let’s use the Moose helper function that can automatically generate Data Models based on analyzing a set of sample data.
Add Sample Data to your Project
Create a new file named sample-data.json
in the root folder of your project:
Inside this file, paste in the following sample data and save your changes:
Initialize a RawStarEvent
Data Model via the CLI
Start a new terminal session and make sure to navigate back to your project directory.
npx moose-cli data-model init RawStarEvent -s sample-data.json
moose-cli data-model init RawStarEvent -s sample-data.json
This command will generate a new file named:RawStarEvent.ts
RawStarEvent.py
and place it in the datamodels
folder of your project.
- RawStarEvent.ts
- RawStarEvent.py
Open and Inspect RawStarEvent.ts
RawStarEvent.py
Open the file Moose generated for you in your IDE to inspect the Data Model code that was automatically generated from the sample data.
Note a couple of interesting fields:
import { Key } from "@514labs/moose-lib";
export interface RawStarEvent {
action: Key<string>;
organization: {
// more data fields ...
//..
};
// more data fields ...
// ...
//...
sender: {
//...
};
starred_at: string;
}
Scroll to the bottom of the file to see the RawStarEvent
Data Model definition. This is decorated with the @moose_data_model
decorator, which is required for Moose to recognize it as a Data Model:
from moose_lib import Key, moose_data_model
from dataclasses import dataclass
from typing import Optional
@dataclass
class Owner:
login: str
id: int
node_id: str
avatar_url: str
gravatar_id: str
url: str
html_url: str
followers_url: str
following_url: str
gists_url: str
starred_url: str
subscriptions_url: str
organizations_url: str
repos_url: str
events_url: str
received_events_url: str
type: str
site_admin: bool
@dataclass
class License:
key: str
name: str
spdx_id: str
url: str
node_id: str
@dataclass
class Repository:
id: int
node_id: str
name: str
full_name: str
private: bool
owner: Owner
html_url: str
description: str
fork: bool
url: str
forks_url: str
keys_url: str
collaborators_url: str
teams_url: str
hooks_url: str
issue_events_url: str
events_url: str
assignees_url: str
branches_url: str
tags_url: str
blobs_url: str
git_tags_url: str
git_refs_url: str
trees_url: str
statuses_url: str
languages_url: str
stargazers_url: str
contributors_url: str
subscribers_url: str
subscription_url: str
commits_url: str
git_commits_url: str
comments_url: str
issue_comment_url: str
contents_url: str
compare_url: str
merges_url: str
archive_url: str
downloads_url: str
issues_url: str
pulls_url: str
milestones_url: str
notifications_url: str
labels_url: str
releases_url: str
deployments_url: str
created_at: str
updated_at: str
pushed_at: str
git_url: str
ssh_url: str
clone_url: str
svn_url: str
homepage: str
size: int
stargazers_count: int
watchers_count: int
language: str
has_issues: bool
has_projects: bool
has_downloads: bool
has_wiki: bool
has_pages: bool
has_discussions: bool
forks_count: int
mirror_url: Optional[str]
archived: bool
disabled: bool
open_issues_count: int
license: License
allow_forking: bool
is_template: bool
web_commit_signoff_required: bool
topics: list[str]
visibility: str
forks: int
open_issues: int
watchers: int
default_branch: str
@dataclass
class Organization:
login: str
id: int
node_id: str
url: str
repos_url: str
events_url: str
hooks_url: str
issues_url: str
members_url: str
public_members_url: str
avatar_url: str
description: str
@dataclass
class Sender:
login: str
id: int
node_id: str
avatar_url: str
gravatar_id: str
url: str
html_url: str
followers_url: str
following_url: str
gists_url: str
starred_url: str
subscriptions_url: str
organizations_url: str
repos_url: str
events_url: str
received_events_url: str
type: str
site_admin: bool
@moose_data_model
@dataclass
class RawStarEvent:
action: Key[str]
starred_at: Optional[str]
repository: Repository
organization: Organization
sender: Sender
action
: Indicates whether a star wascreated
ordeleted
for this event.starred_at
: Forcreated
actions, this field represents the timestamp when the star was added.sender
: An object containing information about the user who starred or un-starred the repository. There are two fields within this nested object that we want to ingest:sender.login
: the username of the starrersender.repos_url
: the URL we can use to lookup the starrer's public repositories
- Each Data Model must designate a field as the
Key
type. This special type, imported from@514labs/moose-lib
moose_lib
, specifies the sorting key for your Clickhouse table. - The optional (
?
) operatorOptional[T]
type is used to mark a field as nullable (i.e., it's possible for the field to be missing in the data you want to ingest).
- Only exported interfaces are picked up as Data Models, and spin up corresponding Moose infrastructure.
- Only the dataclasses decorated with
@moose_data_model
(imported frommoose_lib
) are picked up as Data Models, and spin up corresponding Moose infrastructure. - Nested objects are defined as their own
@dataclass
but without the@moose_data_model
decorator.
Verify Moose Configured the infrastructure
Your dev server should have automatically picked up on the new Data Model and configured the necessary infrastructure to ingest webhook events with this schema.
Run this command in your terminal to verify:
npx moose-cli ls
moose-cli ls
With just a single terminal command, you've generated a Moose Data Model. This Data Model in turn automatically generated:
- An API route to ingest
RawStarEvent
data (ingest/RawStarEvent
) - A Redpanda streaming topic to queue and process the data
- A Clickhouse database table to store these events (
RawStarEvent_0_0
)