Data Modeling
Viewing typescript
switch to python
Moose Data Models allow you to define the schema and shape of your data and automatically provision ingest APIs, streaming topics, and OLAP-backed storage tables—all from simple type definitions.
import { Key } from "moose-lib";
export interface MyDataModel {
primaryKey: Key<string>;
someString: string;
someNumber: number;
someBoolean: boolean;
someDate: Date;
someArray: string[];
}
from moose_lib import Key, moose_data_model
from dataclasses import dataclass
from datetime import datetime
@moose_data_model
@dataclass
class MyDataModel:
primary_key: Key[str]
some_string: str
some_number: int
some_boolean: bool
some_date: datetime
some_list: list[str]
Quick Start
Data Models are defined in the /datamodels
directory of your Moose application. This folder is automatically created when you initialize a new Moose application.
To create a new Data Model, either create a new file in the /datamodels
directory or run the following command:
npx moose-cli data-model init MyDataModel --sample <PATH_TO_SAMPLE_DATA>
The CLI will create a new file in the /datamodels
directory with the name of the Data Model:
- MyDataModel.ts
moose-cli data-model init MyDataModel --sample <PATH_TO_SAMPLE_DATA>
The CLI will create a new file in the /datamodels
directory with the name of the Data Model:
- MyDataModel.py
If you have a .json
or .csv
file with sample data representing the structure of the data you want to ingest, you can use the --sample
flag. The CLI will then infer the Data Model schema from the provided file and create the Data Model for you.
Data Model Definition
Data Models are written as TypeScript interfaces. To have Moose recognize your Data Model, you must export them from a .ts
file. For example:
import { Key } from "@514labs/moose-lib";
export interface MyFirstDataModel {
primaryKey: Key<string>;
someString: string;
someNumber: number;
someBoolean: boolean;
someDate: Date;
someArray: string[];
someOptionalString?: string; // Optional fields are marked with a `?`
}
You can define multiple data models in the same file. Each exported interface is automatically recognized by Moose.
Data Models are defined as Python dataclasses using the @dataclass
decorator. To ensure Moose automatically detects and registers the dataclass as a Data Model, apply the @moose_data_model
decorator to your dataclass:
Import the @moose_data_model
decorator from the moose_lib
package.
from moose_lib import Key, moose_data_model
from dataclasses import dataclass
from typing import List
from datetime import datetime
@moose_data_model
@dataclass
class MyFirstDataModel:
primary_key: Key[str]
some_string: str
some_number: int
some_boolean: bool
some_date: datetime
some_array: List[str]
some_optional_string: Optional[str]
You can define multiple data models in the same file. Each dataclass decorated with @moose_data_model
is automatically recognized by Moose.
Basic Usage and Examples
Below you will find several examples showing the different ways you can define Data Models.
Basic Data Model
{
"example_UUID": "123e4567-e89b-12d3-a456-426614174000",
"example_string": "string",
"example_number": 123,
"example_boolean": true,
"example_array": [1, 2, 3]
}
import { Key } from "@514labs/moose-lib";
export interface BasicDataModel {
example_UUID: Key<string>;
example_string: string;
example_number: number;
example_boolean: boolean;
example_array: number[];
}
from dataclasses import dataclass
from moose_lib import Key, moose_data_model
@moose_data_model
@dataclass
class BasicDataModel:
example_UUID: Key[str]
example_string: str
example_number: int
example_boolean: bool
example_array: list[int]
Optional Fields
[
{
"example_UUID": "123e4567-e89b-12d3-a456-426614174000",
"example_string": "string",
"example_number": 123,
"example_boolean": true,
"example_array": [1, 2, 3],
"example_optional_string": "optional"
},
{
"example_UUID": "123e4567-e89b-12d3-a456-426614174000",
"example_string": "string",
"example_number": 123,
"example_boolean": true,
"example_array": [1, 2, 3]
}
]
import { Key } from "@514labs/moose-lib";
export interface DataModelWithOptionalField {
example_UUID: Key<string>;
example_string: string;
example_number: number;
example_boolean: boolean;
example_array: number[];
example_optional_string?: string; // Use the `?` operator to mark a field as optional
}
from dataclasses import dataclass
from typing import Optional
from moose_lib import Key, moose_data_model
@moose_data_model
@dataclass
class DataModelWithOptionalField:
example_UUID: Key[str]
example_string: str
example_number: int
example_boolean: bool
example_array: list[int]
example_optional_string: Optional[str] # Use the Optional type
Nested Fields
{
"example_UUID": "123e4567-e89b-12d3-a456-426614174000",
"example_string": "string",
"example_number": 123,
"example_boolean": true,
"example_array": [1, 2, 3],
"example_nested_object": {
"example_nested_number": 456,
"example_nested_boolean": true,
"example_nested_array": [4, 5, 6]
}
}
import { Key } from "@514labs/moose-lib";
// Define the nested object interface separately
interface NestedObject {
example_nested_number: number;
example_nested_boolean: boolean;
example_nested_array: number[];
}
export interface DataModelWithNestedObject {
example_UUID: Key<string>;
example_string: string;
example_number: number;
example_boolean: boolean;
example_array: number[];
example_nested_object: NestedObject; // Reference nested object interface
}
from moose_lib import Key, moose_data_model
from dataclasses import dataclass
@dataclass
class ExampleNestedObject:
example_nested_number: int
example_nested_boolean: bool
example_nested_array: list[int]
@moose_data_model # Only register the outer dataclass
@dataclass
class DataModelWithNestedObject:
example_UUID: Key[str]
example_string: str
example_number: int
example_boolean: bool
example_array: list[int]
example_nested_object: ExampleNestedObject # Use the nested dataclass instead of a dict
Using Enum in Data Models
Moose supports the use of Enums in Data Models:
import { Key } from "@514labs/moose-lib";
enum MyEnum {
VALUE_1 = "value1",
VALUE_2 = "value2",
}
export interface DataModelWithEnum {
example_UUID: Key<string>;
example_enum: MyEnum;
}
from moose_lib import Key, moose_data_model
from enum import Enum
class MyEnum(Enum):
VALUE_1 = "value1"
VALUE_2 = "value2"
@moose_data_model
@dataclass
class DataModelWithEnum:
example_UUID: Key[str]
example_enum: MyEnum
Inherritance
Moose supports the use of inheritance in Data Models. This allows you to create a base Data Model and extend it with additional fields:
import { Key } from "@514labs/moose-lib";
export interface BaseDataModel {
example_UUID: Key<string>;
}
export interface ExtendedDataModel extends BaseDataModel {
example_string: string;
}
// Schema of ExtendedDataModel:
// {
// "example_UUID": "123e4567-e89b-12d3-a456-426614174000",
// "example_string": "string"
// }
Inheritance is not supported in Python. Instead, you can use composition to achieve similar functionality. Support for inheritance is in the works.
from dataclasses import dataclass
from moose_lib import moose_data_model, Key
from typing import Optional
@dataclass
class BaseData:
# The fields that might have otherwise lived in a base class
id: Key[str]
created_at: str
@moose_data_model
@dataclass
class CompositeDataModel:
# Use composition by embedding BaseData
base_data: BaseData
# The "extended" fields
user_name: str
email: Optional[str] = None
Field Types
The schema defines the names and data types of the properties in your Data Model. Moose uses this schema to automatically set up the data infrastructure, ensuring that only data conforming to the schema can be ingested, buffered, and stored.
Key Type
The Key[T: (str, int)]
Key<T>
type is specific to Moose and is used to define a primary key (opens in a new tab) in the OLAP storage table for your Data Model. If your Data Model requires a composite key, you can apply the Key
type to multiple columns.
If you do not specify a Key
type, you must set up a DataModelConfig
to specify the properties that will be used for the order_by_fields
. Learn more in the configuration section below.
Supported Field Types
The table below shows the supported field types for your Data Model and the mapping for how Moose will store them in ClickHouse:
Clickhouse | TypeScript | Moose |
---|---|---|
String | string | ✅ |
Boolean | boolean | ✅ |
Int64 | number | ✅ |
Int256 | BigInt | ❌ |
Float64 | number | ✅ |
Decimal | number | ✅ |
DateTime | Date | ✅ |
Json | Object | ✅ |
Bytes | bytes | ❌ |
Enum | Enum | ✅ |
Array | Array | ✅ |
nullable | nullable | ✅ |
All TypeScript number types are mapped to Float64
Clickhouse | Python | Moose |
---|---|---|
String | str | ✅ |
Boolean | bool | ✅ |
Int64 | int | ✅ |
Int256 | int | ❌ |
Float64 | float | ✅ |
Decimal | float | ✅ |
DateTime | datetime.datetime | ✅ |
Json | dict | ❌ |
Bytes | bytes | ❌ |
Array | list[] | ✅ |
nullable | Optional[T] | ✅ |
Moose does not support using dict
types for Data Models. Instead, use nested
dataclasses to define your Data Model.
Ingestion and Storage Configuration
Moose provides sensible defaults for ingesting and storing your data, but you can override these behaviors by exporting a DataModelConfig
:
Default Configuration
When you do not create a custom DataModelConfig
, Moose will use the following defaults:
- Ingestion expects a single JSON object per request.
- A ClickHouse table is automatically created with no explicit
ORDER BY
fields (the Data Model is defined with aKey
). - Deduplication is disabled.
import { Key, DataModelConfig, IngestionFormat } from "@514labs/moose-lib";
export interface DefaultDataModel {
id: Key<string>;
value: number;
timestamp: Date;
}
// If you omit this config entirely, Moose will still behave the same way as just exporting the DefaultDataModel interface:
// - single JSON ingestion
// - storage enabled
// - deduplication disabled
// - no ORDER BY fields
export const DefaultDataModelConfig: DataModelConfig<DefaultDataModel> = {
ingestion: {
format: IngestionFormat.JSON,
},
storage: {
enabled: true,
deduplicate: false,
},
};
from dataclasses import dataclass
from moose_lib import Key, moose_data_model, DataModelConfig, IngestionConfig, IngestionFormat, StorageConfig
# If you omit this config entirely, Moose will still behave the same way as just defining the
# DefaultDataModel dataclass without any config arguments supplied to the @moose_data_model decorator:
# - single JSON ingestion
# - storage enabled
# - no ORDER BY fields
default_config = DataModelConfig(
ingestion=IngestionConfig(
format=IngestionFormat.JSON
),
storage=StorageConfig(enabled=True)
)
## default_config is not necessary if you don't need to customize ingestion or storage
@moose_data_model(default_config)
@dataclass
class DefaultDataModel:
id: Key[str]
value: float
timestamp: datetime
Enabling Batch Ingestion
Set the ingestion
parameter to IngestionFormat.JSON_ARRAY
to enable batch ingestion:
import {
Key,
DataModelConfig,
IngestionFormat
} from "@514labs/moose-lib";
export interface BatchDataModel {
batchId: Key<string>;
metric: number;
}
export const BatchDataModelConfig: DataModelConfig<BatchDataModel> = {
ingestion: {
format: IngestionFormat.JSON_ARRAY // accept an array of JSON objects in the request
}
};
from dataclasses import dataclass
from moose_lib import (
Key,
moose_data_model,
DataModelConfig,
IngestionConfig,
IngestionFormat
)
@moose_data_model(
DataModelConfig(
ingestion=IngestionConfig(
format=IngestionFormat.JSON_ARRAY
)
)
)
@dataclass
class BatchDataModel:
batch_id: Key[str]
metric: float
Disabling Storage
If you do not want to persist the data in OLAP storage, disable storage with the storage
parameter. This will prevent Moose from creating a table in ClickHouse, but the ingestion endpoint and streaming topic will still be created. This is useful for ephemeral or streaming-only use cases:
import { Key, DataModelConfig, StorageConfig } from "@514labs/moose-lib";
export interface NoStorageModel {
id: Key<string>;
data: string;
}
export const NoStorageConfig: DataModelConfig<NoStorageModel> = {
storage: {
enabled: false // no table creation in OLAP storage
}
};
from dataclasses import dataclass
from moose_lib import (
Key,
moose_data_model,
DataModelConfig,
StorageConfig
)
no_storage_config = DataModelConfig(
storage=StorageConfig(enabled=False) # no table creation in OLAP storage
)
@moose_data_model(no_storage_config)
@dataclass
class NoStorageModel:
id: Key[str]
data: str
Setting Up Deduplication
Use the deduplicate
parameter to enable deduplication. This will remove duplicate data from the table with the same order_by_fields
.
You must use the order_by_fields
parameter to specify the columns that will be used for deduplication. Moose will preserve the latest inserted row using the order_by_fields
as the sort key.
import { DataModelConfig, StorageConfig } from "@514labs/moose-lib";
export interface DeduplicatedModel {
timestamp: Date;
data: string;
}
export const DeduplicatedModelConfig: DataModelConfig<DeduplicatedModel> = {
storage: {
enabled: true,
deduplicate: true, // uses ReplacingMergeTree Clickhouse engine, which preserves the highest priority row based on the order_by_fields
order_by_fields: ["timestamp"] // deduplication is based on the timestamp column
}
};
from dataclasses import dataclass
from moose_lib import (
moose_data_model,
DataModelConfig,
StorageConfig
)
deduplicated_config = DataModelConfig(
storage=StorageConfig(
enabled=True,
deduplicate=True,
order_by_fields=["timestamp"]
)
)
@moose_data_model(deduplicated_config)
@dataclass
class DeduplicatedModel:
timestamp: datetime
data: str
Specifying ORDER BY
Fields
When you want to optimize queries on specific columns, set order_by_fields
. If you don’t have a Key
, specifying order_by_fields
is required to avoid ingesting data with no indexing.
import { Key, DataModelConfig, StorageConfig } from "@514labs/moose-lib";
export interface OrderedModel {
primaryKey: Key<string>;
value: number;
createdAt: Date;
}
export const OrderedModelConfig: DataModelConfig<OrderedModel> = {
storage: {
enabled: true,
order_by_fields: ["createdAt"] // optimize queries by creation date
}
};
from dataclasses import dataclass
from datetime import datetime
from moose_lib import (
Key,
moose_data_model,
DataModelConfig,
StorageConfig
)
ordered_config = DataModelConfig(
storage=StorageConfig(
enabled=True,
order_by_fields=["some_date", "some_string"] # optimize queries by some_date and some_string
)
)
@moose_data_model(ordered_config)
@dataclass
class OrderedModel:
some_string: str
some_number: float
some_date: datetime
Combining Customizations
You can mix ingestion and storage settings in one config. Below is a complete example that:
- Accepts batch JSON.
- Disables storage (e.g., ephemeral or streaming-only use case).
- Includes an explicit
ORDER BY
for the scenario where you may enable storage later.
import {
Key,
DataModelConfig,
IngestionFormat,
StorageConfig,
IngestionConfig,
} from "@514labs/moose-lib";
export interface HybridModel {
id: string;
score: number;
eventTime: Date;
}
export const HybridConfig: DataModelConfig<HybridModel> = {
ingestion: {
format: IngestionFormat.JSON_ARRAY
},
storage: {
enabled: false, // disable table creation now ...
}
};
from dataclasses import dataclass
from datetime import datetime
from moose_lib import (
Key,
moose_data_model,
DataModelConfig,
IngestionConfig,
IngestionFormat,
StorageConfig
)
hybrid_config = DataModelConfig(
ingestion=IngestionConfig(format=IngestionFormat.JSON_ARRAY),
storage=StorageConfig(
enabled=False, # table creation off
)
)
@moose_data_model(hybrid_config)
@dataclass
class HybridModel:
id: str
score: float
event_time: datetime