Data Modeling

Data Modeling

Viewing typescript

switch to python

Moose Data Models allow you to define the schema and shape of your data and automatically provision ingest APIs, streaming topics, and OLAP-backed storage tables—all from simple type definitions.

datamodels/MyDataModel.ts
import { Key } from "moose-lib";
 
export interface MyDataModel {
  primaryKey: Key<string>;
  someString: string;
  someNumber: number;
  someBoolean: boolean;
  someDate: Date;
  someArray: string[];
}

Quick Start

Data Models are defined in the /datamodels directory of your Moose application. This folder is automatically created when you initialize a new Moose application.

To create a new Data Model, either create a new file in the /datamodels directory or run the following command:

Terminal
npx moose-cli data-model init MyDataModel --sample <PATH_TO_SAMPLE_DATA>

The CLI will create a new file in the /datamodels directory with the name of the Data Model:

      • MyDataModel.ts
  • Automatically Infer Data Model

    If you have a .json or .csv file with sample data representing the structure of the data you want to ingest, you can use the --sample flag. The CLI will then infer the Data Model schema from the provided file and create the Data Model for you.


    Data Model Definition

    Data Models are written as TypeScript interfaces. To have Moose recognize your Data Model, you must export them from a .ts file. For example:

    datamodels/models.ts
    import { Key } from "@514labs/moose-lib";
     
    export interface MyFirstDataModel {
      primaryKey: Key<string>;
      someString: string;
      someNumber: number;
      someBoolean: boolean; 
      someDate: Date;
      someArray: string[];
      someOptionalString?: string; // Optional fields are marked with a `?`
    }
    Note

    You can define multiple data models in the same file. Each exported interface is automatically recognized by Moose.


    Basic Usage and Examples

    Below you will find several examples showing the different ways you can define Data Models.

    Basic Data Model

    sample_data.json
    {
      "example_UUID": "123e4567-e89b-12d3-a456-426614174000",
      "example_string": "string",
      "example_number": 123,
      "example_boolean": true,
      "example_array": [1, 2, 3]
    }
    datamodels/models.ts
    import { Key } from "@514labs/moose-lib";
     
    export interface BasicDataModel {
      example_UUID: Key<string>;
      example_string: string;
      example_number: number;
      example_boolean: boolean;
      example_array: number[];
    }

    Optional Fields

    sample.json
    [
      {
        "example_UUID": "123e4567-e89b-12d3-a456-426614174000",
        "example_string": "string",
        "example_number": 123,
        "example_boolean": true,
        "example_array": [1, 2, 3],
        "example_optional_string": "optional"
      },
      {
        "example_UUID": "123e4567-e89b-12d3-a456-426614174000",
        "example_string": "string",
        "example_number": 123,
        "example_boolean": true,
        "example_array": [1, 2, 3]
      }
    ]
    datamodels/models.ts
    import { Key } from "@514labs/moose-lib";
     
    export interface DataModelWithOptionalField {
      example_UUID: Key<string>;
      example_string: string;
      example_number: number;
      example_boolean: boolean;
      example_array: number[];
      example_optional_string?: string; // Use the `?` operator to mark a field as optional
    }

    Nested Fields

    sample.json
    {
      "example_UUID": "123e4567-e89b-12d3-a456-426614174000",
      "example_string": "string",
      "example_number": 123,
      "example_boolean": true,
      "example_array": [1, 2, 3],
      "example_nested_object": {
        "example_nested_number": 456,
        "example_nested_boolean": true,
        "example_nested_array": [4, 5, 6]
      }
    }
    datamodels/models.ts
    import { Key } from "@514labs/moose-lib";
     
    // Define the nested object interface separately
    interface NestedObject {
      example_nested_number: number;
      example_nested_boolean: boolean;
      example_nested_array: number[];
    }
     
    export interface DataModelWithNestedObject {
      example_UUID: Key<string>;
      example_string: string;
      example_number: number;
      example_boolean: boolean;
      example_array: number[]; 
      example_nested_object: NestedObject; // Reference nested object interface
    }

    Using Enum in Data Models

    Moose supports the use of Enums in Data Models:

    datamodels/DataModelWithEnum.ts
    import { Key } from "@514labs/moose-lib";
     
    enum MyEnum {
      VALUE_1 = "value1",
      VALUE_2 = "value2",
    }
     
    export interface DataModelWithEnum {
      example_UUID: Key<string>;
      example_enum: MyEnum;
    }

    Inherritance

    Moose supports the use of inheritance in Data Models. This allows you to create a base Data Model and extend it with additional fields:

    datamodels/models.ts
    import { Key } from "@514labs/moose-lib";
     
    export interface BaseDataModel {
      example_UUID: Key<string>;
    }
     
    export interface ExtendedDataModel extends BaseDataModel {
      example_string: string;
    }
     
    // Schema of ExtendedDataModel:
    // {
    //   "example_UUID": "123e4567-e89b-12d3-a456-426614174000",
    //   "example_string": "string"
    // }

    Field Types

    The schema defines the names and data types of the properties in your Data Model. Moose uses this schema to automatically set up the data infrastructure, ensuring that only data conforming to the schema can be ingested, buffered, and stored.

    Key Type

    The Key<T> type is specific to Moose and is used to define a primary key (opens in a new tab) in the OLAP storage table for your Data Model. If your Data Model requires a composite key, you can apply the Key type to multiple columns.

    Key Type is Required

    If you do not specify a Key type, you must set up a DataModelConfig to specify the properties that will be used for the order_by_fields. Learn more in the configuration section below.

    Supported Field Types

    The table below shows the supported field types for your Data Model and the mapping for how Moose will store them in ClickHouse:

    ClickhouseTypeScriptMoose
    Stringstring
    Booleanboolean
    Int64number
    Int256BigInt
    Float64number
    Decimalnumber
    DateTimeDate
    JsonObject
    Bytesbytes
    EnumEnum
    ArrayArray
    nullablenullable
    Disclaimer

    All TypeScript number types are mapped to Float64


    Ingestion and Storage Configuration

    Moose provides sensible defaults for ingesting and storing your data, but you can override these behaviors by exporting a DataModelConfig:

    Default Configuration

    When you do not create a custom DataModelConfig, Moose will use the following defaults:

    • Ingestion expects a single JSON object per request.
    • A ClickHouse table is automatically created with no explicit ORDER BY fields (the Data Model is defined with a Key).
    • Deduplication is disabled.
    datamodels/DefaultDataModel.ts
    import { Key, DataModelConfig, IngestionFormat } from "@514labs/moose-lib";
     
    export interface DefaultDataModel {
      id: Key<string>;
      value: number;
      timestamp: Date;
    }
     
    // If you omit this config entirely, Moose will still behave the same way as just exporting the DefaultDataModel interface:
    // - single JSON ingestion
    // - storage enabled
    // - deduplication disabled
    // - no ORDER BY fields
    export const DefaultDataModelConfig: DataModelConfig<DefaultDataModel> = {
      ingestion: {
        format: IngestionFormat.JSON,
      },
      storage: {
        enabled: true,
        deduplicate: false,
      },
    };
     

    Enabling Batch Ingestion

    Set the ingestion parameter to IngestionFormat.JSON_ARRAY to enable batch ingestion:

    datamodels/models.ts
    import {
      Key,
      DataModelConfig,
      IngestionFormat
    } from "@514labs/moose-lib";
     
    export interface BatchDataModel {
      batchId: Key<string>;
      metric: number;
    }
     
    export const BatchDataModelConfig: DataModelConfig<BatchDataModel> = {
      ingestion: {
        format: IngestionFormat.JSON_ARRAY // accept an array of JSON objects in the request
      }
    };

    Disabling Storage

    If you do not want to persist the data in OLAP storage, disable storage with the storage parameter. This will prevent Moose from creating a table in ClickHouse, but the ingestion endpoint and streaming topic will still be created. This is useful for ephemeral or streaming-only use cases:

    datamodels/NoStorageModel.ts
    import { Key, DataModelConfig, StorageConfig } from "@514labs/moose-lib";
     
    export interface NoStorageModel {
      id: Key<string>;
      data: string;
    }
     
    export const NoStorageConfig: DataModelConfig<NoStorageModel> = {
      storage: {
        enabled: false // no table creation in OLAP storage
      }
    };

    Setting Up Deduplication

    Use the deduplicate parameter to enable deduplication. This will remove duplicate data from the table with the same order_by_fields.

    Important

    You must use the order_by_fields parameter to specify the columns that will be used for deduplication. Moose will preserve the latest inserted row using the order_by_fields as the sort key.

    datamodels/DeduplicatedModel.ts
    import { DataModelConfig, StorageConfig } from "@514labs/moose-lib";
     
    export interface DeduplicatedModel {
      timestamp: Date;
      data: string;
    }
     
    export const DeduplicatedModelConfig: DataModelConfig<DeduplicatedModel> = {
      storage: { 
        enabled: true, 
        deduplicate: true, // uses ReplacingMergeTree Clickhouse engine, which preserves the highest priority row based on the order_by_fields
        order_by_fields: ["timestamp"] // deduplication is based on the timestamp column
      }
    };

    Specifying ORDER BY Fields

    When you want to optimize queries on specific columns, set order_by_fields. If you don’t have a Key, specifying order_by_fields is required to avoid ingesting data with no indexing.

    datamodels/OrderedModel.ts
    import { Key, DataModelConfig, StorageConfig } from "@514labs/moose-lib";
     
    export interface OrderedModel {
      primaryKey: Key<string>;
      value: number;
      createdAt: Date;
    }
     
    export const OrderedModelConfig: DataModelConfig<OrderedModel> = {
      storage: {
        enabled: true,
        order_by_fields: ["createdAt"] // optimize queries by creation date
      }
    };

    Combining Customizations

    You can mix ingestion and storage settings in one config. Below is a complete example that:

    • Accepts batch JSON.
    • Disables storage (e.g., ephemeral or streaming-only use case).
    • Includes an explicit ORDER BY for the scenario where you may enable storage later.
    datamodels/models.ts
    import {
      Key,
      DataModelConfig,
      IngestionFormat,
      StorageConfig,
      IngestionConfig,
    } from "@514labs/moose-lib";
     
    export interface HybridModel {
      id: string;
      score: number;
      eventTime: Date;
    }
     
    export const HybridConfig: DataModelConfig<HybridModel> = {
      ingestion: {
        format: IngestionFormat.JSON_ARRAY
      },
      storage: {
        enabled: false,         // disable table creation now ...
      }
    };