Ingesting Data

Ingesting Data

Viewing typescript

switch to python

Once you've defined a Data Model, Moose makes it easy to ingest data from data sources that align to the schema of your Data Model. Moose automatically spins up an API endpoint for each Data Model that you can make an HTTP POST request to with data that matches the schema of the Data Model.

For each Data Model, an ingestion API endpoint is spun up at:

http://localhost:4000/ingest/<DataModelName>

For example, if you have a Data Model named Foo, the ingestion endpoint will be http://localhost:4000/ingest/Foo.

app/datamodels/Foo.ts
import { Key } from "@514labs/moose-lib";
 
export interface Foo {
  primaryKey: Key<string>;
  timestamp: number;
  optionalText?: string;
}

You can ingest data from any HTTP-compatible data source, including webhooks, scripts, and any existing application that can make an HTTP POST request.

OpenAPI Specification

Moose generates and automatically updates an OpenAPI specification for all API endpoints in your project. This specification is located in the openapi.yaml file in the root of your project. It provides a comprehensive overview of the available endpoints, request and response formats, and includes example data for each Data Model schema so you can quickly get started.

Accessing OpenAPI Documentation

Locating the File in Your Project

To view and utilize the OpenAPI documentation:

  1. Locate the .moose folder in your project. It will be in the root of your project.
  2. Open the openapi.yaml file in your preferred OpenAPI viewer or editor.
  3. Use tools like Swagger UI or Postman to explore and test the API endpoints.
    • openapi.yaml
  • OpenAPI (Swagger) Editor

    If you're using VSCode, consider installing the OpenAPI (Swagger) Editor (opens in a new tab) extension for a seamless experience.

    Accessing via URL

    The file is also automatically hosted on your local development server at http://localhost:5001/openapi.yaml. You can use this URL to access the OpenAPI spec from your browser.

    Using OpenAPI UI to Ingest Data

    The OpenAPI spec includes example data for each Data Model schema, which can be used to construct requests. If using the Swagger UI, you can use the example data to construct requests by clicking the Try it out button.

    Ingesting Data with Moose Workflows

    Moose Workflows are a powerful way to perform ETL jobs on data. You can use the ingestion API to load data into your Data Models from your workflow tasks.

    Quickstart Example

    Here is an example of an ingestion workflow that generates 1000 rows of data and ingests it into the Foo Data Model:

    app/datamodels/Foo.ts
    import { Key } from "@514labs/moose-lib";
     
    export interface Foo {
      primaryKey: Key<string>;
      timestamp: number;
      optionalText?: string;
    }

    Generate the Workflow

    Run the following command:

    Terminal
    npx moose-cli workflow init ExampleIngestWorkflow --tasks "ingest"

    This will generate a new directory in the scripts directory called ExampleIngestWorkflow, which contains the 1.ingest.ts file:

        • 1.ingest.ts
  • Define the Ingestion Task Logic

    Paste the following code into the 1.ingest.ts file:

    app/scripts/ExampleIngestWorkflow/1.ingest.ts
    import { TaskDefinition, TaskFunction } from "@514labs/moose-lib";
    import { Foo } from "../../datamodels/models";
    import fs from "fs";
    import { randomUUID } from "crypto";
     
    // Generate 1000 rows of random Foo data and ingest it into the Foo Data Model
    const ingest: TaskFunction = async (input: any) => {
     
      // Read the Unix word list
      const unixWords = fs.readFileSync("/usr/share/dict/words", "utf8").split("\n");
     
      // Get a recent timestamp within the last n_days
      const getRecentTimestamp = (n_days: number) => {
        const millisecondsInDays = n_days * 24 * 60 * 60 * 1000;
        const intervalStartDate = Date.now() - millisecondsInDays;
        return intervalStartDate + Math.random() * millisecondsInDays;
      };
     
      // Get a random word from the word list
      const getRandomWord = (words: string[]) => {
        return words[Math.floor(Math.random() * words.length)];
      };
     
      // Generate 1000 rows of random Foo data and ingest it into the Foo Data Model
      for (let i = 0; i < 1000; i++) {
     
        const foo: Foo = {
          primaryKey: randomUUID(),
          timestamp: getRecentTimestamp(365),
          optionalText: Math.random() < 0.5 ? getRandomWord(unixWords) : undefined,
        };
     
        await fetch("http://localhost:4000/ingest/Foo", {
          method: "POST",
          body: JSON.stringify(foo),
        });
      }
     
      return {
        task: "ingest",
        data: "success",
      };
    };
     
    export default function createTask() {
      return {
        task: ingest,
      } as TaskDefinition;
    }
     

    Run the Workflow

    To run the workflow, run the following command:

    Terminal
    npx moose-cli workflow run ExampleIngestWorkflow

    Example Snippets

    app/datamodels/Foo.ts
    import { Key } from "@514labs/moose-lib";
     
    export interface Foo {
      primaryKey: Key<string>;
      timestamp: number;
      optionalText?: string;
    }
    Terminal
    curl -X POST \
      -H "Content-Type: application/json" \
      -d '{"key": "exampleKey", "value": 1, "timestamp": 1672531200}' \
      http://localhost:4000/ingest/YourDataModel
    Ensure Your Development Server is Running

    The dev server runs on localhost:4000 by default. Start the server by running moose-cli dev in your terminal. Learn more

    Ingestion SDK

    With the OpenAPI spec, you can use tools like the OpenAPI Generator (opens in a new tab) to generate an SDK for your Data Model.

    If you follow their installation instructions, you can generate an SDK for your Data Model in any language that the OpenAPI Generator supports.

    Assuming you are running the OpenAPI Generator CLI from the root of your Moose project, the following commands will generage an SDK for your Moose project in the moose_sdk directory:

    Terminal
    openapi-generator-cli generate -i .moose/openapi.yaml -g python-requests -o ./moose_sdk