Batch Load Data

Batch Load Historical Star Data

Viewing typescript

switch to python

In this section, we'll load historical star data from GitHub's API to complement our real-time star events. This will give us a complete picture of who has starred our repository over time.

Create Data Models and Functions

First, we need to create a data model for historical star data and a streaming function to process it.

Create HistoricalStargazer Data Model

Create a new file in your datamodels directory:

datamodels/HistoricalStargazer.ts
import { Key, DataModelConfig, IngestionConfig, IngestionFormat } from "@514labs/moose-lib";
 
// Configure the data model to accept JSON arrays for batch ingestion
export const HistoricalStargazerConfig: DataModelConfig<HistoricalStargazer> = {
  ingestion: {
    format: IngestionFormat.JSON_ARRAY,  // Enables batch processing of records
  },
};
 
export interface HistoricalStargazer {
    starred_at: Key<Date>;
    login: string;
    avatar_url: string;
    repos_url: string;
}

Create Streaming Function

Warning:

This section assumes you've already created the StargazerProjectInfo data model from the previous section. If you haven't done this yet, please refer back to the Process Real-Time Events section to create it first.

Create a streaming function to transform historical stargazer data into StargazerProjectInfo records:

functions/HistoricalStargazer__StargazerProjectInfo.ts
import { HistoricalStargazer } from "datamodels/HistoricalStargazer";
import { StargazerProjectInfo } from "datamodels/StargazerProjectInfo";
 
export default async function run(
  source: HistoricalStargazer
): Promise<StargazerProjectInfo[] | null> {
  const repositories = await callGitHubAPI(source.repos_url);
 
  const stargazerProjects = repositories.map((repo: any) => ({
    starred_at: new Date(source.starred_at),
    stargazerName: source.login,
    repoName: repo.name,
    repoFullName: repo.full_name,
    description: repo.description,
    repoUrl: repo.html_url,
    repoStars: repo.stargazers_count,
    repoWatchers: repo.watchers_count,
    language: repo.language || "Multiple Languages",
    repoSizeKb: repo.size,
    createdAt: new Date(repo.created_at),
    updatedAt: new Date(repo.updated_at),
  }));
 
  return stargazerProjects;
}
 
async function callGitHubAPI(url: string): Promise<any> {
  const response = await fetch(url);
  return response.json();
}

Create the Ingest Script

Create a new file for the ingest script. We recommend creating a new workflows directory in your app folder to host your ingest scripts.

Terminal
mkdir -p app/workflows
touch app/workflows/ingest_stargazers.ts

Add the following code to your new file:

app/workflows/ingest_stargazers.ts
// Coming soon! For now, please use the Python version

This script:

  1. Fetches historical stargazer data from GitHub's API using pagination
  2. Processes all stargazers into a batch
  3. Sends the entire batch to your Moose application in a single request
  4. Includes error handling and progress reporting
  5. Uses environment variables for configuration

Configure Environment Variables

Create a .env file in your project root with your GitHub credentials:

.env
GITHUB_ACCESS_TOKEN=your_github_token
GITHUB_OWNER=your_github_repo_owner
GITHUB_REPO=your_github_repo
MOOSE_API_HOST=http://localhost:4000
Warning:

Make sure to replace your_github_token with a valid GitHub Personal Access Token (opens in a new tab). Keep your token secure and never commit it to version control. We recommend storing it in your .env file which should be listed in your .gitignore.

Run the Batch Load Script

The ingest script will:

  1. Fetch all historical stargazers from GitHub's API
  2. Send all stargazer records to your Moose application's HistoricalStargazer ingestion endpoint as a single batch
  3. The streaming function will automatically process each record and populate the StargazerProjectInfo table

Run the script:

Terminal
ts-node app/workflows/ingest_stargazers.ts

You should see output like:

Completed: Ingested 36 stargazers

Verify the Data

Query your StargazerProjectInfo table to see the historical data:

SELECT 
    stargazer_login,
    starred_at,
    COUNT(DISTINCT repo_name) as num_repos
FROM local.StargazerProjectInfo_0_0
GROUP BY stargazer_login, starred_at
ORDER BY starred_at DESC
LIMIT 5;
Congrats!

You now have a complete dataset of both historical and real-time star events! This data will be used in the next sections to analyze trends and patterns in your repository's stargazers.