Aggregate Processed Events with Blocks

Aggregate Programming Languages with Moose Blocks

Viewing typescript

switch to python

To analyze which programming languages are most popular among your stargazers, we need to aggregate data from the StargazerProjectInfo table. This table contains details about each repository owned by your stargazers, but the data is spread across multiple rows and columns, making it hard to glean insights directly.

We can use Moose Blocks to create a Materialized View that pre-aggregates this data for us. This way, we can efficiently query the most popular programming languages among all stargazers.

What are Blocks?

Blocks allow you to execute SQL statements against your database to set up or manage database objects. This includes creating views or materialized views to aggregate, join, or group data from various tables.

Create a New Block: TopProgrammingLanguages

Initialize the Block with Moose CLI

In your terminal, run:

Terminal
npx moose-cli block init TopProgrammingLanguages

This command generates a new file TopProgrammingLanguages.ts in the /blocks directory:

      • TopProgrammingLanguages.ts
  • Inspect the Generated File

    Open TopProgrammingLanguages.ts. You should see:

    app/blocks/TopProgrammingLanguages.ts
    import { Blocks } from "@514labs/moose-lib";
     
    export default {
      setup: [
        // Array of SQL statements for setting up database resources
      ],
      teardown: [
        // Array of SQL statements for tearing down resources
      ],
    } as Blocks;
    Understanding Blocks
    • Blocks: Imported from @514labs/moose-lib, it provides type safety for the setup and teardown arrays.
    • Default Export: An object conforming to the Blocks type.
    • setup: An array of SQL statements executed to set up database resources when you start your dev server.
    • teardown: An array of SQL statements executed to clean up resources when you stop your dev server.

    Implement the Materialized View

    We'll create a materialized view that aggregates programming language data from StargazerProjectInfo_0_0. This view will automatically update as new data arrives, providing real-time insights into the most popular programming languages among your stargazers.

    Here's how to implement it:

    app/blocks/TopProgrammingLanguages.ts
    import {
      Blocks,
      createAggregation,
      dropAggregation,
      ClickHouseEngines,
    } from "@514labs/moose-lib";
     
    const MV_NAME = "TopProgrammingLanguages_MV";
    const TABLE_NAME = "TopProgrammingLanguages";
     
    const createTableOptions = {
      name: TABLE_NAME,
      columns: {
        language: "String",
        total_projects: "AggregateFunction(count, UInt64)",
        total_repo_size_kb: "AggregateFunction(sum, Float64)",
        average_repo_size_kb: "AggregateFunction(avg, Float64)",
      },
      engine: ClickHouseEngines.AggregatingMergeTree,
      orderBy: "language",
    };
     
    const selectQuery = `
    SELECT
      language,
      countState(*) AS total_projects,
      sumState(repoSizeKb) AS total_repo_size_kb,
      avgState(repoSizeKb) AS average_repo_size_kb
    FROM StargazerProjectInfo_0_0
    GROUP BY language
    `;
     
    export default {
      teardown: [...dropAggregation({ viewName: MV_NAME, tableName: TABLE_NAME })],
      setup: [
        ...createAggregation({
          materializedViewName: MV_NAME,
          tableCreateOptions: createTableOptions,
          select: selectQuery,
        }),
      ],
    } as Blocks;

    Explanation:

    • Define the Aggregated Table: We specify createTableOptions for the TopProgrammingLanguages table, which will store aggregated data such as the total number of projects and total repository size per programming language.

    • Write the Aggregation Query: The selectQuery aggregates data from StargazerProjectInfo_0_0, grouping by language and calculating counts and sums.

    • Set Up and Tear Down: In the exported Blocks object, we use createAggregation in setup to establish the materialized view and table. The teardown uses dropAggregation to remove them when they are no longer needed.

    • AggregatingMergeTree Engine: We use this engine for the table to optimize aggregation operations. It maintains and updates aggregation states efficiently as new data is inserted.

    Save and Apply the Changes

    Save TopProgrammingLanguages.ts. Moose will automatically apply the changes, creating the new table and materialized view in your database.

    Verify the Materialized View

    Open your ClickHouse database explorer and check that the TopProgrammingLanguages table and TopProgrammingLanguages_MV materialized view have been created. Query the TopProgrammingLanguages table to see the aggregated data:

    SELECT
      language,
      countMerge(total_projects) AS total_projects,
      sumMerge(total_repo_size_kb) AS total_repo_size_kb,
      avgMerge(average_repo_size_kb) AS average_repo_size_kb
    FROM TopProgrammingLanguages
    ORDER BY total_projects DESC
    Understanding the Query

    In AggregatingMergeTree tables, countState and sumState store intermediate states (partial counts or sums). To get final results, countMerge and sumMerge combine these states into total counts or sums.

    Congrats!

    You've successfully aggregated programming language data! To recap:

    • You initialized a new Block using the Moose CLI helper.
    • You implemented a materialized view that aggregates data from StargazerProjectInfo_0_0.
    • The setup automates the creation of the view and table, and Moose manages the execution of these SQL statements.

    Your data is now pre-aggregated and ready for analysis or integration into applications.