Aggregate Processed Events with Blocks

Aggregate Programming Languages with Moose Blocks

Viewing typescript

switch to python

To analyze which programming languages are most popular among your stargazers, we need to aggregate data from the StargazerProjectInfo table. This table contains details about each repository owned by your stargazers, but the data is spread across multiple rows and columns, making it hard to glean insights directly.

We can use Moose Blocks to create a Materialized View that pre-aggregates this data for us. This way, we can efficiently query the most popular programming languages among all stargazers.

What are Blocks?

Blocks allow you to execute SQL statements against your database to set up or manage database objects. This includes creating views or materialized views to aggregate, join, or group data from various tables.

Create a New Block: TopLanguages

Initialize the Block with Moose CLI

In your terminal, run:

Terminal
npx moose-cli block init TopLanguages

This command generates a new file TopLanguages.ts in the /blocks directory:

      • TopLanguages.ts
  • Inspect the Generated File

    Open TopLanguages.ts. You should see:

    app/blocks/TopLanguages.ts
    import { Blocks } from "@514labs/moose-lib";
     
    export default {
      setup: [
        // Array of SQL statements for setting up database resources
      ],
      teardown: [
        // Array of SQL statements for tearing down resources
      ],
    } as Blocks;
    Understanding Blocks
    • Blocks: Imported from @514labs/moose-lib, it provides type safety for the setup and teardown arrays.
    • Default Export: An object conforming to the Blocks type.
    • setup: An array of SQL statements executed to set up database resources when you start your dev server.
    • teardown: An array of SQL statements executed to clean up resources when you stop your dev server.

    Implement the Materialized View

    We'll create a materialized view that aggregates programming language data from StargazerProjectInfo_0_0. This view will automatically update as new data arrives, providing real-time insights into the most popular programming languages among your stargazers.

    Here's how to implement it:

    app/blocks/TopLanguages.ts
    import {
      Blocks,
      createAggregation,
      dropAggregation,
      ClickHouseEngines,
    } from "@514labs/moose-lib";
     
    const MV_NAME = "TopLanguages_MV";
    const TABLE_NAME = "TopLanguages";
     
    const TABLE_OPTIONS = {
      name: TABLE_NAME,
      columns: {
        language: "String",
        total_projects: "AggregateFunction(count, UInt64)",
        total_repo_size_kb: "AggregateFunction(sum, Float64)",
        average_repo_size_kb: "AggregateFunction(avg, Float64)",
      },
      engine: ClickHouseEngines.AggregatingMergeTree,
      orderBy: "language",
    };
     
    const QUERY = `
    SELECT
      language,
      countState(*) AS total_projects,
      sumState(repoSizeKb) AS total_repo_size_kb,
      avgState(repoSizeKb) AS average_repo_size_kb
    FROM StargazerProjectInfo_0_0
    GROUP BY language
    `;
     
    export default {
      teardown: [...dropAggregation({ viewName: MV_NAME, tableName: TABLE_NAME })],
      setup: [
        ...createAggregation({
          materializedViewName: MV_NAME,
          tableCreateOptions: TABLE_OPTIONS,
          select: QUERY,
        }),
      ],
    } as Blocks;

    Explanation:

    • TABLE_OPTIONS defines the setup and schema for the TopLanguages table, which will store the QUERY results.
    • QUERY aggregates data from StargazerProjectInfo_0_0, grouping by language and calculating counts and sums.
    • The Blocks object uses create_aggregation in setup to establish the materialized view and table. The teardown uses drop_aggregation to remove them when they are no longer needed.
    • We choose the ClickHouseEngines.AggregatingMergeTree engine for the table to optimize aggregation operations. It maintains and updates aggregation states efficiently as new data is inserted.

    Save and Apply the Changes

    Save TopLanguages.ts. Moose will automatically apply the changes, creating the new table and materialized view in your database.

    Verify the Materialized View

    Open your ClickHouse database explorer and check that the TopLanguages table and TopLanguages_MV materialized view have been created. Query the TopLanguages table to see the aggregated data:

    SELECT
      language,
      countMerge(total_projects) AS total_projects,
      sumMerge(total_repo_size_kb) AS total_repo_size_kb,
      avgMerge(average_repo_size_kb) AS average_repo_size_kb
    FROM TopLanguages
    ORDER BY total_projects DESC
    Understanding the Query

    In AggregatingMergeTree tables, countState and sumState store intermediate states (partial counts or sums). To get final results, countMerge and sumMerge combine these states into total counts or sums.

    Congrats!

    You've successfully aggregated programming language data! To recap:

    • You initialized a new Block using the Moose CLI helper.
    • You implemented a materialized view that aggregates data from StargazerProjectInfo_0_0.
    • The setup automates the creation of the view and table, and Moose manages the execution of these SQL statements.

    Your data is now pre-aggregated and ready for analysis or integration into applications.