Aggregate Programming Languages with Moose Blocks
Viewing typescript
switch to python
To analyze which programming languages are most popular among your stargazers, we need to aggregate data from the StargazerProjectInfo
table. This table contains details about each repository owned by your stargazers, but the data is spread across multiple rows and columns, making it hard to glean insights directly.
We can use Moose Blocks to create a Materialized View that pre-aggregates this data for us. This way, we can efficiently query the most popular programming languages among all stargazers.
Blocks allow you to execute SQL statements against your database to set up or manage database objects. This includes creating views or materialized views to aggregate, join, or group data from various tables.
Create a New Block: TopProgrammingLanguages
Initialize the Block with Moose CLI
In your terminal, run:
npx moose-cli block init TopProgrammingLanguages
This command generates a new file TopProgrammingLanguages.ts
in the /blocks
directory:
- TopProgrammingLanguages.ts
moose-cli block init TopProgrammingLanguages
This command generates a new file TopProgrammingLanguages.py
in the /blocks
directory:
- TopProgrammingLanguages.py
Inspect the Generated File
Open TopProgrammingLanguages.ts
TopProgrammingLanguages.py
. You should see:
import { Blocks } from "@514labs/moose-lib";
export default {
setup: [
// Array of SQL statements for setting up database resources
],
teardown: [
// Array of SQL statements for tearing down resources
],
} as Blocks;
from moose_lib import (
AggregationCreateOptions,
AggregationDropOptions,
Blocks,
ClickHouseEngines,
TableCreateOptions,
create_aggregation,
drop_aggregation,
)
teardown_queries = []
setup_queries = []
block = Blocks(teardown=teardown_queries, setup=setup_queries)
- Blocks: Imported from
@514labs/moose-lib
moose_lib
, it provides type safety for thesetup
andteardown
arrays.
- Default Export: An object conforming to the
Blocks
type.
block
: Is an instance of theBlocks
class.
setup
: An array of SQL statements executed to set up database resources when you start your dev server.teardown
: An array of SQL statements executed to clean up resources when you stop your dev server.
Implement the Materialized View
We'll create a materialized view that aggregates programming language data from StargazerProjectInfo_0_0
. This view will automatically update as new data arrives, providing real-time insights into the most popular programming languages among your stargazers.
Here's how to implement it:
import {
Blocks,
createAggregation,
dropAggregation,
ClickHouseEngines,
} from "@514labs/moose-lib";
const MV_NAME = "TopProgrammingLanguages_MV";
const TABLE_NAME = "TopProgrammingLanguages";
const createTableOptions = {
name: TABLE_NAME,
columns: {
language: "String",
total_projects: "AggregateFunction(count, UInt64)",
total_repo_size_kb: "AggregateFunction(sum, Float64)",
average_repo_size_kb: "AggregateFunction(avg, Float64)",
},
engine: ClickHouseEngines.AggregatingMergeTree,
orderBy: "language",
};
const selectQuery = `
SELECT
language,
countState(*) AS total_projects,
sumState(repoSizeKb) AS total_repo_size_kb,
avgState(repoSizeKb) AS average_repo_size_kb
FROM StargazerProjectInfo_0_0
GROUP BY language
`;
export default {
teardown: [...dropAggregation({ viewName: MV_NAME, tableName: TABLE_NAME })],
setup: [
...createAggregation({
materializedViewName: MV_NAME,
tableCreateOptions: createTableOptions,
select: selectQuery,
}),
],
} as Blocks;
from moose_lib import (
AggregationCreateOptions,
AggregationDropOptions,
Blocks,
ClickHouseEngines,
TableCreateOptions,
create_aggregation,
drop_aggregation,
)
mv_name = "TopProgrammingLanguagesMV"
table_name = "TopProgrammingLanguages"
create_table_options = TableCreateOptions(
name=table_name,
columns={
"language": "String",
"total_projects": "AggregateFunction(count, Int64)",
"total_repo_size_kb": "AggregateFunction(sum, Int64)",
"avg_repo_size_kb": "AggregateFunction(avg, Int64)"
},
engine=ClickHouseEngines.AggregatingMergeTree,
order_by="language",
)
select_query = f'''
SELECT
language,
countState(*) AS total_projects,
sumState(repo_size_kb) AS total_repo_size_kb,
avgState(repo_size_kb) AS avg_repo_size_kb
FROM
StargazerRepositoryInfo_0_0
GROUP BY
language
'''
teardown_queries = drop_aggregation(AggregationDropOptions(view_name=mv_name, table_name=table_name))
setup_queries = create_aggregation(AggregationCreateOptions(
materialized_view_name=mv_name,
select=select_query,
table_create_options=create_table_options,
))
block = Blocks(teardown=teardown_queries, setup=setup_queries)
Explanation:
-
Define the Aggregated Table: We specify
createTableOptions
create_table_options
for theTopProgrammingLanguages
table, which will store aggregated data such as the total number of projects and total repository size per programming language. -
Write the Aggregation Query: The
selectQuery
select_query
aggregates data fromStargazerProjectInfo_0_0
, grouping bylanguage
and calculating counts and sums. -
Set Up and Tear Down: In the exported
Blocks
objectBlocks()
instance, we usecreateAggregation
create_aggregation
insetup
to establish the materialized view and table. Theteardown
usesdropAggregation
drop_aggregation
to remove them when they are no longer needed. -
AggregatingMergeTree Engine: We use this engine for the table to optimize aggregation operations. It maintains and updates aggregation states efficiently as new data is inserted.
Save and Apply the Changes
Save TopProgrammingLanguages.ts
TopProgrammingLanguages.py
. Moose will automatically apply the changes, creating the new table and materialized view in your database.
Verify the Materialized View
Open your ClickHouse database explorer and check that the TopProgrammingLanguages
table and TopProgrammingLanguages_MV
materialized view have been created. Query the TopProgrammingLanguages
table to see the aggregated data:
SELECT
language,
countMerge(total_projects) AS total_projects,
sumMerge(total_repo_size_kb) AS total_repo_size_kb,
avgMerge(average_repo_size_kb) AS average_repo_size_kb
FROM TopProgrammingLanguages
ORDER BY total_projects DESC
In AggregatingMergeTree tables, countState
and sumState
store intermediate states (partial counts or sums). To get final results, countMerge
and sumMerge
combine these states into total counts or sums.
You've successfully aggregated programming language data! To recap:
- You initialized a new Block using the Moose CLI helper.
- You implemented a materialized view that aggregates data from
StargazerProjectInfo_0_0
. - The setup automates the creation of the view and table, and Moose manages the execution of these SQL statements.
Your data is now pre-aggregated and ready for analysis or integration into applications.