Processing Events With Streaming Functions
In the /datamodels/models.ts
file, you will find the PageViewProcessed
data model:
export interface PageViewProcessed extends PageViewRaw {
hostname: string;
device_vendor: string;
device_type: string;
device_model: string;
browser_name: string;
browser_version: string;
os_name: string;
os_version: string;
}
In this model, the user_agent
field from the PageViewRaw
model has been replaced by several fields that provide more structured information about the user's device, browser, and operating system.
We achieve this transformation using the Streaming Functions primitive. You can see the transformation code in the PageViewRaw__PageViewProcessed.ts
file located in the /functions
folder of your Moose project. Here is an example of what the code might look like:
import { PageViewRaw, PageViewProcessed } from "../datamodels/models";
import { UAParser } from "ua-parser-js";
export default function run(event: PageViewRaw): PageViewProcessed {
// Process the user_agent to extract device, browser, and OS details
const { browser, device, os } = UAParser(event.user_agent);
// Extract the hostname from the URL
const hostname = new URL(event.href).hostname;
return {
...event,
hostname: hostname,
device_vendor: device.vendor || "",
device_type: device.type || "desktop",
device_model: device.model || "",
browser_name: browser.name || "",
browser_version: browser.version || "",
os_name: os.name || "",
os_version: os.version || "",
};
}
Here's a breakdown of how this works:
- Function Definition: The
run()
function takes in anevent
of typePageViewRaw
. - User Agent Parsing: The function processes the
user_agent
property to extract details about the device, browser, and operating system. It uses theua-parser-js
library to parse the user agent string. - Hostname Extraction: It extracts the hostname from the event URL.
- Return Processed Event: The function returns a new
PageViewProcessed
event object with the extracted details.
When you save this function in the functions
folder, Moose automatically listens for new PageViewRaw
events. Upon receiving a new record, Moose runs the run
function to process the data and publishes the resulting PageViewProcessed
event to the appropriate streaming topic. Finally, Moose stores this processed data in the PageViewProcessed
table in Clickhouse, making it available for querying and analysis.
You can validate that the steaming function provided already worked on the PageViewRaw
data you ingested by checking your terminal. You should see a message indicating that the function processed the data and the number of messages processed.
You can also verify the processed data by querying the PageViewProcessed
table in your database explorer.
Important Note
The naming convention of your filename is critical. In order for Moose to identify your streaming functions, the file you define your function in must follow this convention:
<SOURCE_DATAMODEL_NAME>__<DESTINATION_DATAMODEL_NAME>
-SOURCE_DATAMODEL_NAME
: The pre-processed data model that you want to transform via the function.
DESTINATION_DATAMODEL_NAME
: The model for the return type of transformed data.
By adhering to this convention, Moose ensures that your functions are correctly recognized and executed. For more advanced details and additional capabilities of streaming functions, refer to the Streaming Functions documentation.