Collector service
How to send data in near real-time using the Collector service.
Last updated
How to send data in near real-time using the Collector service.
Last updated
In this scenario, you deploy an Evidently Collector service for near real-time monitoring.
Evidently Collector is a service that allows you to collect online events into batches, create Reports
or TestSuites
over batches of data, and save them as snapshots
to your Workspace.
You will need to POST the predictions from the ML service to the Evidently Collector service. You can POST data on every prediction or batch them. The Evidently collector service will perform asynchronous computation of monitoring snapshots based on the provided configuration.
You can also pass the path to the optional reference dataset.
If you receive delayed ground truth, you can later compute and log the model quality to the same Project. You can run it as a separate process or a batch job.
Refer to this example:
Before sending events, you must configure the collector and start the service.
You can choose either of the two options:
Create configuration via code, save it to a JSON file, and run the service using it.
Run the service first and create configuration via API.
The collector service can simultaneously run multiple “collectors” that compute and save snapshots to different Workspaces or Projects. Each one is represented by a CollectorConfig
object.
CollectorConfig
ObjectYou can configure the following parameters:
Parameter | Type | Description |
---|---|---|
|
| Defines when to create a new snapshot from the current batch. |
|
| Configures the contents of the snapshot: |
| Optional[str] | Local path to a .parquet file with the reference dataset. |
| bool | Defines whether to cache reference data or re-read it each time. |
| str | URL where the Evidently UI Service runs and snapshots will be saved to. For Evidently Cloud, use |
| Optional[str] | Evidently UI Service secrets. |
| str | ID of the project to save snapshots to. |
You can create a ReportConfig
object from Report
or TestSuite
objects. You must run them first so that all Metrics
and Tests
are collected (including when you use Presets or Test/Metric generators).
Currently, there are two options available:
IntervalTrigger
: triggers the snapshot calculation at set intervals (in seconds).
RowsCountTrigger
: triggers the snapshot calculation when a specific row count is reached.
Note: we are also working on CronTrigger
and other triggers. Would you like to see additional scenarios? Please open a GitHub issue with your suggestions.
You can define the configuration and save it as a JSON file. Example:
Then, run the following command:
First, run the collector service:
Then, use the CollectorClient
to add a new collector config:
To specify the path to the reference dataset:
To send events from your ML service:
To send data with curl
:
Example:
This is how it looks in the Terminal.
Sending data:
The data is received by the collector service: