Tutorial - Tracing
How to capture LLM inputs and outputs and view them in Evidently Cloud.
In this tutorial, you will learn how to set up tracing for an LLM application to collect inputs and outputs of your generative AI application and view the collected traces in Evidently Cloud. You can later run evaluations on the created datasets.
You will use the following tools:
Tracely: An open-source library based on OpenTelemetry to track events in your LLM application.
Evidently: An open-source library to run LLM evaluations and interact with Evidently Cloud.
Evidently Cloud: A web platform to view traces and run evaluations.
OpenAI: Used to simulate an LLM application.
You will go through the following steps:
Install libraries
Set up and initialize tracing
Create a simple question-answer LLM function
Collect and send traces to Evidently Cloud
(Optional) Download the resulting dataset to run local evals
To complete the tutorial, use the provided code snippets or run a sample notebook.
Jupyter notebook:
Or click to open in Colab.
If you're having problems or getting stuck, reach out on Discord.
1. Installation
Install the necessary libraries:
Import the required modules:
Optional. To load the traced dataset back to Python and run evals.
2. Get the API keys
Obtain your API keys from Evidently Cloud and OpenAI.
Evidently Cloud: Create an account, set up an Organization and Team. Get the API key from the Token page. (Check the step by step instructions if you need help).
OpenAI: Get your API key from OpenAI. (Token page).
Set your API keys:
It is recommended to pass the key as an environment variable. See Open AI docs for best practices.
3. Configure tracing
Set up configuration details:
The
address
is the destination backend to store collected traces. In this case, it is Evidently Cloud.Project_id
is the identification of the Evidently Project. Go to the Projects page, enter the selected Project and copy its ID.Dataset_name
helps identify the resulting Tracing dataset. All data with the same ID would be grouped into single dataset.
Initialize tracing:
4. Trace a simple LLM app
You will now create a simple function that sends a list of questions to the LLM and gets the responses.
Initialize the OpenAI client with the API key:
Define the list of questions to answer:
Create a template for the questions you will pass to the LLM.
Use the @trace_event()
decorator from Tracely
to trace the execution of the function. This captures input arguments and outputs, sending the trace data to Evidently Cloud.
Loop through the list of questions and call the traced function pseudo_assistant
to get responses while Tracely
captures all relevant data.
5. View traces
Go to the Evidently Cloud, navigate to the datasets in the left menu, and open the traces you just sent. It might take a few moments until OpenAI processes all the inputs.
You can now view, sort, export, and work with the traced dataset. You can run evaluations on this dataset both in the Cloud and locally.
6. Load the dataset
This is an optional step. If you want to access your traced dataset locally, for example, to run evaluations, you can do that by loading your dataset from Evidently Cloud.
Connect to the Evidently Cloud workspace:
Specify the dataset ID. You can copy it from the dataset page in the UI.
Load the dataset to pandas:
Preview the dataset with traced_data.head()
.
7. Run an evaluation
You can run evaluations on this dataset using the Evidently Python library. You can generate the Reports to view them locally or send them to Evidently Cloud. For example, letโs evaluate the length and sentiment of the responses, and whether they include the word "Certainly".
Define the evaluations:
Run the Report on the traced_data
:
Send the results to Evidently Cloud:
To explore the evaluation results, go to Evidently Cloud, enter your Project and navigate to "Reports" in the left menu.
You can view and brows the results. For example, find the longest responses or all responses that say "Certainly".
To view the evals locally, run evals_report
for the Report and evals_report.datasets().current
for the Dataset with added scores.
What's next?
Check the complete LLM evaluation tutorial for more details: how to run other evaluation methods, including LLM as a judge, or test for specific conditions.
Need help? Ask in our Discord community.
Last updated