Evidently LLM Quickstart
LLM evaluation "Hello world."
You can run this example in Colab or any Python environment.
1. Installation
Install the Evidently Python library.
Import the necessary components:
Optional. Import the components to send evaluation results to Evidently Cloud:
2. Import the toy dataset
Import a toy dataset with e-commerce reviews. It contains a column with "Review_Text". You will take 100 rows to analyze.
3. Run your first eval
Run a few basic evaluations for all texts in the "Review_Text" column:
text sentiment (measured on a scale from -1 for negative to 1 for positive)
text length (returns an absolute number of symbols)
There are 20+ built-in evals to choose from. You can also create custom ones, including LLM-as-a-judge. We call the result of each such evaluation a descriptor
.
View a Report in Python:
You will see the summary results: the distribution of length and sentiment for all evaluated texts.
4. Send results to Evidently Cloud
To record and monitor evaluations over time, send them to Evidently Cloud.
Sign up. Create an Evidently Cloud account and your Organization.
Add a Team. Click Teams in the left menu. Create a Team, copy and save the Team ID. (Team page).
Get your API token. Click the Key icon in the left menu to go. Generate and save the token. (Token page).
Connect to Evidently Cloud. Pass your API key to connect from your Python environment.
Create a Project. Create a new Project inside your Team, adding your title and description:
Upload the Report to the Project. Send the evaluation results:
View the Report. Go to the Evidently Cloud. Open your Project and head to the "Reports" in the left menu. (Cloud home).
5. Get a dashboard
Go to the "Dashboard" tab and enter the "Edit" mode. Add a new tab, and select the "Descriptors" template.
You'll see a set of panels that show Sentiment and Text Length with a single data point. As you log ongoing evaluation results, you can track trends and set up alerts.
Want to see more?
Check out a more in-depth tutorial to learn key workflows. It covers using LLM-as-a-judge, running conditional test suites, monitoring results over time, and more.
pageTutorial - LLM EvaluationLast updated