Text Evals
TL;DR: You can explore and compare text datasets.
Report: for visual analysis or metrics export, use the
TextEvals
.
Text Evals Report
To visually explore the descriptive properties of text data, you can create a new Report object and generate TextEvals
preset for the column containing the text data. It's best to define your own set of descriptors
by passing them as a list to the TextEvals
preset. For more details, see how descriptors work.
If you don’t specify descriptors, the Preset will use default statistics.
Code example
Note that to calculate some text-related metrics, you may also need to also import additional libraries:
Data Requirements
You can pass one or two datasets. Evidently will compute descriptors both for the current production data and the reference data. If you pass a single dataset, there will be no comparison.
To run this preset, you must have text columns in your dataset. Additional features and prediction/target are optional. Pass them if you want to analyze the correlations with text descriptors.
Column mapping. Specify the columns that contain text features in column mapping.
How it looks
The report includes 5 components. All plots are interactive.
Aggregated visuals in plots. Starting from v 0.3.2, all visuals in the Evidently Reports are aggregated by default. This helps decrease the load time and report size for larger datasets. If you work with smaller datasets or samples, you can pass an option to generate plots with raw data. You can choose whether you want it on not based on the size of your dataset.
Text Descriptors Distribution
The report generates several features that describe different text properties and shows the distributions of these text descriptors.
Text length
Non-letter characters
Out-of-vocabulary words
Sentiment
Shows the distribution of text sentiment (-1 negative to 1 positive).
Sentence Count
Shows the sentence count.
Metrics output
You can also get the report output as a JSON or a Python dictionary.
Report customization
You can choose your own descriptors.
You can use a different color schema for the report.
You can create a different report or test suite from scratch, taking this one as an inspiration.
Examples
Head to an example how-to notebook to see an example Text Overview preset and other metrics and tests for text data.
Last updated