How to run evaluations for text data.
Generate toy data
OOVWordsPercentage()
may require nltk
dictionaries:
Dataset
object and add descriptors to the selected columns (in this case, “answer” column).add_descriptors
.TextEvals
Preset.
To configure and run the Report for the eval_dataset
:
alias
to each Descriptor to make it easier to reference. This name shows up in visualizations and column headers. It’s especially handy if you’re using checks like regular expressions with word lists, where the auto-generated title could get very long.
Contains
Descriptor, add the list of items
:
TextLength
or Sentiment
), use the tests argument to set conditions. Each test adds a new column with a True/False result.
gte
(greater than or equal), lte
(less than or equal), eq (equal). Check the full list here.
You can preview the results with: eval_dataset.as_dataframe()
:
TestSummary
to combine multiple tests into one or more summary columns. For example, the following returns True if all tests pass:
TestSummary
will only consider tests added before it in the list of descriptors.DeclineLLMEval
in the example.ColumnTest
to apply checks to any column, even ones not generated by descriptors. This is useful for working with metadata or precomputed values:
TextEvals
preset. It’s the simplest and useful way to summarize evaluation results. However, you can also create custom reports using different metric combinations for more control.
Imports. Import the components you’ll need:
TextEvals
to specific descriptors in your dataset. This makes your report more focused and lightweight.
TextEvals
internally uses ValueStats
Metric for each descriptor. To customize the Report, you can reference specific descriptors and use metrics like MeanValue
, MaxValue
, etc:
MeanValue
, MeanValue
, MaxValue
, QuantileValue
, OutRangeValueCount
and CategoryCount
.