Add a custom text descriptor
How to add custom text descriptors.
You can implement custom row-level evaluations for text data that you will later use just like any other descriptor across Metrics and Tests. You can implement descriptors that use a single column or two columns.
Note that if you want to use LLM-based evaluations, you can write custom prompts using LLM judge templates.
Code example
Refer to a How-to example:
Custom descriptors
Imports:
Single column descriptor
You can create a custom descriptor that will take a single column from your dataset and run a certain evaluation for each row.
Implement your evaluation as a Python function. It will take a pandas Series as input and return a transformed Series.
Here, the is_empty_string_callable
function takes a column of strings and returns an "EMPTY" or "NON EMPTY" outcome for each.
Create a custom descriptor. Create an example of CustomColumnEval
class to wrap the evaluation logic into an object that you can later use to process specific dataset input.
Where:
func: Callable[[pd.Series], pd.Series]
is a function that returns a transformed pandas Series.display_name: str
is the new descriptor's name that will appear in Reports and Test Suites.feature_type
is the type of descriptor that the function returns (cat
for categorical,num
for numerical)
Apply the new descriptor. To create a Report with a new Descriptor, pass it as a column_name
to the ColumnSummaryMetric
. This will compute the new descriptor for all rows in the specified column and summarize its distribution:
Run the Report on your df
dataframe as usual:
Double column descriptor
You can create a custom descriptor that will take two columns from your dataset and will run a certain evaluation for each row. (For example, for pairwise evaluators).
Implement your evaluation as a Python function. Here, the exact_match_callable
function takes two columns and checks whether each pair of values is the same, returning "MATCH" if they are equal and "MISMATCH" if they are not.
Create a custom descriptor. Create an example of the CustomPairColumnEval
class to wrap the evaluation logic into an object that you can later use to process two named columns in a dataset.
Where:
func: Callable[[pd.Series, pd.Series], pd.Series]
is a function that returns a transformed pandas Series after evaluating two columns.first_column: str
is the name of the first column to be passed into the function.second_column: str
is the name of the second column to be passed into the function.display_name: str
is the new descriptor's name that will appear in Reports and Test Suites.feature_type
is the type of descriptor that the function returns (cat
for categorical,num
for numerical).
Apply the new descriptor. To create a Report with a new Descriptor, pass it as a column_name
to the ColumnSummaryMetric. This will compute the new descriptor for all rows in the dataset and summarize its distribution:
Run the Report on your df
dataframe as usual:
Last updated