Metrics to evaluate a RAG system.
ContextQualityLLMEval
.
ContextRelevance
metric.
CorrectnessLLMEval
) and non-LLM methods like Semantic similarity and BERTScore. Let’s run all three at once, but we’d recommend choosing the one:
Faithfulness
to detect if the response is contradictory or unfaithful to the context:
synthetic_df
, you create an Evidently dataset object and choose the selected descriptors by simply listing them.
as_dict()
for a Python dictionary output.