> ## Documentation Index
> Fetch the complete documentation index at: https://docs.evidentlyai.com/llms.txt
> Use this file to discover all available pages before exploring further.

# RAG evaluation dataset

> Synthetic data for RAG.

Retrieval-Augmented Generation (RAG) systems rely on retrieving answers from a knowledge base before generating responses. To evaluate them effectively, you need a test dataset that reflects what the system *should* know.

Instead of manually creating test cases, you can generate them directly from your knowledge source, ensuring accurate and relevant ground truth data.

## Create a RAG test dataset

You can generate ground truth RAG dataset from your data source.

### 1. Create a Project

In the Evidently UI, start a new Project or open an existing one.

* Navigate to “Datasets” in the left menu.
* Click “Generate” and select the “RAG” option.

<img src="https://mintcdn.com/evi/8OEti_y2YYYC9e0v/images/synthetic/synthetic_data_select_method.png?fit=max&auto=format&n=8OEti_y2YYYC9e0v&q=85&s=3fea853e4837f812a269046e5ee53437" alt="" width="2448" height="1354" data-path="images/synthetic/synthetic_data_select_method.png" />

### 2. Upload your knowledge base

Select a file containing the information your AI system retrieves from. Supported formats: Markdown (.md), CSV, TXT, PDFs. Choose how many inputs to generate.

<img src="https://mintcdn.com/evi/DLHmQMW9F8KXZznS/images/synthetic/synthetic_data_inputs_example_upload.png?fit=max&auto=format&n=DLHmQMW9F8KXZznS&q=85&s=bbf44a8362fa1380b3ebdf1b8c742412" alt="" width="2462" height="1526" data-path="images/synthetic/synthetic_data_inputs_example_upload.png" />

Simply drop the file, then:

* Choose the number of inputs to generate.
* Choose if you want to include the context used to generate the answer.

<img src="https://mintcdn.com/evi/DLHmQMW9F8KXZznS/images/synthetic/synthetic_data_inputs_example_upload2.png?fit=max&auto=format&n=DLHmQMW9F8KXZznS&q=85&s=60dac0ae0936cd6e8adf99b65b872100" alt="" width="2432" height="1442" data-path="images/synthetic/synthetic_data_inputs_example_upload2.png" />

The system automatically extracts relevant facts and generates user-like questions to your data source with ground truth answers.

<Info>
  Note that it may take some time to process the dataset. Limits apply on the free plan.
</Info>

### 3. Review the test cases

You can preview and refine the generated dataset.

<img src="https://mintcdn.com/evi/DLHmQMW9F8KXZznS/images/synthetic/synthetic_data_rag_example_result.png?fit=max&auto=format&n=DLHmQMW9F8KXZznS&q=85&s=01ac8d3ba7286bb4d7f0d432a6fd6940" alt="" width="2880" height="1654" data-path="images/synthetic/synthetic_data_rag_example_result.png" />

You can:

* Use “More like this” to add more variations.
* Drop rows that aren’t relevant.
* Manually edit questions or responses.

### 4. Save the Dataset

Once you are finished, store the dataset. You can download it as a CSV file or access it via the Python API using the dataset ID to use in your evaluation.

<Info>
  **Dataset API.** How to work with [Evidently datasets](/docs/platform/datasets_overview).
</Info>
