- Bypass safety protections and generate harmful responses.
- Trick the model into revealing sensitive or unintended information.
- Exploit edge cases to evaluate system robustness.
Create an adversarial test dataset
You can configure your own adversarial dataset.1. Create a Project
In the Evidently UI, start a new Project or open an existing one.- Navigate to “Datasets” in the left menu.
- Click “Generate” and select the “Adversarial testing” option.

2. Select a test scenario
Choose a predefined adversarial scenario:
- Harmful content (e.g., profanity, toxicity, illegal advice).
- Forbidden topics (e.g., financial, legal, medical queries).
- Brand image (eliciting negative feedback on a company or product).
- Competition (comparisons with competitor products).
- Offers and promises (attempting to get AI to make commitments).
- Hijacking (out-of-scope questions unrelated to the intended purpose).
- Prompt leakage (extracting system instructions or hidden prompts).
3. Configure the dataset
After selecting a scenario- Provide an optional dataset name and description. (This applies if you export each dataset separately).
- Set the number of inputs to generate.


4. Generate the data
You can choose to:- Combine multiple scenarios into a single dataset. If you select multiple categories (e.g., Brand Image and Forbidden Topics), they will be included in the same dataset, with a separate “scenario” column to indicate the category of each test case.
- Export each scenario separately. Generate individual datasets for each selected test type.
- Open and edit each dataset as needed.
- Download it as a CSV file.
- Access it via the Python API using the dataset ID.
Dataset API. How to work with Evidently datasets.