Important
Giskard v3 is a fresh rewrite designed for dynamic, multi-turn testing of AI agents. This release drops heavy dependencies for better efficiency while introducing a more powerful AI vulnerability scanner and enhanced RAG evaluation capabilities. For now, the vulnerability scanner and RAG evaluation still rely on Giskard v2. Giskard v2 remains available but is no longer actively maintained. Follow progress β Read the v3 Annoucement Β· Roadmap
pip install giskardRequires Python 3.12+.
Giskard is an open-source Python library for testing and evaluating agentic systems. The v3 architecture is a modular set of focused packages β each carrying only the dependencies it needs β built from scratch to wrap anything: an LLM, a black-box agent, or a multi-step pipeline.
| Status | Package | Description |
|---|---|---|
| β Alpha | giskard-checks |
Testing & evaluation β scenario API, built-in checks, LLM-as-judge |
| π§ In progress | giskard-scan |
Agent vulnerability scanner β red teaming, prompt injection, data leakage (successor of v2 Scan) |
| π Planned | giskard-rag |
RAG evaluation & synthetic data generation (successor of v2 RAGET) |
pip install giskard-checksGiskard Checks is a lightweight library for creating evaluations (evals) that test LLM-based systems β from simple assertions to LLM-as-judge assessments. Unlike traditional unit tests, evals are designed for non-deterministic outputs where the same input can produce different valid responses.
Use Giskard Checks to:
- Catch regressions β verify your system still behaves correctly after changes
- Validate RAG quality β check if answers are grounded in retrieved context
- Enforce safety rules β ensure outputs conform to your content policies
- Evaluate multi-turn agents β test full conversations, not just single exchanges
Built-in evals include string matching, comparisons, regex, semantic similarity, and LLM-as-judge checks (Groundedness, Conformity, LLMJudge).
from openai import OpenAI
from giskard.checks import Scenario, Groundedness
client = OpenAI()
def get_answer(inputs: str) -> str:
response = client.chat.completions.create(
model="gpt-5-mini",
messages=[{"role": "user", "content": inputs}],
)
return response.choices[0].message.content
scenario = (
Scenario("test_dynamic_output")
.interact(
inputs="What is the capital of France?",
outputs=get_answer,
)
.check(
Groundedness(
name="answer is grounded",
answer_key="trace.last.outputs",
context="France is a country in Western Europe. Its capital is Paris.",
)
)
)
result = await scenario.run()
result.print_report()The
run()method is async. In a script, wrap it withasyncio.run(). See the full docs forSuites,LLMJudge, multi-turn scenarios, and more.
Giskard v2 included Scan (automatic vulnerability detection) and RAGET (RAG evaluation test set generation) for both ML models and LLM applications. These features are not available in v3.
pip install "giskard[llm]>2,<3"Scan β automatically detect performance, bias & security issues
Wrap your model and run the scan:
import giskard
import pandas as pd
# Replace my_llm_chain with your actual LLM chain or model inference logic
def model_predict(df: pd.DataFrame):
"""The function takes a DataFrame and must return a list of outputs (one per row)."""
return [my_llm_chain.run({"query": question}) for question in df["question"]]
giskard_model = giskard.Model(
model=model_predict,
model_type="text_generation",
name="My LLM Application",
description="A question answering assistant",
feature_names=["question"],
)
scan_results = giskard.scan(giskard_model)
display(scan_results)RAGET β generate evaluation datasets for RAG applications
Automatically generate questions, reference answers, and context from your knowledge base:
import pandas as pd
from giskard.rag import generate_testset, KnowledgeBase
# Load your knowledge base documents
df = pd.read_csv("path/to/your/knowledge_base.csv")
knowledge_base = KnowledgeBase.from_pandas(df, columns=["column_1", "column_2"])
testset = generate_testset(
knowledge_base,
num_questions=60,
language='en',
agent_description="A customer support chatbot for company X",
)We welcome contributions from the AI community! Read this guide to get started, and join our thriving community on Discord.
Follow the progress and share feedback: v3 Announcement Β· Roadmap
π Leave us a star, it helps the project to get discovered by others and keeps us motivated to build awesome open-source tools! π
β€οΈ If you find our work useful, please consider sponsoring us on GitHub. With a monthly sponsoring, you can get a sponsor badge, display your company in this readme, and get your bug reports prioritized. We also offer one-time sponsoring if you want us to get involved in a consulting project, run a workshop, or give a talk at your company.
We thank the following companies which are sponsoring our project with monthly donations:




