Skip to content

jimthompson5802/rag_evaluator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 

Repository files navigation

RAG Evaluator

A local-only RAG (Retrieval-Augmented Generation) evaluation system with a web-based UI that leverages DeepEval for evaluation metrics.

Note: This project is being built with Claude Code, Anthropic's AI-powered coding assistant.

Overview

RAG Evaluator provides a comprehensive framework for evaluating RAG systems. It runs entirely on your local machine with SQLite for data persistence, requiring no cloud infrastructure or external services.

Key Features

  • DeepEval Integration: Evaluate RAG responses using industry-standard metrics

    • Answer Relevancy
    • Faithfulness
    • Contextual Relevancy/Precision/Recall
    • Hallucination Detection
    • And more
  • Web-Based Dashboard: React UI for managing evaluations

    • Configure RAG systems to test
    • Create and organize test cases
    • View detailed results and metrics
    • Track trends over time
  • LLM Provider Support: Configurable LLM providers via Web UI

    • OpenAI: GPT-4, GPT-4-turbo, GPT-3.5-turbo
    • Ollama: Llama2, Mistral, CodeLlama, and other local models
    • Integrated via LangChain (langchain-openai, langchain-ollama)
  • MCP Server Support: Extensible integrations via Model Context Protocol

    • RAG System MCP for querying systems under test
    • Vector DB MCP for Chroma database access
    • LLM Provider MCP for OpenAI and Ollama interactions
  • Skills: Specialized workflows for common tasks

    • Report generation
    • Data visualization
    • Comparison analysis
  • Local-First Architecture: All data stays on your machine

    • SQLite database for persistence
    • No external dependencies required
    • Privacy-focused design
    • No authentication required - designed for single-user local development

Technology Stack

Layer Technologies
Frontend React 18+, TypeScript, Tailwind CSS, Recharts
Backend FastAPI, SQLAlchemy, Pydantic
Database SQLite
Vector DB Chroma (via langchain-chroma)
Evaluation DeepEval, LangChain
LLM Providers OpenAI (langchain-openai), Ollama (langchain-ollama)
Integrations MCP (Model Context Protocol)

Getting Started

Prerequisites

  • Python 3.11+
  • Node.js 18+
  • uv (Python package manager) - Install from astral.sh/uv

Installation

# Clone the repository
git clone <repository-url>
cd rag-evaluator

# Set up the backend with uv
cd backend
uv sync                    # Creates virtual environment and installs dependencies

# Start the backend
uv run uvicorn app.main:app --reload --port 8000

# In a new terminal, set up the frontend
cd frontend
npm install
npm run dev

Configure LLM Provider

Create a .env file in the backend directory:

# For OpenAI
LLM_PROVIDER=openai
LLM_MODEL=gpt-4
OPENAI_API_KEY=your-openai-api-key

# For Ollama (local LLM)
LLM_PROVIDER=ollama
LLM_MODEL=llama2
OLLAMA_BASE_URL=http://localhost:11434

You can also configure the LLM provider through the Web UI under System Configuration > LLM Provider Settings.

Access the Application

Project Structure

rag-evaluator/
├── backend/
│   ├── app/
│   │   ├── api/v1/          # API endpoints
│   │   ├── core/            # Configuration, database
│   │   ├── models/          # SQLAlchemy models
│   │   ├── schemas/         # Pydantic schemas
│   │   └── services/        # Business logic
│   ├── skills/              # Skill definitions
│   └── mcp_servers/         # MCP server implementations
├── frontend/
│   ├── src/
│   │   ├── components/      # React components
│   │   ├── hooks/           # Custom hooks
│   │   ├── services/        # API clients
│   │   └── types/           # TypeScript types
│   └── public/
├── docs/                    # Architecture documentation
└── data/                    # SQLite database (created at runtime)

Documentation

Security Note

This application is designed for local development only. It does not include authentication or authorization features. The application:

  • Binds to localhost (127.0.0.1) only
  • Stores API keys in plain text in .env files
  • Trusts the local user completely
  • Relies on OS-level file permissions for security

Do not expose this application to a network or deploy it to a shared environment.

License

MIT

About

A local-only RAG (Retrieval-Augmented Generation) evaluation system with a web-based UI that leverages DeepEval for evaluation metrics.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors