TJ-Neary/AI_Eval
Comprehensive LLM evaluation framework comparing local and cloud models with hardware-aware benchmarking. Evaluate across code generation, document analysis, and structured output using pass@k, LLM-as-Judge, and RAG metrics. Supports Ollama, Google Gemini, Anthropic, and OpenAI.
Stars
—
Forks
—
Language
Python
License
MIT
Category
Last pushed
Mar 06, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/rag/TJ-Neary/AI_Eval"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
FastBuilderAI/memory
FastMemory is a topological representation of text data using concepts as the primary input. It...
syncreus/syncreus-eval
Evaluate your LLM apps with one function call. Hallucination detection, RAG scoring, and agent...
bevinkatti/rag-harness
⚡ CLI to Evaluate and Compare RAG systems with RAGAS-style scoring
verifywise-ai/verifywise-eval-action
GitHub Action & Python SDK to evaluate LLMs in CI/CD — gate PRs on correctness, faithfulness,...
masaakisakamoto/memory-os
Deterministic continuity for AI systems. Detect and repair inconsistencies across sessions — not...