🐢 Open-Source Evaluation & Testing library for LLM Agents
-
Updated
Feb 26, 2026 - Python
🐢 Open-Source Evaluation & Testing library for LLM Agents
Agentic testing for agentic codebases
Deliver safe & effective language models
MIT-licensed Framework for LLMs, RAGs, Chatbots testing. Configurable via YAML and integrable into CI pipelines for automated testing.
QA Skills Directory QA Skills is a curated directory of testing-specific skills for AI coding agents (Claude Code, Cursor, Copilot, etc.).
GPT4Go: AI-Powered Test Case Generation for Golang 🧪
52-week journey from QA/SDET to GenAI Testing - learning in public with weekly mini-projects, code, and honest documentation of struggles and wins.
AI-powered E2E testing for 10 platforms. 253 MCP tools. Zero config. Works with Claude, Cursor, Windsurf, Copilot. Test Flutter, React Native, iOS, Android, Web, Electron, Tauri, KMP, .NET MAUI — all from natural language.
A Python library for verifying code properties using natural language assertions.
👁 零代码零标注 CV AI 自动化测试工具 🚀 免除大量人工画框和打标签等,直接零代码快速自动化测试 CV 计算机视觉 AI 人工智能图像识别算法:行人检测、动植物分类、人脸识别、OCR 车牌识别、旋转校正、舞蹈姿态、抠图分割 等,还可一键 下载测试报告、导出训练和测试数据集
Eval framework. Define correct, test against it, get results.
Open-source framework for stress-testing LLMs and conversational AI. Identify hallucinations, policy violations, and edge cases with scalable, realistic simulations. Join the discord: https://discord.gg/ssd4S37WNW
Automated testing for Model Context Protocol servers. Ship MCP Servers with confidence.
Turn plain English into Robot Framework files with AI. No dependencies, no hassle — just validated, ready-to-run tests
Statistical evaluation framework for AI agents
A professional collection of AI prompts for QA (Quality Assurance) professionals, designed to help test engineers and QA teams work more efficiently throughout the software testing lifecycle.
Ship evals before you ship features.
Übungsaufgaben zum Buch "Basiswissen KI-Testen"
4-stage evaluation framework for testing Claude Code plugin component triggering. Validates skills, agents, and commands activate correctly via programmatic detection and LLM judgment.
🚀 First multimodal AI-powered visual testing plugin for Claude Code. AI that can SEE your UI! 10x faster frontend development with closed-loop testing, browser automation, and Claude 4.5 Sonnet vision.
Add a description, image, and links to the ai-testing topic page so that developers can more easily learn about it.
To associate your repository with the ai-testing topic, visit your repo's landing page and select "manage topics."