libeval
Passlibeval - RAG evaluation system. Evaluator orchestrates quality assessment using LLM-as-judge patterns. CriteriaEvaluator scores responses against rubrics. RecallEvaluator measures retrieval performance. TraceEvaluator analyzes execution traces. EvalStore persists results. Use for automated quality testing, RAG pipeline evaluation, and agent performance testing
(0)
82
2
3
Install Skill
Skills are third-party code from public GitHub repositories. SkillHub scans for known malicious patterns but cannot guarantee safety. Review the source code before installing.
Install globally (user-level):
npx skillhub install majiayu000/claude-skill-registry/libevalInstall in current project:
npx skillhub install majiayu000/claude-skill-registry/libeval --projectSuggested path: ~/.claude/skills/libeval/
SKILL.md Content
---
name: libeval
description: >
libeval - RAG evaluation system. Evaluator orchestrates quality assessment
using LLM-as-judge patterns. CriteriaEvaluator scores responses against
rubrics. RecallEvaluator measures retrieval performance. TraceEvaluator
analyzes execution traces. EvalStore persists results. Use for automated
quality testing, RAG pipeline evaluation, and agent performance testing
---
# libeval Skill
## When to Use
- Evaluating RAG agent response quality
- Measuring retrieval recall and precision
- Running automated quality assessments
- Benchmarking agent performance over time
## Key Concepts
**Evaluator**: Main orchestrator that runs test cases through the agent and
collects metrics.
**CriteriaEvaluator**: Uses LLM-as-judge to score responses against defined
criteria and rubrics.
**RecallEvaluator**: Measures how well the retrieval system returns relevant
documents.
**TraceEvaluator**: Analyzes execution traces for performance and correctness.
## Usage Patterns
### Pattern 1: Run evaluation suite
```javascript
import { Evaluator } from "@copilot-ld/libeval";
const evaluator = new Evaluator(config);
const results = await evaluator.run(testCases);
console.log(results.summary);
```
### Pattern 2: Criteria-based evaluation
```javascript
import { CriteriaEvaluator } from "@copilot-ld/libeval";
const criteria = new CriteriaEvaluator(llmClient);
const score = await criteria.evaluate(response, rubric);
```
## Integration
Configured via config/eval.yml. Run via `make eval`. Uses libllm for
LLM-as-judge.