libeval

Pass

libeval - RAG evaluation system. Evaluator orchestrates quality assessment using LLM-as-judge patterns. CriteriaEvaluator scores responses against rubrics. RecallEvaluator measures retrieval performance. TraceEvaluator analyzes execution traces. EvalStore persists results. Use for automated quality testing, RAG pipeline evaluation, and agent performance testing

@majiayu000
MIT2/22/2026
(0)
82
2
3

Install Skill

Skills are third-party code from public GitHub repositories. SkillHub scans for known malicious patterns but cannot guarantee safety. Review the source code before installing.

Install globally (user-level):

npx skillhub install majiayu000/claude-skill-registry/libeval

Install in current project:

npx skillhub install majiayu000/claude-skill-registry/libeval --project

Suggested path: ~/.claude/skills/libeval/

SKILL.md Content

---
name: libeval
description: >
  libeval - RAG evaluation system. Evaluator orchestrates quality assessment
  using LLM-as-judge patterns. CriteriaEvaluator scores responses against
  rubrics. RecallEvaluator measures retrieval performance. TraceEvaluator
  analyzes execution traces. EvalStore persists results. Use for automated
  quality testing, RAG pipeline evaluation, and agent performance testing
---

# libeval Skill

## When to Use

- Evaluating RAG agent response quality
- Measuring retrieval recall and precision
- Running automated quality assessments
- Benchmarking agent performance over time

## Key Concepts

**Evaluator**: Main orchestrator that runs test cases through the agent and
collects metrics.

**CriteriaEvaluator**: Uses LLM-as-judge to score responses against defined
criteria and rubrics.

**RecallEvaluator**: Measures how well the retrieval system returns relevant
documents.

**TraceEvaluator**: Analyzes execution traces for performance and correctness.

## Usage Patterns

### Pattern 1: Run evaluation suite

```javascript
import { Evaluator } from "@copilot-ld/libeval";

const evaluator = new Evaluator(config);
const results = await evaluator.run(testCases);
console.log(results.summary);
```

### Pattern 2: Criteria-based evaluation

```javascript
import { CriteriaEvaluator } from "@copilot-ld/libeval";

const criteria = new CriteriaEvaluator(llmClient);
const score = await criteria.evaluate(response, rubric);
```

## Integration

Configured via config/eval.yml. Run via `make eval`. Uses libllm for
LLM-as-judge.