libeval

تایید شده

libeval - RAG evaluation system. Evaluator orchestrates quality assessment using LLM-as-judge patterns. CriteriaEvaluator scores responses against rubrics. RecallEvaluator measures retrieval performance. TraceEvaluator analyzes execution traces. EvalStore persists results. Use for automated quality testing, RAG pipeline evaluation, and agent performance testing

@majiayu000
MIT۱۴۰۴/۱۲/۳
(0)
۸۲
۲
۳

نصب مهارت

مهارت‌ها کدهای شخص ثالث از مخازن عمومی GitHub هستند. SkillHub الگوهای مخرب شناخته‌شده را اسکن می‌کند اما نمی‌تواند امنیت را تضمین کند. قبل از نصب، کد منبع را بررسی کنید.

نصب سراسری (سطح کاربر):

npx skillhub install majiayu000/claude-skill-registry/libeval

نصب در پروژه فعلی:

npx skillhub install majiayu000/claude-skill-registry/libeval --project

مسیر پیشنهادی: ~/.claude/skills/libeval/

محتوای SKILL.md

---
name: libeval
description: >
  libeval - RAG evaluation system. Evaluator orchestrates quality assessment
  using LLM-as-judge patterns. CriteriaEvaluator scores responses against
  rubrics. RecallEvaluator measures retrieval performance. TraceEvaluator
  analyzes execution traces. EvalStore persists results. Use for automated
  quality testing, RAG pipeline evaluation, and agent performance testing
---

# libeval Skill

## When to Use

- Evaluating RAG agent response quality
- Measuring retrieval recall and precision
- Running automated quality assessments
- Benchmarking agent performance over time

## Key Concepts

**Evaluator**: Main orchestrator that runs test cases through the agent and
collects metrics.

**CriteriaEvaluator**: Uses LLM-as-judge to score responses against defined
criteria and rubrics.

**RecallEvaluator**: Measures how well the retrieval system returns relevant
documents.

**TraceEvaluator**: Analyzes execution traces for performance and correctness.

## Usage Patterns

### Pattern 1: Run evaluation suite

```javascript
import { Evaluator } from "@copilot-ld/libeval";

const evaluator = new Evaluator(config);
const results = await evaluator.run(testCases);
console.log(results.summary);
```

### Pattern 2: Criteria-based evaluation

```javascript
import { CriteriaEvaluator } from "@copilot-ld/libeval";

const criteria = new CriteriaEvaluator(llmClient);
const score = await criteria.evaluate(response, rubric);
```

## Integration

Configured via config/eval.yml. Run via `make eval`. Uses libllm for
LLM-as-judge.