agent-tui Pass

Main Agents: Do NOT use this skill directly. If you need to test the TUI, invoke the `tui_tester` subagent. Drive terminal UI (TUI) applications programmatically for testing, automation, and inspection. Use when: automating CLI/TUI interactions, regression testing terminal apps, or verifying interactive behavior. Also use when: user asks "what is agent-tui", "what does agent-tui do", "demo agent-tui", "show me agent-tui", "how does agent-tui work", or wants to see it in action.

104.5k
stars
0
downloads
1
views

// Install Skill

Install Skill

Skills are third-party code from public GitHub repositories. SkillHub scans for known malicious patterns but cannot guarantee safety. Review the source code before installing.

Install globally (user-level):

npx skillhub install google-gemini/gemini-cli/agent-tui

Install in current project:

npx skillhub install google-gemini/gemini-cli/agent-tui --project

Suggested path: ~/.claude/skills/agent-tui/

SKILL.md Content

---
name: agent-tui
description: >
  Main Agents: Do NOT use this skill directly. If you need to test the TUI, invoke the `tui_tester` subagent.
  Drive terminal UI (TUI) applications programmatically for testing, automation, and inspection.
  Use when: automating CLI/TUI interactions, regression testing terminal apps, or verifying interactive behavior.
  Also use when: user asks "what is agent-tui", "what does agent-tui do", "demo agent-tui", "show me agent-tui", "how does agent-tui work", or wants to see it in action.
---

## 🚨 CRITICAL: macOS Daemon Workaround & Gemini CLI Usage 🚨

When using `agent-tui` in this macOS environment, the default background daemonization process crashes, causing `Connection refused (os error 61)` errors. 

**You MUST start the daemon manually shielded from TTY hangups before running any `agent-tui` commands.** Using `nohup` is insufficient; you must use `tmux` to provide a fully isolated pseudo-terminal. 

To support parallel runs, **only restart the daemon if it is not currently running:**

```bash
# Check if daemon is alive, start it in tmux if it is not
if ! agent-tui sessions >/dev/null 2>&1; then
  tmux kill-session -t agent-tui 2>/dev/null || true
  agent-tui daemon stop 2>/dev/null || true
  rm -f /tmp/agent-tui*
  tmux new-session -d -s agent-tui 'agent-tui daemon start --foreground > /tmp/agent-tui-daemon.log 2>&1'
  sleep 1
fi
```

### Session ID vs PID (Crucial for Reconnection)

When `agent-tui run` returns JSON, it includes both a `session_id` and a `pid`. The `pid` is purely informational (the OS process ID of the child command). You **do not** use the `pid` to reconnect or issue commands. You must always use the `session_id` (e.g., `--session <id>`). 

If the daemon crashes (`os error 61`), the pseudo-terminal is destroyed. Even if the child `pid` survives as an orphan, you cannot reconnect to it. You must restart the daemon using the workaround above and start a completely new session.

### Testing the Gemini CLI

When testing the Gemini CLI with `agent-tui`, there are several strict requirements to ensure deterministic and accurate behavior:

1. **Build Before Running**: `agent-tui` runs the built JS files, not TypeScript. You **MUST** run `npm run build` or `npm run build:all` after making code changes and before launching the CLI with `agent-tui`.
2. **Bypass Trust Modals**: Always pass `GEMINI_CLI_TRUST_WORKSPACE=true` in the environment. If you don't, any new project-level agents or extensions will trigger a full-screen "Acknowledge and Enable" modal. This modal steals focus, swallows automation keystrokes, and causes `agent-tui wait` commands to time out.
3. **Isolated Environments**: If you need to test without real user credentials or existing agents interfering, isolate the global settings using `GEMINI_CLI_HOME=<some-test-dir>`.
4. **Testing State Deltas (e.g., Reloads)**: If you are testing features that report deltas (e.g., `/agents reload` outputting "1 new local subagent"), you **MUST**:
   - Start the CLI *first* so it establishes its baseline registry.
   - Use a separate shell command (outside of `agent-tui`) to write the new agent `.md`/`.toml` file.
   - Use `agent-tui type` and `press` to trigger the `/agents reload` command inside the running session.
   - (If you add the files before starting the CLI, they become part of the baseline and won't trigger the delta logic).

```bash
# Example: Standard isolated run (sandboxed config + bypass trust modals)
env GEMINI_CLI_TRUST_WORKSPACE=true GEMINI_CLI_HOME=test-gemini-home agent-tui run -d "$(pwd)" node packages/cli/dist/index.js
```

# Terminal Automation Mastery

## Prerequisites

- **Supported OS**: macOS or Linux (Windows not supported yet).
- **Verify install**:

```bash
agent-tui --version
```

If not installed, use one of:

```bash
# Recommended: one-line install (macOS/Linux)
curl -fsSL https://raw.githubusercontent.com/pproenca/agent-tui/master/install.sh | sh
```

```bash
# Package manager
npm i -g agent-tui
pnpm add -g agent-tui
bun add -g agent-tui
```

```bash
# Build from source
cargo install --git https://github.com/pproenca/agent-tui.git --path cli/crates/agent-tui
```

If you used the install script, ensure `~/.local/bin` is on your PATH.

## Philosophy: Why Terminal Automation Is Different

Terminal UIs are **stateless from the observer's perspective**. Unlike web browsers with a persistent DOM, terminal automation works with a constantly-refreshed character grid. This fundamental difference shapes everything:

| Web Automation | Terminal Automation |
|----------------|---------------------|
| DOM persists across interactions | Screen buffer is redrawn constantly |
| Selectors are stable | Text positions may shift |
| Query once, act many times | Must re-verify before EVERY action |
| Network events signal completion | Must detect visual stability |

**The Core Insight**: agent-tui gives you vision without memory. Each screenshot is a fresh observation. Previous state means nothing after the UI changes. This isn't a limitation—it's the nature of terminal interaction.

## Mental Model: The Feedback Loop

Think of terminal automation as a **closed-loop control system**:

```
    ┌──────────────────────────────────────────────┐
    │                                              │
    ▼                                              │
OBSERVE ──► DECIDE ──► ACT ──► WAIT ──► VERIFY ───┘
   │                                        │
   │                                        │
   └─────── NEVER skip ◄────────────────────┘
```

**Each phase is mandatory.** Skipping verification is the #1 cause of flaky automation.

### The "Fresh Eyes" Principle

Every time you need to interact with the UI:

1. **Take a fresh screenshot** — your previous one is now stale
2. **Locate your target visually** — text positions may have changed
3. **Verify the state** — the UI may have changed unexpectedly
4. **Act only when stable** — animations and loading states cause failures

This feels slower, but it's the only reliable approach. Optimistic reuse of stale state causes intermittent failures that are painful to debug.

## Critical Rules (Non-Negotiable)

> **RULE 1: Atomic Execution (No Pipelining)**
> You are FORBIDDEN from chaining commands with `&&` (e.g., `type "x" && press Enter && wait`). Modals or UI updates can intercept your keystrokes. You MUST execute one atomic action, wait, screenshot, and verify before taking the next action in a new turn.

> **RULE 2: Re-snapshot after EVERY action**
> The UI state is invalidated by any change. Always take a fresh screenshot before acting again.

> **RULE 3: Never act on unstable UI**
> If the UI is animating, loading, or transitioning, `wait --stable` first. Acting during transitions because race conditions.

> **RULE 4: Verify before claiming success**
> Use `wait "expected text" --assert` to confirm outcomes. Don't assume an action worked—prove it.

> **RULE 5: Error Recovery**
> If a `wait` command times out, DO NOT blindly restart or kill the session. Execute `screenshot` to visually diagnose what unexpected UI element (modal, error dialog, lost focus) intercepted the flow.

> **RULE 6: Clean up sessions**
> Always end with `agent-tui kill`. Orphaned sessions consume resources and can interfere with future runs.

## Decision Framework

### Which Screenshot Mode?

Use `screenshot --format json` when parsing automation output, or plain `screenshot` for human readable text.

### How to Wait?

```
What are you waiting for?
│
├─► Specific text to appear
│   └─► `wait "text" --assert` (fails if not found)
│
├─► Specific text to disappear
│   └─► `wait "text" --gone --assert`
│
├─► UI to stop changing (animations, loading)
│   └─► `wait --stable`
│
└─► Multiple conditions
    └─► Chain waits sequentially
```

### How to Act?

```
What do you need to do?
│
├─► Type text into the terminal
│   └─► `type "text"`
│
├─► Send keyboard shortcuts/navigation
│   └─► `press Ctrl+C` or `press ArrowDown Enter`
```

## Core Workflow

The canonical automation loop:

```bash
# 1. START: Launch the TUI app
agent-tui run <command> [-- args...]

# 2. OBSERVE: Get current UI state
agent-tui screenshot --format json

# 3. DECIDE: Based on text, determine next action
# (This happens in your head/code)

# 4. ACT: Execute the action
agent-tui type "text"
agent-tui press Enter

# 5. WAIT: Synchronize with UI changes
agent-tui wait "Expected" --assert    # or wait --stable

# 6. VERIFY: Confirm the outcome (often combined with step 5)
# If verification fails, handle the error

# 7. REPEAT: Go back to step 2 until done

# 8. CLEANUP: Always clean up
agent-tui kill
```

## Anti-Patterns (What NOT to Do)

### ❌ Acting During Animation/Loading

```bash
# WRONG: Acting immediately on dynamic UI
agent-tui run my-app
agent-tui screenshot --format json    # UI might still be loading!
agent-tui type "value"                # ❌ Might miss the input field

# RIGHT: Wait for stability first
agent-tui run my-app
agent-tui wait --stable               # Let UI settle
agent-tui screenshot --format json    # Now it's reliable
agent-tui type "value"
```

### ❌ Assuming Success Without Verification

```bash
# WRONG: Assuming the type worked
agent-tui type "value"
agent-tui press Enter
# ...proceed as if success...       # ❌ What if it failed silently?

# RIGHT: Verify the outcome
agent-tui type "value"
agent-tui press Enter
agent-tui wait "Success" --assert    # ✓ Proves the action worked
```

### ❌ Skipping Cleanup

```bash
# WRONG: Forgetting to kill the session
agent-tui run my-app
# ...do stuff...
# script ends                        # ❌ Session left running!

# RIGHT: Always clean up
agent-tui run my-app
# ...do stuff...
agent-tui kill                       # ✓ Clean exit
```

## Before You Start: Clarify Requirements

Before automating any TUI, gather this information:

1. **Command**: What exactly to run? (`my-app --flag` or `npm start`?)
2. **Success criteria**: What text/state indicates success?
3. **Input sequence**: What keystrokes/data to enter, in what order?
4. **Safety**: Is it safe to submit forms, delete data, etc.?
5. **Auth**: Does it need login? Test credentials?
6. **Live preview**: Does the user want to watch? (`agent-tui live start --open`)

If any of these are unclear, ask before running.

## Demo Mode: Showing What agent-tui Can Do

When a user asks what agent-tui is, wants a demo, or asks "show me how it works":

1. **Don't explain—demonstrate.** Actions speak louder than words.
2. **Use the live preview** so they can watch in real-time.
3. **Run `top`**—it's universal and shows dynamic real-time updates.

**Quick demo trigger phrases:**
- "What is agent-tui?" / "What does agent-tui do?"
- "Demo agent-tui" / "Show me agent-tui"
- "How does agent-tui work?" / "See it in action"

## Failure Recovery

| Symptom | Diagnosis | Solution |
|---------|-----------|----------|
| "Text not found" | Stale view or text moved | Re-snapshot, locate text again |
| Wait times out | UI didn't reach expected state | Check screenshot, verify expectations |
| "Daemon not running" | Daemon crashed or not started | `agent-tui daemon start` |
| Unexpected layout | Wrong terminal size | `agent-tui resize --cols 120 --rows 40` |
| Session unresponsive | App crashed or hung | `agent-tui kill`, then re-run |
| Repeated failures | Something fundamentally wrong | Stop after 3-5 attempts, ask user |

## Self-Discovery: Use --help

You don't need to memorize every flag. The CLI is self-documenting:

```bash
agent-tui --help                     # List all commands
agent-tui run --help                 # Options for 'run'
agent-tui screenshot --help          # Options for 'screenshot'
agent-tui wait --help                # Options for 'wait'
```

**When in doubt, ask the CLI.** This skill teaches *when* and *why* to use commands. For exact flags and syntax, `--help` is authoritative.

## Quick Reference

```bash
# Start app
agent-tui run <cmd> [-- args]        # Launch TUI under control

# Observe
agent-tui screenshot                  # Plain text view
agent-tui screenshot --format json    # Machine-readable output

# Act
agent-tui press Enter                 # Press key(s)
agent-tui press Ctrl+C                # Keyboard shortcuts
agent-tui type "text"                 # Type text

# Wait/Verify
agent-tui wait "text" --assert        # Wait for text, fail if not found
agent-tui wait "text" --gone --assert # Wait for text to disappear
agent-tui wait --stable               # Wait for UI to stop changing

# Manage
agent-tui sessions                    # List active sessions
agent-tui live start --open           # Start live preview
agent-tui kill                        # End current session
```