audio-transcriber
PassTranscribe audio files using Groq's Whisper API (fast, cloud-based). Use when the user sends voice messages, audio files (ogg, mp3, wav, m4a, etc.), or asks for speech-to-text transcription. Requires GROQ_API_KEY environment variable.
(0)
1.0k
50
59
Install Skill
Skills are third-party code from public GitHub repositories. SkillHub scans for known malicious patterns but cannot guarantee safety. Review the source code before installing.
Install globally (user-level):
npx skillhub install openclaw/skills/audio-transcriberInstall in current project:
npx skillhub install openclaw/skills/audio-transcriber --projectSuggested path: ~/.claude/skills/audio-transcriber/
AI Review
Instruction Quality55
Description Precision65
Usefulness65
Technical Soundness65
Scored 62 — practical transcription skill with real script. Good description with clear triggers and dependency note. Straightforward workflow that saves setup time. Could improve with error handling documentation.
SKILL.md Content
---
name: audio-transcriber
description: Transcribe audio files using Groq's Whisper API (fast, cloud-based). Use when the user sends voice messages, audio files (ogg, mp3, wav, m4a, etc.), or asks for speech-to-text transcription. Requires GROQ_API_KEY environment variable.
---
# Audio Transcriber
## Overview
This skill enables fast audio transcription using Groq's Whisper API. Transcription happens in the cloud via Groq's infrastructure, providing significantly faster results than local Whisper models.
## Quick Start
When a user sends an audio file or voice message:
1. Ensure GROQ_API_KEY is set in environment
2. Use the transcribe script: `scripts/transcribe.py /path/to/audio.ogg`
3. Return the transcribed text to the user
## Usage
### Basic Transcription
```bash
export GROQ_API_KEY="your-key-here"
python3 /path/to/audio-transcriber/scripts/transcribe.py /path/to/audio.ogg
```
The script:
- Accepts any audio format (ogg, mp3, wav, m4a, etc.)
- Automatically converts to WAV (16kHz, mono) using ffmpeg (if available)
- Sends to Groq's Whisper API for transcription
- Outputs plain text to stdout
### Supported Audio Formats
- **Voice messages**: OGG (Telegram, Signal, etc.)
- **Common formats**: MP3, WAV, M4A, FLAC, AAC
- **Container formats**: The script handles conversion automatically if ffmpeg is installed
- **Without ffmpeg**: Only WAV files are supported
## Setup Requirements
The skill requires these to be configured:
### 1. Groq API Key
Get an API key from https://console.groq.com/
Set as environment variable:
```bash
export GROQ_API_KEY="your-key-here"
```
For persistent setting, add to your shell profile (~/.zshrc or ~/.bashrc):
```bash
echo 'export GROQ_API_KEY="your-key-here"' >> ~/.zshrc
```
### 2. ffmpeg (recommended)
```bash
brew install ffmpeg
```
Without ffmpeg, only WAV files will work. ffmpeg is used to convert other formats to WAV before sending to Groq.
## Resources
### scripts/transcribe.py
Main transcription script that:
- Validates GROQ_API_KEY environment variable
- Checks for ffmpeg (optional but recommended)
- Converts audio to WAV format if needed
- Sends to Groq's Whisper API (whisper-large-v3 model)
- Extracts and outputs plain text
Run directly from command line or via exec tool.
## Performance Notes
- **Speed**: Much faster than local Whisper (typically <1 second for short messages)
- **Model**: Uses whisper-large-v3 via Groq API (high accuracy)
- **Latency**: Cloud-based, depends on internet connection
- **Cost**: Groq offers free tier; check current pricing for usage limits
- **Accuracy**: Excellent for general speech; handles:
- Multiple accents and dialects
- Multiple speakers (moderately)
- Noisy environments
- Technical jargon
## Troubleshooting
### "GROQ_API_KEY environment variable not set"
```bash
export GROQ_API_KEY="your-key-here"
```
### "ffmpeg not found"
```bash
brew install ffmpeg
```
### API errors
- Check your Groq API key is valid
- Verify you have remaining quota on your Groq account
- Check internet connectivity
## Security Note
Never commit the GROQ_API_KEY to version control. Use environment variables or a secure secrets manager.