
Give Your LLM Vision
A Picture is Worth a Thousand TokensWords
Use Vision|Pipe to capture and narrate screenshots.
or install via command line:
Apple Silicon today. Windows on the roadmap.
View on GitHub →Stop Working Blind
You’re working with an AI and need to show it what’s on your screen. You describe it in words. It misunderstands. You describe again. Repeat.
Or you need to walk your AI through a workflow that spans five different screens. So you take Screenshot 1. Paste it. Describe what it connects to. Take Screenshot 2. Switch back. Paste again. Re-establish the context you just lost. Repeat.
Or you’re not a developer at all. You see exactly what’s wrong — but describing a visual problem in a Jira ticket is hopeless. So you schedule a call. The developer takes notes. Some of it gets lost. They brief Claude Code from memory. The fix misses the point.
The gap between what you see and what your AI understands is costing the whole team — not just the developers.
Show your LLM What You Mean
Capture a sequence of screens. Narrate as you go — your voice transcribes on-device, anchored to the screenshot you were looking at when you said it. Hit Copy & Send. A structured markdown LLM Spec lands on your clipboard, ready for any LLM.
No uploads. No integrations. No UI sprawl.
Just the Unix philosophy applied to AI vision: do one thing, do it well, compose it with everything else.
Every other tool
Capture
↓
Upload image
↓
Switch to LLM
↓
Type context
↓
Submit
Vision|Pipe
Capture(s) + Narrate
↓
Copy & Send
↓
Paste → done
Not just screenshots. A full narrated LLM Spec
Every other tool puts an image on your clipboard.
Vision|Pipe produces a structured markdown document — timestamped, narrated, and organized the same way a senior engineer would hand off a bug report. Claude Code reads it with its native Read tool. No extra prompting. No re-explaining the context you already spoke.
# Vision|Pipe Session — 2026-05-06 10:21:00 UTC
Duration: 2m 14s
---
## Screenshot 1 — Chrome · github.com/visionpipe/issues/42

**Caption:** Auth failure on production
**Narration:** "The API is returning 403 on this endpoint
even though the token looks valid in the response headers
here. I've been seeing this since the deploy this morning..."
---
## Screenshot 2 — Visual Studio Code · src/auth.ts (line 84)

**Caption:** Token generation logic
**Narration:** "...and here's where we generate the token.
The expiry is set to 24 hours. The scope parameter is
hardcoded but I wonder if that's the issue — it was
changed in the last PR."
---
## Closing Narration
"Both screenshots show the same flow. I think the scope
is wrong. Can you check the token generation against the
endpoint's expected claims and propose a fix?"
Your LLM like Claude Code or OpenAI Codex gets the full sequence — not a fragment. Your narration travels with the screenshot it describes. The ask is explicit. The AI can act.
Turn Screenshots into Markdown Specs Your LLM Wants
Five steps from hotkey to handoff.
Capture any part of your screen
Use a configurable hotkey if desired
Give your LLM context on the screenshot
Type or speak instructions to your LLM about the screenshot. Vision|Pipe transcribes on-device in real time, anchoring your words to the screenshot in front of you.
Take more screenshots to build a story for your LLM
Hit your hotkey again without stopping the session. Each new capture becomes its own card with its own segment of narration. Add as many screenshots as you'd like.
Copy and Share with your LLM
A structured markdown bundle is written to disk and copied to your clipboard. Drag it into Claude Code or paste it anywhere. Your AI has everything — screenshots, transcripts, context, and a clear ask.
Your LLM will love you
Your LLM now has vision and can clearly see everything you're trying to communicate.
Captures What You Mean,
Not Just What You See
Every other screenshot tool captures pixels. Vision|Pipe captures intent.
Speak It
Narrate continuously across the entire session. As you capture each screenshot, Vision|Pipe transcribes your voice in real time — anchoring your words to the screenshot you were looking at the moment you said them. No editing. No re-typing. Your intent is preserved, sequenced, and delivered with the images it describes.
Transcription runs on-device via Apple Speech. No audio leaves your machine.
"This dropdown is rendering below the viewport on Safari — why?"
Caption It
Each screenshot gets a short editable caption — a name that anchors it in your bundle and gives your AI an instant index of what each frame contains.
Token generation logic — line 84
Draw It
Circle the problem. Highlight the element. Draw an arrow. A lightweight markup layer is in development — it briefly shipped earlier and is being rebuilt as part of the annotation system.
Give Your Developers Vision
Coming soonDevelopers have AI coding agents that can build, fix, and refactor — but only when given precise context. That context usually lives in someone else’s head.
The PM who walked through the broken checkout flow. The designer who spotted the misaligned component. The QA tester who can reproduce the crash every time.
They all see exactly what needs to happen. Until now, there was no good way to get that vision into the hands of the AI that can act on it. Vision|Pipe is closing that gap.
Product Manager
Records a 3-screenshot walkthrough of a broken user flow with voice-narrated context on what needs to change and why.
What arrives in the LLM
A structured markdown document with every screen, every word, and a clear ask — Claude Code reads it like a spec.
Designer
Captures a UI mismatch across two screens and narrates the expected behavior.
What arrives in the LLM
Side-by-side screenshots with timestamped narration — the AI sees the gap immediately.
QA / CS Team
Walks through a reproducible bug step by step with voice commentary on each screen.
What arrives in the LLM
A sequenced, narrated session that gives the developer and the AI the exact reproduction path.
Share It. Ship It.
Coming soonWhen the session is ready, one button uploads it to the cloud and generates a private link.
Send it to a developer. Drop it in Slack. Paste it in a ticket. The recipient opens the link, previews the session in their browser — screenshots, transcripts, narration, and metadata — then drags the markdown LLM Spec straight into Claude Code.
No files to manage. No context to reconstruct. The LLM Spec is intact, structured, and ready to act on.
Record your session
screenshots + narration
Save to Cloud
one click, private
Share the link
developer → Claude Code → done
Get notified when Cloud Share ships.
We’ll only email you about the launch. No marketing list.
Your LLM Gets the Full Picture
Vision|Pipe doesn’t just send a screenshot. It sends the complete context of where and what the image was captured from — automatically appended to every clipboard payload.
Spatial & Display
Window & Application
Browser Context
System
Captured via macOS Accessibility API. Windows UI Automation on the roadmap.
Every Other Tool Was Built for Humans
Vision|Pipe was built for your AI — and the team around it.
| Tool | Built For | LLM-Native | Annotate at Capture | Rich Metadata | Markdown Deliverable | Team Sharingsoon |
|---|---|---|---|---|---|---|
| Playwright | Programmatic browser automation | Partial | ||||
| Loom / Zight / CleanShot X | Sharing with humans | Post-capture only | Partial | |||
| Snagit | Documentation & tutorials | Post-capture only | ||||
| macOS Screenshot | General capture | |||||
| Vision|Pipe | Piping visual context into LLMs | Voice + caption | soon |
“If Playwright gives your test suite vision, Vision|Pipe gives you vision — and the rest of your team a way to share it.”
Built the Right Way
Tauri
Lightweight and secure — not Electron. Minimal memory footprint.
Rust
Systems-level metadata capture, performance, and reliability.
Apple Speech
On-device transcription via SFSpeechRecognizer. No audio leaves your machine.
Built in the Open
Vision|Pipe is source-available and community-driven. The code is visible, forkable, and we welcome pull requests.
# Fork the repo
git checkout -b feature/your-feature
git commit -am 'Add your feature'
# Open a Pull Request
Questions? Open an issue or reach out on X @Vision_Pipe.
Stop Pasting Screenshots. Start Delivering LLM Specs.
Free for personal use. Open for contributions. Built for everyone who can see what needs to change.
Everything It Does. Nothing It Doesn’t.
Shipping today
Multi-screenshot sessions
String captures together with continuous narration; one shareable bundle.
Real-time on-device transcription
Talk while you capture; Apple Speech anchors transcripts to each screenshot. No audio leaves your machine.
Per-segment re-record
Fix any single piece of narration without losing the rest.
Two view modes
Interleaved (cards + inline narration) or split (cards left, transcript right).
Markdown LLM Spec output
A timestamped, structured markdown file written to disk + clipboard, optimized for Claude Code, GPT, Gemini, etc.
HistoryHub
In-app browser for past sessions; reopen, copy, or show in Finder.
Editable captions
Name each screenshot inline; captions travel into the markdown bundle.
Scrolling capture
Capture content that extends beyond the visible viewport.
Configurable hotkeys
Rebind via Settings.
Rich metadata
Spatial, window, browser, and system context bundled automatically.
Local-first persistence
Sessions live on disk in ~/Pictures/VisionPipe/.
Open source
See exactly what you're running.
LLM-agnostic
Works with any AI that accepts images and markdown.
Lightweight
Tauri + Rust; minimal CPU and memory footprint.
Coming soon
Cloud session sharingsoon
Upload to a private link; recipient previews in browser or downloads.
Drawing & annotationsoon
Markup layer returning as part of the annotation rebuild.
On-device WhisperKitsoon
Opt-in alternative to Apple Speech.
Windows supportsoon
Tauri code base supports cross-compilation; Apple Silicon ships today.