Give Your LLM Vision

A Picture is Worth a Thousand TokensWords

Use Vision|Pipe to capture and narrate screenshots.

or install via command line:

Apple Silicon today. Windows on the roadmap.

Stop Working Blind

You’re working with an AI and need to show it what’s on your screen. You describe it in words. It misunderstands. You describe again. Repeat.

Or you need to walk your AI through a workflow that spans five different screens. So you take Screenshot 1. Paste it. Describe what it connects to. Take Screenshot 2. Switch back. Paste again. Re-establish the context you just lost. Repeat.

Or you’re not a developer at all. You see exactly what’s wrong — but describing a visual problem in a Jira ticket is hopeless. So you schedule a call. The developer takes notes. Some of it gets lost. They brief Claude Code from memory. The fix misses the point.

The gap between what you see and what your AI understands is costing the whole team — not just the developers.

Show your LLM What You Mean

Capture a sequence of screens. Narrate as you go — your voice transcribes on-device, anchored to the screenshot you were looking at when you said it. Hit Copy & Send. A structured markdown LLM Spec lands on your clipboard, ready for any LLM.

No uploads. No integrations. No UI sprawl.

Just the Unix philosophy applied to AI vision: do one thing, do it well, compose it with everything else.

Every other tool

Capture

↓

Upload image

↓

Switch to LLM

↓

Type context

↓

Submit

Vision|Pipe

Capture(s) + Narrate

↓

Copy & Send

↓

Paste → done

Not just screenshots. A full narrated LLM Spec

Every other tool puts an image on your clipboard.

Vision|Pipe produces a structured markdown document — timestamped, narrated, and organized the same way a senior engineer would hand off a bug report. Claude Code reads it with its native Read tool. No extra prompting. No re-explaining the context you already spoke.

VisionPipe-Spec-for-My-App-May-6-2026-10-21-AM-UTC.md

# Vision|Pipe Session — 2026-05-06 10:21:00 UTC
Duration: 2m 14s

---

## Screenshot 1 — Chrome · github.com/visionpipe/issues/42
![screenshot](./screenshot-1.png)

**Caption:** Auth failure on production

**Narration:** "The API is returning 403 on this endpoint
even though the token looks valid in the response headers
here. I've been seeing this since the deploy this morning..."

---

## Screenshot 2 — Visual Studio Code · src/auth.ts (line 84)
![screenshot](./screenshot-2.png)

**Caption:** Token generation logic

**Narration:** "...and here's where we generate the token.
The expiry is set to 24 hours. The scope parameter is
hardcoded but I wonder if that's the issue — it was
changed in the last PR."

---

## Closing Narration
"Both screenshots show the same flow. I think the scope
is wrong. Can you check the token generation against the
endpoint's expected claims and propose a fix?"

Your LLM like Claude Code or OpenAI Codex gets the full sequence — not a fragment. Your narration travels with the screenshot it describes. The ask is explicit. The AI can act.

Turn Screenshots into Markdown Specs Your LLM Wants

Five steps from hotkey to handoff.

Capture any part of your screen

Use a configurable hotkey if desired

Give your LLM context on the screenshot

Type or speak instructions to your LLM about the screenshot. Vision|Pipe transcribes on-device in real time, anchoring your words to the screenshot in front of you.

Take more screenshots to build a story for your LLM

Hit your hotkey again without stopping the session. Each new capture becomes its own card with its own segment of narration. Add as many screenshots as you'd like.

Copy and Share with your LLM

A structured markdown bundle is written to disk and copied to your clipboard. Drag it into Claude Code or paste it anywhere. Your AI has everything — screenshots, transcripts, context, and a clear ask.

Your LLM will love you

Your LLM now has vision and can clearly see everything you're trying to communicate.

Captures What You Mean,
Not Just What You See

Every other screenshot tool captures pixels. Vision|Pipe captures intent.

Speak It

Narrate continuously across the entire session. As you capture each screenshot, Vision|Pipe transcribes your voice in real time — anchoring your words to the screenshot you were looking at the moment you said them. No editing. No re-typing. Your intent is preserved, sequenced, and delivered with the images it describes.

Transcription runs on-device via Apple Speech. No audio leaves your machine.

"This dropdown is rendering below the viewport on Safari — why?"

Caption It

Each screenshot gets a short editable caption — a name that anchors it in your bundle and gives your AI an instant index of what each frame contains.

Token generation logic — line 84

Coming soon

Draw It

Circle the problem. Highlight the element. Draw an arrow. A lightweight markup layer is in development — it briefly shipped earlier and is being rebuilt as part of the annotation system.

Give Your Developers Vision

Coming soon

Developers have AI coding agents that can build, fix, and refactor — but only when given precise context. That context usually lives in someone else’s head.

The PM who walked through the broken checkout flow. The designer who spotted the misaligned component. The QA tester who can reproduce the crash every time.

They all see exactly what needs to happen. Until now, there was no good way to get that vision into the hands of the AI that can act on it. Vision|Pipe is closing that gap.

Product Manager

Records a 3-screenshot walkthrough of a broken user flow with voice-narrated context on what needs to change and why.

What arrives in the LLM

A structured markdown document with every screen, every word, and a clear ask — Claude Code reads it like a spec.

Designer

Captures a UI mismatch across two screens and narrates the expected behavior.

What arrives in the LLM

Side-by-side screenshots with timestamped narration — the AI sees the gap immediately.

QA / CS Team

Walks through a reproducible bug step by step with voice commentary on each screen.

What arrives in the LLM

A sequenced, narrated session that gives the developer and the AI the exact reproduction path.

Share It. Ship It.

Coming soon

When the session is ready, one button uploads it to the cloud and generates a private link.

Send it to a developer. Drop it in Slack. Paste it in a ticket. The recipient opens the link, previews the session in their browser — screenshots, transcripts, narration, and metadata — then drags the markdown LLM Spec straight into Claude Code.

No files to manage. No context to reconstruct. The LLM Spec is intact, structured, and ready to act on.

Step 1

Record your session

screenshots + narration

Step 2

Save to Cloud

one click, private

Step 3

Share the link

developer → Claude Code → done

Get notified when Cloud Share ships.

We’ll only email you about the launch. No marketing list.

Your LLM Gets the Full Picture

Vision|Pipe doesn’t just send a screenshot. It sends the complete context of where and what the image was captured from — automatically appended to every clipboard payload.

Spatial & Display

Capture regionx: 240, y: 180

Capture dimensions1200 × 800 px

Screen resolution2560 × 1600

DPI / scale factor2x (Retina)

MonitorLG UltraFine 5K

Color profileDisplay P3

Window & Application

Active applicationVisual Studio Code

Window titlevisionpipe — README.md

Window size1440 × 900

Window stateWindowed

Process IDPID 4821

Browser Context

BrowserChrome 124.0

Active tab URLgithub.com/visionpipe

Page titleVision|Pipe — GitHub

Viewport1440 × 789

System

Timestamp2026-04-11T14:32:01Z

Operating systemmacOS 15.3.2

Hostnamecolins-macbook-pro

Screen count2

Captured via macOS Accessibility API. Windows UI Automation on the roadmap.

Every Other Tool Was Built for Humans

Vision|Pipe was built for your AI — and the team around it.

Tool	Built For	Annotate at Capture	Rich Metadata	Team Sharingsoon
Playwright	Programmatic browser automation		Partial
Loom / Zight / CleanShot X	Sharing with humans	Post-capture only		Partial
Snagit	Documentation & tutorials	Post-capture only
macOS Screenshot	General capture
Vision\|Pipe	Piping visual context into LLMs	Voice + caption		soon

“If Playwright gives your test suite vision, Vision|Pipe gives you vision — and the rest of your team a way to share it.”

Built the Right Way

Tauri

Lightweight and secure — not Electron. Minimal memory footprint.

Rust

Systems-level metadata capture, performance, and reliability.

Apple Speech

On-device transcription via SFSpeechRecognizer. No audio leaves your machine.

Built in the Open

Vision|Pipe is source-available and community-driven. The code is visible, forkable, and we welcome pull requests.

# Fork the repo

git checkout -b feature/your-feature

git commit -am 'Add your feature'

# Open a Pull Request

View on GitHub Contributing Guide

Questions? Open an issue or reach out on X @Vision_Pipe.

Stop Pasting Screenshots. Start Delivering LLM Specs.

Free for personal use. Open for contributions. Built for everyone who can see what needs to change.

Download for Mac

or install via command line:

Apple Silicon today. Windows on the roadmap.

Everything It Does. Nothing It Doesn’t.

Shipping today

Multi-screenshot sessions

String captures together with continuous narration; one shareable bundle.

Real-time on-device transcription

Talk while you capture; Apple Speech anchors transcripts to each screenshot. No audio leaves your machine.

Per-segment re-record

Fix any single piece of narration without losing the rest.

Two view modes

Interleaved (cards + inline narration) or split (cards left, transcript right).

Markdown LLM Spec output

A timestamped, structured markdown file written to disk + clipboard, optimized for Claude Code, GPT, Gemini, etc.

HistoryHub

In-app browser for past sessions; reopen, copy, or show in Finder.

Editable captions

Name each screenshot inline; captions travel into the markdown bundle.

Scrolling capture

Capture content that extends beyond the visible viewport.

Configurable hotkeys

Rebind via Settings.

Rich metadata

Spatial, window, browser, and system context bundled automatically.

Local-first persistence

Sessions live on disk in ~/Pictures/VisionPipe/.

Open source

See exactly what you're running.

LLM-agnostic

Works with any AI that accepts images and markdown.

Lightweight

Tauri + Rust; minimal CPU and memory footprint.

Coming soon

Cloud session sharingsoon

Upload to a private link; recipient previews in browser or downloads.

Drawing & annotationsoon

Markup layer returning as part of the annotation rebuild.

On-device WhisperKitsoon

Opt-in alternative to Apple Speech.

Windows supportsoon

Tauri code base supports cross-compilation; Apple Silicon ships today.

Give Your LLM Vision

Stop Working Blind

Show your LLM What You Mean

Not just screenshots. A full narrated LLM Spec

Turn Screenshots into Markdown Specs Your LLM Wants

Capture any part of your screen

Give your LLM context on the screenshot

Take more screenshots to build a story for your LLM

Copy and Share with your LLM

Your LLM will love you

Captures What You Mean,Not Just What You See

Speak It

Caption It

Draw It

Give Your Developers Vision

Product Manager

Designer

QA / CS Team

Share It. Ship It.

Your LLM Gets the Full Picture

Spatial & Display

Window & Application

Browser Context

System

Every Other Tool Was Built for Humans

Built the Right Way

Built in the Open

Stop Pasting Screenshots. Start Delivering LLM Specs.

Everything It Does. Nothing It Doesn’t.

Shipping today

Coming soon

Captures What You Mean,
Not Just What You See