When you run a single AI coding agent, the current tooling is fine. Open a terminal, start Claude Code or qalcode2, type your task, wait. The agent works, you come back when it's done.
The problem starts when you run ten.
Ten parallel agents — each doing a different task across ten projects — is increasingly normal for developers building with AI. And ten parallel agents in tmux is chaos. You don't know which one is blocked waiting for your input. You can't tell at a glance which one just failed. Switching between them means remembering which window number maps to which project. A permission prompt fires silently in window 7 while you're looking at window 3, and stays blocked for an hour.
gmux is the solution to that problem. It's a gesture and voice-controlled shell layer that wraps tmux and gives you a mission-control view of every AI agent running on your machine — with live state indicators, gesture navigation, voice command routing, and a phone remote for when you're away from the keyboard.
The core problem in detail
Modern AI coding agents like Claude Code, qalcode2, and opencode run in a terminal. They output their state — thinking, running tools, waiting, permission needed — as text inside a tmux pane. But tmux itself is oblivious to that state. It just shows rectangles of text. There's no traffic light. No way to see from outside the pane that agent 7 needs you right now.
The conventional workaround is to scan every window manually. Prefix+1, look, nothing. Prefix+2, look, nothing. Prefix+7, oh — permission prompt, been waiting an hour. This gets worse as agent count grows.
gmux solves this by reading state directly from the AI agents' own HTTP APIs (no screen scraping, no pattern matching on terminal output), and presenting it as a colour-coded status bar across every tmux window:
┌─ gmux session ──────────────────────────────────────────────┐
│ 2:◉ doofing 5/7 3:● AI_diary 6/6 4:! face-tr 0/3 │
│ 5:◉ gmux 2/5 6:● knowledge 3/4 7:○ fish │
└──────────────────────────────────────────────────────────────┘
◉ green = working ● red = waiting ! orange = permission needed
○ dim = idle shell ✗ = error
The numbers (5/7, 6/6) are live todo progress pulled from qalcode2's API
— how many tasks in the current session are done out of how many.
This alone is useful. But gmux goes further.
The three layers
gmux is built in three independent layers that can each run without the others.
Layer 1 · the terminal stack (gmux core)
The Python backend that runs as a daemon, watching all tmux panes:
- monitor.py — polls every qalcode2 instance's HTTP API and writes live state to
/tmp/gmux-pane-state.jsonevery 2 seconds - pane_status.py — formats that state as a tmux status bar string with colour codes and emoji
- bridge.py — WebSocket hub on
:8767and HTTP on:8768, routes voice commands and phone remote commands to the right tmux pane - session_restore.py — on startup, reads the saved session config and relaunches every agent in its correct project directory
This layer never goes down independently of the others. If the Tauri app crashes, the terminal keeps working.
Layer 2 · the desktop app (gmux-ui)
A Tauri (Rust + WebKit) desktop application that embeds a full tmux terminal inside a native window. Instead of a floating overlay on top of a separate terminal, it is the terminal:
- xterm.js + PTY — tmux renders inside the Tauri window via a real PTY connection. Every keystroke routes through Rust to tmux.
- Agent sidebar — a panel showing all agent panes with live state indicators, todo progress bars, and a sort order that puts
! permission neededfirst, then● waiting, then◉ working. - Gesture canvas — a transparent layer on top of the terminal where MediaPipe hand tracking draws landmark overlays and interprets gestures.
- Tab bar — each tmux window is a tab showing the project name, AI state dot, and
X/Ytodo count.
The data flow:
OpenCode/qalcode2 instances (each on a random HTTP port)
↓ SSE stream per pane
monitor.py → /tmp/gmux-pane-state.json → HTTP :8769/api/state
↓
Tauri app (polls /tmp/*.json) OR Browser (polls :8769)
↓
ui/v3/index.html — renders agent grid, gesture overlay, chat panel
Layer 3 · the phone remote
A mobile-optimised web dashboard at http://yourip:8768 — accessible from phone via
Tailscale or local wifi. Shows the same agent state cards, accepts voice input (speak to a specific
agent pane), and handles permission approve/deny. Volume keys work: Vol↓ jumps to the next waiting
agent, Vol↑ is push-to-talk.
Gesture control
The gesture vocabulary was designed to feel natural rather than arbitrary. Gestures are split between hands — left hand controls navigation and voice, right hand controls the terminal and cursor.
| Hand | Gesture | Action |
|---|---|---|
| Right | Swipe left/right | Switch tmux window |
| Right | Pinch + drag | Scroll |
| Right | Pinch + release | Click/select |
| Left | Point (index finger) | Toggle voice listening |
| Left | Thumbs up | Approve permission |
| Left | Thumbs down | Reject permission |
| Left | Three fingers | Jump to next waiting (●) agent |
| Both | Open palms apart | New tmux window |
The system runs in two modes. In passive mode (default when you're typing), gesture detection has a higher confidence threshold and swipes are blocked — you don't accidentally switch panes while gesturing mid-sentence. In active mode (triggered by holding an open palm for 1.5 seconds), all gestures are active for deliberate navigation.
Camera sharing is handled via v4l2loopback: a virtual camera device at /dev/video2 is
fed from the real webcam by a background ffmpeg process. The gesture engine and browser apps both
read from the virtual device — no "camera in use" conflicts.
Voice commands
Voice runs in two modes depending on what's available:
- faster-whisper (local, offline) via WebSocket at
:8770— runs on-device, no API key, works offline. Uses your AMD ROCm GPU if available. - Web Speech API (browser native) — Chrome/Brave, no setup needed. Falls back to this if the voice daemon isn't running.
The voice vocabulary is two-tier. Navigation commands are handled by gmux itself (never reach the AI):
"next window" / "previous window" → tmux window switch
"accept" / "always" / "deny" → permission response
"new window" → tmux new-window
Everything else is routed as text input to the focused AI pane — effectively typing for you. Say the agent's name first to route to a specific pane: "kalarc, explain the architecture of the voice router."
What makes this different
Most "AI terminal" tools are single-agent. Warp, Cursor, even the new DeepSeek-TUI — they all assume one agent, one session, one task at a time. That's fine for light use. It doesn't scale.
The tools that do handle multiple agents (Multica, LangGraph, AutoGen) operate at the orchestration layer — they assign tasks and track progress. They don't manage the actual terminal experience. They hand tasks to agents and get results back. There's no concept of "I need to look at window 7 right now because the permission light is on."
gmux sits at a different layer entirely — the interaction layer. It doesn't replace orchestration tools like Multica; it runs beneath them, handling the moment-to-moment experience of a human working alongside a fleet of AI agents.
| Tool | Role | What it manages |
|---|---|---|
| Multica | Orchestration | Which agent gets which task |
| qalcode2/opencode | Execution | Running the actual AI coding |
| gmux | Interaction | The human's experience of all of it |
Nothing else in this space has gesture control, voice routing, and live AI state awareness combined. That combination is the moat.
Current status (May 2026)
The terminal stack (Layer 1) is fully working. Live state detection, colour-coded status bar, session
restore, phone remote — all shipped and available via pip install gmux and
paru -S gmux on Arch/CachyOS.
The Tauri desktop app (Layer 2) has working PTY, agent sidebar, and live data flow (14 panes across 5 sessions verified). Gesture overlay is partial — MediaPipe loads but the model currently requires a CDN fetch on first run. Voice is not yet wired in the Tauri app; installer packaging is paused until the app runs end-to-end reliably.
gmux.ai is live at gmux.ai with an early access email list.
gmux is an open project. MIT licensed.
Install: pip install gmux or paru -S gmux