gmux: A Gesture-Aware AI Terminal Multiplexer

When you run a single AI coding agent, the current tooling is fine. Open a terminal, start Claude Code or qalcode2, type your task, wait. The agent works, you come back when it's done.

The problem starts when you run ten.

Ten parallel agents — each doing a different task across ten projects — is increasingly normal for developers building with AI. And ten parallel agents in tmux is chaos. You don't know which one is blocked waiting for your input. You can't tell at a glance which one just failed. Switching between them means remembering which window number maps to which project. A permission prompt fires silently in window 7 while you're looking at window 3, and stays blocked for an hour.

gmux is the solution to that problem. It's a gesture and voice-controlled shell layer that wraps tmux and gives you a mission-control view of every AI agent running on your machine — with live state indicators, gesture navigation, voice command routing, and a phone remote for when you're away from the keyboard.

The core problem in detail

Modern AI coding agents like Claude Code, qalcode2, and opencode run in a terminal. They output their state — thinking, running tools, waiting, permission needed — as text inside a tmux pane. But tmux itself is oblivious to that state. It just shows rectangles of text. There's no traffic light. No way to see from outside the pane that agent 7 needs you right now.

The conventional workaround is to scan every window manually. Prefix+1, look, nothing. Prefix+2, look, nothing. Prefix+7, oh — permission prompt, been waiting an hour. This gets worse as agent count grows.

gmux solves this by reading state directly from the AI agents' own HTTP APIs (no screen scraping, no pattern matching on terminal output), and presenting it as a colour-coded status bar across every tmux window:

┌─ gmux session ──────────────────────────────────────────────┐
│  2:◉ doofing 5/7  3:● AI_diary 6/6  4:! face-tr 0/3       │
│  5:◉ gmux 2/5     6:● knowledge 3/4  7:○ fish              │
└──────────────────────────────────────────────────────────────┘
◉ green = working    ● red = waiting    ! orange = permission needed
○ dim = idle shell   ✗ = error

The numbers (5/7, 6/6) are live todo progress pulled from qalcode2's API — how many tasks in the current session are done out of how many.

This alone is useful. But gmux goes further.

The three layers

gmux is built in three independent layers that can each run without the others.

Layer 1 · the terminal stack (gmux core)

The Python backend that runs as a daemon, watching all tmux panes:

monitor.py — polls every qalcode2 instance's HTTP API and writes live state to /tmp/gmux-pane-state.json every 2 seconds
pane_status.py — formats that state as a tmux status bar string with colour codes and emoji
bridge.py — WebSocket hub on :8767 and HTTP on :8768, routes voice commands and phone remote commands to the right tmux pane
session_restore.py — on startup, reads the saved session config and relaunches every agent in its correct project directory

This layer never goes down independently of the others. If the Tauri app crashes, the terminal keeps working.

Layer 2 · the desktop app (gmux-ui)

A Tauri (Rust + WebKit) desktop application that embeds a full tmux terminal inside a native window. Instead of a floating overlay on top of a separate terminal, it is the terminal:

xterm.js + PTY — tmux renders inside the Tauri window via a real PTY connection. Every keystroke routes through Rust to tmux.
Agent sidebar — a panel showing all agent panes with live state indicators, todo progress bars, and a sort order that puts ! permission needed first, then ● waiting, then ◉ working.
Gesture canvas — a transparent layer on top of the terminal where MediaPipe hand tracking draws landmark overlays and interprets gestures.
Tab bar — each tmux window is a tab showing the project name, AI state dot, and X/Y todo count.

The data flow:

OpenCode/qalcode2 instances (each on a random HTTP port)
          ↓  SSE stream per pane
    monitor.py  →  /tmp/gmux-pane-state.json  →  HTTP :8769/api/state
          ↓
    Tauri app (polls /tmp/*.json) OR Browser (polls :8769)
          ↓
    ui/v3/index.html — renders agent grid, gesture overlay, chat panel

Layer 3 · the phone remote

A mobile-optimised web dashboard at http://yourip:8768 — accessible from phone via Tailscale or local wifi. Shows the same agent state cards, accepts voice input (speak to a specific agent pane), and handles permission approve/deny. Volume keys work: Vol↓ jumps to the next waiting agent, Vol↑ is push-to-talk.

Gesture control

The gesture vocabulary was designed to feel natural rather than arbitrary. Gestures are split between hands — left hand controls navigation and voice, right hand controls the terminal and cursor.

Hand	Gesture	Action
Right	Swipe left/right	Switch tmux window
Right	Pinch + drag	Scroll
Right	Pinch + release	Click/select
Left	Point (index finger)	Toggle voice listening
Left	Thumbs up	Approve permission
Left	Thumbs down	Reject permission
Left	Three fingers	Jump to next waiting (●) agent
Both	Open palms apart	New tmux window

The system runs in two modes. In passive mode (default when you're typing), gesture detection has a higher confidence threshold and swipes are blocked — you don't accidentally switch panes while gesturing mid-sentence. In active mode (triggered by holding an open palm for 1.5 seconds), all gestures are active for deliberate navigation.

Camera sharing is handled via v4l2loopback: a virtual camera device at /dev/video2 is fed from the real webcam by a background ffmpeg process. The gesture engine and browser apps both read from the virtual device — no "camera in use" conflicts.

Voice commands

Voice runs in two modes depending on what's available:

faster-whisper (local, offline) via WebSocket at :8770 — runs on-device, no API key, works offline. Uses your AMD ROCm GPU if available.
Web Speech API (browser native) — Chrome/Brave, no setup needed. Falls back to this if the voice daemon isn't running.

The voice vocabulary is two-tier. Navigation commands are handled by gmux itself (never reach the AI):

"next window" / "previous window" → tmux window switch
"accept" / "always" / "deny" → permission response
"new window" → tmux new-window

Everything else is routed as text input to the focused AI pane — effectively typing for you. Say the agent's name first to route to a specific pane: "kalarc, explain the architecture of the voice router."

What makes this different

Most "AI terminal" tools are single-agent. Warp, Cursor, even the new DeepSeek-TUI — they all assume one agent, one session, one task at a time. That's fine for light use. It doesn't scale.

The tools that do handle multiple agents (Multica, LangGraph, AutoGen) operate at the orchestration layer — they assign tasks and track progress. They don't manage the actual terminal experience. They hand tasks to agents and get results back. There's no concept of "I need to look at window 7 right now because the permission light is on."

gmux sits at a different layer entirely — the interaction layer. It doesn't replace orchestration tools like Multica; it runs beneath them, handling the moment-to-moment experience of a human working alongside a fleet of AI agents.

Tool	Role	What it manages
Multica	Orchestration	Which agent gets which task
qalcode2/opencode	Execution	Running the actual AI coding
gmux	Interaction	The human's experience of all of it

Nothing else in this space has gesture control, voice routing, and live AI state awareness combined. That combination is the moat.

Current status (May 2026)

The terminal stack (Layer 1) is fully working. Live state detection, colour-coded status bar, session restore, phone remote — all shipped and available via pip install gmux and paru -S gmux on Arch/CachyOS.

The Tauri desktop app (Layer 2) has working PTY, agent sidebar, and live data flow (14 panes across 5 sessions verified). Gesture overlay is partial — MediaPipe loads but the model currently requires a CDN fetch on first run. Voice is not yet wired in the Tauri app; installer packaging is paused until the app runs end-to-end reliably.

gmux.ai is live at gmux.ai with an early access email list.

gmux is an open project. MIT licensed.
Install: pip install gmux or paru -S gmux

Next · Implementations → ← Back to writing