gmux.ai/writing/overview
Research · 01 May 2026 · ~10 min

A gesture-aware AI terminal multiplexer.

A technical overview of what gmux is, what problem it solves, and how it works.

When you run a single AI coding agent, the current tooling is fine. Open a terminal, start Claude Code or qalcode2, type your task, wait. The agent works, you come back when it's done.

The problem starts when you run ten.

Ten parallel agents — each doing a different task across ten projects — is increasingly normal for developers building with AI. And ten parallel agents in tmux is chaos. You don't know which one is blocked waiting for your input. You can't tell at a glance which one just failed. Switching between them means remembering which window number maps to which project. A permission prompt fires silently in window 7 while you're looking at window 3, and stays blocked for an hour.

gmux is the solution to that problem. It's a gesture and voice-controlled shell layer that wraps tmux and gives you a mission-control view of every AI agent running on your machine — with live state indicators, gesture navigation, voice command routing, and a phone remote for when you're away from the keyboard.

The core problem in detail

Modern AI coding agents like Claude Code, qalcode2, and opencode run in a terminal. They output their state — thinking, running tools, waiting, permission needed — as text inside a tmux pane. But tmux itself is oblivious to that state. It just shows rectangles of text. There's no traffic light. No way to see from outside the pane that agent 7 needs you right now.

The conventional workaround is to scan every window manually. Prefix+1, look, nothing. Prefix+2, look, nothing. Prefix+7, oh — permission prompt, been waiting an hour. This gets worse as agent count grows.

gmux solves this by reading state directly from the AI agents' own HTTP APIs (no screen scraping, no pattern matching on terminal output), and presenting it as a colour-coded status bar across every tmux window:

┌─ gmux session ──────────────────────────────────────────────┐
│  2:◉ doofing 5/7  3:● AI_diary 6/6  4:! face-tr 0/3       │
│  5:◉ gmux 2/5     6:● knowledge 3/4  7:○ fish              │
└──────────────────────────────────────────────────────────────┘
◉ green = working    ● red = waiting    ! orange = permission needed
○ dim = idle shell   ✗ = error

The numbers (5/7, 6/6) are live todo progress pulled from qalcode2's API — how many tasks in the current session are done out of how many.

This alone is useful. But gmux goes further.

The three layers

gmux is built in three independent layers that can each run without the others.

Layer 1 · the terminal stack (gmux core)

The Python backend that runs as a daemon, watching all tmux panes:

This layer never goes down independently of the others. If the Tauri app crashes, the terminal keeps working.

Layer 2 · the desktop app (gmux-ui)

A Tauri (Rust + WebKit) desktop application that embeds a full tmux terminal inside a native window. Instead of a floating overlay on top of a separate terminal, it is the terminal:

The data flow:

OpenCode/qalcode2 instances (each on a random HTTP port)
          ↓  SSE stream per pane
    monitor.py  →  /tmp/gmux-pane-state.json  →  HTTP :8769/api/state
          ↓
    Tauri app (polls /tmp/*.json) OR Browser (polls :8769)
          ↓
    ui/v3/index.html — renders agent grid, gesture overlay, chat panel

Layer 3 · the phone remote

A mobile-optimised web dashboard at http://yourip:8768 — accessible from phone via Tailscale or local wifi. Shows the same agent state cards, accepts voice input (speak to a specific agent pane), and handles permission approve/deny. Volume keys work: Vol↓ jumps to the next waiting agent, Vol↑ is push-to-talk.

Gesture control

The gesture vocabulary was designed to feel natural rather than arbitrary. Gestures are split between hands — left hand controls navigation and voice, right hand controls the terminal and cursor.

HandGestureAction
RightSwipe left/rightSwitch tmux window
RightPinch + dragScroll
RightPinch + releaseClick/select
LeftPoint (index finger)Toggle voice listening
LeftThumbs upApprove permission
LeftThumbs downReject permission
LeftThree fingersJump to next waiting (●) agent
BothOpen palms apartNew tmux window

The system runs in two modes. In passive mode (default when you're typing), gesture detection has a higher confidence threshold and swipes are blocked — you don't accidentally switch panes while gesturing mid-sentence. In active mode (triggered by holding an open palm for 1.5 seconds), all gestures are active for deliberate navigation.

Camera sharing is handled via v4l2loopback: a virtual camera device at /dev/video2 is fed from the real webcam by a background ffmpeg process. The gesture engine and browser apps both read from the virtual device — no "camera in use" conflicts.

Voice commands

Voice runs in two modes depending on what's available:

  1. faster-whisper (local, offline) via WebSocket at :8770 — runs on-device, no API key, works offline. Uses your AMD ROCm GPU if available.
  2. Web Speech API (browser native) — Chrome/Brave, no setup needed. Falls back to this if the voice daemon isn't running.

The voice vocabulary is two-tier. Navigation commands are handled by gmux itself (never reach the AI):

"next window" / "previous window" → tmux window switch
"accept" / "always" / "deny" → permission response
"new window" → tmux new-window

Everything else is routed as text input to the focused AI pane — effectively typing for you. Say the agent's name first to route to a specific pane: "kalarc, explain the architecture of the voice router."

What makes this different

Most "AI terminal" tools are single-agent. Warp, Cursor, even the new DeepSeek-TUI — they all assume one agent, one session, one task at a time. That's fine for light use. It doesn't scale.

The tools that do handle multiple agents (Multica, LangGraph, AutoGen) operate at the orchestration layer — they assign tasks and track progress. They don't manage the actual terminal experience. They hand tasks to agents and get results back. There's no concept of "I need to look at window 7 right now because the permission light is on."

gmux sits at a different layer entirely — the interaction layer. It doesn't replace orchestration tools like Multica; it runs beneath them, handling the moment-to-moment experience of a human working alongside a fleet of AI agents.

ToolRoleWhat it manages
MulticaOrchestrationWhich agent gets which task
qalcode2/opencodeExecutionRunning the actual AI coding
gmuxInteractionThe human's experience of all of it

Nothing else in this space has gesture control, voice routing, and live AI state awareness combined. That combination is the moat.

Current status (May 2026)

The terminal stack (Layer 1) is fully working. Live state detection, colour-coded status bar, session restore, phone remote — all shipped and available via pip install gmux and paru -S gmux on Arch/CachyOS.

The Tauri desktop app (Layer 2) has working PTY, agent sidebar, and live data flow (14 panes across 5 sessions verified). Gesture overlay is partial — MediaPipe loads but the model currently requires a CDN fetch on first run. Voice is not yet wired in the Tauri app; installer packaging is paused until the app runs end-to-end reliably.

gmux.ai is live at gmux.ai with an early access email list.

gmux is an open project. MIT licensed.
Install: pip install gmux or paru -S gmux

Next · Implementations → ← Back to writing