Building gmux — A Developer's Log

The starting point

The problem was embarrassingly simple to state and annoyingly hard to solve. By early 2026, running multiple AI coding agents in parallel had become normal. Not two. Ten, sometimes more. Each one in a tmux window, each one working on a different piece of a different project.

The problem: you can't see which ones need you.

Agent 7 has been sitting on a permission prompt for an hour. Agent 3 finished its task and is waiting for the next one. Agent 5 errored out silently. You have no idea, because tmux shows you text boxes and nothing else.

The obvious fix — check each window manually — doesn't scale. The slightly less obvious fix — pattern-match the terminal output to detect agent state — works until it doesn't (model outputs a spinner character, state flips to "working"). The correct fix required reading the agent's own API.

What already existed

Before a single gmux-specific repo was written, every component of the solution was already sitting in MASTER_PROJECTS/:

qalarc-claude-voice had a complete wake word → TTS → voice response → tmux routing pipeline. The wake word was "Qalarc" and it routed to a single Claude session, but the plumbing was done.

voice-terminal could take spoken words and inject them as keystrokes into any focused terminal pane. Three modes: dictate, chat, speak.

parralax_tracking was already running MediaPipe Face Mesh in the browser and driving interactive canvas art with head position. Not hands, but the same library.

ai-orchestrator had a four-agent pipeline running. Multi-agent wasn't a future thing; it was already working.

The first version of gmux was therefore: link these together, rename the wake word, add hand detection to the face tracking code, write a launcher script. An afternoon's work, maybe.

That was not what happened.

The first real problem · state detection

The quick approach to detecting AI state was terminal output parsing. Read each tmux pane, look for ❯ (the qalcode2 prompt — means waiting), look for spinner characters like ⠋⠙⠹ (means working), look for "Continue?" (means permission). Write the results to a JSON file. Display it.

This worked. Until it didn't. A model outputting a long response with a ❯ in it would flip the indicator to "waiting" mid-stream. A code block with a spinner character meant the pane was "working" when it was actually idle. The approach was inherently fragile — you're guessing at state from visual artifacts rather than reading truth.

qalcode2 exposes /session/status and a Server-Sent Events endpoint at /event. Every state transition — agent started thinking, agent finished, permission prompt fired, agent responded — comes through as a structured event. Subscribing to that stream instead of pattern-matching terminal output made state detection exact and instant.

monitor.py became an SSE subscriber rather than a terminal scraper. One subscription per running qalcode2 instance. State written to /tmp/gmux-pane-state.json on every event. This architecture has held.

The overlay problem

The first visual layer attempt was a transparent floating window — the "ghost" approach, documented in gmux/archive/gmux-ghost-option-a/. A chromeless, click-through browser window, transparent background, gesture canvas drawn over it. Run your terminal normally; the overlay floats above it.

Wayland killed this idea.

On X11, xdotool getwindowgeometry tells you exactly where any window is on screen. A floating overlay can align its tap targets with the terminal's pane borders because it knows the pixel coordinates.

Wayland has no equivalent. Windows are isolated. An overlay cannot know where the terminal is. You can guess based on desktop dimensions and hope, but a 1-pixel error means gesturing at pane 3 selects pane 2. At different DPI settings or window sizes, the error compounds. The architecture requires certainty and Wayland provides none.

The decision was logged formally in DECISIONS.md: Option A (transparent overlay) is abandoned. Build the terminal host (Option B) instead.

Option B · Tauri as the terminal

If the visual layer can't float above the terminal, it has to be the terminal.

Tauri is a Rust framework that creates native desktop windows with a WebKit WebView inside. The plan: use Rust's portable-pty crate to spawn tmux inside the Tauri window via a real PTY, pipe the output to xterm.js in the WebView, and layer the gesture canvas on top of xterm.js.

This solves the overlay problem cleanly. The Tauri app owns the layout exactly — the gesture canvas is a DOM element layered over the terminal element, at known pixel offsets. No guessing. Tap a point in the gesture overlay and the math to find the corresponding tmux pane is exact.

It also solves the Web Speech API problem. The Tauri app uses a Python sidecar process running faster-whisper for offline speech recognition. webkit2gtk on Linux doesn't implement Web Speech API; a Python sidecar running locally does, with better accuracy and no network requirement.

The implementation uses portable-pty (Rust) to spawn tmux new-session -A -s gmux. PTY output becomes a Tauri event pty-data that xterm.js renders. xterm.js keystrokes call invoke('pty_write') which routes through Rust back to the PTY. The terminal is real — not a screenshot, not a widget approximation, not a VTE embed. Full ANSI support, full terminfo, everything tmux supports.

The camera problem

Two processes want the webcam simultaneously: the gesture engine and any browser apps (video calls, Brave's WebRTC for demos). Most camera drivers don't allow multiple readers.

The fix was v4l2loopback — a kernel module that creates a virtual camera device. A background ffmpeg process reads the real webcam (/dev/video0) exclusively and writes to the virtual device (/dev/video2). Everything that wants camera access reads from the virtual device. As many simultaneous readers as needed, no conflicts.

The rule is now enforced in three places in the codebase: main.js, gmux.py, and gesture/engine.py all check which device to use and refuse to touch /dev/video0 directly.

The port collision

Voice was planned to run on a local WebSocket on :8765. When it came time to wire the voice daemon, port 8765 was already occupied — by aria-phone, a separate uvicorn server from a different project. The voice daemon didn't start.

This is logged in DECISIONS.md as an open issue: the bridge ports are :8767 (WS), :8768 (HTTP phone), :8769 (gmux_receiver for qalcode2 push). Port :8765 is marked "DO NOT USE — occupied by aria-phone." The voice daemon needs to be moved to :8770, which was done in gmux-system/backend/voice/gmux_voice_daemon.py, but the Tauri app's voice connection hasn't been updated to match yet.

The installer decision

An installer was written. It checks all dependencies (node, rust, python 3.11, bun, tmux), installs Python requirements, installs Node packages in app/, downloads the MediaPipe hand landmark model into models/, writes a systemd user service for the monitor daemon, and writes a .desktop entry. It works.

On May 12, 2026, the decision was made to freeze it.

The reasoning: an installer that installs something that doesn't run cleanly is worse than no installer. The installer is the first impression. If gmux-ui launches and the sidebar is empty, or the PTY doesn't connect, or the voice toggle does nothing — that's the experience that gets reported, not "well the installation script ran."

Five things need to work before the installer makes sense:

./scripts/launch.sh opens the Tauri app cleanly on a fresh shell
The status sidebar shows live pane state
Spawning an agent from the UI creates a real tmux window with opencode
Permission approve/reject works against a live session
Voice connects and transcribes into the UI

Three of the five are working. Voice has the port issue. The qalcode2 push patch (which eliminates the 2-second polling lag in favour of instant push events) isn't applied yet. When all five are green, the installer resumes and packaging makes sense.

The qalcode2 overlap realisation

A useful clarification that took a while to crystallise: many gmux's headline features are already inside qalcode2 itself. The AI state detection — working, waiting, permission — qalcode2 generates that and shows it in its own UI. Todo tracking is native to qalcode2. Session management is handled by qalcode2.

gmux doesn't replace any of that. It reads it, and presents it across all simultaneously running qalcode2 instances in one view. The framing that resolved the confusion:

qalcode2 = single-agent executor
gmux = multi-agent interaction layer

Every feature in gmux that seems to duplicate qalcode2 is actually just gmux surfacing qalcode2's data at a different scale — across 10 panes instead of 1. The status bar, the todo progress, the session management — these are qalcode2 features made visible for the whole workspace.

The features that are genuinely gmux-only: gesture control, voice routing, phone remote, cross-pane status bar, RAM/process visibility. Those don't exist in qalcode2 because qalcode2 is one pane.

What's working now

As of May 2026, the Python terminal stack is fully working and published:

pip install gmux && gmux --status-only

That command, right now, gives you live AI state in your tmux status bar. It reads qalcode2's API, shows agent states in colour, shows todo progress per window, and handles session restore. No camera required.

The Tauri app (gmux-system/) has working PTY, agent sidebar with 14 panes live, and real data flow confirmed. Gesture overlay is loading but requires the MediaPipe model (currently a CDN fetch). Voice has the port conflict. The app isn't installer-ready but it runs.

What's next

The short list:

Fix voice port — move connection from :8765 to :8770 in the Tauri app
Apply the qalcode2 push patch — eliminates 2-second polling lag, enables instant permission prompts in sidebar
Test MediaPipe model bundling — avoid the CDN fetch by including the 7.5MB model file in the release
Pass all five installer criteria
Ship the installer
Make the GitHub repo public — the current timing, with terminal AI tools trending hard, is a good window

The longer item: decide whether the multi-pane orchestration should eventually fold into qalcode2 as a "workspace" mode, or stay as a separate product. The gesture/voice/phone-remote angle makes it clearly distinct. The status bar angle is borderline. That question gets easier to answer once real users are using it.

The code is at gmux.ai. Install the working parts: pip install gmux.

Next · Ecosystem position → ← Back to implementations