gmux.ai/writing/implementations
Research · 02 May 2026 · ~15 min

Every implementation we tried — what worked, what didn't.

gmux didn't arrive fully formed. It went through several distinct implementation approaches as the problem became clearer and the constraints of Linux desktop development asserted themselves. Each version solved something the previous one couldn't.

Implementation 0 · the assembled prototype (April 2026)

Status · concept only

Before a single gmux repo existed, the pieces were already there — scattered across MASTER_PROJECTS/.

The prototype vision was to symlink these into a unified gmux-core/ directory and call it done. The gesture layer just needed hands added to the face tracking. The voice layer just needed the wake word renamed. The multi-agent backend was already running.

This worked as a proof of concept but not as a product. There was no status bar. No live AI state detection. No unified config. No way to install it. The pieces existed but weren't integrated.

What it established: The core insight — that all the technology already existed, the work was integration and UX, not invention.

Implementation 1 · Python terminal stack (gmux core)

Status · shipped · PyPI + AUR

Install: pip install gmux or paru -S gmux

The first real implementation was pure Python with no desktop window — just a daemon that watches tmux and augments it.

Architecture

monitor.py          ← Polls all qalcode2 HTTP APIs, writes state JSON
pane_status.py      ← Formats state as tmux status-bar string
bridge.py           ← WS :8767 + HTTP :8768 hub
session_restore.py  ← Saves and relaunches sessions on restart
gmux_receiver.py    ← Receives push events from qalcode2
jump_red.py         ← tmux keybinding: jump to next waiting pane
tui.py              ← Textual dashboard, service toggles, agent launcher

State detection

The key decision was reading qalcode2's HTTP API rather than pattern-matching terminal output. The earlier approach — scanning tmux pane content for strings like (prompt), spinning cursors (working), Continue? (y/n) (permission) — worked but was fragile. A model output that happened to contain would flip the state indicator. The wrong font could break spinner detection.

qalcode2 exposes /session/status (polling) and /event (SSE stream). monitor.py subscribes to the SSE stream for every running instance, getting state transitions pushed instantly rather than inferred. This is why the status bar shows the correct state in real time rather than lagging behind the visual.

State icons

IconColourMeaning
GreenAI actively working — streaming or running tools
RedAI idle — waiting for your next message
!OrangePermission needed
BlueJust finished, not yet acknowledged
GreyPane exists, no AI started
DimPlain shell
Red blinkError

Commands

gmux                  # Full mode: gesture + voice + status
gmux --status-only    # Just the status bar — no camera, no mic
gmux --no-gesture     # Voice + status, camera free
gmux tui              # Interactive Textual dashboard
gmux restore          # Re-launch all agents from saved session
gmux restore --check  # Preview what would be restored
gmux status           # Quick pane state dump

What this version gets right

What this version lacks

The terminal stack has no visual pane for the gesture overlay. Point at a tmux pane with your hand and nothing can see you — the terminal doesn't have a camera feed. The gesture engine runs in a separate browser window, creating a fragmented experience.

Implementation 2 · browser overlay (Option A — abandoned)

Status · parked

The first attempt at a visual layer was a transparent floating window that would sit on top of whatever terminal was already running. The idea: a chromeless browser window, transparent background, with the gesture canvas drawn over it. You'd run your normal terminal, and the gesture overlay would float above it.

Why it failed

Wayland. On X11, you can query the position and size of any window via xdotool or xwininfo. A floating overlay can align itself with the terminal's pane borders because it can ask the display server "where exactly is that terminal window, down to the pixel?"

Wayland has no such API. Windows don't know about each other. A floating overlay on Wayland can't know where the terminal panes are, so it can't align gesture tap targets with the actual tmux windows. A 1px difference means tapping pane 3 selects pane 2.

Web Speech API. The voice layer needed speech recognition that ran locally (not sending audio to Google). The plan was using WebKit's Web Speech API with a local engine. webkit2gtk on Linux does not implement Web Speech API at all — the API exists in the DOM but every call silently fails.

The fundamental problem. A floating overlay doesn't own the terminal. It guesses where the terminal is. At any window size, DPI, or scroll position, that guess degrades. The architecture is permanently fragile.

What this version established: A clear requirement — the visual layer needs to own the terminal, not float above it. This led directly to Option B.

Implementation 3 · Tauri desktop app (gmux-ui / gmux-system)

Status · in progress · backend working, Tauri app pre-release

Launch: ./scripts/launch.sh

The correct solution to the overlay problem was to make the terminal window itself the Tauri app. Instead of floating above a terminal, Tauri becomes the terminal.

Architecture

Tauri (Rust + WebKit native window)
├── xterm.js                          ← Terminal emulator
│     └── PTY (portable-pty Rust crate)  ← Connected to tmux
├── Gesture Canvas                    ← Transparent layer over terminal
│     └── MediaPipe (hand tracking)   ← Reads /dev/video2
├── Agent Sidebar                     ← Live state for all panes
│     └── Rust lib.rs → HTTP :8769    ← Reads monitor.py state
└── Tab Bar                           ← Per-window state + todo progress

PTY implementation

The terminal works through a real PTY (pseudoterminal), not a terminal emulator widget. Rust spawns tmux new-session -A -s gmux via the portable-pty crate, pipes PTY output to a Tauri event pty-data, which xterm.js renders. Keystrokes in xterm.js call invoke('pty_write') → Rust → PTY → tmux. This is a real terminal — not a screenshot, not a VTE widget, not an approximation.

The advantage over a VTE-based terminal widget: a canvas element can be layered on top of xterm.js for the gesture overlay. Native terminal widgets don't support arbitrary DOM children.

Gesture canvas

A <canvas> element is positioned over the xterm.js div with pointer-events: none, so clicks pass through to the terminal. MediaPipe's hand tracking runs in the WebView, drawing landmark overlays and skeleton lines as the user's hand moves. Gesture events bubble up to the sidebar and tab bar — a pinch selects the agent card under the fingertip, a swipe switches the tmux window.

Agent sidebar

The sidebar shows all 14 panes (in the live test environment) sorted by urgency:

! Permission needed   → sort first (blocks progress)
● Waiting input       → sort second (needs attention)
◉ Working             → sort third (fine, just running)
✓ Done                → sort fourth
○ Idle                → sort last

Each card shows: project name, state indicator, todo progress bar, and an action button (Approve/Reject for permission state, Open Chat otherwise). The sidebar updates from the same SSE stream as the status bar — sub-second latency.

The data source hierarchy

The UI uses a three-tier fallback so it works in any environment:

  1. Tauri events — fastest path. Rust reads /tmp/gmux-pane-state.json and emits a gmux-state event to the WebView.
  2. HTTP + SSE from :8769 — if running in a browser without Tauri, polls the monitor daemon directly. An SSE stream is available for real-time updates.
  3. Mock data — falls back to realistic simulated data if neither is available. The demo at gmux.ai/demo/ uses this mode.

Camera architecture

Two camera devices are in play:

/dev/video0  →  Real webcam (exclusive, cam-broker only)
                     ↓
              ffmpeg (gmux-cam-broker.service)
                     ↓
/dev/video2  →  Virtual loopback (v4l2loopback)
                     ├── gmux gesture engine
                     ├── Brave / Chrome (WebRTC)
                     └── Python gesture engine (when Tauri not running)

The rule is strict: nothing reads /dev/video0 except the cam-broker. Everything else reads /dev/video2. This eliminates "camera already in use" errors entirely.

What's working (May 2026)

FeatureStatus
PTY → xterm.js terminal✅ Working
Window switching✅ Working
Agent sidebar (14 panes, 5 sessions)✅ Working
Tab todo counts✅ Working
Bridge WS :8767✅ Working
Live data from :8769✅ Working
Resize without flicker✅ Fixed
Gesture overlay canvas⚠️ Partial
MediaPipe hand model⚠️ CDN fetch on first run
Voice daemon❌ Port conflict (:8765 taken by aria-phone)
qalcode2 push patch❌ Not yet applied
Installer⏸️ Paused (waiting for stable Tauri app)

The installer pause decision

On May 12, 2026, a deliberate decision was logged to stop installer work until the Tauri app passes five criteria:

  1. ./scripts/launch.sh opens the Tauri app cleanly on a fresh shell
  2. The status sidebar shows live pane state from :8769
  3. Spawning an agent via the UI actually creates a new tmux window + opencode
  4. Permission approve/reject from the UI works against a real session
  5. Voice connects and transcribes into the UI

Until those five are green, packaging something that doesn't run reliably is premature. The installer exists — it checks deps, installs Python requirements, downloads the MediaPipe model, writes systemd units and a .desktop entry — but it's frozen until the app itself is stable.

Implementation 4 · gmux-brain (memory layer)

Status · built, not yet wired into opencode.json

gmux-brain is not a separate UI or terminal layer — it's the intelligence layer that makes every agent pane smarter before it starts.

The problem it solves

Without memory, every qalcode2 session starts cold. The agent doesn't know the codebase architecture. It doesn't know decisions made in previous sessions. It doesn't know what the other 13 agents running alongside it are currently doing. The developer has to re-explain context repeatedly, or paste it in manually, or waste the first 10 exchanges getting the agent up to speed.

gmux-brain injects ~600 tokens of structured context into each new agent pane automatically, drawn from three sources:

SourceTechnologyAnswers
StructuralGraphify (AST + NetworkX)What calls what, god nodes, community structure, architecture
EpisodicMemPalace (ChromaDB + SQLite)Why we made decision X, when Y was changed, what was agreed
Workspacegmux native SSEWhat other agents are doing right now

The MCP server

gmux-brain exposes a single MCP endpoint that the opencode instance in each pane can call:

{
  "mcpServers": {
    "gmux-brain": {
      "type": "stdio",
      "command": "python3",
      "args": ["/home/fivelidz/projects/gmux-brain/src/router.py"]
    }
  }
}

Available tools: brain_query, brain_context, brain_graph_query, brain_memory_search, brain_memory_add, brain_kg_add, brain_kg_query, brain_status.

Query routing without an LLM

The router dispatches queries to the right memory layer using keyword matching, not an LLM call. This is intentional — routing should be fast and free, not a round-trip to a model:

Query containsRoutes to
"what calls", "god nodes", "architecture", "class", "function"Graphify
"why did we", "when was", "who decided", "history"MemPalace
"other agents", "pane status", "workspace"gmux state
AmbiguousBoth

Implementation 5 · gmux.ai (the product face)

Status · live

The landing page is a single HTML file with zero dependencies, deployed on Cloudflare Pages. The vote/email backend is a Cloudflare Worker with KV storage.

The counter

The interest counter on the landing page isn't a raw click count. It uses a formula:

display = real × 5 + floor(log(real+1) × 2.3) + (real×11+3)%4

This produces a number that:

The audience

The early interest data from the Cloudflare Worker tells an interesting story:

This is a local audience reached through the developer's existing network. The product hasn't been posted anywhere public (no HN, no GitHub, no Reddit). Broadening distribution is the next step after the Tauri app stabilises.

The demo

A live demo exists at gmux.ai/demo/ — not yet linked from the main page. It uses mock data to show the v3 UI running in the browser without needing a real gmux backend. The gesture controls and visual layout work; agent state is simulated.

Comparing the implementations

VersionWhat it isStatusUnique value
Assembled prototypeSymlinked MASTER_PROJECTSConcept onlyShowed existing pieces could combine
Python terminal stackDaemon + tmux status bar✅ Shipped (PyPI/AUR)Zero-friction entry point, no window needed
Browser overlayFloating transparent window❌ AbandonedProved Wayland requires owning the terminal
Tauri desktop appTerminal host with sidebar + gesture🔄 Pre-releaseCorrect architecture, gesture canvas, real PTY
gmux-brainMCP memory router⚠️ Built, not wired600-token context injection per agent pane
gmux.aiLanding + email + demo✅ LiveProduct identity, early audience

The honest state of things

The Python terminal stack is real and working. If you want live AI state detection in your tmux status bar today, pip install gmux && gmux --status-only does it.

The Tauri app is close. PTY, sidebar, and live data are working. Voice and gesture aren't fully wired. The installer is paused on purpose — shipping a bad install experience is worse than shipping nothing. The five criteria for resuming installer work are clear and measurable.

gmux-brain is an interesting idea sitting idle. Wiring graphify + kalarc-memory into it and registering it in opencode.json would be a high-value afternoon's work.

The terminal AI agent space is moving fast. DeepSeek-TUI gained 21,752 GitHub stars in one week in May 2026 — showing the audience is real and hungry. gmux's combination of gesture + voice + phone remote + live AI state is genuinely novel. The window to be first is open.

All implementations: MIT licensed.
Core terminal stack: pip install gmux | paru -S gmux

Next · The devlog → ← Back to overview