Implementation 0 · the assembled prototype (April 2026)
Before a single gmux repo existed, the pieces were already there — scattered across MASTER_PROJECTS/.
- qalarc-claude-voice — wake word "Qalarc" → TTS output → listens for voice response → routes command to Claude session in tmux. The complete voice loop, already working.
- voice-terminal — speak → keystrokes injected into any focused terminal pane. Three modes: dictate, chat, speak.
- parralax_tracking — MediaPipe Face Mesh driving interactive canvas art via head position. Browser-based, no install.
- ai-orchestrator — four-agent pipeline: Personality → Refiner → Coder (Claude CLI) → Tester. Multi-agent already working.
- local-ai — tmux-style terminal UI for Ollama with sidebar + session management.
The prototype vision was to symlink these into a unified gmux-core/ directory and call it
done. The gesture layer just needed hands added to the face tracking. The voice layer just needed the
wake word renamed. The multi-agent backend was already running.
This worked as a proof of concept but not as a product. There was no status bar. No live AI state detection. No unified config. No way to install it. The pieces existed but weren't integrated.
What it established: The core insight — that all the technology already existed, the work was integration and UX, not invention.
Implementation 1 · Python terminal stack (gmux core)
Install: pip install gmux or paru -S gmux
The first real implementation was pure Python with no desktop window — just a daemon that watches tmux and augments it.
Architecture
monitor.py ← Polls all qalcode2 HTTP APIs, writes state JSON
pane_status.py ← Formats state as tmux status-bar string
bridge.py ← WS :8767 + HTTP :8768 hub
session_restore.py ← Saves and relaunches sessions on restart
gmux_receiver.py ← Receives push events from qalcode2
jump_red.py ← tmux keybinding: jump to next waiting pane
tui.py ← Textual dashboard, service toggles, agent launcher
State detection
The key decision was reading qalcode2's HTTP API rather than pattern-matching terminal output.
The earlier approach — scanning tmux pane content for strings like ❯ (prompt), spinning
cursors (working), Continue? (y/n) (permission) — worked but was fragile. A model
output that happened to contain ❯ would flip the state indicator. The wrong font could
break spinner detection.
qalcode2 exposes /session/status (polling) and /event (SSE stream).
monitor.py subscribes to the SSE stream for every running instance, getting state transitions pushed
instantly rather than inferred. This is why the status bar shows the correct state in real time
rather than lagging behind the visual.
State icons
| Icon | Colour | Meaning |
|---|---|---|
◉ | Green | AI actively working — streaming or running tools |
● | Red | AI idle — waiting for your next message |
! | Orange | Permission needed |
◆ | Blue | Just finished, not yet acknowledged |
─ | Grey | Pane exists, no AI started |
○ | Dim | Plain shell |
✗ | Red blink | Error |
Commands
gmux # Full mode: gesture + voice + status
gmux --status-only # Just the status bar — no camera, no mic
gmux --no-gesture # Voice + status, camera free
gmux tui # Interactive Textual dashboard
gmux restore # Re-launch all agents from saved session
gmux restore --check # Preview what would be restored
gmux status # Quick pane state dump
What this version gets right
- Zero friction entry point.
pip install gmux && gmux --status-onlyworks immediately with no camera, no mic, no Tauri. Just a better tmux status bar. - Graceful degradation.
--status-only,--no-gesture,--phoneflags let you use exactly as much of gmux as you want. - Stability. The terminal stack never crashes the terminal. If bridge.py dies, tmux keeps working. Nothing is load-bearing except tmux itself.
What this version lacks
The terminal stack has no visual pane for the gesture overlay. Point at a tmux pane with your hand and nothing can see you — the terminal doesn't have a camera feed. The gesture engine runs in a separate browser window, creating a fragmented experience.
Implementation 2 · browser overlay (Option A — abandoned)
The first attempt at a visual layer was a transparent floating window that would sit on top of whatever terminal was already running. The idea: a chromeless browser window, transparent background, with the gesture canvas drawn over it. You'd run your normal terminal, and the gesture overlay would float above it.
Why it failed
Wayland. On X11, you can query the position and size of any window via xdotool or xwininfo. A floating overlay can align itself with the terminal's pane borders because it can ask the display server "where exactly is that terminal window, down to the pixel?"
Wayland has no such API. Windows don't know about each other. A floating overlay on Wayland can't know where the terminal panes are, so it can't align gesture tap targets with the actual tmux windows. A 1px difference means tapping pane 3 selects pane 2.
Web Speech API. The voice layer needed speech recognition that ran locally (not sending audio to Google). The plan was using WebKit's Web Speech API with a local engine. webkit2gtk on Linux does not implement Web Speech API at all — the API exists in the DOM but every call silently fails.
The fundamental problem. A floating overlay doesn't own the terminal. It guesses where the terminal is. At any window size, DPI, or scroll position, that guess degrades. The architecture is permanently fragile.
What this version established: A clear requirement — the visual layer needs to own the terminal, not float above it. This led directly to Option B.
Implementation 3 · Tauri desktop app (gmux-ui / gmux-system)
Launch: ./scripts/launch.sh
The correct solution to the overlay problem was to make the terminal window itself the Tauri app. Instead of floating above a terminal, Tauri becomes the terminal.
Architecture
Tauri (Rust + WebKit native window)
├── xterm.js ← Terminal emulator
│ └── PTY (portable-pty Rust crate) ← Connected to tmux
├── Gesture Canvas ← Transparent layer over terminal
│ └── MediaPipe (hand tracking) ← Reads /dev/video2
├── Agent Sidebar ← Live state for all panes
│ └── Rust lib.rs → HTTP :8769 ← Reads monitor.py state
└── Tab Bar ← Per-window state + todo progress
PTY implementation
The terminal works through a real PTY (pseudoterminal), not a terminal emulator widget. Rust spawns
tmux new-session -A -s gmux via the portable-pty crate, pipes PTY output to
a Tauri event pty-data, which xterm.js renders. Keystrokes in xterm.js call
invoke('pty_write') → Rust → PTY → tmux. This is a real terminal — not a screenshot, not
a VTE widget, not an approximation.
The advantage over a VTE-based terminal widget: a canvas element can be layered on top of xterm.js for the gesture overlay. Native terminal widgets don't support arbitrary DOM children.
Gesture canvas
A <canvas> element is positioned over the xterm.js div with
pointer-events: none, so clicks pass through to the terminal. MediaPipe's hand tracking
runs in the WebView, drawing landmark overlays and skeleton lines as the user's hand moves. Gesture
events bubble up to the sidebar and tab bar — a pinch selects the agent card under the fingertip, a
swipe switches the tmux window.
Agent sidebar
The sidebar shows all 14 panes (in the live test environment) sorted by urgency:
! Permission needed → sort first (blocks progress)
● Waiting input → sort second (needs attention)
◉ Working → sort third (fine, just running)
✓ Done → sort fourth
○ Idle → sort last
Each card shows: project name, state indicator, todo progress bar, and an action button (Approve/Reject for permission state, Open Chat otherwise). The sidebar updates from the same SSE stream as the status bar — sub-second latency.
The data source hierarchy
The UI uses a three-tier fallback so it works in any environment:
- Tauri events — fastest path. Rust reads
/tmp/gmux-pane-state.jsonand emits agmux-stateevent to the WebView. - HTTP + SSE from :8769 — if running in a browser without Tauri, polls the monitor daemon directly. An SSE stream is available for real-time updates.
- Mock data — falls back to realistic simulated data if neither is available. The demo at
gmux.ai/demo/uses this mode.
Camera architecture
Two camera devices are in play:
/dev/video0 → Real webcam (exclusive, cam-broker only)
↓
ffmpeg (gmux-cam-broker.service)
↓
/dev/video2 → Virtual loopback (v4l2loopback)
├── gmux gesture engine
├── Brave / Chrome (WebRTC)
└── Python gesture engine (when Tauri not running)
The rule is strict: nothing reads /dev/video0 except the cam-broker. Everything else
reads /dev/video2. This eliminates "camera already in use" errors entirely.
What's working (May 2026)
| Feature | Status |
|---|---|
| PTY → xterm.js terminal | ✅ Working |
| Window switching | ✅ Working |
| Agent sidebar (14 panes, 5 sessions) | ✅ Working |
| Tab todo counts | ✅ Working |
| Bridge WS :8767 | ✅ Working |
| Live data from :8769 | ✅ Working |
| Resize without flicker | ✅ Fixed |
| Gesture overlay canvas | ⚠️ Partial |
| MediaPipe hand model | ⚠️ CDN fetch on first run |
| Voice daemon | ❌ Port conflict (:8765 taken by aria-phone) |
| qalcode2 push patch | ❌ Not yet applied |
| Installer | ⏸️ Paused (waiting for stable Tauri app) |
The installer pause decision
On May 12, 2026, a deliberate decision was logged to stop installer work until the Tauri app passes five criteria:
./scripts/launch.shopens the Tauri app cleanly on a fresh shell- The status sidebar shows live pane state from
:8769 - Spawning an agent via the UI actually creates a new tmux window + opencode
- Permission approve/reject from the UI works against a real session
- Voice connects and transcribes into the UI
Until those five are green, packaging something that doesn't run reliably is premature. The installer exists — it checks deps, installs Python requirements, downloads the MediaPipe model, writes systemd units and a .desktop entry — but it's frozen until the app itself is stable.
Implementation 4 · gmux-brain (memory layer)
gmux-brain is not a separate UI or terminal layer — it's the intelligence layer that makes every agent pane smarter before it starts.
The problem it solves
Without memory, every qalcode2 session starts cold. The agent doesn't know the codebase architecture. It doesn't know decisions made in previous sessions. It doesn't know what the other 13 agents running alongside it are currently doing. The developer has to re-explain context repeatedly, or paste it in manually, or waste the first 10 exchanges getting the agent up to speed.
gmux-brain injects ~600 tokens of structured context into each new agent pane automatically, drawn from three sources:
| Source | Technology | Answers |
|---|---|---|
| Structural | Graphify (AST + NetworkX) | What calls what, god nodes, community structure, architecture |
| Episodic | MemPalace (ChromaDB + SQLite) | Why we made decision X, when Y was changed, what was agreed |
| Workspace | gmux native SSE | What other agents are doing right now |
The MCP server
gmux-brain exposes a single MCP endpoint that the opencode instance in each pane can call:
{
"mcpServers": {
"gmux-brain": {
"type": "stdio",
"command": "python3",
"args": ["/home/fivelidz/projects/gmux-brain/src/router.py"]
}
}
}
Available tools: brain_query, brain_context, brain_graph_query,
brain_memory_search, brain_memory_add, brain_kg_add,
brain_kg_query, brain_status.
Query routing without an LLM
The router dispatches queries to the right memory layer using keyword matching, not an LLM call. This is intentional — routing should be fast and free, not a round-trip to a model:
| Query contains | Routes to |
|---|---|
| "what calls", "god nodes", "architecture", "class", "function" | Graphify |
| "why did we", "when was", "who decided", "history" | MemPalace |
| "other agents", "pane status", "workspace" | gmux state |
| Ambiguous | Both |
Implementation 5 · gmux.ai (the product face)
The landing page is a single HTML file with zero dependencies, deployed on Cloudflare Pages. The vote/email backend is a Cloudflare Worker with KV storage.
The counter
The interest counter on the landing page isn't a raw click count. It uses a formula:
display = real × 5 + floor(log(real+1) × 2.3) + (real×11+3)%4
This produces a number that:
- Is always strictly increasing
- Hovers around 5–5.8x the real click count
- Is never a clean multiplier (so it doesn't look obviously fabricated)
- Drips toward its target at ~2 increments per hour via a scheduled Cloudflare Worker
The audience
The early interest data from the Cloudflare Worker tells an interesting story:
- 21 of 23 visitors are Australian (en-AU locale, Australia/Sydney timezone)
- Mostly iPhone / Safari — mobile users
- 14 mobile vs 8 desktop
This is a local audience reached through the developer's existing network. The product hasn't been posted anywhere public (no HN, no GitHub, no Reddit). Broadening distribution is the next step after the Tauri app stabilises.
The demo
A live demo exists at gmux.ai/demo/ — not yet linked from the main page. It uses mock
data to show the v3 UI running in the browser without needing a real gmux backend. The gesture controls
and visual layout work; agent state is simulated.
Comparing the implementations
| Version | What it is | Status | Unique value |
|---|---|---|---|
| Assembled prototype | Symlinked MASTER_PROJECTS | Concept only | Showed existing pieces could combine |
| Python terminal stack | Daemon + tmux status bar | ✅ Shipped (PyPI/AUR) | Zero-friction entry point, no window needed |
| Browser overlay | Floating transparent window | ❌ Abandoned | Proved Wayland requires owning the terminal |
| Tauri desktop app | Terminal host with sidebar + gesture | 🔄 Pre-release | Correct architecture, gesture canvas, real PTY |
| gmux-brain | MCP memory router | ⚠️ Built, not wired | 600-token context injection per agent pane |
| gmux.ai | Landing + email + demo | ✅ Live | Product identity, early audience |
The honest state of things
The Python terminal stack is real and working. If you want live AI state detection in your tmux
status bar today, pip install gmux && gmux --status-only does it.
The Tauri app is close. PTY, sidebar, and live data are working. Voice and gesture aren't fully wired. The installer is paused on purpose — shipping a bad install experience is worse than shipping nothing. The five criteria for resuming installer work are clear and measurable.
gmux-brain is an interesting idea sitting idle. Wiring graphify + kalarc-memory into it and registering it in opencode.json would be a high-value afternoon's work.
The terminal AI agent space is moving fast. DeepSeek-TUI gained 21,752 GitHub stars in one week in May 2026 — showing the audience is real and hungry. gmux's combination of gesture + voice + phone remote + live AI state is genuinely novel. The window to be first is open.
All implementations: MIT licensed.
Core terminal stack: pip install gmux | paru -S gmux