Running ten coding agents at once is already normal. Knowing which one needs you right now isn't.
Open a terminal today and the question isn't "which file should I edit?"
It's "which of my ten agents is waiting for me to say yes, and which one is quietly burning my API budget on the wrong thing?"
tmux is wonderful. It can hold all ten of them. But every pane looks the same.
Black background. Spinner. Maybe a prompt. You alt-tab through them like checking on a row of kettles,
hoping you catch the one that boiled over before it spills.
gmux is the traffic light layer on top of tmux.
A coloured dot per agent, updated live. Green when it's working. Honey when it's waiting for you.
Clay when it's asking permission to do something risky. You glance, you know, you move on.
workingagent is running tools or streaming
waitingagent is idle, expecting your next message
permissionagent wants to run something — needs you
doneagent finished, you haven't looked yet
idleplain shell, no agent here
Figure 01 — the state legend, read in a glance
Not screen-scraping. Live API.
Most "agent monitors" you've seen just pattern-match the terminal output: looking for a spinner character,
a prompt symbol, the word "error". It works most of the time and breaks the rest.
gmux reads state from the agent's own HTTP API. OpenCode and qalcode2 already expose one per session.
Server-sent events stream the status the moment it changes — no polling lag, no false positives.
When an agent goes from working to needs permission, the dot changes within the same
second the prompt appears in its pane.
claude-14/6
gpt-4o2/4
deploy3/3
sonnet1/3
research5/5
haiku2/5
fish—
gemini0/2
Figure 02 — eight agents, one calm view
What you actually do differently.
With the traffic light in front of you, a few small habits shift. You stop alt-tabbing.
You stop wondering whether the silent pane is thinking or hung. You start treating the row of dots
as a single signal: is anything red? If no, you keep writing. If yes, one keystroke takes you
to the agent that needs you, and one gesture sends you back.
The promise is small: you should never have to ask a terminal "what are you doing?" again.
Everything else in gmux — the gestures, the voice, the phone, the memory layer —
is built on top of this one observation. Once you can see the fleet,
every other interaction stops being a guess.
The terminal stays alive when the gestures crash. The phone stays alive when the desktop sleeps.
Drawing the seams carefully is most of the work.
The first version of gmux was one big process that did everything: tmux orchestration, hand tracking,
voice transcription, phone bridge. When any one piece misbehaved, the whole thing went down,
and so did the work happening in the terminals it was supposed to be helping with.
That was the wrong shape. gmux now sits in three layers that fail independently.
Anything you care about lives in the bottom one; everything else can crash without you noticing.
Layer 01
Terminal · always running
tmux, your agents, the Python services that watch them, the WebSocket bridge that broadcasts state. This layer is the foundation. If everything else dies, your sessions, your work, your AI conversations all survive — exactly as if gmux had never started.
Layer 02
Desktop UI · the window with eyes
The Tauri app with the xterm.js terminal, the gesture canvas, the agent sidebar. Reads from the bottom layer. If the camera glitches or MediaPipe gets confused, you close the window, reopen it, and everything below it is untouched.
Layer 03
Phone · the remote control
A web app served straight from your machine. Same data source, different surface. Lose signal in the lift, the phone reconnects when you come back up. Close the tab, reopen, pick up where you left off.
Figure 03 — three layers, three crash domains
Why we picked Tauri, not Electron.
The desktop app is a Tauri build. Five-megabyte binary, native WebView, real PTY through a tiny Rust shim,
no Node runtime shipped to your laptop. It opens in under a second on the kind of mid-spec Linux box
that AI coding actually happens on, and it doesn't squat 400 MB of RAM just to render a window.
The tradeoff: WebKit on Linux doesn't have every API a Chromium build would. So voice runs through a
separate Python sidecar instead of the Web Speech API, which turned out to be a win — local
Whisper is more accurate, works offline, and doesn't make a network call every time you say something.
Why the camera goes through a broker.
The hardest bug in the early builds was simple: open gmux, your video call breaks.
The webcam is exclusive on Linux — one process owns it at a time. So gmux runs an ffmpeg
broker as a background service. Real camera goes in one end, a virtual camera (via v4l2loopback)
comes out the other. Anyone can read from the virtual device simultaneously: gmux, Brave, OBS, Zoom.
The same broker means the camera-on light goes on once when you log in and stays on,
which is honestly the most polite thing it can do.
A good architecture isn't the one where everything works. It's the one where you can tell exactly what broke.
Most of the design notes for gmux read like that quote in different costumes. The terminal layer doesn't
know the desktop layer exists. The desktop layer doesn't depend on the phone. Each one publishes its
state, each one subscribes to what it needs, and none of them care if the others come and go.
We tried thirty gestures and kept eight. The survivors all answer the same question:
would you do this naturally, without thinking?
Hand-tracking demos look great in a thirty-second clip and feel terrible after thirty minutes.
Your arm gets tired. The gesture you used to scroll is the same one you used to wave at your colleague,
and now the page is at the bottom. By the end of the afternoon you'd rather use the keyboard.
The cure isn't more gestures. It's a small vocabulary that splits between two hands so that
accidental motions on one side never trigger an action on the other.
Your dominant hand navigates. Your other hand issues commands.
Move with the right, decide with the left.
Left · command
Thumbs upapprove permission · or send the voice draft
Point uptoggle voice listening on or off
Three fingersjump to the next agent waiting on you
Open palmopen agent chat for the focused pane
Right · navigate
Pinch & dragscroll, or slide to switch window
Point & holddwell-select the pane under your finger
Pinch releaseclick — confirm the dwell selection
Two-hand spreadchange how many agent columns you see
Figure 04 — the eight gestures that survived testing
The passive mode that saved everything.
Gestures only fire when you're being deliberate. When you're typing — the most common state —
gmux runs in a high-threshold passive mode. Swipes are blocked, gentle motions ignored.
Hold an open palm steady for 1.5 seconds and it switches into active mode, where every gesture
is live.
It sounds small. In practice it's the single feature that took the system from annoying after fifteen minutes
to quietly running for hours.
Voice, when the keyboard is a long way away.
Voice and gesture are partners, not alternatives. Stand at a projector across the room and there's no keyboard
to reach for. You'd rather just say "approve it" or "next red" or
"claude, write the test for that".
A wake word — kalarc by default, configurable — opens the channel.
Transcription happens on-device with faster-whisper, about 400ms end-to-end.
Anything that looks like a navigation command goes to tmux: "next window", "split", "previous".
Anything else gets typed into whichever agent is focused.
A voice command is just a faster way of doing what your fingers were going to do anyway.
Every agent on your desk, live, from a bus seat. Approve permissions with a thumb,
push-to-talk to any agent, watch progress while the kettle boils.
A specific feeling: you've kicked off three long agent runs before leaving the house.
Halfway to the café one of them hits a permission prompt — "can I install this package?" —
and just sits there. The other two finished an hour ago. You don't know any of this because
you're not at your desk.
The fix isn't a cloud product. It's that the desktop you've already paid for has every piece
of information you need; you just can't see it from outside the house. gmux turns your
own machine into the back-end and your phone into the dashboard.
gmux · 5 agents
claude · authworking
deployneeds you
gpt · testswaiting
researchdone
haiku · docsidle
push to talk
Approve from anywhere
The permission prompt that froze your agent shows up on your phone. One tap to allow, one tap to deny. You don't need to be home.
Push to talk
Hold the mic, speak. The transcript goes to whichever agent you've selected on the phone — and gets answered by the model that pane is wired to.
Volume keys cycle agents
Phone in your pocket, screen off — volume down moves to the next pane, volume up is push-to-talk. The page exposes them through the standard media-session API.
Figure 05 — the phone PWA, live from your own machine
Not a cloud account. Your own URL.
The phone UI is just a web page that your desktop serves. No login. No third-party service.
On a home network it's reachable directly; over the open internet, gmux is happy to live behind
Tailscale or a Cloudflare Tunnel — same data, same UI, encrypted end-to-end without any of the
secrets leaving your machine.
If your laptop is off, the dashboard is off. That's the right tradeoff.
Nothing about your agents, your code, your conversations sits on someone else's server waiting
to be subpoenaed or leaked.
The carousel sorts by who needs you.
Open the PWA and the agents are ordered by urgency: anything asking for permission first,
then anything waiting on you, then the ones still working, then the ones already finished.
The top of the list is always where your attention should land. Scroll down only if you're curious.
The phone isn't a smaller terminal. It's a smaller, calmer view of the one you already have.
Every agent pane starts amnesiac by default. gmux-brain gives each one about six hundred tokens
of context — structural, episodic, workspace — before it touches a single file.
Most days the worst part of working with coding agents is the re-explanation.
You spent forty minutes yesterday talking your agent through the architecture, the conventions,
why you rejected the obvious approach. Today you open a fresh pane and it's all gone.
The agent grep-searches blindly through the same files, proposes the same first solution
you already vetoed, and reaches the wrong conclusion in a slightly different way.
gmux-brain is the memory layer that fixes this.
It's not one new database — it's a router that combines three kinds of memory that already exist,
and hands them to every agent pane as inline context the moment it starts.
Structural memory
What the code actually is.
A live graph of what calls what. The "god nodes" everything depends on. The clusters that form
natural modules. Built by a tree-sitter pass over your repo, refreshed every git commit.
asks: "what calls the voice router?" answers: 12 references across 4 files, two of them in tests, one in a dead branch.
Episodic memory
What you decided, and when.
Past sessions, conversations, design notes, the half-finished thought from last Thursday.
Stored as searchable text and as dated facts in a small knowledge graph.
asks: "why are we using faster-whisper?" answers: chosen 2026-02-10, kokoro rejected — 400ms latency was too high.
Workspace memory
What the other agents are doing right now.
Live state from every other pane. Which agent is blocked. Which one just finished and is waiting
to hand off. The cross-pane context that turns ten parallel agents into a real team.
asks: "is anyone else touching the auth module?" answers: pane 3 finished it 14 minutes ago and is waiting for tests.
Figure 06 — three memory layers, one MCP endpoint
One endpoint. The agent doesn't need to know how it works.
The agents in each pane connect to a single MCP endpoint called brain_query.
Ask it a question and a small router decides which layer to consult. Keywords like
"what calls" or "god nodes" route to the structural graph. "Why did we"
or "when was" route to episodic memory. "What's the other pane doing" hits the live
workspace. Ambiguous questions get all three.
No new database, no new LLM. The routing is plain keyword matching — fast, free, and easy to debug
when it gets something wrong.
Six hundred tokens, automatic.
When a pane opens on a project, gmux-brain assembles a context block of about 600 tokens
and hands it to the agent before the first message: a one-page summary from the code graph,
the most relevant decisions from past sessions, and a snapshot of what the rest of the workspace
is doing.
The agent doesn't ask for it. The user doesn't write it. It's just there, the same way a new
teammate gets a handover doc instead of a blank page.
Without gmux-brain, an agent starts with zero tokens of context. With it, every agent starts as if it had been there last week.
gmux is a calm UI layer on top of tmux. Your config still works, your muscle memory still works,
your sessions still survive. We just added eyes, hands, and a phone.
Every few months someone proposes replacing tmux. Rust rewrite. New protocol. Modern terminal.
Every one of those projects spends two years rebuilding a fraction of what tmux already does well,
and the migration cost is borne by people who didn't ask for any of it.
gmux refuses to be that. tmux is the foundation. Your existing config file is honoured.
Your prefix key is whatever you set it to. Your status bar, your keybindings, your tmux-resurrect
setup, your .tmux.conf tweaks from 2019 — none of it changes.
Without gmux
tmux on its own
Black panes. Spinner or prompt. Alt-tab between them looking for the one that needs you.
Permissions block silently. Voice and gestures aren't part of the conversation.
With gmux
The same tmux, with eyes
Coloured dots per pane. A sidebar of agents sorted by who needs you. Gestures to switch.
Voice to direct. A phone to approve from. tmux underneath, untouched.
What we deliberately didn't build.
A new terminal emulator. xterm.js renders the panes inside the desktop app; tmux still does the work below it.
A cloud account. Your data, your machine. Tailscale or a tunnel if you want it remote.
A custom model. Bring your own — Claude, GPT, Gemini, DeepSeek, local Ollama. gmux watches them; it doesn't replace them.
Glasses, headsets, or VR. Webcam gestures and a microphone are enough for the next twelve months. AR glasses come later, when the hardware stops being awkward.
An IDE. If you want one, your editor is one tmux pane away. gmux doesn't have opinions about it.
What's actually shipped versus what's still being built.
Honest about state: the status bar, the sidebar, the gesture engine, the voice daemon, the phone PWA,
the session restore, the camera broker — all working. The Tauri desktop wrapper that ties them
into a single window is the piece we're polishing now. The installer comes after that, because
packaging something that doesn't run cleanly is premature.
If you want to follow along, the email signup on the homepage will tell you when each milestone lands.
We've capped the early access invites so the first round of users gets real attention rather than
a Discord channel and a shrug.
Calm is a design choice. It's not the absence of features — it's the discipline to add only the ones that don't ask for your attention.
Multiple cameras. Scheduled nudges. The next twelve months.
Two things on the near-horizon: a wider gesture field built from more than one webcam,
and agents that wake themselves up at the right moment to check on long-running work.
Everything in pieces 01 through 06 is either shipped or in active development. This piece is different.
It's the public roadmap — what's coming next, in roughly the order we plan to build it.
Some of it is sketched, some of it is prototyped, none of it is finished.
We're publishing it here because the people most likely to use gmux are also the people most likely
to have an opinion about what should come first. If any of these matters to you,
let us know on the interest form and it moves up the list.
Now
What's already working
Status bar, agent sidebar, gesture engine, voice daemon, phone PWA, camera broker, session restore, gmux-brain memory layer.
Next
Polished Tauri desktop · one-command install
A single binary that ties the running backend into a window you can open from your launcher. AUR package, .deb, .AppImage.
Soon
Multi-camera gesture field
Two or more webcams stitched into a single tracking volume. Wider arm range, fewer dead zones, projector-friendly.
Soon
Scheduled prompts & self-triggered check-ins
Agents that wake themselves up at the right moment — to test, to ask, to summarise, to escalate when a long-running task drifts.
Later
AR-glasses gestures
The same gesture vocabulary, hands-free, with display-glass hardware as it becomes properly usable.
Figure 07 — the rough order of what's coming next
Multi-camera · a wider field of view.
Single-camera gesture tracking has a soft ceiling. Stretch your arm fully to one side and your hand
drifts off the visible frame. Step back from the desk to think and the camera loses you entirely.
For a projector or shared-screen setup — two or three metres back from the wall — one webcam isn't enough.
The fix is geometrically simple and operationally annoying. Use two cameras:
one above the screen for desk work, one on the side or ceiling for projector and standing use.
Both feed into MediaPipe; the gmux gesture engine fuses the two coordinate systems into a single
tracking volume so a hand that exits one camera enters the other without a glitch.
01Single camera
Hand stays inside one cone. Step sideways or back, and tracking quietly drops.
02Two cameras, fused
Two cones overlap in the middle. Hands tracked across the full sweep — desk, projector, room.
Figure 08 — one camera versus two, same person, same room
Practically, this also means any old USB webcam earns its keep. You probably have one in a drawer.
A second cheap camera mounted off-axis turns out to be one of the highest-leverage upgrades to a gesture system
— far more impactful than a more expensive single camera.
What changes for the user: the same gesture vocabulary, but you can stand up and walk away from the desk
without losing tracking. Two people can stand in front of a wall-projected gmux and both have working hands.
And on a desk, the awkward "my hand drifted off the right side" failure mode just stops happening.
Scheduled prompts · agents that wake themselves up.
The other half of this piece is about time. Right now, every conversation with an agent is a conversation
you start. You ask, it answers. If the agent's last reply was "I'll let you know when the tests finish,"
that promise is empty — agents don't have a clock, and they certainly don't have your calendar.
Scheduled prompts fix this. You can attach a small set of timed and event-driven triggers
to any pane. When the trigger fires, gmux types a prompt into the agent on your behalf and the answer
surfaces in the sidebar like any other change of state. The agent comes back to you when there's something
worth saying — not because you remembered to check.
Timed
"Every twenty minutes, summarise."
Recurring or one-shot. Cron-style if you want, plain English otherwise. The agent self-prompts at the interval and gives you a rolling status.
Idle
"If quiet for 10 minutes, check in."
The agent has been silent. Maybe stuck, maybe finished, maybe waiting for itself. A gentle self-prompt asks: what's the status? The reply goes into the sidebar.
File change
"When this file changes, review it."
Wired to the filesystem. The agent gets prompted the moment another pane saves the file it's responsible for — automatic code review without you asking.
Command finished
"When tests finish, interpret the result."
A long-running command exits. Whatever it printed is handed to the agent with a prompt: "what does this mean, and what should we do?"
Cross-pane
"When pane 3 finishes, hand off to pane 5."
Pane 3 finishes the auth module. Pane 5 wakes up with that context already in its prompt: "the auth module is done, please write tests for it."
Escalation
"If this agent burns $5, tell me."
Budget, time, or token thresholds. Cross the line and the agent posts a notification to your phone before it spends another dollar.
Figure 09 — six kinds of trigger, all wired through the same scheduler
How it'll actually work.
A trigger is a small piece of YAML or a one-line UI entry, attached to a pane. When it fires, the gmux
bridge types a prompt into that pane's agent — same path that the phone, the voice daemon, and your
own keystrokes already use. There's no special privileged channel; a scheduled prompt is just a normal prompt
that you happen not to be typing in person.
Everything visible to a human is visible in the sidebar: when a trigger fired, what it sent, what the agent
replied. You can edit, pause, or delete any trigger at any time. None of it runs in the cloud — the scheduler
is the same Python daemon that already runs the rest of the workspace.
Without scheduled prompts, an agent forgets you exist the moment it finishes a reply. With them, the agent has a clock — and the discipline to use it.
What we won't do (and why).
Triggers don't auto-approve permissions. A scheduled prompt can ask the agent to do something, but if the agent then needs to write to disk or hit the network, the permission prompt still comes to you. Automation never silently grows new powers.
Triggers don't fan out across machines. Everything is local. If your laptop is asleep, your triggers wait. We're not building a distributed task queue.
Triggers don't replace a human's judgment about what's important. The default is conservative — minutes, not seconds — because the failure mode of a too-eager agent is a much worse experience than a too-quiet one.
Help us pick the order.
Multi-camera and scheduled prompts are roughly tied for "the next big thing." We'd build both, but order matters
when there's a small team and a long list. If you've read this far, you have a preference. Sign up on the
interest form below, and if either feature is the reason you'd use gmux, say so — that's the signal that decides.