Field Notes

Long reads from the
workshop on gmux.

What we're building, why, and what we're learning as we go. Honest about what works, honest about what doesn't.

01 · The pitch

The traffic light for your AI fleet.

Running ten agents in parallel is already normal. Knowing which one needs you right now isn't. Here's what we built to fix it.

Read piece
02 · The architecture

Three layers that fail independently.

The terminal stays alive when the gestures crash. The phone stays alive when the desktop sleeps. A look at how the seams are drawn.

Read piece
03 · The gesture vocabulary

One hand to navigate, one hand to command.

We tried thirty gestures, kept eight. The ones that survived all answer the same question: would you do this naturally, without thinking?

Read piece
04 · The phone is the remote

Pocket-sized mission control.

Every agent on your desk, live, from a bus seat. Approve permissions with a thumb, push-to-talk to any agent, watch progress while the kettle boils.

Read piece
05 · The memory layer

Agents that remember the last session.

Every agent pane starts amnesiac. gmux-brain gives each one ~600 tokens of structural, episodic and workspace context before it touches a file.

Read piece
06 · What it isn't

Not a tmux replacement.

gmux is a calm UI layer on top of tmux. Your config still works, your muscle memory still works, your sessions still survive. We just added eyes.

Read piece
07 · Coming soon

Multiple cameras. Scheduled nudges. The next twelve months.

Two cameras for a wider gesture field. Agents that wake themselves up to check on long-running work. The roadmap we're building toward.

Read piece
01 Why gmux exists

The traffic light for your AI fleet.

Running ten coding agents at once is already normal. Knowing which one needs you right now isn't.

Open a terminal today and the question isn't "which file should I edit?" It's "which of my ten agents is waiting for me to say yes, and which one is quietly burning my API budget on the wrong thing?"

tmux is wonderful. It can hold all ten of them. But every pane looks the same. Black background. Spinner. Maybe a prompt. You alt-tab through them like checking on a row of kettles, hoping you catch the one that boiled over before it spills.

gmux is the traffic light layer on top of tmux. A coloured dot per agent, updated live. Green when it's working. Honey when it's waiting for you. Clay when it's asking permission to do something risky. You glance, you know, you move on.

working agent is running tools or streaming
waiting agent is idle, expecting your next message
permission agent wants to run something — needs you
done agent finished, you haven't looked yet
idle plain shell, no agent here
Figure 01 — the state legend, read in a glance

Not screen-scraping. Live API.

Most "agent monitors" you've seen just pattern-match the terminal output: looking for a spinner character, a prompt symbol, the word "error". It works most of the time and breaks the rest.

gmux reads state from the agent's own HTTP API. OpenCode and qalcode2 already expose one per session. Server-sent events stream the status the moment it changes — no polling lag, no false positives. When an agent goes from working to needs permission, the dot changes within the same second the prompt appears in its pane.

claude-14/6
gpt-4o2/4
deploy3/3
sonnet1/3
research5/5
haiku2/5
fish
gemini0/2
Figure 02 — eight agents, one calm view

What you actually do differently.

With the traffic light in front of you, a few small habits shift. You stop alt-tabbing. You stop wondering whether the silent pane is thinking or hung. You start treating the row of dots as a single signal: is anything red? If no, you keep writing. If yes, one keystroke takes you to the agent that needs you, and one gesture sends you back.

The promise is small: you should never have to ask a terminal "what are you doing?" again.

Everything else in gmux — the gestures, the voice, the phone, the memory layer — is built on top of this one observation. Once you can see the fleet, every other interaction stops being a guess.

I want this Read piece 02 →
02 How it's put together

Three layers that fail independently.

The terminal stays alive when the gestures crash. The phone stays alive when the desktop sleeps. Drawing the seams carefully is most of the work.

The first version of gmux was one big process that did everything: tmux orchestration, hand tracking, voice transcription, phone bridge. When any one piece misbehaved, the whole thing went down, and so did the work happening in the terminals it was supposed to be helping with.

That was the wrong shape. gmux now sits in three layers that fail independently. Anything you care about lives in the bottom one; everything else can crash without you noticing.

Layer 01

Terminal · always running

tmux, your agents, the Python services that watch them, the WebSocket bridge that broadcasts state. This layer is the foundation. If everything else dies, your sessions, your work, your AI conversations all survive — exactly as if gmux had never started.

Layer 02

Desktop UI · the window with eyes

The Tauri app with the xterm.js terminal, the gesture canvas, the agent sidebar. Reads from the bottom layer. If the camera glitches or MediaPipe gets confused, you close the window, reopen it, and everything below it is untouched.

Layer 03

Phone · the remote control

A web app served straight from your machine. Same data source, different surface. Lose signal in the lift, the phone reconnects when you come back up. Close the tab, reopen, pick up where you left off.

Figure 03 — three layers, three crash domains

Why we picked Tauri, not Electron.

The desktop app is a Tauri build. Five-megabyte binary, native WebView, real PTY through a tiny Rust shim, no Node runtime shipped to your laptop. It opens in under a second on the kind of mid-spec Linux box that AI coding actually happens on, and it doesn't squat 400 MB of RAM just to render a window.

The tradeoff: WebKit on Linux doesn't have every API a Chromium build would. So voice runs through a separate Python sidecar instead of the Web Speech API, which turned out to be a win — local Whisper is more accurate, works offline, and doesn't make a network call every time you say something.

Why the camera goes through a broker.

The hardest bug in the early builds was simple: open gmux, your video call breaks. The webcam is exclusive on Linux — one process owns it at a time. So gmux runs an ffmpeg broker as a background service. Real camera goes in one end, a virtual camera (via v4l2loopback) comes out the other. Anyone can read from the virtual device simultaneously: gmux, Brave, OBS, Zoom.

The same broker means the camera-on light goes on once when you log in and stays on, which is honestly the most polite thing it can do.

A good architecture isn't the one where everything works. It's the one where you can tell exactly what broke.

Most of the design notes for gmux read like that quote in different costumes. The terminal layer doesn't know the desktop layer exists. The desktop layer doesn't depend on the phone. Each one publishes its state, each one subscribes to what it needs, and none of them care if the others come and go.

Read piece 03 → ← Back to 01
03 Gestures, voice, and the question of "natural"

One hand to navigate, one hand to command.

We tried thirty gestures and kept eight. The survivors all answer the same question: would you do this naturally, without thinking?

Hand-tracking demos look great in a thirty-second clip and feel terrible after thirty minutes. Your arm gets tired. The gesture you used to scroll is the same one you used to wave at your colleague, and now the page is at the bottom. By the end of the afternoon you'd rather use the keyboard.

The cure isn't more gestures. It's a small vocabulary that splits between two hands so that accidental motions on one side never trigger an action on the other. Your dominant hand navigates. Your other hand issues commands. Move with the right, decide with the left.

Left · command

Thumbs up approve permission · or send the voice draft
Point up toggle voice listening on or off
Three fingers jump to the next agent waiting on you
Open palm open agent chat for the focused pane

Right · navigate

Pinch & drag scroll, or slide to switch window
Point & hold dwell-select the pane under your finger
Pinch release click — confirm the dwell selection
Two-hand spread change how many agent columns you see
Figure 04 — the eight gestures that survived testing

The passive mode that saved everything.

Gestures only fire when you're being deliberate. When you're typing — the most common state — gmux runs in a high-threshold passive mode. Swipes are blocked, gentle motions ignored. Hold an open palm steady for 1.5 seconds and it switches into active mode, where every gesture is live.

It sounds small. In practice it's the single feature that took the system from annoying after fifteen minutes to quietly running for hours.

Voice, when the keyboard is a long way away.

Voice and gesture are partners, not alternatives. Stand at a projector across the room and there's no keyboard to reach for. You'd rather just say "approve it" or "next red" or "claude, write the test for that".

A wake word — kalarc by default, configurable — opens the channel. Transcription happens on-device with faster-whisper, about 400ms end-to-end. Anything that looks like a navigation command goes to tmux: "next window", "split", "previous". Anything else gets typed into whichever agent is focused.

A voice command is just a faster way of doing what your fingers were going to do anyway.
Read piece 04 → ← Back to 02
04 Your fleet, in your pocket

Pocket-sized mission control.

Every agent on your desk, live, from a bus seat. Approve permissions with a thumb, push-to-talk to any agent, watch progress while the kettle boils.

A specific feeling: you've kicked off three long agent runs before leaving the house. Halfway to the café one of them hits a permission prompt — "can I install this package?" — and just sits there. The other two finished an hour ago. You don't know any of this because you're not at your desk.

The fix isn't a cloud product. It's that the desktop you've already paid for has every piece of information you need; you just can't see it from outside the house. gmux turns your own machine into the back-end and your phone into the dashboard.

Approve from anywhere

The permission prompt that froze your agent shows up on your phone. One tap to allow, one tap to deny. You don't need to be home.

Push to talk

Hold the mic, speak. The transcript goes to whichever agent you've selected on the phone — and gets answered by the model that pane is wired to.

Volume keys cycle agents

Phone in your pocket, screen off — volume down moves to the next pane, volume up is push-to-talk. The page exposes them through the standard media-session API.

Figure 05 — the phone PWA, live from your own machine

Not a cloud account. Your own URL.

The phone UI is just a web page that your desktop serves. No login. No third-party service. On a home network it's reachable directly; over the open internet, gmux is happy to live behind Tailscale or a Cloudflare Tunnel — same data, same UI, encrypted end-to-end without any of the secrets leaving your machine.

If your laptop is off, the dashboard is off. That's the right tradeoff. Nothing about your agents, your code, your conversations sits on someone else's server waiting to be subpoenaed or leaked.

The carousel sorts by who needs you.

Open the PWA and the agents are ordered by urgency: anything asking for permission first, then anything waiting on you, then the ones still working, then the ones already finished. The top of the list is always where your attention should land. Scroll down only if you're curious.

The phone isn't a smaller terminal. It's a smaller, calmer view of the one you already have.
Read piece 05 → ← Back to 03
05 gmux-brain

Agents that remember the last session.

Every agent pane starts amnesiac by default. gmux-brain gives each one about six hundred tokens of context — structural, episodic, workspace — before it touches a single file.

Most days the worst part of working with coding agents is the re-explanation. You spent forty minutes yesterday talking your agent through the architecture, the conventions, why you rejected the obvious approach. Today you open a fresh pane and it's all gone. The agent grep-searches blindly through the same files, proposes the same first solution you already vetoed, and reaches the wrong conclusion in a slightly different way.

gmux-brain is the memory layer that fixes this. It's not one new database — it's a router that combines three kinds of memory that already exist, and hands them to every agent pane as inline context the moment it starts.

Structural memory

What the code actually is.

A live graph of what calls what. The "god nodes" everything depends on. The clusters that form natural modules. Built by a tree-sitter pass over your repo, refreshed every git commit.

asks: "what calls the voice router?"
answers: 12 references across 4 files, two of them in tests, one in a dead branch.

Episodic memory

What you decided, and when.

Past sessions, conversations, design notes, the half-finished thought from last Thursday. Stored as searchable text and as dated facts in a small knowledge graph.

asks: "why are we using faster-whisper?"
answers: chosen 2026-02-10, kokoro rejected — 400ms latency was too high.

Workspace memory

What the other agents are doing right now.

Live state from every other pane. Which agent is blocked. Which one just finished and is waiting to hand off. The cross-pane context that turns ten parallel agents into a real team.

asks: "is anyone else touching the auth module?"
answers: pane 3 finished it 14 minutes ago and is waiting for tests.
Figure 06 — three memory layers, one MCP endpoint

One endpoint. The agent doesn't need to know how it works.

The agents in each pane connect to a single MCP endpoint called brain_query. Ask it a question and a small router decides which layer to consult. Keywords like "what calls" or "god nodes" route to the structural graph. "Why did we" or "when was" route to episodic memory. "What's the other pane doing" hits the live workspace. Ambiguous questions get all three.

No new database, no new LLM. The routing is plain keyword matching — fast, free, and easy to debug when it gets something wrong.

Six hundred tokens, automatic.

When a pane opens on a project, gmux-brain assembles a context block of about 600 tokens and hands it to the agent before the first message: a one-page summary from the code graph, the most relevant decisions from past sessions, and a snapshot of what the rest of the workspace is doing.

The agent doesn't ask for it. The user doesn't write it. It's just there, the same way a new teammate gets a handover doc instead of a blank page.

Without gmux-brain, an agent starts with zero tokens of context. With it, every agent starts as if it had been there last week.
Read piece 06 → ← Back to 04
06 Scope, kept small

Not a tmux replacement.

gmux is a calm UI layer on top of tmux. Your config still works, your muscle memory still works, your sessions still survive. We just added eyes, hands, and a phone.

Every few months someone proposes replacing tmux. Rust rewrite. New protocol. Modern terminal. Every one of those projects spends two years rebuilding a fraction of what tmux already does well, and the migration cost is borne by people who didn't ask for any of it.

gmux refuses to be that. tmux is the foundation. Your existing config file is honoured. Your prefix key is whatever you set it to. Your status bar, your keybindings, your tmux-resurrect setup, your .tmux.conf tweaks from 2019 — none of it changes.

Without gmux

tmux on its own

Black panes. Spinner or prompt. Alt-tab between them looking for the one that needs you. Permissions block silently. Voice and gestures aren't part of the conversation.

With gmux

The same tmux, with eyes

Coloured dots per pane. A sidebar of agents sorted by who needs you. Gestures to switch. Voice to direct. A phone to approve from. tmux underneath, untouched.

What we deliberately didn't build.

What's actually shipped versus what's still being built.

Honest about state: the status bar, the sidebar, the gesture engine, the voice daemon, the phone PWA, the session restore, the camera broker — all working. The Tauri desktop wrapper that ties them into a single window is the piece we're polishing now. The installer comes after that, because packaging something that doesn't run cleanly is premature.

If you want to follow along, the email signup on the homepage will tell you when each milestone lands. We've capped the early access invites so the first round of users gets real attention rather than a Discord channel and a shrug.

Calm is a design choice. It's not the absence of features — it's the discipline to add only the ones that don't ask for your attention.
Read piece 07 → ← Back to 05
07 Coming soon

Multiple cameras. Scheduled nudges. The next twelve months.

Two things on the near-horizon: a wider gesture field built from more than one webcam, and agents that wake themselves up at the right moment to check on long-running work.

Everything in pieces 01 through 06 is either shipped or in active development. This piece is different. It's the public roadmap — what's coming next, in roughly the order we plan to build it. Some of it is sketched, some of it is prototyped, none of it is finished.

We're publishing it here because the people most likely to use gmux are also the people most likely to have an opinion about what should come first. If any of these matters to you, let us know on the interest form and it moves up the list.

Now

What's already working

Status bar, agent sidebar, gesture engine, voice daemon, phone PWA, camera broker, session restore, gmux-brain memory layer.

Next

Polished Tauri desktop · one-command install

A single binary that ties the running backend into a window you can open from your launcher. AUR package, .deb, .AppImage.

Soon

Multi-camera gesture field

Two or more webcams stitched into a single tracking volume. Wider arm range, fewer dead zones, projector-friendly.

Soon

Scheduled prompts & self-triggered check-ins

Agents that wake themselves up at the right moment — to test, to ask, to summarise, to escalate when a long-running task drifts.

Later

AR-glasses gestures

The same gesture vocabulary, hands-free, with display-glass hardware as it becomes properly usable.

Figure 07 — the rough order of what's coming next

Multi-camera · a wider field of view.

Single-camera gesture tracking has a soft ceiling. Stretch your arm fully to one side and your hand drifts off the visible frame. Step back from the desk to think and the camera loses you entirely. For a projector or shared-screen setup — two or three metres back from the wall — one webcam isn't enough.

The fix is geometrically simple and operationally annoying. Use two cameras: one above the screen for desk work, one on the side or ceiling for projector and standing use. Both feed into MediaPipe; the gmux gesture engine fuses the two coordinate systems into a single tracking volume so a hand that exits one camera enters the other without a glitch.

01 Single camera

Hand stays inside one cone. Step sideways or back, and tracking quietly drops.

02 Two cameras, fused

Two cones overlap in the middle. Hands tracked across the full sweep — desk, projector, room.

Figure 08 — one camera versus two, same person, same room

Practically, this also means any old USB webcam earns its keep. You probably have one in a drawer. A second cheap camera mounted off-axis turns out to be one of the highest-leverage upgrades to a gesture system — far more impactful than a more expensive single camera.

What changes for the user: the same gesture vocabulary, but you can stand up and walk away from the desk without losing tracking. Two people can stand in front of a wall-projected gmux and both have working hands. And on a desk, the awkward "my hand drifted off the right side" failure mode just stops happening.

Scheduled prompts · agents that wake themselves up.

The other half of this piece is about time. Right now, every conversation with an agent is a conversation you start. You ask, it answers. If the agent's last reply was "I'll let you know when the tests finish," that promise is empty — agents don't have a clock, and they certainly don't have your calendar.

Scheduled prompts fix this. You can attach a small set of timed and event-driven triggers to any pane. When the trigger fires, gmux types a prompt into the agent on your behalf and the answer surfaces in the sidebar like any other change of state. The agent comes back to you when there's something worth saying — not because you remembered to check.

Timed

"Every twenty minutes, summarise."

Recurring or one-shot. Cron-style if you want, plain English otherwise. The agent self-prompts at the interval and gives you a rolling status.

Idle

"If quiet for 10 minutes, check in."

The agent has been silent. Maybe stuck, maybe finished, maybe waiting for itself. A gentle self-prompt asks: what's the status? The reply goes into the sidebar.

File change

"When this file changes, review it."

Wired to the filesystem. The agent gets prompted the moment another pane saves the file it's responsible for — automatic code review without you asking.

Command finished

"When tests finish, interpret the result."

A long-running command exits. Whatever it printed is handed to the agent with a prompt: "what does this mean, and what should we do?"

Cross-pane

"When pane 3 finishes, hand off to pane 5."

Pane 3 finishes the auth module. Pane 5 wakes up with that context already in its prompt: "the auth module is done, please write tests for it."

Escalation

"If this agent burns $5, tell me."

Budget, time, or token thresholds. Cross the line and the agent posts a notification to your phone before it spends another dollar.

Figure 09 — six kinds of trigger, all wired through the same scheduler

How it'll actually work.

A trigger is a small piece of YAML or a one-line UI entry, attached to a pane. When it fires, the gmux bridge types a prompt into that pane's agent — same path that the phone, the voice daemon, and your own keystrokes already use. There's no special privileged channel; a scheduled prompt is just a normal prompt that you happen not to be typing in person.

Everything visible to a human is visible in the sidebar: when a trigger fired, what it sent, what the agent replied. You can edit, pause, or delete any trigger at any time. None of it runs in the cloud — the scheduler is the same Python daemon that already runs the rest of the workspace.

Without scheduled prompts, an agent forgets you exist the moment it finishes a reply. With them, the agent has a clock — and the discipline to use it.

What we won't do (and why).

Help us pick the order.

Multi-camera and scheduled prompts are roughly tied for "the next big thing." We'd build both, but order matters when there's a small team and a long list. If you've read this far, you have a preference. Sign up on the interest form below, and if either feature is the reason you'd use gmux, say so — that's the signal that decides.

Tell us what you'd use first ← Back to 01