All articles
Articles

From hands to intention: how an interface creeps closer to the thought

A sequel to steering a memory map by hand. Every fix I made was secretly the same move — shortening the distance between what I meant and what happened. Follow that line far enough and you arrive at reading the command before the muscle. With interactive diagrams.

The lesson hiding in a fist

A few weeks ago I gave a 3D memory map a webcam and a pair of hands — point to light up a thought, make a fist to grab the cloud, spread two fists to zoom. The write-up was mostly a story about failure: the obvious gesture (pinch-and-drag, like a touchscreen) was miserable, and the fix was to make the fist — the posture the hand keeps falling into anyway — the verb.

The cortex memory map in its 'distributed' topology — a galaxy of glowing memory nodes linked by faint semantic edges, with a hand-tracking webcam inset and gesture controls along the bottom
Steering the memory map by hand: the same graph as the 3D version, now driven by webcam gestures — point, fist-grab, two-fist zoom.

I framed that at the time as an ergonomics lesson. It is more than that. Looking back, every single fix I made was the same move in disguise: shortening the distance between what I meant and what happened. The pinch failed because it demanded a precise, sustained translation of intent into a fragile posture. The fist won because it sat closer to the intention — almost no translation, just “close hand, the world is now grabbable.”

That is what an interface really is: a translator between an intention and an effect. And the whole craft is making the translation shorter. Once you see it that way, the gesture demo stops being an endpoint and becomes a rung — one step on a ladder that leads somewhere much stranger.

A ladder of coupling

There is a spectrum of how directly a body can drive a machine. At one end, a device you hold and learn. At the other, the signal taken straight from the nervous system. Each rung up shortens the path from intention to effect — and each one charges a different price. Tap through them:

Interactive · the coupling ladder
Directness
Bandwidth
Effort

Schematic, not measured — the point is the shape of the trade-off, not the exact heights. "Directness" = how little translation sits between intention and effect; "effort" is the cost of sustaining it.

Read the ladder top to bottom and a pattern falls out. Going up buys directness — fewer translations between the thought and the result — but the currency you spend changes. Cutting out the device costs effort (your arm has no desk to lean on). Cutting out the muscle costs access (now you are talking about electrodes). Nobody gets directness for free.

Why a good interface disappears

The strangest property of a good interface is that you stop noticing it. Drive a car for a while and you feel the car’s width, not the wheel in your hands; the machine has been folded into your sense of your own body. Cognitive scientists call this the body schema, and it is startlingly elastic — it will absorb a tool within minutes if the tool responds the way a limb does. In the hands demo, this is exactly why pointing works: within a few seconds the glowing node is where your finger is. The fingertip and the cursor have merged.

This is also why precision has a measurable cost. Fitts’s law, the oldest quantitative result in interface design, says the time to hit a target grows with the ratio of its distance to its size — and grows as a logarithm, not a straight line. Small, far things are dear; big, near things are cheap. The fist-grab quietly obeys it: instead of asking the hand to land precisely on a node, I let the map snap to the nearest one. You aim roughly; the system meets you halfway. The interface disappears precisely when it stops demanding accuracy the body cannot comfortably give.

Position, rate, and the runaway cloud

There is a deeper fork under all of this, and it is worth feeling rather than reading. Two ways exist to map a movement onto an effect. In position control, your movement is the movement — shift your hand by this much, the thing shifts by that much, like a mouse. In rate control, your displacement sets a speed — hold it off-centre and the thing keeps going, like a joystick. Drag the handle below and switch modes to feel the difference:

Interactive · position vs rate control
the mapyour hand

Position control is the more direct of the two — your hand and the map move as one, so when you stop, it stops, exactly as your body expects. Its flaw is purely physical: run out of desk (or out of the camera’s view) and you must lift, re-grip, and go again — clutching. Rate control abolishes clutching, because a small held offset can scroll forever — but it inserts an integrator between you and the world. Your hand can be perfectly still while the map sails on, and that quietly violates a reflex so deep we never knew we had it: I stopped, therefore it stopped. This is why, in the hands demo, a joystick-style pan felt like the cloud was “running away on its own,” and a fist that moves the map one-to-one felt like simply holding it. Directness, again — the shorter the chain between your intention and the effect, the more the machine feels like an extension of you rather than a thing you are operating at a distance.

The gesture is still a muscle

Here is the crack that the next step prises wide open. Even the most fluent gesture is, physically, a muscle command. The chain runs: a decision forms, the motor cortex fires, nerves carry the order down, the arm moves, the camera sees the arm, the software reads the camera. By the time the map responds, the original intention has been through five or six translations. The whole game of this article — shortening that chain — has an obvious limit if you keep working outside the body. You can make the hand a better cursor, but it is still a hand.

So ask the heretical question: what if you skipped the muscle entirely, and read the command at its source?

That is precisely what a brain-machine interface does. It does not read thoughts — it taps the motor output in the cortex, the very signal that would have driven the muscle, and reroutes it. (Paralysis, after all, breaks the cable, not the command: the cortex still fires the order, it simply never arrives.) And the part that should give anyone pause is why it works at all. The decoder is never perfect. It works because the brain adapts to the machine — it folds the cursor, or the robotic arm, into the body schema, the same absorption that made a fingertip and a glowing node feel like one thing, now with no hand in the loop. Meanwhile the decoder recalibrates onto the brain. Two learners meeting in the middle:

Co-adaptation · two learners meeting
BrainDecoderintended command →← feedback / correctionprediction error

The coupling is not a plug — it is mutual learning. The error shrinks because both sides move toward each other. That, not the silicon, is what makes a brain and a machine fuse.

It is the same idea as the rubber hand, or the car you feel the width of — only now the “tool” is a cursor moved by thought, and the body schema reaches out to claim it. Engelbart’s old word for this was augmentation: it is neither the bare brain nor the machine alone that acts, but the coupled system. The gesture demo was one honest rung on that ladder; it is just the last one you can build with a webcam.

Where this is going

I started the hands project as a Friday “wouldn’t it be fun,” and it quietly turned into a thesis: an interface is a distance, and the work is closing it. Mouse to gesture closed a little of it. Gesture to gaze would close more. But each of those still ends at a muscle, and the muscle is the floor you hit when you stay outside the body.

The next rung doesn’t. It reads the command upstream of the muscle, at the source — and it raises every hard question this little demo got to wave away: bandwidth, calibration, surgery, consent, and what it even means for a machine to become part of you. That is the next piece. For now: go wave at the map, and notice how quickly your finger and the cursor become the same thing. That merging is the whole story — the rest is just moving the point where it happens closer to the thought.

Further reading

  • RubberEdge: Reducing Clutching by Combining Position and Rate Control (Casiez, Vogel et al.) — the trade-off the second diagram is built on: arXiv
  • Fitts, P. M. (1954), “The information capacity of the human motor system in controlling the amplitude of movement” — the law behind why big, near targets are cheap.
  • MediaPipe Hands (Zhang, Bazarevsky et al., 2020) — the on-device tracking behind the gesture demo: arXiv