Why I Prefer Keyboard-Driven Interfaces

Why I Prefer Keyboard-Driven Interfaces inputdevices

This is not a controversial position to take, I think among the computer programming set. However, I think it deserves a little more of a centralized, rigorous defense, instead of the general appeal to "common sense" and preferences that usually happens. Especially in response to some critiques of this preference which I'm going to respond below.

I prefer keyboard-driven interfaces to mouse-driven interfaces for three predominant reasons.

Throughput

Imagine that instead of having a keyboard to enter text, you're forced to use a point and click onscreen keyboard with your mouse. Now imagine that you have to write this essay.

It's excruciating, right?

At best you'll probably be able to get around 20 wpm out of it (I got 18 when I tried this experiment, and that was with me clicking my heart out trying to go as fast as I could), compared to the vast majority of workers that get more than that, often double or more (I can get 130 on my best day, and 110 on average).

This illustrates just one of the reasons that I prefer keyboard-driven interfaces to mouse-driven ones: keyboards are inherently higher throughput devices than mice are.

To think about throughput of input devices a little more rigorously, if you have a point-and-click interface, you can supply about 24 bits of information at a time by clicking, assuming X,Y coordinates and a 4K screen or assemblage of screens that add up to 4K. Since I can click about 96 targets in about 60 seconds across my quad-HD monitor going as fast as I possibly can to a stressful and strenuous degree, that comes out to around 37 bits of information per second. Meanwhile, each key on the keyboard conveys around 8 bits of information – assuming ASCII, a lot more if we include compose keys or alternative shifting configurations like the Space Cadet had – and let's say that an English word is on average five keys long. Since I can currently type around 130 words per minute if I go all out, that comes out to around 86 bits per second of throughput. Around double!¹

And in fact, that's being too charitable – in the vast majority of point-and-click interfaces, each individual pixel that you can theoretically denote with a mouse is not actually a usefully separate piece of information, because clicking pixels is just much too precise of a movement for the mouse. Instead, pixels are grouped into click areas, such as buttons – usually, many hundreds of pixels at a time. So if we were to be more accurate about this, we'd have to divide that 4K total resolution of the average desktop point and click interface into, say, 28px by 100px (which seems reasonable, and is correct according to the Windows Vista guidelines), meaning that there are actually only about 6,000 unique locations that can be denoted in a graphical user interface (and again, this is being very generous with screen sizes). That's only 12 bits of information per click, meaning we get only around 20 bits of information per second with a point and click user interface. That's a tiny fraction – one fourth – of the throughput of a keyboard!

Why?

It's because, on a keyboard, I have around 80 buttons all within easy reach of my fingers, at much shorter distances than one typically has to move the mouse to reach a button (unless you've turned up your mouse speed so high accuracy is probably significantly sacrificed). It's also because my fingers are much more dexterous and fast than my shoulder, elbow, or even wrist, which are typically what you use to direct a mouse. Additionally, when I'm using a keyboard, I'm able to use all ten fingers at once, allowing me to, for instance, parallelize my button entry, preparing one finger to type a character while the previous finger is typing its character, sort of like processor instruction pipelining, or even enter multiple keys simultaneously (for key chords) thus massively increasing the bit value of my key entry. Whereas, in comparison, the mouse essentially turns your entire arm into one single appendage with perhaps five digits at most, usually only two or three, and those digits can't be used to speed up pointing, only for determining the actions you do when you get there.

Thus you can't really get better at point-and-click: typing speed with the onscreen keyboard, for instance, is monumentally dominated by intrinsic factors like simply how long it takes to move your entire shoulder and arm (since that requires course-grained motor skills and moving a lot of mass, relatively speaking), the distance you have to move your mouse to get from one key to another. Meanwhile, with all the keys a mere (usually small) finger movement away, the biggest hurdle is your dexterity and reaction time (and ability to spell).

Of course, the natural objection here would be that we're not comparing like-for-like: we're not using mice for what they're best at, instead we're using them to emulate keyboards. That's not really true if you think about it, though – we're using mice here for exactly what they're meant to do: activating buttons on the screen in a point-and-click fashion in order to achieve a certain behavior. We're just using them for a task where higher throughput is expected and traditional (text entry) instead of a task where people have learned, thanks to a historical accident of interface design, to expect slower throughput communication with your computer (point-and-click GUIs). Mice aren't really out of their element here, it's just that we typically have higher expectations for text entry compared to other aspects of our user interface, so it feels more like mice are out of their element because you know what you're giving up. If you were someone that's used to using Emacs interfaces like Magit or Dired, you'd feel just as frustrated and slowed down moving from that to a point-and-click GUI as you feel trying to use a point-and-click onscreen keyboard instead of a real one.

Tactile Feedback and Location Invariance

Another thing is that when you're trying to click buttons on the keyboard (assuming that each key on your keyboard corresponds to some functionality in your application, either directly as in Emacs special-mode interfaces or transients/hydras, or with a modifier key as in most applications) you always know exactly where each button is, and your hands are always in the same position (the home row) relative to those buttons, and there's tactile feedback when you're in the right homing row location, and tactile feedback for the layout and separation of the buttons, and for when they've been activated. Thus, if you know what buttons to press, you can very easily do it with very little mental attention at all, even blind or looking at something else. This location invariance and tactile feedback also makes it a lot easier to encode keybindings into muscle memory practically without trying.

This is excruciatingly difficult to do with a mouse. You always have to figure out where you are (searching for that cursor does take time, even for the best-sighted of us), then looking around for where you need to go, and then and only then can you go there, and there's absolutely no tactile feedback for getting there or successfully clicking the button when you do. By involving your vision, this takes a lot more of your attention and processing power, and by lacking tactile feedback, it's much more difficult to encode into your muscle memory. You can get rid of having to wonder where the buttons you want are if you always fullscreen a certain application, put buttons always in the corners of the screen, and always perfectly adjust your mouse to the screen size of whatever device you're on, so the distances you have to move your mouse always stay the same, but even then it will be a lot harder simply for the fact that there's no tactile feedback, and it'd also be really difficult to achieve this kind of consistency with a point-and-click interface embedded into a multitasking operating system and window manager.

There's of course the added wrinkle here that you can press a wrong key on the keyboard, so the identity between pressing a key and successfully performing a certain action isn't perfect, but I would argue that since you have an array of keys that usually have certain meanings, and you can reliably at least press them, that's closer than a single button which changes meaning based on much more fine-grained screen locations, and which you can't reliably move to a given location to give it a certain meaning.

Convenience

This is an obvious one, so I won't spend a lot of time on it. If your job or hobby involves a lot of typing, then your hands are already on the keyboard in order to do that, so keyboard-focused interfaces are a lot more efficient, because instead of having to pick your right or left hand up of the keyboard and put it on the mouse – and involve a fast, but still subconsciously noticeable, mental context switch from keyboarding to mousing – to operate your user interface when you're in the middle of entering text, you can just manipulate your interface "inline."

Memorizability

One of the most common quotes trotted out by opponents of the idea that keyboard-focused interfaces are superior to mouse-focused ones is this one from "Keyboard vs. The Mouse, pt. 1" on Ask Tog:

We’ve done a cool $50 million of R & D on the Apple Human Interface. We discovered, among other things, two pertinent facts:

Test subjects consistently report that keyboarding is faster than mousing.

The stopwatch consistently proves mousing is faster than keyboarding.

This contradiction between user-experience and reality apparently forms the basis for many user/developers’ belief that the keyboard is faster.

People new to the mouse find the process of acquiring it every time they want to do anything other than type to be incredibly time-wasting. And therein lies the very advantage of the mouse: it is boring to find it because the two-second search does not require high-level cognitive engagement.

It takes two seconds to decide upon which special-function key to press. Deciding among abstract symbols is a high-level cognitive function. Not only is this decision not boring, the user actually experiences amnesia! Real amnesia! The time-slice spent making the decision simply ceases to exist.

While the keyboard users in this case feels as though they have gained two seconds over the mouse users, the opposite is really the case. Because while the keyboard users have been engaged in a process so fascinating that they have experienced amnesia, the mouse users have been so disengaged that they have been able to continue thinking about the task they are trying to accomplish. They have not had to set their task aside to think about or remember abstract symbols.

Hence, users achieve a significant productivity increase with the mouse in spite of their subjective experience.

There are a number of problems with this quote. Let's set aside the fact that Tog here is conceitedly referencing a study which we have no access to in order to determine its sample size, methods, or other crucial aspects, seemingly believing that the price tag involved in a study is somehow, some way, an indicator of quality. Let's also choose to ignore the fact that this study was done by a company that had a vested interest in winning over the last hard-core keyboard users because every user convinced that the mouse was superior to the keyboard was a user that was purchasing their ten-thousand dollar machines.

(For another more in depth takedown of Tog's statement here, which goes about it in a more experimental way, check out this.)

The first key problem is that I find it inherently unlikely for someone who's sufficiently familiar with the keybindings of an application to take longer to remember that keybinding and then use the much faster (as we've seen in the previous sections) input method of the keyboard to enter it, than the person who has to visually find the location of the button, move their whole hand off the keyboard, move the mouse to the button, click, then put their hand back on the keyboard again. That's simply painful. Even someone that's not currently typing still has to move the mouse a lot farther, using their arm and shoulder, than they'd have to move their much faster and more dexterous fingers to reach a key, and they've still got to look and see the position of the button (and recall where to look for it) because there's no tactile feedback for GUI buttons the way there is for keyboard shortcuts. Memory is extremely fast, when you're familiar with the thing in question. Hell, in the case of a lot of keybindings there's exactly zero memory involved once you've been using an application long enough – it's pure muscle memory, it happens by reflex.

Which leads me to the core of the problem here: Tog is probably comparing someone who's unfamiliar with the specific keybindings of an application with someone who's equally unfamiliar with its GUI (which would make sense from a naive scientific perspective, and from the perspective that whatever software he actually had them use in the experiment was new to them, since GUI software was just very new in the world at the time and Macintoshes were too expensive for many people to have had them). If he'd actually shared the in-depth data of the study he did, instead of just high-handedly referencing it, of course, we could have checked, and perhaps this assumption would've turned out to be wrong, but given what I've just said I think it's a justified assumption. This means that he's measuring just the start of the skill curve of using an application, not where it will go over time, and generalizing that to everyone. That means what he's essentially measuring isn't a difference of inherent efficiency, but a difference in discoverability and learning curve. And that's absolutely fair – it's much easier to discover and remember the functionality of your average GUI application than it is to discover and remember the functionality of one that relies on largely invisible keyboard shortcuts.

The thing is, though, that this can be remedied. Consider, for instance, the humble Emacs help quick toggle menu:

./emacs4.png

This transforms the need to remember an arbitrary key command into the need to visually find the key command, which puts it on an equal playing field with a GUI button, putting – in my opinion – keyboard shortcuts already on a mostly level playing field with button presses.

This isn't even close to the end of the toolkit we have at our disposal to make text-based interfaces even faster and more discoverable, either. We also have icomplete-vertical (and similar things) in Emacs:

./emacs5.png

Which offers real time fuzzy searching and narrowing of generally intuitively named commands, instead of the cryptic mnemonics of keyboard shortcuts (which it can also show you in case they're faster). With things like orderless, you don't even have to recall the specific order a set of words or subwords appears in a command name – just throw in the first few letters of a couple keywords and you'll probably find it. This style of interface may require typing a few more characters than a key chord, but this interface makes it far easier to discover and remember a much wider variety of commands than the Emacs quick help, and doesn't require pressing complicated modifier key operations, which can be a real slowdown. It can also represent far more commands than a tool bar or action bar ever could, which means it's more directly comparable to point-and-click interfaces like menu bars, which take a massive amount of time to navigate because they're hierarchical, so you have to guess how things have been organized in categories, instead of being able to look for the thing you want directly (and search features for things named in text inherently rely on keyboarding).

And then there's things like the Casual Suite:

https://github.com/kickingvegas/casual/blob/main/docs/editkit.org

and Magit:

Transient interfaces can lend the discoverability of GUI applications to keyboard-focused user interfaces, while also maximizing the efficiency of keyboard-based interfaces by eschewing modifier keys in favor of a modal interface that requires on instant single key-presses.

Thus I think the idea that keyboard-based user interfaces are somehow inherently worse at discoverability or recall just comes from a failure of imagination.

Pointing

What about the "pointing" part of pointing-and-clicking. Surely, surely, mice must be better at this? Right?

Wrong.

This intuition is borne out of the fact that most keyboard-focused user interfaces don't provide good primitives for jumping to specific locations on the screen. They'll often provide the arrow keys, or something equivalent, perhaps a few more motions that move by some larger increments – but all of these options ultimately have one flaw: they require you to move linearly through whatever content or interface you're looking at to find what you want, just in larger or smaller jumps. Even search falls into this: if you've got more than one result on a page, you're going to find yourself moving linearly through a list of results trying to get to the one you actually want, even if you're making some possibly large jumps in the process. It's all O(N) operations, whereas the mouse, more or less, is an O(1) operation. It may take it some time to move somewhere, but it doesn't have to move linearly through a bunch of other content in the way text does. It's also much easier to move the mouse linearly a certain distance than it is to move a cursor that same distance via the arrow keys, because keyboards have a fixed repeat rate that's a lot slower than the refresh rate of the mouse.

However, there are better ways to jump quickly to something. Consider the avy approach, allowing you to say what you want to jump to (a character or string or window) and then instantly jump to it anywhere on the screen within O(log_26(N)) keystrokes, then perform any action on whatever was at that location. This article has more details on advanced usage, but here's a simple demonstration of its capabilities:

yt:zar4GsOBU0g

This can be very advantageously used to navigate even interfaces as complex as browsers, just by attaching each interface element with a letter in a tree of letters the way avy does characters that match your search:

yt:t67Sn0RGK54

So what is the mouse good for then?

There is exactly one circumstance where the mouse's throughput is higher than the keyboard's: when you're dragging, not between two points, which can easily be expressed with a jump, but dragging or otherwise in an interface where the intermediate positions of the mouse matter, and each individual pixel you're pointing at, or thereabouts, is a unique piece of information. In that situation, you've got hundreds of 24 bit values being output each second, according to your mouse's refresh rate, and each one of them matters and makes a difference. Some examples of this kind of situation are:

playing FPS games
drawing
animating
rearranging your view in something like Blender

There are probably dozens more, too, that I'm just not thinking of at the moment.

The thing, though, is that these situations just… aren't that common. They're niche hobbies or professions, usually best served by a different input device than a mouse, like a 3D joystick, or a pen.

So why do I still use the mouse for things?

Sometimes context switching (going from content production to browsing), or picking my hand up from the keyboard, is just kind of nice. It resets my brain, lets my hands do something different for awhile. Given all the above you might think I'm advocating for an all-keyboard-all-the-time interface paradigm and I'm here to tell you that's not true at all. Efficiency is not the only thing that matters, there are other values. Some people can't type very fast, or can't touch type at all, so there's little difference for them. There may be some inherent discoverability benefits to GUIs I didn't think of, as well. And some people may just have a preference for an input method that's a little less efficient, sure, but that they like better – and honestly, that's an absolutely fair tradeoff; it's not like I don't make it rather frequently!