Eyes and Keys: An Evaluation of Click Alternatives Combining Gaze and Keyboard
With eye gaze tracking technology entering the consumer market, there is an increased interest in using it as an input device, similar to the mouse. This holds promise for situations where a typical desk space is not available. While gaze seems natural for pointing, it is inherently inaccurate, which makes the design of fast and accurate methods for clicking targets (“click alternatives”) difficult. We investigate click alternatives that combine gaze with a standard keyboard (“gaze & key click alternatives”) to achieve an experience where the user’s hands can remain on the keyboard all the time. We propose three novel click alternatives (“Letter Assignment”, “Offset Menu” and “Ray Selection”) and present an experiment that compares them with a naive gaze pointing approach (“Gaze & Click”) and the mouse. The experiment uses a randomized, realistic click task in a web browser to collect data about click times and click accuracy, as well as asking users for their preference. Our results indicate that eye gaze tracking is currently too inaccurate for the Gaze & Click approach to work reliably. While Letter Assignment and Offset Menu were usable and a large improvement, they were still significantly slower and less accurate than the mouse.
KeywordsEye gaze tracking Click alternative Keyboard
Pointing is a natural activity of the human eye: when we look at an object, this is a good indicator that the object is currently occupying our attention; this rule of thumb has become a core principle in research based on gaze tracking. As a result, gaze tracking is a promising technology for natural user interfaces. It does not require learning any new techniques. With eye gaze tracking devices entering the consumer market at prices similar to gaming mice, there is a growing interest among users as well as in the HCI community.
Gaze trackers hold promise for a variety of situations. They have become an established assistive technology for improving accessibility. While gaze tracking has yet to become a widely used technology, it holds promise for work away from the desk, especially with laptops. Laptops can provide a keyboard similar to desktop devices, but they have no adequate surface for a normal mouse, e.g. when used on the lap. Furthermore, the use of a mouse in typical productive work requires users to switch a hand between keyboard and mouse frequently, which incurs a time penalty. The use of gaze for pointing would allow users to keep their hands on the keyboard more persistently. In any case, users must identify a target visually before moving the mouse to click it, which means that eye gaze tracking could lead to more direct and fluent interaction. Gaze tracking technology may also mitigate some of the problems of repetitive strain injury (RSI) related to mouse overuse.
While pointing with the eyes may seem easy from a user’s perspective, eye gaze tracking technology inherently suffers from inaccuracy. First, there are technical challenges of calibration, resolution, tracking volume, changing lighting conditions, variances in the anatomy of the face and eyes, and optical properties of visual aids such as glasses and contact lenses. All these factors make it difficult to achieve a good tracking accuracy for all users. Second, there are physiological limitations: involuntary eye movements such as jitter and drifts, blinks, and the size of our fovea which also provides clear vision of objects that are not exactly on the point of gaze. These challenges and limitations make it difficult to design methods for pointing and clicking UI elements based on gaze (“click alternatives”) that are fast and accurate.
RQ1. How can fast and accurate gaze & key click alternatives be designed?
RQ2. How do gaze & key click alternatives compare with each other and the mouse?
RQ3. Are gaze & key click alternatives mature enough for everyday general use?
Three novel gaze & key click alternatives with formal state machine specifications.
An experimental comparison of gaze & key click alternatives, with insights into the interaction design and general usability of such technologies.
An experimental procedure for the evaluation of click alternatives and an open-source implementation of the aforementioned gaze & key click alternatives.
Section 2 summarizes related work about gaze-based click alternatives. Section 3 describes the design of the proposed gaze & key click alternatives. Section 4 describes the experimental methodology used. Section 5 gives an overview of the results, and Sect. 6 discusses them. Section 7 summarizes conclusions and points out future research directions.
2 Related Work
Eye gaze tracking as a pointing device currently lacks the accuracy required to be used as a simple point & click device [1, 2, 3, 4, 5], for a number of different reasons. Firstly, the fovea of the eye, which is responsible for sharp central vision, covers about one degree of visual angle [4, 6]. This relatively large angle means that it may be difficult for the eye gaze tracker to accurately pinpoint what the user is looking at on the screen, especially if the target is small such as an icon or text. At a distance of 65 cm from the screen, the eye can view an area of about 1.1 cm diameter clearly. Furthermore, our gaze subconsciously drifts or jumps to other points of interest. As a result, it takes a conscious effort from the user to hold the gaze in an area for a length of time . These eye gaze tracking inaccuracies cannot be solved by simply upgrading the hardware; therefore different software solutions have been built to increase the accuracy in pinpointing the user’s gaze.
Zhang et al.  proposed techniques to improve the stability of an eye gaze cursor, using force fields, speed reduction and warping to a target’s center. Force fields act as a kind of magnet for the cursor: the algorithm attempts to deduce the user’s intent and tries to prevent the cursor from veering off target. Cursor speed reduction was found to increase speed and accuracy when using the eye gaze pointer for medium-size targets. Such techniques are useful, but do not improve the accuracy of eye gaze cursors sufficiently for general use.
The most obvious and natural purely gaze-based click alternative is “dwell”, which clicks a target after the gaze dwells on it for a certain time. For simple object selection tasks, dwell can be significantly faster than the mouse . However, while it has been successfully used for specialized UIs such as carefully-designed menus [9, 10], dwell alone is generally insufficient as a general click alternative because it is not accurate enough for small targets. Hardware buttons for clicking seem slightly faster than simple dwell with a typical 0.4 s dwell threshold, but less accurate as people tend to click before the gaze has fully settled on the target . The accuracy can be improved by taking into account system lag and delaying triggers accordingly .
One approach to address the lack of accuracy is to enlarge or zoom in on the general area of the user’s gaze. EyePoint  magnifies the area around the gaze when a hotkey is pressed, and performs a click at the point of gaze in the magnified view when the hotkey is released. The reported click times are fast (below 2 s), but there are problems with accuracy (error rate exceeding 10 %). There are similar techniques relying only on gaze: Zoom Navigator  continuously magnifies the area the user is looking at, until it is clear what the target is and the target is automatically clicked. If correcting movements are made, Zoom Navigator zooms out for a short period before continuing to zoom in again.
Most zooming techniques overlay the area under the cursor with a magnification, so context is lost. There are techniques to mitigate the loss of contextual information, such as fish-eye lenses and offset magnifying glasses. Ashmore et al.  investigated a dwell-activated fish-eye lens with a continuous fish-eye zoom, which preserves but distorts the context around the target. FingerGlass  employs an offset magnifying glass which never covers the zoomed-in area for touch-based interaction. However, when applying offset techniques in gaze-based interaction, one must consider that the offset content will immediately attract the user’s gaze.
Bates et al.  investigated gaze clicking with zooming and found a clear relationship between the target size and the level of magnification used by a user when targeting a small area. Participants would zoom in until the target is just larger than the pre-test measured pointing accuracy of the eye gaze tracker. It was also found that the participants had difficulty maintaining focus on a target during the selection process. The time spent correcting the cursor position on targets was the largest portion of non-productive time spent carrying out the tasks. This emphasizes the need for additional techniques to address the inaccuracy of gaze cursors.
ceCursor  uses transparent directional buttons located around the area the user is looking at. The buttons, which are activated by dwelling on them, can be used to move a cursor. This technique is accurate (even for small targets) but slow, taking on average 11.95 s. Using a keyboard, it would be straightforward to use the directional keys in a similar manner. But while accurate, this would be slow compared to the speed of gaze.
The gaze-based WeyeB browser  uses a combination of dwell and eye gestures for link navigation. Once the user is looking at the desired target, they must flick their eyes upwards and then back downwards to click a link. If multiple links are under the general area of the cursor, a large secondary drop down menu with the different link options is displayed – an alternative to zooming. The combination of dwell and eye gestures solved the “Midas touch” problem, i.e. inadvertent clicking that can occur when using dwell alone. Gaze & key click alternatives generally do not suffer from the Midas touch, as a key can be used to clearly signal a click.
Another method of improving the accuracy of pointing with eye gaze is to use facial movements to refine the cursor position . Four electrodes are placed on the user in order to capture electromyogram (EMG) signals from muscles in the face. The user first looks at the approximate target location, then uses facial movements to incrementally move the cursor, and finally performs click actions using other facial movements. While this increased accuracy to near mouse levels, it was still about four times slower (more than 4 s per click).
Some approaches combine gaze tracking with a physical pointing device. MAGIC  moves the pointer quickly to the gaze position to speed up pointing, using the mouse for finer movements and clicking. The Rake Cursor  shows a grid of multiple mouse pointers simultaneously, moving the whole grid with the mouse and selecting the active pointer in the grid by gaze. It successfully reduces mouse movements as the pointer closest to a target can be used. The Gaze-enhanced User Interface Design (GUIDe)  combines gaze with keyboard and mouse to improve various common tasks.
3 Click Alternative Design
Four gaze & key click alternatives were designed and implemented as follows. The Ray Selection alternative is included here, but was not used in the experiment for reasons outlined later in this section. All click alternatives are designed with a web browser as the basis, so the targets in the following examples are hyperlinks. The click alternatives can also be applied to other types of targets. All click alternatives were implemented in Java using the Webkit1 web browser engine as a basis. They are freely available as open-source software2.
3.1 Gaze & Click
As indicated by the related work, it is very hard to use this click alternative, and this was confirmed in our pilot study. In particular, it was simply too difficult for the users to know if a link is currently underneath the recorded gaze position. It was necessary to add a visual gaze cursor, an orange dot representing the user’s current gaze position. While it may be possible to hide the gaze cursor for larger targets, most textual hyperlinks are simply too small given the typical inaccuracy of gaze tracking. With a visible gaze cursor users are at least aware of the gaze tracking error and can compensate for it by adjusting their gaze.
3.2 Letter Assignment
A white rectangle is drawn behind the overlaid letter to allow easier reading of the letter; since the hyperlinks are often quite close to other text, the overlaid letters could otherwise be hard to make out. A drop shadow is drawn behind the overlaid letters, on top of the white rectangle, to give the illusion of layering; the overlaid letters are on top and the web browser is the background. The user will naturally want to interact with the top-most layer. The color of the letters is kept black (the same as most of the text on the page) to make them less distracting, so they can be ignored more easily if clicking hyperlinks is not the user’s intention.
3.3 Offset Menu
A drop shadow is again drawn behind the menu options to create the illusion of layering; the menu options are on top and the web page is in the background. The selected option is green because the color green affords “going forward”, much like a traffic light. The size of the menu options is large enough to allow for some inaccuracy and imprecision in gaze tracking. The text is centered, drawing the user’s attention to the center of the menu option to make it easier to select.
3.4 Ray Selection
The selected hyperlink’s name is drawn on a semi-opaque white background, so it is possible to read the name even against background text. A drop shadow creates the illusion of layers, similar to Letter Assignment and Offset Menu. The selected hyperlink is highlighted with a red border to make it clear to the user which hyperlink is selected, even if the user is not looking at it directly.
During our pilot study, it became clear that this alternative exhibited several clear disadvantages, to a degree that it was clearly not worth to include it into the main study. We present this alternative in the interest of reporting also negative results, so that others can learn from our experience. Users often found it very difficult to click the desired hyperlink as they were unsure where to look to select a target. In particular, they found it difficult to change the currently selected hyperlink if it was not the desired one. The start point would typically already be close to the target (as in Fig. 7), forcing users to look beyond the target to make the ray point into the right direction despite gaze tracking inaccuracy. Our experience with this alternative illustrates the problems of separating gaze from intention, i.e. of making users look at anything that is not clearly a target.
The usability study was conducted using a within-subjects design to reduce error variance stemming from individual performance differences. The independent variable is the click alternative used to complete the given tasks. The dependent variables measured are “time taken to click link” (click time) and “number of incorrect clicks” (inaccuracy). Ease of use was measured using the System Usability Scale (SUS) .
A 30 inch 144 Hz LCD monitor with a resolution of 1920 × 1080 pixels, a standard QWERTY keyboard and standard mouse with the default Win7 configuration were used. A Tobii X2-30 W eye gaze tracker with a refresh rate of 30 Hz was mounted on a tripod below the monitor in a non-intrusive space. A fully adjustable chair with headrest and armrests allowed participants with various heights to be well within the tracking volume of the gaze tracker, helped keep their head still and be overall comfortable for the duration of the experiment. The room was lit by fluorescent lights, and the blinds were closed to block sunlight from interfering with the eye gaze tracker.
After filling out a pre-experiment demographics questionnaire, the participants were comfortably seated and the chair adjusted to best fit the eye gaze tracker’s usable parameters. Before each click alternative was started, the eye gaze tracker was calibrated using Tobii’s EyeX software, which took 20–30 s. Additional calibration was provided if the participant moved around too much or found the calibration to be too inaccurate. Calibration was then measured using a custom program which logged how close the gaze tracker coordinates were to each of nine on-screen calibration points.
A generic clicking task was used to measure click time and accuracy, in a series of 40 hyperlinks pseudo-randomly chosen from Wikipedia. We chose Wikipedia because it is one of the most visited websites and all the participants had used it in the past. An offline Wikipedia3 was used to ensure all the pages were static and consistent. For each click alternative, participants were allowed as much “free-play” time as they wanted, so they could learn how the click alternative worked properly and get used to navigating Wikipedia pages.
All clicks were logged in a CSV file together with fine-grained events, such as the time the target was found and the time a button was pressed. After each click alternative was tested, a post-task questionnaire was filled out by the participant, which contained the ten SUS questions answered on a five-point Likert-scale. After completing the tasks using all four click alternatives, a post-experiment questionnaire was filled in, which asked the participants to rank the click alternatives from one to four, with one the best. An optional comment section allowed participants to explain their rankings and express their thoughts on each of the click alternatives.
Summary of results (click time in seconds)
Gaze & Click
Click time mean
Click time std. dev.
Click time median
Click time median 95 % CI
SUS score mean
SUS score std. dev.
Another one-way within-subjects ANOVA was conducted to test the effect of the click alternative on the number of correct clicks, showing a very significant effect (F(3, 72) = 9.97, p < 0.0001). Paired samples t-tests with Holm correction were used to make post hoc comparisons between the conditions. Similar to click time, there were significant differences between the numbers of correct clicks of all conditions (p < 0.01) except for Letter Assignment and Offset Menu (p = 0.10). It is clear that Gaze & Click is the least accurate and the mouse is the most accurate click alternative.
“Letter assignment was quick and easy to use. Mouse beats the other two because it was far more accurate”
“Mouse is the one I’m used to. Offset Menu was quick and accurate. Letter Assignment required whole keyboard. Gaze and click was super inaccurate”
“Gaze & click with very accurate eye tracker would outperform the other two”
“Offset: had to literally search the alternatives → cumbersome. Gaze and click: sometimes hard to hit target. Letter assignment: good but change of focus between keyboard and screen not ideal. Mouse: slow movement speed”.
“Letter assignment was a little troublesome actually having to wait for and read the assigned letter, which is annoying when it’s not the first letter”
“Offset menu is predictable and lots of visual feedback”
“Offset menu was easy to use but still seemed like a strain on the eyes with continuous use”.
As expected, the mouse was the fastest and most accurate click alternative, with the best SUS score. This may be in part due to every participant being heavily used to it and its easy learning curve. Gaze & Click has no support for helping users click links accurately; therefore we expected it to be the least accurate click alternative, but not necessarily the slowest. Interestingly, participants often went for one of two different approaches. One was to click links quickly regardless of whether they were sure their gaze position was on top of the right link, and the other was to spend a long time getting their gaze position to be stable on top of the right link before clicking the hotkey. The first approach was very quick, often being faster than the mouse, but the second approach sometimes took upwards of 10 s. Participants were told to “click links as fast and as accurately as possible”, so participants had to decide whether to be fast or to be accurate, as it was clearly not possible to do both.
From observation, participants had trouble deactivating the Offset Menu if none of the options given were correct. There were two causes for this. First, the 200 ms de-selection threshold was not explicitly explained to users beforehand. Second, some users looked too far off-screen, breaking the line of sight with the gaze tracker. This often caused the gaze position to freeze on a menu option. In both cases, the option would still be selected when the user released the key.
From observation, Letter Assignment proved to be difficult for participants because the assigned letter for a link was not always the one they were expecting. For example, two links “Citizenship” and “Countries” were often next to each other and both assigned a letter. “Citizenship” would be assigned the ‘C’ key and “Countries” would be assigned ‘O’. Participants would often click ‘C’ if they wanted to go to “Countries”. Most of the time the assigned letter was the one the participants were expecting, so it would trip them up when it was not.
For both Letter Assignment and Offset Menu, the speed was quite close to the mouse if the target link was among the first links selected (either by displaying the letters or showing the menu). However, if the target link was not immediately selected for whatever reason, then this would at least double the click time: the user would need to de-select and then re-select the options. This was not an issue for Gaze & Click or the mouse, as both of them are modeless and do not need a selection/de-selection process.
How do the presented gaze & key click alternatives compare to other gaze-based click alternatives? It is difficult to compare the results of studies with different methodologies. However, some studies use hyperlink clicking tasks similar to the one presented here, so at least a discussion is possible. The purely gaze-based Multiple Confirm click alternative  seems slower than Letter Assignment and Offset Menu, which is not surprising considering that no hardware buttons are used. Interestingly, Multiple Confirm also seems more accurate, probably because it is harder to click incorrect (and correct) links. EyePoint , which is another gaze & key click alternative, seems faster than Letter Assignment and Offset Menu, but less accurate than Offset Menu. This could be because Offset Menu – in contrast to EyePoint – gives clear feedback about the target that will be clicked.
Eye trackers may well be one of the next types of computer peripherals going mainstream. However, it is still a challenge to create added value off these devices in everyday computing. Using them as a pointing device seems natural, and combining them with a keyboard to create a point-and-click interface may have advantages in situations where the use of a mouse is inconvenient or impossible.
We designed and implemented novel gaze & key click alternatives combining eye gaze tracking and keyboard input (Letter Assignment and Offset Menu), allowing users to click targets on the screen without the mouse. These click alternatives are able to mitigate some of the inaccuracies of eye gaze trackers and the eye, resulting in an improved accuracy when compared to a naive click alternative based on direct gaze pointing and a physical button (Gaze & Click). They are still significantly slower and less accurate than the mouse, however, we believe that with more work they could become realistic mouse replacements for certain situations.
One major issue found during the experiments was calibration: it was frequently necessary to recalibrate the gaze tracker, and many participants found this tiring and time-consuming. As a consequence, the use of methods for automatic or simplified calibration should be considered. Furthermore, there are problems of the proposed click alternatives that should be addressed, e.g. the assignment of unintuitive letters to targets in Letter Assignment and difficulties with the de-selection of potential targets in Offset Menu.
Finally, there are some open questions. For example, in how far did touch typing skills affect the performance of Letter Assignment? What are the long-term effects of using gaze & key click alternatives? How do such click alternatives compare to other pointing devices such as trackpads and touchscreens? We hope to answer some of these questions in future work.
We would like to acknowledge Abdul Moiz Penkar for his work on the web browser implementation, and all our participants without whom this study would not have been possible.
- 1.Porta, M., Ravelli, A.: WeyeB, an eye-controlled web browser for hands-free navigation. In: Proceedings of the Conference on Human System Interactions (HSI), pp. 210–215. IEEE (2009)Google Scholar
- 2.Porta, M., Ravarelli, A., Spagnoli, G.: ceCursor, a contextual eye cursor for general pointing in windows environments. In: Proceedings of the Symposium on Eye-Tracking Research & Applications, pp. 331–337. ACM (2010)Google Scholar
- 3.Skovsgaard, H.: Noise challenges in monomodal gaze interaction. Ph.D. thesis, IT University of Copenhagen, Köpenhamn (2012)Google Scholar
- 4.Bates, R., Istance, H.: Zooming interfaces! enhancing the performance of eye controlled pointing devices. In: Proceedings of the Fifth International ACM Conference on Assistive Technologies, pp. 119–126. ACM (2002)Google Scholar
- 5.Penkar, A.M., Lutteroth, C., Weber, G.: Designing for the eye: design parameters for dwell in gaze interaction. In: Proceedings of the Australian Computer-Human Interaction Conference (OzCHI), pp. 479–488. ACM (2012)Google Scholar
- 6.Wandell, B.A.: Foundations of Vision. Sinauer Associates, Sunderland (1995)Google Scholar
- 7.Zhang, X., Ren, X., Zha, H.: Improving eye cursor’s stability for eye pointing tasks. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI), pp. 525–534. ACM (2008)Google Scholar
- 8.Sibert, L.E., Jacob, R.J.K.: Evaluation of eye gaze interaction. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI), pp. 281–288. ACM (2000)Google Scholar
- 9.Ohno, T.: Features of eye gaze interface for selection tasks. In: Proceedings of the Asia Pacific Conference on Computer Human Interaction (APCHI), pp. 176–181. IEEE (1998)Google Scholar
- 10.Urbina, M.H., Lorenz, M., Huckauf, A.: Pies with eyes: the limits of hierarchical pie menus in gaze control. In: Proceedings of the Symposium on Eye-Tracking Research and Applications (ETRA), pp. 93–96. ACM (2010)Google Scholar
- 11.Ware, C., Mikaelian, H.H.: An evaluation of an eye tracker as a device for computer input. In: Proceedings of the SIGCHI/GI conference on Human Factors in Computing Systems and Graphics Interface (CHI), pp. 183–188. ACM (1987)Google Scholar
- 12.Kumar, M., Klingner, J., Puranik, R., Winograd, T., Paepcke, A.: Improving the accuracy of gaze input for interaction. In: Proceedings of the Symposium on Eye Tracking Research and Applications, pp. 65–68. ACM (2008)Google Scholar
- 13.Kumar, M., Paepcke, A., Winograd, T.: EyePoint: practical pointing and selection using gaze and keyboard. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI), pp. 421–430. ACM (2007)Google Scholar
- 14.Ashmore, M., Duchowski, A.T., Shoemaker, G.: Efficient eye pointing with a fisheye lens. In: Proceedings of Graphics Interface (GI), pp. 203–210. Canadian Human-Computer Communications Society (2005)Google Scholar
- 15.Käser, D.P., Agrawala, M., Pauly, M.: Fingerglass: efficient multiscale interaction on multitouch screens. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI), pp. 1601–1610. ACM (2011)Google Scholar
- 17.Zhai, S., Morimoto, C., Ihde, S.: Manual and gaze input cascaded (MAGIC) pointing. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI), pp. 246–253. ACM (1999)Google Scholar
- 18.Blanch, R., Ortega, M.: Rake cursor: improving pointing performance with concurrent input channels. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI), pp. 1415–1418. ACM (2009)Google Scholar
- 19.Kumar, M., Winograd, T.: GUIDe: gaze-enhanced UI design. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems - Extended Abstracts (CHI EA), pp. 1977–1982. ACM (2007)Google Scholar
- 21.Brooke, J.: SUS: a ‘Quick and Dirty’ usability scale. In: Jordan, P.W., Thomas, B., Weerdmeester, B.A., McClelland, A.L. (eds.) Usability Evaluation in Industry, pp. 189–194. Taylor & Francis, London (1996)Google Scholar