Advertisement

A Simple Method to Record Keystrokes on Mobile Phones and Other Devices for Usability Evaluations

  • Brian T. LinEmail author
  • Paul A. Green
Conference paper
  • 4.6k Downloads
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9746)

Abstract

Task times, sometimes at the keystroke level, as well as the number of keystrokes are often used to assess the usefulness and ease of use of mobile devices. This paper describes a new method to obtain keystroke-level timing of tasks involving Virtual Network Computing (VNC) and Techsmith Morae (commonly used for usability tests). Running VNC, the PC mimics what the mobile device does, which is recorded by Morae running on the PC. To evaluate this configuration, 24 pairs of subjects texted 1,200 messages concerning five topics to each other. Every keystroke was recorded and timed to the nearest 10 ms. As desired, the communications were quite stable, with times of 90 % of the messages on the two recording computers being normally distributed and within ± 100 ms of each other. Others are encouraged to use this method, given its accuracy and low cost, to examine mobile device usability.

Keywords

Keystroke-logging Usability testing Mobile devices Virtual network computing (VNC) 

1 Introduction

Mobile devices are widely popular because they are convenient and easy to use. Ease of use can be assessed using many measures and statistics, though the most straightforward measure is task time [1] and number of keystrokes. Previous studies have reported methods to record when and which key is pressed at any given time to determine inter-keystroke intervals for mobile devices [2, 3, 4]. However, more specific and easier to implement methods are needed. Until now, usability assessments of these devices have been limited in number because collecting accurate methods to record the desired data is a challenge.

Basically, task time can be determined using (1) a real-time manual timing system operated by an observer, (2) specialized hardware, (3) video records analyzed after the fact (post processed frame by frame), or (4) usability evaluation software embedded in the device being tested (e.g. Remote User Interface, RUI [5], LetterWise [6], Uzilla [7]).

(1) Manual real-time data collection has a long history of use, especially within industrial engineering, typically using stopwatches. However, it is time consuming and events that occur in rapid succession cannot be reliably timed manually. (2) For human-computer interaction, specialized data collection devices inserted between the key and computer (e.g. KeyGhost for PS/2 keyboards, KeyCarbon for USB keyboards) and computer software (Windows APIs, Mac Keylogger by REFOG), can be used to surreptitiously record all keystrokes, mouse actions, and screen updates. By design, most variations of this method will not work for mobile devices. (3) On the surface, post-processing of video records seems to be an easy method to collect the desired data. This approach is not cost effective, and the data analysis is tedious. (4) Finally, one could use specialized evaluation software that runs in parallel with the system being tested. This software may be specifically intended for human-computer interaction evaluations.

Accordingly, tailored approaches for mobile devices have been developed. These approaches involve universal applications such as a small J2ME, a program that can be installed into the device very quickly [8] to log the time of key presses and releases and save them in a file (ASCII, Unicode) on the device. The program is running in the background, independent of other processes. After the data collection phase, the experimenter can export, reduce, and analyze the data. This software may add to the device processor load and memory requirements (leaving insufficient capacity for the applications being assessed), and may affect timing accuracy (adding in variable delays) [9]. Furthermore, these programs often do not provide for some type of independent real time output, in particular a signal that can be used by other software to synchronize related information (e.g. physiological data, other task performance data). Some applications do not encrypt the data, so there may be security concerns.

To overcome mobile device limitations, the mobile device could be emulated on a personal computer, mapping the key/icon assignments (in Flash, JAVA, or XML) to those typical on the PC [6]. The data collection program can be installed in a powerful PC, run in the background [5], and can also combine with other process, such as eye-tracking systems [10]. The PC interface, however, is not what mobile device people use, and the key/icon assignments are device dependent. For example, users will feel odd if they are asked to use a PC numeric keypad instead of the telephone keypad, because the shape, size, layout, and force feedback of the two keypads are very different. Betiol and Cybis [11] clearly indicated that the emulator setup may affect the validity of the usability problems on the device.

Furthermore, these methods can require more software knowledge than those conducting usability evaluations may have, and can exceed their limited budgets and time. Therefore, this paper describes a new method to collect user task times using a remote protocol (Virtual Network Computing, VNC) and usability evaluation software (Morae v.3.2, TechSmith). A sample experiment conducted using this method is described.

2 New Method Description

2.1 Overview of the VNC-Morae Method

A novel method was developed in which VNC (Virtual Network Computing) protocol was installed in a mobile device (a smartphone) that was connected to a PC running the usability evaluation software (Morae v3.2). Because the application involved texting, Skype was used as the platform for data exchange between mobile devices. Currently, the only limitation of this method is that the mobile device must run on iOS, Android, and BlackBerry operating systems, which constitute the majority of phones sold (94.1 % in Q4 2012, [12]).

By far, Morae is the most popular application for collecting usability data. The software suite consists of a recorder, an observer, and a manager. The Morae Recorder is installed in the subject’s computer and records every event (keystroke, mouse action, screen refresh) occurring, along with input from a web-camera (showing the user’s face or hands) and a microphone (what the user says). Accuracy can reliably be determined to the nearest 10 ms. Morae Observer, running on another computer connected to the Recorder via a local area network (LAN) or Internet, allows a remote observer to see the user’s screen, hear what they say, and see their face as a picture-in-picture (PIP) in real time. Using the Observer software, an experimenter can mark when tasks start and stop, log errors, and enter comments about tasks, and identify segments for highlight clips. The experimenter’s inputs are synchronized with subjects’ events in the Recorder computer.

Morae Manager shows all recorded data on a single timeline (Fig. 1). In this example (Fig. 1a), the timeline stops at 0h:27m:28s.90 (entire clip duration is 1h:29m:44s.15) that the message “unless that waitress eats them all” was just sent. Two self-triggered markers (triangles, green at the left and red at the right) represent the beginning and end of the task of interest. Corresponding keystrokes are shown (Fig. 1b).
Fig. 1.

Windows in Morae Manager interface

2.2 Virtual Network Computing - The Connection Between Mobile Device and Computer

Virtual network computing (VNC) involves using the Remote Frame Buffer (RFB) protocol to remotely control another computer. Color values for pixels on the screen stored in the memory buffer are transferred through RFB. VNC consists of a server, a client (or viewer), and a protocol between them. The machine running the server feeds the signal to the client via RFB protocol, so the client has the same desktop image as the server. As the controller, the client can submit commands and interact with the server.

As a default, the communication uses the TCP port 5900 for connections to the Internet through a broadband connection or LAN. RealVNC, a particular implementation, uses the high-strength AES (Advanced Encryption Standard) data encryption to improve the data transfer security.

2.3 Data Transfer Platform – The Connection Between Computers

Given that user’s operations on the smartphone are connected to the computer with VNC server and Morae Recorder running, the data transfer is triggered by the mobile device, but executed by the computer. In other words, the communication between mobile devices can be treated as between computers, which is much easier. The communication can be over a local area network (LAN) or the Internet. Using a LAN eliminates network traffic jams and provides a secure connection.

2.4 Hardware and Software Configuration

Figure 2 shows the configuration used for this test case that supported two smartphone users sending text messages to each other, whose keystrokes and inter-keystroke intervals are recorded. The smartphones were connected to a wireless router via WiFi and connected to the computers with Ethernet, forming two local network areas (left and right halves of Fig. 2) that connected to the Internet via the wireless router. A single router was used in this instance because most of the time data streams exchanged between the two sides did not occur simultaneously. Using one router also reduced the amount of hardware required, but two could be used. The wireless router is very important in the entire process because it establishes the link between the smart phone and computer, and gives the computers access to the Internet.
Fig. 2.

Manual text entry configuration (The numbers 1–8 represent the steps in each communication cycle).

In summary, what the users see on their smartphone (running the VNC Client) is actually from the PC, which is showing the Skype interaction and running Morae on the background. Most of the computationally intensive software is running on the PC and the smartphone serves as an input device.

3 Case Study: Keystroke-Level Accurate Timing of Text Messages Between Smartphones

3.1 Overview

This configuration was used to study users sending text messages to each other [13]. Of interest was both the linguistic content of the message [14] and the keystroke level timing of user inputs. In this experiment, subjects sent text messages on five different topics that were the same for all subjects. About five minutes was spent on each topic, after which the interaction ended and subjects moved on to a new topic. To provide realism of a driving scenario, this experiment took place in a driving simulator. Because texting while driving is illegal, all manual entry by drivers was done with the vehicle parked. The goal of this case study is to examine the protocol of real-time keystroke logging between two smartphones for the usability test, comparing the timelines recorded on the two sides.

3.2 Participants

Twenty-four pairs of young smartphone users who regularly sent text messages to each other participated, an important distinction from other research. Each subject was paid $50 for approximately two hours of his/her time. One subject sat in the driver’s seat of the driving simulator (http://www.umich.edu/~driving/facilities/sim.html), and the other subject was elsewhere out of sight of the driver (sitting at desk in an office), as is commonly the case. Subject pairs were equally drawn from two groups, late teens (aged 18–19, mean = 19) and young adults (aged 20–29, mean = 23). Among all drivers, these are the two groups who are most likely to text and drive. Because their relationship could affect their texting interaction, each group of twelve had three male pairs, three female pairs, and six mixed gender pairs. All participants were friends, classmates, or colleagues, but not relatives to avoid relationships that could alter message content.

3.3 Equipment, Materials, and Software

The ideal situation would be for subjects to have used their own phone. Android users used their own phones with the RealVNC Client downloaded (free) and installed from Android Market. Iphone users used two iPhone 4’s provided by the UMTRI Driver Interface Group onto which RealVNC had been downloaded before the experiment to save time. To simulate the BlackBerry experience, those users were provided iPhones with an attached hard key QWERTY keyboard (Keyboard Buddy iPhone 4 case, Boxwave Co.). The screen size of the BlackBerry was too small (usually about 2.5 in) to accommodate all the information needed by the VNC Server.

Morae Recorder (version 3.2) running independently on two Windows XP computers that recorded the keys pressed and the inter-keystroke interval to the nearest centisecond. Also, the timestamps of when messages were received were collected. A wireless router (Netgear WNDR3400, N600 wireless dual band, 300 Mbps x 2) was connected to the two mobile devices (via WiFi). One of the WiFi protocols used the band of 2.4 GHz and another used 5.0 GHz, each of which had a bandwidth of 300 Mbps without interference.

3.4 Experiment Design

As mentioned before, subjects then sent text messages to each other, with one subject in the driving simulator and the other in an office. An expressway road scene was presented to the driver, but he/she did not drive. Initially, subjects practiced texting for 10 to 15 min to become familiar with the experimental situation, with the two topics of “I need BBQ” and “I am glad the football season begins.” Subjects could either continue the topic assigned, or change to other topics they were more interested in at any time.

Subsequently, subjects participated in five message sequences, each triggered by text provided to the driver by the experimenter on an in-vehicle display. Topics were selected to be statistically representative of those that typically occurred while texting (and not driving) based on a General Motors provided text message corpus collected while not driving [15]. See also Winter et al. [16]. There were 5 topics about human relations, 2 for activities and events, 3 for appointments and schedules, 2 for school and work, 2 for technology, and 1 for emotion. For further information about the design, please refer to Green et al. [13].

3.5 Results

There were 1,200 messages (49,985 keystrokes) sent between the 2 sides, 584 messages from drivers to partners and 616 from partners to drivers. On average, a typical message included 8.5 words (S.D. = 6.2), composed of 41.6 characters (S.D. = 30.7), including 7.5 spaces, 1.2 punctuation marks, and 32.9 letters and numbers. For a detailed analysis of message content, see Hecht et al. [14].

To analyze the inter-keystroke intervals, one needs to have confidence that the timing is accurate. When typing transcripts, the brain is used as a short-term buffer and the typist will load a certain amount of text into the buffer [17]. Text will be grouped into discrete units and entered once [18], so the times between keystrokes can be very short. All too often research is conducted on keying behavior, response times, or eye fixations, but the timing is never checked. This is particularly important when claims are made about millisecond or even centisecond accuracy, but the hardware and software do not support such. The focus of this paper is on how well the method captured the keystrokes and the timing accuracy.

There was no evidence that any of the 49,985 keystrokes were lost or transmitted out of order. Also of interest was if various devices were reporting the same times and the same transmission delays between devices. The issue was addressed in two phases. During the preliminary phase of the development of this method, each smartphone was placed side by side with the PC that was associated with it, and text was entered into the phone. There was no perceptible delay between when keys were pressed, when the text appeared on the smartphone display, and when the text appeared on the associated PC (connected via VNC).

In a subsequent phase, system timing and lags were examined. The time logging started before showing the topic to the driver, continued on the five topics, and did not stop until the texting for all five topics ended. The authors compared the log files on each PC (driver, partner), each with its own timeline and analyzed the effects of message characteristics. In this case, the driver and the partner could not be readily synchronized and compared using a third-party timeline.

The log files included the time a message sent from one device and the time that message received by another device, on different timelines. In this case, a method was conducted to integrate the two timelines, using the first sent/received message as the baseline.

Comparing the adjusted message-sent and -received timelines, the time differences included two parts, the time for Skype to pass messages through the Internet (whose transmission times were variable), and the difference between the Morae timelines running on two non-identical computers to record all the events. (The driver side computer (Intel Core 2 Quad 2.4 GHz + 4 GB of RAM) had more throughput than the other computer (Intel Pentium IV 3.6 GHz + 1 GB of RAM).) These two timelines could not be synchronized without a third-party timeline, which did not exist in this case. Identical hardware for the driver and partner sides was not available, and was initially not thought to be a concern.

Table 1 shows how the data was processed. The first sent/received messages of each subject pair (column A & C) were treated as the baselines for the messages they sent back and forth. Thus, the first message served to zero the timeline, with a log file entry of 0h:00m:00s.00. Using that value, the times for messages being sent and received (columns B and D) could be compared, leading to column E. In column E, the time differences could be positive or negative values, which meant that the time difference was less (if negative) or greater (if positive) than the first sent/received pair.
Table 1.

Calculation of the difference between timelines

Msg #

By driver’s computer

By partner’s computer

(E)

Time difference (ms)

(A)

Sent time

(B)

Time based on the first sent message

(C)

Received time

(D)

Time based on the first received message

0 (First)

0:12:49.52

0:11:49.35

1

0:13:53.85

0:01:04.33

0:12:53.60

0:01:04.25

−80

2

0:14:48.56

0:01:59.04

0:13:48.24

0:01:58.89

−150

3

0:15:49.49

0:02:59.97

0:14:49.35

0:03:00.00

30

During the data collection phase, three messages from the driver to the partner and one from the partner to the driver were delayed by the VNC for greater than five seconds, in which the data stream from VNC Client was not immediately sent to VNC Server. These four messages were removed from further analysis and only 1,196 messages remained.

Further, the time of the first sent/received message of each pair of subjects for both sides did not count because it was the baseline of the time-log adjustment and there was no message sent/received before it. Therefore, 1,148 messages were considered (584 - 24 - 3 = 557 sent from the driver to the partner; 616 - 24 - 1 = 591 from the partner to the driver).

Figure 3 shows the distributions of the time difference that messages were sent from the driver to partner (3a) and vice versa (3b). The white bars at the very right side of Fig. 3a and b represent the relative time difference greater than 300 ms. Times with negative values meant that the processing times for Skype and Morae for a particular message were less than the processing time for the first message sent and received (being assumed as zero).
Fig. 3.

Histogram of time difference between the driver and partner, based on the first sent/received message.

In Fig. 3a, when the driver sent messages to the partner, the relative time differences were normally distributed, with a mean of 0.70 ms and a standard deviation of 73 ms. Approximately 83 % (463/558) of messages had the relative time differences between ± 73 ms, the standard deviation, and over 99 % (553/557) between ± 3σ. Similar results could be found in Fig. 3b, the messages sent by the partner to the driver. The mean and standard deviation of the relative time differences were 8.2 ms and 73 ms. Time differences between ± 73 ms were for 77 % (454/591) of messages and over 99 % (589/591) between ± 3σ. Thus the mean delay was about 8 ms longer from the partner relative to the first message, but the standard deviation was identical, which is most important. As a reminder, times were determined to the nearest 0.01 s by Morae.

Thus, the message transmission delays due to VNC and Morae were quite stable. However, as long as communication occurred over the Internet, communication delays introduced could not be completely avoided. Certainly using identical hardware for the driver and partner would have led to more consistent time, but the expected differences due to such are likely to be much less than those due to the Internet. However, what is important is that in many situations the inter-keystroke intervals for each device were of sufficient accuracy.

4 Conclusion

4.1 Strengths of the Method Used

This paper describes in detail a method that easily and accurately collects keystrokes on mobile devices to the nearest centisecond, and provides example performance data collected using this method from a case study. The performance of the configuration was quite good. The data recorded was very stable. Some 83 % (from driver to partner), and 77 % and (from partner to driver) of the messages had the time differences within ± 1σ (73 and 73 ms, respectively), using the first messages as the baselines. When the error tolerance was ± 100 ms, some 93 % (from driver to partner), and 88 % (from partner to driver) were included, respectively. This is excellent, considering that timing was to the nearest 10 ms. Kukreja et al. [5] and Austin et al. [19] report the peak and mean inter-keystroke intervals of 175 ms and 356 ms for typing on a full-size computer keyboard. Therefore, the data-logging configuration in this study was accurate enough to record keystroke timing. The timing was unaffected by the length of the message sent and was fairly stable throughout the experiment.

4.2 Concerns with the Method Used

There are three sources of potential timing errors, (1) the Internet over which the communication occurred, (2) the software (VNC) to exchange messages, and (3) the computers to log the timestamps. The Internet was used in this case for ease of access. There were three outliers in the messages sent from the driver to the partner and one from the partner to the driver. This corresponds to only 0.3 % of all messages, an acceptable low amount. Oddly, three of these cases all occurred in the afternoon, between 1–4 pm, on a particular day, which is why some sort of Internet-related cause is suspected. In theory, a LAN dedicated to an experiment should provide more consistent transmission times because the load is stable and the hardware fixed. However, there are few applications that support LANs and to customizing a LAN information exchange platform is time and cost consuming and the timing will not be as accurate as native programs [20].

Finally, although not considered to be a major source of timing problems, the driver and partner logging computers were different, so their processing time could differ. Using two identical computers is recommended to eliminate any suspicion of a problem.

4.3 Closing Thought

In summary, the method described in this paper provides a simple, low-cost, and accurate method to record and time keystroke-level actions for mobile devices, something which is extremely difficult to do as well using other methods. These data are essential for performing detailed analyses of user actions in applied usability studies and more fundamental analyses of how people interact with mobile devices. The accuracy, using the configurations is good enough for most purposes, but it is not perfect. The next step is to explore (1) using LANs to improve timing, (2) variations in VNC performance as a function of hardware, and (3) recording performance for other input gestures such as swiping and dragging (in particular their path and click locations). Researchers are strongly encouraged to use this method in their research.

References

  1. 1.
    Hornbæk, K., Law, E.L.-C.: Meta-analysis of correlations among usability measures. Paper presented at the CHI 2007 Proceedings, San Jose, CA, USA (2007)Google Scholar
  2. 2.
    Kjeldskov, J., Stage, J.: New techniques for usability evaluation of mobile systems. Int. J. Hum.-Comput. Stud. 60(5), 599–620 (2004)CrossRefGoogle Scholar
  3. 3.
    Klockar, T., Carr, D.A., Hedman, A., Johansson, T., Bengtsson, F.: Usability of mobile phones. Paper presented at the Proceedings of the 19th International Symposium on Human Factors in Telecommunication, Berlin, Germany (2003)Google Scholar
  4. 4.
    Silfverberg, M., MacKenzie, S., Korhonen, P.: Predicting text entry speed on mobile phones. Paper presented at the CHI 2000, The Hague, The Netherlands (2000)Google Scholar
  5. 5.
    Kukreja, U., Stevenson, W.E., Ritter, F.E.: RUI: recording user input from interfaces under Windows and Mac OS X. Behav. Res. Meth. 38(4), 656–659 (2006)CrossRefGoogle Scholar
  6. 6.
    MacKenzie, S., Kober, H., Smith, D., Jones, T., Skepner, E.: LetterWise: prefix-based disambiguation for mobile text input. Paper presented at the Proceedings of the 14th Annual ACM Symposium on User Interface Software and Technology, Orlando, FL, USA (2001)Google Scholar
  7. 7.
    Edmonds, A.: Uzilla: a new tool for web usability testing. Behav. Res. Meth., Instrum. Comput. 35(2), 194–201 (2003)CrossRefGoogle Scholar
  8. 8.
    Holleis, P., Otto, F., Huβmann, H., Schmidt, A.: Keystroke-level model for advanced mobile phone interaction. Paper presented at the CHI 2007 Proceedings, San Jose, CA, USA (2007)Google Scholar
  9. 9.
    Keller, F., Gunasekharan, S., Mayo, N., Corley, M.: Timing accuracy of web experiments: a case study using the WebExp software package. Behav. Res. Meth. 41(1), 1–12 (2009)CrossRefGoogle Scholar
  10. 10.
    Wengelin, Å., Torrance, M., Holmqvist, K., Simpson, S., Galbraith, D., Johansson, V., Johansson, R.: Combined eyetracking and keystroke-logging methods for studying cognitive processes in text production. Behav. Res. Meth. 41(2), 337–351 (2009)CrossRefGoogle Scholar
  11. 11.
    Betiol, A.H., de Abreu Cybis, W.: Usability testing of mobile devices: a comparison of three approaches. In: Costabile, M.F., Paternó, F. (eds.) INTERACT 2005. LNCS, vol. 3585, pp. 470–481. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  12. 12.
    Gupta, A., Cozza, R., Milanesi, C., Lu, C.: Market Share Analysis: Mobile Phones, Worldwide, 4Q12 and 2012: Gartner, Inc. (2013)Google Scholar
  13. 13.
    Green, P., Lin, B., Kang, T.-P., Best, A.: Manual and Speech Entry of Text Messages while Driving. University of Michigan Transportation Research Institute (UMTRI), Ann Arbor (2011)Google Scholar
  14. 14.
    Hecht, R.M., Tzirkel, E., Tsimhoni, O.: Language models for text messaging based on driving workload. Paper presented at the 4th International Conference on Applied Human Factors and Ergonomics, San Francisco, CA, USA (2012)Google Scholar
  15. 15.
    Winter, U., Grost, T.J., Tsimhoni, O.: Language pattern aanalysis for automotive natural language speech applications. Paper presented at the Proceedings of the 2nd International Conference on Automotive User Interfaces and Interactive Vehicular Applications, Pittsburgh, PA, USA (2010)Google Scholar
  16. 16.
    Winter, U., Ben-Aharon, R., Chernobrov, D., Hecht, R.M.: Topics as contextual indicators for word choice in sms conversations. Paper presented at the Proceedings of the SIGDIAL 2011: The 12th Annual Meeting of the Special Interest Group on Discourse and Dialogue, Portland, OR, USA (2011)Google Scholar
  17. 17.
    Thomas, E.A.C., Jones, R.G.: A model for subjective grouping in typewriting. Q. J. Exp. Psychol. 22(3), 353–367 (1970)CrossRefGoogle Scholar
  18. 18.
    Cooper, W.E.: Cognitive Aspects of Skilled Typewriting. Springer, New York (1983)CrossRefGoogle Scholar
  19. 19.
    Austin, D., Jimison, H., Hayes, T., Mattek, N., Kaye, J., Pavel, M.: Measuring motor speed through typing: a surrogate for the finger tapping test. Behav. Res. Meth. 43(4), 903–909 (2011)CrossRefGoogle Scholar
  20. 20.
    Eichstaedt, J.: An inaccurate-timing filter for reaction time measurement by JAVA applets implementing internet-based experiments. Behav. Res. Meth. Instrum. Comput. 33(2), 179–186 (2001)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  1. 1.Driver Interface GroupUniversity of Michigan Transportation Research InstituteAnn ArborUSA

Personalised recommendations