Effects of Password Permutation on Subjective Usability Across Platforms
- 2.5k Downloads
The current work examines the effects of password permutation on subjective usability across platforms, using system-generated passwords that adhere to the password requirements found in higher-security enterprise environments. This research builds upon a series of studies at the National Institute of Standards and Technology by testing a previously proposed idea of password permutation: grouping like character classes together in order to improve password usability. Password permutation improves mobile device entry by reducing the number of keystrokes required to enter numbers and symbols. Across platforms (smartphone, tablet, and desktop computer) participants rated the longer (length 14) permuted passwords as easier to type than the shorter (length 10) non-permuted passwords. This demonstrates that the composition and structure of a password are important; people are sensitive to factors beyond simple password length. By combining qualitative and quantitative research, we will ultimately arrive at a more complete understanding of how password construction impacts usability.
KeywordsPasswords Authentication Mobile text entry Typing Touchscreens Smartphones Tablets Password permutation Chunking Usable security
Although text-based passwords are widely viewed as a problematic mechanism for user authentication , they are nonetheless a core element of our current digital society. Large-scale efforts, such as the United States National Strategy for Trusted Identities in Cyberspace (NSTIC), are underway to replace passwords . Yet widespread replacement will take time, and legacy systems may continue to require text-based passwords. It is therefore important that we continue to research and understand passwords in order to improve them, both in terms of their security and usability.
Balancing the security and usability of passwords can be difficult, as the characteristics of password policies intended to make them more secure, such as increasing password length, using numbers, symbols, and mixing uppercase and lowercase letters  largely make them less usable. In addition to password requirements and policies [4, 5], there are many other aspects of passwords to study.
One can examine passwords along different phases of their lifecycle , from their original generation to later retrieval. One can consider the security and usability of passwords from different sources, i.e., system-generated [7, 8] versus user-generated passwords . One can evaluate passwords on different platforms, from traditional desktop QWERTY keyboards  to smaller mobile touchscreen keyboards [11, 12, 13]. Finally, one can examine passwords from the perspective of employees at large organizations [14, 15, 16] or web users in general .
Of the numerous research possibilities, the current work examines subjective password usability across multiple platforms—desktop, smartphone, and tablet—using system-generated passwords that adhere to the strict password requirements commonly found in higher-security enterprise environments. As the current research builds upon a series of studies at the United States National Institute of Standards and Technology (NIST), this paper begins with a detailed review of the three studies motivating the current work .
2.1 Usability of System-Generated Passwords on Desktop Keyboards
The first of several NIST studies to examine the usability of system-generated passwords was a desktop study in which participants had to memorize a series of ten different passwords and type them repeatedly . This study was informative in examining the fundamentals of desktop password typing, contributing much-needed baseline data on human performance with complex, system-generated passwords, thereby addressing a critical gap in the literature. Although there was already much research on general memory  and memory for passwords [19, 20, 21, 22], as well as literature on transcription typing  and skilled typing [24, 25, 26], those studies did not use stimuli directly analogous to the more complex, system-generated passwords of interest. In particular, prior research did not include stimuli containing the variety of symbols, numbers, and mixed case letters required for passwords in higher-security, enterprise environments.
In , system generated passwords were used in order to control for effects of different levels of password meaningfulness. In that study, the researchers set out only to examine effects of increasing password length, wanting to hold other factors—such as password meaning—constant. As is often the case in experimental research, it was necessary to trade external validity for the internal control required to address a specific research question: what are the effects of increasing password length on human behavior with complex passwords? In that study, participants received ten system-generated passwords one at a time. Passwords ranged in length from six to fourteen characters long. For each password, participants progressed through a series of three screens—Practice, Verify, and Entry. Participants were allowed to practice at will on the Practice Screen, then had to enter the target password correctly one time on the Verification Screen, and finally had to type the password ten times on the Entry Screen. After completing this three-phase sequence (i.e., practice, verification, entry) for each of the ten passwords, participants received a surprise recall test.
Not surprisingly, the longer a password was, the more time it took for participants to complete the tasks, with the slope of the timing line increasing around the eight-character password length. Of greater interest was the most prevalent error category: incorrect capitalization, or shifting, errors. In addition to capital letters, many symbols also require a shift action (e.g., “%” requires pressing the shift and “5” key). Since many enterprises have password policies explicitly requiring the use of uppercase letters and symbols, the high frequency of shifting errors was particularly important.
2.2 Usability of System-Generated Passwords on Mobile Devices
The second of several NIST studies examining the usability of system-generated passwords was a mobile study  replicating the aforementioned desktop study  with smartphones and tablets. In , the research goal was to investigate effects of changing platforms—desktop versus mobile—on human performance using complex passwords. Therefore, the same experimental design and stimuli from  were used in the mobile devices study .
In , device type greatly impacted both the frequency and nature of errors. A total of 2100 errors were made with the smartphone (iPhone 4S1) versus only 1289 errors with the tablet (iPad 3). For the smartphone, that corresponds to over four times the number of per-participant group errors as seen in the desktop study, and for the tablet, two and a half times the errors. The percentage of adjacent key errors was significantly higher for the smartphone than the tablet. This makes sense given the smaller target sizes for smartphone keys than tablet keys, especially in the portrait orientation used in .
The effects of mobile device constraints on password entry go beyond smaller key sizes; the nature of onscreen keyboards places significantly more demands on working memory for users. Each time a user has to change keyboards to access numbers or symbols, it is akin to a miniature task interruption, incurring interruption costs beyond the additional keystrokes required to change onscreen keyboards. In , passwords requiring a number of onscreen keyboard changes, or screen depth changes, had disproportionately large effects across multiple dependent measures (e.g., entry times, error rates). The authors suggested that porting password requirements directly from desktop to mobile platforms may be unwise without consideration of device constraints, such as those imposed by onscreen keyboards.
2.3 Usability and Security of Permuted Passwords on Mobile Devices
The third NIST study of particular relevance to the current work examined the usability and security of permuted versions of the system-generated passwords used in [10, 11]. Given the significant difficulties participants faced in  due to the high number of onscreen keyboard changes required for entering complex passwords, it was suggested that password permutation should improve usability for such system-generated passwords [7, 27]. In [7, 27] the authors permuted the stimuli from prior studies [10, 11] by grouping characters into four categories: uppercase letters, lowercase letters, numbers, and symbols. The categories were arranged in that sequence (i.e., uppercase letters first, followed by lowercase, then numbers, and finally symbols) in order to minimize the number of onscreen keyboard changes required. The number of keystrokes saved (i.e., the efficiency gained) via permutation was not necessarily dependent on the length of the original password, but rather on the number of onscreen keyboard changes required. This in turn depended on the frequency and placement of numbers and symbols in the original password, since those are the character categories that necessitate switching back and forth between onscreen keyboards.
Although reducing the number of keyboard changes required improves password usability on mobile devices, introducing the predictable structure of uppercase, lowercase, numbers, and symbols negatively impacts security. In [7, 27] the authors measured the resulting security loss in a series of Monte Carlo simulations evaluating the entropy loss per password length category, as well as determining the number of lowercase letters needed to obtain equivalent levels of security for permuted versus non-permuted system-generated passwords. In [7, 27] also evaluate the length that would be required for an all-lowercase password to have equivalent security as a complex, mixed-character password.
In [7, 27] password permutation reduced the number of keystrokes required on mobile devices, thereby improving at least one facet of usability: efficiency. Although it is possible to count keystrokes (efficiency) without behavioral data, in order to measure improvements in effectiveness (error rates) and satisfaction (subjective usability) for permuted system-generated passwords, human data are needed. The current paper focuses on subjective usability, addressing this gap by testing a subset of the previously described original and permuted passwords across multiple platforms.
The current cross-platform2 study builds upon the previously described NIST studies [7, 10, 11, 27] in several ways. One significant difference was its inclusion of all three devices (desktop computer, smartphone, and tablet) within-subjects. To help address questions raised in  that touch-typing ability (the ability to type without looking at the keys) may have contributed to performance differences between participant groups, the current study included a baseline typing test utilizing ten phrases from the MacKenzie and Soukoreff 500-phrase corpus for text entry research . Given the large variability in participant practice repetitions in [10, 11], the current study also implemented forced practice with feedback. To help address questions of whether passwords did not make it out of participants’ short term memory in [10, 11], the current study increased the number of practice repetitions required, reduced the number of passwords used, and notified participants about the final recall test (prior studies used a surprise recall test). By significantly increasing the number of repetitions required while simultaneously reducing the number of passwords used, we hoped that passwords would transition from short term memory to motor memory. Finally, the current study also had participants rate their experience typing versus memorizing the passwords, and incorporated a longer debriefing session where participants were queried explicitly about their strategies; these qualitative additions to the prior experimental protocol were most important. Whereas prior work [10, 11] was focused primarily on quantitative data, the current paper is focused on qualitative data, specifically on people’s perceptions of permuted versus “jumbled” system generated passwords.
Self-reported frequency of use for onscreen and desktop keyboards
Frequency of use
Multiple times a day
Once a day
Of the original 83 participants, five were pilot participants, and 10 did not complete the entire experiment; only data from the remaining 68 participants were included in the following analyses.
Design. Participants used three devices: traditional desktop computer (PC running Windows 7 Enterprise), smartphone (iPhone, iOS 7.1), and tablet (iPad, iOS 7.1). Device presentation order was counterbalanced. In cases where participants could not complete the tasks with all three devices due to time constraints, they used the desktop computer and one of the mobile devices (either a smartphone or a tablet). Password set was manipulated between subjects: half of all participants received Password Set 1, and the remaining half of participants received Password Set 2. Of the 68 participants included in the following analyses, 37 used password Set 1, and 31 used password Set 2 (details in Materials section). Both password sets contained two passwords: an unmodified password originally used in prior research [10, 11] and a permuted version of a password from prior research. Participants used the same two passwords on all three devices. Password presentation order was randomized using the randomization function in the data collection software (details in Procedure section).
Materials. Passwords in the current study were a carefully selected subset of the ten system-generated passwords used in the original NIST studies [10, 11]. Both password sets in the current study contained a previously used, non-permuted password of length 10, and a different, permuted password of length 14. Password Set 1 consisted of q80<U/C2mv and Rmofpaf2207#)^, where q80<U/C2mv was an original length 10 password used in prior studies and Rmofpaf2207#)^ was the permuted version of a previously used length 14 password. Password Set 2 consisted of p4d46*3TxY and QMifnh455230_$, where p4d46*3TxY was an original length 10 password used in prior studies and QMifnh455230_$ was the permuted version of a previously used length 14 password. For ease of exposition, these passwords may be referred to henceforth as “q80,” “Rmof,” “p4d4,” and “QMif.”
The length 14 passwords were selected because they were by far the most difficult passwords from prior work; they took longer to learn and enter and were more error prone as well [10, 11]. They were particularly problematic on mobile devices due to the number of screen depth changes (navigating back and forth between different onscreen keyboards) they required. Therefore, they should benefit most from password permutation.
The length 10 passwords were selected for several reasons. Informal comments from participants in prior studies indicated that “q80” and “p4d4” seemed “easier” and “more memorable.” People also previously commented that they were “breaking the password up at the symbols.” A desire to explore these qualitative observations more rigorously was one reason for the password selection in the current study. Perhaps more importantly, although the length 10 “q80” and “p4d4” passwords are obviously shorter than the length 14 “Rmof” and “QMif” passwords, they require almost exactly the same number of keystrokes, or taps on a mobile device, to enter as the longer, permuted length 14 passwords. In Set 1 “q80” and “Rmof” both take 19 taps to enter on a mobile (iOS) device. In Set 2 “p4d4” takes 18 taps and “QMif” takes 19 taps. Password sets and tap counts are shown in Table 2. Although only the bold passwords in Table 2 were used in the current study, their permuted or original counterparts are also shown for the sake of comparison.
Password sets and tap counts
After participants completed both passwords on the first device, they completed a short questionnaire with four Likert items asking them to rate their experience memorizing and typing each password. Response options ranged from 1 (“Very Difficult”) to 5 (“Very Easy”). The moderator then asked them several verbal questions about their typing and memorization strategies for each password. If participants stated they were breaking the password into smaller pieces, they were asked to draw vertical lines between the characters in the password to indicate their break points. These qualitative questions—in particular the typing ratings—were an extremely important part of the current study, and are the focus of the current analyses.
The above procedure was repeated for each of the remaining two devices. After participants completed both passwords on all three devices, they answered a final set of questions about their overall password preferences and strategies. Customized data collection software3 displayed the passwords on all three devices for the current study. The research software captures a time stamped log of all keyboard actions and button presses while users interact with the data collection application. Although the current study only used two sets of passwords, the software supports any number of sets of passwords via a customizable input file. One can also customize the contents of the ten baseline typing phrases via the input file. Number of practice, verify, and enter rounds required are also customizable via the software settings screen.
3.2 Results: Typing Difficulty Ratings
To simplify tabular presentation of the data, the Likert response options were collapsed from five to three categories. “Difficult” and “Very Difficult” options were collapsed into a single “Difficult” category, while “Easy” and “Very Easy” options were collapsed into an “Easy” category. “Neutral” options were left as they were.
Per-set typing difficulty ratings
Per-password typing difficulty ratings
While prior work [7, 27] demonstrated increased objective usability (improved efficiency) from password permutation, that work did not include any qualitative research components, so could not assess whether subjective usability was also improved. The current results strongly suggest that permuting system generated passwords benefits subjective usability as well, nicely complementing [7, 27] and prior NIST quantitative work on the motoric difficulties of complex password entry [10, 11].
Since the purpose of drastically increasing the number of required password repetitions (in comparison to prior studies [10, 11]) was to transition passwords from participants’ short term memory to motor memory, the focus of the current paper was on participants’ perceptions of typing difficulty (i.e., the motoric component of the password entry tasks). Of particular interest were participants’ perceptions of password differences between permuted and non-permuted passwords. It is clear that people are sensitive to the structure and arrangement of characters within a password.
Interestingly, although password permutation was originally proposed specifically to improve ease of entry on mobile devices [7, 27] permuted passwords were rated as easier across both mobile and desktop devices in the current study. This suggests that such permutation may be beneficial regardless of device, having both motoric and cognitive benefits; in addition to reducing the number of keystrokes required on mobile devices, permuted passwords may also be easier to learn and recall. Participants did comment that permuted passwords seemed easier to remember, e.g. “structure was easier to remember, having the three symbols together was helpful” [P480] and “I chunked it into the way it was already organized, it was easier for me to memorize that, and to organize what my fingers were getting ready to do” [P460]. A crucial next step in this line of research will be comparing performance (efficiency and effectiveness) with preference (satisfaction), analyzing the quantitative data to see whether typing and memory improvements correlate with people’s subjective impressions of these passwords.
Although the emphasis on qualitative data was one of the biggest differences between the current work and prior NIST password typing studies [10, 11], that does not mean that the quantitative data from the current study are not of interest. On the contrary, the current study provides an extremely rich quantitative dataset that will further complement the qualitative results described here. By combining qualitative and quantitative research, we will ultimately arrive at a more complete understanding of how password construction impacts usability, both for system generated and user generated passwords. Although system generated passwords can benefit greatly from the proposed password permutation, it is not likely that it would offer similar benefits to user generated passwords, as many users already group like character classes together in their passwords (e.g., uppercase first, lowercase next, numbers and/or symbols at the end). Nonetheless, given the higher security enterprise environments for which this research is intended, it is encouraging that we have found a means of making system generated passwords somewhat more palatable to users.
Disclaimer: Any mention of commercial products or reference to commercial organizations is for information only; it does not imply recommendation or endorsement by the National Institute of Standards and Technology nor does it imply that the products mentioned are necessarily the best available for the purpose.
A platform is a unified architecture composed of common hardware and software elements that may manifest in various specific devices. For example, the iPhone and iPad are two devices sharing the iOS platform.
Code available at https://github.com/usnistgov/TypingTester. The program was designed to maximize flexibility and opportunity for reuse in future NIST experiments, but we hope that other usable security researchers may also benefit from this research tool.
Participant IDs are denoted as (P###).
The author gratefully acknowledges Brian Stanton at NIST.
- 1.Honan, M.: Kill the password: why a string of characters can’t protect us anymore. Wired (2012)Google Scholar
- 2.National Strategy for Trusted Identities in Cyberspace: Enhancing Online choice, Efficiency, Security, and Privacy. http://www.whitehouse.gov/sites/default/files/rss_viewer/NSTICstrategy_041511.pdf. Accessed on 2011
- 3.United States Department of Homeland Security: United States Computer Emergency Readiness Team (US-CERT). Security tip (ST04-002): Choosing and protecting passwords. http://www.us-cert.gov/cas/tips/ST04-002.htm. Accessed on 2009
- 4.Steves, M., Killourhy, K., Theofanos, M.F.: Clear, unambiguous password policies: an oxymoron? In: Rau, P. (ed.) CCD 2014. LNCS, vol. 8528, pp. 240–251. Springer, Heidelberg (2014)Google Scholar
- 5.Steves, M., Theofanos, M.F.: Password policy interpretation. In: Proceedings of the 3rd International Conference on Human Aspects of Information Security, Privacy and Trust, in the 17th International Conference on Human-Computer Interaction (2015, to appear)Google Scholar
- 6.Choong, Y.-Y.: A cognitive-behavioral framework of user password management lifecycle. In: Tryfonas, T., Askoxylakis, I. (eds.) HAS 2014. LNCS, vol. 8533, pp. 127–137. Springer, Heidelberg (2014)Google Scholar
- 7.Greene, K.K., Kelsey, J., Franklin, J.M.: Measuring the Usability and Security of Permuted Passwords on Mobile Platforms. National Institute of Standards and Technology Interagency Report (NISTIR) 8040 (2015)Google Scholar
- 8.Ploehn, C., Greene, K.K.: The authentication equation: visualizing the convergence of security and usability of system-generated passwords. In: Proceedings of the 3rd International Conference on Human Aspects of Information Security, Privacy and Trust, in the 17th International Conference on Human-Computer Interaction (2015, to appear)Google Scholar
- 9.Lee, P., Choong, Y.: Human generated passwords – the impacts of password requirements and presentation styles. In: Proceedings of the 3rd International Conference on Human Aspects of Information Security, Privacy and Trust, in the 17th International Conference on Human-Computer Interaction (2015, to appear)Google Scholar
- 10.Stanton, B.C., Greene, K.K.: Character strings, memory and passwords: what a recall study can tell us. In: Tryfonas, T., Askoxylakis, I. (eds.) HAS 2014. LNCS, vol. 8533, pp. 195–206. Springer, Heidelberg (2014)Google Scholar
- 11.Greene, K.K., Gallagher, M.A., Stanton, B.C., Lee, P.Y.: I can’t type that! p@$$w0rd entry on mobile devices. In: Tryfonas, T., Askoxylakis, I. (eds.) HAS 2014. LNCS, vol. 8533, pp. 160–171. Springer, Heidelberg (2014)Google Scholar
- 13.Gallagher, M.A.: Modeling password entry on mobile devices: please check your password and try again. Doctoral Dissertation, Rice University, Houston, TX (2015)Google Scholar
- 14.Choong, Y., Theofanos, M., Liu, H.K.: United States Federal Employees’ Password Management Behaviors – a Department of Commerce Case Study. National Institute of Standards and Technology Interagency Report (NISTIR) 7991 (2014)Google Scholar
- 15.Shelton, D.C.: Reasons for non-compliance with mandatory information assurance policies by a trained population. Doctoral Dissertation, Capitol Technology University (2014)Google Scholar
- 16.Choong, Y., Theofanos, M. F.: What 4,500 + people can tell you – employees’ attitudes toward organizational password policy do matter. In: Proceedings of the 3rd International Conference on Human Aspects of Information Security, Privacy and Trust, in the 17th International Conference on Human-Computer Interaction (2015, to appear)Google Scholar
- 17.Florencio, D., Herley, C.: A large-scale study of web password habits. In: Proceedings of the 16th International Conference on World Wide Web, pp. 657–666 (2007)Google Scholar
- 18.Unsworth, N., Engle, R.W.: Individual Differences in Working Memory Capacity and Retrieval: A Cue-Dependent Search Approach. The Foundations of Remembering: Essays in Honor of Henry L. Roedgier III, pp. 241–258. Psychology Press, New York (2007)Google Scholar
- 19.Forget, A., Biddle, R.: Memorability of persuasive passwords. In: CHI 2008 Extended Abstracts on Human Factors in Computing Systems, pp. 3759–3764 (2008)Google Scholar
- 20.Vu, K., Cook, J., Bhargav-Spantzel, A., Proctor, R.W.: Short- and long-term retention of passwords generated by first-letter and entire-word mnemonic methods. In: Proceedings of the 5th Annual Security Conference, Las Vegas, NV (2006)Google Scholar
- 24.Coover, J.E.: A method of teaching typewriting based upon a psychological analysis of expert typing. Nat. Educ. Assoc. 61, 561–567 (1923)Google Scholar
- 25.Gentner, D.: Skilled finger movements in typing. Center for Information Processing. University of California, San Diego. CHIP Report 104 (1981)Google Scholar
- 27.Greene, K.K., Franklin, J., Kelsey, J.: Tap on, tap off: onscreen keyboards and mobile password entry. In: Proceedings of ShmooCon 2015 (2015)Google Scholar
- 28.MacKenzie, I.S., Soukoreff, R.W.: Phrase sets for evaluating text entry techniques. In: Extended Abstracts of the ACM Conference on Human Factors in Computing Systems - CHI 2003, pp. 754–755. ACM, New York (2003)Google Scholar