1 Introduction

The ubiquity of personal smart devices has led to a scenario in which we create and capture increasing amounts of content, generating a proportional demand to share that content with those around us. This near-field data exchange is typically facilitated by wireless connectivity, which allows a user to create an ad hoc connection with co-located peers, through which data can be shared with an individual or group. However, despite the variety and sophistication of connectivity options, a short-range exchange of information can still be a frustrating experience. Central to this problem is that peer-to-peer transactions are poorly supported by current smart devices. Despite the many options available for data transfer across devices, none is ubiquitous, cross-platform, and free of user interface friction (by which we mean the need to associate devices or establish a temporary network).

Of the options available, Bluetooth is perhaps the most widespread, but for ad hoc interactions it is susceptible to usability issues, as it requires a multi-step device discovery process [16, 18]. Alternative technologies such as Wi-Fi Direct and RFID-based near-field communication (NFC) exist, but are currently not feasible methods for peer-to-peer data exchange due to differing cross-platform implementations, leading to issues with device compatibility [48]. For example, although Wi-Fi Direct is now widely adopted on Android devices, it does not exist on iOS, where a propriety alternative is used [73], and the RFID hardware on iOS devices is not currently exposed to developers. A further problem within the user experience of NFC technologies is the opacity, or ‘visibility’ of the interaction, and lack of shared status feedback. User feedback and visibility of the system status are key usability heuristics [61, 62]; yet, in a casual ad hoc interaction, it may not be obvious to other participants what the common state is in order to progress the transaction. This can hinder the speed and success of a sharing activity, and is particularly critical when problems arise in data exchange, potentially amplifying user frustration when device discovery or data sharing fails.

Acoustic data transmission presents an interesting alternative to the aforementioned technologies. In this approach, digital information is encoded in audio signals for transmission between air-gapped loudspeakers and microphones. Audio playback is supported on a broad range of hardware, including all mobile phones, so it immediately offers multiple ways to generate, transport, receive and decode sound on today’s devices. It therefore offers a frictionless way to transmit data between devices by utilising existing sensors. Such acoustic data transmission technology can support one-to-many transactions, unlike many wireless mechanisms. It has the further advantage in that it is visible as an interaction media, providing shared insight into the status of a sharing activity.

Despite significant research into both applications and the underlying technology (which we discuss in Section 3), to our knowledge, there exists no research on the user experience of using acoustic data transmission, either directly or in comparison to alternative wireless communication technologies. In this work, we address this by asking whether acoustic data transmission solves the aforementioned limitations, provides a viable and user-friendly mode of near-field data exchange, and has the potential to enhance the user experience (UX) of exchanging data between devices. We use Chirp [15], an existing, commercially-available implementation of acoustic data transmission technology, which was developed in part by the authors.

In Sections 2 and 3 of this paper, we outline the opportunity for Chirp as a complement to other wireless technologies. We identify the benefits of sound, and thus how it can facilitate peer-to-peer transactions. In Section 4, we present a user study that compares Bluetooth (BLE), QR and Chirp in a simple peer-to-peer contact sharing task, evaluating the UX across the proposed technologies. The results suggest that Chirp can facilitate friction-free interaction between users and their devices, minimising the effort required and thus resulting in a more desirable UX. In summary, we present findings that identify Chirp as being as fast at individual sharing actions as QR codes, and significantly faster than BLE. Chirp also enables a sat-back interaction style that does not involve significant physical actions, similar to BLE, but dissimilar to QR, which involves physical manipulations of the devices and requires users to coordinate their positions in order to complete transactions.

Together, the quantitative and qualitative analysis from the user study suggest that there are significant opportunities in collaborative systems for data sharing using sound.

2 Peer-to-peer data sharing

2.1 Collaborative context

The use of smart devices to support co-located interaction has attracted considerable attention over the past decade [30, 50, 54]. Users typically have a significant amount of personal content on their phones that they wish to share with people around them, including, for example photos [19, 43, 52], calendars [22] and notes [51].

A key part of small group interaction is how the scope of the interaction is defined. At least four classes can be identified: interactions facilitated by a shared device (e.g. [26, 34]), speculative interaction facilitated by ad hoc discovery of potential partners (e.g. Nintendo StreetPass [63]), server-based proximity services (see [42]), and user-activated sharing. We will focus on user-activated ad hoc collaborations. We will assume that the devices are user-owned, that there is no third party sharing service, and that the devices are not already paired or otherwise linked.

User-activated sharing can be achieved in a number of different ways. Often, there is a pairing or device association step where the devices that will interact are identified [17]. This interaction can be as simple as pressing two virtual or real buttons simultaneously (e.g. pressing a physical button on a new game controller and pressing a virtual button on the console to pair it). More novel methods including shaking, touching or banging the devices (e.g. [14, 31, 33, 53, 58]), or using audio as a spatial trigger (e.g. [74, 77]).

To minimise friction, effort and interaction time, the ideal user experience for a sharing task is one in which minimal or no user intervention is required. For this reason, this paper will focus on technologies which do not require any prior shared actions or pairing before the exchange itself takes place. We will also limit the scope to scenarios of one-off data transmission, rather than continuous, synchronous interaction and omit multi-channel hybrid approaches in which audio (or other means) is used to pair an additional communicational channel (c.f. [71, 76]).

2.2 Device-to-device data sharing

There is a plethora of technologies for sharing information between devices. In the space of Internet of Things (IoT) devices, there may be the opportunity for only one or two technologies on any single device because of the requirements for low power and low cost [78]. In contrast, modern smart phones contain numerous sensors, such as motion sensors, cameras, various types of radio chips and microphones. Each of these may be used for ad hoc device-to-device communication.

The use of cameras to read coded information has a long history in collaborative technologies. Denso Wave developed the QR code in 1994; it is now an international standard [37], and many smart devices come with a QR code reader by default. Applications can generate QR codes on the fly, which allows the sharer’s screen to be used as the display surface, as long as the users involved in the transaction can align the receiving camera and display to complete the interaction. There are many similar visual-code based systems, (see for example, [41]), although the QR code is perhaps the most popular.

Smart phones have a range of capabilities for radio communication. Broadband cellular network technology (3G/4G) is very broadly deployed, but does not facilitate device-to-device communication for data sharing. Many phones support ad hoc Wi-Fi, but this can be at the expense of disabling wide-area connections, so it is not appropriate for fast, ad hoc communications at the current time. Bluetooth is commonly available in smart devices. Given its relatively high bandwidth, it has found good use in personal networks between peripherals. The more recent version, Bluetooth Low Energy (BLE), offers improved functionality for ad hoc communication between devices [65], removing the need for device pairing. Many modern smart devices can also read radio-frequency ID tags based on the NFC protocol. These can be used for ad hoc sharing between devices, but this is not as well explored as Bluetooth to date [13, 21]. Further radio-based technologies include ultra-wideband [1] and millimetre wave systems [70], such as 5G cellular networks. Whilst these technologies present promising solutions for low-energy, low-range, high bandwidth communications, they are not currently widely adopted, and presently very few smart devices contain the hardware required to operate in the required frequency ranges.

We will address the remaining modality, audio, in the following section.

3 Acoustic data transmission

3.1 Overview

As phones have evolved, their audio generation and processing abilities have expanded. For example, recent devices might have ‘always on’ listening to enable voice activation. Smart devices have full digital audio generation and sampling capabilities, but even older non-smart devices have microphones, speakers and the associated circuitry. The power consumption of using audio detection can be significantly lower than radio [74]. As a result, there exist many digital and analogue systems for generation, transport and presentation of audio.

Thus, it is sensible to use built-in microphones on a device as a sensing platform. While audio communication underpinned early long-distance communication through the use of modems over wired networks, it was somewhat overlooked as other wireless technologies proliferated in the 1990s [56]. In this section, we review some related technologies that have used acoustic data transmission, outlining the unique benefits that this technology presents to the user interface designer. Furthermore, we introduce Chirp, our implementation of acoustic data transmission.

3.2 Audible vs. near-ultrasonic

Acoustic data transmission technologies can be loosely divided into two categories based on their range in the acoustic spectrum, and thus their perceptibility to the human ear: audible (sub-15 kHz, audible to the majority of listeners) and near-ultrasonic (17–20 kHz, which are imperceptible to many adult listeners but can be detected by typical consumer microphones). Perhaps the first near-ultrasonic direct communication system was developed by Gerasimov and Bender [25]. By its nature, near-ultrasonic communication is not audible to most users, so its presence in an environment is not obvious. This makes it a good candidate for beacon-like or side-channel communication. It can be played on its own or embedded into another audio recording. Recognising that the greatest advantage of near-ultrasound communication was that no extra hardware was required, Ka et al. proposed a framework for TV second screen services [39]. Near-ultrasonic data over sound has also been used to communicate with wearable devices [68], transmit data from within shipping containers [35], share network credentials in an industrial IoT setting [24], and for wireless communication between everyday personal electronic devices and hearing aids [59]. In addition, it has been previously used for near-ultrasonic beacons, for example to control a smartphone museum guide [7]. There are obvious security concerns with inaudible data over sound: users may not be aware that data is being transmitted, and thus covert channels might be enabled [3, 12, 57]. However, because it is inaudible and can thus be present continuously, it has other potential such as measurement of the movement or location of devices (e.g. [14, 74, 80]).

In the audible range, there is a design choice to make the data obvious or not. One prominent audible code is dual-tone multi-frequency signalling (DTMF), still in common use for communication over voice calls. When choosing other audio designs, two important factors are throughput and robustness. However, these are in tension with the desire to have tones that sound pleasant to the human ear. The early work of Madhavapeddy et al. [55] suggests a number of encoding strategies. Using DTMF between devices 3 m apart, they achieved 20 bits per second (bps) at 0.005% error per symbol. Using on-off keying at multiple frequencies, they achieved 251 bps with 4.4 × 10− 5 error rate. The concurrent work of Lopes and Aguiar [49] similarly suggests various protocols. They achieved 125 bps using Johann Sebastian Bach’s Badinerie as the melody code. By using a harmonic frequency shift key, they achieved 800 bps with few errors, but the output would sound more like noise than anything resembling a melody.

3.3 Chirp: a software framework for acoustic transmission

Chirp [15] is a software framework that facilitates over-the-air acoustic transmission. Originating in research at University College London, it was first released as a near-field image-sharing mobile app [5], and now exists as a range of cross-platform SDKs, with both free and commercial licenses.

Chirp uses frequency-shift keying (FSK) [72, p.173] for its modulation scheme, due to its robustness to the multipath propagation present in real-world acoustics [38] in comparison with schemes such as phase-shift keying [72, p.168] or amplitude-shift keying [72, p.165]. For spectral efficiency, Chirp uses an M-ary FSK scheme, encoding input symbols as one of M unique frequencies. Each symbol is modulated by an amplitude envelope to prevent discontinuities, with a guard interval between symbols to reduce the impact of reflections and reverberation on the tone detection.

A Chirp payload is prefixed by a fixed set of preamble tones, to indicate the beginning of a message and to establish timing and synchronisation. It is suffixed by Reed-Solomon forward error correction (FEC) coding [66], enabling audio to be decoded when symbols are obscured due to background noise or reverberation. The transmission protocols can be configured for specific environments and acoustic channels, including both audible and near-ultrasonic bands. Both of these bands are supported by the majority of consumer audio devices that support sample rates of 44.1 kHz.

Chirp SDKs are designed to be integrated into client applications, and typically handle interaction with the operating system’s audio I/O layer. The client application provides the SDK with an array of bytes to transmit, which is encoded and played from the device’s loudspeaker. On the receiving device, audio is sampled from the microphone. When a Chirp signal is detected and decoded from the input stream, it is presented to the client application in a callback function.

3.4 Benefits of using sound to transmit data

In this section, we will briefly discuss the benefits of acoustic data transmission, in relation to the two alternative technologies included in the present study: QR and BLE. We selected the wireless technologies based on their suitability for the task, availability on popular mobile devices, and the type of interaction that they afford. QR is a readily available method for transferring contact details and vCards (being one of the default options to share a contact on Android devices). In addition, it can be used for many of the same applications as synchronous direct peer-to-peer mechanisms, such as authenticating users [46] and secure peer-to-peer data transfer [32, 64]. In terms of ubiquity, it is possible for any device with a camera (including all smart mobiles and tablets) to read QR codes, making it more readily available to users than less well-established technologies with specific hardware requirements, such as NFC. Much like Wi-Fi Direct, BLE is an RF-based technology that requires a device discovery stage, and both BLE and Wi-Fi Direct have been shown to have comparable durations for establishing a connection between devices [40]. As such, we considered these technologies to be very similar for our application in terms of the respective general benefits, at least within the scope of the present study (we note that Wi-Fi Direct has considerable benefits in terms of range and data rate, at the expense of power consumption; however, the data rate and range of BLE was sufficient for our task). For this reason, we chose to include only one of BLE or Wi-Fi Direct, and BLE was selected as the more widely readily available and better established technology (with Wi-Fi Direct unavailable on iOS devices, where only a proprietary equivalent exists [73]).

As with QR and BLE, acoustic data transmission has particular benefits that make it more or less suitable to specific applications. An overview of these are given in Table 1. From a technical perspective, as with BLE, acoustic data transmission is capable of one-to-one, two-way, and one-to-many (broadcast) transmissions. The former are useful for transmitting data objects between 2 users (such as contact details or URLs), but the latter presents a number of wide-reaching applications such as broadcasting status updates at transit stations, or providing information about collections in an art gallery. In addition, because it can utilise existing audio systems, data can be broadcast to radio listeners, TV viewers, or over public address systems by simply playing the data over the normal channels. Furthermore, because acoustic data transmission does not operate in the electromagnetic spectrum, the acoustic spectrum may be used in scenarios where restrictions on radio-frequency (RF) transmissions exist, such as in explosive or flammable environments.

Table 1 Outline of the benefits of acoustic data transmission (ADT) in relation to the technologies compared in the user study

As previously mentioned, acoustic data transmission can utilize devices’ existing hardware components and infrastructures where microphones and speakers are already built in. This makes it extremely cheap and easy to integrate in legacy equipment, compared to QR, which requires a camera, or BLE which requires technology-specific hardware. However, acoustic data transmission has relatively low data rates compared to RF-based technologies. Specifically, BLE has physical layer and application throughput data rates of 1 Mbps and \(\sim \) 240 kbps respectively [27]. The data rate for acoustic data transmission is dependent on the protocol and encoding scheme, which can be tuned for specific ranges and bit error rates. The standard Chirp audible and ultrasonic protocols have data rates of 100 bps and 200 bps respectively. However, for very near-field (sub 30 cm) transmission, up to 1 kbps is achievable using FSK modulation. The maximum amount of data represented by a QR code also varies depending on the encoding scheme. For binary encoding, it is possible to represent up to \(\sim \) 3 kb of data. It should be noted that it is not clear how this relates to data rate, as the transfer of data using QR codes requires a camera and code to be aligned; therefore, transmission duration will depend on a number of factors, including motor control of the user and the distance between the QR code and camera.

Acoustic data transmission requires both sender and receiver devices to be within hearing range of each other, and QR codes require line-of-sight, whereas BLE does not have either constraint. This can have important implications for privacy and security, depending on the use case. Acoustic data transmission may be made secure by limiting the usable range of the protocol; however, to fully protect against eavesdropping attacks, end-to-end encryption is required. For both acoustic data transmission and QR, this must be implemented at the application layer, whereas encryption is available at the link layer in BLE, at least for paired devices (albeit the protection against eavesdropping offered by BLE is limited [67]). In some instances, these technology-specific properties may be desirable, whereas in others, they may be considered as disadvantages. As such, it is clear that there is no ‘one-size-fits-all’ solution to wireless data transmission, and it is conceivable that the choice of technology will be dependent on a number of technical requirements.

In this section, we have considered the technical features of each of the wireless technologies. However, there exists little work on how these features relate to the user experience. For example, does having zero-config or pairing requirements actually provide for a more friction-less user experience? Does the inherent audible notification have any benefit to users in terms of feedback and control? Does the requirement to open a camera for reading QR codes or find a target Bluetooth device interrupt the user to such an extent that it impedes flow and causes frustration? These are the questions that we seek to address through our user study. In particular, we are interested in the advantages and disadvantages that are presented by each of the compared technologies, each of which are technically capable of achieving the same end goal, and how these ultimately impact on the user experience.

4 Methods

Given the benefits of exchanging data over sound as outlined in the previous section, we are interested in evaluating the user experience of the technology in a real-world application. In this section, we present the design and results from a user study based on a simple peer-to-peer contact-sharing task. In particular, we are interested in the effect of the respective technologies (BLE, QR and Chirp) on transaction time, ease of use, user preference, and overall experience.

4.1 Experiment design

Three contact sharing role-play scenarios were formulated for the study: one for each mode (BLE, QR and Chirp). For each scenario, participants (n = 12) worked in pairs, and were tasked with sending and receiving three contact details using a simple address-book application. The participants each took part in three sessions (one for each mode), giving 18 total ‘transactions’ per participant. Our approach followed a within-subjects design and used a complete Latin square Williams design [79] balanced for first-order carry-over residual effects, consisting of three treatments and three periods (3 × 3) in six sequences (ABC, ACB, BAC, CAB, BCA, CBA). Participants were randomised in equal numbers to the six possible sequences of treatments, and also randomly assigned a different partner during each session so that no participant was paired with the same partner twice. Each session took place in a closed meeting room containing a table and chairs or sofa.

Following each session participants completed a survey based on the Usability Metric for User Experience (UMUX) [23], using a four-item, 7-point Likert scale ranging from 1–7 (strongly disagree to strongly agree). The UMUX is designed for the subjective assessment of a system’s perceived usability, and was formulated as an improvement of the System Usability Scale (SUS) [10]. UMUX conforms to the ISO 9241-11 [36] definition of usability, which suggests that measures of usability should cover: users’ ability to complete a task using the system, the quality of the resulting output (effectiveness), the level of resources employed in performing the task (efficiency), and users’ subjective reaction towards the use of the system (satisfaction). Following discussions about the validity of the system [8, 11], the UMUX has been re-assessed and validated in various studies [6, 75], and an UMUX-LITE version has also been proposed [45]. Overall, the UMUX has proven a compact, valid and reliable usability component for measuring the user experience of a system or technology, making it an appropriate metric for our study.

4.2 Participants

Twelve participants (4 males, 8 females; aged 21–46, median age = 25) were recruited through a combination of email and social media invitations, and an online user research recruitment platform. As such, they had a range of backgrounds, and included students, researchers, and working professionals. All participants reported owning a smartphone and having experience using both Bluetooth and QR technologies. A power analysis was conducted using the simr package for R [29]. Based on 3 groups (for the 3 modes), an effect size of 0.5 and alpha = 0.05, simulations indicated a power for predicting mode of between 0.93 and 1.0 (95% confidence interval) with 12 participants. This gives 108 observations using a balanced repeated measures design (36 observations per mode, 6 transactions per pair, 6 unique pairs). This also allows for each participant to complete the task in each modality with a randomly assigned partner, whilst avoiding pairing the same participants more than once.

4.3 Implementation of the technologies

We developed a simple mobile demo application for sharing contact details via Bluetooth, QR codes and Chirp (Fig. 1). The application simulated an address book, giving users the option to view, share and receive contacts. All versions offered the same functionality to send and receive contacts. The application was installed on six mobile devices running Android version 7, which were provided to participants while performing the task. All user actions and network call were logged for analysis. The application was designed such that the same number of user actions were required to share a contact, regardless of the technology being used (see Tables 2 and 3).

Fig. 1
figure 1

Screen capture of the contact sharing application. Sending and listening for a contact (via Chirp)

Table 2 The work flow for sending a contact using each of the three technologies. Each process contained the same number of actions (2)
Table 3 The work flow for receiving a contact using each of the three technologies.

4.4 Procedure

All participants were given verbal instructions on how to use the demo application before starting their first session. Participants were also provided with written instructions of the task and role play scenario at the start of each session. The facilitators configured the application before starting the sessions, to use either BLE, QR or Chirp, depending on the mode being tested in the given session.

Following each task, the participants completed the usability survey (Table 5). After completing all three sessions, semi-structured interviews were conducted, in which the participants were asked a consistent set of open-ended questions, prompting them to talk through their experience using the different technologies.

5 Results

5.1 Transaction time and failure rate

For the quantitative analysis we investigated 2 metrics: (i) the number of attempts required to successfully share each contact and (ii) the time taken to share a contact. These metrics were derived from the data logged by the demo application (every user action and network event was recorded). The demo application was designed to ensure that sharing a contact required the same number of user actions for each technology for both sender and receiver (as shown in Tables 2 and 3). The time taken to share a contact is defined as the duration between the user actioning to share a contact (step 1 in Table 2) and the contact being received on the recipient’s device (step 2 in Table 3). The number of attempts is defined as the number of times a user actions ‘share contact’ before the contact is received on the recipient’s device. All contacts were successfully transferred for the 108 transactions. For QR, 100% of contacts were sent on the first attempt, whereas for Chirp and BLE, this was 94.4% and 83.3% respectively, as shown in Table 4.

Table 4 Percentage of successful transactions. All contacts were successfully shared via QR on the first attempt. Participants managed to share all contacts successfully within 2 attempts for all three technologies

In terms of time taken to successfully send a contact (duration), Chirp was fastest on average (2.4 s), followed by QR (6.3 s) and BLE (8.3 s), as shown in Fig. 2. We fitted a linear mixed effect regression model using the lme4 package for R [4], with duration as the response variable, fixed effects of mode, order and transaction number (with an interaction term between mode and transaction number), and random intercepts for the sender and receiver participants. Model assumptions of normality and homoskedasticity of the residuals were checked by visual inspection. We observed heteroskedasticity in the residuals of the fitted model (with the amount of variance and duration time being positively correlated, see Fig. 2), which was rectified by log transforming duration.

Fig. 2
figure 2

Time taken to share contact information for each technology

The effect of each factor was tested using a full factorial type III analysis of variance (ANOVA) with Satterthwaite’s degrees of freedom approximation from the lmerTest package [44]. We found a significant effect of mode (F(2,77.3) = 52.5, p < 0.001), transaction number (F(5,77.3) = 10.6, p < 0.001), and a small but significant interaction between mode and transaction number (F(2,76.9) = 4.1, p < 0.001). There was no effect of order on the duration, i.e. the transaction duration did not change as users’ familiarity with the application and task increased, as shown in Fig. 3.

Fig. 3
figure 3

Effect of order of mode presentation on the time taken to share a contact (mean and standard error bars)

The significant interaction between mode and transaction number means that it is not reasonable to analyse this model in terms of main effects [60]; therefore, we conducted a post hoc analysis of interaction contrasts between these factors using the phia package for R [20]. This showed a significant interaction for QR and BLE between transactions 1 and 2 (χ2(1) = 13.3, p < 0.01) and 1 and 5 (χ2(1) = 12.5, p < 0.01). There are also significant interactions for QR and Chirp between transaction 1 and each of 2 (χ2(1) = 14.1, p < 0.01), 3 (χ2(1) = 14.8, p < 0.01), 5 (χ2(1) = 15.9, p < 0.01), 6 (χ2(1) = 18.2, p < 0.001), and between transactions 4 and 5 (χ2(1) = 7.7, p < 0.05), and 4 and 6 (χ2(1) = 9.6, p < 0.05). These interactions are shown in Fig. 4. This highlights that the difference in transaction duration is dependent on whether the contact is being shared for the first time. When a set of contacts are shared, the first contact takes significantly longer than the subsequent contacts for QR. This effect is also observed, albeit to a lesser extent, for BLE, but is not the case for Chirp, where the transaction number has no effect on duration.

Fig. 4
figure 4

Effect of transaction number on the time taken to share a contact, by mode (mean and standard error bars)

5.2 UMUX survey

After finishing each session participants completed the four-question UMUX survey. The questions and their related usability components are given in Table 5.

Table 5 UMUX scale items from the survey presented to participants at the end of each session, and their corresponding usability components

Participants’ responses to the UMUX are summarised in Fig. 5. A Friedman rank sum test was performed, showing a significance difference between the responses for questions A, B and D: A (χ2(3, N = 36) = 14.1, p < 0.01); B (χ2(3, N = 36) = 18.0, p < 0.001); D (χ2(3, N = 36) = 25.6, p < 0.001). No significant difference were found between the responses for question C.

Fig. 5
figure 5

Participant responses to the UMUX following each session. Scale coding from 1 (strongly disagree) to 7 (strongly agree)

A pairwise Wilcoxon signed-rank test (with Bonferroni correction) was performed on the modes for questions A, B and D, showing a significant difference between the responses for BLE and both the QR and Chirp modes, as shown in Table 6.

Table 6 P values from a Wilcoxon test for the pairwise comparison between responses for each mode, by question. All values were adjusted for each question using the Bonferroni correction

5.3 Semi-structured interviews

In addition to the application data and survey, a set of open-ended questions were asked to participants during semi-structured interviews. The discussion points addressed participant preferences for the technologies, inviting them to explain the reasons for their choice, whether they experienced any difficulties completing the task (and if so, to describe the difficulties encountered), if they felt the data transfer technology had any impact on the task, and finally, participants were invited to discuss their thoughts on the sound of Chirp. The main questions used as discussion points are given in Table 7.

Table 7 Main questions and discussion points from the semi-structured interviews

The interviews were video recorded and transcribed in order to conduct a qualitative analysis on the data. We followed an inducted approach of thematic analysis, performed at the latent level [9]. We present and discuss the main themes that emerged from the analysis, providing relevant extracts from the interviews for each theme.

User effort required/ Ease of use (12). :

All participants commented on the effort required to complete the task with each of the three technologies, and felt the use of Bluetooth required significant effort due to the amount of steps required to complete the task (“you have to select the device that you want to transfer the data to, and there are always lots of people phones in real life on Bluetooth”), (“it was slow and manual”), (“more interaction was required than the other methods”).

Participants reported that in some instances multiple attempts had to be carried out due to connection issues (“we had to wait a while for the Bluetooth to come on because it just would not pair for a while, then we just went back and started again”), (“it was slow, it kept buffering, so I had to keep going back”), and commented on the poor responsiveness of the technology compared to QR and Chirp (“Bluetooth was slow and we were not sure of what was happening”). This resulted in frustration and feelings of dislike towards the technology (“it annoys me when I have to wait and see if the signal is strong enough, [wait] for the signal to go through”).

Three participants commented on the ease of use of QR and their familiarity with the technology (“I used it before and I feel it’s very easy to use, it just scans quite easily..I guess it’s just what I’m best used to”), (“I found QR a lot quicker and I’ve had experience with it before so it was easier for me”).

Although feeling that QR was the fastest among the technologies, 5 out of 12 participants reported that QR required some degree of effort with device proximity and alignment (“it’s annoying to have to match the camera to the QR code”), (“in the beginning there was a problem when we were too close and also we need two phones together, so it’s a bit more interaction”), (“I wasn’t sure at what angle I had to scan it”). Some participants also declared disliking the QR interaction, due to issues encountered in low lighting conditions (“I don’t really like using QR codes in the real world because if the lighting is not right or you just have trouble positioning the phones”), (“I think the QR code was fastest but I don’t like having to scan a code”).

Half of the participants (6 out of 12) agreed that Chirp was very easy to use and required minimal user effort for completing the task (“Chirp was quite easy, it’s just one step”), (“Chirp was still a lot better than QR code because it wasn’t as fiddly”), (“Chirp was really easy, you just had to click and it was done”), (“I found Chirp really easy to transfer”), (“Chirp is good in that you don’t have to move your phone and, I don’t know how far away you can be from the other person but, it seems like it would work quite well”). There were no reports of Chirp being difficult to use or requiring effort.

Perceived transfer speed (12). :

All participants based their preferred technology on the perceived speed of the data transfer (“when it was just done quickly it felt more efficient, it kind of felt better”), (“the faster it works the better it is”).

QR: (“QR [was my preferred method] because it was really fast”), (“QR code it’s quick and easy to use”).

Chirp: (“Chirp was the best because I didn’t have to wait for the signal to be strong enough, and I didn’t have to pair”), (“it was unexpected, in the sense that when I share and then the sound comes out and it’s done”), (“it was faster than Bluetooth and QR”), (“it was very very fast”), (“I had to press only one button and bang! it was done”), (“it was so instant, I was so impressed by it”).

However, it should be noted that user perception of the transaction time is subjective, and it is unclear whether all participants measured the time it took to complete the task from the moment they had started playing out the scenario, or if they rated transaction speed from the time they actively shared data.

Sound (12). :

Participants expressed mixed feelings about the sound emitted by Chirp. However, feelings of dislike were mostly associated to the loudness of the sound, with 7 participants expressing they felt the volume was too high (“it was a bit high”), (“it was quite loud”), (“it was too high pitched”), whereas 2 participants reported not liking the actual sound of the system (“I didn’t like the sound”), (“it was a very squishy sound”). However, those participants confirmed they wouldn’t have an issue with the sound if they were able to set the volume lower (“if it was a quieter sound then I feel it’d be fine”), (“it was fine, maybe the volume could be lower”).

Three participants mentioned they would like to have control over the sound (“I was wondering, can you control the volume?”), (“I would definitely want it with the sound. It could be slightly quieter. Maybe it’s great to have the option, but the sound is really cool”), (“if there was a change of sound with something a bit more pleasant it would be a bit better”), or having the option of an ultrasonic version of the method (“[I’d prefer a version with] no sound”).

Four participants made positive comments about the sound (“I thought it was really cool”), (“it’s a lovely sound”), (“it’s a really nice sound and you felt like something is happening”), (“I was fascinated by the sound”), (“it has a certain tonality”), (“it’s very unique”), (“it had a calming effect”).

Two participants reported the sound provided a feedback of the state of the task (“it’s going on”), (“you felt like something is happening”), and another participant felt the sound of the method would benefit hearing-impaired users (“I thought it would be good for people with hearing difficulties”).

Novelty of data over sound (3). :

Three participants expressed their interest for the novelty of the approach (“it was really cool that it was transferring data through sound”), (“I did like the idea of the Chirp [..] it’s something different from anything I’ve ever used before”), (“it was a completely new thing”).

6 Discussion

We presented a first evaluation of user experience during acoustic data exchange, by developing a simple contact sharing application where users could exchange contacts via BLE, QR and our implementation of acoustic data transmission, Chirp. From observations, it emerged that participants generally considered transaction time to be the main factor for determining their preferred data transfer method, irrespective of the effort required. The differences in transaction time are limited by hard floors of the technologies. For Chirp, this is determined solely by the data rate. For BLE, it will be determined by the data rate, scanning period (which determines the speed with which devices are detected) and number of devices that the user has to choose from (which will be dependent on the number of active Bluetooth users within range). For QR, the factors are more complex, where a successfully transaction requires coordination and communication between users and physical effort to align devices.

This highlights that ‘technical’ specifications of technologies based on metrics such as data transfer speeds can not be solely relied upon as determinants for their effectiveness in terms of interaction times. For example, QR codes have the potential to provide the fastest means of transferring data (up to a limited payload size). However, in reality, the scanning process can take a notable amount of time and effort. In addition, whilst BLE was the slowest technology overall, there was considerable variability in the data, and some cases where the transaction times were comparable to QR and Chirp, with the fastest BLE transfer being \(\sim \) 1.5 s.

6.1 Perceived interaction time versus actual interaction time

Despite transaction time being a main factor in terms of user experience, there is a mismatch between the actual transaction time which reflects objective time (as defined for the quantitative analysis), and the time that users perceived the transaction to take, as indicated in the results of the UMUX survey. For example, QR was not necessarily faster for the whole transaction, due to having to align phones. However, due to the fact that the transaction seemed instantaneous as soon as the phones were aligned, it creates the perception of a fast transaction. This indicates that, although, users tended to find the alignment process frustrating, they did not consider it as part of the actual transaction of sharing a contact. In terms of user experience, it is the subjective experience of time rather than the actual time of completion recorded by the system that account for time.

Problematic time-related experiences do not occur when users are engaged in performing a task [69], but waiting and interruptions can cause negative experiences. Furthermore, a lack of information about the expected waiting time can lead to an increase in the perceived waiting time [2], which consequently affects a user’s perception of the time taken for the whole interaction. However, a user’s perception of the speed of an interaction (whether accurate or not) affects their enjoyment in performing the task [47]. Another factor to consider is user tolerance threshold, as introduced by [69], arising from a user’s expectation. If users experience a perceived duration under their tolerance threshold, then they will judge the interaction as fast, whereas if the perceived duration falls beyond the threshold, they will judge it as slow, independently from the actual duration time. As such, we also cannot rely on the measured time as a measure for user preference, but must consider the perceived interaction time when designing technologies for device-to-device communication that involve user interaction.

6.2 Effects of transaction number on interaction time

The pairs of participants transferred three contacts between each other, giving six transactions in total per session. Although, it was not prescribed to do so, participants tended to share all their 3 contacts at once, before receiving 3 from their partner. Given this pattern of interaction, we found a notable effect of transaction number (1–6) for both QR and BLE, but not for Chirp (Fig. 4). The first and fourth transaction in each session tended to take more time than the third and sixth respectively, indicating that for multiple transactions in the same direction, transaction time is reduced with each subsequent contact shared. This can be explained for QR, where the initial transaction required the receiving phone to be positioned accordingly (whereas for subsequent transactions the phones were typically already in position). For BLE, it is likely to be indicative of a usability factor, i.e. once the user knows they have to select the device to send to, the subsequent transactions are naturally faster. As such, we might take the best-case scenario transaction times by only looking at those for transactions 3 and 6. Here, there is actually little difference between modes. Nonetheless, the effect of transaction number highlights an important usability difference in terms of the ability of people to immediately use the technology, for which Chirp outperforms both BLE and QR. This is a notable finding, particularly considering that all participants reported previous experience using BLE and QR, but not Chirp. In addition, it highlights that for applications where multiple items are to be sent in succession, interaction times may eventually reflect the technology-specific data rates.

6.3 Transaction failures

Beyond transaction time, one of the major user experience issues of device-to-device communication is when things go wrong and a transaction attempt is unsuccessful. Although all 108 transactions were eventually successful for all three technologies, there were instances where multiple attempts were required. For BLE, this was typically due to the recipient’s device not being found during the scanning process, and the users deciding to ‘go back’ and re-scan for devices. This is an issue that regular users of Bluetooth will be familiar with. For Chirp, there were two instances where the sound was not correctly decoded by the recipient’s device. Finally, the fact that all QR codes were successfully transferred on the first attempt to ‘share’ should be interpreted with caution, because although the senders never had to ‘go back’ and reopen the QR code, the recipients did not always manage to successfully scan the codes on the first attempt.

6.4 Audibility and audio volume

Finally, we found high variance in user preference for the sound of Chirp. In this study, we used an audible version of Chirp, in order to investigate the effect of ‘hearing’ the transaction (and thus increasing the visibility of the technology) from a user perspective. It has been previously shown that using modalities such as sound to convey information in the design of mobile interfaces reduces short-term memory loads [28], potentially enhancing the user experience. However, the participants did not appear to directly equate the audible transactions to a more ‘informative’ experience. In general, there was no clear consensus on whether the sound was perceived to be a positive or negative element of the interaction; some participants enjoyed the sound and novelty of the technology, whereas others disliked the aesthetic. In addition, many users expressed a preference to have some control over the loudness.

It should be noted that, during the study, the volume of the devices was set to a medium level and kept consistent for all participants. For future studies, it might be more suitable to allow participants to adjust the volume, or ask participants to set a volume of their choice before performing the task. Chirp does not inherently rely on being audible, and as mentioned in Section 3, inaudible transmission is possible. Therefore, in a real-world application, it may be desirable to provide some level of user control over the encoding method or to give the option of transmitting data using audible or near-ultrasonic (inaudible) signals.

7 Conclusions and future work

In this paper, we provided an initial evaluation on the use of wireless data-sharing technologies for peer-to-peer information sharing. We measured and compared the benefits of three different data-sharing technologies: Bluetooth (BLE), QR codes and Chirp (acoustic data transmission), in terms of the time taken to complete a transaction and the user experience of doing so.

Our main findings identify perceived transaction time as a major factor in determining user preference for each of the technologies in question. We found that real-world transaction times were lowest for Chirp, followed by QR codes, and were considerably higher for BLE. In general, it follows that QR and Chirp offer significantly more positive user experiences than BLE for the basic contact-sharing task presented herein, as confirmed by user feedback.

Users expressed frustration at BLE due to pairing or device selection issues, and with QR for the physical coordination required to align devices and scan a code. In addition, users were divided on the aesthetic nature of the sound within Chirp’s implementation. However, all participants identified both QR and Chirp as easy to use and meeting the requirements of the technology for the task.

This work identifies that acoustic data transmission technologies such as Chirp constitute a promising alternative to the more common QR and BLE technologies. This is particularly so for tasks that involve ‘one-off’ transactions of data between devices such as mobile phones, computers, and tablets. However, further work is required to establish user preference for different data encoding schemes, each of which offer different sonic aesthetics, and to further understand the role that the sound of audible data transmission plays in the overall user experience.