1 Introduction

In the audio production industry, visual-impaired sound engineers, musicians and audio production specialists rely on screen-readers to access digital audio workstations (DAWs), which are the primary tools for modern sound editing. However, unlike traditional audio production tools, modern DAW interfaces are highly visual and incorporate a number of graphical representations of audio parameters to support editing and mastering, such as waveforms and automation graphs,Footnote 1 which are inaccessible to users of screen-readers. In this context, we were interested in engaging end users to examine how non-visual interaction techniques can be used to design effective access to modern DAWs. Solutions to addressing accessibility issues faced by users living with visual impairments should be designed using non-visual modalities, such as audio, tactile and haptic displays. However, expressing design ideas that exploit these modalities can be challenging. Unlike graphical designs, which can be drawn, edited and manipulated using low cost means, such as paper prototypes, it is harder to articulate, for example, how a particular shape or colour could be represented auditorally or haptically, or how to interact with an auditory or a tactile object [1]. Additionally, involving users living with visual impairments in the design process means that visual tools that are typically used in participatory design should be adapted or replaced to accommodate the particular needs of this population of users. We developed and applied a user-centred approach that incorporates various techniques to help make a participatory design process more accessible to people living with visual impairments. This paper extends our previous work on the use of this participatory design approach [2, 3] by outlining further details of the design process, describing the audio-haptic solutions that resulted from it and their evaluations with users, and reflecting on the design process and its outcomes.

2 Background

2.1 Participatory design

The use of participatory design (PD) within HCI projects is widely discussed, such as by Seland and Svanaes [4] and by Lee et al. [5], in contexts ranging from the design of mobile systems to creating art in Taiwanese night markets. Key techniques of contemporary participatory design—such as video brainstorming—are discussed by Mackay et al. in a video tutorial with accompanying text [6], Holmquist discusses a form of structured brainstorming he describes as “Bootlegging” [7] and the more physical technique of ”Bodystorming” is discussed in [8]. However, many of these techniques break down when the participants are visually impaired due to their reliance on the visual and spatial relations or interactions, and new techniques or variations need to be explored, as we discuss in [3].

2.2 Non-visual participatory design

A number of researchers have attempted to use alternative methods to overcome these issue. For example, using a scenario-based approach enabled rapid communication during workshop activities involving students and visually-impaired stakeholders [9]. A detailed description of this approach is given by Sahib et al. [10] where scenario-based textual narrative was tailored and used as a basis for design dialogue between a sighted designer and visually-impaired users. Other approaches that proposed alternatives to visual design tools include the use of a tactile paper prototypes, which was developed as part of the HyperBraille project [11] and lo-fi physical prototypes [12]. A workshop that ran as part of the NordiCHI conference in 2008 focused on developing guidelines for haptic lo-fi prototyping [13], many of the suggestions made during that workshop can be used as part of an accessible participatory design process. For example, using lego models and technology examples together with scenarios to help give users first hand experience of designed tools [14], or tangible models, such as cardboard mockups and plastic models, to support early prototyping activities of accessible haptic and tactile displays [15]. More recently, Magnusson et al. [16] showed that it is possible to create fun, rich and social co-located games with wearable technology employing very simple interactivity and audio playback.With its focus on the roles that sound can play in interaction, sonic interaction design (SID) has also emerged as a design field at the intersection of auditory display, interaction design and ubiquitous [17]. Hug et al. [18] provide an overview of a series of workshops on SID, which investigated methodologies for sonic interaction design that can integrate into existing design practices. Brazil [19] reviews methods and frameworks focused on the early conceptual design of sonic interactions. We drew upon techniques such as technology examples and audio-haptic lo-fi design in the approach we present in this paper.

2.3 PD and research through design

Parts of our design process also drew on Gaver’s approach to research through design (RTD), presented at CHI’12 [20] and now the subject of a biannual conference. Prior to this, Zimmerman et al. [21] discussed in CHI’07 how RTD could present a new paradigm for interaction design within the HCI community. Since then, the use of RTD methodologies has been discussed within various contexts, from developing new programming languages [22] to the word of gardening [23]. As we discuss, an RTD approach allowed us to draw on techniques from Participatory Design whilst benefiting from the diverse skills of the different stakeholders of “actors”, such as the connections and knowledge of the interactive digital instrument building community at the authors’ universities, the knowledge of design and materials of the researchers involved, and the specific expertise of the participants with visual impairments. Furthermore, as part of our approach, we envisioned the design process itself as being constituted by a dialogue between different groups or “actors”, often taking the form of material communication through objects alongside more conventional techniques (such as video interviews). We discuss this further in [24].

Fig. 1
figure 1

Example of a densely visual DAW interface

Fig. 2
figure 2

Overview of our approach to conducting participatory design with visually-impaired audio engineers/producers

2.4 Design domain: digital audio workstations

In the audio production industry, non-sighted audio engineers and audio production specialists rely on screen-reader technology to access modern DAWs. But DAWs interfaces are highly visual and incorporate a number of graphical representations of sound to support editing and mastering, such as waveform representations, which are entirely inaccessible to screen-readers (e.g. Fig. 1). In a competitive industry, the time it takes to overcome these accessibility barriers often hinders the ability to deliver projects in a timely manner and to effectively collaborate with sighted partners. For many visually-impaired engineers, financial practicalities and working speed become professional constraints. They have noted this repeatedly in interviews and discussions throughout our research:“I have to be able to work as fast as a sighted person, and that’s just the reality of the industry I work in” (SB). And in conversation between participants during the initial exploratory workshop: SB: “Its turnaround time as well – they’ll want it, someone’ll send you something they’ll say right I want my two tracks and I want it by ten o’clock tomorrow morning – ideal if you can see what you’re doing sometimes.” JR: “Of course there’s an element then of choice as to whether you accept that commission, at that point, isn’t there, you could say actually no my turnaround time is not that so forget it, I’m not doing it.” PB: “Yeah but it’s a huge issue, just the time it takes to do stuff.”

3 Approach

Figure 2 shows an overview of our user-centred approach to conducting participatory design with people living with visual impairments. It was organised around three main stages: an initial exploratory workshop, followed by a series of iterative participatory prototyping workshop sessions, and a final evaluation workshop. We describe each stage in the following sections together with the accessible techniques we employed and the designs that resulted from this process.

3.1 Participants and setup

We advertised a call for participation in a number of specialised mailing lists for professionals living with visual impairments. We called for participants who specifically come across difficulties when engaging with sighted colleagues due to the inaccessibility of tools they have available to them in the audio production industry. We recruited the first 18 respondents (14 male and 4 female, mean age 47) who worked across a number of domains as professional musicians, audio production specialists, sound engineers, and radio producers. All participants had no or very little sight, and all without exception used a speech or braille-based screen-reader to access information, and used a mobility aid such as a cane or a guide dog. Workshop sessions were held at the authors’ institution in an informal workspace and lasted between three to five hours each.There was a period of four and eight weeks between the first and second stage and between the second and third stage, respectively. Workshops within the second stage were separated by a period of two weeks. We took extensive notes during the workshops, and video recorded discussions.

3.2 Stage 1: initial workshop

We set up an initial workshop with participants that included three main activities; focus group discussions, technology demonstrations, and audio-haptic mockups design. We took an active part in all these activities, engaging with participants in the discussions and jointly exploring design ideas.

Focus group discussions The discussions were structured around a number of topics to achieve the following aims:

  • Establishing an understanding of current best practice in participants’ various working domains and how current accessibility technology supports it.

  • Establishing an understanding of the limitations of current accessibility technology.

  • Building consensus around a priority list of tasks that are either difficult or impossible to accomplish using current accessibility solutions and that participants would like to be accessible. The aim was to use the list of tasks to drive the participatory design parts of this initial workshops as well as to set the direction for follow up activities.

Fig. 3
figure 3

Some of the technology demonstrated in the initial stage (haptic devises, sonifications and motorised faders)

Together with participants, we explored work practices and current solutions available to audio engineers. Participants explained how extending screen-reader functionality with specialised scripts is the most popular approach used to improve the accessibility of ”hard-to-use” applications such as DAWS, and how they remain inadequate when accessing waveform representations, applying time-varying sound effects, or navigating a large menu structure to locate parameters of interest.

Technology demonstrations The second part of the initial workshop involved hands-on demonstrations of a range of technologies that could be used as a basis for designing better solutions to the accessibility limitations of DAWs identified in the focus discussions. Technology demonstrations were performed on either a one to one basis or with pairs of participants. The technologies we demonstrated included haptic devices (a Phantom OmniFootnote 2 and a FalconFootnote 3), a multi-touch tablet, motorised faders (Fig. 3), alongside examples of sonification mappings and speech-based displays of information. We chose these technologies because the have been shown to be effective means for non-visual interaction. We demonstrated the capabilities of a given technology deliberately without any reference to actual applications so the possibilities offered by the technologies were not constrained by specific domains or context. For example, in order to ensure an application-independent demonstration of the Phantom Omni and Falcon haptic devices, we used a custom program that allowed us to switch between different effects that could be simulated with these devices, such as vibration, spring effects and viscosity.

Fig. 4
figure 4

Foam paper, audio recorders, adhesive label tags and tag e-readers used to create lo-fi audio-tactile mock-ups

Audio-tactile physical mock-up design We then invited participants to actively think through new designs in the last part of the initial workshop. Having had a hands-on experience with the capabilities of new technology, participants worked in small groups, with one to two design team members forming part of each group, and explored the design of a new interface that could be used to address some of the problematic tasks identified in the initial discussions. To help with this process, we attempted to use an accessible version of physical mock-up design [30]. The material used to construct the physical mock-ups included foam paper, basic audio recorders, label tags an electronic tag readers (Fig. 4). Foam paper could be cut into various forms and shapes with the assistance of the sighted group member and used to build tangible tactile structures. Self adhesive tags could be attached to pieces of foam paper, which could then be associated with an audio description that can be both recorded and read using electronic tag readers. Additionally, basic audio recorders (the circular devices shown on Fig. 4), which could record up to 20 seconds of audio, were provided to allow participants to record additional audio descriptions of their physical mock-ups. Thus, different pieces of audio-tagged foam paper forms could be organised spatially and, if combined with the audio recording devices, could constitute physical lo-fi semi-interactive audio-tactile mock-ups of an interface display or a flow of interaction. To close the session, participants were invited to present their physical mock-ups to the rest of the participants.

3.3 Stage 2: participatory prototyping

The second stage in our participatory design approach involved conducting a series of participatory prototyping workshops to engage users in an iterative design process that gradually develops fully functional designs. We invited a smaller group of participants (2–3 participants who also took part in the initial workshop) to actively contribute to the design of basic prototype implementations that embodied the design ideas generated in the initial stage. We wanted to elicit the help of the same participants who were involved in the initial stage to ensure continuity in terms of where the ideas were generated from and how these are to be further developed and refined into concrete implementations. Participatory prototyping activities in this stage (Fig. 5) had a number of important characteristics. First, rather than being exploratory in nature—as was the case in Stage 1—activities at this stage were structured around the tasks that were identified as being problematic in the initial workshop. The aim was to expose the participants to prototype designs that embody the ideas generated in the initial workshop of how such tasks could be supported, and to work closely with them to improve on the implementations of these ideas through iterative prototype development. For example, participants used a sonification mapping that represented the peaks of a waveform to locate areas of interest within an audio track. The sonification mappings were based on ideas generated in the initial workshop, but could be manipulated programmatically in real time in response to participants’ feedback. Secondly, as opposed to the lo-fi physical mock-ups used in the previous stage, the prototype implementations were developed into a highly malleable digital form. Thirdly, each set of participatory prototyping sessions were held with the same group of participants through a collection of three to four workshops that were up to two weeks apart. While the design team worked on implementing participants’ feedback in the interim periods, participants were asked to keep detailed audio diaries of domain activities.

Fig. 5
figure 5

Participatory prototyping

Highly malleable prototypes The prototypes we developed to embody the design ideas generated in the initial stage of this approach were highly malleable because they supported a number of alternatives for presenting a given information or supporting a given task or functionality. The key to employing a highly malleable prototype in our approach is that it was easily customisable and alternatives were quickly and easily generated in real-time. We achieved this flexibility by developing custom control panels, which we had available to us throughout the participatory prototyping sessions.Footnote 4 For example, we developed a prototype DAW controller that supports the scanning of a waveform representation by moving a proxy in a given direction and displaying an audio-haptic effect whose main parameters are mapped to the data values represented by the waveform (e.g. amplitude mapped to friction and frequency mapped to texture; a haptification and sonification of data). This design was malleable in a number of ways; the direction of scanning could be altered to be horizontal or vertical and could be initiated at different starting points; the mapping used to drive the haptification and sonification of the waveform could also be adjusted in terms of scale and polarity; and finally, the haptic effects themselves could be altered to display, for instance, friction, vibration or viscosity. Additionally, the prototypes could also be reprogrammed in real-time. That is, if participants wished to explore an alternative implementation of a given functionality or feature that could not be readily customised using the control panels, we reprogram these features on the fly as and when this was needed.

Audio diaries Another technique that we employed in this stage was to ask participants to record audio diaries in the interim periods that preceded each participatory prototyping session. Specifically, we asked participants to attempt to complete similar tasks to the ones explored during the sessions at their homes or workplaces. We asked them to do this while using their current accessibility technology setup and encouraged them to reflect on the process of completing these tasks in light of the particular iteration of prototype development that they were exposed to in the preceding participatory prototyping session.Whenever participants produced an audio diary they would share it with the design team prior to the next prototyping session. This provided the designers with further feedback, thoughts and reflections that they could then incorporate in the next iteration of the prototypes and present to the participants in the next round of development.

Design workbook We gathered user feedback from prototyping activities in a design workbook [20]. The design workbook became the document through which we worked with an industrial designer to arrive at the final designs. The workbook outlined the brief, reviewed background related work, summarised user feedback up to this point, and served as a sketchpad for design form factors, materials, and technical specifications. It allowed us to collate our observations on how users interacted with the original prototypes and to understand their ergonomics. The workbook also acted as a scrap book, where related designs and design ideas could be quickly collected.

3.4 Produced designs

3.4.1 Point estimation in automation graphs

One of the tasks that was identified as difficult to achieve with current accessibility tools is applying time-varying sound effects to audio tracks. On a visual display, applying such effects can be achieved by drawing an automation graph overlaying the waveform representation of an audio track, which in turn involves editing the points that constitute the graph (e.g. Fig. 6). Editing the graph is accomplished by: (i) locating an existing point or creating a new one, (ii) estimating the point’s position on the X and Y axes, and (iii) altering these coordinates to reflect the desired level of effect (Y axis) at a given time on the track timeline (X axis). The representations that support these tasks are inaccessible to screen-readers.

Fig. 6
figure 6

Applying an effect to an audio track using an automation graph overlaying a waveform representation

The participatory design activities that we undertook explored how a range of alternative audio and haptic representations of automation graphs could be used to improve the accessibility of these artefacts in DAW interfaces. Participants highlighted the importance of providing adequate feedback to indicate the positions of automation points. However, representing the position of an automation point on the Y axis as a single tone was deemed by participants to be insufficient as they needed to know the position of a given point in relation to other points. We thus explored a number of alternative sonifications for conveying the position of automation points.We first designed a simple auditory interface to support users estimating the position of a point when placing it at a desired location on an axis. The interface allows users to manipulate the position of a point using the keyboard up and down arrow keys on an axis. We then designed interactive sonifications to convey feedback about the position of a point and references that mark how far it is from an origin.

Pitch-only sonification mapping In the first design, we sonified the position of a point on an axis by mapping the pitch of a sine tone to the point’s Y coordinate following a positive polarity. That is, the tone’s pitch changes in accordance with the point’s movements on the axis; moving the point up increases the pitch, moving it down decreases it. We used an exponential function to map the position of the point to frequencies in the range of 120–5000 Hz. The range and mapping were chosen to fit within the human hearing range and was also based on workshop participants listening preferences for minimum and maximum values. With the exponential distribution, subsequent frequencies differ by a constant factor instead of a constant term and this has been found to be superior to linear mappings [25]. Interaction with this sonification was designed such that the point moves when users press and hold a cursor key. Pressing and holding a cursor key would therefore trigger a continuous sonification of the point as it moves on the axis.

One-reference sonification mapping In the second design, we used the same pitch mapping described above and added one tone to convey a reference to an origin point. In this case, the reference tone represented the middle point on the scale (a pitch frequency of 77 4Hz lasting 100 ms). We designed this such that the user hears pitch changes that correspond to the movement of the point when they press and hold a cursor key, and hears the reference tone with a static pitch on key release. Comparing the two pitches (on key pressed and on key released) is meant to provide a sense of distance between the current position on the axis and the origin point based on pitch difference; the larger the difference in pitch between the two points the further away from the origin the point is located.

Multiple-references sonification mapping In the third design, we again used the same pitch mapping as described above. But, instead of hearing only one reference point on key release, the user hears multiple successive reference tones with varying pitches that correspond to all the points between the current position and the origin reference. Previous research has shown that the threshold for determining the order of temporally presented tones is from 20 to 100 ms [26]. To create a succession of tones, our reference tones lasted 50 ms and were interleaved by a delay also of 50 ms. In this case, the position of a point in relation to an origin can be estimated by judging both the pitch difference at that point compared to the subsequent points, and the length of the sum of successive tones that separate it from the origin. A longer distance yields a longer succession of tones. Points located below the origin trigger an ascending set of tones, while those above the origin trigger a descending set of tones.

Fig. 7
figure 7

Phantom Omni device and the virtual vertical axis used in the audio-haptic design. (1) A free-form haptification rendering the axis as a smooth line, and (2) a grid-based haptifications of magnetic points to accentuate scale positions

Audio-haptic interaction We also designed a simple user interface that allows users to manipulate the position of a point using a Phantom Omni haptic device instead of a keyboard by using its proxy to traverse a virtual axis with a vertical motion (Fig. 7). The virtual axis was designed to be 16 cm tall, sitting 5 cm above the base of the device and 16 cm away from it. We explored two basic haptification designs to render the vertical axis; In a free-form haptification design, we rendered the axis as a smooth line (Fig. 7(1)). In a grid-based haptification design, we introduced a grid like structure by highlighting each position on the line with a magnetic effect such that moving the proxy along the line feels like snapping from one point to another. Points were positioned about 0.5 cm apart (Fig. 7(2)). A quick upwards or downwards movement in this design gives a textured as opposed to a smooth haptic sensation. In addition to these haptifications, the user movements were constrained to the virtual axis, allowing users to feel both the top and the bottom of the axis. Movements on the axis and reference to the origin were also sonified using the sonification designs described above, with the exception that a grid-based haptification rendered the sonification to be discrete rather than continuous.

3.4.2 Sonification of peak level meters

Another task that was identified as difficult to achieve using screen-readers is identifying areas of interest within an audio track, for example whether the amplitude of an audio track goes past a threshold that causes the signal to distort—also known as clipping.This is typically represented within a waveform or a visual indicator called the peak level meter, which conveys audio levels in real-time by flashing amber and red coloured signals (e.g Fig. 8).

Fig. 8
figure 8

Visual peak level meters (left) showing a red colour when a threshold is exceeded. Extracting peak level information from the waveform to be sonified to convey audio levels and threshold crossing (right)

Participants highlighted that the sonifications we designed for point estimation could be used to access peak level meter information. We thus explored how these sonifications could be modified and used to monitor variations in the shape of a waveform and to highlight clipping areas. The result was a sonification that can be used in two modes: a continuous mode in which the peaks of a signal from an audio track are used to modulated the frequency of a sine wave (Fig. 8) and a clipping mode in which the sine wave modulation is only displayed when parts of an audio track exceed a user specified threshold. The clipping mode produces a short alarm beep (200 ms) each time the audio level goes past the threshold set by the user. We also used stereo panning to indicate whether the clipping occurs on the left or right audio output channel.

3.4.3 Haptic display of waveform amplitude

A further key sound production tasks identified as difficult to achieve by our participants was to gain a sense of the overall shape of the audio waveform. A sighted engineer quickly gains a great deal of information from looking at a waveform: the presence of silences, loud parts of the audio, and beginning of sections of the overall dynamics of a track. The graphical modality of audio waveform representation are unreadable by standard accessibility tools such as screen-readers.Based on early feedback from participants where custom software was implemented on off the shelf haptic interface hardware, we sought to design and build a custom hardware device conceived specifically to function as a haptic audio waveform editing interface. Given that sound represented in a graphical waveform is audio amplitude (Y) as it evolves in time (X), we similarly asked if we might constrain haptic interaction to two dimensions. This brought us to think of a planar construction, mapping the Cartesian display space of a visual waveform to the haptic domain. The initial prototype of the HapticWave was built using a disused scanner bed, to which we coupled a Copenhagen Institute of Interaction Design (CIID) M & M modified arduino board and two 60 mm linear faders. The faders displayed the amplitude of the volume at any given point in the sample, which was located using a infrared distance sensor: the user could move through the sample by moving the scanner bed, whilst the faders displayed the amplitude at this point. This used serial communication and worked with either Max MSP Footnote 5 or our own accessible DAW software (Fig. 9).

Fig. 9
figure 9

The HapticWave running with our own wave editing software: early scanner bed prototype (1); final design (2)

3.5 Stage 3: final workshop and qualitative evaluation

In the third and final stage of our design process we invited three participants who also took part in the second stage to evaluate the final designs and provide further feedback. We asked participants to complete semi-structured tasks based on those identified as difficult to accomplish using current accessibility solutions in the initial workshop. The choice of tasks was chosen together with the participants prior to the workshop. The aim was to test the developed solutions using realistic scenarios that matched participants’ actual working processes. To evaluate the sonifications of point estimation, we presented participants with audio tracks that were unknown to them and asked them to create automation graphs to apply various audio effects at different points on the tracks, such as panning and mixing track by inserting fade-in and outs at different points. This task involved scanning through the tracks, identifying where an audio effect should be applied, and then creating an automation graph by inserting automation points and estimating their positions. For the peak level meter sonification, we asked participants to examine a set of audio tracks and to use the sonification to monitor audio signal levels and to describe the information they could extract from the sonification and how they would use this as part of their working process. For the HapticWave, we asked participants to find and export loops within audio tracks we provided, some speech and others musical. We asked them to isolate, for example, single words or percussive sounds, paying attention to the silences on either side of the sounds they were exporting. Being aware of and editing using these silences makes for a “clean” export, where fragments of unwanted sounds cannot be heard, which is highly important in audio editing.

3.5.1 User feedback

We collected participants feedback through observations, think aloud protocols and informal interviews. Participants were able to use the sonifications to create automation graphs, accurately inserting and editing audio effects to realise an outcome that they felt satisfied with. In particular, participants found that it was useful to combine the haptic and sonification displays as this gave their interactions an increased sense of immediacy and control. They also pointed out that, with little training (approx. 20 min), they were able to edit audio effects as fast as they would have when using their typical setup with screen-reader scripts, but that they felt more expressive with the non-speech audio-haptic designs. One participant commented that a speech output during a creative process such as mixing audio effects can be unpleasant and distracting, and that replacing the speech element with non-spoken output such as the designed sonifications made the process more engaging and enjoyable. However, participants also pointed out that it was sometimes difficult to know how much pressure to apply when manipulating the haptic proxy at a given automation point. One participant concluded that this could be because of the vertical motion that tool enforced, highlighting that horizontal movements, e.g. placing the device proxy on a physical flat surface, might be more intuitive. The free-form haptification seems to have eased this difficulty since it was also received much better than the grid-based haptification. Participants highlighted that a grid-based display allowed them to count their way through the display when estimating point positions and distances on the vertical axis, however they found this to be too restrictive, forcing them to work in a manner similar to using a speech display due to the discrete nature of movement. Reducing the magnetic force, or using an alternative haptic metaphor, such as ridges [27] could help address this issue.

Participants found the sonification of peak levels to be useful in exploring waveforms. Interestingly, the sonification designs were appropriated differently by each participants. For instance, one participant preferred the continuous sonification of peak levels to assess whether two audio tracks’ levels are consistent after mixing them together. They highlighted that a sonification of this kind gives them more insight than using a speech display, which can only provide access to a single track at a time. Another participant preferred to use the sonification in the clipping mode, but appropriated its use by looping a portion of an audio track while gradually reducing the clipping threshold until this displayed a consistently continuous tone, and monitored this consistency to judge the overall signal level of the track.

The participants provided feedback on a revised prototype of the HapticWave, which coupled the refined hardware with software written in the Max MSP environment, allowing users to loop samples, edit samples through inserting and modifying start and end points, preview edits and then export their edit. This was based on the same principles as the initial prototype: moving the fader carriage from left to right allowed for scrubbing through a sample, whilst the motorised fader moved to indicate the volume of the sample at that point. One user found the HapticWave gave them a very good representation of the waveform that they compared to their notion of a sighted user’s experience, noting,“it gives you this immediate, intuitive indication [..] to me it must be pretty much as good as a sighted person would get looking at the waveform” (PB). The same user noted a lag between audio and haptic feedback, editing entirely by feel, describing how the HapticWave allowed them to identify by feel the things a sighted user would find in a waveform: “it’s great [...] it’s like a dream [...] when I’m editing normally, I have to do it entirely by ear [...] I’m finding zero crossings and really good points by ear, which is fine [...] but this would be so much quicker and more intuitive”

We found that the participants preferred the 2d haptic display of the HapticWave against the 3d haptics of the other haptic devices, preferring the reference that the constrictions provided, with one participant suggesting, “that kind’ve 3d thing [...]my concept of that space, if I’ve got something to actually reference that against [...] that’s a much more realistic way of doing something” (SB), and another noting (of the Phantom) “there’s just too many degrees of freedom.”(SC) Another user found the physical interface of the HapticWave to be “more pleasant”(than the other haptic devices), because “it’s under one hand, and its familiar in the sense I’m used to controlling for example a little camera, or a mouse, or faders or anything else, and the way that that moves, quite likely, it’s more immediate, it’s just a lot less clunky than the Phantom.” (PB). The same participant noted, however, that the HapticWave would have to be better integrated into his own software for it be of genuine use to him, with the increased functionality he found in the Phantom (that is, beyond just feeling the amplitude of the waveform), noting: “If i was using it for editing, then I’d want it to be able work with my existing software ideally, so that it would be an interface i was using with say Sound Forge for editing, because it gives you this immediate, intuitive indication”(PB).

4 Discussion

The user-centred approach presented in this paper attempts to address issues associated with the accessibility of a participatory design process to people living with visual impairments. In particular, the approach emphasised the use of audio-haptic technology throughout the design process in order to facilitate discussions about audio and haptic percepts and help the envisioning and capturing of non-visual design ideas. In our experience, close interaction with participants through detailed and thorough workshops such as the ones reported in this paper, allows designers to gain an appreciation of the issues faced by users living with visual impairments and a deeper understanding of how these could be addressed. Similar findings were reported by Magnusson et al. [28] who showed that a longitudinal study design consisting of a linked sequence combining a focus group discussion, interviews, a diary study and lo-fi design workshops can be a useful tool for the exploration of non-visual interaction designs. In our co-designing experience, participants and designers brought different sets of expertise to the sessions. Participants had knowledge about the domain of their expertise but also in-depth knowledge about the practical limitations of current accessibility solutions, while designers brought design and technical knowledge. There was an element of serendipity in the design process, specifically in terms of our awareness about certain technologies: it was only through the networks of musicians and designers that some of us were involved with that we came into contact with the technology that allowed us to develop the prototypes. We also used methods of Research Through Design [20], which we feel allows for an open involvement of different groups within a design process, keeping that process adaptable and open-ended and valuing the multiple artefacts that embody elements of that process.

4.1 Prototype malleability

The technique of using highly malleable prototypes was successful in providing flexibility in interface designs which could be altered in real-time in response to user feedback. Similar results were reported in [29]. The malleability of the software elements was particularly useful for the HapticWave, which used Max MSP software for the audio editor it controlled. In a matter of seconds we were able to adjust, for example, the shortcut keys, to accommodate a left handed user, or introduce or change aspects of the functionality. The hardware for the HapticWave was also to an extent malleable, through technologies such as 3d printing. Being able to present participants with different alternatives and reprogram features on the fly captured an essential characteristic that is found in, for example, paper prototyping techniques that make them an extremely effective design tools [30]. The prototypes capacity to be adaptable in response to changes and feedback generated from the joint prototyping process is crucial in prototyping activities [31], and non-visual design tools should therefore incorporate flexible levels of adaptability for them to attain the same level of efficiency as their visual counterparts. While this was not true in our experience with using the physical audio-tactile mock-ups, which hindered rather than nurtured communication and exchange of design ideas, digital implementations of highly malleable prototypes afforded a more supportive medium of communication between non-sighted participants and designers. This approach might be taken further to develop a structured approach to malleable prototyping for multimodal interfaces, in which the best use of control panels and dynamically programmable prototypes, and possibly integrating other techniques such as capture of user feedback for example through automatic logging of user input could be made more methodical.

4.2 Communication and participation barriers

Not all the techniques used in the first stage of the design process achieved their expected outcomes and benefits. In the final part of the initial workshop, we observed that participants attempted to use the material provided to create audio-tactile mock-ups but, as discussions unfolded, they drifted away from these materials and focused on verbal exchange only. In our experience, the less material participants used the more ideas they expressed. Thus, the process of constructing these mock-ups seems to have hindered rather than encouraged communication. Our audio-tactile mock-ups have therefore had the opposite effect of their visual counterpart methods, where the use of mock-ups is often associated with engendering imagination and conversation [32]. While it is possible that training might change the situation, in general, one of the benefits of lo-fi mock-up design activities lies in the fact that they require minimal training while yielding significant design insights. More training is therefore not necessarily desirable in this case. Another explanation for this is that visually-impaired participants were not able to access the construction of a physical prototype in the same moment as it is being constructed and so the process lacked the emergent properties and illuminating qualities that it can have when shared by sighted co-designers. That is, the audio-tactile mock-ups no longer functioned as an immediate shared artefact, which may have contributed to decreasing the spontaneity that the visual counterpart process has. Indeed, the use of the physical mock-ups might also have contributed to creating an asymmetry between the contributions of the sighted designers—who could not only see the physical artefacts but also assist with their construction—and those of the other participants. In this sense, the shift away from the physical artefacts to the verbal descriptions would have contributed to balancing this asymmetry between designers and participants since all parties were then using a modality that could be equally shared amongst everyone.

A further possible explanation for the difficulty we observed with physical mock-ups is indeed the nature of the artefacts used to construct them as well as type of users we worked with. Cutting foam paper, tagging them and recording audio snippets might have been perceived to be cumbersome by our participants. The resulting mock-ups might not have been enough to compliment the ideas generated and discussed by the participants. In this sense, we suggest that the use of rich scenario-based design as describe in [10] might compliment and improve audio-tactile mock-ups. Additionally, the cumbersome character of audio recorders can also be replaced by Wizard of Oz design [33], although in our case, it could be difficult to provide adequate simulation of the kind of audio manipulation typically associated with interaction with digital audio workstations. Indeed, another possibility for the observed communication barriers is that the tasks that users were trying to design for were too complex to be captured using the lo-fi material provided. Our observations are nonetheless in line with previous work that found narrative scenario-based design to be a particularly effective tool of co-designing with participants living with visual impairments [9, 10]. We note that thorough comparisons of these different methods for non-visual participatory design are generally lacking and more studies are needed to further investigate these issues.

4.3 Appropriation and workshop activities

There were limitations associated with the insights we gained from the workshop activities. The user group was heterogenous in terms of the types of audio editing they did, ranging from producing podcasts and editing audio books to recording and mastering songs by full bands. The users had their own unique setups in terms of software, recording environments, technologies and workflows. The workshop activities therefore did not attempt to replicate participants working environments per se, but rather aimed to jointly explore design problems and come up with potential solutions that capture their experiences. The relatively short time frame of workshop activities meant that participants were not able to properly test the technologies within longer workflows. As one participant noted: “When you come into a workshop – and that was brilliant, being able to sit down with you and other visually impaired people – but I think we all have our own unique ways of doing things, as all people do in their workflow, its the same product but we all use it for different things” (SB). This user borrowed the HapticWave for further tests and observed that this was far more useful, noting “I’ve been able to use it in my own workflow.” Another participant who continues to use the peak level sonification noted: “I’ve had my hands on [the accessible peak meter] now for a couple of months and I’d say I’m still finding weird and wonderful uses for it”(SC). These comments highlight a more general issue with participatory design activities in that they do not always give designers the kind of insights that could be gained from longitudinal usage of the designs they produce. Longer usage leads to appropriation as highlighted, for example, by the different uses of the peak level meter highlighted in Sect. 3.5.1. The workshop cycle, therefore, cannot be seen to comprehensively exhaust the benefits a user might gain from a piece of music technology, nor the problem they might encounter with it.

4.4 Methodologies and future directions

Two further reflections on the evaluation and analytical methodologies we used are worth noting. First, the think-aloud protocol we employed with the users in the final stage was successful in spite of the fact that participants were being subjected to audio from the interface both in the form of speech from the screen reader and non-speech sounds from the sonifications. One factor that might have contributed to the success of using this technique is that users living with visual impairments are perhaps used to both talking about their experiences descriptively and to handling and integrating screen-readers output with other interactions. In the case of the audio diaries, the running commentary provided by the participants was valuable in given us access to actual in-situ experiences with current accessibility solutions that would have been harder to tap into otherwise.

Second, we have gathered extensive user reactions through feedback, observation, and video and audio recordings (and noting that sometimes the video showed a different use of the device than the participant described). However, were did not use established methods for analysing video and audio content, such as thematic or conversational analysis. More insights could therefore be gained for employing further analytical methods. We also chose not to capture quantitative aspects of participants interactions in the final workshop as we believed this was an ineffective an inappropriate way to obtain the data we needed or serve the community we were working with. Our users were a heterogenous collective who had all evolved deeply personal tactics for audio productive in professional situations and home studios using unique and varied combinations of off-the-shelf and customised accessibility tools. There was no single point of comparison or set of actions for a task-based performance study. In addition to this, studio techniques and music making are developed over time for both sighted and non sighted users, with individuals often taking years to develop their own personal combinations of hardware layouts, software, keyboard shortcuts or specialised scripts. Therefore, novel audio-haptic interfaces inevitably take time to learn and incorporate efficiently and effectively into the diversity of users’ workflows, making it very difficult—and not entirely helpful—to compare task completion times or other quantitative measures amongst those testing the device. However, we believe that with increased acceptance amongst users and increased familiarity with these devices, future work could include more quantitative task-based studies of the audio-haptic interfaces in comparison to existing techniques and methods.

5 Conclusion

We presented a user-centred approach for conducting participatory design with visually-impaired sound engineering, musicians and audio production specialists. This approach incorporates accessible means for expressing non-visual design ideas for editing audio using digital audio workstations. It emphasises the need to use audio-haptic throughout the design process in order to build shared vocabularies and support effective expression, communication and capture of auditory and haptic design ideas. We presented the design of sonifications and audio-haptic interfaces that resulted from this process and reflected on the benefits and challenges that we experienced when applying this approach.