1 Introduction

Recent developments in interactive technologies have seen major changes in the way artists and performers interact with digital music technology. Computer music performers are presented with a myriad of interactive technologies and afforded near-complete freedom of expression when creating computer music or sound art. In real time, they can manipulate multiple parameters relating to digitally generated sound; effectively creating gesture interfaces and sound generators that have no real-world acoustic equivalent. When presented with such freedom of interaction, the challenge of providing performers with a tangible, transparent and expressive device for sound manipulation becomes apparent.

DMIs present musicians with performance challenges that are often unique to computer music. One of the most significant deviations from traditional musical instruments is the level of physical feedback conveyed by the instrument to the user. Currently, new interfaces for musical expression are not designed to be as physically communicative as acoustic instruments. Specifically, DMIs are often void of haptic feedback and therefore lack the ability to impart important performance information to the user [1].

In the field of human–computer interaction (HCI), the formal evaluation of an input device involves a rigorous and structured analysis, often involving the use of specific methods to ensure the repeatability of a trial. The formality of the process guarantees that the findings of one researcher can be applied and developed by other researchers. In computer music, the testing of DMIs has been highlighted as being unstructured or idiosyncratic [2,3,4,5] (see Sects., 10.3.2, 11.4, 12.3 and 12.4). However, it is arguably challenging to accurately measure and appraise the creative and effective application of technology in a creative context. These aspects of a DMI’s evaluation cannot effectively be represented by quantitative techniques alone. In response to these shortcomings, we seek to gather data via both quantitative and qualitative means, as has been seen in other studies [3]. Presented within this chapter is an experiment that evaluates and compares the major components of haptic feedback. To achieve this, the feedback mechanisms of two prototype DMIs were assessed, namely the Haptic Bowl and the Non-Haptic Bowl, which were augmented to provide vibrotactile feedback [6]. The objective of the experiment was to quantify the effect of haptic feedback in the performance of pitch selection tasks; specifically, the move time and accuracy that could be achieved with different feedback types. In addition to measure the device performance, the user’s perception of usability and their overall experiences within the context of the experiment were also captured and analysed.

To formally structure the experiment, a validated framework of analysis was applied [7]. This DMI evaluation framework was designed to tackle the multiparametric nature of musical interactions while also assessing the practical design features applied in the construction of a DMI. By applying a structured evaluation model, users’ attitudes towards functionality, usability and user experience data while undertaking a pitch selection task were captured. For this analysis, a pitch selection task was chosen to quantitatively measure user performance and maintain objectivity in the investigative and evaluation methodologies that were later applied. Following this, structured post-task questionnaires were conducted after each stage of the experiment to elicit further information and to closely correlate quantitative with qualitative data. An empathy map for each feedback stage was then constructed to connect in-task results with post-task questioning.

In accordance with the evaluation framework, the structure of the chapter is presented as follows: each device is described and the feedback affordances they apply are reviewed; the experiment is then contextualised, stating the intentions and constraints of the study; a functionality trial is then presented that measures the move time and pitch selection accuracy of the different feedback stages; the usability and user experience data of the study are then presented; finally, the findings of the analysis and post-task data are discussed and concluded.

2 Experiment Design

It has been observed that traditional evaluation methodologies from HCI are unsuitable for the direct evaluation of DMIs without prior contextualisation and augmentation [1]. This is mainly due to the complex coupling of action and response in musical interaction (see Sect. 2.3). These two factors operate within the tightly linked processes of a focused spatiotemporal task. Therefore, if this process is interrupted for an evaluation (e.g. for a questionnaire or thinking-aloud protocols), the participants are inevitably separated from their instantaneous thoughts and therefore from achieving their goals. Due to this, any system of analysis that is applied outside of the interaction is disconnected from the task being evaluated. Similar problems exist in other areas of study, for example in the evaluation of gaming controllers [8]. To counter this, adaptive and reflective models have been developed in HCI that concentrate on specific elements of an interaction, and these techniques have been augmented to evaluate the participants’ experience in specific contexts. In the study presented, several validated HCI evaluation techniques were applied to combat the potential for task evaluation disconnect.

2.1 Functionality Testing

To assess the functionality of the feedback elements from the Haptic and Non-Haptic Bowl devices, an experiment was devised which required participants to use the interfaces in a non-musical pitch selection task. This task was designed to generate quantitative data that could be used to accurately compare each feedback stage. From analysing the functional mechanisms of both devices, a Fitts’ Law style experiment was designed.

2.2 Adapting Fitts’ Law

Fitts’ Law is used in HCI to describe the relationship between movement time, distance and target size when performing rapid aimed movements (Fig. 6.1). Per this law, the time it takes to move and point to a target of a specified width (W) and distance (D) is a logarithmic function of the spatial relative error [9]. While the logarithmic relationship may not exist beyond Windows, Icons, Menus, Pointer (WIMP) systems, the same experimental procedures can be followed to produce data for analysis in an auditory context [10, 11].

Fig. 6.1
figure 1

Fitts’ Law movement model

In the following experiment, we measured the time it took a participant to rapidly aim their movements towards a specified target pitch, which was constrained within a predefined frequency range. Essentially, physical distance was remapped to audio frequency range, where the start position corresponded to a point below 20 Hz and a target position that laid within a range less than 1 kHz. The target’s width was predetermined as a physiological constant of 3 Hz for sinewave signals below 500 Hz, increasing by approximately 0.6% (about 10 cents) as frequency increased towards 1 kHz [12].

2.3 Context of Evaluation

The evaluation context of the experiment was augmented to fit that of the performer/composer and designer’s perspective. These stakeholders concern themselves with how a device works, how it is interacted with, and how the overall design of a system responds to interaction [13]. Considering this, the experiment was purposefully designed to objectively evaluate the performance of device feedback and not the musical performance of the participant. To maintain objectivity, a feedback focused experiment was devised and executed to quantify the device performance in pitch selection tasks. Secondly, validated post-task questionnaires were issued to quantify the usability of the device. This was achieved by employing a Single Ease-of-use Question (SEQ), Subjective Mental Effort Question (SMEQ) and NASA Task Load Index (NASA-TLX) questionnaires. Finally, interviews focusing on user experience were conducted as well as a User Experience Questionnaire (UEQ) to evaluate how the participants experienced the interaction.

Although post-task user experience questioning is problematic due to user disconnect issues, previously validated techniques were applied to accurately evaluate each feedback stage. Firstly, a preference of use question was posed to the participants to evaluate their opinion on the practical application of feedback in their own performances [14]. Secondly, the UEQ was completed to collect quantitative data about the participant’s impressions of their experience [15]. This was followed by a moderately structured post-task interview formulated around specific topics. These known areas of concern in musical interactions included learnability, explorability, feature controllability and timing controllability [16]. These data were then subjected to content analyses. The content analysis topics were designed to elicit and explore critical incidents [17] that have been highlighted as problematic in the field of new instruments for musical expression.

Following the experiment, empathy mapping was applied in the context of user experience to understand and to form empathy for the end-user. This technique is typically applied to consider how a person is feeling and to understand what they are thinking better. This task was achieved by recording what the participants were thinking, feeling, doing, seeing and hearing as they were performing the task. With these data, it was possible to create a general post-experiment persona to raise issues specific to the context of the analysis. It is helpful to create empathy maps to reveal connections between a user’s movements, their choices and the judgements they made during the task in a way that the participants may not be able to articulate post-task. Therefore, empathy mapping data were recorded during the practical stages of the functionality study to capture instantaneous information about the participants’ experience without interrupting the task. Observations about what the participants said out loud, sentiments towards the device, their physical performance and how they used prior information of other devices during the experiment were recorded to validate and potentially expand upon the post-task questionnaire and interview data presented above.

2.4 Device Description: The Bowls

For the analysis of haptic feedback in DMI interactions, prototype devices were constructed (Fig. 6.2). Each device was designed to represent a variety of feedback techniques, and several different input metaphors were initially explored. From this assortment, two devices were selected that could display the unique characteristics of haptic feedback in combination and isolation, while affording the user freedom of movement in a three-Dimensional (3D) space around the device. Specifically, the Haptic Bowl and the Non-Haptic Bowl were chosen.

Fig. 6.2
figure 2

Haptic bowl (left) and Non-Haptic bowl (centre), user for scale (right)

2.4.1 The Haptic Bowl

The Haptic Bowl is an isotonic, zero-order, alternative controller that was developed from a console game interface [6]. The internal mechanisms of a GameTrakFootnote 1 tethered spatial position controller were removed and relocated into a more robust and aesthetically pleasing shell. The original Human Interface Device (HID) electronics was removed and replaced with an Arduino Uno SMD edition.Footnote 2 This HID upgrade reduced communication latencies and allowed for the development of further device functionality through the addition of auxiliary buttons and switches. The controller has very little in the way of performer movement restrictions as physical contact with the device is reduced to two tethers that connect the user via gloves. Control of the device requires the performer to visualise an area in three dimensions, with each hand tethered to the device within this space.

2.4.2 The Non-Haptic Bowl

This device is also an isotonic, zero-order controller, (based upon PING)Footnote 3 ultrasonic distance sensors and basic infrared (IR) motion capture (MOCAP) cameras, thus affording contactless interaction. The ultrasonic components are arranged as digital inputs via an Arduino Micro, and MOCAP cameras were created from modified Logitech C170 web cameras with visual light filters covering their optical sensors and internal IR filters removed. An IR LED embedded in a ring was then used to provide a tracking source for these MOCAP cameras. The constituent components are all contained within an aluminium shell, similar in size and shape as the Haptic Bowl. The use of these sensors matched the input capabilities of the Haptic Bowl, providing a comparable interaction. However, due to its contactless nature, this input device has fewer movement restrictions than the Haptic Bowl. Control of the Non-Haptic Bowl also requires the performer to visualise a 3D area, with input gestures captured within a comparable space to that of the Haptic Bowl.

2.5 Device Feedback Implementation

In addition to the user’s aural, visual and proprioceptive awareness, haptic feedback components were incorporated into the devices to communicate performance data to the user. In the Haptic Bowl, additional feedback was included in the form of a strengthened constant-force spring mechanism for both tether points. The devices spring mechanisms were strengthened to further assist in hand localisation and the positioning effects this created in relation to the main body of the instrument. Furthermore, for vibrotactile feedback, the audio output from a sinewave-generating audio module was rerouted to voice-coil actuators (see Sect. 13.2) embedded in the device’s gloves. The sinewave audio signal was routed via a Bluetooth receiver embedded within the Haptic Bowl. This device was then connected to the voice-coil actuators contained within each of the device’s gloves [18]. Therefore, providing sinewave feedback in real time that is directly related to the audio output, as is innately delivered in acoustic musical instrument interactions. It was also possible to apply this vibrotactile feedback to the Non-Haptic Bowl via the same gloved actuators. To achieve this, the sinewave audio output was again routed through the same type of Bluetooth speaker, but in this case, the speaker was kept external from the device. The removal of the speaker from the DMI was done to highlight the disconnect of these feedback sources in existing DMI designs.

From combinations formulated around these feedback techniques, it was possible to create four feedback profiles for investigation:

  • Haptic feedback (passive constant-force and active vibrotactile feedback)

  • Force feedback (passive constant-force feedback only)

  • Tactile feedback (active vibrotactile feedback only)

  • No feedback (no physical feedback)

Each feedback stage operated within the predefined requirements for sensory feedback as outlined in earlier research [19].

2.6 Participants

Twelve musicians participated in the experiment. All participants were recruited from University College Cork and the surrounding community area. The participants were aged 22–36 (M = 27.25, SD = 4.64). The group consisted of 10 males and 2 females. All participants self-identified as being musicians, having been formally trained or performing regularly in the past 5 years.

2.7 Procedure

All stages of the experiment were conducted in an acoustically treated studio space. The USB output from each Bowl device was connected to a 2012 MacBook Pro Retina. The serial input data from the devices were converted into Open Sound Control (OSC) messages in ProcessingFootnote 4 and outputted as UDPFootnote 5 information. Pure Data (Pd) then received and processed these data. Within Pd, the coordinates over the z-plane were used to create a virtual Theremin,Footnote 6 with the right hand controlling the pitch, and the left hand the volume. The normal operational range of both devices was altered to fit within an effective working range of 30 cm; this range lay slightly above an average waist height of 80 cm (the average height in Ireland, as of 2007, is 170 cm and the waist-to-height ratio calculated 0.48). A footswitch was employed by the participant to indicate the start and end of each test.

After a brief demonstration, participants were given 5-min free-play to familiarise themselves with the operation of the device. Following this, subjects were then given a further five min to practice the experimental procedure. The overall total time-on-task varied between participants and experiment stages, but remained within an average range of 1.5–2 h’ total. Participants were presented with each feedback type in counterbalanced order (a method for controlling order effects in repeated-measures design). For ecological validity, participants were required to wear the device-gloves throughout all experimental stages. The task consisted in listening to a specific pitch, and then seeking and selecting that target pitch with the device as quickly and as accurately as possible. The listening time required for remembering the target pitch varied between participants from only 5 to 10 s maximum. The start position for all stages was with hands resting in a neutral position at the waist. In each trial, participants used the footswitch to start and finish recording movement data. For each run of the experiment, eleven frequencies were selected in counterbalanced order across a range of 110–987.77 Hz. All frequencies in the experiment had a relative pitch value. Participants performed three runs, with a brief rest between each. The processing patch was used to capture input movement data and the time taken to perform the task; these data were then outputted as a.csv file for analysis.

After each feedback stage of the experiment, participants were asked to complete a post-task evaluation questionnaire and informal interview. All interviews followed the same guiding question:

  • What were the central elements of device feedback that resulted in task success or failure?

This directorial question was then operationalised by the following:

  • What positive attributes did the feedback display?

  • What negative attributes did the feedback display?

  • What features made the task a success or failure?

  • Describe this success or failure in a musical context.

Throughout the interview, interview-ladderingFootnote 7 was applied to explore the subconscious motives that lead to the specific criteria being raised. A Critical Incident Technique (CIT) analysis was then applied to extrapolate upon the interview data collected. This set of procedures was used to systematically identify any behaviours that contributed to the success (positive) or failure (negative) in the specific context.

3 Results

Functionality data were collected during the experiment so as to represent objective and quantitative measures that impartially represent the effects of feedback in audio-based exercises. Following this, the validated questionnaires and qualitative interview techniques were undertaken to gather subjective opinions from participants. Participants were not made aware of these performance data when being interviewed.

3.1 Functionality Results

The results from the functionality evaluation can be seen in Fig. 6.3 and Table 6.1. An analysis of variance yielded no significant variations in move time for the different feedback types, with p > 0.05 for all frequencies. For the individual feedback stages, participants could target and select pitches within the predetermined target size of 3 Hz for all frequencies below and including 261.6 Hz. As expected, the accuracy of pitch selection decreased with frequency increment. Above 261.6 Hz and up to and including 523.25 Hz, the deviation from target pitch increased, but remained within the expected range. Beyond this, from 523.25 Hz up to and including 975.83 Hz, the average deviation increased further. Notably, the no feedback stage of the experiment exceeded the expected deviation constant of 6 Hz for this range by 3 Hz. Like move time measurements, although there were practical variations in the accuracy of target selection across all feedback stages, there was found to be no significant effect of feedback on the accuracy of frequency selection, with p > 0.05 for all feedback types.

Fig. 6.3
figure 3

Mean move time over frequency for all feedback stages

Table 6.1 Average deviation from target for all feedback stages

3.2 Usability Results

For the SEQ, the participants were given the opportunity to consider their own performance and factor this into their response. Users had to fit their rating of performance based upon the range of answers available (7 in total) and respond to their interpretation of the difficulty of the task accordingly. The post-task SEQ answers can be seen in Fig. 6.4 and Table 6.2.

Fig. 6.4
figure 4

Diverging stacked bar chart for the SEQ

Table 6.2 SEQ evaluation for all feedback stages

For the haptic feedback stage, a larger portion of users (42%) found that the task was somewhat difficult for them to complete, and the perceived ease-of-use increased in difficulty for each feedback stage after this until the perception of performance decreased to a rating of very difficult (58%) for the no feedback stage. When verbally questioned, participants expressed that while they were fully engaged in the task, the perceived difficulty of performance using the devices was as it would be if they were performing for the first time with any new instrument. This increase in cognitive load moved them to consider their performance more critically. Participants were unaware of their actual move time and accuracy scores at this point.

A Friedman Test revealed a statistically significant effect of feedback upon SEQ answers across the four different feedback stages: x2(3, n = 12) = 31.75, p < 0.001. Following this, a Wilcoxon Signed-Ranks analysis of variance was conducted to explore the impact of device feedback on SEQ answers. There was found to be a statistically significant effect of feedback on device scores. The effect size was measured from 0.34 to 0.45. Post hoc comparisons indicated that the score for the no feedback stage of the experiment was significantly different from the haptic and force stages after Bonferroni adjustment. There were found to be no significant differences between haptic and force feedback and the tactile and no feedback stages. This indicated that the participants’ perception of task difficulty was significantly different from no feedback when force feedback was presented in the interaction. Furthermore, tactile feedback played no role in this perception rating.

In comparison to the SEQ, the SMEQ presented a near-continuous response choice for the participants to choose from (Fig. 6.5). Theoretically, this allowed the participants to be more precise regarding their estimation of the device’s usability. The premise of this scale was to elicit an indication of the user’s thoughts towards the amount of mental effort they exerted during the task. The mean value of the SMEQ answers for each feedback type can be seen in Table 6.3. The results support the usability analysis of the SEQ; however, this scale measured the amount of effort the participants felt they invested rather than the amount of effort demanded from them.

Fig. 6.5
figure 5

Boxplots representing mean SMEQ answers for each unique feedback element

Table 6.3 SMEQ evaluation for all feedback stages

A repeated-measures ANOVA was conducted to compare scores on the SMEQ scale. There was found to be a significant effect for feedback: F(3, 9) = 11, p = 0.002, with partial η2 = 0.79. The post hoc comparisons indicated that the score for the no feedback stage of the experiment was significantly different from the haptic, force and tactile stages. There was found to be no significant difference between haptic and force feedback stages.

Following the evaluation of perceived effort, the participant’s subjective workload was recorded with a paper and pencil NASA-TLX assessment questionnaire. In this, the total workload is divided into six TLX subscales, the results of which can be seen in Fig. 6.6. The first indicator in the NASA-TLX subscale required the user to signify how demanding they found the task in terms of its complexity. The observed results denote that a somewhat small amount of mental and perceptual activity was required, indicating that the task was simple to complete for all feedback stages. Next, the mean physical demand of the task was measured, showing that the participants found the task relatively easy to complete, and that a reasonable amount of physical activity was demanded from them in completion of the task. In terms of temporal demand—the time pressure felt in performing the task—the mean user rating of the experiment shows that the pace of the task was realistic and that participants were not rushed, had plenty of time to complete the task without pressure, and that the task elements were presented within a realistic time frame. In the self-evaluation of performance in the TLX questionnaire, participants indicated that they were relatively unsatisfied with their own performance.

Fig. 6.6
figure 6

NASA-TLX subscale ratings of usability for each unique feedback element

The users’ satisfaction with the success of their performance corroborates with the earlier findings of negative self-satisfaction in performance of the task. It also highlights some difficulties in the completion of the task and that a raised mental awareness was required during its execution. Notably, all feedback stages were rated equally negatively, with no significant effect of feedback. Therefore, although a negative evaluation of performance was recorded, there was no distinction between the performance of the different feedback stages as was present in the SEQ and SMEQ. In contrast to the self-evaluation of performance, participants indicated that they worked only somewhat hard mentally and physically to accomplish their level of performance. This indicated that the participants did not feel that they had worked particularly hard to reach their overall level of performance, even though an unsatisfactory evaluation of performance was measured.

Next, participants recorded that they were not irritated or stressed by the task. The TLX measured relatively low frustration levels, weighting towards a relaxed attitude during the experiment. These results indicated that although participants were relatively unsatisfied with their performance, they were not stressed or unhappy. Finally, a mean overall “raw TLX” measure of workload was calculated to represent the overall TLX rating of each feedback type. Due to time restrictions, a pairwise comparison of each dimension was not deemed necessary and thus not undertaken.

A repeated-measures ANOVA was conducted to compare scores on the different feedback stages, and although there were some noticeable variations in the mean scores for each category and feedback types, no significant effect of feedback was recorded at the p < 0.05 levels for all categories except for effort: (F(3, 9) = 4.22, p = 0.04, partial η2 = 0.58). Post hoc testing for effort revealed that there was a significant difference in mean scores for perceived effort between the no feedback and tactile feedback stages of the experiment (mean difference = 8.42, p = 0.046). This indicated that participants regarded the different feedback types as equally usable across all TLX categories except for effort, where there was minimal difference in scores between the tactile and no feedback stages.

3.3 User Experience Results

The final stage of the functionality analysis incorporated a post-task assessment of the users’ experiences during the experiment. A pre-existing questionnaire was used to measure user experience quickly, simply and as immediately as possible. Six critical aspects of experience were captured via the UEQ questionnaire: attractiveness, perspicuity, efficiency, dependability, stimulation and novelty (Fig. 6.7). The overall internal consistency of the user experience scales was acceptable, with α = 0.88. However, poor internal consistencies for some of the individual feedback stages were observed, highlighting some disparity between participant answers. The maximum range was measured as −3 (very bad) and +3 (very good). However, maximum ratings have been previously reportedly as unlikely in user studies [15]; therefore, a more restrictive range was applied to compensate for different answer tendencies of the participants. For user experience measures on this scale, mean values between −0.8 and 0.8 are representative of a neutral evaluation of the corresponding dimension. Values greater than 0.8 represent a positive evaluation, and values below −0.8 represent a negative evaluation.

Fig. 6.7
figure 7

Boxplots representing UEQ results for each unique feedback stage

A repeated-measures ANOVA was conducted to compare UEQ scores revealing that there were statistically significant variations in user experience answers for the efficiency, dependability and novelty category ratings at the p < 0.05 level. However, pairwise comparisons of novelty with adjustments for multiple comparisons (Bonferroni) revealed no significant differences between the feedback stages. The categories of efficiency and dependability specifically relate to the user’s experience of the ergonomic quality aspects that were applied in the design of the Bowl devices (Fig. 6.8). Participants evaluated their experience of device efficiency in the chosen task as being quick and organised for haptic feedback reducing towards a more neutral rating as feedback was reduced in the order of force, tactile and no feedback, respectively. Similarly, the participants’ experience of dependability of the feedback stages showed the same downwards trend, with experience ratings of predictable and secure behaviour for haptic and force feedback being high and a much more neutral rating for tactile and no feedback.

Fig. 6.8
figure 8

Boxplots representing UEQ efficiency and dependability for each unique feedback stage

From these findings, participants rated the different feedback stages relatively equally for the categories of attractiveness, perspicuity, stimulation and novelty. Post hoc comparisons with Bonferroni adjustment indicated that the mean score for efficiency for force feedback was significantly different from the no feedback stage. In addition, the same test revealed that there were statistically significant effects between dependability ratings for haptic and force feedback and tactile and no feedback. This significance highlighted a perceived efficiency rating difference between the feedback stages of force, tactile and no feedback. These perceived differences are interesting due to the lack of difference observed in performance.

3.4 Interview Data

Participants were asked whether they would like to use each feedback stage to perform with outside of the experiment. Participants’ answers varied across the different feedback stages (Table 6.4). Most participants were pleased with their evaluation of feedback performance for each device and thought that they would use the device outside of the experiment. However, some users also indicated that they did not have an opinion about usage preference, as they would not normally use a computer interface to make music. When questioned further, users indicated that they were not particularly inspired by the experiment methodology, but suggested that if they could expand or explore the devices’ parameters further they might have rated it more favourably. The estimated usage ratings for the different device feedback stages noticeably reduced from the haptic stage through to the no feedback stage (Fig. 6.9). Participants who were not accustomed to performing with computer interfaces expressed that they felt increasingly negative towards devices as feedback was reduced.

Table 6.4 Participant preference of use
Fig. 6.9
figure 9

Diverging stacked bar chart for preference of use evaluation

A Friedman Test revealed a statistically significant difference in device use answers across the four different feedback stages, x2(3, n = 12) = 25.05, p < 0.001. Following this, a post hoc Wilcoxon Signed-Ranks test was conducted to explore the impact of device feedback on estimated use answers. There was found to be a statistically significant difference at the p < 0.0125 levels in device scores between the haptic and all other feedback stages. A medium-to-large effect size was observed from 0.24 to 0.44. Post hoc comparisons indicated that the score for the haptic stage was significantly different from the other feedback stages at the p = 0.0125 level. There were also significant differences in results between the no feedback stage and force and tactile feedback stages. This demonstrates how haptic feedback can be used as a preferential feature when choosing between multiple DMIs in composition or music performance.

Participants were asked open-ended questions to gauge their opinions about the different feedback stages. These questions were then expanded upon in an interview, with care taken not to bias the participants’ responses. A CIT analysis was conducted based upon the participant’s answers to record the users’ attitudes to the different feedback types. Content analysis techniques were then applied to categorise the responses into areas of concern; these included: personal preference, playability, comparison to other musical instruments, learnability, comparison to other DMIs, explorability and tempo.

From the interview transcripts, coherent thoughts and single statements were identified and extracted. After redundancy checking, a total of 322 single statements were counted (M = 80.5, SD = 15.77, per feedback stage). Following this, three researchers were independently employed to iteratively classify this pool of statements as either “positive” or “negative” performance evaluations. Although this process was initially reductive, a second analysis of the data was used to develop a bottom-up categorical system of classifications to known areas of concern in musical interactions: learnability, explorability, feature controllability and timing controllability [16].

Participants were inclined to be positive about the haptic feedback stage of the experiment and were pleased with the amount of feedback that was delivered, see Table 6.5. It was noted that participants were more vocal about their experiences at this stage than for the tactile and no feedback stages. The CIT highlighted personal preference as the most reported aspects of user experience at this stage. These comments highlighted the overall enjoyment of participants when interacting with the device. However, while many comments were positive, participants highlighted some negative ergonomic aspects of the interaction as well. Comments about playability mainly focussed on interaction difficulties during the task. However, many remarks made in the playability category were positive. These demonstrated an appreciation for the increased performance information provided by haptic feedback. Participants expressed a partiality for familiar feel to the interface, which they felt increased their attention to their actions. This showed that if care was taken to provide haptic feedback in DMI designs, the end-user may gain an increased sense of awareness of their interaction, without involving overly complicated mechanisms or device processing power. The comparison to other musical instruments category produced several interesting responses in comparison to the other feedback stages. Specifically, comments that compared the device directly with acoustic instruments provided an interesting insight into the combination of force and tactile feedback. Learnability was seen more positively here than for the force and tactile feedback alone. These findings have been observed in other research areas, most notably in [20]. The category containing the most negative remarks was tempo. The comments expressed here all indicated that a tempo-based task would be very problematic to perform and positive comments indicated that it would be challenging to accomplish.

Table 6.5 Content analysis for haptic feedback

Table 6.6 shows the results of the content analysis of the force feedback stage of the experiment. This stage of the experiment received the same number of positive comments as the haptic stage; however, it also received more negative comments. As with the haptic feedback stage, force feedback received noticeably more comments than the tactile and no feedback stages of the experiment. Again, the category that contained the most comments was the personal preference category; however, the categories following this varied from the haptic feedback stage.

Table 6.6 Content analysis for force feedback

The personal preference category of the force feedback stage contained comments discussing the novelty of the design and how the users found it interesting to use. There were also several positive comments focussing on simplicity and accessibility of the interface. However, some comments fixated negatively on the way pitch selection was achieved and the quality of sound reproduction from the small-embedded speaker. Participants were more inclined to refer to other instruments in the comparison to other musical instruments category compared to the haptic feedback stage; however, some comments were critical of the lack of input gestures available to use. This further highlighted the restrictive nature of functionality focused experimentation. Comments in the playability category discussed the implication of physical requirements for playing the device, either praising its accessibility or commenting on the interface requirements for interaction. The group containing the most negative remarks was again the tempo category. Comments made here referred to issues of envelope attack time, jumps in pitch and concerns about accuracy.

Table 6.7 shows the results of the content analysis of the tactile feedback stage. Participants were more conservative with comments, suggesting that there were not as many aspects of this feedback stage that were worthy of note. However, this may be attributable to the conservative nature of the participant pool. The categories that contained the most responses were personal preference, comparison to other musical instruments and playability.

Table 6.7 Content analysis for tactile feedback

The personal preference category contained the largest amount of participant comments. This category also contained the most positive comments. These comments mainly reflected how the participants felt about the interaction and their curiosity about tactile feedback. However, some participants viewed the interaction as unpredictable and inaccurate. Comments in the comparison to other musical instruments category talked about how the interactions were in comparison to the participants’ own instruments and compared accuracy between the two types of instrument. The playability category contained the highest number of negative comments. The participants were particularly focused on their own perception of lack of accuracy and precision in their movements.

Finally, the results from the no feedback stage of the experiment can be seen in Table 6.8. This feedback stage yielded a high number of comments about personal preference, comparison to other DMIs and playability issues. The negative personal preference comments highlighted the participants’ frustrations at the lack of feedback provided. Positive comments were directed to the novelty and fun factor of the interaction. Participants were more inclined to compare the no feedback stage of the experiment with other DMIs, as seen in the comparison to other DMIs category. Many of the comparisons were negative, focussing again on the perceived inaccuracy of their movements. Positive comments highlighted the differences to other DMI interaction types. As with the tactile feedback stage of the experiment, the playability category contained the most negative comments. These comments mainly focused on the perceived accuracy of the interaction, with a few comments about creative application.

Table 6.8 Content analysis for no feedback

3.5 Empathy Mapping

Empathy mapping results are represented in Figs. 6.10, 6.11, 6.12 and 6.13 showing little deviation from observed actions during the functional task and verbal explanations of answers in the interview; this serves to further validate the analysis techniques applied.

Fig. 6.10
figure 10

Empathy mapping for Haptic feedback

Fig. 6.11
figure 11

Empathy mapping for force feedback

Fig. 6.12
figure 12

Empathy mapping for tactile feedback

Fig. 6.13
figure 13

Empathy mapping for no feedback

4 Discussion

In the functional analysis, participants could select the specific pitches with observable increases in mean move time across the four stages of feedback. However, the statistical analysis of mean move time variance between each feedback stage presented with no significant effect for feedback. This indicated that, although there was evidence of some practical differences between feedback types, haptic feedback and its derivatives had no consistent effect upon move times in pitch selection tasks. This finding supports the argument that haptic feedback has no significant effect upon a device’s performance in functional device evaluation exercises. Furthermore, the accuracy of pitch selection across the different feedback stages also varied with frequency. Mean deviation from the target frequency did so over three distinct bandwidths. For waveforms below 500 Hz, the predetermined physiological constant was maintained, with frequencies above this threshold increasing in deviation by approximately 0.6%. The mean accuracy figures for each feedback stage presented with no significant differences; however, there was again evidence of practical differences. These findings further support an argument that haptic feedback may have no significant quantitative effect upon a device’s performance in auditory pitch selection exercises.

For the SEQ, it was found that when participants were given the opportunity to evaluate their own performance, they rated themselves differently for each feedback type. Participants evaluated the difficulty of the task with tactile and no feedback as being more challenging than with haptic and force feedback. There was no significant difference between the haptic and force feedback stages or the tactile and no feedback stages, indicating that tactile feedback had no effect upon the participant’s perception of ease-of-use. However, from these observations, force feedback can be seen as having some positive effect. Although the quantitative measures of performance indicated that there was no significant difference in move time and accuracy, participants were inclined to be more self-critical of their performance than necessary when feedback was altered or removed. Many participants indicated that, although they found the task difficult across all stages, their level of engagement varied, as it would if they were performing for the first time with any new acoustic instrument.

The SMEQ further supported these findings, with ratings showing that some amount of effort to a fair amount of effort was required to perform the exercises. However, the SMEQ presented a different focus than that of the SEQ, as it measured the perceived amount of mental effort applied during the task. The results showed that the amount of mental effort required increased as feedback was removed, although the actual quantified performance of the different feedback stages did not significantly differ. These differences were significant between the haptic and force feedback stages and the no feedback stage. Tactile feedback did not differ significantly from any other stage. Furthermore, the perception of increased mental effort was also indicated as being a significant effector during the user experience analysis. From analysing the functional data and comparing them to the participant’s perception of mental effort and ease-of-use, it was observed that force feedback was the most influential feedback type, with no significant effect observed for tactile feedback. However, with the addition of tactile feedback to force feedback, there were also no detrimental effects on the user’s performance ratings.

The overall raw usability testing revealed no significant effect of feedback across all feedback stages; however, the data collected did reveal some interesting results. For example, the self-measure of performance on the NASA-TLX scale was found to be reasonably poor for all feedback types. This indicated that participants were equally negative about how successful and satisfied they were with their performance across for all feedback types. The results also indicated that haptic feedback and its constituent parts each played some part in the reduction of participants’ perception of mental demand. The combination of TLX, SEQ and SMEQ usability ratings indicate that a general level of dissatisfaction with performance for each feedback type was noted.

The UEQ data from the study highlighted a significant difference between the users’ experience of efficiency and dependability across all feedback stages. For efficiency ratings, significant differences were observed between haptic and force feedback and tactile and no feedback ratings. This denoted that the evaluation of the participants’ experience of work performed to total effort expended was not affected by tactile feedback, but by force feedback alone. Similarly, the participants’ appraisal of dependability displayed the same evaluation characteristics. The participants’ experience and assessment of device reliability showed that they felt that the tactile and no feedback stages were less reliable than the haptic and force stages, regardless of there being no measurable effect of feedback in accuracy and move time.

Subsequently, critical incidents for each feedback stage were assessed. Overall, the CIT analysis revealed some interesting trends. The most obvious of these was the decrease in positive comments and the increase in negative comments made as feedback was removed from the interaction. Additionally, participants were particularly more vocal about their personal preferences when interacting with each feedback stage. This trend highlighted the importance of performer individuality and prior experiences when designing, building and using a DMI device with feedback. This would imply the need for a more explorative investigation methodology in the evaluation of experience. This aspect could be further expanded upon in user case studies and involve the further consideration of creative applications in its analysis.

With the specific matching and categorisation of the devices and the quantitative and qualitative data recorded during functionality testing, the results of the experiment showed that the effect of haptic feedback and its derivatives could be measured in the operation of a DMI, with accurate data measures. These findings denoted interesting results for the different types of feedback displayed to the user, and although there was no direct affect upon the quantitative performance of the DMI, feedback may still be revealed to have some positive influence upon the user’s perceptual experience when applying them in note-level-control metaphors, musical exercises, and explorative or creative contexts.

The discipline of HCI has a wide range of evaluation frameworks for the appraisal of digital technology as applied to simple, multiparametric tasks. This includes evaluation techniques that are designed to discover issues that arise in unique applications of technology, such as the effects of haptics in DMI design. For the appraisal of complex devices, HCI evaluation techniques can be incorporated in the evaluation of usability and user experience. In addition to this, the subject of human computing (or human-centred computing) can also be used to evaluate the user’s intentions and motivations in the application of technology in creative contexts. As has been presented here, an appraisal of function, as a task-focused approach, presents metrics that are easy to measure and quantify. However, in the creation of music, the application of technology relies upon the user’s previous training and experiences to accurately express the musicians’ inner thoughts and intentions.

It is therefore proposed that, although DMIs require functional testing to highlight potential usability issues, a comprehensive analysis should also include the evaluation of real-world situations to accurately capture and evaluate all aspects of an interaction. Thus, to expand our investigation of haptics into the real world, a music-focused analysis should also be undertaken. This idea emphasises the “third paradigm” concept, which includes the gathering of information relating to culture, emotion and previous experience. Our results show task-focussed evaluations are indeed a necessary precursor to experience-focussed assessment. However, task-focussed evaluations, when carried out in isolation, do not present sufficient information about the user or device in real-world applications of such technology.

Interaction information pertaining to acoustic musical instrument design already exists; therefore, data can be measured and used in DMI interaction design to provide a sense of realism and embodiment to virtual or augmented instruments or expanded upon to fit new design types [21]. Many digital musicians are recognised for their creativity, innovation and adaptation in the design and construction of DMIs; however, these digital instruments are often still devoid of haptic feedback. It is possible to reconstruct the operating principles of acoustic instruments and apply them to DMIs, as is seen in augmented instruments and DMIs that replicate the playing style of an acoustic instrument. For a performer, however, the emptiness of assignable “button bashing” may be seen as a negative characteristic. DMIs offer freedoms to musicians that are near endless, but digital music performers often also play conventional instruments, highlighting the need to experience the creation of music with all senses engaged.

If multimodal collocations are possible within DMI design, it should also be possible to simulate the haptic experience of an acoustic performance. Sound can be created electronically with the freedoms afforded through digital sound generation and with the combined information of the interaction response being fed back with comparable meaning as an acoustic instrument. Sound can be digitally created and manipulated by the artist, and a deeper sense of craft can potentially be realised. Computer musicians need to be able to experience consistency, adaptability, musicality and touch-related sensations that are induced by touch to experience the physiological and psychological occurrences outlined within each of the research conclusions presented here.

5 Conclusions

In this chapter, it has been seen that the addition of haptics to DMI feedback archetypes enhances the user experience, but does not appear to impact on the effectiveness (move time) or accuracy of the functional elements of DMIs. Additionally, from the analysis of feedback in auditory interactions, it has been demonstrated how a HCI-informed framework can be applied in the evaluation of DMI design. Specifically, it was observed how a device’s analysis can be informed by HCI techniques that are applied in the evaluation of general computing and computing for unique or creative applications. Regarding the experimental results presented here, the functional capacity of haptic, force, tactile and no feedback afforded to users in tasks that require the selection of specific frequencies was quantified and evaluated. The accumulation of differences observed within this analysis revealed influential factors of information feedback on the user’s experiences in functional application contexts.

From the data gathered, DMI feedback appeared to be influential on several context dependent levels. In the study, there was found to be no significant effect of feedback upon the quantifiable performance capacities of the tested feedback stages. However, when questioning the participants further, there were discovered to be important inequalities in the perception of usability and experience when completing the task. Within these areas, the musician’s perception of performance was found to be more favourable with the presence of both tactile and force feedback. Therefore, it can be concluded from this experiment that haptic feedback has some positive effect upon many perceptual experiences in the application of DMI technology and should be further investigated in the field.

It is expected that the study of interactions between performers and digital instruments in a variety of contexts will continue to be of research interest. Research on digital musical instruments and interfaces for musical expression will continue to explore the role of haptics, incorporating user experience and the frameworks that are constructed to quantify the relationship between musical performers and new musical instruments. The complexities of these relationships are further complicated by the skills of musicians and are far greater and more meaningful than a physically stimulating interaction.

It has been shown in this work that digital musical instrument design and evaluation methodologies can be applied in the study of interactions between musicians and instrument. However, it is suggested that emergent DMI systems require further measures for an accurate appraisal of the user’s experience when applying the device in a musical context [22]. In a traditional HCI analysis, a device is evaluated in a specific context and the evaluation methods are expert-based heuristic evaluations or user-based experimental evaluations. Only by determining context is it possible to interpret correctly the data gathered. Therefore, it is suggested that DMI-specific functionality, usability and user experience evaluation methods should be developed.

The work presented has only begun to explore the possibilities of haptic feedback in future DMI designs. The experiment endeavoured to present evidence of some influence that haptic feedback has on a user’s perception of functionality, usability and user experience. Beyond this, future research goals should include long-term studies, and the development of tools to assist in the creation of DMI designs, to allow designers experiment with different gestural interface models. Within this space, composers, performers and DMI designers will be able to explore the affordances of technologies in the creation of new instruments for musical expression.