Universal Access in the Information Society

, Volume 16, Issue 2, pp 365–379

Physiological mouse: toward an emotion-aware mouse

  • Yujun Fu
  • Hong Va Leong
  • Grace Ngai
  • Michael Xuelin Huang
  • Stephen C. F. Chan
Long paper


Human-centered computing is rapidly becoming a major research direction as new developments in sensor technology make it increasingly feasible to obtain signals from human beings. At the same time, the pervasiveness of computing devices is also encouraging more research in human–computer interaction, especially in the direction of personalized and adaptive user interfaces. Among the various research issues, affective computing, or the ability of computers to understand and react according to what a user “feels,” has been gaining in importance. In order to recognize the human affect (feeling), computers rely on the analysis of signal inputs captured by a multitude of means. This paper proposes the use of human physiological signals as a new form of modality in determining human affects, in a non-intrusive manner. The principle of non-invasiveness is very important, since it imposes no extra burden on the user, which improves user accessibility and encourages user adoption. This goal is realized via the physiological mouse, as a first step toward the support of affective computing. The conventional mouse is converted with a small optical component for capturing user photoplethysmographic (PPG) signal. With the PPG signal, it is possible to compute and derive human physiological signals. A prototype of the physiological mouse was built and raw PPG readings measured. The accuracy of the approach was evaluated through empirical studies to determine human physiological signals from the mouse PPG data. Finally, pilot experiments to correlate human physiological signals with various modes of human–computer interaction, namely gaming and video watching, were conducted. The trend in physiological signals could be used as feedback to the computer system which in turn adapts to the needs or the mood of the user, for instance change the volume and the light intensity when watching a video or playing a game based on current user emotion. The authors argue that this research will provide a new dimension for multimodal affective computing research, and the pilot study has already shed some light toward this research goal.


Affective computing Physiological signals Non-intrusive measurement Gadget prototype Human emotion 

1 Introduction

Human–computer interaction has been an active research area and has resulted in much progress in improving usability of computing devices. One important breakthrough was the advent of multi-touch screen inputs, which provides an intuitive means of sending input signals to the computer. However, even with improved input capabilities, computers still do not possess much ability to adapt to users. With continuous technological advancements—especially the proliferation of ubiquitous devices—in the past decade, a paradigm shift has developed whereby users expect the computer to accommodate to the human, rather than the human adjusting to the computer. This has given rise to the discipline of human-centered computing, in which the governing role of computers and human is basically reversed. This has been made possible in part through the enhancement of computational power in computing devices, which enables the adoption of various statistical or machine learning models to process and analyze captured human-generated signals.

A first step toward human-centered computing is the ability to more naturally understand what a user wants or implies. The seminal work pioneered by the “Put-That-There” system [5] had largely defined a major research direction in multimodal interaction for the past 30 years. Multimodal interfaces process two or more combined user input modes in a coordinated manner with multimedia system output, aiming to recognize naturally occurring forms of human language and behavior [21], in order to “understand” the human intention. Research work before the new millennium had mainly been focused on the use of alternative input signals, for instance, speech and gesture, and the corresponding recognition and processing algorithms [22].

Affective computing in human-centered computing takes into account the human feeling [23], or affect, which is usually deduced via subtle input channels, such as facial expression, body gesture, and physiological conditions. Most research works are geared toward the understanding and utilization of human mental states [8, 31]. The acquired signals are then processed, analyzed, coordinated, synchronized, fused, and integrated, to construct a model of the human affect, which can be used as an input modality for the computer. For instance, an e-learning system which detects that the learner is bored could switch to a different topic, or provide more audio–visual feedback, or change to a more interactive mode that focuses on problem solving rather than material presentation. In game playing, users can be more immersed in the gaming environment if the computer can ride on the emotion of the player. Positive feedback could be provided to an excited player, as reflected by increase in heart beat rate.

In previous work, the authors have demonstrated that the emotions of a number of users viewing the same video can be conveyed to one another via the notion of the “emotar,” an avatar that presents the recognized emotion of a viewer [17]. They have shown that the provision of this emotar improves the user experience of the viewer, turning a solo activity into a social one. Similarly, two persons could work together to produce a piece of “collaborative” music by streaming across useful musical signals to bring out human affects [19]. This new area of affective computing has taken on much momentum in recent years, resulting in recognition by the research community and the new IEEE Transactions on Affective Computing [8].

There are two major research foci in affective computing. The first is the acquisition and processing of human-centered signals for recognition and integration. The second is to fuse multiple signals [26] acquired as a result of multimodality processing [18] to determine the target human affect. The application of the recognized affect in effecting diversified and context-specific response would be more domain-specific, forming the next higher level of work.

In this paper, the daily used mouse is enhanced to collect additional input signals and extract useful information to support affective computing. Since the mouse is virtually available in all computers and users generally rely heavily on it for interaction with the computer, it is proposed to capture the human physiological signals via the mouse in a non-intrusive manner. The goal is to be able to capture informative signals that will be useful for emotion detection without the user being consciously being aware of a physiological data capturing device. There are three advantages to this approach. First, the use of low cost, commonly found equipment which makes the solution affordable and accessible to common users. Second, though expensive devices, such as Mindset [20] or EPOC [11], would provide a higher accuracy in measuring the brain signals, they are intrusive, imposing a burden on the user and potentially affecting human behavior in return. Finally, the mouse movement information can also be captured simultaneously, providing yet another stream of input signals that could be utilized for future human emotion detection. In their present work, the authors take advantage of non-intrusive photoplethysmographic (PPG) signal measurement, a technology that measures the blood volume pulse from the transmitted or reflected light over human [1], reckoning that the cardiac cycle pumping blood into blood vessels would create peaks in the received signal.

A physiological mouse prototype is built that enables the measurement of PPG signals (namely reflected light signals) in a non-intrusive manner. In particular, the infrared light emitted by a light-emitting diode (LED) reflected off the human skin in form of PPG signals is measured. Those raw PPG signals are then processed into appropriate physiological signals. The paper focuses on the heart beat rate and respiratory rate. It is also possible to compute for the SpO2 value if light sensors for two different colors are adopted. When a human becomes motivated, or becomes angry, his/her heart beat rate will increase, and so will the respiratory rate. A bored person would be displaying a relatively low respiratory rate. These physiological signals are often controlled unconsciously by the inner-body mechanisms, for example a rise in the level of adrenaline. To the authors’ best knowledge, they are probably the first research group to make use of the ubiquitous mouse to derive human physiological signals, in an attempt to recognize human affective state to support human-centric computing. It is argued that this work will provide a good start for designing future emotion-aware devices, as well as building blocks for emotion-adaptive applications, as exemplified by [17, 28].

The contributions of this paper include: (1) the design and development of a prototype for the novel physiological mouse, making use of a small optical component; (2) the proposal to measure the PPG signal via a light sensor and stream over the signal to be processed; (3) algorithms to compute the physiological signals from the captured PPG signal; (4) an evaluation of the algorithms for accuracy; and (5) a pilot empirical study on the relationship between measured physiological signals and human emotions that the experimental subjects are driven into, via video watching and gaming.

The rest of this paper is organized as follows. An overview of related work is presented in Sect. 2. The principles and algorithms in collecting and processing those signals are explained in Sect. 3. In Sect. 4, an outline in building the physiological mouse prototype is provided. Section 5 presents several experiments performed to evaluate the prototype for its accuracy. Section 6 studies the relationship between captured physiological signals and human emotions to a number of subjects. This paper closes with a brief conclusion, followed by an outline on future work.

2 Related work

The most common input factors for affect inference in state-of-the-art affective computing research include facial expression, vocal intonation, hand gesture, body posture, language, and physiological signals [8, 31]. Early work on affective computing focused on hand gestures, body posture, language signals, and vocal intonation, the former two being easily recognized via a camera and the latter two via speech processing techniques. With advances in image processing technology and the widespread use of webcams providing reasonable resolution, facial expression analysis has taken over the research focus. While body gesture and body posture represent the explicit user intention, physiological signals represent the implicit user intention that the user is unaware of and even cannot control or hide. Facial expression is somewhere in between, representing both explicit and implicit user intentions. There is a postulation that the nakedness of the human face (lack of facial hair) and color vision in humans are evolutionary adaptations that allows us to more easily deduce each other’s emotions, therefore facilitating community living [9]. In fact, the authors have recently utilized facial expression analysis techniques to identify viewer emotion when watching a video to create “emotars,” which are emotion-bearing avatars [17].

Despite the direct user intention carried by information-rich body posture and hand gestures, they have attracted less attention from the affective research community. This may be because it is more difficult to accurately infer affective states from these modalities. Some researchers also argue that posture only reflects the quantity (intensity) of the emotion, instead of its quality (category) [10]. More recently, emotion detection based on keystroke dynamics has been investigated [3], providing yet another approach to estimate user affect in a non-intrusive manner. Keystroke dynamics were extended with mouse movement in order to better determine the human attention level, attaining a certain degree of success [28]. It has also been shown that mouse movement pattern is correlated with the level of anxiety [30]. Along a reverse research direction, the mouse can be equipped with tactile feature to provide haptic feedback to the user in the form of an assistive technology. Emotional variation when different tasks are performed has been studied [6].

Physiological signals also appear to carry information related to affects. It had been shown that there is a correlation between the six basic human emotions, namely anger, fear, sadness, disgust, joy, and surprise, and four physiological signals, namely heart rate, skin conductivity, skin temperature, and general body activity [2]. Experimental studies were conducted by measuring the first three physiological signals via equipment attached to human subjects over the chest and one hand, whereas general somatic activity was measured indirectly via the change in mouse movement generated by the other hand.

Despite the usefulness of physiological signals in predicting human emotions, they are not widely adopted since most current physiological measurement approaches are intrusive. Unlike facial expressions, gestures, postures, vocal intonation, and language, which can be easily captured by camera and microphone, often users need to have sensors attached onto their body for electrocardiogram (heart related), electromyogram (muscle related), electroencephalogram (brain related) and so on, in order to measure physiological signals. For instance, BP@Home system [16] relies on an A&D blood pressure sensor that must be worn by the user. The MobiSense system [29] is capable of returning heart beat and activity information to a server based on accelerometer and electrocardiogram sensor information, but it also requires the user to wear sensors on the body.

In this paper, the use of PPG signals to extract information about human heart beat rate and respiratory rate in a non-intrusive manner is explored. There has been work on using non-intrusive video-based methods to measure human physiological signals for health monitoring. For example, the video taken by the smartphone camera could be used for the determination of heart beat rate [25]. Video captured is analyzed via principal component analysis to detect a periodic pulse, from which heart beat rate and respiratory rate could be estimated [4]. Independent component analysis was adopted to reduce the impact of motion in the captured signal [14]. These techniques, however, are very easily affected by movement of the body and changes in the head orientation. After physiological signals are determined, one could proceed to associate them with human affects. It is observed that heart rate variability will decrease when a human feels fear, sad, or happy, while peak heart rate will increase with pleasure [24]. Slow respiration can be regarded as performance of relaxed emotion, while irregular rhythm and quick variations correspond to anger or fear [15, 24, 27]. Nevertheless, establishing a good mapping from the collection of physiological signals and possibly other multimodal signals to human emotions remains an interesting and perhaps open research problem.

3 From PPG signals to physiological signals

A small optical device is attached to the prototypical physiological mouse to capture PPG signals. The optical device sends off infrared light, which is blocked and reflected by the user’s finger when the mouse is in use. A photodiode light sensor is used to measure the intensity of the reflected light, which will vary over time. This light intensity signal is then analyzed, and physiological signals are extracted. Currently, the response time for the sensor is 5ms, which means that it is possible to capture 200 readings per second, i.e., 200 Hz. The range of the light intensity reading returned by the sensor is \(l_i \in [0,1023]\). The signal processing work focuses on two key physiological measures, namely heart beat rate and respiratory rate (per minute). Since the infrared signal is monochromatic, it is not necessary to consider the more complex RGB signals.

3.1 Heart beat rate

In order to compute the heart beat rate, the first step is to clean the input signal by applying a smoothing function. The frequency of the input signal \(l_i\) is f = 200 Hz. A moving window approach is employed, with half-window size hw = 10 (window size w = 21). Given an input series of raw signal \(L= \langle l_1, l_2, l_3, \ldots, l_n\rangle\), the smoothened series over the moving window is computed, \(L{\prime}=\langle l{\prime}_{hw+1}, \ldots, l{\prime}_{n-hw}\rangle\), \(l{\prime}_i=\sum _{j\in [-hw,hw]} l_{i+j} / w\). This moving window smoothing admits an incremental evaluation upon computing \(l{\prime}_i\), with \(l{\prime}_{i+1}=l{\prime}_i + (l_{i+hw+1} - l_{i-hw})/w\). This is illustrated in Fig. 1.
Fig. 1

Signal smoothing

The PPG principle states that periodic changes in signal intensity are manifested by the cardiac cycle. As such, it is necessary to extract the dominant frequency from the smoothened signal. Signals of extreme frequency induced by noise are removed, and those signals of proper frequency band that are of interest are retained, by employing a band-pass filter. The frequency band adopted for heart beat rate is \(B_H\) = [0.5 Hz, 3.5 Hz], corresponding to 30 to 210 beats per minute. Even for champion athletes, it is uncommon to record a heart beat rate below 30. Similarly, a heart beat rate above 210 is unlikely in humans.

To compute a continuous sequence of heart beat rates, a moving window of size \(W_H\) = 5 s on the smoothened signal \(L{\prime}\) is adopted. Upon acquiring \(W_H f\) = 1000 data points spanning 5  s, the evaluation of the heart beat rate based on these 1000 data points starts. To perform filtering based on the desired band, a fast Fourier transform (FFT) is applied on the signal, transforming it from the temporal domain to the frequency domain. It is then easy to filter out unwanted frequency components. The raw FFT signal will be trimmed according to the passing band, and in the present case, \(B_H\) = [0.5 Hz, 3.5 Hz], as depicted in Fig. 2a.
Fig. 2

Frequency domain of heart beat and inter-beat interval

Finally, the power spectral density of the resultant signal is analyzed, extracting the one yielding the maximum power, via the Welch periodogram method. This peak frequency, \(F^*_H\), is taken to be the heart beat rate (in Hz). The heart beat rate is then 60\(F^*_H\). By sliding the moving window \(W_H\), it is possible to compute the heart beat rate throughout the experimental period in the form of a time series.

3.2 Respiratory rate

Since respiration does not directly manifest itself in the periodic heart beat signal, it is not possible to directly apply a band-pass filter corresponding to the potential range of respiratory rate on the raw PPG signal. However, since respiration corresponds closely to the high frequency component of the signal variation [7], the heart beat rate variability is used, in the form of inter-beat interval (IBI) as the key signal to determine respiratory rate. IBI measures the timing difference between successive heart beats, and IBI fluctuation is known to be useful in characterizing respiratory sinus arrhythmia, which is a cardiorespiratory phenomenon in phase with the inhalation and exhalation of the breathing process [7].

To compute IBI, the peaks in smoothened signal \(L{\prime}\) that represent the physical heart beats are detected, also measuring the timing difference between successive peaks as IBI values, to generate the heart variability signal \(H=\langle h_1, h_2, h_3, \ldots , h_n\rangle\). H is then analyzed for the respiratory signal. Owing to the uneven distribution of the data points in this IBI signal H in the temporal domain, interpolation is performed to obtain an equally spaced series \(H{\prime}=\)\(\langle h^{\prime}_1, h^{\prime}_2, h^{\prime}_3, \ldots , h^{\prime}_n\rangle\). When the respiration rate changes, the high frequency peak also shifts accordingly [7]. In this case, the higher frequency end of \(H^{\prime}\), which indirectly measures the respiratory rate is of interest. An appropriate band-pass filter is used, i.e., \(B_R=\) [0.15 Hz, 0.4 Hz], to extract this higher frequency component, after applying a FFT on \(H^{\prime}\). This band-pass filter represents an acceptable range of respiratory rates between 9 and 24 breaths per minute. Figure 2b depicts the peak power frequency obtained after the band-pass filter \(B_R\) = [0.15 Hz, 0.4 Hz] has been applied on the frequency domain.

Spectral analysis is performed with the Lomb periodogram to locate the peak power frequency \(F^*_R\). A moving window \(W_R\) = 60 is adopted instead of \(W_H\) = 5 to compute the respiratory rate. The use of a longer window is required since respiratory rate is much slower than heart beat, which means that there is a longer latency in the detection of the respiratory rate, as compared to the heart beat rate. Due to the possibility of spectral power spreading, instead of directly locating for one single candidate with peak power frequency, a cluster of strong candidates is identified, applying a smoothing filter to locate the dominating powerful group. This gives due credit to a cluster of neighboring frequencies of high power, which is more representative than a single frequency with an even higher power but without any supporting neighbor with high enough power. To be precise, for each moving window \(W_R\), candidate frequencies are located corresponding to local maxima power. To qualify as a candidate C, the power of the frequency \(F_C\) must be a local maxima within region \(N_T\) which has power no less than a threshold \(P_T\). In other words, it must possess a local maxima power around its neighborhood of size \(N_T\), i.e., \({\rm{Power}}(F_C) \ge {\rm{Power}}(F_i)\), \(F_i \in [F_C - N_T, F_C + N_T]\). Upon locating a sequence of candidate peak power frequencies \(\langle F_{C_1}, F_{C_2}, \ldots , F_{C_n}\rangle\) for each frame, it is normalized into the respiratory rate (per minute) by multiplying it by 60 to yield \(\langle R_{C_1}, R_{C_2}, \ldots , R_{C_n}\rangle\), the respiratory rate candidates. As the moving window slides, the different respiratory rate candidates are accumulated into a histogram. Finally, an average filter is applied to obtain the peak candidate for respiratory rate, which reflects a dominating group of frequencies possessing the highest power. In these experiments, \(P_T\) = 5 % of the total power and \(N_T\) = 0.02 have been adopted. Too high a value for \(P_T\) would limit the size of the candidate set, while too low a value would not be effective in discarding weak candidates. Too high a value for \(N_T\) would reduce the size of candidate set, while too low a value would have generated too many candidates. This is illustrated in Fig. 3, where the peak frequency corresponding to the respiratory rate can be identified.
Fig. 3

A histogram for respiratory rate candidates

The standard algorithm for determining respiratory rate assumes the availability of readings for the whole period. In particular, the algorithm of adopting a histogram works well when all data have been captured and stored offline. It provides a more accurate estimate for the respiration rate over time. However, in real applications, it is important to generate continuous physiological signals online, even before sufficient data are available. The algorithms are thus adapted to produce continuous outputs once some data become available. There is a trade-off made when adapting the algorithm to the real-time setting in terms of the length of the warm-up stage. During the initial warm-up stage, the computed respiratory rate would be inaccurate and fluctuate much due to insufficient data. This can be illustrated in Fig. 8b, where a delay period of 3 s is adopted, so that the first reading is obtained after 3 s. The initial fluctuation is quite significant, potentially extending for a few more seconds. However, the signal will become more stable and can track the respiratory rate as time goes on, especially after the first real breath is taken by the human subject. For practical purposes, it would be sufficient to remove the readings for the first 10 s and adopt the remaining physiological signal series for emotion analysis, since one would normally not expect the system to respond instantaneously, waiting for a computer system to “boot,” a smartphone to “turn on” or an application to “start.” In the present experiments, the first three data points are removed for a warm-up stage of 9 s.

4 Physiological mouse prototype

A prototype of the physiological mouse was built for both proof of concept and validation of the signal processing algorithms. The prototype is depicted in Fig. 4a. A small light sensor (a photodiode) is attached to the left side of the mouse (red box), where the thumb of the user is placed. An infrared LED (yellow circle) is attached next to the sensor. The device picks up and relays the intensity of the reflected infrared by the thumb when the user is holding the mouse, as depicted in Fig. 4b via a connected Arduino board. Essentially, this is equivalent to attaching a second light sensor and light source to the side of the mouse (the first pair is at the bottom of a conventional optical mouse) and this can be easily integrated by product engineers.
Fig. 4

Physiological mouse and its usage

The intensity of the reflected infrared light is recorded and the 200-Hz time series signal is passed to a connected computer for processing. Modeling clay is used to hold the gadget together. Though there are wires coming out of the mouse and the use of modeling clay does not look nice, users in general do not have a bad feeling when using it. This is a good proof-of-concept feedback. It can be argued that the market will react to the availability of useful or interesting technology. Had the idea of physiological mouse become well accepted, product engineers will design more user-friendly physiological mouses, as well as creating small add-ons to transform a normal mouse into a physiological mouse. They would also be able to integrate more sensors within the device or add-on in order to capture additional inputs for processing into other physiological signals, for instance temperature sensor and skin conductivity sensor.

5 Validation of physiological signal computation

To evaluate the accuracy of the physiological signals obtained when users use the physiological mouse, eight subjects, five male (Subjects 1–5) and three female (Subjects 6–8), were invited, to participate in the experiments using the mouse. The ages of the subjects ranged from 20 to 30, all university undergraduate and graduate students who do not suffer from any underlying chronic illness. The authors would like to study the viability and validity of the concept and algorithms in deriving physiological signals from PPG signals captured via the simple attachment of a small LED and light sensor on to the mouse. Three sets of experiments were conducted, the first one to study the performance of heart beat rate computation and the other two the performance of respiratory rate computation. The study is presented in more detail in the subsequent subsections.

5.1 Performance of heart beat rate

The goal of the first experiment is to study the accuracy of the computation of heart beat rate from PPG signals acquired by the physiological mouse prototype. To measure the heart beat rate, a iHealth Pulse Oximeter [13] is used that clips onto the finger of the subject. The heart beat readings from the iHealth sensor are taken as the ground truth and compared with the signal returned by the physiological mouse.

In this first experiment, each subject is requested to use the mouse for 2 min. The reading from the iHealth device and the result coming from the mouse are recorded every 3 s. This gives two data series for each subject i, one from iHealth device \(D_i=\langle d_{i,1}, d_{i,2}, \ldots , d_{i,40}\rangle\) and the other one from the mouse \(M_i=\langle m_{i,1}, m_{i,2}, \ldots , m_{i,40}\rangle\). In order to study the trend of sensor readings returned by the two devices, an error series is computed for each subject over the time period, i.e., \(E_i=\langle e_t=|m_{i,t} - d_{i,t}| | t \in [1,40]\rangle\). The general trend for the error is computed as the average \(E=\sum E_i / n_s\) where \(n_s\) is the number of subjects. The results for the average error over time and error for individual subjects over time are depicted in Fig. 5a, b, respectively.

From Fig. 5a, it can be observed that there is an initial transient impact to the readings due to the warm-up effect, and the first three readings are discarded in order to remove the bias induced by this warm-up effect. After that, the average error is normally below 3 across the board. Similarly, there are very few subjects displaying an error more than 5 after the warm-up stage. Moreover, a relatively high error with Subject 6 is observed. When the subject was interviewed, however, she mentioned that her hand was not always holding on to the mouse and it is apparent that part of the signal deviates much from the norm. If this subject were removed from the set of results, the error would drop from 2.90 to 2.65. However, this is an interesting case to report. No additional data taking is requested.
Fig. 5

Heart beat rate error

The physiological signals for heart beat rate obtained by the proposed algorithms are compared with the ground truth in Table 1. The mean values of \(D_i\), \(M_i\) are calculated over time noting their discrepancy, which reflects the error as an aggregate, for each subject, as depicted in Table 1. Two error metrics are computed. The overall error reflects the error between the mean values of \(D_i\) and \(M_i\), and the absolute error measures the average of deviations across all readings. It can be noted that the overall error is at most equal to the actual error, and this occurs when there is a systemic bias in which the reading of one device is consistently higher or consistently lower than the other. In the present experiment, it is observed that the physiological mouse is returning a lower reading most of the time, making the actual error close to the overall error. Nevertheless, this error is very small. Besides the error, the mean square error (MSE) for each subject is also measured so as to quantify the variability of the errors. In general, small MSE values of at most 10 are observed (except for Subject 6), which means that the error seldom exceeds 3 or 4. It can be concluded that the physiological mouse is able to attain a good accuracy for heart beat rate determination, sufficient for emotion recognition purposes.
Table 1

Heart beat rate performance










iHealth d









Mouse m









Overall error

1.77 %

2.34 %

0.95 %

3.21 %

2.99 %

6.85 %

1.94 %

2.31 %

Actual error

2.19 %

2.49 %

1.61 %

3.89 %

3.00 %

7.18 %

2.43 %

2.77 %










5.2 Performance of respiratory rate

Experiments were then conducted to evaluate the accuracy of the computation of respiratory rate from PPG signals. This is more challenging, since the mechanism to derive the respiratory rate from PPG signals is more complex and indirect, and there is more variation to the respiration pattern exhibited by human beings. Furthermore, unlike heart beat, which is more or less an involuntary mechanism, respiration is controllable by a human to a certain degree and breath holding is not an uncommon phenomenon.

The respiratory rate measurement is evaluated via two sets of experiments in line with [25]. The first set of experiments measures the respiratory rate under a controlled environment via a metronome. A simple metronome is implemented by displaying an inhale and exhale indicator at a given frequency, and the subjects are requested to breathe according to the rhythm. The second set measures respiration in a more natural context via self-reporting. Subjects are asked to breathe naturally over a period of time. For the measurement, unlike [25], which makes use of an intrusive respiratory belt fastened around the chest of the subject to measure respiration, the subjects are requested to press a key on the keyboard on every inhale and exhale. The timestamp of each keypress then gives the actual respiration events, thus the respiration rate. While the respiration rate induced by the metronome is constant in the controlled experiment, the actual respiration rate in the natural experiment exhibits variations. This variation provides more room for experimental validation of the accuracy over time and future human affect recognition.

In the first set of controlled experiments, the subjects are asked to breathe according to the predefined rhythms of 10, 12, 14, and 16 breaths per minute for 2 min. The resulting candidate respiratory rate histograms are depicted in Fig. 6a, b, which show the “best” and “worst” scenarios among the subjects, respectively. For each subject, the corresponding histogram summarizes the candidate rates for the four rhythms in different colors. It can be observed that in general, the peak frequency for each rhythm is correct. In the best case scenario, there is just one non-negligible second peak at 14 per minute. The results for the worst case contain non-negligible peaks at 16 per minute, and there are also a number of spikes at 12 per minute. The results are summarized in Table 2. The error for each specific rhythm is illustrated in Fig. 7. It can be observed that the error is not high, only ranging from 2.5 to 6 %, with an average value of 4.1 %. The mean square errors (MSE) are also very low, all below 1.0. The small error rate is attributed to both the stable breathing rhythm and the spectral analysis performed on the collected offline data. This set of experimental results would indicate a potentially “best” case scenario for respiratory rate computation from PPG signals, as compared with natural breathing situation.
Table 2

Respiratory rate: controlled experiment










Rhythm 10









Rhythm 12









Rhythm 14









Rhythm 16









Actual error

3.46 %

3.08 %

5.77 %

2.50 %

5.00 %

3.08 %

5.19 %

4.42 %










Fig. 6

Candidate respiratory rates for subjects: controlled experiment

Fig. 7

Average result with error bar for different respiratory rhythms

Performance results from the second set of natural respiratory experiments are depicted in Fig. 8a, b, as captured by the physiological mouse and the self-reported respiratory rate, as well as the average error for each subject. Figure 8a seems to suggest that there is no transient warm-up effect and the error varies from 1 to 5, with an average value of 1.90. In fact, the transient error is still there, as indicated in Fig. 8b which highlights individual subjects. However, those transient errors are largely canceled out in the average result depicted in Fig. 8a. In general, the respiratory rate exhibits a larger error than heart beat rate, since the mechanism for its determination is more complicated.
Fig. 8

Respiratory rate error: natural respiration

To make a fair comparison with heart beat rate computation and knowing that the initial transient data would likely be incorrect, the first three data points (corresponding to a warm-up stage of 9  s) are removed when presenting Table 3. The reported rate is computed by measuring the time difference between two consecutive respirations. Since the frequency of respiration is lower than the frequency implied by the 3 s reporting interval, there may not be a breath taken in an interval. Linear interpolation is performed to the respiratory rate when computing for the ground truth. This sampling frequency may also exert some impact on the accuracy of the physiological mouse. Compared with the heart beat rate, the overall error for the respiratory rate is generally smaller than that of the actual error, indicating that the error can go both ways and there is no evidence of the presence of any systemic error. The mean square error (MSE) is also not very high across the board, with a highest value of 9 to 16, i.e., a deviation of 3 or 4 in the worst case.

Though very accurate results for heart beat rate as well as respiratory rate can be observed under the controlled environment, it can be observed that the results for natural respiratory rates are not as good, with an average actual error rate of 16.9 %, maximum reaching 28.7 % and a minimum rate of 5.9 %. This is likely to be due to the fact that the respiratory rate is measured indirectly via some biological phenomenon, which allows noise to set in. The achievable accuracy also varies widely from subject to subject. The overall error rates, as with heart beat experiments, are in general lower and particularly for respiratory rates, much lower. This also implies that the errors can sometimes cancel out each other, without the clear presence of systemic errors. It can be argued that the measured respiratory rate can still be used for emotion recognition, especially when computing for more aspects of the respiratory rate to form the list of features in recognition, e.g., the average rate over a past window and the rate variation, besides the instantaneous respiratory rate signal.
Table 3

Respiratory rate: natural respiration










Ground rate









Mouse rate









Overall error

9.34 %

2.22 %

26.61 %

0.51 %

26.90 %

14.46 %

4.90 %

1.32 %

Actual error

12.69 %

5.89 %

27.81 %

13.43 %

28.71 %

15.00 %

19.01 %

6.06 %










5.3 Discussion

Overall, it can be observed that the physiological signals for heart beat rate align very well with the ground truth in most cases. Despite the fact that the proposed algorithm suffers a bit from the warm-up effect, the experiments indicate that it is sufficient to discard the first three data points, corresponding to a warm-up stage of about 9 s. In some extreme cases when the subject is not holding the mouse properly, the algorithm is not returning accurate heart beat rate. This can be reflected by abnormal light sensor reading values, and this provides a basis for detecting and filtering bad data for future enhanced prototypes, for instance, by comparison with multiple light sensors located at different positions of the mouse, returning surrounding lighting readings, or cross-checking with future skin conductivity readings to detect the loss of human contact with the mouse.

With respect to the heart beat rate computation, there appears a small systemic error in the mouse under-estimating the heart beat rate by a small margin. A tuning stage could be provided to the physiological mouse so as to reduce systemic error, if any, for medical applications that rely on accurate heart beat rates. Nevertheless, as most machine learning algorithms and correlation calculations are invariant to linear transformations, the computed physiological readings would be sufficient for the purpose of emotion recognition, regardless of the magnitude of the “linear” systemic error present.

As for the respiratory rate, a larger error is observed, especially when human subjects breathe naturally rather than following a prescribed rhythm. Unlike the more obvious transient warm-up effect in heart beat that could have induced inaccuracy in the result, this warm-up effect may sometimes be hidden when respiratory rate is to be computed. At 3-s sampling and interpolation intervals, the physiological mouse often returns readings of zero for the first 6  s and sometimes also for the 9th second. However, due to the relatively slow respiratory rate (about ten respirations per minute), it is probable that a subject would not have breathed during the first 3 s, essentially resulting in a value of zero as the ground truth in the first data point. This phenomenon could repeat itself for the second data point for some of the subjects. Subjects who breathe would generate a high error and subjects who do not breath would generate no error. Thus, this combination of incorrect mouse reading and user variance in breathing does not generate a high aggregate error, that is typical of the transient effect for the subjects, as shown in Fig. 8a. Nevertheless, the presence of this transient effect can be easily visualized in Fig. 8b which highlights results for individual subjects, though the transient warm-up errors are largely amortized in the average result, due to the presence of several zero errors.

For practical purposes, warm-up effect should be eliminated by skipping over the first few data points despite the fact that the overall error appears to be amortized across subjects. The experimental results indicate that it is sufficient to discard the first three data points as with heart beat rate computation for a warm-up stage of 9 s. There are further challenges in measuring respiratory rate since human subjects may hold their breath once in a while throughout the experimental period, e.g., a subject might be holding his/her breath when there is a suspense scene in a video or intense shooting in the game for a short period of time, possibly followed by several fast breaths afterward. In a medical application, it is necessary to differentiate this scenario against another scenario with noise induced in the PPG signals. A loose mouse holding situation could be interpreted by the PPG processing algorithm as breath holding. However, in the context of human emotion recognition, it might be possible to compensate for this form of uncertainty by means of adopting more features in representing the subject respiration phenomenon instead of purely the respiratory rate as inputs to the machine learning algorithms.

6 Correlating physiological signals toward human emotions

The pilot study to determine the relationship between physiological signals and human emotions involves two sets of experiments drawn from two frequently performed tasks in human–computer interaction: namely watching videos and playing games. Both sets of experiments involve eight subjects. The experimental setup is depicted in Fig. 9. The subject is requested to hold on to the mouse during the entire experiment.
Fig. 9

Experimental setup for emotion-related experiments

In the first set of experiments, each subject is requested to watch two short videos, a funny video of about 2 min and a horror video of about 4 min.1 The funny video presents a joke whereby a number of participants are invited to hold a big sign asking for kisses from passers-by. The participants are all men, and the passers-by happen to be attractive women. At the end of the video, a male passer-by appears and attempts to follow the instructions, which leads to a number of humorous moments. In the horror video, a number of teenagers are practicing gymnastics, with some workers repairing the electricity supply nearby. There are a number of “disasters waiting to happen” in the scene, including a hanging fan that is about to break loose (scene showing a loosening screw), uneven bars starting to fall apart, and leaking water that is slowly making its way across the ground toward a live wire. The screw that drops from the hanging fan on the ceiling onto the balance beam happens to play an important role in this horror video. It has been shown repeatedly with several near misses by the girl on the beam, hence building up the suspense. There are also camera pans to the ceiling and the floor to build up the suspense. Finally, the girl as expected steps on the screw to trigger a cascading series of accidents which culminate in the killing of another girl by electric shock and a fairly horrifying scene highlighting the death of a third girl, breaking her neck in a miserable way. Indeed, one of the female subjects could not handle the horror scene in the video and opted to watch an alternative video involving a snake charmer (despite the fact that she has a phobia of snakes).

In the second set of experiments, the subjects are requested to play a video game using the physiological mouse. The game utilized is called “The House of the Dead,” a classic first-person-shooter game set in a haunted mansion with zombies and monsters. The sequence culminates in a battle with a powerful “named mob,” whom they have to beat before the experiment is considered complete. Since not all subjects possess the same skill level, the amount of time that they spend to complete the experiment varied from 5 to 11 min.

The two physiological signals for each subject are measured, and the results for both sets of experiments are shown in Fig. 10. This provides a comparison for a same subject across different activities. In general, it is observed that the heart beat rate is highest for the game-playing activity, which is not surprising, given the more interactive nature of the activity, demanding continuous attention from the subject. Between the two videos, with the exceptions of two subjects, the horror video does not result in a significantly faster average heart beat than the funny video, and in fact, some subjects’ heart rates are actually higher during the funny video than during the horror video. Upon post-experiment interviews, all subjects stated that they were not particularly nervous or tense during the horror video, perhaps as a result of desensitization because they were used to watching such videos. Subject 8 is the subject with snake phobia (who chose to watch the video with the snake charmer instead of the horror video). Not surprisingly, her heart beat rate during that video is significantly higher than normal. Subject 1 also does not watch videos often, which means that he is not as desensitized as the others and more susceptible to physiological changes brought on by changes of emotion.

The respiratory rate, on the other hand, shows less of a pattern than the heart beat rate. It is interesting that a heightened heart beat rate does not automatically result in a faster respiratory rate (as evidenced especially for Subject 8). One potential cause of this is the breath holding phenomenon in the presence of suspense or threat for some human. This is an interesting issue that should be further investigated in future work.
Fig. 10

Physiological signals across subjects

For a more in-depth investigation, the physiological signals are measured and the temporal changes are aligned with turning points in the plot line of the videos. Similarly in game playing, the changes in the heart beat and respiratory rate are aligned with events in the game playing. In game playing, two different types of events can be distinguished. The first type is when the player is to be engaged, namely either about to face the boss or about to suffer an imminent death, while the second is the set of transition cut-scenes that help to narrate the underlying storyline behind the game. Changes would occur in the vicinity of the moment for the key events, i.e., funniest or humorous part in the funny video, horrifying part in the horror video, and the moment of threat in game playing. These are the so-called elicitation events. The heart beat rate and respiratory rate are considered 3 s before and 3 s after the event, and the change is measured. According to the result, there can be three possibilities: a certain increase in heart beat rate or respiratory rate, a certain decrease, and a negligible change (effectively no change).

Figure 11 shows these statistics. Unsurprisingly, the transition cut-scenes in the game usually elicit a drop in the heart beat rate, which is expected since those scenes usually constitute a break from the constant interactive activity. Similarly, it is also expected that a player facing imminent death would generate a higher heart beat rate, and this seems also to be accompanied with an increase in respiratory rate. What is less expected is that the heart beat rate does not seem to increase following the appearance of the boss, perhaps because the appearance of the boss is rather expected and that the image of the boss is not particularly scary. The funny scenes also appear to elicit an increase in the heart beat rate, perhaps because subjects often shift their position when laughing out loud, which then increases the heart beat rate momentarily. The respiratory rate does not seem to show much clear pattern.
Fig. 11

Physiological signals across activities

It is also interesting to analyze the change in heart beat rate and respiratory rate for an individual subject as a function of time. Figure 12 shows the data for one representative subject playing the game, showing the heart beat rate (red curve) against the left y-axis and the respiratory rate (blue curve) against the right y-axis. It can be seen that the transition cut-scenes elicit a drop in the heart beat rate, as evidenced in the overall data. The moments of imminent death (vertical blue lines) seem to be correlated with an increase in the heart rate, which is also expected. The appearance of the boss (vertical purple line), on the other hand, does not seem to result in an increase in the heart beat rate, perhaps due to the fact that the boss appearance is rather expected following the transition scene. The heart rate is highest toward the end of the experiment, perhaps as a consequence of increased tension upon seeing the end of the mission “within sight.”
Fig. 12

Temporal physiological signals playing game

Figure 13 shows the heart beat rate (red curve) and the respiratory rate (blue curve) for another representative subject watching the horror video. The key horrifying events are indicated in the time line. It can be observed that the heart beat rate increases more often than staying the same when this event happens. Nevertheless, near the end of the video to the point where the girl is going to get killed, the heart beat rate and respiratory rate both stay at a relatively high level, implying intense felt by the subject. The authors would be conducting more in-depth study to correlate the different events with physiological signals and machine learning would be a future direction to pursue.
Fig. 13

Temporal physiological signals watching the horror video

7 Conclusion

This paper has presented the design and development of a low-cost non-intrusive physiological mouse which can detect the physiological signals of a user. PPG signal processing algorithms were adopted based on reflected infrared by the human mouse user to derive the physiological signals. The ultimate goal is to design an emotion-aware mouse capable of determining the human emotion, to be equipped with additional physiological signal detection components. The availability of such a non-intrusive physiological mouse could be deployed by applications to interact better with the user to support user-centric applications.

Based on the proof-of-concept prototype, the accuracy of these PPG-based algorithms on the mouse in detecting human physiological signals was evaluated, both in batch mode and real-time mode. Despite the fact that there were some initial fluctuations in the recognized physiological signals in the real-time mode, the result is in general rather encouraging. It was possible to attain an error below 2 to 3% for heart beat rate and consistent respiratory situations and only a bit over 5 % for natural respiratory situations.

A pilot study on the relationship between measured physiological signals and human emotions was conducted. Experimental subjects were induced into joy, fear, and tense emotions by watching funny videos, horror videos, and game playing, respectively. Needless to say, there could be a mixture of anxiety and suspense emotions besides fear in horror video watching, when the subject was expecting something bad going to occur. There could also be some form of anxiety or fear in game playing when the subject was facing imminent death. Despite the possibly intermixed human emotions, some promising pilot results were observed for relating physiological mouse signals with human emotions. Game playing apparently exhibited slightly more prominent effects.

8 Future work

In the context of future work, the authors plan to characterize more precisely the relationship between physiological signals and human affects returned from the mouse, through more diversified tasks that are designed to elicit different emotions. This is to be augmented with the mouse movement information [28] in an integrated manner to better characterize human emotions. In addition, sensors to measure skin temperature and skin conductivity could be augmented onto the mouse, making it a fully functional physiological mouse capable of returning multiple useful signals in a non-intrusive manner for physiological signal computation. In this aspect, different machine learning approaches would then be studied to better recognize the user emotion based on the various signals.

Finally, it would useful to expand the collection of human signals in a multimodal setting [18], to more accurately detect the human affect with the increased dimension of inputs, for instance, via fusion of recognition engines, or using a more complex recognition model. Recently, the authors have successfully utilized facial expressions, as captured by a webcam, and mapped the recognized emotion to an avatar showing the corresponding emotion for people watching a same video, achieving interesting preliminary results [17]. The eye gaze information embedded in a video stream [12] from the webcam would be highly useful as a complementary signal toward effective emotion identification. All these will be useful complementary models for the ultimate affective mouse. More extensive experiments would need to be conducted. Exploring the development of a general model for generic users and a learning module that will adapt to a specific user over time is also planned.


  1. 1.

    The funny video is extracted from the famous hidden camera comedy show: “Just For Laughs: Gags,” Season 9, Episode 8, between 9’38” and 10’58,” whereas the horror video is taken from the movie “Final Destination 5,” running from 22’05” to 26’35,”



We would like to thank the experiment subjects for their time and patience. We would also like to thank the reviewers for their valuable comments for improving this paper. This research is supported in part by the Research Grant Council and the Hong Kong Polytechnic University under Grant Nos. PolyU 5235/11E and PolyU 5222/13E.


  1. 1.
    Allen, J.: Photoplethysmography and its application in clinical physiological measurement. Physiol. Meas. 28, R1–R39 (2007)CrossRefGoogle Scholar
  2. 2.
    Ark, W.,S., Dryer, D.C., Lu, D.J.: The emotion mouse. In: Proceedings of SIGCHI, ACM, pp. 818–823 (1999)Google Scholar
  3. 3.
    Bixler, R., D’Mello, S.: Detecting boredom and engagement during writing with keystroke analysis, task appraisals, and stable traits. In: Proceedings of the 2013 International Conference on Intelligent User Interfaces, ACM, p. 225 (2013)Google Scholar
  4. 4.
    Bixler, R., D’Mello, S.: Detecting pulse from head motions in video. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 3430–3437 (2013)Google Scholar
  5. 5.
    Bolt, R.: Put-that-there: voice and gesture at the graphics interface. ACM SIGGRAPH Comput. Graph. 14(3), 262–270 (1980)CrossRefGoogle Scholar
  6. 6.
    Brayda, L., Campus, C., Memeo, M., Lucagrossi, L.: The importance of visual experience, gender and emotion in the assessment of an assistive tactile mouse. IEEE Transactions on Haptics (2015). To appearGoogle Scholar
  7. 7.
    Brown, T., Beightolm, L., Koh, J., Eckberg, D.: Important influence of respiration on human RR interval power spectra is largely ignored. J. Appl. Physiol. 75(5), 2310–2317 (1993)Google Scholar
  8. 8.
    Calvo, R., Mello, S.: Affect detection: an interdisciplinary review of models, methods, and their applications. IEEE Trans. Affect. Comput. 1(1), 18–37 (2010)CrossRefGoogle Scholar
  9. 9.
    Changizi, M.: The Vision Revolution: How the Latest Research Overturns Everything We Thought We Knew About Human Vision. BenBella Books, Dallas, Texas (2009)Google Scholar
  10. 10.
    Ekman, P., Friesen, W.: Detecting deception from the body or face. J. Pers. Social Psychol. 29(3), 288–298 (1974)CrossRefGoogle Scholar
  11. 11.
    Emotiv. EEG System/Electroencephalography. http://www.emotiv.com
  12. 12.
    Huang, M.X., Kwok, T.C.K., Ngai, G., Leong, H.V., Chan, S.C.F.: Building a self-learning eye gaze model from user interaction data. In: Proceedings of the 2014 International Conference on Multimedia, ACM, pp. 1017–1020 (2014)Google Scholar
  13. 13.
  14. 14.
    Kim, B., Yoo, S.: Motion artifact reduction in photoplethysmography using independent component analysis. IEEE Trans. Biomed. Eng. 53(3), 566–568 (2006)MathSciNetCrossRefGoogle Scholar
  15. 15.
    Kim, J., Andre, E.: Emotion recognition based on physiological changes in music listening. IEEE Trans. Pattern Anal. Mach. Intell. 30(12), 2067–2083 (2008)CrossRefGoogle Scholar
  16. 16.
    Kusk, K., Nielsen, D., Thylstrup, T., Rasmussen, N., Jorvang, J., Pedersen, C., Wagner, S.: Feasibility of using a lightweight context-aware system for facilitating reliable home blood pressure self-measurements. In: Proceedings of International Conference on Pervasive Computing Technologies for Healthcare, pp. 236–239 (2013)Google Scholar
  17. 17.
    Kwok, T.C.K., Huang, M.X., Tam, W.C., Ngai, G.: Emotar: communicating feelings through video sharing. In: Proceedings of the 2015 International Conference on Intelligent User Interfaces, ACM, pp. 374–378 (2015)Google Scholar
  18. 18.
    Lalanne, D., Robinson, P., Nigay, L., Vanderdonckt, J., Palanque, P., Ladry, J.: Fusion engines for multimodal input: a survey. In: Proceedings of ACM International Conference on Multimodal Interfaces, ACM, pp. 153–160 (2009)Google Scholar
  19. 19.
    Lo, K.W.K., Lau, C., K., Huang, M.,X., Tang, W.,W., Ngai, G., Chan,S.,C.,F.: Mobile DJ: a tangible, mobile platform for active and collaborative music listening. In: Proceedings of International Conference on New Interfaces for Musical Expression, ACM, pp. 217–222 (2013)Google Scholar
  20. 20.
  21. 21.
    Oviatt, S.: Advances in robust multimodal interface design. IEEE Comput. Graph. Appl. 23(5), 62–68 (2003)CrossRefGoogle Scholar
  22. 22.
    Oviatt, S.: Multimodal interfaces. In :Human-Computer Interaction Handbook: Fundamentals, Evolving Technologies, and Emerging Applications, pp. 286–304. L. Erlbaum Assoc. Inc. (2007)Google Scholar
  23. 23.
    Picard, R.: Affective Computing. The MIT Press, Cambridge (1997)CrossRefGoogle Scholar
  24. 24.
    Rainville, P., Bechara, A., Naqvi, N., Damasio, A.: Basic emotions are associated with distinct patterns of cardiorespiratory activity. J. Pers. Social Psychol. 61(1), 5–18 (2006)Google Scholar
  25. 25.
    Scully, C., Lee, J., Meyer, J., Gorbach, A., Granquist-Fraser, D., Mendelson, Y., Chon, K.: Physiological parameter monitoring from optical recordings with a mobile phone. IEEE Trans. Biomed. Eng. 59(2), 303–306 (2012)CrossRefGoogle Scholar
  26. 26.
    Shivappa, S., Trivedi, M., Rao, B.: Audiovisual information fusion in human–computer interfaces and intelligent environments: a survey. Proc. IEEE 98(10), 1–24 (2010)CrossRefGoogle Scholar
  27. 27.
    Soleymani, M., Lichtenauer, J., Pun, T., Pantic, M.: A multimodal database for affect recognition and implicit tagging. IEEE Trans. Affect. Comput. 3(1), 42–55 (2012)CrossRefGoogle Scholar
  28. 28.
    Sun, H.J., Huang, M.X., Ngai, G., Chan, S.C.F.: Nonintrusive multimodal attention detection. In: IEEE Proceedings of International Conference on Advances in Computer–Human Interactions (2014)Google Scholar
  29. 29.
    Waluyo, A., Yeoh, W., Pek, I., Yong, Y., Chen, X.: Mobisense: mobile body sensor network for ambulatory monitoring. ACM Trans. Embed. Comput. Syst. 10(1), 13–42 (2010)CrossRefGoogle Scholar
  30. 30.
    Yamauchi, T.: Mouse trajectories and state anxiety: feature selection with random forest. In: IEEE Proceedings of ACII, pp. 399–404 (2013)Google Scholar
  31. 31.
    Zeng, Z., Pantic, M., Roisman, G., Huang, T.: A survey of affect recognition methods: audio, visual, and spontaneous expressions. IEEE Trans. Pattern Anal. Mach. Intell. 31(1), 39–58 (2009)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2016

Authors and Affiliations

  • Yujun Fu
    • 1
  • Hong Va Leong
    • 1
  • Grace Ngai
    • 1
  • Michael Xuelin Huang
    • 1
  • Stephen C. F. Chan
    • 1
  1. 1.Department of ComputingThe Hong Kong Polytechnic UniversityHong KongChina

Personalised recommendations