Background & Summary

Emotions are complex aspects of the human experience1 that are recognized, analyzed, and comprehended in the field of Affective Computing (AC)2. Recent research in AC has been frequently addressing influence of context in which emotions arise as a determining factor3. Context has a broad definition covering any information that can be used to characterize an individual’s situation4, including their physical or physiological signs as well as their relationship with the environment, other individuals, and daily activities3. While context can trigger and shape emotional responses5, emotions themselves contribute to context6 forming a bidirectional relationship. Emotion- and context-awareness enhance the ability of pervasive computing to interact more effectively with humans7, spanning a wide range of applications from entertainment8 and marketing9 to education10 and healthcare11. In order to benefit from emotion- and context-awareness, their measurement is a key component.

The bidirectional relationship between emotions and context necessitates their concurrent measurement. Measuring multiple aspects of context, from internal, such as emotion or physiological state, to external, such as activity or social context, requires employment of a variety of sensors. Existing technologies allow emotion recognition using physiological12, vision13, or audio14 sensors, among others. Depending on which additional aspects of context are targeted, further environmental15, motion16, or localization17 sensors may be used. However, the use of multiple sensors flags issues such as power consumption, privacy, user discomfort, and high costs18,19,20. A workaround is to restrict sources of context by using only limited number of sensors18; although this might come at the expense of constraining the ultimate goal of context-awareness. Such expense can be mitigated by doing more with less and delving into the opportunities of using a single sensor for multiple purposes. Here, our focus is on miniaturized Inertial Measurement Units (IMUs) that are placed on the chest as they provide a unique opportunity to concurrently capture not only movements of the wearer, but also heart- and respiration-originated vibrations, i.e. Seismocardiography (SCG)21.

At the same time, the use of Machine Learning (ML) algorithms has made it approachable for Artificial Intelligence (AI) to recognize human emotions through physiological data12. Several datasets have been published to address physiological responses to emotional stimuli, e.g., EMOGNITION22, POPANE23, BIRAFFE(2)24,25, CASE26, ASCERTAIN27, DECAF28, DEAP29, and MAHNOB-HCI30. Among the physiological data, Heart Rate Variability (HRV) has been investigated several times for AC with heart-originated biosignals being derived from Electrocardiography (ECG)31, Blood Volume Pulse (BVP)32,33, and/or Radio Frequency (RF) sensors34. However, the potential use of SCG for AC remains untouched. Chest-worn IMUs can provide SCG data as a potential proxy for AC. In addition to SCG, chest-worn IMUs have applications in activity recognition, posture analysis, localization, swallow detection, and Voice Activity Detection (VAD)21. With the variety of their offered contextual information, they have the potential to bring high throughput with less cost and power consumption comparing to when multiple sensors are used. Given the role of IMUs in capturing movement-related context, their integration into AC applications can ultimately broaden the spectrum of measured context using the same sensor35. This could additionally yield valuable insights into the interplay between physiological responses and body movements during emotional experiences, potentially paving the way for more comprehensive and accurate AC systems. However, existing datasets lack the setup to unveil the potential of the chest-worn IMUs for capturing multiple contextual information with one sensor. In this paper, we describe our dataset called EmoWear, with which we try to fill mentioned gaps by touching upon emotions, activity, swallow, and voice activity, as targeted contextual aspects.

Two key enablers distinguish EmoWear from existing datasets in AC: first, the inclusion of SCG extracted from either Accelerometer (ACC) or Gyroscope (GYRO) modalities, which we validated with physiological sensor data; and second, the incorporation of concurrent IMU-oriented contextual tasks (detecting gait, voice activity, and drinking), alongside the standard setup for emotion recognition. Among AC datasets that incorporate IMU sensor data, none are made for investigation of SCG in the field of AC. BIRAFFE225 offers ACC data collected from a gamepad; K-EmoCon36, K-EmoPhone2, and Schmidt et al.32 collect their ACC data only from the wrist. Only WESAD37 and cStress38 provide chest-sensed acceleration data. However, cStress38 has a sampling rate of 16 Hz for its ACC data, which is not suitable for SCG21, nor does it make the data publicly available. While the ACC data provided in WESAD may have utility for SCG, it does not validate usefulness of their ACC data for SCG e.g. by showing its correlation with heart and breathing signals. Moreover, the WESAD dataset differs significantly from ours in that it does not involve contextual tasks that are motion-detectable (such as gait, voice activity, and drinking), focuses solely on amusement and stressful states in its presented stimuli, and notably does not measure GYRO data.

The EmoWear dataset contains around 70 hours of inertial and physiological recordings from 49 adults who have undergone a set of experiments in a lab environment. These experiments include watching 38 emotionally eliciting video clips, self-assessing emotional state, walking a predefined route, reading out sentences, and occasionally drinking water. To quantify emotional states, we employed the widely recognized circumplex model of affects39,40, with the valence, arousal, and dominance scales. We included supplementary questions to assess participants’ familiarity with the stimuli and their subjective liking of the presented content. Physiological data was recorded using two consumer-grade wearables: Empatica E4 providing BVP, Electrodermal Activity (EDA), Skin Temperature (SKT), Inter-beat Interval (IBI), and ACC data and Zephyr BioHarness (BH3) providing ECG, Respiration (RSP), SKT, and ACC data. Motion data was exclusively recorded using three ST SensorTile.box (STb) devices, each providing three sources of ACC and one GYRO data. Two of these devices were worn on the bottom of the sternum (used for SCG) and its corresponding height on the subjects’ backs, while the third was attached to a cup of drinking water to capture any drinking activity. Table 1 provides a summary of the main dataset characteristics.

Table 1 Summary of the EmoWear dataset characteristics.

The EmoWear dataset presents the opportunity for combined research, unlocking the potential of a single chest-worn IMU for measuring multiple contextual dimensions it has to offer. Use of duplicate modalities capturing heart activity (i.e., ECG and BVP), allows for the validation of SCG against them for HRV monitoring. Employment of chest-worn IMUs, measuring movement data in both stationary and walking conditions allows for the study of both SCG and gait analysis for Emotion Recognition (ER). Additionally, VAD, drinking, and gait detection are possible using the dataset to discover the possibility of a wider context measurement via the same setup; a selection that expands concurrent applications of chest-worn IMUs. To the best of our knowledge, this is the first dataset to enable investigation of SCG for emotion recognition; and to enable investigating applicability of a single IMU for multiple contextual aspects. Potential research topics that may be investigated using the EmoWear dataset include but are not limited to: (1) use of SCG for AC, (2) emotion recognition in walking, (3) effect of variations of heart rate due to walking on emotion recognition, (4) multi-modal emotion recognition, (5) VAD using chest-worn IMU, (6) swallow detection using chest-worn IMU.

Methods

Ethics statement

All the hardware, methods, and procedures associated with the collection of the EmoWear dataset were reviewed in advance by the independent Ethics Committee for the Social Sciences and Humanities of the University of Antwerp, under the file SHW_22_035. The committee issued a positive decision regarding the ethical clearance after reviewing the following documents: (1) application file for ethical clearance of the Ethics Committee for the Social Sciences and Humanities of the University of Antwerp, (2) methodology of the study, (3) information sheet for the participant, (4) consent form for the participant, (5) a list of ethical committees to which the research proposal will be presented, (6) all information that will be used to contact the participants, (7) all the diaries or surveys that will be presented to the participants. The participants were informed in advance on the experiment details both orally and in the written form. They were explicitly asked for their consent to participate, and they were informed about their right of withdrawal from the experiments at any stage of the data collection procedure. No personal information that could lead to their identification was saved from the participants.

Participants

Participants were recruited through advertisements on the following social media platforms: Facebook, Instagram, X (at the time known as Twitter), and LinkedIn. The advertisement invites healthy adults, aged between 18 and 65, without major known heart, movement, or psychological issues, to schedule an appointment for a voluntary data collection session. It was presented in both written form and with graphics, informing participants that they would watch videos while wearing physiological sensors, assess their emotional states, and enjoy a drink afterwards. Additionally, it indicated that the experiments would take approximately two hours of their time.

Out of all those who scheduled appointments, 49 attended their designated slots. None of the attendees were excluded due to health issues; instead, they were asked to disclose any physical or mental disorders in a pre-experiment questionnaire. Each participant is identified by a unique 4-character ID assigned by the Graphical User Interface (GUI), as well as a sequential integer code ranging from 1 to 49. Due to an unintended failure to run the logging software, data from one participant (code: 35, ID: 9W29) were not recorded, reducing the total number of EmoWear participants to 48 (21 females, 27 males) aged between 21 and 45 (mean = 29.27, SD = 4.53).

Hardware

We used the following wearable sensors to collect inertial and physiological data from the participants: (1) ST STb set up to collect acceleration and rotation data from three different IMUs it has: LIS2DW12, LIS3DHH, and LSM6DSOX. Three STbs were employed. One was positioned vertically on the sternum, aligning the bottom of the box (which houses the micro-USB port) with the bottom of the xiphoid process, and the back of the box (which displays the printed device label) was pressed against the skin using an elastic strap for male participants and their own bra for female participants. The second box was placed at the same level as the first, on the subject’s back, secured with the same strap/bra, with the bottom of the box facing the ground and the back against the skin. The third box was attached vertically in a similar orientation to a cup of water on the participant’s desk, using a pair of adhesive Velcro strips. (2) Zephyr BH3 device, programmatically configured to be worn on the left side, with its log format set to “Enhanced Summary + Wave.” This device logged ACC, ECG, R wave to R wave (RR), raw RSP, and Breath to Breath (BB) data. It was worn immediately below the strap/bra used to secure the STbs. (3) Empatica E4 wristband worn on the non-dominant hand (as in37) to minimize motion artifacts in the data collected from the participants. It recorded various data including BVP, EDA, SKT, Heart Rate (HR), IBI, and ACC.

We selected the STb device because it offers a variety of accelerometers and allows for different configurations of its accelerometers. We employ all of them to provide redundancy, and set them up differently (Table 2) to ensure robust motion sensing on a range of small to big movements. The STb has been previously validated for various applications, including activity recognition41, micro-positioning42, fall detection43, and virtual reality44. Prior to our selection, we conducted a literature review on the IMUs used for SCG21 and ensured that the available setups of the STb met similar specifications.

Table 2 Sensors used in the dataset, their position, provided data, and (configured) setup.

The Zephyr BioHarness was originally designed for sports performance monitoring and training optimization45. Nevertheless, its measurements have been validated for various applications, including ER46,47, stress detection47,48, energy expenditure estimation49, and psychometric research50, among others. The measurements of both HR and respiratory rate from this device have demonstrated promising correlation with other laboratory-grade and previously validated portable devices51. Presence of this device in our dataset provides additional gateways to validate our SCG signal with the ECG and the respiratory signal of the BH3 device.

Empatica E4 has been previously validated for ER22,32,52, stress detection37,53, mental fatigue modeling54, and energy expenditure estimation49 among others. In our dataset, we employed the Empatica E4 to incorporate an additional proxy for the validation of heart monitoring i.e. the BVP, as well as the EDA and the SKT to enable multi-modal ER.

All the wearable sensors were configured to record data offline, storing it within their internal memory. These sensors offered a range of sampling rates for different types of data, which are detailed in Table 2. Fig. 1 illustrates the hardware setup in action. The coordinate systems of all devices, particularly those of the accelerometers, are illustrated in this figure. In order to match coordinate system of the trunk-worn devices, we applied adjustments to the STb accelerometers and gyroscopes. Specifically, we negated the x-axis of the STb accelerometers and gyroscopes. Additionally, we noticed an inconsistency in the y-axis of the STb gyroscopes, where its positive direction was linked with clockwise rotation around its corresponding accelerometer axis. To rectify this, we negated the y-axis of the STb gyroscopes as well. These adjustments are reflected in the processed dataset packages, namely the mat and csv packages, which are explained under Data Records. Directions in the figure reflect the resulting coordinate systems of the accelerometers after these adjustments. Note that all gyroscope axes now have their positive direction on counterclockwise rotation around their corresponding accelerometer axes.

Fig. 1
figure 1

(a) Sensor setup used in the dataset. Wearable hardware included: two ST SensorTile.boxes secured with elastic strap, Zephyr BioHarness 3 strap worn on left body side, and Empatica E4 wristband, all having contact with the body skin. An additional ST SensorTile.box was connected to a cup of water. Arrows indicate the positive direction of the Cartesian coordinate system for each sensor, based on which acceleration data is reported in the mat and csv packages. Reprinted with permission from IEEE55, slightly modified afterwards. (b) An author demonstrates the complete experimental setup in action.

We used a 24-inch Dell U2419H monitor to display our GUI to the participants. Additionally, we employed noise-canceling wireless headphones, specifically the Sony WH-H910N, to deliver audio of the video clips and minimize potential environmental noise disturbances.

Graphical user interface

We used ColEmo data collection GUI, an open-source software interface that we designed for our study55. ColEmo was used to: (1) Present and collect pre-experiment questionnaire, (2) Facilitate experiment phase control by providing timely instructions for every step of the data collection process, (3) Facilitate recording, communication, and verification of vocal vibration data, (4) Presenting emotional stimuli, and (5) Collecting self-assessment records.

ColEmo was run on the participant’s laptop and used the following I/Os to interact with participants: (1) An external monitor to present the software and the stimuli, (2) An external headphone to present the stimuli audio, (3) The laptop’s built-in keyboard for the pre-experiment questionnaire, (4) An external mouse to enter self-assessment reports and press software buttons, (5) The laptop’s built-in microphone to record their temporary voice in the VAD data phase.

GUI logs

ColEmo published its logs using the Message Queuing Telemetry Transport (MQTT) protocol, with the participant’s laptop itself serving as the MQTT broker. The logs involved participant’s answers to the questionnaire and the self-assessment reports, as well as event markers that indicated occurrence time for every single phase change of the GUI (e.g., beginning of a stimulus video; see Table 7). An MQTT logger application (https://github.com/curzon01/mqtt2sql) installed on the same laptop captured and saved these messages into an SQLite56 database. The experimenter used an additional laptop outside the experimental room to monitor the progression of the experiment. Experimenter’s laptop subscribed to all topics published by ColEmo, resulting in the logging messages being captured and displayed on the experimenter’s laptop. Table 3 explains the logging information generated by ColEmo. The topic IDs can be used to find specific types of messages from the SQLite database.

Table 3 Description of the GUI logs.

Pre-experiment questionnaire

After obtaining their consent, the participants were asked to complete an electronic pre-experiment questionnaire regarding their long- and short-term conditions prior to the experiments. Such data will help future data users draw their own inclusion criteria based on their specific research requirements. Participants were free to choose the language of the questionnaire, with options including English (default) and Dutch (local language). The questionnaire covered participants’ demographic information, including gender, birth year, education level, dominant hand, vision status, and use of vision aids. Additionally, it inquired about alertness level, normal nightly sleep hours, last night’s sleep duration, any physical or psychological disorders, and the consumption of coffee, tea, alcohol, tobacco, and drugs. Participants were asked to specify the regularity of their substance intake, with separate questions addressing the period up to one day prior to the experiments. These questions were similar to those found in the DEAP dataset29, eliminating Electroencephalography (EEG)-related questions. While filling out the questionnaire, participants could ask the experimenter any questions they had for clarification.

Stimuli

We utilized the same stimuli as those collected and evaluated in the benchmark DEAP dataset29. The stimulus selection process in the DEAP dataset involved multiple stages. Initially, 120 stimuli were chosen, with half of them selected through a semi-automatic process and the remaining half chosen manually. Following that, a one-minute highlight segment was designated for each stimulus. Subsequently, an online subjective assessment experiment was conducted based on the well-known Self-Assessment Manikins (SAM) scale57. This assessment involved rating the videos according to their emotional valence, arousal, and dominance. The valence-arousal space was divided into four quadrants: high arousal-high valence (HAHV), low arousal-high valence (LAHV), low arousal-low valence (LALV), and high arousal-low valence (HALV). Based on the online assessment, the 10 most effective videos from each quadrant were selected, resulting in a final set of 40 stimuli. We refer to each of the four mentioned quadrants as (reference) labels for the used video clips.

We were unable to locate 2 out of the 40 labeled DEAP stimuli, resulting in a total of 38 stimuli for our dataset. These stimuli were distributed as follows: 10 with the HAHV label, 9 with the LAHV label, 10 with the LALV label, and 9 with the HALV label. The 2 missing stimuli corresponded to experiments with IDs 16 and 35 from the DEAP dataset. The 38 stimuli were presented in the form of 38 experiment cycles, during which the stimulus contents were displayed on the 24-inch monitor. The screen resolution was 1920 × 1080, and the videos were configured to play with a height of 400 pixels while maintaining their original aspect ratio. To minimize bias, the order of the experiments was randomized, following a similar approach to previous studies26,29. To ensure that participants concluded the data collection session with a positive experience, the last experiment was always selected among the videos with high labeled emotional valence. To implement this, the last experiment was chosen first and appended to the end of a shuffled list containing the remaining experiments.

Self-assessment surveys

Annotating emotional state depends on the choice of the emotional model. We used a dimensional approach where participants assessed their own affective state based on the circumplex model of affect, as frequently used by the existing benchmark datasets29,30,37,58. In order to facilitate comparison with the DEAP dataset which shares the same stimuli content, we employed the same continuous scales. Participants were instructed to assess their affective state after the presentation of each stimulus video. Same SAMs as in the DEAP dataset were used to assess the emotional valence, arousal, dominance, and liking. The fifth question assessed the participant’s familiarity with the presented music video clip on a discrete scale of 1 (totally new) to 5 (well known). Participants could select their desired value by moving a slider below each question with the external mouse. Table 4 provides the specifications of the content presented to participants for their self-assessments following each stimulus video.

Table 4 Content and specifications of the self-assessment survey filled out by the participants after each stimulus presentation.

Experimental procedure

The experiments were conducted in a dedicated meeting room within our institute. The experimental room had a rectangular shape measuring approximately 6 m × 4 m. In the center of the room, there was a rectangular desk measuring 3.6 m × 1.2 m. Participants were seated at one corner of the desk to perform the experiments.

Upon arrival, participants received a comprehensive briefing regarding the purpose, scope, and details of the experiments. They were explicitly informed of their right to withdraw from the study at any point during the session. Any queries or uncertainties were addressed, and all necessary information was provided in written form. After obtaining explicit consent to participate, participants were given the opportunity to use the restroom before starting the experiments. Additionally, they were kindly requested to mute any electronic devices such as smartphones or smartwatches.

Subsequently, participants completed an electronic pre-experiment questionnaire in the ColEmo software. Meanwhile, the experimenter initiated the wearable devices. The experimenter then assisted participants in wearing the sensors. To facilitate synchronizing the sensors, participants were asked to perform three consecutive jumps with around one second pause in between the jumps. During these jumps, the participant was holding the third STb sensor in their hand. The experimenter pressed a button in ColEmo software, right after the participant’s third jump to indicate that the synchronization jumps were over. The moment of this button press was recorded in the GUI logs.

Phase 1: vocal vibration recording

The initial phase of data collection involved recording vocal vibration data with the intention of future IMU data processing for VAD. Participants had the option to decline vocal recording in the ColEmo software if they preferred. This phase was implemented at a later update to the experiment procedure, therefore not all participants have their VAD data recorded (see Usage Notes).

During this phase, participants read out 10 sentences while all wearable measurement hardware was operational. Each sentence was presented to participants for 6 seconds, along with a progress bar that indicated the elapsed time. Participants were asked to read out the sentences within the presentation time for each sentence. It was not possible to proceed to the next sentence if they read too quickly. Two 6-second periods of silence were included before and after the set of all 10 sentences.

Participants were instructed to place their arms on the desk and maintain a steady position throughout the recording. Simultaneously, the readings were temporarily captured using the laptop’s built-in microphone and subsequently processed to mark the beginning and end of each sentence. To achieve this, we used a Python interface to the WebRTC voice activity detector (https://github.com/wiseman/py-webrtcvad). The Python script processed the microphone recording to generate an initial set of time markers denoting the start and end points of speech segments. These markers were used to segment the audio into discrete chunks, which were then reviewed by the experimenter to verify marker accuracy. If any markers were found to be inaccurately detected, the experimenter had the opportunity to make necessary manual adjustments. Once the correctness of the detected voice chunks was confirmed, the microphone recording was permanently deleted to maintain privacy and confidentiality. An overview of the mentioned procedure is illustrated in Fig. 2b.

Fig. 2
figure 2

Experiment procedure of the data collection sessions. (a) The overall flow. (b) Details of the vocal vibration recording (phase 1). (c) Details of the elicit-assess-walk sequences in phase 2.

The sentences used during this phase were selected from the Common Voice dataset59. To ensure ease of reading, our selection criteria included sentences with a length ranging from 6 to 11 words. Moreover, some sentences were slightly simplified in terms of both length and vocabulary. In order to cover a range of sentence types, we included declarative, imperative, and interrogative sentences. These sentences were all presented in English. For reference, Table 5 provides the exact sentences that participants read out during this phase, in the order they were presented to the participants.

Table 5 Sentences that participants read out during the phase 1 of the experiments (vocal vibration recording), in the presented order.

Phase 2: elicit-assess-walk cycles

This phase forms the main and the lengthiest part of the dataset involving a 2-minute initial baseline recording while presenting a fixation cross on a white screen, and 38 cycles of elicit-assess-walk experiments. Each cycle consisted of the 7 following stages: (1) 2 seconds presentation of the sequence number, to indicate participant’s progress out of the total 38 experiments; (2) 5 seconds presentation of a fixation cross on a white screen, to provide intermediate baselines; (3) presentation of an eliciting stimulus video with around 1-minute length; (4) 3 seconds presentation of the same fixation cross on a white screen; (5) filling out the self-assessment survey with 5 scales (valence, arousal, dominance, liking, and familiarity); (6) walking a fixed certain route and returning to the chair; (7) optionally, drinking water before proceeding to the next experiment. The main outline of the methodology above, including the presentation order and time of each component, the stimuli used, and the provided survey questions, matches that of the benchmark DEAP dataset29, with the extension of walking and drinking tasks.

To familiarize participants with the procedure, a trial sequence of the elicit-assess-walk cycle mirroring the main experiment cycles was presented to the participant. The experimenter explained to the participant at each step of the trial sequence, what they had to expect at the real experiments. The trial sequence included an example brief video of a bee on a flower. Participants were informed that the presented video was only a dummy one, and that during the main experiments, they would encounter emotional videos lasting approximately one minute. An example self-assessment survey was also presented, with the experimenter clarifying the meaning of each scale.

During the trial sequence, participants were also informed that they would be asked to stand up and walk after completing the self-assessment survey. The experimenter demonstrated the designated walking route for participants to follow at the end of each experiment cycle. This path covered a total distance of approximately 18 to 19 m, involving circling around three sides of the desk, specifically, one long side, one short side, and finally another long side, before returning along the same route. During the walk, an on-screen timer counted 19 seconds before the participants could proceed to the next experiment. Participants were explicitly instructed to walk at their own pace and disregard the on-screen timer.

Purpose of adding such walking stage was multi-fold: First, to enable research on any potential effect of emotional state on gait (as in35). Second, to enable research on the effect of physiological parameter changes rising from movement on falsified emotion classification (as in60,61). Third, to enable research on movement artifact removal from the SCG signal, either by pure signal processing methods (as in62,63), or by using the additional STb on the back (as in64). Finally, to possibly facilitate washing out the previous elicited emotion and prepare participant for the next experiment.

After returning, participants were instructed to press the “continue” button on ColEmo when they were ready for the next experiment. Additionally, a cup of water, equipped with stb, was provided on the participant’s desk. They were encouraged to drink as needed, but only after completing each walk and before continuing to the next experiment in ColEmo.

At this stage, the experimenter exited the room. Participants initiated the second phase of the experiments, starting with a 2-minute initial baseline recording while a fixation cross was displayed on the screen. The explained experimental sequence was then repeated 19 times, followed by about 5 minutes break. The experimenter returned to the room during the break, offering participants non-alcoholic and non-caffeinated beverages. Once participants were ready, the experimenter exited, allowing participants to complete the remaining 19 experiments.

Upon concluding all experiments, the experimenter re-entered the room to stop the wearable sensors. Total data recording session lasted approximately 1 h 30 min. Participants were then sincerely thanked and offered a drink at our cafe. Fig. 2 shows the detailed flow of the data collection procedure explained above.

Data cleaning and synchronization

After completing the data collection session, the raw data from all wearable devices was copied to a secure drive. After the whole dataset was collected, we imported the entire dataset into MATLAB to achieve uniform data structures and synchronize the data for all participants. Data cleaning and synchronization was performed with MATLAB R2023b.

The synchronization process consisted of two main steps: synchronization of data from different wearable devices, and synchronization of those data with the event time markers generated by ColEmo (see GUI section). For the first part, initially, we processed the three jumps executed by the participants. We identified the moments when each participant reached the apex of their jumps for all devices, resulting in five vectors of three points. These vectors were used to synchronize all sensors with respect to the BH3 device. This synchronization involved minimizing the distance between the vectors of each sensor and that of the BH3 device (see Fig. 3a). Additionally, we compensated for potential clock drift among the sensors by manually identifying corresponding data points among the sensors during the last experiments. Correction factors were then calculated to ensure synchronization with the BH3 device.

Fig. 3
figure 3

(a) Synchronization process of the five sensors using three jumps. Red dots represent the detected top of jump moments where acceleration crosses the gravitational value (g), while black dots denote the specified end of all jumps (only specified on the  BioHarness 3 (BH3) plots). All sensors were synchronized with respect to the Zephyr BH3 device. (b) Synchronization of the Front ST SensorTile.box (STb) sensor with the event time markers using voice activity data. Black lines indicate detected voice onsets (b#) and offsets (e#) using microphone (MIC) found in the event time markers, while red lines represent detected voice onsets and offsets using accelerometer (ACC) data from the Front sensor. The necessary time shift is determined by minimizing the distance between the two sets of markers and is applied to all sensor data to synchronize them with the event time markers.

The second part involved synchronizing sensor data with the event time markers generated by the ColEmo app. Event time markers refer to specific points in time indicated by the GUI logs, such as the start and end times of a certain stimulus (see Table 7 for an extensive list). The goal of this synchronization is to ensure that the time values associated with GUI events align correctly with the sensor data. For this part, we made a distinction between the participants who have their VAD data recorded, and those who do not. The former could be synchronized using their VAD phase data, as depicted in Fig. 3b. We first applied the same VAD algorithm to the accelerometer data that was used for the microphone data. Subsequently, all sensor data was shifted to minimize the distance between the accelerometer-detected and the microphone-detected voice onsets/offsets. To implement this, we compared our detection algorithm per participant on all axes (including magnitude of 3D acceleration) of both the Front and the Back sensors. For the synchronization purpose, we only took into account those onset/offsets that were best detected by the algorithm according to careful visual inspection. For the VAD-less group, we conducted manual synchronization taking into account two key criteria: (1) we ensured that all three jumps were completed before being indicated by their corresponding GUI time marker (eoj < 0, see Table 7); and (2) we verified that the walking activity for all 38 experiments fit within the corresponding time window of the GUI application, aligning to our best effort with the period between the issuance of the walking instruction and the start of the next experiment (walkDetecti > walkBi and walkFinishi < newExpi+1, see Table 7).

We also identified missing data, imperfections, and anomalies. All identified problems are included in a CSV table along with the dataset (meta.csv).

Data Records

The EmoWear dataset is available at Zenodo65 at the following DOI link: https://doi.org/10.5281/zenodo.10407278. The data is divided into three main packages: raw, mat, and csv, each serving distinct purposes to comply with various user needs and software compatibility. In addition to the mentioned packages, three separate files are included: (1) questionnaires.csv containing all participants’ responses to the pre-experiment questionnaires, (2) meta.csv containing information about data completeness (all identified missing data, imperfections, and anomalies), and (3) sample.zip containing sample data of one participant from all three provided packages. This sample file provides an easy way to check out the records in the dataset without having to download the whole data. The three packages of data are discussed below.

raw

The raw data package contains the original, unprocessed data collected during the study. The data is organized into folders named according to a specific convention, denoted as “[code]-[ID]”, representing individual participants. Additionally, a SQLite database file (mqtt.db) is included in the package. This database file holds ColEmo logs including event time markers, self-assessment reports, and pre-experiment questionnaires of all participants. Within each participant’s folder, there are zip files corresponding to the wearable sensors, each containing raw data in the form of CSV files. An info.txt file is provided within each device’s zip file, offering detailed explanations of the raw sensor data and file contents. Please note that the column order of the CSV files in the STb devices changed starting from participant code 6 onwards, including code 6 itself. This change is reflected in the info.txt files as well.

Also, we identified and resolved two problems in the SQLite database file, mqtt.db: (1) The entry for the “vidE” event in the 19th sequence for the participant with the ID “9UP0” was missing. We manually duplicated the “postB” event from the same experiment and modified it to “vidE” to solve the issue. Please note that these two markers are simultaneous (see Table 7). (2) The entry for the “participantRegistered” event for the participant with the ID “9VYD” was mistakenly stored in a separate database file due to a malfunction in the OneDrive synchronization mechanism. To resolve this issue, we manually transferred this missing entry to the main mqtt.db file.

mat

The mat data package contains cleaned and synchronized participant data suitable for analysis within MATLAB. The folder structure within this package replicates the convention of the raw package, using “[code]-[ID]” folder names to distinguish individual participants. Inside each participant folder, four MAT files are found: signals, markers, surveys, and params. The signals structure provides access to synchronized wearable sensor data. It is organized as MATLAB structural variable with sensor names serving as structural fields. Each of these fields contain tables of data categorized per signal name that the wearable device provides. markers holds tables specifying time markers for crucial events during the experiments (see Table 7). markers is further divided into the “unique,” “phase1,” and “phase2” tables. unique table provides the unique time of end of jumps (eoj), beginning and end of the VAD phase (vadB and vadE), beginning and end of the initial baseline recording (baselineB and baselineE), and finally, beginning and end of the mid-experiments break (pauseB and pauseE). phase1 table holds time markers of the detected beginnings (onset) and ends (offset) of the 10 sentences, read out by the participant during the first phase of the experiments i.e. the VAD phase. phase2 contains the event time markers during the second phase of the experiments, i.e. the elicit-assess-walk sequences. surveys is a table that holds the participant’s self-assessment survey responses for all experiment sequences of the second phase. All timestamps found in the mat package are referenced with the sync moment, in which the experimenter indicated that the synchronization jumps of the participant were completed by pressing a button in ColEmo. Finally, the params table holds the time shifts and correction factors that were applied per wearable device in order to synchronize them. See Tables 6 and 7 for detailed description of mat package contents.

Table 6 Overview of the data records per participant in the mat and the csv packages.
Table 7 Meaning of the event time markers used in the dataset.

csv

To provide compatibility with a wider range of data analysis tools, the csv data package mirrors the mat package but presents the data in CSV format. It shares the same organizational structure, with folders per participant. Each folder contains a total of 25 CSV files (6 files for E4, 8 files for BH3, 2 files × 3 STbs for front, back, and water, 3 files for markers, 1 for surveys, and 1 for synchronization parameters). The naming convention for these CSV files reflects the structure of the corresponding MATLAB variables, e.g., signals-bh3-ecg.csv and markers-phase1.csv. Tables 6 and 7 provide detailed information on the data records of the csv package.

Technical Validation

We analysed technical validity of the EmoWear dataset from three aspects: (1) effectiveness of the emotion elicitation, (2) quality assessment of the collected signals using the Signal-to-Noise Ratio (SNR) metric, and (3) correlation of comparable signals collected from different body positions. We further explain these aspects in the three following subsections. The statistical analyses in this section was conducted using R v4.3.1, while the calculations for SNR and the correlation matrices were performed in MATLAB R2023b.

Elicitation success

As mentioned in the Methods section, different video clips were targeted at eliciting emotions in specific quadrants of the valence-arousal space (i.e., HAHV, HALV, LAHV, LALV; a.k.a. the reference labels). We compare the self-assessment scores of our participants collected in our experiments against the reference video clip labels (taken from the DEAP dataset29). Fig. 4 illustrates a scatter plot of the mean location of the self-assessed valence and arousal per video clips, and Fig. 5 shows distribution box plots of all the self-assessed scales per video clip conditions. The reference labels of valence-arousal quadrants in both figures (the legends in Fig. 4 and the conditions in Fig. 5) are determined by the DEAP dataset29.

Fig. 4
figure 4

Mean locations of the self-reported affective state on the valence-arousal plane for each video clip. The colors correspond to the expected regions based on the labels from video selection procedure, and the bars indicate the standard errors for each video clip.

Fig. 5
figure 5

The distribution of the participants’ self-assessment scores per reported scale (A: arousal, V: valence, D: dominance, L: liking, F: familiarity) for the four different elicitation conditions (HAHV, HALV, LAHV, LALV). All scales are normalized between 0 and 1. The dotted line represents the midpoint of the scales, a threshold used to distinguish high- from low-valence/arousal in the four elicitation conditions.

To test whether the presented video clips elicited a variety of emotions, we used one-way repeated-measures ANalysis Of VAriance (rmANOVA)66 with its independent variable being the reference label. We used Mauchly’s test of sphericity to test whether or not the assumption of sphericity is met. The degrees of freedom were then adjusted using the Greenhouse–Geisser correction to account for violations of sphericity assumption. To check whether the effect sizes are substantive, we calculated the recommended effect sizes of eta-squared (η2)67,68. Table 8 summarizes results of the one-way rmANOVA, as well as descriptive statistics (mean and standard deviation) of the self-assessed scales per video clip condition. Results of the Figs. 4 and 5, and Table 8 suggest that presenting the video clips elicited the targeted emotions from the valence-arousal quadrants. Significant results of the rmANOVA indicate differences between video clip conditions for the scales of valence, arousal, liking, and familiarity, but not dominance. The box plots in Fig. 5 further explain the acquired significance of rmANOVA for different scales.

Table 8 Results of one-way repeated measures-ANalysis Of VAriance (rmANOVA) of the self-assessed scales between film clip conditions (reference labels), as well as the means and Standard Deviations (SDs) of the self-assessed scales per film clip conditions.

Signal to noise ratios

We analyzed the raw signals obtained from wearable devices to estimate their SNRs. We used an algorithm that fits a second order polynomial to the autocorrelation function of the signal, around the n = 0 lag69. This algorithm assumes the noise to be white. While this assumption may not necessarily be accurate, it is a practical approach for SNR estimation that is also used in similar datasets22,23.

We estimated SNRs separately for all raw recordings in both experiment phases. For the phase 1, a single SNR was estimated per raw signal recording for each participant, representing the SNR value during the vocal vibration recording interval, i.e. from the onset of the first uttered sentence until the offset of the last one. In phase 2, we distinctively examined intervals during which participants were expected to remain seated (referred to as sitting intervals) and those when they were not (referred to as moving intervals). Using terms of the event time markers presented in Table 7, these intervals translate to (newExpi, walkBi) and (walkBi, newExpi+1), respectively. For the 19th sequence (after which a break is planned) and the last sequence (usually the 38th for most participants), moving interval is defined differently, i.e.: (walkBi, walkFinishi).

Table 9 lists descriptive statistics of the estimated SNRs. Computed values indicate high quality of the recorded signals. The mean SNRs range from 21.9 dB (EDA signal of the E4 device in moving intervals) to 49.8 dB (\(\parallel \overrightarrow{AC{C}_{3}}\parallel \) signal of the Front STb in sitting intervals). The minimum SNR of the dataset is 8.9 dB corresponding to the EDA signal of the E4 device in one of the moving intervals; however, still 99.7% of the EDA signals during the moving intervals had SNR values above 15.1 dB.

Table 9 Descriptive statistics of the Signal-to-Noise Ratios (SNRs) computed for the recorded raw signals.

Sensor correlations

To further evaluate technical validity of our dataset, we investigate correlations between sensor data collected from our different sensors used, as recommended in49. A look at our available sensor data suggests that we have three sets of comparable data collected from different body sources: a) the heartbeat set, referring to heartbeat-related data including BVP from the E4 device, ECG from the BH3 device, and chest/back vibrations from the Front/Back STbs. b) the breathing set, referring to breathing-related data including RSP from the BH3 device and chest/back vibrations from the Front/Back STbs. c) The motion set, referring to pure movement data from all IMUs (E4, BH3, Front/Back STbs). For SCG signal within the heartbeat and breathing sets, we chose the dorsoventral (z) and the mediolateral (x) axes of the Front/Back ACC and GYRO signals respectively. These axes are expected to be the most promising for this type of analysis in the literature70,71.

The heartbeat set and the breathing set involve signals with distinct morphological properties within the sets. To conduct a meaningful comparison through correlation analysis, we pre-processed those signals to amplify those properties expected to remain similar across diverse signal types. For example, in the case of the heartbeat set, our pre-processing focused on accentuating variations originating from the heartbeats rather than emphasizing their unique morphological properties (such as the QRS complex, the well-known morphological properties of an ECG signal). This was achieved by filtering the signals in bandpasses of interest (0.67 to 3.33 Hz for the heart set and 0.05 to 0.7 Hz for the breathing set) and extracting their autocorrelation sequences. Subsequently, Pearson correlation coefficients were calculated within each set and averaged over the whole dataset.

Fig. 6 presents the correlation results using heatmaps. Whenever the average correlation is above 0.3, all the p-values are below 0.05. Results suggest consistent and meaningful relationships among the measurements taken from the compared sources. Note that the amplitudes of accelerations recorded from the wrist (E4) show no correlation with those recorded from the trunk (BH3 and Front/Back STbs). This lack of correlation can be explained by the natural different movements of the wrist compared to those of the trunk. Similarly, little correlations are observed among the BH3, Front, and Back STbs during sitting intervals when they only measure small vibrations from different positions on the trunk. However, these correlations increase during moving intervals, highlighting the dominance of similarly sensed motions of different body trunk positions during gaits. In summary, the correlation results suggest: (1) absence of systematic malfunctioning in the sensor measurements, as the correlations align with expectations, (2) feasibility of conducting HRV and breathing analysis through the measured inertial data, as they demonstrate meaningful correlation with BVP and ECG data.

Fig. 6
figure 6

Pearson correlation heat maps, computed separately for sitting intervals (from the beginning of progress indicator presentation to self-assessment form submission) and moving intervals (from self-assessment form submission to the beginning of the progress indicator for the subsequent experiment) and averaged over the entire dataset. Compared signals (referred to as listed in Table 2) are noted in the matrix rows and are assigned letters that are reused wherever the same source is used. Note 1: Signals from the heartbeat set and the breathing set undergo preprocessing using bandpass filters and the calculation of their autocorrelation sequences. Note 2: Filter cut-offs applied to the heartbeat set and the breathing set differ, even though identical letters may be found.

Usage Notes

Data interpretation

The three accelerometers and the gyroscope of the ST STb devices were initially configured to provide output data rates as listed in Table 2. However, their data was written to the output files using the task scheduler of the FreeRTOS72 within the STbs firmware. As a result, the timestamps of these sensors are irregular, which must be considered when working with their data. To convert these irregular timestamps to regular ones, functions like interp1 from the MATLAB Signal Processing Toolbox and interp from the Python Numpy library are useful.

For guidance on performing emotion recognition, we direct readers to a comprehensive review of the common steps required for emotion recognition from wearable physiological signals12. For available methods on the use of chest-worn accelerometer data for VAD, activity recognition, drinking detection, etc., we direct readers to our comprehensive survey on applications and methods associated with chest-worn IMUs21.

Limitations

As detailed in the Methods section, the synchronization of event markers with sensor data needed manual intervention. This need arose from an incorrect assumption of ours concerning the time synchrony between the two laptops used in the experiments, namely the experimenter’s and the participant’s laptops. The Zephyr BH3 devices were consistently synchronized with the experimenter’s laptop, as data was copied from and the device was recharged through this laptop. However, the event time markers were generated by the participant’s laptop, which was responsible for running the ColEmo GUI. The synchrony among these two laptops was neither sufficient nor consistent throughout the data collection period, for synchronization of the event markers with the BH3 device.

Another limitation was lack of VAD and the drinking data for the first 26 participants as well as the lack of a pause for the first 2 participants. The reason was that the ColEmo GUI faced two updates throughout the data collection period. First, happened after the 2nd participant. In this update, the 5-minute break was added in the middle of experiments based on the feedback by the first two participants. Also, the sequence number which was not sent along with the MQTT log messages was added to the logs. Second update took place after the 26th participant. In this update, we added VAD phase in the beginning and started using the third STb device on participants’ cup of water. Therefore, the first 26 participants are missing the VAD and the drinking data (codes 1–26). Furthermore, due to a technical issue, there is an absence of drinking data for participant 34, and due to an unintended failure to run the logging software, time markers from participant 35 were not recorded, making their sensor data incomprehensible. Details regarding such instances of data incompleteness and all identified problems are documented in a meta file available in.csv format, along with the data.

Accessibility

The EmoWear dataset is fully accessible to the public without restriction at: https://doi.org/10.5281/zenodo.10407278. However, in compliance with GDPR regulations, we strictly prohibit any attempt by the dataset users to reveal and/or disclose identity or personal information of the participants.