LabelSens: Enabling Real-time Sensor Data Labelling at the point of Collection on Edge Computing

In recent years, machine learning has made leaps and bounds enabling applications with high recognition accuracy for speech and images. However, other types of data to which these models can be applied have not yet been explored as thoroughly. In particular, it can be relatively challenging to accurately classify single or multi-model, real-time sensor data. Labelling is an indispensable stage of data pre-processing that can be even more challenging in real-time sensor data collection. Currently, real-time sensor data labelling is an unwieldly process with limited tools available and poor performance characteristics that can lead to the performance of the machine learning models being compromised. In this paper, we introduce new techniques for labelling at the point of collection coupled with a systematic performance comparison of two popular types of Deep Neural Networks running on five custom built edge devices. These state-of-the-art edge devices are designed to enable real-time labelling with various buttons, slide potentiometer and force sensors. This research provides results and insights that can help researchers utilising edge devices for real-time data collection select appropriate labelling techniques. We also identify common bottlenecks in each architecture and provide field tested guidelines to assist developers building adaptive, high performance edge solutions.


INTRODUCTION
eep Neural Networks (DNN), are attracting more and more attention as a breakthrough in the advance of artificial intelligence showing high potential to accurately classify sensory data. An abundance of commodity edge computing devices powered by AI are currently available including the Nvidia Jetson Nano [1] and Google Coral [2] which are predicted to be used routinely in the near future for on-device processing and machine learning applications.
In order to train DNNs, vast quantities of data must be first collected and labelled. Data can include videos, images, audio, physical activities, temperature and air quality. These can result in huge datasets containing all types of actions and behaviours. Labelling such data is not a trivial task, especially as the promise of such devices is to enable real-time machine learning such as recognising emotions or security threats. So far, most of the attention has been focused on the processing power of these devices and little attention has been paid on how to obtain clean and efficient labelled data to train models [3] [4].
When collecting data in "the wild" [5] a participant could be doing anything from driving a car to eating in a restaurant. Labelling, either automatic or manual, while engaged in physical activities can be particularly challenging. Taking this into account, the nature of each activity needs to be considered, both at UX and User Interface design stage, for data sources and providers as well as at the application level. It is crucial to label sensor data in real-time, as unlike images and audio it is not usually possible to label the data offline using the raw data itself. In pervasive sensing there are three data collection methods [6]. Passive data sensing uses smartphones or other sensors to record unlabelled data in the background [7]. Passive data collection is often used to collect weather [8] , health [9] [10] and environmental data [7]. Alternatively, Active data sensing enables users to label the data in real-time through self-reporting. Active data sensing is often used to report well-being or physical activity. Hybrid data sensing combines both passive and active data collection as it involves users actively labelling the passive sensor data that is recorded in the background [11] as shown in figure 1. The choice of labelling approach depends on the complexity of a problem, the required training data, the size of a data science team, and the financial and time resources a company can allocate to implement a project. The best approach to label data often fundamentally depends on the data and source type being recorded, e.g. sensor data can utilise mobile phone applications to collect labelled data whereas labelling images and audio may utilise postprocessing techniques to implicitly crowdsource the labels such as Google's reCAPTCHA [12].
The labelling rate of sensor data can also dictate which approach to choose as data that frequently changes may require a higher labelling rate along with a more convenient labelling approach. The sample size is another factor that can dictate labelling approaches as the labelling of images can be automated or crowdsourced whereas a large sample size of sensor data requires recruiting many participants for an extended period. Crowdsourcing labels using web-based applications is often employed for images and audio data tagging as it is most commonly processed offline [13]. This is not possible with time-series data which has to be labelled online in real-time at the point of collection due to the nature of the data. Outsourcing the labelling of image, video and audio data to private companies is also gaining popularity although this is also not possible for sensor data as activities cannot be deduced from the raw data meaning real-time labelling techniques must be developed [14].
Our Vision. To meet the above challenge, we introduce LabelSens, a new framework for labelling sensor data at the point of collection. Our approach helps developers in adopting labelling techniques that achieve higher performance. In this paper we present five edge prototypes utilising different labelling mechanisms and provide a comprehensive performance comparison and analysis of these prototypes. In particular, two popular deep learning networks were tested: LSTM and GRU. Both were used to classify human generated, physiological activity data collected from 10 users.
Activity recognition is an established field however the methods used to label the sensor data collected is greatly under-explored. Researchers often manually label the activity participants undertake [15] which typically prevents the collection of data in the wild as it requires the researcher to continuously video participants' activities so that they can be labelled offline.
In this paper our contribution is two-fold: 1) We introduce a novel mechanism to label sensory data on edge computing devices while collecting training data for machine learning algorithms and 2) we present a systematic way to assess the performance of these labelling mechanisms.
Previous research has utilised smartphone applications to enable users to self-label their current activity using onscreen buttons [16]. However, it is not always possible to label using smartphone applications as they require explicit attention making them challenging to use when engaged in additional activities and not all users may own smartphones. Little research has been conducted to evaluate the feasibility and performance of other real-time labelling techniques that would be suitable for edge devices. Going beyond the data collection stage, we also examine the classification accuracy different labelling techniques produce.
The remainder of the paper is organized as follows: section 2 presents related work whilst section 3 introduces our experimental methods. Results are presented in section 4 followed by discussion in section 5. Potenital applications are discussed in section 6 and conclusions and future work are discussed in Section 7.

BACKGROUND: DATA LABELLING
There are numerous labelling approaches that often vary dependant on the data being collected. Sensor data is most commonly labelled using a hybrid approach where the sensor data is continuously recorded and the user occasionally records a label against all or part of the previously recorded data. The labelling of human activities increasingly relies on hybrid data collection techniques using smartphones to continuously record accelerometer data as well as enable users to self-report their current activity [16]. Smartphone applications are becoming increasingly popular to label sensor data as they provide a familiar, always accessible interface for users although, recently the use of new smartphone labelling techniques such as NFC and volume buttons have shown to be intuitive and popular when using an application is inconvenient [6].
Active learning [17] can be used to label data with few labelled training instances as the machine learning algorithm chooses the data from which it learns. Active learning could be beneficial for data where it is challenging to crowdsource labels such as raw sensor data that is not sufficiently labelled [18]. Additionally, active learning has been demonstrated to benefit labelling audio by removing the requirement for trained linguists to label the data and reducing the reliance on outsourcing and crowdsourcing labelling.
Legion:AR [14] used the power of crowdsourcing combined with active learning to label human activities. Active learning was used to automate the labelling process but was paired with real-time human labellers to label the data that cannot be correctly labelled automatically. However, this approach requires cameras to constantly monitor users so that the unlabelled activities can be tagged offline. This may be feasible in specific scenarios such as the workplace but would not be plausible in the wild. Another method to crowdsource human activities requires users to record short video clips of themselves performing different actions at home [19]. While crowdsourcing videos can result in ample data, it only allows for video data to be captured with no other sensor feeds and relies on the willingness of people to perform different activities on video.
The techniques used to label data vastly vary dependant on the data type as images can be labelled offline using an automated process based on clickthrough data, greatly reducing the effort required to create a labelled dataset [20]. Additionally, online tools have been developed that enable users to highlight and label objects within images. The use of an online tool allowed people from around the world to help label objects within images which is simply not possible with sensor data [21].
Labelling audio data is similar to that of images as spoken words are often labelled in-house by linguistic experts or can be crowdsourced. There are many forms of audio labelling including genre classification, vocal transcription and labelling different sounds within the audio e.g. labelling where bird calls start and stop. One solution used album artwork, text reviews and audio tracks to label over 30000 albums to one of 250 labels using deep learning to provide the multi-label genre classification [22]. Vocal transcriptions to train machine learning models have typically relied upon manual labelling by trained linguists but more recently active learning has been used to reduce the number of trained labelled examples required [23].
While labelling specific sounds can be crowdsourced encouraging individuals to correctly label data can be a challenging task as it can be tedious. To increase compliance and engagement during labelling, previous research has developed games such as Moodswings [24] and TagATune [25] where players would label different sounds. TagATune demonstrates the ability to engage users in labelling data as 10 out of 11 players said they were likely to play the game again.
Text data from social media websites can be automatically labelled using the hashtags and emojis each post contains as these often describe the contents of the post however this can result in noisy data [30]. Alternatively. text can be manually labelled but this is a labour intensive process. One solution to this problem involved training a machine learning model using a manually labelled dataset and then combining this with noisy emoticon data to refine the model through smoothing [31]. This method of combining labelled and noisy data outperformed models trained using just one datatype.
The real-time labelling of emotions or mental well-being is more challenging and often uses tangible interfaces as they provide a physical interface to interact with the digital sensor data being collected. Recent advances in pervasive technologies have allowed engineers to transform bulky and inconvenient monitors into relatively small, comfortable, and ergonomic research tools.
Emoball [32] has been designed to enable users to selflabel their mood by squeezing an electronic ball. While this device only allows users to report a limited number of emotions, participants did believe it was simple to use and liked the novel interaction approach. An alternative method to label mood was explored using a cube containing a face representing a different emotion of each face of the cube [33]. Users simply moved the cube to display the face that most represents their current mood providing a simple, intuitive way for people to label data although limited by the number of faces on the cube. Mood TUI [34] goes beyond self-reporting to a hybrid approach in order for users to record their emotions and relevant data from the user's smartphone including location data and physiological data such as heart rate. Participants found the use of a tangible interface very exciting, demonstrating the potential for them to increase the usability and engagement of labelling but thus far they have not been utilised outside of self-reporting emotions.
Numerous methods of self-reporting emotions have been explored including touch, motion and buttons. These interaction techniques have paved the way for unique interactions with devices but the accuracy of the techniques has not been evaluated as previous research has not used the data collected for machine learning but purely as a method for individuals to self-report their well-being.
Sometimes it is not possible to interact with physical devices to label sensor data such as when driving. One solution to this problem used the participants voice to label pot holes in the road [35]. When labelling fast-changing data such as road conditions it can be difficult to label the data at the exact time so techniques to analyse sensor data windows near the label allow the exact pothole readings to be  correctly labelled. Techniques such as these are vital to ensure that the sensor data is correctly labelled as incorrectly labelled data will result in inaccurate machine learning models that will not be able to correctly classify any future data. Table 1 shows the current labelling approaches used including in-house labelling and crowd source labelling that require user activities to be video recorded so they can be manually labelled offline. Similarly, automatic labelling can use large amounts of labelled video or sensor data to enable future data to be automatically labelled, dramatically reducing the time required to label but also reducing the accuracy in which the data is labelled. Alternatively, Generative Adversarial Networks (GAN) can be used to automatically generate further labelled data but a vast labelled dataset is first required and the synthetic data labels may be highly inaccurate.
In comparison, labelling at the point of collection is highly accurate as it is real-time, cost effective, time effective and enables in-situ data to be collected. Thus far however labelling at the point of collection has had limited use mainly consisting of smartphone applications. There are numerous scenarios where labelling at the point of data collection would result in the most effective and accurate data but the requirement of a smartphone results in its dismission. When providing participants with tangible interfaces to collect sensory data, embedding a labelling method directly into the device removes the requirement to use a smartphone app while also creating a simple, tangible, easy to use method to label in real-time and in-situ.
Overall, there are numerous possibilities for text, audio and images to be labelled offline, unlike raw sensor data which must be labelled in real-time. Tangible interfaces have previously been used to self-report but the data is often not collected to train machine learning models and so the accuracy and validity of the labelling techniques have never been evaluated. Human activity recognition has been well-researched but the techniques to label the data have always either involved offline labelling or a mobile phone application. The use of tangible interfaces containing different labelling methods has not been considered. This research aims to explore the impact different labelling techniques embedded within tangible interfaces has on the accuracy of labelling, label rate and usability.

INTERFACES & EXPERIMENTAL SETUP
Labelling at the point of data collection provides many benefits including low cost, reduced time and the ability to label data in the wild. Tangible interfaces present many opportunities to embed unique physical labelling techniques that are easier to use than comparative mobile phone applications that are currently used to collect in-situ labelled data. Furthermore, tangible interfaces provide ideal interfaces to directly embed a magnitude of sensors, negating the need for participants to carry sensors in addition to a mobile phone to label the data. By embedding a labelling technique along with the sensors in a tangible interface rather than a separate mobile app it also ensures sensor data is being collected in real-time as the labels are recorded.
To evaluate the performance and capabilities of different labelling techniques for tangible interfaces, five prototypes have been developed that each contain a unique labelling technique and will be used to label human activity (walking, climbing downstairs and climbing upstairs) : • Two adjacent buttons (press one button for climbing upstairs, press the other button for climbing downstairs and press both buttons to record walking) • Two opposite buttons (press one button for climbing upstairs, press the other button for climbing downstairs and press both buttons to record walking) • Three buttons (one button each for climbing upstairs, climbing downstairs and walking) • Force sensitive resistor to measure touch (Light touch for walking, medium touch for climbing downstairs, hard touch for climbing upstairs) • Slide potentiometer (slide to the left for climbing downstairs, slide to the middle for walking and slide to the right for climbing upstairs) Each tangible interface is a 7cm * 7cm * 7cm 3d printed cube that contains a labelling technique combined with the required sensor and microcontroller. The embedded electronics include: • Arduino Nano microcontroller. Due to its small size, open source nature and wide compatibility with sensors • Inertial measurement unit (IMU). To record motion data. An IMU with 9 degrees of freedom has been used as it integrates an accelerometer, a magnetometer and a gyroscope to provide better accuracy and additional data.

•
Micro SD card reader to locally record the IMU sensor data along with the user inputted label.
The buttons and slide potentiometer enable users to easily visualise the activity they are labelling whereas when using the touch sensor it is difficult to distinguish between the three levels of force. To visualise the selected label a multicoloured LED has also been incorporated into the device that changes from green to yellow to red when the device is touched with low, medium and high force. Figure 2 shows the electronic circuit and the developed tangible interface for each of the 5 labelling devices.

Fig. 2. Electronic circuits and developed tangible interfaces
This novel approach to in-situ labelling provides an easy to use interface that facilitates the collection of realtime labelled data. The five interfaces demonstrate the variety of labelling techniques that can be used in comparison to traditional app-based or offline labelling.
An experiment has been conducted that aims to explore the feasibility of different self-labelling techniques contained within tangible interfaces. Each participant used each tangible interface containing a different labelling technique for 2 minutes while undertaking 3 activities; walking, climbing upstairs and climbing downstairs. These activities allowed for each participant to experience the different labelling techniques as well as collect sensor data which can be used to examine the accuracy and performance of each labelling technique.

Labelling rate
The maximum labelling rate of the devices is a key factor in deciding a labelling technique as some forms of data can frequently change requiring a new label to be recorded multiple times every minute. To measure the maximum rate at which it is possible to label the data, each device was used continuously for 2 minutes to record the maximum number of label changes possible. Figure 3 shows the total number of times each label was recorded on each device.

Upstairs label
Two adjacent and two opposite buttons show the lowest data rate for each of the three labels because of an intentional delay that is added to prevent mislabelling when clicking both buttons to record the third label. Three buttons shows significantly more data recorded although very little data recorded for one of the buttons possibly due to the third button being more difficult to reach as each button is located on a different face of the cube. The touch sensor recorded a high label rate for all three labels as to reach label 2 (high setting) by forcefully pressing the sensor the user must first record label 0 and 1 as they increase the force exhorted on the sensor. The slider shows high labelling rates for label 0 and label 2 but not label 1 because it is simple to slide the slider from one end to the other but the slider was rarely located in the middle of the device long enough for the label to be recorded. It is also vital to compare the number of times the label changed over the 2-minute period to evaluate how simple it is to change label for each technique. Figure 4 shows the slider recorded the most label changes overall because of the simplicity to navigate between the labels and the high number of labels recorded followed by two opposite buttons which is surprising due to its low labelling rate shown in figure 3. This demonstrates that while the use of buttons does not result in the highest labelling rate it is simple to switch between the different labels. Touch, three buttons and two adjacent buttons all performed similarly showing there is little difference in accessing all of the labels in the devices.
Once all participants used each device to label walking, climbing downstairs and climbing upstairs the data was extracted enabling comparisons to be established. The rate at which labels were changed during the collection of activity data shows three buttons recorded fewest labelling changes for all users while two opposite buttons had the highest overall labelling rate demonstrating fewer buttons increases ease of use. Labelling via touch had a consistently high labelling rate for users but this again could be due to the requirement of looping through all of the labels to reach the desired level. Overall the slider and three buttons produced the lowest labelling rate during data collection showing these labelling techniques should not be utilised with data that requires frequent labelling changes.  Figure 5 shows the total number of in-situ recorded labels from all participants for each of the devices. Touch and slider have the most labels recorded as when using these labelling techniques each label must be cycled through to change the label. Two opposite buttons has the smallest number of labels which is to be expected as a delay had to be added after a button press to prevent incorrect labelling. Because of the delay it was expected that two adjacent buttons would similarly have a low data rate but it achieved a higher data rate than three buttons, possibly, because of the difficulty of accessing the three different buttons on different faces of the cube. This shows the position of the buttons has a greater impact in the number of labels recorded than the number of labelling techniques embedded into the device.

Algorithms
In order to identify the three activities from the sensor data collected artificial neural networks (ANN) were used to develop three predictive models. The performance of three supervised, deep learning algorithms were tested to classify the sensor data into three activity classes. A multilayer Recurrent Neural Network (RNN) with Long Short-Term Memory (LSTM), a multilayer RNN with Gated Recurrent Unit (GRU) and multilayer RNN with a stacked LSTM and GRU were selected due to their high performance and capabilities in classifying time series data. It is vital to utilise LSTM and GRU cells when working with sequential data such as human activity to capture long-term dependencies and remove the vanishing gradient. Recently the use of GRU cells is becoming increasingly popular due to their simpler design using only two gates; a reset gate and an update gate rather than the three gates used by an LSTM; forget gate, input gate and output gate. The use of a GRU can significantly reduce the time required to train models because of its simpler structure as it exposes the full hidden content to the next cell. GRUs have also been shown to outperform LSTM networks when there is a smaller training dataset but LSTMs should remember longer sequences than GRUs outperforming them in tasks requiring modelling long distance relations [36] [37] [38] [39]. The data collected from each of the five devices was used to train the three models over 10 epochs with 10-fold cross-validation. Figure 7 shows the accuracy of each model. The stacked LSTM and GRU displayed little impact compared with the LSTM. The GRU outperformed the LSTM and stacked models for most labelling techniques with the exception of two adjacent buttons where the LSTM network achieved the highest accuracy of all the labelling techniques at 92.8%. The GRU accuracies range between 68.5% and 89% demonstrating the impact different labelling techniques has on a dataset and thus the accuracy of a classification model. The two adjacent buttons labelling technique achieved the highest accuracy of the devices which is unexpected due to its complex nature where 2 buttons represent 3 labels. The second most accurate device, touch, was also unexpected due to the more complex interaction required of pressing the device using varying levels of force to record the different labels. It is possible that the more complex action forced users to have a greater focus on labelling their activity resulting in more accurate labelling. This however may not be sustained if the device was to be used for several hours or days. Even though three buttons and the slider labelling techniques resulted in the lowest changing labelling rate, they result in consistently high accuracies in the three trained models. This demonstrates that although it may be more difficult to collect fast changing data with these techniques, the collected data is reliable and capable of producing accurate classification models.
The initial learning rate of the model was set to 0.0025 and a batch-size of 32. The data sequences used during training have a length T=100 with an overlap of 20. Figure  9, shows the accuracy and loss of the combined user test data for all of the devices during each epoch when trained using the RNN with GRU. The loss for each of the models gradually decreases but the loss for the touch and slider decrease significantly as would be expected as they achieved the highest accuracy. Fig. 9. Comparison of training accuracy and loss when using GRU on the total data collected for each device Classification accuracy alone does not provide an informed overview of the most beneficial labelling technique. The f1 score, a harmonic average of the precision and recall, for each label and device has been calculated, as shown in Table 2 . "Walking", has consistently higher precision and recall compared to "upstairs" which has the lowest scores. The slider, two adjacent buttons and touch show the highest f1 scores which demonstrate their consistency as a useful labelling technique. Even though three buttons had a higher accuracy than slider, its f1 score is extremely low when labelling "upstairs", demonstrating its unreliability. Cochran's Q test was used to evaluate the three different models (L=3) for each labelling technique providing a chi squared value and Bonferroni adjusted p-value. Cochran's Q test is used to test the hypothesis that there is no difference between the classification accuracies across multiple classifiers distributed as chi squared with L-1 degrees of freedom. Cochran's Q is similar to one-way repeated measures ANOVA and Friedman's test but for dichotomous data as the classification will either be correct or incorrect and can be applied across more than two groups unlike McNemar's test [40]. Assuming a significance level of α=0.05, Cochran's Q test shows for touch, two adjacent button and three buttons the null hypothesis can be rejected as all three classifiers perform equally well. For the remaining labelling techniques, the null hypothesis has failed to be rejected showing there is a significant difference for the classifiers on those datasets. The F test was also performed to compare the three classifiers as it is regarded analogous to Cochran's Q test. Assuming the same level of significance the slider rejects the null hypothesis in addition to two adjacent buttons confirming Cochran's results.
Cochran's Q test shows there is a significant difference between the models when trained on the two opposite buttons and slider datasets but does not show where the differences lie. To see which models contain the significant differences the McNemar test was performed to compare the predictive accuracy of each model using the 2 datasets.  Table 4 shows the resulting p values when McNemar's test was performed. There is a significant difference between all of the models for both two opposite buttons and slider with the largest difference being between LSTM and the stacked network for both datasets. This demonstrates that both the labelling technique and the model result in significant differences in the models' accuracy and reliability.

DISCUSSION
To ensure the effectiveness of the labelling techniques it is also vital to gain users' preference. 50 users were asked which labelling technique they preferred. Figure 10 shows the results from the 50 users with 28% preferring three buttons as it was simple to understand and use although due to there being one label per button although did not result in accurate models. Two adjacent buttons and two opposite buttons followed again possibly due to the simplicity and familiarity of buttons to label data. The slider was well received but the granular control made the middle label more difficult to access meaning careful consideration had to be made to ensure actions were being correctly labelled. Finally, the least number of people preferred the touch based labelling technique due to the complexity of having to touch with varying pressure to correctly label the data. However, touch did result in highly accurate models showing that while the increased attention required is not preferred it does ensure accurate data labelling but this may not be sustained over long periods.     While the user preference of labelling technique does not correlate with the accuracy achieved for each method it shows the benefits of using buttons as they are well-received by users and also achieve high accuracy. A lower number of buttons than labels is well-received by users and achieves the highest accuracy, but the number of buttons must remain similar to the number of labels to ensure users do not experience confusion when labelling. The position of the buttons has shown to impact user preference. In terms of labelling rate and model accuracy, two adjacent buttons were preferred by users and resulted in 24.3% higher model accuracy than two opposite buttons which had a higher total number of recorded in-situ labels but a lower labelling rate. It is imperative to balance user preference with the rate at which the data needs to be labelled and the accuracy required from the model when selecting an appropriate labelling technique.
Novel labelling methods including the slider and touch displayed their own strengths and weaknesses. Labelling using touch resulted in high model accuracy and labelling rate but was the least favoured by users. If accurate labelling is required for only short periods labelling via touch could be ideal. The slider was liked by users and had the highest labelling rate but achieved the second worse accuracy of all the devices at 73.4% showing the slider is best for continually changing data or granular data that would be more difficult to label with buttons. One of the most significant challenges encountered was the inconsistent quality of labelled data as when collecting insitu data to train machine learning models it is not possible to ensure all users are successfully labelling their actions. By not accompanying users during the labelling process the experiment more replicated in situ data labelling resulting in the different labelling rates experienced even though all users were instructed to walk the same route.
Additionally, as users had to repeat the experiment five times to enable them to use each device, their labelling rate may change as they become more familiar with the experiment. To combat this, users were provided with the devices in varying orders preventing the same device from being used by all users at the same stage of the experiment.

APPLICATIONS
AI powered Edge Computing has numerous potential applications as it is not always possible to label real-time data using a smartphone application. Common uses for tangible labelling techniques include times when users may be engaged in other activities such as labelling whilst physically active. Additionally, tangible labelling techniques are required in cases where specialist sensors are required to collect labelled data such as physiological sensors used to label mental well-being. The tangible labelling techniques discussed provide new opportunities to label real-time sensor data that has traditionally been challenging to label.

CONCLUSION
Tangible user interfaces are ideal interfaces for data collection and running real-time machine learning classifiers but first real-world labelled data must be collected. Images, video and audio data can all be labelled offline but this is not possible with time series sensor data. To address this issue and collect in-situ labelled sensor data five different labelling techniques have been embedded into tangible user interfaces including, two opposite buttons, two adjacent buttons, three buttons, slider and touch. The devices were used by participants to label three physical activities enabling the performance of each technique to be evaluated. It is vital to compare different labelling techniques as machine learning models can only be as accurate as the labelled data they are trained on.
The result from participants using the five labelling techniques to train various RNNs demonstrates that while touch results in a high labelling rate and high model accuracy it is the least favoured by users due to the high level of concentration required to use the device. The slider resulted in high user preference and labelling rate but poor model accuracy with two adjacent buttons having both high user preference and the highest model accuracy showing it is the most beneficial technique for this data collection.
Overall, embedding labelling techniques within tangible user interfaces addresses many of the challenges facing the collection of in-situ, time-series sensor data. When collecting labelled data, the nature of the data, labelling rate, duration of data collection and user preference all need to be considered to ensure the most effective labelling technique is used. This will increase the reliability of collected labelled data and enable more accurate machine learning classifiers.

Eiman Kanjo, is an Associate Professor in Mobile
Sensing & Pervasive Computing at Nottingham Trent University. She is a technologist, developer and an active researcher in the area of mobile sensing, smart cities, spatial analysis, and data analytics, who worked previously at the University of Cambridge, Mixed Reality Laboratory and the University of Nottingham as well as the International Centre for Computer Games and Virtual Entertainment, Dundee. She has authored some of the earliest papers in the research area of Mobile Sensing and currently carries out work in the area of Digital Phenotyping Smart cities, technologies for Mental Health and the Internet of Things for Behaviour Change.
Andreas Oikonomou is a Senior Lecturer at Nottingham Trent University where he teaches Computer Science, Human-Computer Interaction and Project Management. Dr Oikonomou previously taught Computer Science and Games Development at Derby and Coventry Universities in the UK and has also worked as a Technical Project Manager, Quality Assurance Manager and games studio head in the software and games development industry.