1 Introduction

Skin-to-skin touch is an important tactile interaction. This type of touch has attracted the attention of many authors (e.g. [9, 24, 31]) and motivated research across many fields; from philosophy [16, 23], cognitive neuroscience [2, 5, 10, 20, 28], to human development and well-being [1, 4, 8, 25]. All these works are based on introspection or on behavioural observations since, by necessity, they cannot rely on the objectification of the mechanical consequences, hence of the sensory consequences of skin touching skin. It was however recently been realised that the objectification of tactile interactions is possible when hands actively interact with inanimate objects [12, 29, 30]. Motivated by this observation, some of us collected a database of vibration signals collected from a index finger interacting with another finger, or a forearm, with a view to provide objective data produced during skin-to-skin interactions [17]. This latter study demonstrated that the “tactile waves” measured on a touching finger bore features related to the interaction, to wit, the pressure applied (a tonic characteristic) and the sliding speed (a kinematic characteristic). The signals were shown to be relatively independent from the posture with which the interaction was effected, making this technique potentially useful for analyses about tactile behaviour [17].

1.1 Present Study

In the present study we advanced the hypothesis that information contained in single-channel vibration signals recorded from a finger in sliding contact with another finger, or with a forearm, contained information that would enable the discrimination between self-touch and touching another person. In the foregoing, we show that certain supervised machine classifiers can achieve a very high level of success in deciding whether tactile vibrations came from self-touch or from other-touch (touching another person). Supervised machine classifiers trained models through ground-truth labels which are indicated here by self and other. Our findings could contribute to the study of behaviour in many domains, chief among them is the study of the role of touch in social interactions and investigations related to cognitive conditions such as autism or schizophrenia where self/other touch discrimination might be impaired [8, 31]. It also bears the intriguing conclusion that the determining factors differentiating self from ordinary touch are not limited to a unique convergence of sensory and motor signals [3, 14, 15, 23] but that the tactile inputs per se contain cues that are special to self-touch.

1.2 Signal Database

A key attribute of the database is that it included signals recorded in similar conditions of pressure and speed during self-touch but also when touching another person. The vibrations recorded from a finger sliding on skin clearly depended on the pressure applied and on the sliding speed [17]. On this account, it would be surprising if machine learning classifiers were not able to discriminate between categories of intensity or speed. In fact, the multichannel whole-hand recordings described in [29] contained sufficient information to enable a support vector machine classifier to categorise twelve different tactile gestures, three types of materials, as well as the shape of the objects being touched.

1.3 Feature Extraction

Since the data at hand were available in form of time-dependent signals arising from mechanical interactions between objects in contact, it stands to reason that techniques developed to classify sounds would also be appropriate to classify signals arising from sliding fingers. A first option was to train and then to test classifiers using raw data. Another possibility was to extract domain knowledge features from the signals.

Features frequently used in the processing of sound include: Maximum Mel Frequency Cepstral Coefficients (MFCC), a quantification technique for vibratory signals [11], minimum MFCC, mean MFCC, Zero-Crossing Rate which capture the rhythmic features of a signal [13], Chromograms which are commonly used for the analysis of musical sounds [27], Spectral Roll-Off which measures the right skewness of a spectrum [18], Spectral Flux which describes rate of change a time-varying spectrum arising from a non-stationary process [21], and Pitch which is a well-known instantaneous attribute of sounds [26]. Features do not contribute equally significant to a given classification problem. Their significance can be assessed through the decrease of accuracy in classification when a feature is dropped. To this end, GINI importance, or mean decrease in impurity (MIDI), may be used to evaluate the importance of each feature [6].

1.4 Performance Measures

The performance of a classifier is relative to a test dataset used to examine the model trained with a train dataset. Here we use standard performance metrics. Accuracy is measured by

$$\begin{aligned} \text {Accuracy} = (N_\mathrm{tp} + N_\mathrm{tn}) / (N_\mathrm{tp} + N_\mathrm{fp} + N_\mathrm{fn} + N_\mathrm{tn}), \end{aligned}$$

which accounts for the number, N, of predictions labeled as true positives (tp), true negatives (tn), false positives (fp), and false negatives (fn), commonly expressed in percent, while precision is defined by the proportion of true positive predictions to the total number of positive predictions, and recall which reaches one when there are no false negatives. We also used a statistical measure termed, \(F_1\)-Score, which is the harmonic mean of precision and recall. These metrics are recalled below.

$$\begin{aligned} \begin{aligned} \text {Precision} = N_\mathrm{tp} / (N_\mathrm{tp} + N_\mathrm{fp}), \quad \text {Recall} = N_\mathrm{tp} / (N_\mathrm{tp} + N_\mathrm{fn}), \\ F_1\text {-Score} = 2 (\text {Recall} \cdot \text {Precision}) / (\text {Recall} + \text {Precision}).\quad \end{aligned} \end{aligned}$$

We used a graphical representation borrowed from Signal Detection Theory [22]. Here, a Receiver Operating Characteristics (ROC) curve plots the false positives vs. true positives. It can be interpreted as a plot of 1-sensitivity vs. sensitivity. Even though these curves may cross, any curve clearly above another is better. A single-number separability measure, the area under the ROC curve (AUC), follows from this representation. An AUC of 0.7 is said to be acceptable, excellent if it around 0.8, and outstanding above 0.9.

1.5 Ambiguity and Abstention

In the present study we employed a recently introduced technique which proposes that in case of ambiguity it is better to abstain rather than to make a prediction [7]. This technique can be implemented, for example, in a three-way random forest algorithm which can extract probabilities for each class. Given two probability thresholds, \(\alpha \) and \(\beta \), having value 1.0 for the ground-truth class and 0 for the incorrect class, predictions are declared positive when the score of the positive class is greater than \(\alpha \) and the score the positive class is greater than the score of the negative class. Conversely, predictions are declared negative when the score of the negative class is greater than \(\beta \) and the score of the negative class is greater than the score of the positive class. When neither of these conditions are met, then there is an abstention. Typical values for \(\alpha \) and \(\beta \) are 0.75.

2 Results

The skin-to-skin touch datasets described in [17] contained signal recorded with eighteen participants of balanced gender and hence captured a reasonable diversity of individual behaviours. The signals were recorded at audio-rate and down-sampled 10-fold. The initial one second interval of each recording was edited out to eliminate the energy burst due to the stick-to-slip transition. This deletion potentially eliminating useful information for the purpose of this study. The pressure dataset comprised ten 10-s recordings where participant touched ten times for each condition their own or the other participant’s index finger with a gentle or firm touch, resulting in 720 trials. The speed dataset comprised similar recordings but the participants touched their own or the other participant’s forearm at three different speeds giving rise to 1080 trials. The posture dataset comprised similar recordings but the participants touched their own or the other participant’s index finger in two different orientation to vary the relationship between the sensor and the regions of skin contact, resulting in 720 trials.

2.1 Relative Performance of Classification Techniques

Table 1 shows the performance of various classification techniques using the pressure dataset suggesting that the random forest classifier performed best compared to other classifiers. It produced negligible Mean Squared Error (MSE) for the test dataset with a classification accuracy of 81% greatly surpassing logistic regression, decision tree, Gaussian, and support vector classification.

Table 1. Classifier performance & Importance of feature extraction. Accuracy in %.

2.2 Importance of Feature Extraction

Tests conducted with the combined datasets to evaluate the contribution of extracted features vs raw data unequivocally confirmed the importance of providing the algorithms with extracted domain knowledge features. Most metrics, Table 1, show low to unacceptable values when raw data was used. Figures 1a, b further indicate that classification models become skilful with the introduction of domain knowledge features since the figure shows across-the-board reduction in false positives rate.

Fig. 1.
figure 1

ROC curves. a, Weak performance of classifiers on the raw data of combined datasets. b, Overall effect of domain knowledge feature extraction on different datasets.

The results shown in Table 2 indicate that a three-way classification algorithm with abstention greatly improved discrimination between the labels self and other. Accuracy was increased from 81% to 92% with the pressure dataset, from 78% to 97% with the speed dataset, and from 76% to 84% with the posture dataset. For the case of the combined dataset accuracy was increased from 73% to 90%, which is very significant.

Table 2. Three-way classification of self and other using abstention.

Figure 2 summarizes the GINI importance of the different domain knowledge features used relatively to the datasets.

Fig. 2.
figure 2

GINI importance of features. This measure is plotted for each features during three-way random forest classification of different datasets. Features are ranked by order of decreasing importance with combined data.

2.3 Discussion and Conclusion

Overall, the best performance was achieved by the three-way random forest classification algorithm which makes use of an ensemble learning method though a multitude of decision trees. The mutually exclusive branches represent sub-categories of the input features. The random forest classifier overcomes overfitting though voting to predict an output. The technique known as bootstrap aggregating de-correlates the decision trees corresponding to different training sets. Noise in single tree affects the performance of the model but not the average of many trees. This strategy was very successful in the classification of self-touch and other-touch (labels self and other) from single-channel recordings of tactile waves in the index finger.

As a whole, the eight domain knowledge features showed little relative advantages over the others in the task of discriminating self-touch from other-touch. The mean and max MFCC features which made important contributions when the speed of sliding contact varied could be considered as exceptions. Also, the zero-crossing feature was important when the posture was changed. Chromogram and pitch had the highest importance in terms of classification with combined data. These findings indicates that none of the commonly used audio features preferentially revealed the latent characteristics of skin-to-skin friction-induced vibrations that can be used to distinguish self-touch from touching other people. It is possible that the removal of the stick-to-slip frictional transitions was responsible for this general lack of sensitivity. These findings therefore suggest that further research is needed to discover better domain knowledge features for this type of data.

The AUC measure was found to be a preferred performance indicator over accuracy, an observation that was commented in [19] although it is recommended to consider different evaluation metrics to discuss performance of a classifier on a particular problem. This observation is validated by the ROC curve obtained under change in posture compared to that obtained under change in speed.

It is astonishing that some of the machine learning algorithms could reach such very high level of performance in discriminating self- from other-touch. Common sense would suggest that the applied pressure would be a factor but the results suggest otherwise. The same can be said of the speed of sliding. If it was an important factor, then the dataset where speed was purposefully varied would have led to poor performance, which was not the case. Thus, surprisingly, the latent characteristics that enabled discrimination between self- and other-touch were not related to neither the tonic nor the kinematic attributes of the gestures employed. Since it is not at all obvious which signal characteristics these algorithms exploited to achieve discrimination, future research will seek to identify which invariant properties hidden in the recorded tactile signals were used by the classifier to discriminate self- from other-touch.