Introduction

There are three main levels at which gender equality is considered in Horizon Europe, the European Union’s key funding program for research and innovation [1]:

  1. 1.

    Having a Gender Equality Plan (GEP) in place becomes an eligibility criterion for certain categories of legal entities from the EU and associated countries.

  2. 2.

    The integration of the gender dimension into research and innovation content is a requirement by default, an award criterion evaluated under the excellence criterion, unless the topic description explicitly specifies otherwise.

  3. 3.

    Increasing gender balance throughout the program is another objective, with a target of 50% women in Horizon Europe related boards, expert groups and evaluation committees, and gender balance among research teams set as a ranking criterion for proposals with the same score.

In this paper, we will analyze the main dynamic characteristics of online handwriting signals in different tasks, in agreement with the level 2 of Horizon Europe. While this kind of signals has recently attracted the interest of the scientific community, few efforts have been done to analyze whether there are significant gender differences in healthy subjects as well as in those with some pathologies. We think that establishing a response to the fundamental question “Are there significant differences between online handwritten tasks performed by both genders?” is mandatory to develop gender-bias-free e-health and e-security applications based on online handwritten tasks. In addition, the understanding of these gender differences will help to improve cognitive systems that deal with handwriting signals. While cognitive computing has been used, for instance, in customer profiling by means of meta-classification for gender prediction [2], we think that a deep knowledge of gender differences in handwritten tasks will help to improve classification accuracies of cognitive systems and to avoid gender biases.

A paradigmatic example of gender differences is in hearth attacks [3]. Although the incidence of cardiovascular disease (CVD) in women is usually lower than in men, women have higher mortality and worse prognosis after acute cardiovascular events. These gender differences exist in various CVDs, including coronary heart disease, stroke, heart failure, and aortic diseases. These differences have caused widespread concerns, and the consideration of gender differences is of great importance for the prevention, diagnosis, treatment, and management of CVD.

In signal processing, we can easily compare, for instance, speech and online handwriting signals. Both of them are dynamic signals, permitting us to express ourselves and require a high cognitive effort. However, as we will describe next, the efforts devoted to gender relevance study in handwriting signal analysis have been very low, and this should be corrected in order to develop science in a fair way and aligned to European directives.

Speech Signals

If we get a look at signal processing, in speech signal analysis, it is well-known that gender differences exist. The clearest example is pitch. In general, women speak at a higher pitch, about an octave higher than men. This is mainly due to morphological aspects such as the vocal tract length and oral cavities dimension. This implies significant differences. For instance, in [4], the authors studied the effect of age and gender on emotion recognition applications. They compared the performance of four different models and presented the relationship between age/gender and emotion recognition accuracy. Experimental results showed that using a separated emotion model for each gender and age category gives a higher accuracy compared with using one classifier for all the data.

In [5], the authors identified that the associations between hearing loss (HL) and cognitive impairment varied according to gender in older community dwellers, suggesting that different mechanisms are involved in the etiology of HL.

Another different topic is behavioral sex differences. In [6], the authors found that many popular conceptions of behavioral sex differences are founded more on personal beliefs than on facts. Some of these beliefs include that women speak more, speak faster, leave more sentences unfinished, and operate at a simpler conceptual level than men.

Online Handwriting Signals

In online handwriting signal analysis, the gender influence has attracted less interest from the scientific community, when compared to other signals such as speech. Most of the existing literature dealing with gender and handwriting signals is aimed to gender classification [7,8,9,10,11,12,13,14]. For example, in [14], Likforman-Sulem et al. employed the support vector machine (SVM) classifier and modeled a set of dynamic handwriting features extracted from a sentence copy task in order to stratify a cohort into males and females. They reached 65% accuracy, suggesting that the gender effect is present but with a relatively small impact. This finding is in line with the study of Liwicki et al. [11], who utilized Gaussian mixture models trained on kinematic and spatial features and reached a classification accuracy of 64%. The team also reported that even though this accuracy is low, it outperforms the classification based on the offline handwriting, i.e., an accuracy of 55%. On the other hand, in [13], the authors quantified an offline handwritten text by spatial (geometric) features and achieved 74% gender classification accuracy using the Random Forest algorithm. Although this accuracy slightly differed across languages (English and Arabic), it again highlighted that the gender effect in handwriting processing should be taken into account and further explored. For example, the conclusions of the Kaggle competition (which hosted more than 190 teams) suggest that the effect of gender is also associated with the age of the writer [12].

Nevertheless, besides the above-mentioned studies, some important questions concerning the gender classification must be addressed, the most important one being are there gender/age differences in the handwriting of healthy subjects that must be taken into account when analyzing pathologies such as Alzheimer’s or Parkinson’s disease? The first step would be to identify these differences. If they exist, it is important to obtain deep knowledge about them, in a similar way that we know that the pitch of males/females is significantly different. In the case of age differences, they can be alleviated by the fact that control groups in pathology analysis are usually paired by age, or, in the case of imbalanced datasets, age is considered as a confounding factor and could be regressed out. However, it has not been deeply analyzed whether significant differences exist in handwritten tasks performed by males and females. If differences exist in speech signals, then why not in handwritten tasks?

The second important question is about the correlation between different handwritten tasks performed by the same user: if a user exerts a high pressure and exhibits higher writing speed than the standard population, is this result independent of the performed task? And, are there gender differences?

In this paper, we want to address the question about differences in online handwritten tasks due to gender. We are mainly interested in temporal, kinematic, and dynamic features such as time in-air/on-surface, pressure, acceleration, and complexity (measured by Shannon entropy). In order to respond to the questions, we performed a set of experiments on the BIOSECUR-ID database [15].

For a recent description of the state of the art, applications and future trends in online handwriting analysis, see [16, 17]. Usually, the scientific literature has ignored the separate analysis by gender, but we can forecast increasing attention to this issue, as improved classification accuracies in e-health and e-security could be obtained by considering the gender issue. From this point of view, we consider that a good starting point is the analysis of different online handwritten tasks performed by healthy subjects (when performing the same task). Obviously, this limits the analysis of other levels of information such as differences in word usage and expressions, which could be evaluated by means of a handwritten free text analysis. Nevertheless, this requires an acquisition of a new database, which is beyond the scope of this paper.

We also skip some interesting questions, which should be addressed in future research work, such as the following analysis of gender dependence skills in e-security and e-health applications:

  • Is any gender more skilled to produce forgeries in biometric recognition? This topic has been analyzed in the case of offline Arabic handwritten signatures in [18], and the authors concluded that women were found to have a marginal advantage in simulating all elements of the signatures, but there was no statistically significant difference between the genders on any of the elements examined.

  • Is any gender more skilled to produce handwritten tasks that require a cognitive effort such as the complex figure copying test? To respond to this question, a large database of healthy people performing drawing tasks is required.

Experimental Analysis

This section describes the experimental database and the obtained results.

Database

We have analyzed the BIOSECUR-ID [15] database, which consists of 330 subjects (46% females, 54% males), who provided handwritten samples in four acquisition sessions. The participants performed the following tasks:

  • Text in cursive letters (TXT)

  • Numbers from zero to nine (NUM)

  • Eight words written in capital letters (WORD)

  • Genuine signature (SIGN)

  • Fake signature trying to imitate other user’s signature (SIGN fake)

The database was acquired with the Wacom Intuos3 A4 tablet in combination with the Wacom Inking pen. It provides 5080 dpi and 1024 pressure levels and a spatial accuracy ± 0.25 mm.

Figure 1 shows the population distribution for both genders. Experimental mean and standard deviation reveal that both populations present a similar distribution.

Fig. 1
figure 1

Histograms with population distribution for males and females in the BIOSECUR-ID database

Handwriting Features

The parameters used to carry out the analysis were:

  • How much time the pen has spent lifted from the tablet and also how much time the pen has spent on the surface (tup, tdown computed as the mean of all the realizations done by males and females separately, as well as the standard deviation)

  • The mean of the pressure (\(\overline{p }\)) and its standard deviation

  • The mean of the speed (\(\overline{s }\)) and acceleration (\(\overline{a }\)) and their standard deviation

  • The entropy of the variables x, y, and p (Hx, Hy, Hp) and their standard deviation, where x and y are the spatial coordinates and p is the pressure exerted with the pen by a writer

Experimental Results

Table 1 shows handwriting features for different tasks using recordings from session 1.

Table 1 Mean and (standard deviation) for different tasks and features for males (M) and females (F)

Worth to mention that males and females have been educated during the schooling period to write texts in the same way. However, the signature is something personal, chosen by the individuals, and is therefore more prone to gender differences.

The next figures depict scatter plots for a specific feature and a couple of tasks. Males are depicted in red and females in green. Worth to mention that when comparing different tasks performed by the same user, there is a large correlation as the dots appear aligned. Figure 2 shows the five analyzed features (tup, tdown,\(\overline{p }\), \(\overline{s }\), \(\overline{a }\)) for the text in cursive letters (in x coordinate) vs numbers (in y coordinate). Figure 3 shows the same previous features for the cursive text (in x coordinate) vs capital letters (in y coordinate). Figure 4 is analogous with the text in capital letters versus numbers, and Fig. 5 compares genuine signatures versus skilled forgeries performed by the same user. Figure 6 compares cursive letters vs. signature, and Figs. 7, 8, 9, 10 capital letters vs. signature.

Fig. 2
figure 2

Features (tup, tdown,\(\overline{p }\), \(\overline{s }\), \(\overline{a }\)) in the cursive text versus numbers for males (red dots) and females (green dots)

Fig. 3
figure 3

Features (tup, tdown,\(\overline{p }\), \(\overline{s }\), \(\overline{a }\)) in the cursive text versus capital letters for males (red dots) and females (green dots)

Fig. 4
figure 4

Features (tup, tdown,\(\overline{p }\), \(\overline{s }\), \(\overline{a }\)) in numbers versus capital letters for males (red dots) and females (green dots)

Fig. 5
figure 5

Features (tup, tdown,\(\overline{p }\), \(\overline{s }\), \(\overline{a }\)) in genuine signatures versus skilled forgeries performed by the same user for males (red dots) and females (green dots)

Fig. 6
figure 6

Features (tup, tdown,\(\overline{p }\), \(\overline{s }\), \(\overline{a }\)) in the cursive text versus genuine signatures performed by the same user for males (red dots) and females (green dots)

Fig. 7
figure 7

Features (tup, tdown,\(\overline{p }\), \(\overline{s }\), \(\overline{a }\)) in capital letters versus genuine signatures performed by the same user for males (red dots) and females (green dots)

Fig. 8
figure 8

Features (tup, tdown,\(\overline{p }\), \(\overline{s }\), \(\overline{a }\)) in the cursive text versus forgery signatures performed by the same user for males (red dots) and females (green dots)

Fig. 9
figure 9

Features (tup, tdown,\(\overline{p }\), \(\overline{s }\), \(\overline{a }\)) in numbers versus forgery signatures performed by the same user for males (red dots) and females (green dots)

Fig. 10
figure 10

Features (tup, tdown,\(\overline{\mathrm{p} }\), \(\overline{\mathrm{s} }\), \(\overline{\mathrm{a} }\)) in capital letters versus forgery signatures performed by the same user for males (red dots) and females (green dots)

Table 2 shows Pearson’s correlation coefficients for a specific feature when performing two different tasks (for instance, the first cell in the upper left corner of the table represents the correlation between the tup required to perform a text in cursive letters versus numbers).

Table 2 Correlation coefficients and corresponding (p values) for different features extracted from a pair of tasks. High and low correlation values are highlighted in green and red colors, respectively

Significant correlation values are always positive in our case. In addition, all the correlations are significant with p < 0.001, except for those involving forgeries. Probably this is because, in contrast to the other tasks, forgeries are a non-usual handwriting activity in daily life. Thus, the dynamic is quite different from the “normal” handwriting activity.

Considering the following range of values:

  • [0, 0.2]: no association

  • (0.2, 0.4]: very week association

  • (0.4, 0.6]: moderate association

  • (0.6, 0.8]: strong positive association

  • (0.8, 1]: very strong positive association

We observe:

  • Very similar behavior when comparing males and females

  • No association between the pressure exerted by a user when performing her/his own signature or trying to imitate another’s signature

  • Stronger correlations between features extracted from handwritten text (cursive, numbers and capital letters) than between signatures and handwritten text. This opens the possibility for improved biometric accuracies when combining text and signature

Another interesting question is whether there are significant differences between a specific task and a feature when performed by males and females. In order to respond to the question, and since the features do not have normal distribution, we performed the Mann–Whitney U test with the null hypothesis that the two samples come from distributions with equal medians. Table 3 summarizes the p values for different tasks and features.

Table 3 p values of the Mann–Whitney U test for different tasks and features. The p values are computed by considering the two classes (males and females)

Considering as significant those results with p < 0.05, we observe:

  • Signatures of males and females are different in time in-air and on-surface, speed, and acceleration. This is not surprising as signatures are personal traits, and each person has its own signature shape. On the other hand, the exerted pressure is not significantly different.

  • In the case of faked signatures, we observe that the dynamics measured by speed and acceleration are not significantly different. Probably because BIOSECUR-ID forgeries have not been performed by professional forgers and, in some sense, males and females are doing this task in a way which is closer to copying a drawing than performing a signature.

  • The time up in the air is not significantly different in males versus females, while time on the surface reveals differences when performing numbers and cursive text but not in capital letters. In fact, according to Table 1, females required on average 10.2% and 10.6% (respectively, for cursive text and numbers) extra on-surface time. The extra time in words in capital letters is on average 7.4% higher. This is probably because words in capital letters are produced in a simpler way, with strokes which consist mainly of straight lines, and there is less room for differences.

  • There is no difference in the exerted pressure by males and females in all the evaluated tasks except for words in capital letters. In this case, according to Table 1, males exerted 10.2% higher pressure than females. A similar conclusion is found in speed and acceleration.

Conclusions

The experimental results are in agreement with our previous experiments that revealed poor accuracies when trying to identify whether a handwritten text in capital letters was performed by a male or a female [19, 20] in two cases: using automatic classification and using human expert classification too. We reported an identification rate of up to 70%, which can be considered low as it is a two-class problem (flipping a coin the accuracy would be 50%). This is in agreement with the fact that few differences appear in time in-air and on-surface in this task.

The main conclusions of this work are that:

  • No significant differences have been found in handwritten tasks of healthy users related to gender except for time on-surface in cursive letters text and numbers.

  • Significant differences exist in the signatures of males and females. Worth to mention that probably this will not be generalizable to signers of other languages. The database was acquired in Spain, and most of the signatures in Spain tend to be legible (they normally include the name and surname). According to [21], this is the case for around 50% of signatures contained in the MCYT database [22].

  • High correlations exist on some features extracted from different handwritten tasks (text in cursive letters, capital letters, and numeric digits). The signature exhibits lower correlations with other tasks. This may be because handwriting has been deeply modified by education and the signature has been more freely decided by each user.