Towards the Prediction of Multiple Soft-Biometric Characteristics from Handwriting Analysis

Bouadjenek, Nesrine; Nemmour, Hassiba; Chibani, Youcef

doi:10.1007/978-3-319-89743-1_19

Part of the book series: IFIP Advances in Information and Communication Technology ((IFIPAICT,volume 522))

Included in the following conference series:

IFIP International Conference on Computational Intelligence and Its Applications

1235 Accesses

Abstract

Soft-biometrics prediction from handwriting analysis is gaining a wide interest in writer identification since it gives additional knowledge about the writer like its gender (man or woman), its handedness (left-handed or right-handed) and its age range. All research works developed in this context were focused on predicting a single soft-biometric trait. Nevertheless, it could be more interesting to develop a system that predicts several traits from a handwritten text. Presently, we investigate the feasibility of such multiple trait prediction. To reach this end, we propose two prediction schemes. The first combines individual prediction scores to aggregate a global prediction. The second scheme is based on a multi-class prediction. For both schemes, the prediction is based on SVM classifier associated with Gradient features. Experimental corpus is collected from IAM handwritten database. Conclusively, the second scheme proved to be more promising and evinced that the age characteristic is stable over time for a certain category of writers.

You have full access to this open access chapter, Download conference paper PDF

Writer identification approach by holistic graphometric features using off-line handwritten words

Article 29 March 2018

Detection of Personality Traits Through Handwriting Analysis Using Machine Learning Approach

Analyzing Various Handwriting Recognition Phenomenon for Predicting Gender, Age and Handedness

Keywords

1 Introduction

Soft-biometrics provides complementary information about the individual, without being able to fully authenticate him. It includes various traits such as gender, skin color, eyes color and ethnicity. These characteristics were recently used to reinforce biometric identification systems. Besides, in forensics applications, soft-biometrics allows the restriction of investigations to a limited category of persons or suspects [1]. First works in this field extracted soft-biometrics by analyzing individual face images which constitute the most practical identification tool [2, 3]. Nevertheless, soft-biometrics can bring useful information to some forensics and handwriting recognition applications. For instance, when analyzing an anonymous threat letter soft-biometric information such as the writer’s gender, handedness, age range, and educational level, has a precious contribution in investigations. Since 2001, researchers in the handwriting recognition field started to predict soft-biometric traits from handwritten text. The first work proposed by Cha et al. [4] tried to classify US population into some demographic sub-categories defined by gender, ethnicity and educational level. Then, some other works have followed later by dealing with various traits such as gender, handedness, age range and nationality [5,6,7,8,9,10,11,12,13,14]. However, in the state of the art, soft-biometrics prediction systems were developed to deal with a single trait. This is mainly due to two reasons. First, there is a lack of datasets providing several soft-biometric characteristics and second, the prediction of one characteristic is a challenging task. In fact, the results reported on several benchmark datasets vary from 55% to 85% [5,6,7,8,9,10,11,12,13,14]. Thereby, the following question have come up: Could we predict two or more characteristics from the same analysis and if so, how much will be the prediction score? The present work attempts to answer these questions by proposing two multi-class prediction schemes. In the first scheme, we employ the same handwritten text to develop individual systems that predict writer’s gender, handedness and age range. Then, the predictions obtained are grouped to get a global prediction on the three characteristics. Whilst, the second scheme adopts directly a multiclass prediction based on the one against all implementation. In both schemes, the prediction process is based on SVM classifier associated with gradient features. The rest of this paper is arranged as follows: Sect. 2 introduces multi-trait prediction schemes. Section 3 presents the experimental evaluation while the last section reports the main conclusions of this work.

2 Multiclass Prediction of Soft-Biometrics

Gender is the social definition of a man and a woman, while handedness defines the preference for use of a hand, known as the dominant hand (left or right). As to age, it is perceived by ranges. Until now, predicting such characteristics from handwriting is performed for only one characteristic at a time. In this work we investigate the feasibility of a multiclass prediction from the analysis of the same handwritten text. Whatever the adopted scheme, the prediction task is founded on two main steps that are feature generation and prediction. As feature generation several texture, gradient, shape and geometric features were proposed [10, 14]. Also, for the prediction step, several classifiers were employed such as artificial neural networks, Support Vector Machine (SVM) and decision tree algorithms [8, 10]. Nevertheless, findings report that SVM is the best candidate for solving the prediction task [10]. So, in this work we employ SVM associated with the Gradient Local Binary Patterns (GLBP) which showed a high performance for predicting a single soft-biometric trait [10].

2.1 Dataset Description

Up to now, the prediction of writer’s soft-biometrics is not widely investigated because of the lack of public datasets. Precisely, for the Latin script IAM is the only public dataset which provides gender, handedness and age range of writers. IAM was developed by a research group on computer vision and artificial intelligence at Bern University in Switzerland^{Footnote 1}. It contains handwritten sentences of more than 200 writers grouped into two age categories that are “25–34 years” and “35–56 years”. So, by considering the three available traits, we can define 8 classification categories as shown in Table 1. Presently, 534 samples are collected to perform multi-class soft-biometrics prediction. Specifically, we considered only one handwritten sentence per writer for right-handed writers. For classes including left-handed classes we considered more than one sample per writer since the IAM dataset contains only 20 left-handed writers.

Table 1. Data distribution for multiclass prediction.

Full size table

2.2 Multiclass Soft-Biometrics Prediction Based on Individual Systems

Since soft-biometrics prediction is commonly evolved in systems predicting a single characteristic, the direct extension for a multi-class prediction consists of grouping individual decisions of such systems. The idea of this scheme is to predict each soft-biometric characteristic independently from the others, so that the global system will be composed of “j” individual binary systems, if we have “j” characteristics to predict. In this respect, three systems are developed to predict gender, handedness and age range by grouping the training data according to the considered characteristic. Hence, we use 176 female samples and 180 male samples for gender prediction, 168 left-handed samples and 188 right-handed samples for handedness prediction, and finally, 178 samples for age ranges prediction. For the test stage, each sample is simultaneously presented to the three systems as shown in Fig. 1. Then, predictions on gender, handedness and age range are grouped and compared to the ground truth of the considered sample. In experiments, classes in Table 1 were grouped to perform a binary classification. For gender prediction training samples of classes 1, 2, 3 and 4 were grouped to constitute the Female class while the remaining classes were grouped to form the Male class.

2.3 Multiclass Soft-Biometrics Prediction Based on One-Against-All SVM

The One-Against-All (OAA) SVM builds “j” binary SVM to solve a j-class classification problem. Each SVM is dedicated to separate one class from all other classes. After the training stage, a test sample is presented to all SVM to produce 8 decisions according to the classes of interest. Then, the sample is assigned to the class with the highest decision as depicted in Fig. 2.

Note that test samples are common for the two schemes, in order to get a fair comparison of the prediction scores.

3 Experimental Evaluation

The proposed multiclass prediction schemes are evaluated on the selected IAM sub-set. For performance evaluation the confusion matrix is used to highlight the precision per class and the global prediction accuracy. Recall that the prediction system is better the more the confusion matrix approaches a diagonal matrix.

3.1 Prediction Based on Individual Systems

Prediction results expressed through overall accuracies are exhibited in Fig. 3. The overall accuracy based on the combination of individual decisions that is 32,48% is much lower than those given by each individual predictions. This can be explained by the proliferation of prediction errors of each binary system, when aggregating the final decision. Specifically, the decrease in the prediction accuracy is mainly due to the age range prediction system that gives a medium prediction, which is about 52,86%. This finding leads us to move towards multiclass implementation to improve these results.

3.2 Prediction Based on the OAA SVM

Compared to the first scheme, the OAA implementation improves the overall prediction accuracy to 54.09%. To understand this result, which remains low, we present the confusion matrix in Table 2. From a look at this table, we note that classes 3, 4, 7 and 8 that correspond to right-handed writers are poorly predicted. These classes are problematic, as the addition of the age range characteristic, especially for right-handed writers, increases the complexity of the prediction task. More precisely, for right-handed females the age range doesn’t show any behavioral differences (classes 3 and 4). This means that writers of this category keep almost a stable and stationary age characteristic over time. Similar behavior is observed for right-handed males that are highly confused with the right-handed females by 36.36% in precision. To get a more precise interpretation of the soft-biometric behavior, we reproduced the OAA test by considering two soft-biometrics that are gender and handedness. This was done by merging classes age ranges according to gender and handedness, which yields a 4-classes prediction. Experiments report an overall prediction score of 66.17%. Based on this outcome as well as the results derived by the first scheme, we suggested that the age range is the most critical trait. So, to improve the description of age range information, we developed a fuzzy membership model that gives an additional knowledge about the writer’s age.

Table 2. Confusion matrix for multiclass prediction (%).

Full size table

Membership degree for age modeling

Age range is modeled through a membership degree that gives an automatic information about the affinity of a sample to one of the two age ranges. Indeed, inspired by a work carried out in a remote sensing application [15], we define a fuzzy membership degree to age categories based on Mahalanobis distance. Specifically, all training samples were grouped into two sets according to the age range. For each set, we calculated the mean and the covariance matrix. Then, for each sample, the fuzzy membership degree to the age range categories is calculated according to the steps presented in Algorithm 1.

The membership degree is generated for all samples and concatenated with GLBP features. Table 3 illustrates the confusion matrix obtained be adding the fuzzy meberships of age range.

Table 3. Confusion matrix for multiclass prediction with membership degree contribution (%).

Full size table

As can be seen, the membership degree allows a gain of 7,42% given an overall prediction about 61.51%. Indeed, an improvement of 13,64% and 27,27% are reached for right-handed females of both age ranges. Moreover, an improvement of 24% is noticed for right-handed males of the second age range. However, we observe a negative effect on left-handed writers. For instance, the precision of left-handed females aged between 25 and 34 years old drops from 90,91% to 77,27% which corresponds to 3 samples wrongly predicted.

In summary, predicting gender, handedness and age range simultaneously is very challenging as it is limited by the difficulty of separating sub-categories according to the age characteristic. For this reason, prediction accuracy drops from 66.17% to 54.09% without and with age characteristic, respectively. These, all outcomes reveal that the right-handed writers which represent the majority of writers in the database, are not distinguished according to age. This allows to say that the characteristic that is supposed to evolve over time, has generally stagnated for this category of writers.

4 Concluding Remarks

This work addresses the possibility of a simultaneous prediction of writer’s gender, handedness and age range from one single analysis of the handwriting giving an 8-classes prediction problem. In this respect, using a set of data extracted from an English benchmark dataset, we investigate two prediction schemes. In the first scheme, three systems designed to predict a single characteristic are developed. Then, predictions are grouped to give an overall prediction of the three characteristics. The second scheme adopts a multiclass prediction based on the one against all implementation of SVM to solve the 8 classes prediction problem. Experimental findings reveal that the age characteristic is problematic as it seems to be stable and unchanged over time, especially for right-handed writers. This is perhaps due to the fact that all contributers are adult since the dataset doesn’t contain young and very old writers. So, it seems necessary to perform other experiments with larger age range categories to get more concluding results. Nevertheless, the best multiclass prediction accuracy is about 61,51%. This remains a promising result and can be improved by finding a better modeling of the age characteristic, not necessarily through a classifier but through a model representation such as regression. Also, perhaps other features or classification methods can deal better with this characteristic.

Notes

1.
http://www.iam.unibe.ch/fki.

References

Tome, P., Vera-Rodriguez, R., Fierrez, J., Ortega-Garcia, J.: Facial soft biometric features for forensic face recognition. Forensic Sci. Int. 257, 271–284 (2015)
Article Google Scholar
Jain, A.K., Park, U.: Facial marks: soft biometric for face recognition. In: International Conference on Image Processing, Cairo, Egypt, pp. 37–40, November 2009
Google Scholar
Zhang, H., Beveridge, J.R., Draper, B.A., Phillips, P.J.: On the effectiveness of soft biometrics for increasing face verification rates. Comput. Vis. Image Underst. 137, 50–62 (2015)
Article Google Scholar
Cha, S.H., Srihari, S.N.: A priori algorithm for sub-category classification analysis of handwriting. In: International Conference on Document Analysis and Recognition, Seattle, USA, pp. 1022–1025, September 2001
Google Scholar
Liwicki, M., Schlapbach, A., Loretan, P., Bunke, H.: Automatic detection of gender and handedness from on-line handwriting. In: Conference of the International Graphonomics Society, Melbourne, Australia, pp. 179–183, November 2007
Google Scholar
Liwicki, M., Schlapbach, A., Bunke, H.: Automatic gender detection using on-line and off-line information. Pattern Anal. Appl. 14, 87–92 (2011)
Article MathSciNet Google Scholar
Al-Maadeed, S., Ferjani, F., Elloumi, S., Hassaine, A.: Automatic handedness detection from off-line handwriting. In: GCC Conference and Exhibition, Doha, Qatar, pp. 119–124, November 2013
Google Scholar
Al-Maadeed, S., Hassaine, A.: Automatic prediction of age, gender, and nationality in offline handwriting. EURASIP J. Image Video Process. 2014, 10 (2014)
Article Google Scholar
Bouadjenek, N., Nemmour, H., Chibani, Y.: Age, gender and handedness prediction from handwriting using gradient features. In: International Conference on Document Analysis and Recognition, Tunisia, pp. 1116–1120, August 2015
Google Scholar
Bouadjenek, N., Nemmour, H., Chibani, Y.: Robust soft-biometrics prediction from off-line handwriting analysis. Appl. Soft Comput. 46, 980–990 (2016)
Article Google Scholar
Bouadjenek, N., Nemmour, H., Chibani, Y.: Fuzzy integral for combining SVM-based handwritten soft-biometrics prediction. In: 12th IAPR Workshop on Document Analysis Systems, Santorini, Greece, pp. 311–316, April 2016
Google Scholar
Tan, J., Bi, N., Suen, C.Y., Nobile, N.: Multi-feature selection of handwriting for gender identification using mutual information. In: International Conference on Frontiers in Handwriting Recognition, Shenzhen, China, pp. 578–583, October 2016
Google Scholar
Mahreen, A., Rasool, A.G., Afzal, H., Siddiqi, I.: Improving handwriting based gender classification using ensemble classifiers. Expert Syst. Appl. 85(C), 158–168 (2017)
Google Scholar
Bouadjenek, N., Nemmour, H., Chibani, Y.: Fuzzy integrals for combining multiple SVM and histogram features for writer’s gender prediction. IET Biom. 6(6), 429–437 (2017)
Article Google Scholar
Nemmour, H., Chibani, Y.: Fuzzy neural network architecture for change detection in remotely sensed imagery. Int. J. Remote Sens. 27(4), 705–717 (2006)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Laboratoire d’Ingénierie des Systèmes Intelligents et Communicants (LISIC), Faculty of Electronics and Computer Sciences, University of Sciences and Technology Houari Boumediene (USTHB), Algiers, Algeria
Nesrine Bouadjenek, Hassiba Nemmour & Youcef Chibani

Authors

Nesrine Bouadjenek
View author publications
You can also search for this author in PubMed Google Scholar
Hassiba Nemmour
View author publications
You can also search for this author in PubMed Google Scholar
Youcef Chibani
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hassiba Nemmour .

Editor information

Editors and Affiliations

University of Saida, Saida, Algeria
Abdelmalek Amine
University of Regina, Regina, Saskatchewan, Canada
Malek Mouhoub
Concordia University, Montreal, Québec, Canada
Otmane Ait Mohamed
University of Oran, Oran, Algeria
Bachir Djebbar

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bouadjenek, N., Nemmour, H., Chibani, Y. (2018). Towards the Prediction of Multiple Soft-Biometric Characteristics from Handwriting Analysis. In: Amine, A., Mouhoub, M., Ait Mohamed, O., Djebbar, B. (eds) Computational Intelligence and Its Applications. CIIA 2018. IFIP Advances in Information and Communication Technology, vol 522. Springer, Cham. https://doi.org/10.1007/978-3-319-89743-1_19

Download citation

DOI: https://doi.org/10.1007/978-3-319-89743-1_19
Published: 12 April 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-89742-4
Online ISBN: 978-3-319-89743-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Federation for Information Processing (opens in a new tab)

Towards the Prediction of Multiple Soft-Biometric Characteristics from Handwriting Analysis

Abstract

Similar content being viewed by others

Writer identification approach by holistic graphometric features using off-line handwritten words

Detection of Personality Traits Through Handwriting Analysis Using Machine Learning Approach

Analyzing Various Handwriting Recognition Phenomenon for Predicting Gender, Age and Handedness

Keywords

1 Introduction