Abstract
Verifying the genuineness of official documents, such as bank checks, certificates, contract forms, bonds, etc., remains a challenging task when it comes to accuracy and robustness. Here, the genuineness is related to the degree of match of the signature contained in the documents relating to the original signatures of the authorized person. Signatures of authorized persons are considered known in advance. In this paper, a novel feature set is introduced based on quasistraightness of boundary pixel runs for signature verification. We extract the quasistraight line segments using elementary combinations of the directional codes from the signature boundary pixels and subsequently we obtain the feature set from various quasistraight line classes. The quasistraight line segments provide a blending of straightness and small curvatures resulting in a robust feature set for the verification of signatures. We have used Support Vector Machine (SVM) for classification and have shown results on standard signature datasets like CEDAR (Center of Excellence for Document Analysis and Recognition) and GPDS100 (Grupo de Procesado Digital de la Senal). The results establish how the proposed method outperforms the existing state of the art.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Today, recognition of persons by machines is an active area of research in which biometrics plays an essential role in the recognition models. The developed models are generally based on two common biometric feature types, physical features, such as face, fingerprint, retina, etc. [14, 23, 32], and behavioral features, such as voice and handwriting [22, 42]. Signature is considered as human behavioral characteristic with which individuals can be uniquely identified. Therefore, when it comes to security and fraud prevention, the signature feature can be used to design authentication systems. Bank checks, contract documents, certificates, for example, are often faked and claimed to be an original. Thus, for the verification of this type of document, we should have prior knowledge about the original signers and their original signature style. Therefore, to investigate the genuineness of a signed document, an automated signature verification can be applied. Here, we consider that we have prior knowledge of the signers and a readymade dataset with genuine signatures of the signers (for the training of the recognition model). There are two types of approaches to automating signature verification: online verification [1, 11, 12, 17, 41] and offline verification [10, 13, 18, 33, 35, 44]. Offline signature verification is considered more challenging than online verification, because dynamic information, such as pentip pressure, velocity, and acceleration of the pentip, is not available in case of offline signature images. On the contrary, the special arrangements for the acquisition of the signatures make the online method unsuitable in practice on several occasions. Here, too, we have to go offline to verify the genuineness of existing legal documents or papers.
This paper presents a novel feature set for offline signature verification. The objective is to detect a faked signature in relation to a particular signer if we have a ready dataset of genuine and a sample of faked signatures of the signer. There are three basic types of forgeries, namely random forgeries, simple forgeries, and skilled forgeries. For the first two types, the faked signature is created without knowing the name, signature shape, etc., or they are not done skillfully. But, in the case of skilled forgeries, the creator of the faked signature is assumed to be an expert in imitating the signature shape and style, and the genuine signature style is known to the imitator. It is obvious that skilled forgery detection is more challenging in the absence of dynamic features.
Related works and motivation: The performance of a verification model depends on the set of features being used for the model. A lot of work has been done related to offline signature verification which uses various types of feature sets for the working of the model. In most of the works, the features are topology, geometric information, gradient, structural information, and concavity bases [13, 26, 28, 33, 39]. For example, Ferrer et al. [13] proposed a method that uses a set of geometric features specified in the description of the signature envelope and the distributions of the strokes. Subsequently, hidden Markov model, support vector machine and Euclidean distance classifier were used for the verification process. ZulNarnain et al. [52], in a recent work, have introduced a signature verification scheme based on geometric features like side, angle, and perimeter of the triangles which are derived after triangulation of a signature image. For the classification, they used the Euclidean classifier and votingbased classifier. Some works are reported which are based on gray value distribution [21, 24, 48], directional run of pixels [5, 35, 36], pixel surroundings [31] and curvature related features [18]. Also, graphometric featurebased works are available in the literature [4]. In [29], the authors proposed a shape feature called chord moment to analyze upper and lower signature envelopes. For signature verification, a support vector machine (SVM) was used with the chord momentbased features. Frequently, in a model, multiple features have been used in combination to improve the classification accuracy of the model. For example, in [35], along with the directional feature, moment information and gray value distribution have been used. The authors have used 16directional feature obtained from the distribution of pixels in the thinned signature strokes. A combination of different types of features makes the feature extraction part costly. Clearly, the computation of moment information along with the 16directional feature is computationally costly considering the model to be used for realtime applications. In a very recent work [44], proposed by Serdouk et al., the directional distribution is not the sole feature extraction policy. Here, a directional distributionbased longest run feature is combined with gradient local binary patterns (GLBP) to strengthen the feature set where the longest runs have been considered in horizontal, vertical, and two major diagonal directions. So, they have used a combination of topological and gradient features. As a topological feature, the longest run of pixels has been used. Gradient information is extracted using Local Binary Pattern (GLBP) in the neighborhood. Computing GLBP at each pixel of the signature image can be considered costly. Serdouk et al. proposed a verification system which is based on the Artificial Immune Recognition System (AIRS). A templatebased verification scheme was also presented [50]. The method they provide is based on encoding the geometric structure of signatures using grid templates.
We also note that many works in the stateoftheart use ensembles of multiple classifiers to produce best results. For example, Ooi et al. [37], in a recent work, presented a framework based on Discrete Radon Transform (DRT), Principal Component Analysis (PCA) and a Probabilistic Neural Network (PNN) to identify forgeries from genuine signatures. But in applications, the designed hardware device should act quickly for classification and decision making. A summary of the existing methods and their classification techniques is given in Table 1.
Though several methods or recognition models have been developed, the results from the existing methods corroborate that there is still plenty of scope for improvement in terms of accuracy and robustness. In addition, there is scope to propose a strong feature set that can collaborate with a lowcomplexity classifier to emerge as a better performer. It will be an additional advantage if the feature set can be easily extracted from the signature images. In this paper, we have proposed a novel set of features from the quasistraight digital curve segments defining the signature strokes. The following sections describe the proposed method and experimental results in detail. The contributions in this paper are mentioned below.

1.
We used the idea of quasistraightness of the pixel runs along the edges defining the signature boundaries. The orientations of the quasistraight edge segments lead to the categorization of the edge segments based on the singular and nonsingular directions. The categories or the various classes gave us a vector of features. The idea of quasistraightness enabled us to catch longer edge segments with bends up to some extent.

2.
In the proposed method, feature extraction is very fast and robust as it computes the quasistraight edge segments from the edge contour, using Freeman’s 8N chain code. The complexity is linear with the number of edge pixels in the signature boundary.

3.
The method is invariant to the presence of noises like dots, blobs, or small strokes in the signature images.

4.
From results on standard signature datasets, such as CEDAR (Center of Excellence for Document Analysis and Recognition) and GPDS100 (Grupo de Procesado Digital de la Senal), it is perceived that the feature set selected by us outperforms the existing state of the art.
2 Proposed method
The directional distribution of edge pixels is often considered an important feature. Here, in this paper, we proposed a set of features using the run of pixels along the signature edge boundary. To do so, we introduced classes of almost straight line segments following some straightness criteria. In the beginning, the set of edge pixels E, coming from the signature boundary, is obtained by applying a \(3\times 3\) filter on the binarized signature image [15]. We make the boundary edge 1pixel thick, through the removal of redundant pixels, using a morphological thinning procedure [15]. For binarization, we used the method given in [43]. The edge boundary for a sample signature image (cropped and zoomed portion) is shown in Fig. 1. The edge pixels coming from E can be understood as a set of digital straight line segments. The neighbors of a pixel are represented using the values coming from the range \(\{0,1,2, \ldots ,7\}\) (considering 8N). The pixel in the east with respect to the center pixel is represented by 0 and the other pixels by 1, 2, 3, 4, 5, 6, and 7 in counter clockwise order. So, the southeast pixel gets the code 7. This scheme is referred to as Freeman’s chain code. The straightline segments are represented as a sequence of pixels which follow some regularity properties (in terms of the chain code values) [27, 40]. A curve segment is said to be digitally straight if and only if there are at most two values, differing by \(\pm 1 \mod 8\), in the chain code of the segment, and for one of these two values, the runlength must be 1 (Property–1). The code always with runlength 1 is referred to as singular code (s) and the code other than the singular code is referred to as nonsingular code (n). Also, the runs of the nonsingular code n can have only two lengths, which are consecutive integers (Property–2). Here, only Property1 was used for the detection of the straight line segments and subsequent feature extraction. Henceforth, as we only use Property–1, the extracted straight line segments will be referred to as quasistraight line segments. We considered twelve quasistraight line classes, using all possible combinations of singular and nonsingular codes. The features used in our proposed method are based on the distribution of the boundary edge pixels among the twelve different quasistraight line classes. The flowchart shown in Fig. 2 represents the proposed method.
2.1 Quasistraight line segment classes
Twelve different classes \(C_1,C_2, C_3, \ldots , C_{12}\) of quasistraight line segments are considered in connection with Property1 of digital straightness. Classwise singular (s) and nonsingular (n) code values are: \(C_1\): \(n=0\), \(s=\epsilon \); \(C_2\): \(n=0\), \(s=1\); \(C_3\): \(n=1\), \(s=0\); \(C_4\): \(n=1\), \(s=\epsilon \); \(C_5\): \(n=1\), \(s=2\); \(C_6\): \(n=2\), \(s=1\); \(C_7\): \(n=2\), \(s=\epsilon \); \(C_8\): \(n=2\), \(s=3\); \(C_9\): \(n=3\), \(s=2\); \(C_{10}\): \(n=3\), \(s=\epsilon \); \(C_{11}\): \(n=3\), \(s=4\); \(C_{12}\): \(n=4\), \(s=3\). Figure 4 demonstrates the classes concerning an example digital curve. Here, the code value \(\epsilon \) denotes that the code is absent or not defined. In understanding the quasistraight line segments present in the signature stroke boundaries, no restriction was applied on the run lengths of the nonsingular code (n). The number of classes could be further increased considering the variations in nonsingular code run length, but it would be oversensitive to detect the genuine signatures.
The method used for the detection of quasistraight line segments is given in Algorithm 1. The algorithm takes the edge map E, nonsingular code n and singular code s as input. It selects an initial run of two consecutive pixels \(p_1\) and \(p_2\), where the direction code from \(p_1\) to \(p_2\) is n, i.e., dir(\(p_1\),\(p_2\)) \(=n\). Then, this run, initiated from the seed \((p_1, p_2)\), extends along the curve, if possible, in two prospective directions n and \((n+4) \mod 8\) (the nonsingular direction n and its opposite direction). The singular direction s is an important part of this extension which is another input to the algorithm. To extend the current run, the Procedure ExtendSegment is invoked as shown in Line 6 and Line 7 of Algorithm 1. The maximal quasistraight line segment is finally reported, which started from the seed \((p_1, p_2)\) (see Fig. 3). The process is continued to report all quasistraight line segments that suit nonsingular code n and singular code s, thereby giving a set (class) of quasistraight line segments. Different sets (classes) are reported by varying the code values n and s. The algorithm shown here considers that \(s \ne \epsilon \). In case of \(s = \epsilon \) (e.g., \(C_1\): \(n=0\), \(s=\epsilon \)), the segment from the seed \((p_1, p_2)\) is extended in direction n and \((n+4) \mod 8\) just by checking the neighborhood of the pixels. The algorithm halts at a point if either its neighbor is not in direction n (or \((n+4) \mod 8\)) or there exists no unvisited neighbor pixel. It must be noted that the Procedure ExtendSegment returns Status= true only if the singular code s appears at least once in the run when \(s \ne \epsilon \). In our proposed method, a quasistraight line segment is considered for inclusion into a class if its length is at least a predefined threshold l (see Line 8 of Algorithm 1). The firsthand selection of l is made on the basis of experimentations so that very small segments are removed from consideration. Necessary discussions are provided in Sect. 3.3.
2.2 Features from the classes
Of each of the twelve different quasistraight line classes, we have the following three features.

The number of quasistraight line segments in the class \(C_i\), denoted as \(n_i\).

The pixel density of the class \(C_i\), i.e., \(\frac{p_i}{P}\), where \(p_i\) is the number of edge pixels in the class and P is total edge pixels from the signature boundary E.

Average edge length, i.e., \( \frac{p_i}{n_i}\) (in terms of pixels) in the class.
Hence, from the twelve classes of line segments, we have the following 36 feature values. So, from this part, we have a feature vector \(\langle f_1, f_2, f_3, \ldots , f_{36} \rangle \), directly using the distribution of edge pixels among classes. Here, as an example, the distribution of edge pixels in class \(C_5\) and class \(C_6\) are shown in Fig. 5 with respect to the signature image shown in Fig. 1.
Further, the count of common pixels (\(c_p\)) in two neighboring classes \(C_i\) and \(C_j\) where \(j = (i+1) \mod 12\) is considered as a feature of \(C_i\). The existence of common pixels in two neighboring classes defines the smoothness of the boundary curves. We consider the ratio \(\frac{c_p}{P}\) for every class and this adds twelve more features to the feature vector, i.e., we get \(\langle f_{37}, f_{38}, f_{39}, \ldots , f_{48} \rangle \).
In addition to these 48 features, we added 30 more features. The rectangular signature area is first divided into six equalsized rectangular regions \(R_1, R_2, \ldots , R_6\), as shown in Fig. 6. Then, we ask the two questions as given below.

1.
Considering a region \(R_i\), which class \(C_j\) has maximum contribution in \(R_i\), i.e., which class is the leader in a given region?

2.
Given the class \(C_j\), in which region does it have the maximum contribution and what is the contribution (pixel density)?
These pieces of information help us to measure the degree of presence of the classes within the signature image area. The first question provides us with 6 values, i.e., the six class numbers, which are regionwise leaders. On the other hand, the second question provides us with 24 values. If the class \(C_i\) (one of the 12 classes) is the leader in region \(R_j\) \((j=1,2,\ldots ,6)\), we have feature values j and \(\frac{r_{ij}}{P}\), where, \(r_{ij}\) is the count of pixels in region \(R_j\) concerning class \(C_i\). Hence, from this part, we obtain a total of 30 feature values, i.e., we have the feature vector, \(\langle f_{49}, f_{50}, f_{51}, \ldots , f_{78} \rangle \).
2.3 Sample feature vector
We have a feature vector of length 78 for every signature image. As an example, the extracted feature vector for the signature image shown in Fig. 1 is presented in Table 2. The first three columns represent the feature values \(\langle f_1, f_2, f_3, \ldots , f_{36} \rangle \) providing the number of quasistraight line segments (\(n_i\)), pixel density (\(p_i/P\)), and average edge length (\(p_i/n_i\)) of all the twelve classes. The fourth column lists 12 values corresponding to 12 classes giving common pixel densities (\(c_p/P\)) with the neighboring classes. The fifth column shows 12 values (one against each class), mentioning the region number where the class \(C_i\) has a maximum contribution, denoted by \(m_{C_i}\). The sixth column contains another 12 values showing the classwise pixel densities indicated by \(r_{ij}/P\) (for all \(i=1,2,\ldots ,12\), as discussed in Sect. 2.2). The last column shows which class \(C_i\) has a maximum contribution in a region \(R_j\) i.e., the regionwise leaders, denoted as \(l_{R_j}\). Here, for example, \(C_6\) is the leader in \(R_2\), \(R_3\), \(R_4\), \(R_5\); \(C_3\) is the leader in \(R_6\); \(C_{11}\) is the leader in \(R_1\). We obtain 6 values from this part. Thereby, in total, we have a 78dimensional feature vector. Representative comparisons of the feature values representing objects from the same class, and objects from different classes are shown in Figs. 7, 8, 9, 10, 11, and 12. The plots clearly show that our features can be used to differentiate the genuine signatures from the faked.
3 Experiments
Our presented signature verification scheme is signer dependent. For performance evaluation, we have carried out the tests for each signer individually. Also, we have used standard available performance measurement methods. One of them is, False Rejection Rate (FRR), is defined by the percentage of genuine signatures rejected by the system. The other is the False Acceptance Rate (FAR), which gives the percentage of faked signatures accepted as genuine. Also, FRR and FAR are combined to report the total verification error and termed as Average Error Rate (AER). The AER is a simple average of FAR and FRR when the CEDAR dataset is considered, but for the GPDS100, the weighted mean of FAR and FRR are taken, because test counts for genuine signatures and faked signatures differ. Effectively, AER reflects the total verification error. For various setwise results (different training and test sets of images), we have shown an equal error rate (EER) also. Here, EER is computed as the mean (weighted mean in case of GPDS100) of respective FAR and FRR of the set, which shows the minimum difference in FAR and FRR values. Experiments were conducted on two standard datasets, as discussed in the following sections.
3.1 Test datasets
The CEDAR dataset contains signatures of 55 different signers. For each signer, there are 24 genuine signatures and 24 skillfully faked signatures. So, the CEDAR dataset has a total of 1320 genuine signatures and 1320 skilled forgeries. The signature images have 8bit gray value levels and they were scanned at 300 dpi. The dataset is created by the Center of Excellence for Document Analysis and Recognition [8].
The GPDS corpus was prepared by Grupo de Procesado Digital de Senals [16]. The collection consisting of the first 100 signers, referred to GPDS100, was used by us. With respect to each person, there are 24 genuine signatures and 30 skilled forgeries. This gives a total of 2400 genuine signatures and 3000 skilled forgeries. The signature images are obtained as binary images.
Sample signature images, both genuine and faked, from CEDAR and GPDS100 datasets are shown in Figs. 15 and 16, respectively. In many cases, it is difficult to find out differences between the faked signatures and the genuine signatures of the respective person. Hence, we consider these types of forgeries as skilled forgeries.
3.2 Support vector machine (SVM)
The support vector machine (SVM) is a wellaccepted classifier in twoclass classification problems. It is observed that in the signature verification problem, the SVM was used extensively [2, 5, 6, 29, 47, 50]. In our proposed method, performance evaluation is done for each individual signer, and our problem is considered a twoclass classification problem. We considered Class0 and Class1 to correspond with the genuine signature class and the faked signature class, respectively.
For both CEDAR and GPDS100, signatures from the genuine category and signatures from the faked category were taken to train the model. How many signatures are to be taken from each category depends on the training size in use. For example, we refer to this training size as \(\langle 16\)+\(16 \rangle \) when 16 signatures from the genuine category and 16 signatures from the faked category were taken to train the model. Other than the size \(\langle 16\)+\(16 \rangle \), we used training sizes \(\langle 12\)+\(12 \rangle \) and \(\langle 8\)+\(8 \rangle \) for experimentation. We found that the LINEAR kernel shows better results than POLY, RBF, and SIGMOID on our feature set. So, we used the LINEAR kernel SVM for the classification work.
We used 8fold crossvalidation to set the value of parameter C. For every person, a set of 48 random signatures (equal contribution from genuine and faked signatures of that person) was considered, and they were divided into 8 equal groups (6 images per group). During crossvalidation, 7 groups, i.e., 42 images were used for training, and one group (6 images) was tested. The selection of the training and testing groups was made in all possible combinations, thereby giving a total of 48 test results (considering all different training samples). Again, the C value was updated/incremented in a loop with a small variation (0.0001), starting at 0.0001, and we checked up to 1. The corresponding C was selected for the person for whom we have reached a maximum accuracy rate during crossvalidation. Accuracy means how many times Class0 objects are reported as Class0 objects and how many Class1 objects are reported as Class1 objects. It must be noted that 48 tests were carried out for every C value in the loop.
3.3 Results and discussions
For the implementation of the system, we have used C++ in Ubuntu 12.04, Kernel Linux 3.2.054generic 64bit, Intel^{®} Core^{™} i52310 CPU 2.90GHz. SVM class of OpenCV has been used for classification. Our system takes edge length threshold, l, as user input, and this is the only parameter used as input. This parameter is used to restrict the small quasistraight line segments from taking part in the feature extraction phase. A quasistraight segment is considered if its length is at least l. We experimented with various l values empirically and observed that the system works well for \(l=4\). The average error rates, as found by the proposed method taking various l, for the CEDAR dataset, are shown in Fig. 13.
As an initial setup, for every individual, we had a training size of \(\langle 16 + 16 \rangle \), i.e., 16 genuine signatures and 16 faked signatures were used for training, as mentioned in Sect. 3.2. Further, to report the error rates (FAR, FRR, and AER), we carried out the experiments four times by changing the training set randomly (test set consisting of the remaining signatures of that individual). The four types of different random arrangements are referred to as Set1, Set2, Set3, and Set4. Such a test result for Set4 from the CEDAR dataset is shown in Table 3.
During each test, the corresponding error rates are recorded, and finally, the resultant FAR, FRR, and AER are reported. Other than the training size \(\langle 16\)+\(16 \rangle \), as mentioned in Sect. 3.2, we have tested the performance of our algorithm on training sizes \(\langle 8\)+\(8\rangle \) and \(\langle 12\)+\(12\rangle \) also. The corresponding setwise results are shown in Tables 4 and 5 with respect to CEDAR and GPDS100 datasets. The EER values are reported in Tables 4 and 5 against each training size considering the various sets of trainingtest arrangements.
Comparative results with existing methods are shown in Tables 6 and 7. Only those existing methods that use CEDAR and GPDS100/160/200/300 datasets to report their results are shown. The corresponding results are taken from the respective papers where the works are reported. First, we will discuss the results we obtained in the CEDAR dataset and then we will discuss GPDS100. We notice an average error rate below \(3\%\) (FAR = 3.35 and FRR = 2.39) for the proposed method when applied to the CEDAR dataset, considering training size \(\langle 16\)+\(16 \rangle \). We now discuss the training policies of the other models, and the corresponding FAR and FRR values. Four methods corresponding to the four best results, as shown in Table 6, other than the proposed method, are considered for the discussion. In their method, Bharathi and Shekar [5] used chaincode (4directional)based directional features from the contour of the signature. They have also used SVM as the classifier and obtained FAR = 7.84 and FRR = 9.36. In their model, they used 12 genuine samples of a person, and 108 random forgeries from other persons for training (taking 2 from the remaining other persons; \(54\times 2=108\)). So, the test size against each person was 12 genuine signatures and 24 forgeries of the same person. Results of the proposed method using training size \(\langle 12\)+\(12\rangle \) are better than their results. Larkins and Mayo [33] used 16 genuine signatures for training. Then, eight genuine signatures along with 24 forgeries were used for testing in relation to the same person. They used the adaptive feature thresholding (AFT)based similarity score finding method, and the automatic verification method resulted in FAR = 10.96 and FRR = 8.16. The training policy of these methods is not directly comparable with ours; still, these pieces of information give us an idea about the performance of our proposed method. Kumar and Puhan [29] used the same training size as we, i.e., 16 genuine and 16 forgeries for each person. As a classifier, they also used SVM. Corresponding FAR and FRR values are 5.68 and 6.36, respectively. In their recent work [44], Serdouk et al. used the same training size as we. So, 16 genuine signatures and 16 forgeries are used in the training stage, and the remaining signatures (eight genuine + eight forgeries for CEDAR and eight genuine + 14 forgeries for GPDS100) are used to test the verification performance. Indeed, we find this method perfectly comparable with our proposed method. It is also noted that the work has achieved a comparable and sometimes better performance than other systems. Now, in comparison with Serdouk et al., we see that our proposed method exhibits better FAR=3.35 (Serdouk et al.: 4.93) and comparable FRR=2.39 (Serdouk et al.: 2.12). Loka et al. [34] used two types of training strategies. In the first strategy, they used genuine and simulated forgeries, whereas in the second, they used genuine and only random forgeries. In the testing phase, the remaining genuine and forgery samples were tested using a binary SVM. The second strategy is found to be comparable with our proposed method, as we are also employing random forgeries in training. Taking training size 16, they obtained FRR=6.22 and FAR=5.33. In conclusion, on the CEDAR dataset, the performance of our proposed method is either better than or comparable to other methods.
The GPDS dataset is available in various sizes. GPDS100 consists of the samples of the first 100 persons, GPDS160 of the first 160 persons, GPDS200 of the first 200 persons and GPDS300 refers to the set for all 300 persons. Our test results shown here refer to the GPDS100 dataset. For the GPDS100 dataset, the average error rate according to our method is 12.42, which is comparable to other methods (see Table 7). We obtained this result when the training size \(\langle 16\)+\(16 \rangle \) was used. Serdouk et al. also provided results for GPDS100, and the corresponding FAR and FRR are reported as 13.16 and 11.38 (AER = 12.52) taking training size = 16, which are comparable to our results. Our proposed method leads to FAR = 15.04 and FRR = 7.85, respectively. Good results are obtained for many individuals (100 signers), but poor results for some persons downgrade the overall average accuracy. We have shown groupwise error rates of GPDS100 in Fig. 14, where each group contains 10 signers. Group1 is from signer 1 to 10, Group2 is from signer 11 to 20, and so on. If we consider the best four groups (\(40\%\) best results) with a lower average error rate, we have an AER close to \(10\%\). Zois et al. reported the same behavior using the GPDS300 dataset in their recent paper [50].
We notice three other works in the literature that have been published very recently which employ pixel distributionbased features and geometric features. Zois et al. [51] presented a method based on lattice structure arrangements and pixel distribution. They used random training and testing sets and obtained the FAR \(= 12.35\%\), FRR \(= 12.21\%\), AER \(= 12.28\%\) for the CEDAR dataset and FAR \(= 9.11\%\), FRR \(= 5.05\%\), AER \(= 7.08\%\) for the GPDS. In the work proposed by Sharif et al. [45], the authors have used geometric features and features generated from the study of the local distribution of pixels. They have used a genetic algorithmbased selection of features and then finally SVM for classification work. The training and testing sizes ratio is observed as 70:30 in their experiments, and they obtained the FAR \(= 4.17\%\), FRR \(= 4.17\%\), AER \(= 4.17\%\) for the CEDAR dataset and FAR \(= 6.67\%\), FRR \(= 4.16\%\), AER \(= 5.42\%\) for the GPDS. Batool et al. [3] presented a method that generates the features by finding the distribution of pixels in regions of the signatures. They have used SVM for classification. The training and testing sizes ratio is noted as 70:30 in their experiments, and they obtained the FAR \(= 3.34\%\), FRR \(= 3.75\%\) for the CEDAR dataset and FAR \(= 9.17\%\), FRR \(= 10.0\%\) for the GPDS.
The proposed method for extracting features is significantly fast. For example, the CPU time required to generate all features for the image shown in Fig. 1 (size = \(582\times 486\), \(\#\)edgepixels=3638) is 0.028 Sec (excluding binarization). On average, the classification time for a single image is less than 0.2 Sec (excluding crossvalidation tests) which shows the fitness of the feature set to be used in realtime applications.
3.3.1 Other datasets and skilled forgeries
In addition to the two datasets (CEDAR and GPDS100), we have tested our method in two other datasets. The first of them is the dataset created at the Netherlands Forensic Institute (NFI) [7]. The dataset consists of authentic signatures from 100 newly introduced writers and faked signatures from 33 writers (the writers are NFI employees). In total, there are 1953 signatures. The dataset is referred to as SigComp2009.
Another dataset, consisting of faked signatures imitating the genuine signatures in the CEDAR dataset is created by us. A genuine signature is selected for each person, and we imitate the signature style. The copied signatures were written on plain paper and captured by a mobile camera. We refer to this dataset as mobile captured skilled forgeries of CEDAR genuine signatures or MCSFC in short. For preparing the MCSFC, special care has been taken to ensure that our faked signatures look as good as genuine signatures. Hence, these signatures are highly skilled forgeries. Sample signature images from MCSFC are shown in Fig. 17.
In addition to SVM, we have experimented with four different classification schemes, namely MultiLayer Perceptron (MLP), Multiclass Classifier (MCC), Random Forest (RF), and Simple Logistic (SL) Regression. MultiLayer Perceptrons (MLP) are used as a type of neural network, and are employed to perform computational tasks, such as predictive modeling tasks. The MultiClass Classifier (MCC) is used for classification purposes, and normally consists of more than two classes or outputs. But variants have been designed for binary classification problems as well. The Random Forest (RF) classifier consists of several decision trees on various subsets, and it takes the prediction from each of the trees based on a majority, voting scheme and finally predicts the final prediction output. Simple Logistic (SL) or Logistic Regression is one of the most popular supervised machine learning techniques. In application purposes, it predicts output as probabilistic values in [0, 1] [49].
For MLP, we have the following parameter settings: number of hidden layers \(=40\), learning rate \(=0.3\), momentum \(=0.2\), batch size \(=100\), the number of epochs to train through is 500, and the validation threshold is equal to 20. For MCC, the batch size is set to 100, the classifier model is logistic regression with a ridge estimator, and the random width factor is set to 2.0. For RF, the bag sizes are the same as the training set size; the batch size is set to 100, the maximum depth of the trees are unlimited, the number of iterations to be performed is set to 100. For SL, the maximum boosting iterations are set to 500, and the batch size is considered as 100.
To evaluate our proposed method on the SigComp2009 dataset, we have used the training and testing size ratio as 80:20. The detection accuracies are reported as \(91.48\%\), \(90.73\%\), \(89.43 \%\), \(80.60 \%\) and \(86.53 \%\) for SVM, Random Forest (RF), MultiLayer Perceptron (MLP), Multiclass classifier (MCC) and Simple Logistic (SL), respectively. The SVM shows the best result, and the proposed method shows an EER \(=8.51\%\) in comparison with \(9.15\%\) as reported by the method given by Blankers et al. [7].
A separate test is conducted for testing on MCSFC using the training size \(\langle 16\)+\(16 \rangle \), as mentioned earlier. Three faked signatures from MCSFC are taken in relation to each person for the test. Even if there are maximum similarities with the genuine signatures, we observe notable accuracies in rejecting them as faked. The FAR value with respect to MCSFC is found to be \(24.5\%\), i.e., \(75.5\%\) of skilled faked signatures are rejected as faked.
The plots in Fig. 18 show the corresponding FPR, FRR, EER values for CEDAR and GPDS dataset, respectively. The ROC plots concerning all the classification methods are shown in Fig. 20 for both CEDAR and GPDS datasets. Also, the Precision and Recall values are shown in Fig. 19 with respect to GPDS dataset. We notice that the SVM shows the best results and RF performs almost similarly.
3.3.2 Scale invariance
We tested our proposed feature set for the identification of scaleddown signature images. We used scaling factors of 2 and 3 and created images from the CEDAR dataset. Scaling factor k indicates that both height and width are reduced by a factor k. As representative data, we present the recognition accuracy when all the classification methods are used on Person12 of the CEDAR dataset in Table 8. Here, the results related to 10fold crossvalidation and trainingtesting size ratio 80:20 are shown in the table. For the full dataset, we found that MLP and SVM shows an accuracy percentage just above 90.0 for both 10fold crossvalidation and the trainingtesting size ratio 80:20 when \(k=2\), which is very promising because of the extracted feature values’ normalized nature. Our observation is that when k increases to 3, the accuracy of the detection decreases.
3.3.3 Impactful features
For the CEDAR dataset, we consider one test loop for each of the 55 persons. In one loop there are 16 tests (as there are 24 genuine and 24 faked signatures and we use \(\langle 16 + 16 \rangle \) training). The WEKAattributeselection method reports the impactful features at the end of each person’s testloop, i.e., after 16 tests. So, 55 times we report the 78 features in descending order of their impact ranks. We have the following parameters settings for SVM: the number of xval folds to be used when estimating subset accuracy \(= 5\), the number of attributes to evaluate in parallel\(=1\), seed(Seed to use for randomly generating xval splits)\(=1\), threshold\(= 0.01\). We observe that the WEKAattribute selection method picks more frequently the first 36 features, i.e., \(\langle f_1, f_2, \ldots , f_{36} \rangle \), than the comparatively impactful features from the set of 78 features. For example, on CEDAR, one random test loop (for all 55 persons) shows the features \(\langle f_{17}, f_{18}, f_{43}, f_{26}, f_{19}, f_4, f_7, f_{20}, f_{23}, f_1 \rangle \) as the most impactful 10 features with maximum number of presence in 55 tests.
4 Conclusion
In this paper, we have proposed a novel feature set for verification of signatures based on the distribution of boundaryedge pixels of the signatureimage. Previously, in some other works, directional features in terms of the longest pixel run or chain code have been used in combination with additional features. In this work, newly proposed classes of quasistraight line segments have been used to define the discriminating features. We conducted experiments on standard datasets like CEDAR and GPDS100. Experimental results corroborate that the proposed feature set shows significantly good results and may be of use in realtime applications. Also, there are scopes to combine with other geometric features like convex hull shape, count and locations of endpoints, count of closed loops, etc., to improve the accuracy level. We must mention that the parameter, edge length threshold, plays a vital role in the process of feature generation and classification. Hence, the automatic selection of the edge length threshold will be worth investigating. This threshold may be related to the resolution of the signature images. Further, considering any two consecutive quasistraight line segments from the boundary, an investigation may be worth to check whether their nonsingular code values, singular code values, and nonsingular run lengths together can perceive the local curvature information when joining the two segments. The nature of curvature and its amount can be important features in the recognition process. It is to be noted that no extra computation is needed to extract the curvature information. This curvature information may also improve the accuracy level of the model.
Abbreviations
 Symbol:

Meaning
 E :

Set of boundary edge pixels
 \(C_i\) :

The ith (\(i=1\ldots 12\)) class of quasistraight line segments
 n :

Nonsingular code
 s :

Singular code
 l :

The straight edge length threshold
 Q :

The set of quasistraight line segments
 \(q_i\) :

The ith segment in the set Q
 p :

A pixel in E
 \(d(p_1,p_2)\) :

Chain code direction from pixel \(p_1\) to pixel \(p_2\) where \(p_2\) is in 8N of \(p_1\)
 \(f_i\) :

The ith feature
 \(R_i\) :

The ith region in the signature area; \(i= 1\ldots 6\)
 \(p_i\) :

The number of edge pixels in E belonging to ith class
 P :

Total edge pixels in E
 \(n_i\) :

Number of edge segments in the ith class
 \(c_p\) :

Number of common pixels in two neighboring classes \(C_i\) and \(C_j\), \(j=(i+1)\) mod 12
 \(r_{ij}\) :

Number of pixels in the region \(R_j\) concerning class \(C_i\)
 \(m_{c_i}\) :

The region number (from 1 to 6) where the class \(C_i\) has the maximum contribution
 \(l_{R_j}\) :

The class number (from 1 to 12) which has the maximum presence in region \(R_j\)
References
Ansari AQ, Hanmandlu M, Kour J, Singh AK (2014) Online signature verification using segmentlevel fuzzy modelling. IET Biom 3(3):113–127
Batista L, Granger E, Sabourin R (2012) Dynamic selection of generativediscriminative ensembles for offline signature verification. Pattern Recognit 45(4):1326–1340
Batool FE, Attique M, Sharif M, Javed K, Nazir M, Abbasi AA, Iqbal Z, Riaz N (2020) Offline signature verification system: a novel technique of fusion of GLCM and geometric features using SVM. Multimed Tools Appl 1–20
Bertolini D, Oliveira LS, Justino E, Sabourin R (2010) Reducing forgeries in writerindependent offline signature verification through ensemble of classifiers. Pattern Recognit 43(1):387–396
Bharathi RK, Shekar BH (2013) Offline signature verification based on chain code histogram and support vector machine. In: International conference on advances in computing, communications and informatics (ICACCI), pp 2063–2068
Bhattacharya I, Ghosh P, Biswas S (2013) Offline signature verification using pixel matching technique. Procedia Technol 10:970–977
Blankers VL, van den HCE, Franke KY, Vuurpijl LG (2009) Icdar 2009 signature verification competition. In: International conference on document analysis and recognition, pp 1403–1407
CEDAR (Center of Excellence for Document Analysis and Recognition) Dataset. http://www.cedar.buffalo.edu/NIJ/publications.html. Last accessed: 20170612
Chen S, Srihari S (2005) Use of exterior contours and shape features in offline signature verification. In: International conference on document analysis and recognition (ICDAR), pp 1280–1284
Chen S, Srihari S (2006) A new offline signature verification method based on graph matching. In: International conference on pattern recognition (ICPR), pp 869–872
Cpalka K, Zalasinski M (2014) Online signature verification using vertical signature partitioning. Expert Syst Appl 41(9):4170–4180
Cpalka K, Zalasinski M, Rutkowski L (2016) A new algorithm for identity verification based on the analysis of a handwritten dynamic signature. Appl Soft Comput 43:47–56
Ferrer MA, Alonso JB, Travieso CM (2005) Offline geometric parameters for automatic signature verification using fixedpoint arithmetic. IEEE Trans Pattern Anal Mach Intell 27(6):993–997
Galbally J, Marcel S, Fiérrez J (2014) Image quality assessment for fake biometric detection: application to iris, fingerprint, and face recognition. IEEE Trans Image Process 23(2):710–724
Gonzalez RC, Woods RE (2006) Digital Image Processing (3rd Edition). PrenticeHall, Inc
GPDS100 (Grupo de Procesado Digital de la Senal) Dataset. http://www.gpds.ulpgc.es/download/. Last accessed: 20170612
Griechisch E, Malik MI, Liwicki M (2014) Online signature verification based on Kolmogorov–Smirnov distribution distance. In: International conference on frontiers in handwriting recognition (ICFHR), pp 738–742
Guerbai Y, Chibani Y, Hadjadji B (2015) The effective use of the oneclass SVM classifier for handwritten signature verification based on writerindependent parameters. Pattern Recognit 48(1):103–113
Hafemann LG, Sabourin R, Oliveira LS (2016) Writerindependent feature learning for offline signature verification using deep convolutional neural networks. In: International joint conference on neural networks, pp 2576–2583
Hamadene A, Chibani Y, Nemmour H (2012) Offline handwritten signature verification using contourlet transform and cooccurrence matrix. In: 2012 International conference on frontiers in handwriting recognition (ICFHR), pp 343–347
Hanmandlu M, Yusof MHM, Madasu VK (2005) Offline signature verification and forgery detection using fuzzy modeling. Pattern Recognit 38(3):341–356
He Z, You X, Tang YY, Fang B, Du J (2006) Handwritingbased personal identification. Int J Pattern Recognit Artif Intell 20(2):209–225
Jain A, Hong L, Bolle R (1997) Online fingerprint verification. IEEE Trans Pattern Anal Mach Intell 19(4):302–314
Jiang N, Xu J, Yu W, Goto S (2013) Gradient local binary patterns for human detection. In: International symposium on circuits and systems, pp 978–981
Justino EJ, Bortolozzi F, Sabourin R (2005) A comparison of SVM and HMM classifiers in the offline signature verification. Pattern Recognit Lett 26(9):1377–1385
Kalera MK, Srihari S, Xu A (2004) Offline signature verification and identification using distance statistics. Int J Pattern Recognit Artif Intell 18(07):1339–1360
Klette R, Rosenfeld A (2004) Digital straightness: a review. Discrete Appl Math 139(1–3):197–230
Kovari B, Charaf H (2013) A study on the consistency and significance of local features in offline signature verification. Pattern Recognit Lett 34(3):247–255
Kumar MM, Puhan NB (2014) Offline signature verification: upper and lower envelope shape analysis using chord moments. IET Biom 3(4):347–354
Kumar R, Kundu L, Chanda B, Sharma J (2010) A writerindependent offline signature verification system based on signature morphology. In: International conference on intelligent interactive technologies and multimedia, pp 261–265
Kumar R, Sharma J, Chanda B (2012) Writerindependent offline signature verification using surroundedness feature. Pattern Recognit Lett 33(3):301–308
Lajevardi SM, Arakala A, Davis SA, Horadam KJ (2013) Retina verification system based on biometric graph matching. IEEE Trans Image Process 22(9):3625–3635
Larkins R, Mayo M (2008) Adaptive feature thresholding for offline signature verification. In: International conference on image and vision computing, pp 1–6
Loka H, Zois EN, Economou G (2017) Long range correlation of preceded pixels relations and application to offline signature verification. IET Biom 6(2):70–78
Lv H, Wang W, Wang C, Zhuo Q (2005) Offline chinese signature verification based on support vector machines. Pattern Recognit Lett 26(15):2390–2399
Nguyen V, Kawazoe Y, Wakabayashi T, Pal U, Blumenstein M (2010) Performance analysis of the gradient feature and the modified direction feature for offline signature verification. In: International conference on frontiers in handwriting recognition (ICFHR), pp 303–307
Ooi SY, Teoh ABJ, Pang YH, Hiew BY (2016) Imagebased handwritten signature verification using hybrid methods of discrete radon transform, principal component analysis and probabilistic neural network. Appl Soft Comput 40:274–282
Pal S, Alaei A, Pal U, Blumenstein M (2016) Performance of an offline signature verification method based on texture features on a large indicscript signature dataset. In: Workshop on document analysis systems (DAS), pp 72–77
Pham TA, Le H, Do N (2015) Offline handwritten signature verification using local and global features. Ann Math Artif Intell 75(1–2):231–247
Rosenfeld A (1974) Digital straight line segments. IEEE Trans Comput 23(12):1264–1269
SaeBae N, Memon ND (2014) Online signature verification on mobile devices. IEEE Trans Inf Forens Secur 9(6):933–947
Said HES, Tan TN, Baker KD (2000) Personal identification based on handwriting. Pattern Recognit 33(1):149–160
Sauvola JJ, Pietikäinen M (2000) Adaptive document image binarization. Pattern Recognit 33(2):225–236
Serdouk Y, Nemmour H, Chibani Y (2016) New offline handwritten signature verification method based on artificial immune recognition system. Expert Syst Appl 51:186–194
Muhammad SK, Muhammad AF, Muhammad Y, Mussarat F, Steven L (2020) A framework for offline signature verification system: best features selection approach. Pattern Recognit Lett 139:50–59
Ruizdel Solar J, Devia C, Loncomilla P, Concha F (2008) Offline signature verification using local interest points and descriptors. In: Iberoamerican congress on pattern recognition, pp 22–29
Vargas JF, Ferrer MA, Travieso C, Alonso JB (2011) Offline signature verification based on grey level information using texture features. Pattern Recognit 44(2):375–385
Vargas JF, Ferrer MA, Travieso CM, Alonso JB (2008) Offline signature verification based on high pressure polar distribution. In: International conference on frontiers in handwriting recognition (ICFHR), pp 373–378
WEKA—The Workbench for Machine Learning. https://www.cs.waikato.ac.nz/ml/weka/. Last accessed: 20210118
Zois EN, Alewijnse L, Economou G (2016) Offline signature verification and quality characterization using posetoriented grid features. Pattern Recognit 54:162–177
Zois Elias N, Alexandridis A, Economou G (2019) Writer independent offline signature verification based on asymmetric pixel relations and unrelated trainingtesting datasets. Expert Syst Appl 125:14–32
ZulNarnain Z, Rahim MSM, Ismail NAF, Arsad MAM (2016) Triangular geometric feature for offline signature verification. World Acad Sci Eng Technol Int J Comput Electr Autom Control Inf Eng 10(3):485–488
Funding
Open Access funding provided by Fachhochschule Nordwestschweiz FHNW..
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Ajij, M., Pratihar, S., Nayak, S.R. et al. Offline signature verification using elementary combinations of directional codes from boundary pixels. Neural Comput & Applic 35, 4939–4956 (2023). https://doi.org/10.1007/s00521021058546
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521021058546