Off-line signature verification using elementary combinations of directional codes from boundary pixels

Verifying the genuineness of official documents, such as bank checks, certificates, contract forms, bonds, etc., remains a challenging task when it comes to accuracy and robustness. Here, the genuineness is related to the degree of match of the signature contained in the documents relating to the original signatures of the authorized person. Signatures of authorized persons are considered known in advance. In this paper, a novel feature set is introduced based on quasi-straightness of boundary pixel runs for signature verification. We extract the quasi-straight line segments using elementary combinations of the directional codes from the signature boundary pixels and subsequently we obtain the feature set from various quasi-straight line classes. The quasi-straight line segments provide a blending of straightness and small curvatures resulting in a robust feature set for the verification of signatures. We have used Support Vector Machine (SVM) for classification and have shown results on standard signature datasets like CEDAR (Center of Excellence for Document Analysis and Recognition) and GPDS-100 (Grupo de Procesado Digital de la Senal). The results establish how the proposed method outperforms the existing state of the art.


Introduction
Today, recognition of persons by machines is an active area of research in which biometrics plays an essential role in the recognition models. The developed models are generally based on two common biometric feature types, physical features, such as face, fingerprint, retina, etc. [14,23,32], and behavioral features, such as voice and handwriting [22,42]. Signature is considered as human behavioral characteristic with which individuals can be uniquely identified. Therefore, when it comes to security and fraud prevention, the signature feature can be used to design authentication systems. Bank checks, contract documents, certificates, for example, are often faked and claimed to be an original. Thus, for the verification of this type of document, we should have prior knowledge about the original signers and their original signature style. Therefore, to investigate the genuineness of a signed document, an automated signature verification can be applied. Here, we consider that we have prior knowledge of the signers and a ready-made dataset with genuine signatures of the signers (for the training of the recognition model).
There are two types of approaches to automating signature verification: online verification [1,11,12,17,41] and offline verification [10,13,18,33,35,44]. Off-line signature verification is considered more challenging than online verification, because dynamic information, such as pen-tip pressure, velocity, and acceleration of the pen-tip, is not available in case of off-line signature images. On the contrary, the special arrangements for the acquisition of the signatures make the online method unsuitable in practice on several occasions. Here, too, we have to go offline to verify the genuineness of existing legal documents or papers.
This paper presents a novel feature set for off-line signature verification. The objective is to detect a faked signature in relation to a particular signer if we have a ready dataset of genuine and a sample of faked signatures of the signer. There are three basic types of forgeries, namely random forgeries, simple forgeries, and skilled forgeries. For the first two types, the faked signature is created without knowing the name, signature shape, etc., or they are not done skillfully. But, in the case of skilled forgeries, the creator of the faked signature is assumed to be an expert in imitating the signature shape and style, and the genuine signature style is known to the imitator. It is obvious that skilled forgery detection is more challenging in the absence of dynamic features.
Related works and motivation: The performance of a verification model depends on the set of features being used for the model. A lot of work has been done related to off-line signature verification which uses various types of feature sets for the working of the model. In most of the works, the features are topology, geometric information, gradient, structural information, and concavity bases [13,26,28,33,39]. For example, Ferrer et al. [13] proposed a method that uses a set of geometric features specified in the description of the signature envelope and the distributions of the strokes. Subsequently, hidden Markov model, support vector machine and Euclidean distance classifier were used for the verification process. ZulNarnain et al. [52], in a recent work, have introduced a signature verification scheme based on geometric features like side, angle, and perimeter of the triangles which are derived after triangulation of a signature image. For the classification, they used the Euclidean classifier and votingbased classifier. Some works are reported which are based on gray value distribution [21,24,48], directional run of pixels [5,35,36], pixel surroundings [31] and curvature related features [18]. Also, graphometric feature-based works are available in the literature [4]. In [29], the authors proposed a shape feature called chord moment to analyze upper and lower signature envelopes. For signature verification, a support vector machine (SVM) was used with the chord moment-based features. Frequently, in a model, multiple features have been used in combination to improve the classification accuracy of the model. For example, in [35], along with the directional feature, moment information and gray value distribution have been used. The authors have used 16-directional feature obtained from the distribution of pixels in the thinned signature strokes. A combination of different types of features makes the feature extraction part costly. Clearly, the computation of moment information along with the 16-directional feature is computationally costly considering the model to be used for real-time applications. In a very recent work [44], proposed by Serdouk et al., the directional distribution is not the sole feature extraction policy. Here, a directional distribution-based longest run feature is combined with gradient local binary patterns (GLBP) to strengthen the feature set where the longest runs have been considered in horizontal, vertical, and two major diagonal directions. So, they have used a combination of topological and gradient features. As a topological feature, the longest run of pixels has been used. Gradient information is extracted using Local Binary Pattern (GLBP) in the neighborhood. Computing GLBP at each pixel of the signature image can be considered costly. Serdouk et al. proposed a verification system which is based on the Artificial Immune Recognition System (AIRS). A template-based verification scheme was also presented [50]. The method they provide is based on encoding the geometric structure of signatures using grid templates.
We also note that many works in the state-of-the-art use ensembles of multiple classifiers to produce best results. For example, Ooi et al. [37], in a recent work, presented a framework based on Discrete Radon Transform (DRT), Principal Component Analysis (PCA) and a Probabilistic Neural Network (PNN) to identify forgeries from genuine signatures. But in applications, the designed hardware device should act quickly for classification and decision making. A summary of the existing methods and their classification techniques is given in Table 1.
Though several methods or recognition models have been developed, the results from the existing methods corroborate that there is still plenty of scope for improvement in terms of accuracy and robustness. In addition, there is scope to propose a strong feature set that can collaborate with a low-complexity classifier to emerge as a better performer. It will be an additional advantage if the feature set can be easily extracted from the signature images. In this paper, we have proposed a novel set of features from the quasi-straight digital curve segments defining the signature strokes. The following sections describe the proposed method and experimental results in detail. The contributions in this paper are mentioned below.

Proposed method
The directional distribution of edge pixels is often considered an important feature. Here, in this paper, we proposed a set of features using the run of pixels along the signature edge boundary. To do so, we introduced classes of almost straight line segments following some straightness criteria. In the beginning, the set of edge pixels E, coming from the signature boundary, is obtained by applying a 3 Â 3 filter on the binarized signature image [15]. We make the boundary edge 1-pixel thick, through the removal of redundant pixels, using a morphological thinning procedure [15]. For binarization, we used the method given in [43]. The edge boundary for a sample signature image (cropped and zoomed portion) is shown in Fig. 1. The edge pixels coming from E can be understood as a set of digital straight line segments. The neighbors of a pixel are represented using the values coming from the range f0; 1; 2; . . .; 7g (considering 8-N). The pixel in the east with respect to the center pixel is represented by 0 and the other pixels by 1, 2, 3, 4, 5, 6, and 7 in counter clockwise order. So, the southeast pixel gets the code 7. This scheme is referred to as Freeman's chain code. The straight-line segments are represented as a sequence of pixels which follow some regularity properties (in terms of the chain code values) [27,40]. A curve segment is said to be digitally straight if and only if there are at most two values, differing by AE1 mod 8, in the chain code of the segment, and for one of these two values, the run-length must be 1 (Property-1). The code always with run-length 1 is referred to as singular code (s) and the code other than the singular code is referred to as non-singular code (n). Also, the runs of the non-singular code n can have only two lengths, which are consecutive integers (Property-2). Here, only Property-1 was used for the detection of the straight line segments and subsequent feature extraction. Henceforth, as we only use Property-1, the extracted straight line segments will be referred to as quasi-straight line segments. We considered twelve quasi-straight line classes, using all possible combinations of singular and non-singular codes. The features used in our proposed method are based on the distribution of the boundary edge pixels among the twelve different quasi-straight line classes. The flowchart shown in Fig. 2 represents the proposed method.

Quasi-straight line segment classes
Twelve different classes C 1 ; C 2 ; C 3 ; . . .; C 12 of quasistraight line segments are considered in connection with Property-1 of digital straightness. Class-wise singular (s) and non-singular (n) code values are: Figure 4 demonstrates the classes concerning an example digital curve. Here, the code value denotes that the code is absent or not defined. In understanding the quasi-straight line segments present in the signature stroke boundaries, no restriction was applied on the run lengths of the non-singular code (n). The number of classes could be further increased considering the variations in non-singular code run length, but it would be over-sensitive to detect the genuine signatures. The method used for the detection of quasi-straight line segments is given in Algorithm 1. The algorithm takes the edge map E, non-singular code n and singular code s as input. It selects an initial run of two consecutive pixels p 1 and p 2 , where the direction code from p 1 to p 2 is n, i.e., dir(p 1 ,p 2 ) ¼ n. Then, this run, initiated from the seed ðp 1 ; p 2 Þ, extends along the curve, if possible, in two prospective directions n and ðn þ 4Þ mod 8 (the non-singular direction n and its opposite direction). The singular direction s is an important part of this extension which is another input to the algorithm. To extend the current run, the Procedure Extend-Segment is invoked as shown in Line 6 and Line 7 of Algorithm 1. The maximal quasistraight line segment is finally reported, which started from the seed ðp 1 ; p 2 Þ (see Fig. 3). The process is continued to report all quasi-straight line segments that suit non-singular code n and singular code s, thereby giving a set (class) of quasi-straight line segments. Different sets (classes) are reported by varying the code values n and s. The algorithm shown here considers that s 6 ¼ . In case of s ¼ (e.g., C 1 : n ¼ 0, s ¼ ), the segment from the seed ðp 1 ; p 2 Þ is extended in direction n and ðn þ 4Þ mod 8 just by checking the neighborhood of the pixels. The algorithm halts at a point if either its neighbor is not in direction n (or ðn þ 4Þ mod 8) or there exists no unvisited neighbor pixel. It must be noted that the Procedure Extend-Segment returns Status= TRUE only if the singular code s appears at least once in the run when s 6 ¼ . In our proposed method, a quasi-straight line segment is considered for inclusion into a class if its length is at least a predefined threshold l (see Line 8 of Algorithm 1). The first-hand selection of l is made on the basis of experimentations so that very small segments are removed from consideration. Necessary discussions are provided in Sect. 3.3.
Search for the next seed-pair (p 1 , -The number of quasi-straight line segments in the class C i , denoted as n i . -The pixel density of the class C i , i.e., p i P , where p i is the number of edge pixels in the class and P is total edge pixels from the signature boundary E. -Average edge length, i.e., p i n i (in terms of pixels) in the class.
Hence, from the twelve classes of line segments, we have the following 36 feature values. So, from this part, we have a feature vector hf 1 ; f 2 ; f 3 ; . . .; f 36 i, directly using the distribution of edge pixels among classes. Here, as an example, the distribution of edge pixels in class C 5 and class C 6 are shown in Fig. 5 with respect to the signature image shown in Fig. 1.
Further, the count of common pixels (c p ) in two neighboring classes C i and C j where j ¼ ði þ 1Þ mod 12 is considered as a feature of C i . The existence of common pixels in two neighboring classes defines the smoothness of the boundary curves. We consider the ratio c p P for every class and this adds twelve more features to the feature vector, i.e., we get hf 37 ; f 38 ; f 39 ; . . .; f 48 i.
In addition to these 48 features, we added 30 more features. The rectangular signature area is first divided into six equal-sized rectangular regions R 1 ; R 2 ; . . .; R 6 , as shown in Fig. 6. Then, we ask the two questions as given below. 1. Considering a region R i , which class C j has maximum contribution in R i , i.e., which class is the leader in a given region? 2. Given the class C j , in which region does it have the maximum contribution and what is the contribution (pixel density)?
These pieces of information help us to measure the degree of presence of the classes within the signature image area.
The first question provides us with 6 values, i.e., the six class numbers, which are region-wise leaders. On the other hand, the second question provides us with 24 values. If the class C i (one of the 12 classes) is the leader in region R j ðj ¼ 1; 2; . . .; 6Þ, we have feature values j and r ij P , where, r ij is the count of pixels in region R j concerning class C i . Hence, from this part, we obtain a total of 30 feature values, i.e., we have the feature vector, hf 49 ; f 50 ; f 51 ; . . .; f 78 i.

Sample feature vector
We have a feature vector of length 78 for every signature image. As an example, the extracted feature vector for the signature image shown in Fig. 1 is presented in Table 2. The first three columns represent the feature values hf 1 ; f 2 ; f 3 ; . . .; f 36 i providing the number of quasistraight line segments (n i ), pixel density (p i =P), and average edge length (p i =n i ) of all the twelve classes. The fourth column lists 12 values corresponding to 12 classes giving common pixel densities (c p =P) with the neighboring classes. The fifth column shows 12 values (one against each class), mentioning the region number where the class C i has a maximum contribution, denoted by m C i . The sixth column contains another 12 values showing the class-wise pixel densities indicated by r ij =P (for all i ¼ 1; 2; . . .; 12, as discussed in Sect. 2.2). The last column shows which class C i has a maximum contribution in a region R j i.e., the region-wise leaders, denoted as l R j . Here, for example, C 6 is the leader in R 2 , R 3 , R 4 , R 5 ; C 3 is the leader in R 6 ; C 11 is the leader in R 1 . We obtain 6 values from this part.

Test datasets
The CEDAR dataset contains signatures of 55 different signers. For each signer, there are 24 genuine signatures and 24 skillfully faked signatures. So, the CEDAR dataset has a total of 1320 genuine signatures and 1320 skilled forgeries. The signature images have 8-bit gray value levels and they were scanned at 300 dpi. The dataset is created by the Center of Excellence for Document Analysis and Recognition [8].
The GPDS corpus was prepared by Grupo de Procesado Digital de Senals [16]. The collection consisting of the first Fig. 5 Edge pixels in Class C 5 , n ¼ 1, s ¼ 2 (left) and Class C 6 , n ¼ 2, s ¼ 1 (right) for the signature image shown in Fig. 1 Fig . 6 The six regions R 1 , R 2 , R 3 , R 4 , R 5 and R 6 of the signature content area Table 2 Extracted feature set used in the proposed method in relation to the signature image shown in Fig. 1 Class ðC i Þ n i p i /P p i /n i c p =P m Ci r ij /P Sample signature images, both genuine and faked, from CEDAR and GPDS-100 datasets are shown in Figs. 15 and 16, respectively. In many cases, it is difficult to find out differences between the faked signatures and the genuine signatures of the respective person. Hence, we consider these types of forgeries as skilled forgeries.

Support vector machine (SVM)
The support vector machine (SVM) is a well-accepted classifier in two-class classification problems. It is observed that in the signature verification problem, the SVM was used extensively [2,5,6,29,47,50]. In our proposed method, performance evaluation is done for each individual signer, and our problem is considered a twoclass classification problem. We considered Class-0 and Class-1 to correspond with the genuine signature class and the faked signature class, respectively. For both CEDAR and GPDS-100, signatures from the genuine category and signatures from the faked category were taken to train the model. How many signatures are to be taken from each category depends on the training size in use. For example, we refer to this training size as h16?16i when 16 signatures from the genuine category and 16 signatures from the faked category were taken to train the model. Other than the size h16?16i, we used training sizes h12?12i and h8?8i for experimentation. We found that the LINEAR kernel shows better results than POLY, RBF, and SIGMOID on our feature set. So, we used the LINEAR kernel SVM for the classification work.
We used 8-fold cross-validation to set the value of parameter C. For every person, a set of 48 random signatures (equal contribution from genuine and faked signatures of that person) was considered, and they were divided into 8 equal groups (6 images per group). During crossvalidation, 7 groups, i.e., 42 images were used for training, and one group (6 images) was tested. The selection of the training and testing groups was made in all possible combinations, thereby giving a total of 48 test results (considering all different training samples). Again, the C value was updated/incremented in a loop with a small variation (0.0001), starting at 0.0001, and we checked up to 1. The corresponding C was selected for the person for whom we have reached a maximum accuracy rate during cross-validation. Accuracy means how many times Class-0 objects are reported as Class-0 objects and how many Class-1 objects are reported as Class-1 objects. It must be noted that 48 tests were carried out for every C value in the loop.

Results and discussions
For the implementation of the system, we have used C?? in Ubuntu 12.04, Kernel Linux 3.2.0-54-generic 64-bit, Intel Ò Core TM i5-2310 CPU 2.90GHz. SVM class of OpenCV has been used for classification. Our system takes edge length threshold, l, as user input, and this is the only parameter used as input. This parameter is used to restrict the small quasi-straight line segments from taking part in the feature extraction phase. A quasi-straight segment is considered if its length is at least l. We experimented with various l values empirically and observed that the system works well for l ¼ 4. The average error rates, as found by the proposed method taking various l, for the CEDAR dataset, are shown in Fig. 13. As an initial setup, for every individual, we had a training size of h16 þ 16i, i.e., 16 genuine signatures and 16 faked signatures were used for training, as mentioned in Sect. 3.2. Further, to report the error rates (FAR, FRR, and AER), we carried out the experiments four times by changing the training set randomly (test set consisting of the remaining signatures of that individual). The four types of different random arrangements are referred to as Set-1, Set-2, Set-3, and Set-4. Such a test result for Set-4 from the CEDAR dataset is shown in Table 3.
During each test, the corresponding error rates are recorded, and finally, the resultant FAR, FRR, and AER are reported. Other than the training size h16?16i, as mentioned in Sect. 3.2, we have tested the performance of our algorithm on training sizes h8?8i and h12?12i also. The corresponding set-wise results are shown in Tables 4 and 5 with respect to CEDAR and GPDS-100 datasets. The EER values are reported in Tables 4 and 5 against each training size considering the various sets of training-test arrangements.
Comparative results with existing methods are shown in Tables 6 and 7. Only those existing methods that use CEDAR and GPDS-100/160/200/300 datasets to report their results are shown. The corresponding results are taken from the respective papers where the works are reported. First, we will discuss the results we obtained in the CEDAR dataset and then we will discuss GPDS-100. We notice an average error rate below 3% (FAR = 3.35 and FRR = 2.39) for the proposed method when applied to the CEDAR dataset, considering training size h16?16i. We now discuss the training policies of the other models, and the corresponding FAR and FRR values. Four methods corresponding to the four best results, as shown in Table 6, other than the proposed method, are considered for the discussion. In their method, Bharathi and Shekar [5] used chain-code (4-directional)-based directional features from the contour of the signature. They have also used SVM as the classifier and obtained FAR = 7.84 and FRR = 9.36. In their model, they used 12 genuine samples of a person, and 108 random forgeries from other persons for training (taking 2 from the remaining other persons; 54 Â 2 ¼ 108). So, the test size against each person was 12 genuine signatures and 24 forgeries of the same person. Results of the proposed method using training size h12?12i are better than their results. Larkins and Mayo [33] used 16 genuine signatures for training. Then, eight genuine signatures along with 24 forgeries were used for testing in relation to the same person. They used the adaptive feature thresholding (AFT)-based similarity score finding method, and the automatic verification method resulted in FAR = 10.96 and FRR = 8. 16. The training policy of these methods is not directly comparable with ours; still, these pieces of information give us an idea about the performance of our proposed method. Kumar and Puhan [29] used the same training size as we, i.e., 16 genuine and 16 forgeries for each person. As a classifier, they also used SVM. Corresponding FAR and FRR values are 5.68 and 6.36, Fig. 13 The error rates (FRR, FAR and AER) are shown with respect to various edge length threshold l when applied on CEDAR dataset  respectively. In their recent work [44], Serdouk [34] used two types of training strategies. In the first strategy, they used genuine and simulated forgeries, whereas in the second, they used genuine and only random forgeries. In the testing phase, the remaining genuine and forgery samples were tested using a binary SVM. The second strategy is found to be comparable with our proposed method, as we are also employing random forgeries in training. Taking training size 16, they obtained FRR=6.22 and FAR=5.33. In conclusion, on the CEDAR dataset, the performance of our proposed method is either better than or comparable to other methods.
The GPDS dataset is available in various sizes. GPDS-100 consists of the samples of the first 100 persons, GPDS-160 of the first 160 persons, GPDS-200 of the first 200 persons and GPDS-300 refers to the set for all 300 persons. Our test results shown here refer to the GPDS-100 dataset. For the GPDS-100 dataset, the average error rate according to our method is 12.42, which is comparable to other methods (see Table 7). We obtained this result when the training size h16?16i was used. Serdouk et al. also provided results for GPDS-100, and the corresponding FAR and FRR are reported as 13.16 and 11.38 (AER = 12.52) taking training size = 16, which are comparable to our results. Our proposed method leads to FAR = 15.04 and FRR = 7.85, respectively. Good results are obtained for many individuals (100 signers), but poor results for some persons downgrade the overall average accuracy. We have shown group-wise error rates of GPDS-100 in Fig. 14, where each group contains 10 signers. Group-1 is from signer 1 to 10, Group-2 is from signer 11 to 20, and so on. If we consider the best four groups (40% best results) with a lower average error rate, we have an AER close to 10%.   Zois et al. reported the same behavior using the GPDS-300 dataset in their recent paper [50]. We notice three other works in the literature that have been published very recently which employ pixel distribution-based features and geometric features. Zois et al. [51] presented a method based on lattice structure arrangements and pixel distribution. They used random training and testing sets and obtained the FAR ¼ 12:35%, FRR ¼ 12:21%, AER ¼ 12:28% for the CEDAR dataset and FAR ¼ 9:11%, FRR ¼ 5:05%, AER ¼ 7:08% for the GPDS. In the work proposed by Sharif et al. [45], the authors have used geometric features and features generated from the study of the local distribution of pixels. They have used a genetic algorithm-based selection of features and then finally SVM for classification work. The training and testing sizes ratio is observed as 70:30 in their experiments, and they obtained the FAR ¼ 4:17%, FRR ¼ 4:17%, AER ¼ 4:17% for the CEDAR dataset and FAR ¼ 6:67%, FRR ¼ 4:16%, AER ¼ 5:42% for the GPDS. Batool et al. [3] presented a method that generates the features by finding the distribution of pixels in regions of the signatures. They have used SVM for classification. The training and testing sizes ratio is noted as 70:30 in their experiments, and they obtained the FAR ¼ 3:34%, FRR ¼ 3:75% for the CEDAR dataset and FAR ¼ 9:17%, FRR ¼ 10:0% for the GPDS.
The proposed method for extracting features is significantly fast. For example, the CPU time required to generate all features for the image shown in Fig. 1 (size = 582 Â 486, #-edge-pixels=3638) is 0.028 Sec (excluding binarization). On average, the classification time for a single image is less than 0.2 Sec (excluding cross-validation tests) which shows the fitness of the feature set to be used in real-time applications.

Other datasets and skilled forgeries
In addition to the two datasets (CEDAR and GPDS-100), we have tested our method in two other datasets. The first of them is the dataset created at the Netherlands Forensic Institute (NFI) [7]. The dataset consists of authentic signatures from 100 newly introduced writers and faked signatures from 33 writers (the writers are NFI employees). In  Another dataset, consisting of faked signatures imitating the genuine signatures in the CEDAR dataset is created by us. A genuine signature is selected for each person, and we imitate the signature style. The copied signatures were written on plain paper and captured by a mobile camera. We refer to this dataset as mobile captured skilled forgeries of CEDAR genuine signatures or MCSFC in short. For preparing the MCSFC, special care has been taken to ensure that our faked signatures look as good as genuine signatures. Hence, these signatures are highly skilled forgeries. Sample signature images from MCSFC are shown in Fig. 17.   For MLP, we have the following parameter settings: number of hidden layers ¼ 40, learning rate ¼ 0:3, momentum ¼ 0:2, batch size ¼ 100, the number of epochs to train through is 500, and the validation threshold is equal to 20. For MCC, the batch size is set to 100, the classifier model is logistic regression with a ridge estimator, and the random width factor is set to 2.0. For RF, the bag sizes are the same as the training set size; the batch size is set to 100, the maximum depth of the trees are unlimited, the number of iterations to be performed is set to 100. For SL, the maximum boosting iterations are set to 500, and the batch size is considered as 100.
A separate test is conducted for testing on MCSFC using the training size h16?16i, as mentioned earlier. Three faked signatures from MCSFC are taken in relation to each person for the test. Even if there are maximum similarities with the genuine signatures, we observe notable accuracies in rejecting them as faked. The FAR value with respect to MCSFC is found to be 24:5%, i.e., 75:5% of skilled faked signatures are rejected as faked.
The plots in Fig. 18 show the corresponding FPR, FRR, EER values for CEDAR and GPDS dataset, respectively. The ROC plots concerning all the classification methods are shown in Fig. 20 for both CEDAR and GPDS datasets. Also, the Precision and Recall values are shown in Fig. 19 with respect to GPDS dataset. We notice that the SVM shows the best results and RF performs almost similarly.

Scale invariance
We tested our proposed feature set for the identification of scaled-down signature images. We used scaling factors of 2 and 3 and created images from the CEDAR dataset. Scaling factor k indicates that both height and width are reduced by a factor k. As representative data, we present the recognition accuracy when all the classification methods are used on Person-12 of the CEDAR dataset in Table 8. Here, the results related to 10-fold cross-validation and training-testing size ratio 80:20 are shown in the table. For the full dataset, we found that MLP and SVM shows an accuracy percentage just above 90.0 for both 10-fold cross-validation and the training-testing size ratio 80:20 when k ¼ 2, which is very promising because of the extracted feature values' normalized nature. Our observation is that when k increases to 3, the accuracy of the detection decreases.

Impactful features
For the CEDAR dataset, we consider one test loop for each of the 55 persons. In one loop there are 16 tests (as there are 24 genuine and 24 faked signatures and we use h16 þ 16i training). The WEKA-attribute-selection method reports the impactful features at the end of each person's test-loop, i.e., after 16 tests. So, 55 times we report the 78 features in descending order of their impact ranks. We have the following parameters settings for SVM: the number of xval folds to be used when estimating subset accuracy ¼ 5, the number of attributes to evaluate in parallel ¼ 1, seed(Seed to use for randomly generating xval splits)¼ 1, threshold ¼ 0:01. We observe that the WEKA-attribute selection method picks more frequently the first 36 features, i.e.,

Conclusion
In this paper, we have proposed a novel feature set for verification of signatures based on the distribution of boundary-edge pixels of the signature-image. Previously, in some other works, directional features in terms of the longest pixel run or chain code have been used in combination with additional features. In this work, newly proposed classes of quasi-straight line segments have been used to define the discriminating features. We conducted experiments on standard datasets like CEDAR and GPDS-100. Experimental results corroborate that the proposed feature set shows significantly good results and may be of use in real-time applications. Also, there are scopes to combine with other geometric features like convex hull shape, count and locations of endpoints, count of closed loops, etc., to improve the accuracy level. We must mention that the parameter, edge length threshold, plays a vital role in the process of feature generation and classification. Hence, the automatic selection of the edge length threshold will be worth investigating. This threshold may be related to the resolution of the signature images. Further, considering any two consecutive quasi-straight line segments from the boundary, an investigation may be worth to check whether their non-singular code values, singular code values, and non-singular run lengths together can perceive the local curvature information when joining the two segments. The nature of curvature and its amount can be important features in the recognition process. It is to be noted that no extra computation is needed to extract the curvature information. This curvature information may also improve the accuracy level of the model.
List of symbols Symbol: Meaning; E: Set of boundary edge pixels; C i : The i-th (i ¼ 1. . .12) class of quasi-straight line segments; n: Nonsingular code; s: Singular code; l: The straight edge length threshold; Q: The set of quasi-straight line segments; q i : The i-th segment in the set Q; p: A pixel in E; dðp 1 ; p 2 Þ: Chain code direction from pixel p 1 to pixel p 2 where p 2 is in 8-N of p 1 ; f i : The i-th feature; R i : The i-th region in the signature area; i ¼ 1. . .6; p i : The number of edge pixels in E belonging to i-th class; P: Total edge pixels in E; n i : Number of edge segments in the i-th class; c p : Number of common pixels in two neighboring classes C i and C j , j ¼ ði þ 1Þ mod 12; r ij : Number of pixels in the region R j concerning class C i ; m ci : The region number (from 1 to 6) where the class C i has the maximum contribution; l Rj : The class number (from 1 to 12) which has the maximum presence in region R j Funding Open Access funding provided by Fachhochschule Nordwestschweiz FHNW..

Declarations
Conflict of interest The authors declare that they have no conflict of interest.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons. org/licenses/by/4.0/.