Two-stage approach to extracting visual objects from paper documents

In the paper we present an approach to the automatic detection and identification of important elements in paper documents. This includes stamps, logos, printed text blocks, signatures and tables. Presented approach consists of two stages. The first one includes object detection by means of AdaBoost cascade of weak classifiers and Haar-like features. Resulting image blocks are, at the second stage, subjected to verification based on selected features calculated from recently proposed low-level descriptors combined with certain classifiers representing current machine-learning approaches. The training phase, for both stages, uses bootstrapping, i.e., integrative process, aiming at increasing the accuracy. Experiments performed on large set of digitized paper documents showed that adopted strategy is useful and efficient.


Introduction
From the historical point of view, paper document has been one of the basic means of human communication across ages. Although the information in such documents is represented in different languages, structures and forms, they often contain common elements such as stamps, signatures, tables, logos, blocks of text and background. It can be seen that in order to prevent document accumulation, most of valuable pieces are digitally scanned and kept as digital copies. Storing data this way makes process of document organizing, accessing and exchange easier but, even then without a managing system it is difficult to keep things in order. In the paper we present an approach to extract characteristic visual objects from paper document. According to [1] such an approach that is able to recognize digitized paper document may be used to transform it into hierarchical representation in terms of structure and content, which would allow for an easier exchange, editing, browsing, indexing, filling and retrieval.
Our algorithm can be a part of a document managing system, whose main purpose is to determine parts of the document that should be processed further (e.g., text [2]) or to be a subject of enhancement and denoising (e.g., graphics, pictures, charts [3]). It could be an integral part of any content-based image retrieval system, or simply a filter that would select only documents containing specific elements [4], segregate them in terms of importance (colored documents containing stamps and signatures are more valuable than monochromatic ones, which suggest a copy [5,10]), etc. Presented approach is document-type independent; hence, it can be applied to any formal documents, diploma, newspapers, postcards, envelopes, bank checks, etc.
The paper is organized as follows: first we review related works and point out their characteristic features; then, we demonstrate both stages of the algorithm and finally, we present selected experimental results. We conclude the paper with an in-depth discussion. as well as k-means algorithm for grouping. Mean accuracy equals to 94 %.
In the same survey, a list of top-down strategies was provided. Most of them rely on run-length analysis performed on binarized, skew-corrected documents. As an example vertical and horizontal histogram (of run-length) profiles are examined in terms of valley occurrence, which represents white space between blocks. Other solutions include usage of Gaussian pyramid in combination with low-level features or Gabor-filtered pixel clustering.
Heuristic methods combine bottom-up and top-down strategies. Usage of XY-cuts algorithm for joining components of the same label, which were obtained through classification performed on run-length matrix statistics, is a perfect example of such combination. Another approach makes use of quad-tree adaptive split-and-merge operations [6] to group or divide regions of high and low homogeneity accordingly. An analysis of fractal signature value, which is lower for background than other elements, proves its usefulness while processing documents of high complexity.
When we consider zone classification as a separate issue, it will allow us to put more focus onto multi-class discrimination problem. Keysers et al. [3] proposed a discrimination into eight different classes. The paper provides a comparative analysis of commonly used features. Among them, Tamura's histogram achieved the highest accuracy, but due to its computation complexity it was discarded in favor to less complex feature vectors. Reported error rate is equal to 2.1 %, but 72.7 % of logos and 31.4 % of tables were misclassified. Wang et al. [1] proposed 69-element feature vector, which was reduced to 25 elements during feature selection stage, which allowed to achieve mean accuracy of 98.45 %; however 84.64 % of logos and 72.73 % of "other" elements were misclassified.

Individual approach
Individual approach focuses on single class detection and recognition. It is based on classification of characteristic features, often in a scheme "one versus all". In our previous works [5,9] a similar problem of stamp detection and recognition was described in detail. It applies Hough line and circle transforms, color segmentation and heuristic techniques. As it was stated in above-mentioned literature survey, logo detection is a very similar problem and can be solved with a little tweak to our previously presented solution [5]. Other authors propose to use key-point analyzing algorithms like Scale-Invariant Feature Transform (SIFT), Speeded-Up Robust Features (SURF) and Features from Accelerated Segment Test (FAST) or Angular Radial Transform (ART). Two-step approaches similar to methods described in previous subsection are also highly popular.
It should be noted that the intraclass variance of table objects is a huge problem, since they can be very complex. Typical table usually consists of a header and cells forming rows and columns. The number of cells, rows and columns depends on the volume of information contained. Moreover, font, ruling and background can be styled differently. In [19] Hu et al. focused on different kinds of mistakes that could be made during table detection. They also made major assumption that the input document contains only one column of text with easily separable, non-overlapping lines [19]. Sameer et al. [20] proposed a solution based on line detection algorithm. Although their aim was to reconstruct tables, information on outermost line intersections could be used to determine table coordinates as well.
Signature and autograph detection methods may be derived from handwriting detection algorithms, but direct application of those methods is hampered by high intraclass variance caused by individual characteristic style of signatures [21]. When it comes to signatures recognition, much more effort was put into biometric aspects such as recognition carried on beforehand, manually extracted images of signatures. Zhu et al. [21] proposed an algorithm consisting of extensive pre-processing, multi-scale signature saliency measure calculation for each connected component and area merging based on proximity and curvilinear constraints. High accuracy (92.8 %) was achieved on popular Tobacco-800 database.
Keypoint-based algorithms are also popular in terms of signature segmentation. In [22] SUFR algorithm was used to determine keypoint location on images containing results of connected component analysis performed on image with erased text (only signature is visible) and with erased signature. For each keypoint a feature vector is extracted and stored in appropriate database. Components of query document are labeled according to the closest example from both databases. Text tagged component is erased; thus, a segmented signature is revealed. Connected component analysis is crucial part of the solution presented in [23]. The paper provides a comparative analysis of HOG, SIFT, gradient-based features, Local Ternary Patterns (LTP) and global low-level features. Classification is performed by SVM classifier. Experiments performed on Tobacco-800 database proved that the set containing gradient and low-level features was the best, achieving 95% accuracy.
Since in this paper only a selection of the most interesting methods was described, for a broad and recent literature survey on page segmentation and zone classification a reader is directed to the paper mentioned in the beginning [1].

Two-stage processing concept
We apply a two-stage approach to the page segmentation. This concept is definitely not novel in the computer vision field; however, in this particular task is rarely used. Similar ideas have been applied mostly to the problems of object detection, extraction and classification in other classes of digital images [24]. In most of them, the idea comes from the assumption that the first processing stage performs a rough detection of objects of interest, while the second one applies more precise means to improve the identification accuracy [25]. In many papers, the two-stage approach is related to the integration of features (e.g., appearance and spatio-temporal HOGs [26], difference-of-Gaussians and accumulated gradient projection vector [27], entropy of local histograms and heuristic features [28], edge information and SIFT features [29]), combining classifiers (e.g., SVM and random sample consensus-RANSAC [30], two stages of mean-shift clustering [31]), mixed approaches (e.g., Hough transform joined with DBSCAN clustering [32], edge map and SVM [33], HOG and SVM [34], two variants of snakes [35], particle swarm optimization and fuzzy classifier [36]).
The analysis of the literature shows that most of the algorithms often use image pre-processing techniques (e.g., document rectification), deal with restricted forms of analyzed documents (e.g., to checks) and employ sophisticated features together with multi-tier approaches. The other observation is that there is hardly any method aimed at the detection of all possible classes of visual objects in paper documents. It may be caused by non-trivial nature of the problem and different characteristics of analyzed graphical elements.
In the proposed approach, we do not apply any preprocessing and employ very efficient AdaBoost cascade which is implemented using integral image, hence giving very high processing speed. It should be stressed that we analyze probably most of all possible object types that can be found in documents, which has got no significant representation in the literature.

Algorithm description
In our approach we adopted an assumption that a successful extraction of visual objects from a paper document can be performed using a sequence of rather simple means. Hence, the developed algorithm consists of two subsequent stages. The first one is a rough detection of candidates, while the second one is a verification of found objects. The first stage is based on fast and simple approach, namely AdaBoost cascade of classifiers (employing Haar-like features). Since it results in significantly high number of false positives, it is supported with a verification stage using an additional classification employing a set of more complex features. The training of the algorithm (see Fig. 1) in terms of detection and verification employs working in a iterative manner, which yields improved accuracy, depending on the quality and volume of learning sets. As it can be seen from Fig. 1 the reference documents dataset is subjected to manual cropping of interesting visual objects. This is an initialization of detector and verifier. Then, in each step (either detection or verification) the training involves fine-tuning and extending the learning sets. After that, the algorithm stops. In each iteration the learning set is being extended based on the results of accuracy verification.
The detector accuracy is evaluated on the set of testing documents, while the verifier is tested on objects extracted by cascade detector.

Cascade training and detection
Candidates detection is performed by AdaBoost-based cascades of weak classifiers [37,38]. At the training stage we learned five individual cascades for specific types of objects, namely: stamps, logos/ornaments, texts, tables and signatures. Exemplary objects are presented in Fig. 2. Background blocks, being an additional class, were taken further as negative examples for training other cascades. The detection was performed using a sliding window of 24 × 24 elements on a pyramid of scales where in each iteration we downscaled an input image by 10 %. Such size and downscale step are a compromise between complexity, memory overhead and discriminative properties.
The training procedure is performed iteratively with bootstrapping. The first, preliminary training, is to initialize the classifier. For this stage we used manually selected positive and negative samples for each class, marked in images collected from Internet and from SigComp2009 [39]. The number of objects was limited in order to lower the processing time, having in mind the assumption, that after this iteration, positive and negative samples will be determined automatically.
Since the selection may by imperfect, in order to increase the detection accuracy we performed second iteration, in which the learning database was extended with objects resulted from previous iteration. We call it fine-tuning the detector (see Fig. 1). The positive results were added to the positive samples collection, while the negative to negative ones, respectively. It is a general rule that all samples from all classes except the selected one are put into negative part. The numbers of objects per class (in two iterations) are presented in Table 1. The class "background" was added in order to accumulate samples that were classified as other objects in the preliminary investigations. In the second iteration we removed background class since it gave very ambiguous results and it seems that detecting background using AdaBoost is not very accurate.
The effect of such fine-tuning is a removal of many false detections while retaining positive ones. Two examples of such situations are presented in Fig. 3. The first row presents the results of stamp detection after the first and the second iterations of training. The same applies to the second row in the same figure, yet it shows the results of signature detection. In both cases, the number of false detections has been reduced (however, not all of them have been eliminated). It  Stamps  150  740  324  2808   Logos  150  740  594  2538   Texts  150  740  707  2425   Signatures  150  740  224  2908   Tables  140  750  1283  1849 Background 150 740 n/a n/a Fig. 3 The effect of fine-tuning the detector (for stamps, in the first row, and signatures, in the second row, respectively) is possible that repeating above-presented stage again will increase the quality of a learning set further. We stopped at two iterations as a compromise between accuracy and computational overhead.

Verification stage
Detected candidates are verified using a set of low-level features. The initial learning set, upon which reference features were calculated, consists of manually extracted 219 logos, 452 text blocks, 251 signatures, 1590 stamps, 140 tables and 719 background areas. As in case of detection, background blocks are used as negative examples and we do not verify background detection accuracy. After the initial investigations and the analysis of confusion matrices, in the second iteration of verification, we extended the learning set using extra 60 tables, 120 signatures and 50 text areas. Logotype and stamp classes together are quite numerous, and since their verification accuracy was acceptable, they were not extended. It is a partial solution to the main observed problem; namely, many true-positive samples in signature and table classes were misclassified during the verification.
During our studies we selected eight feature sets, representing different approaches to low-level image description. They are presented in the following sections. Most of them (except binary version of LBP-LBPB) work on single-channel intensity images and do not relay on color information, which is an advantage.

First-order statistics (FOS)
We propose to use low-dimensional FOS as a base for further comparisons. Employed feature vector consists of six, direct, low-level attributes calculated from histogram of pixel intensities. These features are: mean pixel intensity, second (variance), third (skewness), fourth (kurtosis) central moment and entropy. They provide information about global characteristic of input image. A visualization of averaged FOS vectors over the whole learning database is presented in Fig. 4.  Table   1 6 −0.01 0.01   Table   1 11 0 0.01

Gray-level run-length statistics (GLRLS)
This feature vector consists of eleven attributes calculated from run-length matrix: short-run emphasis, longrun emphasis, gray-level non-uniformity, run-length nonuniformity, run-length non-uniformity, run percentage, low gray-level run emphasis, high gray-level run emphasis, short-run low gray-level emphasis, short-run high graylevel emphasis, long-run low gray-level emphasis, long-run high gray-level emphasis. Those features provide information about texture coarseness and/or fineness. Algorithm for GLRLM matrix calculation and a respective equations are presented in [40][41][42]. A visualization of averaged GLRLS vectors over the whole learning database is presented in Fig.  5.

Haralick's statistics (HS)
Well-known Haralick's properties are created from a set of 22 features calculated from gray-level co-occurrence matrix. A list of features used in our approach consists of: autocorrelation, contrast, correlation, cluster shade, cluster prominence, dissimilarity, energy, entropy, homogeneity, maximum probability, sum of squares: variance, sum average, sum variance, sum entropy, difference variance, difference entropy, information measures of correlation, inverse difference, inverse difference normalized, inverse difference moment normalized. Appropriate algorithms are available in [43][44][45]. A visualization of averaged HS vectors over the whole learning database is presented in Fig. 6.  Table   1 22  Table   1 5 0 0.01

Neighboring gray-level dependence statistics (NGLDS)
A very low-dimensional vector employing neighboring graylevel dependence statistics contains five values derived from NGLDM matrix, namely small number emphasis, large number emphasis, number non-uniformity, second moment and entropy. Element and their value distribution inside NGLDM matrix provide information about the level of texture coarseness. Algorithm for matrix calculations and respective equations are presented in [46]. A visualization of averaged NGLDS vectors over the whole learning database is presented in Fig. 7.  Table   1 12 0 0.1

Low-level features (LLF)
So-called low-level features are a result of our previous research on stamp detection and recognition [5,9]. This approach shares common features with measures proposed by Haralic et al. Created feature vector contains eleven values, namely contrast, correlation, energy and homogeneity calculated in the same way as in case of GLCM matrix. Other attributes include: average pixel intensity, standard deviation of intensity, median intensity, contrast, mean intensity to contrast ratio, intensity of edges and mean intensity to edges intensity ratio. A visualization of averaged LLF vectors over the whole learning database is presented in Fig. 8.

Histograms of oriented gradients (HOG)
In order to investigate the state-of-the-art methods aimed at object detection we added to the comparison the histogram of oriented gradients approach. It is a method proposed by Dalal and Triggs in [47] and proved to be effective in human detection in digital images, but as it was mentioned in the paper, the algorithm is also capable of distinguishing between objects of different types. Feature vector of HOG descriptor is 256-element long. A visualization of averaged HOG vectors over the whole learning database is presented in Fig. 9.

Local binary patterns (LBP)
The last of the discussed features are local binary patterns. It was introduced in [48] as a universal, fine-scale texture descriptor [49]. Similarly to HOG the output vector consists of 256 elements. In our case, local binary patterns come in two different variants. The first one is calculated on  Table   1 256 0 1 Fig. 9 Mean values of feature vectors (HOG) calculated for all classes in the learning set  Table   1 256 0 0.2

Fig. 10
Mean values of feature vectors (LBP) calculated for all classes in the learning set monochromatic image, for the second binarized image was supplied (LBPB). A visualization of averaged LBP vectors over the whole learning database is presented in Fig. 10.

Dimensionality reduction
As it can be seen from Figs. 6, 7, 8, 9 and 10 many feature vectors have values that are common for all distinguished classes. It is probable that by eliminating them we can reduce the dimensionality of feature space while retaining recognition accuracy. That is why in the experiments we employed a substage of dimensionality reduction/feature selection, namely: principal component analysis (PCA) [50], linear discriminant analysis (LDA) [51], information gain (IG) [52] and least   [53]. It is an improvement over a recent work [54].
In order to select the most discriminative information part in reduced feature spaces (after applying above algorithms) we performed an analysis of the distribution of energy (or importance levels) of reduced components. The visualizations of normalized components for each method of feature extraction are provided in Figs. 11, 12, 13 and 14. Selected components for further classification are marked.
As it can be seen, in some cases, only a fraction of calculated attributes were left (PCA), while in other cases the reduction algorithm selected more of them (less than half in case of LDA, more than half in case of IG and LASSO).

Experiments
The experiments were performed on our own database consisting of 719 digitized documents of various origin gathered, among other, from the Internet. It is the same database as one used in our previous work [5]. It contains scanned copies of  Fig. 15. First, an evaluation of the detection stage was performed. In the next step all generated examples were divided into two categories: positive and negative detections. This allowed us to calculate confusion matrices for each combination of classifier and feature set.

Detection stage
The decision whether the result should be considered positive or negative was made based on its bounding box area. Objects that are covered by approximately 75 % of resulting bounding box were classified positively. The results for both iterations are provided in Table 2. The mean detection accuracy after first iteration was equal to 54 % (with highest 80 % for text and lowest 14 % for signatures). Observed low accuracy is caused by high resemblance between classes, e.g., many logos were classified as stamps, large number of tables (which according to [6] should be considered as graph- ics) as printed text. The low accuracy for signatures comes from the lack of signatures in input documents; hence, we included the samples from SigComp2009, which are quite different in character. Examples of objects difficult to detect are presented in Fig. 16.
Lowest accuracy of signature detector results from different characteristics of examples used to train cascade (high resolution, bright and noise-free background, clear strokes, contrast ink) and the ones that are actually located on test  Table 2 one can see a significant increase in detection accuracy using a learning set obtained by two iterations of training procedure. After that, there is a significantly lower number of false detections, yet also slightly lower number of positive detections. A clearly visible significant increase in signatures detection rate is still far from ideal. It is caused by the fact that in most cases signatures are overlapped with other elements, such as stamps, text and signature lines.

Verification stage
Experiments described below were aimed at determining a combination of a classifier and a feature vector (from the selection presented in Sect. 3.2) that gives the highest possible verification accuracy, depending on the quality of input samples. The selection of classifiers we investigated consists of: 1-nearest neighbor (1NN), Naïve Bayes (NBayes), binary decision tree (CTree), support vector machine (SVM), general linear model regression (GLM) and classification and regression trees (CART). There were also two iterations of processing provided for comparison. In the first iteration, the learning set was composed of initial features calculated for manually selected samples. The verification at this stage involved a selected pair of feature vector and classifier employed on objects returned in the first iteration of detection stage (see Sect. 3.1). The second iteration of verification process employed an extended learning set (see Sect. 3.2) and a feature vector/classifier fed with an output returned after the second iteration of detection.
The following figures (Figs.17 and 18) show examples of correct detections/verifications and failed ones, respectively.  In each figure the objects are grouped in classes, as follows: stamps, logos, texts, signatures, tables.
As it can be seen from above figures, logotypes are often classified as stamps. Similar confusion applies to tables which are sometimes classified as text areas. What is more, the most problematic are tables which contain or are overlapped with graphical elements (e.g., logotypes or stamps).
In Tables 3, 4, 5, 6 and 7 verification accuracy for each class is presented (there are two columns of results for each classifier, each for subsequent iterations, respectively). The highest accuracy in the first iteration is underlined, while the highest accuracy in the second iteration is double underlined, respectively. Sometimes, more than one accuracy has the highest value; hence, more results are underlined.

Dimensionality reduction
In the experiments devoted to dimensionality reduction we employed k-nearest neighbor classifier and tenfold crossvalidation. We tried to decide whether the reduction is necessary, since selected features (especially LBP and HOG) have rather high-dimensional feature space. As it was mentioned, we used PCA, LDA, IG and LASSO methods, since they are well-known, general purpose methods of high efficiency. The results of this experiment are presented in Table 8. Bold values indicate reflect the highest verification rate among methods involving dimensionality reduction.
As it can be seen, in most cases LASSO gives the highest accuracy; however, it is still lower than classification performed on a non-reduced features. Although the difference is not high, introducing these kinds of reductions may not be justified mainly because of additional computation overhead. The only exception is the case when we should conserve memory space, but nowadays it is not always crucial. The results of above experiment show that this substage may be omitted without loss of accuracy.

Discussion
As it was shown in Table 4 verification accuracy of logodetecting cascade after second iteration had decreased. Large number of detected samples were misclassified as negative instead of positive. This is due to quite rigorous character of classifiers used. Taking into account the accuracy of the detection process (which is also done through classification) a cascade could be assigned a higher decision weight than the best pair of feature set and classifier used in verification to compensate for low precision in verification stage. Similar situation occurs in case of tables-again high detection accuracy is combined with low verification result. This is caused mostly by fuzzy boundary separating tables containing text from pure text class.
Average accuracies achieved at both stages of stamps and texts processing mean that equal decision weight could be assigned to both cascade and best combination of feature set and classifier. In both cases high precision of detection is coupled with high verification result. It is important to note that tables filled with text were classified as text. Otherwise, the results would be much lower.  As it was noted, signature class causes most of the problems. Higher detection accuracy is only a result of much lower FP rate. This is caused by the extension of the learning set (both in training of cascade and at the verification stage). Further increase, especially in case of positive samples number, would be beneficial.
The analysis of presented verification results shows that all of discussed object classes should be considered separately. Unfortunately, it is impossible to point out a single pair of classifier/feature vector that wins in all cases. There seems to be no one rule that is behind above results.
In case of stamp class, the most accurate pair consists of GLM classifier and HC features set and a pair of 1NN classifier and HOG descriptor comes at second. Those pairs alternate between iterations. Analogous observations were made in case of the worst pair. In the first iteration, GLM classifier and NGLDS features were worst and NBayes + GLRS were second worst. Reverse relationship occurred in the second iteration. The average accuracy across all sets is equal to 60.17 and 53.3 % in first and second iterations, respectively. HS is the most accurate descriptor (average accuracy of 70.42 %) in the first iteration and HOG (with 61.82 % average accuracy) in the second. An accuracy of 63.51 % places CART classifier as the best in the first iteration, and 61.86 % places CTree classier at the top in the second iteration. Results for remaining classes were described in similar manner-first percentage value always corresponds to the result achieved in the first iteration and so on.
In both iterations of logo verification SVM classifier and GLRLS features set proved to be the best. There was no recurrence in case of the worst pair. Average accuracy is equal to 48.6 and 29.04 %. The highest average score was achieved by SVM classifier (53.31, 34.12 %) and HOG descriptor (54.58, 38.11 %).
Bayes-based classifiers, namely NBayes+GLRLS and NBayes+HOG, achieved the highest accuracies in the first and the second iteration of text verification process, respectively. Analogous switch in terms of the best and the second best as in case of stamp occurred. Overall accuracy stands at 55. 52

Comparison with state-of-the-art methods
It is not easy to directly compare obtained results with other state-of-the-art methods, since the benchmark sets are very different. Moreover, the comparison with individual methods may not be justified because such methods employ classspecific approaches, which are tuned for particular object types. Hence, below, a not entirely meaningful comparison with certain, selected global approaches is provided. Taking into consideration average values, the detection accuracy in case of our algorithm is equal to 71.93 % and the verification accuracy (calculated for the best individual pairs) is equal to 78.48 %. When we exclude the most problematic class (in terms of detection), namely signatures, the detection accuracy rises to 82.61 % and the verification slightly drops to 76.59 %. It is because signatures are detected with a relatively low accuracy, yet their verification accuracy is quite high. In [3], the authors obtained an average detection accuracy equal to 81.84 %; however, when we consider only classes, that are similar to our case (however, without stamp class), the accuracy drops to 72.95 %. The main problem with that approach is a high number of misclassifications in case of tables. In our algorithm, tables are detected and verified with a very high accuracy. In [1] the mean accuracy for 9 classes is equal to 84.38 %. When we restrict the set in order to be similar to the one in our case (also without stamp class), it is equal to 89.11 %. The best result was obtained for printed text class, and again, the most problematic class is logotypes.
As it can be seen, our approach is comparable to the state-of-the-art approaches, while it features very intuitive processing flow and a significantly lower computational overhead. It also takes into consideration classes that are not analyzed in above-mentioned approaches, namely stamps and signatures. Having in mind increasing the learning datasets and introducing extra training iterations (at the detection stage), the accuracy may be even higher.

Summary
We have presented a novel approach to the extraction of visual objects from digitized paper documents. Its main contribution is a two-stage detection/verification idea based on iterative training and multiple features-classifiers pairs. As opposite to other known methods, the whole framework is common for various classes of objects. It also features classes that are not considered by other scientists in the global approaches, namely signatures and stamps. Performed extensive experiments showed that the whole idea proved to be valid. High accuracies achieved in in-depth analysis performed on large, real document set prove this fact further. Results from the second iteration (see Table 2) are particularly encouraging. Although there is a high similarity between some classes and numerous challenging examples throughout image database (see Fig. 16), the detection is successful. The signatures class is an exception, and the lower accuracy of detection/verification can be put down to the poor representation across databases. Increasing the size of learning set for signatures detection with high degree of probability would boost results as shown in case of the first and the second iteration.
High accuracies for certain classes in particular could lead to dropping the verification stage as it is redundant if cascade looks as like what it really is-a classifier itself. However, as long as there is more than a few of misclassified samples the use of this stage is justified. If we decide to use the verification stage, it is important to examine each class separately, as shown in previous section. It is well illustrated in Table 7. While overall accuracy is really low, accuracy for LLF feature set is several times higher than in case of any other feature set. As it was shown, the dimensionality reduction substage is not necessary, since it does not improve the classification accuracy.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecomm ons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.