1 Introduction

In the last decades, the recognition of character in freestyle and unrestricted handwritten document images has been given extensive attention. Text documents are being scanned and converted to digital format and editable format which can be stored in the computer for future use. Recognition of character has a many applications such as language-based learning, for blind reading aid, bank cheque processing and postal automation. Automation of character recognition in Arabic handwritten document particularly with scripts carrying multi-font is a challenging work. Arabic script has huge similarity of inter class and it differs from one writer to another depending on one’s writing style.

Recognition of script in languages like English, Latin and Chinese have sufficient literature review however previous studies shows that recognition of Arabic text in Arabic document is still insufficient works. Thus, Arabic character recognition is considered as more complicated due to different reasons. First, different styles of writing which makes character and text line recognition very intricate. Second, huge number of characters with their different shapes in different location and large character sets with high similarity of characters all makes the task of the off-line scripts recognition is not easy. Thus, a good system of recognition should be dealing with various styles of handwriting, various font sizes, and connected characters. Document images require to be preprocessed to remove noise. Then line, word, and characters to be segmented. After segmentation, discriminate features need to be identified, and identifying individual classifiers with the use of classifiers.

Because cursive nature of Arabic script, the improvement of Arabic optical character recognition systems which denoted by OCR includes numerous technical issues, particularly in the phase of extraction of features and classification. Most Arabic characters have madda, zigzag(s), and dot(s) related with the characters and these could be inside, below, or on top of the character. Several characters have the same form only the dots number, secondary strokes number, or position makes them distinct. Though several researchers are discussing and examining solutions for take care of these issues. In this direction little advancement has been done.

There is no best pattern recognition method investigated for recognition of Arabic handwritten script so far. Every methodology has some kind of advantages, disadvantages and limitations. To overcome challenges addressed earlier, the method need to take benefit out of variety of strengths from existed methods to construct multiple data based frame work in addition to the novelty.

The hybrid methodologies are formed by systems for recognition which utilizes dataset at classification level by combining at least two paradigms of complementary classification and a combination function for decision which combine the classifier outcome to produce a single decision. Or the combination at feature extraction stage combining at least two primitives types to get better input character description, or at each stage of classification and feature extraction.

This work aims at evaluation of methods by considering set of four features and three classifiers is the first aim of this manuscript. The features which are used in this work are moments invariants (MI), run length matrix (RLM), statistical properties of intensity histogram (SFIH) and wavelet decomposition (WD). The classifiers which are used in this proposed work are modified quadratic discriminate functions (MQDF), support vector machine (SVM) and random forest (RF). To study the fusion effect of features, classifiers and accuracy rate of Arabic character recognition by choosing different combinations of classifiers and features is the second aim of this manuscript.

2 Related work

In any character recognition systems, segmentation phase is the significant task especially in Arabic handwritten recognition. A new text recognition method has been presented in [1], which depends on analysis of a subset of documents. There are three main sub-set domains. In order to avert failure in detection, the skeletonization method, this did the correction in the situation of false alarms into building the segmentation of vertically connected characters. Accuracy rate achieved is 98.9 and 97.4 on IFN/ENIT and IFN/ENIT datasets respectively which is considered high.

Another method for handwritten recognition in Arabic script has been presented in [2]. Using characters’ shapes information, the method over-segments every word and then those extra break-points were deleted. The accuracy rate that achieved in the algorithm was 92.2% and the method suffers from over-segmentation.

A new method has been suggested in [3] for locating the segmentation points. It used word image thinning to fetch the width of the stroke of one pixel. To detect the ligatures of Arabic characters, geometry and shape are utilized in the segmentation procedure. The approach of the proposed work is worked with touching characters, in both case of ligatures touching the segmentation existing between the characters of consecutive closed and the existence of ligatures within characters in the state of open characters.

Defined three basic approaches for word segmentation which are holistic, internal segmentation and external segmentation have been presented in [4]. In holistic segmentation approach also called no segmentation in which generic entire word features are utilized for the purpose of recognition. So, no characters are segmented in this method for recognition. In the internal segmentation, characters segmentation and recognition have been followed. The approach commonly used in the segmentation is the external segmentation which is prior to the recognition.

Another method which depended on mathematical morphology namely regularities and singularities has been presented in [5]. The segmentation candidates are the regularities. By applying an opening operation to the word image of the Arabic handwritten, authors specified singularities along with regularities by subtracting the singularities from the original document images. Segmentation works for overlapping of characters, the accuracy rate that achieved in the algorithm was 81.88%.

Another method which depends on segmentation and extracting characters with connected block and searches the topographic features to recognize possible points have been suggested in [6]. In this block, the divisions of these points are depending on the average width of the character. The accuracy rate that achieved in this algorithm was 69.72%. This method facing two main problems which are failed in segmentation of horizontal overlapping characters’ and segmentation of large of Arabic handwritten text not dealt in the system.

Methods are applied to achieve a good accuracy rate using Hough Transform with skeletonization for Arabic text recognition and modeling framework of Gaussian mixture method has been suggested in [7] that cover possible false correction to create proficiency in vertical connected characters segmentation. The authors used Euclidean distance metrics and convex fusion to calculate the distance among neighboring overlap components, which is classified as an intra-word distance or an inter-word distance.

A model of the Gaussian mixture has been suggested in [8] for identification at word level. An endeavor has been done to examine and compare various classifiers utilizing various statistical tests has been presented in [9]. Authors prescribed to utilize 39 particular features relying upon the convex hull and topological features. From eight different classifiers with multilayer perceptron (MLP) accomplished 99.87% accuracy rate. The directional discrete cosine transform (DDCT) has been exploited in [10] where KNN and LDA are utilized for classification.

A recognition method for handwritten documents has been proposed in [11]. In this method extracted five features of the connected components namely, number of holes, relative X centroid, aspect ratio, sphericity, and relative Y centroid. After that computed skew of all the features, standard deviation and mean were computed and utilized LDA for script recognition. Different techniques of feature extraction used for recognition of handwritten script have been reviewed in [12]. Detailed review on recognition of script from multi-script has been explained in [13].

3 Proposed methodology

This manuscript concentrates on combination of features and fusion of classifiers, proposed a system as mentioned in Fig. 1 to recognize handwritten characters of Arabic script with multi fonts. This system integrates numerous features and classifiers to increase the recognition rate of classifiers.

Basically, the suggested work focuses on the character level recognition. The suggested model mimics the data recognition done by human during analysis from various viewpoints. To automate the script recognition, the fusion strategies concept is utilized. Fusion technique utilized to determine Arabic script type followed by recognizing the character. The classifiers and features are fused to arrive at final decision.

Fig. 1
figure 1

Feature and classifier level fusion for script recognition and identification

3.1 Pre-processing

The preprocessing steps are performed to convert every numeral image into a consistent form for achieving the objective of recognition. Pre-processing involves resizing, binarization, skew detection, median filtering, skew correction, and thinning operations. The following processes have been done; (1) To eliminate any salt and pepper noise from an image, the median filtering has been utilized (2) holes in the region of an object are filled (3) after that using Otsu’s segmentation method thresholding into binary images (4) Resizing every image into \(32 \times 32\) pixels size with the algorithm of nearest neighborhood interpolation. (5) Using morphological thinning operation thinned an image of one pixel so that wide could be determined. Then, (6) detection of skew is done to document images followed by (7) correction of skew [14].

3.2 Proposed model

Let \(C=\{c^1,c^2,\ldots ,c^k\}\) be the k classes set. Let \(F=\{f_{1},f_{2},\ldots ,f_{n}\}\) be the n features set. Given W is the characters set where \(W=\{w_{1},w_{2},\ldots ,w_{m}\}\) and m number of scripts in a multi-type font, Arabic document \(w_{i}\) is the set containing characters from ith document where \(1\le i\le m\). For every m Arabic script applied the techniques of feature extraction to get an individual set of the features f. For every \(f_{j}\) is a set of features train the every k classifier in the set C and get the Arabic script recognition accuracy. With given C, \(f_{j}\) and W authors obtained multiple results out these obtained best accuracy for given data set and classifier. Then select the classifier which is \(C^m_{j}\) that has the best accuracy to recognize the scripts within W utilizing a set of features \(f_{j}\). Evaluation of accuracy need a testing set and training set to test and train the classifier \(C^k\). Thus the whole data set \(w_{i}\) divided into testing set, validation set, and training set to train the part of classifiers. In similar way execute the experimentation for every n features with k classifiers to get a selected classifier list \(\{C^{k_{1}}_{1}(1\ldots m), C^{k_{2}}_{2}(1\ldots m),\ldots ,C^{k_{p}}_{n}(1\ldots m)\}\) which is for every n feature is the highest accuracy.

The proposed model is mentioned in Fig. 1 has n features with k classifiers for recognizing Arabic script in an effective way using combination of features and fusion of classifiers.

3.2.1 Features fusion

In feature fusion level, four sets of features are computed by utilizing MIs, RLM, SFIH, and WD. These initial vectors of the features are utilized for evaluating the highest possible rate of the recognition. Using the rule of features concatenation, merged the features from n features which can be denoted as \(F^+=\{f_{1},f_{2},\ldots ,f_{n}\}\), which is an input to the classifier level. The decision with the highest rrecognition accuracy of classifier fusion is chosen as the final decision for Arabic script recognition.

The feature selection has been done to decrease computation time and feature space which increases the accuracy of prediction. This is done through eliminating irrelevant, noisy, and redundant features. It selects the features subset that can make the best performance in the time of computation and accuracy terms. The previous features selection represents the reduction of dimensionality. For reduction of dimensionality, Principal Components Analysis(PCA) is considered. Presented a data set on n dimensions, PCA is used to get a linear dimension subspace which denoted by d which is lesser than n such that the points of data mainly lie over these linear subspaces that reduced subspace aims to preserve most of the data variability. Using PCA for reduction of dimensionality that obtains the minimum number of features giving the maximum possible rate of recognition utilizing the full vector of the feature for every procedure. Analyzing the feature fusion effect by combining the features with different combination and rate of recognition is evaluated.

The features which from primitive are extracted in features fusion phase is compared to the features set of the model. Like the classification is performed generally according to minimizing criterion of the Euclidian distance among vectors of feature. The normalization ought react with a rule which every component of feature should be equally handled to the distance for its contribution. The rationale presented for this rule usually is that it blocks particular features from dominating calculations of distance merely because the features have big numerical values and after normalization every component of feature on the entire set of data is between zero and one to do this a method of linear stretch is being utilized. A weighing method which is known as feature contrast is used to do an unsupervised feature selection. \(F_{i}=\{f_{i,1},f_{i,2},\ldots ,f_{i,n}\}\) denote the ith\((n-d)\) features fused vector. Eq. 1 define the contrast component of jth feature vector.

$$\xi _{j}=\frac{\max _{i}(f_{i,j}) -mean_{i}(f_{i,j})}{\max _{i}(f_{i,j})+mean_{i}(f_{i,j})}$$
(1)

After that every component of feature is weighed by its contrast of feature to the maximum contrast of feature of every component of feature, that is,

$$F_{i}^*=\frac{1}{\max _{j}(\xi _{j})}{\{\xi _{1}f_{i,1}, \{\xi _{2}f_{i,2},\ldots ,\{\xi _{n}f_{i,n}\}\}\}}$$
(2)

A popular feature fusion method can first merge different features and then proceed to select features to get optimal subset of features such as PCA.

3.2.2 Classifiers fusion

In classifier fusion level, to get the best possible performance of the classification is the main aim of layout pattern recognition system. This aims led to improvement of various classifications schemes for any pattern recognition issue to be solved. Evaluation of the experimental results to the various designs would be the foundation for selecting one of the classifiers which should be the best as the final solution. In such design studies, It had been noticed that the patterns sets misclassified by various classifiers will not necessarily overlap even though one of the designs will yield the best recognition performance. This indicated that various designs of classifier potentially given complementary information on the classified pattern which could be utilized to develop the selected classifier performance. Fusion classifiers can be used rather than depending on the scheme of one decision. Fusion of individual classifier results overcome trainability of one classifiers and deficiencies of features. Results from numerous classifiers can be fused to give a better accurate result. The fusion of classifier has two forms: fusion of such as classifiers which on various sets of data are trained and dissimilar classifiers fusion. The results are confidence associated with every class. For fusing the outputs of every suggested classifier utilized an aggregation function because these results directly cannot be compared. Utilizing the method of the majority voting various classifiers decisions are combined, here \(D=\{D_{F}^1(1\ldots m),D_{F}^2(1\ldots m),\ldots ,D_{F}^k(1\ldots m)\}\) where D is the decision of every kth classifier for F features set inclusive \(F^+\). Like the one which has been selected by the classifiers majority the script is recognized as mentioned in Fig. 1. The final fused decision \(D_{j}\) of the jth class are supported assignment as in Eq. 3:

$$D_j= \sum _{CD=1}^{k}\varphi _{CD} * O_{jCD} ,\quad 1\le j\le k$$
(3)
$$\varphi _{CD}= \frac{D_{CD}}{\sum _{CD=1}^{k}D_{CD}}$$
(4)

Here k indicates the number of classes, CD indicates the classifier decision, \(O_{jCD}\) indicates CDth unknown pattern to the jth class with \(1 \le j \le k\). Decision fusion counting votes class-based weighted for decisions of individual class to help a final decision. The final decision calculated as \(D=\max (D_{j})\) where \(1 \le j \le k\).

3.3 Features extraction

To prove the proposed model efficiency, the following various features set are used. These features set are extensively utilized in recent research. A brief description is given in the following points:

3.3.1 MIs features

In a binary image X by Y, a shape regular moment is defined as Eq. 5:

$$s_{mn}=\sum _{j=0}^{Y-1}\sum _{i=0}^{X-1}i^mj^nf(i,j)$$
(5)

where the intensity is denoted by f(ij) at the coordinate (ij), f is the pixel value either 1 or 0 and \(m+n\) is the moment order. Eq. 6 determined the centroid coordinates:

$$i'=\frac{s_{10}}{s_{00}}\quad {{\text {and}}}\quad j'=\frac{s_{01}}{s_{00}}$$
(6)

Then relative moments utilizing Eq. 7 for central moments are calculated:

$$s_{mn}=\sum _{j=0}^{Y-1}\sum _{i=0}^{X-1}(i-i')^m(j-j')^nf(i,j)$$
(7)

A seven-moment group of rotational invariant functions which form a proper representation of shape were derived in [15,16,17]. The moment feature equations used in this work is shown in Eq. 8.

$$\begin{aligned} MF_{1}&=(s_{20}+s_{02}), \nonumber \\ MF_{2}&=(s_{20}-s_{02})^2+4s_{11}^2, \nonumber \\ MF_{3}&=(s_{30}-3s_{21})^2+(3s_{21}-s_{30})^2, \nonumber \\ MF_{4}&=(s_{30}+s_{12})^2+(s_{21}+s_{03})^2, \nonumber \\ MF_{5}&=(s_{30}-3s_{12})(s_{30}+s_{12})((s_{30}-s_{12})^2 \nonumber \\&\quad -\,3(s_{21}-s_{03})^2)+(3s_{12}-s_{30})(s_{21} \nonumber \\&\quad +\,s_{03})(3(s_{30}+s_{12})^2-(s_{21}+s_{03})^2) \nonumber \\ MF_{6}&=(s_{20}-s_{02})((s_{30}+s_{12})^2-(s_{21}+s_{03})^2) \nonumber \\&\quad +\,4s_{11}(s_{30}+3s_{12})(s_{21}+s_{03}) \nonumber \\ MF_{7}&=(3s_{12}-s_{30})(s_{30}+s_{12})((s_{30}+s_{12})^2 \nonumber \\&\quad -\,3(s_{21}-s_{03})^2)-(s_{30}-3s_{12})(s_{21}\nonumber \\&\quad +\,s_{03})(3(s_{30}+s_{12})^2-(s_{21}+s_{03})^2) \end{aligned}$$
(8)

3.3.2 RLM features

The statistics of run-length capture in particular directions is the texture coarseness. Like consecutive string pixels with similar intensity of gray level defined a run along a particular linear orientation. Coarse textures with important various intensities of gray level have a lot of long runs while fine textures head for including with similar intensities of gray level has a more short run [18]. Every element p(ij) performs with intensity gray level pixels the runs number which equals to i and run-length equal to j straight a particular orientation, a P run-length matrix is defined. The matrix P size is nxk, wherein the matching image k with a possible maximum length of the run is equal and in the image, n denoted to the maximum level of gray. Utilizing a displacement vector which denoted by d(xy) orientation is defined, where x and y for the x-axis and y-axis are the displacements, respectively, which \(135^\circ\), \(90^\circ\), \(45^\circ\), and \(0^\circ\) are the typical orientations and computing the encoding of run-length for every direction will create four matrices of run-length. When the matrices of run-length are computing along every direction, various descriptors of texture are computed to capture the properties of texture and distinguish between various textures [18].

3.3.3 SFIH features

Depending on intensity histogram statistical properties for texture analysis is the frequently used technique which depending on statistical moments as a measure [19]. The Eqs. 9 and 10 for the nth order moments is the expression about the mean:

$$\rho _{n}= \sum\limits_{i=0}^{L-1}(R_{i}-X)^Y \mu (R_{i})$$
(9)
$$X = \sum\limits_{i=0}^{L-1}R_{i}\mu (R_{i})$$
(10)
$$\begin{aligned} \rho _{n}= & {} {\bar{X}}=\frac{1}{N}\sum _{i}X_{i} \nonumber \\ \rho _{3}= & {} \frac{\sum (X-\rho )^3}{N\sigma ^4} \nonumber \\ \rho _{4}= & {} \frac{\sum (X-\rho )^4}{N\sigma ^4}-3 \nonumber \\ \sigma= & {} \sqrt{\frac{1}{N-1}\left( \sum (X_{i}-{\bar{X}})^2\right) } \nonumber \\ \sigma ^2= & {} \frac{1}{N-1}\left( \sum (X_{i}-{\bar{X}})^2\right) \nonumber \\ Z= & {} 1-\frac{1}{(1+\sigma ^2)} \nonumber \\ U= & {} \sum _{i=0}^{L-1}\mu ^2(R_{i}) \nonumber \\ RPC= & {} \frac{n_{r}}{U(i,j)*j} \nonumber \\ E= & {} -\sum _{i=0}^{L-1}\mu (R_{i})\log _{2}\mu (R_{i}) \nonumber \\ RLNU= & {} \frac{1}{N_{r}}\sum _{i=1}^{X}\left(\sum _{j=1}^{Y}\mu (i,j)\right)^2 \end{aligned}$$
(11)

where the probable intensity levels number denoted by L, the average or mean intensity denoted by X, an indicating intensity of the random variable denoted by \(R_{i}\) and in the image the intensity levels histogram denoted by \(\mu (R_{i})\). The statistical feature set equations used in this work are shown in Eq. 11.

3.3.4 Wavelet decomposition

Haar wavelet is applied WD indicated to wavelet decomposition of single level on matrices of every original input channels of gray scale. For converting every channel into a matrix, four coefficients must be calculated to show its results. The gained matrix of the coefficients approximation by cA is denoted and horizontal denoted by cH, vertical denoted by cV and diagonal denoted by cD which are three detail coefficient matrices. Then calculate each detail of coefficients and coefficients matrix of every reconstructed channel in order to analyze wavelet function. After that, every channel feature is extracted via computing every gained energy matrix of reconstructed coefficients. The features number is 12 because of gained four reconstructed matrices of coefficients.

3.4 Learning classifier

In general, the main decision phase is the classification of the OCR system. To prove the proposed model efficiency, the following various classifiers methods are used. These classifiers methods are extensively utilized in recent research. A brief description is given in the following points:

3.4.1 MQDF classifier

According to the decision rule of Bayesian for the objective of patterns classification, the quadratic discriminant function which denoted by QDF under the hypothesis is derived which the hypothesis is the Gaussian distribution satisfying by probabilities of conditional and the previous every word class probabilities are equal.

$$g_{i}(x)=(x-\mu _{i})^T\sum _{i}^{-1}(x-\mu _{i})+\log |\varSigma _{i}|\quad i=1,2,3,\ldots ,C$$
(12)

In Eq. 12 the distance of QDF can be formed [20] for a feature vector of d-dimensional denoted by x, the total word class numbers denoted by C, the matrix of the covariance denoted by \(\varSigma _{i}\) and the vector of the mean denoted by \(\mu _{i}\) of \(\varphi _{i}\) class. In the state of limited samples which let to critical covariance matrix estimation error so to avoid the error of the estimation in case of a small training set with a constant which denoted by \(\sigma ^2\) did the minor values of Eigen replacement and utilized over \(\varSigma _{i}\) orthogonal decomposition, the MQDF derived suggested distance as Eq. 13:

$$g_{i}(x)= \frac{1}{\varsigma ^2}(|x-\mu _{i}|^2-\sum _{i=1}^{N}\left( 1-\frac{\varsigma ^2}{\omega _{ji}^T}\right) [\lambda _{ji}(x-\mu _{i})^2] +\,\lambda _{ji}^T\sum _{i=1}^{N}\log \omega _{ji}+(d-N)\log \varsigma ^2\quad j=1,2,3,\ldots ,C$$
(13)

Equation 13 parameters rated through maximum estimation of likelihood framework which denoted by MLE where d indicated the number of dimensional, principal axes number of dominant denoted by N, the corresponding eigenvector indicated by \(\lambda _{ji}\) and ith eigenvalue with descending order indicated by \(\omega _{ji}\) both of \(\varSigma _{i}\) where \(N<d\) among all classes related to Eq. 13 and the sample of the test. Also, MQDF computes the distances among the sample of the test and every class related to Eq. 13 through the classification, also with minimum distances is choose K classes like results of the candidate. The results of K the candidates are relating to their distances coordinated with increasing order. The distances results of the candidates will be utilized to compute the score of recognition.

3.4.2 SVM classifier

Support vector machine denoted by SVM which is the powerful identifiable classifier [21, 22]. Essentially SVM utilized to specify the surface of the decision in Eq. 14 which classified using \(\varPhi\) which is a nonlinear transformation, also in the situation of the linear inseparable data. The optimal hyper-plane got through solving a problem of quadratic programming that is relying on parameters of the regularization. This transformation was executed through kernel functions as sigmoid linear, the function of radial basis, and kernel types of polynomial calculated as the following:

  • The Sigmoid kernel: \(K(x,y)=\tanh (\beta _{0}xy+\beta _{1})\)

  • The linear kernel: \(K(x,y)=x \times y\)

  • RBF kernel which is Radial Basis Function: \(K(x,y)=\exp (-\gamma \left\| x-y\right\| ^2)\)

  • The polynomial kernel: \(K(x,y)=[(x\times y)+1]^d\)

With the parameters of \(\beta _{0}\), \(\gamma\), d and \(\beta _{1}\) which will be empirically specified.

$$f(x)=W^T\varPhi (x)+b$$
(14)

where a feature map denoted by \(\varPhi (x)\) and \(b\in R\), \(W\in R^n\). Since the space of the feature is inseparable linearly in this work, a transformation through matching \((x_{i},y_{i})\) is used which is the input data to a space of feature of higher dimensional through using \(\varPhi (x)\) which is an operator of nonlinear. The optimal hyper-plane realized as the following as an outcome:

$$f(x)= sgn\left( \sum y_{i}a_{i}K(x_{i},x)+b\right)$$
(15)
$$K(x_{i},x)= \exp (-\gamma \left\| x_{i}-x\right\| ^2)$$
(16)

Here the kernel function established on a radial basis function denoted by RBF. This model of the classifier called RBF kernel SVM is inserted to execute Arabic Handwritten word classification.

3.4.3 Random forest

RF [23] is a learning method of the ensemble for regression and classification. At time of the training, it builds a decision trees number and the classes mode is the output class, it is tree predictors’ fusion which all trees relying upon the random vector values independently sampled in the forest with similar distribution of each tree. From the set of training this model constructed which made \({\bar{p}}\) indicated to predictions through looking at the point neighborhood for \(a'\) new points that is given via function of weight W as in Eq. 17:

$${\bar{p}}=\sum _{j=1}^{N}W(a_{j},a')b_{j}$$
(17)

Here the non-negative \(i^{\prime }\)th training weight point related to \(a'\) new point denoted by \(W(a_{i},a')\) that is the training data fraction same leaf falls like \(a'\), where the weights should sum with \(a'\). The Random Forest averages trees set predictions with functions of individual weight, Eq. 18 showed its predictions:

$${\bar{p}}=\frac{1}{M}\sum _{i=1}^{M}\sum _{j=1}^{N}W_{i}(a_{j},a')b_{j}=\sum _{j=1}^{N}\left( \frac{1}{M}\sum _{i=1}^{M}W_{i}(a_{j},a')\right) b_{j}$$
(18)

4 Experimentation

Experimentations are done on a different multi-font level such as the document was written using more than one Arabic script font. In dataset of IFN/ENIT, it is selected 0.35 for the threshold score of recognition that is appropriate for the set of training to do the next layer training and 0.42 for the threshold score of recognition that is appropriate for the set of training of AHCD to do the next layer. The experiment carried out with 60%, 40%, 20%, and 10% for training over IFN/ENIT and AHCD datasets and 40%, 60%, 80%, and 90% utilized for testing. Further, the offered part of training is split into a set of validations and set of training. Other possible features of extraction or dimensionality or classifier methods can be utilized on the proposed models.

The recognition score threshold in the fusion process to be determined where samples with smaller than the threshold in recognition score will be chosen to train the classifier of the next layer. However, specifying threshold \(\theta\) must not be too large because it leads to create good samples for recognition instead of taking challenging samples. Otherwise, those good samples which are selected to train the classifier of the next layer might cause the classifier of the next layer incapable to focus over these challenging samples. Accordingly, it will be lowering the classifier fusion performance. It is worth mentioning here that it must not be so small as it leads to make the number of selected samples so small that the reason the issue of numerical will be unstable through next training layer.

The entire training samples are denoted by m, the number samples with recognition score smaller than \(\theta\) what through f classifier is recognized is denoted by \(m'\), the ratio for a f classifier and determined score of recognition is denoted by R() as shown in Eq. 19, the correct sample number recognized by f with scores of recognition larger than \(\theta\) denoted by M and the correct rate of the sample with larger than \(\theta\) in the score of recognition is denoted by CRR as in Eq. 20.

$$R(f,\theta )= \frac{m'}{m}$$
(19)
$$CRR(f,\theta )= \frac{M}{m-m'}$$
(20)

IFN/ENIT dataset contains 26,400 words and 210,000 Arabic characters which are written by 411 writers. The experiment results are presented in Table 1. The accuracy rate of the characters’ recognition has been done by choosing 60/40 percentage of dataset samples by m-fold method for training/testing on IFN/ENIT. Table 1 shows the accuracy rate for the proposed model over four Arabic font types as single font type in documents and multi-font type which include in one document mixing of a multi-type font.

AHCD dataset contains 16,800 characters which are written by 60 writers. The experiment results are presented in Table 1 which shows the character recognition accuracy rate with 60/40 percentage for training/testing on AHCD. Table 1 shows the accuracy rate for the proposed model over four Arabic font type as single font type in documents and multi-font type which include in one document mixing of a multi-type font.

The comparison of the proposed model for every font type over two datasets with 60/40 for training /testing is displayed in Fig. 2. The proposed model gained a good and high recognition rate for all Arabic font type especially with Naskh font type which got the highest accuracy rate among them. The characters’ recognition results of Arabic handwritten script visualized in Table 1. In Table 1, the group contains one feature gives the maximum accuracy rate of the recognition utilizing MIs, followed by RLM, and SFIH. The WD gave the minimum accuracy rate of recognition. The group contains two features fusion gives the maximum accuracy rate for recognition utilizing (MIs, RLM), followed by (MIs, SFIH), (RLM, SFIH), (MIs, WD), and (RLM, WD) respectively. The (SFIH, WD) gave the minimum accuracy rate of recognition. In group contains three features fusion gives the maximum accuracy rate of the recognition utilizing (MIs, RLM, SFIH) followed by (MIs, SFIH, WD). The (RLM, SFIH, WD) gave the minimum accuracy rate of recognition.

Table 1 reveals that three features fusion and two features fusion results are better than utilizing one feature where three features fusion gives the maximum accuracy utilizing (MIs, RLM, SFIH) features and the minimum accuracy rate obtained when using WD from one feature group.

Classifier with group contains tow classifiers fusion and another group contains three classifiers fusion mentioned in Table 1. The maximum accuracy rate of the recognition once used one feature and two features fusion was achieved utilizing (MQDF, SVM, RF) followed by (MQDF, RF), (SVM, RF) classifiers respectively. Once used three features fusion the maximum accuracy rate of the recognition was achieved utilizing (MQDF, SVM, RF) followed by (SVM, RF), (MQDF, RF) classifiers respectively. The (MQDF, SVM) gave the minimum accuracy rate of recognition.

Figure 3 shows few samples of handwritten character recognition even with overlapping and touching characters in Arabic document images with all mentioned font type over AHCD and IFN/ENIT datasets and from other datasets for the validation purpose.

From Fig. 3 authors can noted that the proposed model recognizes the ligature character which is most challenging in Arabic script which is not present in most of the languages script as English and Chinese characters. The suggested model also solved the issue of touching characters and overlapped characters.

Table 2 displays the Comparison among the suggested method from the global fusion of ALL features and classifiers with various division samples for training and testing of AHCD and IFN/ENIT datasets. The suggested method as shown in Fig. 1 used four feature descriptors which are: MIs, RLM, SFIH and WD, for classification used three classifiers which are: MQDF, SVM and RF. Table 3 presented the Comparison among the suggested method and various well-known methods which are used with AHCD dataset. The results of the recognition of character process visualize by Fig. 3. Table 4 displays the Comparison among the suggested method and various well-known methods which are used with IFN/ENIT dataset. The results of the recognition of character process visualize by Fig. 3. Table 5 presented the comparison among the suggested method and various well-known methods which are used with HACDB dataset. The results of the recognition of character process visualize in Fig. 3.

Table 1 Arabic script recognition accuracies
Fig. 2
figure 2

Recognition accuracy rate on two datasets for various Arabic handwritten types

Fig. 3
figure 3

Samples of Arabic handwritten characters recognition results. a Test image and b output document

Table 2 Arabic script recognition accuracies from the global fusion of ALL features and classifiers (MI, RLM, SFIH, WD features) and (MQDF, SVM, RF classifiers) using various percentages for training/testing using two dataset
Table 3 Comparison between the proposed method and the well-known methods with AHCD dataset
Table 4 Comparison between the proposed method and the well-known methods with IFN/ENIT dataset
Table 5 Comparison between the proposed method and the well-known methods with HACDB dataset

5 Conclusion

In this manuscript, a Fusion based system for developing robust Arabic handwritten characters recognition has been proposed. It is the novel method proposed for fusion of classifier and features for Arabic script recognition by considering various font types as SH Roqa, Naskh, Farsi, Igaza and multi-fonts. In this method chose a classifier and features by the algorithm. The proposed model compared with the existed models by experimentation. The suggested model proved that the model works for complex Arabic handwritten characters. Models are executed over popular feature extraction and classifiers techniques. The fusion of suitable classifiers and features are suggested for developing the character recognition accuracy.

Authors have done three contributions in this manuscript. The first contribution is a new method for solving the slanting and skewing issue in Arabic handwritten character by utilizing a classifiers fusion. Three different classifiers combined at the level of decision. Every classifier evaluates the text from a given direction. Fuse those classifiers altogether increases the accuracy rate of the recognition significantly. The second contribution is a new method for solving the issue of touching characters and overlapped characters by utilizing a features fusion. Four different features combined at the level of features. Fusion those features altogether increases the recognition rate significantly. The third contribution is the suggested model recognizes the ligature character. That is most challenging in Arabic script which is not there in most language script as English and Chinese characters. These problems solved by taking the correct classifier by choosing the fusion of classifier with highest accuracy. Various designs of the classifier potentially on the classified pattern gave complementary information. That developed the chosen classifier performance.

The outcomes accomplished by the suggested system demonstrate that the suggested contributions by choosing the extracted features carefully as well as choosing the classifier with highest accuracy by fusing different classifiers are extremely promising which accomplish higher recognition rates compared with the previous studies.