A Hybrid Swarm and Gravitation-based feature selection algorithm for handwritten Indic script classification problem

In any multi-script environment, handwritten script classification is an unavoidable pre-requisite before the document images are fed to their respective Optical Character Recognition (OCR) engines. Over the years, this complex pattern classification problem has been solved by researchers proposing various feature vectors mostly having large dimensions, thereby increasing the computation complexity of the whole classification model. Feature Selection (FS) can serve as an intermediate step to reduce the size of the feature vectors by restricting them only to the essential and relevant features. In the present work, we have addressed this issue by introducing a new FS algorithm, called Hybrid Swarm and Gravitation-based FS (HSGFS). This algorithm has been applied over three feature vectors introduced in the literature recently—Distance-Hough Transform (DHT), Histogram of Oriented Gradients (HOG), and Modified log-Gabor (MLG) filter Transform. Three state-of-the-art classifiers, namely, Multi-Layer Perceptron (MLP), K-Nearest Neighbour (KNN), and Support Vector Machine (SVM), are used to evaluate the optimal subset of features generated by the proposed FS model. Handwritten datasets at block, text line, and word level, consisting of officially recognized 12 Indic scripts, are prepared for experimentation. An average improvement in the range of 2–5% is achieved in the classification accuracy by utilizing only about 75–80% of the original feature vectors on all three datasets. The proposed method also shows better performance when compared to some popularly used FS models. The codes used for implementing HSGFS can be found in the following Github link: https://github.com/Ritam-Guha/HSGFS.


Introduction
The past decade has witnessed an increased availability of digital images and high capacity low cost storing devices.This has made storage of handwritten or printed documents in digital format a lot easier and budget-friendly.These handwritten documents are non-editable in nature.In order to achieve easy editing, maintenance, indexing, retrieval and transfer of contents, researchers throughout the world strive to develop various Optical Character Recognizer (OCR) which are currently used to convert images of handwritten, typed or printed text into machine readable text.This machine readable text produced by OCR engines are easily editable and maintainable but the problem with these OCRs is that they are largely scriptspecific (the writing style or graphical form of a language is known as script).This has not been an issue till there are only single-script documents but as document storage became a regular practice among a large group of people, it became a necessity to store and process the multi-script documents as well.As OCRs are script-specific, multi-script documents require multiple OCRs to be converted into machine readable configuration.The problem of converting multi-script documents into machine readable format can be solved by introducing a new layer of workflow ahead of OCR feeding of documents which is known as Automatic Script Classification (ASC).The entire workflow for an automated multi-script document storage system is shown in Fig. 1.The difficulty of ASC can be easily realized by providing an overview of the vast set of languages that are currently used across the globe.According to Ethnologue catalogue of world languages, one of the best linguistic resources, currently there are 7,097 living languages used throughout the world [1].A large multitude of scripts are constantly being used while these languages are expressed through writings.The presence of these large multitude of scripts has made the process of ASC very complex.The problem is profound in highly multilingual countries.One of the best examples of such a country is India which has 23 officially recognized languages and around 150 other languages.So, script/language identification has huge importance to increase digital communication in the field of culture, research and language studies.The 23 constitutionally recognized languages in India are Bangla, English, Hindi, Punjabi, Marathi, Gujarati, Sindhi, Oriya, Assamese, Malayalam, Urdu, Telugu, Sanskrit, Tamil, Kannada, Nepali, Kashmiri, Maithali, Manipuri, Konkani, Bodo, Santhali, and Dogari [2].A total of 12 official scripts used to write the said Indian languages are: Devanagari, Gujarati, Gurumukhi, Bangla, Tamil, Kannada, Urdu, Telugu, Malayalam, Manipuri, Oriya, and Roman.
The identification of handwritten or printed scripts is a complicated process with a number of steps which includes pre-processing, segmentation, feature extraction and finally classification.
Depending on the mode of segmentation, the problem of script classification can be conceived at three different levels: (1) word-level, (2) text line-level and (3) block-level.The next step is the creation of a feature vector through some feature extraction strategies.Many algorithms have been employed to extract features from the pre-processed and segmented documents.
Each algorithm gives different feature vector which is then passed through a classifier for ASC.
Sometimes the size of the feature vectors used for the ASC process becomes significantly large.
Most of the times, such large feature vectors contain many redundant information which may decrease the overall classification accuracy provided by the classifiers.Even processing such large feature vectors results into huge time requirement.That is why before recognition, a FS algorithm can be used to keep the necessary and significant features which provide two advantages.Firstly, it extracts a near optimal set of features which improves the overall classification accuracy and secondly use of a reduced set of features decreases the load on the classifiers, thereby speeding up the recognition process.This fact motivates us to apply FS in the field of ASC for handwritten Indic scripts.There are mainly three categories of FS algorithms: filter [3,4], wrapper [5][6][7] and embedded [8][9][10].Filter methods use the statistical measures of various features to select the optimal feature subset whereas wrapper methods take help of a classifier to check the classification capability of a feature subset.Due to supervision of a classifier, wrapper methods usually require more time than filter methods but on the other hand wrapper methods are able to perform better classification than filter methods.Now-adays some of the researchers have proposed a hybrid version of both filter and wrapper versions which are known as embedded methods.For ASC, classification accuracy is lot more important than time requirement.Hence, we have focused on developing a hybridization of two popularly used wrapper models.
In this paper, we have proposed a hybridized version of Binary Particle Swarm Optimization (BPSO) and Binary Gravitational Search Algorithm (BGSA) known as Hybrid Swarm and Gravitation based FS (HSGFS) in order to get an optimal feature subset to be used for ASC from handwritten Indic script documents.Ebelhart and Kennedy in 1995 [6] created PSO which simulates the social behaviour observed in flocks of fish and birds.Over the years, PSO has gained huge popularity as a FS algorithm.In 2008, Rashedi and Saryazdi proposed GSA [7] to optimize solutions for single objective function.Contrasting to PSO, GSA works on the principles of Gravitational forces of Newtonian laws.PSO and GSA are two popularly used optimization algorithms but in our proposed model we have used their hybrid binary version to overcome the drawbacks of both the algorithms.A local search is also implemented within this hybrid version to improve the exploitation ability of the algorithm which is very useful to circumvent local optimal solution and reach the global optimal one.After getting the optimized output, the feature vector selected by HSGFS method is finally used to identify the script.The step-by-step procedure of handwritten ASC is represented in Fig. 2Fig .From the schematic representation presented in the figure, it can be observed that we have introduced a FS section (highlighted in red) to the existing workflow for ASC.• Applying the proposed FS model over handwritten datasets written in 12 Indic scripts at all the three different levels of ASC namely, block-level, text line-level and wordlevel.
• Comparing our proposed HSGFS procedure with existing FS models for handwritten ASC problem.
The rest of the paper is organized in 4 sections.Section 2 provides a brief description about some of the existing research works related to the domain of our work.The detailed explanation of our proposed FS model in ASC is provided in section 3. Section 4 consists of descriptions of the various experimentations we have performed to test and analyze the proposed method.
Finally, section 5 concludes our work and provides a brief overview of the future scope mentioning the possible extension of this work.

Related Study
This section contains brief discussion of some previously proposed methods in the domain of script identification and FS.At first, we have discussed various script classification techniques and the later part of this section contains description of some significant variants of PSO and GSA found in the literature.
In 2017, Singh et al. [11] developed some standard datasets of handwritten Bangla-Roman and Devanagari-Roman mixed-script document images.Modified log-Gabor filter (MLG) was used for feature extraction in order to develop bi-script (Devanagari-Roman and Bangla-Roman) and tri-script (Bangla-Devanagari-Roman) word-level script identification modules.
In 2017, Obaidullah et al. [12] presented a handwritten document image dataset at page-level named PHDIndic_11 having 11 officially recognized Indic scripts: Devanagari, Bangla, Urdu, Roman, Oriya, Gujarati, Gurumukhi, Tamil, Malayalam, Telugu and Kannada.The paper also contained the results for handwritten script identification (HSI).The authors used SL (Simple Logistics) and Multi-layer Perceptron (MLP) and their voting-based integration using average of probabilities in order to perform HSI.
Feature extraction was done by a combination of polygonal and elliptical approximation and MLP was found to be the best classifier for ASC.In 2016, Chaudhari et al. [14] performed script classification of Gujarati and English languages at word-level.In order to perform feature extraction, the directional energy distribution of a word was obtained employing Gabor filters having suitable frequency and orientations.Their proposed model used SVM classifier to perform classification.In 2017, Obaidullah et al. [15] analyzed the performance of ASC when input data were conceived at different levels, i.e. page, block, text-line/word-level.The same multi-script handwritten document images were considered at 4 different levels and mainly 2 kinds of features were considered, namely, Script-Independent Features (SIF) and Script-Dependent Features (SDF).Final classification was performed by MLP and Random Forest (RF) classifiers.In 2017, Goswami et al. [16] put forward a novel approach for separating Indic scripts based on the presence of 'Matra', which was used as precursor to simplify following HSI in multi-script environment.Two different scripts, Devanagari and Bangla, were considered as positive samples as 'Matra' is present there; and two other scripts, Roman and Urdu, were considered as negative samples for the experimentation as they do not have 'Matra'.After experimentation, Fractal Geometry Analysis (FGA) was found to be the best suited feature extraction methodology for script identification and RF classifier was the most appropriate classifiers among the classifiers used in the process.Singh et al. proposed a tree-oriented approach to perform recognition of 12 handwritten Indic scripts in [17].The authors separated the Matra and non-Matra based scripts using Distance-Hough transform (DHT) and then they identified each script individually using modified log-Gabor (MLG) based features.
In 2018, Mukhopadhyay et al. proposed a method [18] to combine classifiers in order to efficiently recognize scripts in multilingual environment.Combination of classifiers reduced the complementary nature imposed by different classifiers on the same pattern.This reduced the burden of selecting appropriate classifier for a particular pattern recognition problem.The classifier combination approach was applied to handwritten Indic script (word-level) database developed by the authors which was named as CMATERdb8.4.1and was made online.
In 2015, Singh et al. [2] provided a survey on the feature extraction and script identification techniques used for the classification of printed or handwritten Indic scripts.The survey gave a platform to encourage future research activities in the field of script classification.
In spite of its importance, till date FS in script classification has been least explored.A FS approach for script identification was first attempted in printed documents using ReliefF algorithm [19].In 2016, Das et al. [13] proposed a Harmony Search (HS) based FS procedure which was applied for HSI.In this approach, each candidate solution was considered as a musician.Just as musicians play various notes with different instruments and eventually find a perfect combination to get a harmony among the musical instruments, the candidate solutions were also fine-tuned and processed to achieve the most appropriate combination of frequencies (i.e. the final solution) which was then used to optimize the objective function.Thus, it can be seen that although FS can play an important role in ASC, yet it has not been attempted much by the researchers till now.
Throughout the years many optimization algorithms have been proposed which can be applied in the domain of FS.In 1995, Eberhart et al. [6] proposed two initial versions of PSO: GBEST (Global BEST) model and LBEST (Local BEST) model.GBEST model's candidate solutions (particles in case of PSO) use their own information as well as the information provided by the global best candidate to form their own solutions.Similarly, LBEST model particles produce solutions using their own information but instead of global best particle, they seek additional information from certain number of their neighbours.Canonical PSO utilises information obtained from only one neighbour, whereas in Full Information PSO (FIPSO) each neighbour acts as a source of information.Thus, Canonical PSO overlooks information provided by all the neighbours except one and FIPSO retains a lot of redundant features due to inclusion of too much information.To get rid of these limitations, Du et al. proposed an adequate-information version of PSO [21] in 2015 which was known as Limited Information PSO (LIPSO).LIPSO particles are influenced by top individuals of the swarm and the number of individuals influencing each particle may vary from particle to particle.In 2014, Cheng et al. incorporated social learning in PSO particles [22].According to this approach, any particle may learn or retrieve information from the particles which are better than the individual of consideration.In

2007, Ghamisi et al. proposed a hybridized model combining Genetic Algorithm (GA) and
Particle Swarm Optimization (PSO) [23].The authors introduced elitism among the particles in the swarm.The elites or top performing particles in swarm qualifies to reach the next generation after going through PSO.All the other particles are discarded and GA is performed on the elites to obtain other candidates for the next generation.
In 2009, Rashedi et al. presented an optimization approach [7] based on Newtonian Laws of Motions and interaction of masses.The search agents were considered to be collections of masses interacting among themselves obeying laws of motions which guided the agents towards optimal solutions.A binary version of the GSA or BGSA has also been developed [24] by the same authors, Rashedi and Saryazdi to solve FS problems [25].BGSA has been merged with Simulated Annealing(SA)to create GABSA [26].Use of SA in GSA increases the local search and hence improves its exploitation abilities.

Recently Ghosh et al. proposed an improved version of GA known as Histogram based Multi-
Objective GA (HMOGA) in [27].They have produced optimized feature vectors for multiple runs of GA and combined the results using histogram-based cut-off criteria.In [28] [9].The proposed model known as Wrapper-Filter ACOFS (WFACOFS) used both wrapper and filter methods to evaluate its candidate solutions thereby reducing the time requirement of the overall model (wrapper methods require more time to evaluate candidates than filter).Guha et al. proposed another level of improvement over HMOGA in [29] where they added memory to the existing technique to store the best candidate solutions generated over the iterations which are eventually lost in the process.The model was then applied on handwritten numeral classification datasets.
Therefore, from the literature review, it can be noticed that application of FS in handwritten digit or word recognition is quite well-addressed.For example, in [29,30], the authors have applied FS for solving the problem of handwritten Devanagari digit recognition whereas the work described in [31] implements FS for handwritten Bangla word classification.But, to the best of our knowledge, FS has rarely been used for handwritten ASC problem.This motivates us to abridge this research gap and propose a FS method for handwritten ASC problem.In this work, the proposed FS method is applied on three state-of-the-art feature extraction algorithms namely, DHT algorithm [32], Histogram of Oriented Gradients (HOG) [33] and MLG Transform [34].The proposed FS model is described in the next subsection.

Proposed Model
BPSO and BGSA are two well-known algorithms in the domain of FS.The first one is impressive in its exploitation abilities and the second one is rich in exploration.In order to attain a good exploitation-exploration trade-off, we have combined the methodologies of these two algorithms to create the new optimization method known as HSGFS.Even after combining these two algorithms, there remains a chance of premature convergence as both these algorithms tend to follow the global best in each iteration.Hence, we have implemented a local search within the algorithm to escape this convergence.

Hybrid Swarm and Gravitation based Feature Selection (HSGFS)
BPSOBGSA [35] is built combining the properties of both BPSO ( which signifies the social thinking) and BGSA (which brings about its significant exploration abilities).The same concept is utilised in FS by converting the value of velocity to probability of whether a feature should be selected or not.A binary string (  ) of size  is used to signify the feature selection status.A '1' and a '0' denotes that the feature is selected and not selected respectively.
N is the total number of particles,  is the index of the feature which is considered and  is the number of features, and   is a binary value determining if the ℎ feature is selected or not.
Each particle in the population is randomly initialised with '0' and '1's.
Each particle influences the others with its influence proportional to its own fitness.The fitness of a particle is evaluated from the classification ability of the features chosen by that particle.
The classification ability refers to the recognition accuracy of the candidate solutions (particles) obtained with the help of a trained classifier.The least performing particle (()) and the best performing particle (()) are utilised to derive the masses of the particles using Equation 2. The masses are then modified using Equation 3 so as to allow the masses of each particle to be proportional to its relative strength.  () is the fitness (recognition accuracy) of agent  at time .
=   () ∑   ()  =1 (3) , Gravitational constant, and   , the Hamming distance between two particles  and , are obtained as follows: α is the descending coefficient taken as 20 here.G0 indicates the initial gravitational constant which is taken as 1,  denotes the current iteration number, and _ is the total number of iterations we set.Each particle asserts a force on another particle following Equation 6.
In accordance with the laws of motion, the value of acceleration is found by dividing the force by the mass of the particle (Equation 8).The force derived from the value of mass and the Hamming distance allow the particles to be influenced more by better particles which are similar (less Hamming distance).This value is then used to modify the velocity corresponding to the particle as shown in Equation 9.It combines the velocity updating strategies of both GSA and PSO. 1 is the accelerating factor (Equation 10) and  2 is the velocity factor (Equation 11).
( + 1) =  *   () +  1 *   () +  2 * ( -  ()) ( 9) The value of velocity is interpreted as the probability of the feature being selected.The overall flow of the algorithm is represented in Fig. 3.The problem with BPSO-BGSA as developed till now is the relatively poor local search capability especially in GPS which is pointed out in [36].To account for this shortcoming, we introduce a local search method, where we perform a filter ranking in offline mode and utilise it in local search.In doing so, we add the top  ranked features into a particle and delete the lowest  ranked chromosomes we find in the particle.The ordered pair (, ) is generated randomly and lies between 1 and ((5 * )/100).
Another major problem of BPSO-BGSA is the lack of storage of the best feature sets.
Therefore, if the population degrades to a worse solution over the iterations, the better solutions are lost.To account for this short coming, a memory has been added to the algorithm to retain the best solutions produced over the iterations.The overall steps of the proposed HSGFS are stated below: Step 1: (  0  1).

Feature Extraction
The proposed FS method has been applied on three previously used feature descriptors such as DHT Algorithm [32], HOG [33] and MLG Transform [34] used for ASC.Since these feature descriptors have already been proposed earlier, so, here, these descriptors are described in brief.

Distance Hough Transform (DHT) Algorithm:
A feature vector consisting of 144 (72+72) attributes has been extracted using DHT algorithm.
The steps of implementation of the DHT algorithm, as proposed in [32], are given below:

Histogram of Oriented Gradients (HOG)
For object detection from images, the HOG descriptor was first proposed by Dalal and Triggs [33].This descriptor was applied for detection of pedestrian from static images.The essential thought behind the HOG descriptors is that local object appearance and shape within an image can be described by the distribution of intensity gradients or edge directions.This method is similar to that of edge orientation histograms, scale-invariant feature transform descriptors, and shape contexts.The only difference is that it is computed on a dense grid of uniformly spaced cells and uses overlapping local contrast normalization for improved accuracy.The algorithm for implementing HOG descriptors is as follows:

MLG Transform
MLG filter transform based features, as proposed in [34], is also considered as the one of texture feature descriptors for the classification of textual images based on the script in which it is written at three different levels viz., word-level, block-level and text line-level.In this work, a Windowed Fourier Transform (WFT) is taken into account for preserving the spatial information.The process of WFT consists of two steps.In the first step, the input image is multiplied with the window function whereas in the second step, the Fourier transform is applied to the previous step in order to get the resulting output.In short, WFT is mainly a convolution of the low-pass filter with the input image.MLG transform uses a Gaussian function as the optimally concerted function in both spatial as well as frequency domain [37].
In order to get the filtered images as output, the inverse Fourier transform is finally applied on the resulting vector.For the calculation of feature vector, two important measures such as energy and entropy features [38] are calculated from the MLG filter transformed images.Here, the number of scales (  ) is chosen as 5 (that is, ns=1, 2, 3, 4 and 5) and the number of orientations ( 0 ) is taken as 18 (that is, 10 0 , 20 0 , 30 0 , to 180 0 ).Hence, this produces a feature set comprising 180 elements for a given input image containing handwritten text.

Experimental Outcomes and Analysis
This section describes the experimentation results achieved using the proposed FS method and related comparisons with other popularly used algorithms applied to ASC of handwritten Indic documents.All the experimentations of the proposed method are implemented in MATLAB 2016a environment and tested on PC with Intel Core-i3 (5 th Gen.) CPU having 4 GB of RAM.
The classification accuracy, used to measure the performance of handwritten ASC, is calculated as follows:

Preparation of Handwritten Indic Script Database
We have prepared our own datasets for handwritten Indic script documents in the laboratory due to unavailability of the same in public domain.These document pages are collected from different writers who contributed their handwriting on A-4 sized white pages.We then scan the input script images at 300 dpi and save them in grey-scale form.Gaussian filter is then used to de-noise the noisy pixels in the collected images.Firstly, handwritten text blocks written in 12 Indic scripts, of pre-defined size 256x256 pixels are automatically cropped from the document pages.The extracted text blocks also have a chance of containing lines of varying size, thickness and white spaces between characters, lines or words.Instead of performing any homogenizing technique to compensate for this, we try to manually ensure that at least 50% of our input image region contains text.Fig. 4 shows samples of handwritten text blocks in 12 Indian scripts.In a similar manner, the words and then the test-lines are also extracted from the input document pages employing the techniques described in [39] and [40] respectively.
Finally, a set of 7200 text blocks (with exactly 600 text blocks per script), 3600 text lines (with exactly 300 text-lines per script) and 12000 text words (with exactly 1000 text words per script) written in 12 Indic scripts are prepared to evaluate the proposed HSGFS methodology for ASC from handwritten text obtained at block, text line and word level respectively.

Parameter Setting of HSGFS methodology
The performances of DHT algorithm, HOG and MLG Transform feature sets is observed by altering the size of population and number of iterations, the two most important parameters of our FS method called HSGFS.This is done to select the optimal values for these two parameters in the present context.Throughout the process of searching for the optimal values of the parameters, MLP has been used as the classifier.After initial experimentation, it is observed that setting population size as 20 and number of iterations as 15 gives good results.Then we have varied one of these values keeping the other one constant.Firstly, the number of iterations is kept constant (in present case, 15) while the population size is varied for all the three datasets.
Outcome of this experiment is graphically illustrated in Fig. 5.Then, the population size is kept constant (in our case, 20) while the number of iterations is varied which is depicted in Fig. 6.
It can be observed from Fig. 5 (a-c) that the optimal values of population size are 15, 20 and 20 for DHT algorithm, HOG and MLG Transform feature sets respectively.From Fig. 6 (a-c) it is also clear from that the optimal number of iterations is 15 for all three feature sets.
The results obtained from the variation of parameters indicate that our initial estimations of number of iterations being 15 and population size being 20 are optimal with minor variations.
Hence, we have used these values for population size and number of iterations for rest of the experimentations.

ASC performance results without using HSGFS methodology
Firstly, previously mentioned three feature sets have been used separately for classification of the Indic scripts on all the three types of datasets at three different levels without any FS (i.e. all available features are used for identification).Here, training and testing of handwritten script samples are done using three popular classifiers namely, MLP [41], k-NN [42] and SVM [43].
The script classification accuracies attained by these three classifiers, without using any FS 4.5 ASC performance results using HSGFS methodology

MLP Classifier
MLP classifier, described in [41], is used to measure the performance of the optimal feature sets produced by the proposed HSGFS methodology.The values of two parameters namely,  and  for MLP classifier are experimentally set to be 0.6 and 0.5 respectively and the classifier is made to run for 1000 epochs.The MLP classification model is taken as ′ − ′ − ′.Here,  ′ and  ′ are defined as the number of neurons present in the output and input layers which are taken as the number of output classes and the number of features considered here respectively.
The number of neurons present in hidden layer (denoted by ′) for the three feature sets is varied experimentally in order to achieve the optimal results.Appendix A contains the graphical representation of the various results obtained through alteration of number of neurons for MLP classifier.Then, we perform ASC on the previously mentioned datasets after performing FS using our proposed method HSGFS.are achieved for k=3.

SVM Classifier
Finally, SVM classifier with polynomial kernel is employed to measure the FS ability of the proposed HSGFS methodology on the three script datasets.

Summarization of performance results of HSGFS methodology
It is already clear from the above trial outcomes that the best feature descriptor is found to be MLG Transform as it showed the highest classification accuracies on all three script datasets.
For  3 that after applying the proposed HSGFS method, the sizes of optimal feature sets are found to be 132, 130 and 137 for block, text line and word-level datasets respectively.This means that the proposed HSGFS methodology selects only about 73%, 72% and 76% of the original feature vectors for the three datasets respectively.Moreover, increments of about 5%, 2% and 5% in the original classification accuracies are also noticed in case of the three datasets.

Performance comparison of HSGFS methodology with other well-known optimization algorithms
It can be seen that FS can increase the accuracy of the ASC system.But, there exist several previously proposed optimization algorithms which can be used to perform FS.So, in order to evaluate the efficiency of the present FS method i.e.HSGFS, we have provided a comparison of some popularly used FS methods such as GA, PSO, GSA, SA, HS with HSGFS when applied for solving ASC problem.We have used three classifiers for evaluation of our proposed method and MLP was found to provide the best classification accuracy.Hence, we have compared the FS performance of HSGFS with the other previously mentioned algorithms in terms of classification accuracies obtained by using MLP classifier.From Table 4, it can be seen that HSGFS outperforms other optimization algorithms at all the three different levels of ASC in terms of classification accuracy.[13].The same MLG Transform feature vector is implemented on the current datasets and classified using MLP classifier.It can be noticed from comparative analysis given in Table 5 that using lesser number of features, the present HSGFS methodology performs better than the HS based FS method for all the three levels of ASC.It can be witnessed from the experimental results that the proposed HSGFS method attains increments of around 5%, 2% and 5% with respect to when no FS method has been applied, and choosing only around 73%, 72% and 76% of the original feature vectors for the block, text-line and word-level datasets respectively.Our proposed HSGFS method is also found to perform better than other renowned optimization algorithms like GSA, PSO, GA, SA and HS.As a future scope, we can suggest to increase the number of datasets on which HSGFS is applied.Some other scripts apart from Indic scripts can be used to ascertain the applicability of the proposed technique.Moreover, as the proposed method is applicable to any pattern recognition problem, the proposed HSGFS model can be employed to solve other PR problems like facial emotion recognition, word identification, character recognition etc. to test its efficiency.

Fig 1 :
Fig 1: Schematic representation of the multi-script document storage system.

Fig 2 :
Fig 2: Schematic representation of the handwritten ASC system using FS procedure.
, Guha et al. proposed another updated version of GA where they replaced the mutation operation of GA with Great Deluge Algorithm (DGA) to improve the local searching capabilities of conventional GA.Like GA, Ant Colony Optimization (ACO) has gained massive popularity over the years as an efficient FS algorithm.Ghosh et al. introduced a wrapper-filter embedded version of ACO in

Fig. 4 :
Fig. 4: Samples of text blocks taken from our database written in 12 official scripts of India.

Fig 5 :Fig 6 :
Fig 5: Performance comparison illustrating the variation of classification accuracy with population size keeping number of iterations fixed on block, text line and word-level datasets for: (a) DHT algorithm, (b) HOG and (c) MLG Transform feature sets.
an important research topic in the area of ASC, researchers have not yet addressed it in the domain of handwritten Indic script identification.In this work, a new hybridized version of BPSO and BGSA, called HSGFS, has been proposed for implementing FS.The FS capability of the proposed method has been tested on three different feature sets namely, MLG, HOG and DHT at three different levels of script classification.
()is the mass related to agent  at time ,   is the mass corresponding to agent , () is a gravitational constant at time instant ,  is a very small positive value, and   () is the Euclidean distance between two agents  and  at time instant .   is the force between the two particles for the ℎ feature.The total force on a particle is calculated by Equation7for all the  particles. is a randomly generated number in the interval [0, 1].

Table 1 : ASC accuracies of three different original feature sets (without FS) using MLP, k-NN and SVM classifiers on block, text-line, and word-level datasets (maximum accuracy achieved at each level is marked in bold style) Feature vector Size of feature vector Level of classification Number of Training samples Number of Testing samples Classification accuracy without FS MLP classifier
scheme, are noted in Table 1.It can be witnessed from Table 1 that in case of DHT algorithm, k-NN classifier scores the highest classification accuracies of 91.62%, 92.16% and 87.25% at block, text-line and word-level datasets respectively.In case of HOG feature descriptor, SVM classifier records the highest classification accuracies of 93.16%, 94.55% and 89.6% at block, text-line and word-level datasets respectively.Whereas, for MLG transform, the highest classification accuracies of 93.15%, 95.8% and 91.34% are attained by MLP classifier at block, text-line and word-level datasets respectively.

Table 2 : ASC accuracies of three different feature sets after FS by HSGFS using MLP, k-NN and SVM classifiers on block, text line, and word-level datasets (maximum accuracy achieved at each level is made bold).
NN classifier on block, text line, and word-level datasets.The numbers of optimal features selected required to attain the best accuracies are 115, 118 and 120.For HOG feature set, the maximum classification accuracies of 97.02%, 95.42% and 93.05% are achieved k-NN classifier on block, text line, and word-level datasets.The numbers of optimal features selected for this case are 144, 118 and 149.Similarly, for MLG transform feature set, the highest classification accuracies of 97.93%, 96.63% and 95.83% are achieved using k-NN classifier on block, text line, and word-level datasets.The numbers of optimal features selected for this case are 146, 133 and 127.It can be clear that the MLG transform feature set again performs the best among all the three feature sets for k-NN classifier and the optimal results Table 2 illustrates that for DHT algorithm, the highest classification accuracies of 94.95%, 94.75% and 93.97% are achieved using SVM classifier on block, text line, and word-level datasets.The numbers of optimal features required to attain the best accuracies are 114, 117 and 116.For HOG feature set, the highest classification accuracies of 97.12%, 96.83% and 93.97% are achieved using SVM classifier on block, text line, and word-level datasets.The numbers of optimal features selected for this case are 142, 151 and 168.Similarly, for MLG Transform feature set, the highest classification accuracies of 97.58%, 97.08% and 96.73% are achieved using SVM classifier on block, text line, and word-level datasets.The numbers of optimal features selected for this case are 155, 134 and 137.Moreover, MLG Transform feature set once again performs the best among all the three feature sets when SVM classifier is applied.

Comparison with previously proposed FS techniques reported for ASC Since
, there is hardly any work found in the literature which uses FS for handwritten ASC problem, in the present work, we have compared the proposed HSGFS with one of the previously proposed works where Harmony Search (HS) based FS technique is reported

Table 5 : Performance comparison of the present HSGFS method with the method described in [13] HS based FS methodology proposed by Singh et al
[13]]