Feature selection algorithm based on P systems

Since the number of features of the dataset is much higher than the number of patterns, the higher the dimension of the data, the greater the impact on the learning algorithm. Dimension disaster has become an important problem. Feature selection can effectively reduce the dimension of the dataset and improve the performance of the algorithm. Thus, in this paper, A feature selection algorithm based on P systems (P-FS) is proposed to exploit the parallel ability of cell-like P systems and the advantage of evolutionary algorithms in search space to select features and remove redundant information in the data. The proposed P-FS algorithm is tested on five UCI datasets and an edible oil dataset from practical applications. At the same time, the P-FS algorithm and genetic algorithm feature selection (GAFS) are compared and tested on six datasets. The experimental results show that the P-FS algorithm has good performance in classification accuracy, stability, and convergence. Thus, the P-FS algorithm is feasible in feature selection.


Introduction
Membrane computing is a new branch of natural computing. Inspired by the way cells deal with chemicals and the structure of cells, Gheorghe Pȃun, first proposed the membrane computing model in Technical report of Turku Center for Computer Science, Finland, in 1998. The first paper on membrane computing (Pȃun 2000) was published in 2000. In recent years, the research of membrane computing (Krishna 2007) has developed rapidly and become one of the frontier research fields in theoretical computer science. Membrane computing mainly studies computational models abstracted from cells, tissues or organs, also known as P systems, which can be divided into three types: cell-like, tissue-like, and neural-like, has been successfully applied in various fields.
Cell-like P systems are abstracted from the structure and function of the cell, which is mainly composed of the membranes, objects, and evolution rules. Academician Gheorghe Pȃun introduced the formal definition and computability of cell-like P systems in detail. He proved that cell-like P systems have the same computing power as Turing machines and have great parallelism, distribution, and uncertainty. In 2006, Nishida (Nishida 2006) successfully solved the traveling salesman problem using membrane computing, showing that membrane computing can be used to solve the NP-hard problem. Since then, membrane computing has developed rapidly in the field of evolutionary computing and engineering (Xiao et al. 2012 ) ( Xiao et al. 2014).
With the rapid development of machine learning and data processing technology, the problem of dimension disaster (Dong et al. 2020) is more serious. This kind of problem can be solved by dimension reduction. Feature extraction and feature selection(FS) (Farahat et al. 2013) are two commonly used methods. Feature extraction maps feature space to smaller space. Feature selection reduces the number of features directly by selecting a feature subset with enough information through evaluation criteria. The method has been applied in many engineering fields, including network anomaly detection (Emiro and Eduardo 2014), facial expression recognition (Mlakar and Fister 2017), face detection (Pan et al. 2013), and medical applications(Vivekanandan and Ch 2017) et al. It is an NP-Hard problem. The simplest method to evaluate each feature subset is the exhaustive method which finally determines the optimal feature subset. However, there are many shortcomings, such as the time-consuming and high cost of the assessment. Using a metaheuristic algorithm( (Welikala et al. 2015;Singh and Singh 2019)) can avoid increasing the computational complexity of feature selection. Feature selection methods based on metaheuristic algorithms, such as particle swarm optimization (PSO) (Amoozegar and Minaei-Bidgoli 2018), ant colony optimization (Ghosh et al. 2019), artificial bee colony optimization (ABC) (Xue et al. 2018), differential evolution ( (Dong et al. 2018) is easier to apply to feature selection problems using binary coding (Xue et al. 2019). Raman et al. (Raman et al. 2017) used a genetic algorithm to build an intrusion detection system. Lin et al. (Lin et al. 2014) designed a genetic algorithm feature selection that can be used for image retrieval and classification. Das et al.(Das et al. 2017) combined double objective and genetic algorithm for feature selection. Inspired by membrane computing, this paper proposes a new feature selection method-P-FS algorithm, which uses the advantages of the GA algorithm in feature selection and the parallel characteristics of cell-like P systems to find the optimal feature subset. The P-FS algorithm and the GAFS algorithm are both tested on five UCI (Dheeru and Karra 2017) datasets and an edible oil dataset. The performance is verified from practical applications in terms of classification accuracy, the number of selected features, stability, and convergence.

Cell-like P systems
Cell-like P systems with degree m can be defined as: where: V is a nonempty finite alphabet, and the elements in the alphabet are called objects; O V is a collection of output objects; H is a set of membrane labels, H ¼ 1; 2; Á Á Á; m f g ; l is a membrane structure with m membranes, and m is called the degree of P; x i 2 V Ã 1 i m ð Þ , represents the multisets of objects in the region i of the membrane structure l, and V Ã is set of all multisets over V; R i 1 i m ð Þis the set of evolution rules in each region i of membrane structure l; i 0 is the label of the output membrane of the membrane systems, i 0 2 H.
When the cell-like P system starts running, the objects in each membrane evolve according to evolutionary rules. The rules are executed in parallel, and the system terminates when there are no rules available in the system.

Genetic algorithm
GA is a metaheuristic algorithm, which belongs to the category of evolutionary algorithms (EA). It usually uses biological heuristic operators, such as mutation, crossover, and selection, to generate high-quality optimization and search solutions. GA for feature selection includes initialization of population, evaluation of individual fitness, selection, crossover, mutation, and end condition judgment. The flow chart of the algorithm is shown in Fig. 1.
Population initialization: set the number of iterations and initialize to generate n chromosomes. A chromosome is represented by a binary string composed of 0 or 1. The length of the chromosome is the dimension of the dataset (the total number of features in the dataset). 0 means that the feature is not selected, and 1 means that the feature is selected.
Evaluation of individual fitness: fitness function is used to calculate the fitness value of the feature subset corresponding to each chromosome. In this paper, the Support Vector Machine (SVM) algorithm (Cortes and Vapnik 1995) is used to calculate the fitness value.
Selection: different fitness values of chromosomes lead to the different probability of being selected. This paper uses the roulette method to select chromosomes. Crossover: two chromosomes exchange genes in a certain position according to a certain probability to produce new chromosomes.
Mutation: a gene in a chromosome changes from 0 to 1 or from 1 to 0 according to a certain probability to produce a new individual.
Meet termination conditions: when the current number of iterations is equal to the maximum number of iterations, the algorithm ends running.

Feature selection algorithm based on P systems
GA has a powerful search ability in feature selection, but it is computationally complex and takes a long time to process high-dimensional data. At the same time, the frequency of mutation is certain, and the effect of using the mutation factor to jump out of the local optimum has volatility. Due to the ability of parallel processing in P systems, the P-FS algorithm is proposed. Using the computing rules of cell-like P systems and GA to select features has a strong ability for global search, which is helpful to jump out of the global optimum.

Algorithm design
The designed P-FS algorithm adopts the membrane structure of cell-like P systems. Its structural form is defined as: where: (1)V is a nonempty finite alphabet whose objects are the feature subset corresponding to each chromosome in the genetic algorithm; (2)O V is the output alphabet and the output algorithm results; (3)H is a set of membrane labels, H ¼ 1; 2; 3; 4 f g ; (4)l is a membrane structure with m membranes, m ¼ 4; represents the multiset of objects in a region i in membrane structure l, corresponding to the initial binary chromosome populations in membrane 3 and membrane 4; (6)R i 1 i m ð Þis a set of evolution rules in each region of membrane structure, including computing fitness values for chromosomes, the selection, crossover, and mutation in GA, and the communication rules for transferring objects in membrane to adjacent regions; (7)i 0 is the label of the output membrane from the membrane systems, i 0 ¼ 2.

Evolution rules
In the cell-like P system, objects in each region evolve according to their own evolutionary rules and the execution of evolutionary rules in parallel. The system will terminate if there are no rules to execute in the system. In Fig. 1, there is no operation in membrane 1, but only the chromosome transferred from membrane 2 is recovered. As the main membrane, membrane 2 receives the chromosome population and corresponding fitness value transmitted by membrane 3 and membrane 4. Then, it transfers the population containing the best fitness value to membrane 3 and membrane 4, and the other population to membrane 1. Membrane 3 and membrane 4 mainly search the space globally to find the region where the optimal solution is located.
The chromosomes in membrane 3 were updated according to selection, crossover, and mutation. At each iteration, the chromosomes are sorted according to the fitness value from greatest to smallest. After sorting, the population and the optimal fitness value are transferred to membrane 2. The evolution rules of membrane 3 are as follows:

Begin
where: k is the number of the population in membrane and the number of the population in membrane 3 and membrane 4 is equal; C t 0 31 C t 0 32 Á Á Á C t 0 3k is the sequence of chromosome selection, crossover, mutation, and sorting in membrane 3 when the number of iterations is t; f t 0 31 ; f t 0 32 ; Á Á Á; f t 0 3k is the fitness value of chromosomes in membrane 3 when the number of iterations is t.
The chromosomes in membrane 4 were updated according to selection, crossover, and mutation. At each iteration, the chromosomes are sorted according to the fitness value from greatest to smallest. After sorting, the population and the optimal fitness value are transferred to membrane 2. The evolution rules of membrane 4 are as follows: where: C t 0 41 C t 0 42 Á Á Á C t 0 4k is the sequence of chromosome selection, crossover, mutation, and sorting in membrane 4 when the number of iterations is t; f t 0 41 f t 0 42 Á Á Á f t 0 4k is the fitness value of chromosomes in membrane 4 when the number of iterations is t.
In membrane 2, the optimal fitness values transmitted from membrane 3 and membrane 4 were sorted from greatest to smallest, the populations corresponding to great fitness values were transmitted to membrane 3 and membrane 4, and the populations corresponding to small fitness values were transmitted to membrane 1. The evolution rules of membrane 2 are as follows: where: f t 0 1 f t 0 2 is the sequence of fitness values after sorting when the number of iterations is t. C t 0 1 C t 0 2 Á Á Á C t 0 k is the population corresponding to f t 0 1 when the number of iterations is t. C t 0 1 0 C t 0 2 0 Á Á Á C t 0 k 0 is the population corresponding to f t 0 2 when the number of iterations is t.
The structure of the algorithm is shown in Fig. 2, and the flow chart of the algorithm is shown in Fig. 3.
In Fig. 2, the initial object C 31 C 32 Á Á Á C 3k and C 41 C 42 Á Á Á C 4k are the chromosome population composed of 0 or 1.
The length of the chromosome is equal to the characteristic number of the data set. r 21 is the sorting rule, and r 22 is the exchange rule of membrane 2 and membrane 1, membrane 3 and membrane 4. r 31 And r 41 are the selection, crossover, variation and fitness calculation rules. r 32 is the exchange rules for membrane 3 and membrane 2. r 42 the exchange rules for membrane 4 and membrane 2. When the system starts to run, there are initial objects in membrane 3 and membrane 4 and the rules can be run. Therefore, the rules in membrane 3 and membrane 4 are executed in parallel. The population executes the selection, crossover and mutation rules to obtain a new species group. Then, according to the position and number of 1 in the chromosome, a feature subset equal to the number of the population is obtained. For each feature subset, SVM is used to calculate the fitness. Then the fitness and the corresponding chromosomes are sorted. Finally, membrane 3 and membrane 4 send the optimal fitness value and population to membrane 2 at the same time. The rules in membrane 2 are implemented. Membrane 2 sorts the optimal fitness transmitted from membrane 3 and membrane 4. It sends the larger fitness and its corresponding population to membrane 3 and membrane 4 at the same time and sends the smaller fitness and its corresponding population to membrane 1. At this time, the objects in membrane 3 and membrane 4 can continue to execute according to the rules. When the system meets the number of execution steps, the output objects of the system are the larger fitness and the corresponding population in membrane 2.

Dataset
The purpose of feature selection is to reduce the dimension of dataset and remove the redundant information of dataset. The dimension of the dataset is equal to the number of features in the dataset. The features of the dataset have actual physical meaning. Feature selection can reduce the dimension of the dataset without changing the physical meaning. The selection of feature subset is a NP-hard problem. The number of features in the dataset has a great impact on the computational complexity of the algorithm. Select datasets with different dimensions to observe the impact of data dimensions on the model. We collected five datasets from different fields of the UCI Machine Learning Repository and used laser-induced fluorescence technology to collect the fluorescence spectrum data of edible oil. To calculate the fitness, we randomly divided each dataset into a training set and a validation set according to 3:1. Table 1 shows the statistical data of the six datasets. The number of features in Oil is 2048, indicating the fluorescence intensity at different wavelengths. The number of features in Gas is 128, which represents the reaction results between the sensor surface and chemical substances at different times. The number of features in Musk is 166, indicating the shape or conformation of different molecules. The number of features in Sonar is 60, which represents the bounce sonar signal obtained on the metal cylinder from different positions and angles. The number of features in Ulc is 147, which represents the image information of land cover in different cities. The number of features in Wine is 13, indicating the chemical analysis results of different wines. Feature selection of the above six data sets can analyze the impact of different features on the model performance and remove redundant features. In Table 1, the six datasets contain a significantly different number of features, categories, and total data, which can verify the performance of the P-FS algorithm on different types of datasets.

Evaluation metrics
To prove the efficiency of the proposed method, the confusion matrix of six datasets was calculated, which inclu- The P-FS algorithm and GAFS algorithm are evaluated according to the following metrics.
Classification Accuracy: Classification Accuracy refers to the percentage of correctly classified samples compared to the total samples. Equation 6 is the classification accuracy of the method calculated according to the confusion matrix.
Feature Reduction Rate (FRR): FRR refers to the percentage of the number of features that are not selected to the number of original features. Equation 7 is the formula for calculating the feature reduction rate.
F1_Score: For a certain classifier, F1_Score is a judgment indicator that combines Precision and Recall.
Receiver Operation Characteristic Curve (ROC): ROC is a curve drawn according to (1-specificity) as abscissa and sensitivity as ordinate. Equation 11 is the calculation formula of specificity, and Eq. 9 is the calculation formula of sensitivity.

Performance of the algorithm
The P-FS algorithm and GAFS algorithm are tested on six datasets. The parameters are shown in Table 2. To compare the performance of the two algorithms, the number of chromosomes, the number of iterations, and mutation rate are set to be the same. Table 3 shows the experimental results of the P-FS algorithm and GAFS algorithm. The number of internal iterations is 100, and run 100 times. Table 3 includes the optimal accuracy (OPA), worst accuracy (WOA), average accuracy (AVA), and standard deviation (STD) in 100  times. Figure 4 shows the iterative diagram of the optimal accuracy of the P-FS algorithm and GAFS algorithm. From Table 3, it can be concluded that the optimal accuracy of the P-FS algorithm and the GAFS algorithm is the same on the Gas, Musk, Sonar, and Wine datasets, and the P-FS algorithm is higher on the Oil and Ulc datasets. The average accuracy of the P-FS algorithm is higher than that of the GAFS algorithm on six datasets, proving that the performance of the P-FS algorithm is better. The worst accuracy of the P-FS algorithm is higher than that of the GAFS algorithm on six datasets, indicating that the P-FS algorithm improves the lower limit of the algorithm. The standard deviation of the P-FS algorithm is lower than the GAFS algorithm on six datasets, indicating that the stability of the P-FS algorithm is better than that of the GAFS algorithm. Figure 4 shows that both the P-FS algorithm and the GAFS algorithm have convergence. Compared with the GAFS algorithm, the P-FS algorithm converges faster. In addition to Musk dataset, the P-FS algorithm can find the optimal value first on the remaining five datasets, indicating that the generalization ability of the P-FS algorithm is stronger than the GAFS algorithm. Table 4 is the feature reduction rate (FRR) of the P-FS algorithm and GAFS algorithm under the optimal accuracy. Table 4 includes the total number of selected features and features reduction rate by the P-FS algorithm and the GAFS algorithm. Table 4 shows that both the P-FS algorithm and the GAFS algorithm can effectively reduce the original features and achieve the purpose of data processing. Except for the Gas dataset, the FRR of the P-FS algorithm is higher than that of the GAFS algorithm, which shows that the P-FS algorithm has a stronger ability to remove redundant information than the GAFS algorithm. Table 5 shows the F1_Score of the P-FS algorithm and GAFS algorithm, and the value range of F1_Score is [0,1]. F1_Score of 0 indicates the worst performance of the algorithm, and F1_Score of 1 indicates that the algorithm has the best performance.
In Table 5, the F1_Score of the P-FS algorithm and GAFS algorithm is equal on Musk, Sonar, and Wine. F1_Score of the P-FS algorithm is higher on the remaining three datasets, which proves that the performance of the P-FS algorithm is better than the GAFS algorithm. Figure 5 shows the ROC of the P-FS algorithm and GAFS algorithm, AUC in the figure is the area of ROC, and the value range of AUC is [0.5,1]. The higher the AUC value, the better the performance of the algorithm. In Fig. 5, the AUC of the P-FS algorithm and GAFS algorithm is equal on Musk, Sonar, and Wine, and the AUC of the P-FS algorithm is higher on the remaining three datasets. At the same time, the AUC of the P-FS algorithm on six datasets is greater than 0.9, which proves that the accuracy of the P-FS algorithm is better than the GAFS algorithm.
P-FS has the parallel capability, and can support two GAFS to calculate and exchange information at the same time. From the above experimental results, it can be seen that the average accuracy of P-FS is higher than that of GAFS, indicating that P-FS has stronger search ability. At the same time, the standard deviation of P-FS is lower than that of GAFS, indicating that P-FS has better stability. And P-FS is better than GAFS in other performance indexes, which shows that the parallelism of P-FS improves the performance of the algorithm.

Discussion
In this section, we mainly discuss the advantages and applicability of the P-FS algorithm. We have shown the advantages of our P-FS algorithm in feature extraction. From experiments, on the one hand, we can see that the P-FS algorithm uses the parallel processing ability of celllike P systems to expand the searchability of feature space, and at the same time uses the communication between membranes to search the optimal region faster. On the other hand, the mutation factor can help the algorithm jump out of the local optimum and improve the ability of feature space search. In addition to the advantages mentioned above, our P-FS algorithm also has some limitations. Firstly, the initialization of the population has a great influence on the search results; Secondly, the algorithm takes a lot of computing resources and takes a long time; Finally, the algorithm only uses the fitness value as the evaluation standard, and the number of features selected is more, but the less the number of features selected, the better.
In future work, we plan to optimize our P-FS algorithm from the following aspects: first, the initialization method. In the experiment, we find that the initial subset has a great influence on the final result of the algorithm. Later, the  author wants to initialize the population of the algorithm by the filtering method to improve the stability of the algorithm. Secondly, the algorithm is only tested on six datasets and only compared with the performance of the GAFS algorithm, the work is relatively small. In the future, the author will use more public datasets and fitness functions to verify the feasibility of the algorithm as comprehensively as possible and compare the performance with other feature selection algorithms to study the advantages of the proposed algorithm. Finally, the kernel of the method proposed by the author is GA. Later, the author hopes to design a feature selection algorithm to simulate the membrane structure, which provides a new idea for the application research of membrane computing.