Determining Chemical Reaction Systems in Plasma-Assisted Conversion of Methane Using Genetic Algorithms

Even for processes with only a few gas species involved the detailed description of plasma-assisted conversion processes in gas mixtures requires a large amount of processes to be taken into account and a large number of neutral and charged particles must be considered. In addition, setting up the corresponding reaction kinetics model needs the knowledge of the rate coefficients and their temperature dependence for all possible reactions between those species. Reduced reaction networks offer a simplified and pragmatic way to obtain an overall reaction kinetics model, already useful for the analysis of experimental data even if not all details of chemistry can be covered. In this paper we present a derivation of a data driven reduced model for plasma-assisted conversion of methane in an helium environment. By consideration of a small number of elementary reactions, a simple model is set up. Experimental data are analyzed by a genetic algorithm that provides best-fit approximations for the open parameters of the model. In a further step non-relevant parameters of the model are identified and a further model reduction is achieved. The data driven analysis of methane conversion serves as an illustrative example of the proposed method. The parameters and reaction channels found are compared with known results from the literature. The method is described in detail. The main goal of this work is to present the potential of this data driven method for a simplified and pragmatic modeling in the increasingly important field of plasma-assisted catalytic processes.


Introduction
Reaction kinetic models allow far-reaching predictions of complex chemical processes. The description on the basis of reaction kinetics has a tradition of decades and an enormous amount of literature exists on this topic and special applications. A starting point for the reader to get more information on this might be Refs. [1,2] and the references therein. Still, the determination of rate coefficients is in the center of attention when creating reaction kinetic models. Although several quantum mechanical approaches allow for a number of conclusions, it is not always possible to build a complete reaction kinetic model completely on the basis of ab initio theories. Furthermore, extensive statistical analysis of experimental data is required to support or supplement theories. This is mostly complicated by large numbers of species and reactions involved. The present work discusses an approach for data driven analysis of experimental data of plasma-assisted processes, which allows to derive reduced models in a simple way. When selecting a data evaluation method it is essential to know whether a detailed knowledge of the species involved is available or whether only a few of the reactants and products involved can be observed. Furthermore, it makes a decisive difference whether a time series of concentrations of individual species is available or only stationary states can be measured. In order to cope with these difficulties, we focus on the application of genetic algorithms to analyze the experimental data. The flexibility and robustness of the method is reflected by an enormous number of publications from various scientific disciplines on the application and concrete design of genetic algorithms. Useful references on the basic ideas and practical implementation are given by Refs. [3][4][5][6][7]. Examples of applications in the field of reaction kinetics can be found in Refs. [7][8][9][10][11][12][13][14][15][16]. Especially Lapene et al. [16] give a very good overview of applications and methods in chemical physics. These references are given to name just a few and without claiming to be exhaustive. The particular details of the method depend on the problem at hand. In the present data driven analysis we face the problem of being able to track only a small number of species in the plasma-assisted conversion of methane, but with a rather good time resolution. In addition, data are available from different experiments in which individual reaction parameters were varied. The goal is, on the one hand, to find a suitable set of only a few reactions to describe the conversion process, and on the other hand, to find the appropriate rate coefficients. Both can be achieved by the genetic algorithm that is presented. In Sect. 2 the concrete experiment and the data available are described. Then, in Sect. 3 the details of the method are presented, i.e. how the reaction kinetic model is set up and how the genetic algorithm is constructed. Sect. 4 contains numerical results and Sect. 5 a discussion of the possible conclusions for further steps in model refinement. In Sect. 6 the main results are summarized.

Experiments on Plasma-based Treatment of a Methane Containing Gas Stream
Recently, experiments were conducted to study a pretreating of hydrocarbon exhaust gas using a plasma process [17]. The intent was to study a process in which CH 4 is oxidized into CO and CO 2 while simultaneously consuming existing O 2 , therefore providing a kind of gas cleaning. For this purpose a radiofrequency (RF) atmospheric pressure plasma was generated in a plug flow reactor and various gas mixtures of O 2 , CH 4 and He were fed into it. Helium was the dominant species, so that O 2 and CH 4 were only present in high dilution. Infrared spectroscopy was used to analyse the plasma conversion. This high dilution has been chosen to keep the number of relevant species small, as secondary reactions/polymerizations can be expected to be unimportant. By this means it was possible to monitor CH 4 as well as the reaction products CO , CO 2 and H 2 O . Details on the experimental setup used and discussion of the experimental data can be found in [17] and references therein. In this paper, however, the aspect of data analysis will be in the foreground and the measured data 1 3 will be taken as given and not further questioned. The special characteristics of the available data can be described as follows: • Complex plasma-assisted processes involving several species can be adjusted by welldefined initial conditions and power input. • Well equipped experimental arrangements allow the measurement of time courses of individual species. • Despite the wealth of information obtained by varying mixing ratios and plasma power, many species and parameter ranges are not experimentally accessible.
From this general point of view, the evaluation of existing experimental data resembles a very common situation, which is why the method discussed here may also be of importance for a number of other experiments, e. g. in plasma-assisted catalysis, where the surface species on the catalyst are usually not easy to access. The concrete data to be analyzed in this work are partial time traces of CH 4 , CO , CO 2 and H 2 O which have been compiled in various RF discharges where the plasma power and the gas mixture have been varied. These data are supplemented by measurements of concentrations at only a single specific point in time, but obtained for several values of plasma power and initial gas composition.
In total a number of 269 data points are available from the campaign reported in [17] containing scattered data of the concentrations of CH 4 , CO , CO 2 and H 2 O at different times, plasma power and feed gas admixture.

Selection of the Reaction Network
First, a selection of the reactants and products to be considered is made. From the combinatorial possibilities of all unimolecular and bimolecular reactions, those processes are selected that are stoichiometrically possible. In our example, a reaction network is built up for a number of N s = 9 species X s , s = 1, … , N s , namely CH 4 , O 2 , O , CO 2 , CO , H 2 O , C , H 2 and e − . According to the stoichiometric selection rule a number of N r = 42 reactions are possible which are listed in Table 1. The corresponding rate laws for the species densities [X s ] can be written in the compact form The stoichiometric matrix is given by the components s,r and the rate coefficient for a particular reaction is denoted by k r . Only a part of the densities [X s ] are experimentally accessible and at the beginning of the evaluation we do not make any assumptions about the rate coefficients k r . Inspection of the reactions of Table 1 shows that the species densities obey the conservation laws

3
The conservation of electrons is due to the fact that it is assumed that they participate in electron impact dissociation only. It is to be noted that the combustion of methane is a well-known system [18][19][20][21] with typically very many species and reactions. Reaction pathways have also been studied extensively for plasma-assisted conversion including a large number of reactions and species [22][23][24][25][26]. In our system, however, the dilution is very high and the residence time is short. We therefore, restrict ourselves to only primary reactions. Reactions involving charged species such as ion molecule reactions or electron ion recombination are neglected due to the very low charge carrier density of 10 11 cm −3 in comparison to the neutral reactive species densities of the order of 10 17 cm −3 . This model does not pretend to be complete, but it exhibits important characteristics of the observed processes. In the end, it represents a first and certainly arbitrary step towards a pragmatic model that is to be found here.

Genetic Algorithm for Extraction of Rate Coefficients
The goal of our numerical calculations is to find rate coefficients k r that bring the results of the integrated rate equations of Eq. 1 in the best possible agreement with the experimental data. This represents a minimization problem where the difference between calculated densities and the experimental results should be minimized for all investigated time points, plasma power and starting conditions simultaneously. To solve this problem, we use a genetic algorithm which considers a population of N chromosomes which carry a number of M genes. The chromosomes are represented by vectors i , i = 1, … , N with components y i,j , j = 1, … , M . Each chromosome i is a candidate solution for the minimization problem considered and the genes represent model parameters to be optimized. In our case where rate coefficients are the model parameters, the components y i,j are given by the relations This choice ensures the positivity of the rate coefficients k i,j for any y i,j . The number of model parameters equals the number of reaction rate coefficients, M = N r , such that k i,r is the ith candidate solution for the reaction rate k r in the rate laws of Eq. 1. The fitness F i of the ith chromosome is defined by where a is an appropriate normalization constant and f i is a positive definite functional representing the deviance of our reaction kinetic model Here, ĉ q denotes a measured density of some observable species at a particular time and for particular experimental conditions. The density c iq is the corresponding numerical result obtained by integration of the rate laws of Eq. 1 using the ith candidate solution k i,r , r = 1, … , N r for the rate coefficients. In total a number of N c constraints (experimental data points) is used to define the fitness functional. The factor w q is a weight that makes it possible to assign greater importance to certain experimental points. For example, the choice of a weight w q = m , where m is an integer, would be equivalent to the m-fold repetition of an experiment, providing always the same result. An accumulated probability function P i is introduced by This means that if chromosome-indices i are picked randomly via the rule P i−1 < r ≤ P i , with uniform random numbers 0 ≤ r ≤ 1 , the resulting distribution of indices reflects their fitness, i. e. more chromosomes with high fitness are selected. The practical computation then begins with a first guess for the rate coefficients k r and the evolution to the next generation of chromosomes, i. e. a refinement of solutions, consists of the following steps indicated by roman numerals I to V. The populations i , i = 1, … , N resulting from the modifications of a particular step are also labeled by (I), ..., (IV).
Step: I Hierarchy The chromosomes i are sorted with respect to their fitness values F i and a resorted vector F (I) i is constructed following the ordering: This re-ordering defines the intermediate population (I) i .
Step: II Selection In the selection step the lower half of the population (the solutions with small fitness) is removed and the upper half is cloned. For the resulting population of chromosomes (II) i an accumulated probability function P (II) i is computed according to Eq. 9 for the fitness distribution F (II) i . The selection can be expressed formally by writing Step: III Crossover First, the two best fit chromosomes are copied into the next generation of offsprings.
To obtain further N − 2 offsprings, pairs are picked out by taking into account the fitness probability. However, the first partner in the pairing process is always the chromosome with the best fitness (II) 1 . The second partner (II) n is chosen randomly according to the probability P (II) n , but inbreeding, i. e. n = 1 is avoided. Then a crossover for a particular pair (II) 1 and (II) n takes place with a probability p c . The gene index m for crossover is chosen randomly in the range 1, … , M and offspings (III) 2j−1 and (III) 2j , j = 2, … , N∕2 , are obtained via the interpolating crossover rule [7,16] The interpolation parameter is taken as r = 1 for l ≠ m , but for l = m it is sampled from a uniform distribution on the interval [0,1]. Note, that the choice r = 0 for l = m would result in a simple exchange crossover.
Step: IV Mutation For each chromosome (III) i , i = 2, … , N (again i = 1 is excluded to keep the best fit chromosome), a mutation takes place with a probability p m in a single gene. The particular gene index m for mutation is chosen randomly in the range 1,...,M and the mutation is realized by an incremental change of the component y (III) i,m [6,16] The values of H m and L m denote prescribed upper and lower bounds for the mth model parameter. A uniform random number −1 ≤ r ≤ 1 and a prescribed increment 0 < Δ ≤ 1 are used to compute the change in the gene.
Step: V Update The resulting population (IV) i is the new generation. The assignment (IV) i ⟶ i completes one evolutionary step and the procedure begins anew until a certain convergence criterium is fulfilled.

Parallelization Issues
It is in the nature of genetic algorithms that an increase in computational effort is often accompanied by an improvement in results or an acceleration of the computation. A large population allows a greater variance of possible solutions and this can lead to a faster and more extensive search in the solution space. Advantageously, parallel computer architectures can be used very well for this purpose, since relatively simple parallelization strategies are possible. One method which is easy to implement is the island model [27], where the algorithm sketched in Sect. 3.2 is applied not only to a single population of N individuals, but to a group of populations that are considered to live on different islands. In practice the islands are different hardware processors. In our computations we consider a number of N i populations, i. e. a number of N i processors, each group consisting of N individuals. For N g generations the islands are isolated and the algorithms are used in an identical way on each island. The only difference is in the sampling of random numbers. Each island generates its own chain of random numbers which differs from all other islands. This gives diverging results for the best fit search results on the islands. After N g evolutionary steps all individuals from all islands are evaluated together and a global ranking is established. Then the two best individuals of the global ranking replace the two best individuals of each island population. After that the computation proceeds, again with different random numbers on each island. This minimizes the communication between the processors and introduces a considerable extension of variants in the global population. This implementation is easy and has been realized by MPI routines [28].

Inclusion of Plasma Effects
The experimentally observed plasma effects are taken into account by assumed electron impact dissociation processes. Therefore, the electrons are an additional species in the model and a fundamental part of the theoretical analysis. However, the electron density is hard to determine experimentally. To find a reasonable description of the impact of electrons a linear relationship between the discharge power P and the electron density [e − ] is used. This means, that the number of electrons follows a simple discharge characteristics and no saturation effects occur, where the energy of electrons might be fed into other reaction channels for higher electron energies. Such modifications would be possible by introduction of additional model parameters to describe a more complex relation between n e and P, but this will be analysed in future studies.

Numerical Setup for a Reference Case
As mentioned in Sect. 2 and Sect. 3.1, our numerical study considers a number of 269 experimental data points, representing densities of CH 4 , CO , CO 2 and H 2 O at different times, plasma power and feed gas admixture. The numerical regression model is based on an assumed 1 3 network with N s = 9 species and N r = 42 reactions. For the genetic algorithms a number of N = 20 chromosomes, each with M = 42 genes defines the population of a single island. The computations are performed with N i = 8 processors until no significant change in the globally best fitness is detected. The number of 20 chromosomes used in the computations is just a compromise between acceptable computational effort (low N) and quality of parameter space exploration (high N). After several numerical tests it was found that N > 20 does not change the results anymore, i. e. the best fit quality could not be increased. During the computation the islands communicate for the global ranking every N g = 100 iterations (generations). The probabilities for crossover and mutation are chosen as p c = 0.6 and p m = 0.9 . The upper and lower bounds for the model parameters y i,j are chosen as L j = −15 and H j = 5 for all i = 1, … , N and j = 1, … , M . Therefore, the rate coefficients are kept in a range 10 −15 ≤ k r ≤ 10 5 . Here, dimensionless coefficients k r result from a scaling of densities and time in the calculations. The species densities are scaled by a reference density n 0 = 6.0 ⋅ 10 16 cm −3 , which is equal to the nominal value of the density [CH 4 ] at the inlet, and the time coordinate is scaled by the residence time t 0 = 6.92 ⋅ 10 −2 s , which is of the order of the typical residence time of the plasma in the experimental device, where v = 0.3 m/s is the flow speed and the plasma volume has a length of about 26 mm (see Ref. [17]). This gives values for the scaled densities and times of the order 1. As a starting guess for the rate coefficients just a small number k r = 10 −5 has been used for all reactions r = 1, … , N r . The initial condition for the integration of the rate laws of Eq. 1, is prescribed by the amount of CH 4 and O 2 . All other densities are initially set to zero. The numerical integration of the rate laws is done with the Fortran-Routine DVODE from the package ODEPACK [29]. Prescribing all rate coefficients, the initial conditions and the time point of observation allows to obtain the values for the densities c iq in the cost functional Eq. 8 and to evaluate the fitness of each chromosome in the population. The experimental data are weighted with w q = 1 . Then one goes through all steps of the algorithm as described in Sect. 3.2. The Fig. 1 shows the distribution of numerically found data points compared to the experimental ones. A perfect match would lie on the straight line, but due to uncertainties in the measurements actually this would mean overfitting. Therefore, the regression gives a scattered distribution close to the perfect match. To quantify the quality of the regression the following parameters are introduced Fig. 1  where Actually they all consider just the sum of squared residuals (c q −ĉ) 2 , but in each case a different normalization factor is used to get dimensionless measures. Note that 1 − SSR 3 is identical with the so-called coefficient of determination R 2 . The plot in Fig. 1 illustrates the final distribution of numerical results vs experimental data, therefore the convergence of the algorithm. The results are shown and analysed in detail in Figs. 2 and 3. The corresponding measures for the residuals are SSR 1 = 0.16 , SSR 2 = 0.03 and SSR 3 = 0.06 . All of these measures demonstrate a fairly good quality of the regression to fit the data points. (16)

Regression of Experimental Data
The results of the regression with the numerical setup of the reference case described in Sect. 4.1 and shown in Fig. 1 Table 1 are used without further weighting (all w q = 1 ) or other constraints. Obviously, the raw model works reasonably well to fit the data. However, it has been observed, that for some different starting guesses k i,r some slight conversion can take place in the absence of plasma (top left figure in Fig. 2). It is assumed that this is not really reflecting the experiments, even though a detailed proof is missing. Therefore, an additional information is taken into account: The model should not show any CH 4 -conversion if no plasma is present. As a consequence, the reactions R 1 , R 8 , R 9 and R 10 of Table 1 should be discarded, i. e. k 1 = k 8 = k 9 = k 10 = 0 . According to the experimental trends it is also assumed that even for high plasma powers a finite amount of CO 2 , CO and H 2 O still remains. This requires that the processes R 36 , R 37 , R 38 , R 39 and R 41 of Table 1 are not involved, i. e. k 36 = k 37 = k 38 = k 39 = k 41 = 0 . This reduces the number of possible reactions in the model to N r = 32 . We will call this set of reactions the "constrained model", even though it is just an assumption of expected trends. At this point we would like to note that the numerical algorithm also yields negligible rate coefficients for the corresponding processes, if the weights for the data points corresponding to plasma power P = 0 are increased to w = 20 and additional (fictitious) points are inserted for the densities of CO 2 , CO and H 2 O at a power P > 8 W . This means that the algorithm was able to identify these insignificant reactions as such, if an appropriate constraint is taken into account. In addition to this and to demonstrate how additional data points which remove some uncertainties in the densities of O 2 and O comply with the observed data we combine the constrained model with a specification of fictitious data points, defining a mixture O 2 :O for a time t = t * ≫ t 0 .

3
The parameter t * is chosen to represent a point in time, when a stationary equilibrium is expected according to previous observations. The parameter is introduced to prescibe the densities according to the following relations which are a consequence of the conservation law Eq. 5 where 0 ≤ ≤ 1 and The indices 0 and * label the densities at time t = 0 (initial conditions in the experiment) and time t = t * , respectively. The parameter is set to either 0 or 1. These values represent the two limiting cases, in which either [O] * disappears and [O 2 ] * fulfills the mass balance ( = 0 ), or [O 2 ] * disappears and [O] * is constrained by the mass balance ( = 1 ). We call these models the = 0-model and the = 1-model, respectively. Finally, a fourth model is considered which consists of only 6 reactions, which are labeled by the symbol ★ in Table 1. This model was not the result of a systematic consideration, but arose from the  attempt to use only reactions whose reaction rates are known from the literature. We call this the "minimal model". The plots in Fig. 5 show exemplary results for the temporal evolution of the observable densities for a power of P = 4.00 W . For all the four alternative models the fit curves match fairly well the data points. This is confirmed by Fig. 6, where the entire set of fit results are compared to the experimental data. The corresponding error quality measures SSR 1 , SSR 2 and SSR 3 are given in the caption and show similar values as for the raw model regression. Although differences in regression quality can be seen in individual species, the overall quality remains similar. The same quality is found for plots similar to those shown in Figs. 2, 3 and 4 for the other time series and plasma power scans. However, considering the extrapolated results for the non-observable quantities shown in the plots of Fig. 7  separately. According to this general argumentation, also the differences in the models can be explained for points in time far outside the experimentally observable time series and for very fast processes, which cannot be resolved by experimental data. The findings up to now can be summarized as follows: • Satisfactory approximation of the experimental trends was found using a raw model with 9 species and 42 reactions and no additional information or assumption used as constraint. • The assumption that the plasma effects are described by a linear relation of electron density and plasma power and reactions including electron impact dissociation works very well to take into account the dependence of species densities on plasma power. • The method provides good results for an interpolation in the experimentally investigated parameter range , i.e. for times t < t 0 , for a plasma power 0 W ≤ P < 8 W , • The good agreement of the experimental data with the regression model suggests that the conservation laws the model implies are valid, i. e. Eqs. 2-5 describe the experimental situation well. This result is probably due to the fact that a high helium dilution was used in the experiments and thus the number of species remained small. • Such good interpolation results are also compatible with certain constraints imposed on species that cannot be observed or for data points beyond the experimental limit. This is demonstrated by numerical analysis of four alternative models (the constrained model, the = 0 -and = 1-model and the minimal model), which gives good regression for the experimental data too. In order to advance the modelling and to derive reliable rate coefficients, further a priori information should be included. In the ideal case detailed time traces of all species of a proposed model can be observed. However, before we get there, we want to emphasize the successful application of the presented algorithm and discuss the results of the model discovery in more detail.

Lower Limit of O 2 -Depletion
Even though the extrapolation to densities of O 2 and O separately is uncertain, the good interpolation of the density of the sum 2O 2 + O can be used to derive a lower limit for the conversion of O 2 . For this purpose coefficients for O 2 -depletion are introduced as The coefficient Γ true is actually the quantitiy of interest, which tells us the amount of molecular oxygen, reduced by the conversion process. For complete depletion one finds Γ true = 1 , whereas Γ true = 0 if no O 2 has been lost by production of CO 2 , CO and H 2 O . Due to the lack of knowledge of [O 2 ] the true depletion is not known. But one can give a lower bound for the depletion efficiency by Γ min , because Γ true is always larger than Γ min . This coefficient is plotted in Fig. 8 as a function of plasma power and initial admixture ration It can be seen that a complete depletion can be expected for a plasma power almost proportional to the square of mixing ratio, i. e. , the higher the initial content of O 2 , the more plasma power is needed to ensure reliable reduction.

Characteristics of the Minimal Model
The very surprising success of the minimal model gives reason to have a closer look on the implications of this good regression result. First we want to compare the coefficients with values from the literature. The table Table 2 lists those reactions for which numerical values  Table 2 Comparison of regression results for the scaled rate coefficients with corresponding scaled values from the NIST data base [30]. The symbol × labels reactions with a third body involved. To compute the corresponding rate coefficients a He-density of 1.0 ⋅ 10 19 cm −3 and a gas temperature of T = 298 K is assumed. The rate coefficients in physical units are recovered by multiplication with the conversion factors. The last column gives the references provided by the NIST data base  [35] could be found in the NIST database [30]. They are compared with the results of the regression with the raw model and the minimal model. One recognizes that the raw model reflects the negligibility of some reactions quite well and recognizes some essential processes as such.

Conclusion and Next Steps
What was found in the previous sections? Considering the aspect of finding a suitable regression model able to interpolate the experimental data, it was shown that the genetic algorithm presented is capable of identifying simple and small models. For different assumptions about reaction channels, reactants and products, rate coefficients could be determined that allow to describe the experimental data very well. Moreover, irrelevant reaction mechanisms could be identified, thus providing an indication for a further reduction of the models. However, if one considers the concrete physical processes underlying the model assumptions, it becomes apparent that the numerical method is not able to identify whether a reaction is physically meaningful. This has led to calculated rate coefficients sometimes differing considerably from known literature values (see Table 2, where a few rate coefficients match quite well and others deviate strongly). Also, various well-functioning models sometimes contradicted each other considerably. An extreme example of this is the minimal model, which gets by with a very small number of reactions and neglects mechanisms that are relevant in other useful regression models. It should be noted here that Ref. [17] also discusses a useful regression model that differs significantly from those presented here. However, in our data driven approach, the choice of the rate coefficients and their absolute values is in the first order unimportant as along as the model reproduces the data and allows some extrapolation. The comparison of individual rate coefficients with literature values may then serve as an indicator for open points in the model, whether further reaction mechanisms should be invoked in the future. At the moment the data do not enforce us to that.
Nevertheless, the results for the H 2 densities in Figs.4 and 7, for example, are not very satisfactory. The accumulation of H 2 is not very likely in an oxidizing environment. A way out of this inadequacy, for example, would be to consider hydroxyl OH as another reactive species. Various studies have already shown that there are many indications that OH plays an important role in the conversion of CH 4 [23][24][25] and must also be considered as an important channel for the production of CO 2 via the reaction pathway OH + CO ⟶ H + CO 2 [36]. Indeed, it would have been easy to include other species like OH in the models and look at other reaction pathways. But this route would only have introduced more unknowns into the models that would have to be clarified by experimental data, i.e., the problem of extrapolation to non-measurable quantities discussed in the previous sections would have become even greater.
At this point, it is important to remember, that the fundamental difference between a detailed forward calculation and the model discovery algorithm is that in the former case the rate coefficients and the reaction channels are specified, while in the latter only the reaction channels are provided, and the algorithm looks for an optimal solution of how these channels could be used by the reaction network to reproduce the experimental data. Physical laws can help by constraining the rate coefficients. But an educated guess about the basic reaction mechanisms is essential if one wants to build a realistic model that goes beyond the regression of a few measured data. Now what is the best way to use such a method? It could be used to extend a model step by step as new experimental data become known. The flexibility of the numerical method makes it quick and easy to "try out" additional effects that could serve as explanations without having to calculate or guess rate coefficients. In addition, competing effects could be added to existing models to investigate which reaction dominates a process, or under what conditions, a reaction becomes irrelevant. An important application is also in the identification of the relevant experimental domains to compare different models. For the CH 4 conversion discussed here, a few measurements of atomic or molecular oxygen would already be sufficient to exclude some of the presented models as unrealistic. Likewise, the measurement of H 2 would provide information on whether, for example, a species such as the hydroxyl OH is necessary to explain the experiments. In addition, a refined evaluation of the results from the genetic algorithm opens up the possibility of learning more about the significance and correlation of individual reactions. However, these refined methods are being developed at the moment and will be presented in a later paper. In this work, we have limited ourselves to presenting the basic framework for this type of study, which will be used further on for the analysis of plasma-assisted processes.

Summary
In this work a reaction kinetic analysis of experiments on the plasma conversion of CH 4 :O 2 :He gas mixtures in RF discharges was carried out. The aim was to formulate a model as simple as possible with few species and reaction channels, which allows the representation of experimental findings with only a few model parameters. Furthermore, the model should be flexible enough to consider future experimental findings and other information (or assumptions) about the underlying processes. To cope with both purposes, genetic algorithms prove to be useful tools that allow not only the calculation of model parameters but also the classification and comparison of different models. For the particular problem in the context of the plasma-assisted processes in RF plasma discharges considered here, a genetic algorithm has been developed and implemented. The details of the numerical structure and practical application were presented in detail. By considering a fairly extensive model based on 42 reactions and including the simplifying assumption for the plasma effects, that only a few electron impact dissociation processes are relevant, a fairly good match with the available experimental data has been obtained. This can serve as a basic framework for an analysis of further models which can be derived from it. Some examples of model reductions have been presented in this paper and have been examined and evaluated using the genetic algorithm. All these discussed models are suitable to reconstruct the experimental data. However, the predictions for species that cannot be observed and for parameter ranges that are not supported by experimental data can differ considerably from model to model. But this is not a fundamental problem, it only reflects the gaps in the currently available data. As soon as new data are added, it will be possible to use the contradictory results to exclude certain models. Nevertheless, the discussed examples show that our approach allows a convenient numerical investigation of different model assumptions within minutes or a few hours. In the sense of the objective of reduced reaction-kinetic models, our method allows to test different consequences of the model assumptions in a clearly arranged way in order to verify the predictions later in experiments. Among the model variants for the CH 4 -O 2 -conversion presented here, the model with only 6 reactions is particularly noteworthy, which partly allows analytical results. This allows a very convenient verification of the hypotheses and gives a direction for further investigations. In summary, the genetic algorithm presented here is a very flexible and simple method to systematically develop a reaction network for a variety of different plasma-assisted processes. The Fortran routines used for the simulations presented here can be obtained from the authors upon request.