1 Introduction

The modelling of non-equilibrium situations plays a fundamental role in the description of combustion, atmospheric and interstellar medium chemistry, of plasmas and all related devices. In such environments, the number of co-occurring physical processes is huge, and the system is not in a local thermal equilibrium state [1]. The contribution of each single process (i.e., two-body collisions involving atoms, molecules and electrons) and the effect of electric and magnetic fields need to be explicitly included in the description. Furthermore, processes involving excited electronic and vibrational states, which are significantly populated in such conditions, concur to characterize the overall distribution of the available energy in the system. For this reason, a large quantity of state-to-state data, i.e., considering atoms and molecules selected in specific electronic and vibrational states, is required, mainly in the form of rate coefficients or cross sections. Quite obviously, only a very limited number of experimental rate coefficients are available, and often the vast majority of needed data are obtained through very rough approximation/extrapolation procedures, like the use of first-order Schwartz, Slavsky, Herzfeld (SSH) theory [2] or scaling laws, undermining the reliability of the model, as pointed out in Ref. [3] for cold air plasmas. On the other hand, the direct calculation of accurate rate coefficients might pose some formidable challenges connected to the number of ro-vibrational states to be included, to their correct quantum treatment, to the availability of reliable potential energy surfaces (PESs) for the system, etc. In the last years, we specifically addressed the calculation of large datasets of rate coefficients for vibrational relaxation (vibration to translation/rotation, V–T/R, and vibration to electronic, V–E) and vibrational energy transfer (vibration to vibration, V–V) processes. In particular, we focused on atom–diatom [4, 5] and diatom–diatom collisions [6,7,8], by using a mixed quantum-classical (MQC) dynamics method [9] in conjunction with accurate analytical non-reactive PESs based on the Improved Lennard–Jones (ILJ) model [10, 11]. This represents an advantageous combination since: (1) the MQC method allows to recover quantum effects associated with the vibrational motion, which are the most relevant ones in this kind of process, yet for the remaining degrees of freedom a classical dynamics description is kept, thus involving a computational load similar to standard quasi-classical trajectory (QCT) calculations; (2) the analytical representation of the PES allows a fast evaluation of the potential and its derivatives, which are essential in nuclear dynamics simulations and, at the same time, provides a very accurate description of non-reactive potential even at long and very long interaction distances, which strongly influence the outcome of inelastic scattering, particularly in the low energy regime.

In the present paper, we apply the same methodology to determining large datasets of V–V and V–T/R rate coefficients for collisions between molecular nitrogen and molecular hydrogen. Indeed, \(\textrm{N}_2\)/\(\textrm{H}_2\) plasmas have a wide range of applications, starting from the low-temperature sustainable synthesis of ammonia, NH\(_3\), as opposed to the conventional Haber-Bosch process: the nonequilibrium conditions induced by different types of discharges could produce active species in N\(_2\)/H\(_2\) mixtures, leading to the weakening of N\(_2\) bond and to a convenient mechanism for ammonia (or ammonia derived fertilizers) formation. A recent study [12] on \(\textrm{N}_2\)/\(\textrm{H}_2\) plasmas shows that \(\textrm{H}_2\) can efficiently quench the high-lying vibrational levels or energetic metastable states of \(\textrm{N}_2\), so even a small amount of \(\textrm{H}_2\) (less than 1%) plays a significant role in the \(\textrm{N}_2\) plasma discharges. Nitrogen/hydrogen plasmas are also used for the chemical treatment of materials, like low-k etching in dielectric [13], surface cleaning [14], or for the production of metal nitrides, as well as for satellite propulsion [15,16,17]. Besides, Titan’s ionosphere is a natural plasma made of N\(_2\), H\(_2\) and CH\(_4\) and an increasing number of studies is devoted to investigate chemical processes arising for active species which might form therein [18]. The determination of the vibrational distribution function in such environments requires the knowledge, among others, of the V–V and V–T/R rate coefficients for N\(_2\)–N\(_2\), H\(_2\)–H\(_2\) and N\(_2\)–H\(_2\) collisions, spanning their complete vibrational ladder. We recently reported such values for energy exchange in molecular nitrogen [6], and we focus here on N\(_2\)–H\(_2\) inelastic scattering processes.

For this system, a PES based on an ILJ formulation has already been published [19]. As in previous investigations [6, 8], we start from such PES and fine-tune its parameters, which have a strong physical meaning and are therefore free to change only in a very limited range, to improve the agreement with the few available experimental data and with high-level ab initio points calculated at reference configurations of the interacting diatoms.

We use such modified PES and the MQC method to calculate rate coefficients in the temperature range 100–5000 K for the following processes, namely V–T/R energy exchanges of type:

$$\begin{aligned} \textrm{N}_2(v)+\textrm{H}_2(w) \longrightarrow \textrm{N}_2(v-\Delta v)+\textrm{H}_2(w), \end{aligned}$$
(1)

and

$$\begin{aligned} \textrm{N}_2(v)+\textrm{H}_2(w) \longrightarrow \textrm{N}_2(v)+\textrm{H}_2(w-\Delta w), \end{aligned}$$
(2)

and V–V energy exchanges:

$$\begin{aligned} \textrm{N}_2(v)+\textrm{H}_2(w+1) \longrightarrow \textrm{N}_2(v+1)+\textrm{H}_2(w), \end{aligned}$$
(3)

and

$$\begin{aligned} \textrm{N}_2(v+2)+\textrm{H}_2(w) \longrightarrow \textrm{N}_2(v)+\textrm{H}_2(w+1). \end{aligned}$$
(4)

We considered vibrational quantum numbers v for N\(_2\) up to 40 and w for H\(_2\) up to 10. Higher vibrational levels, though available, were not included because the Morse description of the intramolecular potential of N\(_2\) and H\(_2\) and of their vibrational wavefunctions might not be accurate as we approach the dissociation limit. For a reliable description of such states, a potential and a dynamical treatment accounting for bond breaking and reactivity should be used [20].

Even if such a combined approach has helped us to calculate and produce large tables of rate coefficients in a wide temperature range, full coverage of all possible vibrational states is a formidable task. Fitting procedures of the calculated coefficients, generally based on polynomial expansions [4], can be used and work rather well to estimate rates as a function of temperature for a specific process, but the prediction of rates upon change of the initial vibrational states is a more complex procedure. In the last years, the use of Machine Learning (ML) techniques to get insight and make dependable predictions starting from limited experimentally or theoretically available datasets has enormously grown, showing to have the potentialities to be applied for the modeling of a variety of aspects of collision dynamics [21,22,23,24,25]. As a matter of fact, nowadays, most PESs are obtained by exploiting Neural Network (NN) approaches to interpolate ab initio points [21, 24, 26], generally producing reliable and fast-to-compute reactive potentials and so much that the procedure and strategies have been widely described [27, 28]. We note here that one could in principle obtain a reliable PES for the description of collisional events at very high temperature (or when higher vibrational states are considered or reactive channels open) by merging the ILJ PES with PESs which accurately describe short range and the first long range regions as those obtained through NN procedures [25,26,27, 29]. On the other hand, ML approaches have just started to be applied for the production of databases (or complete datasets) of rate coefficients mainly for reactive dynamics [23, 30, 31], whereas, with the exception of some very early attempts [32], few studies report their application for vibrational energy transfer processes [25]. Considering that modeling of plasma and non-equilibrium environments must rely on the knowledge of rate coefficients for reactive and vibrational energy transfer covering all the vibrational ladder of reactants and products states, ML methods combined with a sufficient number of accurately calculated sets can provide the answer to the problem. In the present paper, we therefore present some preliminary results on the performance of different ML techniques, namely Neural Network and Gaussian Process (GP) [33] approaches, for the interpolation and the prediction of rate coefficients starting from a set of MQC calculated ones for a selected V–T/R process:

$$\begin{aligned}{} & {} \textrm{N}_2(v)+\textrm{H}_2(0) \longrightarrow \textrm{N}_2(v-\Delta v)\nonumber \\{} & {} \quad +\textrm{H}_2(0), \quad \Delta v=1,2,3. \end{aligned}$$
(5)

The paper is organized as follows: Sect. 2 describes the methods applied for (i) the construction of the PES, highlighting the modifications made to the original formulation of ref. [19]; (ii) the dynamical calculations, i.e., the details of the MQC method; (iii) the test of NN and GP approaches; Sect. 3 reports the datasets for processes (14) calculated with the MQC method and discusses physically meaningful trends of the rates as a function of temperature and of the vibrational quantum numbers. Section 4 describes the performance of the tested NN and GP models to fit and predict rate coefficients for processes (5) as a function of temperature and, most importantly, of the initial vibrational quantum numbers, discusses their performances in terms of their predictive capability and computational burden and, finally, provides the complete dataset of rates for (5) covering all v values up to \(v=40\). General conclusions are drawn in Sect. 5.

2 Methods

2.1 Potential energy surface

The formulation of the PES is essentially the same one of Ref. [19], so we report here only its main features to describe how the parameters were modified. A full detailed description can be found in the Supporting Information as well as in the original reference.

The overall interaction V of the diatom–diatom system is expressed as a sum of intramolecular (\(V_{\textrm{i n t r a}}\)) and intermolecular (\(V_{\textrm{i n t e r}}\)) interaction components. \(V_{\textrm{i n t r a}}\) is formulated using a Morse potential energy function \(D_{e}\left( t^{2}-2 t\right) \), in which \(D_{e}\) is the dissociation energy of the diatomic molecule, \(t=\exp \left[ -\beta _e\left( r-r_{e q}\right) \right] \) and r is the internuclear diatomic distance (with \(r_{e q}\) being its equilibrium value). The set of Morse parameters for \(\textrm{N}_2\) and \(\textrm{H}_2\), used in the intramolecular potential as well as in the Morse wavefunction of the MQC calculations (see the following), are shown in Table 1 and are taken from the spectroscopic data of Ref. [34].

It is well-known that Morse potential increasingly loses accuracy when describing vibrationally and rotationally highly excited states. For this reason, we tested the vibrational energy values and the matrix elements needed for MQC dynamics (Eq. (15) in the Supporting Information) calculated by the present Morse intramolecular potentials with ab initio based ones (Ref. [35] for N\(_2\) and [36] for H\(_2\)) and found that differences for the highest vibrational states amount to 10% ca. for energy values and to much less for the matrix elements. These uncertainties are smaller than the overall accuracy of the method (see discussion in the following).

Table 1 Molecular constants for \(\textrm{N}_2\) and \(\textrm{H}_2\) [34]
Fig. 1
figure 1

Behavior of the original and modified ILJ potential energy surfaces and ab initio points as a function of the intermolecular distance. Selected configurations, at the equilibrium intramolecular distance of both monomers, are considered: the parallel (or H), collinear (or I), crossed (or X) and perpendicular (or T\(_{a}\)) configurations

The intermolecular (\(V_{\textrm{i n t e r}}\)) component is represented as the sum of two main contributions:

$$\begin{aligned} V_{\textrm{inter}}=V_{v d W}+V_{\textrm{elect}}, \end{aligned}$$
(6)

where the \(V_{v d W}\) term accounts indirectly for many-body effects, formulated as a bond-bond interaction [19] according to the ILJ model. The \(V_{\textrm{elect}}\) term consists of the electrostatic interaction due to the molecular permanent multipoles, and only the main quadrupole–quadrupole term is taken into account herein [19]:

$$\begin{aligned} V_{\textrm{elect}}(R,\gamma ) = \frac{Q_{a} Q_{b}}{R^5} \; A^{2 2 4} (\gamma ), \end{aligned}$$
(7)

where \(Q_{a}\) and \(Q_{b}\) correspond to the quadrupole moments of the monomers (\(Q_{H2}=0.4835\) a.u. [37], \(Q_{N2}=-1.1\) a.u. at their equilibrium distances \(r_{eq}\) [38]) and \(A^{2 2 4}(\gamma )\) is the bipolar spherical harmonic which describes the angular dependence of the quadrupole–quadrupole interaction.

The \(V_{v d W}\) interaction (size repulsion plus dispersion attraction) depends on the distance R between the centers of mass of the interacting partners, and on \(\theta _{a}, \theta _{b}, \Phi \), collectively indicated as \(\gamma \), i.e., the four body Jacobi angular coordinates which describe the relative orientation of the two diatoms. In the ILJ formulation (see Supporting Information), the overall interaction can be expressed as a function of physically meaningful parameters, \(\beta ^i(\gamma )\), \(\varepsilon ^i(\gamma )\), \(R_{m}^{i}(\gamma )\), corresponding to five selected reference configurations i of the colliding diatoms, namely a parallel H geometry \(\left( \theta _{a}=90^{\circ }, \theta _{b}=90^{\circ },\Phi =0^{\circ }\right) \), a crossed X (90, 90, 90) geometry, a collinear I geometry (0, 0, 0), and two perpendicular T\(_{a}, (90,0,0)\) and T\(_b,(0,90,0)\) geometries, differing for H\(_2\) pointing to N\(_2\) center of mass (T\(_{a}\)) or N\(_2\) pointing to H\(_2\) center of mass (T\(_{b}\)). \(\beta ^i(\gamma )\) is related to the hardness of the interacting partners, whereas \(\varepsilon ^i(\gamma )\) and \(R_{m}^{i}(\gamma )\) correspond to the potential well depth and the equilibrium distance of each configuration. The values of such parameters for the original PES [19] can be found in Table 2.

Table 2 Parameters for the N\(_{2}\)–H\(_{2}\) intermolecular potentials

Figure 1 compares the original PES and ab initio interaction energies for the parallel H, collinear I, crossed X and perpendicular T\(_{a}\) configurations. The ab initio values were obtained by using the CCSD(T) level of theory together with Dunning’s aug-cc-pV5Z basis set [39] and the bond function set [3s3p2d1f] developed by Tao [40] and placed on the midpoint of the intermolecular distance R. The obtained interaction energies were corrected using the counterpoise method [41] to remove the basis set superposition error. All computations were performed using the Molpro code [42].

The figure shows that the PES of ref. [19] matches very well the ab initio points at long range and for the description of the potential well, whereas differences can be found in the repulsive region. We therefore tried to improve the description of the short range interaction by slightly changing the values of the \(\varepsilon ^{i}\) and \(R_{m}^{i}\) parameters. The modifications, amounting to a few percent so as to fully preserve the physical meaning of these parameters, can be found in Table 2 and the comparison with the ab initio points and with the original PES in Fig. 1. Note that in this ILJ formulation (as in the original one and that of refs. [6, 7]), the monomers’ deformations are explicitly accounted for in \(V_{elect}\) and \(V_{v d W}\) through the N\(_2\) and H\(_2\) bond length dependence of the molecular polarizabilities and quadrupole moments [19]. We tested the reliability of this approach by comparing the ILJ results to ab initio points, calculated at the same CCSD(T) level of theory described above, for the intermolecular interaction in the I and T\(_{a}\) configurations with each monomer stretched in turn by 10% and 20% (with respect to equilibrium distance) and the results are shown in Figures S1 and S2 in the SI displaying an overall good matching. Since the variation of N\(_2\) and H\(_2\) quadrupole moments and of their parallel and perpendicular polarizability components (see discussion and Fig.8 of Ref. [19]) agree with the theoretical values in a wide range of molecular deformations, we are confident that the proposed formulation is reliable for molecular deformations up to 40%. The uncertainty associated with larger elongations might be slightly larger.

The new values were also fine-tuned against the few available experimental data, that is, against the second virial coefficient, B(T), values [43,44,45] and V–T/R rate coefficients for N\(_{2}\left( 1\right) +\)H\(_{2}\left( 0\right) \rightarrow \)N\(_{2}\left( 0\right) +\)H\(_{2}\left( 0\right) \) (hereafter indicated as (1,0)\(\rightarrow \)(0,0) in short) [46] and V–V rate coefficients for N\(_{2}\left( 0\right) +\)H\(_{2}\left( 1\right) \rightarrow \)N\(_{2}\left( 1\right) +\)H\(_{2}\left( 0\right) \) (hereafter indicated as (0,1)\(\rightarrow \)(1,0)) processes [47].

The comparison in Fig. 1 shows a substantial improvement in the description of the first repulsive wall with the modified PES; differences remain in the very repulsive short range region, but they are not expected to play a relevant role in the temperature range considered here.

Comparison of second virial coefficients, calculated on the original and modified PESs by including first quantum corrections \(B_{q l}(\textrm{T})\) to the classical estimate \(B_{c l}(\textrm{T})\) [48] (\(B(\textrm{T})=B_{q 1}(\textrm{T})+B_{c 1}(\textrm{T})\)), with measured values [43,44,45]) can be found in Figure S3 in SI and shows an excellent agreement of both PESs, with an improvement at medium and high temperature upon the use of the modified one. This can be explained by considering that negative second virial coefficient values, at low T, mostly depend on the interaction anisotropy in the attractive part of the PES, which is substantially unchanged. On the other hand, at high T, positive B(T) values probe the potential first repulsive region, where changes of the parameters were most effective.

Furthermore, the spherical average of the modified potential remains very similar to the original one, both in the well depth and at long-range. This is an important point, because it ensures the correct reproductions of the maximum glory quantum interference resolved in scattering experiments and of the absolute value of the measured total integral cross section. Note that the negative area of such potential defines the capture probability leading to the formation of precursor states of inelastic processes.

V–T/R rate coefficients for the process (1,0)\(\rightarrow \)(0,0) calculated by the MQC method, as described in the following Section, on the modified PES (red line in Fig. 2) present a significant better agreement with experimental data of Ref. [46] (the experimental uncertainties are not available). However, they still tend to overestimate the experimental values by ca. 50% quantitatively. The figure also reports results obtained using the SSH theory [1] (dashed green line), which explicitly uses the experimental data in the fit.

For the V–V (0,1)\(\rightarrow \)(1,0) process, the only measured data are available at 295 K with the value of 6.41\(\cdot \)10\(^{-17}\) [47]. Rate coefficients calculated by the MQC approach on the original and modified PESs at the same temperature amount to 8.43\(\cdot \)10\(^{-17}\) and 4.42\(\cdot \)10\(^{-17}\), slightly overestimating and underestimating, respectively, the experiment. Figure S4 in the SI reports the rates in the whole temperature range investigated here and the values calculated by the SSH theory [49], fitted on the only available measured rate. It is worth noting how much the MQC and the SSH calculated rate coefficients differ at very low and high temperatures, with discrepancies larger than one order of magnitude.

2.2 The mixed quantum-classical method

The nuclear dynamics calculations of cross sections and rate coefficients were carried out with the MQC method, introduced by Billing [9] in the version developed by us and used to describe diatom–diatom collisions [6,7,8]. The method is based on the simultaneous solution of the time-dependent Schrödinger equation for the diatom vibrations (by solving a close coupled equations system) and the classical Hamilton equations of motion for the remaining degrees of freedom. Details of the method are reported in the above-mentioned references and the SI of the present paper. Here, we only report the settings specifically used in V–V and V–T/R calculations. V–V and V–T/R rate coefficients were obtained by coupling up to 121 initial vibrational states, needed to describe the highly excited vibrational states, and by starting from an initial diatoms separation distance equal to 15 Å and 50 Å for V–V and V–T/R processes, respectively. Forty-seven different values of total classical energies, comprised between 45 cm\(^{-1}\) and 80,000 cm\(^{-1}\), were considered, and a maximum value of the impact parameter of 9 Å is used. Such settings should ensure an accuracy of the MQC rates within 20%.

All the vibrational deactivation processes (Eqs. 1 and 2) are exothermic. However, by considering the detailed balance principle (microscopic reversibility), the rate coefficients \(k\prime _{endo}\) for the reverse endothermic vibrational excitation processes can readily be obtained as

$$\begin{aligned} k\prime _{\textrm{endo}}= k_{\textrm{exo}} \times \exp \left( -\frac{|\Delta E |}{k_B T}\right) , \end{aligned}$$
(8)

where \(\Delta E\) is the energy mismatch between the initial and final vibrational states (i.e., the endothermicity) in the process. V–V transitions (Eqs. 3 and 4), on the other hand, might be exothermic or endothermic depending on the initial N\(_2\) and H\(_2\) vibrational quantum numbers. In such cases, the rate coefficients for the reverse reactions can also be calculated by applying microscopic reversibility through the same Eq. 8.

Fig. 2
figure 2

Rate coefficients for the transition (1, 0)\(\longrightarrow \)(0, 0) calculated on original PES (dash-dot black line) and modified PES (full red line). Experimental data of Ref. [46] (blue squares) and results calculated by the SSH theory [1] (green dash line) are also reported

2.3 Neural network and Gaussian process models

Two different Machine Learning (ML) approaches are employed for rate coefficient predictions in terms of temperature and initial vibrational nitrogen quantum number v: the Gaussian Process (GP) and the Neural Network (NN) techniques, both known to perform well [23, 30, 31]. In all Supervised Learning methods, the complete dataset is split into a training and a test set. The training set sample is used to train the model to learn how to classify or predict a specific value. The test set is instead used to test the capability of the model to predict the values or to classify the new data, that is, how well the model can generalize [50]. Specifically, we are here interested in regression models that should be able to correctly predict variables (i.e., the label, in our case \(\log _{10}(k)\), see the following) given some descriptors (i.e., features, in our case temperature T and quantum number v).

GP uses a set of prior random Gaussian functions to predict the correlation between the Gaussian distributions at two nearby configurations. A Gaussian process is specified by a covariance/kernel function. Here, we used a Matern kernel, which is a generalization of the RBF (Radial basis function), having an extra parameter (i.e., \(\nu \)) in the Matern covariance function to specify the smoothness of the resulting function. We employed the GP model as implemented in scikit-learn [51] and a \(\nu = 5/2\), as recently done in Ref. [25], where they used GP to predict the energy transfer cross sections in CO–CO collisions for different vibrational quantum numbers and collision energies.

Fig. 3
figure 3

V–V rate coefficients for processes N\(_2(v)\)+H\(_2\)(\(w+1\))\(\rightarrow \) N\(_2(v+1)\)+H\(_2\)(w), with \(w=0, 1, 2, 4, 6, 9\), as a function of the vibrational quantum number v at different temperature values

Neural Networks (NNs), also known as artificial neural networks (ANNs), are sets of node layers, containing an input layer, one or more hidden layers, and an output layer mimicking the human brain [52]. Each node (i.e., artificial neuron) is connected to another having associated weight and threshold (i.e., the node is activated only when the output of any individual node is above the specified threshold value). ANNs are well-known Supervised ML techniques at the heart of deep learning algorithms. In this work, we use a NN model developed using TensorFlow [53] (see Fig. S8), made of an input bi-dimensional layer and an output layer of a single node. The NN model employs four hidden layers having: 32, 64, 128 and 32 units starting from the first to the last. A linear activation function for the input and output layers and a rectified linear activation function (ReLU) in all the hidden layers are used. The NN scheme has been trained for 50 epochs. We also tested a smaller NN model, hereafter indicated as NN2, made of only two hidden layers of 100 units each. NN2, whose performance is only shown in the Supporting Information (see Figs. S9–S12), is much less efficient than both the GP and NN models described here, indicating that oversimplified NN approaches cannot adequately handle predictions and a certain degree of complexity, in terms of hidden layers, is needed to get reliable results. All the ML code developed within this work is freely available at Ref. [54].

3 The V–V and V–T/R rate coefficients datasets

The modified PES, tested on its ability to reproduce experimental data of different type, was then used to build a large and accurate dataset for V–V and V–T/R rate coefficients of the N\(_2\)+H\(_2\) system.

Fig. 4
figure 4

V–V rate coefficients for asymmetric processes N\(_2(v+2)\)+H\(_2\)(w)\(\rightarrow \) N\(_2(v)\)+H\(_2\)(\(w+1\)), with \(w=0, 1, 2, 3, 7, 10\), as a function of the vibrational quantum number v at different temperature values

Here, we consider a wide range of temperature values between 100–5000 K as well as a large range of initial vibrationally excited states of N\(_2\) and H\(_2\). The determination of so many rate coefficients requires a great computational effort, and, to the best of our knowledge, they are not included in any available database of the title system. However, it is worth noting that an estimate of V–V and V–T/R rate coefficients using the SSH theory and scaling laws was made in ref. [49] and the obtained values were used, together with rates for electron impact, chemical reaction and wall influenced processes, to model DC glow discharges and post-discharges in N\(_2\)/H\(_2\) mixtures. We will therefore compare the rates estimated according to the simple model formulation in [49], as described in Appendix A, with those explicitly calculated with the MQC method.

All the datasets calculated here were deposited in Zenodo and are freely accessible [55].

The rate coefficients for V–V processes N\(_2(v)\)+H\(_2\)(w+1)\(\rightarrow \) N\(_2(v+1)\)+H\(_2\)(w) (Eq. 3), with \(w=0, 1, 2, 4, 6, 9\), as a function of the initial quantum number v at T= 100, 300, 500, 1000, 3000, 5000 K are reported in Fig. 3. Due to the difference in the magnitude of the vibrational quanta of N\(_2\) and H\(_2\), rate coefficients for this process increase with w, i.e., with the vibrational excitation of molecular hydrogen, which reduces the energy mismatch. As a matter of fact, processes with \(w=0\) are strongly exothermic (of about 2000 cm\(^{-1}\)), whereas when \(w=9\), we have quasi-resonant energy exchange at \(v\approx 12\). When the energy mismatch is large (i.e., \(w=0,1,2\)), the trend of the curves in Fig. 3 is approximately the same at all temperature values: for very small v, there is a small increase in the rates as v grows and then, the rates reach a plateau for higher v values. For these processes, a marked rise in the rates as temperature increases can also be observed. When H\(_2\) vibrational excitation grows (bottom panels in Fig. 3), the trend for high-temperature rates remains that previously described. At low temperatures, instead, there is first an increase (at low v) and then, a marked decrease (at high v) in the rates. The largest rates are found in the correspondence of exothermic V–V exchanges, presenting a small energy mismatch. Indeed, as observed experimentally [56] and theoretically [6,7,8] in diatom–diatom collisions for other systems, for quasi-resonant processes the enhancement of the rate coefficients becomes very relevant at the lowest temperature, producing an anti-Arrhenius kinetic behavior (i.e., larger rates at lower temperature). This phenomenon can be explained as the result of an effective trapping in the potential well at low collision energies, which determines a strong coupling of the diatomic vibrational wavefunctions, favoring the exchange of vibrational quanta.

As mentioned before, we report here, for the cases \(w=0\) and \(w=9\), the rate coefficients calculated by the analytical expression based on SSH theory given in [49] and described in Appendix A, which can be found as dashed lines in the first and last panel of Fig. 3. The qualitative trend of the curves is very similar to that of MQC rates. However, the quantitative difference is extremely large often amounting to more than one order of magnitude for \(w=0\) and two or three orders for \(w=9\). The only agreement is indeed found for the (0,1)\(\rightarrow \)(1,0) process rate coefficient at 300 K, i.e., very close to the experimental value the SSH model was calibrated upon.

Figure 4 reports the rate coefficients for asymmetric V–V processes N\(_2(v+2)\)+H\(_2\)(w)\(\rightarrow \) N\(_2(v)\)+H\(_2\)(w+1) (Eq. 4), with \(w=0, 1, 2, 3, 7, 10\), as a function of the initial quantum number v at T= 100, 300, 500, 1000, 3000, 5000 K. In this case, the basic trends are similar to those found for process (3), but they are enhanced: rates become larger as w increases and for the highest temperatures investigated here, i.e., 3000 and 5000 K, the behavior of the curves is the same as in Fig. 3. However, at lower temperatures, the behavior found in the bottom panels of Fig. 3 is emphasized: there is an increase followed by a strong decrease in the rates as v increases at a fixed temperature, the highest rates can again be found for nearly resonant processes. The latter processes show the smallest dependence on temperature: the rate coefficients are close at almost all temperatures. The peak in the rates is shifted to higher v values as w increases. Far from this peak, the rates strongly depend on temperature, particularly when we consider endothermic energy exchanges, which occur at high v values. Other rate coefficient data for generic V–V processes of Eqs. 3 and 4, with different v, w and temperature values, are given in [55].

In the first panel of Fig. 4, we also report the comparison with the rate coefficients calculated by the SSH theory as described in the Appendix A and used in Refs. [16, 49] to model N\(_2\)/H\(_2\) plasma in DC discharges. It can be noticed again that, though the qualitative trend is somehow similar to that of MQC rates, the rates evaluated by the SSH formulation present several orders of magnitude differences, particularly for high vibrationally excited nitrogen.

The behavior of rate coefficients for V–T/R processes N\(_2(v)\)+H\(_2\)(0)\(\rightarrow \) N\(_2(v-\Delta v)\)+H\(_2\)(0), with \(\Delta v=1, 2, 3\), as a function of the initial quantum number v at T= 100, 300, 500, 1000, 3000, 5000 K is reported in Figs. 5, S5 and S6. The results show that, though the rate of V–T/R processes is low at low temperatures (being generally some orders of magnitude smaller than V–V processes for the same initial vibrational states), their efficiency rapidly grows by increasing the temperature, and they become comparable with high-temperature values.

Additional rate coefficients for general V–T/R processes of the kind N\(_2(v)\)+H\(_2\)(w)\(\rightarrow \) N\(_2(V-\Delta v)\)+H\(_2\)(w) (Eq. 1) can be found in the Zenodo repository [55].

We also evaluated V–T rate coefficients for \(\Delta v = 1\) by using the scaling law proposed in ref. [49] and here reported in Appendix A. The results are shown in Fig. 5: they are comparable with MQC rates only for the \(v=1\) case. The difference becomes rapidly huge as v increases.

This evidence, together with the results obtained for V–V rates, confirms that the use of too simple first-order models, such as the SSH theory, or, even worse, of scaling laws, can lead to enormous discrepancies with directly calculated data, and, therefore, their use should be avoided. Nowadays, the availability of powerful computers and methods allowing the calculation of large datasets with reasonable computational time and load permits us to rely on much more accurate data.

Fig. 5
figure 5

V–T/R rate coefficients for N\(_{2}\left( v \right) +\)H\(_{2}\left( 0\right) \rightarrow \)N\(_{2}\left( v-1\right) +\)H\(_{2}(0)\) processes as a function of the vibrational quantum number v at different temperature values

V–T/R processes involving vibrational relaxation of molecular hydrogen, i.e., N\(_2(v)\)+H\(_2\)(w)\(\rightarrow \) N\(_2(v)\)+H\(_2\)(\(w-\Delta w\)) (Eq. 2), are less important in plasma applications than those involving the vibrational relaxation of nitrogen. Nevertheless, we calculated rate coefficients for the process N\(_2(0)\)+H\(_2\)(w)\(\rightarrow \) N\(_2(0)\)+H\(_2\)(\(w-1\)) as a function of the initial quantum number w at T= 100, 300, 500, 1000, 3000, 5000 K and the results can be found in Fig. S7. Rate coefficient data for additional generic V–T/R processes with different v values and involving hydrogen relaxation are also given in [55].

4 V–T/R rate coefficients by machine learning

The results reported in this section are intended to be preliminary, as they only concern the processes (5), for which the predictive performances of the ML approaches described in Sect. 2.3, i.e., Gaussian Process (GP) and Neural Network NN, were analyzed and compared in terms of the associated MSE (Mean Square Error) and computational time. It is worth noting that, though GP approaches automatically compute an intrinsic associated standard deviation for each predicted point, this is not the case for NN method. Therefore, in order to compare GP and NN on the same ground, in the present work we only evaluate and analyze MSE for the test set.

A more extensive ML investigation, covering not only all V–V and V–T/R processes for the title system, but also the use of different parameters or features for training the sets is in progress. It was indeed found, for instance, that results obtained by using rate coefficients k or instead \(\log _{10}(k)\) values might produce changes in the performance, the magnitude and direction of the changes being different with the methods, as could be expected because the models rely on different functional forms. Here, we chose to use \(\log _{10}(k)\) data which, at least for GP and NN models shown here, produced better results in the whole range of T and v investigated.

The capabilities of the two ML models to predict rate coefficients (at temperature values and initial vibrational quantum numbers v not included in the training set) were tested by considering four different splitting scenarios.

Fig. 6
figure 6

Test set MSE values as a function of temperature: \(log_{10}(k)\) values corresponding to a specific temperature T were removed from the training set and constitute the test set. The three panels correspond to processes (5) with \(\Delta v=1, 2, 3\), respectively

In the first one, we remove all the rate coefficients having the same temperature, one temperature at a time, using all the other points as a training set and the removed ones as the test set. The MSE obtained for the three models is reported as a function of the removed temperature value in the three panels of Fig. 6, corresponding to processes (5) \(\Delta v\)=1, 2, 3, respectively. The MSE slightly grows with \(\Delta v\), but is always very small for NN and GP (for T\(\le \) 2000 K, the MSE becomes relevant). A slightly larger error at very low temperatures can be found for GP and NN, but the absolute magnitude remains small. Specifically, the error associated with the GP model is practically negligible, showing the best performance in this case.

Fig. 7
figure 7

Test set MSE values as a function of initial vibrational quantum number v: \(log_{10}(k)\) values corresponding to a specific v were removed from the training set and constitute the test set. The three panels correspond to processes (5) with \(\Delta v=1, 2, 3\), respectively

Fig. 8
figure 8

Test set MSE values for the two models obtained by removing an increasing number of random points (5% to 50%) from the training set. The three panels correspond to processes (5) with \(\Delta v=1, 2, 3\), respectively

Fig. 9
figure 9

The test set MSE values for the two models obtained by removing an increasing number of systematically selected points, corresponding to specific v values, from the training set, i.e., Set1, removed v = [2; 4; 6; 8; 10; 14; 18; 22; 26; 30; 35], Set2, removed v = [1; 3; 5; 7; 9; 12; 16; 20; 24; 28; 32; 40], Set3, removed v = [2; 3; 5; 6; 8; 9; 12; 14; 18; 20; 24; 26; 30; 32], Set4, removed v = [1; 2; 4; 5; 7; 8; 10; 12; 16; 18; 22; 24; 28; 30; 35; 40]. The three panels correspond to processes (5) with \(\Delta v=1, 2, 3\), respectively

The other scenarios concern the performance of the ML models to predict rate coefficients for processes (5) corresponding to initial quantum numbers v different from the MQC calculated ones. Indeed, as mentioned before, this is the principal target for the use of ML techniques since extending the direct calculation of rate coefficients to different temperature ranges is much less demanding than to different vibrational quantum numbers.

Fig. 10
figure 10

NN and GP rate coefficient predictions for \(v=34\) (left panel) and \(v=36\) (right panel) as a function of temperature. The reference MQC calculated rates are shown for comparison

In the second scenario, we therefore proceed similarly: we split the full set into a training set made of all data points except those corresponding to one specific v value and a test set made of the latter. The three panels of Fig. 7 report the MSE results for each specific v for processes (5) with \(\Delta v=1, 2, 3\), respectively. The trend shown in the panels is basically similar to that of Fig. 6: the magnitude of MSE is relatively small for all models. The error tends to grow as \(\Delta v\) increases and, above all, at the edges of the investigated range of v values. For intermediate v, the MSE associated with GP is negligible and smaller than that for NN. However, for the highest v values investigated here (where the training set consists of more sparse data points), the predictive capability of GP drops.

A further comparison was made by splitting the full set into a training and a test sets made of randomly selected points. The number of random points to be moved from the training to the test set was progressively increased from 5% up to a maximum of 50% of the full MQC calculated set. The MSE results are shown in panels of Fig. 8 for processes (5) with \(\Delta v=1, 2, 3\), respectively, and show some of the trends found before: a growing MSE as \(\Delta v\) grows, and the GP model performing slightly better than NN. The main message from this scenario can, however, be found concerning the MSE behavior as a function of the percentage of randomly selected eliminated points. In general, the error tends to remain very small when up to 20 % of points are removed. When this percentage increases, we have a larger MSE in general, but this value does not necessarily grow with the percentage, i.e., the MSE seems to strongly depend on which points are removed: we might have 50% data points removed from the training set, but if they are effectively distributed, we will have smaller MSE than with more data points in the training set distributed less regularly.

To better analyze this issue, we considered a fourth scenario built by removing rate coefficients corresponding to initial vibrational quantum numbers v in a systematic fashion. Specifically, Sets 1 and 2 were obtained by moving alternate data points from the training to the test set, that is, points corresponding to v = [2; 4; 6; 8; 10; 14; 18; 22; 26; 30; 35] for Set 1 and v = [1; 3; 5; 7; 9; 12; 16; 20; 24; 28; 32; 40] for Set 2 (which does not correspond to even and odd points, because the MQC rate coefficients do not cover all v values). Sets 3 and 4 were obtained by moving 2 data points out of 3 from the training to the test set, i.e., Set 3 = [2; 3; 5; 6; 8; 9; 12; 14; 18; 20; 24; 26; 30; 32] and Set 4 = [1; 2; 4; 5; 7; 8; 10; 12; 16; 18; 22; 24; 28; 30; 35; 40].

The resulting MSE is reported in panels abc in Fig. 9 for processes (5) with \(\Delta v=1, 2, 3\), respectively. It can be noted that, with some exceptions, the best performance can be obtained when the rate coefficients for values at the edges (i.e., \(v=1\) and \(v=40\)) are kept in the training set, that is, for Set 1 and 3, the difference between removing one point out of 2 or two points out of 3 making a smaller impact on the MSE values. This pattern is more regular for the GP model and more unpredictable for NN. As in all other scenarios although NN is presenting a small MSE, it usually gives slightly larger errors than GP except when the prediction involves rate coefficients corresponding to v values larger than the maximum used in the training set, i.e., when an extrapolation rather than an interpolation of the values is requested.

From the present analysis, it emerges that, regardless of the considered scenario, both NN and GP provide reliable results with very small associated MSE, with GP generally performing better in a predictable way. The only exception to this behavior consists of GP predictions when extrapolating the rate coefficients for v values larger than those contained in the training set (see Fig. 7 and selected sets in Fig. 9), in such cases NN outperforms GP.

Given the above results, comparing the computational time required to train GP and NN models might be useful. The wall times needed in examples of the scenarios investigated above (quite obviously, the wall time depends on the training set magnitude) are reported in Table S1 in SI, which invariably shows that NN is much faster than GP in all cases by one or two orders of magnitude. This is not an issue when the magnitude of the training set is not too large since the training is done only once, but it might become important when larger datasets, depending on an increased number of variables (features), are used, as could be the case of complete datasets for the general V–T/R and V–V processes (14).

As a final step, we built the complete table of rate coefficients for the process (5) covering all the v values up to \(v=40\) using the full set as the training set. The complete dataset of MQC calculated and ML (GP or NN) predicted rates can be found in Zenodo [55]. In Fig. 10 panels (a) and (b), we report the comparison between the ML predictions using the full training set and the reference MQC rates specifically computed for such purpose, for processes (5) with \(v = 34\) and \(v = 36\), respectively. The results show a very good agreement between both GP and NN rates with MQC ones: the match of GP and MQC rates is excellent for the whole temperature range and all \(\Delta v=1,2,3\). NN and MQC rates instead slightly differ at the highest temperature for \(\Delta v=2\) and \(\Delta v=3\). The difference is, however, within 25 %.

5 Conclusions and final remarks

In this work, we have applied a consolidated strategy developed by our research team to calculate large datasets of V–V and V–T/R rate coefficients in a wide temperature range for the N\(_2\)-H\(_2\) collisions. The methodology relies on an accurate non-reactive PES (built according to the ILJ model) able to capture relevant effects occurring at long range, in the van der Waals well and in the first repulsive region of the interaction. The PES was fine-tuned and tested against available experimental data (second virial coefficients and few measured V–V and V–T/R rates) and was subsequently exploited in the dynamical calculations, performed by a MQC method, whereby the full dimensionality of the system is considered and quantum effects associated with the vibrational motion and roto-vibrational coupling are recovered.

In particular, we calculated V–V and V–T/R rates for processes (14) corresponding to several initial vibrational states up to \(v=40\) for molecular nitrogen and \(w=10\) for molecular hydrogen. The results were compared to the rate coefficients derived from first-order perturbation SSH theory or scaling laws, which are generally used in modeling non-equilibrium systems and plasma environments in different discharge conditions. The comparison is striking: with the exception of processes in the proximity of the experimental data, which were in fact used to fit the SSH parameters, discrepancies between calculated values and those obtained by the approximate SSH formulation are huge, often amounting to several orders of magnitude. This outcome poses serious doubts about the use of such approaches for modelling purposes.

In addition, we present a detailed, though preliminary, analysis on the use of different ML techniques and models for predicting rate coefficients corresponding to initial vibrational states not included in the direct calculations. It turns out that the most robust method is the GP model (also employed in Ref. [25], to the best of our knowledge the only previous attempt to use machine learning to produce rate coefficients for V–V processes), which gives negligible MSE values for interpolation. The NN model also gives very small MSE values (though in a less predictable fashion than GP) and presents two additional interesting features: it seems to lead to better results for extrapolating rates which fall just outside the calculated vibrational range, a case for which the present GP model does not perform too well; and the computational time needed for the training of the datasets is generally an order of magnitude smaller than for GP. This might not be an issue for the specific process (5) investigated here, consisting of a relatively small dataset. However, in the prospect of using ML to produce rate coefficients for the general process:

$$\begin{aligned} \textrm{N}_2(v)+\textrm{H}_2(w) \longrightarrow \textrm{N}_2(v')+\textrm{H}_2(w'), \end{aligned}$$
(9)

i.e., starting from a dataset with increased dimensionality (depending on 4 vibrational quantum numbers), the question of the associated computational time might be significant. The use of GPUs instead of CPUs, as well as a systematic optimization of the NN hyperparameters, can further speed up the training process for NN models [57]. Work on comparing the performance of ML methods depending on the features (e.g., cross sections (collision energies) instead of rate coefficients (temperature), etc.) and on optimizing hyperparameters in the models is currently in progress.