Machine learning and a computational fluid dynamic approach to estimate phase composition of chemical vapor deposition boron carbide

Chemical vapor deposition is an important method for the preparation of boron carbide. Knowledge of the correlation between the phase composition of the deposit and the deposition conditions (temperature, inlet gas composition, total pressure, reactor configuration, and total flow rate) has not been completely determined. In this work, a novel approach to identify the kinetic mechanisms for the deposit composition is presented. Machine leaning (ML) and computational fluid dynamic (CFD) techniques are utilized to identify core factors that influence the deposit composition. It has been shown that ML, combined with CFD, can reduce the prediction error from about 25% to 7%, compared with the ML approach alone. The sensitivity coefficient study shows that BHCl2 and BCl3 produce the most boron atoms, while C2H4 and CH4 are the main sources of carbon atoms. The new approach can accurately predict the deposited boron–carbon ratio and provide a new design solution for other multi-element systems.


Introduction 
Ceramic matrix composites (CMC) are ideal hightemperature materials because of their high-temperature resistance, wear resistance, low thermal conductivity, prevent further oxidation [5][6][7][8][9], it is a good candidate for oxidation protective materials, and it is usually added to modify the interface and matrix of CMC.
Chemical vapor deposition (CVD) is a promising method for the preparation of boron carbide [7,[10][11][12]. The deposition mechanism for B x C from a BCl 3 -CH 4 -H 2 system has been previously studied. For instance, Berjonneau et al. [13,14] identified the main gaseous species in situ by Fourier transform infrared (FTIR) spectroscopy, such as BCl 3 , BHCl 2 , and HCl. The B/C ratio of the deposits decreased from 3.6 to 2 when the deposition temperature increased from 900 to 1100 ℃. Karaman et al. [11,12] considered BHCl 2 a by-product of the gas phase. Liu et al. [15] systematically studied the relationship between deposition morphology and processing conditions. They found that temperature has an effect on the phase composition and microstructure. At higher temperatures (1050-1100 ℃), the deposits consisted of two interlaced phases (PyC and boron carbide) with low boron content ranging from about 40 to 49 at%. At lower temperatures (900-950 ℃ ), layered and dense B 4 C was produced with uniform phase composition, and its boron content is 65-79 at%.
It was considered that B x C exists as a single stable phase in a large homogeneous region from about 8 to 20 at% carbon concentration (the corresponding B/C ratio is from 11.5 to 4) [7], and then a carbon-rich B x C coating was obtained when B/C ratio was lower than 4 [15,16]. Thus, it is important to elucidate the correlation between the B/C ratio of the deposit and the deposition conditions to achieve reproducibility and consistency in the process. Apart from experimental studies, some numerical studies have provided an in-depth understanding of deposition kinetics [17][18][19], but scarce modeling attempts have been reported in boron carbide. Only Reinisch et al. [20] reported it by combining experimental and reaction kinetics computations, and brought strong evidences of the presence of methydichloroborane (MDB, BCl 2 CH 3 ) in the process.
This paper is concerned with the correlation between deposit composition and processing parameters, such as temperature, inlet gas concentration, pressure, and gas flow velocity. A novel approach that considers gas phase kinetics and heat and mass transfer, in combination with machine learning (ML), is proposed to better understand the actual deposition mechanism and design of the deposit composition. The approach was applied to a BCl 3 -CH 4 -H 2 system, which provides a useful foundation in understanding other precursor systems, such as SiCl 4 -BCl 3 -NH 3 -H 2 -Ar [21] and BCl 3 -MTS-H 2 [13].
This paper is organized as follows: In Section 2, the details of the reactor model (RM) and two machine learning (ML) methods, including error back propagation (BP) algorithm and support vector machine (SVM), are presented. In Section 3, ML was used to directly correlate the global processing parameters (temperature, inlet gas composition, total pressure, reactor inner diameter, and total flow rate) with the experimentally measured boron-carbon ratio of the deposits. However, no satisfactory functional relationship was found. Then, the proposed method was applied and validated by comparing the experimental data.

1 Reactor-scale model
The experimental data reported by Berjonneau et al. [13,14] and Liu et al. [15] are adopted in this study. The specific deposition parameters for their experiments are listed in Table 1. In Ref. [15], B 4 C and pyrolytic carbon (PyC) were deposited onto graphite slices (30 mm × 15 mm × 2 mm) [15]. Boron trichloride (BCl 3 ≥ 99.99 vol%) and methane (CH 4 ≥ 99.95 vol%) were used as the boron and carbon sources, respectively. The reactor was a vertical, hot-wall deposition furnace with a 200 mm inner diameter (a diagram of the equipment is shown in Fig. 1). The temperature within the reactor was 900-1100 ℃, and the pressure was fixed at 10 kPa. Berjonneau et al. [13,14] adopted the analogous CVD reactor, which was a vertical silica glass tube (700 mm in length and 34 mm in internal diameter) heated in its central part by a radio-frequency induction furnace. Their processing conditions are as follows: 850-1000 ℃, 2-12 kPa, and total flow rate of 210-390 sccm.
Because the reactor is cylindrical and symmetrical, its CFD model can be simplified to a two-dimensional axisymmetric model. The mass and heat transfer values were accounted via conservation of mass and conservation of momentum: where p is the total fluid pressure, μ is the hydrodynamic viscosity, ρ is the fluid density, u is the fluid velocity field, and I is the unit tensor. Note: Q is the total flow rate. In_BCl 3 , In_CH 4 , and In_H 2 are the input molar fractions of BCl 3 , CH 4 , and H 2 , respectively. ID is the reactor inner diameter and B/C is the boron-carbon ratio of the deposited products. The chamber was heated through the induction of the graphite layer by the radiation frequency (RF) coil to generate eddy currents. Heat transfer occurred mainly through surface radiation and convection conduction, and the electromagnetic phenomenon was neglected. The main governing equations are as follows: where p C is the fluid heat capacity, T is the deposition temperature of the fluid, and q is the Fourier heat conduction. The radiative transfer equation (RTE) is used to describe the transfer process of radiant heat. is the scattering coefficient. The intensity of directional light will be attenuated by scattering in different directions, and will be enhanced by radiation in different directions. Therefore, this radiation process is described by the following equation where the probability of ray scattering from the direction of   to the direction of  is described by the scattering phase function ( , ) kI represents the radiation of the medium in all directions, where b I is the intensity of the black body.
The most commonly used method for calculating the radiation transfer equation is the discrete coordinate method. The principle of this method is to calculate the ordinate component in the discrete direction. Therefore, it is still necessary to solve the intensity I by calculating the partial differential equation on each discrete ordinate: where i S is the discrete ordinate in the i-th direction and j  is the orthogonal weight.
For multi-component diffusion, the balance of the i-th chemical species includes the contribution of diffusion, convection, and loss/production of species in K gas phase reactions.
Among them, i  and i M represent the mass fraction and molecular weight of species i, respectively. The total mass and mole fraction are , 1, , where ij D and T i D are the matrix of multi-component diffusion coefficient and multi-component thermal diffusion coefficient, respectively. T i D is calculated by gas kinetic theory, and specific information can be found in Ref. [22]. The following equation was used to calculate the binary diffusion rate (m 2 /s) [23]: where M is the molar mass,  is the minimum energy value of the characteristic length; subscripts A and B indicate binary diffusion gas species; and D  is the collision integral: To calculate the collision integral D Ω , we need to define the minimum energy values of the characteristic length and Lennard-Jones interaction potential, which are  (10 -10 m) and b / (K) k  , respectively. It is also necessary to provide a material dipole moment D  (Debye). The values were obtained from Ref. [24], databases, or experiments ( Table 2).  and b /k  of BCl 3 were approximated for the unknown coefficients of other boron-containing intermediates.
The density of the mixed gas is expressed by the ideal gas equation: where M is the molar mass of the mixed gas. The detailed chemical kinetic model that describes gas-phase reactions in CVD has been presented in several studies. Table 3 lists 62 reversible reactions adopted for the gas model in the present case. Ge et al. [25][26][27] focused precisely on the thermodynamics of Si-C-H-Cl systems. Moreover, the temperature conditions they studied are within the range of our research temperature. Therefore, for C-H-Cl systems (G1-G44 in Table 3), we preferentially adopted Ge's calculated data. Based on the most favorable reaction path previously proposed, Liu et al. [28] studied the reaction rate of the BCl 3 /CH 4 /H 2 gas-phase system. Their rate constants were calculated for temperatures within 200-2000 K and adopted G45-G58 [29]. Reinisch et al. [30] reported a set of theoretical experiments of the gas-phase decomposition of boron trichloride in the presence of hydrogen radicals, and their reactions are also included (G59-G62 in Table 3).

2 ML
With the great success of data-driven modeling, ML has received increasing attention [31][32][33][34]. Considering the production of the variety of intermediate species and the complexity of the deposition process of the CVD-B x C, we combined a reactor model (RM) with ML, as shown in Fig. 2 . i X represents the input data set, and i t is the output data set.
Error back propagation (BP) algorithm and support vector machine (SVM) are two commonly used machine learning algorithms. They have many applications in material studies [35,36]; thus, they are applied in ML. The BP algorithm is composed of two processes: forward calculation of data stream (forward propagation) and backward propagation of error signal. In forward propagation, the propagation direction is from the input layer to the hidden layer and then to the output layer. The state of each layer of neurons only affects the next layer of neurons. However, if the actual output does not match the expected output, the process of back propagation of errors is then entered. Error back propagation involves passing the output error back to the input layer through the hidden layer, layer by layer, apportioning the error to all units of each layer, and using the error signal obtained from each layer as the basis for adjusting the weight of each unit. Through these two processes alternately, the error function gradient descent strategy is performed in the weight vector space, and a set of weight vectors are dynamically studied to achieve the minimum network error function value. Learning rate and learning error of BP neural network structure are set as 0.01 and 0.001, respectively. Considering the network accuracy and calculation time, three layers of BP neural network structure are selected, and the number of the hidden layer nodes is chosen as 10 by trial-and-error techniques. The N arbitrary samples ( , ) i i X t were entered into the BP neural algorithm program, and the specific steps are as follows: 1) Initialize the network, and randomly assign each connection weight and threshold values , i t r  ; 2) Compute hidden layers from a given input-output mode pair; 3) Calculate new connection weights and thresholds; 4) Select the next input mode pair to return to the second step, and repeat the training until the network output error reaches the required training.
The SVM algorithm is very powerful; it not only supports linear and nonlinear classification but also linear and nonlinear regression [37]. In the process of determining the hyperplane with the largest geometric interval, only the sample points closest to the hyperplane play a role. Such sample points are called support vectors. This classification model is also called support vector machine. In practical problems, data is usually not linearly separable in a multidimensional space; that is, in the input space where the data is located, there is no hyperplane that can complete the required classification. A feasible solution is to apply kernel techniques to map data from the input space to a higher dimensional space through a specific function and look for hyperplanes in the higher dimensional space. We call this space a feature space. Because the input space is mapped to a higher dimensional feature space through specific mapping, the amount of calculation in the higher dimensional space will increase significantly, and the computational complexity will also increase significantly.
In the SVM regression, the system attempts to fit as many data as possible into the interval while limiting the margin violation. To reduce the amount of calculation, on the premise that the calculation of the support vector machine involves only the inner product calculation, a kernel function is introduced to convert the inner product calculation in the feature space into a non-linear transformation of the inner product operation of the data in the input space. SVM needs to adjust the relevant parameters, mainly penalty parameter (denoted as C) and nuclear function parameter (denoted as G), to get a better predictive accuracy. C demonstrates how much the data range could be adapted for data fitting, the greater the C value, the wider the data range, and thus lead to overfitting more likely. G implicitly determines the distribution of the data after mapping to the new feature space. The larger the G, the less the support vector. The number of support vectors affects the speed of training and prediction. With regard to the optimization of C and G, the generally accepted method is to search the optimum values in a certain range. Here C and G are considered as 2.3 and 4 by grid-search techniques, respectively.
It is essential to assess the error during prediction in order to evaluate the performance of BP and SVM models. This could be carried out by comparing the model predicted value and the experimentally measured values. In this regard, mean absolute percentage error (MAPE) and mean square error (MSE) were employed to assess the closeness of prediction values and measured values.
where i E is the measured experimental value, i P indicates the ML model predicted result, and n refers to the total number of sample points.

1 Temperature field distribution
The temperature distribution of the reactor is shown in Fig. 3. The results show that the temperature distribution in the reactor shows a gradient distribution, and the temperature around the deposition substrate is the highest, which is consistent with the actual processing temperature. The temperature distribution in the middle deposition area is relatively uniform (isothermal zone). Due to the size effect of the reactor and the cooling effect of the gas, the temperature gradient near the inlet and outlet of the isothermal zone was relatively large. The precursor was gradually heated and decomposed

2 Distribution of intermediate species
The distribution of the molar fraction of the intermediate species in the reactor is shown in Fig. 4. The concentration of BCl 3 dropped sharply as it approached the substrate, which was consistent with the temperature distribution trend inside the reaction chamber. The main decomposed gas species of BCl 3 in the reaction chamber were BHCl 2 and HCl, and the trends of BHCl 2 and HCl were the same, which was consistent with the main reaction BCl 3 + H 2 → BHCl 2 + HCl [13]. BHCl 2 was considered an intermediate and/or by-product, which were formed by hydrogen reduction of BCl 3 , according to thermodynamic analysis and experimental work [13,38]. Berjonneau et al. [13] showed that in the case of a homogeneous reaction, CH 4 still exists in a large amount at high temperatures, and in the case of a heterogeneous reaction, CH 4 will decompose to form boron carbide and graphite as a carbon source. The above situation agrees with our prediction. In addition, we found that some intermediate gas phase components, such as BCl 2 , BCl, BH 2 Cl 2 , CHBCl 2 , CHBCl, CH 3 , C 2 H 3 , C 2 H 5 , and C 2 H 6 , have lower molar fractions, indicating that they do not play a dominant role in the B/C ratio of the deposited product during the homogeneous reaction. In addition, Reinisch et al. [20] proved that MDB (CH 3 BCl 2 ) was a major intermediate component by FTIR apparatus. Correspondingly, Fig. 4 shows MDB is the third important boron intermediate whose concentration is only lower than BHCl 3 and BHCl 2 . The results obtained from Reinisch et al. [20] and our model agrees well with each other. Fig. 4 Plots of the gas-phase compositions for (a) boron and (b) carbon elements along the reactor height for deposition temperature of 1000 ℃ , total pressure of 12 kPa, total gas flow rate of 210 sccm, In_BCl 3 is 2/7, and In_CH 4 is 1/7. Figure 5 shows the predicted mole fractions of important species at different temperatures, with a constant feed gas composition and inlet flow rate. The concentrations are the mean values at the substrate surface. Only the major chemical species having a molar fraction larger than 10 −4 are shown. Low temperature (900 ℃) leads to less decomposition of BCl 3 and CH 4 . BHCl 2 has the highest concentration compared to other boron-containing intermediate species. The decomposition of CH 4 is very sensitive to temperature. Remarkably, at 950 ℃, the concentration of hydrocarbon increased with temperature, e.g., C 2 H 2 and C 2 H 4 . The concentration of CH 4 slightly decreased with increasing temperature, and C 2 H 4 became the most abundant carbon intermediate. Further, C 2 H 2 became the second largest hydrocarbon species at higher temperatures. The shift from CH 4 to C 2 H 2 at high temperatures has also been found in hydrocarbon cracking chemistry (C-H system, without B nor Cl). These results demonstrate that the major hydrocarbon species are CH 4 and C 2 H 4 , and the major species containing boron are BCl 3 , BHCl 2 , BHCl 3 , and CH 3 BCl 2 for a typical CMC processing temperature. Our predictions are consistent with Berjonneau's experimental results and thermodynamic calculation [38].

3 Predicted boron to carbon ratio
Firstly, BP and SVM were used to directly correlate the deposition atomic ratio with deposition conditions (temperature, input gas ratio, total pressure, reactor inner diameter, and flow rate), without the RM. Figures  6 and 7 show that the functional relationship between them is not very good (R 2 = 0.938). The average values of MSE and MAPE are approximately 25% and 30%, respectively.
Next, an RM was established to predict the molar fraction of the intermediate gas species generated in various deposition processes. Then, we used BP and SVM to relate the boron-carbon ratio values of MSE and MAPE were around 6.8% and 8.2%, respectively. Figures 8 and 9 show that the experimental measured results and predicted values are close to overlapping (R 2 = 0.975). As shown in Table 5, after inputting the intermediate gas species predicted by RM into the SVM algorithm, the prediction error was reduced from about 25% to 7%. Compared with the BP algorithm, the prediction of SVM was better. The prediction of SVM was determined by the algorithm itself and affected by the sample size. In the case of small sample data, SVM is recommended.
One important application of the ML approach is to interpolate the correlation of B/C ratio vs. experimental control parameters. Given the fixed processing parameters, we calculate the B/C ratio with temperature in Fig. 10. Our model predicts that the B/C ratio attains a minimum at ~950 ℃ and then increases to the maximum at 1100 ℃. Liu et al. [15] found that even at the same gas ratio, the atomic composition obtained at different deposition temperatures is different. Liu et al. [15] and we both reach the same conclusion that it is not possible to simply establish a relationship between the gas ratio and the deposition result.

4 Sensitivity coefficient analysis
After the training session, sensitivity analysis was performed with SVM. Sensitivity coefficients were calculated with respect to the B/C: where 0 v is the predicted deposition B/C by SVM training model when the molar fraction of the i-th gas species is fixed as zero, and the concentrations of the other species remain unchanged; v is the original deposition rate predicted by SVM training model. Therefore, "sensitivity" indicates the degree of influence on the predicted ratio of the components of the deposited product when the studied species is removed. Sensitivity analysis can be used to identify core species in a deposition mechanism. Obviously, a larger S(k) value for a certain substance indicates that it plays a dominant role in the deposition process.
Corresponding to 900, 1000, and 1100 ℃ , we selected samples 2, 7, and 15 to study S. The sensitivity analysis results (Fig. 11) show that the intermediate species with molar fractions that limit the B/C ratio are BCl 3 , CH 4 , BHCl 2 , BHCl 3 , CH 3 BCl 2 , C 2 H 4 , C 2 H 2 , CH 4 , HCl, and H 2 . The sensitivity coefficients of other species were below 0.01, which suggested they minimally influence the B/C ratio. BHCl 2 and BCl 3 provide the most productivity for boron atoms, while C 2 H 4 and CH 4 are the main sources of carbon atoms. The S of CH 4 was the largest, which is consistent with the deposition mechanism proposed by Vandenbulcke [39]. Lartigue et al. [40] also indicated that the production of CH 4 is a predominant step toward deposition. The intermediate BHCl 2 was found to be very important to the deposition process, which is also consistent with Vandenbulcke's experimental results [39].
To compare with the effects of boron-and carboncontaining intermediate species on the composition of deposited products, the S of the main boron-and carbon-containing intermediate species were separately calculated and then compared in Fig. 12. The S of the boron-containing intermediate species increased with temperature. On the contrary, the S values of carboncontaining intermediates were larger at 900 ℃, and the values were higher than that of boron-containing species. This is consistent with the reported mechanism, which states that methane decomposes less as compared Fig. 9 Predicted B/C with reactor scale (input variables are T, P, BCl 3 , CH 4 , BHCl 2 , BHCl 3 , BCl 2 , HCl, and H 2 ).

Fig. 10
Plots of the B/C ratios with deposition temperature, total pressure is 12 kPa, total gas flow rate is 210 sccm, In_BCl 3 is 2/7, and In_CH 4 is 1/7.  to boron trichloride at low temperatures. This result is also consistent with the reported thermodynamic calculation results [38]. Therefore, the deposit composition is limited by methane at low temperatures. At 1000 and 1100 ℃ , the decompositions of the boron-and carbon-containing intermediate species were similar, which demonstrate a competitive deposition mechanism. Thus co-deposition occurs, and the B/C ratio of the deposits gradually decreased with temperature. We concluded the main factor controlling the boron-carbon ratio is temperature. Methane is the controlling gas species of the deposit compositions at low temperatures (~900 ℃), while methane and boron trichloride both affect the deposit compositions at higher temperatures (> 900 ℃).

Conclusions
1) Based on the reported experimental results, we established a simplified two-dimensional reactor model, which couples gas-phase reaction, as well as heat and mass transfer, for a BCl 3 -CH 4 -H 2 system.
3) Comparing ML techniques without RM, the prediction error of ML techniques incorporated with RM reduced from about 25% to 7%. 4) Sensitivity analysis shows that, with the exception of temperature, CH 4 is the key factor in controlling the deposit compositions at low temperatures, while CH 4 and BCl 3 are both important at high temperatures.
In summary, the combination of RM and ML allows for the prediction of the boron-carbon ratio of the deposited product within a reasonable error range (less than 10%). This method can also be applied to other multi-element CVD systems.