Parameter estimation of an ion-exchange process based on a pseudo probability measure

The manufacturing of integrated optical waveguide components can be complex if the design space is highly multivariate or constraints introduced by the process itself are not well-known. Numerical simulation helps to create a basic understanding of the process and the resulting components with their features but is costly with respect to time and hardware resources. The estimation approach in this work introduces a precise, numerically robust and controllable method without any black boxes to estimate the input parameters of the manufacturing ion-exchange process. Therefore, small number of simulation samples is used to determine basic dependencies of desired waveguide component features on the input parameters. A pseudo probability measure is then used to estimate the combination of process parameters that is most likely to result in the required features. The feature data is detached from the sample size of the simulation data with a multidimensional curve fit in order to debloat the estimation model. The implications of the estimation are wide-ranged and not restricted to the physics of the ion-exchange process, i.e. easy to be abstracted to other models. The estimation enables parameter subspace optimization or clustering of solution. Model benchmarks show a very high estimation rate for all types of considered applications.


Introduction
For manufacturing integrated waveguide components by diffusion, a metallic mask is grown on a substrate material. The immersion in an ionic salt melt results in a characteristic change of silver ion concentration in the areas of the mask opening and thus in an altered refractive index (Ramaswamy and Srivastava 1988;Tervonen et al. 2011). With a given set of process parameters, the ion-exchange process can be accurately modeled by Fick's law. Although semi-analytic modeling approaches help with the forward simulation (Roth et al. 2018a(Roth et al. , 2019, three-dimensional simulations remain both resource and time intensive. However, practical applications mostly require the estimation of the process parameters prior to the manufacturing based on specifically desired features of the resulting components. Conventional approaches for the determination of the desired features mostly imply forward simulation approaches. In order to ensure that certain requirements are met, a fundamental understanding of the manufacturing process is necessary, some of which can be gained by means of simulation (Roth et al. 2018b(Roth et al. , 2019 or by experience, but costly with respect to time and/or hardware resources. There are, of course, alternatives to 'brute force' design space sweeping that rely on general optimization techniques, metaheuristic algorithms or neural networks (Zhang et al. 2014). But there are also problems that arise from these approaches. Also, many of these approaches are black boxes because more often than not the direct impact of the parameters of the algorithms on depending variables is obscure or of little significance. Thus, controlling individual aspects of the algorithms can prove to be difficult. This is especially true for techniques that only consider a subspace of the design space or the according co-domain. Furthermore, there are many differences with respect to data treatment. The modeling data can be extensive or might need to be filtered. Also data clusters might be of relevance. The estimation approach presented in this work implies a pseudo probability measure and solely relies on basic mathematic procedures. The three main steps of the approach presented in this work are data characterization, numerical compression and estimation, each of which can be adjusted to the specific needs of the problem. This means that all calculation procedures and also their impact on any depending variable can be directly evaluated with respect to robustness, error and performance. This makes the estimation process open, controllable and debuggable at any given step.

Manufacturing process
Integrated optical waveguide components in electro-optical printed circuit boards (EOCB) can be manufactured by thermal ion-exchange processes. Therefor, a metallic mask is grown on the substrate material. The shape of the mask can be designed arbitrarily-its accuracy is consequently only limited by the tolerances of the manufacturing process. Figure 1 shows the schematic structure of the manufacturing process. By the application of a diluted salt melt at a specific temperature, Ag + ions diffuse into the substrate replacing Na + ions in the borosilicate glass substrate. The manufacturing process of integrated components can thus be divided into four basic steps accordingly: (1) Metallic coating, (2) Mask structuring, (3) Application of the diluted salt melt and (4) Mask removal and surface treatment. These steps lead to a specific profile of silver ion concentration in the area of the mask opening. The silver ions are also present in the area under the mask structure, which is referred to as side diffusion. For thermal ion-exchange processes, this effect increases with temperature. By specifically choosing the process parameters such as width of the mask or an arbitrary mask structure respectively, diffusion times, temperature, which implies the diffusion coefficient D of Ag + ions, and the initial ion concentration in the salt melt, the ion-exchange process and the resulting profile respectively can be accurately described by Fick's law where t is the diffusion time, D is the self-diffusion coefficient of Ag + ions in the glass substrate and c is the Ag + concentration in the substrate. The resulting Ag + concentration in the substrate can be calculated by solving Eq. (1). Evaluating a two-or three-dimensional geometry can only be done numerically. Here, both an efficient algorithm based on the Finite-Element-Method (FEM) as well as semi-analytical modeling tools were used (Roth et al. 2018a). As mentioned above, the geometric shape of the mask structure can generally be designed arbitrarily. However, the exact mask structure for the realization of a specific value for the mask opening depends on the type of lithographic manufacturing technique.

Feature extraction
From a practical point of view, it is most desirable to define specific feature or feature values of the components and to set up the manufacturing process accordingly. However, the nature of a highly multivariate design space often restricts a comprehensive understanding of global feature dependencies on the process parameters. While the forward simulation does provide a limited understanding of the relation between feature values and input parameters, the required brute force sweep of the design space is certainly limited by large computation times or finite physical memory. This also holds for the nature of the thermal diffusion process because its design space is too large to find the right parameter combination by sweeping through a sufficient and representative number of input parameter combinations. This leads to the need of a more efficient model to predict or estimate the input parameters of the manufacturing process based on the desired feature values of a (1) c t = ∇(D∇c),

Metallic Mask
Silicate glass (Na 2 O) Diluted salt melt (AgNO 3 ) Fig. 1 Schematic design of the manufacturing process. The diluted salt melt (AgNO 3 ) is applied to the borosilicate glass substrate (Na 2 O) at the area of the mask opening. A characteristic change of silver ion concentration occurs in this area which leads to a corresponding change of the refractive index. Silver ions are also present in the area under the mask structure, which is referred to as side diffusion component to avoid time-consuming simulation work and preferably without having to make educated guesses for initial values or solution subspaces. Figure 2 depicts the multivariate design space. A chosen set of process parameters yields the according concentration profile from which relevant features can be derived. This leads to a fundamental problem because especially the non-linear saturation effects of the diffusion process create ambiguities in the feature characterization process. In other words, the mapping from design space onto the co-domain, i.e. the concentration profile, is not bijective and multiple combinations of process parameters can and do result in one or even multiple matching features for the very same input parameter combination. The features used for the purpose of parameter extraction are shown in Fig. 3. The profile is therefore characterized by its maximum concentration c max , the diffusion depth at c max as well as by its height and width. Furthermore, the process parameters of the diffusion process are the initial concentration c 0 of silver ions in the salt melt, diffusion times t in , t out and the mask width w as listed in Table 1. The range of these input parameters represents the design space sweep, which is used to characterize the features. In order to cover all types of potential applications for the

Fig. 3
Exemplary concentration profile and the defined features for the purpose of parameter estimation, which are maximum concentration c max , diffusion depth at c max as well as height and width of the profile resulting components, the range of the input parameters has to be extensive. The different combinations allow single-and multimode waveguides with low and high index changes respectively. The choice of this wide input parameter range can thus be understood as worst case scenario and leaves plenty of room for improvements if the requirements of the applications limits the input parameter range a priori, e.g. in the case of single-mode waveguides. Consequently, limiting the input parameter range will improve the accuracy of the estimation, mainly since the error of the parameter compression can be reduced. Extending the input parameter range might reduce the accuracy of the estimation. The basis for the estimation in this work is the determination of the dependencies between waveguide component features and process parameters, which can be done with a small, limited number of simulations. These 1560 sampling points for two-dimensional waveguide geometries were used to create a data set for feature characterization and can be simulated in a matter of hours. In general, the dependencies of feature values and input parameters are ambiguous and therefore a single feature is not sufficient for the estimation of all input parameters. The feature extraction results in a four-dimensional matrix per feature. In the first step, two process parameters are selected as primary-in this case the diffusion times t in and t out , which were chosen because their combination is the most ambiguous (Roth et al. 2020), which means that, for example, any maximum concentration c max can be reached by just varying the two diffusion times. The results then differ, however, in other features, e.g. with respect to diffusion depth. The feature characterization has been carried out at a constant temperature of T = 250 • C, which leaves four process parameters. While the general model is not limited to a specific choice of temperature, it would add an additional dimension to the estimation process. With the remaining four process parameters we can determine four matrices as four-dimensional feature matrices, which hold the feature values for each combination of input parameters. The symbols (⋆, #, * , ⋄) indicate the respective feature.

Parameter compression
The creation of the feature matrices F is based on simulation samples. This is problematic since the size of the matrices and consequently the numerical complexity of the estimation model is linked to the extent of design space sampling used for the generation of the feature matrices. A solution can be a 'parameter compression'. Therefor, the feature matrices are curve fitted over the two primary process parameters t in and t out . In this case, this can be done efficiently with a 'poly22' fit, where the base functions for the fit are polynomial (2) F ⋆,#, * ,⋄ function of second degree. The complete approximation function is a broken rational function consisting of a polynomial base function in both the nominator and the denominator. Taking a closer look at the dimensionality, the feature matrix for every individual feature is a function of the according simulation sample vectors p i . For an exemplary ⋆-feature, the dimensions yield This means that the lengths of the simulation sample vectors p i are m, n, o and p for the according parameters w, t in , t out and c 0 . The poly22 fit of the feature matrix over the primary process parameters t in and t out yields six coefficient matrices per feature matrix with the dimensions m × p , which corresponds to the sample vector lengths of the diffusion times.
This results in a total number of n × o coefficient matrices of the dimension m × p for each of the six individual coefficients and per feature. The amount of six coefficient matrices is a result of the poly22 fit that is used. As a result of this first step, i.e. after the first fitting/ interpolation, the model now is detached from the sample size of the two primary process parameters. To fully detach the model from the length of the simulation sample vectors, the coefficient matrices can be fitted a second time over the secondary process parameters w and c 0 .
The length of the coefficient vector d depends on the type of fit used for this second step. Here, a higher order interpolation with 25 coefficients is used. While it is possible to also use simpler polynomial fits for some features, the resulting impact on fitting errors has not yet been investigated extensively. The interpolation scheme yields 25 coefficients for each of the six coefficients of the polynomial fit per feature. Thus, each feature can be represented by a total of 150 coefficients, which can potentially be reduced even further by choosing a simpler fitting scheme. Nonetheless, this number is completely independent of the initial sample size of the forward simulation. Because of this detachment, the fitting results are now referred to as 'double-fitted' or 'compressed' model.

Parameter estimation
The estimation of the process parameters is done by evaluating the compressed model reversely. The four-dimensional feature dependencies, i.e. the feature matrices, are generated with the approximation functions of the two fitting steps and the according coefficients of the compressed, double-fitted model. At this point, a testing set of known or desired feature values must be provided. For each feature and all combinations of secondary process parameters a deviation f is calculated as a function of the primary process parameters which describes a difference between generated feature value and testing set.
Similar to the generation of the model, this deviation f needs to be calculated for each feature, thus, the feature index. Essentially, it describes the numerical mismatch between the probing value in f probe and the values in the according feature matrix. If the mismatch between probing value and feature value is small for a combination of process parameters, the deviation is also small and vice versa. The constant in Eq. (6) serves the numerical purpose to avoid a deviation of value 0 if feature value and probing values do match exactly. This deviation is used to compute a pseudo probability measure P f , which depends on the two primary process parameters t in and t out . Since the values of P f are not actual probabilities in a strictly mathematical definition of the term, the item is to be understood as a measure of likelihood, hence, pseudo probability.
Hence, the more the modeled feature value and the value of the testing set diverge the higher the deviation f becomes resulting in a smaller probability for this particular combination of process parameters. The estimation is exemplarily illustrated in Fig. 4. Each feature, with its deviation f , leads to an individual probability distribution. When the probability values for each feature and all m × p combinations of secondary process parameters are calculated, they are normalized and superposed. By this superposition of feature probabilities, the most probable combination of primary process parameters and hence of all process parameters is found. It should be noted that the definition of the deviation f and thus also P f as a function of the primary process parameters is done arbitrarily to match the order of the model compression. An alternate separation of the input parameters, i.e. a permutation of their order does not have any influence on the probability values of the estimation results, since only the sequence of calculation is interchanged. Figure 5 shows the distribution of the pseudo probability measure for a testing set of feature values for all stages of the estimation process-simulation data (uncompressed), single fit (semi-compressed) and double fit (fully compressed). The combination of process parameters that has been identified correctly by all steps of the estimation process is w = 3 μ m, t in = 1.8 × 10 3 s, t out = 3.24 × 10 3 s, c 0 = 0.01 mol/m 3 . Figure 5a shows the probability distribution for the uncompressed simulation data. Expectedly, the contrast of probability between the correct set of process parameters and all other, in this context irrelevant sets is at a maximum. After the first compression step, which is the first curve fit of the feature matrices, the distribution is spread in the plane of the primary process parameters t in and t out . This makes sense, because the feature matrices are fitted over the primary process parameters. The error introduced by the first fit, thus, mainly causes variance for these parameters. Figure 5b illustrates the according probability distribution. The contrast of probability is still sufficient to correctly identify the correct set of input parameters. Lastly, Fig. 5c shows the probability distribution for the double-fitted, fully compressed

Benchmarking
For the practical application of the estimation model, its accuracy has to be determined on a comprehensive scale. Therefor, the model was benchmarked with a large number of random probing samples. The probing samples where derived from the simulation data and the results of the estimation model were then compared directly to the input parameters that were used for the generation of the probing samples. Table 2 shows the results of the global benchmarking of the superposition of all features. Unsurprisingly, all of the input parameters are correctly estimated when the simulation data is used as basis for the estimation process. However, since the simulation data only consists of the original sample Pseudo probability measure p p depending on all primary and secondary process parameter. Probability distributions of p p for all stages of the estimation process are illustrated-simulation data (uncompressed), single fit (semi-compressed) and double fit (fully compressed). Numerical compression artifacts caused by the curve fit are highlighted exemplarily. The combination of process parameters that has been identified correctly by all three steps of the estimation process is w = 3 μ m, t in = 1.8 × 10 3 s, t out = 3.24 × 10 3 s, c 0 = 0.01 mol/m 3 points used for feature characterization, i.e. is not interpolated by the parameter compression process, only these can be the output of the estimation model. This is partially true for the semi-compressed data set since the primary process parameters are interpolated and changes completely with the fully compressed data set in which the results for the input parameters can attain any value within the total range of the original sampling points. One thing to be pointed out in particular is the relatively low matching rate of only 44.23 and 6.35% for the semi-and fully compressed model if a numerically exact match of values is specified. This changes, if a physically irrelevant error bound is added to the matching algorithm. The detection rates increase to 75.19 and 90.06%. If the boundary values of the original sampling grid are not taken into account, the matching rate increases further, yet only insignificantly in case of the fully compressed data set. Table 3 shows a more detailed overview of the matching rates with respect to the individual input parameters. The global benchmarking results of Table 2 ultimately consist of the superposition of the individual matching rates listed in Table 3.
Obviously, the characteristics observed in Table 2 reflect the general behaviour of the individual matching rates here. The weak points with respect to complete, correct parameter estimation are diffusion time t in and the initial concentration c 0 . This can be explained by considering the dependency of the two input parameters on the chosen features. Both input parameters only have a weak dependence on most of the features. This is due to the fact that large parts of the input range of t in and c 0 do not impact feature like c max or the profile width, for example, because of non-linear saturation effects of the diffusion process. Certainly, this can be fundamentally different if another process temperature is chosen. An approach to optimize the matching rate for these two input parameters is the definition of another feature, possibly as substitute or as addition. Most important should be a strong feature dependence on the according input parameters to ensure an optimal matching rate. Figure 6 gives an impression of the probability distribution of the estimated input parameter combinations of the fully compressed model. This is shown for all input parameter combinations in Fig. 6a and also only for correctly estimated combinations thereof in Fig. 6c. From the direct comparison it becomes evident that the average maximum probability calculated by the estimation is significantly higher for correctly estimated combinations. Furthermore the according difference in probability to the second most likely combination of input parameters (contrast) is shown in Fig. 6b for all input parameters and also, again, only for correctly estimated combinations thereof in Fig. 6d. The same basic characteristic can be observed as with the previous comparison. The mean of the distributions increases, i.e. the average contrast value is higher for correctly estimated combinations of input parameters. When considering all input parameters, 35% of all combinations can be characterized by a contrast of under 1% whereas only 24% of the correctly estimated combinations share this property. Generally, the low absolute contrast values show that multiple similarly probable combinations can occur, yet the matching rates are satisfyingly high. If evaluated properly, this is a great way of identifying clusters of possible solutions that might be worth investigating. Lastly, both probability and contrast can be further increased if the individual feature probabilities described in Table 3 are maximized.

Conclusion
The estimation of the process parameters of the thermal ion-exchange process yields excellent results with respect to the unambiguous estimation of primary and secondary process parameters for all estimation steps. The relations between characterized features of the concentration profile and the primary process parameters greatly benefit the simplification of the estimation process. Furthermore, the implications of these results open up additional approaches in practical component design. If certain process parameters are fixed due to specific hardware or other external specifications, the subspace of remaining parameters can be estimated in order to find an optimal solution. Also, the visualization of the probability measure reveals clusters of possible solution that might be worth investigating further. Despite the non-optimized fitting for the parameter compression technique and the wide input parameter range, the estimation yields an excellent matching rate of over 90% and can be improved even further with the addition or substitution of a more suitable type of feature or the restriction to certain subspaces if the application allows it. The excellent performance of this approach is directly related to strong feature dependencies on the input parameters of the ion-exchange process. Certainly, the general idea of the approach is not limited to the thermal ion-exchange process and can easily be abstracted to other physics and models. Its utilization can prove to be useful where depending variables or desired characteristics are not easily to be determined because of a technologically complex manufacturing process or a resource intensive numerical simulation, which implies many applications involving component design. However, the estimation relies on strong feature dependencies on the input parameters. Oscillations, saturation effects or other specific nonlinearities in the feature dependencies could limit the effectiveness of the estimation. Thus, the application to other physics and models is to be determined individually.
Funding Open Access funding enabled and organized by Projekt DEAL. The authors have not disclosed any funding.

Conflict of interest
The authors declare that they have no conflict of interest.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.