1 Introduction

For manufacturing integrated waveguide components by diffusion, a metallic mask is grown on a substrate material. The immersion in an ionic salt melt results in a characteristic change of silver ion concentration in the areas of the mask opening and thus in an altered refractive index (Ramaswamy and Srivastava 1988; Tervonen et al. 2011). With a given set of process parameters, the ion-exchange process can be accurately modeled by Fick’s law. Although semi-analytic modeling approaches help with the forward simulation (Roth et al. 2018a, 2019), three-dimensional simulations remain both resource and time intensive. However, practical applications mostly require the estimation of the process parameters prior to the manufacturing based on specifically desired features of the resulting components. Conventional approaches for the determination of the desired features mostly imply forward simulation approaches. In order to ensure that certain requirements are met, a fundamental understanding of the manufacturing process is necessary, some of which can be gained by means of simulation (Roth et al. 2018b, 2019) or by experience, but costly with respect to time and/or hardware resources. There are, of course, alternatives to ‘brute force’ design space sweeping that rely on general optimization techniques, metaheuristic algorithms or neural networks (Zhang et al. 2014). But there are also problems that arise from these approaches. Also, many of these approaches are black boxes because more often than not the direct impact of the parameters of the algorithms on depending variables is obscure or of little significance. Thus, controlling individual aspects of the algorithms can prove to be difficult. This is especially true for techniques that only consider a subspace of the design space or the according co-domain. Furthermore, there are many differences with respect to data treatment. The modeling data can be extensive or might need to be filtered. Also data clusters might be of relevance. The estimation approach presented in this work implies a pseudo probability measure and solely relies on basic mathematic procedures. The three main steps of the approach presented in this work are data characterization, numerical compression and estimation, each of which can be adjusted to the specific needs of the problem. This means that all calculation procedures and also their impact on any depending variable can be directly evaluated with respect to robustness, error and performance. This makes the estimation process open, controllable and debuggable at any given step.

2 Manufacturing process

Integrated optical waveguide components in electro-optical printed circuit boards (EOCB) can be manufactured by thermal ion-exchange processes. Therefor, a metallic mask is grown on the substrate material. The shape of the mask can be designed arbitrarily—its accuracy is consequently only limited by the tolerances of the manufacturing process. Figure 1 shows the schematic structure of the manufacturing process. By the application of a diluted salt melt at a specific temperature, \(\hbox {Ag}^+\) ions diffuse into the substrate replacing \(\hbox {Na}^+\) ions in the borosilicate glass substrate. The manufacturing process of integrated components can thus be divided into four basic steps accordingly: (1) Metallic coating, (2) Mask structuring, (3) Application of the diluted salt melt and (4) Mask removal and surface treatment. These steps lead to a specific profile of silver ion concentration in the area of the mask opening. The silver ions are also present in the area under the mask structure, which is referred to as side diffusion. For thermal ion-exchange processes, this effect increases with temperature. By specifically choosing the process parameters such as width of the mask or an arbitrary mask structure respectively, diffusion times, temperature, which implies the diffusion coefficient D of \(\hbox {Ag}^+\) ions, and the initial ion concentration in the salt melt, the ion-exchange process and the resulting profile respectively can be accurately described by Fick’s law

$$\begin{aligned} \frac{\partial c}{\partial t} = \nabla (D\nabla c), \end{aligned}$$
(1)

where t is the diffusion time, D is the self-diffusion coefficient of \(\hbox {Ag}^+\) ions in the glass substrate and c is the \(\hbox {Ag}^+\) concentration in the substrate. The resulting \(\hbox {Ag}^+\) concentration in the substrate can be calculated by solving Eq. (1). Evaluating a two- or three-dimensional geometry can only be done numerically. Here, both an efficient algorithm based on the Finite-Element-Method (FEM) as well as semi-analytical modeling tools were used (Roth et al. 2018a). As mentioned above, the geometric shape of the mask structure can generally be designed arbitrarily. However, the exact mask structure for the realization of a specific value for the mask opening depends on the type of lithographic manufacturing technique.

Fig. 1
figure 1

Schematic design of the manufacturing process. The diluted salt melt (AgNO\(_3\)) is applied to the borosilicate glass substrate (Na\(_2\)O) at the area of the mask opening. A characteristic change of silver ion concentration occurs in this area which leads to a corresponding change of the refractive index. Silver ions are also present in the area under the mask structure, which is referred to as side diffusion

3 Feature extraction

From a practical point of view, it is most desirable to define specific feature or feature values of the components and to set up the manufacturing process accordingly. However, the nature of a highly multivariate design space often restricts a comprehensive understanding of global feature dependencies on the process parameters. While the forward simulation does provide a limited understanding of the relation between feature values and input parameters, the required brute force sweep of the design space is certainly limited by large computation times or finite physical memory. This also holds for the nature of the thermal diffusion process because its design space is too large to find the right parameter combination by sweeping through a sufficient and representative number of input parameter combinations. This leads to the need of a more efficient model to predict or estimate the input parameters of the manufacturing process based on the desired feature values of a component to avoid time-consuming simulation work and preferably without having to make educated guesses for initial values or solution subspaces. Figure 2 depicts the multivariate design space. A chosen set of process parameters yields the according concentration profile from which relevant features can be derived. This leads to a fundamental problem because especially the non-linear saturation effects of the diffusion process create ambiguities in the feature characterization process. In other words, the mapping from design space onto the co-domain, i.e. the concentration profile, is not bijective and multiple combinations of process parameters can and do result in one or even multiple matching features for the very same input parameter combination. The features used for the purpose of parameter extraction are shown in Fig. 3. The profile is therefore characterized by its maximum concentration \(c_\text {max}\), the diffusion depth at \(c_\text {max}\) as well as by its height and width. Furthermore, the process parameters of the diffusion process are the initial concentration \(c_0\) of silver ions in the salt melt, diffusion times \(t_\text {in}\), \(t_\text {out}\) and the mask width w as listed in Table 1. The range of these input parameters represents the design space sweep, which is used to characterize the features. In order to cover all types of potential applications for the resulting components, the range of the input parameters has to be extensive. The different combinations allow single- and multimode waveguides with low and high index changes respectively. The choice of this wide input parameter range can thus be understood as worst case scenario and leaves plenty of room for improvements if the requirements of the applications limits the input parameter range a priori, e.g. in the case of single-mode waveguides. Consequently, limiting the input parameter range will improve the accuracy of the estimation, mainly since the error of the parameter compression can be reduced. Extending the input parameter range might reduce the accuracy of the estimation. The basis for the estimation in this work is the determination of the dependencies between waveguide component features and process parameters, which can be done with a small, limited number of simulations. These 1560 sampling points for two-dimensional waveguide geometries were used to create a data set for feature characterization and can be simulated in a matter of hours. In general, the dependencies of feature values and input parameters are ambiguous and therefore a single feature is not sufficient for the estimation of all input parameters. The feature extraction results in a four-dimensional matrix per feature. In the first step, two process parameters are selected as primary—in this case the diffusion times \(t_\text {in}\) and \(t_\text {out}\), which were chosen because their combination is the most ambiguous (Roth et al. 2020), which means that, for example, any maximum concentration \(c_\mathrm {max}\) can be reached by just varying the two diffusion times. The results then differ, however, in other features, e.g. with respect to diffusion depth. The feature characterization has been carried out at a constant temperature of \(T = 250\,^\circ\) C, which leaves four process parameters. While the general model is not limited to a specific choice of temperature, it would add an additional dimension to the estimation process. With the remaining four process parameters we can determine four matrices

$$\begin{aligned} F_{\star ,\#,*,\diamond } \end{aligned}$$
(2)

as four-dimensional feature matrices, which hold the feature values for each combination of input parameters. The symbols \((\star ,\#,*,\diamond )\) indicate the respective feature.

Fig. 2
figure 2

With a chosen set of process parameters, the multivariate design space is mapped onto the co-domain, which is the concentration profile. The characterization of the profile itself yields the relevant feature values of the components

Fig. 3
figure 3

Exemplary concentration profile and the defined features for the purpose of parameter estimation, which are maximum concentration \(c_{\mathrm {max}}\), diffusion depth at \(c_{\mathrm {max}}\) as well as height and width of the profile

Table 1 Input parameters with range and output parameters (features)

4 Parameter compression

The creation of the feature matrices F is based on simulation samples. This is problematic since the size of the matrices and consequently the numerical complexity of the estimation model is linked to the extent of design space sampling used for the generation of the feature matrices. A solution can be a ‘parameter compression’. Therefor, the feature matrices are curve fitted over the two primary process parameters \(t_\mathrm {in}\) and \(t_\mathrm {out}\). In this case, this can be done efficiently with a ‘poly22’ fit, where the base functions for the fit are polynomial function of second degree. The complete approximation function is a broken rational function consisting of a polynomial base function in both the nominator and the denominator. Taking a closer look at the dimensionality, the feature matrix for every individual feature is a function of the according simulation sample vectors \({\underline{p}}_i\). For an exemplary \(\star\)-feature, the dimensions yield

$$\begin{aligned} F_\star ({\underline{p}}_i) = F_\star ^{(m\times n\times o\times p )}({\underline{p}}_i). \end{aligned}$$
(3)

This means that the lengths of the simulation sample vectors \({\underline{p}}_i\) are \(m,\, n,\, o\) and p for the according parameters \(w,\, t_\mathrm {in},\, t_\mathrm {out}\)and \(c_0\). The poly22 fit of the feature matrix over the primary process parameters \(t_\mathrm {in}\) and \(t_\mathrm {out}\) yields six coefficient matrices per feature matrix with the dimensions \(m\times p\), which corresponds to the sample vector lengths of the diffusion times.

$$\begin{aligned} \left. \begin{array}{ll} &{} C_{\star _{00}}^{(m\times p)}\\ &{} C_{\star _{01}} \\ &{} C_{\star _{01}} \\ &{} C_{\star _{11}} \\ &{} C_{\star _{02}} \\ &{} C_{\star _{20}} \end{array}\right\} \quad \text {Coefficient matrices with dimesion }m\times p \end{aligned}$$
(4)

This results in a total number of \(n\times o\) coefficient matrices of the dimension \(m\times p\) for each of the six individual coefficients and per feature. The amount of six coefficient matrices is a result of the poly22 fit that is used. As a result of this first step, i.e. after the first fitting/interpolation, the model now is detached from the sample size of the two primary process parameters. To fully detach the model from the length of the simulation sample vectors, the coefficient matrices can be fitted a second time over the secondary process parameters w and \(c_0\).

$$\begin{aligned} C_{\star _{xx}}^{(m\times p)} \Longrightarrow d_{\star _{xx}} \end{aligned}$$
(5)

The length of the coefficient vector d depends on the type of fit used for this second step. Here, a higher order interpolation with 25 coefficients is used. While it is possible to also use simpler polynomial fits for some features, the resulting impact on fitting errors has not yet been investigated extensively. The interpolation scheme yields 25 coefficients for each of the six coefficients of the polynomial fit per feature. Thus, each feature can be represented by a total of 150 coefficients, which can potentially be reduced even further by choosing a simpler fitting scheme. Nonetheless, this number is completely independent of the initial sample size of the forward simulation. Because of this detachment, the fitting results are now referred to as ‘double-fitted’ or ‘compressed’ model.

5 Parameter estimation

The estimation of the process parameters is done by evaluating the compressed model reversely. The four-dimensional feature dependencies, i.e. the feature matrices, are generated with the approximation functions of the two fitting steps and the according coefficients of the compressed, double-fitted model. At this point, a testing set of known or desired feature values must be provided. For each feature and all combinations of secondary process parameters a deviation \(\sigma _f\) is calculated as a function of the primary process parameters which describes a difference between generated feature value and testing set.

$$\begin{aligned} \sigma _f({\underline{p}}_i) = | F - f_\mathrm {probe} | + \mathrm {const.} \end{aligned}$$
(6)
Fig. 4
figure 4

Exemplary illustration of the superposition of the individual probability distributions of each feature \((\star ,\#,*,\diamond )\), which represent the maximum concentration \(c_\text {max}\), the diffusion depth of \(c_\text {max}\) as well as the height and width of the concentration profile. This superposition is done for all \(m\times p\) combinations of secondary process parameters to identify the maximum probability of all process parameters including the primary process parameters \(p_1\) and \(p_2\). The individual feature probability distributions are not the actual ones but characteristically similar—for the sake of a clean visualization

Similar to the generation of the model, this deviation \(\sigma _f\) needs to be calculated for each feature, thus, the feature index. Essentially, it describes the numerical mismatch between the probing value in \(f_\mathrm {probe}\) and the values in the according feature matrix. If the mismatch between probing value and feature value is small for a combination of process parameters, the deviation is also small and vice versa. The constant in Eq. (6) serves the numerical purpose to avoid a deviation of value 0 if feature value and probing values do match exactly. This deviation is used to compute a pseudo probability measure \(P_f\), which depends on the two primary process parameters \(t_\mathrm {in}\) and \(t_\mathrm {out}\). Since the values of \(P_f\) are not actual probabilities in a strictly mathematical definition of the term, the item is to be understood as a measure of likelihood, hence, pseudo probability.

$$\begin{aligned} P_f(t_\text {in},t_\text {out}) = \frac{1}{\sqrt{2\pi }\,\sigma _f(t_\text {in},t_\text {out})} e^{ -\frac{1}{2} \left( \frac{1}{\sigma _f(t_\text {in},t_\text {out})} \right) ^2} \end{aligned}$$
(7)
Fig. 5
figure 5

Pseudo probability measure \(p_p\) depending on all primary and secondary process parameter. Probability distributions of \(p_p\) for all stages of the estimation process are illustrated—simulation data (uncompressed), single fit (semi-compressed) and double fit (fully compressed). Numerical compression artifacts caused by the curve fit are highlighted exemplarily. The combination of process parameters that has been identified correctly by all three steps of the estimation process is \(w = 3\,\upmu\)m, \(t_\mathrm {in} = 1.8\times 10^3\) s, \(t_\mathrm {out} = 3.24\times 10^3\) s, \(c_0 = 0.01\,\)  mol/m\(^3\)

Hence, the more the modeled feature value and the value of the testing set diverge the higher the deviation \(\sigma _f\) becomes resulting in a smaller probability for this particular combination of process parameters. The estimation is exemplarily illustrated in Fig. 4. Each feature, with its deviation \(\sigma _f\), leads to an individual probability distribution. When the probability values for each feature and all \(m\times p\) combinations of secondary process parameters are calculated, they are normalized and superposed. By this superposition of feature probabilities, the most probable combination of primary process parameters and hence of all process parameters is found. It should be noted that the definition of the deviation \(\sigma _f\) and thus also \(P_f\) as a function of the primary process parameters is done arbitrarily to match the order of the model compression. An alternate separation of the input parameters, i.e. a permutation of their order does not have any influence on the probability values of the estimation results, since only the sequence of calculation is interchanged.

Figure 5 shows the distribution of the pseudo probability measure for a testing set of feature values for all stages of the estimation process—simulation data (uncompressed), single fit (semi-compressed) and double fit (fully compressed). The combination of process parameters that has been identified correctly by all steps of the estimation process is \(w = 3\,\upmu\)m, \(t_\mathrm {in} = 1.8\times 10^3\) s, \(t_\mathrm {out} = 3.24\times 10^3\) s, \(c_0 = 0.01\) mol/m\(^3\). Figure 5a shows the probability distribution for the uncompressed simulation data. Expectedly, the contrast of probability between the correct set of process parameters and all other, in this context irrelevant sets is at a maximum. After the first compression step, which is the first curve fit of the feature matrices, the distribution is spread in the plane of the primary process parameters \(t_\text {in}\) and \(t_\text {out}\). This makes sense, because the feature matrices are fitted over the primary process parameters. The error introduced by the first fit, thus, mainly causes variance for these parameters. Figure 5b illustrates the according probability distribution. The contrast of probability is still sufficient to correctly identify the correct set of input parameters. Lastly, Fig. 5c shows the probability distribution for the double-fitted, fully compressed model. Similar to the single fit there are compression artifacts caused by the error of the fit, which conveniently only occur at the immediate neighbour values of the maximum probability simplifying correct parameter identification.

Table 2 Global benchmarking results

6 Benchmarking

For the practical application of the estimation model, its accuracy has to be determined on a comprehensive scale. Therefor, the model was benchmarked with a large number of random probing samples. The probing samples where derived from the simulation data and the results of the estimation model were then compared directly to the input parameters that were used for the generation of the probing samples. Table 2 shows the results of the global benchmarking of the superposition of all features. Unsurprisingly, all of the input parameters are correctly estimated when the simulation data is used as basis for the estimation process. However, since the simulation data only consists of the original sample points used for feature characterization, i.e. is not interpolated by the parameter compression process, only these can be the output of the estimation model. This is partially true for the semi-compressed data set since the primary process parameters are interpolated and changes completely with the fully compressed data set in which the results for the input parameters can attain any value within the total range of the original sampling points. One thing to be pointed out in particular is the relatively low matching rate of only 44.23 and 6.35% for the semi- and fully compressed model if a numerically exact match of values is specified. This changes, if a physically irrelevant error bound is added to the matching algorithm. The detection rates increase to 75.19 and 90.06%. If the boundary values of the original sampling grid are not taken into account, the matching rate increases further, yet only insignificantly in case of the fully compressed data set.

Table 3 shows a more detailed overview of the matching rates with respect to the individual input parameters. The global benchmarking results of Table 2 ultimately consist of the superposition of the individual matching rates listed in Table 3.

Table 3 Parameter benchmarking results

Obviously, the characteristics observed in Table 2 reflect the general behaviour of the individual matching rates here. The weak points with respect to complete, correct parameter estimation are diffusion time \(t_\mathrm {in}\) and the initial concentration \(c_0\). This can be explained by considering the dependency of the two input parameters on the chosen features. Both input parameters only have a weak dependence on most of the features. This is due to the fact that large parts of the input range of \(t_\mathrm {in}\) and \(c_0\) do not impact feature like \(c_\mathrm {max}\) or the profile width, for example, because of non-linear saturation effects of the diffusion process. Certainly, this can be fundamentally different if another process temperature is chosen. An approach to optimize the matching rate for these two input parameters is the definition of another feature, possibly as substitute or as addition. Most important should be a strong feature dependence on the according input parameters to ensure an optimal matching rate.

Fig. 6
figure 6

Normalized histogram of maximum probability values for the fully compressed model for all input parameters (a) and only for matching combinations (c). According difference in probability (contrast) to the second most likely combination of input parameters for all input parameters (b) and matching only for matching combinations (d)

Figure 6 gives an impression of the probability distribution of the estimated input parameter combinations of the fully compressed model. This is shown for all input parameter combinations in Fig. 6a and also only for correctly estimated combinations thereof in Fig. 6c. From the direct comparison it becomes evident that the average maximum probability calculated by the estimation is significantly higher for correctly estimated combinations. Furthermore the according difference in probability to the second most likely combination of input parameters (contrast) is shown in Fig. 6b for all input parameters and also, again, only for correctly estimated combinations thereof in Fig. 6d. The same basic characteristic can be observed as with the previous comparison. The mean of the distributions increases, i.e. the average contrast value is higher for correctly estimated combinations of input parameters. When considering all input parameters, 35% of all combinations can be characterized by a contrast of under 1% whereas only 24% of the correctly estimated combinations share this property. Generally, the low absolute contrast values show that multiple similarly probable combinations can occur, yet the matching rates are satisfyingly high. If evaluated properly, this is a great way of identifying clusters of possible solutions that might be worth investigating. Lastly, both probability and contrast can be further increased if the individual feature probabilities described in Table 3 are maximized.

7 Conclusion

The estimation of the process parameters of the thermal ion-exchange process yields excellent results with respect to the unambiguous estimation of primary and secondary process parameters for all estimation steps. The relations between characterized features of the concentration profile and the primary process parameters greatly benefit the simplification of the estimation process. Furthermore, the implications of these results open up additional approaches in practical component design. If certain process parameters are fixed due to specific hardware or other external specifications, the subspace of remaining parameters can be estimated in order to find an optimal solution. Also, the visualization of the probability measure reveals clusters of possible solution that might be worth investigating further. Despite the non-optimized fitting for the parameter compression technique and the wide input parameter range, the estimation yields an excellent matching rate of over 90% and can be improved even further with the addition or substitution of a more suitable type of feature or the restriction to certain subspaces if the application allows it. The excellent performance of this approach is directly related to strong feature dependencies on the input parameters of the ion-exchange process. Certainly, the general idea of the approach is not limited to the thermal ion-exchange process and can easily be abstracted to other physics and models. Its utilization can prove to be useful where depending variables or desired characteristics are not easily to be determined because of a technologically complex manufacturing process or a resource intensive numerical simulation, which implies many applications involving component design. However, the estimation relies on strong feature dependencies on the input parameters. Oscillations, saturation effects or other specific non-linearities in the feature dependencies could limit the effectiveness of the estimation. Thus, the application to other physics and models is to be determined individually.