1 Introduction

Customer demands for higher product quality as well as the requirement to flexibly adjust their production are increasing challenges for manufacturing companies. Especially in high-wage countries, companies are aiming for optimized process efficiency with low waste production and fast throughput times of each manufacturing order [1]. One derived focus point of these claims targets fast determination of an optimized operating point in manufacturing processes: Process engineers or operators are required to setup a production process in short time to minimize machine downtime. On the other hand, a minimum of waste production shall be produced to identify an optimized set of machine parameters for the process. Depending on the complexity of the process and the capability of the engineer or operator, a manual optimization based on expert knowledge or trial-and-error will take up an unpredictable amount of time, generate significant waste, or will not lead to an optimized process at all [2].

In recent years, machine learning (ML) algorithms have proven their feasibility to serve industrial purposes, introducing an objective, data-driven approach to process optimization, scheduling, and failure diagnosis [3,4,5,6]. Being fitted to a defined set of process data, ML models serve as a surrogate model of the depicted process with fixed input and output parameters. In opposition to well-established linear modelling technique such as a simple regression, many ML models are able to adapt to non-linear process behavior and relationships between input and output parameters of the models. Especially artificial neural networks (ANNs) have proven to display superior results for non-linear modelling assignments.

The injection molding (IM) process is one of the most important plastics manufacturing processes and an example for a highly complex manufacturing process with non-linear process behavior [7]. Typical application fields for the products are automotive, medical, electric, and houseware. IM is a discontinuous manufacturing process during which polymer material is drawn into the barrel of a plasticizing unit by the rotation of a screw. While the material is melted by friction and external heat, introduced by band heaters around the barrel, it accumulates in front of the screw tip, the screw anteroom. During this dosing process, the screw continuously draws back in a translational movement to allow the material accumulation under a certain applied back pressure. Once enough plastic melt has been collected in the screw anteroom, the rotation stops, and the screw injects the plastic melt under high pressure through the nozzle into a tempered mold to achieve a volumetric fill of the cavity. The mold contains one or more cavities which take the melt in, shape and cool the material, and finally eject the solidified part after opening of the mold. The heat exchange is realized by a tempered coolant running in cooling channels in the mold. During the holding pressure phase, the screw keeps pressing melt into the cavity to compensate volumetric shrinkage due to the melt’s pvT behavior. The plasticizing unit recuperated the injected plastic melt during the cooling and ejection phase, enabling the start of a new production cycle after closure of the mold.

However, as suitable as ML methods may be for the modeling of production processes such as injection molding, a pending hindrance of a broad application is the requirement of an adequate amount of training data. Potential applicants of the technology are repelled by the necessity to generate purposely collected training data for each process that shall be modelled [8]. On the other hand, in times of “plastics industry 4.0,” manufacturers can choose from a variety of solutions for a comprehensive data collection on the production floor [9]. Hence, it is likely that manufacturing enterprises will have plenty of existing process data at hand in future. Therefore, it would be desirable to reuse that amount of data from other injection molding processes in order to reduce the necessary amount of training data which needs to be generated in a conventional approach to train an ML model for a new process. In this paper, transfer learning is used to investigate if machine learning is suitable to reduce the amount of training data that is needed to train a ML model for a specific injection molding process, here the target process, when utilizing trained source models from already known processes.

The paper is organized as follows. Chapter 2 gives a rough overview over the state of art of applied ML methods for the optimization of injection molding processes. The idea of transfer learning and its adaption to the use case injection molding process setup is depicted. ANNs are used to model the injection molding processes. Chapter 3 illustrates the used artificial neural network and used hyperparameters, as well as the specimen and the methodology for the transfer learning approaches. Chapter 4 shows the transfer learning results and analyzes their significance regarding the above described use case. Chapter 5 concludes the proposed work and gives an outlook regarding further research on the topic.

2 Injection molding and transfer learning

2.1 Artificial neural networks in injection molding

In terms of industrial manufacturing processes, the injection molding process is particularly qualified for data-driven optimization because of its high non-linear process behavior and complex relationship between machine, process, and quality parameters [10]. An accurate description of the process behavior based on physical modeling is not possible, e.g., due to the visco-elastic thermoplastic material characteristics [11].

Various researchers have investigated the possibility to implement data-driven methods to model and optimize the process regarding different quality parameters. For example, Sedighi et al. proposed a combination of a radial basis function ANN and a genetic algorithm to optimize the gate location, which served as the input parameter [12]. Therefore, the objective was the reduction of a weld line formation on the considered product before the start of production. Simulations for the injection molding process were conducted by the software Moldflow. The gate location was randomly varied on the bottom of the part to perform an effective system identification.

Bensingh et al., on the other hand, modelled several surface quality parameters such as the surface roughness or waviness for bi-aspheric lenses using ANNs. The input parameters were fill time, fill pressure, holding pressure, melt and mold temperature, as well as cooling time with several quality parameters such as the surface roughness or waviness [13]. The trained ANN was then used as a virtual model for the injection molding process during the optimization of the manufacturing conditions where Bensigh et al. compared a conventional with an adaptive particle swarm optimization (PSO) method and a genetic algorithm to exploit the trained model due to the non-invertibility of feedforward ANNs [14, 15].

Shi et al. developed an offline optimization approach for injection molding machine parameter settings, using ANNs as a process surrogate model and genetic algorithms as well [16]. The mold temperature, melt temperature, injection time, and injection pressure were used as input parameters, while the maximum shear stress served as a quality parameter in the component, as this is a main reason for possible warpage. The training database for the ANN model was formed by simulation data.

Many more researchers proposed ANN-based models of the injection molding process for a subsequent optimization of the product warpage [17,18,19,20,21,22], mechanical properties [23,24,25], or even a combination of several quality parameters together in a single model [26,27,28]. Each of the above described research works refer to an explicitly generated database, introducing an iterative data generation process and therefore costs into the optimization. The sharing of different aspects of the modelling assignment, e.g., the database or the models itself, could reduce the initial costs for the application of data-based methods in injection molding and lead to a resilient manufacturing considering several processes [29, 30]. Transfer learning is assumed to be a feasible approach to achieve this goal.

2.2 Knowledge transfer in injection molding models

Transfer learning (TL) describes, in terms of machine learning, the transfer of knowledge from one or more source assignments AS to a target assignment AT [31]. Similarity between the assignments is assumed to prevent the so-called negative transfer which would ultimately reduce the model quality when performing transfer learning [32]. An assignment A consists of a domain D and a task T. A domain is represented by a defined input parameter space X and a marginal data distribution P(X). A task contains a defined output parameter space Y as well as a model f(X) with f : X → Y. From a probabilistic point of view, f(X) can also be denoted as the conditional probability P(Y| X). During fitting in supervised learning, a training algorithm utilizes an amount M of labelled data samples (xi, yi) to train the model f(X) (see Eqs. 1, 2, and 3).

$$ D=\left\{X,P(X)\right\} $$
$$ T=\left\{Y,f(X)\right\} $$
$$ M=\left\{\left({x}_i,{y}_i\right)\ |\ x\in {\mathbb{R}}^a,y\in {\mathbb{R}}^b,a,b,i\in \mathbb{N}\right\} $$

Torrey and Shavlik define three possible advantages of transfer learning for any new assignment as seen in Fig. 1 [33]. Firstly, in case of a very similar source domain Ds and target domain DT, transferred models or models trained with substitutionary data from Ds could already be eligible to make accurate predictions for the target assignment without further training (1). Secondly, few additional target assignment training data samples MT provoke a fast adaption and fast tuning of the model to the new assignment with regard to a conventional approach without transfer learning (2). Thirdly, the shared knowledge can ultimately lead to a higher generalization ability of the resulting model, e.g., due to the enrichment of the training dataset by the combination of native (MT) and substitutionary (MS, i) data from other data domains DS, i (3).

Fig. 1
figure 1

Possible advantages of transfer learning

Besides differentiating transfer learning in model-based and data-based approaches, the three categories inductive, transductive, and unsupervised transfer learning can be used as a classification. Transductive and unsupervised learning are not utilized in this study; further information on these topics can be obtained in the denoted references [34, 35].

Inductive transfer learning assumes that based on the transfer from one assignment AS to another assignment AT, the source and target domains Ds and DT are equal, and the source and target tasks TS and TT are different regarding the given definitions. Yosinski et al. describe the fundamental process for inductive transfer in machine learning [36]. In their work, a convolutional ANN is firstly fitted to an adequate dataset size to achieve a good model quality. The model’s first n layers are extracted to hybridize a new ANN with partially pretrained and partially newly initialized neuron layers. The results derive that the transfer learning success depends on the similarity of the provided pretrained data or model to the target data: If high similarity can be assumed, only fine tuning to the target task needs to be done as the fundamental feature extraction has been computed in the first n layers of the pretrained model [37]. While Yosinski et al. worked with images, other research was conducted regarding transfer learning, e.g., in the fields of handwriting recognition [38], recommendation systems for online shops [39], or natural language processing [40]. In earlier work, transfer learning has been applied by Tercan et al. for injection molding as a regression Task, improving the model’s degree of determination for a small experimental training dataset. A prior training with simulation data of the same process has been conducted in order to let the ANN adapt to fundamental relations between machine setting parameters and quality parameters [41]. For the model, injection time, holding pressure, holding pressure time, mold temperature, cooling time, and melt temperature served as input parameters. The part weight from simulation and the real experiments, respectively, served as the quality parameter. Among other things, Tercan et al. were able to see an accelerated learning adaption of the model to the provided experimental injection molding data as well as a higher generalization capability of the adapted model.

3 Data and methodology

However, the previously described transfer learning approach for injection molding remains iterative for the optimization of several processes: For every new article being produced on a machine, the necessity arises to rerun new simulations and fits a new ANN. Therefore, transfer learning between several geometrically different specimen is conducted and evaluated as a potential approach to reduce the amount of samples that need to be generated.

3.1 Specimen geometries and data generation

Sixty different toy building blocks have been designed with the software Autodesk Inventor Professional 2018 (Autodesk Inc., San Rafael, CA, USA). The geometries can be categorized by size and configuration:

  • The size is depicted by the amount of studs in a row and the amount of parallel rows. The illustration of the toy building block in Fig. 2 has namely the size “4 × 2.” Further designed toy building blocks vary in the size by 1, 2, 3, 4, 6, or 8 studs and 1 or 2 rows.

  • A total of 5 different configurations of the toy building blocks are generated. Based on a configuration named original, all dimensions have either been scaled up or down by the factor 3 (x 3, /3), or the shoulder height of the toy building blocks has been doubled or halved (doubled, halved, see Fig. 3).

Fig. 2
figure 2

Exemplary dimensions of toy building block “4 × 2 original”

Fig. 3
figure 3

DoE for simulation of different part geometries

For each part, injection molding process simulation data is generated with the software Cadmould 3D-F Solver Version (Simcon kunststofftechnische Software GmbH, Würselen, Germany). Seventy-seven different machine settings are sampled per part based on an a priori defined Design of Experiments (DoE). A central composite design is chosen as DoE. It consists of a 26 full factorial test plan together with 12 face-centered stars and the center point. In total, 4620 simulations have been conducted for all parts together. A structured DoE for the identification of the process behavior is necessary. This way, the impact of varied machine setting parameters on the chosen quality criteria of the injection molding process can be observed. The use of process data, gathered during series production, usually depicts the process only in a very limited operating point, rendering predictions for different operating points unvalidated [42]. In order to choose a practical approach for the simulations, default meshing is selected when the parts are imported into the Cadmould simulation software. The element length for the meshing for the smallest part is set to 0.376 mm, and for the largest part to 7.225 mm.

Figure 3 shows the exemplary depiction for a 3-dimensional machine parameter search space as well as the absolute values for the chosen machine parameters in the DoE. Six machine parameters are varied for the DoE: injection volume flow, melt temperature, mold temperature, holding pressure, holding pressure time, and cooling time. The part weight is considered an important quality criterion for the injection molding process and, therefore, chosen as such for the modelling assignments [43, 44]. It is determined by parsing the calculation of the part weight of the simulation after the last discrete time step of the holding pressure phase simulation. The 3D-F solver of Cadmould uses a Hele-Shaw approximation of the Navier-Stokes equations to model the flow of the polymer melt [45]. The simulations are conducted with a polypropylene PP 579S from SABIC Deutschland GmbH & Co. KG (Düsseldorf, Germany). The Carreau-WLF law is used to determine the material viscosity [46], and the Renner law serves as a model for the pvT behavior [47]. The parameters for the chosen models can be found in Tables 1 and 2. Further material properties, e.g., the specific heat capacity and thermal conductivity, can be found in Table 3. The simulations for each geometry have been performed with uniform cavity wall temperature.

Table 1 Carreau-WLF model parameters
Table 2 Renner law model parameters
Table 3 Material properties of SABIC PP 579S

Each part is molded with two gates on the short wall sides of the toy building blocks. The gates are located on one of the two symmetrical axes of each part, leading to a symmetrical flow front and temperature distribution during injection and holding pressure phase. Figure 4 shows several filling stages of the surface temperature distribution during the process simulation for the “4 × 2 original” toy building block.

Fig. 4
figure 4

Surface temperature profile during different stages of filling for part “4 × 2 original”

3.2 Transfer learning: categorization and approach

In Chapter 3, inductive transfer learning was described in a detailed manner. The transfer of knowledge of different injection molding setup processes complies with requirements for this respective category regarding the following aspects:

  • All data has been generated varying the same machine parameters of the injection molding process: injection volume flow, melt temperature, cavity wall temperature, holding pressure, holding pressure time, and cooling time. Therefore, XS, i = XT.

  • The marginal probability of each data sample for every assignment is equal because it has been generated in a controlled DoE. Therefore, PS, i(XS, i) = PT(XT).

  • The quality parameter for all toy building blocks considered in this study is the part weight, leading to a 1-dimensional quality parameter space. Therefore, YS, i = YT.

  • Each model f(X) correlates the data of a different injection molding process to the respective part weights. As different geometries of the produced parts are used, the correlation functions are assumed to vary from process to process. Therefore, fi(Xi) ≠ fj(Xj) ≠ fT(XT) with {i, j ∈  | i, j ≤ 60,  i ≠ j}.

One main aspect why ANNs are not widely used for the modelling of injection molding processes is the requirement of an abundant amount of available data for the training process. Torrey and Shavlik found one of the possible three main advantages of transfer learning to be the fast model adaption for a low training effort and superior model determination respective to the conventional training of an ANN [33]. The transfer learning approach for the injection molding process data has been designed accordingly: It shall be investigated how the resulting model determination of the ANN for a transfer learning approach is compared to a conventional training while providing a rising amount of samples for the training of the models.

The choice of the architecture of the model as well as several hyperparameters for training can significantly influence the success of the model building. In order to ensure comparability between all source and target assignments, the used structure of the ANN was defined and fixed for all experiments. The chosen architecture and hyperparameters are adopted from previously conducted experiments in the style of a random search with the dataset of the “4 × 2 original” toy building block. In the field of injection molding, ANNs with one hidden layer and a single digit number of hidden neurons are frequently used for the modelling of the process and found to be suitable for a high model quality [48, 49]. Therefore, a model structure with one hidden layer and seven neurons in the layer is chosen. The selected hyperparameters in Table 4 resulted in a convergence of the model during training. Early stopping was used to prevent overfitting during training and to keep the training process to a timely minimum [50]. All implementations are done in python (version 3.7.5), utilizing the python modules TensorFlow (version 2.0.0), Keras (version 2.3.1), and Scikit-Learn (version 0.21.3).

Table 4 ANN architecture and hyperparameters

It is assumed that for the injection molding assignments AS, i, enough data has been sampled as training data for the training of the source models to achieve a good model determination after training. The assignment to learn the correlation between machine parameters and the part weight for the “4 × 2 original” domain is selected and defined as the target assignment AT. In the experimental setup for source domain training and conventional training,

  • All fS, i(XS, i) (source-ANN) are trained with the maximum amount of data without evaluation with a test partition in order to achieve the maximum model quality for the source models.

  • fT(XT) is trained and evaluated in a conventional approach with defined steps of training data availability in order to determine the respective model quality.

Yosinski et al. tested hybridized models consisting of pretrained and newly initialized layers [36]. Accordingly, in this paper, approaches with varying pretraining degrees are chosen (rf. Fig. 5). Three different intensities of the layer transfer and reassembly as a hybridized transfer learning ANN (TL-ANN) are investigated:

  1. 1.

    Transferring a trained input layer, untrained hidden and output layer (IL)

  2. 2.

    Transferring trained input and hidden layers, untrained output layer (IL_HL)

  3. 3.

    Transferring the complete trained model (CM)

Fig. 5
figure 5

Three intensities of network-based knowledge transfer

In order to evaluate the induced transfer learning results for AT, the TL-ANN model quality is tested against the model quality of a conventional approach for the “4 × 2 original” part without a layer transfer (conv-ANN).

3.3 Data separation and conducted experiments

According to induced transfer learning, the newly created TL-ANN must be adapted to the new task TT. Therefore, a secondary training step is conducted with some available data from MT (rf. Eq. 4). In order to determine the model’s generalization quality for few target data samples, artificially reduced datasets DS are provided.

Equation 6 defines the generation of DS for the data. As the cardinal number, the number of samples in the dataset, of MT is 77, sampling effects are probable to occur, especially when providing a very small amount of training data as an excerpt of MT. Therefore, the experiments are prepared 10 times with randomly shuffled data by a pseudorandom generator. The resulting data sequences OT, β are defined in Eq. 5. Each ordered quantity is indicated by the index β. The data sequences are the discretely sliced into training and validation data OT, β; α on one hand and testing data OT, β; 1 − α on the other hand. Nineteen experiments with a consecutively rising amount of combined training and validation data are conducted to simulate reduced amounts of training and validation data, starting with 4 samples of the available data and increasing in steps of 4 samples. The respective samples which are not used during training and validation are used for testing. Therefore, DS contains 190 different datasets of AT.

$$ {M}_T=\left\{\left({x}_i,{y}_i\right)\ |\ x\in {\mathbb{R}}^6,y\in \mathbb{R},i\in \left\{1,\dots, 77\right\}\right\} $$
$$ {O}_{T,\beta }={\left({\left(x,y\right)}_j\right)}_{j\in \left\{1,\dots, 77\right\}}\kern1.25em \forall \beta \in \left\{1,\dots, 10\right\} $$
$$ DS=\left\{\left({O}_{T,\beta; \alpha },{O}_{T,\beta; 1-\alpha}\right)|\ \alpha =0.05\ast \delta, \delta \in \left\{1,2,\dots, 19\right\},\beta \in \left\{1,\dots, 10\right\}\right\} $$

Table 5 displays the overall amount of planned model trainings and evaluations for the investigation in this paper. Within the conventional approach, a total of 1900 fS, i(XS, i) are trained as conv-ANN with initially untrained neuron layers. While the conv- and TL-ANNs are trained with a rising amount of training samples, all source models fS, i(XS, i) are trained with 80% of the totally available domain data. For the conventional and source domain trainings, as well as the transfer learning approach, each training run is repeated 10 times in order to prevent initialization effects by the he_normal initialization of the neuron weights per layer in the results. Also, 10 different data sequences are created by a pseudorandom number generator as explained before for the dataset generation of DS. For the source domain training, the data shuffling is conducted due to the assumption that for the corresponding 64 samples of the injection molding process, sample effects can cause distorted models and therefore an inferior generalization capability of the ANN which is possibly unfavorable for the subsequent transfer learning. The resulting average of the model qualities is taken into consideration for the calculation and display of the results.

Table 5 Description of conducted experiments

The resulting 5900 source-ANNs serve as base models for the transfer learning approaches IL, IL_HL, and CM. While the conv-ANN experiments are conducted for all elements in DS, the transfer learning experiments are focused on the area between 4 and 24 training data samples for this paper. In previous trials with a significantly smaller amount of experiments, the absolute differences of model determination between conv-ANN and TL-ANN were found to be significant although small and are therefore waived in the investigation here.

Regarding the scaling of all experiments, the provision of few data samples for training of the models bear the risk of distortion because of a scaling method with sample effects. However, in order to experiment as close to real processing conditions, the used MinMaxScaler class of the used framework Scikit-Learn for the data from AT for the adaption training is fitted exclusively to the training data, but also applied to the testing data afterwards. This allows a realistic interpretation of the prediction capabilities of the TL-ANN for reduced data in AT.

All model qualities are evaluated with the degree of determination R2 according to Eq. 7:

$$ {R}^2=1-\frac{\sum {\left({y}_i-{\hat{y}}_i\right)}^2}{\sum {\left({y}_i-{\overline{y}}_i\right)}^2}=\frac{SS_{ex}}{SS_{ov}} $$

R2 is a frequently used performance indicator for the quality of the predictions in this field [41, 19, 53, 54, 49]. It is defined as the quotient of the explained sum of squares SSex and the overall sum of squares SSov. It is computed with the true value of the quality criterion yi, the average of the true values \( {\overline{y}}_i \), and the predictions \( {\hat{y}}_i \) per sample made by the model. The possible value range for R2 in linear models such as linear regressions is usually defined as R2 ∈ [0; 1], with 0 indicating the prediction of the dataset mean for all samples and 1 signifying a perfect adaption when tested with an unknown set of samples. Particularly, for non-linear models and datasets, the metric can drop below 0 as an unsuitable model can introduce more variation into the prediction as can be observed in the true values. R2 is independent of the scaling of the data, in contrast to further performance indicators such as root mean square error (RMSE) or mean square error (MSE).

4 Results

4.1 Comparison of TL intensities

The overall transfer learning result is depicted in Fig. 6. The diagram shows the model’s R2 in relation to the provided cardinal number of training data MT for adaption which has been taken from the target domain dataset of the “4 × 2 original” toy building block. The display of the R2 results for the cardinal number 76 is omitted as for only one sample in the test partition; R2 cannot be computed. The data series (curves) represent the conv-ANN approach without transfer learning and the three transfer intensities IL, IL_HL, and CM. The calculated standard deviations for each approach are shown by the bars with values on the secondary axis. Each conv-ANN series point represents the average model determination of 100 ANNs, whereas each transfer learning series point is averaged by 590.000 ANNs.

Fig. 6
figure 6

Results for all domains of all distributions together

Regarding the conv-ANN results, the model architecture and the chosen hyperparameters prove to be suitable for the target assignment AT. The model achieves a model determination value R2 of 0.973 for 60 provided training data elements in MT. With increasing training data, the R2 value does not fall behind 0.963. This confirms an appropriate choice of hyperparameters for the modelling task. However, for small amounts of training data, the conv-ANN approach produces high standard deviations in the model quality (rf. Table 6): Even with 16 training data elements, the model determination R2 can still be below 0, indicating an insufficient adaption to the target domain TT. Therefore, considering the bespoken use case of process setup in injection molding, practitioners of this technique should not employ ANNs as a model building technique if less than 20 data samples for the training are available. The ANN has a chance to perform worse than a linear mean estimator with its degree of determination dropping below 0 (rf. Table 6, Conv, MIN). This emphasizes the necessary effort for data generation when using ANNs in a conventional approach. Interestingly, with the provision of 40 samples for training, the average model quality sinks and the standard deviation has a significant peak. This outlier needs to be compared to the results of the transfer learning approaches once data is available here.

Table 6 R2 results for TL intensities and conv-ANN for DS for α ∈ [4; 24]

The result is expected as the models retrieve more information about the relationship between machine parameters and the quality parameter part weight once more samples are provided. This implies that information about the injection molding process in general is stored in any of the neuron layers of the models trained on any of the source domains DS. Equally important to the averaged model quality difference between conventional and all three transfer learning approaches is the stability improvement of the model quality by transfer learning. A practitioner using this modeling technique is therefore less prone to receive a trained model which underperforms significantly regarding the average. Table 3 indicates that for the CM approach, the maximal deviation from the average model quality is 0.0058, for the conventional approach respectively 0.584 for 20 provided training data samples of MT.

Differences between the TL approaches can be seen in Fig. 6 and Table 6 as well. For the observed cardinal numbers of MT, the CM approach outperforms the other transfer learning approaches by the averaged model quality. Especially significant is the quality difference on average for very small training datasets like 4. This indicates that the transfer of the whole model in general is a feasible approach when trying to achieve the highest model quality. However, the CM approach shows the highest standard deviation of the three TL methods. For example, for the provision of 8 training samples, the IL or IL_HL approach could result in a higher model quality for a specific model instance. Interestingly, the averaged model quality sinks again for the conventional, IL, and IL_HL approach for 12 and 16 training data samples, while the CM model quality rises monotonously. This behavior is yet inexplicable and will need careful evaluation which is out of this paper’s scope. Regardless of the specific approach, network-based inductive transfer learning appears to be beneficial for the model quality for this use case in injection molding.

4.2 Resulting differences by provided source models

However, as indicated by the standard deviations in Fig. 6, the model qualities vary significantly for the same cardinal number of MT. One possible explanation lies in the differences of the source models, induced by the different geometrical characteristics of a mold cavity, which influence the process behavior: The longer the maximal flow path, the higher the injection pressure has to be in order to overcome the pressure loss by the mold. This, however, only applies for comparable wall thicknesses. Furthermore, thermal inhomogeneities are often unavoidable as the rising complexity of injection molding parts often requires rapid changes of wall thickness of the part. Due to the variances of injection molding processes introduced by different geometries, the conditional probability PS, i(Y| X) varies from process to process. Therefore, it is assumed that the transfer of a model fS, i to the target task TT is more successful if the processes are similar to each other, including the cavity geometries.

Figure 7 depicts the transfer learning results for the CM approach for three different source domains: “4 × 1 original,” “8 × 2 x 3” and “8 × 2 original.” While the results for the “4 × 1 original” and “8 × 2 original” source models are at the same level, for the provision of only 4 data samples, the retrained source models of the “4 × 1 original” and “8 × 2 original” show high values for R2 of 0.89 and 0.92, respectively. Significant differences are visible for the “8 × 2 x 3” source model: For small training datasets in the transfer learning approach, the model based on this source model performs considerably worse on average. Furthermore, the standard deviation for 4 and 8 provided training data samples appears to be significantly higher than for the other source models.

Fig. 7
figure 7

Results of the CM approach for three different source domains

An initial analysis of the geometrical differences as influence on the process data can be conducted by comparing the geometrical characteristics flow path length and average wall thickness as mentioned above. The flow path for the “4 × 2 original” toy building block is 108.3 mm long; the average wall thickness is 1.97 mm. While the “4 × 1 original” toy building block has a difference of 9.9 mm, the “8 × 2 original” toy building block shows a prolongation of 62.8 mm. This difference, however, is not reflected in the results of the transfer learning approaches. The “4 × 1 original” and “8 × 2 original” toy building block have the same average wall thickness of 1.97 mm as the part of the target domain, while the “8 × 2 x 3” toy building block shows an average wall thickness of 5.90 mm. In comparison with Fig. 7, the wall thickness appears to have a much higher influence on the transferability of a source model. Regarding the results, the transfer of source models can be especially successful for small amounts of training data samples from the target assignment, if the average wall thickness of the manufactured product in the injection molding process is similar or equal to the one of the source domain. This is an especially encouraging observation as occurring product variations commonly do not change in average wall thickness.

5 Conclusion and outlook

The extensive data need for the modeling of the relationship between machine setting parameters and quality parameters in injection molding is one of the main reasons why artificial neural networks are not yet broadly in use for process optimization. In this paper, different approaches of transfer learning with ANNs between different injection molding processes have been examined with the goal to reduce the necessary amount of data for model training. It has been determined that any applied transfer intensity for network-based induced transfer learning has a beneficial effect on the resulting model’s degree of determination and, hence, should be used to enhance the model quality for small available datasets. The transfer of a complete model with a subsequent retraining has been found to outperform all other approaches, especially no transfer at all as seen for the conventional approach. With 16 training samples from the target domain, an R2 value of 0.88 could be achieved on average. The conventional approach surpasses these results only with 24 training samples, which indicate a reduction of 33% necessary training data. An even higher reduction of training samples as a result can be seen when choosing a similar source domain: The model determination achieved using the 8 × 2 original models for a CM approach with 4 training data samples is only surpassed by the conventional training of an ANN when using at least 32 training samples. In comparison, this refers to a reduction of training data of 88%.

However, the similarity of the injection molding processes cannot yet be determined regarding the transfer learning success. Even though the average wall thickness appears to be a relevant factor for the possible transfer learning success, further investigation has to be done: More geometrical parameters of the parts, the cavity and the mold have to be examined for correlations with the transfer learning success in order to receive an estimation if a source model is suitable for a specific target assignment. Other quality criteria such as warpage are expected to behave highly non-linear and will introduce further complexity into the system. The success of transfer learning between different, practically relevant quality criteria in injection molding has to be compared in additional experiments. Especially for transfer learning between part variations, it needs to be investigated, if the observations for transfer between parts with equal wall thickness can be confirmed. Furthermore, these results based on simulation data have to be validated for real experimental data. The impact of noise in experimental data, introduced by varying environmental conditions, material fluctuations, or inaccurate machine movements, needs to be evaluated regarding the transfer learning results. A real-world case study will be conducted based on the presented results and transfer learning strategy. Finally, material and machine influences could be equally or more relevant factors on the transferability of source models to a target assignment. Experiments need to be designed, in accordance to the test series with geometrically different parts, to identify and measure their impact.