Deep learning study of induced stochastic pattern formation in the gravure printing fluid splitting process

We use deep learning (DL) algorithms for the phenomenological classification of Saffman-Taylor-instability-driven spontaneous pattern formation at the liquid meniscus in the fluid splitting in a gravure printing press. The DL algorithms are applied to high-speed video recordings of the fluid splitting process between the rotating gravure cylinder and the co-moving planar target substrate. Depending on rotation velocity or printing velocity and gravure raster of the engraved printing cylinder, a variety of transient liquid wetting patterns, e.g., a raster of separate drops, viscous fingers, or more complex, branched liquid bridges appear in the printing nip. We discuss how these patterns are classified with DL methods, and how this could serve the identification of different hydrodynamic flow regimes in the nip, e.g., point or lamella splitting.


Introduction
Printing, as a solvent-based surface technology, is one of the backbones of contemporary, highly productive, and cost-efficient mass production of packaging. 1 Typically, liquid wetting patterns are deposited on the substrate, ranging in size from plain liquid films to high-resolution ink patterns shaped on the 20-lm scale. Liquid films have a thickness in the 1-lm range, typically. The deposited liquid is subsequently solidified and dried. Printing is not restricted to graphical art applications. Rather, layers of functional inks also protect the substrate from water imbibition and penetration, ultraviolet radiation, and microorganisms. Plain surfaces can be furnished with an appealing design, customer information, and branding features. Recent research on printing technology fosters the minimization of energy usage and consumption of valuable resources. At the same time, paper and plastic foils are equipped with additional functions, transforming it to an economical and ecological superior packaging alternative for nutrition, medicals, and sensitive technical goods. Substrates can be endowed with adhesives, laminated to multilayer stacks, and equipped with printed electronics add-ons, as seen in the books by Klauk 2,3 and Nisato et al. 4 Moreover, by its economical use of materials, printing technology and printed products are an indispensable part of material recycling concepts, and help to close the reuse cycle of valuable natural and synthetic materials.
However, transfer and splitting of the printing fluid in a large-area printing unit, either operating according to the gravure, flexographic, or offset lithographic principle, is hydrodynamically unstable. 5 The ink splitting process in the nip between cylinder and substrate tends to superimpose a ribbing pattern to the printing layout. This is due to the viscous fingering instability, which occurs when a less viscous fluid, e.g., air, supplants a more viscous one, e.g., ink, from a cavity. Saffman and Taylor 6 studied this phenomenon in a Hele-Shaw cell, where air or a fluid of small viscosity supplanted a more viscous fluid from a shallow cuvette. When the meniscus moved between the bottom and top plate, see Fig. 1, it developed finger-like structures. This left a branched network of liquid bridges in the cuvette which decorated the top and bottom plates all the way along its walls.
The same phenomenon occurs in a portion of liquid located in the wedge between a rotating printing cylinder and a substrate plate on which the cylinder is rolling with some velocity. Air and liquid fingers of characteristic width are spontaneously created. They leave a stripy ink pattern on the surfaces, rather than a liquid layer of constant thickness, which would be the desired result in coating and printing applications. The technical problem is even worse, as the finger formation triggers secondary pattern formation effects: 8 dewetting of the ink from the surface in case of a finite wetting angle, Marangoni drag driven by the evaporation of volatile ink components. These effects amplify the finger pattern and may create even new, more complex patterns which persist on the printout after ink solidification and drying. All this is subsumed as the 'ribbing defect' in the graphic industry. The initial length scale of the pattern is on the order of 100 lm, and is usually selected by the primary Saffman-Taylor instability. As a matter of fact, the problem is controlled using rastering methods when printing text and images. The human eye cannot resolve the ribbing pattern. However, in functional printing or glossy finishing applications dense and very smooth layers are mandatory. Here, the problem of ribbing is evident.
There have been numerous studies on the viscous fingering phenomenon in past decades for the rotating cylinder geometry. Gaskell et al. 9 studied the ink splitting in gravure roll coating. A linear stability analysis of the liquid meniscus at the onset of finger formation goes back to Carvalho and Scriven. 10 Figure 2 shows a schematic of the gravure cylinder setup, indicating the two different ink splitting regimes of point and lamella splitting which have been identified by Hü bner, 11 creating characteristic dot and finger patterns, respectively. Systematic measurements of the instability and its phenomenology for the particular problem of ultrathin liquid films have been done by Bornemann et al. 12 and by Kitsomboonloha et al. 13 All this has contributed to our fundamental understanding of the instability. However, the full phenomenology of pattern formation in the gravure press is much richer and still offers surprising features. Evidently, stochastic aspects cannot be ignored, showing that the pattern details are seeded by noise. One is tempted to suspect a complex sequence of critical bifurcations which are characteristic for complex, nonlinear interacting systems. It appears promising to apply the hierarchical complexity concepts of Glansdorff and Prigogine 14 to pattern formation. Furthermore, we would also like to refer to the phenomenological study of Sahimi 15 on viscous finger phenomena in porous media, and on the dynamic-system view of Casademunt. 16 In a former study on flexographic printing, 17 we obtained evidence that the nonlinear liquid-structure interaction in the nip does indeed have a substantial impact on scaling properties of the pattern structure. Elastic deformation of the printing cylinder, nonlinear rheology of complex printing inks as well as possible cavitation of gas bubbles offer plenty of possibilities to implement models with delicate nonlinear dynamics and pattern formation mechanisms in the printing nip. In this scheme, the printing process could be considered as a self-stabilizing dynamical system, with technically useful process windows which correspond to stable dynamical fixed points in parameter space, and with Hopf bifurcations defining the onset of pattern evolution. This could be achieved in a systematical manner as described by Cross and Hohenberg. 18 Such models could also explain further pattern formation regimes apart from point and lamella splitting, which are distinct in their symmetry and local correlation. However, to the knowledge of the authors, no such approach has ever been successfully elaborated, nor have adequate order parameters been identified. Recent printing experiments which studied viscous fingering at velocities of up to several m/s instead of cm/s offered encouraging results which, however, are still waiting for a proper understanding.
Novel insight was gained by the unique experiments of Schä fer et al. 19,20 who was able to prepare a large set of high-speed video recordings of the microscopic, highly dynamic liquid-air interface in the nip of a rapidly rotating gravure printing press. The parameter space of possible machine settings and printing cylinder gravure patterns was deliberately narrowed down to settings which appeared interesting with respect to pattern formation. The videos show the vigorous dynamics of liquid bridges and filaments forming out of the expanding liquid meniscus between the wetted rotating cylinder and the co-moving tangent planar surface. A surprising variety of distinct phenomena was found, with partly simple, but also with multiply branched liquid bridges. Some of them inherited their length scale from the gravure pattern of the cylinder, whereas others appeared to be quite independent. Most details of these liquid patterns were transient, i.e., they were extincted by capillary leveling and relaxation when the substrate had left the nip. For this reason, the full variety of structures could not be recognized in the finished, dried printouts any more. However, within the phase of fluid splitting, meniscus dynamics apparently was in a steady-state situation, and continuously generated the characteristic patterns as long as the process was kept running. The future task here is to fathom the parameter space for a map of distinguishable dynamic regimes. There is broad consent that all this is principally related with the viscous finger instability observed by Saffman and Taylor 6 in their famous cuvette experiment of the retracting liquid meniscus. However, a complex dynamic system such as the printing nip obviously involves a much more sophisticated spontaneous pattern formation physics.
Based on the video data set of Schä fer, 21 we applied deep learning (DL) concepts for assigning the patterns to different classes. Our goal was to implement a tool which is capable to identify and to distinguish the distinct regimes of pattern formation by scale, autocorrelation, and symmetry. DL-enhanced hydrody- In the point splitting regime (a), there is no interaction of the gravure cells and each gravure cell deposits its printing fluid content onto the substrate. In the lamella splitting regime (b), a closed meniscus of printing ink forms and the liquid-air interface at the diverging side of the nip may become unstable due to viscous fingering and leads to a ribbing pattern on the substrate namical models including instabilities are presently fostered by Brunton et al. 22 Typical applications of such models are related to the exploitation of mineral oil reservoirs, see Magzymov et al. 23 where viscous fingering instabilities at the subterrestrial oil-water interface are an important limiting factor for the yield of the oil sources. 15 In this article, we are focusing more on the variability of the patterns and to recognize their particular features. DL methods are particularly useful to filter out long-ranged correlations from a huge and noisy set of image data. As a further benefit, the DL algorithms also exploit time correlations in subsequent frames of the video records of the patterns. This is important here, because the hydrodynamics of the nip is in a steady-state condition, but the individual recorded patterns are transient. The aim is to establish a method to complete the map of pattern formation regimes beyond Hü bner's 11 categories of point and lamella splitting. It appears that the interesting features can be observed in the parameter range close to the transition between these two cases. The long-term aim is to fit this map in a framework of pattern formation order parameters, and to make it accessible to the tools of modal stability analysis and bifurcation theory.
As our initial effort, we trained a recurrent neural network (RNN) and a 3-dimensional convolutional neural network (3D-CNN) with selected videos where the type of pattern could be unambiguously identified. We then studied so-called class activation maps (CAMs) of the 3D-CNN, a graphical illustration of structure features which were most relevant for the class assignment by the neural network (NN). As already mentioned we focused on the well-established phenomenology of the transition between pattern formation phenomena of a gravure printing press known as point and lamella splitting.
In ''Experimental'' section, we explain the experimental gravure printing setup of Schä fer 20 and his video recordings of the finger instability. ''Data analysis'' section shows how we prepared the video data, and how the DL algorithms have been applied. The results of the DL assessment are shown in ''Results'' section.

Experimental
A large number of high-speed video recordings of the printing nip from Schä fer 21 served as a data set for this study. Schä fer built a unique, optically accessible rotogravure printing machine which he used to analyze highly dynamic fluid splitting phenomena and transient pattern formation in situ for the first time. Below we briefly present Schä fer's experimental setup. This is crucial in order to understand the distinct pattern formation occurring in the resulting high-speed video data set.

Experimental setup with high-speed camera
The experimental setup from Schä fer mainly consisted of a modified laboratory sheet-fed rotogravure printing machine, a high-power white light emitting diode (LED) light source, a beam splitter for confocal light input, and a high-speed camera Photron Fastcam SA-4, see Fig. 3. The core of the modified machine was an optically accessible substrate carrier, which replaced the standard opaque substrate carrier. Schä fer used a printing cylinder with 120 electromechanically engraved quadratic test fields (13 mm x 13 mm) with different raster frequencies (40-140 lines/cm) and tonal values (5-100 %) at a constant raster angle of 45 o . This enabled him to study the impact of the underlying gravure raster pattern and transfer volume on transient pattern formation in the printing nip. Additionally, the printing velocity (0.5-1.5 m/s) was varied in the experiments. Ethanol was used as the printing fluid. Printing trials with 185 different parameter combinations of raster frequency, tonal value, and printing velocity were performed three times each, resulting in 555 high-speed video recordings. A simplified sketch of the setup as well as an exemplary video snapshot with annotations is presented in Fig. 4. For more details of the experimental setup see Schä fer et al. 19 Variety of observed patterns converging and on the diverging side had clear optical contrast by the light refraction at the liquid-air interface. While the meniscus situated on the converging side of the nip ('converging meniscus') was an almost straight border and oriented parallel to the cylinder axis, the meniscus on the diverging side ('diverging meniscus') showed characteristic corrugation into more or less periodic finger-like structures with periods that were typically in the range of a few 100 lm. The finger frequency was especially dependent on the raster frequency and independent from the printing velocity and the mean gravure cell volume. 20 Analogous to the finger phenomenon at the retracting meniscus in the Hele-Shaw cuvette experiments of Saffman and Taylor, the fingers were the effect of air intrusions penetrating into the nip. In certain cases, extended branched dendrite-like patterns evolved, and even complex, disconnected structures were observed. As the air intrusions were always topologically interconnected, the tips of the air fingers were apparently in a steady pressure equilibrium with the air outside. For this reason, we can exclude gas bubble cavitation as the origin of any air volume in the seam. The liquid phase, however, was only occasionally connected. Rather, isolated, mostly corrugated, and irregularly shaped liquid structures were detaching from the liquid seam, also drop-like structures. Each such structure was the footprint of liquid bridges between the substrate and the gravure cylinder surface. 24 In certain cases, the liquid structures (e.g., drops and fingers) coincided with the gravure raster of the printing cylinder, but there were also regimes with a larger drop and finger size than the gravure pattern, and periodic finger patterns of an integer multiple width of the gravure raster appeared. A frequent lock-in of the finger width at three or four times the raster width was observed even when the printing velocity was changed, and was apparent to the bare eye. In particular, extended finger patterns were either aligned with or, alternatively, inclined against the printing direction, and more correlated with the skew axes of the gravure raster. Thus, the transient patterns showed a considerable variety of regimes in meniscus dynamics, which we consider to deserve a phenomenological classification. Accordingly, a complementary, extended-scale view on the classification should be useful which considers the spatial correlation of these patterns, revealing point symmetries in the autocorrelation function and in the spectral Fourier transform. We have already studied the pattern formation with 1D Fourier methods in reference (25). However, there are more options to detect hidden periodicities in a stochastic 2D  The snapshots were all taken from the same moment of the fluid splitting process and were enhanced in contrast and brightness for better visibility. A variety of patterns, from finger-like liquid bridges to dot-like patterns can be observed. Only an experienced observer can distinguish between several predefined pattern classes: Lamella splitting regime (LSR) (a-e), mixed regime (MR) (f), and point splitting regime (PSR) (g-i). A list of the videos used for exemplary snapshots in this paper can be found in Table 5 in Appendix pattern. Folding such functions with representations of 2D crystallographic, i.e., of the planar uniform discontinuous groups, 26 extracts the particular weights of these symmetries from the patterns. These weights and their variation with printing parameters could serve as order parameters of symmetry breaking transitions, and help to identify specific regimes of pattern formation. NNs could be efficient in recognizing such symmetries in the highly stochastic image data, and one might also train them by use of the respective representation functions.

Fundamentals of DL
Artificial intelligence (AI) is one of today's worldwide major topics of interest. Self-driving cars, face recognition, and predictive maintenance are just some familiar examples for applications of AI in industry.
A subdomain of AI is machine learning (ML). According to Zhou, ML 'is the technique that improves system performance by learning from experience via computational methods [...] and the main task of ML is to develop learning algorithms that build models from data'. 27 ML problems can be divided into three classes: supervised learning, unsupervised learning, and reinforcement learning. In supervised learning which is for example used for regression or classification tasks, labeled data is needed on which the ML model is trained. For example, to classify certain classes of iris flowers according to the length and width of their sepals and petals as given in the famous iris data set, 28 the class or the so-called label needs to be known to train the ML model. In contrast, unsupervised learning does not need labeled data. For example, clustering tasks can be performed using unsupervised learning. Reinforcement learning cannot be considered as supervised or unsupervised, since it does not include labels but it involves taking feedback from the environment. DL is a subdomain of ML. We speak of DL, when deep neural networks (DNNs) are used to, e.g., cluster or classify the data. DNNs are NNs with many layers of neurons and they try to mimic the behavior of biological NNs, e.g., the human brain. In this study, we use DL for the classification of videos showing different classes of hydrodynamic pattern formation. There are many well-known architectures for DNNs which are available from online repositories. For image classification, convolutional neural networks (CNNs) are commonly used. They incorporate so-called convolutional layers in their architecture which detect and enhance certain features of the input image, e.g., edges, horizontal lines, circles, or much more complex shapes. The extracted features are fed to subsequent NNlayers. In the case of pretrained CNNs, the convolutional layers have already been trained on well-known public data sets like ImageNet. 29 There is a difference between DNNs used for image classification and those used for video classification. Since videos are a chronological sequence of single images, DNNs designed for video classification take the temporal relationship between the images into account.
A DL model needs to be trained before it can be used for the intended purpose, e.g., for image classification. The training duration is specified by the number of training epochs. One training epoch describes one cycle in the process diagram in Fig. 6. First, the DL model receives input data. Then, the model makes predictions and the predictions are compared with the labels of the input data. According to the chosen loss function (e.g., crossentropy loss), a loss value is calculated and according to the chosen optimizer (e.g., Adam), the weights of the DL model are updated. The updated DL model again makes predictions and the next training epoch starts.
In order to effectively train a DL model, a large enough data set is needed, the so-called training data set. In addition, a test data set, and possible, also a validation data set for hyperparameter tuning are needed. Hyperparameters describe the parameters that can be changed during the training process, e.g., number of epochs and learning rate. The test data set is used to evaluate whether the DL model was able to generalize or if it rather learned the data by heart. The latter is called overfitting which leads to a poor performance of the DL model on unseen data even though it performed well on the training data set. Should overfitting occur, e.g., hyperparameters or even the architecture of the DL model have to be adjusted iteratively. This iteration process yields a final trained DL model. The performance of this final DL model is tested on the test data set which has not been involved in the iteration process. A validation data set is only necessary if the model's hyperparameters have been tuned iteratively. To avoid overfitting, several techniques are generally used. One is called data augmentation which is used to increase the variety of the data set. Examples for data augmentation are rotation, translation, or mirroring of the images from the data set before feeding it to the DL model. Another common technique to avoid overfitting is data set balancing. An image data set is called unbalanced if there is a large discrepancy between the number of images for every class. To obtain a balanced data set, e.g., some images of over-represented classes can be ignored from the data set or a so-called cost-sensitive training can be performed. Figure 6 shows the general concept of training DL models. In the case of costsensitive training, the loss function is weighted according to the representation of the classes. The weights are higher for under-represented classes. The performance of DNNs can be assessed using different approaches, for example: 1. Confusion matrix 2. Accuracy 3. CAMs A confusion matrix shows true positive (TP), true negative (TN), false positive (FP), and false negative (FN) predictions in a matrix. Table 1 shows a confusion matrix for a binary classification problem. 30 Often, it is also interesting to know with which confidence a DL model has predicted a class. A prediction results from the class probability for each of the two classes in case of a binary classification problem. The class which is attributed to the higher probability is the predicted class. As an example assume that an image of a dog was classified with a class probability of 95 % as a dog and with a class probability of only 5 % as a cat. Thus, the model's prediction is 'dog' with the DL model being very sure about its prediction. If the class probabilities were 55 % for dog and 45 % for cat, the prediction would also be 'dog' but the model would be rather unsure about its prediction. Based on the TP, TN, FP, and FN values, the so-called accuracy can be calculated [equation (1)]: 31 The accuracy of a DL model can be calculated for the training, the validation and the test data set. We refer to training accuracy, validation accuracy, and test accuracy. The higher the accuracy, the better the performance of a DL model at least in the case of a balanced data set, i.e., when there is a similar number of samples for each class. CAMs are heat maps that reveal the areas of an input image that were most important for the classification decision of the DL model. 32 The warmer the color, the more important is the area for the classification decision. If applicable, the heat maps can be overlaid with the original image. For further reading on ML and DL, we suggest the textbooks from Zhou, 27 Joshi, 31 and Rebala et al., 30 from which much of the information in this subsection is taken.

Aim
The aim of our computer vision workflow was to train and test DL models on the video data set from Schä fer, 21 (Fig. 5h) or no pattern formation due to insufficiently transferred ink volume from the gravure cells (Fig. 5i). The rest of the data set, 72 videos, is labeled with '0.5,' meaning that the videos show mixed pattern formation of finger-like and dot-like patterns (i.e., MR), see Fig. 5f.

Computer vision framework
As a programming language for our computer vision workflow, we used Python 3.8.6 as well as several Python libraries for image processing, deep learning, and other tasks. All used libraries are depicted in Table 2. As a source code editor, Microsoft Visual Studio Code version 1.52.0 was used. Training and testing of all DL models were performed on a desktop computer with installed Microsoft Windows 10 Pro Version 1909 using an Intel Core i5-4460 3.2 GHz central processing unit (CPU) and 16 GB of DDR3-RAM. However, the possibility of parallel computing on a graphics processing unit (GPU) was implemented in the source code for further use. The complete source code used for this research as well as further research data can be downloaded from https://doi.org/10.48328/ tudatalib-938.

DL data set creation
The high-speed videos from the Schä fer data set are provided in a raw image data format (MRAW) plus their metadata files (CIH). MRAW and CIH are native data formats from Photron high-speed cameras (Photron Deutschland GmbH, Reutlingen, Germany). The videos were recorded in grayscale at a bit depth of 12 bit and have a resolution of 512 px x 768 px at 49.6 lm/px. The length of the videos varies depending on the printing velocity. It is important to note that the videos recorded at different printing velocities do not start at the same exact moment of the printing process. The Python library pyMRAW was used for importing the grayscale videos and saving them as single RGB images in a PNG data format with a bit depth of 8 bit for each channel and lossless compression. However, we needed to alter the library pyMRAW before using it so that 12 bit videos could be imported. The original library supports only 8 and 16 bit data formats. A conversion from grayscale to RGB was necessary since our DL models expected RGB input images. The conversion to RGB was performed by tripling the 8 bit grayscale information of every pixel and copying it to the three color channels of the desired RGB image. We decided to train our DL models only on a fraction of frames from each video. In our code, we created a list of 16 evenly spaced frames within each video. Only the middle eight frames of the list were exported, since the middle part of the videos was found to contain the most relevant frames for the analysis of pattern formation phenomena. The first and last four frames contained a lot of black areas and only partially showed the test field. The export of eight frames per video lead to a total amount of 4,440 images that served as our DL data set.

Training of DL models
We used two different architectures of DL models, suitable for video classification: First, a RNN architec-ture ( Fig. 7a) based on a pretrained 'ResNet-18' model 34 with a subsequent RNN and second, a 3D-CNN architecture (Fig. 7b) based on a pretrained 'ResNet 3D 18' model. 35 Our implementation of deep learning for video classification was based on the book from Avendi 33 and code snippets from the associated online GitHub repository. 36 From our DL data set consisting of eight consecutively numbered frames per video (numbers 1-8), one to eight frames centered around the middle of the list were used for the training and testing of our DL models, see Fig. 8. For the majority of this work and if the number of frames is not further mentioned, four frames per video were used, i.e., frames with the numbers 3, 4, 5, and 6. Due to the fact that videos recorded at different printing velocities do not start at the exact same moment within the printing process, the ROI is not located at the same place in extracted frames of a video recorded at 0.5 m/s (Fig. 8a) and at 1.5 m/s (Fig. 8b). In a separate analysis, we investi-   35 Modified from Avendi 33 gated the influence of the number of frames on the DL model's accuracy. We compared the training results for 1, 2, 4, 6, and 8 frames per video. Before feeding the frames to the DL models, they were cropped from their original size of 512 px x 768 px to 512 px x 512 px and then resized to 224 px x 224 px or 448 px x 448 px. For each printing parameter combination, three videos were available. Two of the videos (67 %) were used for training and one video (33 %) for testing of our DL models. We did not change hyperparameters and therefore performed our study without a validation data set. Each model was trained for 20 epochs. For the best performing model, data augmentation and data set balancing were additionally implemented. As is standard practice, data augmentation was performed only on the training data set and not on the test data set by applying a random translation and a random perspective transformation per frame and per training epoch. Data set balancing was implemented via cost-sensitive training. We distinguished between 3-class-models and 2-class-models. 3-class-models were trained and tested on frames of videos of all three classes: LSR, PSR, and MR. For 3-class-models, 370 videos were used for training and 185 for testing. Thus, the data set used for 3-class-models comprised 2,220 frames (1,480 for training, 740 for testing). 2-class-models were trained and tested only on frames of videos showing LSR or PSR. Thus, 322 videos were used for training and 161 for testing, which resulted in 1,932 frames in total (1,288 for training and 644 for testing).

Assessment of trained DL models
We compared the accuracy on the test data set, i.e., the test accuracy, as well as the training duration of the DL models as our main performance metrics. For identifying what types of errors the DL models made during classification, we took a look at confusion matrices as well as class probabilities. To get a deeper insight into the decision-making process of the trained DL models, we implemented the possibility to derive CAMs for the 3D-CNN architecture models.

Test accuracies and training duration
We trained six different DL models over 20 training epochs on the training data set and compared the test accuracies of all trained DL models. All models achieved test accuracies of more than 94 %, see Table 3. Model #2 was found to be the best performing 3-class-model with a test accuracy of 99.5 %. We observed that for a resolution of 224 px x 224 px the 3D-CNN architecture achieved higher test accuracies than the RNN architecture. For the same architecture and resolution, the 2-class-models performed better than the 3-class-models and also had a shorter training duration. For the RNN architecture, an increase in resolution lead to higher test accuracies of 2-and 3class-models, respectively. Concerning the training duration, the RNN architecture models trained on . The ROI is marked in red. The images were enhanced in contrast and brightness for better visibility and the scaling bar is the same for all images. The complete DL data set comprised eight frames per video. Per default, four frames per video were used for training and testing, but for a separate analysis, the influence of the number of frames on the test accuracy was investigated lower resolution frames of 224 px x 224 px (models #3 and #4) required the least amount of training time, about 1 h, whereas the 3D-CNN architecture models and the RNN architecture models with higher resolution (models #1, #2, #5 and #6) had about three times longer training durations. The test accuracy plotted against the training epoch for all 2-class-and 3-class-models as well as for an optimized model #2 is depicted in Fig. 9. The 2-classmodels curves (Fig. 9a) are more closely spaced in comparison to the 3-class-model curves (Fig. 9b). Additionally, we applied data set balancing and data augmentation for model #2. We found that the implementation of data set balancing did not exceed the previous best test accuracy (99.5 %) ( Table 3). In fact, data augmentation slightly decreased the test accuracy to 97.8 %. However, the test accuracy curve for model #2 with data augmentation showed a rising trend, see     We also investigated the effect of increasing the number of frames per video for training and testing of model #2, see Table 4. Four frames were used as the default number throughout this study. Training of model #2 with default frame number took about 3.5 hours and yielded a test accuracy of 99.5 %. By reducing the number of frames per video down to one, the training duration of model #2 decreased to roughly a third compared to the default value. However, the test accuracy was also decreased by around 2 % to 97.3 %, leading to more confusions especially concerning the MR videos, compare Fig. 16 in Appendix. When doubling the number of frames per video to eight frames, the training duration also doubled but the test accuracy roughly stayed the same. This could be associated with the fact that the additional frames did not contribute relevant information about the printed patterns, since only a small fraction of the test field was visible in the additional frames, see Fig. 8. However, increasing the number of frames per video from one to four improved the classification accuracy specifically for the MR class. This indicates that time correlations in the transient patterns were important for the MR classification, whereas there was almost no effect for the LSR and PSR class recognition.

Confusion matrices
Confusion matrices for the test data set for all trained DL models can be found in Fig. 10. The confusion matrices reveal that the 3-class-models, model #2, #4, and #6, (Fig. 10a) never confused the LSR and PSR, apart from one exception: model #4 misclassified one PSR video as LSR, see Fig. 11a. Therefore, the 3-classmodels had implemented the characteristics of PSR and LSR very well, with only one wrong event among more than 150 classifications. All other misclassifications that occurred for the 3-class-models, 17 in total, involved the MR class. Consequently, the MR evoked by far the greatest amount of confusions. The number of confusions for one model can be calculated by adding together the entries that are not on the main diagonal of its confusion matrix. The confusion matrices for the 2-class-models (Fig. 10b) illustrate that only one confusion happened in total, namely model #3 misclassified a LSR video as PSR, see Fig. 11b. A list of all confusions can be found in Table 6 in Appendix as well as confusion matrices for model #2 for different numbers of frames per video, see Fig. 16.

Class probabilities
We investigated the class probabilities for models #1 and #2 and plotted them as boxplots in Fig. 12. Model #1 is a 2-class-model which was trained only on PSR and LSR videos and reached a test accuracy of 100.0 %. When observing the class probabilities, it became clear that model #1 was very decided of its classification. For example, LSR videos were assigned a class probability for PSR of around 0 % and a class probability for LSR of around 100 %. The same applied to videos labeled as PSR in an analogous manner. In contrast, for videos labeled as MR, model #1 was rather indecisive. This behavior was expected, since model #1 was not trained on MR videos. For videos labeled as MR, the class probabilities for PSR and LSR each ranged from 0 to 100 %, although the median class probability for PSR was around 10 % and for LSR 90 %. In other words, model #1 tended to classify a MR video as LSR. For model #2, a 3-classmodel with 99.5 % test accuracy, class probabilities are also displayed in Fig. 12. LSR videos were assigned a class probability for PSR and MR of around 0 % and for LSR of around 100 % with only a few outliers. The same behavior applied to PSR videos in an analogous manner. Thus, model #2 was very confident about its classification decisions concerning videos labeled as LSR or PSR. However, for MR videos, the model was less confident, since there were more outliers and larger box sizes in the boxplot, but it was still more   the four original video frames so that the hot parts of the CAM could be directly connected to certain features of the video frames. Whereas the first layer detected numerous small features, e.g., the gravure raster dots, small droplets, or single fluid fingers, the DL model focused on larger features in later layers, e.g., groups of features. The second layer of the DL model yielded two CAMs and the third and fourth layer resulted in one CAM each. Consequently, these CAMs could not be assigned to the four original video frames and were plotted without any overlay. This is because a 3D-CNN not only convolves the spatial dimensions but also the temporal dimension.
In other words, the model looks at each input frame individually in the first layer and in later layers considers all input frames at once for the classification decision. Interestingly, especially in the first layer, model #2 also paid attention to features outside the ROI, e.g., as seen in Fig. 15a. The region outside the ROI was much hotter than the ROI itself. However, in later layers (15b-d), the model tended to focus on the ROI. We observed this behavior also for many other videos from the Schä fer data set, although not presented here.

Discussion
The aim of this study was to develop a training and verification scheme for DL models in automated classification of high-speed videos of fluid splittingrelated patterns in the gravure printing nip. Depending on the model details classification accuracies of 94 % to 100 % were achieved for all trained DL models after 20 training epochs. We consider this as a very promising output. Only few videos were misclassified by certain DL models as summarized in the confusion matrices in Fig. 10. The training duration of about one to seven hours on a CPU was acceptable for our research purposes. The 2-class-models achieved higher accuracies than the 3-class-models for the same architecture and resolution, see Table 3. This was not unexpected as the 3-class classification problem is also more challenging for a human referee than the 2-class problem. It is consistent with the fact that the characteristic features of the LSR and PSR allow for a closer, more selective class definition, and are therefore associated with a larger structural entropy, which, in turn, means a superior learning efficiency for the DNNs. comprises a relatively broad variety of features, and is therefore more challenging to recognize. Another explanation could be that the qualitative labels provided for the Schä fer data set, especially for the MR videos, contain some degree of inconsistency by themselves, and do not allow a 100 % clear distinction by a DL model. The 2-class-models were able to distinguish the PSR and the LSR confidently, whereas videos labeled with MR yielded a broad range of class probabilities for PSR and LSR, see Fig. 12.
One might be tempted to identify the MR within the 2-class-models by setting a lower and upper threshold of the PSR and LSR class probabilities. For example, a video could be classified as MR if the class probability for the PSR was between 30 % and 70 %. If the class probability for the PSR was less than 30 % or more than 70 %, the video would then be classified as LSR or PSR, respectively. In this manner, 2-class-models could be used to distinguish all three fluid splitting regimes. The classification performed by the trained 2class-models could even outperform the human referee by providing a more objective classification criterion in form of thresholds. However, the use of such thresholds as a criterion for a third class could be spurious, because it eventually would not distinguish between a pattern with truly unique features and a pattern which is just a superposition or collage of dots and fingers. Thus, the classification would be misleading. Using, however, 3-class-models, we could show that the third class is possible and distinct from a superposition of the other classes. It is also possible to teach the network in the learning process even though the specific features are not explicitly known. In spite of the somewhat smaller accuracy, the 3-class-models confidently classified MR videos. Videos labeled as MR had high median class probabilities for MR and low median class probabilities for LSR and PSR. A clear discrimination of the third class from mixtures of the other two classes is thus possible. Regarding videos with a clear MR class assignment, a closer inspection with respect to unidentified, more complex symmetries or periodicities could be the next step in finding the relevant feature of this third class. This could be done using spectral analysis of the fingering patterns within the new regime. We do not expect that extending the number of training epochs to more than 20 would have a substantial impact on the accuracy for most of the models. The recognition rate had already reached a high level, and the test accuracy curves appeared to approach an upper limit. However, the test accuracy of model #2 with data augmentation might still have some rising trend, see Fig. 9. Extended training could have a positive effect here. The number of frames used for the training had significant effect on test accuracy only for the MR classification in the best performing model (#2). The best results were already achieved using only four to six frames per video.
CAMs for selected frames provided deep and useful insight into the decision-making process of the 3D-CNN architecture models. We found that the DL models indeed focused on the pattern formation within the ROI, but partly, features outside the ROI were considered. From this, we conclude that the classification relied on the particular pattern features, even though the images contain a lot of structural information and entropy which is not related to pattern formation, e.g., the gravure raster and the borders of the test field, both having intense optical contrast and exhibit peculiar geometric details on length scales comparable to the dots and fingers. Therefore, we regard DL as a reasonable approach for distinguishing pattern formation regimes, even though the pattern contrast was not dominant, and sometimes obscured by redundant features. The trained DL models can be used for automated classification of pattern formation regimes in further unlabeled video data sets, provided that the experimental high-speed video setup does not differ from Schä fer's setup significantly. In case of a significant change, the performance of the ML models would have to be tested and, if necessary, the models should be retrained with labeled high-speed videos from the new experimental setup. One could consider to train the models only on the stabilized ROI to obtain DL models that are more independent from the experimental conditions like the field of view.

Conclusions and outlook
In our study, we trained and compared several DL models on video frames of the complex, highly dynamic meniscus at the liquid-air interface that evolves through fluid splitting in a gravure printing press. We trained the models with different classes of meniscus patterns, i.e., LSR, PSR, and MR. Each such pattern was defined by specific local geometric features, indicating particular nip hydrodynamics, but was also highly stochastic. The features that defined the classes were dependent on, and could be manipulated by the physical printing parameters, namely printing velocity and raster geometry of the gravure cylinder. The DL models were quite successful in assigning the correct classes to the video data, with test accuracies between 94 % and 100 %. From this, it can be concluded that it is possible to identify the parameter regimes of the prevailing fluid splitting hydrodynamics, and to establish a correlation between printing parameters and the mechanisms and phenomenology of a possibly very delicate pattern formation problem. This was our initial goal, in spite of the fact that we do not fully understand the details of fluid splitting from the hydrodynamic point of view. The automated classification via DL enables us to sort the videos and apply specific further analyses on each pattern class. Another conclusion results from the comparison of 2-and 3-class-models concerning the MR. Provided that the MR is regarded as a truly unique pattern rather than a superposition of PSR and LSR, we suggest to use 3-class-models for classification, rather than 2-class-models. Several methods, i.e., calculation of test accuracies as well as analysis of confusion matrices, class probabilities and CAMs, were applied to confirm that the DL process was successful and that the trained DL models focused on the ROI over the course of the training. By analysis of the CAMs in subsequent layers of the 3D-CNN, we demonstrated that the network increasingly shifted its focus to the actually relevant pattern details in the ROI. The hot areas in the CAMs of later layers appeared to be more uniform, and were extended over much larger scales than the gravure raster. In contrast, the CAMs of earlier layers typically consisted of more or less randomly distributed small hot insulas, partly located outside of the ROI. We interpret this as an indication that later layers put more weight on the long-scale autocorrelation in the evolving liquid meniscus pattern, and on its immanent, even though stochastically blurred finger or dot periodicity.
Concerning pattern formation analysis, RNNs/3D-CNNs have interesting pattern detection capabilities which could supplement conventional classification methods: classification by visual perception through an experienced human referee, and by spectral analysis, i.e., by fast Fourier transformation (FFT) of the video images and subsequent characteristic spike identification. Due to the stochastic nature of the patterns, none of these analyzing methods needs to be fully conclusive even if large quantities of data are processed.
In summary, we would like to emphasize the following key advantages of DL for pattern classification in the gravure printing fluid splitting process: 1. Independence of subjective human judgment: DL makes the distinction of spontaneously forming patterns independent of human judgment, and can handle large data quantities which can help to improve statistical significance. 2. Multiple patterns: If it was clear that not more than two different patterns are possible (PSR, LSR), human judgment or FFT analysis of periodicities may be considered adequate for pattern discrimination. However, if there are many possible patterns in a noisy environment, NNs take advantage from long-ranged correlations in the noisy background. In contrast to FFT analysis which is only sensitive to periodic features, DL can discriminate much more complex characteristics such as branching frequency in network-like structures. 3. Time-correlations: The very strength of videobased DL pattern recognition is the resolution of time-correlation in a dynamic pattern, in addition to the spatial correlation of the patterns. We were able to demonstrate that this feature significantly reduced the number of erroneous assignments of the observed patterns to their respective classes.
DL is a useful tool for hydrodynamic and pattern formation research in general, especially when combined with large-scale printing technology. We would like to make the reader aware of our recent DL studies which did not use high-speed videos from the nip, but a large data set consisting of the finished printouts of a gravure printing press. 37 Theoretically, data generation and pattern recognition capabilities could be scaled up to huge data sets almost without upper boundary. NN capabilities have improved especially in the last decade with the advent of parallelized computations through GPUs. Printing technology offers fluid handling and transport performance and accuracy which has drastically advanced in the same manner during the past two decades, also driven by digitalization and economization. This could make stochastic data evaluation feasible even in cases where conventional methods appear to be statistically hopeless.
For future work, a deeper insight into the decisionmaking process of the trained DL models would be strongly desired. For this purpose, one could consider to implement further visualization methods for our DNNs apart from the CAMs for 3D-CNN architectures. Also, a classification into more than three fluid splitting regimes should be considered in order to comply with the large variety of different pattern formation phenomena observed. Moreover, MR should be investigated in closer detail because this class may be composed of different, possibly very complex patterns. The distinction between LSR and MR as well as PSR and MR is particularly challenging because the class appears in a comparably small number of videos only, and MR is not defined in terms of geometric features alone, but also comprises timedependent aspects. As a further option, pattern recognition could be made more independent of subjective human perception by implementing unsupervised learning methods. Labeled data are no longer needed here for teaching. Instead, the AI would identify pattern regimes by itself. However, unsupervised learning for image classification is presently still very challenging, and a rapidly developing topic in current research. As an alternative, we consider using unsupervised ML methods as an interesting option, e.g., clustering based on previously extracted features such as specific pattern symmetries. Funding Open Access funding enabled and organized by Projekt DEAL.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.