High Resolution Interferometric Imaging of Liquid-Solid Interfaces with HOTNNET

A variety of imaging methods are available to obtain kinematic data at an interface, with a widely varying range of spatial and temporal resolution. These methods require a trade-off between imaging rate and resolution. A deep learning framework trained on synchronous profilometry data acquired using two imaging modalities at two different spatial resolutions to enhance spatial resolution while maintaining temporal resolution is desired. Fizeau interferometry (FIF) and frustrated total internal reflection (FTIR) are used to overcome the resolution-rate trade-off via a deep learning framework. The FTIR imaging data are recorded at high resolution, while the FIF imaging data are recorded with a lesser resolved, larger field of view. We apply a deep learning framework using a multi-layer convolutional neural network to enhance the FIF image resolution. With the deep learning framework, we achieve the high spatial resolution of measurements obtained by FTIR imaging in all three dimensions from the lower resolution FIF data. A high-order overset technique ultimately yields full up-scaled images from the network outputs without losing precision. The accuracy of the super-resolved image is evaluated using test data. This hybrid framework, called HOTNNET, is implemented in its entirety on high-speed imaging profilometry data acquired in the study of droplet impacts on a smooth, solid surface, and is used to recover full, high-resolution images at high rates by unwrapping the phase of the interferometry. This framework can be readily adapted to other paired datasets by retraining the network on the novel data.


Introduction
Myriad applications of imaging involve rapid phenomena [1][2][3][4]. In the studies of these phenomena, a trade-off between spatial and temporal resolution is imposed by the data bandwidth of typical high-speed camera hardwarewe can obtain a pixel-dense image at low imaging rates, or low-pixel count images at high-rates, but we cannot obtain high pixel-count images at the highest frame rates without specialized techniques [5]. In addition to this technical constraint, physical phenomena such as optical diffraction or the finite numerical aperture of the image-forming objective also restrict spatial resolution. These limitations impede our ability to resolve rapid and multi-scale phenomena.
For rapid and multi-scale interfacial phenomena, interferometric imaging is often used as a means of visualization. This imaging modality has numerous applications in scientific and engineering fields, ranging from contact mechanics [6], fluid interfacial phenomena [7][8][9][10][11], biotechnology [12], nanotechnology [13], astronomy [14] to more industrial sectors like semiconductor [15] and display industries [16]. Interferometric imaging has the added benefit that it can be used as a non-invasive optical measurement technique. With interferometric imaging, the information about the target physical quantity is encoded in a two-dimensional fringe pattern, where the intensity of the fringes indicates the optical interference at a given spatial location within the image. This introduces a problem of lost phase information, leading to ambiguity in the interpretation of the interferogram; the ambiguity of the phase is only resolved in a process called demodulation. The demodulation process can limit the precision and accuracy of the measurement and requires a large field of view to work effectively.
Interferometry is often used in measurements of deformation during contact to obtain precise information about the gap between adjacent surfaces via interfacial profilometry; these measurements provide insight into material displacement, and can characterize material response due to mechanical loading or wear. As a result of the small spacing between adjacent surfaces, contact phenomena can occur on fleeting timescales; thus the problem of insufficient imaging resolution renders the aforementioned challenge of interferogram demodulation very difficult indeed. For fast interfacial dynamics, the limitations of high-speed imaging require that spatial resolution is reduced, or the field of view is shrunk, to obtain the necessary temporal resolution; however, any sacrifice of resolution or field of view can impair the demodulation effort by blurring or decreasing the fringe resolution. Even for slow or stationary interfacial profilometry, surface discontinuities, sharp interface changes and high slopes in the profilometry can degrade the fringe contrast [17] which makes it challenging to use interferometric optical techniques in nano-scale surface topographies [18].
Within the domain of contact mechanics, droplet impacts exemplify the rapid phenomena that can occur at an interface. Many experiments on droplet impacts use interferometry to directly visualize the contact profile and also the air gap between a liquid and a flat smooth substrate during impact [19][20][21]. Interferometric fringes can be demodulated to reveal the three dimensional profile of the air layer beneath the impacting drop, with a trough-to-peak resolution of approximately 150 nanometers between the dark and bright fringes. In these studies, however, absolute measurements of the air gap can only be obtained through more technically advanced implementations, such as white-light interferometry [8] or two-color techniques [22,23], and require additional hardware and processing.
We address this imaging challenge using deep convolutional neural networks (DCNN) to obtain accurate and absolute values of the target quantities, while bypassing the need for explicit interferometric phase unwrapping. DCNNs were recently used to recover cohesive properties in the context of dynamic fracture mechanics from experimental interferometry images [24] in a manner similar to our recovery of FTIR imaging data from interferometric imaging data. Here, we exploit the proven ability of DCNNs [25] to obtain a super-resolved image from a low-resolution counterpart in the study of droplets impacting on a glass surface. Our DCNN, adapted from Visual Geometry Group (VGG) deep convolutional network [26], simultaneously demodulates the interferometric fringe patterns and maps them to target quantities. This builds upon recent efforts to use neural networks to unwrap the phase of interferometric fringes [27][28][29], that show better accuracy [30] and robustness [31] than approaches like phase shifting [32] and spatial phase demodulation methods [33].
A set of images recorded during the droplet impact experiments comprise the DCNN training database. We image the entire spatial extent of the droplet impact interface with an interferometric modality using a large field of view. Concurrently, we employ a frustrated total internal reflection (FTIR) microscopy modality to image a narrow sub-region of the interface at a higher resolution, producing the absolute target quantities. We then crop the interferometry images and the FTIR images to the common sub-region, thus forming the data pairs used to train and test the DCNN. Once trained, the network then maps all the other sub-regions to high resolution data; these upscaled subregions are then assembled via a high-order overset image reconstruction technique [34] for the entire sequence of low-resolution interferometric images. The hybrid processing framework, consisting of the High-order OverseT assembly method and the deep convolutional Neural NETwork (HOTNNET), produces demodulated, super-resolved images on a large field of view. We can then use a nanometer-precise, closed-form mapping function [35] to treat the HOTNNET output, thus obtaining accurate, absolute measurements with improved resolution along all directions, with accuracy of the order of the sub-fringe thickness resolution obtained with two-color interferometric imaging methods [36].

Imaging Setup
A 1.6 mm diameter drop of water-glycerol solution was released from a syringe luer adapter, and impacted on a smooth glass surface, as shown in Fig. 1(a) at top. The droplet viscosity in each experiment was selected from 8 different viscosities in the range of 1 to 100 centi-Stokes. Impact velocity was controlled by releasing the droplet from among 5 different fall heights, H, varying from 12.5 to 24.5 mm. At each height, all viscosities were tested once; thus, in total, 40 impact experiments were conducted. Prior to liquid-solid contact initiation, a nanometer scale air film forms between the droplet and the glass surface. The initial thickness of this air film varies inversely with impact velocity, from approximately 300 nm to tens of nm [37]. The liquid-solid-air interface was simultaneously imaged with two independent optical measurement techniques: FTIR and Fizeau Interferometry (FIF). These data were used in reports on the lift-off instability beneath an impacting droplet [38].
In the FTIR method [2,[37][38][39], the top surface of an optically smooth dove prism (BK-7 glass) is illuminated with collimated light, aligned to be incident at an angle greater than the critical angle for total internal reflection at a glass-air interface, but smaller than the critical angle at the glass-liquid interface. This light excites an exponentially decaying evanescent field above the prism's surface, and when the liquid droplet enters within a few decay lengths of the surface, some light is transmitted to the droplet, and the intensity of the reflected light is less. The proportion of light transmitted to the drop increases as the droplet gets closer to the surface, further reducing the reflected intensity. The reflected light is imaged onto a high-speed camera sensor using a long-working distance (Mitutoyo 5X) microscope objective. The normalized light intensity is directly related to the air layer thickness by a deterministic transfer function, which varies with the incident light polarization, as described in Appendix 1.
The interferometric method requires a second optical path. In this approach, collimated light is directed onto the optical path using a 50-50 beam-splitter and focused onto the impact surface. When the droplet enters to within the coherence length of the light, an optical cavity is formed between the droplet surface and the solid surface. Some of the light reflects from the solid-air interface; a smaller amount of the transmitted light then reflects from the airliquid interface. Depending on the gap thickness between the solid and the liquid, these two reflected beams will interfere [22,40]. The interference between the two reflected beams from the spatially varying gap results in a pattern of fringes that are subsequently imaged onto a second high speed camera sensor. Both FTIR and FIF optical configurations are depicted in Fig. 1(a). Alignment of the optical paths and robust optical mounts were used to ensure image registration fidelity along the 40 experiments.
Simultaneous FIF and FTIR imaging is used in recent interfacial mechanics studies to probe the liquid-air interface beneath an impacting droplet [41][42][43]. These imaging modalities provide complimentary information about the liquid-air profile at different scales and with different resolutions -FTIR can resolve up to the wavelength of . Light emitted from a Thorlabs HNL150LB HeNe laser (L1) illuminates the Dove prism (P) upper surface and totally internally reflects from the glass-air interface directly beneath the impacting droplet and is captured by camera C1 after exiting from the prism and passing through a dry 5X Mitutoyo microscope objective (O1). Fizeau interferometry data is acquired with the independent light path from a Thorlabs M530L2 green LED light source (L2) that is collimated and goes through the beam splitter (BS) and is captured on sensor C2 through another dry 5X Mitutoyo microscope objective (O2). C1 and C2 are Phantom V711 and Phantom V7.3 fast cameras, respectively with tube lenses. They record at 180,000 and 90,000 frames per second, respectively, and are simultaneously triggered externally using a analog signal for each droplet impact experiment. Mirrors (M, M1 and M2) are used for light path alignment. Two example images of the thin air film during contact initiation beneath the droplet: FTIR image (b) and Fizeau interferometry fringes image (c). The FTIR image is stretched by a factor of √ 2 in the x 2 direction Appendix 1 to account for the oblique illumination angle, thus unifying the scale of the major and minor axes of FTIR images. The interferometric image recovers a larger region on the surface. The surface region sampled by both imaging methods simultaneously is shown by a blue rectangular box which has a height and width of 0.35 mm and 1.42 mm, respectively. (d) and (e) show the rapid nature of contact initiation. Onset of wetting is mediated by the formation and growth of nanoscale liquid bridges, binding the liquid to the solid through a thin film of air. FTIR imaging modality better resolves the fleeting time scales where the contact initiates but on a narrrower field of view light with exponentially increasing resolution as the droplet approaches closer to the surface, whereas FIF can be used to measure gaps of up to several microns with lesser resolution. During simultaneous imaging, FTIR provides a reference for absolute measurement of the liquid-air interface with FIF; however, this doesn't remove the need for FIF fringe demodulation.
In our simultaneous FIF and FTIR imaging setup, the FIF imaging path corresponds to a larger physical region extending over the entire impact zone beneath the droplet; by contrast, the FTIR modality images a smaller region near the impact axis with greater magnification. The common region recorded by both imaging methods is shown by a blue rectangular box in Fig. 1(b and c). The FIF images comprise 128 × 128 pixels at a resolution of 12.7 microns per pixel in x 1 and x 2 directions with a precision of about one micron in the x 3 direction. The FTIR images comprise 512 × 64 pixels at a resolution of 3.8 by 5.0 microns per pixel in the x 1 by x 2 directions. FTIR resolves the gap thickness to within 10 nanometers in x 3 direction [35]. FTIR and FIF images are recorded at 180,000 and 90,000 frames per second, respectively; example images are shown in Fig. 1(b and c). The sequence of images shown in Fig. 1(d and e) shows the rapid nature of contact formation.

Deep Convolutional Neural Network
We developed a deep convolutional neural network that maps our FIF images to our FTIR images. The network we employed [44] is adapted from the EnhanceNet neural network [26]. Similar networks have proven to be a robust tool in numerous applications in the field of data science whenever information is abundant [45,46], and there is considerable interest in using these networks in engineering applications [24,47,48]. The structure of our network is detailed in Appendix 2 and is available online [44].
To train the network, we generate input-output data pairs from the images recorded during the impact experiments. Prior to training, the images are treated with a pre-processing protocol. First, the FIF images are interpolated in time using a quadratic interpolation function in order to synchronize the FTIR and FIF images. This yields a total of 14 871 time synchronized frames from the 40 droplet impact experiments. Second, the data from the common region indicated by a blue rectangle in Fig. 1(b and c) are denoised and cropped from the images. The dimensions of the FIF and FTIR images are reduced to 25 × 108 and 64 × 364 pixels, respectively. Third, from each FIF-FTIR pair, 3 additional data pairs are generated by mirroring the images on x 1 , x 2 , and x 1 − x 2 axes which increases the number of data pairs to 59 484 . As the final step, instead of pairing the FIF-FTIR images on a 1 to 1 basis, the input FIF data is augmented by 2 FIF frames, one before and one after the target time step, resulting in the pairing of 3 successive FIF images to 1 FTIR image; yielding data pairs with dimensions 3 × 25 × 108 ⇒ 1 × 64 × 364 . The sequence of 3 FIF frames provides additional information to the network, facilitating the interferometric signal phase unwrapping and better capturing the dynamics of the liquid-air interface.
The total data pairs were split into 2 pools: the first pool consists of 32 impact experiments; images in this pool were used for both training/validation and testing purposes. The second pool consists of 8 separate droplet impact experiments; these images were used entirely for the final testing of the network performance. To train the network, 12,000 and 2000 data pairs were randomly drawn from the first pool to assemble the training and validation data sets, respectively. The testing data set for the network consists of the remaining of data pairs from the first pool, and all the pairs in the second pool. In total, 45,484 data pairs that the network did not see during the training were used to test the network's performance. Out of these testing set, 12,395 pairs were from the second pool which were entirely left out during the network training. The root mean square error on all pixels averaged on 45,484 images is in order of 10 −6 , as shown in Appendix 2.
There are four key stages of the droplet impact process, where the images appearing in each stage are somewhat distinctive; namely, before the contact initiation, immediately after the contact initiation, late stage of liquid-solid contact growth, and complete liquid-solid contact development. The last stage is discernible when the contact line(s) become stationary after sweeping the whole contact area and a single or multiple air bubbles might remain trapped. This stage of the impact process is distinguished by no perceptible change from frame-to-frame in the recorded images. Example network output along with the FIF and FTIR images for these four key stages is shown in Fig. 2. The network is able to capture the outline of the contact regions successfully at different stages of impact while simultaneously increasing the spatial resolution of the images. This "super-resolution" imaging is achieved by the network due to the higher magnification of the FTIR image. Additionally, the network enhances accuracy in the x 3 direction as a natural consequence of the elevated resolution along x 3 with the FTIR method. Thus, transforming the FIF data to the FTIR data by the use of deep learning implicitly bypasses the need for unwrapping the fringe patterns.

Mosaic Mapping with Overset Technique
The DCNN input dimension must match the size of the data that is used to train the network [49]. Our deep neural network is trained to transform subregions of the FIF images with dimensions of 3 × 25 × 108 , converting them to high resolution 64 × 364 FTIR-like images. However, we aim to recover the entire dynamics recorded with the full 128 × 128 pixel field of view in the FIF image sequence. As a consequence of the mismatched dimensionality of our network input and the full FIF image, we must employ a means of using our trained network on the full FIF image despite this dimensional mismatch. Stitching the image sub-regions is commonly carried out in super-resolution imaging applications as a solution. In order to avoid losing the accuracy, we employed the overset grid technique [34,50], which is inspired by Chimera method used extensively in high fidelity numerical simulations of fluid flows [51][52][53][54]. This method breaks a discretized domain into subregions, and provides a means of connecting the subregions by enforcing high order continuity conditions in an overlapping or overset neighborhood of each subregion's boundary.
In order to implement the overset technique, we have partitioned the FIF data where there are a minimum of 2 and 3 overlapping pixels in the "untransformed" FIF images in x 1 and x 2 directions, respectively, which results in minimum of 5 overlapping data points in both 1 and 2 directions in the transformed subdomain as shown in Fig. 3(a). Using this partitioning we are thus breaking the entire FIF image into subregions; each of these subregions is an individual tile in the entire mosaic of the original FIF image. We can thus apply our neural network to each tile and obtain as an output a series of tiles that together form a new mosaic image representing the piece-wise mapping of the entire FIF image into a new super-resolved image. However, in order for the resulting super-resolved image to be coherent, the tiles must be connected properly. The procedure for connecting adjacent tiles in the super-resolved mosaic image is to interpolate the . If we only consider those 12 395 images which were from the 8 droplet experiments that were left out during the pre-processing stage ("Deep Convolutional Neural Network" section), the RMS of the error is equal to 8.3 × 10 −6 per pixel 2-dimensional equally-spaced coincident data points in the boundary region using a 6th-order Lagrangian interpolant [34]. The specific implementation of the interpolator stencils can be found elsewhere [44]. In this way, the subregions of the FIF data are fed one-by-one to the trained network, resulting in the high resolution mosaic shown in Fig. 3(b). Finally, by applying the pre-determined interpolator stencils the whole image is reconstructed, as shown in Fig. 3(c).

HOTNNET Results
Our DCNN can demodulate the interferometric fringe patterns and increase the data resolution of the image, as shown in Fig. 2. By merit of the FTIR image database used to train the network, the network also enhances the image resolution along the third axis, with some loss of depth-of-field. However, in the prior section we introduced the means to extend of the use of our DCNN to the entire FIF image using the high-order overset technique. The combined workflow of first using the DCNN region-by-region with the high-order overset technique comprises a hybrid processing framework that we call HOTNNET for short. With HOTNNET, we are able to enhance the resolution across the entire field of view contained in the interferometric images.
The results of HOTNNET are shown for 3 different FIF images corresponding to 3 stages of droplet impact event in Fig. 4. Upon the application of HOTNNET, instead of FIF fringe patterns, the reconstructed images contain light intensities representing the air gap measurements; such an output, in principle, provides a rich database for extraction of the physics of droplet impact, with images that can be mapped onto gap measurements. The HOTNNET output no longer suffer from the 150 nanometers ambiguity in the Fig. 3 The mosaic mapping from the low resolution image (a) with 128 × 128 pixels to the higher resolution image (c) 431 × 328 . The whiteoutlined region near the bottom is the subset of the FIF image where high resolution FTIR data are available and is used for training the neural network. An overset grid with overlapping nodes is adapted from computational fluid dynamics [34] to partition the FIF image into subsets of appropriate dimensions for the neural network input. The subsets in the untransformed domain (a) and the mosaics in transformed image (c) are indicated by the rectangular colored outline; these comprise the unseen input and output of the trained neural network, respectively. This transformation is illustrated in (b) for 2 subdomain/mosaic pairs as an example; the arrows indicate the application of the trained neural network. The height and width the mosaic is normalized to unity with uniform node spacing in x 1 and x 2 directions to construct the interpolation stencils for the overlapping pixels as described elsewhere [34,44]. The number of overlapping pixels introduced by hatch pattern on (b) is at least 5 in the transformed subdomain; this allows for a 6th-order Lagrangian interpolation of the neural network output measurement that the monochromatic FIF and its accuracy is FTIR imaging modality which is proven to have an accuracy of a few nanometers in the x 3 direction [35]. Using HOTNNET, we retain the resolution enhancement of the DCNN for each individual tile of the mosaic -the 128 × 128 images are up-sampled to 431 × 328 pixels.
Notably, any defect in the imaging system, e.g., dust on the camera sensor is removed by HOTNNET as indicated in Fig. 4(a-c). This is likely due to the use of 3 successive frames in the network training. Even in the presence of such artifacts, the output resolution of HOTNNET is at the same level as the FTIR imaging, suggesting that HOTNNET is capable of removing static noise from the input images.
HOTNNET successfully demodulates the grayscale and the profile edges of recorded FIF images. Some minor spurious noise is visible in Fig. 4(a and b), arising from the very rapid droplet spreading and the fastest dynamic contact line movement in these experiments. The recognition of fringe patterns is not as precise in Fig. 4(c), but the trapped air pockets (light spots in the dark area) are accurately identified. The lower quality of unwrapped fringes in this case arises from the lack of similar patterns in the network training data.
The overset technique employed in HOTNNET ensures that the reconstructed images do not show any non-uniformity in the inter-boundary regions of the tiles, thus eliminating the so-called 'edge effect.' as shown in Fig. 5. The high order stencils of HOTNNENT ensures that even at the edges the error is in the orders of 10 −4 , but is larger very close to the liquid-solid contact due to sharp change of intensity.

Discussion
In this work, we introduce HOTNNET -a computational framework that upscales and demodulates interferometric images, thus enhancing spatial resolution in all three dimensions. At the core of HOTNNET is a deep convolutional neural network that eliminates static artifacts from the images while simultaneously enhancing resolution. HOTNNET uses a high order overset technique to produce full super-resolved images from the interferometric images of a given size and is not constrained by the dimensions of the initial network training sets. In order to demonstrate the HOTNNET framework presented in this paper, we used a total 21 888 number of frames generated from experiments on droplet impact interfaces. The underlying physical results are explained elsewhere [35,38] -when the drop approaches the surface, even at low speeds the initial dynamics of wetting are strongly altered by the confined air layer, which is displaced by the wetting fluid and deforms the liquid-air interface ahead of the contact line. The deformation results in a capillary disturbance that precedes the wetting front and leads to peculiar low wetting velocities beneath the impacting drop. These dynamics, ahead of the contact line, are inertial. Thus, the wetting front is surfing on a capillary wave and dragging a viscous tail in its wake.
Close to the edge of the contact line there is a distinguished halo region with sharp gradient in air layer thickness ( Fig. 2(b and c)). In this vicinity, the FIF images suffer from degradation of fringe patterns due to lack of lateral resolution and sharp slopes at the liquid-air and Fig. 4 Converting low resolution FIF fringes to high resolution, easy-to-interpret, FTIR images by a hybrid overset technique-neural network (HOTNNET) (a-c). The arrows represent application of the HOTNNET framework introduced in this article, namely, untransformed image partitioning, application of the neural network to subsets, and mosaic mapping of neural network output with 6th-order interpolation at edges/corners for assembling the high resolution image. The size of the reconstructed FTIR images ( 431 × 328 ) exceeds the input FIF data ( 128 × 128 ) by hundreds of thousands of pixels, significantly extending the spatial contents of these images. The dust on the FIF modality sensor is marked by the red arrows. HOTNNET removes this static noise from the demodulated images as indicated by red rectangles liquid-solid interfaces. This introduces inaccuracies in capturing the actual peak/trough locations (iso-intensity contours) and their intensity magnitude. Consequently, fringe pattern demodulation becomes impossible or extremely inaccurate. In HOTNNET results, the outline of the halo section is captured and there is little ambiguity in the extent of the actual peak of halo region as the relation between the air layer thickness and the normalized light intensity is monotonic. However, the mapping function from FTIR to air layer thickness given in Appendix 1 may need correction [55] for high slopes and curvatures.
While the HOTNNET framework can be readily adapted to other imaging datasets, including those obtained with other interferometric imaging modalities for instance, the weight coefficient factors embedded in our trained neural network correspond to our specific optical setup. Consequently, changing the imaging system will require the network to be retrained. Retraining the network with a novel imaging system requires the collection of sufficient data for training, where the exact volume of training data depends on the quality of the images, but as few as several hundreds of images could suffice. The open-source code for HOTNNET including that for training the network is available on github [44].
Most neural network based super-resolution techniques use an artificially generated database of down-sampled, high-resolution images. Real world imaging artifacts such as sensor noise and static defects are absent from the images in these databases and thus these networks cannot be used for quantitative imaging modalities in science and engineering. HOTNNET faithfully treats the real-world recorded images because the complete data pairs for its network training are the outcome of laboratory experiments including imaging artifacts.
HOTNNET generates the final super-resolved mosaic by first mapping local tiles that are each super-resolved with the deep convolutional neural network. Because the regions of the mosaic correspond to different stages of the dynamics, the local mapping is non-invertible; thus on some level the network is classifying the contents of each tile based on limited information. Thus, like all other supervised learning strategies, the neural network in HOTNNET may lose fidelity when it is subject to substantially different interferometric image inputs. Nevertheless, for sufficiently similar interferometric images, HOTNNET is capable of producing super-resolved data that might be used quantitatively, and represents a step toward physically relevant data generated by neural network. Indeed, given the universality of physical behaviors, it is not outside the realm of possibility to use data obtained with HOTNNET to evaluate physical principles; this would require further work and verification to ensure that results obtained using the HOTNNET output are not artifacts of the processing, but instead are physical phenomena.

Conclusion
HOTNNET could be directly applied in fields of precision metrology provided that the neural network is appropriately trained. Using HOTNNET, we address constraints on our measurement due to the performance characteristics of Fig. 5 (a) The output of the HOTNNET with its subset (solid white rectangle) at the common domain as specified in Fig. 1. The image intensity at overlapping pixels of the tiles (coloured rectangles) are the result of Lagrangian interpolation in the overset scheme. The span (s) is the edge of one of the tiles selected as an example. (b) The normalized error along the horizontal s span passing through the overlapping grids. The error is in the orders of 10 −4 , but is larger very close to the liquid-solid contact due to sharp change of intensity our imaging hardware; however one can envision a means by which HOTNNET could address other constraints. For example, if an imaging system places an elliptical aperture in the imaging plane, the imaging domain would result in a non-rectangular tile shape. The overset technique in HOTNNET can be readily adapted to curvilinear geometries. Additionally, with appropriate calibration, HOTNNET could increase the accuracy of white light interferometry [56], which is currently limited by the bandwidth of the color filters on the CCD camera.
HOTNNET is designed to address constraints imposed by a given experimental setup using machine learning. This framework makes progress toward accurate and reliable physical measurements as the output of a trained neural network. Here, images recorded during droplet impact and contact formation were treated with HOTNNET, but the technique introduced here can be employed in the broader context of physics and engineering. HOTNNET thus offers a compelling and powerful tool that simultaneously achieves noise rejection, phase unwrapping and resolution enhancement of images when direct measurements are not possible or are cost prohibitive.

Profilometry by FTIR Imaging Modality
Profilometry with FTIR image processing is described in [57]. To summarise, firstly the FTIR image is stretched in the x 2 direction by a factor of to compensate for the optical transformation in the total internal reflection; where n G , n S , and 1 are refractive index of air (gas), BK-7 glass (substrate) and the light angle of incident at the air-substrate interface, respectively. Second, the image intensities are divided by the background image to obtain normalized intensities, Finally, the normalized intensities are mapped into absolute air layer thickness fields, h(x 1 , x 2 ) , using the following equations.
with constant values In these equations, n L is the refractive index of the liquid, j is the imaginary unit, and the subscripts ⟂ ∕ ∥ denote s-and p-polarized components of the light, respectively. In practice, either pure s-or p-polarized light is used in the FTIR imaging to simplify the equations above and h is readily deduced using using a lookup table [57], instead of inverting relation (4). In our droplet impact experiments, we are specifically interested in air layer thicknesses of nanometer scales where the van der Waals and other short-range interfacial forces can drive contact formation; these are the subject of recent numerical investigations [59]. We have targeted an accuracy close to few nanometers of precision. Close to darkest point of the camera ( I = O(10) ) the slope of the transfer function gets over 10 3 nanometer per unit change of normalized intensity ( Î ). Hence, in our overall HOTNNET imaging profilometry, we kept normalized root mean-squared error (NRMSE) per pixel below 10 −5 . However, our initial implementation of FTIR profilometry of droplet impact experiments [38] employed a simplified total energy inversion algorithm, that assumed all light incident upon the droplet's surface above the air transmitted into the drop, without taking into account the important role played by polarization of the light, resulting in error bars in Fig. 6 when doing the profilometry of the halo section. Figure 6 shows the kinematics of wetting front for the droplet with a viscosity of 76 centi-Stokes falling from a height of 18 mm. network [26], consisting of several interconnected layers, as depicted schematically in Fig. 7.
The EnhanceNet is a generative feed-forward fully convolutional network [26], derived from Visual Geometry Group (VGG) deep neural network [60], using strided convolutions but with leaky ReLU activation and without pooling layers. Our network uses an Adam optimizer with a learning rate of 0.0005 to minimize the mean square error cost function. Our goal was to achieve normalized root mean-squared error of order O(10 −6 ) in the interior of the mosaics (see Appendix 1) which is 2 orders of magnitude smaller than the error on edges, i.e., O(10 −4 ) (refer to Fig. 5).
With this metric, we found the EnhanceNet perform better in comparison with similar upsampling and convolution networks in terms of capturing the contact line (Fig. 8). Different networks were implemented using the TensorFlow version 1.14 Python library and was run on an NVIDIA Tesla T4 graphics processing unit. Our DCNN has shown better accuracy and an increased rate of convergence compared to GAN and U-Net. It took our modified Enhance-Net training 800 epochs over a course of 9.5 hours runtime to satisfy the convergence criterion. This was faster and more efficient compared to the original EnhanceNet and the VGG-16 architectures taking over 2400 and 4000 epochs with runtime period of each over 24 hours, respectively. Due to the fine structure of the liquid-solid contact line, as many training epochs were required for the network to achieve reproducibility of these dynamics. Our attempts for U-Net architecture training have been unsuccessful (Fig. 8(c)). We used batch shuffling of the training data set with a drop-out ratio of 12 percent was used to avoid over-fitting.
Further, here we investigate the image dimensions in convergence of training the network in HOTNNET. In an ideal scenario, a super-resolution microscopic imaging method should be capable of recording data with highprecision, covering a large spatial domain. In our experiment, the FIF imaging covers a large enough surface; using the HOTNNET technique, we have been able to improve the accuracy and quality of our FIF images to the same level as our locally recorded FTIR images as shown Fig. 9(b). We are further confident about the overset technique to remove the edge discontinuity (Fig. 5). For the Fig. 6 Kinematics of the wetting front: Close to the edge of the contact line there is a halo region with sharp gradient in air layer thickness causing smearing of FIF fringes as depicted in (a). The halo region leading the propagating front excites a capillary wave ahead of it. Thanks to the higher pixel per length ratio of FTIR microscopy, HOTNNET is better resolving (b) the outline of the halo section. The profilometry across the radial span r s is given in (c). It shows wetting front propagating outward from the point of contact initiation physics of contact lines forming beneath impacting droplets, our imaging precision in x 3 direction is satisfactory. However, further improvements of the accuracy in the x 1 and x 2 directions, from the current 3.9 and 5.5 microns per pixel, respectively, are desirable. This augmented resolution can be achieved by replacing the current 5X microscope objective in the FTIR setup to a greater magnification. This will decrease the actual dimensions of the subset used for the network training, i.e., the blue rectangle in Fig. 1 or the white rectangular region in Fig. 5. One must be aware that the smaller subset recorded with the greater magnification might not have sufficient information to train the network to demodulate the FIF fringe patterns. In the other words, the training subset must correspond to sufficient interferometry fringe variation for trainability of the network. This would likely require at least one full modulation from minimum to maximum intensity in the fringe pattern. If this criterion is satisfied, the network is trainable, the rest of the HOTNNET procedure is straightforward.
To quantitatively evaluate the minimum size of the training data, we have shrunk the dimensions of DCNN input images to 108 × 25 , 97 × 24 , 86 × 20 , 76 × 18 , 65 × 15 , 54 × 13 , and 43 × 10 towards the center and checked the performance of the network on 45 484 test data pairs which were unseen data sets during the network training in terms of the root mean square of error on all pixels. The architecture and the other parameters of the network, including the maximum number of epochs, were maintained to ensure direct comparability of the network fidelity. The results show that using larger dimensions reduces the error, reproducing the FTIR images much more satisfactorily. The network is not trainable for 43 × 10 pixels, i.e., 0.57 mm × 0.14 mm subdomain size. For 54 × 13 pixels input, i.e., 0.71 mm × 0.18 mm subdomain, which is approximately the subdomain size which can be covered by a 20X objective, the network is trainable but the error is found to be high. We use a maximum root mean square of 10 −4 as a threshold for trainability, which was achieved with a subdomain dimensions of 86 × 20 ≈ 1700 pixels, or 1.15 mm × 0.28 mm tile size. This width of the tile is comparable to the diameter of the droplet, i.e., 1.6 mm. Fig. 7 The architecture of the network employed in this work. The network implicitly performs unwrapping of the input interferometry fringes, up-sampling the data to a higher spatial resolution, and improves the signal to noise ratio at the same time. (a) All the convolutional layers employ a stride of a size (3 × 3) . The ○ symbols on the convolutional layers indicate Rectified Linear Units (Relu) activation. (b) In the first few convolutional layers the outline of the grayscales are determined. In the other words, the network distinguishes which stage of contact dynamics are being taught the network.The FIF images are enlarged in the figure and not to be scaled