1 Introduction

Liquid crystals (LCs)– a unique bridge between solid and liquid phases – combine the flowing nature of liquids with the symmetrical properties of the crystals [1]. These have optical anisotropy, exhibiting birefringence properties falling into two main categories– thermotropic and lyotropic. The specific LC phase is determined by factors such as temperature changes (for thermotropic LCs) or variations in concentration (for lyotropic LCs). For instance, the thermotropic LC, 5CB transitions to a nematic phase at approximately \(35^{\circ }\)C [2]. In the nematic phase, the molecules are aligned in a specific direction, but lack a long-range positional order. When viewed under crossed polarizers, the nematic phase appears uniform and exhibits a characteristic “worm-like” or “thread-like” texture. On the other hand, one typical example of a lyotropic LC is sodium dodecyl sulfate (SDS) in water. SDS is an amphiphilic molecule, meaning it has both a hydrophilic (water-attracting) head and a hydrophobic (water-repelling) tail. In certain concentrations and under specific conditions, SDS molecules can self-organize to form lyotropic LC phases. The lyotropic phases include lamellar (smectic), hexagonal, and cubic phases. Their textures exhibit distinctive patterns when observing these phases (lamellar, hexagonal, and cubic) under crossed polarizing microscopy. It is important to note that these textures provide information about the molecular ordering within the LC phases. The specific patterns result from the molecular alignment and organization, which can be influenced by factors such as concentration, temperature, and solvent [3].

This emergence of the patterns has been investigated thoroughly from a soft matter perspective. Examples include studying copolymers, where different polymer blocks self-assemble [4], exploring colloids that organize into structures like crystals and gels [5], understanding LC patterns with their distinctive molecular order [6], and investigating vortex formation in active matter, like swimming microorganisms displaying collective motion [7]. In contrast, patterns that emerge from drying sessile droplets, often found to have the “coffee ring effect” [8], have also been investigated in soft matter. When a liquid droplet containing solutes evaporates, it can leave behind distinctive patterns [9]. Understanding the dynamics of a drying droplet and the resultant pattern formation is not only interesting from a fundamental physics perspective but also has practical applications in various fields, including inkjet printing [10], coating technologies [11], and bio-medical diagnostics [9]. Recently, it has been investigated how the morphological patterns emerge when the optically active particles (5CB) are used as a probe in the different protein drying droplets [2, 12, 13]. It reveals that adding a fixed volume of LC to different globular protein solutions (lysozyme, BSA, and myoglobin added with de-ionized water) alters morphological patterns during drying [14]. In lightweight proteins (myoglobin and lysozyme), LCs are preferred to be randomly distributed. Conversely, in heavily weighted proteins like BSA, they form umbilical defect structures [12]. Interestingly, when the solvent is replaced from the de-ionized water to buffer saline (PBS), the optical activities of the LCs become lower and lower as the initial PBS concentration increases from 0.25 to 1x [13].

Within the field of LCs, both traditional and deep learning approaches in the supervised machine learning (ML) have been applied during the last two decades. The traditional approaches include Random Forest (RF), Support Vector Machine (SVM), Decision Trees (DT), Multivariate Adaptive Regression, etc. In contrast, deep learning involves fast-forwarding neural networks (NN) and artificial and convolution NN. SVM [15] predicts the transition temperatures in thermotropic LC. Furthermore, RF [16] and ordinal networks [17] are applied to forecast LC properties, while DT is specifically implemented to identify clearing temperatures in bent-core LCs [18]. In contrast, the calibration of LC phases has been successfully predicted using neural networks (NN) [19]. Even unsupervised ML is used to characterize the particle trajectories, pitch, and conical angle of the nematic phases relating to its structural and dynamical properties [20]. A few recent investigations include predicting molecular ordering [21], phase transitions [22,23,24], topological defects [25, 26], and LC textures [27, 28]. On the other hand, different MLs are implemented for pattern recognition in a sessile drying droplet setting [29,30,31].

Despite significant progress in machine learning (ML) and liquid crystals (LC), the integration of LC textures and ML techniques in a drying droplet scenario has been relatively limited. This paper addresses two main objectives. Firstly, we aim to quantify LC textures using advanced image processing techniques. Secondly, we employ generalized additive modeling and k-means clustering as an unsupervised machine learning to identify distinct stages in the drying process. Specifically, we investigate which drying stage predominates in identifying different LC-textured droplets and determine the number of LC textures at each stage. While comparing the methodology adopted in this study with the existing approaches in the drying droplet settings, we observed that the traditional methods often focus solely on visual inspection or limited quantitative analysis, lacking the depth and precision provided by the integrated approach adopted in the current study. Most quantitative analysis is possible either before drying or after the drying process is completed. In contrast, we aim to use the whole drying process by incorporating advanced texture analysis techniques to classify different drying stages and quantify the pattern dynamics based on these evolving distinct textures. Our study contributes valuable insights into the efficacy of drying droplets as a rapid and straightforward tool for characterizing and classifying dynamic LC textures. This application has significant implications in various systems, including the LCs and any multi-component colloidal drying droplets. Furthermore, our approach differs from others and uniquely classifies the LC textures by implementing an unsupervised data-driven method. K-means clustering does not involve using labeled training and test datasets like classification machine learning problems usually do (described so far). Instead, it focuses on discovering patterns, relationships, and structures (if any) within the dataset without predefined targets

Fig. 1
figure 1

A flowchart showing the initiation of the drying evolution of a sessile droplet captured with optical microscopy. Different stages during the drying process are established. The quantitative image analysis using the Pyradiomics toolbox is used to identify the dynamics of the liquid crystal (LC) textures induced by the drying process. The parameters include First Order Statistics (FOS), Gray Level Co-occurrence Matrix (GLCM), Gray Level Size Zone Matrix (GLSZM), Gray Level Run Length Matrix (GLRLM) and Gray Level Dependence Matrix (GLDM). These are feature vectors for Generalized Additive Modeling (GAM) and K-means Clustering to identify the LC-protein droplets at different initial buffered concentrations of 0x, 0.25x, 0.5x, 0.75x, and 1x

To achieve this, we utilize texture analysis involving various order rank parameters, ranging from first-order to higher-order parameters. The quantitative image analysis employs the Pyradiomics toolbox to unveil the dynamics of LC textures induced by the drying process.

Figure 1 illustrates the flowchart outlining our approach. Each droplet progresses through various stages during the drying process, ultimately forming distinct fingerprint patterns. The drying droplets are examined using bright-field microscopy under crossed polarizing configurations. Throughout the drying process, features, including First-Order Statistics (FOS), Gray Level Co-occurrence Matrix (GLCM), Gray Level Size Zone Matrix (GLSZM), Gray Level Run Length Matrix (GLRLM), and Gray Level Dependence Matrix (GLDM), are systematically extracted. Statistical data analysis is then applied to the diverse features extracted from the images. This step allows us to determine the number of LC textures present and identify which drying stage predominantly influences the identification of different LC-textured droplets. The comprehensive approach employed in this study aims to provide a detailed understanding of the evolving patterns in LC textures during the drying process.

2 Methods

2.1 Samples and their preparations

The lyophilized form of hen-egg white lysozyme (Catalog number L6876) was purchased from Sigma-Aldrich, USA. The 1x PBS (phosphate buffer saline) is diluted to 0.75, 0.5, and 0.25x. The 1x PBS solution (Catalog number BP24384, Fisher BioReagents, USA) contains 0.137M (\({\sim }8.0\) mg mL\(^{-1}\)) NaCl, 0.002M (\({\sim }0.2\) mg mL\(^{-1}\)) KCl, and 0.0119M (\({\sim }1.44\) mg mL\(^{-1}\) of Na\(_2\)HPO\(_4\) and \({\sim }0.24\) mg mL\(^{-1}\) of KH\(_2\)PO\(_4\)) phosphates at a pH of \({\sim }7.4\). The 0x represents the de-ionized water (Millipore, 18.2 M\(\Omega \).cm at \({\sim }25^{\circ }\)C). 100 mg of lysozyme is weighed and mixed in 1 mL of these PBS solutions. The thermotropic liquid crystal [5CB (4-Cyano-4’-pentylbiphenyl)] (Catalog number 328510, Sigma-Aldrich, USA), was heated above its transition temperature (\({\sim }35^{\circ }\)C). \({\sim }10\) \(\upmu \)L was added as a third component to the different protein-saline droplets. Therefore, we have five protein-LC droplets, with the initial PBS concentration of 0x, 0.25x, 0.5x, 0.75x, and 1x.

A volume of \({\sim }1\) \(\upmu \)L sample solution is pipetted on a freshly cleaned coverslip (Catalog number 48366-045, VWR, USA) under ambient conditions (the room temperature of \({\sim }25^{\circ }\)C and the relative humidity of \({\sim }50\)%). At the beginning of the experiment, the droplet’s contact angle was \({\sim }40^{\circ }\). The slides were cleaned using standard cleaning procedures involving washing with ethanol, isopropanol, and deionized water, followed by thorough drying before performing the experiments. This ensures the removal of any contaminants or residues that might interfere with the experimental observations. The drying evolution is monitored every two seconds. The clock started when the droplets were deposited on the coverslips. The normalized time is calculated as the ratio of the instantaneous time to the total drying time. Each drying process lasts approximately 10 min (or 600 s). The duration is divided into three stages– initial, middle, and final. The initial stage typically lasts around 4 min or 240 s. The middle stage corresponds to 3 min and 30 s, while the final stage lasts roughly 1 min and 30 s. To ensure reliability, all these experiments were repeated three times. These samples show a good reproducibility.

2.2 Image acquisition

The images were captured under 5x magnification using cross-polarized optical microscopy (Leitz Wetzlar, Germany) configured in the transmission mode. An 8-bit digital camera (MU300, Amscope) was attached to the microscope, and the top-view images were clicked. All the experiments are conducted in a dark room. The microscope lamp illuminated the room, and it remained constant throughout. However, minimal fluctuations in the images within the region outside the droplet were observed.

2.3 Image processing

The Pyradiomics tool [32] in Python (version 3.10) extracts the quantitative features during drying. We used one droplet per PBS concentration type for the analysis. Each drying droplet has approximately 300 images for each concentration. The total number of images considered in this study is approximately 1500 (300 \(\times \) 5= 1500). We have re-sized the images into 150 pixels \(\times \) 150 pixels, and converted all images into 8-bit grayscale images. Following this, a mask is selected on the time series of the droplet to highlight the region of interest (ROI) for the feature extraction. A systematic approach was employed to select these variables. Features related to shape were disregarded due to the limitation of top-view bright-field images, which capture the droplet in 2D. The droplet shape remains consistent during drying as its edge is pinned to the substrate. Subsequently, all 89 quantitative features from the Pyradiomics package were plotted against the PBS concentration. The visual inspection enabled us to identify the features that exhibited distinct behavior for each concentration. These features were then selected for further analysis. The standard features considered in this study can also be generated using any other image processing package. This includes First Order Statistics (FOS), Gray Level Co-occurrence Matrix (GLCM), Gray Level Run Length Matrix (GLRLM), Gray Level Size Zone Matrix (GLSZM), and Gray Level Dependence Matrix (GLDM). The respective classes are RadiomicsFirstOrder, RadiomicsGLCM, RadiomicsGLRLM, RadiomicsGLSZM, and RadiomicsGLDM. These classes compute the distribution of intensities within the specified ROI by initializing the batch of images as an input and the associated mask.

For FOS, \(X\) is a set of \(N_p\) pixels in the ROI, \(P(i)\) is the first-order histogram with \(N_g\) discrete intensity levels, \(N_g\) is the number of non-zero bins, and \(p(i)\) is the normalized first-order histogram and equal to \(\frac{P(i)}{N_p}\). The FOS parameters include Mean (\(\frac{1}{N_p} \sum _{i=1}^{N_p} X(i)\)), Energy (\(\sum _{i=1}^{N_p} (X(i) + c)^2\)), RootMeanSquared (\(\sqrt{\frac{1}{N_p} \sum _{i=1}^{N_p} (X(i) + c)^2}\)); where c is used to shift the calculated intensities and prevent negative values in \(X\)), Entropy (\(-\sum _{i=1}^{N_g} p(i) \log _2(p(i) + \epsilon )\), where \(\epsilon \) \(\approx 2.2 \times 10^{-16}\)), Skewness (\(\frac{\mu _3}{\sigma ^3} = \frac{1}{N_p} \sum _{i=1}^{N_p} \frac{(X(i) - \bar{X})^3}{\left( \frac{1}{N_p} \sum _{i=1}^{N_p} (X(i) - \bar{X})^2\right) ^{3/2}}\)), Kurtosis (\(\frac{\mu _4}{\sigma ^4} = \frac{1}{N_p} \sum _{i=1}^{N_p} \frac{(X(i) - \bar{X})^4}{\left( \frac{1}{N_p} \sum _{i=1}^{N_p} (X(i) - \bar{X})^2\right) ^2}\), where \(\mu _3\), and \(\mu _4\) are the 3rd and 4th central moment, respectively), and Uniformity (\(\sum _{i=1}^{N_g} p(i)^2\)), and Variance (\(\frac{1}{N_p} \sum _{i=1}^{N_p} (X(i) - \bar{X})^2\)).

A Gray Level Co-occurrence Matrix (GLCM) of dimensions \(N_g \times N_g\) characterizes the second-order joint probability function within the ROI image. It is defined as \(P(i, j|\delta , \theta )\), where \((i, j)\) represents the frequency of occurrences of levels \(i\) and \(j\) in pairs of pixels. These pixels are separated by a distance of \(\delta \) pixels along angle \(\theta \) from the center pixel. The feature value is computed individually for each angle in the GLCM, and the average of these values is obtained. Let \(P(i, j)\) denote the co-occurrence matrix for an arbitrary \(\delta \) and \(\theta \), and \(p(i, j)\) represent the normalized co-occurrence matrix, defined as \(P(i, j)/\sum P(i, j)\). \(N_g\) is the count of discrete intensity levels in the image, \(p_x(i) = \sum _{j=1}^{N_g} p(i, j)\) denotes the marginal row probabilities, and \(p_y(j) = \sum _{i=1}^{N_g} p(i, j)\) denotes the marginal column probabilities. Further, \(\mu _x\) is the mean gray level intensity of \(p_x\), calculated as \(\mu _x = \sum _{i=1}^{N_g} p_x(i) i\). Similarly, \(\mu _y\) is the mean gray level intensity of \(p_y\), defined as \(\mu _y = \sum _{j=1}^{N_g} p_y(j) j\). Additionally, \(\sigma _x\) represents the standard deviation of \(p_x\), and \(\sigma _y\) represents the standard deviation of \(p_y\).

The GLCM includes Contrast (\(\sum _{i=1}^{N_g} \sum _{j=1}^{N_g} (i - j)^2 p(i, j)\)), Difference Average (DA) (\(\sum _{k=0}^{N_g-1} k p_{x-y}(k)\)), Maximum Probability (\(\max (p(i,j))\)), Correlation (\(\sum _{i=1}^{N_g} \sum _{j=1}^{N_g} p(i, j) \frac{ij - \mu _x \mu _y}{\sigma _x(i) \sigma _y(j)}\)), Difference Entropy (\(\sum _{k=0}^{N_g-1} p_{x-y}(k) \log _2(p_{x-y}(k) + \epsilon )\)), Difference Variance (\(\sum _{k=0}^{N_g-1} (k - DA)^2 p_{x-y}(k)\)), Sum Entropy (\(\sum _{k=2}^{2N_g} p_{x+y}(k) \log _2(p_{x+y}(k) + \epsilon )\)), and Inverse Difference Moment (IDM) (\(\sum _{k=0}^{N_g-1} \frac{p_{x-y}(k)}{1 + k^2}\)).

For GLSZM, it assesses the distribution of gray level zones in an image. A gray level zone is the count of connected pixels with the same gray level intensity. In the matrix \(P(i,j)\), the \((i,j)\)th element signifies the number of zones with gray level \(i\) and size \(j\) in the image. \(N_g\) is the number of discrete intensity values in the image, \(N_s\) is the number of discrete zone sizes in that image, \(N_p\) is the number of pixels in the image, \(N_z\) is the number of zones in the ROI, which is equal to \(\sum _{i=1}^{N_g} \sum _{j=1}^{N_s} P(i,j) \text { and } 1 \le N_z \le N_p\), and p(i,j) is the normalized size zone matrix, defined as \(\frac{P(i,j)}{N_z}\). It is rotation-independent, and only one matrix is calculated for all directions within that ROI.

The main parameters of GLSZM are NonUniformity (\(\frac{\sum _{i=1}^{Ng} \left( \sum _{j=1}^{Ns} P(i,j)\right) ^2}{N_z}\)), Zone NonUniformity (\(\sum _{j=1}^{Ns} \left( \sum _{i=1}^{Ng} P(i,j) \right) ^2 / N_z\)), Zone Entropy (\(-\sum _{i=1}^{Ng} \sum _{j=1}^{Ns} p(i,j) \cdot \log _2(p(i,j) + \epsilon \)), Zone Variance (\(\sum _{i=1}^{Ng} \sum _{j=1}^{Ns} p(i,j) \cdot (j - \mu )^2\)), and Variance (\(\sum _{i=1}^{Ng} \sum _{j=1}^{Ns} p(i,j) (i - \mu )^2\), where \(\mu = \sum _{i=1}^{Ng} \sum _{j=1}^{Ns} p(i,j) i\)).

A Gray Level Run Length Matrix (GLRLM) characterizes sequences of pixels with the same gray level value, known as runs, by measuring their length in the number of consecutive pixels. The matrix, \(P(i,j|\theta )\), represents the frequency of runs with gray level i and length j along the specified angle \(\theta \) within the image ROI. \(N_p\) is the number of pixels in the image, \(N_r(\theta )\) is the number of runs in the image along angle \(\theta \), which is defined as \(\quad \sum _{i=1}^{N_g}\sum _{j=1}^{N_r} P(i,j|\theta ) \quad \text {and } 1 \le N_r(\theta ) \le N_p\). Therefore, similar to GLSZM, \(P(i,j|\theta )\) is the run length matrix for an arbitrary direction \(\theta \), and \(p(i,j|\theta )\) is the normalized run length matrix, defined as \(p(i,j|\theta ) = \frac{P(i,j|\theta )}{N_r(\theta )}\).

The GLRLM has four parameters. These are Gray Level Variance (\(\sum _{i=1}^{N_g} \sum _{j=1}^{N_r} p(i,j|\theta ) (i - \mu )^2\)), Run Entropy (\(-\sum _{i=1}^{N_g}\sum _{j=1}^{N_r} p(i,j|\theta )\log _2(p(i,j|\theta ) + \epsilon )\)), Run Variance (\(\sum _{i=1}^{N_g}\sum _{j=1}^{N_r} p(i,j|\theta )(j - \mu )^2\), where \(\mu = \sum _{i=1}^{N_g}\sum _{j=1}^{N_r} p(i,j|\theta )j\)), and Gray Level NonUniformity (\(\sum _{i=1}^{N_g} \left( \sum _{j=1}^{N_r} P(i,j|\theta ) \right) ^2 N_r(\theta )\)).

The Gray Level Dependence Matrix (GLDM) quantifies gray level dependencies in an image. Similar to GLSZM (which is dependent on the zone matrix), a gray level dependency is defined as the number of connected pixels within distance \(\delta \) dependent on the center pixel. The neighboring pixel with gray level j is dependent on the center pixel with gray level i if \(|i - j| \le \alpha \). For GLDM, the parameters Dependence Entropy (\(- \sum _{i=1}^{Ng} \sum _{j=1}^{N_d} p(i, j) \log _{2}(p(i, j) + \epsilon )\)), Dependence NonUniformity (\(\sum _{j=1}^{N_d} \left( \sum _{i=1}^{N_g} P(i, j) \right) ^{2} N_z\)), Dependence Variance (\(\sum _{i=1}^{Ng} \sum _{j=1}^{Nd} p(i, j) (j - \mu )^{2}\), where \(\mu = \sum _{i=1}^{N_g} \sum _{j=1}^{N_d} j p(i, j)\)), Gray Level NonUniformity (\(\sum _{i=1}^{N_g} \left( \sum _{j=1}^{N_d} P(i, j) \right) ^{2} N_z\)) and GrayLevelVariance (\(\sum _{i=1}^{N_g} \sum _{j=1}^{N_d} p(i, j) (i - \mu )^{2}\), where \(\quad \mu = \sum _{i=1}^{N_g} \sum _{j=1}^{N_d} i \cdot p(i, j)\)).

In total, there are thirty variables with eight First Order Statistics (FOS), eight Gray Level Co-occurrence Matrix (GLCM), four Gray Level Run Length Matrix (GLRLM), five Gray Level Size Zone Matrix (GLSZM), and five Gray Level Dependence Matrix (GLDM).

2.4 Statistical data analysis

The generalized additive modeling (GAM) is implemented using R (version 4.1.2). For this, the library(mgcv) and library(gam) [33] were installed before the modeling. It involves selecting a basis for the space in which \( f \) resides. This selection leads to the basis functions \( F_j \) such that each is associated with parameters \( b_j\). The combination of these basis functions using these parameters results in \(f(x) = \sum _{j=1}^{q} F_j(x) b_j\), where \( q \) represents the number of basis functions chosen for the representation of \( f(x) \).

The model is gam(LC-protein droplets \(\sim \) s(Energy) + s(Entropy) + s(Kurtosis) + s(Mean Absolute Deviation) + s(Mean) + s(Robust Mean Absolute Deviation) + s(Root Mean Squared) + s(Skewness) + s(TotalEnergy) + s(Uniformity) + s(Variance) + s(Contrast) + s(Correlation) + s(Difference Average) + s(Difference Entropy) + s(Difference Variance) + s(Idm) + s(Maximum Probability) + s(Sum Entropy) + s(Gray Level NonUniformity) + s(Gray Level Variance) + s(Size Zone NonUniformity) + s(Zone Entropy) + s(Zone Variance) + s(Gray Level NonUniformity) + s(Gray Level Variance) + s(Run Entropy) + s(Run Variance) + s(Dependence Entropy) + s(Dependence NonUniformity) + s(Dependence Variance) + s(Gray Level NonUniformity) + s(Gray Level Variance), data = df), where the term “LC-protein droplets” includes the initial PBS concentration of 0x, 0.25x, 0.5x, 0.75x, and 1x (five classes); and the term “s” denotes the smooth function of the GAM modeling. The data is scaled using a scale function before using the modeling to minimize the inter-scale differences. The summary of the model is described using summary(model). This modeling predicts five classes based on the smooth terms applied to the respective thirty predictor variables.

Following the GAM modeling, K-means clustering [34] is implemented in R using library(factoextra) and library(cluster). It is built using kmeans(df, centers = 5, nstart = 25), where df is the dataframe, centers specify the number of clusters (or centroids) that the algorithm should aim to find in the data, and nstart is the number of times the K-means algorithm should be run with different initial cluster centers. We chose to run the algorithm 25 times with different initializations, and the best result in minimizing the within-cluster sum of squares (WCSS) will be chosen. The clustering results are visualized using fviz_cluster(km, data = df).

The comparative analysis considers all drying stages and the individual examination of three distinct drying stages. This approach aims to assess the impact of various stages on the characteristics of five protein-LC mixtures at the initial PBS concentrations of 0x, 0.25x, 0.5x, 0.75x, and 1x. GAM and K-means clustering facilitates a comprehensive understanding of the distinctive influences of each drying stage on these protein-LC droplets.

3 Results and discussion

3.1 Qualitative and quantitative image analysis

Fig. 2
figure 2

A Optical images depict the progressive drying stages of liquid crystal (LC)-protein droplets with varied initial buffered concentrations (0x, 0.25x, 0.5x, 0.75x, and 1x) under a crossed polarizing configuration, represented by crossed arrows. The scale bar is 0.2 mm in length. B First-Order Statistics (FOS) encompass a range of variables: (I) Mean, (II) Variance, (III) Skewness, (IV) Kurtosis, (V) Root mean square, (VI) Uniformity, (VII) Entropy, and (VIII) Energy. Dynamic changes in FOS parameters are presented over normalized time, calculated as the instantaneous time divided by the total time. Noteworthy stages– initial, middle, and final– are marked with the yellow, pink, and blue colors, respectively

Figure 2A exhibits the qualitative drying evolution of the LC-protein droplets with varied initial PBS concentrations (0x, 0.25x, 0.5x, 0.75x, and 1x) under a crossed polarizing configuration. The 0x is the droplet of lysozyme, LC prepared in de-ionized water. During the drying process, the droplet height and contact angle decrease after pipetting the droplets onto the substrate. These droplets exhibit a spherical-cap shape. The curvature of these droplets induces higher mass loss near the periphery compared to the central region, leading to the well-known “coffee-ring effect” [8], commonly observed in various bio-colloids [35]. The droplets become pinned to the substrate, and the lysozyme particles are transported through an outward capillary radial flow to compensate for the loss. The initial stage involves the general mechanism of the bio-colloidal droplets and does not depend on the varied concentrations of the LC-protein droplets.

The middle drying stage is the most dynamic stage, which starts when the fluid front recedes from the periphery toward the central region and undergoes crack formation due to mechanical stress. The lysozyme droplet films are thin enough and buckled up as more water evaporates from the droplets. For 0x, upon mechanical stress-induced buckling of the protein-cracked domains, the LCs are drawn beneath these domains. The bright regions observed under crossed polarized configurations represent randomly oriented LCs. In contrast, the dark region in each domain corresponds to the attached protein layer that is not optically active, as discussed in detail in [12]. When the initial PBS concentration goes from 0.25 to 1x, the interaction between lysozyme, salts, and LCs potentially influences the arrangement of lysozyme particles and LCs. However, LCs influence the packing and might not increase the film height. LCs appear trapped in the layer between lysozyme and salts in the central regions. The evaporation of a further volume of water leads to the bursting and random distribution of LCs. In contrast to 0x, an additional salt layer is on top of the LC distribution. This entire process offers a plausible explanation for the observed reduction in birefringence intensity under crossed polarizing configuration when PBS concentration increases from 0.25 to 1x, despite having a fixed volume of LCs. The detailed analysis can be found in [13]. It is to be noted that understanding the interactions between various components, such as proteins, salts, LCs, and the substrate surface, is vital for interpreting the experimental results accurately. The substrate’s wettability can influence the adsorption and organization of proteins and other particles at the interface. Additionally, the presence of salts and LCs during drying can further modulate these interactions, potentially affecting the structure and dynamics of the assembled layers. However, the exact mechanism of the complex interplay between these components and the surface is beyond the scope of this paper. The final stage represents the concluding phase of the drying process, characterized by minor observable changes (see Fig. 2A).

Figure 2B(I–VIII) shows the quantitative drying evolution of the LC-protein droplets with varied initial PBS concentrations (0x, 0.25x, 0.5x, 0.75x, and 1x) under a crossed polarizing configuration using First-Order Statistics (FOS). The variables include Mean, Variance, Skewness, Kurtosis, Root mean square, Uniformity, Entropy, and Energy. The three stages of the drying process exhibited the FOS values qualitatively. The initial stage of the drying process reveals gradual changes in these parameters. However, limited variations in these parameters are observed for specific PBS concentrations (0x, 0.25x, 0.5x, 0.75x, and 1x). The middle stage shows a rapid rise and is more vibrant than the initial and final stages. In contrast, the final stage stabilizes the parameters, where their values become relatively constant. This three-stage FOS evolution provides a comprehensive picture of the dynamic and settling behaviors exhibited by the droplet throughout drying.

During the middle stage, a notable surge in the mean [Fig. 2B(I)], variance [Fig. 2B(II)], root mean square [Fig. 2B(V)], entropy [Fig. 2B(VII)], and energy [Fig. 2B(VIII)] is observed for PBS concentrations of 0x, 0.25x, and 0.5x. Conversely, a comparatively gradual increase is noted for concentrations of 0.75x and 1x during this stage. In contrast, uniformity [Fig. 2B(VI)] exhibits a declining trend in the middle stage compared to the initial stage. Uniformity, representing the homogeneity of the image array, is calculated as the sum of the squares of each intensity value.

Kurtosis, a measure of the “peakedness” of the distribution values in the image ROI, indicates a higher concentration of mass toward the tails rather than the mean. Skewness measures the asymmetry of the distribution of values about the mean. In this study, skewness and kurtosis exhibit positive values; however, their magnitudes vary based on the initial PBS concentration and drying stage. Initially, these values are characterized by higher magnitudes, which diminish as the drying process progresses [Fig. 2B(III–IV)].

The mean represents the average gray level intensity within the ROI [Fig. 2B(I)]. At the same time, energy measures the magnitude of pixel values in the image [Fig. 2B(VIII)], with a more significant value indicating a greater sum of the squares of these values. Variance, as the mean of the squared distances of each intensity value from the mean, reflects the spread of the distribution about the mean [Fig. 2B(II)]. The root mean square (RMS) is derived from the square root of the mean of all squared intensity values, measuring the magnitude of the image values. The trend of Variance and RMS is the same [Fig. 2B(II and V)]. On the other hand, entropy quantifies the uncertainty and randomness in the image values, representing the average amount of information required to encode these values [Fig. 2B(VII)].

Fig. 3
figure 3

A Gray Level Co-occurrence Matrix (GLCM) encompass a range of variables: (I) Contrast, (II) Correlation, (III) Inverse Difference Moment, (IV) Maximum Probability, (V) Difference Average, (VI) Difference Variance, (VII) Sum Entropy, and (VIII) Difference Entropy. B Gray Level Run Length Matrix (GLRLM) encompasses a range of variables: (I) Variance, (II) Run Variance, (III) Nonuniformity, and (IV) Run Entropy. Dynamic changes in GLCM and GLRLM parameters are presented over normalized time, calculated as the instantaneous time divided by the total time. The liquid crystal (LC)-protein drying droplets with varied initial buffered concentrations include 0x, 0.25x, 0.5x, 0.75x, and 1x. Noteworthy stages– initial, middle, and final– are marked with the yellow, pink, and blue colors, respectively

Figure 3A(I–VIII) shows the quantitative drying evolution of the LC-protein droplets with varied initial PBS concentrations (0x, 0.25x, 0.5x, 0.75x, and 1x) under a crossed polarizing configuration using Gray Level Co-occurrence Matrix (GLCM). In the context of image analysis, various texture features derived from a GLCM provide valuable insights into local intensity patterns. The variables include Contrast, Correlation, Inverse Difference Moment (IDM), Maximum Probability, Difference Average, Difference Variance, Sum Entropy, and Difference Entropy [Fig. 3A(I–VIII)]. The middle stage is the most dynamic phase, characterized by peaks and troughs in the parameter values of GLCM. In contrast, a metric highlighting intensity variation tends to increase with more significant disparities among neighboring pixels [Fig. 3A(I)]. During the initial drying stage, the contrast among the LC-protein droplets remain primarily consistent, reflecting the overall uniformity in their initial flow behavior. However, a shift occurs as the drying process advances toward completion, and distinct droplets exhibit a more dominant contrast than others. This change in contrast dynamics signifies evolving interactions and variations in the drying patterns of individual droplets, adding complexity to the overall drying process.

On the other hand, Correlation measures the linear dependency of gray level values within the GLCM, where the value lies between 0 (uncorrelated) and 1 (perfectly correlated) [Fig. 3A(II)]. The current investigation reveals correlation values ranging from 0.55 to 0.99 through the drying process. Inverse Difference Moment (IDM) quantifies the local homogeneity of an image, with weights inversely proportional to Contrast weights [Fig. 3A(I and III)]. The Maximum Probability denotes the occurrences of the most predominant pair of neighboring intensity values [Fig. 3A(IV)]. The Difference Average reflects the relationship between occurrences of pairs with similar versus differing intensity values [Fig. 3A(V)]. Difference Variance emphasizes heterogeneity, assigning higher weights to differing intensity level pairs deviating more from the mean [Fig. 3A(VI)]. As observed, the heterogeneity values get higher at the later stage of drying. Sum Entropy captures the sum of neighborhood intensity value differences [Fig. 3A(VII)], while Difference Entropy gages the randomness and variability in these differences [Fig. 3A(VIII)].

The Gray Level Run Length Matrix (GLRLM) computes Variance, Run Variance (RV), NonUniformity, and Run Entropy (RE) [Fig. 3B(I–IV)]. Variance quantifies the variation in gray level intensity for the runs, while RV assesses the variance in runs concerning run lengths. Specifically, RV is noticeably prominent during the middle drying stage when the PBS concentration is 1x, while variance is observed for all LC droplets throughout the entire drying process. The variance exhibits a trend starting with lower values and increasing toward the end of the process [Fig. 3B(I–II)]. NonUniformity quantifies the similarity of gray-level intensity values in the image, with a lower value indicating greater similarity in intensity values. On the other hand, RE measures the uncertainty and randomness in the distribution of run lengths and gray levels. A higher RE value suggests increased heterogeneity in the texture patterns. The expected relationship between NonUniformity and RE is that they should exhibit opposite trends, given that the former measures homogeneity, while the latter measures heterogeneity. This anticipated Contrast is observed in Fig. 3B(III–IV), where the initial drying stage is characterized by uniform texture, while the resulting patterns in the final drying stage are heterogeneous.

Fig. 4
figure 4

A Gray Level Size Zone Matrix (GLSZM) encompass a range of variables: (I) NonUniformity, (II) Zone NonUniformity, (III) Variance, (IV) Zone Variance, and (V) Zone Entropy. B Gray Level Dependence Matrix (GLDM) encompass a range of variables: (I) NonUniformity, (II) Dependence NonUniformity, (III) Variance, (IV) Dependence Variance, and (V) Dependence Entropy. Dynamic changes in GLSZM and GLDM parameters are presented over normalized time, calculated as the instantaneous time divided by the total time. The liquid crystal (LC)-protein drying droplets with varied initial buffered concentrations include 0x, 0.25x, 0.5x, 0.75x, and 1x. Noteworthy stages– initial, middle, and final– are marked with the yellow, pink, and blue colors, respectively

Figure 4A–B(I–V) shows GLSZM features, including nonuniformity, zone nonuniformity, variance, zone variation (ZV), and zone entropy (ZE). In contrast, the GLDM feature consists of NonUniformity, Dependence NonUniformity(DN), Variance, Dependence Variance (DV), and Dependence Entropy. Variance quantifies the variance in gray level intensities for the zones, while ZV measures the variance in zone size volumes for the zones. ZE evaluates the uncertainty and randomness in the distribution of zone sizes and gray levels, with a higher value indicating increased heterogeneity in the texture patterns. NonUniformity assesses the variability of gray-level intensity values in the image, and a lower value signifies more homogeneity in intensity values. In contrast, Zone NonUniformity gages the variability of size zone volumes in the image, with a lower value indicating more homogeneity in size zone volumes.

The observed trend between NonUniformity and Zone NonUniformity follows a similar pattern, exhibiting lower values at the initial stage and higher values at the final drying stage. However, the trend in Variance and Zone Variance is not closely aligned, suggesting that the zone’s impact may vary depending on the specific parameter being considered. Not only this, but the Gray Level Variance under the GLDM measures the variance in gray levels in the image, while DV assesses the variance in dependence size in the image. Though every parameter quantifies the variance, it is plausible that a specific parameter might be significantly higher for one protein-LC droplet while being negligible for others. The similar concept is also true for the ZE and DE. This variability underscores the intricate and context-dependent nature of the observed effects on different droplets [Fig. 4A–B(I–IV)].

Hence, qualitative and quantitative analyses affirm the predominant existence of three distinct stages throughout the drying process. This suggests that features derived from FOS, GLCM, GLRLM, GLSZM, and GLDM hold significant potential as key parameters for discerning and characterizing the LC-protein droplets.

3.2 Quantitative statistical analysis: Generalized Additive Modeling

GAM statistics show that all the features are equally crucial for identifying different LC-protein droplets. The predictors include all the feature parameters, whereas the response variable is the different LC-protein droplets. Tables 1 and 2 describes the F-statistics, Degrees of freedom (Df), and p values obtained from GAM modeling. These are the different image texture features, such as FOS, GLCM, GLSZM, GLRLM, and GLDM. The smooth functions are applied to each feature in GAM modeling. The ‘s’ in front of each feature indicates that these are smooth functions, for instance, s(Energy), s(Entropy), etc. Df represents the flexibility of the smooth functions, where the higher values indicate greater flexibility. Here, all parameters are equally flexible. The F-statistic measures the overall significance of the smooth term, and a higher value indicates a more significant effect. The p-values indicate whether the smooth term is statistically significant. The p-value less than 0.05 suggests a significant effect on the response variable (LC-protein droplets) due to the presence of all predictors (parameters within the features). This GAM output identifies the salient feature, and intriguingly, the results suggest that no single feature dominates; instead, all the features contribute significantly to the differentiation of LC-protein droplets.

The output metrics from GAM show that the null deviance (a measure of the model’s fit without any predictors) is 245.3621, with DF = 1941. When the predictors are included, the residual deviance drops to 1.8626, and DF = 1811, indicating the GAM model has significantly improved the fit compared to a null model. This underscores the intricate interplay of various image statistics in capturing the evolving textures during the drying process.

Table 1 Results of a Generalized Additive Model (GAM) with various features [First Order Statistics (FOS), and Gray Level Co-occurrence Matrix (GLCM)] and their corresponding parameters, degrees of freedom (Df), F-statistics, and p-values
Table 2 Results of a Generalized Additive Model (GAM) with various features [Gray Level Run Length Matrix (GLRLM), Gray Level Size Zone Matrix (GLSZM), and Gray Level Dependence Matrix (GLDM)] and their corresponding parameters, degrees of freedom (Df), F-statistics, and p-values

To find the drying stage for effectively distinguishing various LC-protein droplets, a Generalized Additive Model (GAM) analysis was conducted: initial, middle, final, and the entire drying process. The Akaike Information Criterion (AIC) compares the model’s effectiveness in the drying stages. The negative AIC value provides a relative measure of the model’s quality for the null model. Figure 5A illustrates a comparative overview of Akaike Information Criterion (AIC) values across these stages, denoted as “all” for the entire process. The AIC values obtained were 7720, 3749, 4325, and 3235 for the whole process, initial, middle, and final stages, respectively.

The AIC is a measure that balances the goodness of fit against the complexity of the model, with lower values indicating a superior trade-off. Intriguingly, the GAM results suggest that predicting different LC-protein droplets based on the drying process is not optimally achieved, as evidenced by the highest AIC value of 7720. Notably, the middle and initial stages follow in sequence, with AIC values of 4325 and 3749, respectively. Surprisingly, the final stage stands out, yielding the lowest AIC value of 3235, signifying a more favorable balance between model fit and simplicity. This implies that the final drying stage plays a pivotal role in effectively capturing and distinguishing the textures in various LC-protein droplets at the initial PBS concentrations of 0x, 0.25x, 0.5x, 0.75x, and 1x.

3.3 K-means clustering– An unsupervised machine learning approach

Fig. 5
figure 5

A The Akaike Information Criterion (AIC) values of Generalized Additive Modeling (GAM) for different features for the liquid crystal (LC)-protein droplets at different drying stages– the whole drying process (indicated with ‘all’), initial, middle, and final stages. The red dotted lines show the variation of the AIC values across these drying stages. The K-mean clustering where different initial buffered concentrations of 0x, 0.25x, 0.5x, 0.75x, and 1x are denoted by clusters for B All, C Initial stage, D Middle stage, and E Final stage of the drying process

In addition to the GAM analysis, we employed the K-means clustering method as a complementary approach to find the predominant drying stage influencing the identification of different LC-protein droplets [Fig. 5B–E]. This method involves grouping data points into clusters based on their similarities in a multidimensional space. Each plot’s X and Y axes represent the first two principal components derived from the data containing all the predictors chosen for 2D visualization. These principal components are linear combinations of the original parameters, maximizing the variance in the data. Each point on the plot represents an observation, symbolized according to the initial buffered concentrations of 0x, 0.25x, 0.5x, 0.75x, and 1x.

When considering the entire drying process, the first two dimensions account for approximately \(81\%\) and \(10\%\) of the variance, respectively, explaining a total of around \(91\%\) variance deviance. The concentration ellipses around each cluster reveal that 0.25x has a significant overlapping region with the 0x cluster and a smaller overlap with 1x. Additionally, the 0.5x cluster’s peripheral boundary touches that of the 0.75x cluster [Fig. 5B], indicating that the five clusters are not uniquely determined by K-means clustering. The situation improves when focusing solely on the data from the initial drying stage [Fig. 5C]. Overlapping regions are minimal, with 0.5x being predominantly identified. In contrast, 0x and 0.75x have a touching boundary, similar to 0.25x and 1x. The first two dimensions represent approximately \(69\%\) and \(17\%\), explaining a total variance deviance of around \(86\%\).

For the middle stage, the clusters of 0.5x and 0.75x exhibit overlapping regions with variance deviance of approximately \(77\%\) and \(14\%\), respectively [Fig. 5D]. Notably, K-means clustering provides distinct, spatially separable clusters without overlapping regions for the final drying stage. The two dimensions explain a total variance of approximately \(97\%\) [Fig. 5E]. The shape and size of the clusters for each drying stage indicate how compact or dispersed the data points are within each cluster. Except for the final drying stage, the clusters are not well-separated and do not have similar sizes. For instance, in Fig. 5E, the clusters are distinct and balanced. The distribution and variation of data points within and across the clusters reflect how homogeneous these clusters become as the drying process progresses toward the end.

Therefore, the K-means clustering method and GAM analysis enhance our understanding of the evolving patterns in the drying process and provide valuable insights into the spatial distribution and homogeneity of different LC-protein droplets as the drying process progresses toward completion.

4 Conclusions

This paper presents a comprehensive qualitative and quantitative exploration of the drying process, revealing distinct LC-protein textures across three main stages: initial, middle, and final. Through the application of image statistics, incorporating features from First Order Statistics (FOS), Gray Level Co-occurrence Matrix (GLCM), Gray Level Run Length Matrix (GLRLM), Gray Level Size Zone Matrix (GLSZM), and Gray Level Dependence Matrix (GLDM), the dynamics of these textures are analyzed concerning the drying stages. Generalized Additive Modeling (GAM) aims to identify salient features, and intriguingly, the results suggest that no single feature dominates; instead, all features contribute significantly to the differentiation of LC-protein droplets. This underscores the intricate interplay of various image statistics in capturing the evolving textures during the drying process. In addition, integrating the K-means clustering method (as unsupervised machine learning approach) with GAM analysis shows how the textures are influenced in the three drying stages. The final drying stage emerges with well-defined, non-overlapping clusters, supporting the visual observations and offering evidence of distinct LC textures at varying initial buffered concentrations (0x, 0.25x, 0.5x, 0.75x, and 1x). The novelty is that we use the whole drying process by incorporating advanced texture analysis techniques to classify different drying stages and quantify the pattern dynamics based on these evolving distinct textures. Therefore, this integrated approach enhances our understanding of the evolving morphological patterns and spatial distribution of LC-proteins in a sessile droplet configuration, contributing valuable insights into applying the drying droplet as a simple and rapid characterizing tool for identifying and classifying texture dynamics.