Texture identification in liquid crystal-protein droplets using evaporative drying, generalized additive modeling, and K-means Clustering

Abstract Sessile drying droplets manifest distinct morphological patterns, encompassing diverse systems, viz., DNA, proteins, blood, and protein-liquid crystal (LC) complexes. This study employs an integrated methodology that combines drying droplet, image texture analysis (features from First Order Statistics, Gray Level Co-occurrence Matrix, Gray Level Run Length Matrix, Gray Level Size Zone Matrix, and Gray Level Dependence Matrix), and statistical data analysis (Generalized Additive Modeling and K-means clustering). It provides a comprehensive qualitative and quantitative exploration by examining LC-protein droplets at varying initial phosphate buffered concentrations (0x, 0.25x, 0.5x, 0.75x, and 1x) during the drying process under optical microscopy with crossed polarizing configuration. Notably, it unveils distinct LC-protein textures across three drying stages: initial, middle, and final. The Generalized Additive Modeling (GAM) reveals that all the features significantly contribute to differentiating LC-protein droplets. Integrating the K-means clustering method with GAM analysis elucidates how textures evolve through the three drying stages compared to the entire drying process. Notably, the final drying stage stands out with well-defined, non-overlapping clusters, supporting the visual observations of unique LC textures. Furthermore, this paper contributes valuable insights, showcasing the efficacy of drying droplets as a rapid and straightforward tool for characterizing and classifying dynamic LC textures. Graphical Abstract


I. INTRODUCTION
Liquid crystals (LCs)-a unique bridge between solid and liquid phases-combine the flowing nature of liquids with the symmetrical properties of the crystals [1].These have optical anisotropy, exhibiting birefringence properties falling into two main categories-thermotropic and lyotropic.The specific LC phase is determined by factors such as temperature changes (for thermotropic LCs) or variations in concentration (for lyotropic LCs).For instance, the thermotropic LC, 5CB transitions to a nematic phase at approximately 35 • C [2].In the nematic phase, the molecules are aligned in a specific direction, but lack a longrange positional order.When viewed under crossed polarizers, the nematic phase appears uniform and exhibits a characteristic "worm-like" or "thread-like" texture.On the other hand, one typical example of a lyotropic LC is sodium dodecyl sulfate (SDS) in water.SDS is an amphiphilic molecule, meaning it has both a hydrophilic (water-attracting) head and a hydrophobic (water-repelling) tail.In certain concentrations and under specific conditions, SDS molecules can self-organize to form lyotropic LC phases.The lyotropic phases include lamellar (smectic), hexagonal, and cubic phases.Their textures exhibit distinctive patterns when observing these phases (lamellar, hexagonal, and cubic) under crossed-polarizing microscopy.It is important to note that these textures provide information about the molecular ordering within the LC phases.The specific patterns result from the molecular alignment and organization, which can be influenced by factors such as concentration, temperature, solvent, etc [3].This emergence of the patterns has been investigated thoroughly from a soft matter perspective.Examples include studying copolymers, where different polymer blocks selfassemble [4], exploring colloids that organize into structures like crystals and gels [5], understanding LC patterns with their distinctive molecular order [6], and investigating vortex formation in active matter, like swimming microorganisms displaying collective motion [7].
In contrast, patterns that emerge from drying sessile droplets, often found to have the "coffee ring effect" [8], have also been investigated in soft matter.When a liquid droplet containing solutes evaporates, it can leave behind distinctive patterns [9].Understanding the dynamics of a drying droplet and the resultant pattern formation is not only interesting from a fundamental physics perspective but also has practical applications in various fields, including inkjet printing [10], coating technologies [11], and bio-medical diagnostics [9].Recently, it has been investigated how the morphological patterns emerge when the optically active particles (5CB) are used as a probe in the different protein drying droplets [2,12,13].It reveals that adding a fixed volume of LC to different globular protein solutions (lysozyme, BSA, and myoglobin added with de-ionized water) alters morphological patterns during drying [14].In lightweight proteins (myoglobin and lysozyme), LCs are preferred to be randomly distributed.Conversely, in heavily weighted proteins like BSA, they form umbilical defect structures [12].Interestingly, when the solvent is replaced from the de-ionized water to buffer saline (PBS), the optical activities of the LCs become lower and lower as the initial PBS concentration increases from 0.25 to 1× [13].
Within the field of LCs, both traditional and deep learning approaches in the super-vised machine learning (ML) have been applied during the last two decades.The traditional approaches include Random Forest (RF), Support Vector Machine (SVM), Decision Trees (DT), Multivariate Adaptive Regression, etc.In contrast, deep learning involves fastforwarding neural networks (NN) and artificial and convolution NN.SVM [15] predicts the transition temperatures in thermotropic LC.Furthermore, RF [16] and ordinal networks [17] are applied to forecast LC properties, while DT is specifically implemented to identify clearing temperatures in bent-core LCs [18].In contrast, the calibration of LC phases has been successfully predicted using neural networks (NN) [19].Even unsupervised ML is used to characterize the particle trajectories, pitch, and conical angle of the nematic phases relating to its structural and dynamical properties [20].A few recent investigations include predicting molecular ordering [21], phase transitions [22][23][24], topological defects [25,26], and LC textures [27,28].On the other hand, different MLs are implemented for pattern recognition in a sessile drying droplet setting [29][30][31].
Despite significant progress in machine learning (ML) and liquid crystals (LC), the integration of LC textures and ML techniques in a drying droplet scenario has been relatively limited.This paper addresses two main objectives.Firstly, we aim to quantify LC textures using advanced image processing techniques.Secondly, we employ generalized additive modeling and k-means clustering to identify distinct stages in the drying process.Specifically, we investigate which drying stage predominates in identifying different LC-textured droplets and determine the number of LC textures at each stage.
To achieve this, we utilize texture analysis involving various order rank parameters, ranging from first-order to higher-order parameters.The quantitative image analysis employs the Pyradiomics toolbox to unveil the dynamics of LC textures induced by the drying process.ferent LC-textured droplets.The comprehensive approach employed in this study aims to provide a detailed understanding of the evolving patterns in LC textures during the drying process.
A volume of ∼1 µL sample solution is pipetted on a freshly cleaned coverslip (Catalog number 48366-045, VWR, USA) under ambient conditions (the room temperature of ∼25 • C and the relative humidity of ∼50%).The drying evolution is monitored every two seconds.
The clock started when the droplets were deposited on the coverslips.The normalized time is calculated as the ratio of the instantaneous time to the total drying time.To ensure reliability, all these experiments were repeated three times.These samples show a good reproducibility.

B. Image acquisition
The images were captured under 5x magnification using cross-polarized optical microscopy (Leitz Wetzlar, Germany) configured in the transmission mode.An 8-bit digital camera (MU300, Amscope) was attached to the microscope, and the top-view images were clicked.

C. Image processing
The Pyradiomics tool [32] in Python (version 3.10) extracts the quantitative features during drying.To do this, a mask is selected on the time series of the droplet to highlight the region of interest (ROI) for the feature extraction.This includes First Order Statistics (FOS), Gray Level Co-occurrence Matrix (GLCM), Gray Level Run Length Matrix (GLRLM), Gray Level Size Zone Matrix (GLSZM), and Gray Level Dependence Matrix (GLDM).The respective classes are RadiomicsFirstOrder, RadiomicsGLCM, Radiomics-GLRLM, RadiomicsGLSZM, and RadiomicsGLDM.These classes compute the distribution of intensities within the specified ROI by initializing the batch of images as an input and the associated mask.
For FOS, X is a set of N p pixels in the ROI, P (i) is the first-order histogram with N g discrete intensity levels, N g is the number of non-zero bins, and p(i) is the normalized first-order histogram and equal to P (i)  Np .The FOS parameters include Mean ( ; where c is used to shift the calculated intensities and prevent negative values in X), Entropy (− , where µ 3 , and µ 4 are the 3rd and 4th central moment, respectively), and Uniformity ( Ng i=1 p(i) 2 ), and Variance ( 1 ).A Gray Level Co-occurrence Matrix (GLCM) of dimensions N g × N g characterizes the second-order joint probability function within the ROI image.It is defined as P (i, j|δ, θ), where (i, j) represents the frequency of occurrences of levels i and j in pairs of pixels.These pixels are separated by a distance of δ pixels along angle θ from the center pixel.The feature value is computed individually for each angle in the GLCM, and the average of these values is obtained.Let P (i, j) denote the co-occurrence matrix for an arbitrary δ and θ, and p(i, j) represent the normalized co-occurrence matrix, defined as P (i, j)/ P (i, j).N g is the count of discrete intensity levels in the image, p x (i) = Ng j=1 p(i, j) denotes the marginal row probabilities, and p y (j) = Ng i=1 p(i, j) denotes the marginal column probabilities.Further, µ x is the mean gray level intensity of p x , calculated as µ x = Ng i=1 p x (i)i.Similarly, µ y is the mean gray level intensity of p y , defined as µ y = Ng j=1 p y (j)j.Additionally, σ x represents the standard deviation of p x , and σ y represents the standard deviation of p y .

Ng i=1
Ng j=1 (i − j) 2 p(i, j)), Difference Average (DA) ( For GLSZM, it assesses the distribution of gray level zones in an image.A gray level zone is the count of connected pixels with the same gray level intensity.In the matrix P (i, j), the (i, j)th element signifies the number of zones with gray level i and size j in the image.N g is the number of discrete intensity values in the image, N s is the number of discrete zone sizes in that image, N p is the number of pixels in the image, N z is the number of zones in the ROI, which is equal to Ng i=1 Ns j=1 P (i, j) and 1 ≤ N z ≤ N p , and p(i,j) is the normalized size zone matrix, defined as P (i,j) Nz .It is rotation-independent, and only one matrix is calculated for all directions within that ROI.
The main parameters of GLSZM are Non-Uniformity ( A Gray Level Run Length Matrix (GLRLM) characterizes sequences of pixels with the same gray level value, known as runs, by measuring their length in the number of consecutive pixels.The matrix, P (i, j|θ), represents the frequency of runs with gray level i and length j along the specified angle θ within the image ROI.N p is the number of pixels in the image, N r (θ) is the number of runs in the image along angle θ, which is defined as

Ng i=1
Nr j=1 P (i, j|θ) and 1 ≤ N r (θ) ≤ N p .Therefore, similar to GLSZM, P (i, j|θ) is the run length matrix for an arbitrary direction θ, and p(i, j|θ) is the normalized run length matrix, defined as p(i, j|θ) = P (i,j|θ) Nr(θ) .The GLRLM has four parameters.These are Gray Level Variance ( The Gray Level Dependence Matrix (GLDM) quantifies gray level dependencies in an image.Similar to GLSZM (which is dependent on the zone matrix), a gray level dependency is defined as the number of connected pixels within distance δ dependent on the center pixel.
In total, there are thirty variables with eight First Order Statistics (FOS), eight Gray Level Co-occurrence Matrix (GLCM), four Gray Level Run Length Matrix (GLRLM), five Gray Level Size Zone Matrix (GLSZM), and five Gray Level Dependence Matrix (GLDM).

D. Statistical Data Analysis
The generalized additive modeling (GAM) is implemented using R (version 4.1.2).For this, the library(mgcv) and library(gam) [33] were installed before the modeling.It involves selecting a basis for the space in which f resides.This selection leads to the basis functions F j such that each is associated with parameters b j .The combination of these basis functions using these parameters results in f (x) = q j=1 F j (x)b j , where q represents the number of basis functions chosen for the representation of f (x). the term "LC-protein droplets" includes the initial PBS concentration of 0x, 0.25x, 0.5x, 0.75x, and 1x (five classes); and the term "s" denotes the smooth function of the GAM modeling.The data is scaled using a scale function before using the modeling to minimize the inter-scale differences.The summary of the model is described using summary(model).
This modeling predicts five classes based on the smooth terms applied to the respective thirty predictor variables.
Following the GAM modeling, K-means clustering is another complementary method used for data analysis.It does not involve using labeled training and test datasets like classification machine learning problems usually do.Instead, it focuses on discovering patterns, relationships, and structures (if there are any) within the dataset without predefined targets.[34] is implemented in R using library(factoextra) and library(cluster).

K-means clustering
It is built using kmeans(df, centers = 5, nstart = 25), where df is the dataframe, centers specify the number of clusters (or centroids) that the algorithm should aim to find in the data, and nstart is the number of times the K-means algorithm should be run with different initial cluster centers.We chose to run the algorithm 25 times with different initializations, and the best result in minimizing the within-cluster sum of squares (WCSS) will be chosen.
The clustering results are visualized using fviz cluster(km, data = df ).
The comparative analysis considers all drying stages and the individual examination of three distinct drying stages.This approach aims to assess the impact of various stages on the characteristics of five protein-LC mixtures at the initial PBS concentrations of 0x, 0.25x, 0.5x, 0.75x, and 1x.GAM and K-means clustering facilitates a comprehensive understanding of the distinctive influences of each drying stage on these protein-LC droplets.Figure 2A exhibits the qualitative drying evolution of the LC-protein droplets with varied initial PBS concentrations (0x, 0.25x, 0.5x, 0.75x, and 1x) under a crossed polarizing configuration.The 0x is the droplet of lysozyme, LC prepared in de-ionized water.During the drying process, the droplet height and contact angle decrease after pipetting the droplets onto the substrate.These droplets exhibit a spherical-cap shape.The curvature of these droplets induces higher mass loss near the periphery compared to the central region, leading to the well-known "coffee-ring effect" [8], commonly observed in various bio-colloids [35].
The droplets become pinned to the substrate, and the lysozyme particles are transported through an outward capillary radial flow to compensate for the loss.The initial stage involves the general mechanism of the bio-colloidal droplets and does not depend on the varied concentrations of the LC-protein droplets.
The middle drying stage is the most dynamic stage, which starts when the fluid front recedes from the periphery toward the central region and undergoes crack formation due to mechanical stress.The lysozyme droplet films are thin enough and buckled up as more water evaporates from the droplets.For 0x, upon mechanical stress-induced buckling of the protein-cracked domains, the LCs are drawn beneath these domains.The bright regions observed under crossed polarized configurations represent randomly oriented LCs.In contrast, the dark region in each domain corresponds to the attached protein layer that is not optically active, as discussed in detail in [12].When the initial PBS concentration goes from 0.25 to 1x, the interaction between lysozyme, salts, and LCs potentially influences the arrangement of lysozyme particles and LCs.However, LCs influence the packing and might not increase the film height.LCs appear trapped in the layer between lysozyme and salts in the central regions.The evaporation of a further volume of water leads to the bursting and random distribution of LCs.In contrast to 0x, an additional salt layer is on top of the LC distribution.This entire process offers a plausible explanation for the observed reduction in birefringence intensity under crossed polarizing configuration when PBS concentration increases from 0.25 to 1x, despite having a fixed volume of LCs.The detailed analysis can be found in [13].The final stage represents the concluding phase of the drying process, characterized by minor observable changes (see Figure 2A).The observed trend between Non-Uniformity and Zone Non-Uniformity follows a similar pattern, exhibiting lower values at the initial stage and higher values at the final drying stage.However, the trend in Variance and Zone Variance is not closely aligned, suggesting that the zone's impact may vary depending on the specific parameter being considered.Not only this, but the Gray Level Variance under the GLDM measures the variance in gray levels in the image, while DV assesses the variance in dependence size in the image.Though  In addition to the GAM analysis, we employed the K-means clustering method as a complementary approach to find the predominant drying stage influencing the identification of different LC-protein droplets [Figure 5B-E Therefore, the K-means clustering method and GAM analysis enhance our understanding of the evolving patterns in the drying process and provide valuable insights into the spatial distribution and homogeneity of different LC-protein droplets as the drying process progresses toward completion.

IV. CONCLUSIONS
This paper presents a comprehensive qualitative and quantitative exploration of the drying process, revealing distinct LC-protein textures across three main stages: initial, middle, and final.Through the application of image statistics, incorporating features from First Order Statistics (FOS), Gray Level Co-occurrence Matrix (GLCM), Gray Level Run Length Matrix (GLRLM), Gray Level Size Zone Matrix (GLSZM), and Gray Level Dependence Ma-trix (GLDM), the dynamics of these textures are analyzed concerning the drying stages.Generalized Additive Modeling (GAM) aims to identify salient features, and intriguingly, the results suggest that no single feature dominates; instead, all features contribute significantly to the differentiation of LC-protein droplets.This underscores the intricate interplay of various image statistics in capturing the evolving textures during the drying process.In addition, integrating the K-means clustering method with GAM analysis shows how the textures are influenced in the three drying stages.The final drying stage emerges with well-defined, non-overlapping clusters, supporting the visual observations and offering evidence of distinct LC textures at varying initial buffered concentrations (0x, 0.25x, 0.5x, 0.75x, and 1x).Therefore, this integrated approach enhances our understanding of the evolving morphological patterns and spatial distribution of LC-proteins in a sessile droplet configuration, contributing valuable insights into applying the drying droplet as a simple and rapid characterizing tool for identifying and classifying texture dynamics.

Figure 1 FIG. 1 .
Figure1illustrates the flowchart outlining our approach.Each droplet progresses through various stages during the drying process, ultimately forming distinct fingerprint patterns.The drying droplets are examined using polarizing microscopy under crossed polarizing configurations.Throughout the drying process, features, including First Order Statistics (FOS), Gray Level Co-occurrence Matrix (GLCM), Gray Level Size Zone Matrix (GLSZM), Gray Level Run Length Matrix (GLRLM), and Gray Level Dependence Matrix (GLDM), are systematically extracted.Statistical data analysis is then applied to the diverse features extracted from the images.This step allows us to determine the number of LC textures present and identify which drying stage predominantly influences the identification of dif- i, j|θ)(i− µ) 2 ), Run Entropy (− Ng i=1 Nr j=1 p(i, j|θ) log 2 (p(i, j|θ)+ǫ)), Run Variance ( Ng i=1 Nr j=1 p(i, j|θ)(j− µ) 2 , where µ = Ng i=1 Nr j=1 p(i, j|θ)j), and Gray Level Non-Uniformity ( Ng i=1 Nr j=1 P (i, j|θ) 2 N r (θ)).

Figure
Figure2B(I-VIII) shows the quantitative drying evolution of the LC-protein droplets with varied initial PBS concentrations (0x, 0.25x, 0.5x, 0.75x, and 1x) under a crossed polarizing configuration using First Order Statistics (FOS).The variables include Mean, Variance, Skewness, Kurtosis, Root mean square, Uniformity, Entropy, and Energy.The three stages of the drying process exhibited the FOS values qualitatively.The initial stage of the drying process reveals gradual changes in these parameters.However, limited variations in these parameters are observed for specific PBS concentrations (0x, 0.25x, 0.5x, 0.75x, and 1x).The middle stage shows a rapid rise and is more vibrant than the initial and final stages.In contrast, the final stage stabilizes the parameters, where their values become relatively

Figure
Figure 3A(I-VIII) shows the quantitative drying evolution of the LC-protein droplets with varied initial PBS concentrations (0x, 0.25x, 0.5x, 0.75x, and 1x) under a crossed polarizing configuration using Gray Level Co-occurrence Matrix (GLCM).In the context of image analysis, various texture features derived from a GLCM provide valuable insights into local intensity patterns.The variables include Contrast, Correlation, Inverse Difference Moment (IDM), Maximum Probability, Difference Average, Difference Variance, Sum Entropy, and Difference Entropy [Figure 3A(I-VIII)].The middle stage is the most dynamic phase,

Figure
Figure 4A-B(I-V) shows GLSZM features, including non-uniformity, zone non-uniformity, variance, zone variation (ZV), and zone entropy (ZE).In contrast, the GLDM feature consists of Non-Uniformity, Dependence Non-Uniformity(DN), Variance, Dependence Variance (DV), and Dependence Entropy.Variance quantifies the variance in gray level intensities for the zones, while ZV measures the variance in zone size volumes for the zones.ZE evaluates the uncertainty and randomness in the distribution of zone sizes and gray levels, with ].This method involves grouping data points into clusters based on their similarities in a multidimensional space.Each plot's X and Y that the five clusters are not uniquely determined by K-means clustering.The situation improves when focusing solely on the data from the initial drying stage [Figure5C].Overlapping regions are minimal, with 0.5x being predominantly identified.In contrast, 0x and 0.75x have a touching boundary, similar to 0.25x and 1x.The first two dimensions represent approximately 69% and 17%, explaining a total variance deviance of around 86%.For the middle stage, the clusters of 0.5x and 0.75x exhibit overlapping regions with variance deviance of approximately 77% and 14%, respectively [Figure5D].Notably, Kmeans clustering provides distinct, spatially separable clusters without overlapping regions for the final drying stage.The two dimensions explain a total variance of approximately 97%[Figure5E].The shape and size of the clusters for each drying stage indicate how compact or dispersed the data points are within each cluster.Except for the final drying stage, the clusters are not well-separated and do not have similar sizes.For instance, in Figure5E, the clusters are distinct and balanced.The distribution and variation of data points within and across the clusters reflect how homogeneous these clusters become as the drying process progresses toward the end.

TABLE I .
Results of a Generalized Additive Model (GAM) with various features [First OrderStatistics (FOS), and Gray Level Co-occurrence Matrix (GLCM)] and their corresponding parameters, degrees of freedom (Df), F-statistics, and p-values.
The AIC is a measure that balances the goodness of fit against the complexity of the model, with lower values indicating a superior trade-off.Intriguingly, the GAM results

TABLE II .
Results of a Generalized Additive Model (GAM) with various features [Gray Level RunLength Matrix (GLRLM), Gray Level Size Zone Matrix (GLSZM), and Gray Level Dependence