Probing the Rheological Properties of Liquids Under Conditions of Elastohydrodynamic Lubrication Using Simulations and Machine Learning

In elastohydrodynamic lubrication (EHL), the lubricant experiences pressures in excess of 500 MPa and strain rates larger than 105\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$10^5$$\end{document} s-1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\text{s}^{-1}$$\end{document}. The high pressures lead to a dramatic rise in the Newtonian viscosity and the high rates lead to large shear stresses and pronounced shear thinning. The extraction of accurate rheological properties using non-equilibrium molecular dynamics simulations (NEMD) has played a key role in improving our understanding of lubricant flow in EHL conditions. However, the high dimensionality of the output data generated by NEMD simulations often makes a deeper interrogation of the link between molecular-scale features and rheological properties challenging. In this paper, we explore the use of machine learning to analyze and visualize the high-dimensional output data generated in typical NEMD simulations. We show that dimension reduction of NEMD simulation data describing the shear flow of squalane enables a clear visualization of the transition from Newtonian to non-Newtonian shear thinning with increasing shear rate and provides a reliable assessment of the correlation between shear thinning and the evolution in molecular alignment. The end-to-end atom pairs dominate the largest variations in pair orientation tensor components for low-pressure systems (0.1, 100 MPa) and provide the clearest separation of the orientation tensors with rate. On the other hand, the side atom pairs dominate the largest variation in the tensor components for the high-pressure systems (P≥400\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$P\ge 400$$\end{document} MPa) which exhibit an overall limited evolution in orientation tensors as a function of rate. Dimension reduction using all the six components of the orientation tensors of all 435 pairs associated with a squalane molecule shows that the decrease in viscosity with rate for low pressures is strongly correlated with changes in molecular alignment. However, for high pressures, shear thinning occurs at saturated orientational order.


Introduction
Many liquids when sheared exhibit a decrease in viscosity with the rise in shear strain rate [1,2]. This shear thinning behavior depends on many factors such as thermodynamic conditions, strain rates, and the molecular structure of the liquid. Shear thinning and the scaling of the Newtonian viscosity with temperature and pressure are critical to the performance of liquids in many practical applications. For example, the Newtonian viscosity and shear thinning of fracturing fluids determine whether the fluid viscosity is sufficiently large to reduce particle settling rates and sustain fractures of desired geometry against closure stresses imposed by the reservoir [3,4]. In elastohydrodynamic lubrication (EHL), high sliding velocities of solid machine components produce strain rates > 10 5 s −1 and substantial shear thinning of lubricant films, which lowers the frictional stress [5][6][7]. Further, the high normal loads compress the lubricant in EHL contacts to pressures in excess of 500 MPa, which increases the Newtonian viscosity of the lubricant by orders of magnitude, preventing squeeze out and limiting wear.
Accurate models describing the rheological properties of liquids under EHL conditions are critical to predicting friction. Coupled with a fundamental molecular-level understanding of shear thinning, these models can also help guide the design of synthetic lubricants. However, developing these rheological models have proved to be challenging [5,8,9]. A key reason is the difficulty to reach the relevant pressures and strain rates in laboratory experiments. Specially designed rheometers have been able to exceed 1 GPa at 10 4 s −1 and reach shear stresses of up to ~ 30 MPa [5,[10][11][12], but may be affected by unmeasured temperature increases during shear at high rates [5]. Experiments on idealized EHL contacts can reach the higher rates, pressures, and shear stresses (~ 150 MPa) important in real devices [5,7], but measure an average stress over contact regions with a range of local strain rates, pressures and temperatures [6,13]. EHL conditions also present fundamental theoretical challenges as fluids are in a regime where they exhibit strongly nonlinear viscoelastic behavior and poorly understood glassy rheology. While there is a general agreement on the qualitative changes in shear stress with increasing rate, there is an ongoing debate regarding the underlying mechanisms of shear thinning and the most suitable functional form of the rheological model that best describes liquid flows in EHL conditions [5,6,13].
Two phenomenological models at the center of this active debate are based on the work by Eyring [14,15] and Carreau [16], respectively. The Eyring model assumes that flow occurs by thermal activation over a single characteristic energy barrier and predicts that shear stress rises as log(̇ ) at high strain rate ̇ . In the Carreau model, increases as a power law, ∝̇n, with n typically around 0.5. This power-law behavior can arise from several different physical pictures of flow, including thermal activation over a broad range of energy barrier heights or a continuous increase in some type of order (e.g., molecular alignment) with increasing rate. Both of these competing models can be fit to the limited range of existing rheometer data on simple fluids that extends up to rates of about 10 4 s −1 . The two models predict dramatically different shear stresses and viscosities when extrapolated to rates that are important in EHL ( ̇> 10 5 s −1 ), but inaccessible to rheometers [5]. Resolving the debate about flow behavior at these rates is of great practical and fundamental interest. In practical terms, it will yield accurate quantitative information about the macroscopic flow properties that are essential in predicting EHL friction and guiding synthetic lubricant design. From a fundamental perspective, it will advance our understanding of the molecularlevel mechanisms of shear thinning and the nature of flow in glassy systems.
Non-equilibrium molecular dynamics (NEMD) simulations [17,18] can probe shear flow under high strain rates ( ̇> 10 5 s −1 ) which are challenging to access experimentally. These simulations have been used to investigate the shear thinning behavior of small-molecular liquids [19][20][21][22] as well as long-chain polymer molecules [23][24][25][26]. However, they are limited in probing the transport properties at lower rates due to increased computational costs associated with longer simulation times. Most of the early work on using NEMD to study model lubricants considered systems at state points far from conditions prevalent in EHL and was limited to short simulation times and high strain rates accessible to the computers at the time [20,[27][28][29][30].
Increased computational power and the development of accurate all-atom force fields at high pressure has enabled recent flow studies under pressure and temperature conditions prevalent in EHL [7][8][9][31][32][33][34]. Much of this work has been pioneered by Mark Robbins. Jadhao and Robbins [9,31,35] studied the nature of shear thinning in squalane by evaluating the shear stress, viscosity, and molecular order for a wide range of strain rates ( 10 5 −10 10 s −1 ) and thermodynamic conditions (pressures: 0.1−1200 MPa; temperatures: 150−373 K) using NEMD simulations of a united-atom model of squalane [9,31]. Their findings were consistent with available experimental data and, by extending the range of Newtonian viscosity N and rates to as low as 10 5 s −1 , revealed two distinct regimes of rheological behavior. At high temperatures and low pressures, where N of squalane is low, shear thinning viscosity was well fit by Carreau model, but with a scaling exponent n changing with N . As N increased above ∼ 1 Pa-s, squalane began to exhibit glassy flow and shear thinning was better described by the Eyring model over six orders of magnitude relative to N . For the high viscosity regime of experiments ( N > 10 3 Pa-s), the simulations and experiments were consistent with the Eyring equation over more than 8 orders of magnitude in strain rate. In this regime, using time-temperature superposition, simulation and experimental data were collapsed on the analytic solution of the Eyring model over almost 30 orders of magnitude in rate.
In a subsequent paper, Galvani Cunha and Robbins [32] employed a similar NEMD approach to study the shear response of 2,2,4-trimethylhexane using an all-atom force field developed by O'Connor, Andzelm, and Robbins for simulating shear flow at very high pressures [33]. They also observed a crossover from Carreau at low N to Eyring response at high N . More recently, Lin and Kedzierski employed the Eyring model to describe the NEMD simulation data for the rate-dependent viscosity of PEC6 at different temperatures and pressures over which N was changed by 4 orders of magnitude [34]. They found that the reduced viscosity vs rate data collapsed onto the analytic form of the Eyring equation. No universal Carreau equation could describe all the viscosity data for PEC6 as the power-law exponent n varied appreciably with temperature and pressure.
Shear thinning is often attributed to molecular ordering such as the alignment of chain-like molecules along the flow direction. By computing the ensemble-averaged components of the orientation order tensor associated with the pairs of end atoms, Jadhao and Robbins showed that while squalane molecules tend to align along the shear flow direction, the degree of order saturated before the viscosity dropped by more than half a decade. Shear thinning continued to occur after the squalane chains have aligned and the decrease in viscosity extended to over six decades and over more than eight decades in rate. This regime was characterized with nearly constant molecular alignment and the associated flow was best described by the Eyring model.
The reliable extraction of accurate macroscopic rheological properties using NEMD simulations has played a key role in improving our understanding of lubricant flow in EHL conditions. Further, the computation of molecular-scale quantities such as average end-to-end molecular distances, alignment angles, and radii of gyration using the simulation data have provided key insights into the microscopic origins of changes in the rheological properties. However, the high dimensionality of the output data generated by the NEMD simulations often makes a deeper interrogation of the link between molecular-scale features and rheological properties challenging. For example, in the case of squalane, a more informative link between rheological properties and changes in the intramolecular structure can be established by examining pair orientation tensors associated with all the 435 distinct pairs of united atoms, with each tensor characterized by six non-trivial pair distance components. Visualizing this 435 × 6 high-dimensional information is difficult using conventional analysis tools. As NEMD simulations explore EHL fluids containing molecules of more complex shapes or mixtures of molecules of different chemical structures and lengths, new tools will be necessary to expedite the analysis of the high-dimensional simulation outputs aimed at identifying key molecular features (e.g., subset of atom pairs) that best classify evolution in molecular structure (e.g., alignment along the flow direction).
Machine learning (ML) techniques have the potential to address these needs and facilitate the analysis of information hidden in the high-dimensional data generated by standard NEMD simulations. Recent years have seen a surge in ML approaches to assist, enhance, or bypass MD simulations aimed at understanding material phenomena [36][37][38]. ML has been employed to rapidly scan assembly landscapes employing particle trajectories [36,39], design efficient force fields [40], design integrators [41], and develop surrogate models to predict specific simulation outcomes bypassing part or all of the simulation [42][43][44][45][46][47][48]. Studies have demonstrated that ML methods can classify conformations of polymer chains and characterize structural relaxations in glassy materials [37,49,50]. However, with the exception of a few recent developments [51], there is a lack of utilization of ML techniques in examining the rheological behavior of liquids. The application of ML methods in the specific context of NEMD simulations related to tribological applications has not been explored to the best of our knowledge.
In this paper, we explore the use of ML tools such as dimension reduction methods to analyze and visualize the data generated in typical NEMD simulations. Using the extensive amount of simulation data generated for squalane, we show that ML methods can aid in the investigation and examination of the molecular-level mechanisms underlying the changes in the rheological properties. We show that the dimension reduction methods can enable a clear visualization of the transition from Newtonian flow to non-Newtonian shear thinning with increasing shear rate, and provide a reliable assessment of the correlation between shear thinning and the evolution in molecular alignment. Our findings suggest that the integration of ML methods with simulation and experimental approaches can further improve our understanding of lubricant flow behavior under EHL conditions.
The following section provides a brief overview of flow studies performed by Jadhao and Robbins [9]. Section 3 presents details of the simulation data preparation and dimension reduction methods utilized. Dimension reduction results for shear thinning in ambient conditions and under different higher pressures are given in Sect. 4. The final section presents discussion and conclusions.

Non-equilibrium Molecular Dynamics Simulations of Squalane
Squalane or 2,6,10,15,19,23-hexamethyltetracosane ( C 30 H 62 ) is a branched alkane with a C 24 backbone and six methyl side groups. As shown in Fig. 1, the side groups are placed symmetrically about the backbone in a way that frustrates crystallization. Squalane has played an important role in the debate about rheological models of shear thinning under pressure and temperature conditions prevalent in EHL [5,6,8,9,13]. Due to its relatively simple chemical structure and small molecular size (end-to-end distance ∼ 2 nm), squalane has also been chosen for several computational studies of rheological properties of liquids [9,20,28,30,31]. Below we briefly summarize the methods and results of the flow studies performed by Jadhao and Robbins using NEMD simulations of squalane at 293 K [9,31]; a detailed account can be found in previous publications [9,31,35]. Squalane is modeled using a united-atom (UA) potential based on the model developed by Mondello and Grest (MG) for branched alkanes [28,52,53]. In the UA description, hydrogens are lumped together with carbons into UA groups such as CH 3 , CH 2 , and CH. Each squalane molecule is represented by 30 UAs (Fig. 1). Within each molecule, the UAs interact via a harmonic bond-bending term, a harmonic bond-stretching term, a torsional potential, and a harmonic bending term to prevent umbrella inversion at tertiary carbon branch points. There is an additional non-bonded Lennard-Jones potential between UAs on different molecules or separated by at least four bonds on the same molecule.
In all cases, simulations are performed with LAMMPS [54] and used N m = 125 squalane molecules. Periodic boundary conditions are used to prevent edge effects, and the temperature is maintained with a Nosé-Hoover thermostat. An initial state is created at T = 293 K and ambient pressure P = 0.1 MPa where diffusion is rapid and equilibrium is reached quickly. High-pressure states are generated by gradually compressing this state at constant T until P reached the desired value. Typical equilibration times are ≈ 100 nanoseconds and the simulation time step is 1 femtosecond.
Shear is imposed using the SLLOD equations of motion to study steady-state flow behavior [18]. The periodic box is sheared along the x-axis at a constant rate to impose the desired average strain rate ̇= v x ∕ y , with the velocity v x along the x-axis and velocity gradient along the y-axis. For each ̇ , the system is sheared until it reached steady state. The total strain required to reach steady state is of order 5 to 10. After reaching steady state, simulations are run for strains large enough to achieve the desired statistical accuracy in average quantities such as the stress and viscosity.
The rheological data from experiments and simulations are compared to fits from Eyring and Carreau models. The Eyring model is based on the idea that shear flow is a stressbiased thermally activated process [14,15], that is, shear can only occur through molecular rearrangements that require activation over potential energy barriers with a single characteristic height. The model predicts that the shear stress rises sublinearly with strain rate according to the Eyring equation = E sinh −1 ( Ṅ∕ E ) where sinh −1 is the inverse hyperbolic sine function, and E = k B T∕V * is a parameter known as the Eyring stress that is related to the activation volume V * . The Eyring equation predicts Newtonian behavior with a constant viscosity for ≪ E and ̇≪̇E , where ̇E = E ∕ N is called the Eyring rate. The shear thinning behavior is given by the rate-dependent viscosity Limiting behavior for high rates sets in for ̇≫̇E , where the stress rises logarithmically with rate, ∼ log(̇) , and the corresponding viscosity decreases as ∼̇− 1 log(̇).
The Carreau model [16] belongs to the set of power-law fluid models which describe shear flow by assuming that there is a very broad distribution of barrier heights. These models are well suited to describe low viscosity fluids where molecules are in almost continual motion rather than hopping between cages formed by their neighbors. A power-law shear thinning behavior can also result where a continuous increase in some type of order with increasing rate leads to reductions in dissipation during flow. For example, individual molecules may stretch or align and groups of molecules may order into lines to reduce friction. The rate-dependent viscosity is given by the Carreau equation which has a specific form of the crossover from Newtonian behavior at ̇ below a characteristic rate ̇0 to power-law behavior at ̇≫̇0. In the high rate limit, the stress rises with rate as a power law, ∼̇n, where the exponent n is between 0 and 1, and the corresponding viscosity shows power-law shear thinning: ∼̇− (1−n) .
We now summarize the results for shear stress, viscosity, and molecular alignment obtained by Jadhao and Robbins [9,31]. The simulation data shown in Figs. 2 and 3 are the same as in their paper [9]. Figure 2a shows the plot of shear stress vs logarithm of rate ̇ at T = 293 K for pressures between 0.1 and 1200 MPa, focusing on values where there is experimental data [10,11,30]. Simulation results are shown along with fits to the Eyring and Carreau equations. For P ≤ 500 MPa, simulations reached the Newtonian regime and the limiting N is used in the fits. For P between 636 and 955 MPa, fits are constrained to have the experimental value of N . No data are available to constrain N at 1200 MPa, so N is included in the fit.
High-pressure results ( P ≥ 300 MPa) from experiments and simulations are found to be consistent with the Eyring equation over up to 8 orders of magnitude in rate. At high rates, rises logarithmically with ̇ and the slope gives the Eyring stress E , which is ≈ 9.3 MPa for 500 ≤ P ≤ 955 MPa, and ≈ 9.6 MPa for 1200 MPa. Fits to simulations extending to rates as low as 10 5 s −1 are consistent with experiments at the same pressure that reach rates up to 10 4 s −1 and extend down into the Newtonian regime. As the pressure decreases, the range of rates where logarithmic behavior (1) = ṄĖ sinh −1 (̇∕̇E). Fig. 2a show clear curvature and are not well fit by the Eyring equation. In contrast, this low-pressure data are reasonably fit by the Carreau equation. The small deviations in the Carreau fits are of order 2%. We note that these errors can be further reduced by a factor of 10 by fitting the data to the Carreau-Yasuda equation that uses an additional adjustable parameter [5]. There is a gradual decrease in the exponent n of the power law as P increases; n decreases from about 0.46 at 0.1 MPa to 0.2 at 200 MPa. Figure 2b shows the viscosities corresponding to the stress data in Fig. 2a in a log-log plot. Data for P < 300 MPa in Fig. 2b are fit to the Carreau equation with values of n that decrease with increasing pressure. Data at higher pressures also have apparently linear regions in Fig. 2b with a slope near −1 that corresponds to n near zero. However, as noted already, the data at these pressures are better fit by the Eyring equation where the high frequency limit, ∼ loġ∕̇ , is equivalent to a power law in the limit of n approaching zero. Eyring theory describes the decrease in viscosity by six orders of magnitude relative to the Newtonian viscosity N . Although there are no non-equilibrium experimental data for P ≤ 500 MPa, simulation data are consistent with the experimental N values [9,11,31]. Eyring theory also provides a reasonable fit for systems prepared under different T and P as long as N ≳ 1 Pa-s. A collapse of reduced viscosity ∕ N vs rate scaled by ̇E is found for all systems with N ≳ 1 Pa-s. Over almost 30 decades, simulation and experimental data lie on the analytic solution of the Eyring model.
Shear thinning in the Eyring model originates from a mechanism that only involves a change in the rate of rearrangements through activated motion over fixed barriers. In contrast, one commonly invoked mechanism for power-law shear thinning described by the Carreau model is a change in some order parameter. For squalane, the simplest type of order is the alignment of molecules along the flow direction [23,55,56]. This may lower the viscosity by decreasing the rate of collisions between molecules and the associated momentum transfer and dissipation. To study changes in molecular order under shear, Jadhao and Robbins [9] extracted the orientational tensor using NEMD simulation data: where and are Cartesian coordinates, û i is the unit vector in the direction between the end atoms of the ith squalane molecule, and N m is the number of molecules. The diagonal elements S xx , S yy , and S zz measure the degree of molecular alignment in the direction of the flow field, velocity gradient, and vorticity, respectively. A value of 0 indicates no preference to align along the axis, while values of 1 and −0.5 imply perfect alignment along or perpendicular to the axis. Figure 3 shows the variation of S xx and S yy with rate ̇ for squalane sheared at T = 293 K and pressures P indicated in the legend. In equilibrium, the system must become isotropic, implying S xx = S yy = 0 . This limit is approached for the lowest P and ̇ in Fig. 3. With increasing ̇ , molecules become more aligned along the flow direction. S xx and S yy rise rapidly and then saturate for all P. These changes occur over about two decades in rate. The maximum degree of alignment increases only slightly with increasing P. By correlating the rate dependence of orientational order and the reduced viscosity, it is found that the alignment saturates when viscosity ∕ N drops by Pressure dependence of shear stress and viscosity vs strain rate ̇ for squalane at T = 293 K. a Linear-log plot of vs ̇ at pressures P corresponding to the symbols defined in the legend. Symbols for ̇≥ 10 5 s −1 correspond to simulation results obtained in [9], and symbols at lower rates are experimental values [10,30]. Black dotted lines are Eyring model fits to simulation data and red dashed lines are Carreau fits. Fits are constrained to the limiting Newtonian viscosity from simulations for P ≤ 500 MPa and from non-equilibrium experiments for P between 636 and 955 MPa. The Newtonian viscosity is fit at 1200 MPa. b vs ̇ calculated from the stresses shown in Fig only a factor of ≈ 3 . The viscosity then continues to drop by many decades with little change in alignment.

Machine Learning Techniques for Analyzing Rheological Behavior
The link between rheological properties and changes in the molecular order in squalane can be examined more closely by visualizing the information encoded in all the components of the orientation tensors associated with all the atom pairs. We now describe the data preparation processes and the ML techniques used to enable the rapid visualization of this high-dimensional information.

Simulation Data Collection and Preparation
NEMD simulation produces trajectory data consisting of 3D positions of N m × 30 = 3750 atoms, where N m = 125 is the number of squalane molecules (each molecule has 30 united atoms). We collect this trajectory data from simulations at T = 293 K for six pressures P = 0.1, 100, 400, 636, 875, 955 MPa and rates ̇∈ 10 6 −10 10 s −1 . For all rates except 10 6 s −1 , this information is collected at a strain interval of 0.01. That is, the data collection frequency f changes with ̇ ; for ̇= 10 10 s −1 , f = 1000 steps and for ̇= 10 7 s −1 , f = 10 6 .
As simulations at ̇= 10 6 s −1 are relatively slower, the data are collected for every 0.001 strain. In some higher ̇ cases, new simulations are run to produce trajectory data with the needed granularity of strain. Otherwise, trajectory data generated from past simulations performed by Jadhao and Robbins are utilized [9,31]. For the purposes of dimension reduction, the last 1000 frames of the trajectory data associated with the steady-state region are used to compute the pair orientation tensors (see below). For each system (specified by pressure and rate), the size of this dataset is ∼ 4 GB. The complete dataset size is ∼ 200 GB. The trajectory data is used to extract the orientation tensor S p for each atom pair p belonging to the same molecule: where and are Cartesian coordinates, and r p i is the unit distance vector between the atoms associated with pair p of the ith squalane molecule. The diagonal elements S p xx , S p yy , and S p zz of the pair orientational tensor measure the degree of alignment of the atoms associated with pair p in the direction of the flow field, velocity gradient and vorticity, respectively. Each element of the tensor S p represents an ensemble average of the distance between the atoms associated with the pair p. The ensemble average is performed over the number of squalane molecules and over the selected time window (Fig. 4). Note that S p is equivalent to S of Eq. 3 for the end atom pair (1-30) of the squalane molecule. S p is a 3 × 3 symmetric matrix, that is S p = S p . This means six of the nine components are non-trivial and fully specify S p . Further, there are a total of N p = 435 distinct pairs associated with a squalane molecule. Figure 4 shows a sketch of the averaging process used to obtain the orientation tensor for all the 435 pairs. Given the six non-trivial components of S p and N p = 435 distinct pairs, the orientation information is represented with 6 × 435 = 2610 dimensions. We did not apply a min-max normalization to the high-dimensional data in order to preserve the relative difference between the separate components of the orientation tensor in the reduced dimension.

Dimension Reduction Methods
We now describe the ML techniques that enable the visualization of the 435 × 6 dimensional information associated with the pair orientation tensors in reduced dimensions. We experiment with two dimension reduction techniques: principal component analysis and t-distributed stochastic neighbor embedding, and reduce the data of N dimensions to 2. The N = 6 to 2 dimension reduction reduces the components of S p for each pair p, and the N = 2610 to 2 dimension reduction reduces the information encoded in all components of all pairs for a given pressure and rate. Our code and the data utilized for dimension reduction tasks are available on GitHub. 1

Principal Component Analysis (PCA)
PCA is an unsupervised linear transformation technique used for feature extraction and dimension reduction [57,58]. It enables the visualization of a dataset by performing a linear mapping of the data to a lower dimensional space so that the variance of the data in the reduced dimensional space is maximized. The linear transformation chooses a new coordinate system for the dataset such that the greatest variance by any projection of the dataset comes to lie on the first axis, then called the first principal component (or PC1), which accounts for as much of the variability in the data as possible. The second greatest variance lies on the second axis (PC2) under the constraint that it is orthogonal to the preceding component. Large pairwise distances are preserved after the original data are projected on to the two principal component axes. The implementation involves finding the eigenvalues and eigenvectors of the N × N dimensional covariance matrix associated with the N-dimensional input dataset. The covariance matrix is symmetric and its diagonal elements are the variances of N dimensions. The eigenvectors of the covariance matrix with the largest eigenvalues are identified as unit vectors in the N dimensional space associated with the principal components PC1 and PC2. These components describe the largest variations in the projected high-dimensional data in reduced dimensions. PC1 accounts for the maximum variation in the projected data, and PC2 represents the axis orthogonal to PC1 that is associated with the next largest variation. The PCA utility provided in the scikit-learn package is used to perform the PCA [59].

t-Distributed Stochastic Neighbor Embedding (t-SNE)
t-SNE is a popular non-linear dimension reduction technique which captures non-linear relationships between the elements of the dataset that are not captured by linear methods such as PCA [60]. However, the interpretability of the patterns observed in data reduced via t-SNE is limited compared to PCA. In t-SNE, the local structures associated with the data are preserved while converting from higher dimensions to lower dimensions. t-SNE constructs a probability distribution over pairs of high-dimensional objects such that similar objects have a high probability of being picked, while dissimilar objects have a very low probability of being picked. It defines a similar probability distribution over objects in low dimensions. Finally, it minimizes the Kullback-Leibler divergence between the two distributions with respect to the locations of the objects in respective high and low dimensional spaces.
We use the t-SNE utility provided in the scikit-learn package [59]. The key parameters from an implementation standpoint are: perplexity and n _ iter. Other optimization parameters are set to their default values provided in the scikit-learn library. Perplexity is related to the number of nearest neighbors. We tuned perplexity over a range of 20 to 60 for the different dimension reduction tasks. n_iter represents the maximum number of iterations scikit-learn will perform for a given dimension reduction task. We find that the maximum n_iter value required across all tasks is 3000 via a grid search for the n_iter parameter. So, n_iter is set to 3000 across all studies. We note that the optimization

Results
We now utilize the information encoded in all 435 pairs as well as all the six non-trivial components of the pair orientation tensors to make a complete assessment of the correlation between strain rate, pressure, and molecular alignment in squalane via dimension reduction methods. We begin by showing the results for the dimension reduction of the 6-dimensional dataset formed by the components of S p to 2 dimensions for each pair p.

Dimension Reduction using PCA at a Single Rate
We first discuss the results obtained at a single strain rate using PCA for squalane under ambient conditions of P = 0.1 MPa. It is useful to recall that during the data pre-processing step for PCA, the data for each component of the pair orientation tensor (e.g., S p xx ) are centered using the mean of the components across all 435 pairs (e.g., ∑ 435 p=1 S p xx ), and no normalization is applied to the high-dimensional data. Figure 5a shows a scatter plot of the six dimensional S p components reduced to two dimensions using PCA for 435 pairs associated with the unsheared squalane system. We label this system with "0" strain rate, noting that this result corresponds to an equilibrium NVT simulation of squalane. Each point in Fig. 5a represents one orientation tensor S p for an atom pair p. Almost all pairs have ≈ 0 values (scores) along the two principal component axes, PC1 and PC2. The scores for PC1 and PC2 are similar and these components capture a similar amount of variance in the tensor components ( 72% and 28% by PC1 and PC2, respectively). The pairs are also nearly symmetrically distributed around the rate label 0 which represents the center of all 435 pairs relative to both PC1 and PC2.
Squalane has 4 side atom pairs, 4 end atom pairs, 4 endto-end atom pairs, and 15 physically connected backbone pairs. On the plot, we highlight these distinct groups of atom pairs using the following sets: 4 side atom pairs (7-8, 12-13, 18-19, 23-24), 4 end atom pairs (1-3, 2-3, 28-29, 28-30), 4 randomly chosen physically connected backbone atom pairs (3-4, 11-12, 20-21, 27-28), and 4 end-to-end atom pairs (1-30, 1-29, 2-30, 2-29). These are marked with the atom indices associated with the pair in different colors as noted in the caption. While some of the side atom pairs (12)(13)(7)(8) appear to contribute the largest variation in the data, no clear and significant separation between the different groups of atom pairs is observed. Overall, PCA results suggest that under ambient conditions, squalane acts like an isotropic liquid at zero strain rate. We note that the deviations in the PC1 and PC2 scores from 0 observed in the data for this case are due to finite size effects. These scores provide a reference to interpret the effects of shearing squalane with non-zero strain rates and under higher pressures. Figure 5b shows the PCA results for squalane sheared at a strain rate of 10 9 s −1 , which is chosen as a representative case for the sheared systems. These results are in clear contrast with the zero strain rate case. Most of the 435 pairs now have scores significantly different from 0 along both PC1 and PC2. The PC1 scores are significantly larger ( ≈ 5× ) than PC2 scores, and PC1 captures 96% of the variation in the tensor components associated with the pairs. The average pair behavior coincides with the center ( PC1 = 0, PC2 = 0 ) as evidenced by the " 10 9 " rate label, and is similar to the zero strain rate case. However, the distribution of the pairs in the 2D scatter plot is clearly asymmetric and inhomogeneous, indicating that many pairs contribute differently to the variations in the orientation tensor components.
We now examine these differences using the 4 groups of atom pairs defined above. Unlike the unsheared system, PCA clearly groups these four sets of atom pairs into well-separated clusters for squalane sheared at 10 9 s −1 . The side atom pairs (blue) account for the largest variation in the pair orientation tensor components S p along PC1, followed by the end atom pairs (lime green) and the set of backbone atom pairs (olive green). The end-to-end pairs also contribute significantly albeit in a contrasting role as indicated by the negative sign associated with their PC1 scores. The large density of points near the end-to-end pairs corresponds to pairs with atoms separated by at least 25 atoms. These pairs play a similar role as the end-to-end atom pairs in explaining the variation in the data. The large variations across the S p components exhibited by the side (and end) atom pairs can be attributed to their atoms being relatively less constrained in responding to thermal forces which can induce changes in all 6 S p components. On the other hand, the end-to-end atom pairs respond strongly to shear-driven alignment which couples most significantly to the flow and gradient directions and likely induces large changes in the S p xx , S p xy , S p yy components. The PCA scatter plot establishes that these two distinct groups of atom pairs contribute in contrasting roles.
Our aim is for PCA to furnish the principal components that enable the examination of the competing contributions of side atom pairs vs end-to-end atom pairs to the variations in the S p components across all pairs and all strain rates. We also expect PCA to reveal any significant changes in the flow behavior vs rate and provide a complete assessment of the extent of shear thinning in squalane resulting from changes in molecular alignment. Figure 6 shows the visualization of the dimension reduced S p tensors on a 2D scatter plot using PCA. Each point represents an orientation tensor S p . Different symbols correspond to different strain rates and are manually assigned without invoking any clustering algorithm. The pair orientation tensor data for each rate are represented by 435 points. Labels for each strain rate are placed at the center of the data of that rate to facilitate assessing the separation between different clusters. We also mark the atom indices associated with the side atom pairs, randomly chosen backbone atom pairs, and end-to-end atom pairs in different colors as noted in the caption. The index labels for the end atom pairs are not shown for the sake of clarity. We first note that PC1 scores are significantly larger ( 5× ) compared to PC2 scores. The values 0.95 and 0.03 on the axis labels PC1 and PC2 indicate variance ratios that signify the amount of variance in the data captured along the respective axis. Using the variance ratios, we Behavior of atom pairs associated with a squalane molecule illustrated via dimension reduction of orientation tensors using PCA for 8 strain rates ̇= 0, 10 7 , 2 × 10 7 , 10 8 , 2 × 10 8 , 10 9 , 2 × 10 9 , and 10 10 s −1 under ambient conditions. Each point represents one pair orientation tensor. Each rate is represented by 435 pairs. Different symbols of points represent different rates. Shear rate label with the same color marks the center of the data with respect to all the pairs associated with that rate. Side, backbone, and end-to-end pairs are labeled with pair numbers in blue, olive green, and magenta colors, respectively. The values in the brackets on the PC1 and PC2 axis labels are the variance ratios which signify the amount of variance captured along the respective axis (Color figure online) find that PC1 captures 97% of the variance in the data. Atom pairs belonging to high rates ( ̇≳ 10 9 s −1 ) have high PC1 scores. The end-to-end atom pairs contribute the largest variation in the S p components for a given high rate.
Examining the rate labels shows that the clusters of S p data for different rates are well separated from each other and organized along PC1 by strain rate increasing from left to right. This indicates that changes in shear strain rate strongly correlate with the variations in S p components.
The S p data for end-to-end atom pairs at different rates is grouped into well-separated clusters. The associated PC1 scores range from ∼ −0.1 to ∼ +0.4 as the rates shift from 10 7 to 10 10 s −1 . This large evolution in the orientation tensors for end-to-end atom pairs in 2 dimensions likely originates from significant changes in S p xx , S p yy , and S p xy in the 6-dimensional space because these components couple most strongly to shear. This signals a strong correlation between changes in molecular alignment and the shear thinning of squalane over this range of rates. Figure 6 shows that the side atom pairs have high negative PC1 scores, indicating that they contribute to the variation in S p components in a contrasting role to the end-to-end atom pairs. These roles are reversed in comparison to the system sheared at a single rate of 10 9 s −1 ; Fig. 5b. Further, the side atom pairs for all rates are grouped together, indicating that the changes in strain rate minimally affect the S p components associated with these pairs. PCA shows that the pair orientation tensor datasets for rates ̇≤ 2 × 10 8 s −1 are grouped in a region (with PC1 scores ≲ 0 ) that is well separated from the datasets for ̇> 2 × 10 8 s −1 . The separation of clusters for ̇≤ 2 × 10 8 s −1 from the S p datasets for ̇> 2 × 10 8 s −1 suggests a significant change in the pair orientation tensors as the strain rate increases beyond 2 × 10 8 s −1 . This shift can be associated with the change in the flow behavior seen in the viscosity vs rate data in Fig. 2b from Newtonian flow to non-Newtonian shear thinning with increasing rate. The observation of the crossover between the Newtonian and non-Newtonian flow regimes is made more precise and clear via dimension reduction using t-SNE. Figure 7 shows the dimension reduced S p tensors using t-SNE for the same set of physical systems analyzed using PCA. Different strain rates are represented with the same color scheme and marker style as in PCA. Following standard practices, we have not shown the abscissae and ordinates explicitly in the plot because t-SNE preserves the differences between samples in the high-dimensional space in transforming the information into two dimensions to produce abscissae and ordinates whose absolute values are not comparable and meaningful on their own [60,61]. Each point corresponds to one pair orientation tensor S p . t-SNE groups pairs associated with each rate into well-separated clusters for almost all pairs, indicating that the S p components are more clearly separable based on rates compared to what PCA revealed. A clear gap is observed between the clusters for ̇< 2 × 10 8 s −1 (top left) and those for ̇≥ 2 × 10 8 s −1 (bottom right). Unlike PCA, t-SNE groups the S p data associated with 435 pairs at 2 × 10 8 s −1 closer to the S p data at higher rates. The clear separation between pair orientation tensor data for ̇≥ 2 × 10 8 s −1 and ̇< 2 × 10 8 s −1 , and the prediction of a crossover rate between 1 × 10 8 and 2 × 10 8 s −1 is in agreement with the Newtonian to non-Newtonian transition in the shear flow response observed in the viscosity vs rate plot shown in Fig. 2b. t-SNE also confirms the PCA results on the contrasting roles of the end-to-end atom pairs and the side atom pairs in governing the evolution in S p components with rate. The side atom pairs are among the least well-separated set of atom pairs. On the other hand, the end-to-end atom pairs best classify the S p data for different rates as they are among the most well-separated set of atom pairs. This suggests that the important differences in the high-dimensional dataset are Fig. 7 Behavior of atom pairs associated with a squalane molecule illustrated via dimension reduction of orientation tensors using t-SNE for 8 strain rates ̇= 0, 10 7 , 2 × 10 7 , 10 8 , 2 × 10 8 , 10 9 , 2 × 10 9 , and 10 10 s −1 under ambient conditions. Each point represents one pair orientation tensor. Each rate is represented by 435 pairs. Different symbols represent different rates. Shear rate label with the same color marks the center of the data with respect to all the pairs associated with that rate. Side, backbone, and end-to-end pairs are labeled with pair numbers in blue, olive green, and magenta colors, respectively (Color figure online) originating from the S p features associated with the end-toend atom pairs. Overall, the dimension reduction of pair orientation tensor data involving all 435 pairs and 6 S p components via PCA and t-SNE shows clear evidence of a strong correlation between shear strain rate and the evolution of S p . Changes in S p are dominated by the end-to-end atom pairs when data across all systems sheared at different strain rates is analyzed. A clear distinct grouping of datasets for rates ̇≥ 2 × 10 8 s −1 signals a dramatic change in the flow behavior at a rate between 1 × 10 8 and 2 × 10 8 s −1 . This result can be utilized to determine the characteristic rate for the onset of shear thinning in the Carreau model ( ̇0).

Rheological Response Under Different Pressures
We now present the results for the dimension reduction of the pair orientation tensor data associated with the shear flow of squalane under different pressures. Figure 8 shows six scatter plots representing the PCA results for six different pressures P = 0.1, 100, 400, 636, 875, 955 MPa. Results for the ambient pressure P = 0.1 MPa are re-plotted in Fig. 8a and serve as a reference to assess changes in S p with increasing P. For P = 100 MPa, results are shown for the same set of 8 strain rates as used in the system under ambient conditions: 0, 10 7 , 2 × 10 7 , 10 8 , 2 × 10 8 , 10 9 , 2 × 10 9 , 10 10 s −1 . For P = 400 MPa, in addition to these rates, results are shown for 10 6 s −1 . For P = 636, 875, 955 MPa, results are shown for 2 additional rates: 10 6 and 2 × 10 6 s −1 . Other details are the same as noted for the system under ambient conditions. We note that the PCA plots and the associated scores remain unchanged by excluding the zero strain rate system from the dimension reduction tasks for the different pressures reported in Fig. 8. We have included the dimension reduction results for the zero rate case as a reference for each pressure to ease the assessment of changes in the flow response of squalane with increasing rate (for example, by examining the separation of the different clusters from the "0" labeled cluster). For all pressures, PC1 scores are significantly larger ( ≈ 5× ) than PC2 scores. PC1 captures 96 − 98% of the variance in the data for the pressures studied. For 100 MPa, the rate labels for the clusters of S p data for many rates are well separated from each other. The clusters are organized by strain rate along PC1, however, in contrast to the ambient case, the PC1 scores now increase with decreasing rate. This switch can be attributed to the large contributions of the side atom pairs to the variations in the pair orientation tensor components, as reflected by their large positive PC1 scores, which surpass those of the end-to-end atom pairs that receive large negative PC1 scores. The end-to-end atom pairs provide the clearest separation of S p data associated with different rates, although the separation is worse for the highest rates ( 10 9 , 10 10 s −1 ) compared to the ambient case. The associated PC1 scores range from ∼ −0.3 to ∼ +0.15 as the rates shift from 10 7 to 10 10 s −1 . These scores vary over a narrower range compared to the ambient case suggesting that the evolution in S p for the end-to-end atom pairs and related changes in molecular order saturate relatively quickly. Despite the overall greater degree of overlap between clusters of S p data for different rates compared to the ambient case, there is a correlation between changes in strain rate, molecular alignment, and shear thinning of squalane over the examined range of rates for 100 MPa. PCA shows that the pair orientation tensor datasets for rates ̇≤ 2 × 10 7 s −1 are grouped in a region (with PC1 scores ≳ 0 ) that is well separated from the datasets for ̇> 2 × 10 7 s −1 . This distinct grouping of clusters of S p datasets for ̇> 2 × 10 7 s −1 is similar to that observed in the ambient case (albeit at a different crossover rate), and signals a similar significant change in the pair orientation tensors and flow behavior as the strain rate increases beyond 2 × 10 7 s −1 . As before, a more precise assessment of this crossover is obtained via the dimension reduction using t-SNE shown later.
PCA results for 400 MPa show significant differences from the results at lower pressures P = 0.1, 100 MPa. First, the labels for the clusters of S p data associated with different strain rates show no discernible ordering of strain rates along PC1 (or PC2). These rate labels are also not clearly separable for most rates except for the zero strain rate system. The S p data for 1 × 10 6 s −1 are separated from the data for the higher strain rates but only marginally as indicated by its rate label. In stark contrast with results at lower pressures, the end-to-end atom pairs do not clearly separate S p tensors associated with different strain rates for 400 MPa. The associated PC1 scores range from ∼ −0.2 to ∼ −0.05 as the rates shift from 10 6 to 10 10 s −1 . These scores vary over a much narrower range compared to the results for P = 0.1, 100 MPa systems, signaling that the evolution in S p vs rate for the end-to-end atom pairs and related changes in molecular order saturate relatively quickly. These observations imply that the extent of shear thinning of squalane at 400 MPa arising from shear-driven molecular alignment is limited. The side atom pairs contribute the largest variation in the pair orientation tensor components along PC1, easily surpassing the contributions of the end-to-end atom pairs. These atom pairs (identified by blue atom indices) separate out into a cluster and are not separable by rate. The next largest variation in S p components is contributed by the end atom pairs (indices not labeled) and by the atom pairs associated with the system at zero strain rate. The closely spaced clusters associated with these three sets of atom pairs suggest the presence of a common mechanism. The dominant contribution to shear thinning of squalane as it is pushed toward the glassy regime at 400 MPa may originate from the thermally activated rearrangements of the side and the end atom pairs. New investigations utilizing different types of high-dimensional data are required to make progress toward a more precise diagnosis. PCA results for higher pressures of P = 636, 875, 955 MPa share many similarities with the results for 400 MPa. The rate labels are stacked almost on top of each other. There is no evidence of any ordering of strain rates along PC1 or PC2. Except for the unsheared system, there is no separation of S p data by strain rate and no clear grouping of clusters of S p datasets for different sets of low and high rates (as seen for P = 0.1, 100 MPa). The latter observation confirms that the flow behavior for the probed rates under these pressures is squarely within the shear thinning region; Fig. 2b. The ability of the end-to-end atom pairs to separate the data at different strain rates worsens as P is increased. The associated PC1 scores range from ∼ −0.2 to ∼ −0.05 , similar to 400 MPa, as the rates shift from 10 6 to 10 10 s −1 . These scores vary over a much narrower range compared to the results for P = 0.1, 100 MPa systems, signaling that the evolution in S p vs rate for the end-toend atom pairs and related changes in molecular order saturate relatively quickly. These observations imply that the contribution to shear thinning of squalane at high P due to shear-driven molecular alignment is small and bounded. The side atom and end atom pairs dominate the contribution to the variation in the pair orientation tensor components along PC1. Their PC1 scores slightly increase with pressure. Similar to the case of 400 MPa, the clusters formed by these groups of atom pairs are close to the cluster associated with S p data for all pairs belonging to the unsheared system. As squalane is pushed deep into the glassy flow regime under these high pressures of P = 636, 875, 955 MPa, we expect the common mechanism of thermally activated rearrangements of the side and end atoms to dominate the contribution to the decrease in viscosity with rate. We now discuss the results of dimension reduction for different P using t-SNE. Figure 9 shows the visualization of the dimension reduced S p tensors (from 6 dimensions to 2 dimensions) using t-SNE for the same set of physical systems analyzed using PCA in Fig. 8. For each pressure, different strain rates are represented with the same symbol scheme as in Fig. 8; other details are also the same as in the PCA plots. Results for the ambient pressure P = 0.1 MPa are re-plotted in Fig. 9a to serve as a reference to assess changes in S p with increasing P. Figure 9b shows t-SNE results for P = 100 MPa, which share many similarities with the results for the ambient case. t-SNE groups atom pairs comprising the S p data associated with different strain rates into well-separated clusters for a majority of pairs, signaling that the 6-dimensional S p tensor components are clearly separable based on rates. Similar to the 0.1 MPa system, t-SNE shows that the end-to-end atom pairs best classify the S p data for different rates as they are among the most well-separated set of atom pairs occupying the farthest regions in the 2D scatter plot. t-SNE confirms the PCA results on the contrasting roles of the end-to-end atom pairs and the side atom pairs in governing the evolution in S p components with rate. The side atom pairs are among the least well-separated set of atom pairs. There are few differences between the t-SNE results for the ambient case and P = 100 MPa system. The side atom pairs and the backbone atom pairs organize in larger across different subplots (e.g., green diamond represents 435 pairs at ̇= 10 7 s −1 across all six pressures.) Shear rate label with the same color marks the center of the data with respect to all the pairs associated with that rate. Side, backbone, and end-to-end pairs are labeled with pair numbers in blue, olive green, and magenta colors, respectively (Color figure online) numbers into more clearly separable and dense clusters for 100 MPa. An increase in the overlap of pairs (e.g., atom pairs separated by at least 25 atoms) associated with rates ̇≥ 10 8 s −1 is observed for 100 MPa compared to the ambient case.
t-SNE results for 100 MPa show a clear gap between the clusters for ̇≤ 2 × 10 7 s −1 and those for ̇> 2 × 10 7 s −1 . This distinct grouping of pair orientation tensors and the prediction of a crossover rate between 2 × 10 7 and 1 × 10 8 s −1 are in agreement with the Newtonian to non-Newtonian transition in the shear flow response observed in the viscosity vs rate plot shown in Fig. 2b. This crossover rate is lower than that for the ambient conditions. Figure 9c shows the t-SNE results for P = 400 MPa which show important differences from the results at lower pressures. The atom pairs belonging to the S p data associated with different strain rates are not easily separable for most rates. Only the atom pairs for the 10 6 s −1 rate and the unsheared system are separately grouped together into clusters well separated from the atom pairs that belong to higher strain rates. Unlike PCA results, the end-to-end atom pairs are able to separate the different strain rates but they are no longer the most well-separated and spread out pairs as in the case of P = 0.1, 100 MPa systems; Fig. 9a and b. The distance between clusters of the end-to-end atom pairs for higher rates is substantially reduced, and the pairs appear to cluster on one side of the 2D scatter plot. Similar to PCA, t-SNE groups the side atom pairs (blue indices) and the end atom pairs for all strain rates into separate clusters. The clusters formed by these groups of atom pairs are close to the S p data cluster for all pairs associated with the unsheared system. t-SNE results for 400 MPa also show a relatively more clear separation between the data at higher rates and the S p cluster at 10 6 s −1 . Including the pair orientation tensor data (not shown) from ̇= 3 × 10 5 s −1 in the dimension reduction task for 400 MPa, t-SNE predicts a crossover rate to Newtonian flow occurs in the vicinity of 10 6 s −1 , in agreement with Fig. 2b. t-SNE results for the higher pressures of P = 636, 875, 955 MPa share many similarities with the corresponding PCA results. The rate labels for S p data associated with different decades of rate are nearly stacked on top of each other, with the separation getting worse as P increases. Only the data for the unsheared system are clearly separated by strain rate.
The end-to-end atom pairs separate the data for different strain rates, but the distance between these clusters is further reduced compared to the 400 MPa system. These pairs are organized on one side of the 2D scatter plot (e.g., bottom left for P = 875, 955 MPa) and are substantially less spread out relative to the large separations observed for lower P = 0.1, 100 MPa systems.
Similar to PCA, t-SNE groups the side atom pairs (blue indices) and the end atom pairs (not labeled) for all strain rates into separate clusters. The clusters formed by these groups of atom pairs are close to the S p data cluster for all pairs associated with the unsheared system. The separation between these different clusters decreases with increasing pressure. This further supports the existence of a common mechanism governing the dynamics of atom pairs associated with these groups. We note that t-SNE groups the backbone atom pairs for all non-zero rates together into a cluster well separated from the side and end atom pairs. Furthermore, the large mix of atom pairs in the middle of the scatter plots for these high pressures corresponds to pairs separated with the number of atoms between 3 and 25.

Single Dimension Reduction for All Systems Using PCA
We now discuss the PCA results for the dimension reduction of the high-dimensional pair orientation tensor data associated with the shear flow of squalane for all non-zero strain rate values and all pressures using a single dimension reduction plot. The principal components PC1 and PC2 furnished by PCA employ the information from 49 systems characterized with different rates and pressures. Figure 10a shows the reduction of the 6-dimensional S p tensors to 2 dimensions for pressures P = 0. Our experiments with dimension reduction of six pair orientation tensor components to 2 dimensions for all 435 Fig. 10 a Behavior of atom pairs associated with a squalane molecule illustrated via dimension reduction of orientation tensors using PCA for all 49 sheared systems characterized with different pressures P ∈ (0.1, 955) MPa and rates ̇∈ (10 6 , 10 10 ) s −1 . Each point represents one pair orientation tensor. Different symbols represent different systems identified with the pressure-rate ( P−̇ ) labels shown in the legend panel on the right. P−̇ label with the same color mark the center of the data with respect to all the pairs associated with that system. Side, backbone, and end-to-end pairs are labeled with pair numbers in blue, olive green, and magenta colors, respectively. The values in the brackets on the PC1 and PC2 axis labels are the variance ratios which signify the amount of variance captured along the respective axis. b Behavior of squalane under different P and ̇ conditions illustrated via the dimension reduction of all pairs and all orientation tensor components using PCA. The same color scheme noted in the legend on the right is used to represent all the 49 systems. c Rate dependence of the PC1 scores associated with systems at pressures P indicated in the legend in (c). PC1 scores are obtained from the scatter plot shown in (b) (Color figure online) pairs reveal the important roles of the side atom pairs and the end-to-end atom pairs in assessing the correlation between changes in molecular alignment and shear thinning under increasing pressures. In addition to these studies, PCA can be used to process and reduce the entire N = 2610 dimensional data, consisting of 435 pairs and six orientation tensor components to 2 dimensions in order to reveal what features dominate the variations across all 2610 dimensions for all 49 systems. This dimension reduction results in 1 point for each system characterized with a given rate and pressure. Figure 10b shows a single scatter plot of the reduction of the 2610-dimensional orientation tensor components associated with the shear flow of squalane for all strain rates and pressures (49 different systems). The symbol scheme is the same as noted in Fig. 10a. PC1 scores are 10× larger compared to PC2, with PC1 capturing 99% of the variance in the entire dataset consisting of all 49 systems. Figure 10b shares similar information as Fig. 10a but with more clarity. The aforementioned three groups of systems are more clearly identified in Fig. 10b. Low-pressure systems sheared at smaller strain rates are characterized with PC1 scores > 6 and clearly separate out as a group that can be associated with flow closer to the Newtonian regime. Within this group, it is easy to separate different systems. The high-pressure systems ( P ≥ 400 MPa) at strain rates > 10 6 s −1 form a distinct group with negative PC1 scores between ∼ 0 and ∼ −3 . Most of the systems within this group are not easily separable as evident from the large mix of closely spaced pressure-rate labels. This group can be associated with non-Newtonian shear thinning dominated by a mechanism that does not rely on changing orientational order. Finally, systems characterized with small positive PC1 scores between ∼ 0 and ∼ 3 form another group. Within this group, the systems are relatively more separable and can be associated with non-Newtonian shear thinning dominated by changes in molecular alignment. The crossover from Newtonian regime to non-Newtonian shear thinning is clearly seen with increasing strain rate.
The dimension reduction tasks based on pair orientation tensors show that changes in molecular order in squalane are not correlated with strain rate for high-pressure systems even after considering all 435 pairs and all 6 S p components, as evident by the large mix of non-separable labels for these systems in Fig. 10b. This can also be observed by examining the PC1 scores for each rate and pressure obtained from Fig. 10b as a function of shear strain rate. Figure 10c shows that PC1 scores for low-pressure systems (0.1, 100 MPa) are strongly correlated with increasing rate. On the other hand, PC1 scores exhibit minimal variation with rate for the highpressure systems. Figure 10c closely resembles the S yy results shown in the order parameter plot (Fig. 3). This similarity bolsters the conclusions drawn from the earlier study [9] using the orientation tensor components along flow and gradient direction associated with a single end-to-end atom pair. The utilization of all the six components of the orientation tensors for all 435 pairs reaffirms that the decrease in viscosity with rate for low pressures is strongly correlated with changes in the alignment of molecules. However, for high pressures, shear thinning occurs at saturated orientational order.

Discussion and Conclusion
The complexity of the high-dimensional output data originating from non-equilibrium molecular dynamics simulations can often make its analysis time consuming using conventional post-processing methods. We highlighted the capabilities of machine learning-based feature investigation tasks to expedite the analysis of this data and obtain reliable information. The approach was illustrated by examining the link between the rheological properties and the molecular-scale structure of squalane via the dimension reduction of the information encoded in all the six components of the orientation tensors of all 435 atom pairs associated with a squalane molecule. PCA was used to assess the correlation between strain rate and the evolution of the components of the pair orientation tensors S p for different pressures, and examine the contributions of distinct sets of atom pairs to the variation in the six components of the pair orientation tensors. t-SNE was used to more precisely assess the clustering of pair orientation tensor data by PCA and probe the crossover from Newtonian to non-Newtonian flow regimes. Dimension reduction showed a clear, distinct grouping of atom pairs in the reduced dimension for systems characterized with small strain rates and pressures (0.1, 100 MPa), signaling a dramatic change in the flow behavior with increasing rate for a given pressure. This was related to the transition from Newtonian to non-Newtonian flow. The associated crossover rates predicted via dimension reduction can be used to determine the characteristic rate for the onset of shear thinning in Carreau and Eyring models.
Dimension reduction of 6-dimensional S p data in 2 dimensions showed that both the side atom pairs and the end-to-end atom pairs contribute significantly to variations in the orientation tensor components, but in contrasting roles. The end-to-end atom pairs dominated the largest variations in S p components for low-pressure systems (0.1, 100 MPa), and provided the clearest separation of the S p data with rate. A strong correlation between the shear rate and the PC1 scores was observed. On the other hand, the side atom pairs dominated the largest variation in S p components for the high-pressure systems ( P ≥ 400 MPa). For these systems, no clear correlation between shear rate and PC1 scores was observed and the ability of the end-to-end atom pairs to separate data at different rates worsened with increasing pressure. The evolution in S p vs rate for most pairs associated with high-pressure systems saturated relatively quickly, implying a small and limited contribution to shear thinning due to changes in shear-driven molecular alignment. Dimension reduction of the 2610-dimensional data consisting of all 435 pairs and six components of the orientation tensors for all strain rates and pressures reaffirmed the presence of 3 distinct groups of systems associated with the Newtonian flow, shear thinning with nearly saturated molecular order, and shear thinning dominated by changes in molecular alignment. Examining the PC1 scores for this dimension reduction task bolstered the conclusions drawn from the earlier study by Jadhao and Robbins [9] using the orientation tensor components along flow and gradient direction associated with a single end-to-end atom pair. Specifically, the utilization of all the six components of the orientation tensors for all 435 pairs showed that the decrease in viscosity with rate for low pressures is strongly correlated with changes in the alignment of molecules along the flow direction. However, for high pressures, shear thinning occurred at saturated orientational order.
Dimension reduction showed that the mechanisms responsible for shear thinning under high pressures, where squalane exhibits glassy flow, do not involve long-range molecular alignment such as that associated with the end-to-end atom pairs, but may involve the rearrangements of the side atom pairs. These rearrangements could be dominated by shear-assisted thermal activation [62,63]. Other mechanisms such as those originating from the elasticity of the squalane molecules as represented in the Prandtl model may also explain the origins of the shear thinning behavior at high pressures [64]. New investigations utilizing different types of high-dimensional data are required to make progress toward a more precise diagnosis. For example, investigating the time evolution of the pair orientation tensors (averaged only over the number of molecules) can help preserve and quantify the rearrangements of different pairs with time. Further, enhancing the input information provided to the dimension reduction task by including the pressure and kinetic energy tensors can also assist in diagnosing the origins of the non-Newtonian shear flow under high pressures.
Dimension reduction methods explored in this work can be used to complement simulations aimed at describing the shear flow of other small-molecular liquids. In particular, studies of liquids such as nonadecane and 9-octylheptadecane, which have a similar number of carbon atoms as squalane, can help assess the correlation between shear thinning and molecular alignment for molecules of different architecture. These methods can also be extended to long-chain polymer systems to estimate how much shear thinning may come from alignment [25,56]. In many applications, EHL fluids contain a mixture of molecules of different architecture and length. There may be shear thinning correlated with the order of one species, say a long polymer, that is dissolved in a lower viscosity base oil. Application of dimension reduction methods to analyze the simulations of such fluid mixtures over a range of pressures and temperatures [65] could also be valuable. In general, we expect that the use of the dimension reduction methods to visualize high-dimensional information can complement simulations and experiments aimed at furnishing an improved understanding of the molecular-level mechanisms underlying the macroscopic rheological response of liquids.