Introduction

In normal and diseased tissues, cells are continually exposed to a wide variety of extracellular stimuli, including growth factors and cytokines, that modulate morphological and phenotypic states. However, quantitative assessment of complex morphological states remains a challenging problem. Single timepoint measurements provide some information about cell state but do not capture how responses evolve over time. Live-cell imaging has been deployed to characterize dynamical changes in cell morphology, or cellular morphodynamics1,2. Recent advances in live-cell imaging technologies have enabled unprecedented assessment of the behavior and interactions of cellular populations3,4,5,6. To date, however, most analyses of live-cell image data have been primarily based upon classification of cell morphology observed in individual time points and do not directly examine the rich dynamic landscape of cell morphology trajectories7,8,9.

Here we describe a generalizable morphodynamical trajectory embedding method to analyze live-cell imaging datasets composed of unlabeled phase-contrast microscopy images. Analysis methods directly based upon trajectory features that are aggregated over multiple time points have been used to classify mitosis and apoptosis10, and also to monitor signaling responses via reporter molecules11. We show here that mapping the multiple-time-point trajectory space of cells, rather than examining single-cell time courses built from snapshots, can increase the information extracted from live-cell imaging experiments and improve the quantitative description of cellular responses.

Live-cell imaging provides temporal information not available from other single-cell and omic measures. Single-cell RNA sequencing can assay thousands of molecular read-outs across thousands or hundreds of thousands of cells. A common data analysis procedure is the extraction of cell states, including continuous low-dimensional cell state spaces, from high-dimensional molecular data12. Because sequencing is a destructive readout, single-cell trajectories in this space can only be inferred indirectly through population time-series modeling13,14,15,16 or pseudo-time approaches17,18; however, approaches such as “RNA velocity”19,20 have been used to infer cell state dynamics. In contrast, live-cell imaging enables cellular and phenotypic responses to be assayed over multiple time points.

In a live-cell imaging experiment, cell responses to a perturbation, such as the addition of a signaling ligand to the cell culture medium, can be examined. Cells with shared response patterns can be assigned to distinct cell states that can then be tracked over time to quantify response dynamics. Cell states can be discrete, as for cell-cycle states G0-G1-G2-M, or continuous as for epithelial-to-mesenchymal transition (EMT). Live-cell imaging has been used to develop gene-level functional annotation, including in RNAi gene knockout screens and drug screening7, and also to study how single-cell trajectories evolve in a cell state space that represents distinct cell-cycle phases8. Gordonov et al. developed an unsupervised approach to characterize live-cell state, analyze cell shape space, and obtain models of cell responses that included three distinct cell states21. This workflow of live-cell imaging, segmentation, featurization, and tracking has been used to describe cell state as a continuum22 and to develop a cell trajectory-based description of EMT23 in the space of single-timepoint snapshot features. Heryanto et al. utilized 3D shape descriptors to explore the relationship between 3D shape and cell motility24. Trajectory information, including combined motility and morphological features computed as averages over single-cell trajectories, have also been used to define and identify cell state space in microglia and neural progenitor cells25.

Trajectories are the natural space from which to classify a system out of equilibrium, such as a living cell26,27. Leveraging such a framework, however, requires that the state space of the system can be measured. Floris Takens’ seminal trajectory embedding theorem28 states that in a deterministic dynamical system, there is a 1:1 correspondence between the space of the full dynamical system and that formed by concatenating incomplete observations of the system across time—the “trajectory embedding” space, also referred to as delay-embedding. For cell morphodynamics, the incomplete observations are single-cell morphological features (snapshots) and concatenating features across time forms morphodynamical feature trajectories, which we refer to as “trajectories.” For \({N}_{f}\) features and \({n}_{\tau }\) trajectory timepoints, the trajectory of a cell can be considered a vector of dimensionality \({N}_{f}\) × \({n}_{\tau }\). In stochastic systems or systems of sufficient complexity, such as a cell, a comprehensive map enabling perfect prediction of the dynamical system is not necessarily achievable, but trajectory embedding can still lead to an improved characterization of the dynamical behavior29,30,31. Trajectory embedding methodology has been applied in fields as diverse as weather prediction32, economics33, and molecular dynamics34,35,36, but to our knowledge has not yet been applied to examine cell states observable in live-cell imaging assays.

Here, we develop and apply the morphodynamical trajectory embedding method in a dataset of MCF10A mammary epithelial cells perturbed with a set of six ligands that target three signaling pathways (PI3K/AKT, MAPK/ERK, and SMAD) and induce distinct cellular responses, including changes to cell proliferation, differentiation state, and motility and which are known to be important in mammary tissues37. The live-cell imaging data are part of a broader data collection effort through the Library of Integrated Network-Based Cellular Signatures (LINCS) consortium38,39 MCF10A project37 where the molecular and phenotypic responses to these ligand perturbations were assessed. Molecular and cellular responses indicate changes in multiple pathways and initiation of unique cellular responses in each ligand condition37. Our live-cell imaging cell-trajectory-based analysis was developed to characterize the morphodynamical changes associated with molecular responses. By directly analyzing morphological feature trajectories, rather than single-timepoint features or averaged features over time, we leverage the additional information contained in the time-ordered single-cell trajectory information.

Results

We applied our trajectory embedding analysis to systematically characterize cell state from live-cell imaging of MCF10A mammary epithelial cells treated with a panel of ligands that induce distinct phenotypic responses: PBS (no ligand; control), EGF, HGF, OSM, BMP2 + EGF, IFNG + EGF, and TGFB + EGF. Cells were assessed via phase-contrast microscopy over 48 h, with images collected every 30 min, as part of the LINCS MCF10A project37. Single-cells were segmented, featurized, and tracked through time as described in Methods. In our “trajectory embedding” approach, time-sequences of features for the trajectory length under consideration were concatenated and used for UMAP40 dimensionality reduction as the basis for further analysis. The single-cell trajectories are the set of extracted single-cells and their tracks, or linkages between frames. An overview of the live-cell imaging trajectory embedding workflow is shown in Fig. 1. The quantification of morphodynamical cell states demonstrates the biological information intrinsic to cellular trajectories in a broadly applicable and relatively simple imaging assay.

Fig. 1: Live-cell imaging analysis and trajectory embedding pipeline.
figure 1

The data analysis pipeline starts from 48 h image stacks and proceeds to the morphodynamical trajectory analysis. Image processing steps include a preprocessing, b cell segmentation, c featurization (z-normalized phase-contrast pixel values colored red positive to blue negative), d tracking (cell boundaries at t, t + 30 min with cell centers connected by black arrows), e extracting morphodynamical trajectories of principal component features as sliding window cell feature trajectory snippets \(\{{\mathop{X}\limits^{ \rightharpoonup }}_{t}\}\) from cell linkages (3 possible trajectory snippets of length 3 shown in green, red, and blue), f trajectory embedding (UMAP), and g cell state and trajectory analysis in the trajectory embedding (UMAP) space using trajectories longer than the trajectory embedding length.

Comparing morphodynamical trajectories between the different ligand treatments requires the construction of a shared cell state space, which we created by analyzing all of the trajectories from the full set of treatments through the dimensionality reduction pipeline together. The single-cell trajectories we use for a trajectory embedding of length \({n}_{\tau }\) consist of all possible trajectory snippets of length \({n}_{\tau }\) in the full trajectory set; for example, a single cell that is tracked over 12 frames will have 5 possible trajectory snippets of length 8 in a sliding window manner (frames 1–8, 2–9, 3–10, 4–11, 5–12). Unless otherwise indicated, we used all available trajectory snippets over the 48-h experiment. Snippets mapped to the same location in the reduced-dimensionality cell state space share qualitatively similar morphologies across trajectory timepoints and across all treatments (Supplementary Fig. 2). In this shared morphodynamical trajectory space, ligand treatments alter the distribution of morphodynamical cell states (Fig. 2).

Fig. 2: Trajectory embedding increases the distinguishability of cell states induced by ligand perturbation.
figure 2

a Representative background subtracted phase-contrast images (size 1.6 mm × 1.9 mm with z-normalized phase-contrast pixel values colored red (positive phase contrast) to blue (negative phase contrast) in the set of ligand conditions (top to bottom) at 0, 6, 12, 24, and 48 h (left to right). b, c Distributions of cells in trajectory embedding space over the 48 h of imaging. b Snapshot space (trajectory embedding length = 1), and c trajectory embedding length = 8. d Average pairwise overlap over all treatment pairs (shared area under probability distributions) as a function of the morphodynamical feature trajectory length used in the embedding (log2 x-axis scale), comparing trajectory embedding (blue dashed lines) and null model with randomly scrambled time labels within treatments (red dashed lines), averaging over results obtained by dividing data into three sets by field of view (diamonds), with error bars from a bootstrapped 95% confidence interval over the three data splits. A trajectory length of 1 corresponds to a snapshot description.

Separation of unique and shared cell state under ligand perturbation

Ligand perturbation can induce time-dependent morphologic and phenotypic changes, including cycling rate, motility, and cytoskeletal features. Cells in the control condition (PBS, no ligand) did not proliferate, while cell populations grown in the other treatments display changes in proliferation and morphology as early as 6 h (Fig. 2a). For example, TGFB + EGF ligand treatment increased cell spacing and induced large lamellopodia, while OSM treatment induced tightly packed cell clusters; see Gross et al.37 for further phenotypic assessment. These changes motivate our interest in quantitative analysis of morphodynamical trajectories.

We characterized changes to cell morphodynamics under the different ligand treatments by quantifying the similarity between distributions of morphodynamical trajectories between ligand conditions (Fig. 2b–d). We found similar ligand-specific distributions in the embedding space of morphological snapshots, with increased ligand-specific uniqueness observed in the embedding space of morphological trajectories. At the snapshot level, which excludes trajectory information, Fig. 2b shows that over the course of the experiment cells occupy broad distributions in the embedding space. For example, we observed distinct shifts in occupancy that separate OSM and TGFB + EGF from other conditions, which is consistent with the distinct morphologies associated with these two treatments (Fig. 2a). At a trajectory length of 8 steps (3.5 h), these broader relationships are preserved but the cell state distributions in the embedded space become more condensed and display distinct peaks (Fig. 2c). The uniqueness of cell state distributions between ligand treatments is reflected in a monotonic reduction in the shared area, or overlap, between cell state probability distributions with increasing trajectory length (Fig. 2d). The pairwise overlap decreased more rapidly than in a null model in which cell features were randomly scrambled within treatment (Fig. 2d). Thus, the trajectory embedding that leveraged information across timepoints yielded improved description of the ligand-specific morphodynamical responses as compared to snapshot analysis.

Improved cell state description from morphodynamical trajectories

Cell states can be defined by identifying metastable regions of the morphodynamical trajectory embedding space where trajectories remain localized for extended time periods. To compare cell states and their relationships, we extracted the single-cell dynamics in the embedding space, i.e., the space of trajectory snippets defined in a sliding window. If there are T timepoints in a full single-cell trajectory, then there are T-\({n}_{\tau }+1\) snippets of length \({n}_{\tau }\), each of which is a point in the embedding space. Together, these points trace out a trajectory in the embedding space with T-\({n}_{\tau }\) transitions between snippets (e.g., from the snippet consisting of frames 1–8 to the snippet consisting of frames 2–9). We used the dynamical information about snippet-to-snippet transitions to calculate dynamics in the morphodynamical cell state space. The average of all cell state trajectories passing through a local region in the landscape yields the cell state “flow”, which in a Markovian picture of a continuous stochastic process41,42, is proportional to the effective force “pushing” a cell from one morphodynamical state to another. These cell state force-fields are visualized in Fig. 3. In the snapshot landscape, cell state trajectories appear highly random with little systematic variation between treatments (Fig. 3, left column). In the trajectory embedding landscape, however, the cell state force-field displays treatment-specific convective flows (Fig. 3, center column). These flows indicate stabilization in the unique regions of density peaks between treatments, providing direct evidence for the paradigm of metastable attractors in a landscape of cell state. Individual single-cell trajectories in the embedded space can stay partly localized to these metastable cell states for extended time periods. Over the timescale of 10 or more hours, cell state changes reveal the transition pathways between metastable cell states (Fig. 3, single-cell trajectories shown as blue to green lines, image sequences in the right column). Trajectories that appear random when observed via single-timepoint snapshot features unfold43,44 and become systematic in the trajectory embedding space.

Fig. 3: Trajectory embedding enables the determination of metastable cell states and pathways across ligand treatments.
figure 3

Left and middle column: average displacement proportional to the effective force (orange arrows), and cell density (grayscale) for snapshot embedding (left: snapshot, trajectory snippet length = 1, right: trajectory snippet length = 8). Representative cell morphodynamical trajectories capturing cell state transitions (blue to green line with arrows showing the direction of motion in the embedding space) from t0 to tf determined by the available cell tracks. Right column: Cell images every hour along the extracted trajectory, with the tracked cell centered in the image frame and neighboring cells moving in and out of view, except in the IFNG+EGF images where the tracked cell is temporarily clipped at the edge of the microscope field of view.

The utility of a single-cell trajectory depends upon how well it characterizes cell state transitions and transition dynamics. We measured how systematic and predictable the trajectories are by quantifying the randomness of the trajectories in the embedding space. We defined the predictability of a trajectory as a locality ratio \(l=\sqrt{\left\langle {(x\left(t+30\min \right)-x\left(t\right))}^{2}\right\rangle }/\sqrt{\left\langle {(x-\left\langle x\right\rangle )}^{2}\right\rangle }\) between the root-mean-square (RMS) displacement in the embedding space after one timestep (30 min) and the standard deviation in the displacement over the full population. Angled brackets <···> indicate averages over all trajectories and timepoints \(t\). In a completely random trajectory, this ratio is 1 because the variance in a single timestep and the full population is identical. In a deterministic trajectory, all trajectories emanating from the same point are identical and have no variance after a single timestep–the only contribution to the ratio is the relative average displacement in the time interval. In a continuous, stochastic description of the trajectories in the embedded space41,42, this locality ratio is related to the effective diffusion rate. Figure 4a shows that this locality ratio systematically decreases with trajectory embedding length in contrast with the null model, indicating that trajectories are increasingly less random and more predictable with increasing trajectory embedding length.

Fig. 4: Trajectory embedding increases the predictability of cell trajectories.
figure 4

a Ratio between the single-step (dt = 30 min) and full RMS displacement in the trajectory embedding space as a function of the trajectory length, null model with randomly scrambled features within treatments (reds) and trajectory embedding (blues), and UMAP embeddings where d is the number of UMAP components. Three replicates are shown per embedding (sea green, turquoise, teal). b Average log-likelihood per trajectory step from the validation set cell trajectories, as a function of the trajectory length, averages for the trajectory embedding (blue dashed line) and for the null model (red dashed line), from UMAP d = 2 embeddings. Individual treatments (colors) for the null model (crosses) and for the trajectory embedding (diamonds).

To determine the capability of the morphodynamical embeddings to characterize single-cell morphodynamical trajectories, we first used a subset of the data to train a model of trajectory likelihood, then calculated the average log-likelihood of a held-out test set of trajectories. The average log-likelihood is a direct measure of the predictability of the cell trajectories45. Here we used a Markovian transition matrix likelihood model, trained by counting transitions between Voronoi states defined by k-means cluster centers on the landscape46,47. We utilized 100 k-means centers to discretize the morphodynamical embedding space, which provided sufficient Voronoi centers to capture the observed patterns of cell state flow between metastable cell states while still retaining adequate sampling of state-state transitions. The average log-likelihood increases as a function of morphodynamical trajectory embedding length and is higher than in a null model where cell features were randomly scrambled between treatments, shown in Fig. 4b.

We expect in general that greater trajectory embedding lengths will increase the descriptive capability of trajectory models, but only up to the point where the increase in information in the longer time-ordered trajectories is greater than the loss of information due to incomplete tracking of cells between frames. For snippet lengths longer than 10 frames, we did not observe an increase in ligand-specific trajectories relative to the null model (Fig. 2d), which likely is related to the decrease in the number of extracted trajectories longer than 8 frames (Supplementary Table 2). Thus we chose a trajectory embedding length of 8 (3.5 h) for further analysis. The decreased overlap between ligand-specific distributions (Fig. 2d), decreased locality ratio (Fig. 4a), and increased trajectory likelihood (Fig. 4b) indicate that even partial trajectory information over a timescale of a few hours substantially improved the representation of cell state.

Morphodynamical transitions precede cell cluster formation

Ligand treatments displayed characteristic transitions between cell states (Fig. 5a), indicating ligand-specific regulation of cell morphodynamics. In general, cell state distributions were more similar across ligand treatments at early times and became more condensed and distinct at later times, which is reflected in the time-dependent morphodynamical cell state distribution (Supplementary Fig. 2).

Fig. 5: Trajectory embedding resolves pathway of cell cluster formation via mesenchymal-like intermediate.
figure 5

a Cumulative distribution of all cells under all treatments (grayscale) with labeled fine-grained density peaks (A–S), and qualitative cell states (mauve dashed circles) with numeric labels (1–6), and cell state transition networks with arrow weight proportional to conditional transition probability, not overall transition flows, with transition probabilities <3% not drawn. b Representative trajectory snippets (embedding length = 8 = 3.5 h) extracted at density peak locations (A.–S.) with the treatment condition of the representative trajectory snippet, grouped by macrostate with morphologically descriptive macrostate labels (right of images).

Cell states in the embedding space were identified using quantitative methods and refined by visual inspection. We identified fine-grained metastable states spanning the cell state landscape from density peaks arising in individual treatments (Fig. 5a, b), which we divided into 6 states, or groupings, and assigned qualitatively descriptive labels based upon observed cell morphology. These labels are intended to be descriptive and aid interpretation. “Separated epithelial-like” cell states are rounder cells enriched in EGF condition, while “separated mesenchymal-like” cell states display more extended cytoskeletal features such as lamellopodia and are enriched in the TGFB condition. The “intermediate clusters” cell state represents cells that are mostly separated from tightly clustered cells. We further divided multicellular clustered cell states into “bound clusters” which display a thick border around attached cells that span multiple cells, “unbound clusters” which lack this border and whose outer cells have extended cytoskeletal features, and “budding” cells which consist of semi-round cells attached to the outer border of a cell cluster.

The direction of the cell state flow in the trajectory embedding space indicates that a group of cells coming together and forming an attached multicellular cluster proceeds via state 2, “separated mesenchymal-like” cells with extended cytoskeletal features. This flow was consistently observed in the cell state “force-fields” (Fig. 4), cell state transition networks (Fig. 5a), and the time-dependent cell state distributions (Supplementary Fig. 3), and in all of the ligand conditions.

Cell cluster formation was most strongly associated with OSM treatment37, consistent with Fig. 5a showing that in the OSM treatment the transition probability from the “intermediate cluster” state 3 into the multicellular clustered state was higher than the transition probability out of it. Separated cells (states 1 and 2) are separated from cell clusters (states 4 and 5) and cell cluster formation proceeds through states 2 → 3 → 4 via cells displaying extended cytoskeletal features and increased cell border contrast (Fig. 5b).

Discussion

In vivo, cells continually modulate their phenotypic state in response to changes in local microenvironmental cues38. For example, during development, cells must precisely control cell states48, proliferation, and cell motility, and the loss of cell state control is associated with various diseases, including cancer49. Dynamic cell behaviors can be observed via live-cell imaging but quantifying the dynamic relationships between diverse cellular phenotypes has been challenging. Our morphodynamical “trajectory embedding” provides a method to quantitate dynamic morphologic behaviors. Trajectory embedding leverages the unique capability of live-cell imaging to follow single cells in time and constructs a coordinate space based on time-aggregated “hyper features” better suited to study cell states and their dynamical relationships as compared to standard featurization based on single timepoints.

We observed that cells dynamically transition between morphodynamical cell states and that the transition frequencies are strongly modulated by ligand treatment. These dynamic cell state relationships can provide a framework for understanding cell-cell heterogeneity and heterogenous cell responses to perturbation. Live-cell trajectory embedding brings the cell state landscape paradigm from theoretical biology to direct application, where cell states and the transitions between them can be resolved, validated, and potentially leveraged for actionable control strategies14,50,51,52,53,54,55.

Our morphodynamical trajectory embedding procedure quantifies the space of morphological trajectories directly, leading to an improved description of dynamical cell state changes compared to using only morphological snapshots. One limitation of our analysis is that we did not explicitly consider cell cycle stage, instead, these processes remain implicitly described via morphological features. We envision that future studies could extend our framework by directly identifying cell cycle phase. Such an extension would provide insight into the coupling of cell cycling to changes in motility and morphology and the heritability of morphodynamical state from parent to daughter cell. We chose a broad cell feature set, but many other approaches to define cell features have been developed, including novel cell shape descriptors56,57 and machine learning-based approaches58. Trajectory embedding, in principle, has the capability to map dynamical information from any reasonable featurization towards a more complete description of cell state.

The morphodynamical trajectory embedding method can be applied to any live-cell imaging modality where cells can be characterized and tracked through time, and in particular, we expect that this method will be especially powerful for analysis of live-cell approaches that incorporate genetically-encoded, fluorescently-labeled reporters, both in vitro59,60,61 and in vivo62,63,64. Practical limitations will always come into play, nevertheless. For example, we applied our method to unlabeled phase-contrast imaging of cell cultures that approach confluence65,66, without paired ground truth labeled images67. Trajectory data quantity and quality will generally pose a constraint on trajectory length used in the morphodynamical trajectory embedding analysis. Even when many very long trajectories are available, analysis based upon shorter trajectory lengths might more directly capture processes and relationships of specific interest, such as the connection between specific morphological features and cell motility. The approach outlined here could be used as the foundation for analyses designed to identify the optimal trajectory lengths.

Biological interpretation and validation of the morphodynamical cell states extracted here will be important, and our findings help to motivate specific hypotheses that could be explored in future studies. Identification of the molecular programs associated with particular cell states and cell state transitions would provide insight into how these processes are mediated in normal tissues and how they may go awry in diseases. Importantly, trajectory embedding analysis enables quantification of cell state transitions and may therefore be useful for gaining insights into disease progression and therapeutic resistance68,69,70,71,72,73,74. Our observation that cell cluster formation is preceded by a mesenchymal-like shift in cell state aligns with the maturation of transverse arc stress fibers as a precursor to stable cell-cell junctions observed by Rajakylä et al.75 but live-cell imaging coupled with deeper molecular profiling data—such as multiplexed imaging76,77,78,79 and single-cell transcriptomics80,81—are needed in order to gain a more comprehensive understanding of the underlying biological mechanism beyond the information obtained from live-cell imaging alone. Manifold-based or mutual-information approaches have had some success with single-cell data integration82,83,84 and may enable integration of live-cell imaging trajectory embeddings with molecularly resolved data, a critical data analysis goal needed to provide insight into the biological relevance of morphodynamical cell states. With the trajectory embedding method we present here, it is now possible to study the emergence of metastable attractors and the regulation of dynamic cell state changes, via live-cell trajectories.

Methods

Live-cell imaging of MCF10A cells

Data used in this study were recently described by Gross et al.37 In brief, MCF10A cells were plated at 75,000 cells/well on collagen-coated 8-well plates. After an 8 h attachment period in growth media containing EGF and insulin and a 12 h period in media lacking EGF and insulin, cells were treated with 7 different ligand treatments (PBS, EGF 10 ng/ml, HGF 40 ng/ml, OSM 10 ng/ml, BMP2 20 ng/ml + EGF 10 ng/ml, IFNG 20 ng/ml + EGF 10 ng/ml, TGFB 10 ng/ml + EGF 10 ng/ml). Wells were imaged every 30 min for 48 h via bright-field phase contrast with an Incucyte microscope (1020 × 1280, 1.49 \(\mu m\)/pixel), with the initial frame coinciding with the addition of the ligands and fresh imaging media. 6 image stacks were collected for each ligand treatment and imaged simultaneously, see Gross et al.37 for a discussion of batch effects and experimental replicates. Experimental protocols can be found in detail at the publicly available Synapse database85.

Image preprocessing

Foreground (cells) and background pixel classification was performed using manually trained random forest classifiers using the ilastik v1.3.3 software86. Images were z-normalized (mean subtracted and normalized by standard deviation) and background pixel values were set to a value of 0. In cell images, these z-normalized pixel values are shown from red to blue (positive to negative).

Cell segmentation

Single-cells were segmented from the preprocessed images using the cytoplasm model of the Cellpose87 v0.6.5 software, a deep learning approach trained broadly across cell types and imaging modalities. The Cellpose algorithm requires estimating the size of the cells before segmentation; due to the variability in sizes and shapes of the MCF10A cells, segmentation was performed iteratively over multiple rounds allowing the Cellpose algorithm to determine a new cell size at each round until no more new cells were found (pixels of previously segmented cells set at each round to the background value of 0). Image preprocessing and segmentation scripts can be found on the github repository, see data and code availability. The unlabeled, bright-field phase-contrast imaging used here leads to image analysis challenges, particularly for cell segmentation. It is difficult to judge the quality of many extracted cell segmentations when local cell density is high, see Supplementary Table 1 and Supplementary Fig. 1 for manual validation.

Cell featurization

Three classes of features were used to characterize individual cells: (i) texture features, (ii) shape features, and (iii) features characterizing adjacent cells. As preliminary steps, segmented cells were extracted, and mask-centered into zero-padded equal sized arrays larger than the linear dimension of the biggest cell (in each treatment); then the long axis was defined by the non-mass-weighted moment of inertia of the cell mask and aligned along a reference axis. (i) Two types of internal cell features were calculated. Zernike moments (49 features) were used to characterize the overall spatial phase contrast signal and Haralick texture features (13 features) were used to characterize the phase contrast texture; these were calculated in the Mahotas88 image analysis package. The sum average Haralick texture feature was discarded due to normalization concerns. (ii) Shape features (15 features) were calculated as the absolute value of the frequency coefficients of the Fourier transform of the distance to the boundary as a function of the radial angle around cell center89, with the sum of shape features normalized to 1. (iii) The cell environment was featurized in a similar fashion, where an indicator function with value 0 if the cell boundary was in contact with the background mask (no neighboring cell), and value 1 if in contact with the cell foreground mask. The absolute values of the Fourier transform coefficient of this indicator as a function of radial angle around the cell-center then featurized the local cell environment (15 features), with the sum of cell environment features normalized to 1. Note the first component of the cell environment features is practically the fraction of the cell boundary in cell-cell contact. Additional information regarding the cell featurization can be found in Supplementary Fig. 4A.

After computing the raw features as described, the high-dimensional cell feature space was dimensionally reduced using principal component analysis (PCA), retaining the largest 3 eigen-components of the feature covariance matrix (spanning all treatments and image stacks) which captured >90% of the variability. More PCA components can be retained with only a small computational cost in the dimensionality reduction step (here UMAP) which typically scales linearly with the data vector length. Additional information regarding the cell feature PCA reduction can be found in Supplementary Fig. 4B.

Cell tracking

Segmented cells were tracked between live-cell imaging frames to extract the set of single-cell trajectories. Image stacks were first registered translationally without allowing rotational or general affine transformation using the pystackreg implementation of subpixel registration90. Cell centers were recorded as the equally weighted center of mass of the single-cell masks. Cells were tracked between frames by first separating each contiguous cluster of cells as defined by connected sets of the foreground/background cell mask. If cell clusters occupied less than 10,000 pixels (typical cell very roughly 30 × 30 pixels), cells were simply tracked by minimum distance with a cutoff of 45 pixels. Tracking cells by minimum distance refers to linking a cell at frame t to the cell at frame t + 1 which has the minimum distance between cell centers. For larger cell clusters, clusters were first tracked by minimum distance with a cutoff of 300 pixels. Tracked cell clusters were each individually registered rotationally and translationally (again using pystackreg), and individual cells in the clusters were tracked between frames by maximum overlap with a cutoff of 10 pixels. The ligand panel yields differential impacts on cell proliferation, with a typical cell division time of ~12–20 h. In a correctly tracked cell division event, a parent cell will be linked to two daughter cells, which in turn will lead to two separate trajectories sharing a common initial history prior to mitosis. In our analysis, two or more cells may have the same parent cell as the closest cell in the previous frame. This can be the result of a cell division event, or can be the result of a missed track or missed cell segmentation. We do not separate cell division events into their own category, but simply use the cell tracks, or linkages between frames, to identify the unique cell trajectory history. Where trajectories are split as in a cell division, trajectories are not truncated and begun anew, but rather each daughter is tracked backward in time leading to two trajectories which are treated separately, see cell image sequences from HGF and TGFB + EGF conditions in Fig. 3 for examples. That is, daughter cells will have overlapping or shared history from the parent cell.

Morphodynamical trajectory embedding

High-dimensional time-sequences of features are used as the input to dimensionality reduction algorithms as the basis for morphodynamical cell state analysis. Extracted cell linkages were constructed from available cell tracks. These cell linkages are not complete, that is incomplete segmentation/tracking means that some cells cannot be traced all the way back to an initially plated progenitor cell. The available cell history is the unique backwards trace of any extracted single-cell through time. Of the total 476,855 cells extracted here; 137,845 could be traced back only one step, while 36,919 could be traced back 10 steps, see Supplementary Table 2. We consider the set of all cells (from possibly different experimental time points) with cell history equal to or longer than a given length, the trajectory snippet set, see Supplementary Fig. 5 for a graphical description. Note there is a large amount of duplication in this sliding window division of available cell trajectories. Cell trajectories that are longer than the trajectory snippet length used in the trajectory embedding allow for the determination of cell states and pathways. From the linkages, all cell trajectory snippets of length \({n}_{\tau }\) (all possible cell histories with length \({n}_{\tau }\)) were extracted in a sliding window manner. The number of available trajectory snippets for each treatment is shown in Supplementary Table 2. The PCs of each cell in the trajectory snippet were then concatenated together (e.g. for 2 PCs \(\{{\mathop{X}\limits^{ \rightharpoonup}}{\,\!}^{{n}_{\tau }}\}=\{{PC}{A}_{1}({t}_{0}),{PC}{A}_{2}({t}_{0}),\ldots ,{PC}{A}_{1}({t}_{1}),{PC}{A}_{2}({t}_{1}),\ldots ,{PC}{A}_{1}({t}_{{n}_{\tau }}),{PC}{A}_{2}({t}_{{n}_{\tau }})\}\) to form the trajectory snippet supervector, which we define as the morphodynamical feature trajectory of length \({n}_{\tau }\) (with N features, the feature trajectory for each cell is N × \({n}_{\tau }\)). These morphodynamical feature trajectories (spanning all image stacks and all treatments) are flattened into vectors and then embedded using UMAP91 into a space of dimension d. Changing UMAP embedding hyperparameters will alter fine details regarding the trajectory embedding landscape, but we find our overall results to be robust with very little change in the measured overlap between ligand populations or trajectory likelihood as shown in Supplementary Fig. 6. The trajectory embedding analysis allows for the robust and systematic characterization of cell state trajectories even in this challenging data analysis regime with many missing and partially segmented cells.

Overlap coefficient

To compare the similarity of two probability distributions over a shared space, we use the overlap coefficient92 defined by the sum of the minimum value of two probability distributions. The overlap is 0 for completely distinct non-overlapping distributions, and 1 for identical distributions.

Stochastic dynamics: locality ratio, cell state force-fields

We employ several measures motivated by standard concepts of stochastic physics. A measure of the randomness of motion in stochastic dynamics is the effective diffusion rate defined at a timescale \(\tau\) by \(D\equiv < {\triangle x}^{2}{ > }_{\tau }/\tau\) with here \(\tau\) the time between frames of 30 min. We characterize how random trajectories are by the ratio of the single-step RMS (root-mean-square) displacements to the total RMS displacement, a locality ratio \(l=\sqrt{\left\langle {(x\left(t+30\min \right)-x\left(t\right))}^{2}\right\rangle }/\sqrt{\left\langle {(x-\left\langle x\right\rangle )}^{2}\right\rangle }\) where the mean-squared displacements are summed over dimension and the angle brackets \(\left\langle \ldots \right\rangle\) indicate averages over all trajectories and time points. For completely random trajectories, this ratio is 1, and tends to the relative average displacement for a continuous, deterministic dynamic. If cells are obeying stochastic Markovian dynamics characterized by a diffusion equation, then the average displacement \(\left\langle \triangle \mathop{x}\limits^{ \rightharpoonup }\right\rangle\) is proportional to the effective force \(\mathop{F}\limits^{ \rightharpoonup }(\mathop{x}\limits^{ \rightharpoonup })\) via \(\left\langle \triangle \mathop{x}\limits^{ \rightharpoonup }\right\rangle \,={\gamma }^{-1}\mathop{F}\limits^{ \rightharpoonup }(\mathop{x}\limits^{ \rightharpoonup })\triangle t\) with \(\gamma\) the friction.

Cell state clustering and prediction

The cell state dynamics were characterized by building a fine-grained discretized transition matrix model in the continuous morphodynamical trajectory embedding space. The embedded space was binned using k-means clustering with \(k=100\) clusters. In the discrete space, a transition matrix between bins was estimated from the observed transition counts \({C}_{{ij}}\) from a microbin i to microbin j as \({T}_{{ij}}={C}_{{ij}}/{C}_{i}\) and \({C}_{i}=\mathop{\sum}\limits_{j}{C}_{{ij}}\). This transition matrix is commonly referred to as a “Markov Model” broadly used in the analysis of molecular systems46. Note that this transition matrix does not share a steady-state distribution with the cell populations, as cell birth and death states are not included. This transition matrix was used as a (highly simplified) model of the single-step trajectory likelihood.

Trajectory likelihood

To assess the quality of the description of the cell state dynamics, we adopted a self-consistent measure of how likely a test set of single-cell trajectories were within a transition matrix model trained from a separate training set of trajectories. Data were split into a training set (5/6 images stacks per treatment) and a validation set (1/6 image stacks per treatment). The training set was used to train a transition matrix likelihood model, and the average log-likelihood per trajectory step was calculated from the test set trajectories using the transition matrix as \( < L > =\frac{1}{{N}_{2}}\mathop{\sum}\limits_{x0=i,x1=j}{{\log }}({T}_{{ij}})\) with \({N}_{2}\) the set of all 2-step trajectories in the validation set (initial point \(x0\) mapped to bin \(i\) and next point \(x1\) mapped to bin \(j\).

Cell metastable state extraction and grouping

To capture the broad relationships defined in the continuous trajectory embedding space we defined a small set of discrete cell “states”. We first picked out all metastable locations on the landscape. We identified these metastable locations by local maxima in the population density, see Fig. 5 and Supplementary Fig. 3. These fine-grained metastable cell states spanning the trajectory embedding landscape (19 total) were picked via density peaks in the individual treatments. To group these fine-grained cell states into broader state groupings, we first utilized an unsupervised kinetic clustering approach93 which separated three major metastable basins consistent with the regions of the landscape enriched in the EGF, TGFB + EGF, and OSM conditions (respectively lower right, lower left, and upper regions, Fig. 2B, C). We then manually refined these regions to distinguish between clustered cell states (states 4, 5, and 6) differentially occupied between ligand treatment. These 6 qualitative cell states were used to define cell state transition networks (transition matrix, see Cell state clustering and prediction and Fig. 5a). Cells were mapped into these states by first finding the closest fine-grained metastable state in the embedding space, and then assigning the state label accordingly. The morphodynamical trajectory embedding space indicates a continuum of cell states, thus intermediate metastable states as indicated by density peaks which are nearest-neighbors but assigned to different states, such as E and F assigned to the epithelial-like state and G and H assigned to the mesenchymal-like state, may be very similar and not strictly distinct. Cell state names are descriptive for ease of interpretation but not based upon validated biological interpretation.

Statistics and reproducibility

Overlap and locality ratio results were validated by calculating over 3 replicates of the data, split into 3 groups composed of 2 image stacks for each treatment. Means over the replicates, and the individual replicate data points are plotted to allow visual estimation of the robustness of the analysis. Error bars and 95% confidence intervals are estimated from the data splits by Bayesian bootstrapping94.