Unlabelled landmark matching via Bayesian data selection, and application to cell matching across imaging modalities

We consider the problem of landmark matching between two unlabelled point sets, in particular where the number of points in each cloud may differ, and where points in each cloud may not have a corresponding match. We invoke a Bayesian framework to identify the transformation of coordinates that maps one cloud to the other, alongside correspondence of the points. This problem necessitates a novel methodology for Bayesian data selection, simultaneous inference of model parameters, and selection of the data which leads to the best fit of the model to the majority of the data. We apply this to a problem in developmental biology where the landmarks correspond to segmented cell centres, where potential death or division of cells can lead to discrepancies between the point-sets from each image. We validate the efficacy of our approach using in silico tests and a microinjected fluorescent marker experiment. Subsequently we apply our approach to the matching of cells between real time imaging and immunostaining experiments, facilitating the combination of single-cell data between imaging modalities. Furthermore our approach to Bayesian data selection is broadly applicable across data science, and has the potential to change the way we think about fitting models to data.


Introduction
Understanding early mammalian development is key to the advancement of in vitro fertilisation (IVF) techniques and improved understanding of early cell specification within mammals.Within developmental biology there have been significant advances in experimental techniques, including the ability to culture preimplantation embryos ex vivo and monitor their development through periodic 3D imaging, known as real time imaging (RTI) [1,2].In conjunction with the generation of mouse reporter lines, such as the H2b:GFP line, we are able to visualise the development of the murine embryo and monitor the behaviour of individual cells [3].One of the highly disputed questions regarding the development of the preimplantation embryo, is the effect of cell history and changes in embryo architecture on cell lineage specification [4][5][6].
After RTI experiments, embryos can be fixed to halt their development and stained for proteins of interest via immunostaining.The cells' respective fates can then be inferred from their protein expression profiles.In order to interrogate the relationship between cell history and cell specification it is crucial to link historical information (gained from RTI experiments) with protein expression (quantified from immunostaining) at the single cell level.However, the cell-to-cell matching across these two imaging modalities is non-trivial due to the random re-orientation of the embryo during staining and potential deformation during the fixation process.Coordinates of cell centres can be extracted from the final frame of the RTI experiment, using the H2b:GFP signal, and from the immunostained image, using the nuclear stain, allowing us to frame this problem as an unlabelled landmark matching problem where the landmarks are cell positions in three dimensions.
It is most common for landmark-based registration to be approached using variational techniques [7,8], although some probabilistic approaches have also been developed [9,10].A commonly used approach to match landmarks subject to some non-rigid transformation is the large deformation diffeomorphic metric matching (LDDMM) method.This method uses a curve to describe diffeomorphic mapping of individual landmarks between the target and template point clouds [11,12].A Bayesian approach of shape matching via a non-linear deformation is presented in [13], where the geodesic map which takes one shape to the other is inferred.
In this work we invoke the Bayesian framework in order to be able to not only find likely cell matchings, but also to quantify the uncertainty in those matchings.Our biological example has an additional difficulties, since the landmarks are unlabelled, and the assumption that all landmarks exist in both point-sets does not hold.This discrepancy in landmarks can occur due to cell death or division between the time that the RTI experiment was stopped and fixation, or due to inaccurate segmentation of cell centres.One approach would be to manually clean the data and select only cells with guaranteed matches in the corresponding image, however this is highly subjective with potential for significant errors as we do not know a priori which cells to eliminate from the registration.
There has been some work on data selection with regards to single and multi-source data acquisition [14], however there is little Bayesian description of data selection, although there are similarities with Bayesian model selection [15] and outlier detection [16].In this work, we introduce a novel approach to Bayesian data selection, which limits the effect that cells which do not appear in both images have on the inference.This is implemented through the introduction of parameters which describe our belief in the fidelity of each observation in the data.The values of these fidelity parameters are jointly inferred alongside the model parameters describing the transformation and correspondence of the landmarks.
We implement Markov chain Monte Carlo (MCMC) methods to explore the complex distribution on the model and fidelity parameters.The posterior is frequently highly multimodal, preventing complete exploration of the parameter state space due to 'trapping' in local minima.We therefore implement tempering of the likelihood, alongside adaptive MCMC [17] to optimise our sampling and minimise trapping.
Although the introduction of data selection is primarily introduced to facilitate landmark registration within our specific biological example, it is clear that this framework could be expanded to a very broad range of inferential problems, with potential for wide-ranging impact in many applications of data science.
In Section 2 we introduce the transformation model, including the description of a 3D affine transformation and a non-linear deformation, and a method of describing landmark correspondence within the model.In Section 3 we describe the construction of the posterior distribution that we wish to characterise.In Section 4 we introduce the concept of Bayesian data selection and its implementation within the landmark registration problem.We then go on to describe the MCMC implementation in Section 5.In Section 6 we firstly present several in silico test problems demonstrating the efficacy of the approach.We then perform inference on embryos with microinjected fluorescent cells which enable us to identify a subset of the cells in both images for validation on a real data set.We then demonstrate the accuracy of our approach on a problem in which we wish to match cells from the final frame of an RTI experiment with corresponding immunostained images, with the additional challenge of embryo matching.We conclude with a discussion in Section 7.

Landmark matching
In order to better understand mammalian development, spatiotemporal information from RTI experiments must be linked with protein expression which is inferred from secondary immunostaining images.In order to link this data, cell centres are extracted from the final frame of the RTI study and the immunostained image and matched, Figure 1.Previously, attempts have been made to manually match the cells between images, however this is non-trivial due to the manipulation of the embryos during staining and can lead to low confidence matchings of cells between images.immunostaining of stained embryos matching via model We can generalise this problem to the matching of two unlabelled point clouds: where d ∈ N is the dimension of the observation space, and the number of points in Y 1 and Y 2 is n 1 , n 2 respectively where we assume n 1 ≤ n 2 .In the context of cell matching, potential differences in n 1 and n 2 may arise from cell death or division after the completion of the RTI prior to fixation of the embryos, or due to segmentation errors.Y 1 and Y 2 are pre-processed such that the average coordinate in each cloud is shifted to (0, 0, 0), and re-scaled through division by the minimum cell-to-cell Euclidean distance.
The two point clouds can be considered to be noisily transformed versions of each other, with labels subject to a random permutation, along with the potential addition or subtraction of points in both clouds.The transformation of Y 2 to Y 1 can be described by the composition of an affine transformation, a non-linear deformation, and a permutation of labels, described by an observation operator G(θ; Y 2 ) with parameters θ.

Non-linear deformations via geodesic motion
Deformation can occur due to continued growth of the embryo prior to fixation, or manipulation of the embryo during immunostaining, necessitating an explicit description of a non-linear deformation within G.The non-linear deformation to the point-set is modelled as a geodesic transformation resulting from the application of an initial momentum, p i ∈ R d to each point q i ∈ R d [18,19].The deformed points, φ(θ; Y 2 ) = q(1), are then obtained through the ODE given by: over the time interval t = [0, 1], where q(0) = Y 2 , details given in Section S1.5.
The application of the geodesic flow is computationally expensive due to the solving of 3n differential equations.We envisage that for smaller embryos, deformation is minimal, in which case we set φ to be the identity.However for larger embryos, it may not be possible to accurately match cells without the addition of inference of a non-linear transformation between Y 1 and Y 2 .

Affine transformation
G also incorporates a three dimensional affine transformation to account for shear scaling, rotation and translation of points.The transformation is applied to φ(θ; Y 2 ) to give where 1 n 2 ∈ R n 2 is a column vector of ones, A(θ) is the transformation matrix and b(θ) the translation vector, see Section S1.6.

Permutation of labels
Our overall aim is to find the labelling of points in order to match cells across images.We introduce a permutation vector as a method of describing the matching of cells across Y 1 and Y 2 .The permutation vector P ∈ N n 2 describes the ordering of cells in Y 2 in order to match them with cells in Y 1 .Note that in the case that n 1 < n 2 , the cell numbers in the n 2 − n 1 last entries of the permutation vector are assumed not to have a corresponding match in Y 1 , and as such are not required for the calculation of the likelihood.
Our aim is to compare the positions of points in Y 1 with their corresponding matches, as given by P , in the transformed cell centres in Y 2 .As such, we define the matrix M P ∈ R

The observation operator
We define our observation operator G : Θ → R 3×n 1 , which takes the cell center coordinates of Y 2 , applies a non-linear transformation (if being applied), an affine transformation, and then reorders the subset of the cells which we aim to match to Y 1 according to the permutation vector P .Therefore we arrive at

Bayesian cell matching
Bayes' theorem is a fundamental property of sets and measures that forms the basis of a probabilistic framework for inverse problems, involving the combination of prior knowledge, observations, and models.Our aim is to characterise the posterior probability density π(θ|Y ) on the model parameters θ conditioned on the data Y , which by Bayes' theorem is given by: where L(Y |θ) is the likelihood of the observations given θ, and π 0 (θ) is the prior density.
The matching of cells between images can be considered as an inverse problem where we wish to identify a transformation of Y 2 in order to identify the correct matching, P , of the cells.The inverse problem of cell matching is complex with potentially correlated parameters across the components of the model, leading to potentially multimodal posterior distributions.

The likelihood
We assume that the observations of the cell centers are subject to mean-zero i.i.d.Gaussian noise, such that: where Y 1 i is the ith column of Y 1 , and G i (θ) is the ith column of G(θ).Therefore the likelihood is given by: where x is the covariance-weighted norm.

Priors
We choose mean zero priors on the affine transformation parameters and deformation momenta as shown in Table 1, and a uniform prior on label permutations.

Parameter
Prior Parameters

Hierarchical Bayes posterior
The noise covariance Σ is unknown a priori and so we use a hierarchical Bayes approach to infer its value alongside the model parameters.We choose the Inverse-Wishart distribution as a prior on Σ which is conjugate to the Gaussian likelihood, enabling marginalisation of Σ [20,21].This distribution has two parameters, the degrees of freedom ν > d−1, and the positive definite symmetric scale matrix Ψ ∈ R d×d .The Inverse-Wishart distribution has a mean given by when ν > d + 1, and variance of the diagonal terms given by when ν > d + 3. We choose ν and Ψ ∝ I 3 to achieve E(Σ) = 0.01I 3 and Var(Σ i,i ) = 0.2 2 , giving us ν = 6.0050 and Ψ = 0.0201I 3 .
The posterior given by: can be marginalised with respect to Σ to give the target density: 4 Bayesian Data Selection The observation operator G describes the transformation and permutation of points in Y 2 to match Y 1 , but assumes that all cells in Y 1 have a corresponding match in G(θ; Y 2 ).This assumption does not always hold, since cells can divide or undergo apoptosis in between the RTI experiment and fixation, or may be too faint for accurate segmentation, resulting in the presence of cells within one or both of the point clouds with no corresponding match.We cannot know which cells do not have a match a priori, and therefore we aim to infer this information, thereby conducting what we will refer to as Bayesian data selection.This refers to any approach where additional parameters are introduced into the inference which dictate the sensitivity of the posterior to a given observation, where the values of these parameters are themselves inferred from data, jointly with the model parameters.

Data fidelity
We aim to infer the value of parameters that represent our belief in the fidelity of each observation along with model parameters.We introduce fidelity parameters γ i ∈ [0, 1] for each observation (in our case a cell center from Y 1 ), that controls the relative contribution of that observation to the likelihood.These γ i are effectively inverse annealing temperatures for each observation, with high temperatures (where γ i 1) resulting in a likelihood which is not sensitive to the data-model mismatch for this observation.This approach limits the effect on the posterior of spurious data, for instance in the case where a cell has no corresponding match.
The likelihood function is ordinarily a function of the mismatch between each observation Y i , and the observation operator at a given value of the model parameters G i (θ), such that In ordinary Bayesian inference the likelihood is sensitive to all of the data-model mismatches Y i − G i (θ), which causes issues when the data is corrupted, or where the model does not adequately describe the entirety of the data.We now aim to infer which of these data can be well-matched to the model through a likelihood which takes into account the fidelity of each observation, given by: In the context of Bayesian unlabelled landmark matching, for each point in Y 1 the fidelity parameter represents our belief that this cell has a match in Y 2 .A value of γ i = 0 corresponds to a likelihood which is independent of the data-model mismatch of the i th observation, and γ i = 1 corresponds to a likelihood which is dependent on the i th cell's mismatch as in equation (12).
The inclusion of the fidelity parameters works to prevent the fitting of a model to an entire set of points for which a subset may not be adequately described by that model.Without appropriate data selection in the landmark matching problem, there are no guarantees that the transformation and permutation that leads to the lowest overall least squares fit corresponds with the correct matching.

Application to cell matching
We choose a beta prior B(α, β) on each γ i , with α = 2 and β = 2 such that E(γ i ) = 0.5 and Var(γ i ) = 0.05 and consider the data fidelity posterior distribution density which arises from multiplying each data-model mismatch by its fidelity parameter γ i : where as shown in Section S1.7.

MCMC methodology
The posterior distribution is a highly complex multimodal distribution on a high dimensional space, involving a mixture of continuous and discrete variables.In order to generate samples from the posterior distribution, we implement MCMC, a commonly used approach to sample from complex probability distributions.As our model and data selection approach is inherently modular (transformation, permutation and fidelity modules), we use a Metropolis-within-Gibbs approach [22], which enables us to tune the proposal variances adaptively for each of the modules involving continuous random variables.

Multimodality and tempering
We assume that the parameter state space is dominated by a single best-fit mode, corresponding to the correct matching of points.However, the state-space is likely to be multimodal and difficult to sample from due to its complexity and the level of correlation between components of the model.
To facilitate better exploration of the parameter space and avoid trapping in local minima, we implement likelihood-tempering, as described in [23].During early iterations improved mixing is promoted through a high temperature T , with adapted Metropolis-Hastings formula given by: where Φ is the negative log of the posterior predictive, and Q(•, •) is the proposal kernel.The temperature T > 0 is gradually reduced via an exponential cooling schedule.Selection of the start temperature T 0 and the cooling rate of the system is crucial to the successful and efficient identification of the dominant mode, details given in S1.8.Once T = 1, the temperature is fixed and subsequent samples from the posterior recorded.

Proposals on continuous random variables
A subset of the parameters we wish to infer are defined on bounded state spaces.In order to facilitate efficient proposals on these parameters we transform them onto an unbounded domain.
The proposals are then mapped back to the bounded parameters, and the adjustment to the proposal density included in the calculation of the acceptance ratio, details given in S1.9.
Random walk proposals are used for the continuous random variables, given by where β is the step-size of the proposed move, and tuned so that we achieved the optimum 23.4% acceptance rate within each Gibbs module [24].β is also adjusted during temperature reductions by a factor (1 − √ t c ), where t c is the cooling rate of the tempering schedule, to account for changes in the target distribution density.β is initialised with a value 2.38 2 np where n p is the number of parameters being sampled on within that Gibbs iteration.
The proposal covariance C is initialised as a diagonal matrix of the prior variances.However, to facilitate better sampling on θ at T = 1, we implement adaptive MCMC where we use the sample covariance matrix of accepted samples to construct efficient proposals on the model parameters, details given in S1.10.

Proposals on the permutation vector
MCMC techniques are predominantly designed for continuous problems, rather than for discrete problems such as permutation sampling [25].In order to explore different permutation vectors, we use a proposal distribution that is uniform on a set of permutations which are one switch of labels different from the current state.When at an initial permutation vector P we propose the swapping of two cell labels i = j to generate P .This proposal is uninformed and symmetric about P , therefore giving the same acceptance probability as a standard random-walk on continuous random variables.

Interpretation of results
To interpret the results of our sampling on the permutation vector, we record the number of times each cell in Y 1 matches with each cell in Y 2 during sampling at T = 1.The number of matches is recorded using a matrix M ∈ R n 1 ×n 2 .The matrix is then normalised so that the entries represent the proportion of samples in each matching, which can be visualised using probability heatmaps.
In order to calculate the most likely matching (MLM) of the cells in Y 1 , we solve the linear assignment problem where A i,j = 1 − M i,j , using the matchpairs MATLAB function [26].From this we can describe the MLM of a given chain, and then compare this to the ground truth permutation vector for the in silico tests.In tests using real data where the true matching is unknown, this MLM would be representative of the inferred matching of points for subsequent analyses.
To assess the accuracy of the spatial matching of the points, we evaluate and stored thinned samples of the cell-to-match distances given by during sampling at T = 1.These values are then used to evaluate the median and root-meansquared-error (RMSE) of the cell-to-match distances for each chain, thereby giving an indication of the spatial quality of the matchings.
To allow us to visualise the quality of the MLM, and compare fidelity parameters of cells, we also calculate the MAP estimates on the transformation and fidelity parameters, conditioned on the MLM, details given in S1.11.

Results
We first constructed several in silico tests which were designed to mimic real cell matching problems.
The in silico test problems used real cell centre coordinates segmented from images of fixed embryos for Y 2 .We chose to use embryos from four key stages within the mammalian preimplantation period with; 8, 15, 33 and 62 cells respectively, see S1.1-S1.3 for details.Y 1 was then generated by applying the observation operator with known values of the model parameters to Y 2 , and adding i.i.d.mean zero Gaussian noise.The permutation was chosen to be the identity to make it simpler to visualise a correct matching.
All test problems were evaluated through 8 independent Markov chains, on a machine with specification outlined in S1.12.Initial positions of chains were randomly chosen as draws from the parameter priors, and a random initial permutation vector chosen.A minimum of 7 × 10 6 tempered samples were performed (unless stated otherwise) and a further 10 6 samples at T = 1, where thinned chains were used to characterise the posterior.The average acceptance ratio ᾱ was evaluated every 2000 iterations, and the proposal variances adjusted accordingly to ensure efficient sampling.

In silico cell matching
For the first test, we generated problems where a known random affine transformation was applied to the original Y 2 coordinates in order to generate Y 1 , parameters given in Table S1.Additive noise of the form N (0, 0.01 2 I 3 ) was then added to each point.
We performed sampling on the affine transformation parameters, the permutation vector and fidelity parameters but disregarded non-linear deformation.All chains for the 8-, 15-, and 33-cell tests converged to a posterior distribution highly concentrated on the correct matching of points as can be seen in the example permutation probability heatmap in Figure 2a.For the 62-cell test, 7/8 chains also appeared to converge to posterior distributions highly concentrated on the correct matching of points.The outlier chain, which likely got trapped in a local minima, had an MLM with 60 incorrect matches.The incorrect matching was clearly identifiable through the median cellto-match distances (1.077 versus an average of 0.0189 for successful chains).This indicated that harder problems with more cells may require slower tempering due to a more complex state space.ᾱP was close to zero during sampling at T = 1 for all successful chains, due to the chains likely being within the mode containing the global minimum whereby any proposed move in the permutation vector was unlikely to be accepted.

Data selection in presence of non-corresponding cells
The assumption that every cell in Y 1 and Y 2 has a corresponding match does not always hold, as discussed in Section 4, motivating the introduction of fidelity parameters to facilitate the selection of data within the point sets.If there is sufficient evidence that a match can not be described by the current model, the fidelity parameter posterior will have small mean, dramatically reducing the impact of that observation on the likelihood.
To investigate the effectiveness of Bayesian data selection in an in silico setting, we simulated two test problems based on the 33-and 62-cell embryos.As before, we applied a random affine transformation, parameter values given in Table S1, and added noise of the form N (0, 0.  without corresponding matches.We first generated two problems where n r = 3, and 6 for the 33and 62-cell data sets respectively.We chose to model these two stages as cells divide asynchronously at this stage in development, making the presence of points without associated matches more likely.
For now we neglect the non-linear deformation.
All chains for the 33-cell tests converged to distributions which were highly concentrated on the correct permutation vector, with reductions in the final n r fidelity parameter posterior distributions, Figure 5a.The MLM identified was the expected permutation vector for the first 1 → (n 1 − n r ) cells in Y 1 , with the final n r cells matching to cells 1 → n r in Y 2 as expected.
We then compared these results with examples where we did not include data selection.All 8 chains in the 33-cell example were concentrated about an MLM with 2 incorrect matches, Figure 5b.
We compared the median and RMSE cell-to-match distances with and without data selection, see Tables 2, S2 and S3.It was evident that at a small cost to the RMSE, we were able to reduce the median cell-to-match distance, thereby facilitating a better, more accurate matching for the majority of cells with definitive matches, as can be seen in Figure 5c-d Larger problems with more densely packed points could result in an increased number of incorrect matchings, as we found in the 62-cell example with n r = 6.There were between 8 and 50 incorrect matches in the MLM when data selection was not included, and the distribution appeared less  concentrated on the correct permutation vector in all chains, Figure S2c and Tables S2 and S3.
This variability in the number of errors is indicative of a posterior that is much more difficult to explore, leading to local trapping.When we did include data selection, we were able to retrieve an MLM equal to the true matching in all chains.In this instance, Bayesian data selection helped us not only identify suitable data to be registered between Y 1 and Y 2 , but also to smooth the posterior making it easier to explore.
In the 62-cell test problem we observed an increase in the RMSE of cell-to-match distances when data selection was included, but improvement in the median cell-to-match distance, indicative of an improved matching of the majority of cells, see tables S2 and S3.We also noted that the cells with reduced posterior means of their fidelity parameters (the final n r cells in Y 1 ) displayed non-committal matching; matches with low fidelity could be freely exchanged, Figure S2a-b.We conducted a test with larger values for n r with even more stark differences in success, see S1.13.

Non-linear deformations
We next sought to incorporate non-linear deformation within the data.We generated a test problem based on the 33-cell data set where we assigned non-zero momenta, drawn from the prior, to 18 points where x < 0, and deformed explicitly through equations 2a and 2b.The points were then subject to an affine transformation, all parameters given in Table S1.Noise of the form N (0, 0.01 2 I 3 ) was then added.We designed four tests covering all combinations of inclusion of deformation in the observation operator and/or data selection.
When neglecting non-linear deformation and data selection, referred to as test a), we found that all chains had the same MLM with two incorrect matches, see table S4.Although the number of errors in this particular example is low, when we trialled another test problem with the initial momenta scaled by a factor of 1.1, we found four unique MLMs with up to 33 incorrect matches.
Without data selection and the inclusion of the non-linear deformation, even small increases in problem difficulty can lead to large numbers of incorrect matches.
Next we included non-linear deformation within G, but neglected data selection, test b).The 8 chains converged to different regions, with 8 unique MLMs and between 0 and 19 incorrect matches.
The posterior here is higher dimensional and more complex, leading to poor mixing.Interestingly, when we compared the minimum negative log of the posterior density after optimisation conditioned on the MLM, we found that the chain with the deepest mode actually corresponded to a chain with 4 incorrect matches not the chain with all correct matches, see table S4.Mirroring this, the median and RMSE cell-to-match distance was in fact larger for the chain with 0 incorrect matches when compared to the chain with 4 incorrect matches, see table S4, suggesting this was a fluke, and would actually lead to selection of the incorrect matching.Without the fidelity parameters to assist in interpretation of the results, the identification of good matches is ambiguous.
We then included data selection and non-linear deformation within G, test c) and observed similar results as in test b), with an obviously multi-modal, difficult to sample from posterior distribution, resulting in the identification of 8 unique MLMs.The MLMs had between 2 and 32 incorrect matches, see table S4, again indicating poor mixing of the chains.This test had the additional difficulty that the prior on the momenta must be carefully balanced with the prior on the fidelity parameters.
We are most interested in the identification of cell matchings where we are confident in the identified matching, i.e. not necessarily identifying all cells' matches.We therefore neglected nonlinear deformation but included data selection, test d).We found that the 8 chains identified 2 unique MLMs with either 5 or 6 incorrect matches.The cells with incorrect matches were associated with reduced fidelity parameter posterior means (< 0.15) and corresponded to cells which were explicitly deformed in the generation of the test problem.We identified consistent matching for cells with MAP estimates of fidelity parameters (conditioned on the MLM) greater than 0.5, see Figure 6, which corresponded with cells from the undeformed region.Two cells that were not deformed explicitly did have reduced posterior means of their fidelity parameters, but this is due to the interaction of points and their mutual repulsion via σ K in equations 2a and 2b.We compared the median and RMSE cell-to-match distances for test d) with the previous tests and found that they were higher.However, when we considered only the undeformed cells, we found that the median cell-to-match distances were greatly reduced, indicating a successful matching of this subset of undeformed cells, see table S4.We also trialled the more difficult test where the initial momentum was scaled by a factor of 1.1, and sampled only on the affine transformation, permutation vector and fidelity parameters.The 8 chains identified one unique MLM, and all cells that were subject to an initial deformation had low posterior means of their fidelity parameters indicating the successful reduction of their contribution to the likelihood.As for the previous example, we observed reduction of the median cell-to-match distance for the undeformed cells, with correct matchings, again suggesting a good matching for the subset of undeformed cells.
A final key point regarding the benefit of including data selection rather than complex non-linear deformation models is the significant improvement in run-time, due to the cost of solving the ODEs (2a)-(2b).Tests b) and c) that included the deformation took approximately 28 hours to run, and suffered from slow mixing due to additional dimensionality and complexity of the posterior.On the other hand, test d) took approximately 30 minutes and converged to consistent MLMs therefore making it a far more feasible approach to match subsets of cells accurately within reasonable time frames.

Validation of cell matching for fixed embryos using reference markers
Next we devised a simple biological test problem where we introduced reference markers within the embryo via microinjection.We collected embryos at the 8-cell stage and then microinjected a single cell with H2b-mCherry, a fluorescent protein.Embryos were then subject to 24 hours ex vivo culture and then fixed and stained with Hoechst to facilitate nuclear segmentation.See Sections S1.1-S1.3,S1.14, [27] and [28] for full protocols.
We selected one embryo where four mCherry positive cells were identified and used as reference markers.The embryo was imaged and then moved randomly using a pipette before a second image of the embryo was taken, Figure 7a-b.Cell centres were approximated through segmentation of the nuclei in both images, see Section S1.3.
We performed inference including data selection but neglecting non-linear deformations, and initiated 8 chains randomly using draws from the priors.A minimum of N = 10 7 tempered iterations were conducted, and a further N 1 = 10 6 iterations at T = 1.
All eight chains were found to have the same MLM and had good spatial matching between the two point sets with an average median cell-to-match distance equal to 0.0403 units across the 8 chains, Figure 7c.We noticed that 6 cells had reduced fidelity posterior means in this example, Figure 7d, but not so low as to indicate a poor overall matching.
Previously, in our in silico tests we knew the ground truth permutation of the points and were able to construct heatmaps which approximated a diagonal matrix.In real data tests we no longer know this ordering a priori for the purposes of visualisation.Therefore the ordering of cells in the permutation heatmap was arranged such that the cells in Y 1 were ordered according to the MAP estimates of the fidelity parameters (conditioned on the MLM), and then the order of Y 2 changed according to the maximum match probability for each cell in Y 1 .The resulting heatmap was a diagonal matrix and we were able to show that the reference cells in Y 1 corresponded to the reference cells in Y 2 , Figure S4.

Matching of cells and embryos across imaging modalities
Finally, we wanted to trial matching cells between the final frame of a RTI experiment and an immunostained image.H2b:GFP embryos were chosen to facilitate the segmentation of cell centres from the movie, and were subject to ex vivo culture.Prior to removal of the embryos from the confocal microscope, they were imaged a final time using a z resolution of 1µm to increase the accuracy of the extracted cell centres.Embryos were then fixed to halt development and stained using Hoechst to enable visualisation of the nuclei for segmentation.Details of experimental protocol given in Section S1.1-S1.4.
We chose a group of four embryos (embryos 1-4) that were co-cultured and successfully stained (embryos A-D).Due to the co-culture of the embryos, the embryo matching was unknown a priori, We noticed that for each of the embryo combinations, one of the embryo pairings would typically have a more concentrated posterior distribution about some permutation vector, whereas the remaining three embryo combinations would show more disperse distributions, with a larger variability in the permutation vector, see Figure S6.We also noticed that the 8 chains would converge to the same MLM when the optimum embryo matching was identified for embryos 1, 2 and 4 but would otherwise identify at least 3 unique MLMs, see table 3. Table 3: Number of unique MLMs identified for each embryo combination, out of 8 chains.
As before, we recorded the cell-to-match distances during sampling at T = 1 and then compared the median and RMSE distances from the chain that converged to the minimum negative log of the posterior for each embryo pairing, see table 4. By considering the median cell-to-match distance, we were able to clearly identify three embryo matchings; embryo 1, 2 and 4 with A, B and C respectively.
Interestingly, these embryo pairings did not always correspond to the lowest RMSE distances.For instance, the pairing of embryo 1 with embryo A had the third largest RMSE distance, despite having the overall minimum median cell-to-match distance.On closer inspection, this was due to the reduction of several fidelity parameters.We noticed that cells 5 and 19 in Y 1 had significantly reduced posterior means of the corresponding fidelity parameters and were matched to cells 14 and 31 in Y 2 causing the increase in the RMSE, see Figure S5.These cells were either deep within the embryo or in the extremes of the z axis, both regions where segmentation can be prone to errors.
The identification and description of these points via the fidelity parameters, could help us go back and re-segment these regions more accurately, something that without data selection we would not be able to do.4: Median and RMSE cell-to-match distances for each embryo combination, given in arbitrary units corresponding to the chain that converged to the minimum negative log of the posterior density.
Bold entries correspond to high confidence embryo matchings.
By deduction we can say that embryo 3 should match with embryo D, however this was not as clear through assessment of the median cell-to-match distance or permutation heatmaps as for the other embryos.The 8 chains converged to three unique MLMs, again increasing our uncertainty in the identified matchings.The permutation heatmaps were typically more diffuse for all embryo combinations, suggesting that we were unable to identify a single global minimum indicative of the true cell matching, see Figure S7.The inaccuracy of the matching between embryo 3 and D could potentially be due to some increased deformation during fixation or inaccuracies during segmentation of the cell centres.This highlights the requirement for good quality data when matching the points.
For comparison, we performed the cell matching for each of the well-identified embryo matches (embryo pairs 1A, 2B and 4C) without data selection.All 8 chains for embryo pairings 2A and 4C identified the same MLM as with data selection, as we would hope high quality data.However, when we tried to match embryo 1 with embryo A without data selection, we identified MLMs with large numbers of differences when compared to the MLM identified previously with data selection.
One chain had 7 differences, 6 of the chains had 11 differences and the final chain had 39 differences, indicating a completely different identified MLM.This highlights our need to include the data selection framework, to ensure the accurate matching where there are cells without a corresponding match.Furthermore, the inclusion of data selection facilitates further inference and interpretation of the confidence of the matches presented within the MLM, and enables better mixing of the Markov chains due to its smoothing properties.

Discussion
In this work we presented a solution to an unlabelled landmark registration problem, by introducing a novel Bayesian data selection approach to account for non-corresponding cells.We included non-linear deformation, 3D affine transformation and description of the matching of cells via a permutation matrix within the registration model.By using MCMC and tempering of the likelihood, we were able to explore the complex, multimodal posterior and identify most likely matchings of two point-sets.To demonstrate the efficacy of the approach, we constructed a series of in silico problems, and used real data from biological imaging experiments.We were able to determine the matching of cells between the final frame of a RTI experiment and corresponding immunostained images, even when the embryo correspondence was originally unknown due to co-culture of the embryos.
Our development of an approach to match single cells between imaging modalities enables the combination of historical cell data extracted from RTI studies, with protein expression at the single cell level.Previously this has been approached manually, resulting in potentially subjective conclusions relating cell behaviour and protein expression.By enabling this joint assessment of spatio-temporal information at the single cell level using our approach, we can begin to investigate the importance of cell history during cell lineage specification within the mammalian preimplantation period.
Existing landmark registration approaches are predominantly framed as optimisation problems, and therefore provide no measure of uncertainty in the identified matching of points [8].Some of these approaches also rely on some partial labelling of matches and additional information relating the points such as the properties of the landmarks [8][9][10].In contrast our approach is based solely on the geometrical coordinates of the landmarks.
The development of the data selection aspect of this approach was crucial to the accurate registration in the real-world problem due to the presence of cells without corresponding matches in either image.We demonstrated that without the incorporation of the data selection framework, the accuracy of identified cell matchings was reduced, especially in larger embryos where the number of cells without corresponding matches was potentially increased.We also demonstrated that the inclusion of data selection facilitated better mixing of the MCMC chains by reducing the roughness of the state space, thus improving chain convergence and improving the robustness of the approach.
More sophisticated MCMC methods that are known to be more efficient in multimodal targets, such as parallel tempering, could be used to further improve mixing and reduce computational complexity.
Choosing conjugate priors for the fidelity terms could also reduce the dimensionality of the problem, and further improve mixing.
The idea of Bayesian data selection, in which parameters which govern the effect of an observation on the posterior are inferred alongside the model parameters, is extremely general, with great potential to be applicable to a very broad class of inferential problems in statistics and data science.
Data cleaning is a subjective and laborious task which is often undertaken by hand, the results of which can have a profound impact on the outputs of the inference, and this approach automates that process in a way which is consistent and free from user-bias.In future work we plan to explore these ideas in more depth, and apply them to a range of disparate application areas.
which is uniquely determined by the initial momenta p j t at t = 0 at each landmark with coordinate q j .Here V is a reproducing kernel Hilbert space [19] with norm • V , and with kernel K V , which we assume to be Gaussian: The geodesic deformation is then given by the solution of the following differential equations over the time interval [0, 1], where q j and p j are the initial position and momentum of the j th cell respectively in three dimensions.We define u t at q j t as where σ K describes the variance of the kernel.As the data is pre-processed to ensure a minimum cell-to-cell distance of one, we found that σ K = 1 was a sensible value to use.Equations S1a and S1b can be re-written as The deformation is applied to q = Y 2 , the original positions of the cell points prior to deformation, through p and equations S3a and S3b solved over t = [0, 1] to give φ(θ; Y 2 ), the deformed Y 2 coordinates at time t = 1.

S1.6 Affine transformation in three dimensions
To apply an affine transformation to three dimensional points we first describe the matrix A(θ) as a combination of two rotation matrices R 1 (θ), R 2 (θ), and a scaling matrix S(θ) where A(θ) = R 1 (θ)S(θ)R 2 (θ), α i and β j are angles of rotation, and s k are the scaling coefficients.
The translation of points is applied through where 1 n ∈ R n 2 is a column vector of ones.The affine transformation is applied as where φ(θ; Y 2 ) = Y 2 when no non-linear behaviour is included within the model.

S1.7 Calculating the Γ-dependent normalisation to the posterior density
The normalisation factor of the posterior distribution is no longer constant when we include data selection, instead it is dependent on Γ as We can directly calculate the Γ-dependent normalisation by considering the substitution Y = XΓ, , where D J is the Jacobian of the transformation from X to Y , given by and the absolute value of the determinant given by From this we can re-write dX and write Z(Γ) as a combination of a Γ dependent function multiplied by some constant By dropping the constant terms in Z(Γ) and retaining only the factor dependent on Γ, we re-write the posterior as The resulting proposal is no longer a random walk, and the proposals are no longer symmetric.The ratio of the proposal densities as required in the Metropolis-Hastings formula is given by: where n p is the number of parameters being sampled on.

S1.10 Adaptive MCMC
When sampling at T=1, we switch to an adaptive sampling approach to increase efficiency of our sampling.The sample covariance is calculated on-the-fly to avoid the need for storing all accepted parameter values using The maximum number of iterations and evaluations of the function were set to 10 6 .These values of θ and γ are then used to generate spatial matching figures and displayed alongside permutation heatmaps.

S1.13 Fidelity parameter test with higher n r
To further investigate the effect of the fidelity parameters (in addition to tests in Section 6.2), we designed a more challenging test.We instead removed n r = 6 and 12 cells from Y 1 and Y 2 for the 33-and 62-cell examples respectively.
When data selection was included, all chains sampled from distributions highly concentrated about the correct permutation vector for cells 1 → (n 1 − n r ).The final n r cells has reduced posterior means of the fidelity parameters and exhibited non-committal matching in the permutation probability heatmap, Figure S3a-b.
When we attempted to identify the matching of the points without data selection, matching success was reduced with increased numbers of incorrect matches in the MLMs, see Figure S3c-d, and increased median cell-to-match distances, see tables S2 and S3.

S1.14 Embryo microinjection
Embryo microinjection was performed using the protocols outlined in [27].For a more detailed description of methods please refer to the full materials and methods within [27] and [28].Reference markers inside embryos were generated through microinjection of H2b-mCherry mRNA (1µg/µl) into single blastomeres of 8-cell stage embryos (collected from oviducts via flushing).Embryos were placed in drops of M2 in the deepest point of a glass-depression slide (Fisher), mounted within the micro-injector microscope stage (Leica).Drops of M2 were then covered with mineral oil to prevent evaporation.
Embryos were then washed in KSOM and cultured in 35-mm glass bottom culture dishes for 24 hours at 37.5 • C and 5% CO 2 .

(
;  2 ) a) Real time imaging b) Fixation and c) Confocal imaging d) Spot detection and cell (RTI)

Figure 2 Figure 3 :
Figure 2: a) Example normalised probability heatmap of matches for the 15-cell problem with corresponding MAP estimates (conditioned on MLM) on γ. b) Corresponding spatial matching of Y 1 and G(θ; Y 2 ) using estimates of transformation parameters.

01 2 I 3 )
to each point.To introduce cells without corresponding matches whilst maintaining n 1 = n 2 , we removed the first n r cells from Y 1 and the last n r cells from Y 2 , resulting in n r cells in Y 1 and Y 2

Figure 4 :
Figure 4: Typical average acceptance ratio (ᾱ) plots for permutation, transformation and fidelity modules (subscript P , t, and f respectively) during tempering.

Table 2 :
. Without data selection, the matching identified is the effective result of minimising the RMSE of the cell-to-match distances for all cells, including those without corresponding matches.Example of median and RMSE cell-to-match distances corresponding to the chains with the minimum values of the negative log of the posterior.

Figure 5 :
Figure 5: Comparison of matching for 33-cell test with n r cells without corresponding matches.a) Example of a permutation heatmap when data selection was included, with associated MAPs of γ, conditioned on MLM.b) Corresponding heatmap when data selection was not included, with two incorrect matches (pink arrows).c-d) Spatial matching of points for example with and without data selection respectively.

Figure 6 :
Figure 6: Permutation heatmap for the non-linear deformation test d).Cells ordered vertically and horizontally according to the ordering of the MAP estimates of the fidelity parameters conditioned on the MLM.Undeformed cells highlighted with cyan box.

Figure 7 :
Figure 7: a) The first confocal image of the embryo prior to movement on the imaging stagereference cells marked with arrows b) Second image of embryo after manipulation.Scale bar equal to 20µm.c) Spatial matching of points using MAP estimates of transformation parameters.d) Example of marginal posteriors on the fidelity parameters, with maximum possible posterior in black dashed line.
S20)    where i is the iteration number during sampling at T = 1, ∆ θ = (θ − θ) and θ the rolling average of the model parameters.The proposal covariance matrix is updated less frequently as i → N 1 to satisfy the diminishing adaptation condition.C is then used as the proposal covariance matrix to generate future proposals.S1.11Estimation of transformation parameter and fidelity parameter MAP estimates conditioned on MLMDuring non-tempered sampling (at T=1) the minimum value of the negative log of the posterior density from equation 14 is stored, along with corresponding θ and γ parameter values.This gives us an estimate of the the deepest mode within the explored state space.To estimate the MAPs of model parameters conditioned on the MLM, we use the inbuilt fmincon optimiser in MATLAB, using starting positions of the parameters as those identified at the minimum log of the posterior density.The permutation vector is not changed from the MLM during optimisation as optimisation over the discrete permutation vector state space would have been computationally expensive and likely unnecessary due to the low acceptance of new permutation vectors during sampling at T = 1.
Parameters used to generate in silico data sets.

Figure S2 :Figure S3 :
Figure S2: Example of the 62-cell test with n r = 6 non-corresponding cells from Section 6.2.a) Permutation heatmap when data-selection is included.b) Inset of region where cells have no corresponding matches and reduced γ.c) Example permutation heatmap when data selection is not included, 14 incorrect matches in the MLM.Less concentrated distribution about the MLM.

Figure S4 :
Figure S4: Example permutation heatmap from the problem in Section 6.4 with four known mCherry reference markers, ordered by fidelity parameter and cell matches in Y 2 .highlighted rows/columns indicate the successful matching of four reference maker cells.

Figure S6 :
Figure S6: Example permutation heatmaps for embryo 1 with embryos A, B, C and D from Section 6.5.Heatmaps ordered by MAP estimate conditioned on MLM of fidelity parameters and then the corresponding maximum match in Y 2 .

Figure S7 :
Figure S7: Example permutation heatmaps for embryo 3 with embryos A, B, C and D from Section 6.5.Heatmaps ordered by MAP estimate conditioned on MLM of fidelity parameters and then the corresponding maximum match in Y 2 .
n 2 ×n 1 where M P = e P 1 e P 2 . . .e Pn 1 , where e i ∈ R n 2 ×1 are the standard canonical basis column vectors for R n 2 .Post multiplication of the transformed Y 2 coordinates by M P gives us a matrix of the new cell center positions ordered according to P .

Table S2 :
Summary of the number of incorrect matches in the MLM, median and RMSE cell-tomatch distances (d) for each chain, for test problems in Section 6.2 when data selection was included.

Table S3 :
Summary of the number of incorrect matches in the MLM, median and RMSE cell-tomatch distances (d) for each chain, for each test problem in Section 6.2 when data selection was not included.