Dynamical projections for the visualization of PDFSense data

A recent paper on visualizing the sensitivity of hadronic experiments to nucleon structure [1] introduces the tool PDFSense which defines measures to allow the user to judge the sensitivity of PDF fits to a given experiment. The sensitivity is characterized by high-dimensional data residuals that are visualized in a 3-d subspace of the 10 first principal components or using t-SNE [2]. We show how a tour, a dynamic visualisation of high dimensional data, can extend this tool beyond 3-d relationships. This approach enables resolving structure orthogonal to the 2-d viewing plane used so far, and hence finer tuned assessment of the sensitivity.


Introduction
Many problems in physics can be broadly characterized as a description of a large number of observations with models that contain multiple parameters. It is common practice to perform a global fit to the observations to arrive at the set of parameter values that best fits the data. To understand how well this fit describes the observations, a series of one or two-dimensional projections of confidence level regions are usually provided.
It is desirable to visually inspect the results of such fits to gain insight into their structure. One possibility is to directly compare the predictions of different parameter sets in the vicinity of the best fit. A simple algorithm to organise this idea that results in a manageable number of such parameter sets can be constructed using singular value decomposition (SVD). One first decides the confidence level at which to make the desired comparison and quantifies it with the corresponding ∆χ 2 for the appropriate number of parameters being fit, n. The region in parameter space within the desired confidence level is approximately an n-dimensional ellipsoid, and SVD provides an ideal set of 2×n points on which to evaluate the predictions of the model for visual inspection. These points are given by the intersections of the ellipsoid with its principal axes and clearly provide a minimal sample of parameter space that covers all relevant directions at a desired confidence level.
A tool for the direct visualisation of the high dimensional model predictions thus constructed has existed in the statistics literature for many years, but has not been applied to high energy physics problems recently. 1 It is called a tour, and is a dynamic visualization of low-dimensional projections of high-dimensional spaces. The most recent incarnation of the tool is available in the R [4] package, called tourr [5]. The goal of this paper is to introduce the use of a tour as a visualisation tool for sensitivity studies of parton distribution functions (PDFs) building on the formalism that has been developed over the years by the CTEQ collaboration. It is beyond the scope of this article to provide a detailed analysis of the PDF uncertainties. The choice of this example has two motivations: the PDF fits embody the generic problem of multidimensional fits to large numbers of observables that are common in high energy physics; and Ref. [1] has recently provided the parameter sets for this problem in an initial effort to visualize the PDF fits. Our starting point will be the PDFSense [1] results but our study differs in an important way: PDFSense utilizes the Tensorflow Embedding Projector [6], limiting visualisation to three of the first ten principal components, that is, a 3-d subspace, whereas the tour allows us to explore the full space. As we will see here, this allows additional insights into the fits.
Our paper is organised as follows. In Section 2 we first describe the problem as formulated in Ref. [1] and we discuss a toy example to illustrate the concepts involved. We then introduce the tour algorithm and its implementation in Section 3. Finally we discuss the results obtained by applying tour to the PDFSense dataset in Section 4 and present our conclusions in Section 5.

PDF fits and residuals
The analysis of collider physics results relies on theoretical calculations of cross-sections and distributions. Factorization theorems allow us to bypass non-perturbative physics that cannot be calculated from first principles and to describe instead, the initial state of a reaction in terms of parton distribution functions or PDFs. These consist of simple functional forms describing the probability density for finding a given quark or gluon in the proton with a given momentum fraction x, at a given momentum transfer scale Q. The PDFs used today have been constructed by fitting high energy physics data collected over many years by multiple experiments and are produced by large collaborations. As such, they constitute an ideal example of a multidimensional parameter fit to a large data set to study with a tour.
For our study we will make use of the framework for treating uncertainties of the PDF predictions as has been defined in [7,8]. The best fit PDF, defined by the set of n parameters a 0 i , is obtained by finding the global minimum of a χ 2 function. To study uncertainties in the fit one considers small variations of the parameters around the minimum using a quadratic approximation for the χ 2 function written in terms of the Hessian matrix of second derivatives at the minimum, H. The eigenvectors of this matrix provide the principal axes of the confidence level ellipsoids around the global minimum, and one defines a displacement along these directions to find the n dimensional set of points a i which provide 2n PDF sets that differ from the best fit by a desired confidence level.
Ref. [1] has introduced the package PDFSense to study the sensitivity of different experiments to different aspects of the PDFs. An ingredient of that study are the so-called shifted residuals which are related to the experimental error contribution to the χ 2 by [9] where theλ α are the best-fit nuisance parameters. The shifted residuals r i ( a) are calculated as the difference between the theoretical prediction T i ( a) and the shifted central data value D i,sh ( a), normalised by the total uncorrelated uncertainty s i , Note that D i,sh ( a) is the observed central value shifted by a function of the optimal nuisance parametersλ α and therefore depends on the point in parameter space considered. The so-called response of a residual to an experimental result i is then defined as [1] with r 0 E the root-mean squared residuals characterizing the quality of fit to experiment E. 2 It parameterizes the change in residuals with variations along the independent directions a ± l . Large values of δ ± i,l therefore indicate considerable variation in the theory prediction values within the selected window of allowed probability variation along the considered direction. We thus consider a 2N dimensional vector for each data point (i.e. experimental result). Concretely, here we consider a 56 dimensional parameter space in which we want to compare and group the experimental results. These responses δ i are calculated and provided by Ref. [1] and they constitute the starting point of our study.
2 Note that the shifted central data value enters the residuals, thus while the observed central value cancels in the definition of δ ± i,l , differences in the shift arising from differences of the optimized nuisance parameters at a ± l are encoded in the results together with difference in theory predictions.

Simple illustrative example
The procedure described so far has been used for many years, but it is complicated. For newcomers to the field, we illustrate it here using a simple example drawn from two early data sets for the gluon parton distribution function extracted from two types of ψ production experiments [10]. This example will allow us to illustrate all the concepts involved. In Figure 1 we show these two data sets, labelling the points and their error bars p(x) ± ∆p(x), for 15 and 16 values of x (in red and blue) respectively. The points are fit to the two-parameter function similar to but simpler than the forms used today. The next step is to minimise the χ 2 -function defined by The parameters a 0 , b 0 that result in the global minimum χ 2 (a, b) min define the best fit to the data. They are shown as the cross in the right panel of Figure 2, and produce the solid black curve shown in Figure 1. At the same time one adopts a quadratic approximation to the χ 2 function in the vicinity of its minimum where the matrix of second derivatives evaluated at the global minimum is the well-known Hessian. This approximation seems unnecessary for the simple example we are discussing now but is used for the current global fits offering complementary features to exact numerical methods [11]. To quantify the error in the fit one then constructs the region in a, b parameter space corresponding to a given confidence level. For our example we take χ 2 (a, b) − χ 2 (a 0 , b 0 ) ≤ 5.99 which corresponds to a 95% confidence level in the estimation of two parameters. The intersection of the plane χ 2 (a, b) = χ 2 (a 0 , b 0 ) + 5.99 (green) with the χ 2 (a, b) function (shown in black) and its quadratic approximation (in orange) is shown in the left panel of Figure 2. The right panel in the same figure shows the ellipsoid (two-dimensional in this case) defined by this intersection for the quadratic approximation (in orange) and the deformed ellipsoid in black for the exact χ 2 (a, b) function. The difference between the two is small indicating that the quadratic approximation is quite adequate for this confidence level. The eigenvectors of the Hessian matrix provide the directions of the principal axes of the ellipsoid and are shown in black in the right panel of Figure 2: the dashed (dotted) lines correspond to the direction associated with the largest (smallest) eigenvalue. The intersections of these axes with the ellipse, shown as black dots, provide a set of fits to the data that can be compared with the best fit and used as a means of quantifying the uncertainty in the fitting procedure. These are also shown in Figure 1.  Figure 2: Difference between the χ 2 -function (black), and quadratic approximation (orange). Their intersection with a 95% confidence level plane is shown on the right panel. The intersections of the principal axes with the ellipse (that occurs in the quadratic approximation) are shown as the black dots in the right panel. The numbers label the eigenvector of H corresponding to that direction.
The set of responses, δ ± i,l , in this example is shown in Figure 3. From inspecting the limiting behaviour of Eq. 5 it is clear that the description at low x is dependent mainly on a while large values of x are mostly sensitive to b. This is reflected in the uncertainty curves in Figure 1, and also when looking at the δs. For this simple example the main directions identified by the Hessian method are in fact well aligned with the original directions in parameter space. Considering the values of δ we find that δ ± 1 , which corresponds mainly to a variation of a, takes large values for bins with low values of x, while δ ± 2 takes large values for bins with large values of x. We conclude that the parameter dependence is captured by the δs as expected. Going to more complex descriptions and fits, as we do in the following, this correspondence is no longer clear from the description and the δ values may be used to infer the parameter dependence of a given prediction.
In Figures 1 and 3 we have labelled the following four points: 1. point with highest value in δ 1 , found at low x and with small error bar These observations illustrate that large values of δ correlate with points with errors that are comparable to or smaller than the uncertainty in the fit as parametrized by the Hessian method. At the same time, points that are not well described by the fits do not necessarily result in large δs.

Data visualisation
When looking for structure in high dimensional parameter spaces we rely on tools for dimensional reduction and visualisation. Due to the importance of this task, many methods have been developed. Here we give a brief overview of the tools used in the following work.

Principal component analysis
Principal component analysis (PCA) is an orthogonal linear transformation of elliptical data into a coordinate system, such that the first basis aligns with the maximum variance. The second basis is the direction of maximum variation orthogonal to the first coordinate, and the remaining bases are sequentially computed analogously. It is typically used for dimension reduction. To choose the number of principal components (PCs) to use, the proportion of variance explained by each component is examined. Either a pre-determined proportion of total variance is used, or by plotting the proportions against the number of PCs and choosing the point where this flattens to zero. PCA is an optimization problem with a well defined solution. However, the outcome of the PCA is affected by the preparation of the input data. The preparation can also be used to highlight specific aspects of the data distribution. For example, the input data is generally centered before performing PCA by setting each variable to have a mean value of zero. In this way, large variation describing only mean values different from zero are removed from the results. Another approach would be to normalize the distribution, to emphasize directional information. Typically this means "sphering" of the data points, by normalizing each vector to have length one. This results in comparison of similar, or different, directions in the parameter space, but information about the differences in length are lost by this approach.
In this work we use the standard implementation prcomp in R for the computation of the principal components.

Nonlinear embeddings
It is also common to examine non-linear mapping of the data points onto a low dimensional embedding. The aim is to preserve multidimensional structure by minimizing the difference in distances in the full parameter space as compared to distances in the low dimensional projection. PCA is a simple member of this more general type of transformation. A widely used method in machine learning is the algorithm called t-distributed stochastic neighbor embedding (t-SNE) [2]. It has a goal to cluster similar points together (i.e. points with small Euclidean distance) while separating the individual clusters from one another. This gives appealing and often useful pictures but results should be considered with care as t-SNE is a nonlinear transformation and does not preserve original distance. Note that while nonlinear embeddings may be useful in identifying clusters in the data, their interpretation is limited by lack of an analytical description of the transformation. This is not the case for linear transformations such as the PCA, where the transformation can be readily reversed to identify the contribution of the original parameters to a given principal component direction.

Overview
When a data set has more than two parameters, the tour [12] can be used to plot the multiple dimensions. Currently the typical approach is to plot two parameters or pairs of combinations of the parameters. The tour extends this idea to plot all possible combinations. The viewer is provided with a continuous movie of smooth transitions from one combination to another, from which it is possible to extrapolate the shape of the parameter space in high-dimensions. Seeing many combinations in quick succession shows the associations between all the parameters.
There are several types of tours. Here we use a grand tour, of projections from n-dimensional parameter space to 2-d projections space. A projection of data is computed by multiplying an m×n data matrix, X, having m sample points in n dimensions, by an orthonormal n×d projection matrix, A, yielding a d-dimensional projection. The grand tour is a mechanism for choosing which projections to display, and how the smooth transitions happen. New projections are chosen from all possible projections, and a geodesic interpolation to a target projection provides the smooth transition. The original algorithm is documented in [13]. The implementation used in this paper is from the tourr [5] package in R [4].
The tour shows linear projections of the parameter space. In contrast, methods like t-SNE [2] produce non-linear mappings from high-to low-dimensional space. The difference is that the shape of the data in high-dimensions is preserved by linear projections, but not with nonlinear mappings.

Algorithm
A movie of data projections is created by interpolating along a geodesic path from the current (starting) plane to the new target plane. In the grand tour, the target plane is chosen by randomly selecting a plane. The interpolation algorithm (as described in [14]) follows these steps: 1. Given a starting n × d projection A a , describing the starting plane, create a new target projection A z , describing the target plane. It is important to check that A a and A z describe different planes, and generate a new A z if necessary. To find the optimal rotation of the starting plane into the target plane we need to find the frames in each plane which are the closest.
2. Determine the shortest path between frames using singular value decomposition.
The principal directions are the frames describing the starting and target planes which have the shortest distance between them. The rotation is defined with respect to these principal directions. The singular values, λ i , i = 1, . . . , d, define the smallest angles between the principal directions.
3. Orthonormalize B z on B a , giving B * , to create a rotation framework.

5.
Rotate the frames by dividing the angles into increments, τ i (t), for t ∈ (0, 1], and create the i th column of the new frame, b i , from the i th columns of B a and B * , by b i (t) = cos(τ i (t))b ai + sin(τ i (t))b * i . When t = 1, the frame will be B z .

Project the data into
7. Continue the rotation until t = 1. Set the current projection to be A a and go back to step 1.
In a grand tour the target plane is drawn randomly from all possible target planes, which means that any plane is equally likely to be shown. That is, we are sampling from a uniform distribution on a sphere. To achieve this, sample n values from a standard univariate normal distribution, resulting in a sample from a standard multivariate normal. Standardize this vector to have length equal to one, gives a random value from a (n − 1)-dimensional sphere, that is, a randomly generated projection vector. Do this twice to get a 2-dimensional projection, where the second vector is orthonormalized on the first.
The data typically needs some standardization or scaling before computing the tour. This can be as simple as centering each variable on 0, and standardizing to a range of -1 to 1. It could be as severe as sphering the data which in statistics means that the data is transformed into principal components (from elliptical shape to spherical shape). The same term is used for a different type of transformation in other fields, where observations are scaled to fall on a high-dimensional sphere, by scaling each observation to have length 1. (An interesting diversion: this type of sphering is the same transformation made on multivariate normal vectors to obtain a point on a sphere, to choose the target planes in the grand tour.) The initial description of the tour promised display of all possible projections. Theoretically this is true, but practically it would require that the user stay watching forever! However, the coverage of the space is fairly fast, depending on n, and within a short time it is possible to guarantee all possible projections are displayed within an angle of tolerance.

Display
For physics problems, setting d = 2 would be most common. The projected data is displayed as a scatterplot of points. It is also possible to overlay confidence regions, or contours. Groups in the data can be highlighted by color. Displaying the combination of variables of a particular projection can be useful to interpret patterns. This can be realized by plotting a circle with segments indicating the magnitude and direction of the contribution, and it is called the axes.
The same tour path can be used to display subsets of the data, in different plots, to compare groups. When we break the display into subsets, the full data is also shown in each plot, in light grey. This makes it easier to do group comparison.

Results
This section compares the findings made using the tour relative to those made with PDFSense using the recent CT14HERA2 fits [15]. The PDFSense results form the basis on which to expand the knowledge of PDF fits. The results from both tools are summarized in Table 1, where PDFSense results were obtained using the TensorFlow Embedding Projector (TFEP) software [6] for the visualisation of high-dimensional data. The summary statistic "reciprocated distance" referenced in Table 1 is defined as: TFEP provides two methods, PCA and t-SNE, and [1] is exploring both for the visualisation of the data set. The PCA implementation returns projections onto the 10 first PCs evaluated from centered and sphered data, and allows the user to choose two or three of them to view the results.

Results from PDFSense & TFEP
For comparison we first reproduce results similar to those found in [1] by using the TFEP software. A selection of four views is shown in Figure 4, for a complete set of plots related to the PDFSense column in Table 1 we refer the reader to [1]. The selected examples show how the view was chosen based on orthogonality of assigned groups, and how for the example of the jet+tt group the various contributions have been compared. We can identify several limitation in using the TFEP software for the visualisation: • Relevant information about the distributions is encoded in more than 3 dimensions. This is clear as PCs 3, 5 and 8 have been selected in the visualisation, thus the majority of variation in the data is not captured in Figure 4. Moreover, the application of t-SNE clustering shown PDFSense & TFEP Tour 1 Three clusters can be separated in the visualisation, labelled DIS, VBP and jet cluster. In the selected view the jet cluster is roughly orthogonal to the DIS cluster.
We observe the differences in distributions between the three clusters more clearly. Substructure within the clusters is also observed, and studied in some detail.
2 New ATLAS and CMS results will dominate the jet cluster.
A more detailed comparison of jet cluster results shows that CMS results are mainly responsible for extending the range, consistent with sensitivity rankings.
3 tt results are characterized by large δ but there are only a few points and they are found inside the jet cluster.
While the tt results follow similar distributions to the jet cluster, they do contain outlying points.
4 Results from semi-inclusive charm production at HERA (147) are found to overlap with the DIS and jet clusters.
These results do not take significant values in any direction of the δ space, directional information is misleading here.
5 CCFR/NuTeV dimuon SIDIS results (124-127) are orthogonal, the direction cannot be resolved in the selected view.
The tour resolves the orthogonal direction and further allows to identify outlying points.
6 Reciprocated distance as summary statistic to characterize "relevance" of results.
We can use the ranking as guidance to select results to highlight in the visualisation to gain understanding of how the summary statistics relate to raw distributions.  in [1] results in a large number of clusters, indicating higher dimensional structure. It would be preferable to display it as a linear projection for which interpretations are straightforward.
• The sphering of data points when preparing the PCA visualisation is removing relevant information about the length of the vectors δ i .
• In addition while the online tool allows highlighting of groups it is considerably less flexible in selecting options compared to scripted tools like the tour, limiting the detail in which the results can efficiently be studied.
We next explore how these points can be addressed, in particular in the framework of dynamical projections and the tour algorithm.

Expanded findings made using the tour
We first optimize the number of principal components considered in our study, and then show how the tour results expand on previous observations, as was summarized in Table 1. The mapping from the original δ coordinates onto the PCs for all PCAs considered in this work are listed in Appendix B.

PCA, normalisation and variance explained
In the following we study two sets of principal components (PCA1, PCA2), corresponding to the two data preparation choices described above (i.e. PCA1=centered, and PCA2=centered and sphered). Results from each are compared. Note that for this problem, the centering has negligible impact on the results as the mean value in each direction δ ± i,l is close to zero. An important consideration is the number of PCs that contain relevant information. To study this we show in Figure 5 the proportional variance that is explained by the principal components, for the two choices of the PCA, with labels "Centered" for PCA performed on centered data (PCA1) and "Sphered" for the PCA obtained for centered and sphered data (PCA2) thus reproducing results from Figure 4. We find a steep curve for the first few PCs, followed by a slow decay of the proportional variance, and the curve only flattens out towards zero around PC30. As a consequence we expect that looking at a 3 dimensional subset of the first 10 PCs is not sufficient to understand the variation in the considered parameter space, and that judging similarity based on the view in Figure 4 only, is misleading. q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q To capture all the variation, one would need close to 30 principal components, but around 6 captures about 50% of the variation. Both data preparations produce similar variance explanation, but the differences are enough to matter in some interpretations.
In the following we want to study a higher dimensional subspace where we base the number of dimensions considered on the results found in Figure 5.
For simplicity, we illustrate the tour approach using just the first 6 PCs, which captures about 50% of the overall variation.This is sufficient to provide new insights as compared to Figure 4 (left), and additional PCAs can be added for detailed studies of subgroups as we do below.

Grand tour result details
A short tour path is generated, of 20 basis planes and associated interpolation between them, of 2 dimensional projections of 6-d. This is used to compare between multiple groups. The examples considered are guided by findings in [1] and are summarized in Table 1.
Grouping of data points We first consider a display corresponding to Figure 4 (left), i.e. the data set is grouped into three main clusters. Selected views from the animation are shown in Figure 6, PCA1 (left) and PCA2 (right). The same colors used in Ref. [1] indicate the grouping: the DIS cluster is shown in blue, VBP in orange and the jets cluster in red. The first window in the display shows the axes, the other windows show the projected data, where one group is highlighted in color, while the remaining points are shown below in grey for easy comparison. As can be seen from the selected views, in any particular static view it is only possible to separate two of them at a time. The static views are not sufficient to convey the full picture obtained by watching the tour animation which allows to separate all three groups. The tour indicates that there is higher dimensional structure in the data points as can be seen in the linked animation. In addition, it is possible to visually identify substructure within the clusters (e.g. groups of points aligned along some direction) as well as outlying points. This is especially true for PCA1 which is found to provide a much clearer picture than PCA2. We also find that the DIS and VBP clusters extend in multiple directions, while the jets cluster seems to be well described in a single plane.
The jet cluster In more detail, we investigate the jet cluster. These results are of special interest since they contain indeed the largest data sets to be added in the fit, which were indeed found to be important according to [1]. In addition, the new data from LHC jet measurements is of interest because of possible tensions e.g. [16,17]. As seen above the jet cluster appears to be described in a lower dimensional subspace. Indeed performing PCA on the results in the jet cluster alone we see that the cumulative proportional variance reaches 49/75/91/95 % for PC1/2/3/4 respectively, with the proportional variance dropping to less than 2% for PC5. We therefore study substructure in this 4 dimensional space. While [1] distinguish three types of groups, i.e. "old" jet results (those included in the CT14HERA2 fit), "new" jet results (more recent ATLAS and CMS results) and tt, it makes sense to differentiate the LHC results further by experiment and √ s (motivated also by the differences in sensitivities observed in [1]). For simplicity we consider only the results from performing PCA on the centered data shown in Figure 7 with grouping into: Tevatron (IDs 504, 515), ATLAS7old (535), CMS7old (538), CMS7new (542), ATLAS7new (544), tt-energy (565, 567), tt-rap (566, 568) and CMS8 (545). Indeed we observe that the Tevatron results as well as the ATLAS results generally fall in the center of the cluster, with exception of some outlying points. On the other hand CMS 7 and 8 TeV results extend in (different) new directions. It is interesting to note that "old" CMS 7 TeV results extend further out than the corresponding "new" ones. In  fact while the new measurement extended to higher rapidities and lower values in jet p T , the old measurement contains higher p T bins at no longer present in the updated result, which turn out to give large values of δ. Finally for tt results we distinguish the observations binned in energy (p t T or m tt ) or rapidity (y t/t or y tt ). We can identify differences between the two groups in the visualisation, however as already noted in [1] the data points are not significantly different from the main jet cluster.
It is interesting to study which data points are found to be outlying in the visualisation. These points are highlighted in Figure 7 and are best distinguished when watching the tour animation: • |y| > 2.5 and µ > 950 GeV -marked with a star symbol: only one such point is found in the 7 TeV data sets. It occurs in ATLAS7new, it is the last rapidity bin and is clearly outlying (large negative values in PCs 1, 2 and 3). However no particular trend is observed when comparing with points in nearby bins. There are two more such data points in the CMS8 data set, but they do not stand out in δ space.
• |y| > 2 and µ > 1000 GeV -marked with downward pointing triangle. These points are seen to align in a new direction, away from the main cluster highlighting their importance in the fits.
They are also useful for comparing the different CMS results: in this case there are common points to both datasets that nevertheless look different, suggesting the need for further study of these points.
• for CMS8 we also highlight |y| < 1 and µ < 200 -marked with diamond symbol: they are very different from the main distribution and give large positive values in PC1. It is interesting that we can clearly separate these low µ bins in CMS8 set but not in CMS7.
The DIS cluster We next consider subgroups of the DIS cluster for which the TFEP visualisation allowed only limited interpretation. Concretely, while the bulk of the cluster was clearly spanned by the HERA results (ID 160) as expected, other results were found to follow quite different distributions. In particular the Charm SIDIS (ID 147) results are distributed in a different direction, overlapping partly with both the DIS and the jet clusters, while the dimuon SIDIS results (IDs 124-127) were found in the center of the distribution and it was concluded that this cluster extends in an orthogonal direction, although it was not shown explicitly. We therefore compare in detail these three groups. In this case it is useful to consider both PCA1 and PCA2, the latter more closely related to the TFEP output. First, we observe that the dimuon SIDIS is poorly separated in the PCA2 projection, whereas PCA1 clearly shows how it extends considerably away from the main DIS cluster (ID 160). On the other hand, the charm SIDIS can be separated more easily when studying the directional information in the PCA2 projection because the individual values in the space of deltas are all comparatively small. These results suggest that either predictions for these type of observables are well under control in the existing fits, or that alternatively the experimental errors are too large for them to be constraining. We also observe substructure in the DIS HERA1+2, see Figure 8 and the corresponding animation, indicating that this group combines a number of qualitatively different types of results.
Comparison with summary statistics We now consider the experimental results with the highest values in reciprocated distances to show they can also be easily distinguished with our visualisation. We highlight three groups in Figure 9: the HERA dataset (ID 160), the W asymmetry measurements (ID 234, 266 and 281) and the fixed-target Drell-Yan measurements from E605 and E866 (ID 201, 203 and 204).
Indeed we find that the W asymmetry measurements (234, 266 and 281) follow a very distinct distribution, as does the HERA DIS dataset (160). On the other hand, the fixed-target Drell-Yan measurements (201, 203 and 204), do not stand out in our visualisation. We find that this is a consequence of the dimension reduction, 3 and we can easily identify views separating this group from the other data points when considering additional dimensions. Here we show this by looking at projections found by performing PCA on this data subset only and using it to compare it to the other data sets in the subspace of the first 4 PCs thus defined. Note however that the tour allows visualisation of the distributions in the full parameter space which would yield the same information. Our choice of procedure is simply to limit the viewing times required, which grow  with the number of dimensions considered. 4 This type of visualisation, together with inverting the mapping onto principal components, may be used to identify the origin (i.e. underlying physics) of the large differences. For example the first three PCs found for the DY dataset capture three different distributions, and mapping those back to the original δ directions together with study of those directions with respect to uncertainty in individual parton pdfs may provide additional insight. Such detailed investigations are however beyond the scope of this study.

Summary and conclusions
Starting from the set of 56 dimensional vectors in the space of residual responses calculated in [1], we have demonstrated how the grand tour may be used for visualizations in particle physics. The 56 dimensions are reduced to 6 dimensions (for illustration) using principal component analysis, and the resulting representation is then passed onto the tour. The findings made about the fits using the tour, even with only 6 dimensions, are more comprehensive and clearer than what TFEP allows.
The tour visualisation verified several results from [1], notably, the separation between DIS, VBP and JET experiments into clusters populating different regions of delta space. It also allowed us to go into further detail by examining certain substructures within these groups. We have moreover demonstrated that the tour can complement and support analyses based on the use of reciprocated distances.
In our examples we have considered performing the PCA either on centered data (PCA1) or on centered and sphered data (PCA2), as they highlight different aspects of the structure, the former retaining length information and the latter emphasizing directionality. In general we find the results from PCA1 more useful, in particular for this application where the length of the individual data point vectors (i.e. for each experiment) carries important information that is lost when sphering the input data.
The sensitivity defined in [1], or projection of δs onto a direction given by the gradient of a QCD variable (e.g. cross section prediction) can also be inspected visually and the tour permits this visualisation in multiple dimensions.
We conclude that the above described method is a valuable tool for PDF uncertainty and sensitivity studies. In addition, the visual analysis allows a better understanding of the method itself and can uncover unexpected features, and even possibly errors. It can provide experiments with a guide to the measurements needed to improve PDF fits.

A CTEQ labelling IDs
These are the same numbers used in Ref. [1], we reproduce them here for convenience. Experimental datasets included in the CT14HERA2 fit are listed in Table 3, additional results included in the study are given in Table 4.

B Projections on the principal components
Here we list the projection matrices used to obtain projections in the various PC subspaces considered, for PCA1 in Table 5, for PCA2 in Table 6, for the PCA performed on the jet cluster in Table 7 and for the PCA performed only on the DY data in Table 8.