The Hidden Geometry of Particle Collisions

We establish that many fundamental concepts and techniques in quantum field theory and collider physics can be naturally understood and unified through a simple new geometric language. The idea is to equip the space of collider events with a metric, from which other geometric objects can be rigorously defined. Our analysis is based on the energy mover's distance, which quantifies the"work"required to rearrange one event into another. This metric, which operates purely at the level of observable energy flow information, allows for a clarified definition of infrared and collinear safety and related concepts. A number of well-known collider observables can be exactly cast as the minimum distance between an event and various manifolds in this space. Jet definitions, such as exclusive cone and sequential recombination algorithms, can be directly derived by finding the closest few-particle approximation to the event. Several area- and constituent-based pileup mitigation strategies are naturally expressed in this formalism as well. Finally, we lift our reasoning to develop a precise distance between theories, which are treated as collections of events weighted by cross sections. In all of these various cases, a better understanding of existing methods in our geometric language suggests interesting new ideas and generalizations.


Introduction
Unification of ideas in physics has been an important way of achieving elegance, clarity, and simplicity, which in turn helps inspire meaningful new developments. In this paper, we use the energy mover's distance (EMD) between collider events [1] to provide a natural geometric language that unifies many important concepts and techniques in quantum field theory and collider physics from the past five decades. Furthermore, we introduce and discuss several new ideas inspired by this geometric approach to studying the space of events. Throughout this paper, we refer to an event and its energy flow interchangeably. The energy flow, or distribution of energy, is the kinematic information that is experimentally observable and perturbatively well-defined in quantum field theories with massless particles [2]. As it relates to collider physics, the energy flow has been extensively studied [2][3][4][5][6][7][8][9][10][11][12][13][14][15][16][17], and this paper builds on many of these previous concepts. For an event consisting of M particles with positive energies E i and angular directionsn i , the energy flow is: (1.1) Note that the energy flow is insensitive to charge and flavor information. Particles are taken to be massless in the body of this paper, with n µ i = (1,n i ) µ = p µ i /E i , and the case of massive particles is discussed in App. A. In a hadron collider context, particle transverse momenta p T,i are typically used in place of particle energies, but we focus on energies in this paper to minimize extraneous notation.
The EMD was introduced in Ref. [1] as a metric between events. It is based on the well-known earth mover's distance [18][19][20][21][22], also known as the Wasserstein metric [23,24]. Intuitively, the EMD between two events is the amount of "work" required to rearrange one event to the other. Its value can be obtained by solving the following optimal transport problem between energy flows E and E : where θ ij is a pairwise distance between particles known as the ground metric, R > 0 is a parameter controlling the tradeoff between transporting energy and destroying it, and β > 0 is an angular weighting exponent. 1 For the angular metric between two massless particles, 1 Strictly speaking, for the case of β > 1, one must raise the first term in Eq. (1.2) to the 1/β power for the EMD to be a proper metric satisfying the triangle inequality, in which case it is known as a p-Wasserstein metric with p = β. Additionally, 2R should be larger than or equal to the maximum distance in the ground space for the EMD to satisfy the triangle inequality. When written without subscripts, EMD(E, E ) refers to the case of β = 1 and a sufficiently large R to ensure that we have a proper metric. Even if the EMD is not a proper metric, though, it is still a valid optimal transport problem for any positive values of β and R. M v e h / 4 G g Z k X W f z / p 9 4 U c g q B 4 N S C + d m n J W Y 1 M K i k h q a X l w 5 K I W 8 F k u Y e T Q i B 5 f U 3 W k N 3 f P K g m a F 9 Z 9 B 2 q m 3 H b X I n V v l q Z 9 s r 3 S b v V b 8 X 2 9 W Y T Z J a m X K C s H I m 0 V Z p S k W t E 2 b L p Q F i X r l Q U i r / K 1 U X g k r J P p / c m d L B i u T l 0 3 P B 8 M 3 Y 7 g P 5 6 M h Z 0 P + 9 c P g h K 0 j 2 i a v y V u y T z g 5 J i f k M z k j U y L J 7 2 A r 2 A l 2 g 7 / h m 3 A Q 7 t 2 M h s H a 8 4 r c q X D 4 D + K s t z w = < / l a t e x i t > The distance between events is quantified by the EMD, giving rise to a metric space. Geometry in this abstract space of events provides a natural language to understand many ideas and developments in quantum field theory and collider physics.
we focus on the case of θ ij = 2n µ i n jµ = 2(1 −n i ·n j ), (1.4) which reduces to their opening angle in the nearby limit. 2 The first term in Eq. (1.2) quantifies the difference in radiation patterns while the second term, which vanishes in the case of normalized energy flows, allows for the comparison of events with different total energies. The constraints in Eq. (1.3) specify that the amount of energy moved to or from a particle cannot exceed its initial energy, and that as much energy must be moved as possible. The EMD has previously been used to bound modifications to infrared-and collinearsafe (IRC-safe) observables, distinguish different types of jets, and enable visualizations of the space of events [1]. It has also been used to explore the space of jets and quantify detector effects with CMS Open Data from the Large Hadron Collider (LHC) [25]. Alternative pairwise event distances were considered in Ref. [26] in the context of new physics searches. Here, we demonstrate that the EMD can be used to clarify numerous concepts throughout quantum field theory and collider physics using a unified language of event space geometry. In addition to demonstrating how concepts such as IRC safety, observables, jet finding, and pileup subtraction are related, we will develop new ideas and techniques in each of these areas, which we describe below.
Equipping collider events with a metric allows us to explore interesting geometric and topological ideas in the space of events. Fig. 1 illustrates the space of events with the EMD as a metric. One key construction for relating these concepts is the notion of a manifold in the space of events, which will allow us to define the distance between an event and a manifold, as well as the point of closest approach on a manifold. Since fixed-order perturbation theory works with a definite number of particles, an important type of manifold will be the idealized massless N -particle manifold: which, intuitively, is the set of all possible events with N massless particles. Note that P N ⊃ P N −1 ⊃ · · · P 2 ⊃ P 1 ⊃ P 0 via soft and collinear limits, so that the idealized N -particle manifold contains each manifold of smaller particle multiplicity.
The key concepts unified in this paper are outlined and summarized in Table 1. In Sec. 2, we discuss observables as functions defined on event radiation patterns and IRC safety as smoothness in the space of energy flows. Colloquially, the label "IRC safe" indicates that an observable should be well-defined and calculable in perturbation theory [27,28] due to its robustness to long-distance effects (e.g. hadronization in the case of QCD). This "perturbatively accessible" IRC safety is traditionally connected to the observable being "insensitive" to the addition of low energy particles or collinear splittings of particles [29][30][31][32][33][34][35][36].
Here, we refine the definition of IRC safety and clarify when discontinuities in an observable spoil its perturbative calculability. Critical to our formulation is the notion of continuity with respect to the metric topology provided by the EMD: Definition 1. An observable O is EMD continuous at an event E if, for any > 0, there exists a δ > 0 such that for all events E : (1. 6) We argue that IRC safety is EMD continuity everywhere except a negligible set of events, where a negligible set is one that contains no EMD balls of non-zero radius. Using the EMD provides a particle-free definition of IRC safety, which circumvents many pathologies of previous definitions. We argue that observables that are calculable in fixed-order perturbation are exactly those that satisfy a slightly stronger continuity condition known as Hölder continuity [62,63], which restricts the types of divergences that can appear in the distribution of an observable [32,36]. Fascinatingly, this framework naturally accommodates Sudakovsafe observables [37][38][39] as those that are IRC safe but fail to satisfy EMD Hölder continuity on a non-negligible subset of some P N (where a non-negligible subset of P N is one that has measure in P N ). This suggests, in agreement with Ref. [39], that Sudakov safe observables are indeed perturbatively calculable once properly regulated. Sec.

Concept Equation Illustration
E < l a t e x i t s h a 1 _ b a s e 6 4 = " y v F x + S t u S u W / b U a E G o k a k v U 6 6 a w = " > A A A C B n i c d V D J S g N B E K 2 J W 4 x b X G 5 e G o P g a Z g R I f E W E M F j B L P A J I S e T k / S p K d 7 6 O 4 R w p C 7 P + B V / 8 C b e P U 3 / A G / w 8  h F o u s j S R j l p K 2 o Y q Q b C 4 J C n 5 F H f 3 y T + 4 9 P R E g a 8 Q c 1 i Y k X o i G n A c V I a a l v 1 t N e 8 Y k r h r 6 X 2 j W 7 w O U K y X o h U i O M W H q X Z X 2 z + q t b q 8 S Z k 2 r z B A q 0 + u Z X b x D h J C R c Y Y a k d B 0 7 V l 6 K h K K Y k a z S S y S J E R 6 j I X E 1 5 S g k 0 k u L Y J l 1 r p W B F U R C P 6 6 s Q v 2 7 k a J Q y k n o 6 8 k 8 o 1 z 2 c v E / z 0 1 U 0 P B S y u N E E Y 5 n h 4 K E W S q y 8 q q s A R U E K z b R B G F B d V Y L j 5 B A W O l C F 6 4 E Z M L D O K v o Y p z l G l Z J 5 6 r m 2 D X n v l 5 t N m Y N Q R l O 4 Q w u w I F r a M I t t K A N G J 7 h F d 7 g 3 X g x p s a H 8 T k b L R n z n W N Y g P H 9 A z B w p y g = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " r V P S F T 0 / L T / Z t F t i M L Z r A q 8 s a K A = " > A A A C M X i c b V D L S s N A F L 2 p r 1 p f 8 b F z E y y C C y m J F O y y 4 M a d F W w t p K F M p p N 2 6 G Q S Z i Z C C f k R f 8 M f c K t / 0 J 2 4 c O N P O E k r 2 N Y D A 4 d z 7 p 1 7 O H 7 M q F S 2 P T V K a + s b m 1 v l 7 c r O 7 t 7 + g X l 4 1 J F R I j B p 4 4 h F o u s j S R j l p K 2 o Y q Q b C 4 J C n 5 F H f 3 y T + 4 9 P R E g a 8 Q c 1 i Y k X o i G n A c V I a a l v 1 t N e 8 Y k r h r 6 X 2 j W 7 w O U K y X o h U i O M W H q X Z X 2 z + q t b q 8 S Z k 2 r z B A q 0 + u Z X b x D h J C R c Y Y a k d B 0 7 V l 6 K h K K Y k a z S S y S J E R 6 j I X E 1 5 S g k 0 k u L Y J l 1 r p W B F U R C P 6 6 s Q v 2 7 k a J Q y k n o 6 8 k 8 o 1 z 2 c v E / z 0 1 U 0 P B S y u N E E Y 5 n h 4 K E W S q y 8 q q s A R U E K z b R B G F B d V Y L j 5 B A W O l C F 6 4 E Z M L D O K v o Y p z l G l Z J 5 6 r m 2 D X n v l 5 t N m Y N Q R l O 4 Q w u w I F r a M I t t K A N G J 7 h F d 7 g 3 X g x p s a H 8 T k b L R n z n W N Y g P H 9 A z B w p y g = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " r V P S F T 0 / L T / Z t F t i M L Z r A q 8 s a K A = " > A A A C M X i c b V D L S s N A F L 2 p r 1 p f 8 b F z E y y C C y m J F O y y 4 M a d F W w t p K F M p p N 2 6 G Q S Z i Z C C f k R f 8 M f c K t / 0 J 2 4 c O N P O E k r 2 N Y D A 4 d z 7 p 1 7 O H 7 M q F S 2 P T V K a + s b m 1 v l 7 c r O 7 t 7 + g X l 4 1 J F R I j B p 4 4 h F o u s j S R j l p K 2 o Y q Q b C 4 J C n 5 F H f 3 y T + 4 9 P R E g a 8 Q c 1 i Y k X o i G n A c V I a a l v 1 t N e 8 Y k r h r 6 X 2 j W 7 w O U K y X o h U i O M W H q X Z X 2 z + q t b q 8 S Z k 2 r z B A q 0 + u Z X b x D h J C R c Y Y a k d B 0 7 V l 6 K h K K Y k a z S S y S J E R 6 j I X E 1 5 S g k 0 k u L Y J l 1 r p W B F U R C P 6 6 s Q v 2 7 k a J Q y k n o 6 8 k 8 o 1 z 2 c v E / z 0 1 U 0 P B S y u N E E Y 5 n h 4 K E W S q y 8 q q s A R U E K z b R B G F B d V Y L j 5 B A W O l C F 6 4 E Z M L D O K v o Y p z l G l Z J 5 6 r m 2 D X n v l 5 t N m Y N Q R l O 4 Q w u w I F r a M I t t K A N G J 7 h F d 7 g 3 X g x p s a H 8 T k b L R n z n W N Y g P H 9 A z B w p y g = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " c 3 d l U L j j k Q s / x f G 9 T z G p 8  T j E c U C 4 w g x J 6 d h W p N w U C U U x I 1 l t G E s S I T x D E + J o y l F A p J s W w T J 4 o Z U x 9 E O h H 1 e w U P 9 u p C i Q M g k 8 P Z l n l K t e L v 7 n O b H y W 2 5 K e R Q r w v H i k B 8 z q E K Y V w X H V B C s W K I J w o L q r B B P k U B Y 6 U K X r v g k 4 U G U 1 X Q x 9 m o N 6 6 R 3 3 b C t h v 3 Q r L d b Z U V V c A b O w S W w w Q 1 o g z v Q A V 2 A w T N 4 B W / g 3 X g x 5 s a H 8 b k Y r R j l z i l Y g v H 9 A 7 e 1 p t Q = < / l a t e x i t > E 0 < l a t e x i t s h a 1 _ b a s e 6 4 = " 0 o 6 4 k Q W U L Z q P q W N 7 i J F U C G w J W e I = " > A A A C B 3 i c d V D L S g M x F L 1 T X 7 W + 6 m P n J l h E V 2 V G h N Z d Q Q S X F W w t t E P J p J k 2 N J M J S U Y Y h n 6 A P + B W / 8 C d u P U z / A G / w 3 S q Y H 0 c C B z O u Z d 7 c g L J m T a u + + Y U F h a X l l e K q 6 W 1 9 Y 3 N r f L 2 T l v H i S K 0 R W I e q 0 6 A N e V M 0 J Z h h t O O V B R H A a c 3 w f h 8 6 t / c U q V Z L K 5 N K q k f 4 a F g I S P Y W K n X i 7 A Z E c y z i 8 l R v 1 x x q 2 c 1 1 w L 9 J l 7 V z V F p 7 E G O Z r / 8 3 h v E J I m o M I R j r b u e K 4 2 f Y W U Y 4 X R S 6 i W a S k z G e E i 7 l g o c U e 1 n e e Y J O r T K A I W x s k 8 Y l K v f N z I c a Z 1 G g Z 2 c Z t Q / v a n 4 l 9 d N T F j 3 M y Z k Y q g g s 0 N h w p G J 0 b Q A N G C K E s N T S z B R z G Z F Z I Q V J s b W N H c l p K m I 5 K R k i / n 6 P f q f t E + q n l v 1 r k 4 r j f q s I S j C P h z A M X h Q g w Z c Q h N a Q E D C P T z A o 3 P n P D n P z s t s t O B 8 7 u z C H J z X D w / p m o Q = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " 0 o 6 4 k Q W U L Z q P q W N 7 i J F U C G w J W e I = " > A A A C B 3 i c d V D L S g M x F L 1 T X 7 W + 6 m P n J l h E V 2 V G h N Z d Q Q S X F W w t t E P J p J k 2 N J M J S U Y Y h n 6 A P + B W / 8 C d u P U z / A G / w 3 S q Y H 0 c C B z O u Z d 7 c g L J m T a u + + Y U F h a X l l e K q 6 W 1 9 Y 3 N r f L 2 T l v H i S K 0 R W I e q 0 6 A N e V M 0 J Z h h t O O V B R H A a c 3 w f h 8 6 t / c U q V Z L K 5 N K q k f 4 a F g I S P Y W K n X i 7 A Z E c y z i 8 l R v 1 x x q 2 c 1 1 w L 9 J l 7 V z V F p 7 E G O Z r / 8 3 h v E J I m o M I R j r b u e K 4 2 f Y W U Y 4 X R S 6 i W a S k z G e E i 7 l g o c U e 1 n e e Y J O r T K A I W x s k 8 Y l K v f N z I c a Z 1 G g Z 2 c Z t Q / v a n 4 l 9 d N T F j 3 M y Z k Y q g g s 0 N h w p G J 0 b Q A N G C K E s N T S z B R z G Z F Z I Q V J s b W N H c l p K m I 5 K R k i / n 6 P f q f t E + q n l v 1 r k 4 r j f q s I S j C P h z A M X h Q g w Z c Q h N a Q E D C P T z A o 3 P n P D n P z s t s t O B 8 7 u z C H J z X D w / p m o Q = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " 0 o 6 4 k Q W U L Z q P q W N 7 i J F U C G w J W e I = " > A A A C B 3 i c d V D L S g M x F L 1 T X 7 W + 6 m P n J l h E V 2 V G h N Z d Q Q S X F W w t t E P J p J k 2 N J M J S U Y Y h n 6 A P + B W / 8 C d u P U z / A G / w 3 S q Y H 0 c C B z O u Z d 7 c g L J m T a u + + Y U F h a X l l e K q 6 W 1 9 Y 3 N r f L 2 T l v H i S K 0 R W I e q 0 6 A N e V M 0 J Z h h t O O V B R H A a c 3 w f h 8 6 t / c U q V Z L K 5 N K q k f 4 a F g I S P Y W K n X i 7 A Z E c y z i 8 l R v 1 x x q 2 c 1 1 w L 9 J l 7 V z V F p 7 E G O Z r / 8 3 h v E J I m o M I R j r b u e K 4 2 f Y W U Y 4 X R S 6 i W a S k z G e E i 7 l g o c U e 1 n e e Y J O r T K A I W x s k 8 Y l K v f N z I c a Z 1 G g Z 2 c Z t Q / v a n 4 l 9 d N T F j 3 M y Z k Y q g g s 0 N h w p G J 0 b Q A N G C K E s N T S z B R z G Z F Z I Q V J s b W N H c l p K m I 5 K R k i / n 6 P f q f t E + q n l v 1 r k 4 r j f q s I S j C P h z A M X h Q g w Z c Q h N a Q E D C P T z A o 3 P n P D n P z s t s t O B 8 7 u z C H J z X D w / p m o Q = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " g e g d 3 A J G 8 T E z I f q 9 Y w T W 3 G 7 s r g s = " > A i p u l r e 2 d 3 b 3 y / k F L R Y m k 2 K Q R j + S 9 T x R y J r C p m e Z 4 H 0 s k o c + x 7 Y + u J / X 2 A 0 r F I n G n x z F 6 I R k I F j B K t L F 6 Z T f t 5 p 9 0 5 M D 3 U q f q 5 D p f g q y L s W I 8 E l m v X P k 1 7 Y e D j P + f M O f x + z J n S j v N u F V Z W 1 9 Y 3 i p u l r e 2 d 3 b 3 y / k F L R Y m k 2 K Q R j + S 9 T x R y J r C p m e Z 4 H 0 s k o c + x 7 Y + u J / X 2 A 0 r F I n G n x z F 6 I R k I F j B K t L F 6 Z T f t 5 p 9 0 5 M D 3 U q f q 5 D p f g q y L s W I 8 E l m v X P k 1 7 Y e D j P + f M O f x + z J n S j v N u F V Z W 1 9 Y 3 i p u l r e 2 d 3 b 3 y / k F L R Y m k 2 K Q R j + S 9 T x R y J r C p m e Z 4 H 0 s k o c + x 7 Y + u J / X 2 A 0 r F I n G n x z F 6 I R k I F j B K t L F 6 Z T f t 5 p 9 0 5 M D 3 U q f q 5 D p f g q y L s W I 8 E l m v X P k 1 7 x o o C T + a I g Y Z a K r F l O 1 o g K I I q l G j A R V N 9 q k Q k W m C i d 5 s K W A F I e x n l N B + M s x 7 A K 3 c u G Y z e c u 6 t 6 q 1 l G V E U n 6 B S d I w d d o x a 6 R W 3 U Q Q Q 9 o m f 0 g l 6 N J + P N + D A + 5 6 0 V o 5 w 5 R g s y v n 8 A R R e l j w = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " C T i t 7 o w v 6 R b o 1 Q g I m t m r L E J T E C w = " > A A A C L H i c b Z D L S s N A F I Z P v F t v 8 b J z E y y C C 6 m J C H Z Z c O N S w V q h C W U y P W k H J 5 M w M x F C y G P 4 G r 6 A W 3 0 D N y J u + x x O 0 w p q / W H g 4 z / n z D n 8 Y c q Z 0 q 7 7 b s 3 N L y w u L a + s 1 t b W N z a 3 7 O 2 d W 5 V k k m K b J j y R d y F R y J n A t m a a 4 1 0 q k c Q h x 0 5 4 f z G u d x 5 Q K p a I G 5 2 n G M R k I F j E K N H G 6 t k n h V 9 9 0 p W D M C j c h l v p e A Z K v 4 9 c k 7 J n 1 7 8 t Z x a 8 K d R b e 1 D p q m e P / H 5 C s h B V 6 t J + v N + r A + J 6 1 z 1 n R m F 3 7 J G n 0 B 2 F 6 k 3 w = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " C T i t 7 o w v 6 R b o 1 Q g I m t m r L E J T E C w = " > A A A C L H i c b Z D L S s N A F I Z P v F t v 8 b J z E y y C C 6 m J C H Z Z c O N S w V q h C W U y P W k H J 5 M w M x F C y G P 4 G r 6 A W 3 0 D N y J u + x x O 0 w p q / W H g 4 z / n z D n 8 Y c q Z 0 q 7 7 b s 3 N L y w u L a + s 1 t b W N z a 3 7 O 2 d W 5 V k k m K b J j y R d y F R y J n A t m a a 4 1 0 q k c Q h x 0 5 4 f z G u d x 5 Q K p a I G 5 2 n G M R k I F j E K N H G 6 t k n h V 9 9 0 p W D M C j c h l v p e A Z K v 4 9 c k 7 J n 1 7 8 t Z x a 8 K d R b e 1 D p q m e P / H 5 C s h B V 6 t J + v N + r A + J 6 1 z 1 n R m F 3 7 J G n 0 B 2 F 6 k 3 w = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " C T i t 7 o w v 6 R b o 1 Q g I m t m r L E J T E C w = " > A A A C L H i c b Z D L S s N A F I Z P v F t v 8 b J z E y y C C 6 m J C H Z Z c O N S w V q h C W U y P W k H J 5 M w M x F C y G P 4 G r 6 A W 3 0 D N y J u + x x O 0 w p q / W H g 4 z / n z D n 8 Y c q Z 0 q 7 7 b s 3 N L y w u L a + s 1 t b W N z a 3 7 O 2 d W 5 V k k m K b J j y R d y F R y J n A t m a a 4 1 0 q k c Q h x 0 5 4 f z G u d x 5 Q K p a I G 5 2 n G M R k I F j E K N H G 6 t k n h V 9 9 0 p W D M C j c h l v p e A Z K v 4 9 c k 7 J n 1 7 8 t Z x a 8 K d R b e 1 D p q m e P / H 5 C s h B V 6 t J + v N + r A + J 6 1 z 1 n R m F 3 7 J G n 0 B 2 F 6 k 3 w = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " 3 p F / J b 7 / c A n q P I g L p v 2 6 u M S v w l E = " > A A A C L H i c b Z D N S s N A F I U n 9 a / W v 6 p L N 4 N F c C E 1 E c E u C 2 5 c V r C t k I Q y m d 6 0 Q y e T M D M R S s h j + B q + g F t 9 A z c i b v s c T t s I t v X A w M e 5 9 8 6 9 n C D h T G n b / r R K a + s b m 1 v l 7 c r O 7 t 7 + Q f X w q K P i V F J o 0 5 j H 8 j E g C j g T 0 N Z M c 3 h M J J A o 4 N A N R r f T e v c J p G K x e N D j B P y I D A Q L G S X a W L 3 q Z e b N P n H l I P A z u 2 7 P d L E C u d c H r k n e q 9 Z + L b w K T g E 1 V K j V q 0 6 8 f k z T C I S m n C j l O n a i / Y      Observables v g x Z 0 o 7 z p t V W F p e W V 0 r r p c 2 N r e 2 d 8 q 7 e 2 0 V J Z L Q F o l 4 J L s + V p Q z Q V u a a U 6 7 s a Q 4 9 D n t + J N G 7 n d u q V Q s E t d 6 G l M v x C P B A k a w N p L X D 7 E e E 8 z T i 2 z Q G J Q r j n 1 e d Q z Q b + L a z g y V + g H M 0 B y U 3 / v D i C Q h F Z p w r F T P d W L t p V h q R j j N S v 1 E 0 R i T C R 7 R n q E C h 1 R 5 6 S x 0 h o 6 N M k R B J M 0 T G s 3 U 7 x s p D p W a h r 6 Z z E O q n 1 4 u / u X 1 E h 3 U v J S J O N F U k P m h I O F I R y h v A A 2 Z p E T z q S G Y S G a y I j L G E h N t e l q 4 E t C p C O O s Z I r 5 + j 3 6 n 7 R P b d e x 3 a u z S r 0 2 b w i K c A h H c A I u V K E O l 9 C E F h C 4 g X t 4 g E f r z n q y n q 2 X + W j B + t z Z h w V Y r x / 9 0 Z s J < / l a t e x i t > < l a t e Theory Space Table 1: Concepts from quantum field theory and collider physics, unified in this paper as geometric and topological constructions in the space of events. In Sec. 2, IRC safety is identified as continuity in this space. In Sec. 3, many classic collider observables are shown to be the shortest distance between the event and a manifold of events. In Sec. 4, popular jet algorithms are derived by projecting the event onto manifolds of N -particle events. In Sec. 5, common pileup mitigation strategies are cast as transporting away uniform radiation. In Sec. 6, a space of theories is developed using a distance between event distributions.
In Sec. 3, we highlight that many well-known collider observables can be viewed as the distance of closest approach between an event and a manifold of events. Many of the observables we consider can be exactly cast as: for particular choices of the manifold M and parameters β and R. Observables that have the form of Eq. (1.7) include thrust [40,41], spherocity [42], (recoil-free) broadening [43], and N -jettiness [44]. Particularly interesting is the event isotropy, recently proposed in Ref. [45], which was inspired by EMD geometry and is directly based on optimal transport. This geometric framework also includes jet substructure observables such as jet angularities [46] and N -subjettiness [47,48]. In Sec. 4, we demonstrate how jet finding can be phrased in our geometric language. Intuitively, a jet algorithm "approximates" an M -particle event with N < M objects called jets. To phrase this geometrically, we are interested in the point of closest approach in P N to our event, allowing us to define jets as: where J is the collection of N jets corresponding to the event E. Many common jet finding algorithms can be derived in full detail from Eq. (1.8). For instance, we show that jets defined by Eq. (1.8) are precisely those found by XCone [49,50], where β is the angular weighting exponent and R is the jet radius. Also, several popular sequential clustering algorithms and recombination schemes, such as k T clustering [51,52] with winner-take-all recombination [43,53,54], can be exactly obtained by iterating Eq. (1.8) with N = M − 1 for various β. It is satisfying that a rich diversity of jet algorithms can be concisely encoded using event geometry, and we find that several new schemes not previously appearing in the literature naturally emerge. In Sec. 5, we connect several pileup mitigation strategies to optimal transport through the EMD. There is a long-established relationship between pileup subtraction and geometric concepts [55][56][57][58][59][60][61]. Since pileup is reasonably modeled as uniform contamination in rapidity and azimuth [60], we phrase pileup subtraction as removing a uniform distribution of radiation from the event using optimal transport. Intuitively, pileup mitigation finds the event that, when combined with an amount ρ of uniform radiation U, is closest to the given event: yielding the pileup-corrected event E C . Here, Ω refers to the space of all possible energy flows and EMD β compares events of equal energy, as described at the beginning of Sec. 3. We demonstrate that Voronoi area subtraction [55,56] and constituent subtraction [58] can be phrased exactly as Eq. (1.9) in the small-pileup limit. Generalizing this to the large-pileup limit, we develop two new pileup subtraction schemes, Apollonius subtraction and iterated Voronoi subtraction, and discuss their prospects and potential advantages.  Table 2: Comparing the constructions of EMD and ΣMD as optimal transport problems. Events are treated as energy-weighted angular distributions, whereas theories are treated as cross section-weighted event distributions. This connection allows us to bootstrap the EMD as a ground metric for the ΣMD to develop a rigorous notion of theory space.
In Sec. 6, we introduce a distance between theories: the cross section mover's distance (stylized as ΣMD, using the typical greek letter for cross section). Here, a "theory" T is taken to be a distribution over (or collection of) events {E i } weighted by cross sections {σ i }: (1.10) The ΣMD is formulated as an optimal transport problem with EMD as the ground metric and cross sections as the weights. The similarity of the constructions of EMD and ΣMD are highlighted in Table 2. Interestingly, we connect ΣMD to a recently proposed technique for probing jet modifications due to the quark-gluon plasma by comparing similar sets of events between proton-proton and heavy-ion collisions [64]. We also demonstrate that representative events can be identified by clustering using the ΣMD, analogously to how particles are clustered into jets. The ΣMD provides the foundation for a rigorous formulation of "theory space", quantifying how different two theories are based on all of their physically observable quantities simultaneously. Our conclusions are presented in Sec. 7, where we also highlight the interesting and unique interplay between machine learning and the natural sciences in this story.

Infrared and collinear safety: Smoothness in the space of events
IRC safety is a central notion in collider physics because it indicates when an observable is robust to long distance effects and hence can be described in perturbation theory [27,28]. This insensitivity is frequently connected to the invariance of an observable under certain modifications of the event, namely soft and collinear splittings [29][30][31][33][34][35][36].
In this section, we review some of the common mathematical statements of this invariance that have appeared in the literature, with the goal of clarifying and categorizing their implications. We arrive at a simple, unified description of IRC safety and related concepts (including

Sudakov Safe
Discontinuous on some N-particle manifolds  Table 3.
Sudakov safety) as statements about continuity in the space of energy flows. In Fig. 2, we show the breakdown of observables into broad classes according to our categorization. A few common examples of each category are given in Table 3.

Review of infrared and collinear invariance
The most straightforward statement of IRC invariance is that an observable O is unchanged under the addition of an exactly zero energy particle or an exactly collinear splitting [33]: for any soft momentum p µ 0 and collinear splitting fraction λ ∈ [0, 1]. These conditions correctly rule out some observables from having a perturbative description, such as the number of particles in an event, which change by a finite amount under any splitting. Exact IRC invariance, however, is not sufficiently restrictive to guarantee perturbative calculability of an observable. For instance, the number of calorimeter cells with non-zero energy is safe according to Eqs. (2.1) and (2.2), though it is highly sensitive to arbitrarily low-energy effects [69]. Similarly, the pseudo-multiplicity, which we define as the smallest N that yields
Another common statement of IRC invariance refines the concept by invoking the limit as particles become soft or collinear [30,31,34,36]: One issue with this definition is that many reasonable observables that have hard boundaries in phase space are excluded, such as jet kinematics due to sensitivity to particles on a jet boundary. Hybrid definitions mixing exact and near IRC invariance also appear in the literature but they suffer from the same pathologies. Another issue is that Eqs. (2.3) and (2.4) (and also Eqs. (2.1) and (2.2)) do not guarantee insensitivity to multiple soft or collinear splittings. Several of these issues were previously identified in Ref. [36], which utilized a limit-based statement of IRC invariance, recognized the importance of allowing for multiple soft and collinear emissions, and allowed for exceptions on sets of measure zero. Despite noting that a rigorous mathematical definition of IRC safety would be desirable, Ref. [36] concluded that formulating one without pathologies was challenging and that a satisfactory definition had not yet been obtained. Here, we explore how the geometric picture provided by the EMD yields a natural and elegant way to phrase IRC safety and to control these various subtleties. This builds on the notion of "C-continuity" advocated for in Refs. [2,7], which argue that the perturbative calculability of C-continuous observables can be seen by relating the energy flow to the stress-energy tensor of the underlying quantum field theory.

Infrared and collinear safety in the space of events
The EMD provides a natural language for understanding IRC-safe observables as continuous functions on the space of events. To make this precise, we first must understand which observables are well-defined functions of the energy flow.
We can show that observables that are defined on all energy flows are precisely those which have exact IRC invariance according to Eqs. (2.1) and (2.2). First, an observable is well defined on the space of energy flows if its value is the same on events that are zero EMD apart. The following lemma establishes the remaining connection to exact IRC invariance. Proof. Adding a zero energy particle or a collinear splitting to an event manifestly does zero energy moving, proving the forward direction. To prove the reverse direction, suppose that two events are zero EMD apart and take their energy flows to be: Since the EMD is a proper metric between energy flows, the identity of indiscernibles says that EMD(E(n), E (n)) = 0 implies E(n) = E (n). For any directionn with at least one particle, either the sums of energies in that direction are equal between the two events or the particle has zero energy. In the first case, the events differ by exactly collinear splittings in that direction, and in the second case they differ by zero energy particles.
By this lemma we see that exact IRC invariance ensures that we can write O(E) rather than O(p µ 1 , · · · , p µ M ) for an observable. As discussed in Sec. 2.1, exact IRC invariance is insufficient to guarantee IRC safety and we must formulate a stronger condition phrased in the geometric language of the space of events.
We propose that IRC safety is achieved by requiring an observable to be EMD continuous, in the sense of Definition 1, except possibly on a negligible set of events. We define a negligible set to be one that contains no EMD ball. The (open) EMD ball B r (E) around an event E is defined as all events within an EMD of r > 0: where Ω is the space of all energy flows. Implicit in the above requirement is that an observable must be well defined on energy flows. Concretely, we state IRC safety as the following: Infrared and Collinear Safety. An observable is IRC safe if it is EMD continuous for all energy flows, except potentially on a negligible set of events.
This new formulation of IRC safety has many aspects of existing ideas of safety discussed in Sec. 2.2 wrapped into a concise and rigorous statement. It makes mathematically precise the intuitive notion that small perturbations in the energy flow of the event give rise to small perturbations in the observable. This notion of EMD continuity for IRC safe observables is illustrated in Fig. 3. The exception for negligible sets allows observables to be discontinuous in a way that affords them the opportunity to depend sharply on phase space but does not spoil their calculability. Calculability is a statement about integrability, and removing a negligible set of points from an integral cannot change its value.
To get some familiarity with this definition, consider additive IRC-safe observables, which are ubiquitous structures [17] for an angular function f . One can prove that they are Lipschitz continuous in the space of events assuming f is Lipschitz continuous [1], and therefore they naturally satisfy continuity according to the EMD. As a generalization of additive observables, energy flow networks [16] are a machine learning architecture that can approximate any IRC-safe observable through an additive IRC-safe latent space. As long as the activation functions are continuous almost everywhere, then the final energy flow network output will be IRC safe.
There are also observables that fail the criteria of Eqs. (2.3) and (2.4) for small sets of events but are safe according to our definition and are indeed calculable. The energy of a jet is a simple example where emissions on the jet boundary result in discontinuous behavior of the observable, but this discontinuity is integrable in fixed-order perturbation theory. A more complicated example is the invariant mass after soft drop grooming [38,68]: for events on the threshold of having an emission dropped, tiny perturbations can give rise to discontinuously large changes in the observable. This issue, however, only occurs on a negligible set, satisfying our definition of safety and avoiding serious analytic pathologies [78][79][80][81]. Piecewise continuity does, however, complicate analyzing the nonperturbative corrections [82] and detector response [83,84] of soft-dropped jet mass.
Our definition also includes observables that would sometimes not be called IRC safe since they do not have a well defined Taylor expansion in the small parameter of the theory (e.g. α s for QCD). These observables are nevertheless perturbatively calculable, though methods beyond fixed-order perturbation theory may be required. The next subsections are devoted to exploring which IRC-safe observables are calculable in fixed-order perturbation theory and which require additional techniques.

Calculability in fixed-order perturbation theory
IRC safety has long been connected with the notion of calculability order-by-order in perturbative quantum field theory. However, IRC safety according to our Definition 1 includes observables that are not calculable in fixed-order perturbation theory, which we explore further in the next subsection. Here, building off the work in Refs. [32,36], we formulate the stronger notion of EMD Hölder continuity [62,63] and argue that it is the appropriate condition to guarantee order-by-order perturbative control: Note that the case of α = 1 corresponds to Lipschitz continuity at E, and in general we have containment such that Hölder continuity with exponent α implies Hölder continuity with exponent β if β ≤ α. EMD Hölder continuity effectively specifies that the δ in Definition 1 is no smaller than to some power (times a constant) for all points in a neighborhood of E, and thus it is a stronger requirement than plain EMD continuity.
To connect to fixed-order perturbation theory, we state the following conjecture: An observable is calculable order-by-order in perturbation theory if it is EMD Hölder continuous on all but a negligible set of events in each N -particle manifold.
This relation phrases the ideas of Ref. [32] and "Version 2" of the IRC safety definition of Ref. [36] in our geometric language via the EMD. While these criteria were originally formulated for the calculability of moments of an observable, they appear to also extend to the calculability of distributions of observables [85]. It is possible to demonstrate a precise equivalence between our Conjecture 1 and the following criteria of Ref. [32] regarding when the average value of an observable O is calculable in fixed-order perturbation theory: where the powers a and b are positive and the choices of i and j are arbitrary. Here, Eq. (2.8) is a statement of Hölder continuity in the energy of particle i, which implies ordinary soft safety. Similarly, Eq. (2.9) is a statement of Hölder continuity in the angular distance between particles i and j, which implies ordinary collinear safety. In these soft and collinear limits, EMD(E, E ) ∝ E i and EMD(E, E ) ∝ θ ij respectively, and so Eqs. (2.8) and (2.9) can be phrased compactly as: for some positive exponent c. This is equivalent to the Hölder continuity of the observable O at E with some exponent α ≥ c, connecting the formulation of Ref. [32] to our conjecture. Our Conjecture 1 also nicely connects to "Version 2" of the IRC safety definition in Ref. [36], which we restate here with a suggestive relabeling of the original notation. The criteria for fixed-order calculability of an observable in Ref. [36] are as follows: Ref. [36]: Given almost any fixed set of particles and any value n, then for any > 0, however small, there should exist a δ > 0 such that producing n extra soft or collinear emissions, each emission being at a distance of no more than δ from the nearest particle, then the value of the observable does not change by more than . Furthermore, there should exist a positive power c such that for small , δ c can always be taken greater than .
By equipping the space of events with these topological and geometric structures via EMD, our language provides a natural language to sharply mathematically formulate this discussion. The first sentence can be encoded as EMD continuity of the observable on all but a negligible set of events. The power relation between the and δ parameters is precisely captured by EMD Hölder continuity with some exponent α > c, connecting to our Conjecture 1.
A variety of observables are considered in Ref. [36] at the boundary of perturbative calculability, which helpfully illustrate the various requirements in their definition. 4 An observable that is useful to consider is: , (2.11) where T N are N -jettiness observables [44] discussed further in Sec. 3.1.4, and E is the total energy of the event. We will refer to this observable as the "V parameter". The double logarithmic structure of T 3 spoils the integrability of V at fixed order due to its behavior as T 3 goes to zero [36], which is the three-particle manifold P 3 . Nonetheless, this observable can be calculated using techniques beyond fixed-order perturbation theory, such as the Sudakov safety approach discussed in the next section. The relation between our formalism and fixed-order perturbative calculability is phrased as a conjecture since additional subtleties or nuances about this type of calculability may emerge with future research. Nonetheless, it is very satisfying that our geometric language provides an efficient encapsulation and unification of the existing formulations of Refs. [32,36]. In future work, it would be interesting to find a geometric phrasing of recursive IRC safety [36], which is a more restrictive condition than EMD Hölder continuity and relevant for understanding factorization and resummation. It would also be interesting to find a geometric phrasing of unsafe observables that can be nevertheless be computed with the help of nonperturbative fragmentation functions (see Ref. [86] for a broad class of such observables). We hope that further refinements and developments will benefit from and be enabled by the rigorous geometric and topological constructions we have introduced for the space of events via the EMD.

A refined understanding of Sudakov safety
Sudakov-safe observables [37][38][39] are an interesting class of observables that are not typically considered IRC safe because divergences may appear order by order in perturbation theory; this issue was originally pointed out in Ref. [57]. Nevertheless, the distribution for a Sudakov-safe observable O s can be computed perturbatively by calculating its conditional distribution with an IRC-safe companion observable O c , resumming the O c distribution, and then marginalizing over O c to obtain a finite answer [39]: (2.12) The conditional probability p(O s |O c ) can either be computed in fixed-order perturbation theory or it can be further resummed to obtain a more accurate prediction for p(O s ).
Here, we interpret Sudakov-safe observables as observables that are IRC safe according to our definition but may be EMD (Hölder) discontinuous on sets with non-zero measure when restricted to some idealized massless N -particle manifold P N , defined in Eq. (1.5). The relevant manifolds are the N -particle manifolds since these contain the infrared singular regions of massless gauge theories, namely configurations that differ by soft and collinear splittings. The IRC safety of an observable according to our definition guarantees that any potentially problematic energy flows are infinitesimally close to energy flows for which the observable is well defined. The strategy in Eq. (2.12) also enables the computation of observables such as the V parameter in Eq. (2.11), which are EMD continuous everywhere but exhibit Hölder discontinuities on sets with non-zero measure in P N and are therefore incalculable with fixed-order perturbation theory alone.
It is instructive to make a connection to practical methods of computing Sudakov-safe observables. In a quantum field theory of massless particles, the cross section to produce events with exactly N particles is zero (i.e. the naive S-matrix is zero), and such theories ultimately yield smooth predictions in the space of events. Hence, divergences that appear in the calculation of such an observable in a fixed-order expansion can be regulated by a joint, all-orders calculation of the observable and the distance from the problematic manifold P N . This is precisely the strategy represented by Eq. (2.12), though Ref. [39] did not provide a generic method to identify the companion observable O c . In Sec. 3, we will establish that the distance from an event to the manifold P N is precisely N -(sub)jettiness [44,47,48], suggesting that they are universal companion observables for the calculation of Sudakov-safe observables, in a similar spirit to Refs. [87][88][89].
M < l a t e x i t s h a 1 _ b a s e 6 4 = "    It is worth mentioning that, even if an observable is EMD Hölder continuous everywhere, resummation along the lines of Eq. (2.12) may still be beneficial for making reliable predictions. The C-parameter [71][72][73] is an example of an EMD Hölder continuous observable, yet its fixed-order perturbative distribution exhibits discontinuous behavior at C = 3 4 [74]. This perturbative discontinuity can be smoothed through soft-gluon resummation, and such techniques are relevant for other observables that exhibit Sudakov shoulder behavior [90]. This is different, however, from Sudakov-safe observables, where the observable itself (and not just its distribution) is ill-defined on some P N .
To summarize, our definition of IRC safety does includes Sudakov-safe observables, but we argue that this is appropriate since such observables are indeed perturbatively accessible via regulation with N -(sub)jettiness.

Observables: Distances between events and manifolds
In this section, we show that a number of event-level and jet substructure observables can be identified as geometric quantities in the space of events. Broadly speaking, the observables we consider take the general form of a distance between an event and a manifold, as in Eq. (1.7). The illustration in Fig. 4 shows an observable as a distance between geometric objects in the space of events. We will work with unnormalized observables here, but normalized versions can be obtained by dividing by the total energy (or transverse momentum in the hadronic case).
We begin by discussing thrust and spherocity, where the manifold is the set of all backto-back two-particle events. To understand (recoil-free) broadening, we expand the manifold Thrust t(E) 2 P BB 2 : 2-particle events, back to back Spherocity s(E) 1 P BB 2 : 2-particle events, back to back Broadening b(E) 1 P 2 : 2-particle events Table 4: Observables as the EMD between the event E and a manifold M, using the EMD definition in Eq. (3.2). Several of these observables are illustrated in Fig. 5. Here, we consider only the "recoil-free" versions of these observables. to all two-particle events, beyond just back-to-back configurations. Then, to connect to Njettiness, we utilize the idealized N -particle manifold defined in Eq. (1.5). Our geometric language gives clear and intuitive explanations of what physics these observables probe and why they take the forms that they do. Finally, we identify jet angularities and N -subjettiness as jet substructure observables obeying similar principles at the level of jets.
For most of the observables in this section, the R parameter is not needed, in which case we define a notion of EMD relevant for comparing events with equal energies: This only has a finite limit if E and E have the same total energy, which is a useful property to simplify our analysis. Explicitly, when comparing events with equal energy, this EMD simplifies to: This will be the precise notion of EMD we use when the R subscript is suppressed.
In Table 4, we summarize some of the observables considered below and their geometric interpretations. In Fig. 5, we illustrate the geometric construction of many of these observables, which we will explore in detail below. I < l a t e x i t s h a 1 _ b a s e 6 4 = " H P 9 X 7 V l c + E c 5 e + w d 3 7 e r q W B   Figure 5: An illustration of a variety of observables as distances between an event E and various manifolds in the space of events, as summarized in Table 4. (a) Thrust t is the smallest distance from the event to the manifold P BB 2 of two-particle back-to-back events, while event isotropy I is the distance to the uniform event U. (b) N -jettiness observables T N are the smallest distances from the event to the N -particle manifolds P N .

Thrust
Thrust is an observable that quantifies the degree to which an event is pencil-like [40,41,91]. It has been experimentally measured [92][93][94][95][96][97][98][99][100][101][102][103][104] and theoretically calculated [105][106][107][108][109][110] in detail for electron-positron collisions. Thrust seeks to find an axisn (the "thrust axis") such that most of the radiation lies in the direction of eithern or −n; i.e. it maximizes the amount of radiation longitudinal to the thrust axis. While a variety of conventions for defining thrust exist, here we use the following dimensionful definition: wheren i = p i /| p i | and other definitions follow by simple rescalings. A thrust value of zero corresponds to an event consisting of two back-to-back prongs, while its maximum value of the total energy corresponds to a perfectly spherical event.
Interestingly, the value of thrust in Eq. (3.4) is equivalent to the cost of an optimal transport problem. This connection will allow us to cast thrust as a simple geometric quantity written in terms of the EMD. Using E i = | p i | for massless particles and writing out the absolute value, we can cast Eq. (3.4) as: For a fixedn, the summand in Eq. (3.5) is the transportation cost to move particle i to the closer ofn or −n with an angular measure of θ 2 ij = 2n µ i n jµ = 2(1 −n i ·n j ). The sum is then the EMD between the event and a two-particle event consisting of back-to-back particles directed alongn, where the energy of each of the two particles is equal to the total energy in the corresponding hemisphere. The minimization overn is equivalent to a minimization over all such two-particle events.
Thus, thrust is our first example of an observable that can be cast in the form of Eq. (1.7). First, we define the manifold of back-to-back two-particle events: (3.6) Then, using the notation of Eq. (3.2) with β = 2, 5 thrust is the smallest EMD from the event to the P BB 2 manifold: where the minimization is carried out over all back-to-back two-particle configurations. Because of the R → ∞ limit in Eq. (3.1), the optimal back-to-back configuration is guaranteed to have the same total energy as the event E, as desired. Note that even if this analysis is carried out in the center-of-mass frame, the optimal back-to-back configuration will generically not be at rest, since it involves two massless particles with different energies. 6 This suggests a possible variant of thrust where one restricts the two-particle manifold to only include events that are physically accessible, either by forcing E 1 = E 2 or by considering massive particles as in App. A.

Spherocity
Spherocity is an observable that also probes the jetty nature of events [42]. It seeks to find an axis that minimizes the amount of radiation in the event transverse to it according to the following criterion: (3.8) 5 As mentioned in footnote 1, strictly speaking only the square root of EMD2 is a proper metric. Because the square root is a monotonic function, though, this has no impact on the interpretation of thrust as an optimal transport problem. 6 We thank Samuel Alipour-fard for discussions related to this point.
where the original definition of spherocity is related to this by an overall rescaling. In the small s limit, where the event configurations are back to back, we can write | n i ×n| 2(1 − |n i ·n|) and obtain: (3.9) We focus on this limiting form for the following discussion. Similar to the case of thrust, we can identify the spherocity expression to be minimized as an optimal transport problem. For a fixedn, the summand in Eq. (3.9) is the cost to transport particle i to the closer ofn or −n with an angular measure of θ ij = 2n µ i n jµ . 7 The sum is once again the EMD from the event to the manifold of back-to-back events, with the minimization overn interpreted as a minimization over the manifold. Spherocity, in the appropriate limit, is therefore the square of the smallest EMD (with β = 1) from the event to the manifold P BB 2 from Eq. (3.6): (3.10) Through this lens, spherocity differs from thrust (besides the overall exponent) solely in the angular weighting factor: β = 1 for spherocity and β = 2 for thrust. One could continue in this direction, defining the distance of closest approach for general β. (This is related to the event shape angularities [70], with a key difference being that angularities are traditionally measured with respect to the thrust axis.) Instead, we now turn towards enlarging the manifold itself.

Broadening
Recoil-free broadening [43] is an observable that is sensitive to two-pronged events that are not precisely back-to-back jets. Here we focus on recoil-free broadening, to be distinguished from the original jet broadening [111][112][113] which is defined in terms of the thrust axis. It differs from spherocity only in that it minimizes the same quantity over two "kinked" axes that need not be antipodal. Though subtle, this difference gives rise to very important theoretical differences between broadening and spherocity in the treatment of soft recoil effects [114], as discussed extensively in Ref. [43].
Here, we use the following definition of broadening: where θ iL and θ iR are the angular distances between particle i andn L andn R , respectively. The fact thatn L andn R are minimized separately (rather thann L = −n R ) is the key 7 In fact, Eq. (3.8) is already an optimal transport problem, using θij = sin Ωij, where Ωij is the opening angle between particles i and j. This has the same small angle behavior as θij = 2 sin distinction between recoil-free broadening and previous observables. For a fixedn L andn R , the summand in Eq. (3.11) is the cost to transport particle i to the closer ofn L orn R with an angular measure of θ ij = 2n µ i n jµ . The sum is then the EMD from the event to the manifold of all two-particle events, which need not be back-to-back, namely P 2 from Eq. (1.5). The minimization overn L andn R is then interpreted as a minimization over this manifold.
Thus, broadening is the smallest EMD with β = 1 from the event to P 2 : The geometrical formulation of broadening in Eq. (3.12) differs from that of spherocity in Eq. (3.10) only in that it does not restrict the manifold to back-to-back configurations. This distinction is important to extend these ideas beyond the two-particle manifold.

N -jettiness
N -jettiness [44] (see also Ref. [115]) is an observable that partitions an event into N jet regions and, for hadronic collisions, a beam region. Without a beam region, it is defined based on a minimization procedure over N axes: where θ i1 through θ iN are the angular distances between particle i and axesn 1 throughn N , respectively. We immediately identify the summand as the cost of transporting particle i to the nearest axis. For fixedn 1 throughn N , assigning the energy transported to each axis as the energy of that axis gives rise to an N -particle event. The expression to be minimized is then the EMD between the original event and that N -particle event. The minimization overn 1 throughn N is interpreted as a minimization over all such N -particle events.
Therefore, N -jettiness is the smallest distance between the event and the manifold P N of N -particle events. Equivalently, one can view it as the EMD to the best N -particle approximation of the event, and we return to this interpretation in Sec. 4.1. Thus, we have: (3.14) We see that N -jettiness generalizes the geometric interpretation of broadening to a general N -particle manifold and a general angular weighting exponent β. For hadronic collisions, initial state radiation and underlying event activity require the introduction of a "beam" (or out-of-jet) region [44,116,117]. This can be accomplished via the introduction of a beam distance θ i,beam into the minimization of Eq. (3.13). There are many possible beam measures [49,118], including ones that involve optimizing over two beam axesn a andn b . For simplicity, we focus on θ i,beam = R β which makes no explicit reference to the beam directions [48]. Dividing by an overall factor of R β , this modified version of N -jettiness can be written as: This definition of N -jettiness is similar to Eq. (3.13), though now a particle can be closer to the beam than to any axis. In this case, we say that the particle is transported to the beam and removed for a cost E i . The summand is then the cost to transport the event to an N -particle event plus the cost of removing any particles beyond R from any axes.
Remarkably, this precisely corresponds to the EMD when formulated for events of different total energy. Namely, N -jettiness with this beam region is simply the smallest distance between the event and the manifold of N -particle events, with R smaller than the radius of the space: Particles removed by the optimal transport procedure are interpreted as being part of the beam region. This fact will also be relevant in Sec. 4.2 for understanding sequential recombination jet clustering algorithms as geometric constructions in the space of events.

Event isotropy
Our new geometric phrasing of these classic collider observables highlights the types of configurations that they are designed to probe. Specifically, Eq. (1.7) can be interpreted as how similar an event is to the class of events on the manifold M. This framework also suggests regions of phase space that are poorly resolved by existing observables and provides a prescription for developing new observables by identifying new manifolds of interest. Event isotropy [45] is a recently-proposed observable that provides a clear example of this strategy. It is based on the insight that distances from the N -particle manifolds (such as thrust and N -jettiness) are not well-suited for resolving isotropic events with uniform radiation patterns. Having observables with sensitivity to isotropic events can, for instance, improve new physics searches for microscopic black holes or strongly-coupled scenarios. This motivates event isotropy, which is the distance between the event E and an isotropic event U of the same total energy: Since E and U have the same total energy by construction, it is natural to normalize event isotropy by the total energy to make it dimensionless. The analysis in Ref. [45] focused primarily on β = 2, though this approach can be extended to a general angular exponent. For practical applications, it is convenient to consider a manifold of quasi-isotropic events of the same total energy and then estimate event isotropy as the average EMD between an event and this manifold.
We can cast Eq. (3.17) into the form of Eq. (1.7) by introducing a manifold M U of uniform events with varying total energies: (3.18) The R → ∞ limit in Eq. (3.1) enforces that the optimal isotropic approximation U has the same total energy as E, as in the original event isotropy definition.
The particular notion of a uniform distribution depends on the collider context-spherical for electron-positron collisions and cylindrical or ring-like for hadronic collisions-with corresponding choices for the energy and angular measures. The case of ring-like isotropy at a hadron collider is particularly interesting, since there are known simplifications for onedimensional circular optimal transport problems. For β = 1, ring-like event isotropy can be computed in O(M ) runtime [119] and there are fast approximations for any β ≥ 1 [119]. This is much faster than the generic O(M 3 ) expectation for EMD computations, motivating further studies of these one-dimensional geometries.

Jet angularities
Jet angularities are the energy-weighted angular moments of radiation within a jet [46] (see also Refs. [43,120,121]). Here, we use the following definition of a recoil-free jet angularity: where θ i is the angular distance between particle i and an axisn. The summand of an angularity is the EMD from the jet to the axis, so we can follow the analogous logic from our previous discussions of event shapes to reframe this observable in our geometric language. Specifically, the recoil-free angularities are the closest distance between the jet and the 1particle manifold P 1 : One can alternatively consider a definition of angularities where θ i is computed with respect to a fixed jet axis. In that case, the angularities are the EMD from the jet to a 1-particle configuration where the total energy of the jet is placed at the position of the desired axis.

N -subjettiness
N -subjettiness is a jet substructure observable that applies the ideas of N -jettiness at the level of jet substructure [47,48]. N axes are placed within the jet, with a penalty for having energy far away from any axis, and then the positions of the axes are optimized. The (dimensionful) N -subjettiness of a jet can be defined as follows:

l a t e x i t s h a 1 _ b a s e 6 4 = " E x P M Y 3 V a K c I H c n f 9 h Y 3 G / U y X l D Q = " > A A A C B n i c d V D J S g N B E K 2 J W 4 x b X G 5 e G o P g a Z g R I f E W 8 C K e I p g F J i H 0 d H q S J j 3 d Q 3 e P E I b c / Q G v + g f e x K u / 4 Q / 4 H X Y m C s b l Q c H j v S q q 6 o U J Z 9 p 4 3 p t T W F p e W V 0 r r p c 2 N r e 2 d 8 q 7 e y 0 t U 0 V o k 0 g u V S f E m n I m a N M w w 2 k n U R T H I a f t c H w x 8 9 u 3 V G k m x Y 2 Z J L Q X 4 6 F g E S P Y W C n o x t i M C O b Z 1 b R f r n j u e d W z Q L + J 7 3 o 5 K v U D y N H o l 9 + 7 A 0 n S m A p D O N Y 6 8 L 3 E 9 D K s D C O c T k v d V N M E k z E e 0 s B S g W O q e 1 l + 8 h Q d W 2 W A I q l s C Y N y 9 f t E h m O t J 3 F o O 2 c n 6 p / e T P z L C 1 I T 1 X o Z E 0 l q q C D z R V H K k Z F o 9 j 8 a M E W J 4 R N L M F H M 3 o r I C C t M j E 1 p Y U t E J y J O p i U b z N f 3 6 H / S O n V 9 z / W v z y r 1 2 j w h K M I h H M E J + F C F O l x C A 5 p A Q M I 9 P M C j c + c 8 O c / O y 7 y 1 4 H z O 7 M M C n N c P s D C a W A = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " E x P M Y 3 V a K c I H c n f 9 h Y 3 G / U y X l D Q = " > A A A C B n i c d V D J S g N B E K 2 J W 4 x b X G 5 e G o P g a Z g R I f E W 8 C K e I p g F J i H 0 d H q S J j 3 d Q 3 e P E I b c / Q G v + g f e x K u / 4 Q / 4 H X Y m C s b l Q c H j v S q q 6 o U J Z 9 p 4 3 p t T W F p e W V 0 r r p c 2 N r e 2 d 8 q 7 e y 0 t U 0 V o k 0 g u V S f E m n I m a N M w w 2 k n U R T H I a f t c H w x 8 9 u 3 V G k m x Y 2 Z J L Q X 4 6 F g E S P Y W C n o x t i M C O b Z 1 b R f r n j u e d W z Q L + J 7 3 o 5 K v U D y N H o l 9 + 7 A 0 n S m A p D O N Y 6 8 L 3 E 9 D K s D C O c T k v d V N M E k z E e 0 s B S g W O q e 1 l + 8 h Q d W 2 W A I q l s C Y N y 9 f t E h m O t J 3 F o O 2 c n 6 p / e T P z L C 1 I T 1 X o Z E 0 l q q C D z R V H K k Z F o 9 j 8 a M E W J 4 R N L M F H M 3 o r I C C t M j E 1 p Y U t E J y J O p i U b z N f 3 6 H / S O n V 9 z / W v z y r 1 2 j w h K M I h H M E J + F C F O l x C A 5 p A Q M I 9 P M C j c + c 8 O c / O y 7 y 1 4 H z O 7 M M C n N c P s D C a W A = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " E x P M Y 3 V a K c I H c n f 9 h Y 3 G / U y X l D Q = " > A A A C B n i c d V D J S g N B E K 2 J W 4 x b X G 5 e G o P g a Z g R I f E W 8 C K e I p g F J i H 0 d H q S J j 3 d Q 3 e P E I b c / Q G v + g f e x K u / 4 Q / 4 H X Y m C s b l Q c H j v S q q 6 o U J Z 9 p 4 3 p t T W F p e W V 0 r r p c 2 N r e 2 d 8 q 7 e y 0 t U 0 V o k 0 g u V S f E m n I m a N M w w 2 k n U R T H I a f t c H w x 8 9 u 3 V G k m x Y 2 Z J L Q X 4 6 F g E S P Y W C n o x t i M C O b Z 1 b R f r n j u e d W z Q L + J 7 3 o 5 K v U D y N H o l 9 + 7 A 0 n S m A p D O N Y 6 8 L 3 E 9 D K s D C O c T k v d V N M E k z E e 0 s B S g W O q e 1 l + 8 h Q d W 2 W A I q l s C Y N y 9 f t E h m O t J 3 F o O 2 c n 6 p / e T P z L C 1 I T 1 X o Z E 0 l q q C D z R V H K k Z F o 9 j 8 a M E W J 4 R N L M F H M 3 o r I C C t M j E 1 p Y U t E J y J O p i U b z N f 3 6 H / S O n V 9 z / W v z y r 1 2 j w h K M I h H M E J + F C F O l x C A 5 p A Q M I 9 P M C j c + c 8 O c / O y 7 y 1 4 H z O 7 M M C n N c P s D C a W A = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " m k v v N O 5 I l O B o V h o 0 v e v r W g l o 9 y k = " > A A A C B n i c d V D L S s N A F J 3 U V 6 2 v q k s 3 g 0 V w V R I R W n c F N + K q g n 1 A G s p k O m m H z i P M T I Q Q s v c H 3 O o f u B O 3 / o Y / 4 H c 4 a S t Y H w c u H M 6 5 l 3 v v C W N G t X H d d 6 e 0 s r q 2 v l H e r G x t 7 + z u V f c P u l o m C p M O l k y q f o g 0 Y V S Q j q G G k X 6 s C O I h I 7 1 w e l n 4 v T u i N J X i 1 q Q x C T g a C x p R j I y V / A F H Z o I R y 6 7 z Y b X m 1 i 8 a r g X 8 T b y 6 O 0 M N L N A e V j 8 G I 4 k T T o T B D G n t e 2 5 s g g w p Q z E j e W W Q a B I j P E V j 4 l s q E C c 6 y G Y n 5 / D E K i M Y S W V L G D h T v 0 9 k i G u d 8 t B 2 F i f q n 1 4 h / u X 5 i Y m a Q U Z F n B g i 8 H x R l D B o J C z + h y O q C D Y s t Q R h R e 2 t E E + Q Q t j Y l J a 2 R C Q V P M 4 r N p i v 7 + H / p H t W 9 9 y 6 d 3 N e a z U X E Z X B E T g G p 8 A D D d A C V 6 A N O g A D C R 7 A I
A W 8 g j f r 2 X q 3 P q 2 v e W v F W s y c g i V Y P 7 8 n Y K T y < / l a t e x i t > Figure 6: An illustration of N -subjettiness values as the smallest distances, as measured by EMD, between the event E and each of the N -particle manifolds P N . The jet angularities are the distances to the 1-particle manifold P 1 . These observables form a set of "coordinates" for the space.
where θ i1 through θ iN are the angular distances between particle i and axesn 1 throughn N . The beam region is absent due to the fact that these observables are only defined using the particles already within an identified jet.
We can find a geometric interpretation for N -subjettiness by using the analogous discussion from N -jettiness in Sec. 3.1.4. N -subjettiness is the distance between the jet and the manifold of all N -particle jets: As a limiting case, N = 1 corresponds to the jet angularities in Eq. (3.20).
In this way, we can view N -subjettiness values as "coordinates" for the space of jets, defined as distances from each of the N -particle manifolds, illustrated in Fig. 6. The Nsubjettiness ratios τ N /τ N −1 , used ubiquitously for jet substructure studies [122][123][124], are then the relative distances between the manifolds P N and P N −1 . This is also an interesting way to interpret existing constructions of observable bases using N -(sub)jettiness [125][126][127]; the fact that multiple β values are typically needed for these constructions emphasizes that the choice of ground metric affects the geometry of the space induced by the EMD.   Figure 7: An illustration of jet clustering algorithms as projections to N -particle manifolds P N in the space of events. (a) Exclusive cone finding algorithms yield N jets as the closest Nparticle approximation to the event, as measured by the EMD. (b) Sequential recombination algorithms iteratively find the best (M − 1)-particle approximation to the M -particle event, either (dashed) merging two particles or (solid) removing a particle and calling it a jet.
4 Jets: The closest N -particle description of an M -particle event In this section, we turn our attention to how jets are defined. We interpret two of the most common classes of jet algorithms as simple geometric constructions in the space of events. Intuitively, we find that jets are the best N -particle approximation to an M -particle event. Many existing techniques naturally emerge from this simple principle in fascinating ways.
First, we discuss exclusive cone finding, as this technique corresponds exactly to the intuition above that jets approximate the energy flow of an event using a smaller number of particles. Next, we show that many sequential recombination algorithms can be derived by iteratively approximating an M -particle event using M − 1 particles. These jet-finding strategies are illustrated in Fig. 7 as projections to N -particle manifolds in the space of events.

General N : Exclusive cone finding
XCone [49,50] is an exclusive cone finding algorithm that seeks to find jets by minimizing N -jettiness. It returns a fixed number of jets based on the parameters N and R, in the same spirit as the exclusive version of the k t sequential recombination algorithm [51]. XCone proceeds by finding the N axes that minimize N -jettiness as defined in Eq. (3.15): Together with the energy assigned to those axes, or equivalently the set of particles mapped to each axis, the N axes from Eq. (4.1) define N jets. The jet radius parameter R controls which particles are not assigned to any jet (i.e. assigned to the beam region). Following the discussion in Sec. 3.1.4, Eq. (4.1) can be interpreted as finding the N -particle configuration that best approximates the event of interest. In our geometric language, we can cast XCone as identifying the point of closest approach between an event E and the N -particle manifold P N : Different variants of XCone correspond to different choices for the energy weight E i and the angular measure θ ij [49,118], which in turn correspond to different choices for what defines the "best" N -particle approximation to an event.
As discussed in Ref. [128], there is a close relationship between exclusive cone finding algorithms, stable cone algorithms [129][130][131], and jet maximization algorithms [132][133][134][135][136]. For the choice of β = 2, the jet axis aligns with the jet momentum direction, which is known as the stable cone criterion [129,130]. For N = 1, one can relate the optimization problem in Eq. (4.2) to maximizing a "jet function" over all possible partitions of an event into one in-jet region and one out-of-jet region [132]. Iteratively applying the N = 1 procedure is related to the SISCone algorithm with progressive jet removal [131]. All of these various algorithms can now be interpreted in our geometric picture as different ways to "project" the event E onto the N -particle manifold P N .

N = M − 1: Sequential recombination
Sequential recombination algorithms are a class of jet clustering algorithms that have seen tremendous use at colliders, particularly the anti-k t algorithm [137] which is the current default jet algorithm at the LHC. These methods utilize an interparticle distance d ij , a particle-beam distance d iB , and a recombination scheme for merging two particles. The algorithm proceeds iteratively by finding the smallest distance, combining particle i and j if it is a d ij , or calling i a jet and removing it from further clustering if it is a d iB .
There are a variety of distance measures and recombination schemes that appear in the literature, many of which are implemented in the FastJet library [138]. The most commonly used distance measures take the form: Table 5: Different sequential recombination measures d ij and recombination schemes λ * that emerge from an EMD formulation. A question mark indicates a method that, to our knowledge, does not yet appear in the literature. The traditional definitions of generalized k t and C/A require squaring d ij and d iB . Note the factor of 2 in the C/A effective jet radius parameter.
where p is an energy weighting exponent and R is the jet radius. The exponent p = 1 corresponds to k t jet clustering [51,52], p = 0 corresponds to Cambridge/Aachen (C/A) clustering [139,140], and p = −1 corresponds to anti-k t clustering [137]. The recombination scheme determines the energy E c and directionn c of the combined particle and typically takes the form: where κ = 1 corresponds the E-scheme (most typically used), κ = 2 is the E 2 -scheme [51,141], and κ → ∞ is the winner-take-all scheme [43,53,54]. In the E-scheme, the four-momenta of the two particles are simply added. 8 In the winner-take-all scheme, the direction is determined by the more energetic particle. The conceptual and algorithmic richness of these different distance measures and recombination schemes arose from decades of phenomenological studies. Remarkably, many of these techniques naturally emerge from event space geometry, as finding the point on the (M − 1)particle manifold P M −1 that is closest to configuration E with M particles. Note that the sequential recombination algorithms in Eqs. (4.3) and (4.4) depend on the two parameters p and κ, whereas Eq. (4.6) depends only on β, so the logic below will only identify a onedimensional family of jet algorithms, as summarized in Table 5.
To derive this connection between event geometry and sequential recombination, we need the following simple yet profound lemma, using the suggestive notation of d iB and d ij to refer to the EMD cost of rearrangement.
Lemma 2. As measured by the EMD, the closest (M − 1)-particle event to an M -particle event has, without loss of generality, either: (a) Two of the particles in the event merged together.
(b) One of the particles in the event removed. Proof. Removing a particle from the event has some EMD cost d iB and merging a pair of particles has a some EMD cost d ij . To reduce the number of particles in the event by one, one can either remove a particle or merge two particles. Altering more than two particles by (re)moving fractions of additional particles always incurs additional EMD costs. If there are multiple pairs that are zero distance apart, then we can without loss of generality always choose to only merge one pair.
The two options in this lemma correspond precisely to the two possible actions at each stage of a sequential recombination algorithm. The EMD cost of removing a particle is always If this is less than the cost of merging two particles together, then particle i can be identified as a jet. For one step of a sequential recombination (SR) procedure applied to an event E with M particles, we can express this mathematically as: In our geometric picture, if the M particle event is "far away" from the (M − 1)-particle manifold P M −1 , then the projected difference is a jet.
On the other hand, if the cost of merging two particles is less than any of the particle energies, then the event is "close" to the (M − 1)-particle manifold. Consider a pair of particles with energies E i and E j separated by a distance θ ij . To find the best (M − 1)particle approximation, we want to merge these two particles into one combined particle with energy E i + E j . Because the EMD is a metric, the optimal transportation plan must occur along a "geodesic" connecting the particles, with particle i moving a distance λ θ ij and particle j moving a distance (1 − λ) θ ij for some λ ∈ [0, 1]. 9 Minimizing this cost with respect to λ yields both the cost of merging those two particles as well as the optimal recombination scheme with which to merge them. Because no energy is removed in this process, Eq. (4.6) yields a zero energy jet, which we can interpret as no jet being found at this step of the sequential recombination.
The cost of merging particles i and j depends on the jet radius parameter R and angular exponent β: For β ≤ 1, the cost in Eq. (4.7) is minimized at the endpoints. This corresponds to moving the less energetic particle the entire distance θ ij to the more energetic particle, which is the precisely behavior of the winner-take-all recombination scheme. For β > 1, the optimal value λ * can be found by differentiating Eq. (4.7) with respect to λ and setting the result equal to zero. In general, the optimal recombination scheme has: (4.8) To determine the actual cost, we substitute this λ * back into Eq. (4.7): If all d ij values in Eq. (4.9) are smaller than all particle energies in Eq. (4.5), then the optimal transportation plan is to merge particles i and j.
In this way, Eq. (4.6) takes an M -particle event and returns a jet (with zero energy if no actual jet is found) plus the remaining (M − 1)-particle approximation. This corresponds exactly to one step of a sequential clustering procedure. Iterating this procedure until M = 1, we derive a sequential recombination jet algorithm, where the jets correspond to all of the positive energy configurations obtained from Eq. (4.6).
Many existing methods reside within the simple framework of Eq. (4.6). For instance, β = 1 corresponds to k t jet clustering with winner-take-all recombination. The recombination scheme for β = 2 is the E-scheme, whereas for β = 3 2 it is the E 2 -scheme. Raising the distance measures to the 1/β power and taking the β → ∞ limit, we obtain the C/A clustering metric, albeit with an effective jet radius that is twice the R parameter. There are also number of methods, indicated as question marks in Table 5, that emerge from this reasoning yet do not presently appear in the literature. Exploring these new methods is an interesting avenue for future work.
Intriguingly, in this geometric picture, the distance measure d ij and the recombination scheme λ * are paired by the β parameter. A similar pairing was noted in Refs. [49,142] A 3 f i 1 t / w B / w O p 6 2 C z w M X D u f c y 7 3 3 h C l n S j v O q 1 V Z W l 5 Z X a u u 1 z Y 2 t 7 Z 3 6 r t 7 X Z V k k l C P J D y R / R A r y p m g n m a a 0 3 4 q K Y 5 D T n v h 9 G L m 9 2 6 o V C w R 1 z p P a R D j s W A R I 1 g b y R / E W E 8 I 5 o V X D u s N x z 5 v O g b o N 3 F t Z 4 5 G + w D m 6 A z r b 4 N R Q r K Y C k 0 4 V s p 3 n V Q H B Z a a E U 7 L 2 i B T N M V k i s f U N 1 T g m K q g m J 9 c o m O j j F C U S F N C o 7 n 6 d a L A s V J 5 H J r O 2 Y n q p z c T / / L 8 T E e t o G A i z T Q V Z L E o y j j S C Z r 9 j 0 Z M U q J 5 b g g m k p l b E Z l g i Y k 2 K X 3 b E t F c x G l Z M 8 F 8 f o / + J 9 1 T 2 3 V s 9 + q s 0 W 4 t E o I q H M I R n I A L T W j D J X T A A w I J 3 M E 9 P F i 3 1 q P 1 Z D 0 v W i v W x 8 w + f I P 1 8 g 7 B w 5 p j < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " K C K 9 o q M l g u m U F B y C 6 O T c W + z b b x U = " > A A A C B n i c d V D L S s N A F L 2 p r 1 p f 9 b F z M 1 g E V y E R o X V X c O O y g m k L a S i T 6 a Q d O p m E m Y k Q Q v b + g F v 9 A 3 f i 1 t / w B / w O p 6 2 C z w M X D u f c y 7 3 3 h C l n S j v O q 1 V Z W l 5 Z X a u u 1 z Y 2 t 7 Z 3 6 r t 7 X Z V k k l C P J D y R / R A r y p m g n m a a 0 3 4 q K Y 5 D T n v h 9 G L m 9 2 6 o V C w R 1 z p P a R D j s W A R I 1 g b y R / E W E 8 I 5 o V X D u s N x z 5 v O g b o N 3 F t Z 4 5 G + w D m 6 A z r b 4 N R Q r K Y C k 0 4 V s p 3 n V Q H B Z a a E U 7 L 2 i B T N M V k i s f U N 1 T g m K q g m J 9 c o m O j j F C U S F N C o 7 n 6 d a L A s V J 5 H J r O 2 Y n q p z c T / / L 8 T E e t o G A i z T Q V Z L E o y j j S C Z r 9 j 0 Z M U q J 5 b g g m k p l b E Z l g i Y k 2 K X 3 b E t F c x G l Z M 8 F 8 f o / + J 9 1 T 2 3 V s 9 + q s 0 W 4 t E o I q H M I R n I A L T W j D J X T A A w I J 3 M E 9 P F i 3 1 q P 1 Z D 0 v W i v W x 8 w + f I P 1 8 g 7 B w 5 p j < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " K C K 9 o q M l g u m U F B y C 6 O T c W + z b b x U = " > A A A C B n i c d V D L S s N A F L 2 p r 1 p f 9 b F z M 1 g E V y E R o X V X c O O y g m k L a S i T 6 a Q d O p m E m Y k Q Q v b + g F v 9 A 3 f i 1 t / w B / w O p 6 2 C z w M X D u f c y 7 3 3 h C l n S j v O q 1 V Z W l 5 Z X a u u 1 z Y 2 t 7 Z 3 6 r t 7 X Z V k k l C P J D y R / R A r y p m g n m a a 0 3 4 q K Y 5 D T n v h 9 G L m 9 2 6 o V C w R 1 z p P a R D j s W A R I 1 g b y R / E W E 8 I 5 o V X D u s N x z 5 v O g b o N 3 F t Z 4 5 G + w D m 6 A z r b 4 N R Q r K Y C k 0 4 V s p 3 n V Q H B Z a a E U 7 L 2 i B T N M V k i s f U N 1 T g m K q g m J 9 c o m O j j F C U S F N C o 7 n 6 d a L A s V J 5 H J r O 2 Y n q p z c T / / L 8 T E e t o G A i z T Q V Z L E o y j j S C Z r 9 j 0 Z M U q J 5 b g g m k p l b E Z l g i Y k 2 K X 3 b E t F c x G l Z M 8 F 8 f o / + J 9 1 T 2 3 V s 9 + q s 0 W 4 t E o I q H M I R n I A L T W j D J X T A A w I J 3 M E 9 P F i 3 1 q P 1 Z D 0 v W i v W x 8 w + f I P 1 8 g 7 B w 5 p j < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " m L P V F X E 7 6

q S o k M b 3 S 8 p d h R 3 N o W o = " > A A A C B n i c d V D L S s N A F J 3 4 r P V V d e l m s A i u Q i J C 6 6 7 g x m U F 0 x b S U i b T S T t 0 H m F m I o S Q v T / g V v / A n b j 1 N / w B v 8 N J W 8 H 6 O H D h c M 6 9 3 H t P l D C q j e e 9 O y u r a + s b m 5 W t 6 v b O 7 t 5 + 7 e C w o 2 W q M A m w Z F L 1 I q Q J o 4 I E h h p G e o k i i E e M d K P p V e l 3 7 4 j S V I p b k y V k w N F Y 0 J h i Z K w U 9 j k y E 4 x Y H h T D W t 1 z L
x u e B f x N f N e b o Q 4 W a A 9 r H / 2 R x C k n w m C G t A 5 9 L z G D H C l D M S N F t Z 9 q k i A 8 R W M S W i o Q J 3 q Q z 0 4 u 4 K l V R j C W y p Y w c K Z + n 8 g R 1 z r j k e 0 s T 9 Q / v V L 8 y w t T E z c H O R V J a o j A 8 0 V x y q C R s P w f j q g i 2 L D M E o Q V t b d C P E E K Y W N T W t o S k 0 z w p K j a Y L 6 + h / + T z r n r e 6 5 / c 1 F v N R c R V c A x O A F n w A c N 0 A L X o A 0 C g I E E D + A R P D n 3 z r P z 4 r z O W 1 e c x c w R W I L z 9 g l J F 5 o P < / l a t e x i t > E < l a t e x i t s h a 1 _ b a s e 6 4 = " y v F x + S t u S u W / b U a E G o k a k v U 6 6 a w = " > A A A C B n i c d V D J S g N B E K 2 J W 4 x b X G 5 e G o P g a Z g R I f E W E M F j B L P A J I S e T k / S p K d 7 6 O 4 R w p C 7 P + B V / 8 C b e P U 3 / A G / w 8 Figure 8: A visualization of pileup subtraction in the space of events as moving away from uniform radiation. This proceeds by finding the event E C that, when combined with uniform contamination ρ U, is most similar to the given event E. Different pileup mitigation strategies implement this removal in different ways. In the figure above, Ω refers to the space of all energy flows and Ω + ρ U is a subset of that space obtained by adding uniform contamination to every event configuration (shown as a separate manifold for ease of visualization).
in the context of choosing approximate axes for computing N -(sub)jettiness, and it would be interesting to explore the phenomenological implications of these paired choices for jet clustering. One sequential combination algorithm that does not appear is anti-k t . Given that anti-k t is a kind of hybrid between sequential recombination and cone algorithms, there may be a way to combine the logic of Secs. 4.1 and 4.2 to find a geometric phrasing of anti-k t . If successful, such a geometric construction would likely illuminate the difference between exclusive jet algorithms like XCone that find a fixed number of jets N and inclusive jet algorithms like anti-k t that determine N dynamically.

Pileup subtraction: Moving away from uniform events
The LHC era has brought with it new collider data analysis challenges. One notable example is pileup mitigation [60], removing the diffuse soft contamination from additional uncorrelated proton-proton collisions. The radiation from pileup interactions is approximately uniform in the rapidity-azimuth plane, and several existing pileup mitigation strategies seek to remove this uniform distribution of energy from the event [55,58,59,[143][144][145][146][147][148].
In this section, inspired by the approximate uniformity of pileup, we consider a class of pileup removal procedures that can be described as "subtracting" a uniform distribution of energy with density ρ, denoted ρ U, from a given event. We take the pileup density per unit area ρ to be given, for instance, by the area-median approach [55]. Given an event flow E, the subtracted distribution E − ρ U is typically not a valid energy flow, since the local energy density can go negative. Therefore, to implement this principle at the level of energy distributions, we turn this logic around and declare the corrected event E C to be one that is as close as possible to the given event E when uniform radiation ρ U is added to it: Here, Ω refers to the complete space of energy flows, and the R → ∞ limit of the EMD from Eq. (3.1) enforces that the corrected distribution E C has the correct total energy. As illustrated in Fig. 8, one can visualize Eq. (5.1) as a procedure that subtracts a uniform component from the energy flow. To make contact with existing techniques, we show that area-based Voronoi subtraction [55,56,138] and ghost-based constituent subtraction [58] can be cast in the form of Eq. (5.1) in the low-pileup limit. We then develop two new pileup mitigation techniques that have optimal transport interpretations even away from the lowpileup limit: Apollonius subtraction, which corresponds to exactly implementing Eq. (5.1) for β = 1, and iterated Voronoi subtraction, which repeatedly applies Eq. (5.1) with an infinitesimal ρ. Since pileup is characteristic of a hadron collider, throughout this section we compute the EMD using particle transverse momenta p T,i and rapidity-azimuth coordinateŝ n i = (y i , φ i ), with θ ij being the rapidity-azimuth distance. Typically, pileup is taken to be uniform in a bounded region of the plane (e.g. |y| < y max ), though the specifics will not significantly affect our analysis. First, though, we establish an important lemma that justifies why the corrected distribution E C has a particle-like interpretation.

A property of semi-discrete optimal transport
There is a direct connection between pileup subtraction in Eq. (5.1) and semi-discrete optimal transport [149]. Semi-discrete means that we are comparing a discrete energy flow (i.e. one composed of individual particles) to a smooth distribution (i.e. uniform pileup contamination).
Importantly, if E is discrete, then the corrected distribution E C will also be discrete. This can be proved via the following lemma.
Lemma 3. E C defined according to Eq. (5.1) is strictly contained in E, where containment here means that E − E C is a valid distribution with non-negative particle transverse momenta.
Proof. Suppose for the sake of contradiction that E C is defined according to Eq. (5.1) has some support where E does not. LetẼ be the distribution that E C flows to when E C + ρ U is optimally transported to E, noting that by definition,Ẽ must be contained in E. By the linear sum structure of Eq. (3.2) [150], we have the following relation: Now using the following property of EMD β inherited from Wasserstein distances [149]: with equality if β = 1 and the ground metric is Euclidean, we addẼ to both arguments of the last term in Eq. (5.2) and apply Eq. (5.3) to find: Now using that EMD β (Ẽ, E C ) > 0 by the assumption that they have different supports as well as the non-negativity of the EMD, we find: which contradicts the assumption that E C is found according to Eq. (5.1). Thus, we conclude that E C has no support outside of the support of E, verifying the claim.
This lemma establishes that pileup mitigation strategies defined by Eq. (5.1) act by scaling the energies of the particles in the original event E, not by producing new particles. Indeed, this is a desirable feature of many popular pileup mitigations schemes, including two well-known methods that we describe next.

Voronoi area subtraction
Voronoi area subtraction [55,56,138] is a pileup mitigation technique that estimates a particle's pileup contamination by associating it with an area determined by its corresponding Voronoi region, or the set of points in the plane closer to that particle than any other [151]. Letting A Vor.
i be the area of the Voronoi region of particle i, Voronoi subtraction then simply removes ρA Vor.
i from each particle's transverse momentum, without letting the particle p T become negative. If ρA Vor. i ≥ p T,i then the particle is removed entirely. In Fig. 9a, we show the Voronoi regions for an example jet recorded by the CMS detector [25,152].
Voronoi area subtraction (VAS) can be thought of as carving up the uniform event ρ U according to the original event's Voronoi diagram and transporting this energy to the location of the corresponding particle, yielding the corrected energy flow: Strictly speaking, Voronoi area subtraction does not satisfy exact IRC invariance (see Eqs. (2.1) and (2.2)) and thus it cannot in general be written as operating on energy flows. The reason is that an exact IRC splitting changes the number of Voronoi regions as well as their areas. In order for Eq. (5.6) to be valid, we therefore assume that particles with exactly zero transverse momentum are removed and exactly coincident particles are combined before applying the Voronoi area subtraction procedure.  The jet constituents are shown as gray disks at their locations in the rapidity-azimuth plane with sizes proportional to their transverse momenta. The boundary is at a distance R = 0.5 from the jet axis at the center. The color intensity of each region is proportional to its area, which determines the size of the pileup correction. The Voronoi diagram (a) is independent of ρ. The constituent subtraction (b) and Apollonius (c) diagrams are determined using a ρ that corresponds to subtracting one-tenth of the total scalar p T of the jet.
In the limit that ρ ≤ p T,i /A Vor.
i for all particles i, the max in Eq. (5.6) evaluates to just its first argument. In this case, since no particle is assigned a larger correction than its own transverse momentum, the Voronoi diagram gives the optimal transportation plan that minimizes the EMD of moving the uniform event with density ρ onto the event of interest: Thus, in this small-pileup limit, Eq. (5.6) agrees with Eq. (5.1) with β = 1. Despite this attractive geometric interpretation, Voronoi area subtraction beyond this limit is sensitive to arbitrarily soft particles: the amount that is subtracted depends only on particle positions, through their Voronoi areas, and not their transverse momenta.

Constituent subtraction
Constituent subtraction [58] is another pileup mitigation method that resolves several pathologies of Voronoi area subtraction by correcting the particles in a manner that depends on both their positions and their transverse momenta. 10 This comes at the cost of requiring a fine grid of low energy "ghost" particles with p g T = ρA ghost , where A ghost is the area assigned to each ghost, as a proxy for the pileup contamination. The algorithm is applied by considering the geometrically closest ghost-particle pair k, i and modifying them via: continuing until all such pairs have been considered. Since the number of ghosts is typically large in order to have fine angular granularity, this iteration through all ghost-particle pairs can be computationally expensive. Constituent subtraction (CS) in the continuum ghost limit can be geometrically described by placing circles around each point in the rapidity-azimuth plane and simultaneously increasing their radii. Each point in the plane is assigned to the particle whose circle reaches it first. Circles stop growing when A CS i , the area assigned to particle i, grows larger than p T,i /ρ. We can write the resulting distribution as: Unlike naive Voronoi area subtraction, continuum constituent subtraction satisfies exact IRC invariance, since a zero energy particle has zero A CS and an exact collinear splitting yields two areas that sum to the original A CS . Constituent subtraction is also better suited for intermediate values of ρ, where particles can be fully removed, since further corrections are distributed to the next closest particle instead of being ignored as in Voronoi area subtraction. Due to the complicated shapes of the corresponding regions, it is difficult to describe the areas A CS i analytically and in practice they need to be estimated using numerical ghosts. An example of constituent subtraction is shown in Fig. 9b, where it can be seen that some region boundaries are straight and thus contained in the Voronoi diagram of Fig. 9a. Indeed, growing circles from a set of points and assigning points in the plane according to which circle reaches them first is another way of describing the construction of a Voronoi diagram. Regions with circular boundaries correspond to softer particles that are fully subtracted by the constituent subtraction procedure.
When ρ is sufficiently small such that no particle's region has a circular boundary (i.e. no circle stops growing), constituent subtraction is exactly equivalent to Voronoi area subtraction. Constituent subtraction in the low-pileup limit is then also equivalent to optimally transporting the uniform event with density ρ to the event of interest and subtracting accordingly, again in line with Eq. (5.1) with β = 1: (5.10) Constituent subtraction can also be extended with a ∆R max parameter to restrict ghosts from affecting distant particles. Our geometric formalism can also encompass this locality by re-introducing the R-parameter to the EMD in Eq. (5.10) with R = ∆R max .

Apollonius subtraction
Voronoi area subtraction and constituent subtraction both make contact with Eq. (5.1) in the small-ρ limit, but we would like to explore pileup subtraction based on optimal transport for all values of ρ. By Lemma 3, we know that the corrected event is contained in the original event, and by the decomposition properties of the EMD in Eq. (5.2), we only need to consider the transport of ρ U to E. Since the total transverse momenta of ρ U and E are generally different, this is now an example of a semi-discrete, unbalanced optimal transport problem [153,154]. The problem of minimizing the EMD between a uniform distribution and an event is solved, for general β, by a generalized Laguerre diagram [154]. For the special case of β = 1, which we focus on here, this is also known as the Apollonius diagram (or additively weighted Voronoi diagram) [149,155,156], and for β = 2 it is a power diagram [157]. An Apollonius diagram in the plane is constructed from a set of pointsn i that each carry a non-negative weight w i that is the i th component of a vector w ∈ R M + . In the two-dimensional Euclidean plane, the Apollonius region associated to particle i depending on w is: where particle indices i, j = 1, . . . , M and · is the Euclidean norm. One interpretation of Eq. (5.11) is that region i is all points closer to a circle of radius w i centered atn i than to the corresponding circle for any other particle. The boundaries of the Apollonius regions are contained in the set {n ∈ R 2 | n −n i − n −n j = w i − w j }, which is a union of hyperbolic segments. Note that adding the same constant to all of the weights does not change the resulting Apollonius diagram. Hence, if all the weights are equal, they can equivalently be set to zero and we attain the Voronoi diagram as a limiting case of an Apollonius diagram.
We can now specify the action of Apollonius subtraction on an event using the areas of the Apollonius regions subject to the minimal EMD requirement: treating ρR Apoll. i (w) as an event with uniform energy density ρ in that Apollonius region. Here, Eq. (5.12) is analogous to Eqs. (5.6) and (5.9), and Eq. (5.13) implements the requirement that the EMD of the subtraction is minimal. Note that the R parameter in Eq. (5.13) serves only to guarantee that it is more efficient to transport energy rather than create/destroy it. As long as 2R is greater than the diameter of the space, R has no impact on the solution other than to guarantee that ρA Apoll. i (w * ) does not exceed p T,i , as this would be less efficient than transporting the excess energy elsewhere. An example of an Apollonius diagram is shown in Fig. 9c, where hyperbolic boundaries of the Apollonius regions are clearly seen in the outer part of the jet and straight boundaries, matching those of the Voronoi diagram, are seen near the core.
In this way, Apollonius subtraction generalizes Voronoi area and constituent subtraction beyond the small-pileup limit, directly implementing Eq. (5.1) for β = 1 for all values of ρ: (5.14) While the optimal solution in Eq. (5.13) is based on an unbalanced optimal transport problem, the restatement in Eq. (5.14) corresponds to balanced transport. This same connection underpins Lemma 3, guaranteeing that the corrected event in Eq. (5.12) involves the same M directions as the original event, just with different weights. To turn Eq. (5.14) into a practical algorithm, we would need an efficient way to compute the weights according to Eq. (5.13). While Refs. [153,154] have developed the theoretical framework of semi-discrete, unbalanced optimal transport needed to solve this convex minimization problem, they stop short of describing easily-implementable algorithms to attain practical solutions. In order to create Fig. 9c, we were limited to using numerical ghosts to directly solve for the transport plan that minimizes the EMD cost of subtracting the uniform energy component from the event, which is too computationally costly for LHC applications.
If the target areas A Apoll.
i are previously specified, then the solution to Eq. (5.13) simplifies [149]. Given that the areas depend nontrivially on the resulting weight vector, though, the only case where we know them ahead of time is when ρ is such that all of the energy will be exactly subtracted, in which case A Apoll. i = p T,i /ρ. Though this is not so useful for pileup, where we typically want to subtract an amount of energy less than the total, it does indicate that an Apollonius diagram can be found and used to compute the event isotropy from Sec. 3.1.5 without the use of numerical ghosts. We leave the implementation of such a procedure to future work, though we note that Ref. [149] has already built an implementation that relies on numerical ghosts to estimate the areas of the Apollonius regions rather than solving for them analytically.

Iterated Voronoi subtraction
Given the difficulty of analytically solving Eq. (5.13) and thus implementing Apollonius subtraction, we now develop an alternative method called iterated Voronoi subtraction that gives up a global notion of minimizing EMD but retains a local one. In all three methods described above, the difficulty comes when a particle is removed in the course of subtracting pileup. Otherwise, the above methods all reduce to subtracting transverse momentum according to the Voronoi areas of the regions corresponding to the particles, as in Eq. (5.7). This suggests a procedure in which pileup is subtracted according to Eq. (5.1) an infinitesimal amount at a time, thus ensuring that Eq. (5.7) can be used at every stage of the procedure.
The area of the Voronoi cell of particle i is now a function of the total amount of energy density that has been subtracted thus far, a quantity that starts at zero and will be integrated up to the target ρ tot over the course of the procedure. When a particle loses all of its transverse momentum, it is removed from the Voronoi diagram and is considered to have zero area associated to it. The removal of a particle from the diagram changes the Voronoi regions of all of its neighbors, and their areas are updated accordingly. Denoting the area associated to particle i after ρ worth of energy density has been subtracted as A IVS i (ρ), we can write the corrected distribution for iterated Voronoi subtraction (IVS) as: Unlike Eqs. (5.12) and (5.13), Eq. (5.15) naturally lends itself to a simple and efficient implementation. We can iteratively solve for A IVS i (ρ) using the fact that the areas correspond to Voronoi regions, and furthermore that these regions change only when a particle is removed. Let E (0) be the initial event consisting of particles with transverse momenta p i . We subtract a total energy density ρ tot by breaking up the integral in Eq. (5.15) starting with ρ (0) = 0 and determining the boundaries from: where n starts at 1 and goes up to at most M . The values of ρ (n) can be expressed simply as: where the inner minimum is taken over all remaining particles with p (n−1) T,j > 0. The updated particle momenta from each piece of the integral in Eq. (5.15) are then: and a particle is considered removed if its transverse momentum is zero, in which case it is also considered to have zero area. The areas A (n) i are determined by the Voronoi diagram of E (n) . The above procedure terminates either when the total amount of energy density removed is equal to ρ tot or there are no more particles left in the event. Thus, iterated Voronoi subtraction makes contact with the geometric perspective of Eq. (5.1), applying it in infinitesimal increments, resulting in the discrete steps: Said another way, this is simply a repeated application of Voronoi area subtraction: subtract until a particle reaches zero momentum, and repeat until the desired energy density has been removed.
Iterated Voronoi subtraction is made even more attractive computationally when one considers that the Voronoi diagram of E (n) does not need to be recomputed from scratch. Rather, it can be obtained from the Voronoi diagram of E (n−1) by removing a site and updating only the neighboring regions. Thus, we only need to construct the Voronoi diagram of E (0) and each removal can be done in constant (amortized) time as the average number of neighbors of any cell is no more than 6 [151]. We have constructed an implementation of iterated Voronoi subtraction that interfaces with FastJet and will explore its phenomenological properties in future work.

Theory space
When do two theories give rise to similar signatures? In this section, we seek to generalize the intuition behind the EMD to obtain a metric between theories using their predicted cross sections in energy flow space. A construction of such a distance and the induced "theory space" is conceptually useful and, in fact, naturally underpins several recently introduced techniques for collider physics.
We introduce the cross section mover's distance (ΣMD) as a metric for the space of theories. Here, we treat a "theory" as an ensemble of event energy flows with corresponding cross sections, encompassing both the predictions of quantum field theories as well as the structure of collider datasets. To accomplish this, we again make use of an EMD-like construction, except the ΣMD uses the EMD itself as the "angles" and the event cross sections as the "energies", as mentioned in Table 2. The resulting space of theories with the ΣMD as a metric is illustrated in Fig. 10. Interestingly, Ref. [158] also put a metric on theory space by using the Fisher information matrix.

Introducing a distance between theories
A "theory" T is taken to be a (finite, for now) set of events with associated cross sections We can equivalently view T as a distribution over the space of event configurations E: In the case of unweighted events, the cross sections are simply σ i = 1/L, where L is the total integrated luminosity. While it might seem strange to associate a cross section to an individual event, one can think of σ i as being the rate to produce events that look similar to E i , with the degree of similarity determined by the EMD. In the L → ∞ limit, the cross section of an individual event goes to zero, and Eq. (6.1) becomes a smooth distribution. The ΣMD is the minimum "work" required to rearrange one theory T into another T by moving cross section F ij from event i in one theory to event j in the other: ΣMD γ,S;β,R (T , T ) = min Here, i and j index the events in theories T and T , respectively. The parameter S, which has the same units as the EMD, controls the relative importance of the two terms, analogous to the jet radius parameter R in the EMD. We have also introduced a possible γ exponent, analogous to the β angular exponent in the EMD. The ΣMD has dimensions of cross section, where the first term quantifies the difference in event distributions and the second term accounts for the creation or destruction of cross section. For γ > 0, it is a true metric as long as the underlying EMD is a metric and 2S is larger than the largest attainable EMD between two events. 11 In the limit as S → ∞, the ΣMD reduces simply to the difference in total cross section between the two theories. The natural continuum notion of Eq. (6.2) can be used whenever such an analysis is analytically tractable.
The ΣMD from a theory to itself is zero in the continuum limit or with infinite data. Further, two theories that differ in their Lagrangians yet give rise to identical scattering cross sections for all energy flows will have a ΣMD of zero. This includes, for instance, theories that are equivalent up to field redefinitions [159] or rearrangements of the asymptotic states [160][161][162]. Finally, if two theories have any observable differences in energy flow, then the ΣMD between them will be non-zero. Note that the ΣMD inherits the flavor and charge insensitivity 11 The analogous discussion to footnote 1 holds for γ > 1.
of the EMD, but it is interesting to consider extending the ΣMD to account for additional quantum numbers that particles may carry.

Jet quenching via quantile matching
Quantile matching [64] is an analysis strategy to study the modification of jets as they traverse the quark-gluon plasma in heavy-ion collisions. We now show that, surprisingly, this technique can be cast naturally in a ΣMD formulation. The optimal "theory moving" transport between two otherwise-equivalent datasets provides a proxy for the jet modification by the quark-gluon plasma.
Intuitively, the idea is to select a set of statistically equivalent jets from both protonproton (pp) collisions and heavy-ion (AA) collisions. This gives a snapshot of jets and their energies before and after modification by the quark-gluon plasma, respectively. Such a selection can be achieved by selecting jets with the same upper cumulative effective cross section, after appropriately normalizing the AA cross section to account for the average number of nucleon-nucleon collisions.
With such a selection, a quantile matching can be used to specify p quant T for a given heavy-ion jet with reconstructed transverse momentum p AA T : where p quant T gives a proxy for the jet p T prior to modification by the quark-gluon plasma. The ratio between the heavy-ion and proton-proton jet transverse momentum in the same quantile then gives a physically-motivated quantification of the medium jet modification.
We now turn to explaining the intriguing connection between this quantile matching procedure and the ΣMD through optimal transport. We use transverse momenta in place of energies and take R → ∞ in Eq. (1.2), where the EMD becomes simply the difference in jet transverse momenta. Further, we set γ = 1 and S = 1 in Eq. (6.2) and note that the normalization of the cross sections makes the second term in that equation vanish.
The theory moving problem now becomes a simple one-dimensional optimal transport problem of moving the pp jet p T distribution to the AA jet p T distribution. Remarkably, this is mathematically equivalent to quantile matching. We use the notation TM to represent the optimal theory movement F * in the ΣMD. Letting T AA be the set of heavy-ion jets and T pp be the set of proton-proton jets, we have that: where we can define this formally using a "ghost" heavy-ion jet with transverse momentum p AA T and infinitesimal cross section σ ∼ 0.
Quantile matching can therefore be seen as a matching induced by the optimal theory movement between the heavy-ion and proton-proton jets. In this sense, it operationally defines the modification by the quark-gluon plasma in terms of the theory-movement of the jet transverse momentum spectrum. It would be interesting to follow this connection further and explore this procedure using the full EMD beyond the R → ∞ limit to study the medium modification as a function of the jet substructure.

Event clustering and coresets
One of the essential unsupervised methods for probing a dataset is to analyze its complexity. A method to do this for collider physics datasets is that of k-eventiness, recently introduced in Ref. [25]. Here, one seeks to find k representative events that minimize the EMD from each event in the dataset to the nearest representative event: where we have dropped the β and R subscripts on EMD for compactness. The value of V k probes how well the dataset is approximated by the k events. This gives rise to the notion of V k as the "k-eventiness" of the dataset, in analogy with N -(sub)jettiness, where smaller values of V k indicate better approximations. From a geometric perspective, V k is the smallest ΣMD to the manifold of k-event datasets. Analogous to Eq. Here, we use the |·| notation to count the number of events in T . Just like for N -(sub)jettiness, different values of γ highlight different aspects of theory space geometry. Following the logic in Sec. 4.1 of lifting the N -jettiness observable into the XCone jet algorithm, we can lift k-eventiness into an event clustering algorithm. The representative events K (i.e. the point of closest approach on the k-event manifold), has the interpretation of the "k-geometric-medians" for γ = 1 or "k-means" for γ = 2. For practical applications, it is often convenient to restrict the representative events to be within the dataset T , i.e. the "k-medoids", giving only an approximate value of V k . While the full problem of finding the representative jets may be computationally intractable, fast approximations to find the medoids exist and have been explored in Ref. [25].
Inspired by Sec. 4.2, one might consider implementing sequential clustering algorithms by iterating ΣMD computations to approximate M events with M − 1 events and so forth. Such a clustering may be helpful for rigorous data compression of large collider datasets or, if implemented efficiently, for tasks such as triggering. These ideas are closely related to the notion of finding a coreset (see Ref. [163] for a recent review), for which techniques from quantum information and quantum computation may also find use [164]. Additionally, Ref. [165] uses the Wasserstein metric to construct "measure coresets" that take into account the underlying data distribution and which may prove useful for high-energy physics applications. We leave further exploration of theory geometry and theory space algorithms to future work.

Conclusions
In this paper, we have explored the metric space of collider events from a theoretical perspective. Beginning from the EMD between final states, namely the "work" required to rearrange one into another, we have cast a multitude of diverse collider algorithms and analysis techniques in a geometric language. First, we connected this metric to the fundamental notion of IRC safety in massless quantum field theories, with the EMD providing a sharp language to define IRC safety and even Sudakov safety. We extended this connection by highlighting that a wide variety of collider observables, including thrust and N -jettiness, can be cast as distances between events and manifolds in this space. Further, we demonstrated that many jet clustering algorithms, such as exclusive cone finding and sequential clustering, can be exactly derived in full detail from the simple principle that jets are the best N -particle approximation to the event. Even pileup mitigation techniques developed to face the LHC-era challenge of high luminosity running can be cast in the language of subtracting a uniform radiation pattern, connecting this field to semidiscrete unbalanced optimal transport. Finally, we generalized our reasoning to define a distance between "theories" as sets of events with cross sections, proving a new lens to understand several existing techniques and a roadmap for future developments in the geometry of theory space.
From the perspective of massless quantum field theories, our metric space of events is the natural space for understanding observables, as the only truly observable quantities are IRC safe. More speculatively, it would be interesting to circumvent the (unphysical) particlelevel stage of calculations and make theoretical predictions directly in the space of events. Understanding and expanding in this direction would require natural notions of volume and integration in this space, perhaps aided by recent developments in Wasserstein spaces [166][167][168][169], though we leave this fascinating exploration to future work. Nonetheless, it has already been established that the energy flows themselves obey factorization theorems in effective field theory contexts [170], and give rise to rich behavior in correlators [5,[171][172][173]. Going directly from first principles and symmetries to observables (i.e. energy flows, not particles) suggests a natural extension to the philosophy driving the present scattering amplitudes program (see Refs. [159,174,175] for reviews). It is also interesting to extend this logic to massive quantum field theories where observable quantities can depend on flavor and charge [176].
It is also useful to discuss these developments in the broader context of machine learning and the physical sciences [122,177,178]. Typically, problems in the natural sciences can be cast as machine learning problems such as classification and regression, whereby the relevant tools from machine learning can be applied to achieve improved performance on those tasks. It is far rarer for machine learning to enhance our theoretical or conceptual understanding of physics directly. This story provides an interesting case where new insights and questions exposed by machine learning have impacted purely theoretical and phenomenological collider physics. The question of when two collider events are similar, for which the EMD was introduced, was originally motivated by unsupervised learning methods and autoencoders [179][180][181][182], which require a distance matrix or reconstruction loss. By providing an answer to this simple question, which itself involved familiar machine learning tools such as optimal transport, we uncovered a new mathematical formalism to better understand and express concepts in quantum field theory and collider physics. We hope that this will be just one example of many future profound insights into the natural sciences facilitated by this perspective.

A Energy moving with massive particles
In this appendix, we briefly explore an alternative definition of energy flow appropriate for massive particles, with a corresponding change in the measures used to define the EMD. The energy flow in Eq. (1.1) treats events as sets of particles that have energy-like weights {E i } and geometric directions {n i }. The EMD in Eq. (1.2) is based on pairwise distances {θ ij } that are only functions of then i andn j directions. The exact definitions of E i ,n i , and θ ij may vary depending on the collider context and other choices. For massless final-state particles in e + e − collisions, it is typical to take the energy E to be equal to the total momentum | p |, and the geometric directionn to be equal to the unit vector p/E. For massless particles in hadronic collisions, it is natural to use transverse momentum p T and a geometric direction based on azimuth φ and pseudorapidity η.
It is straightforward to adapt the energy flow to massive particles (see related discussion in Ref. [13]). For the energy measure, the natural choices are energy in the e + e − case and transverse energy in the hadronic case: ✓ ij < l a t e x i t s h a 1 _ b a s e 6 4 = " G / O + g 3 e f q a M r 5 I 3 Q k x + V q U Y 1 X G I = " > A A A C B n i c d V D L T h t B E O w l k B g D i Z N w 4 z L C Q u J k z Z K X c 7 P E J U c j 4 Y d k W 9 b s u B c P z M 6 u Z n o j W a u 9 5 w d y T f 6 A W 5 Q r v 8 E P 8 B 2 M 1 0 Y C l J T U U q m q W 9 1 d U a a V I 8 5 v g 4 0 X m 1 s v X 9 W 2 6 z u 7 e 6 / f N N 6 + 6 7 s 0 t x J 7 M t W p H U b C o V Y G e 6 R I 4 z C z K J J I 4 y C 6 O l 3 6 g + 9 o n U r N O S 0 y n C T i w q h Y S U F e G o 1 p j i S m h b o s p 4 0 m b 3 3 i 4 d f P n P E W r 1 C R d v g h Z O F a a X b 2 o U J 3 2 r g b z 1 K Z J 2 h I a u H c K O Q Z T Q p h S U m N Z X 2 c O 8 y E v B I X O P L U i A T d p K h O L t m R V 2 Y s T q 0 v Q 6 x S H 0 8 U I n F u k U S + M x E 0 d 8 + 9 p f g v b 5 R T 3 J 4 U y m Q 5 o Z G r R X G u G a V s + T + b K Y u S 9 M I T I a 3 y t z I 5 F 1 Z I 8 i k 9 2 R L j w i R Z W f f B P H z P / k / 6 J 6 2 Q t 8 K z j 8 1 O e 5 U Q 1 O A A D u E Y Q v g C H f g G X e i B h B R + w i / 4 H f w I r o M / w d 9 V 6 0 a w n n k P T x D c 3 A P U g 5 p u < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " G / O + g 3 e f q a M r 5 I 3 Q k x + V q U Y 1 X G I = " > A A A C B n i c d V D L T h t B E O w l k B g D i Z N w 4 z L C Q u J k z Z K X c 7 P E J U c j 4 Y d k W 9 b s u B c P z M 6 u Z n o j W a u 9 5 w d y T f 6 A W 5 Q r v 8 E P 8 B 2 M 1 0 Y C l J T U U q m q W 9 1 d U a a V I 8 5 v g 4 0 X m 1 s v X 9 W 2 6 z u 7 e 6 / f N N 6 + 6 7 s 0 t x J 7 M t W p H U b C o V Y G e 6 R I 4 z C z K J J I 4 y C 6 O l 3 6 g + 9 o n U r N O S 0 y n C T i w q h Y S U F e G o 1 p j i S m h b o s p 4 0 m b 3 3 i 4 d f P n P E W r 1 C R d v g h Z O F a a X b 2 o U J 3 2 r g b z 1 K Z J 2 h I a u H c K O Q Z T Q p h S U m N Z X 2 c O 8 y E v B I X O P L U i A T d p K h O L t m R V 2 Y s T q 0 v Q 6 x S H 0 8 U I n F u k U S + M x E 0 d 8 + 9 p f g v b 5 R T 3 J 4 U y m Q 5 o Z G r R X G u G a V s + T + b K Y u S 9 M I T I a 3 y t z I 5 F 1 Z I 8 i k 9 2 R L j w i R Z W f f B P H z P / k / 6 J 6 2 Q t 8 K z j 8 1 O e 5 U Q 1 O A A D u E Y Q v g C H f g G X e i B h B R + w i / 4 H f w I r o M / w d 9 V 6 0 a w n n k P T x D c 3 A P U g 5 p u < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " G / O + g 3 e f q a M r 5 I 3 Q k x + V q U Y 1 X G I = " > A A A C B n i c d V D L T h t B E O w l k B g D i Z N w 4 z L C Q u J k z Z K X c 7 P E J U c j 4 Y d k W 9 b s u B c P z M 6 u Z n o j W a u 9 5 w d y T f 6 A W 5 Q r v 8 E P 8 B 2 M 1 0 Y C l J T U U q m q W 9 1 d U a a V I 8 5 v g 4 0 X m 1 s v X 9 W 2 6 z u 7 e 6 / f N N 6 + 6 7 s 0 t x J 7 M t W p H U b C o V Y G e 6 R I 4 z C z K J J I 4 y C 6 O l 3 6 g + 9 o n U r N O S 0 y n C T i w q h Y S U F e G o 1 p j i S m h b o s p 4 0 m b 3 3 i 4 d f P n P E W r 1 C R d v g h Z O F a a X b 2 o U J 3 2 r g b z 1 K Z J 2 h I a u H c K O Q Z T Q p h S U m N Z X 2 c O 8 y E v B I X O P L U i A T d p K h O L t m R V 2 Y s T q 0 v Q 6 x S H 0 8 U I n F u k U S + M x E 0 d 8 + 9 p f g v b 5 R T 3 J 4 U y m Q 5 o Z G r R X G u G a V s + T + b K Y u S 9 M I T I a 3 y t z I 5 F 1 Z I 8 i k 9 2 R L j w i R Z W f f B P H z P / k / 6 J 6 2 Q t 8 K z j 8 1 O e 5 U Q 1 O A A D u E Y Q v g C H f g G X e i B h B R + w i / 4 H f w I r o M / w d 9 V 6 0 a w n n k P T x D c 3 A P U g 5 p u < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " c M p 6 X 3 3 U e D u K C s q r U P E A W 1 1 Z F 7 U = " > A x m Z p a 7 t l 4 9 R w a P J Y x u Y y Z B a k 0 N B 0 w k m 4 T A w w F U p o h 8 P T 3 G + P w F g R 6 w s 3 T q C n 2 L U W k e D M e e m q O w K e j S b 9 2 3 6 p T C q E E E o p z g m t H h N P 6 v X a I a 1 h m l s e Z T R D o 1 9 6 7 w 5 i n i r Q j k t m b Y e S x P U y Z p z g E i b F b m o h Y X z I r q H j q W Y K b C + b H j z B + 1 4 Z 4 C g 2 v r T D U / X 7 R M a U t W M V + k 7 F 3 I 3 9 7 e X i X 1 4 n d V G t l w m d p A 4 0 / 1 w U p R K 7 G O f f 4 4 E w w J 0 c e 8 K 4 E f 5 W z G + Y Y d z 5 j H 5 s i W C s V T I p + m C + v s f / k 9 Z h h Z I K P T 8 q n 9 R m E S 2 j X b S H D h B F V X S C z l A D N R F H C t 2 j B / Q Y 3 A V P w X P w 8 t l a C G Y z O + g H g t c P 4 8 W Z T w = = < / l a t e x i t > Figure 11: The space of (massive) particle kinematics, with pairwise distances corresponding to Euclidean distances in this space. Massless particles with v = 1 live on the boundary and particles at rest with v = 0 are at the origin. One can interpret this figure as a snapshot of the event taken at a time t after the collision, when the particles have traveled a distance vt.
Both of these reduce nicely to the expected expressions in the m i → 0 limit. For the geometric direction, the natural choices are velocity and transverse velocity, written in four-vector notation: where v = p i /E i is the particle three-velocity, v T i = p T i /E T i is the particle transverse twovelocity, and y i is the particle rapidity. Again, these have the expected behavior in the m i → 0 limit, and for finite mass, the velocities are bounded as | v| ∈ [0, 1] and | v T | ∈ [0, 1].
To define the EMD, we choose the following pairwise angular distance: where one replaces n µ with n µ T in the hadronic case. The first minus sign is needed because the difference between two time-like vectors with n 2 ∈ [0, 1] is space-like. This expression reduces to the usual expression θ ij = 2n µ i n jµ in the massless limit. To gain intuition for this geometric distance between massive particles, it is instructive to expand out Eq. (A.3) in the e + e − case: where Ω ij = arccosn i ·n j is the purely geometric angle between particles i and j. We see that the velocity magnitude v = | v| acts as a radial coordinate on the sphere, and the pairwise distances θ ij are just the Euclidean distances between two points in the unit ball, with distances v i and v j from the origin and angle Ω ij between them. Massless particles live entirely on the boundary with v = 1 and massive particles live inside the ball with 0 ≤ v < 1. An illustration of this massive particle phase space is shown in Fig. 11. The use of this massive distance measure has an interesting interplay with some of the studies in the body of the paper. For example, the analysis of thrust in Sec. 3.1.1 involved finding the EMD to the manifold of back-to-back massless particle configurations of potentially unequal energy. Using the massive particle distance, one could consider finding the EMD to the manifold of all possible two-particle configurations, including massive particles. For β = 2, this is equivalent to partitioning the event into two halves with masses M A and M B and corresponding energies E A and E B , and minimizing the quantity M 2 A /E A + M 2 B /E B . A nice feature of this approach is that the optimal two particle configuration has the same energies and velocities as one would get from clustering the particles in each half. Note that this approach is closely related to (but not identical to) the original definition of heavy jet mass in Ref. [67] based on minimizing M 2 1 + M 2 2 . The idea of optimizing jet regions based on M 2 /E also appears in the jet maximization approach [132]. In fact, using the massive distance measure in Eq. (4.2) with β = 2 and N = 1, and repeating the logic in Ref. [128], we recover precisely the algorithm in Ref. [132], where the parameter R controls the size of the resulting jet region. The EMD approach yields a natural way to extend the jet maximization algorithm to N > 1, and also allows for an alternative definition of N -jettiness from Eq. (3.16) based on time-like axes.
As another example, the analysis of recombination schemes in Sec. 4.2 involved minimizing the transportation cost to merge two particles into one. For β = 2, this was equivalent to the E-scheme, namely κ = 1 in Eq. (4.4), up to the subtlety noted in footnote 8. Using the massive particle distance, the merged particle in the E-scheme has the energy and direction: where appropriate T subscripts should be included in the hadronic case. The combined four-vector is p µ c = E c n µ c = p µ i + p µ j , (A. 6) which is a valid expression in both the e + e − and hadronic cases. Thus, the combined fourvector is just the sum of the two particles, which is indeed the desired E-scheme behavior. Note, however, that the interpretation of the jet radius is very different if one uses the massive particle distance, since clustering happens in velocity space. We leave further studies of the massive particle distance to future work.