Towards a Compositional Framework for Convex Analysis (with Applications to Probability Theory)

We introduce a compositional framework for convex analysis based on the notion of convex bifunction of Rockafellar. This framework is well-suited to graphical reasoning, and exhibits rich dualities such as the Legendre-Fenchel transform, while generalizing formalisms like graphical linear algebra, convex relations and convex programming. We connect our framework to probability theory by interpreting the Laplace approximation in its context: The exactness of this approximation on normal distributions means that logdensity is a functor from Gaussian probability (densities and integration) to concave bifunctions and maximization.


Introduction
Convex analysis is a classical area of mathematics with innumerous applications in engineering, economics, physics, statistics and information theory.The central notion is that of a convex function f : R n → R, satisfying the inequality f (tx + (1 − t)y) ≤ tf (x) + (1 − t)f (y) for all t ∈ [0, 1].Convexity is a useful property for optimization problems: Every local minimum of f is automatically a global minimum.Convex functions furthermore admit a beautiful duality theory; the ubiquitous Legendre-Fenchel transform (or convex conjugation) defined as encodes f in terms of all affine functions ⟨x * , x⟩ − c majorized by f (here ⟨−, −⟩ denotes the standard inner product on R n ).The function f * is convex regardless of f , and under a closedness assumption we recover f * * = f .While convex analysis is a rich field, its compositional structure is not readily apparent; the central notion, convex functions, is not closed under composition.The notion which does compose is less well known: a convex bifunction, due to [27], is a jointly convex function F : R m × R n → R of two variables.Such bifunctions compose via infimization (F • G)(x, z) = inf y {F (y, z) + G(x, y)} arXiv:2312.02291v2[math.CT] 27 Jan 2024 Categorical Methods In this work, we will study bifunctions and their associated dualities in the framework of category theory.Graphical methods are ubiquitous in engineering and statistics, and can used to derive efficient algorithms by making use of the factorized structure of a problem.The language of props and string diagrams unifies these methods, as a large body of work on graphical linear algebra and applied category theory shows [2,1,19,7].We extend these methods to problems of convex analysis and optimization.Our category of bifunctions subsumes an array of mathematical structures, such as linear maps and relations, convex relations, and (surprisingly) multivariate Gaussian probability.It turns out that anything we can do with Gaussian densities and integration can instead be done with logdensities and maximization.For example, to compute the density of a sum of independent variables, we may take a convolution of densities, or instead compute a sup-convolution of logdensities (see Fig. 1), as This is highly particular to Gaussians.We can elegantly formalize this statement in categorical terms, as our main theorem states: Logprobability defines a functor from Gaussian probability to concave bifunctions (Theorem 5) In this sense, the essence of Gaussians is captured by concave quadratic functions.By extending our viewpoint to partial concave quadratic functions, we obtain a generalized notion of Gaussian relation which includes improper priors.Such entities are subtle to describe measure-theoretically, but straightforward in the convex analytic view.The duality theory of bifunctions generalizes the duality of precision and covariance, and more generally connects to the notion of cumulant-generating function in probability theory.
We elegantly formalize the connections between convex analysis and probability theory using the language of Markov categories [17], which are a categorical formalism for probability theory, and have close connections to the semantics of probabilistic programs [30].
Contribution and Outline This paper is intended to serve as a high-level roadmap to a categorical treatment of convex analysis.Our aim is to spell out the underlying structures, and present a diverse range of connections, especially with diagrammatic methods and categorical probability.For the sake of presentation, we choose to stick to general statements and keep some technical notions (such as regularity conditions) informal.Spelling out the details in a concrete setting is a starting point for future developments.We elaborate one such particular setting in detail, namely Gaussian probability.
We begin §2 by recalling the relevant notions of convex analysis, and proceed to define and study the categorical structure of bifunctions in §3.This includes two structures as a hypergraph category and the duality theory of §3.1.
In §4, we elaborate different examples of categories which embed in bifunctions, such as linear and affine algebra, linear algebra, convex relations and convex optimization problems.In each case, the embedding preserves the relevant categorical structures and dualities.In particular, we show that the theory of bifunctions is a conservative extension of graphical linear algebra [25].
In §5 we begin making connections to probability theory.We recall Gaussian probability from a categorical point of view, and construct the embedding functor to bifunctions.We discuss how partial quadratic functions can be seen as an extension of Gaussian probability beyond measure theory.
We conclude with §6-7 discussing the wider context of this work, elaborating connections of probability and convex analysis such as the Laplace approximation and cumulant generating functions, and the idea of idempotent analysis as a 'tropical limit' of ordinary analysis.

Overview of Convex Analysis
The following section is a brief overview of standard material in convex analysis; all propositions and conventions are taken from [27].
Caveat: An important feature of convex analysis is that it deals with formal infinities +∞, −∞ in a consistent fashion.This is crucial because optimization problems may be unbounded.Traditionally, one considers the extended real numbers R = [−∞, +∞] and extends the usual laws of arithmetic to them.The case (+∞) + (−∞) is left undefined and carefully avoided like 0/0 in real analysis.
A more systematic approach [36,18] is based on enriched category theory, and endows R with the structure of a commutative quantale, which gives it totally defined operations with a particular arithmetic.
A more serious caveat is that many results in convex analysis require specific regularity assumptions to hold.As these assumptions are not the focus of the present paper, so we will state some big picture theorems in §3 under reservation of these conditions.We then elaborate an array of concrete examples §4-5 where we make sure that all regularity conditions are indeed satisfied.We discuss this drawback in §7.
. This is equivalent to the well-known definition from the introduction, while accounting for infinities.We say that f is concave if (−f ) is convex.
Example 1.The following functions are convex: linear functions, |x|, x 2 , exp(x), − ln(x).For a convex subset A ⊆ R n , the convex indicator function δ A : R n → R is defined by We also write indicator functions using modified Iverson brackets as ; this is equivalent to f being lower semicontinuous.
2.1 Conjugacy -the Legendre-Fenchel transform Definition 1.For a convex function f : R n → R, its convex conjugate f * : R n → R is the convex function For a concave function g : R n → R, its concave conjugate g * : R n → R is the concave function Geometrically, f * encodes information about which affine functions ⟨x * , −⟩ − c are majorized by f .It is thus natural to view f * as a function on covectors x * ∈ (R n ) * .This is for example done in [36], but in order to keep notation consistent with [27], we make the traditional identification (R n ) * ∼ = R n via the inner product, and the notation x * is purely decoration.The Legendre-Fenchel transform has applications in many areas of mathematics and physics [33], such as the Hamiltonian formalism in mechanics, statistical mechanics or large deviation theory (e.g.§6.2).
A closed convex function f is the pointwise supremum of all affine functions h ≤ f [27, 12.1].This allows them to be recovered by their Legendre transform Proposition 1 ([27, Theorem 12.2]).For any convex function f : R n → R, f * is a closed convex function.We have f * * = f if and only if f is closed.
For arbitrary functions f , the operation f → f * * is a closure operator which we denote by cl(f ).This is the largest closed convex function majorized by f .

Categories of Convex Bifunctions
We now come to the central definition of this article, namely that of convex (or concave) bifunctions.This concept is due to [27] and scattered throughout his book.
A bifunction F from R m to R n is the convex analysis terminology for a curried function R m → (R n → R).The uncurried function F : R m+n → R is referred to as the graph function of F .We will suppress the partial application and write F (x)(y) and F (x, y) interchangeably.Definition 2. A bifunction F from R m to R n is called convex (or concave, closed) if its graph function F : R m+n → R has that property.The closure operation cl(F ) is applied on the level of graph functions.We denote a convex bifunction by F : R m ⇀ R n and a concave bifunction by F : R m ⇁ R n .
Definition 3 (Categories of bifunctions).We define a category CxBiFn of convex bifunctions as follows objects are the spaces R n morphisms are convex bifunctions R m ⇀ R n the identity R n ⇀ R n is given by the indicator function id n (x, y) = {|x = y|} composition is infimization over the middle variable Analogously, the category CvBiFn of concave bifunctions is defined as objects are the spaces R n morphisms are concave bifunctions R m ⇁ R n the identity R n ⇁ R n is given by the concave indicator function composition is supremization over the middle variable Proof (of well-definedness).This construction is a subcategory of the the category of weighted relations Rel(Q) taking values in a commutative quantale Q [3,12,23], where Q = R are the extended reals.It suffices to verify that convex bifunctions are closed under composition, tensor (addition) and contain the identities ([27, p. 408]).
We will write bifunction composition as F • G when it is clear from context whether we use the convex or concave variety.We will write I for the unit space R 0 , and 0 for its unique element.

Duality for Bifunctions
Unless otherwise stated, theorems phrased for convex bifunctions will hold for concave bifunctions by selecting the appropriate versions of the operations.
The duality theory of convex functions extends to bifunctions as follows.
Definition 4 ([27, §30]).The adjoint of a convex bifunction F : R m ⇀ R n is the concave bifunction F * : R n ⇁ R m defined by The adjoint of a concave bifunction is convex and uses sup instead of inf.The adjoint of the convex bifunction F is related to the conjugate of its graph function F using the formula F * (y * , x * ) = −F * (−x * , y * ).(Note the slight asymmetry that one input is negated) The analogue of Proposition 1 for bifunctions is as follows Proposition 3 ([27, Theorem 30.1]).For any convex bifunction F , the adjoint F * is a closed concave bifunction, and we have F * * = cl(F ).In particular, if F is a closed convex bifunction, then F * * = F .
Theorem 1 ([27, Theorem 38.5]).Under regularity assumptions, the adjoint operation respects composition.That is, for That is, the adjoint operation defines a pair of mutually inverse functors We indicate with dashed arrows that the functoriality depends on regularity assumptions.
For the interested reader, the regularity assumptions in Theorem 1 include closedness, as well as properness and certain (relative interiors of) domains of the involved bifunctions intersecting [27, § 38].These assumptions are not necessary conditions.
As a corollary of functoriality, we can derive the following well-known fact: Corollary 1 (Fenchel duality).Let f : R n → R be convex, g : R n → R concave, and let f * , g * be their convex and concave conjugates respectively.Then under sufficient regularity assumptions, we have Proof.Consider the convex function h = −g and form the state and effect e h : R n ⇀ I, e h (x, 0) = h(x).The proof proceeds by using functoriality to compute the scalar (e in two ways: On the one hand, we have On the other hand, we express the adjoints in terms of the conjugates The adjoint acts as the identity on scalars, so we obtain

Hypergraph Structure and Symmetries
Bifunctions can not only be composed in sequence, but also in parallel.The relevant structure is that of a symmetric monoidal category (C, ⊗, I).In this work, we are dealing with a particular simple form of such categories called a prop.A prop C is a strict monoidal category which is generated by a single object R so that every object is of the form R ⊗n for some n ∈ N. The monoid of objects (ob(C), ⊗, I) is thus isomorphic to (N, +, 0).Proposition 4. Convex bifunctions have the structure of a prop, generated by the object R 3. The tensor of bifunctions is given by addition: Proof (of well-definedness).General fact about categories of weighted relations Rel(Q) ( [23]).
Symmetric monoidal categories are widely studied and admit a convenient graphical language using string diagrams [28].It is useful to consider further pieces of structure on such a category 1. in a copy-delete category [11], every object carries the structure of a commutative comonoid copy X : X → X ⊗ X and discard X : X → I.This lets information be used in a non-linear way (in the sense of linear logic).2. in a hypergraph category [14], every object carries the structure of a special commutative Frobenius algebra Every hypergraph category is in particular a copy-delete category.The pieces of structure of a hypergraph category are often rendered as cups and caps in string diagrams copy discard multiply unit subject to equations such as the Frobenius law = = This gives rise to a rich graphical calculus, which has been explored for a number of engineering applications like signal-flow diagrams or electrical circuits [25,8,7,9,2,1] Proposition 5. CxBiFn has the structure of a hypergraph category in two different ways, which we call the additive and co-additive structure.That is, every object carries two different structures as a special commutative Frobenius algebra 1.The additive structure is given by The co-additive structure is given by The analogous structures on CvBiFn use concave indicator functions instead.
We can motivate the names of the hypergraph structures by observing how multiplications acts on states.This duality is clarified in what follows.
Example 5. Let f, g : I ⇀ R n be two states.Then Definition 5.The dagger of a bifunction F : R m ⇀ R n is given by reversing its arguments The inverse of a bifunction F : R m ⇀ R n is the concave bifunction [27, p. 384] Both these operations define involutive4 functors The functor (−) † is a dagger functor in the sense of [29].The composite operation F * * defines another covariant functor CxBiFn → CxBiFn, which we now interpret: As is customary in graphical linear algebra, we render the two hypergraph structures as follows copy discard comp unit coadd cozero add zero (1) We refer to the additive structure as 'black' (•) and the co-additive one as 'white' (•).This presentation reveals an array of symmetries (mirror-image and color-swap 5 ), which we are relating now: Theorem 2. The adjoint operation interchanges the additive and co-additive structure.That is we have functors of hypergraph categories Note that the opposite of a hypergraph category is again a hypergraph category where cups and caps are interchanged.
Proof.Follows from the results in §4.1, as the hypergraph structures are induced by linear maps.
In terms of the generators (1), the mirror image is given by the (−) † functor.Both hypergraph structures consist of †-Frobenius algebras, meaning that (−) † is a functor of hypergraph categories CxBiFn op → CxBiFn.
The color-swap operation is given by the inverse adjoint F * * , which gives a hypergraph equivalence (CxBiFn, •) → (CxBiFn, •).This equivalence does however not commute with †, i.e. is not an equivalence of dagger hypergraph categories.

Example Categories of Bifunctions
We now elaborate example subcategories of bifunctions on which functoriality and duality work as desired (that is, all regularity conditions apply).

Linear Algebra
The identities and dualities of convex bifunctions generalize those of linear algebra.Let A : R m → R n be a linear map.The convex indicator bifunction of A is defined as The following facts hold [27, p 310]: 1.For composable linear maps, A, B we have 3. if A is invertible, then the inverse (F A ) * is the concave indicator bifunction associated to the inverse A −1 .In that case, Proposition 6 generalizes the identity (A −1 ) T = (A T ) −1 .
In more categorical terms, let Vect denote the prop of the vector spaces R n and linear maps.This is a copy-delete category equipped with the linear maps ∆ : R n → R n ⊕ R n and !: R n → R 0 .For a linear map A : R m → R n , define Theorem 3. We have a commutative diagram of functors between copy-delete categories Proof.Functoriality and commutativity follow from the above facts.For the copy-delete structures, notice that copy, delete, add, zero are the indicator bifunctions of the linear maps ∆ and !.The transpose of ∆ is summation (x, y) → x+y.
We call a diagram like (2) a duality situation.The dashed arrows indicate that, while (−) * is neither a functor nor idempotent on all bifunctions without further conditions, everything works out on the image of F, G respectively.We could thus obtain a genuine commutative diagram of functors by characterizing these images exactly (which we refrain from doing here for the sake of simplicity).
Linear Relations Graphical Linear Algebra [25] is the diagrammatic study of the prop LinRel of linear relations, which are relations R ⊆ R m × R n that are also vector subspaces.This category is a hypergraph category using the two structures shown in (1), and the operations mirror-image and color-swap are defined for linear relations via relational converse and a twisted orthogonal complement The operations (−) † and (−) c commute and define a composite contravariant involution (−) * : LinRel op → LinRel.The following theorem shows that bifunctions are a conservative extension of graphical linear algebra.In addition, the functor I preserves both hypergraph structures.
Affine Relations Graphical linear algebra has been extended to affine relations [6]; those are affine subspaces R ⊆ R m × R n .This still forms a hypergraph category with both structures •, •, however the color-swap symmetry of linear relations is broken.That is because the affine generator 1 : 0 → 1 representing the affine relation {(0, 1)} does not have an obvious color-swapped dual; affine subspaces are not recovered by their orthogonal complements.
The embedding into bifunctions suggests an avenue to recover such a symmetry: Taking the embedding (3) as a starting point, the indicator bifunction of 1 is f : which is a perfectly well-defined bifunction but not the indicator bifunction of any affine relation.This suggests that an extension of affine relations with colorswap symmetry can be obtained using a category of partial affine function (e.g.[27, p. 107]) but details are to left for future work.We will discuss the case of partial quadratic functions in §5.2.

Convex Relations
Generalizing the previous example even further, a convex relation R ⊆ R m × R n is a relation which is also a convex subset of R m+n .Convex relations are closed under the usual relation composition and thus form a prop CxRel [3,12,23].
Every linear relation is in particular convex, and like linear relations, convex relations embed into convex bifunctions via the indicator function.
We sketch a certain converse to this embedding: The space (R, +, 0) is a monoid object in CxRel.We consider the 'writer' monad T : CxRel → CxRel associated to that monoid, i.e.T (R m ) = R m+1 .If S ⊆ R m × R n+1 and R ⊆ R n × R k+1 are Kleisli arrows, then Kleisli composition takes the following form R • S = {(x, z, t 1 + t 2 ) : (x, y, t 1 ) ∈ S, (y, z, t 2 ) ∈ R} Given a convex bifunction F : R m ⇀ R n , the epigraph of its graph function epi(F ) ⊆ R m × R n+1 is thus a Kleisli arrow for T .Under sufficient regularity assumptions, this is functorial, and we have an embedding epi : CxBiFn → CxRel T .

Ordinary Convex Programs
We briefly discuss the historical origins of bifunctions in convex optimization [27, § 29-30]: For simplicity, we say that a ordinary convex program P is a minimization problem of the form inf{f (x) : where the objective function f and the constraints g 1 , . . ., g k : R n → R are finite convex functions.The bifunction associated to P is defined as The inputs of v ∈ R k can be thought of as perturbations of the constraints.The so-called perturbation function of P is the parameterized minimization problem (inf F P )(v) = inf x {F P (v, x)}.The convex function F P (0, −) represents the unperturbed problem and (inf F P )(0) is the desired solution.Note that in categorical language, the perturbation function is straightforwardly obtained as the bifunction composite (discard • F P ) : R k ⇀ I, or graphically The associated bifunction F P contains all information about the problem P , and allows one to find the dual problem P * by taking its adjoint.This way one can think of any bifunction R k ⇀ R n as a generalized convex program ([27, p. 294]).The associated bifunction and its adjoint are which is the concave bifunction associated to the dual maximization problem sup{⟨b, y⟩ : y ≥ 0, c − A T y ≥ 0}

Gaussian Probability and Convexity
We now study the probabilistic applications of our categorical framework: Recently, a sizeable body of work in categorical probability theory has been developed in terms of copy-delete and Markov categories.A Markov category [17] is a copy-delete category (C, ⊗, I) where every morphism f : Classic examples of Markov categories are the category FinStoch of finite sets and stochastic matrices, and the category Stoch of measurable spaces and Markov kernels.Discardability expresses that probability measures are normalized (integrate to 1).Markov categories provide a natural semantic domain for probabilistic programs [30].
In this section, we will focus on Gaussian probability, by which we mean the study of multivariate normal (Gaussian) distributions and affine-linear maps.This is a small but expressive fragment of probability, which suffices for a range of interesting application from linear regression and Gaussian processes to Kalman filters.The univariate normal distribution N (µ, σ 2 ) is defined on R via the density function Multivariate Gaussian distributions are easiest described as the laws of random vectors A•X +µ where A ∈ R n×k and X 1 , . . ., X k ∼ N (0, 1) are independent variables.The law is fully characterized by the mean µ and the covariance matrix Σ = AA T .Conversely, for every vector µ ∈ R n and positive semidefinite matrix Σ ∈ R n×n , there exists a unique Gaussian law N (µ, Σ).If X ∼ N (µ, Σ) and Y ∼ N (µ ′ , Σ ′ ) are independent then X + Y ∼ N (µ + µ ′ , Σ + Σ ′ ) and AX ∼ N (Aµ, AΣA T ).Gaussians are self-conjugate: If (X, Y ) are jointly Gaussian, then so is the conditional distribution X|Y = y for any constant y ∈ R k .
If the covariance matrix Σ is positive definite, then the Gaussian has a density with respect to the Lebesgue measure on R n given by where Ω = Σ −1 is known as the precision matrix.This suggests two equivalent representations of Gaussians with different advantages (e.g.[20,31]): -In covariance representation Σ, pushforwards (addition, marginalization) are easy to compute.Conditioning requires solving an optimization problem -In precision representation Ω, conditioning is straightfoward.Pushforwards require solving an optimization problem.
If Σ is singular, the Gaussian distribution is only supported on the affine subspace µ + S where S = im(Σ).In that case, the distribution has a density only with respect to the Lebesgue measure on S.This variability of base measure makes it complicated to work with densities, and by extension the precision representation.
The situation becomes clearer if we represent Gaussians by the quadratic functions induced by their covariance and precision matrices.These functions are convex (concave), and turn out to be adjoints of each other.This explains the duality of the two representations, and paves the way for generalizations of Gaussian probability like improper priors [31] which correspond to partial quadratic functions ( §5.2).

Embedding Gaussians in Bifunctions
We now give a categorical account of Gaussian probability (in covariance representation).A Gaussian morphism R m → R n is a stochastic map of the form x → Ax + N (µ, Σ), that is a linear map with Gaussian noise.

Definition 6 ([17, §6]
).The Markov prop Gauss is given as follows 1. objects are the spaces R n , and Σ ∈ R n×n positive semidefinite 3. composition and tensor are given by the formulas where ⊕ is block-diagonal composition.4. the copy-delete structure is given by the linear maps ∆, !
We have a Markov functor Gauss → Stoch which sends R n to the measurable space (R n , Borel(R n )) and assigns (A, µ, Σ) to the probability kernel given by x → N (Ax+µ, Σ).Functoriality expresses that the formulas of Definition 6 agree with composition of Markov kernels given by integration of measures.Our main theorem shows that, surprisingly, the representation of Gaussians by quadratic functions is also functorial, i.e. we have an embedding Gauss → CxBiFn.
Proof.We elaborate the proof systematically in the appendix.
The value logpdf f (y, x) is indeed the conditional log-probability (4) minus a scalar.The name cgf is short for cumulant-generating function, which we elaborate in §6.2.For now, we can see cgf as a generalized covariance representation.

Outlook: Gaussian Relations
Measure-theoretically, there is no uniform probability distribution over the real line.Such a distribution, if it existed, would be useful to model complete absence of information about a point X -in Bayesian inference, this is called an uninformative prior.Intuitively, such a distribution should have density 1, but this would not integrate to 1. On the other hand, a formal logdensity of 0 makes sense -this is simply the indicator function of the full subset R ⊆ R.
An extended Gaussian distribution, as described in [31], is a formal sum N (µ, Σ) + D of a Gaussian distribution and a vector subspace D ⊆ R n , called a fibre, thereby blending relational and probabilistic nondeterminism.Such entities were considered by Willems in the control theory literature, under the name of linear open stochastic systems [34,35]; he identifies them with Gaussian distributions on the quotient space R n /D.A categorical account based on decorated cospans is developed in [31].
It is straightforward to embed extended Gaussian distributions into convex bifunctions, by taking the sum of the interpretations from Theorems 4 and 5.The distribution ψ = N (µ, Σ) + D has a convex interpretation given by Functions of this form are partial convex quadratic functions, which are known to form a well-behaved class of convex functions (see §7.3).The theory of such functions can be understood as an extension of Gaussian probability with relational nondeterminism and conditioning, which we term Gaussian relations.
In Gaussian relations, we achieve full symmetry between covariance and density representation (that is, there exists a color-swap symmetry).
Partiality is necessary to be able to interpret all generators of (1); on the upside, the presence of partiality makes conditioning a first-class operation: For example, if f : R 2 ⇁ I is the joint logdensity of Gaussian variables (X, Y ), then conditioning on Y = 0 is the same as computing the bifunction composite with the zero map, which is a simple restriction of logdensity f X|Y =0 (x) = f (x, 0).On the other hand, conditioning in the covariance representation f * requires solving the infimization problem inf y * {f * (x * , y * )}.Graphically, we have f f * vs.

A Wider Perspective
The example of Gaussian probability was particular situation in which we could map probabilistic concepts to concepts of convex analysis in a functorial way.In this section, we will take an even wider perspective and view convex bifunctions as a categorical model of probability on its own.We will then point out known connections between probability theory and convex analysis, such as the Laplace approximation and cumulant generating functions.

The Laplace Approximation
For every copy-delete category C, the subcategory of discardable morphisms is a Markov category, and can therefore be seen as a generalized model of probability theory.We investigate this notion for categories of bifunctions.Proposition 7. Let F : R m ⇀ R n , G : R n ⇁ R m be bifunctions.Then and the adjoint (−) * defines a bijection between the two.
The embedding of Theorem 5 takes values in discardable bifunctions and hence preserve Markov structure.Functoriality means that the composition of Gaussians (integration) and the composition of bifunctions (optimization) coincide.For general probability distributions, this will no longer be the case.We can however understand bifunction composition as an approximation of ordinary probability theory under the so-called Laplace approximation.In its simplest instance, Laplace's method (or method of steepest ascent) is a method to approximate certain integrals by finding the maxima of its integrand (e.g.[33]) A wide class of commonly used probability distributions is log-concave, including Gaussian, Laplace, Dirichlet, exponential and uniform distributions.Laplace's approximation (e.g.[22, §27]) is a way of approximating such distributions around their mode x 0 by a normal distribution, as the Taylor expansion of their logdensity resembles a Gaussian one We can attempt to reduce questions about such distributions to mode-finding (maximization).The Laplace approximation is fundamental in many applications such as neuroscience [15,16] and has been generalized to a large body of literature on so-called saddle-point methods [10,24].The existence of the functor from Gaussians to bifunctions expresses that, as desired, the Laplace approximation is exact on Gaussians.We give an example of the approximation not being exact (ironically) on Laplacian distributions.).The latter function is idempotent under addition, and conversely h □ h = h, so h is idempotent under infimal convolution.In contrast, the density f (x) is not idempotent under integral convolution: The sum of independent standard Laplacians is not itself Laplacian.

Convex Analysis in Probability Theory
For a random variable X on R n , the moment generating function M X is defined by the following expectation (provided that it exists) M X (x * ) = E[e ⟨x * ,X⟩ ].The cumulant-generating function is defined as its logarithm c X (x * ) = log M X (x * ).
The function c X is always convex.The cumulant-generating function of a multivariate Gaussian X ∼ N (µ, Σ) is precisely which explains our choice of the convex bifunction cgf associated to a Gaussian morphism in Theorem 5.The notion of cumulant-generating function has a central place in the study of exponential families.It is a particular fact about Gaussians that the cumulant-generating function is the convex conjugate of the logdensity.In the general case, the convex conjugate c * X (x) does have a probabilistic interpretations as a so called-rate function in large deviations theory (Cramér's theorem, [13]).It has also been used to formulate a variational principle [37].

Idempotent Mathematics
We zoom out to an even wider perspective: This subsection briefly outlines some further background of the connections between convex and probabilistic world: The logarithm of base t < 1 defines an isomorphism of semirings ([0, ∞), ×, +) → (R ∪ {+∞}, +, ⊕ t ) where ⊕ t is x ⊕ t y = log t (t x + t y ).In the 'tropical limit' t ↘ 0, we have x ⊕ t y ≈ min(x, y), so we can consider working in the semiring (R, +, min) as a limit or deformation of the usual operations on the reals.The semiring R is idempotent, meaning x ⊕ x = min(x, x) = x, hence this field of study is also known as idempotent mathematics [26], and the limiting procedure has been called Maslov dequantization [21].Our definition of convex bifunctions in terms of the idempotent semiring R thus carries a strong flavor of idempotent mathematics.
Idempotent analogues of measure theory are discussed in [26,21], and many theorems in classical probability theory are mirrored by theorems of idempotent probability theory.For example, the idempotent analogue of integration is infimization; under this view, the tropical analogue of the Laplace transform (cf.moment-generating function) is the Legendre transform [21, §7] which explains the appearance of the cumulant-generating function in our work.Theorem 5 means that for Gaussians, it makes no difference whether we work in the real-analytic or idempotent world.Idempotent Gaussians have been defined in [26, 1.11.10] using the same formula (5).

Related and Future Work
We have described categories of bifunctions as a compositional setting for convex analysis which subsumes a variety of formalisms like linear functions and relations, as well as convex optimization problems, and has a rich duality theory and an elegant graphical language.We have then explored connections between convex analysis and probability theory, and showed that Gaussian probability can be equivalently described in a measure-theoretic and a convex-analytic language.The equivalence of these two perspectives is elegantly formalized as a structure-preserving functor between copy-delete categories.It will be interesting to see how this approach can be generalized to larger classes of distributions such as exponential families.
Concurrently to our work, the categorical structure of convex bifunctions has been exploited by [19] to compositionally build up objective functions for MPC in control theory.That work does not explore Legendre duality and the connections with categorical models of probability theory.The language of props has a history of applications in engineering [2,1,7], and our work was directly inspired by the semantics of probabilistic programming [32,30].
A starting point for future work is to flesh out the outlook given in §5.2, that is to define a hypergraph category of partial quadratic convex functions, which generalizes Gaussian and extended Gaussian probability.It is also interesting to give a presentation for this prop in the style of [25]: We believe that this is achieved by the addition of a single generator ν : I → R to graphical affine algebra [6] which represents the quadratic function f (x) = 1 2 x 2 , and that its equational theory is essentially given by invariance under the orthogonal groups O(n).A similar equational theory has been attempted in [32] though no completeness has been proven.Diagrammatic presentations of concepts from geometry and optimization such as polyhedral algebra and Farkas lemma have been given in [4,5].
We realize that the dependence on regularity assumptions (the caveat of §2) makes general theorems about categories of bifunctions like Theorem 1 somewhat awkward to state.We still believe that using a general categorical language is a useful way of structuring the field and making connections, but see the following avenues of improving the technical situation 1. Identifying specific, well-behaved subcategories of bifunctions (such as convex relations, (partial) linear and (partial) quadratic functions) on which everything behaves as desired.This was pursued in §4 and §5. 2. The Legendre-Fenchel transform has been phrased in terms of enriched adjunctions in [36].It stands to hope that developing this enriched-categorical approach may take care of some regularity conditions in a systematic way.
Proposition 11.Let A ∈ R n×n be a symmetric matrix, then the value ⟨y, A − y⟩ does not depend on the choice of generalized inverse A − for all y ∈ im(S).
Proof.Let y = Ax, then ⟨Ax, A − Ax⟩ = ⟨x, AA − Ax⟩ = ⟨x, Ax⟩, so we need to show that that value does not depend on the choice of x.Let x ′ be another solution, then A(x − x ′ ) = 0, and we can derive ⟨x, Ax⟩ − ⟨x ′ , Ax ′ ⟩ = 0.

Partial Quadratic Functions
The theory of (partial) quadratic functions is spelled out in [27, p. 109], which we summarize here: A quadratic function q : R n → R is convex if and only if it is of the form q(x) = ⟨x, Ax⟩ + ⟨µ, x⟩ + c with A positive semidefinite.A partial convex quadratic function (pcqf) is function of the form f (x) = q(x) + {|x ∈ M |} where q is a convex quadratic function and M ⊆ R n is an affine subspace.One can show that every pcqf arises via suitable linear transformations from an elementary pcqf given by the diagonal form From this formula, we can derive that the class of pcqf is this closed under convex conjugation.

Example 4 .
The states (morphisms I ⇀ R n out of the unit) are in bijection with convex functions f : R n → R, as are the effects R n ⇀ I. States and effects in CvBiFn are in bijection with concave functions f : R n → R.

Theorem 4 .
If we embed a linear relation R ⊆ R m × R n via its indicator function as a bifunction I R : R m ⇀ R n , then we have a commutative diagram