OPTIMASS: A Package for the Minimization of Kinematic Mass Functions with Constraints

Reconstructed mass variables, such as $M_2$, $M_{2C}$, $M_T^\star$, and $M_{T2}^W$, play an essential role in searches for new physics at hadron colliders. The calculation of these variables generally involves constrained minimization in a large parameter space, which is numerically challenging. We provide a C++ code, OPTIMASS, which interfaces with the MINUIT library to perform this constrained minimization using the Augmented Lagrangian Method. The code can be applied to arbitrarily general event topologies and thus allows the user to significantly extend the existing set of kinematic variables. We describe this code and its physics motivation, and demonstrate its use in the analysis of the fully leptonic decay of pair-produced top quarks using the $M_2$ variables.


Introduction
The CERN Large Hadron Collider (LHC) has successfully completed the decades-long quest to discover the particles of the Standard Model (SM) by finding the Higgs Boson [1,2]. The paramount question in the current Run 2 of the LHC is whether the LHC can reach the relevant energy scale to discover physics beyond the standard model (BSM). Popular frameworks for new physics such as Supersymmetry (SUSY) [3][4][5][6] and Universal Extra Dimensions (UED) [7,8] predict: 1. The presence of particles, such as neutralinos or KK-photons, that are not reconstructed in the detector, and are hence termed "invisible". In general, the production of these particles will lead to "missing transverse energy" 1 (MET) in an event.
2. Relatively complex decay topologies, in which pair-produced, generally colored, particles undergo several subsequent decays. Each "decay chain" thus produces one or more visible particles, as well as at least one invisible particle.
Searches for new particles that produce such long decay chains in combination with a MET signature are complicated by large backgrounds from tt, W , and Z production, often with additional jets from initial state radiation (ISR). The severity of these backgrounds in multijet and multilepton channels increases with the collider energy. Even if a signal of such new physics is seen, the corresponding measurements of particle properties such as masses, couplings, and spins, are highly nontrivial [9][10][11]. Therefore, sophisticated procedures must be used to separate signal from background and to extract the quantum numbers of the new particles. In a given event, one observes some number of "physics objects" which correspond to physical particles that have produced appropriate energy deposits in the tracker and calorimeters of the detector; we will refer to these objects as "visible particles". We shall denote their measured four-momenta by p µ j , where j is the visible particle label. At the same time, the existence of a non-vanishing MET in the event indicates the presence of some number of additional, invisible, particles with four-momenta q µ k , where k now labels the invisible particles. The individual momenta q µ k are not measured, and the only available piece of information is the missing transverse momentum (1.1) The MET, E T / , is then simply the magnitude of the missing transverse momentum vector: The essential question is how to take these sets of measured four-momenta, {p µ j } a , (one for each event a) and determine whether the events are produced purely by SM processes or whether new physics is at work. The standard procedure is to construct some kinematic variable, v, and compare the v distribution predicted by the SM to the data. In choosing a suitable variable, v, one typically follows one of these approaches: 1. The variable, v, is an analytic function of some, but not all, of the measured momentum degrees of freedom. This is the preferred approach in inclusive analyses, where one targets a specific subset of the event. For example, in an inclusive search for a visibly decaying resonance, X, one would select only the momenta of the hypothesized decay products j 1 , j 2 , ..., j n , and form the invariant mass of the resonance X, leaving out all other objects in the event. 2 In the case of missing energy events, the situation is much more complicated -we cannot reconstruct the mass of the resonance as in eq. (1.3), since we have not measured the momenta, q µ k , of the invisible decay products. Then, one typically tries to form a variable which correlates with the scale of M X . Various candidates have been tried, including the transverse momentum of the hardest object of a given type (lepton, jet, etc.) [12,13], the scalar p T sum of the four hardest jets (or of all jets) [14], the jet multiplicity [15,16], the "fat" jet mass [17], the "contransverse mass", M CT [18][19][20], the lepton energy [21][22][23] or lepton energy ratios [24,25], and many more. The advantage of such techniques is their simplicity and robustness -they do not involve too many theoretical assumptions, making them ideal for model-independent searches for new physics. At the same time, they appear to be suboptimal, since they do not utilize the full set of measured degrees of freedom, leading to a certain loss of information. It is also rather challenging to assign a proper physical meaning to a kinematic variable which only uses such partial information (for more detailed discussion, see refs. [14,26,27]).
2. The variable, v, is an analytic function of some measured momentum degrees of freedom and the measured E T / . The explicit inclusion of the measured E T / in the definition of v was the next attempt to design a better performing class of variables. Perhaps the best known example is the W transverse mass [28,29], where one identifies the transverse momentum of the missing neutrino with the measured / P T . Other possibilities include the "effective mass", M ef f [14,30], the √ŝ min variable [31][32][33], and the "razor" variables [34][35][36][37]. The outputs of neural nets, boosted decision trees, and other multivariate analyses [38], particularly those involving some form of machine learning, are also variables in this class. Incorporating the measured E T / , which is often a sensitive variable all by itself [39,40], into the definition of the kinematic variable, v, is certainly a step in the right direction.
3. The definition of the variable, v, involves all measured momentum degrees of freedom and the individual invisible momenta, q µ k . Finally, one may construct the variable, v, so that from the onset it has explicit dependence on the individual invisible momenta, q µ k . The advantage of this approach is that one works with theoretically motivated quantities with clear physical meaning [26]. The obvious downside is that the individual invisible particle momenta, q µ k , are unknown, and something must be done to fix their values in the calculation. There are two possible alternatives: • Integrate over all possible values of the invisible momenta. Perhaps the simplest solution is to allow all possible values of the invisible momenta, q µ k , which are consistent with the measured missing transverse momentum (1.1), and compute the variable, v, as a suitably weighted average. A celebrated example of this approach is the Matrix Element Method (MEM) [41][42][43] which is finding increased use in hadron collider physics [44][45][46][47][48][49][50][51][52][53]. However, the method is often too computationally challenging, requiring novel ideas and approaches [54][55][56]. Additionally, it is generally difficult to incorporate "reducible" backgrounds, which consist of events where the reconstructed particles are misidentified and/or their momenta significantly mismeasured.
• Use a physically motivated ansatz for the invisible momenta. Alternatively, instead of considering all possible values of the set of invisible momenta, {q µ k }, one could fix them by following some prescription specified in advance. With this approach, one gives up on trying to "guess" the correct values of the invisible momenta, and instead focuses on constructing a useful variable, v, whose properties can reveal important information about the underlying physics. Examples of such variables include the Cambridge M T 2 variable [57,58] and its variants [59][60][61], and the variables M 2C [62,63], M CT 2 [64,65], M T [66], M W T 2 [67], and M min [68]. The variables in this class are often specified, not by analytic formulae, but by the algorithm used to calculate them 3 .
Our focus in this paper will be on the algorithmically specified variables from the very last category, which are known to possess several attractive features: 1. They are "maximally constraining" [26,76] in the sense that, on an event per event basis, they provide the best possible lower bound on an invariant mass quantity of interest, such as the parent masses or the center-of-mass energy, √ŝ . This is particularly useful in cases where it is not possible to determine the actual values of that quantity due to incomplete event information.
2. Their kinematic distributions exhibit sharper endpoints which are easier to measure over the SM backgrounds [77], leading to a more precise determination of the new physics mass spectrum.
3. Certain measurements of their properties can be used as a self-consistency check on the assumed signal event topology [77,78]. If the check fails, our conjecture about the event topology is falsified, thus narrowing down the allowed set of possibilities.
At the same time, such algorithmically defined variables are notoriously difficult to compute. The algorithmic procedure typically involves calculating a mass for a set of hypothesized particles in an event (possibly after projecting their momenta onto the transverse plane), and minimizing the value of that mass with respect to all invisible momenta, q µ k . What makes the problem particularly challenging, however, is the presence of additional non-linear mass-shell constraints. In essence, we are faced with a multidimensional constrained optimization problem, where our objective function is an energy function which is to be minimized. Given the importance of the maximally constraining invariant mass variables for new physics searches and measurements [26,[76][77][78], it is important to have publicly-available software for performing constrained minimizations to calculate kinematic variables from high energy collision data with sufficient generality and efficiency. The existing packages described in the literature are typically only applicable to a specific variable, e.g., M T 2 [58,[79][80][81], or M W T 2 [67] and cannot be readily generalized to the whole class of on-shell constrained variables [76,77]. The standard approach is to try to solve the constraining equations, thus reducing the unknown number of degrees of freedom (d.o.f.), then implement an unconstrained minimization over the remaining d.o.f. While this approach generally provides the most efficient algorithm for a specific event topology, it is not extendable to more general event topologies, where not all constraints can be simultaneously solved analytically.
In this paper, we describe an alternative approach that is sufficiently universal and flexible, and can be applied to arbitrarily general event topologies. The main idea is to use the Augmented Lagrangian Method (ALM) [82,83], briefly described below in section 2.3. In this approach, the feasibility (i.e., the validity of the constraints) is ensured by penalizing infeasibility by adding "penalty terms" to the objective function, rather than by directly solving the constraining equations. The fact that the method does not require the solving of any constraints beforehand makes it very flexible and applicable to a very general class of event topologies. Of course, we still have to perform a standard unconstrained minimization, for which we can take advantage of any one of the many publicly available packages -we have chosen to use Minuit [84], which is widely popular in high energy physics. We also supply a comprehensive software package, Optimass 4 , in the form of a library, which interfaces with Minuit to perform the constrained minimization of a user-specified kinematic function using the ALM. Appendix B contains instructions on the installation and usage of Optimass.
The paper is organized as follows. In section 2, we review the general problem of constrained optimization with special emphasis on the motivation and the techniques used by Optimass, in particular the ALM. The relevant Minuit routines with which it interfaces are described in Appendix A. Section 3 describes in detail the algorithm behind the Optimass package and presents several toy examples for its validation. In section 4 we briefly review the M 2 variables [26,76,77], and provide examples of their use, in the study of the fully leptonic decays of pair-produced top quarks. We use this as an opportunity to compare the results from Optimass with previous studies and known analytical calculations. Finally, section 5 is reserved for our conclusions and a brief discussion of the future of Optimass.

Review of constrained minimization
While many excellent textbooks and references discuss the optimization techniques that we will utilize (see, e.g., refs. [85][86][87] and references therein), we feel that a brief review of the elements of optimization theory relevant to the operation of Optimass will prove useful. We first note that while we will sometimes speak of "optimization" rather than "minimization": (i) in the calculation of kinematic variables we will be interested only in minimization, and (ii) the methods used to find a maximum are, up to obvious changes in the sign of certain parameters, identical to those used to find a minimum. So in what follows we will not worry much about this distinction.
The first issue that will concern us is the important division of minimization problems into two types: (i) constrained and (ii) unconstrained. In constrained minimization, we want to minimize an objective function, where x, in general, refers to a point in R n formed from some unknown momentum degrees of freedom x 1 , x 2 , . . . , x n . In what follows, we shall assume that the number of constraints, m, is always less than the number of independent degrees of freedom, n, so that we are dealing with a true minimization problem. If the constraints in eq. (2.2) are all independent, then the parameter space is effectively reduced from n dimensions to n−m dimensions. Sometimes this reduction can be performed analytically. For example, consider an optimization problem in R 3 , subject to the constraint x 2 +y 2 +z 2 = 1 -then we should parameterize the two dimensional subspace that satisfies the constraints, i.e., the surface of the unit sphere, S 2 , by the angles θ and ϕ in the standard way.
However, this reduction of dimensionality cannot always be performed analytically. A useful alternative procedure, therefore, is to turn the constrained minimization problem into the problem of an unconstrained minimization 5 of a modified objective function,f ( x), over the full, unconstrained, parameter space, R n . This new problem can then be solved by setting the gradient of some function equal to zero, or by searching for a local/global minimum using one of the many standard numerical algorithms conventionally used for this purpose.
When performing this unconstrained minimization iteratively, one develops an algorithm for finding the location of the minimum off ( x), x * , which is also referred to as the minimizer. At each iteration, one starts with some initial estimate, x k , (typically taken to be the minimizer of the previous, k −1 st , iteration), then refining this estimate by obtaining a new minimizer, x k+1 , in some prescribed way, until certain convergence criteria are met. Since we have not analytically solved the constraints (2.2), the estimates, x k , will not, in general, satisfy the constraints exactly. Following the standard mathematical terminology, we shall refer to values of x that satisfy the constraints in eq. (2.2) as feasible. The absolute value of c a ( x) is then a measure of the feasibility 6 . Even though feasibility is not strictly guaranteed, the ultimate solution found by the method should nevertheless be such that the constraints are satisfied to within a required degree of numerical precision. 5 Or possibly, a series of unconstrained minimization problems. 6 A point x in the unknown momentum space R n is feasible if it is an element of the feasible set Ω defined by Ω = { x | ca( x) = 0, a = 1, .., m} .
In the remaining three subsections of this section, we discuss three possible ways to transform a constrained minimization problem into an unconstrained minimization problem, namely (i) the method of Lagrange multipliers (in section 2.1), (ii) penalty methods (in section 2.2), and (iii) the Augmented Lagrangian Method (in section 2.3). We shall see how penalty methods solve some of the problems associated with the use of Lagrange multipliers, while the ALM, in turn, resolves certain numerical issues related to the use of penalty methods. Of course we also must be able to solve the resulting unconstrained optimization problem; we discuss approaches to this challenge in Appendix A.

The method of Lagrange multipliers
As noted above, in a generic constrained minimization problem we are looking for the minimum value of the target function, f ( x), subject to the constraints (2.2): Alternatively, one is trying to find the location of the minimizer, x * , in R n . We note that in practical applications of the maximally constraining invariant mass variables both the minimum value of the function, f ( x * ), and the minimizer, x * , itself can serve a useful purpose. For example, in the M T 2 -assisted on-shell (MAOS) reconstruction method, the minimizer is used to provide an ansatz for certain transverse components of the invisible momenta [88,89]. Both the function, f ( x), and the constraints, c a ( x), are assumed to be smooth 7 real-valued functions in R n . The method of Lagrange multipliers provides necessary and sufficient conditions for finding local solutions of the minimization problem (2.3) above. In this method, we define a corresponding Lagrangian, henceforth denoted by L, for an objective function, f ( x), and constraints, c a ( x), by where λ ≡ (λ 1 , λ 2 , . . . , λ m ) is an m-component vector 8 of Lagrange multipliers, λ a . We now describe the conditions that must be satisfied in this method in order to establish the existence of a local minimum at the proposed minimizer, x * .
First order condition (FOC). In unconstrained optimization, a necessary condition for the existence of a local minimum of f at x * is that the gradient vector, ∇ x f ( x * ), vanishes. As this condition involves the gradient, we may term it a first order condition. In the method of Lagrange multipliers, an analogous condition holds for the existence of a local minimum. Here it is only necessary that the gradient of the objective function, f , at x * is orthogonal to the surface defined by the constraints. This condition will hold if the 7 In some cases, the objective function may depart from smoothness in a specific way; we discuss this point and related issues in sections 3.2.2 and 4.1, see also ref. [77]. 8 Throughout this paper we shall use the notation v for n-dimensional vectors in the space of independent variables, R n , and v to denote m-dimensional vectors in the space of constraints, R m .
, is an element of the vector space spanned by the gradient vectors of c a , ∇ x c a ( x * ). Thus the FOC is that there exists a Lagrange multiplier vector, λ * , in R m , such that at the point ( x * , λ * ) ∈ R n ⊗ R m , the following conditions hold Therefore, at least at the level of the FOC, the problem of constrained optimization has been reformulated as an unconstrained optimization problem.
Second order condition (SOC). As is well-known, the condition that a first derivative vanishes is not sufficient to establish that there is an extremum at that point -one must also verify that the second derivative is, in the case of a minimum, positive. The extension of this idea to unconstrained optimization in many dimensions is to require that the Hessian matrix, defined for the objective function, f , by and evaluated at the prospective minimizer, x * , be positive definite 9 . This is the "second order condition" (SOC); that both the FOC and the SOC are satisfied is sufficient for the existence of a local minimum. As long as we are not at a boundary of the parameter space, these conditions are both necessary and sufficient. We now consider the corresponding SOC for the Lagrange multiplier method, hoping that, as in the case of the FOC, we can obtain a condition analogous to the case of unconstrained minimization, i.e., a constraint involving the positive definiteness of some Hessian. We therefore evaluate the Hessian of the Lagrange function (2.4) with respect to x: To determine the SOC, we note that for x * to be a minimizer (with Lagrange multiplier vector, λ * ), we must have for any infinitesimal displacement, d, from the proposed minimizer, x * , in a direction allowed by the constraints. Thus the relevant condition is not the positive definiteness of the Hessian (2.7), but the positive definiteness of the restriction of the Hessian to the space allowed by the constraints. The fact that we must still use the constraints to see if a given stationary point (i.e., a point satisfying the FOC) is a minimum, means that we have failed in our mission to convert the constrained minimization problem to an unconstrained minimization problem. We therefore proceed to look for methods for which the SOC does not explicitly involve the constraints.

Penalty methods
A natural approach to our problem of transforming a constrained optimization problem to an unconstrained optimization problem is to follow the example of the method of Lagrange multipliers in modifying the objective function, but to do so in a different way. Clearly, we would like to modify the function so that infeasibility incurs a penalty. One possible way to achieve this is via "convexification" of the geometry near the desired solution point, i.e., making sure that a solution to the constrained minimization problem is a local minimum of the transformed function,f ( x), even if it is not a local minimum of f ( x) in the absence of constraints. We now proceed to give an example of one such convexification approach.
In the so-called penalty methods, the original objective function, f ( x), is modified by the addition of a penalty term, i.e., a functional of c a (x) weighted by a positive penalty parameter, µ, so that the term vanishes when c a ( x) = 0, but becomes large if c a ( x) = 0 in the µ → 0 limit. While there are various penalty methods, which differ in the form of the penalty function, we consider the "Quadratic Penalty Method" (QPM) here, because of its connection with the ALM discussed below. In the QPM, the penalty term is chosen so that the modified function 10 under consideration is In the course of the algorithm, the parameter, µ, will be reduced, as the desired properties of this function hold in the µ → 0 limit. The gradient of (2.9) is which reproduces the gradient of the Lagrange function (2.5) if we take to be (the components of) the Lagrange multiplier vector, λ. This shows that the necessary FOC for a minimum is the same as for the method of Lagrange multipliers, only the Lagrange multiplier vector is now determined for us. Since a minimization using a QPM should give the same value for the solution, x * , as the Lagrange multiplier method (which gives the solution ( x * , λ * )), it follows that as we approach this solution, we must have In determining the SOC for the QPM, we note that the Hessian of the function in (2.9) is If x * satisfies the constraints, then this expression simplifies to (2.14) If we consider an infinitesimal displacement, d, along the surface allowed by the constraints, then as On the other hand, for displacements, d , orthogonal to this surface, we obtain non-zero terms from both the objective function and the penalty term, i.e.
where ∇c a (x) · d is now non-vanishing. In the µ → 0 limit, the second term will dominate. So if the Hessian is positive definite with respect to allowed displacements, it is automatically positive definite in general. Thus, we now have that in the limit of µ → 0, the sufficient condition that the stationary point, x * , be a local minimum is simply that the Hessian of the function (2.9) is positive definite. Thus we have succeeded in overcoming the limitation of the method of Lagrangian multipliers in that both the FOC and SOC are the same as in unconstrained minimization. Unfortunately, all is not well when it comes to practical applications of the QPM. The basic problem is that in the µ → 0 limit the algorithm becomes too sensitive to small departures from feasibility, hence numerical instabilities may prevent convergence to the solution. In more formal language, small values of µ → 0 can result in severe illconditioning of the Hessian, since the rightmost term in (2.14), which dominates in the µ → 0 limit, is not invertible. Therefore, we will need to further modify the QPM, retaining the way in which it maps constrained optimization problems to unconstrained optimization problems, but reducing the relative importance given to feasibility in the µ → 0 limit. The solution, presented in the next subsection, is the Augmented Lagrangian Method (ALM).

Augmented Lagrangian Method
We seek a method which preserves the success of the QPM in generating FOC and SOC that correspond exactly to those obtained in unconstrained minimization, but which overcomes the difficulties encountered by the QPM in the µ → 0 limit. One approach is to introduce augmented Lagrange multiplier terms, leading to the following modified objective function:L Note that λ is no longer determined numerically by the optimization procedure, but is instead a fixed vector, just like the penalty parameter, µ. As was the case for the QPM, (2.18) is used iteratively; unconstrained minimization ofL( x ; λ, µ) is performed for the chosen values of λ and µ in each step of the procedure. After optimizingL( x ; λ, µ) to within some tolerance, new values of λ and µ are chosen; the process is repeated until the desired levels of optimality and feasibility are satisfied. This procedure is the ALM. We must verify that the ALM indeed avoids the problems of the QPM. To do this, we note that while the Hessian is given by We note that (2.19) recovers the expression for the method of Lagrange multipliers (2.5) with the substitution which shows that the FOCs for optimality are the same as for the problem of unconstrained minimization, just as in the case of the QPM. It is also possible, albeit more challenging, to show that the SOC for a minimum is the same as in the unconstrained case; see refs. [82], [83], and [90] for details. We also conclude from (2.21) that in the asymptotic limit in analogy to (2.12). The point of the ALM is that now that we are free to choose both λ a and µ, we can enforce (2.22) without taking the µ → 0 limit. We shall do this by choosing the value of λ a in the (k + 1) st iteration as follows As we do not take µ k → 0, the Hessian defined in (2.20) should never become ill-conditioned. Hence the ALM avoids the major drawback of the QPM, while preserving its successes. We therefore implement the ALM in the Optimass code, as described in the following section.

Step one: initialization
At the onset, we must specify certain parameters. We first discuss the parameters associated with the optimality condition, followed by those related to the feasibility condition. Finally, we will discuss the penalty/Lagrange multiplier parameter, and the starting value for the minimizer.
Optimality condition. The test of optimality, i.e., whether the relevant sub-minimization procedure, has reached an acceptable local minimum in the k th iteration, should be performed in terms of some relevant geometric quantity. In Optimass, we choose the Migrad algorithm of Minuit (see Appendix A.1), where the optimality criterion utilizes the so called "Estimated vertical Distance to the Minimum" (EDM) defined as follows Using the EDM (3.1), at each iteration k in Optimass the optimality convergence test in Minuit is performed by checking if the EDM is smaller than the optimality tolerance, ω k : We have observed that simply setting ω k to a constant, ω * , suffices. By default in Migrad, this constant is set by the internal re-parametrization In Optimass, the optimality tolerance ω * is controlled and set by the tolerance parameter through the interface with Migrad 12 . Its default value is set to be while the parameter up is not used, it is set internally in Optimass to up = 1.
Feasibility condition: Throughout the whole minimization procedure, we test for feasibility by computing the quantity 13 in each iteration. Then, the test for feasibility is where η k denotes the feasibility tolerance in the k th iteration; this criterion evolves iterationby-iteration, unlike the optimality parameter ω k . Although η k eventually approaches zero as k → ∞, we set the final convergence criterion as follows: where η * denotes the terminal feasibility tolerance set to be Here M is the appropriate typical mass scale associated with the target mass function, its value should be chosen depending on the specific physics process at hand. In the current version of Optimass, the user is expected to provide the relevant fixed value of the scale 12 The specific method to set this quantity is where maxfcn denotes maximum number of function calls after which the Minuit minimization routine will be stopped even if it has not yet obtained satisfactory convergence to an acceptable minimum. The default value is maxfcn = 5000. 13 In eq. (3.6), we assume that all constraints ca have the same mass dimension dim(ca) = dim(c) = 1, otherwise each term in the RHS should be raised to the appropriate power of 1/dim(ca).
M . 14 The rules for updating η k will be described in section 3.1.3. The starting feasibility tolerance is initialized by Here the pre-factorη is given byη with some coefficient α chosen in the range 10 < ∼ α < ∼ 1000, (3.12) µ 0 is the starting penalty parameter given in (3.16) below, whileγ ∈ (0, 1), and our default for it isγ = 0.2. (3.13) By adjusting the coefficient α within the range (3.12), one can control the relative scale of the initial feasibility tolerance, η 0 , to its terminal value, η * . This in turn determines the relative number of iterations within each of the two regimes (phase-1 and phase-2) described in section 3.1.3 below. If α is too large, the ALM iterations in phase-1 terminate very quickly, and the majority of the ALM iterations are performed in the regime of phase-2, where the Lagrange-multipler driven evolution may not be efficient. On the other hand, if α is chosen to be too small, most of the ALM iterations will be done in the regime of phase-1, which only reduces the penalty parameter µ k . In that case, Optimass will not be able to take full advantage of the ALM method in avoiding the ill-conditioning as explained in section 2.2. We recommend that users test several different values of α, until the number of iterations in each phase is adequate and the results are stable. Finally, the "tightening" parameters, β k η ∈ (0, 1), for the feasibility constraint are set to be β 0 η = 0.5, (3.14) Penalty and Lagrange multiplier parameters: The penalty parameter, µ, and Lagrange multiplier parameter vector, λ, are updated in each iteration (the actual assignment rule will be explained below in section 3.1.3). Their starting values are λ 0 a = 0 (a = 1, · · · , m). If the value of τ µ is too small, the penalty parameter µ may decrease too quickly, causing strong convexification, which can lead to ill-conditioning. Conversely, if the value of τ µ is too large, one can experience slow convergence and a premature transition to phase-2 (see section 3.1.3 below).
Initial minimizer and initial step size for Minuit: The initial guess for the minimizer of the objective function, x s 0 , (referred to as "seeding" in the flowchart of figure 1), is set by the invisible momentum configuration corresponding toŝ min [31], i.e., the invisible momenta are such that the total invariant mass in the event is minimized. In addition, the initial step size, ∆ x s 0 , from from the x s 0 toward the final minimizer, x * , is another input parameter for the Minuit initialization 15 .

Step two: unconstrained minimization with Minuit
Once the initial parameters have been chosen, we use the ALM mapping of a constrained minimization problem to an unconstrained minimization problem and then perform the latter minimization with Minuit (see Appendix A). In the process of this minimization, we perform an appropriate adjustment of parameters in each step of the algorithm. The code has two different options for minimization with Minuit in the k th iteration: 1. Migrad: Using as a starting value the minimizer, x k−1 , obtained in the previous iteration, Migrad searches for a minimizer, x k,1 .

Simplex and Migrad:
Again using as a starting value the minimizer, x k−1 , obtained in the previous iteration, Simplex first finds a minimizer, x k,S , and then Migrad uses x k,S as a starting value to find the final minimizer, x k,2 .
Among the two possible answers, x k,1 and x k,2 , we choose as our minimizer, x k , the one which gives a lower value for the objective function. One could repeat either algorithm 1 or algorithm 2 (or other minimization procedures) with various starting points, x 0 , to find a global minimum more accurately. However, we find that the combination of algorithms 1 and 2 described above is adequate to obtain accurate on-shell constrained M 2 values, while keeping the computational effort to a minimum. Interestingly, we find that algorithms 1 and 2 are complementary to each other. For example, in the M 2 calculations of section 4, we found that for the maximally constrained case of M 2CC , the final solution, x k , was given by the answer x k,1 from algorithm 1 ( x k,2 from algorithm 2) in 83% (17%) of the events. This trend was reversed in the calculation of the minimally constrained case of M 2XX , where algorithm 1 (algorithm 2) supplied the final solution x k in 19% (81%) of the events. This behavior can be understood as follows. Migrad relies on gradient information, thus 15 The initial input values for x s 0 and ∆ x s 0 can be set via the method it can be fast and accurate if the objective function is smooth and continuous. On the other hand, Simplex does not require gradient information, and can handle more complicated functions (including "folds" and "creases"), although the ultimate accuracy is not as high.
In the case of M 2CC , the objective function is convexified by the penalty terms, which makes the relevant geometry near the local minimum smooth and well-defined, thus we expect algorithm 1 by Migrad to outperform algorithm 2. However, it is known that the M 2XX objective function, "the maximum of the two invariant masses", develops a crease, on which the solution is found [77], and therefore one might expect algorithm 2 by Simplex to work better for this case. The performance of Optimass with regard to the M 2 variables will be discussed in more detail in section 4.

Step three: ALM parameter adjustment
Once the minimization routine has obtained a value of the minimizer, x k , we then evaluate the constraints at this minimizer, i.e., c a ( x k ). Depending on the value of the feasibility (3.6), we define three phases: "Phase 1", "Phase 2", and "Phase 3". While "Phase 3" is nothing but terminating the entire minimization procedure, the other two phases basically tighten the feasibility tolerance. Our tightening scheme is inspired by the LANCELOT package [91]. As mentioned earlier, the tolerance, ω k , for the optimality condition (3.3) is not evolved: ω k = ω * .
Phase 1. The feasibility condition is far from satisfied: In this case we put more weight on the penalty term by reducing µ k+1 for the next iteration. At the same time, the Lagrange multiplier vector, λ k , remains unchanged. In the next iteration, the starting value of the minimizer, x s k+1 , is set to be the minimizer obtained in the previous iteration, x k . Finally, the feasibility tolerance, η k+1 , is also evolved. The detailed updating rules for Phase 1 are thus Phase 2. The feasibility condition is converging, but insufficient to terminate the algorithm: In this case, we do not reduce the penalty parameter, µ k , and instead adjust the values of the Lagrange multiplier vector by the rule in eq. (2.23). The starting value of the minimizer, x s k+1 , for the next iteration is set as in Phase 1. The feasibility tolerance is also modified. The detailed updating rules are given by Phase 3. In this phase, we have achieved sufficient feasibility: We therefore break the sub-minimization loop and return x k as the final value, x * , of the minimizer.

Validation
We now demonstrate the performance of the algorithm described in the previous subsection with two simple examples, for which one can also obtain analytic solutions for the minimizer, x * , and the Lagrange multiplier vector, λ * . The first example, considered in section 3.2.1, yields a well-defined solution at a unique global minimum. In the second example, treated in section 3.2.2, we find that the solution for the Lagrange multiplier vector is not well-defined because the relevant objective function is "folded" and is not differentiable at x * . The examples illustrate the evolution of the ALM parameters and demonstrate how the solution found in the k th iteration converges to the true value in terms of feasibility and optimality.

Example one
Our first example involves minimizing the objective function over the usual plane, x ≡ (x, y), subject to the constraint This constraint implies that our solution must lie on a unit circle centered at the origin. Clearly, the function (3.30) is minimized along the circle at the point which is the global minimizer in this example. The objective function, (3.30), is plotted in the left panel of figure 2. The locus of feasible points (i.e., the unit circle about the origin,) is shown in black. With the Lagrange multiplier method, one adds a Lagrange multiplier term to the objective function as in (2.4) L(x, y, λ) = x + y − λ (x 2 + y 2 − 1). There are two stationary points given by The Hessian corresponding to the function (3.33) is and the condition for a minimum (i.e., that H L should be positive definite) requires us to choose λ < 0, which selects the correct minimizer among the two stationary points (3.34): confirming the earlier result (3.32).
We now wish to verify that the Optimass algorithm reproduces this solution. After running the code, we obtain (x * , y * ; λ * ) = (−0.707106, −0.707106; −0.707180), (3.37) which is consistent with (3.36) ( √ 2/2 = 0.707107...). We note that this convergence only required five steps, suggesting that the minimum was found relatively easily. The right panel of figure 2 shows a contour plot of the augmented Lagrangian, L(x, y; µ 5 , λ 5 ) = (x + y) − λ 5 (x 2 + y 2 − 1) + 1 2µ 5 (x 2 + y 2 − 1) 2 with  The upper right and lower left panels in figure 3 respectively show the evolution of (the initial values of) the penalty parameter, µ k , and the Lagrange multiplier, λ k , in each iteration. We see that during Phase 1, µ k was updated while λ k was fixed, while in Phase 2, λ k was updated while µ k was held fixed. Throughout this process, the solution, x k , from each step gradually converges to the analytic solution, (3.32), as shown in the lower right panel in figure 3. Note that from the first iteration on, the solutions are on the diagonal line x = y (see also the right panel in figure 2) -except for their starting values, the red and blue lines in the lower right panel of figure 3 essentially overlap.

Example two
Our second example is very similar to the one considered in the previous subsection, except now we change the objective function to f (x, y) = |x + y| , (3.39) and we keep the same constraint as before: The left panel in figure 4 plots the objective function, f (x, y), as well as the feasible set (the unit circle centered on the origin). Note the "fold" along the line y = −x. On this line, the objective function is not a smooth function, hence one of the basic assumptions generally employed in the theory of constrained optimization (see section 2) is not satisfied. Nevertheless, in such cases we typically find that the Optimass algorithm still converges efficiently to the correct solution.
It is clear from figure 4 that the current problem has a two-fold ambiguity, there are two equivalent solutions for the global minimizer: Starting from the initial point (x s 0 , y s 0 ) = (0.6, 0.7), Optimass converges to (x * , y * ) = (−0.707132, 0.707132) (3.42) in a single iteration (i.e., without taking µ → 0). The right panel in figure 2 shows a contour plot of the augmented Lagrangian, for the only step that the algorithm needed, the initial step. The "folded valley" feature of the objective function suggests that one does not need much additional convexification from the penalty term (since we are already in a valley), and the algorithm converges very quickly.
In both of these two toy examples, as well as in numerous physics motivated studies described in the next section, we verified that the numerical solutions obtained with Optimass are stable with respect to small variations of the default initial values of the parameters, and in particular x s 0 .

Calculating M 2 variables with Optimass
In this section, we describe the main intended use of Optimass, the calculation of kinematic variables suitable for analyzing missing energy events at hadron colliders. The calculation involves a minimization of a mass function over a number of invisible particle momenta, subject to certain constraints (e.g., on-shell constraints, or the missing transverse momentum constraint (1.1)). In particular, we will show how the code can be used to calculate the recently proposed M 2 variables with non-linear constraints [26,76,77]. We first provide a brief review of the M 2 variables; then define the relevant objective function and identify the sorts of constraints that may be imposed. We then demonstrate the performance of Optimass in the calculation of M 2 in the physically important case of top quark pair production, when both tops decay leptonically.

Introduction to M 2
The M 2 variable [26,76,77] is a (3 + 1)-dimensional analogue of the well-known M T 2 variable [57]. Both are typically applied to final states that may result from (a) the pair 16 production of "mother" particles that (b) subsequently decay to both visible and invisible particles. The best motivated scenarios typically have too many invisible particles in the final state, so that we cannot, in general, reconstruct the masses or momenta of all of the Let us consider for concreteness a process in which a pair of heavy particles, A, undergo identical two-step, two-body, cascade decays, i.e., each A decays into two (massless) visible particles, a and b, plus a (massive) invisible particle, C, via an on-shell intermediate particle, B, (see figure 5) A → a B → a b C. (4.1) For simplicity, we assume that all visible particles are fully distinguishable and that particle a is emitted before particle b, i.e., we do not address the combinatorial issues since they are not relevant for the current discussion of computing the M 2 variables. Given the event topology of figure 5, one can consider three different subsystem topologies, depending on which of the particles along the red dashed lines are treated as mothers and which are treated as daughters [73,77]. For example, considering the event as a whole corresponds to subsystem (ab), in which A i are the mothers, C i are the daughters, while B i are intermediate resonances, dubbed "relatives" in [77]. We are interested in placing the maximum possible lower bound on the mass of A, as a function of the hypothesized mass, m, of C. The prescription for doing so is well-known (see, e.g., [26]): we minimize of the heavier of the two parent masses, M A 1 and M A 2 , subject to relevant kinematic constraints. In the simplest case, we only apply the missing transverse momentum constraint, (1.1), and obtain where q i denotes the three-momentum of the invisible particle, C i . Note that the missing transverse momentum condition is linear and is easily solved, so that the minimization in (4.2) is unconstrained and can be performed over an unconstrained four-dimensional momentum space, e.g., { q 1T , q 1z , q 2z }.
The situation becomes much more interesting (and challenging) when we consider additional nonlinear constraints. Given the process of figure 5, it is natural to consider additionally constrained versions of eq. (4.2). Having already made the assumption that the two decay chains are identical 17 , we can additionally impose that the particles A 1 and A 2 have the same mass, that the particles B 1 and B 2 have the same mass, 4) or both (4.3) and (4.4). Together with the case where neither (4.3) or (4.4) is required to hold, a total of four variants are therefore possible. Following the same notation as ref. [77], we introduce two more subscripts on the M 2 variable to indicate whether the constraints in eqs. (4.3) and (4.4) were applied during the minimization or not. The first subscript will refer to the parent constraint in eq. (4.3), while the second subscript will refer to the relative constraint in eq. (4.4). If a constraint is imposed, the corresponding index is "C", otherwise it is "X". Therefore, eq. (4.2) can be expressed as M 2XX because no extra constraints are imposed: The other three variables are formally defined as follows: 17 See refs. [60,61,78] for relaxing this assumption.
Eqs. (4.5-4.8) define the four possible M 2 variables for the (ab) subsystem. One can similarly define four M 2 variables for each of the (a) and (b) subsystems, we refer the reader to [77] for the exact definitions.

Calculating M 2 for dilepton top events
From the definitions (4.6-4.8) it is clear that the problem of computing the variables M 2CX , M 2XC , and M 2CC falls into the general category of constrained minimization problems (2.1, 2.2) which Optimass is designed to solve. In the remainder of this section, we shall therefore illustrate the functionality of Optimass with the physics example of figure 5. Specifically, we consider the case of pair-produced top quarks that decay fully leptonically: Thus, in figure 5, particle A is associated with the top quark, particle B with the W -boson, and particle C with the neutrino. For simplicity, since our major interest is not in the shape of the distributions but in the precision of the minimization procedure, we consider events where the top quarks are produced at threshold and decay according to phase space. We neglect initial and final state radiation, and also do not take into account experimental efficiencies, cuts, combinatorics, and detector resolution. All those effects are important in a real physics analysis, but are irrelevant to the question of evaluating the performance of the Optimass minimization algorithm, which is our goal here. We use Optimass to compute the values for the M 2CX , M 2XC , and M 2CC variables for all three subsystems. In order to judge the precision of this numerical calculation, ideally we need to identify an alternative method for computing these answers, which would give us reference benchmarks. Fortunately, for the case of the M 2CX variable, such a benchmark is provided by the M T 2 variable itself -we can use the result, proven in ref. [77], that M 2CX and M T 2 are identical event by event. This enables us to directly compare the values of M T 2 and M 2CX for each of the three possible subsystems. The Cambridge variable M T 2 can already be reliably computed with one of several publicly available codes; here we  use the package from ref. [79]. Furthermore, for the (ab) and (a) subsystems, analytical formulae for M T 2 are also available [69,72], facilitating the comparison. 18 Figures 6 and 7 show the results from the comparison between the value of M 2CX obtained from Optimass and the corresponding reference value, M Ref T 2 , for all three subsystems. Since the exercise is performed with the tt decay sample, we take the trial mass to be the true mass of the daughter particle:m = 0 GeV for the (ab) and the (b) subsystems andm = 80 GeV for the (a) subsystem. We choose the M parameter in eq. (3.9) to be 200 GeV for subsystems (ab) and (a), and 100 GeV for subsystem (b). Figure 6 reveals that the distributions of the on-shell constrained M 2 variable, M 2CX , (red dashed histograms) are almost identical to the corresponding M T 2 distributions (blue dot-dashed shaded his-  Of course, the precision can be further improved by tweaking the relevant tolerance parameters, increasing the maximum number of iterations, or improving on the initial guess of x s 0 . However, this will come at the cost of increased computation time; we believe that the level of precision seen in these figures should be sufficient for most practical analyses. Figures 8 and 9 provide a similar validation for the case of the M 2XC and M 2CC variables. In this case, however, we do not have a readily available benchmark for comparison: first, because analytical formulas for those cases do not exist, and second, because there is no publicly available code which is able to handle M 2XC and M 2CC . This is why we produced two different versions of our code (created independently by different sets of the current authors), and proceeded to compare their results for M 2XC and M 2CC in figures 8 and 9, respectively. The figures show that the two internal codes agree reasonably well, with notable differences only in about 1% of the events. The events with the largest deviations were scrutinized further, revealing that one of the codes typically found a local minimum, due to a different choice of starting values for  Figure 11. The same as figures 2 and 4, but for the single event considered in section 4.3. Since the objective function has four independent arguments, in order to visualize the evolution of the minimizer, we plot q 1z and q 2z , having fixed the other two variables, q 1x and q 1y , to the values which minimize the objective function for the given choice of q 1z and q 2z . Figure 10 is the analogue of Fig. 3, showing the convergence to a satisfactory solution for M 2CC in subsystem (ab) after the k = 4 step, giving   .4)) is shown in blue (red). As before, the magenta points mark the locations of the minimizer, x k , found in the k th iteration.

Conclusions
With the restart of the LHC, the quest for new physics has resumed. We believe that kinematic variables like M 2 will play an increasingly important role in searching for SUSY and related models; the gain in sensitivity that these variables provide (see, e.g., [78]) aids both in setting limits on and in discovering BSM physics.
Since the calculation of M 2 , like many other kinematic variables, involves a constrained minimization that must be performed numerically, it is important to ensure that this calculation is performed in an efficient and reliable way. Thus we have introduced the public package, Optimass, which achieves these important goals. Our algorithm utilizes the ALM and interfaces with the popular unconstrained minimization package, Minuit. Figure 12. An example of the general event topology which can be handled by Optimass. We allow for two decay chains, involving a sequence of 2-body or 3-body decays. In each decay, the final state particles can be visible, invisible, or both.
We have described the relevant issues in what we hope will be sufficient depth to aid the physicist new to the challenges of constrained optimization. The Optimass algorithm has been described in detail and examples of its use have been provided. We compared analytically calculated values of M T 2 to the M 2CX variable obtained using Optimass and found excellent agreement. Other tests of Optimass were performed and the results are encouraging. We stress that while our physics example in section 4 was limited to the dilepton tt topology of figure 5, Optimass has been designed to handle arbitrarily general event topologies, as indicated in figure 12: • Multiple invisible particles in each decay chain. In many motivated scenarios, invisible particles may appear not only at the end of the decay chain (as is customary for dark matter particles), but also at intermediate stages. The Optimass code can handle such cases, since the total number of invisible particles is unrestricted. In figure 12, the first decay in the lower chain and the second to last decay in the upper chain provide examples of sources of such intermediate invisible particles.
• Multi-body decays. The decay chains can be constructed of two-body decays, threebody decays, etc. Furthermore, a multi-body decay may result in a set of final state particles which can be visible, invisible, or both. As an illustration, in figure 12 we show three two-body decays and three three-body decays. Two of the three-body decays result in visible particles only, while the remaining one produces both a visible and an invisible final state particle.
This more general functionality of Optimass will be explored and demonstrated in a future publication [101].
In conclusion, we look forward to implementing improvements to and extensions of the Optimass framework, and to its use in the search for new physics that lies ahead.

A Unconstrained minimization with Minuit
Using the methods of section 2, and in particular the ALM, we can convert a constrained optimization problem into an unconstrained minimization problem; the next step is to actually solve the resulting unconstrained minimization problem. For this task we use Migrad and Simplex, two algorithms which are part of the Minuit library. Thus, after briefly introducing this ubiquitous library, we will discuss these algorithms in sections A.1 and A.2, respectively.
Minuit [84] is a popular function minimization library. It is often used for data analysis, as the minimization of χ 2 functions and likelihoods represents perhaps the main use of minimization in experimental particle physics. Minuit was initially written in Fortran, but has been reimplemented (as Minuit2 [92]) in C++, taking advantage of its object-oriented features; Minuit2 is included in the math library of the omnipresent data analysis package Root [93]. Minuit and Minuit2 (which we will henceforth refer to simply as "Minuit") contain various minimization algorithms, offering the user a choice. Among the several main algorithms (Migrad, Seek, Scan, Simplex), we choose to use Migrad and Simplex which are briefly described in the next two subsections.

A.1 Migrad
Migrad utilizes a variation of the Newton's method called a Variable Metric Method (VMM) [94]. We remind the reader that Newton's method is an iterative method for finding the root of a function f (x) in which Finding a minimum of f (x) rather than a root corresponds to finding a zero of f (x). In this case the sequence of approximate solutions obtained by Newton's method are described by The analogous expression in the multidimensional case is where H −1 ( x n ) is the inverse of the Hessian matrix. The name "Variable Metric Method" is due to an interesting parallel with General Relativity. Namely, in the limit where the objective function, f ( x), is a quadratic form with minimum at x = 0, then, for small x, The expression on the right hand side is a bilinear form with (1/2)H( x) playing the role of the metric tensor. If one chooses in n dimensions, then (1/2)H( x) is precisely the n-dimensional Euclidean metric. The quantity on the right hand side of eq. (A.4) is known as the "estimated vertical distance to minimum" (EDM), i.e., Generally, when using a VMM, the optimality condition will check whether the calculated value for the EDM exceeds a certain tolerance parameter. Eq. (A.3) describes the essential idea of the VMM. However, the VMM is a "quasi-Newton method" (as opposed to Newton's method itself) because instead of calculating the Hessian (or "metric") exactly, it approximates it iteratively. The main differences among different VMM algorithms lie in the precise form of this iterative approximation procedure. The first VMMs used the so-called DFP updating formula [94,95] (after Davidon, Fletcher, and Powell). Currently, the most common algorithms, including Migrad, use the BFGS method [96][97][98][99].
A very useful property of VMMs is that subsequent steps are in "conjugate" directions, i.e., in orthogonal directions with respect to the metric provided by the Hessian; additionally, convergence to the minimum is efficient. Thus, the algorithm rarely crosses the folded region of the M 2 -objective function where the gradient and Hessian are not defined. This fact motivates the use of the Migrad implementation of the VMM for our constrained M 2 calculations.

A.2 Simplex
Unlike VMMs, the downhill simplex (or Nelder-Mead) method [100], does not require the calculation of gradients. Instead, one calculates the values of f ( x) at the n + 1 vertices of a simplex, a non-degenerate solid in n dimensions with n + 1 sides and n + 1 vertices. A new vertex for the simplex is generated in each iteration of the method. If the value of the objective function at the new point is lower than the value at one of the existing vertices, the worst vertex is replaced by the new point. In this way, the volume of the simplex becomes smaller; the algorithm stops when the simplex, now enclosing the minimum, has shrunk to a specified size.
To be more concrete, let us first consider a (large) simplex of n + 1 points in n dimensions, with vertices p 1 , p 2 , ..., p n , p n+1 . (A.7) These points are ordered so that where f i ≡ f (p i ). We define the "center of mass",p, using all points except p n+1 as follows, In each step, the algorithm tries to replace the worst point, p n+1 . First, a new test point, p r , is obtained by reflection of the worst point about the center of mass, p r =p + α(p − p n+1 ), (A. 10) for some typical value of the expansion factor (generally α = 1; this is the value we shall use in our use of Simplex). A new point is then determined using the value of f (p r ) as follows: 1. f 1 ≤ f r ≤ f n : The previous worst point, p n+1 , is replaced by p r , and the points are relabeled in accordance with (A.8).
2. f r ≤ f 1 : The test point, p r , is the best point, so the current search direction is considered to be effective. We therefore shift the first n points p 1 → p 2 , p 2 → p 3 , ...., p n → p n+1 . (A.11) To determine the new value for p 1 , we try one additional point, p s1 =p + β(p r −p) (typically β = 2), and evaluate its functional value, f s1 . If f s1 < f r , we set p 1 = p s1 , otherwise p 1 = p r .
3. f r > f n : The simplex may be too big and therefore its size must be reduced. If f r > f n+1 , then a new contracted simplex is defined around the best point, p 1 , by replacing all points except p 1 by p i = p 1 + δ(p i − p 1 ) with 0 < δ < 1 (typically δ = 0.5). If f n < f r < f n+1 , then p n+1 is replaced by p r . A test of the new inner point, p s2 =p − γ(p − p n+1 ) (typically γ = 0.5), is then performed, and p n+1 is replaced by p s2 if f s2 < f n+1 .
Since the simplex method is always designed to take as big a step as possible, it is rather insensitive to shallow local minima and other small-scale structures in the objective function. Thus, we use the method to identify promising candidates for global minima. Once the location of a possible global minimum has been identified, the Migrad algorithm described above is used to obtain a more precise value of any local minimum in this area, hopefully obtaining an accurate value for the location of the global minimum. The downhill simplex method is implemented in Minuit using the Simplex algorithm.

B Installation and user instructions
The latest version of the Optimass has been developed and designed to achieve the automation of kinematic mass function minimization with constraints for general particle decay system. In particular, 1. It has generality to treat various decay topologies from multiple parent particles where in general the multiplicity can be larger than two.
2. It also has generality to include decay vertices where in general the multiplicity of branch legs can be larger than three.
3. It has flexibility to easily define a specific sub-system of intermediate parent particles with effective invisibles, in the full decay system. 4. It also has flexibility to define kinematic constraint functions of user's own interest, in terms of visible and invisible particles' momentum degrees of freedom.
5. All these generality and flexibility can be initiated from user's simple model card file which defines 1) full decay process with user's own particle label scheme, 2) parent node particle in each decay chain, 3) effective invisible nodes in each decay chain, and 4) constraint functions of particle momenta which can be interactively expressed by the user's particle label scheme.
The Optimass is free software written in C++ and Python under the copyleft of the GNU General Public License. The latest version of the Optimass can be downloaded from the following web page : http://hep-pulgrim.ibs.re.kr/optimass More detailed Optimass installation guide and the tutorial with examples on how to run the code implementing user's own decay topologies, can be found on the webpage as well.