Maintaining AUC and $H$-measure over time

Measuring the performance of a classifier is a vital task in machine learning. The running time of an algorithm that computes the measure plays a very small role in an offline setting, for example, when the classifier is being developed by a researcher. However, the running time becomes more crucial if our goal is to monitor the performance of a classifier over time. In this paper we study three algorithms for maintaining two measures. The first algorithm maintains area under the ROC curve (AUC) under addition and deletion of data points in $O(\log n)$ time. This is done by maintaining the data points sorted in a self-balanced search tree. In addition, we augment the search tree that allows us to query the ROC coordinates of a data point in $O(\log n)$ time. In doing so we are able to maintain AUC in $O(\log n)$ time. Our next two algorithms involve in maintaining $H$-measure, an alternative measure based on the ROC curve. Computing the measure is a two-step process: first we need to compute a convex hull of the ROC curve, followed by a sum over the convex hull. We demonstrate that we can maintain the convex hull using a minor modification of the classic convex hull maintenance algorithm. We then show that under certain conditions, we can compute the $H$-measure exactly in $O(\log^2 n)$ time, and if the conditions are not met, then we can estimate the $H$-measure in $O((\log n + \epsilon^{-1})\log n)$ time. We show empirically that our methods are significantly faster than the baselines.


Introduction
Measuring the performance of a classifier is a vital task in machine learning. The running time of an algorithm that computes the measure plays a very small role in an offline setting, for example, when the classifier is being developed by a researcher. However, the running time becomes more crucial if our goal is to monitor the performance of a classifier over time where the new data points may arrive at a significant speed.
For example, consider a task of monitoring abnormal behaviour in IT systems based on event logs. Here, the main problem is the gargantuan volume of event logs making the manual monitoring impossible. One approach is to have a classifier to monitor for abnormal events and alert analysts for closer inspection. Here, monitoring should be done continuously to notice abnormalities rapidly. Moreover, the performance of the classifier should also be monitored continuously as the underlying distribution, and potentially the performance of the classifier, may change due to the changes in the IT system.
In order to detect recent changes in the performance, we are often interested in the performance over the last n data points. More generally, we are interested in maintaining the measure under addition or deletion of data points.
We study algorithms for maintaining two measures. The first measure is the area under the ROC curve (AUC), a classic technique of measuring the performance of a classifier based on its ROC curve. We also study H-measure, an alternative measure proposed by Hand [14]. Roughly speaking, the measure is based on the minimum weighted loss, averaged over the cost ratio. A practical advantage of the H-measure over AUC is that it allows a natural way of weighting classification errors.
Both measures can be computed in O(n log n) time from scratch, or in O(n) time if the data points are already sorted. In this paper we present 3 algorithms that allow us to maintain the measures in polylogarithmic time.
The first algorithm maintains AUC under addition or deletion of data points. The approach is straightforward: we maintain the data points sorted in a selfbalanced search tree. In order to update AUC we need to know the ROC coordinates of the data point that we are changing. Luckily, this can be done by modifying the search tree so that it maintains the cumulative counts of the labels in each subtree. Consequently, we can obtain the coordinates in O(log n) time, which leads to a total of O(log n) maintenance time.
Our next two algorithms involve maintaining the H-measure. Computing the H-measure involves finding the convex hull of the ROC curve, and enumerating over the hull. First we show that we can use a classic dynamic convex hull algorithm with some minor modifications to maintain the convex hull of the ROC curve. The modifications are required as we do not have the ROC coordinates of individual data points, but we can use the same trick as when computing AUC to obtain the needed coordinates.
Then we show that if we estimate the class priors from the test data, we can decompose the H-measure into a sum over the points in the convex hull such that the ith term depends only on the difference between the ith and the (i − 1)st data points. This decomposition allows us to maintain the H-measure in O log 2 n time.
If the class priors are not estimated from the test data, then we propose an estimation algorithm. Here the idea is to group points that are close in the convex hull together. Or in other words, if there are points in the convex hull that are close to each other, then we only use one data point from such group. The grouping is done in a way that we maintain ǫ-approximation in O (log n + ǫ −1 ) log n time.
Structure: The rest of the paper is organized as follows. We present preliminary definitions in Section 2. In Section 4 we demonstrate how to maintain AUC, and in Sections 5-6 we demonstrate how to maintain the H-measure. We present the experimental evaluation in Section 7, and conclude the paper with a discussion in Section 8.  Fig. 1: Example of a ROC curve and AUC. If we consider label 1 as a true label and label 2 as a false label, then the vertical axis is the true positive rate (TPR) while the horizonal axis is the false positive rate (FPR).

Preliminaries
Assume that we are given a multiset of n data points Z. Each data point z = (s, ℓ) consists of a score s ∈ R and a true label ℓ ∈ {1, 2}. The score is typically obtained by applying a classifier with high values implying that z should be classified as class 2. To simplify the notation greatly, given z = (s, ℓ) we define d (z) = (1, 0) if ℓ = 1, and d (z) = (0, 1) if ℓ = 2. We can now write that is, n j is the number of points having the label equal to j. Here we used a convention that the sum of two tuples, say (a, b) and (c, d), is (a + c, b + d). Note that n = n 1 + n 2 . Let S = (s 1 , . . . , sn) be the list of all scores, ordered from the smallest to the largest. Let us write that is, r i are the label counts of points having a score less than or equal to s i . We obtain the ROC curve by normalizing r i in Eq. 1, that is, the ROC curve is a list of n + 1 points X = (x 0 , x 1 , . . . , xn), where x i = (r i1 /n 1 , r i2 /n 2 ) and x 0 = (0, 0). Note that not all points in X are necessarily unique. The points in X are confined in the unit rectangle of (0, 1) × (0, 1). See Figure 1 for illustration. 1 The area under the curve, auc(Z) is the area below the ROC curve. If there is a threshold σ such that all data points with a score smaller than σ belong to class 1 and all data points with a score larger than σ belong to class 2, then auc(Z) = 1.
If the scores are independent of the true labels, then the expected value of auc(Z) is 1/2.
Instead of defining auc(Z) using the ROC curve, we can also define it directly with Mann-Whitney U statistic [16]. Assume that we are given a multiset of points Z. Let S 1 = {s | (s, ℓ) ∈ Z, ℓ = 1} be a multiset of scores with the corresponding labels being equal to 1, and define S 2 similarly. The Mann-Whitney U statistic is equal to We obtain auc(Z) by normalizing U , that is, auc(Z) = 1 |S1||S2| U . AUC can be computed naively using U statistic in O n 2 time. However, we can easily speed up the computation to O(n log n) time using Algorithm 1. To see the correctness, note that in Eq. 2 each t ∈ S 2 contributes to U with s∈S1 f (s, t) = |{s ∈ S 1 | s < t}| + 1 2 |{s ∈ S 1 | s = t}| .
Algorithm 1 achieves its running time by maintaining the first term (in a variable h) as it loops over sorted scores. Note that if Z is already sorted, then the running time reduces to linear.
Algorithm 1: Algorithm for computing auc(Z) Our first goal is to show that we can maintain AUC in O(log n) time under addition or removal of data points.
Our second contribution is a procedure for maintaining H-measure. H-measure is an alternative method proposed by Hand [14]. The main idea is as follows: consider minimizing weighted loss, where c is a cost ratio, σ is a threshold, z is a random data point, and π k = p(ℓ(z) = k) are class priors. Let us write σ(c) to be the threshold minimizing Q(c, σ) for a given c. Increasing c will decrease σ(c), or in other words by varying c we will vary the threshold. As pointed out by Flach et al. [10] the curve Q(c, σ(c)) is a variant of a cost curve (see [8]), Here the difference is that Q(c, σ(c)) uses class priors π k whereas the cost curve omits them.
Since not all values of c may be sensible, we assume that we are given a weight function u(c). We are interested in measuring the weighted minimum loss as we vary c, Here small values of L indicate strong signal between the labels and the score. The H-measure is a normalized version of L, Here, Lmax is the largest possible value of L over all possible ROC curves. The negation is done so that the values of H are consistent with the AUC scores: values close to 1 represent good performance.
We will see that the convenient choice for u will be a beta distribution, as suggested by Hand [14], since it allows us to express the integrals in a closed form.
Computing the empirical H-measure in practice starts with an ROC curve X. The following computations assume that the ROC curve is convex. If not, then the first step is to compute the convex hull of X, which we will denote by Y = (y 0 , . . . , ym). Taking a convex hull will inflate the performance of the underlying classifier, however it is possible to modify the underlying classifier (see [14] for more details) so that its ROC curve is convex.
We then define where, recall that, π k = p(ℓ(z) = k) are the class probabilities and (y 0 , . . . , ym) is the convex hull. The probabilities π k can be either estimated from Z or by some other means. If former, then we show that we can maintain the H-measure exactly, if latter, then we need to estimate the measure in order to achieve a sublinear maintenance time.
We also set c 0 = 0 and cm = 1. Note that c i is a monotonically decreasing function of the slope of the convex hull. This guarantees that c i ≤ c i+1 . We can show that (see [14]) if c i < c < c i+1 , then the minimum loss is equal to We can now write Eq. 3 as and if we use beta distribution with parameters (α, β) as u(c), we have where B(·, α, β) is an incomplete beta function. Finally, we can show that the normalization constant is equal to Given an ROC curve X, computing the convex hull Y , and subsequent steps, can be done in O(n) time. We will show in Section 5 that we can maintain the H-measure in O log 2 n time if π k are estimated from Z. Otherwise we will show in Section 6 that we can approximate the H-measure in O (ǫ −1 + log n) log n time.
As pointed earlier, Q(c, σ(c)) can be viewed as a variant of a cost curve. If we were to replace Q with the cost curve and use uniform distribution for u, then, as pointed by Flach et al. [10], L is equivalent to the area under the cost curve. Interestingly enough, we cannot use the algorithm given in Section 5 to compute the area of under the cost curve as the precense of the priors is needed to decompose the measure. However, we can use the algorithm in Section 6 to estimate the area under the cost curve.
Interestingly enough, Q(c, σ) can be linked to AUC. If, instead of using the optimal threshold σ(c), we average Q over carefully selected distribution for σ and also use uniform distribution for c, then the resulting integral is a linear transformation of AUC [10].
Self-balancing search trees In this paper we make a significant use of selfbalancing search trees such as AVL-trees of red-black trees. Such trees are binary trees where each node, say u, has a key, say k. The left subtree of u contains nodes with keys smaller than k and the right subtree of u contains nodes with keys larger than k. Maintaining this invariant allows for efficient queries as long as the height of the tree is kept in check. Self-balancing trees such as AVL-trees or red-black trees keep the height of the tree in O(log n). The balancing is done with O(log n) number of left rotations or right rotations whenever the tree is modified (see Figure 2). Searching for nodes with specific keys, inserting new nodes, and deleting existing nodes can be done in O(log n) time. Moreover, splitting the search tree into two search tree or combining two trees into one can also be done in O(log n) time.
We assume that we can compare and manipulate integers of size O(n) and real numbers in constant time. We do this because it is reasonable to assume that the current bit-length of integers in modern computer acrhitecture is sufficient for any practical applications, and we do need to resort to any custom big integer implementations. If needed, however, the running times need to be multiplied by an additional O(log n) factor.

Related work
Several works have studied maintaining AUC in a sliding window. Brzezinski and Stefanowski [6] maintained the order of n data points using a red-black tree but computed AUC from scratch, resulting in a running time of O(n + log n), per update. Tatti [19] proposed algorithm yielding ǫ-approximation of AUC in O (1 + ǫ −1 ) log n) time, per update. Here the approach bins the ROC space into a small number of bins. The bins are selected so that the AUC estimate is accurate enough. Bouckaert [3] proposed estimating AUC by binning and only maintaining counters for individual bins. On the other hand, in this work we do not need to resort to binning, instead we can maintain the exact AUC by maintaining a search search tree structure in O(log n) time, per update.
We should point out that AUC and the H-measure are defined over the whole ROC curve, and are useful when we do not want to commit to a specific classification threshold. On the other hand, if we do have the threshold, then we can easily maintain a confusion matrix, and consequently maintain many classic metrics, for example, accuracy, recall, F 1-measure [11,12], and Kappa-statistic [2,20].
In a related work, Ataman et al. [1], Brefeld and Scheffer [4], Ferri et al. [9], Herschtal and Raskutti [15] proposed methods where AUC is optimized as a part of training a classifier. Note that this setting differs from ours: changing the classifier parameters most likely will change the scores of all data points, and may change the data point order significantly. On the other hand, we rely on the fact we can maintain the order using a search tree. Interestingly, Calders and Jaroszewicz [7] estimated AUC using a continuous function which then allowed optimizing the classifier parameters with gradient descent.
Our approaches are useful if we are working in a sliding window setting, that is, we want to compute the relevant statistic using only the last n data points. In other words, we abruptly forget the (n + 1)th data point. An alternative option would be to gradually downplay the importance of older data points. A convenient option is to use exponential decay, see for example a survey by Gama et al. [13]. While maintaining the confusion matrix is trivial when using exponential decay but-to our knowledge-there are no methods for maintaining AUC or H-measure under exponential decay.

Maintaining AUC
In this section we present a simple approach to maintain AUC in O(log n) time. We accomplish this by showing that the change in AUC can be computed in O(log n) time whenever a new point is added or an existing point is deleted. We rely on the following two propositions that express how AUC changes when adding or deleting a data point. We then show that the quantities occurring in the propositions, namely, the weights (u 1 , u 2 ) and (v 1 , v 2 ) can be obtained in O(log n) time.
Proposition 1 (Addition) Let Z be a set of data points with (n 1 , n 2 ) label counts. Let Y be a set of points having the same score σ.
Proof We will use Mann-Whitney U statistic, given in Eq. 2 to prove the claim. Let us write Z ′ = Z ∪ Y and define Eq. 2 states that We obtain the claim by rearranging the terms. ⊓ ⊔ Proposition 2 (Deletion) Let Z be a set of data points with (n 1 , n 2 ) label counts. Let Y ⊆ Z be a set of points having the same score σ. Write (w 1 , w 2 ) = y∈Y d (y).
Define also Note that the sign of the last term is the same for both addition and deletion.
Proof We will use Mann-Whitney U statistic, given in Eq. 2 to prove the claim. Let us write Z ′ = Z \ Y and define Eq. 2 states that We obtain the claim by rearranging the terms. ⊓ ⊔ Note that normally we would be adding or deleting a single data point, that is, Y = {y}. However, the propositions also allow us to modify multiple points with the same score. These two propositions allow us to maintain AUC as long as we can compute (u 1 , u 2 ) and (v 1 , v 2 ). To compute these quantities we will use a balanced search tree T such as red-black tree or AVL tree. Let S be the unique scores of Z. Each score s ∈ S is given a node n ∈ T .
Moreover, for each node x with a score of s, we will store the total label counts having the same score, In addition, we will store cd (x), cumulative label counts of all descendants of x, including x itself. We need to maintain these counts whenever we add or remove nodes from T , change the counts of nodes, or when T needs to be rebalanced.
we can compute cd (x) in constant time as long as we have the cumulative counts of children of x. Whenever node x is changed, only its ancestors are changed, so the cumulative weights can be updated in O(log n) time. The balancing in red-black tree or AVL tree is done by using left or right rotation. Only two nodes are changed per rotation (see Figure 2), and we can recompute the cumulative counts for these nodes in constant time. There are at most O(log n) rotations, so the running time is not increased. Given a tree T and a score threshold σ, let us define lcount (σ, T ) = s(x)<σ d (x), to be the total count of nodes with scores smaller than σ. Computing lcount (s, T ) gives us (u 1 , u 2 ) used by Propositions 1-2.
In order to compute lcount (σ, T ) we will use the procedure given in Algorithm 2. Here, we use a binary search over the tree, and summing the cumulative counts of the left branch. To see the correctness of the algorithm, observe that during the while-loop Algorithm 2 maintains the invariant that u + cd (left(x)) is equal to lcount (s(x) , T ). We should point out that similar queries were considered by Tatti [19]. However, they were not combined with Propositions 1-2.
Algorithm 2: Computes lcount (σ, T ) using a binary search tree Since T is balanced, the running time of Algorithm 2 is O(log n). In summary, we can maintain T in O(log n) time, and we can obtain (u 1 , u 2 ) and (v 1 , v 2 ) using T in O(log n) time. These quantities allow us to maintain AUC in O(log n) time.

Maintaining H-measure
If we were to compute the H-measure from scratch, we first need to compute the convex hull, and then compute the H-measure from the convex hull. In order to maintain the H-measure, we will first address maintaining the convex hull, and then explain how we maintain the actual measure.

Divide-and-conquer approach for maintaining a convex hull
Maintaining a convex hull under point additions or deletions is a well-studied topic in computational geometry. A classic approach by Overmars and Van Leeuwen [18] maintains the hull in O log 2 n time. Luckily, the same approach with some modifications will work for us.
Before we continue, we should stress two important differences between our setting and a traditional setting of maintaining a convex hull.
First, in a normal setting, the additions and removals are done to new points in a plane. In other words, the remaining points do not change over time. In our case, the data point consists of a classifier score and a label, and modifications shift the ROC coordinates of every point. As a concrete example, in a traditional setting, adding a point cannot reveal already existing points whereas adding a new data point can shift the ROC curve enough so that some existing points become included in the convex hull.
Secondly, we do not have the coordinates for all the points. However, it turns out that we can compute the needed coordinates with no additional costs.
We should point out that the approach by Overmars and Van Leeuwen [18] is not the fastest for maintaining the hull: for example an algorithm by Brodal and Jacob [5] can maintain the hull in O(log n) time. However, due to the aforementioned differences adapting this algorithm to our setting is non-trivial, and possibly infeasible.  We will explain next the main idea behind the algorithm by Overmars and Van Leeuwen [18], and then modify it to our needs.
The overall idea behind the algorithm is as follows. A generic convex hull can be viewed as a union of the lower convex hull and the upper convex hull. We only need to compute the upper convex hull, and for simplicity, we will refer to the upper convex hull as the convex hull.
In order to compute the convex hull C for a point set P we can use a conquerand-divide technique. Assume that we have ordered the points using the x-coordinate, and split the points roughly in half, say in sets R and Q. Then assume we have computed convex hulls, say H = {h i } and G = {g i }, for R and Q independently.
A key result by Overmars and Van Leeuwen [18] states that the convex hull C of P is equal to {h 1 , . . . , hu, gv, g v+1 , . . .}, that is, C starts with H and ends with G. See Figure 3a for illustration. The segment between hu and gv is often referred as a bridge.
We can find the indices u and v in O(log n) time using a binary search over H and G. In order to perform the binary search we will store the hulls H and G in balanced search trees (red-black tree or AVL tree). Then the binary search amounts to traversing these trees.
Note that the concatenation and splitting of a search tree can be done in O(log n) time. In other words, we can obtain C for partial convex hulls H and G in O(log n) time.
In order to maintain the hull we will store the original points in a balanced search tree T ; 2 only the leaves store the actual points. Each node in u ∈ T represents a set of points stored in the descendant leaves of u. See Figure 3b for illustration.
Let us write H(u) to be the convex hull of these points: we can obtain H(u) from H(left(u)) and H(right(u)) in O(log n) time. So whenever we modify T by adding or removing a leaf v, we only need to update the ancestors of v, and possibly some additional nodes due to the rebalancing. All in all, we only need to update O(log n) nodes, which brings the running time to O log 2 n .
An additional complication is that whenever we compute H(u) we also destroy H(left(u)) and H(right(u)) in the process, trees that we may need in the future. However, we can rectify this by storing the remains of the partial hulls, and then reversing the join if we were to modify a leaf of u. This reversal can be done in O(log n) time.

Maintaining the convex hull of a ROC curve
Our next step is to adapt the existing algorithm to our setting so that we can maintain the hull of an ROC curve X.
First of all, adding or removing data points shifts the remaining points. To partially rectify this issue, we will use non-normalized coordinates R = (r 0 , . . . , rm) given in Eq. 1. We can do this because scaling does not change the convex hull.
Consider adding or removing a data point z which is represented by a leaf u ∈ T . The points in R associated with smaller scores than s(z) will not shift, and the points in R associated with larger scores than s(z) will shift by the same amount. Consequently, the only partial hulls that are affected are the ancestors of u. This allows us to use the update algorithm of Overmars and Van Leeuwen [18] for our setting as long as we can obtain the coordinates of the points.
Our second issue is that we do not have access to the coordinates r i . We approach the problem with the same strategy as when we were computing AUC.
Let U be the search tree of a convex hull H. Let u ∈ U be a node with coordinates r i . We will define and store d (u) as the coordinate difference r i − r i−1 . Let s i be the score corresponding to r i . Then Eq. 1 implies that d (u) = si−1<s(z)≤si d (z).
In addition, we will store cd (u), the total sum of the coordinate differences of descendants of u, including u itself.
Let u be the root of U . The coordinates, say p, of u in U are cd (left(u)) + d (u). Moreover, the coordinates of the left child of u are p − d (u) − cd (right(left(u))) , and the coordinates of the right child of u are p + d (right(u)) + cd (left(right(u))) .
In other words, we can compute the coordinates of children in U in constant time if we know the coordinates of a parent.
When combining two hulls, the binary search needed to find the bridge is based on descending U from root to the correct node. During the binary search the algorithm needs to know the coordinates of a node which we can now obtain from the coordinates of the parent. In summary, we can do the binary search in O(log n) time, which allow us to maintain the hull of a ROC curve in O log 2 n time.
For completeness we present the pseudo-code for the binary search in Appendix.

Maintaining H-measure
Now that we have means to maintain the convex hull, our next step is to maintain the H-measure. Note that the only non-trivial part is L given in Eq. 5.
Assume that we have n data points Z with n k data points having class k. Let Y = (y 0 , . . . , ym) be the convex hull of the ROC curve computed from Z. Let  (d 1 , . . . , dm) the non-normalized differences between the neighboring points, that is, We will now assume that π k occurring in Eq. 5 are computed from the same data as the ROC curve, that is, π k = n k /n. We can rewrite the first term in Eq. 5 as If we use the beta distribution for u, Eq. 6 reduces to Let us now consider values c j . Because we assume that π k are estimated from the testing data, we have π k = n k /n, so the values c j , given in Eq. 4, reduce to In summary, the terms of the sum in Eq. 7 depend only on the coordinate differences d j . We should stress that this is only possible if we assume that π k are computed from the same data as the ROC curve. Otherwise, the terms n k will not cancel out when computing c j .
Let T be a binary tree representing a convex hull. The sole dependency on d j allows us to use T to maintain the H-measure. In order to do that, let v ∈ T be a node with the coordinate difference (d 1 , We also maintain ch(v) to be the sum of h(u) of all descendants u of v, including v. Note that maintaining ch(v) can be done in a similar fashion as cd (v).
Finally, Eq. 7 implies that L = ch(root(T )) nB (1,α,β) , allowing us to maintain the Hmeasure in O log 2 n time. 6 Approximating H-measure In our final contribution we consider the case where π k are not computed from the same dataset as the ROC curve. The consequence is that we no longer can simplify c j so that it only depends on d j , and we cannot express L as a sum over the nodes of the tree representing the convex hull. We will approach the task differently. We will still maintain the convex hull H. We then select a subset of points from H from which we compute the H-measure from scratch. This subset will be selected carefully. On one hand, the subset will yield an ǫ-approximation. On the other hand, the subset will be small enough so that we still obtain polylogarithmic running time.
We start by rewriting Eq. 5. Given a function x : [0, 1] → R + , let us define Consider the values {y i } and {c i } as used in Eq. 5. We define two functions f, g : [0, 1] → R + as We can now write Eq. 5 as L = L 1 (f) + L 2 (g). We say that a function x ′ is an ǫ-approximation of a function x if x(c) − x ′ (c) ≤ ǫx(c). The following two propositions are immediate.
Proposition 3 Let x ′ be an ǫ-approximation of x, then Proposition 4 Let f and g be defined as in Eq. 8, and let f ′ and g ′ be respective ǫ-approximations. Define .
In other words, if we can approximate f and g, we can also approximate the H-measure. Note that the guarantee is ǫ (1 −H), that is, the approximation is more accurate when H is closer to 1, that is, a classifier is accurate.
Next we will focus on estimating g.

Proposition 5
Assume ǫ > 0. Let Y be the convex hull of an ROC curve. Let Q be a subset of Y such that for each y i , there is q j ∈ Q such that Let g be the function constructed from Y as given by Eq. 8, and let g ′ be a function constructed similarly from Q. Then g ′ is an ǫ-approximation of g.
Proof Let (c i ) be the slope values computed from Y using Eq. 4, and let (c ′ i ) be the slope values computed from Q.
Due to convexity of Y , the slope values have a specific property that we will use several times: fix index j, and let i be the index such that y i = q j . Then Assume 0 < c < 1. Let i be an index such that c i ≤ c < c i+1 , consequently g(c) = y i2 . Similarly, let j be an index such that c ′ j ≤ c < c ′ j+1 , so that g ′ (c) = q j2 . Let a be an index such that q j = ya.
Proposition 6 Assume ǫ > 0. Let Y be a convex hull of a ROC curve. Let Q be a subset of Y such that for each y i , there is q j ∈ Q such that Let f be the function constructed from Y as given by Eq. 8, and let f ′ be a function constructed similarly from Q. Then f ′ is an ǫ-approximation of f .
The above propositions lead to the following strategy. Only use a subset of the ROC curve to compute the H-measure; if we select the points carefully, then the relative error will be less than ǫ.
Let us now focus on estimating L 2 (g). Assume that we have the convex hull Y = {y 0 , . . . , ym} of a ROC curve stored in a search tree T . Consider an algorithm given in Algorithm 3 which we call Subset.
Algorithm 3: Subset(u, p, q, ǫ), outputs truncated part of the convex hull tree. Here, u is the current node, p and q are the minimum and the maximum coordinates of the subtree rooted at u, and ǫ is the approximation guarantee.
Subset(left(u) , p, z, ǫ); 5 Subset(right (u) , z, q, ǫ); The pseudo-code traverses T , and maintains two variables p and q that bound the points of the current subtree. If q 2 ≤ (1 + ǫ)p 2 , then we can safely ignore the current subtree, otherwise we output the current root, and recurse on both children. It is easy to see that Q = {y 0 , ym} ∪ Subset(r, 0, cd (r)) satisfies the conditions of Proposition 5.
A similar traverse can be also done in order to estimate L 1 (f). However, we can estimate both values with the same subset by replacing the if-condition with Proof Given a node v, let us write Tv to mean the subtree rooted at v. Write pv and qv to be the values of p and q when processing v.
Let V be the reported nodes by Subset. Let W ⊆ V be a set of m nodes that have two reported children. Let {h 1 , . . . , hm} be the non-normalized 2nd coordinate of nodes in W , ordered from smallest to largest.
Fix i and let u and v be the nodes corresponding to h i and h i+1 . Assume that v / ∈ Tu. Let r = right (u) be the right child of u. Then Tr ∩ W = ∅ as otherwise h i and h i+1 would not be consecutive. We have h i+1 ≥ q r2 > (1 + ǫ)p r2 = (1 + ǫ)h i .
In summary, h i+1 > (1 + ǫ)h i . Since {h i } are integers, we have h 2 ≥ 1. In addition, hm ≤ n since the original data points (from which the ROC curve is computed) do not have weights.
Given v ∈ W , define k(v) to be the number of nodes in V \ W that have v as their youngest ancestor in W . The nodes contributing to k(v) form at most two paths starting from v. Since the height of the search tree is in O(log n), we have Finally, we can bound |V | by concluding the proof. ⊓ ⊔

Speed-up
It is possible to reduce the running time of Subset to O log 2 n + ǫ −1 log n . We should point out that in practice Subset is probably a faster approach as the theoretical improvement is relatively modest but at the same time the overheads increase.
There are several ways to approach the speed-up. Note that the source of the additional log n term is that in the proof of Proposition 7, we have k(v) ∈ O(log n). The loose bound is due to the fact that we are traversing a search tree balanced on tree height. We will modify the search procedure, so that we can show that k(v) ∈ O(1) which will give us the desired outcome. More specifically, we would like to traverse the hull using a search tree balanced using the 2nd coordinate.
The best candidate to replace the search tree for storing the convex hull is a weight-balanced tree [17]. Here, the subtrees are (roughly) balanced based on the number of children. The problem is that this tree, despite its name, does not allow weights for nodes. Moreover, the algorithm relies on the fact that the nodes have no weights.
It is possible to extend the weight-balanced trees to handle the weights but such modification is not trivial. Instead we demonstrate an alternative approach that is possible using only stock search structures.
We will do this by modifying the search tree T in which the nodes correspond to the partial hulls, see Figure 3b.
Let Z be the current set of points and let P = {(s, ℓ) ∈ Z | ℓ = 2} be the points with label equal to 2. Set N = Z \ P . We store P in a tree T of bounded balance; the points are only stored in leaves. Each leaf, say u, also stores all points in N that follow immediately u. These points are stored in a standard search tree, say Lu, so that we can join two trees or split them when needed. Any points in N that are without a preceding point in P are handled and stored separately.
Note that Lu correspond to a vertical line when drawing the ROC curve. Consequently, a point in the convex hull will always be the last point in Lu for some u. This allows us to define the weight d (u) of a leaf u in T as (m, 1), where m is the number of nodes in Lu. We now apply the convex hull maintenance algorithm on T . As always, we maintain the cumulative weights cd (u) for the non-leaf nodes.
In order to approximate the H-measure we will use a variant of Subset, except that we will traverse T instead of traversing the hull. The pseudo-code is given in Algorithm 4. At each node we output the bridge, if it is included in the final convex hull. The condition is easy to test, we just need to make sure that it does not overlap with the previously reported bridges. Since we output both points of the bridge, this may lead to duplicate points, but we can prune them as a postprocessing step. Finally, we truncate the traversal if the subtree is sandwiched between two bridges that are close enough to each other. It is easy to see that the output of SubsetAlt satisfies the conditions in Proposition 5 so we can use the output to estimate L 2 (g). In order to estimate L 1 (f) we duplicate the procedure, except we swap the labels and negate the scores which leads to a mirrored ROC curve.
Algorithm 4: SubsetAlt(u, o, p, q, ǫ), outputs truncated part of the convex hull tree. Here, u is the current node, o are the minimum coordinates of the subtree rooted at u, p and q are the coordinate bounds based on already reported bridges, and ǫ is the approximation guarantee. Proof Let T be the tree traversed by SubsetAlt. Let us write Tv to be the subtree rooted at v.
Let n(v) be the number of nodes in Tv, and let ℓ(v) be the number of leaves in Tv. Note that n(v) = 2ℓ(v) + 1.
Let v be a child of u. Since T is a weight-balanced tree [17], we have Let us write o(v) to be the 2nd origin coordinate of Tv. Note that o(v) corresponds to the variable o 2 in SubsetAlt when v is processed.
Let V be the set of nodes whose bridges we output, and let U be the set of nodes in T for which ℓ(u) > ǫo(u).
We will prove the claim by showing that V ⊆ U and |U | ∈ O log 2 n + ǫ −1 log n .
To prove the first claim, let v ∈ V . Let p and q match the variables of SubsetAlt when v is visited. The points p and q correspond to the two leaves of Tv. In other words, q 2 − p 2 ≤ ℓ(v), and o(v) ≤ p 2 . Thus, This proves that v ∈ U .
To bound |U |, let W ⊆ U be a set of m nodes that have two children in U . Define (h 1 , . . . , hm) = (o(right(v)) | v ∈ W ) to be the sequence of the (nonnormalized) 2nd coordinates of the right children of nodes in W , ordered from the smallest to the largest.
Fix i. Let u ∈ W be the node for which o(right (u)) = h i , and let v ∈ W be the node for which o(right(v)) = h i+1 .
Assume that h i > o(v). Then u ∈ T left (v) , and consequently v / ∈ T right(u) . Thus, T right(u) ∩ W = ∅ as otherwise h i and h i+1 are not consecutive. Since right (u) ∈ U , we have In summary, we have h i+1 > (1 + ǫ)h i . Note that h 1 ≥ 1. In addition, hm ≤ n since the original data points (from which the ROC curve is computed) do not have weights.
Consequently, n ≥ hm ≥ (1 + ǫ) m−1 . Solving m leads to Given v ∈ W , define k(v) to be the number of nodes in V \ W that have v as their youngest ancestor in W . The nodes contributing to k(v) form at most two paths starting from v. Since the height of the search tree is in O(log n), we have Assume that ǫ > α/2 (recall that α = 1 2 (1 − √ 2)). Then proving the proposition.
Recall that the nodes corresponding to k(v) form at most two paths. Let u 1 , . . . , u j be such a path. Let w be a child of u 1 for which w / ∈ U . We have which in turns implies 1 + ℓ(u 1 ) ≤ 2α −1 (1 + ǫo(u 1 )). Applying Eq. 11 iteratively and the fact that u j ∈ U , we see that 1 + ǫo(u 1 ) ≤ 1 + ǫo(u j ) (u j is a child of u 1 ) Solving for j leads to j ≤ 1 + log 1−α α/2 ∈ O(1), and consequently k(v) ∈ O(1). We conclude that we see that DynAuc is faster than the baseline by several orders of magnitude with the needed time increasing logarithmically for DynAuc and linearly for the baseline. We repeat the same experiments but now we compare maintaining the Hmeasure against computing it from scratch from sorted data points. From the results shown in Figures 6 and 7 we see that Hexact is about 10-10 2 times faster, and the time grows polylogarithmically for Hexact and linearly for the baseline. Similarly, the spikes in running time of Hexact are due to self-balancing search trees. Interestingly, Hexact is faster for APS than for the other datasets. This is probably due to the imbalanced labels, making the ROC curve relatively skewed, and the convex hull small.
In our final experiment we use approximative H-measure, without the speedup described in Section 6.1. Here, we measure the total time to compute the Hmeasure for z 1 , . . . , z i for every i as a function of ǫ. Figure 8 shows the running time as well as the difference to the correct score when using the whole data. Computing the H-measure from scratch required roughly 1 minute for APS, and 2.5 minutes for Diabetes and Dota2. On the other hand, we only need 10 seconds to obtain accurate result, and as we increase ǫ, the running time decreases. As we increase ǫ, the error grows but only modestly (up to 3%), with Happrox underestimating the exact value.

Conclusions
In this paper we considered maintaining AUC and the H-measure under addition and deletion. More specifically, we show that we can maintain AUC in O(log n) time, and the H-measure in O log 2 n time, assuming that the class priors are obtained from the testing data. We also considered the case, where the class priors are not obtained from the testing data. Here, we can approximate the H-measure in O (log n + ǫ −1 ) log n time.
We demonstrate empirically that our algorithms, DynAuc and Hexact, provide significant speed-up over the natural baselines where we compute the score from the sorted, maintained data points.
When computing the H-measure the biggest time saving factor is maintaining the convex hull, as the hull is typically smaller than all the data points used for creating the ROC curve. Because of the smaller size of the hull, the tricks employed by Happrox, provide less of a speed-up. Still, for larger values of ǫ, the speed-up can be almost 50%.