We discuss possible approaches to fix TimSort and describe how one of these fixes was formally verified. To be specific, we proved mechanically that TimSort terminates normally for any input, i.e., no ArrayIndexOutOfBoundsException is thrown, see Listing 7. We have not proven that TimSort ensures the sortedness and permutation properties. However, as will become clear in Sect. 4.3 below, even the proof of absence of top-level exceptions required verification of a considerable amount of non-trivial functional specifications (500+ lines, see Table 2). The reason is that ensuring that the exception is never thrown depends on a complex wellformedness property that must be maintained. Hence, we do not think this specification and verification effort could have been achieved with “lightweight” verification tools [6].
Different Ways to Fix TimSort
In [13] we suggested two possible fixes of the TimSort bug. The first was to use larger stack sizes based on an operational pen-and-paper worst case analysis of the implementation. While the code change is trivial (modification of a few integer numbers in the allocation table), we did not favor this solution as it fixes the symptom, but not the underlying problem that the fundamental invariant of TimSort is broken. In short, when adopting this fix (as done in the Java OpenJDK), one uses an algorithm that by itself is not fully understood why and how it works. We do not know how to formulate a correct invariant for the original implementation of the algorithm.
We favored the second suggestion which is to formalize the invariant as originally intended and to fix the code of the method mergeCollapse that is responsible for re-establishing the invariant. We were able to formally and mechanically prove that this fixed version of the algorithm is correct in the sense that the stack lengths are sufficient and no ArrayIndexOutOfBoundsException is thrown. We describe this fix and its verification in Sect. 4.3 below.
In the aftermath of our discovery, it turned out that the bug was present in several implementations of TimSort. Besides in (Open)JDK,Footnote 4 the bug was present in (1) its original Python implementation,Footnote 5 (2) Android,Footnote 6 (3) an independent Java implementation used by Apache Lucene,Footnote 7 as well as (4) a Haskell implementation.Footnote 8
All of these projects fixed the bug within a short time frame. The OpenJDK project was the only one where the bug was fixed by just increasing the allocated array lengths, which is in our opinion sub-optimal, and there is no machine checked proof of that fix. All other projects implemented our second suggestion and fixed the underlying problem. Notably, the Android fix varies from our proposal, but we were able to mechanically verify their fix with only minor modifications to the specifications and proofs of our fix. We discuss the Android fix in detail in Sect. 4.4. We do not know the reason for the alternative fix as the comment discussing their solution refers to an internal Google issue tracker.
Verification in KeY
Before we start to explain the verification of TimSort in detail, we briefly sketch the verification process in KeY. In particular, we clarify the notion of proof obligation and explain its generic form. We start by explaining that KeY is based on symbolic execution rather than on verification condition generation. For each method a formula in Java Dynamic Logic (JavaDL) is generated, which is valid if and only if the implementation adheres to its specification. The formula is then given to KeY ’s theorem prover to be proven valid. The formula contains the source code of the method to be verified as first class citizen and the calculus rules concerned with program elimination implement a symbolic interpreter, i.e., the whole program elimination is an integral part of the logic calculus itself and not an external entity. The advantage of integrating symbolic execution and first-order reasoning into a logic calculus is that first-order reasoning and symbolic execution rules can be seamlessly interleaved. This allows KeY to simplify intermediate states eagerly and to close infeasible paths early.
The simplified form of a proof obligation for a method m is
$$\begin{aligned} ( pre \wedge inv ) \rightarrow \langle m( args ); \rangle ( post \wedge inv ) \end{aligned}$$
with \( pre , post \) and \( inv \) being the method precondition, postcondition and the class invariant, respectively. In JML, the specification language used by KeY, a precondition is given by a requires clause, and a post-condition is given by ensures. To avoid manually adding the class invariant at all these points, JML offers an invariant keyword which implicitly conjoins the class invariant to all pre- and post-conditions.
The formula expresses that, if a method m is invoked in a state where the precondition and invariant hold, then m terminates normally (not throwing an exception) and in its final state the postcondition and the class invariant holds. This dynamic logic formula is equivalent to the Hoare triple
$$\begin{aligned} \{ pre \wedge inv \}~m( args )~\{ post \wedge inv \} \end{aligned}$$
plus termination.
Such formulas had to be proven valid for all methods of TimSort. Roughly, during verification the method body of m is symbolically executed to translate the above formula into a pure first order formula. When symbolic execution requires to execute a method call statement, the contract of the invoked method was used instead of inlining its implementation. Using a contract involves to show that at the point the precondition of the called method as well as the invariant of the receiver object holds. Symbolic execution then continues with the next statement after the method invocation, assuming that the method’s postcondition and the class invariant holds.
We conclude this section with a brief overview of the required proof-obligations for verification of TimSort, and their interplay. Figure 1 provides a (simplified) call graph. For each of the methods we have to prove that they adhere to their specification and in particular preserve the invariant. The methods directly relevant for the bug are the methods pushRun (where the exception was thrown) and mergeCollapse, which failed in its original implementation to re-establish the invariant. The specification and verification of these methods is explained in detail in Sects. 4.3 and 4.4.
The methods mergeLo/Hi do not change the runLen array or other program locations occurring in the invariant at all and thus cannot invalidate the invariant. Their verification proved surprisingly challenging because of their complex control flow which caused the number of symbolic paths to explode. In order to be able to prove these methods, extensions to the KeY verification system were necessary, namely, an improved rule to verify do-while loops (see Sect. 5.1) and state-merging (see Sect. 6.3).
The verification of these methods was mostly necessary to exclude the presence of implicit run-time exceptions and only a few of the listed methods mentioned in the previous two paragraphs modify fields occurring in the invariant. Their method contracts serve mostly to ensure that no NullPointerExceptions are thrown and that accesses to the array to be sorted are within bounds.
Verification of the Code that Re-establishes the Invariant
In Sect. 3 we showed that mergeCollapse does not fully re-establish the invariant, which led to an ArrayIndexOutOfBoundsException in pushRun. Now we fix mergeCollapse so that the invariant of the main sorting loop (Listing 1) is re-established, formally specify the new implementation in JML and provide a correctness proof, focusing on the most important specifications and proof obligations. (In the specification listings shown below we omitted some irrelevant lines; the specifications are otherwise unaltered). This formal proof has been fully mechanized in the theorem prover KeY [1]. Statistics of the verification effort and insights drawn from it are discussed in Sects. 5 and 6.
Listing 8 shows the fixed version of mergeCollapse. The main idea is to check validity of the element invariant on the top four elements of runLen (lines 4–5 and 8), instead of only the top three, as in the original implementation. The question arises: why is checking the last four runs sufficient? Initially, the precondition of mergeCollapse guarantees that all but the last three runs satisfy the element invariant. After mergeAt, the entry of runLen at index stackSize-2 or stackSize-1 may be modified, but runs at earlier indices remain intact. Thus the element invariant of runLen[stackSize-4] might not hold after merging, but the element invariant of earlier runs is not affected by the merging. This is the basis for checking the element invariant on the last four runs. Merging continues until the top four elements satisfy the invariant, at which point we break out of the merging loop (line 9). We prove below that this ensures that all runs satisfy the invariant.
To obtain a human readable specification and a feasible (mechanized) proof, we introduce suitable abstractions using the following auxiliary predicates:
Predicate name
|
Predicate definition
|
---|
\(\text{ elemBiggerThanNextTwo }(arr, idx)\)
|
\((0 \le idx \wedge idx+2 < arr.length) \rightarrow \)
|
|
\(arr[idx] > arr[idx+1] + arr[idx+2]\)
|
\(\text{ elemBiggerThanNext }(arr, idx)\)
|
\(0 \le idx \wedge idx+1 < arr.length \rightarrow \)
|
|
\(arr[idx] > arr[idx+1]\)
|
\(\text{ elemLargerThanBound }(arr, idx, v)\)
|
\(0 \le idx < arr.length \rightarrow arr[idx] \ge v\)
|
\(\text{ elemInv }(arr, idx, v)\)
|
\(\text{ elemBiggerThanNextTwo }(arr, idx) \wedge \)
|
|
\(\text{ elemBiggerThanNext }(arr, idx) \wedge \)
|
|
\(\text{ elemLargerThanBound }(arr, idx, v)\)
|
The predicate \(\text{ elemInv }(\texttt {+}runLen+,\mathtt {i}, 16)\) holds when runLen[i] satisfies the element invariant as defined in Sect. 3, and has length at least 16 (the lower bound on the minimal run length). Aided by these predicates we are ready to express the formal specification, beginning with the main sorting loop, which contains the fundamental invariant of TimSort.
Invariant of Main Sorting Loop We now specify the main sorting loop (Listing 1), formalising the invariant discussed in Sect. 3. Listing 9 shows the loop invariant in JML (note the use of the JML keyword loop_invariant). The crucial lines 6–10 express that all elements in runLen satisfy the invariant.
The local variable lo points to the index in the input array a of the first element that has yet to be processed, thus a has been partitioned into runs from indexFootnote 9old(lo) to lo. Since JML by default uses Java integer types, which can overflow, we need to make sure this does not happen by casting those expressions that potentially can overflow to \(\backslash \texttt {bigint}\) (the \(\backslash \texttt {bigint}\) type represent mathematical integers). Furthermore nRemaining counts the number of elements still to be processed. This explains line 4 and 5. Line 11–12 specify that if there are remaining elements, all runs in runLen have a length of at least 16.
The class invariant is formalised next. It is roughly a weaker version of this loop invariant.
Class Invariant As mentioned above, a class invariant is a property that all instances of a class should satisfy and which must be preserved by each instance method, i.e., if it holds before a method invocation then it must also hold after termination of the method.Footnote 10 This means the class invariant is implicitly contained in a method’s pre- and postcondition.
A seemingly natural candidate for the class invariant states that all runs on the stack satisfy the element invariant and have a length of at least 16, like lines 6–10 of the sort(..) loop invariant. The method pushRun critically relies on this invariant, to ensure that runLen is sufficiently long to push a new run on the stack (otherwise, it throws the ArrayIndexOutOfBoundsException). However, this class invariant is not preserved by pushRun. Further, inside the loop of mergeCollapse (Listing 8) the mergeAt method is called, so the class invariant must hold after it. But the merge could result in a new entry at the one-but-last index stackSize-1 in runLen, thus the element invariant for the last four runs might be broken after mergeAt. Finally, the last run pushed on the stack in the main sorting loop (Listing 1) can be shorter than 16 if fewer items remain. The class invariant given in Listing 10 addresses all this.
Lines 3–6 specify the length of runLen in terms of the length of the input array a. Line 8 formalize the property that the length of all runs together (the sum of all run lengths) does not exceed a.length. Line 8 contains bounds for stackSize. Line 9 expresses that all but the last four elements satisfy the element invariant. The properties satisfied by the last four elements are specified on lines 10–13. Lines 14–16 say that run \(\mathtt {i}\) starts at runBase[i] and extends for runLen[i] elements.
The
pushRun
method.
This method adds a new run of length runLen to the stack starting at index runBase.Footnote 11 Lines 4–5 of Listing 11 express that the starting index of the new run (runBase) directly follows after the end index of the last run (at index stackSize-1 in this.runLen and this.runBase). The assignable clause indicates which locations can be modified; it entails that previous runs on the stack are unchanged.
ThemergeCollapsemethod. The new implementation of mergeCollapse restores the invariant at all elements in runLen; this is stated in lines 6–7 of Listing 12. Since the method mergeCollapse only merges existing runs, the sum of all run lengths should be preserved (lines 8–9). Line 10 expresses that the length of the last run on the stack after merging never decreases (merging increases it). This is needed to ensure that all runs, except possibly the very last one, have length \(\ge 16\).
The loop invariant of mergeCollapse is given in Listing 13. As discussed above, merging preserves the sum of all run lengths (lines 2–3). Line 4 expresses that all but the last four runs satisfy the element invariant: a merge at index \(\texttt {stackSize-3}\) (before merging) can break the invariant of the run at index stackSize-4after merging (beware: stackSize was decreased). Lines 5–8 state the conditions satisfied by the last four runs. Lines 9–10 specify consistency between runLen and runBase. Line 11 states that stackSize can only decrease through merging.
In order to prove the contract of mergeCollapse we make use of the contract of mergeAt in Listing 14. The postcondition formalizes the first three cases of the merging pattern of mergeCollapse as described in Sect. 2. It allows us to formally prove that, by repeated application of this merging pattern, the loop invariant of mergeCollapse in Listing 13 is established again.
To prove that each method satisfies its contract, several verification conditions must be established. We discuss the two most important ones. The first states that on entry of pushRun, the stackSize must be smaller than the stack length:
This is a crucial property: the ArrayIndexOutOfBoundsException of Listing 6 was caused by its violation.
Proof
Line 8 of the class invariant implies \(\texttt {stackSize} \le \texttt {this.runLen.length}\). We derive a contradiction from \(\texttt {stackSize} = \texttt {this.runLen.length}\) by considering four cases: \(\texttt {a.length} < 120\), or \( \texttt {a.length} \ge 120\texttt { \& \& }{} \texttt {a.length} < 1542\), or \( \texttt {a.length} \ge 1542\texttt { \& \& }{} \texttt {a.length} < 119151\), or \(\texttt {a.length} \ge 119151\). We detail the case \(\texttt {a.length} < 120\), the other cases are analogous. Since \(\texttt {a.length} < 120\), line 3 of the class invariant implies \(\texttt {stackSize} = \texttt {this.runLen.length} = 4\).
Let \({\texttt {SUM} = \texttt {this.runLen[0]}} \ldots + \texttt {this.runLen[3]}\). Suitable instances of lines 15–16 of the class invariant imply \(\texttt {this.runBase[3]} + \texttt {this.runLen[3]}\)\(= \texttt {this.runBase[0]} + \texttt {SUM}\). Together with line 14 of the class invariant and lines 4–5 of the pushRun contract we get \(\texttt {runLen} + \texttt {SUM} < 120\). But the \(\backslash \texttt {requires}\) clause of pushRun implies \(\texttt {runLen} > 0\), so \(\texttt {SUM} < 119\). The \(\backslash \texttt {requires}\) clause also implies \(\texttt {runLen[3]} \ge 16\) (line 9), \(\texttt {runLen[2]} \ge 17\) (line 8), \(\texttt {runLen[1]} \ge 34\) and \(\texttt {runLen[0]} \ge 52\) (line 7). So \(\texttt {SUM} \ge 16+17+34+52 = 119\), a contradiction. \(\square \)
The second verification condition arises from the break statement in the loop of the method mergeCollapse (Listing 8, line 9). At that point the guards on lines 4–5 are false, the one on line 8 is true, and the \(\backslash \texttt {ensures}\) clause of mergeCollapse (which implies that the invariant holds for all runs in runLen) must be proven:
Proof
Preservation of sums (lines 8–9 of \(\backslash \texttt {ensures}\)) follows directly from lines 2–3 of the loop invariant. Lines 10–11 of \(\backslash \texttt {ensures}\) are implied by lines 11–12 of the loop invariant. The property elemBiggerThanNext(runLen,stackSize-2) follows directly from \(\texttt {n}>= 0 \texttt { ==> runLen[n]} > \texttt {runLen[n+1]}\). We show by cases that
-
\(\texttt {i} < \texttt {stackSize-4}\): from line 4 of the loop invariant.
-
\(\texttt {i} = \texttt {stackSize-4}\): from line 3 of the premise. The original mergeCollapse implementation (Listing 4) did not cover this case, which was the root cause that the invariant \(\texttt {elemInv(runLen, i, 16)}\) could be false for some i.
-
\(\texttt {i} = \texttt {stackSize-3}\): from the line 4 of the premise.\(\square \)
Preservation of the Main Loop Invariant The final proof obligation we discuss states that the loop invariant (Listing 9) of the main sorting loop (Listing 1) is preserved by the loop body. Line 2 follows from line 11 of the contract of mergeCollapse (Listing 12) and line 8 of the class invariant (Listing 10). Line 3 follows from the contract of minRunLength and and line 9–14 of the main loop. Line 4 follows from line 20, 21 of the main loop. For line 5, notice that pushRun increases the sum of run lengths with (the local variable passed as parameter) runLen (see line 12–15 of its contract, Listing 11); this sum is preserved by mergeCollapse (line 8 and 9 of its contract); finally in line 20 of the main loop, lo is incremented with the value of runLen.
Line 6–7, formalising the crucial part of the invariant, follow from line 6 of the contract of mergeCollapse. Line 8 follows from line 7 of the contract of mergeCollapse. Lines 12–13 follow from lines 12–13 of the class invariant. Lines 11-12 follow from lines 11–15 and 21 of the main loop. Finally, lines 13–14 follows from lines 15–16 of the class invariant.
Of course, all proof obligations described above (plus all others) were formally shown in KeY.
The Android Fix
In the Android implementation of TimSort, the bug was corrected by adapting the method mergeCollapse in a different way than proposed in Listing 8. We discuss this new version and its correctness, which we have proved in KeY.
The above method also checks that the last four elements of runLen satisfy the invariant, but at a different time than the corrected version in Listing 8. Suppose that on entry of the loop, the last five elements of runLen are A, B, C, D, E. As explained in Sect. 3, the invariant can be broken at (the run of length) A by merging the runs C and D, which would happen in line 6. But in this case, the above method immediately checks whether the invariant holds at A (i.e., whether \(A> B + C + D\)), and if not, it merges the last two runs, yielding the run length stack \(A,B,C+D+E\). Hence, the appropriate invariant for mergeCollapse_android ensures that the invariant holds at all but the last three elements of runLen. Indeed, it is obtained from the invariant of mergeCollapse in Listing 13 by replacing lines 4–5 by the following:
This is the only required change: the contract of mergeCollapse_android is the same as the contract of mergeCollapse. Hence, similar to the explanation of the correctness of mergeCollapse, the main verification condition arises from the break statement at line 16. At that point, the guards at line 4 and 13 are false, hence we must prove the following:
The invariant of mergeCollapse_android implies that of mergeCollapse, and, because of (2), it implies the left-hand side of the verification condition (1) discussed above in the correctness of mergeCollapse. Finally, since we have that
(their contracts are identical), the above verification condition follows from (1).