On the Average Case of MergeInsertion

MergeInsertion, also known as the Ford-Johnson algorithm, is a sorting algorithm which, up to today, for many input sizes achieves the best known upper bound on the number of comparisons. Indeed, it gets extremely close to the information-theoretic lower bound. While the worst-case behavior is well understood, only little is known about the average case. This work takes a closer look at the average case behavior. In particular, we establish an upper bound of $n \log n - 1.4005n + o(n)$ comparisons. We also give an exact description of the probability distribution of the length of the chain a given element is inserted into and use it to approximate the average number of comparisons numerically. Moreover, we compute the exact average number of comparisons for $n$ up to 148. Furthermore, we experimentally explore the impact of different decision trees for binary insertion. To conclude, we conduct experiments showing that a slightly different insertion order leads to a better average case and we compare the algorithm to the recent combination with (1,2)-Insertionsort by Iwama and Teruyama.


Introduction
Sorting a set of elements is an important operation frequently performed by many computer programs. Consequently there exist a variety of algorithms for sorting, each of which comes with its own advantages and disadvantages.
Here we focus on comparison based sorting and study a specific sorting algorithm known as MergeInsertion. It was discovered by Ford and Johnson in 1959 [5]. Before D. E. Knuth coined the term MergeInsertion in his study of the algorithm in his book "The Art of Computer Programming, Volume 3: Sorting and Searching" [7], it was known only as Ford-Johnson Algorithm, named after its creators. The one outstanding property of MergeInsertion is that the number of comparisons it requires is close to the information-theoretic lower bound of log(n!) ≈ n log n−1.4427n (for sorting n elements). This sets it apart from many other sorting algorithms. MergeInsertion can be described in three steps: first pairs of elements are compared; in the second step the larger elements are sorted recursively; as a last step the elements belonging to the smaller half are inserted into the already sorted larger half using binary insertion.
In the worst case the number of comparisons of MergeInsertion is quite well understood [7] -it is n log n+b(n)·n+o(n) where b(n) oscillates between −1.415 and −1.3289. Moreover, for many n MergeInsertion is proved to be the optimal algorithm in the worst case (in particular, for n ≤ 15 [9,10]). However, there are also n where it is not optimal [8,2]. One reason for this is the oscillating linear term in the number of comparisons, which allowed Manacher [8] to show that for certain n it is more efficient to split the input into two parts, sort both parts with MergeInsertion, and then merge the two parts into one array.
Regarding the average case not much is known: in [7] Knuth calculated the number of comparisons required on average for n ∈ {1, . . . , 8}; an upper bound of n log n − 1.3999n + o(n) has been established in [3]. Most recently, Iwama and Teruyama [6] showed that in the average case MergeInsertion can be improved by combining it with their (1,2)-Insertion algorithm resulting in an upper bound of n log n − 1.4106n + O(log n). This reduces the gap to the lower bound by around 25%. It is a fundamental open problem how close one can get to the information-theoretic lower bound of n log n − 1.4427n (see e. g. [6,11]).
The goal of this work is to study the number of comparisons required in the average case. In particular, we analyze the insertion step of MergeInsertion in greater detail. In general, MergeInsertion achieves its good performance by inserting elements in a specific order that in the worst case causes each element to be inserted into a sorted list of 2 k − 1 elements (thus, using exactly k comparisons). When looking at the average case elements are often inserted into less than 2 k −1 elements which is slightly cheaper. By calculating those small savings we seek to achieve our goal of a better upper bound on the average case. Our results can be summarized as follows: -We derive an exact formula for the probability distribution into how many elements a given element is inserted (Theorem 2). This is the crucial first step in order to obtain better bounds for the average case of MergeInsertion. -We experimentally examine different decision trees for binary insertion. We obtain the best result when assigning shorter decision paths to positions located further to the left. -We use Theorem 2 in order to compute quite precise numerical estimates for the average number of comparisons for n up to roughly 15000. -We compute the exact average number of comparisons for n up to 148thus, going much further than [7]. -We improve the bound of [3] to n log n − 1.4005n + o(n) (Theorem 3). This partially answers a conjecture from [11] which asks for an in-place algorithm with n log n + 1.4n comparisons on average and n log n − 1.3n comparisons in the worst case. Although MergeInsertion is not in-place, the the techniques from [3] or [11] can be used to make it so. -We evaluate a slightly different insertion order decreasing the gap between the lower bound and the average number of comparisons of MergeInsertion by roughly 30% for n ≈ 2 k /3.

3
-We compare MergeInsertion to the recent combination by Iwama and Teruyama [6] showing that, in fact, their combined algorithm is still better than the analysis and with the different insertion order can be further improved. Most proofs as well as additional explanations and experimental results can be found in the appendix. The code used in this work and the generated data is available on [12].

Preliminaries
Throughout, we assume that the input consists of n distinct elements. The average case complexity is the mean number of comparisons over all input permutations of n elements.

Description of MergeInsertion
The MergeInsertion algorithm consists of three phases: pairwise comparison, recursion, and insertion. Accompanying the explanations we give an example where n = 21. We call such a set of relations between individual elements a configuration.
1. Pairwise comparison. The elements are grouped into n 2 pairs. Each pair is sorted using one comparison. After that, the elements are called a 1 to a ⌊ n Recursion. The n 2 larger elements, i. e., a 1 to a ⌊ n 2 ⌋ are sorted recursively. Then all elements (the n 2 larger ones as well as the corresponding smaller ones) are renamed accordingly such that a i < a i+1 and a i > b i still holds. 3. Insertion. The n 2 small elements, i. e., the b i , are inserted into the main chain using binary insertion. The term "main chain" describes the set of elements containing a 1 , . . . , a t k as well as the b i that have already been inserted. The elements are inserted in batches starting with b 3 , b 2 . In the k-th batch the are inserted in that order. Elements b j where j > n 2 (which do not exist) are skipped. Note that technically b 1 is the first batch; but inserting b 1 does not need any comparison.
Because of the insertion order, every element b i which is part of the k-th batch is inserted into at most 2 k − 1 elements; thus, it can be inserted by binary insertion using at most k comparisons.
Regarding the average number of comparisons F (n) we make the following observations: the first step always requires n 2 comparisons. The recursion step does not do any comparisons by itself but depends on the other steps. The average number of comparisons G(n) required in the insertion step is not obvious. It will be studied closer in following chapters. Following [7], we obtain the recurrence (which is the same as for the worst-case number of comparisons)

Average Case Analysis of the Insertion Step
In this section we have a look at different probabilities when inserting one batch of elements, i. e., the elements b t k to b t k−1 +1 . We assume that all elements of previous batches, i. e., b 1 to b t k−1 , have already been inserted and together with the corresponding a i they constitute the main chain and have been renamed to The situation is shown in Fig. 1.
We will look at the element b t k +i and want to answer the following questions: what is the probability of it being inserted between x j and x j+1 ? And what is the probability of it being inserted into a specific number of elements? We can ignore batches that are inserted after the batch we are looking at since those do not affect the probabilities we want to obtain.
First we define a probability space for the process of inserting one batch of elements: let Ω k be the set of all possible outcomes (i. e., linear extensions) when sorting the partially ordered elements shown in Fig. 1 by inserting Each ω ∈ Ω k can be viewed as a function that maps an element e to its final position, i. e., ω(e) ∈ {1, 2, . . . , 2t k }. While the algorithm mandates a specific order for inserting the elements b t k−1 +1 to b t k during the insertion step, using a different order does not change the outcome, i. e., the elements are still sorted correctly. For this reason we can assume a different insertion in order to simplify calculating the likelihood of relations between individual elements.
Let us look at where an element will end up after it has been inserted. Not all positions are equally likely. For this purpose we define the random variable X i as follows. To simplify notation we define x t k−1 +j := a j for t k−1 < j ≤ t k (hence, the main chain consists of x 1 , . . . , x 2 k ).
We are interested in the probabilities P (X i = j). These values follow a simple pattern (for k = 4 these are given in Table 2 in the appendix).
Theorem 1. The probability of b t k−1 +i being inserted between x j and x j+1 is given by Next, our aim is to compute the probability that b i is inserted into a particular number of elements. This is of particular interest because the difference between average and worst case comes from the fact that sometimes we insert into less than 2 k − 1 elements. For that purpose we define the random variable Y i .
The elements in the main chain when inserting b t k +i are x 1 to x 2t k−1 +i−1 and those elements out of b t k−1 +i+1 , . . . , b t k which have been inserted before a t k−1 +i (which is x 2t k−1 +i ). For computing the number of these, we introduce random variablesỸ i,q counting the elements in {b t k−1 +i+1 , . . . , b t k−1 +i+q } that are inserted before a t k−1 +i : For an illustration see Figure 16 in the appendix. Clearly we have P Ỹ i,0 = j = 1 if j = 0 and P Ỹ i,0 = j = 0 otherwise. For q > 0 there are two possibilities: have been j elements inserted before a t k−1 +i and b t k−1 +i+q is inserted after a t k−1 +i .
From these we obtain the following recurrence: The probability P (X i+q < 2t k−1 + i |Ỹ i,q−1 = j − 1) can be obtained by looking at Fig. 1 and counting elements. When b t k−1 +i+q is inserted, the elements on the main chain which are smaller than a t k−1 +i are x 1 to x 2t k−1 , a t k−1 +1 to a t k−1 +i−1 and j − 1 elements out of {b t k−1 +i+1 , . . . , b t k−1 +i+q−1 } which is a total of 2t k−1 + 2i + j − 2 elements. Combined with the fact that the main chain consists of 2t k−1 + 2i + 2q − 2 elements smaller than a t k−1 +i+q we obtain the probability 2t k−1 +2i+j−1 2t k−1 +2i+2q−1 . We can calculate P (X i+q ≥ 2t k−1 + i |Ỹ i,q−1 = j) similarly leading to By solving the recurrence, we obtain a closed form for P (Ỹ i,q = j) and, thus, for P (Y i = j). The complete proof is given in Appendix B.2.
. Figure 2 shows the probability distribution for Y 1 , Y 21 and Y 42 where k = 7. Y 42 corresponds to the insertion of b t k (the first element of the batch). Y 1 corresponds to the insertion of b t k−1 +1 (the last element of the batch). In addition to those three probability distributions Fig. 3 shows the mean of all Y i for k = 7.
Binary Insertion and different decision trees The Binary Insertion step is an important part of MergeInsertion. In the average case many elements are inserted in less than 2 k − 1 (which is the worst case). This leads to ambiguous decision trees where at some positions inserting an element requires only k  strategy center-right where k = ⌊log n⌋. Notice that the left strategy is also used in [6], where it is called right-hand-binary-search. Figure 5 shows experimental results comparing the different strategies for binary insertion regarding their effect on the averagecase of MergeInsertion. As we can see the left strategy performs the best, closely followed by center-left and center-right. right performs the worst. The left strategy performing best is no surprise since the probability that an element is inserted into one of the left positions is higher that it being inserted to the right. Therefore, in all further experiments we use the left strategy.

Improved Upper Bounds for MergeInsertion
Numeric upper bound The goal of this section is to combine the probability given by Theorem 2 that an element b t k−1 +i is inserted into j elements with an upper bound for the number of comparisons required for binary insertion. By [4], the number of comparisons required for binary insertion when insert- While only being exact in case of a uniform distribution, this formula acts as an upper bound in our case, where the probability is monotonically decreasing with the index.
This leads to an upper bound for the cost of . From there we calculated an upper bound for MergeInsertion. Figure 6 compares those results with experimental data on the number of comparisons required by MergeInsertion. We observe that the difference is rather small.  Computing the Exact Number of Comparisons In this section we explore how to numerically calculate the exact number of comparisons required in the average case. The most straightforward way of doing this is to compute the external path length of the decision tree (sum of lengths of all paths from the root to leaves) and dividing by the number of leaves (n! when sorting n elements), which unfortunately is only feasible for very small n. Instead we use Equation (1), which describes the number of comparisons. The only unknown in that formula is G(n) the number of comparisons required in the insertion step of the algorithm. Since the insertion step of MergeInsertion works by inserting elements in batches, Here Cost(s, e) is the cost of inserting one batch of elements starting from b s+1 up to b e . The idea for computing Cost(s, e) is to calculate the external path length of the decision tree corresponding to the insertion of that batch of elements and then dividing by the number of leaves. As this is still not feasible, we apply some optimizations which we describe in detail in Appendix C.
For n ∈ {1, . . . , 15} the computed values are shown in Table 1, for larger n Fig. 7 shows the values we computed. The complete data set is provided in the file exact.csv in [12]. Our results match up with the values for n ∈ {1, . . . , 8} calculated in [7]. Note that for these values the chosen insertion strategy does not affect the average case (we use the left strategy).  Table 1: Computed values of F (n) · n!.
Improved theoretical upper bounds In this section we improve upon the upper bound from [3] leading to the following result: Theorem 3. The number of comparisons required in the average case of Merge-Insertion is at most n log n − c(x n ) · n ± O(log 2 n) where x n is the fractional part of log(3n), i. e., the unique value in [0, 1) such that n = 2 k−log 3+xn for some k ∈ Z and c : [0, 1) → R is given by the following formula: Hence we have obtained a new upper bound for the average case of MergeInsertion which is n log n − 1.4005n + O(log 2 n). A visual representation of c(x) is provided in Fig. 8. The worst case is near x = 0.6 (i. e., n roughly a power of two) where c(x) is just slightly larger than 1.4005. The proof of Theorem 3 analyzes the insertion of one batch of elements more carefully than in [4]. The exact probability that b t k−1 +i is inserted into j elements is given by Theorem 2. We are especially interested in the case However, the equation from Theorem 2 is hard to work with, so we approximate it with the binomial distribution p  elements less compared to the worst case. Combining that with the bounds from [4] we obtain Theorem 3. The complete proof is given in Appendix B.3.

Experiments
In this section we discuss our experiment, which consist of two parts: first, we evaluate how increasing t k by some constant factor can reduce the number of comparisons, then we examine how the combination with the (1,2)-Insertion algorithm as proposed in [6] improves MergeInsertion.
We implemented MergeInsertion using a tree based data structure, similar to the Rope data structure [1] used in text processing, resulting in a comparably "fast" implementation. Implementation details can be found in Appendix D. All experiments use the left strategy for binary insertion (see Section 3). The number of comparisons has been averaged over 10 to 10000 runs, depending on the size of the input.
Increasing t k by a Constant Factor In this section we modify MergeInsertion by replacing t k witht k = ⌊f · t k ⌋ -otherwise the algorithm is the same. Originally the numbers t k have been chosen, such that each element b i with t k−1 < i ≤ t k is inserted into at most 2 k − 1 elements (which is optimal for the worst case). As we have seen in previous sections many elements are inserted into slightly less than 2 k − 1 elements. The idea behind increasing t k by a constant factor f is to allow more elements to be inserted into close to 2 k − 1 elements. Figure 10 shows how different factors f affect the number of comparisons required by MergeInsertion. The different lines represent different input lengths. For instance, n = 21845 is an input size for which MergeInsertion works best. An overview of the different input lengths and how original MergeInsertion performs for these can be seen in Figure 9. The chosen values are assumed to be representative for the entire algorithm. We observe that for all shown input lengths, multiplying t k by a factor f between 1.02 and 1.05, leads to an improvement. Figure 11 compares different factors from 1.02 to 1.05. The factor 1.0 (i. e., the original algorithm) is included as a reference. We observe that all the other factors lead to a considerable improvement compared to 1.0. The difference between the factors in the chosen range is rather small. However, 1.03 appears to be best out of the tested values. At n ≈ 2 k /3 the difference to the information-theoretic lower bound is reduced to 0.007n, improving upon the original algorithm, which has a difference of 0.01n to the optimum.
Another observation we make from Figure 11 is that the plot periodically repeats itself with each power of two. Thus, we conclude that replacing t k witĥ t k = ⌊f · t k ⌋ with f ∈ [1.02, 1.05] reduces the number of comparisons required per element by some constant.
Combination with (1,2)-Insertion (1,2)-Insertion is a sorting algorithm presented in [6]. It works by inserting either a single element or two elements at once into an already sorted list. On its own (1,2)-Insertion is worse than MergeInsertion; however, it can be combined with MergeInsertion. The combined algorithm works by sorting m = max {u k | u k ≤ n} elements with MergeInsertion. Then the remaining elements are inserted using (1,2)-Insertion. Let u k = 4 3 2 k denote a point where MergeInsertion is optimal.
In Fig. 12 we can see that at the point u k MergeInsertion and the combined algorithm perform the same. However, in the values following u k the combined algorithm surpasses MergeInsertion until at one point close to the next optimum MergeInsertion is better once again. In their paper Iwama and Teruyama calculated that for 0.638 ≤ n 2 ⌈log n⌉ ≤ 2 3 MergeInsertion is better than the combined algorithm. The fraction 2 3 corresponds to the point where MergeInsertion is optimal. They derived the constant 0.638 from their theoretical analysis using the upper bound for MergeInsertion from [3]. Comparing this to our experimental results we observe that the range where MergeInsertion is better than the combined algorithm starts at n ≈ 2 17.242 . This yields 2 17.242 2 18 = 2 17.242−18 = 2 −0.758 ≈ 0.591. Hence the range where MergeInsertion is better than the combined algorithm is 0.591 ≤ n 2 ⌈log n⌉ ≤ 2 3 , which is slightly larger than the theoretical analysis suggested. Also shown in Fig. 12 is the combined algorithm where we additionally apply our suggestion of replacing t k byt k = ⌊f · t k ⌋ with f = 1.03. This leads to an additional improvement and comes even closer to the lower bound of log(n!).

Conclusion and Outlook
We improved the previous upper bound of n log n − 1.3999n+o(n) to n log n−1.4005n+o(n) for the average number of comparisons of MergeInsertion. However, there still is a gap between the number of comparisons required by MergeInsertion and this upper bound.
In Section 4 we used a binomial distribution to approximate the probability of an element being inserted into a specific number of elements during the insertion step. However, the difference between our approximation and the actual probability distribution is rather large. Finding an approximation which reduces that gap while still being simple to analyze with respect to its mean would facilitate further improvements to the upper bound.
Our suggestion of increasing t k by a constant factor f reduced the number of comparisons required per element by some constant. However, we do not have a proof for this. Thus, future research could try to determine the optimal value for the factor f as well as to study how this suggestion affects the worst-case.   For an arbitrary k we can calculate the probabilities P (X i = j) with the following recursive scheme. We start with P (X 1 = j). This corresponds to the insertion of b t k−1 +1 into x 1 , . . . , x 2t k−1 . The probability of all those is uniformly distributed, so P (X 1 = j) = 1 2t k−1 +1 for 0 ≤ j ≤ 2t k−1 . For i > 1 we can express P (X i = j) in terms of P (X i−1 = j). Observe that when inserting b t k−1 +i there are 2t k−1 + 2i − 2 elements known to be smaller than a t k−1 +i . These are x 1 , . . . , x 2t k−1 and a t k−1 +1 , . . . , a t k−1 +i−1 as well as the corresponding b's. The number of elements known to be smaller than a t k−1 +i−1 is one less: just 2t k−1 + 2i − 3. As a result the probability that b t k−1 +i is inserted between a t k−1 +i−1 and a t k−1 +i is P (

A Tables and Figures
The probability that is ends up in one of the other positions consequently is If we know that b t k−1 +i is inserted into one of those other positions, then it is inserted into exactly the same elements as b t k−1 +i−1 , thus we can write P ( . This leads to Eq. (2).
(2) It remains to simplify Eq. (2). We begin with the first case:

Florian Stober and Armin Weiß
For the second case we have By substitution of (3) and (4) in (2) we obtain Theorem 1.
Recall the definitions of Y i ,Ỹ i,q and their relation: To proof Theorem 2 we start with the following closed form for the probability P (Ỹ i,q = j). 1 From the definition ofỸ i,q we can see that 0 ≤Ỹ i,q ≤ q thus P (Ỹ i,0 = 0) = 1. This also holds for Eq. (8).
Recall that for q > 0 there are two possibilities: 1.Ỹ i,q−1 = j − 1 and X i+q < 2t k−1 + i. Informally speaking that means out of {b t k−1 +i+1 , . . . , b t k−1 +i+q−1 } there have been j − 1 elements inserted before a t k−1 +i and b t k−1 +i+q is inserted before a t k−1 +i .
2.Ỹ i,q−1 = j and X i+q ≥ 2t k−1 + i. Informally speaking that means out of {b t k−1 +i+1 , . . . , b t k−1 +i+q−1 } there have been j elements inserted before a t k−1 +i and b t k−1 +i+q is inserted after a t k−1 +i .
Note that the first case requires j > 0 and the second case requires j < q so we look at j = 0 and j = q separately. Using Bayes' theorem we obtain the following identities: The probability P (X i+q < 2t k−1 + i | Y i,q−1 = d) can be obtained by looking at Fig. 16 and counting elements. When b t k−1 +i+q is inserted, the elements on the main chain which are smaller than a t k−1 +i are x 1 to x 2t k−1 , a t k−1 +1 to a t k−1 +i−1 and d elements out of {b t k−1 +i+1 , . . . , b t k−1 +i+q−1 } which is a total of 2t k−1 + 2i − 1 + d elements. Combined with the fact that the main chain consists of 2t k−1 + 2i + 2q − 2 elements smaller than a t k−1 +i+q we obtain the following formula From that we can calculate Now we have all the necessary ingredients to proof Eq. (8) using induction.

B.3 Proof of Theorem 3
The exact probability that b t k−1 +i is inserted into j elements is given by Theorem 2. We are especially interested in the case of b t k−1 +u where u = ⌊ t k −t k−1 2 ⌋, because if we know P (Y u < m) then we can use that for all q < u the probability of b t k−1 +q being inserted into less than m elements is at least P (Y u < m), Fig. 17: Configuration where one batch is to be inserted.

Florian Stober and Armin Weiß
i.e. P (Y q < m) ≥ P (Y u < m). This is because when b t k−1 +i is inserted into m elements, then no matter which position it is inserted into, the next element, b t k−1 +i−1 , is inserted into at most m elements. However Theorem 2 is hard to work with, so we approximate it with a binomial distribution. For a given k let d = t k − t k−1 be the number of elements that are inserted as part of the batch. This configuration is illustrated in Fig. 17 To calculate into how many elements b t k−1 +u = b t k−1 + d 2 is inserted, we ask how many elements out of b t k−1 +⌊ 3 4 d⌋ to b t k (marked as section B in Fig. 17) are inserted between a t k−1 + d 2 +1 and a t k−1 +⌊ 3 4 d⌋−1 (marked as section A). The rationale is that for each element from section B that is inserted into section A, b t k−1 +u is inserted into one less element. As a lower bound for the probability that an element from section B is inserted into one of the positions in section A we use the probability that b t k is inserted between a t k −1 and a t k which is 1 2t k −1 . That is because if we assume that all b i with i < t k are inserted before inserting b t k , then b t k is inserted into 2t k − 2 elements, so the probability for each position is 1 2t k −1 . Since none of the b i with i < t k can be inserted between a t k −1 and a t k because they are all smaller than a t k −1 , the probability that b t k is inserted between a t k −1 and a t k does not change when we insert it first as the algorithm demands.
To calculate the probability that an element b t k −q with q > 0 is inserted into the rightmost position we assume that all b i with i < t k − q are inserted before inserting b t k −q . Then b t k −q is inserted into at most 2t k − q − 2 elements, i. e., the elements Hence the probability for each position is greater than 1 2t k −q−1 which is greater than 1 2t k −1 . Since none of the b i with i < t k − q can be inserted to the right of a t k −q−1 , the probability that b t k − q is inserted into any of the positions between a t k −q−1 and a t k −q remains unchanged when inserting the elements in the correct order.
The probability that an element is inserted at a specific position is monotonically decreasing with the index. This is because if an element b i is inserted to the left of an element a i−h then b i−h is inserted into one more element than it would be if b i had been inserted to the right of a i−h . As a result any position further to the left is more likely than the right-most position, so we can use that as a lower bound.
There are d 4 − 1 elements in section A, i. e., there are at least d 4 positions where an element can be inserted. Hence the probability that an element from section B is inserted into section A is at least ⌊ d 4 ⌋ 2t k −1 and consequently the probability that it is not inserted before b t k−1 +u is at least ⌊ d 4 ⌋ 2t k −1 . That is because all positions part of section A are after a t k−1 +u . Section B contains d 2 elements. Using that and substituting u = d 2 we obtain the binomial distribution with the parameters n B = u 2 and p B = ⌊ d 4 ⌋ 2t k −1 . As a result we have with q = 2 k − 1 − j, that by construction fulfills the property given in Equa- Fig. 18 compares our approximation p(j) with real distribution P (Y u = j). We observe that the maximum of our approximation is further to the right than the one of the real distribution.
We split S(n) into S α (n) + S β (n) with We can represent n as 2 k−log 3+xn with x n ∈ [0, 1). This leads to With y = 1 − x n we obtain Theorem 3.

C Details on Computing the Exact Number of Comparisons
The code for calculating F (n) and G(n) is shown in Algorithm 2 and Algorithm 3 respectively. Cost(s, e) is the number of comparisons required for inserting the batch of elements that consists of b s+1 to b e . Such a configuration can be seen in Fig. 19. Cost(s, e) is computed by calculating the external path length of the decision tree and dividing by the number of leaves. To improve performance we apply the following optimization: We collapse "identical" branches of the decision tree. E.g. whether b e is inserted between x 1 and x 2 or between x 2 and x 3 does not influence the number of comparisons required to insert the subsequent elements. So we can neglect that difference. However, if b e is inserted between a e−1 and a e then the next element (and all thereafter) is inserted into one less element. So this is a difference we need to acknowledge. Same if an element is inserted between any a i and a i+1 . By the time we insert b i the element inserted between a i and a i+1 is known to be larger than b i and thus is no longer part of the main chain, resulting in b i being inserted into one element less. In conclusion that means that our algorithm needs to keep track of the elements inserted between any a i and a i+1 as well as those inserted at any position before a s+1 as two branches of the decision tree that differ in any of these cannot be collapsed. Algorithm 4 shows how this is implemented.
Algorithm 2 Computation of F (n) 1: procedure ComputeF(n) 2: if n = 1 then 3: return 0 4: else 5: return n 2 +ComputeF( n 2 )+ComputeG( n 2 ) 6: end if 7: end procedure once this results in a complexity of O(n 2 ). To avoid this we store the elements in a custom data structure inspired by the Rope data structure [1] used in text processing. Being based on a tree it offers O(log n) performance for lookup, insertion and deletion operations, thus putting our Algorithm in O(n log 2 n). 2. In the second step of the algorithm we need to rename the b i after the recursive call. Our chosen solution is to store which a i corresponds to which b i in a hash map(line 11) before the recursive call and use the information to reorder the b i afterwards(line 13). The disadvantage of this solution is that it requires each element to be unique and the hash map might introduce additional comparisons. An alternative would be to have the recursive call generate the permutation it applies to the larger elements and then apply that to the smaller ones. That is a cleaner solution as it does not require the elements to be unique and it avoids potentially introducing additional comparisons. It is also potentially faster, though not by much. However, we stuck with using a hash map as that solution is easier to implement. 3. In the insertion step we need to know into how many elements a specific b i is inserted. For b t k this is 2 k − 1 elements. However, for other elements that number can be smaller depending on where the previous elements have been inserted. To account for that we create the variable u in line 21. It holds the position of the a i corresponding to the element b i that is inserted next. Thus b i is inserted into u − 1 elements (since b i < a i ). After the insertion of b i , we decrease u in line 25 until it matches the position of a i−1 , which is what we want as b i−1 is the next element to be inserted. This step also makes use of the requirement that each element is unique. At this point we have to be aware that testing whether the element at position u is a i−1 might introduce additional comparisons to the algorithm. This is acceptable because we do not count these comparisons. Also these are not necessary. We could keep track of the positions of the elements a i however we choose not to, in order to keep the implementation simple.