Fast and accurate computation of the Euclidean norm of a vector

The numerical computation of the Euclidean norm of a vector is perfectly well conditioned with favorite a priori error estimates. Recently there is interest in computing a faithfully rounded approximation which means that there is no other floating-point number between the computed and the true real result. Hence the result is either the rounded to nearest result or its neighbor. Previous publications guarantee a faithfully rounded result for large dimension, but not the rounded to nearest result. In this note we present several new and fast algorithms producing a faithfully rounded result, as well as the first algorithm to compute the rounded to nearest result. Executable MATLAB codes are included. As a by product, a fast loop-free error-free vector transformation is given. That transforms a vector such that the sum remains unchanged but the condition number of the sum multiplies with the rounding error unit.


Notation and introduction
We assume a precision-p binary floating-point arithmetic with rounding to nearest according to the IEEE 754 standard [10] to be given and denote the set of floating-point numbers by .Then is symmetric, i.e., = − , and there is a small- est and largest positive normalized floating-point number realmin and realmax , respectively.We define P ∶= [realmin, realmax] and call N ∶= −P ∪ {0} ∪ P the normalized range.
To be more precise, for "rounding to nearest" we assume RoundTiesToEven which means that a real number being the midpoint of two adjacent floating-point numbers is rounded to the one with even mantissa.Calling that rounding function fl ∶ ℝ →  it follows that fl(a•b) is the floating-point result of a•b for a, b ∈ and • ∈ {+, −, ×, ∕}.
In the computation of the Euclidean norm of a vector intermediate results may be outside N but the final result in N .That is taken care of by case distinctions and normalization, see [1,3,20].Henceforth, we assume throughout this note without loss of generality that neither over-nor underflow occurs, i.e., all intermediate results are in N .
For u ∶= 2 −p denoting the relative rounding error unit [7] the refined error esti- mate [4,7,23,31] holds true, and the same constant u 1+u bounds the relative error of every floatingpoint operation.
Many of our results are also true for a precision-p floating-point arithmetic with general base and u = 1 2 1−p .Since we target on MATLAB implementations, we restrict our attention to binary.
Throughout this note ‖ ⋅ ‖ denotes the Euclidean, i.e., 2 -norm.The result of a floating-point evaluation of an expression is denoted by float(⋅) , where parentheses are respected but otherwise any order of evaluation may be used.Hence float(a•b) = fl(a•b) is the floating-point result of a•b for a, b ∈ and • ∈ {+, −, ×, ∕} .For example, s ∶= float(‖x‖) denotes a floating-point approxima- tion of the Euclidean norm of x ∈ n using any order of summation.Standard error estimates [7] yield provided that (n + 1)u < 1 .In [11] we proved a refined estimate without restriction on n.
Lemma 1 Let x ∈ n and s ∶= float( � ∑ n i=1 x 2 i ) computed in any order.Then is true without restriction on n ∈ ℕ.
The bound is basically sharp, but practical experience and probabilistic arguments [8,9,26] suggest that practically the relative error for the Euclidean norm and for summation is hardly larger than √ nu‖x‖. ( � u‖x‖ 1 3 Fast and accurate computation of the Euclidean norm of a vector Recently [6,20] there is interest in algorithms computing a faithful approximation of the Euclidean norm.That means that there is no other floating-point number between the computed and the true real result.Both are based on error-free transformations and some kind of double-double arithmetic [2], where the latter was already considered in [5].The computed result is thus equal to the rounded to nearest result or to one of its neighbors.If the true result is a floating-point number, that will be the result of the algorithms.
Both approaches [6,20] are devoted to the computation of the Euclidean norm.In [17] we introduced a novel pair arithmetic cpair and prove sufficient conditions that for a general arithmetic expression comprised of {+, −, ×, ∕, √ } the result computed using cpair is faithfully rounded.As a by-product it includes the Euclidean norm.
One difference to the well-known double-double pair arithmetic [2], which is intrinsically used in [6,20], is that a final error-free transformation is omitted.That speeds up the algorithms in cpair significantly.While there is not much penalty in the accuracy of the computed result, it bears the advantage that, in contrast to [2], the higher order part is equal to the ordinary floating-point result.In that sense cpair is a floating-point arithmetic together with an error term.
In this note we will give some new algorithms for the computation of a faithfully rounding of the Euclidean norm as well as for the rounded to nearest result.All algorithms are given in executable MATLAB code [21].We invest particular care in designing fast algorithms diminishing the interpretation overhead.In particular, we avoid loops as they may slow down the performance significantly.
This note is organized as follows.The next section recalls error-free transformations and some improvements, and mainly error estimates to ensure a faithfully rounded result.In Sect. 3 a vectorized error-free vector transformation is given, we recall recent sharp error estimates on summation and present the first two of our new algorithms to approximate ‖x‖ .Those are based on relative splittings and adopt methods presented in [25].In the next section another two new algorithms are presented using absolute splitting as in [28,29], again with sufficient conditions on a faithfully rounded result.In Sect. 5 an algorithm is presented computing the nearest approximation of ‖x‖ with proof of correctness.To our knowledge that is the first of its kind.The generation of ill-conditioned test examples, i.e., floating-point vectors x with ‖x‖ very close to a switching point is addressed in Sect.6.The note is closed with computational results on the computing time and the accuracy of all algorithms and a conclusion.

Error-free transformations and previous algorithms
Since a long time it is known [5,13,14,22] that the sum and product of two floating-point numbers can be expressed as the sum x + y of two floating-point numbers, and that x and y can be computed using few pure floating-point operations.It was used implicitly by Neumaier, who wrote the remarkable paper [24] when he was a bachelor student, otherwise it was basically known to experts.The methods received wide attention when I coined the term "error-free transformations" in [25] with numerous papers following thereafter.
For this note we need only the error-free transformations for sum and product; for details of other error-free transformations see e.g.[23] (Fig. 1).Consider We display all algorithms in executable MATLAB code; later some longer algorithms appear so that we decided to add line numbers.The following is true [18,22,23,25].The assumptions for Algorithm FastTwoSum can be weakened [18,23], but we do not need this here.One might use Algorithm FastTwoSum together with an "if"-statement thereby reducing the number of operations from 6 to 3, however, that is often slower [25] than applying Algorithm TwoSum.
The proof of correctness [23] relies on the fact that all operations from row 3 on are error-free, i.e., cannot cause a rounding error.
The key to the error-free transformation of multiplication is to split [5] both factors into a sum of two floating-point numbers such that the product of the addends does not cause a rounding error.The Algorithm Split for the binary64 format can be implemented as follows (Fig. 2).
In precision-p the factor in line 2 is to be replaced [5] by 2 ⌈p∕2⌉ + 1 .For various splitting methods and many details see [12].For the calculation of the Euclidean norm we need only squares, so we add a specialized method for that (Fig. 3).
Since the input is split into two parts we use, for example, the notation Aa to indicate that A+a = Aa in line 3, and similarly for Bb in Algorithm TwoProduct. (2) x + y = a + b and fl(x + y) = x.Proof The first result (3) is well-known [14,23], where the proof relies on the fact all operations in lines 3 − 5 do not cause a rounding error.That proves (4) as well because multiplication by 2 is error-free.The last estimate is a well-known property [23] of Algorithm TwoProduct.◻ For given x ∈ n , previous approaches [6, 20] borrow from the double-double pair arithmetic [2] to calculate a pair (T, t) such that T + t is an accurate approxima- tion of the sum of squares ∑ n i=1 x 2 i .Another candidate for a pair arithmetic is the cpair arithmetic [17].Both are implemented as toolboxes dd and cpair in INT-LAB [27], the MATLAB toolbox for Reliable Computing.
In [6] TwoProduct is used to compute a pair approximation for x 2 i , in [20] FMA instructions are used.While this is part of the new floating-point standard [10] and implemented on many computers, it is not available in MATLAB. 1 Therefore some of our algorithms avoid that in this note.
Given (T, t) it remains to compute a good floating-point approximation of √ T + t .In [6] just sqrt(T) is used ignoring the lower order part t.In [20] the algorithm (3) of our cpair arithmetic [17] is used adapted to one output P + p rather than the pair (P, p) (Fig. 4).If (T, t) are such that the correction t is below the last bit of T, i.e., fl(T + t) = T , then the result of AccSqrt is almost always equal to

√
T , at most one bit apart.In [20,Theorem 3.6] the following error estimate is proved.The theorem estimates the error of P + p rather than that of fl(P + p) , otherwise the additional rounding error u would spoil the result.Now we can display the Algorithms normG from [6] and normL from [20].Recall that the latter used an FMA instruction to calculate P, p in line 2, that is, they use ( ) = ( ). ∧ and p(i) = FMA(x(i), x(i), −P(i)) inside the loop.Since the FMA instruction is not available in MATLAB, we replaced the computation in the loop by [P,p] = TwoSquare(x) splitting the whole vector x without loop.In that respect later time comparisons may be more fair (Fig. 5).
The summation scheme in Algorithms normG and normL is slightly different, but the main improvement is in the last line: Algorithm normG ignores the lower order part, whereas normL uses our Algorithm AccSqrt in Fig. 4 to compute the square root approximation of the pair S + s .As we will see in Sect. 7that produces often a nearest rounding.
An alternative is to use the double-double and the cpair toolbox directly (Fig. 6): The goal is to guarantee a faithfully rounded approximation to ‖x‖ or even the rounded to nearest result.In [6] it is proved that, if computed in binary64, the result is a faithfully rounded approximation to ‖x‖ if n < (24u + u 2 ) −1 , corresponding  Fast and accurate computation of the Euclidean norm of a vector to n ≲ 3.7 ⋅ 10 14 .Our cpair arithmetic proves similar conditions for general arith- metic expressions.Applied to the Euclidean norm the the result is faithful for n ≤ ( u) − 1 2 when using base-arithmetic, and that corresponds to n ≲ 8.3 ⋅ 10 7 in binary64.In [20] we did not find an explicit limit for n, but the error estimates suggest that it should be a little larger than that for Algorithm normG.
In order to prove a faithful rounding for our algorithms to be presented we use the following criterion [17,Lemma 5.3].That is a specialized version; the original allows for a much more general computer arithmetic.In a typical application a pair (T, t) with r ∶= T + t is an approximation to some real quantity q.If |r − q| < u 2−u |r| , then fl(r) is a faithful rounding of q.An applica- tion is the following criterion that fl(T + t) is a faithful rounding of q ∶= √ x.
Lemma 6 Let T, t ∈ with T + t > 0 be given, and let 0 ≤ q ∈ ℝ .Assume for some ∈ ℝ with  < 1 .Let r ∈ ℝ be such that for some  < 1 .Then implies that fl(r) is a faithful rounding of q.
Proof Note that for positive x, y ∈ ℝ .We show We distinguish two cases.First, suppose T + t ≤ q 2 .Then (5) gives Second, suppose T + t > q 2 .Then using again q 2 ≤ T+t 1− and ( 5) give and proves (8).Hence (6) yields and r ≥ (1 − ) √ T + t together with Lemma 5 implies the result.◻ In our applications is very small.A sufficient criterion that AccSqrt(T,t) is a faithful rounding of √ x follows.

Faithfully rounding of ‖x‖ based on relative splitting of x
The Algorithms TwoProduct and TwoSquare as in Fig. 3 apply to vector input as well.As a consequence we obtain the following lemma.
Lemma 7 For a, b ∈ n the output [P, p] of Algorithm TwoProduct as in Fig. 3 applied to a, b satisfies ∑ n i=1 a i b i , and the output [P, p] of Algorithm TwoSquare applied to a satisfies Thus one way to approximate ‖x‖ is to compute vectors P, p ∈ n with P + p = ‖x‖ 2 and apply some accurate summation algorithm.Both [6] and [20] fol- low that scheme.Note that the vectors P, p are computed based on a relative splitting of x; later we will an absolute splitting.
Fast and accurate computation of the Euclidean norm of a vector In [25] efficient summation algorithms are developed based on TwoSum.First, q = VecSum(p) transforms an input vector p into a vector q without changing its sum S but with the property that q 1…n−1 is small in absolute value and q n = float( ∑ n i=1 p i ) .The error estimates in [25] imply that res = float( ∑ n i=1 q i ) is a very good approximation of the true sum S.
Before continuing, we need to estimate the error of ordinary floating-point summation.To that end the traditional Wilkinson-type estimate n−1 can be used.However, new and optimal bounds are available.The following sharp bound was shown in [16,Theorem 5].
for summation in any order, and denote by i the errors in the n − 1 nodes of the evaluation tree.Hence Then and that bound is sharp as for the input vector p = (1, u, … , u) T .
The Algorithm VecSum is realized by a loop in [25].In MATLAB we face some interpretation overhead, so loops should be avoided where possible.That has been done in TwoSquare, and next we give a new, loop-free version of VecSum, see Fig. 7.
It is easily verified that Algorithms VecSum and FastVecSum produce identical results.The error analysis follows by Lemma 8.

Lemma 9
For given p ∈ n let [S, s] be the output of Algorithm FastVecSum.
and that bound is sharp as by the input vector p = (1, u, … , u) T . (9) Fig. 7 Error-free vector transformation We mention that ( 10) is true [15, Theorem 2.1] without restriction on n when replacing k by ku 1+u .Our first algorithm is based on Algorithm Sum2 in [25], which in turn is equivalent to Algorithm IV in [24] (Fig. 8).

Theorem 1 Let res be the result of Algorithm
Then res is a faithful rounding of ‖x‖.Proof We will prove for the scalars [T, t] computed in line 4 of Algorithm normSum2 in order to apply Corollary 1.We know by Lemma 7, so that Lemma 9 implies Denote the floating-point sum sum(s) by s , and correspondingly of the floatingpoint sum sum(p) by p .Note that s ∈ n−1 and p ∈ n .Then Lemma 8 gives and Furthermore, T + t = S + fl( s + p ) .Hence, using S + Algorithm VecSum is an error-free vector transformation, so as in [25] we may apply it a second time, thus further diminishing the condition number of the sum (Fig. 9).Theorem 2 Let x ∈ n be given and apply Algorithm normSum3 to x. Suppose n ≤ ( 17 4 u 2 ) −1∕3 and u ≤ 2 −8 .Then res is a faithful rounding of ‖x‖.
Proof We proceed as in the proof of Theorem 1 and show that the scalars [T, t] in Algorithm normSum3 satisfy The quantities in Algorithm normSum3 are scalars Q, S, T and t as well as vectors P, p ∈ n , q ∈ n−1 and s ∈ 2n−2 .As before . Denote the floating-point sum sum(s) by s .Then Furthermore, T + t = Q + fl(S + s ) .Hence and using (13) yields The factor is monotonically increasing in n.A direct computation for u ∈ {2 −e ∶ 8 ≤ e ≤ 53} and the maximal value n ∶= ⌊( 174 u 2 ) −1∕3 ⌋ verifies Hence ( 14) is true and Corollary 1 finishes the proof.◻ The error of floating-point summation in Lemma 9 is sharp but, as has been mentioned after Lemma 1, highly overestimated in practice: We hardly find cases with relative error exceeding √ nu-unless we looked for them.In particular it seems unlikely that the worst case bound (10) is attained for all summations in Algorithms normSum2 or normSum3.
Theorems 1 and 2 prove that Algorithms normSum2 and normSum3 compute a faithfully rounded result if the vector length n satisfies 32  31 n 2 u ≤ 1 or 4n 3 u 2 ≤ 1 , respec- tively.These are sufficient criteria, but in practice the results are faithful for much larger n.
1 3 Fast and accurate computation of the Euclidean norm of a vector A rough estimate of this limit under practical assumptions, i.e., when replacing k in Lemma 8 by √ ku , suggests a faithfully rounded result for n ≲ u −1 for Algorithm normSum2 and n ≲ 1 4 u −4∕3 for Algorithm normSum3.In other words, in practical applications it suffices to use Algorithms normSum2 and we can always expect a faithfully rounded result.

Faithfully rounding of ‖x‖ based on absolute splitting of x
The error-free transformation of ‖x‖ 2 into (P, p) with P + p = ‖x‖ 2 , as used in [6] and [20] and our algorithms up to now, is based on a relative splitting of the x i , i.e., each x i is transformed into a sum of two floating-point numbers.
Once the vectors P, p are available, any good summation algorithm may be applied.An alternative to a relative splitting of the x i was first proposed in [32].A constant larger in absolute value than all summands is chosen.The split of the vector x into a pair of vectors (r, s) with respect to is such that x = r + s and all bits of r reside in the same range and such that the sum(r) is error-free.The same principle can be applied successively.
In [32] the splitting was performed using scaling and integer rounding, and no analysis was given.In [28] we pursued that principle in Algorithm AccSum with an efficient implementation and thorough error analysis.Based on that Algorithm AccDot is presented in [28] for the accurate computation of a dot product x T y .Basically, it first splits x T y = r + s as in TwoProduct and then applies AccSum.That algorithm can be used for ‖x‖ as well.
Following we split the input vector x into vectors q, b directly such that sum(q.*q) is error-free.That avoids the costly splitting ‖x‖ 2 = P + p by Algorithm TwoSquare.The Algorithm normExtract is presented in Fig. 10.Note that M, in contrast to Algorithm AccSum in [28], is not a power of 2.
The bound on the dimension n as for Algorithm normSum2 to guarantee that the approximation res is a faithful rounding of ‖x‖ is very conservative.We present this algorithm because it is very fast and, as explained at the end of the previous section, we can expect a faithful result up to n ≲ 79 million.That may be sufficient in most practi- cal applications.
To that end we need "ufp" as introduced in [28], the unit in the first place and ufp(0) ∶= 0 .Compared to the often used "ulp", the unit in the last place, it bears the advantage that it is independent of a floating-point format and applies to real numbers as well.The following properties are proved in [28].For 20) holds as well.
Theorem 3 Let x ∈ n be given and apply Algorithm normExtract to x. Suppose n ≤ 11 59 u −1∕3 and u ≤ 2 −8 .Then res is a faithful rounding of ‖x‖.
Proof As in the previous proofs we will show that the scalars [T, t] in Algorithm normExtract satisfy Henceforth we assume x i ≥ 0 as justified by line 2 of Algorithm normExtract.

3
Fast and accurate computation of the Euclidean norm of a vector where ∶= 4∕(2 − (n + 8)u)∕ √ u .Lines 5 and 6 of Algorithm normExtract are similar to Algorithm FastTwoSum in Fig. 1.More precisely, the code for FastTwoSum(M,x) is identical to where q in line 5 of Algorithm normExtract is equal to -qs, and b in line 6 is the same.By (24), Lemma 2 for Algorithm FastTwoSum is applicable, so that there is no rounding error when subtracting M in line 5, i.e., q i = fl(M + x i ) − M .Using ufp(M + x) ≤ 2ufp(M) by ( 24) that implies and q i ≤ x i + u ⋅ ufp(M + x i ) for all i ∈ {1, … , n} .We distinguish three cases to show First, if 2u ⋅ ufp(M) ≤ x i , then ufp(M + x i ) ≤ 2ufp(M) proves (26).Second, if u ⋅ ufp(M) ≤ x i < 2u ⋅ ufp(M) , then ufp(M + x) = ufp(M) and proves (26) as well.Third and finally, if x i < u ⋅ ufp(M) , then fl(M + x i ) = M and q i = 0 .Thus (26), (24) and ( 15) yield Now (20) and (16)  (19) show that the floating-point sum of all q 2 i is errorfree, i.e., S = ∑ n i=1 q 2 i .For c i ∶= float((q i + x i )b i ) we see by (1) that and, if n ≥ 3, Moreover, using (23) and 26) and ( 25) yield In order to show (22) we note that 6(n − 1)u < 61.9 + 12u 4 + 465u 6 + 18u 10 + 93u 12 Fast and accurate computation of the Euclidean norm of a vector The reason for the severe restriction of the vector length n to guarantee a faithfully rounded result is the estimate (27).As before, a rough estimate under the practical assumption  k ≲ √ ku in Lemma 8 suggests a faithfully rounded result for n ≲ 1  12 u −1∕2 for Algorithm normExtract.The limit on the dimension for guaranteed faithful rounding is improved by the following Algorithm normExtract2 by introducing a second splitting (Fig. 11).
We show this algorithm as yet another example to compute the Euclidean norm faithfully, however, we refrain from giving a complete analysis.We just mention that the main errors occur in line 10, namely, the summation of 2(q i + r i )c i and c 2 i .The following sums of the r 2 i , the q i r i and q 2 i are error-free.

Computation of the nearest rounding of ‖x‖
The algorithms in the previous section adapt Algorithm AccSum in [28] to the computation of the Euclidean norm of a vector.In [29] we explored that principle by designing Algorithm NearSum to compute the rounded to nearest value of the sum of floating-point numbers, and Algorithm AccSign to compute the sign of the sum.Several other algorithms such as storing the result in an unevaluated vector, the rounded downward and upward result, treatment of vectors of huge lengths and more.
Next we derive Algorithm normNearest to compute the nearest value of the Euclidean norm of a vector.To that end we first present an adapted version of the Algorithm Transform derived in [28] (Fig. 12).
In our adaptation we rewrote the "repeat"-into a "while"-loop and omitted the output parameter .Then Lemma 4.3 in [28] shows the following.

Proof
The definition of M in line 2 implies 2 M ≥ k + 2 ≥ 2 M−1 and therefore Hence the the assumptions of Lemma 4.3 in [28] are satisfied, and the assertions until (30) follow.The last statement is implied by Theorem 4.2 in [29].◻ The smaller the constant is, the less "while"-loops are necessary in Algorithm Transform.As shown in [28] and [29] the chosen constants = 2 2 M for a faithful result and = 2 M for the sign are optimal.
Our Algorithm NearSum needs the predecessor and successor of a floatingpoint number.The next Algorithm PredSucc combines Algorithm 1 in [30] (Fig. 13).
In Theorem 2.2 in [30] it is shown that Algorithms Pred and Succ computes the predecessor and successor of a floating-point number c provided that u ≤ 1  16 and except for a tiny range near the smallest positive normalized floating-point number.
To avoid that, we scaled the input in line 2 so that, provided no overflow occurs, Algorithm PredSucc computes the predecessor and successor of c.Of course, proper scaling avoids overflow.Now we can present our Algorithm normNearest in Fig. 14 to compute the nearest value of the Euclidean norm of a vector.It borrows from Algorithm Near-Sum in [29] and is adapted to our task.Fast and accurate computation of the Euclidean norm of a vector Remark 1 There are obvious ways to improve Algorithm normNearest by utilizing information obtained in the first transformation in line 3 in the following transformations in lines 5 and possibly 19, or by integrating the call in line 5 into that of line 3.Moreover, the transformation in Algorithm 3.3 in [29] with an extra parameter computing a faithful rounding of + ∑ n i=1 p i could be used.We refrain from doing that keep the code simple.
Remark 2 Algorithm Transform in line 3 transforms the input vector [S; s] into p.The number of "while"-loops is proportional to the condition number of the sum, i.e., how close the true is sum to the midpoint of adjacent floating-point numbers.
Algorithm Transforms in lines 5 and possibly 19 is applied to the already transformed vector p, so that in all our examples we did not encounter more than 2 loops.
Theorem 4 Let x ∈ n be given and apply Algorithm normNearest to x, where Algorithm Transforms in lines 5 and 19 is identical to Transform in Fig. 12 with replacing the constant in line 3 by = 2 M .Suppose n ≤ 1 4 u −1∕2 − 4 .Then the computed result res is equal to the Euclidean norm of x rounded to the nearest floating-point number, i.e., = fl(‖x‖).
∑ n i=1 p i , so that Lemma 10 shows that and that f computed in line 4 is a faithful rounding of ‖x‖ 2 .Thus pred(f ) < ‖x‖ 2 < succ(f ) .The vector argument of Transforms in line 5 is equal to Lemma 10 shows that the signs of Q and the computed coincide, It follows that , Hence g 1 and g 2 are equal or adjacent floating-point numbers, and (32) yields In other words, the nearest rounding of ‖x‖ is in {g 1 , g 2 } .Thus, if g 1 = g 2 , the nearest rounding is equal to g 1 = g 2 which is handled in line 15.Otherwise, line 17 implies g 2 1 = R + r .Then d, which is a power of 2 because it is half the distance between g 1 and g 2 , is computed in line 18 without rounding error.Thus the product 2g 1 d is computed without error as well, and the sum of the vector argument of Transforms in line 19 is equal to Note that the length of the vector argument is 2n + 6 and the assumption on n veri- fies that Lemma 10 is applicable and implies that sign( ) = sign(S) .Now g 1 + d is the midpoint between the adjacent floating-point numbers g 1 and g 2 , and the result follows by ‖x‖ ∈ {g 1 , g 2 } . ◻ We mention that the assumption n ≤ 1 4 u −1∕2 − 4 can be lifted to n ≤ 1 32 u −1 − 64 using the ideas in Algorithm AccSumHugeN in [29], but we refrain from exploring this.( 31) .
1 3 Fast and accurate computation of the Euclidean norm of a vector We showed that the f computed in line 4 is a faithful rounding of ‖x‖ 2 .As has been noted in [6] that does not imply that fl( √ f ) is a faithful rounding of ‖x‖ , but likely AccSqrt(f,delta) is.

Generation of ill-conditioned examples
A vector p is ill-conditioned with respect to the nearest rounding of ‖p‖ if a very small change of the input data changes the result.The closer ‖p‖ is to a switching point, the more difficult and ill-conditioned is the computation of the nearest rounding.For positive f ∈ its successor is succ(f For given it is, in principle, not too difficult to generate a vector p with ‖p‖ having a relative distance to a switching point.To that end a multiple precision package may be helpful.However, when doing this we observed a severe influence on the timing.The mere presence of a call to the multiple precision package, of course, outside the loop to be measured, changed the measured computing time by a factor of 2 and more.Therefore, we wrote Algorithm GenVec, see Fig. 15.Using it ensured reliable computing times.Fig. 15 Vector p ∈ n with relative distance of ‖p‖ to a switching point The challenge is to approximate the anticipated final result ‖p‖ near a switch- ing point s "from below": During a loop the vector norm must always stay below s.That is the principle of Algorithm GenVec, a nice example of our algorithms with absolute splitting used for faithful and nearest rounding.
The rationale is as follows.The output vector p is computed in K segments each of length m.The initialization in line 4 ensures that the final vector length is n.The floating-point number f in line 6 or its successor f + 1 is the antici- pated result of the nearest rounding of the final vector p to be generated with relative distance e to the switching point f + 0.5 .The initial vector p as in line 4 satisfies ‖p‖ 2 = p 1 + p 2 and f − ‖p‖ > 0 .Lines 7 − −8 yield f 2 = F 1 + F 2 and e ⋅ f = ef 1 + ef 2 , so that for the S in line 9.Here ∑ S i denotes the mathematical sum of all elements of S. Furthermore, lines 10 − 11 and Lemma 10 imply that sumS is a faithful rounding of ∑ S i .The in line 15 satisfies ≤ 1 − 4u reasonable values of n and e, so that ps in line 14 satisfies In the for-loop the element ps is appended to the vector p and −ps 2 = −ps 1 − ps 2 to the vector S, so that the sum T = ∑ S i changes into T − ps 2 .Since sumS is a faithful rounding of ∑ S i , (33) implies that T − ps 2 > 0. At the end of every loop, sumS is always a faithful rounding of the sum ∑ S i by lines 19 − 20 , and the construction implies that sumS decreases into (1 − 2 ) in each step.The starting value of sumS is about f 2 , and and K are chosen such that ≤ |e| after finishing the loop.After finishing the for-loop, sumS is a faithful rounding of (f + 1  2 In the above setting = 1 2 and � = 1+e 2 , so that the relative distance of ‖p‖ is � − = e .The "approximations" in (34) are very accurate.Fast and accurate computation of the Euclidean norm of a vector Finally, if e < 0 , then ‖p‖ is left of the switching point f + 1 2 and f is the nearest rounding, otherwise, as computed in line 24, the nearest rounding is f + 1 .The ran- dom perturbation in line 22 may be useful for testing the generality of algorithms.
It is clear from the code that the elements of one segment are close together, and the segments decay with the factor .If the number of segments K is increased, then a better distrubution of the vector elements of p is obtained, however, at the cost of increasing computing time.

Computational results
The following computational results are all performed using MATLAB Version 2020b on some core i7 Laptop.In all of the following examples the number of test cases is generally 1 million, but for large dimensions chosen such that the computing time stays below 1 hour.
We start with some timing comparisons of variants of MATLAB implementations.For example, an alternative to Algorithm Split in Fig. 2 is the following (Fig. 16).
The following Table 1 shows the computing time of Algorithm Split1 divided by that for Algorithm Split1.We also compare sqr(a) vs. a.*a,TwoProduct vs. TwoSquare as in Fig. 3 and VecSum vs. FastVecSum as in Fig. 7.
The original Algorithm Split is significantly faster than the simulation by log2 and round, so we use Algorithm Split.Similarly, Algorithm TwoSquare in Fig. 3 is some 50% faster than Algorithm TwoProduct, and the loop-free vari- ant Algorithm FastVecSum in Fig. 7 is much faster than Algorithm VecSum in [25].We use a.*a because the time seems the same as for sqr(a).
Before we come to timing comparisons, we give information about the accuracy of our algorithms and competitors.We start with possibilities to approximate ‖x‖ by the built-in MATLAB routines, where the obvious candidate is norm(x).We generate random testcases and display triples of numbers: The first and second is the percentage of nearest and faithful roundings, respectively, and the third the percentage where the result is not faithful.
As can be seen in the third row of Table 2, the built-in function norm(x) is surprisingly accurate, more accurate than theory predicts [7][8][9].
We therefore perform tests on the same data using sqrt(sum(x.*x))and an ordinary for-loop.Still sqrt(sum(x.*x)) is more accurate than expected, only the for-loop shows the anticipated behavior.
Details on the actual implementation of norm and sum are confidential, but the data in Table 2 suggests that some higher precision or compensating algorithms are used.The essential difference between Algorithms normG [6] and normL [20] is the use of AccSqrt of [17] for the square root approximation in the last line of normL.
To see the advantage, we use Algorithm normGacc which is identical to Algorithm normG except that The following Table 3 shows the percentage of nearest rounding random test cases with different dimensions.
There was no case with not faithful rounding, as proved in [6], and for the improved Algorithm normGacc we found only for n = 10 7 cases where the round- ing was not to nearest, in fact, some 17%.
Up to now we used random vectors produced by randn(n,1) for which it is not too difficult to calculate a nearest rounding of ‖x‖ .That changes when the true result is close to the midpoint between two adjacent floating-point numbers, i.e., close to a "switching point". 2o that end we use Algorithm GenVec to generate vectors x of different dimensions with relative distance of ‖x‖ to a switching point.For each pair of dimension n and relative distance , we display the percentage of nearest roundings in Table 4.
In all test cases and for all algorithms we did not encounter an example with not faithful rounding.As already seen in Table 3, generally Algorithm normGacc outperforms the original normG in terms of accuracy.Algorithm normG is targeted to a faithfully rounded result S, not minimizing the error of S + s versus ‖x‖ 2 .Thus about half the results of Algorithm normG are nearest, the other half faithful but not nearest.
Algorithm normDD uses a general purpose double-double arithmetic, and Algorithm normCpair our pair arithmetic with computable error bounds.As Algorithms normGacc and normL are tailored methods but use the same principle, we expect similarly accurate results.Indeed, that can be seen in  and normSum3 are based on similar principles, they show the same accuracy, with normSum3 being a little bit better.Algorithms normExtract and normExtract2 are based on a different principle, namely on an absolute splitting.As mentioned this avoids the costly application of Algorithm AccSquare.For moderate distance the rounding is nearest, including large vector lengths, for distance 10 −10 and below the accuracy is similar to normG with roughly a 50-50 chance of nearest result.As we will see next, the little less number of nearest roundings is compensated by a much better performance.
The number of nearest cases improves a little bit with normExtract2, and the result of Algorithm normNearest is, of course, always rounded to nearest.
Next we present timing results for our algorithms and competitors for random vectors and for dimension up to n = 10 7 .It is appropriate to use random vectors because the computing time of all algorithms except normNearest do not depend on the difficulty of the problem, only on the length of the input vector; times for normNearest for different are displayed separately.
It turns out that our new Algorithm normExtract is always the fastest.Therefore the following Table 5 shows the time ratio against normExtract.The timing for Algorithms normDD and normCpair is dominated by MATLAB's interpretation overhead and in particular by the use of operator overloading.Therefore  1 3 Fast and accurate computation of the Euclidean norm of a vector comparing the computing times hardly gives information on the performance of the algorithms and is omitted.
From the operation count it may surprise that normExtract is so much faster than normG and normL.Now normExtract is based on AccSum in [28], and it was analyzed by Langlois [19] that it enjoys a better instruction-level parallelism than other algorithms.The same applies to normSum2 and may explain its relatively good performance, and also to normSum3 where we see twice the computing time of normSum2, as expected.That is still faster than normG and normL.There is an exception to all algorithms, namely n = 10 5 .We think this is due to unfortunate cache management, similarly for normNearest.
Algorithm Nearest is about as fast as normL, for medium size dimension much faster, although it guarantees a nearest rounding of ‖x‖.
The time in seconds for 5000 calls in dimension up to 1 million of all algorithms is shown in Fig. 17.The legend on the left is ordered by performance, from the slowest normG downto the fastest normExtract.All algorithms except Algorithm normNearest execute the same code independent of the difficulty of the problem, hence the computing time depends almost linearly on the dimension.For normNearest we see small zig-zags depending on the number of transformations.
Finally we investigate whether the guarantee of nearest rounding causes a time penalty for Algorithm normNearest if ‖x‖ is very close to a switching point.As before we generate examples with relative distance to a switching point.The ratio of computing time of Algorithm normNearest to normExtract are displayed in Table 6; the time for the other algorithms does not change because they are independent of the condition of the problem.
There is not much performance impact on the computing time of Algorithm normNearest in our examples despite the guarantee of nearest rounding of ‖x‖ , even for a tiny relative distance = 10 −100 to a switching point.

Summary
We may use a general purpose pair arithmetic such as double-double [2] or [17] to calculate an accurate approximation of the Euclidean norm ‖x‖ of a vector.To that end we presented Algorithms normDD and normCpair in Fig. 6.Specialized algorithms based on a pair arithmetic have been presented in [6,20] and are displayed as Algorithms normG and normL in Fig. 5.
In this note we developed Algorithms Sum2 and Sum3 in Sect. 3 based on relative splitting as algorithm in [25].The performance is significantly improved by a vectorized version FastVecSum of VecSum in [25].In addition, Algorithms Extract and Extract2 based absolute splittings as in [28,29] are presented in Sect. 4.

Lemma 2
Let a, b ∈ be given and x, y be the result of Algorithm TwoSum applied to a, b.Then If |a| ≥ |b| , then (2) is also true for the result of Algorithm FastTwoSum.

Lemma 4
Let T, t ∈ be such that fl(T + t) = T , and assume u ≤ 2 −5 .Let P, p be the final values in Algorithm AccSqrt when applied to the pair [T, t].Then

Fig. 17
Fig. 17 Timing for 5000 calls for random vectors of dimension up to 1 million

Table 1
Time comparisons for different vector lengths

Table 2
Percentage of rounding is nearest/faithful/none

Table 3
Percentage of nearest rounding of normG vs. normGacc

Table 4
Percentage of nearest rounding for relative distance of ‖x‖ to switching point Fast and accurate computation of the Euclidean norm of a vector

Table 5
Timing relative to normExtract for random vectors of length n

Table 6
Timing of normNearest/normExtract, relative distance of ‖x‖ to switching point