Comparative Verification of the Digital Library of Mathematical Functions and Computer Algebra Systems

Digital mathematical libraries assemble the knowledge of years of mathematical research. Numerous disciplines (e.g., physics, engineering, pure and applied mathematics) rely heavily on compendia gathered findings. Likewise, modern research applications rely more and more on computational solutions, which are often calculated and verified by computer algebra systems. Hence, the correctness, accuracy, and reliability of both digital mathematical libraries and computer algebra systems is a crucial attribute for modern research. In this paper, we present a novel approach to verify a digital mathematical library and two computer algebra systems with one another by converting mathematical expressions from one system to the other. We use our previously eveloped conversion tool (referred to as LaCASt) to translate formulae from the NIST Digital Library of Mathematical Functions to the computer algebra systems Maple and Mathematica. The contributions of our presented work are as follows: (1) we present the most comprehensive verification of computer algebra systems and digital mathematical libraries with one another; (2) we significantly enhance the performance of the underlying translator in terms of coverage and accuracy; and (3) we provide open access to translations for Maple and Mathematica of the formulae in the NIST Digital Library of Mathematical Functions.


Introduction
Digital Mathematical Libraries (DML) gather the knowledge and results from thousands of years of mathematical research.Even though pure and applied mathematics are precise disciplines, gathering their knowledge bases over many years results in issues which every digital library shares: consistency, completeness, and accuracy.Likewise, Computer Algebra Systems (CAS) 8 play a crucial role in the modern era for pure and applied mathematics, and those fields which rely on them.CAS can be used to simplify, manipulate, compute, and visualize mathematical expressions.Accordingly, modern research regularly uses DML and CAS together.Nonetheless, DML [8,19] and CAS [1,12,25] are not exempt from having bugs or errors.Durán et al. [12] even raised the rather dramatic question: "can we trust in [CAS]?" Existing comprehensive DML, such as the Digital Library of Mathematical Functions (DLMF) [11], are consistently updated and frequently corrected with errata 9 .Although each chapter of the DLMF has been carefully written, edited, validated, and proofread over many years, errors still remain.Maintaining a DML, such as the DLMF, is a laborious process.Likewise, CAS are eminently complex systems, and in the case of commercial products, often similar to black boxes in which the magic (i.e., the computations) happens in opaque private code [12].CAS, especially commercial products, are often exclusively tested internally during development.
An independent examination process can improve testing and increase trust in the systems and libraries.Hence, we want to elaborate on the following research question.
How can digital mathematical libraries and computer algebra systems be utilized to improve and verify one another?
Our initial approach for answering this question is inspired by our previous studies on translating DLMF equations to CAS [8].In order to verify a translation tool from a specific L A T E X dialect to Maple10 ., we performed symbolic and numeric evaluations on equations from the DLMF.Our approach presumes that a proven equation in a DML must be also valid in a CAS.In turn, a disparity in between the DML and CAS would lead to an issue in the translation process.However, assuming a correct translation, a disparity would also indicate an issue either in the DML source or the CAS implementation.In turn, we can take advantage of the same approach to improve and even verify DML with CAS and vice versa.Unfortunately, previous efforts to translate mathematical expressions from various formats, such as L A T E X [9,19,34], MathML [36], or OpenMath [22,35], to CAS syntax have shown that the translation will be the most critical part of this verification approach.
In this paper, we elaborate on the feasibility and limitations of the translation approach from DML to CAS as a possible answer to our research question.We further focus on the DLMF as our DML and the two general-purpose CAS Maple and Mathematica for this first study.This relatively sharp limitation is necessary in order to analyze the capabilities of the underlying approach to verify commercial CAS and large DML.The DLMF uses semantic macros internally in order to disambiguate mathematical expressions [32,40].These macros help to mitigate the open issue of retrieving sufficient semantic information from a context to perform translations to formal languages [19,36].Further, the DLMF and general-purpose CAS have a relatively large overlap in coverage of special functions and orthogonal polynomials.Since many of those functions play a crucial role in a large variety of different research fields, we focus in this study mainly on these functions.Lastly, we will take our previously developed translation tool L A CAST [9,19] as the baseline for translations from the DLMF to Maple.In this successor project, we focus on improving L A CAST to minimize the negative effect of wrong translations as much as possible for our study.In the future, other DML and CAS can be improved and verified following the same approach by using a different translation approach depending on the data of the DML, e.g., MathML [36] or OpenMath [22].
In particular, in this paper, we fix the majority of the remaining issues of L A CAST [8], which allows our tool to translate twice as many expressions from the DLMF to the CAS as before.Current extensions include the support for the mathematical operators: sum, product, limit, and integral, as well as overcoming semantic hurdles associated with Lagrange (prime) notations commonly used for differentiation.Further, we extend its support to include Mathematica using the freely available Wolfram Engine for Developers (WED) 11 (hereafter, with Mathematica, we refer to the WED).These improvements allow us to cover a larger portion of the DLMF, increase the reliability of the translations via L A CAST, and allow for comparisons between two major generalpurpose CAS for the first time, namely Maple and Mathematica.Finally, we provide open access to all the results contained within this paper, including all translations of DLMF formulae, an endpoint to L A CAST 12 , and the full source code of L A CAST 13 .
The paper is structured as follows.Section 2 explains the data in the DLMF.Section 3 focus on the improvements of L A CAST that had been made to make the translation as comprehensive and reliable as possible for the upcoming evaluation.Section 4 explains the symbolic and numeric evaluation pipeline.Since Cohl et al. [8] only briefly sketched the approach of a numeric evaluation, we will provide an in-depth discussion of that process in Section 4. Subsequently, we analyze the results in Section 5. Finally, we conclude the findings and provide an outlook for upcoming projects in Section 6.

Related Work
Existing verification techniques for CAS often focus on specific subroutines or functions [6,7,13,21,25,26,30,31], such as a specific theorems [28], differential equations [23], or the implementation of the math.hlibrary [29].Most common are verification approaches that rely on intermediate verification languages [6,21,23,25,26], such as Boogie [2,30] or Why3 [5,26], which, in turn, rely on proof assistants and theorem provers, such as Coq [4,6], Isabelle [23,33], or HOL Light [20,21,25].Kaliszyk and Wiedijk [25] proposed on entire new CAS which is built on top of the proof assistant HOL Light so that each simplification step can be proven by the underlying architecture.Lewis and Wester [31] manually compared the symbolic computations on polynomials and matrices with seven CAS. Aguirregabiria et al. [1] suggested to teach students the known traps and difficulties with evaluations in CAS instead to reduce the overreliance on computational solutions.
Cohl et al. [8] developed the aforementioned translation tool L A CAST, which translates expressions from a semantically enhanced L A T E X dialect to Maple.By evaluating the performance and accuracy of the translations, we were able to discover a sign-error in one the DLMF's equations [8].While the evaluation was not intended to verify the DLMF, the translations by the rule-based translator L A CAST provided sufficient robustness to identify issues in the underlying library.To the best of our knowledge, besides this related evaluation via L A CAST, there are no existing libraries or tools that allow for automatic verification of DML.

The DLMF dataset
In the modern era, most mathematical texts (handbooks, journal publications, magazines, monographs, treatises, proceedings, etc.) are written using the document preparation system L A T E X.However, the focus of L A T E X is for precise control of the rendering mechanics rather than for a semantic description of its content.In contrast, CAS syntax is coercively unambiguous in order to interpret the input correctly.Hence, a transformation tool from DML to CAS must disambiguate mathematical expressions.While there is an ongoing effort towards such a process [18,27,37,38,39,41], there is no reliable tool available to disambiguate mathematics sufficiently to date.
The DLMF contains numerous relations between functions and many other properties.It is written in L A T E X but uses specific semantic macros when applicable [40].These semantic macros represent a unique function or polynomial defined in the DLMF.Hence, the semantic L A T E X used in the DLMF is often unambiguous.For a successful evaluation via CAS, we also need to utilize all requirements of an equation, such as constraints, domains, or substitutions.The DLMF provides this additional data too and generally in a machine-readable form [40].This data is accessible via the i-boxes (information boxes next to an equation marked with the icon ).If the information is not given in the attached i-box or the information is incorrect, the translation via L A CAST would fail.The i-boxes, however, do not contain information about branch cuts (see Section B) or constraints.Constraints are accessible if they are directly attached to an equation.If they appear in the text (or even a title), L A CAST cannot utilize them.The test dataset, we are using, was generated from DLMF Version 1.1.3(2021-09-15) and contained 9,977 formulae with 1,505 defined symbols, 50,590 used symbols, 2,691 constraints, and 2,443 warnings for non-semantic expressions, i.e., expressions without semantic macros [40].Note that the DLMF does not provide access to the underlying L A T E X source.Therefore, we added the source of every equation to our result dataset.

Semantic L A T E X to CAS translation
The aforementioned translator L A CAST was developed by Cohl and Greiner-Petter et al. [8,9,19].They reported a coverage of 58.8% translations for a manually selected part of the DLMF to the CAS Maple.This version of L A CAST serves as a baseline for our improvements.In order to verify their translations, they used symbolic and numeric evaluations and reported a success rate of ∼16% for symbolic and ∼12% for numeric verifications.
Evaluating the baseline on the entire DLMF result in a coverage of only 31.6%.Hence, we first want to increase the coverage of L A CAST on the DLMF.To achieve this goal, we first increasing the number of translatable semantic macros by manually defining more translation patterns for special functions and orthogonal polynomials.For Maple, we increased the number from 201 to 261.For Mathematica, we define 279 new translation patterns which enables L A CAST to perform translations to Mathematica.Even though the DLMF uses 675 distinguished semantic macros, we cover ∼70% of all DLMF equations with our extended list of translation patterns (see Zipf's law for mathematical notations [17]).In addition, we implemented rules for translations that are applicable in the context of the DLMF, e.g., ignore ellipsis following floating-point values or \choose always refers to a binomial expression.Finally, we tackle the remaining issues outlined by Cohl et al. [8] which can be categorized into three groups: (i) expressions of which the arguments of operators are not clear, namely sums, products, integrals, and limits; (ii) expressions with prime symbols indicating differentiation; and (iii) expressions that contain ellipsis.While we solve some of the cases in Group (iii) by ignoring ellipsis following floating-point values, most of these cases remain unresolved.In the following, we elaborate our solutions for (i) in Section 3.1 and (ii) in Section 3.2.

Parse sums, products, integrals, and limits
Here we consider common notations for the sum, product, integral, and limit operators.For these operators, one may consider mathematically essential operator metadata (MEOM).For all these operators, the MEOM includes argument(s) and bound variable(s).The operators act on the arguments, which are themselves functions of the bound variable(s).For sums and products, the bound variables are referred to as indices.The bound variables for integrals 14 are called integration variables.For limits, the bound variables are continuous variables (for limits of continuous functions) and indices (for limits of sequences).For integrals, MEOM include precise descriptions of regions of integration (e.g., piecewise continuous paths/intervals/regions).For limits, MEOM include limit points (e.g., points in R n or C n for n∈N), as well as information related to whether the limit to the limit point is independent or dependent on the direction in which the limit is taken (e.g., one-sided limits).
For a translation of mathematical expressions involving the L A T E X commands \sum, \int, \prod, and \lim, we must extract the MEOM.This is achieved by (a) determining the argument of the operator and (b) parsing corresponding subscripts, superscripts, and arguments.For integrals, the MEOM may be complicated, but certainly contains the argument (function which will be integrated), bound (integration) variable(s) and details related to the region of integration.Bound variable extraction is usually straightforward since it is usually contained within a differential expression (infinitesimal, pushforward, differential 1-form, exterior derivative, measure, etc.), e.g., dx.Argument extraction is less straightforward since even though differential expressions are often given at the end of the argument, sometimes the differential expression appears in the numerator of a fraction (e.g., f(x)dx g(x) ).In which case, the argument is everything to the right of the \int (neglecting its subscripts and superscripts) up to and including the fraction involving the differential expression (which may be replaced with 1).In cases where the differential expression is fully to the right of the argument, then it is a termination symbol.Note that some scientists use an alternate notation for integrals where the differential expression appears immediately to the right of the integral, e.g., dxf(x).However, this notation does not appear in the DLMF.If such notations are encountered, we follow the same approach that we used for sums, products, and limits (see Section 3.1).

Extraction of variables and corresponding MEOM
The subscripts and superscripts of sums, products, limits, and integrals may be different for different notations and are therefore challenging to parse.For integrals, we extract the bound (integration) variable from the differential expression.For sums and products, the upper and lower bounds may appear in the subscript or superscript.Parsing subscripts is comparable with the problem of parsing constraints [8] (which are often not consistently formulated).We overcame this complexity by manually defining patterns of common constraints and refer to them as blueprints.This blueprint pattern approach allows L A CAST to identify the MEOM in the sub-and superscripts.A more detailed explanations with examples about the blueprints is available in the Appendix A.

Identification of operator arguments
Once we have extracted the bound variable for sums, products, and limits, we need to determine the end of the argument.We analyzed all sums in the DLMF and developed a heuristic that covers all the formulae in the DLMF and potentially a large portion of general mathematics.Let x be the extracted bound variable.For sums, we consider a summand as a part of the argument if (I) it is the very first summand after the operation; or (II) x is an element of the current summand; or (III) x is an element of the following summand (subsequent to the current summand) and there is no termination symbol between the current summand and the summand which contains x with an equal or lower depth according to the parse tree (i.e., closer to the root).We consider a summand as a single logical construct since addition and subtraction are granted a lower operator precedence than multiplication in mathematical expressions.Similarly, parentheses are granted higher precedence and, thus, a sequence wrapped in parentheses is part of the argument if it obeys the rules (I-III).Summands, and such sequences, are always entirely part of sums, products, and limits or entirely not.
A termination symbol always marks the end of the argument list.Termination symbols are relation symbols, e.g., =, =, ≤, closing parentheses or brackets, e.g., ), ], or >, and other operators with MEOMs, if and only if, they define the same bound variable.If x is part of a subsequent operation, then the following operator is considered as part of the argument (as in (II)).However, a special condition for termination symbols is that it is only a termination symbol for the current chain of arguments.Consider a sum over a fraction of sums.In that case, we may reach a termination symbol within the fraction.However, the termination symbol would be deeper inside the parse tree as compared to the current list of arguments.Hence, we used the depth to determine if a termination symbol should be recognized or not.Consider an unusual notation with the binomial coefficient as an example This equation contains two termination symbols, marked red and green.The red termination symbol = is obviously for the first sum on the left-hand side of the equation.The green termination symbol terminates the product to the left because both products run over the same bound variable m.In addition, none of the other = signs are termination symbols for the sum on the right-hand side of the equation because they are deeper in the parse tree and thus do not terminate the sum.
Note that varN in the blueprints also matches multiple bound variable, e.g., m,k∈A .In such cases, x from above is a list of bound variables and a summand is part of the argument if one of the elements of x is within this summand.Due to the translation, the operation will be split into two preceding operations, i.e., m,k∈A becomes m∈A k∈A .Figure 1 shows the extracted arguments for some example sums.The same rules apply for extraction of arguments for products and limits.

Lagrange's notation for differentiation and derivatives
Another remaining issue is the Lagrange (prime) notation for differentiation, since it does not outwardly provide sufficient semantic information.This notation presents two challenges.First, we do not know with respect to which variable the differentiation should be performed.Consider for example the Hurwitz zeta function ζ(s,a) [11, §25.11].In the case of a differentiation ζ (s,a), it is not clear if the function should be differentiated with respect to s or a.To remedy this issue, we analyzed all formulae in the DLMF which use prime notations and determined which variables (slots) for which functions represent the variables of the differentiation.Based on our analysis, we extended the translation patterns by meta information for semantic macros according to the slot of differentiation.For instance, in the case of the Hurwitz zeta function, the first slot is the slot for prime differentiation, i.e., ζ (s,a)= d ds ζ(s,a).The identified variables of differentiations for the special functions in the DLMF can be considered to be the standard slots of differentiations, e.g., in other DML, ζ (s,a) most likely refers to d ds ζ(s,a).The second challenge occurs if the slot of differentiation contains complex expressions rather than single symbols, e.g., ζ (s 2 ,a).In this case, . Since CAS often do not support derivatives with respect to complex expressions, we use the inbuilt substitution functions 15 in the CAS to overcome this issue.To do so, we use a temporary variable temp for the substitution.CAS perform substitutions from the inside to the outside.Hence, we can use the same temporary variable temp even for nested substitutions.Table 1 shows the translation performed for ζ (s 2 ,a).CAS may provide optional arguments to calculate the derivatives for certain special functions, e.g., Zeta(n,z,a) in Maple for the n-th derivative of the Hurwitz zeta function.However, this shorthand notation is generally not supported (e.g., Mathematica does not define such an optional parameter).Our substitution approach is more lengthy but also more reliable.Unfortunately, lengthy expressions generally harm the performance of CAS, especially for symbolic manipulations.Hence, we have a genuine interest in keeping translations short, straightforward and readable.Thus, the substitution translation pattern is only triggered if the variable of differentiation is not a single identifier.Note that this substitution only triggers on semantic macros.Generic functions, including prime notations, are still skipped.

System
A related problem to MEOM of sums, products, integrals, limits, and differentiations are the notations of derivatives.The semantic macro for derivatives \deriv{w}{x} (rendered as dw dx ) is often used with an empty first argument to render the function behind the derivative notation, e.g., \deriv{}{x}\sin@{x} for d dx sin x.This leads to the same problem we faced above for identifying MEOMs.In this case, we use the same heuristic as we did for sums, products, and limits.Note that derivatives may be written following the function argument, e.g., sin(x) d dx .If we are unable to identify any following summand that contains the variable of differentiation before we reach a termination symbol, we look for arguments prior to the derivative according to the heuristic (I-III).
Wronskians With the support of prime differentiation described above, we are also able to translate the Wronskian [11, (1.13.4)] to Maple and Mathematica.A translation requires one to identify the variable of differentiation from the elements of the Wronskian, e.g., z for W {Ai(z),Bi(z)} from [11, (9.2.7)].We analyzed all Wronskians in the DLMF and discovered that most Wronskians have a special function in its argument-such as the example above.Hence, we can use our previously inserted metadata information about the slots of differentiation to extract the variable of differentiation from the semantic macros.If the semantic macro argument is a complex expression, we search for the identifier in the arguments that appear in both elements of the Wronskian.For example, in W {Ai(z a ),ζ(z 2 ,a)}, we extract z as the variable since it is the only identifier that appears in the arguments z a and z 2 of the elements.This approach is also used when there is no semantic macro involved, i.e., from W {z a ,z 2 } we extract z as well.If L A CAST extracts multiple candidates or none, it throws a translation exception.

Case Filter
Fig. 2: The workflow of the evaluation engine and the overall results.Errors and abortions are not included.The generated dataset contains 9,977 equations.In total, the case analyzer splits the data into 10,930 cases of which 4,307 cases were filtered.This sums up to a set of 6,623 test cases in total.
For evaluating the DLMF with Maple and Mathematica, we follow the same approach as demonstrated in [8], i.e., we symbolically and numerically verify the equations in the DLMF with CAS.If a verification fails, symbolically and numerically, we identified an issue either in the DLMF, the CAS, or the verification pipeline.Note that an issue does not necessarily represent errors/bugs in the DLMF, CAS, or L A CAST (see the discussion about branch cuts in Section B). Figure 2 illustrates the pipeline of the evaluation engine.First, we analyze every equation in the DLMF (hereafter referred to as test cases).A case analyzer splits multiple relations in a single line into multiple test cases.Note that only the adjacent relations are considered, i.e., with f(z) = g(z) = h(z), we generate two test cases f(z) = g(z) and g(z) = h(z) but not f(z)=h(z).In addition, expressions with ± and ∓ are split accordingly, e.g., i ±i =e ∓π/2 [11, (4.4.12)] is split into i +i =e −π/2 and i −i =e +π/2 .The analyzer utilizes the attached additional information in each line, i.e., the URL in the DLMF, the used and defined symbols, and the constraints.If a used symbol is defined elsewhere in the DLMF, it performs substitutions.For example, the multi-equation [11, (9.6.2)] is split into six test cases and every ζ is replaced by 2  3 z 3/2 as defined in [11, (9.6.1)].The substitution is performed on the parse tree of expressions [19].A definition is only considered as such, if the defining symbol is identical to the equation's left-hand side.That means, z = ( 3 2 ζ) 3/2 [11, (9.6.10)] is not considered as a definition for ζ.Further, semantic macros are never substituted by their definitions.Translations for semantic macros are exclusively defined by the authors.For example, the equation [11, (11.5.2)] contains the Struve K ν (z) function.Since Mathematica does not contain this function, we defined an alternative translation to its definition H ν (z)−Y ν (z) in [11, (11.2.5)] with the Struve function H ν (z) and the Bessel function of the second kind Y ν (z), because both of these functions are supported by Mathematica.The second entry in Table 3 in the Appendix D shows the translation for this test case.
Next, the analyzer checks for additional constraints defined by the used symbols recursively.The mentioned Struve K ν (z) test case [11, (11.5.2)] contains the Gamma function.Since the definition of the Gamma function [11, (5.2.1)] has a constraint z > 0, the numeric evaluation must respect this constraint too.For this purpose, the case analyzer first tries to link the variables in constraints to the arguments of the functions.For example, the constraint z > 0 sets a constraint for the first argument z of the Gamma function.Next, we check all arguments in the actual test case at the same position.The test case contains Γ (ν+1/2).In turn, the variable z in the constraint of the definition of the Gamma function z >0 is replaced by the actual argument used in the test case.This adds the constraint (ν+1/2)>0 to the test case.This process is performed recursively.If a constraint does not contain any variable that is used in the final test case, the constraint is dropped.
In total, the case analyzer would identify four additional constraints for the test case [11, (11.5.2)].Table 3 in the Appendix D shows the applied constraints (including the directly attached constraint z >0 and the manually defined global constraints from Figure 3).Note that the constraints may contain variables that do not appear in the actual test case, such as ν+k+1>0.Such constraints do not have any effect on the evaluation because if a constraint cannot be computed to true or false, the constraint is ignored.Unfortunately, this recursive loading of additional constraints may generate impossible conditions in certain cases, such as |Γ (iy)| [11, (5.4.3)].There are no valid real values of y such that (iy)>0.In turn, every test value would be filtered out, and the numeric evaluation would not verify the equation.However, such cases are the minority and we were able to increase the number of correct evaluations with this feature.
To avoid a large portion of incorrect calculations, the analyzer filters the dataset before translating the test cases.We apply two filter rules to the case analyzer.First, we filter expressions that do not contain any semantic macros.Due to the limitations of L A CAST, these expressions most likely result in wrong translations.Further, it filters out several meaningless expressions that are not verifiable, such as z = x in [11, (4.2.4)].The result dataset flag these cases with 'Skipped -no semantic math'.Note that the result dataset still contains the translations for these cases to provide a complete picture of the DLMF.Second, we filter expressions that contain ellipsis 16(e.g., \cdots), approximations, and asymptotics (e.g., O(z 2 )) since those expressions cannot be evaluated with the proposed approach.Further, a definition is skipped if it is not a definition of a semantic macro, such as [11, (2.3.13)], because definitions without an appropriate counterpart in the CAS are meaningless to evaluate.Definitions of semantic macros, on the other hand, are of special interest and remain in the test set since they allow us to test if a function in the CAS obeys the actual mathematical definition in the DLMF.If the case analyzer (see Figure 2) is unable to detect a relation, i.e., split an expression on <, ≤, ≥, >, =, or =, the line in the dataset is also skipped because the evaluation approach relies on relations to test.After splitting multi-equations (e.g., ±, ∓, a = b = c), filtering out all non-semantic expressions, non-semantic macro definitions, ellipsis, approximations, and asymptotics, we end up with 6,623 test cases in total from the entire DLMF.
After generating the test case with all constraints, we translate the expression to the CAS representation.Every successfully translated test case is then symbolically verified, i.e., the CAS tries to simplify the difference of an equation to zero.Nonequation relations simplifies to Booleans.Non-simplified expressions are verified numerically for manually defined test values, i.e., we calculate actual numeric values for both sides of an equation and check their equivalence.

Symbolic Evaluation
The symbolic evaluation was performed for Maple as in [8].However, we use the newer version Maple 2020.Another feature we added to L A CAST is the support of packages in Maple.Some functions are only available in modules (packages) that must be preloaded, such as QPochhammer in the package QDifferenceEquations 17 .The general simplify method in Maple does not cover q-hypergeometric functions.Hence, whenever L A CAST loads functions from the q-hyper-geometric package, the better performing QSimplify method is used.With the WED and the new support for Mathematica in L A CAST, we perform the symbolic and numeric tests for Mathematica as well.The symbolic evaluation in Mathematica relies on the full simplification 18 .For Maple and Mathematica, we defined the global assumptions x,y ∈R and k,n,m∈N.Constraints of test cases are added to their assumptions to support simplification.Adding more global assumptions for symbolic computation generally harms the performance since CAS internally uses assumptions for simplifications.It turned out that by adding more custom assumptions, the number of successfully simplified expressions decreases.

Numerical Evaluation
Defining an accurate test set of values to analyze an equivalence can be an arbitrarily complex process.It would make sense that every expression is tested on specific values according to the containing functions.However, this laborious process is not suitable for evaluating the entire DML and CAS.It makes more sense to develop a general set of test values that (i) generally covers interesting domains and (ii) avoid singularities, branch cuts, and similar problematic regions.Considering these two attributes, we come up with the ten test points illustrated in Figure 3.It contains four complex values on the unit circle and six points on the real axis.The test values cover the general area of interest (complex values in all four quadrants, negative and positive real values) and avoid the typical singularities at {0,±1,±i}.In addition, several variables are tied to specific values for entire sections.Hence, we applied additional global constraints to the test cases.Special Test Values n,m,k, ,l,i,j, ,ε∈{1,2,3}

Global Constraints
x,α,β >0 −π <ph(z)<π x,y,a,b,c,r,s,t,α, The numeric evaluation engine heavily relies on the performance of extracting free variables from an expression.Unfortunately, the inbuilt functions in CAS, if available, are not very reliable.As the authors explained in [8], a custom algorithm within Maple was necessary to extract identifiers.Mathematica has the undocumented function Reduce'FreeVariables for this purpose.However, both systems, the custom solution in Maple and the inbuilt Mathematica function, have problems distinguishing free variables of entire expressions from the bound variables in MEOMs, e.g., integration and continuous variables.Mathematica sometimes does not extract a variable but returns the unevaluated input instead.We regularly faced this issue for integrals.However, we discovered one example without integrals.For EulerE[n,0] from [11, (24.4.26)], we expected to extract {n} as the set of free variables but instead received a set of the unevaluated expression itself {EulerE[n,0]} 19 .Since the extended version of L A CAST handles operators, including bound variables of MEOMs, we drop the use of internal methods in CAS and extend L A CAST to extract identifiers from an expression.During a translation process, L A CAST tags every single identifier as a variable, as long as it is not an element of a MEOM.This simple approach proves to be very efficient since it is implemented alongside the translation process itself and is already more powerful as compared to the existing inbuilt CAS solutions.We defined subscripts of identifiers as a part of the identifier, e.g., z 1 and z 2 are extracted as variables from z 1 +z 2 rather than z.
The general pipeline for a numeric evaluation works as follows.First, we replace all substitutions and extract the variables from the left-and right-hand sides of the test expression via L A CAST.For the previously mentioned example of the Struve function [11, (11.5.2)],L A CAST identifies two variables in the expression, ν and z.
According to the values in Figure 3, ν and z are set to the general ten values.A numeric test contains every combination of test values for all variables.Hence, we generate 100 test calculations for [11, (11.5.2)].Afterward, we filter the test values that violate the attached constraints.In the case of the Struve function, we end up with 25 test cases.
In addition, we apply a limit of 300 calculations for each test case and abort a computation after 30 seconds due to computational limitations.If the test case generates more than 300 test values, only the first 300 are used.Finally, we calculate the result for every remaining test value, i.e., we replace every variable by their value and calculate the result.The replacement is done by Mathematica's ReplaceAll method because the more appropriate method With, for unknown reasons, does not always replace all variables by their values.We wrap test expressions in Normal for numeric evaluations to avoid conditional expressions, which may cause incorrect calculations (see Section 5.1 for a more detailed discussion of conditional outputs).After replacing variables by their values, we trigger numeric computation.If the absolute value of the result (i.e., the difference between left-and right-hand side of the equation) is below the defined threshold of 0.001 or true (in the case of inequalities), the test calculation is considered successful.A numeric test case is only considered successful if and only if every test calculation was successful.If a numeric test case fails, we store the information on which values it failed and how many of these were successful.

Results
The translations to Maple and Mathematica, the symbolic results, the numeric computations, and an overview PDF of the reported bugs to Mathematica are available online on our demopage.In the following, we mainly focus on Mathematica because of page limitations and because Maple has been investigated more closely by [8].The results for Maple are also available online.Compared to the baseline (≈31%), our improvements doubled the amount translations (≈62%) for Maple and reach ≈71% for Mathematica.The majority of expressions that cannot be translated contain macros that have no adequate translation pattern to the CAS, such as the macros for interval Weierstrass lattice roots [11, §23.3(i)] and the multivariate hypergeometric function [11, (19.16.9)].Other errors (6% for Maple and Mathematica) occur for several reasons.For example, out of the 418 errors in translations to Mathematica, 130 caused an error because the MEOM of an operator could not be extracted, 86 contained prime notations that do not refer to differentiations, 92 failed because of the underlying L A T E X parser [39], and in 46 cases, the arguments of a DLMF macro could not be extracted.
Out of 4,713 translated expressions, 1,235 (26.2%) were successfully simplified by Mathematica (1,084 of 4,114 or 26.3% in Maple).For Mathematica, we also count results that are equal to 0 under certain conditions as successful (called ConditionalExpression).We identified 65 of these conditional results: 15 of the conditions are equal to constraints that were provided in the surrounding text but not in the info box of the DLMF equation; 30 were produced due to branch cut issues (see Section B); and 20 were the same as attached in the DLMF but reformulated, e.g., z ∈C\(1,∞) from [11, (25.12.2)] was reformulated to z =0∨ z <1.The remaining translated but not symbolically verified expressions were numerically evaluated for the test values in Figure 3.For the 3,474 cases, 784 (22.6%) were successfully verified numerically by Mathematica (698 of 2,618 or 26.7% by Maple20 ).For 1,784 the numeric evaluation failed.In the evaluation process, 655 computations timed out and 180 failed due to errors in Mathematica.Of the 1,784 failed cases, 691 failed partially, i.e., there was at least one successful calculation among the tested values.For 1,091 all test values failed.Table 3 in the Appendix D shows the results for three sample test cases.The first case is a false positive evaluation because of a wrong translation.The second case is valid, but the numeric evaluation failed due to a bug in Mathematica (see next subsection).The last example is valid and was verified numerically but was too complex for symbolic verifications.

Error Analysis
The numeric tests' performance strongly depends on the correct attached and utilized information.The first example in Table 3 in the Appendix D illustrates the difficulty of the task on a relatively easy case.Here, the argument of f was not explicitly given, such as in f(x).Hence, L A CAST translated f as a variable.Unfortunately, this resulted in a false verification symbolically and numerically.This type of error mostly appears in the first three chapters of the DLMF because they use generic functions frequently.We hoped to skip such cases by filtering expressions without semantic macros.Unfortunately, this derivative notation uses the semantic macro deriv.In the future, we filter expressions that contain semantic macros that are not linked to a special function or orthogonal polynomial.
As an attempt to investigate the reliability of the numeric test pipeline, we can run numeric evaluations on symbolically verified test cases.Since Mathematica already approved a translation symbolically, the numeric test should be successful if the pipeline is reliable.Of the 1,235 symbolically successful tests, only 94 (7.6%) failed numerically.None of the failed test cases failed entirely, i.e., for every test case, at least one test value was verified.Manually investigating the failed cases reveal 74 cases that failed due to an Indeterminate response from Mathematica and 5 returned infinity, which clearly indicates that the tested numeric values were invalid, e.g., due to testing on singularities.Of the remaining 15 cases, two were identical: [11, (15.9.2)] and [11, (18.5.9)].This reduces the remaining failed cases to 14.We evaluated invalid values for 12 of these because the constraints for the values were given in the surrounding text but not in the info boxes.The remaining 2 cases revealed a bug in Mathematica regarding conditional outputs (see below).The results indicate that the numeric test pipeline is reliable, at least for relatively simple cases that were previously symbolically verified.The main reason for the high number of failed numerical cases in the entire DLMF (1,784) are due to missing constraints in the i-boxes and branch cut issues (see Section B in the Appendix), i.e., we evaluated expressions on invalid values.
Bug reports Mathematica has trouble with certain integrals, which, by default, generate conditional outputs if applicable.With the method Normal, we can suppress conditional outputs.However, it only hides the condition rather than evaluating the expression to a non-conditional output.For example, integral expressions in [11, (10.9.1)] are automatically evaluated to the Bessel function J 0 (|z|) for the condition21 z ∈ R rather than J 0 (z) for all z ∈ C. Setting the GenerateConditions22 option to None does not change the output.Normal only hides z ∈ R but still returns J 0 (|z|).To fix this issue, for example in (10.9.1) and (10.9.4), we are forced to set GenerateConditions to false.
Setting GenerateConditions to false, on the other hand, reveals severe errors in several other cases.Consider [11, (8.4.4)], which gets evaluated to Γ(0,z) but (condition) for z > 0∧ z = 0.With GenerateConditions set to false, the integral incorrectly evaluates to Γ(0,z)+ln(z).This happened with the 2 cases mentioned above.With the same setting, the difference of the left-and right-hand sides of [11, (10.43.8)] is evaluated to 0.398942 for x, ν = 1.5.If we evaluate the same expression on x, ν = 3 2 the result is Indeterminate due to infinity.For this issue, one may use NIntegrate rather than Integrate to compute the integral.However, evaluating via NIntegrate decreases the number of successful numeric evaluations in general.We have revealed errors with conditional outputs in (8.4.4), (10.22.39),(10.43.8-10), and (11.5.2) (in [11]).In addition, we identified one critical error in Mathematica.For [11, (18.17.47)],WED (Mathematica's kernel) ran into a segmentation fault (core dumped) for n > 1.The kernel of the full version of Mathematica gracefully died without returning an output 23 .
Besides Mathematica, we also identified several issues in the DLMF.None of the newly identified issues were critical, such as the reported sign error from the previous project [8], but generally refer to missing or wrong attached semantic information.With the generated results, we can effectively fix these errors and further semantically enhance the DLMF.For example, some definitions are not marked as such, e.g., Q(z)= ∞ 0 e −zt q(t)dt [11, (2.4.2)].In [11, (10.24.4)], ν must be a real value but was linked to a complex parameter and x should be positive real.An entire group of cases [11, (10.19.10-11)] also discovered the incorrect use of semantic macros.In these formulae, P k (a) and Q k (a) are defined but had been incorrectly marked up as Legendre functions going all the way back to DLMF Version 1.0.0 (May 7, 2010).In some cases, equations are mistakenly marked as definitions, e.g., [11, (9.10.10)] and [11, (9.13.1)] are annotated as local definitions of n.We also identified an error in L A CAST, which incorrectly translated the exponential integrals E 1 (z), Ei(x) and Ein(z) (defined in [11, §6.2(i)]).A more explanatory overview of discovered, reported, and fixed issues in the DLMF, Mathematica, and Maple is provided in the Appendix C.

Conclusion
We have presented a novel approach to verify the theoretical digital mathematical library DLMF with the power of two major general-purpose computer algebra systems Maple and Mathematica.With L A CAST, we transformed the semantically enhanced L A T E X expressions from the DLMF to each CAS.Afterward, we symbolically and numerically evaluated the DLMF expressions in each CAS.Our results are auspicious and provide useful information to maintain and extend the DLMF efficiently.We further identified several errors in Mathematica, Maple [8], the DLMF, and the transformation tool L A CAST, proving the profit of the presented verification approach.Further, we provide open access to all results, including translations and evaluations 24 .and to the source code of L A CAST 25 .
The presented results show a promising step towards an answer for our initial research question.By translating an equation from a DML to a CAS, automatic verifications of that equation in the CAS allows us to detect issues in either the DML source or the CAS implementation.Each analyzed failed verification successively improves the DML or the CAS.Further, analyzing a large number of equations from the DML may be used to finally verify a CAS.In addition, the approach can be extended to cover other DML and CAS by exploiting different translation approaches, e.g., via MathML [36] or OpenMath [22].
Nonetheless, the analysis of the results, especially for an entire DML, is cumbersome.Minor missing semantic information, e.g., a missing constraint or not respected branch cut positions, leads to a relatively large number of false positives, i.e., unverified expressions correct in the DML and the CAS.This makes a generalization of the approach challenging because all semantics of an equation must be taken into account for a trustworthy evaluation.Furthermore, evaluating equations on a small number of discrete values will never provide sufficient confidence to verify a formula, which leads to an unpredictable number of true negatives, i.e., erroneous equations that pass all tests.A more sophisticated selection of critical values or other numeric tools with automatic results verification (such as variants of Newton's interval method) potentially mitigates this issue in the future.After all, we conclude that the approach provides valuable information to complement, improve, and maintain the DLMF, Maple, and Mathematica.A trustworthy verification, on the other hand, might be out of reach.

Future Work
The resulting dataset provides valuable information about the differences between CAS and the DLMF.These differences had not been largely studied in the past and are worthy of analysis.Especially a comprehensive and machine-readable list of branch cut positioning in different systems is a desired goal [10].Hence, we will continue to work closely together with the editors of the DLMF to improve further and expand the available information on the DLMF.Finally, the numeric evaluation approach would benefit from test values dependent on the actual functions involved.For example, the current layout of the test values was designed to avoid problematic regions, such as branch cuts.However, for identifying differences in the DLMF and CAS, especially for analyzing the positioning of branch cuts, an automatic evaluation of these particular values would be very beneficial and can be used to collect a comprehensive, inter-system library of branch cuts.Therefore, we will further study the possibility of linking semantic macros with numeric regions of interest.

B Why Branch Cuts Matter
Problems that we regularly faced during evaluation are issues related to multi-valued functions.Multi-valued functions map values from a domain to multiple values in a codomain and frequently appear in the complex analysis of elementary and special functions.Prominent examples are the inverse trigonometric functions, the complex logarithm, or the square root.A proper mathematical description of multi-valued functions requires the complex analysis of Riemann surfaces.Riemann surfaces are one-dimensional complex manifolds associated with a multi-valued function.One usually multiplies the complex domain into a many-layered covering space.The correct properties of multi-valued functions on the complex plane may no longer be valid by their counterpart functions on CAS, e.g., (1/z) w and 1/(z w ) for z,w ∈C and z =0.For example, consider z,w ∈C such that z =0.Then mathematically, (1/z) w always equals 1/(z w ) (when defined) for all points on the Riemann surface with fixed w.However, this should certainly not be assumed to be true in CAS, unless very specific assumptions are adopted (e.g., w ∈ Z,z > 0).For all modern CAS26 , this equation is not true.Try, for instance, w =1/2.Then (1/z) 1/2 −1/z 1/2 =0 on CAS, nor for w being any other rational non-integer number.
The resulting ranges of multi-valued functions are referred to as branches, and the curves which separate these branches are called branch cuts.The restricted range which is associated with the range typically adopted using real numbers, is often referred to as the principal branch.In order to compute multi-valued functions, CAS choose branch cuts for these functions so that they may evaluate them on their principal branches.Branch cuts may be positioned differently among CAS [10], e.g., arccot(− 1 2 ) ≈ 2.03 in Maple but is ≈ −1.11 in Mathematica.This is certainly not an error and is usually well documented for specific CAS [14,24].However, there is no central database that summarizes branch cuts in different CAS or DML.The DLMF as well, explains and defines their branch cuts carefully but does not carry the information within the info boxes of expressions.Due to complexity, it is rather easy to lose track of branch cut positioning and evaluate expressions on incorrect values.For example, consider the equation [11, (12.7.10)].A path of z(φ)=e iφ with φ∈[0,2π] would pass three different branch cuts.An accurate evaluation of the values of z(φ) in CAS require calculations on the three branches using analytic continuation.L A CAST and our evaluation frequently fall into the same trap by evaluating values that are no longer on the principal branch used by CAS.To solve this issue, we need to utilize branch cuts not only for every function but also for every equation in the DLMF [19].The positions of branch cuts are exclusively provided in the text but not in the i-boxes.Adding the information to each equation in the DLMF would be a laborious process because a branch cut position may change according to the used values (see the example [11, (12.7.10)] from above).Our result data, however, would provide beneficial information to update, extend, and maintain the DLMF, e.g., by adding the positions of the branch cuts for every function.

C Overview of Bug Reports and Discovered Issues
Throughout the development of L A CAST and especially during the research on this paper, we identified several issues in the DLMF, Maple, and Mathematica.Some of these issues are severe while most of them are minor problems.With this section, we want to take the opportunity to conclude the progress of L A CAST as a verification approach and summarize the more prominent issues we discovered over the time.Please note that some of these issues (especially in regard of the DLMF and Maple) have been reported before and even published in previous publications.

C.1 Digital Library of Mathematical Functions
Since L A CAST was always developed in collaboration with developers of the DLMF, numerous of minor fixes, tweaks, and updates have been implemented over the time.Most of them are not worth noting with a few exceptions.The first error in the DLMF that we discovered with the help of L A CAST [19] was the sign error in [11, (14.5.14)] ( This error also appeared in the original Handbook of Special Functions [15, p. 359] and was fixed with DLMF version 1.0.16 in September 2017.An entire group of equations [11, (10.19.10-11)] used semantic macros incorrectly and therefore yielded to wrong links and annotations visible in the attached information box next to the equation in the DLMF.In these formulae, P k (a) and Q k (a) are defined but had been incorrectly marked up as Legendre functions going all the way back to DLMF version 1.0.0.This error has been fixed due to our feedback with DLMF version 1.0.27 in June 2020.

C.2 Maple
Via L A CAST, we discovered a bug in Maple's 2016 simplify procedure.For the equation [11, (7.18.4)] where e is the base of the natural logarithm, erfc(z) is the complementary error function, and i n erfc(z) the repeated integrals of the complementary error function, L A CAST correctly generated the following translation: Redundant parentheses removed to improve readability.
Maple 2016 falsely returns 0 when we call the simplify procedure for the translated left-hand side of the equation.Maplesoft has confirmed this defect in Maple 2016 in private communications [19].Although an updated behavior occurred in Maple 2018 and 2020, the error still persists today.Maple version 2020.2 automatically evaluates the left-hand side of equation (3) to the rather complex expression where G m,n p,q z; a1,...,ap b1,...,bq is the Meijer G-function [11, (16.17 the incomplete Bell polynomials [3], and (x) n the Pochhammer's symbol [11, (5.2.4)].
For small n and z, the difference of left-and right-hand side of equation ( 3) is indeed almost zero up to the machine accuracy.For large absolute values of z, however, the difference increases quickly.

C.3 Mathematica
As we pointed out in Section 5.1, we discovered some trouble with integrals in Mathematica and confusing behavior with rational numbers.After discussing these cases with Mathematica developers, some of them have been confirmed as bugs.
Other cases, however, were the results of our testing methodology.First, we take a look at the confirmed errors.The most crucial report was about [11, (18.17.14)] For this equation, we calculated the difference of the left-and right-hand side as usual and computed numerical test values for this difference.In particular, we identified the four variables x, n, α, and µ.As described in Figure 3  .This resulted in 270 test value combinations which are further limited by the attached (local) constraints in the DLMF [11, (18.17.14)]: µ > 0,x > 0. Since x was already constraint to positive real values with our global constraints, the second local constraint has no additional effect.

Segmentation Fault Example in
We further identified errors in the variable extraction procedure in Mathematica.For example, for [11, (24.4.26)] we expected to extract just n as the free variable.We reduced the issue to a minimal working example just for the most left-hand side of the equation.
False Variable Extraction in Mathematica v. 12.0 This particular error was confirmed and has been fixed 30 .However, since the procedure Reduce'FreeVariables is not a publicly documented function in Mathematica, the method remain unstable.Especially in mathematical operators with bounded variables, such as sums, products, integrals, and limits, the procedure tend to generate inaccurate results.
In regard of the outlined issues with the GenerateConditions flag in integrals, most problematic cases were the result of using ReplaceAll to set numeric values for variables.Consider, for example [11, (10.43.8) 28 Case ID: 4664157 29 The fix was communicated to us via a new case ID: 4776927 30 Case ID: 4373302 First, L A CAST splits the expression in two test cases by resolving ± and ∓.For the first case, i.e., ± is replaced by + and ∓ by −, Mathematica automatically evaluates the test expression, i.e., the difference of left-and right-hand side of the equation: x with the GenerateConditions set to None for the integral to This happens because CAS automatically perform some computations on their inputs unless we prevent it (e.g., via Hold).However, evaluating this expression now on x, ν = 1.5 returns 0.398942 rather than the expected zero.For x, ν = 3 2 , the return value is infinity (i.e., indeterminate).The issue was acknowledged by the developers, who explained that the last term causes the behaviour, because is Infinity/Infinity for ν =3/2.A workaround to this issue is to use Limit rather than ReplaceAll to evaluate the expression on specific values.
Limit Workaround for equation 9 To the best of our knowledge, this issue still persists 31 .If this behavior is intended (or even desired) is up for debate.Yet, it is another characteristic of CAS to keep track of.The same workaround was suggested for [11, (11.5.2)] and [11, (11.5.8-10)].In case of [11, (10.9.1)] the right-hand side of the equation was evaluated to BesselJ[0,z] by Mathematica for GenerateConditions -> False.Which is correct.However, without this flag (or set to None), Mathematica returns BesselJ[0, Abs[z]] if z ∈R.While confusing at the first glance, the output is not particularly wrong.Since J 0 (z) is even in the second argument and along the Real line, the absolute value is simply redundant.
In case of [11, (8.4 we have not received any feedback from the developers.For the difference between left-and right-hand side of the first equation Mathematica conditionally returns 0 if (z) > 0 and (z) = 0. We would expect 0 without conditions.Setting the problematic GenerateConditions to False returns −ln(z).

Second Example of Conditional Flag Influence in
We noticed that another initial computation hook on the input could cause the issue.For example, if we prevent instant evaluations on the input via Hold and evaluate the expression on z = i, Mathematica returns 0. + 0.i.Without Hold, an evaluation on the same value returns undefined.

D Evaluation Tables
In this section, we provide three additional tables for our evaluation and translation results.Table 3, provides three examples of our evaluations on the DLMF with different degrees of complexity.The first entry [11, (1.4.8)], for example, illustrates the difficulty of translating formulae from L A T E X to CAS syntaxes even on a semantically enriched dataset like the DLMF.Often, the arguments of a function in derivative notations are omitted since they can be deduced from the variable of differentiation.For example, d 2 f dx 2 the argument of the function f(x) is omitted.However, in this case L A CAST is unable to correctly interpret f as a function and presumed it to be a variable.Unfortunately, not only caused this error a wrong translation but also produced a false positive evaluation because the symbolic simplification returned 0=0 for The other two examples in Table 3, even though more complex, illustrate the capability of L A CAST and our evaluation pipeline.Additionally, Table 5 and 6 show the number of translated and evaluated expressions for each chapter of the DLMF.For reference, Table 4 shows the full name of each chapter and the total number of displayed formulae according to the released dataset by Youssef and Miller [40].The actual number of functions may vary compared to Table 5, because Youssef and Miller did not split multi-equations, and ± or ∓.Hence, our final dataset consists of 10,930 formulae.Additionally, as described in our paper, we filter out non-semantic expressions, non-semantic macro definitions, ellipsis, approximations, and asymptotics.We ended up with 6,623 test cases.A more comprehensive table and all data is available at https://lacast.wmflabs.org.For overview reasons, Table 6 only shows the results for translations to Mathematica.For Maple, see our website.Table 3: The table shows three sample cases of our evaluation pipeline from the DLMF.The translation shows the performed translations to Mathematica.The numeric column contains the number of successfully computed test cases.The constraints column contains all applied constraints including global constraints from Figure 3.

Fig. 3 :
Fig. 3: The ten numeric test values in the complex plane for general variables.The dashed line represents the unit circle |z| = 1.At the right, we show the set of values for special variable values and general global constraints.On the right, i is referring to a generic variable and not to the imaginary unit.

Table 1 :
Example translations for the prime derivative of the Hurwitz zeta function with respect to s 2 .
(Section 4.2) in the paper, n is defined as a special variable bind to the numeric values {1,2,3}, x and α are positive real values of our general test values, i.e., x,α∈{1

Table 5 :
Overview of translations for DLMF chapters.Table headings are 2C: 2-letter chapter codes; C#: chapter numbers; F: number of formulae; T old : number of translated expressions using old translator; T Map , T Math : number of translations with improved translator-Map for Maple and Math for Mathematica; M Map , M Math : number of failed translations due to missing macro translation; E: number of other errors in the translation process.Best five performances are colored.Chapter codes are linked with our result page https://lacast.wmflabs.org.

Table 6 :
Overview of symbolic and numeric evaluations for DLMF chapters as in Table 5.Table headings are T Math : number of successfully translations to Mathetmatica; S success , S fail : number of successful and failed symbolic verifications (for translated expressions only) respectively; N success , N fail : number of successful and remaining failed numeric (for failed symbolical tests only) respectively.P, T: number of partial (at least one test was successful) and total failed numeric tests.Best five performances are colored.Chapter codes are linked with our result page https://lacast.wmflabs.org.