Every proper software requires testing, and so is Owl. All too often, we have found that testing can help us discover potential errors we failed to notice during development. In this chapter, we briefly introduce the philosophy of testing in Owl, the tool we use for conducting the unit test, and examples to demonstrate how to write unit tests. Issues such as using functors in tests and other things to notice in writing test code for Owl, etc. are also discussed in this chapter.

11.1 Unit Test

There are multiple ways to perform test on your code. One common way is to use assertion or catching/raising errors in the code. These kinds of tests are useful, but the testing code is mixed with the function code itself, while we need separate test modules that check the implementation of functions against expected behaviors.

In Owl, we apply a unit test to ensure the correctness of numerical routines as much as possible. A unit test is a software test method that checks the behavior of individual units in the code. In our case, the “unit” often means a single numerical function.

There is an approach of software development that is called test-driven development, where you write test code even before you implement the function to be tested itself. Though we don’t enforce such approach, there are certain testing principles we follow during the development of Owl. For example, we generally don’t trust code that is not tested, so in a GitHub pull request, it is always a good practice to accompany your implementation with a unit test in the test/ directory in the source code. Besides, try to keep the function short and simple, so that a test case can focus on a certain aspect.

We use the alcotest framework for testing in Owl. It is a lightweight test framework with simple interfaces. It exposes a simple TESTABLE module type, a check function to assert test predicates, and a run function to perform a list of unit -> unit test callbacks.

11.2 Example

Let’s look at an example of using alcotest in Owl. Suppose you have implemented some functions in the linear algebra module, including the functions such as rank, determinant, inversion, etc., and try to test them before making a pull request. The testing code can be included in one test unit, and each unit consists of four major sections.

In the first section, we define some utility function and common constants which will be used in the unit. For this example, we specify the required precision and some predefined input data. Here, we use 1e-6 as the precision threshold. Two ndarrays are deemed the same if the sum of their difference is less than 1e-6, as shown in mpow. The predefined input data can also be defined in each test case, as in is_triu_1.

open Owl open Alcotest module M = Owl.Linalg.D (* Section #1 *) let approx_equal a b =   let eps = 1e-6 in   Stdlib.(abs_float (a -. b) < eps) let x0 = Mat.sequential ~a:1. 1 6

The second section is the core. It contains the actual testing logic, for example, whether the det function can correctly calculate the determinant of a given matrix. Every testing function defined in the To_test module has self-contained logic to validate the implementation of the target function.

(* Section #2 *) module To_test = struct   let rank () =     let x = Mat.sequential 4 4 in     M.rank x = 2   let det () =     let x = Mat.hadamard 4 in     M.det x = 16.   let vecnorm_01 () =     let a = M.vecnorm ~p:1. x0 in     approx_equal a 21.   let vecnorm_02 () =     let a = M.vecnorm ~p:2. x0 in     approx_equal a 9.539392014169456   let is_triu_1 () =     let x = Mat.of_array [| 1.; 2.; 3.; 0.;       5.; 6.; 0.; 0.; 9. |] 3 3 in     M.is_triu x = true   let mpow () =     let x = Mat.uniform 4 4 in     let y = M.mpow x 3. in     let z = Mat.(dot x (dot x x)) in     approx_equal Mat.(y - z |> sum') 0. end

The most common test function used in Owl has the type unit -> bool. The idea is that each test function compares a certain aspect of a function with expected results. If there are multiple test cases for the same function, such as the case in vecnorm, we tend to build different test cases instead of using one large test function to include all the cases. The common pattern of these functions can be summarized as

let test_func () =     let expected = expected_value in     let result = func args in     assert (expected = result)

It is important to understand that the equal sign does not necessarily mean the two values have to be the same; in fact, if the floating-point number is involved, which is quite often the case, we only need the two values to be approximately equal within an error range. If that’s the case, you need to pay attention to which precision you are using: double or float. The same threshold might be enough for a single-precision float number, but could still be a large error for double-precision computation.

The third section contains mostly boilerplate code which notifies the testing framework two important pieces of information. The first one is the name of the testing function, so the testing framework can store in its log for post-analysis. The second one is the anticipated result which will be used by the testing framework to check against the outcome of the testing function. If the outcome does not match the expected result, the testing fails and the failure will be logged.

Here, we expect all the test functions to return true, though alcotest does support testing returning a lot of other types such as string, int, etc. Please refer to its source file for more details.

(* Section #3 *) let rank () =   Alcotest.(check bool) "rank" true (To_test.rank ()) let det () =   Alcotest.(check bool) "det" true (To_test.det ()) let vecnorm_01 () =   Alcotest.(check bool) "vecnorm_01" true (To_test.vecnorm_01 ()) let vecnorm_02 () =   Alcotest.(check bool) "vecnorm_02" true (To_test.vecnorm_02 ()) let is_triu_1 () =   Alcotest.(check bool) "is_triu_1" true (To_test.is_triu_1 ()) let mpow () =   Alcotest.(check bool) "mpow" true (To_test.mpow ())

Now let us look at the last section of a test unit.

(* Section #4 *) let test_set =   [ "rank", `Slow, rank   ; "det", `Slow, det   ; "vecnorm_01", `Slow, vecnorm_01   ; "vecnorm_02", `Slow, vecnorm_02   ; "is_triu_1", `Slow, is_triu_1   ; "mpow", `Slow, mpow ]

In the final section, we take functions from section 3 and put them into a list of test set. The test set specifies the name and mode of the test. The test mode is either Quick or Slow. Quick tests run on any invocations of the test suite. Slow tests are for stress tests that run only on occasion, typically before a release or after a major change. We can further specify the execution order of these testing functions.

After this step, the whole file is named unit_linalg.ml and put under the test/ directory, as with all other unit test files. Now the only thing left is to add it in the test_runner.ml:

let () =   Alcotest.run     "Owl"     [ "stats_rvs", Unit_stats_rvs.test_set     ; "maths", Unit_maths.test_set     ; "linear algebra", Unit_linalg.test_set     ...     ; "conv3d_mec", Unit_conv_mec_naive.Conv3D_MEC.test_set     ; "conv2d_naive", Unit_conv_mec_naive.Conv2D_NAIVE.test_set     ; "conv3d_naive", Unit_conv_mec_naive.Conv3D_NAIVE.test_set     ; "dilated_conv2d", Unit_dilated_conv2d.test_set     ; "dilated_conv3d", Unit_dilated_conv3d.test_set     ; "base: algodiff diff", Unit_base_algodiff_diff.test_set     ; "base: algodiff grad", Unit_base_algodiff_grad.test_set     ; "base: slicing basic", Unit_base_slicing_basic.test_set     ; "base: pooling2d", Unit_base_pool2d.test_set     ; "base: pooling3d", Unit_base_pool3d.test_set     ... ]

That’s all. Now you can try make test and check if the functions are implemented well. The compilation result is shown in Figure 11-1. It shows that all tests are successful.

Figure 11-1
An image of all tests passed represents the running of a set of codes and the output is a successful test.

All tests passed

What if one of the test functions does not pass? Let’s intentionally make a failing test, such as asserting the matrix in the rank test equals 1 instead of the correct answer 2, and run the test again.

As we can see in Figure 11-2, the failure was detected and logged directly onto the standard output.

Figure 11-2
An image displays the running of a set of codes that detect the failure of a test by displaying an error message.

Error in tests

11.3 What Could Go Wrong

“Who’s watching the watchers?” Beware that the test code itself is still code and thus can also be wrong. We need to be careful in implementing the testing code. There are certain cases that you may want to check.

11.3.1 Corner Cases

Corner cases involve situations that occur outside of normal operating parameters. That is obvious in the testing of convolution operations. As the core operation in deep neural networks, convolution is complex: it contains input, kernel, strides, padding, etc. as parameters. Therefore, special cases such as 1x1 kernel, strides of different height and width, etc. are tested in various combinations, sometimes with different input data.

module To_test_conv2d_back_input = struct     (* conv2D, 1x1 kernel *)     let fun00 () =       let expected =         [| 30.0; 36.0; 42.0; 66.0; 81.0; 96.0; 102.0          ; 126.0; 150.0; 138.0; 171.0; 204.0          ; 174.0; 216.0; 258.0; 210.0; 261.0; 312.0 |]       in       verify_value test_conv2d         [| 1; 2; 3; 3 |] [| 1; 1; 3; 3 |] [| 1; 1 |]         VALID expected     (* conv2D, 1x2 kernel, stride 3. width 5 *)     let fun01 () =       let expected =         [| 2271.0; 2367.0; 2463.0          ; 2901.0; 3033.0; 3165.0 |]       in       verify_value test_conv2d         [| 1; 2; 3; 3 |] [| 2; 2; 3; 3 |] [| 1; 1 |]         VALID expected     (* conv2D, 1x2 kernel, stride 3, width 6 *)     let fun02 () = ...     (* conv2D, 1x2 kernel, stride 3, width 7 *)     let fun03 () = ...     (* conv2D, 2x2 kernel, padding: Same *)     let fun04 () = ...     ...     (* conv2D, 2x2 kernel, stride 2, padding: Same *)     let fun09 () = ...

11.3.2 Test Coverage

Another issue is test coverage. It means the percentage of code for which an associated test has existed. Though we don’t seek a strict 100% coverage for now, wider test coverage is always a good idea. For example, in our implementation of the repeat operation, depending on whether the given axes contain one or multiple integers, the implementation changes. Therefore, in the test functions, it is crucial to cover both cases.

11.4 Use Functor

Note that we can still benefit from all the powerful features in OCaml such as the functor. For example, in testing the convolution operations, we would like to test the implementation of both that in the core library (which is implemented in C) and that in the base library (in pure OCaml). Apparently, there is no need to write the same unit test code twice for these two sets of implementation. To solve that problem, we have a test file unit_conv2d_genericl.ml that has a large module that contains all the previous four sections:

module Make (N : Ndarray_Algodiff with type elt = float) = struct     (* Section #1 - #4 *)     ... end

And in the specific testing file for core implementation unit_conv2d.ml, it simply contains one line of code:

include Unit_conv2d_generic.Make (Owl_algodiff_primal_ops.S)

Or in the test file for the base library unit_base_conv2d.ml:

include Unit_conv2d_generic.Make (Owl_base_algodiff_primal_ops.S)

11.5 Performance Tests

For a numerical library, being able to calculate correct results is not enough. How fast a function can calculate also matters; actually, it matters a lot in modern real-time data analysis, which has wide applications in many fields such finance, robotics, flight control, etc. In addition to correctness, a performance test is also included in the Owl testing framework. The following simple generic function runs a target function for certain amount of times, then calculates the average speed:

(* test one operation c times, output the mean time *) let test_op s c op =   let ttime = ref 0. in   for i = 1 to c do     Gc.compact ();     let t0 = Unix.gettimeofday () in     let _ = op () in     let t1 = Unix.gettimeofday () in     ttime := !ttime +. (t1 -. t0)   done;   ttime := !ttime /. (float_of_int c);   Printf.printf "| %s :\t %.8fs \n" s !ttime;   flush stdout

This function for testing each operation is similar but prints out the traces more eagerly in every iteration:

(* test one operation c time, output the used time in each evaluation *) let test_op_each c op =   Printf.printf "| test some fun %i times\n" c;   let ttime = ref 0. in   for i = 1 to c do     Gc.compact ();     let t0 = Unix.gettimeofday () in     let _ = op () in     let t1 = Unix.gettimeofday () in     Printf.printf "| #%0i\t:\t %.8fs \n" i (t1 -. t0);     flush stdout;     ttime := !ttime +. (t1 -. t0)   done;   ttime := !ttime /. (float_of_int c);   Printf.printf "| avg.\t:\t %.8fs \n" !ttime

With these two generic functions, we can write up a list of tests very quickly. An example that tests the execution time of various matrix operations is shown as follows:

let _ =   Random.self_init ();   let m, n = 5000, 20000 and c = 1 in   print_endline (String.make 60 '+');   Printf.printf "| test matrix size: %i x %i    exps: %i\n" m n c;   print_endline (String.make 60 '-');   let x, y = (M.uniform Float64 m n), (M.uniform Float64 m n) in   test_op "empty             " c (fun () -> M.empty Float64 m n);   test_op "zeros             " c (fun () -> M.zeros Float64 m n);   test_op "col               " c (fun () -> M.col x (n-1));   test_op "row               " c (fun () -> M.row x (m-1));   test_op "cols              " c (fun () -> M.cols x [|1;2|]);   test_op "rows              " c (fun () -> M.rows x [|1;2|]);   test_op "map               " c (fun () -> M.map (fun y -> 0.) x);   ...

11.6 Summary

In this chapter, we briefly introduced how the unit tests are performed with the alcotest framework in the existing Owl codebase. We used one example piece of test code for the linear algebra module in Owl to demonstrate the general structure of the Owl test code. We then discussed some tips we find helpful in writing tests, such as considering corner cases, test coverage, and using functors to simplify the test code. In practice, we find the unit tests come really handy in development, and we just cannot have too much of them.