figure a
figure b

1 Introduction

Tensor Shape Checking and Its Difficulties. Tensor shape mismatch is one of the common sources of dynamic errors in programs using tensors (i.e., multi-dimensional arrays). For example, the reshape operation of tensors takes a tensor x and an integer list S and returns a new tensor of the shape S obtained by realigning the elements in x. The input and output tensors must have the same number of elements; a tensor of shape [2; 3; 4]Footnote 1 can be reshaped into a shape [3; 2; 4], while trying to reshape it into [3; 4] results in a runtime error.

Early detection of tensor shape mismatch errors is critical in particular for deep learning programs, where tensors are frequently used. Since deep learning programs often take a considerable amount of time to train networks, it is often the case that a program takes hours and days to compute the weights of deep neural networks only to be terminated by one tensor shape mismatch error, throwing away the trained weights. Even worse, some tensor shape mismatches can be harder to notice: mixing up the height and the width of square images does not raise runtime errors but degrades the performance of the neural network.

The existing work on static detection of tensor shape mismatch errors can be classified into two categories. One is the whole-program analysis approach [17, 31], which collects tensor shape information by partially evaluating the program in the style of abstract interpretation. The other is the type-based approach [3, 25], which expresses the shapes of tensors as a part of the type information. Still, none of them is fully satisfactory: either they are too conservative and reject valid programs, or fail to detect some shape mismatch errors.

This paper pursuits the type-based approach as it is expected to provide modular detection of tensor shape inconsistencies. Designing an appropriate type system and a type inference procedure to reason about tensor shapes is challenging because shapes are first-class objects. For example, the library function Tensor.zeros of OCaml-Torch [4] (which provides OCaml bindings for libtorch [20]) takes a list S of integers, and returns a new tensor whose shape is S. Thus, we have to work with dependent types: Tensor.zeros would be given the type \( S\mathbin {:}{ \texttt {int list}} \rightarrow \{r:{ \texttt {tensor}}\mid r.{ \texttt {shape}}=S\}\). It is difficult to infer such dependent (refinement) types fully automatically. Yet, we wish to avoid programmers’ burden of writing too many type annotations.

Another difficulty is that shape constraints can be so complex that even type checking, let alone inference, can be too costly or impossible. For instance, the reshape operation explained earlier needs the proof that the shape of the input tensor x is compatible with the given shape \(S=[s_1;\ldots ;s_n]\) (i.e., if the shape of x is to be \([s_1';\ldots ;s_m']\), then \(\varPi _{i=1}^m s_i' = \varPi _{i=1}^n s_i\) holds)Footnote 2. Thus, type checking requires complex reasoning about (non-linear) integer arithmetic and lists.

Overview of Our Approach. Based on the observations above, we propose an approach that is expected to work well in practice despite the above-mentioned difficulties. Our approach can be characterized by three main features: best-effort type inference, hybrid type checking, and gradual typing [27]. We explain them using our prototype tool GraTenFootnote 3.

Best-Effort Type Inference. GraTen does not try to infer the most general types; it performs type/shape inference in a best-effort manner. Thanks to this design choice, GraTen works even if no type annotations are provided (despite that the underlying type system involves dependent types), and yet it can statically detect (not necessarily all but) some shape mismatch errors.

Fig. 1.
figure 1

An OCaml program written with OCaml-Torch.

As an example, let us consider the program in Figure 1. The function model takes an integer parameter s, defines functions f and g, and returns a layer (which is a function that takes a tensor and returns a tensor) which composes f and g. The definitions of f and g are omitted here, but their types are assumed as below, where s in the type of f is the argument of model and the function \({ \texttt {nth}}(n,S)\) returns the n-th element of the list S (the index starts with 0).

$$\begin{aligned} { \texttt {f}}:&\,\, x{:}\{ \nu :{ \texttt {tensor}}\mid { \texttt {len}}(\nu .{ \texttt {shape}}) = 1 \} \rightarrow { \texttt {tensor}}\left( \left[ { \texttt {nth}}(0,x.{ \texttt {shape}})/{ \texttt {s}}\right] \right) \\ { \texttt {g}}:&\,\, { \texttt {tensor}}([10]) \rightarrow { \texttt {tensor}}([1]) \end{aligned}$$

These types indicate that f takes a 1-dimensional tensor (i.e., a vector) and returns a vector whose length equals the length of the argument vector divided by s, and that g expects a vector of length 10 and returns a vector of length 1. The formal syntax of types will be introduced later in Section 2.

For the program above, GraTen’s best-effort inference outputs the following type for the function model.

$$ s{:}{ \texttt {int}}\rightarrow x{:}\left\{ \nu {:}{ \texttt {tensor}}\mid { \texttt {len}}(\nu .{ \texttt {shape}}) = 1 \wedge { \texttt {nth}}(0,\nu .{ \texttt {shape}})/{ \texttt {s}} = 10 \right\} \rightarrow { \texttt {tensor}}([1]) $$

Here, the constraint \({{ \texttt {nth}}(0,\nu .{ \texttt {shape}})}/{{ \texttt {s}}}=10\) for the shape of x is necessary for this program not to raise a shape mismatch error at the application of g. The inferred type of model is used to prevent any calls to model that violate the constraint. Indeed, GraTen rejects the call on line 4 of Figure 1, where the arguments do not satisfy the constraint \(\frac{{ \texttt {nth}}(0,\nu .{ \texttt {shape}})}{{ \texttt {s}}} = 10\). As in this example, our approach can statically detect shape mismatches when enough type information has been obtained from the best-effort type inference or user-provided type annotations.

Fig. 2.
figure 2

The program from Figure 1 with small modification.

Fig. 3.
figure 3

The program returned by GraTen given the program in Figure 2.

Hybrid Type Checking. Another main feature of our approach is hybrid type checking: we combine static and dynamic checking. The type checker inserts assertions to program points where the type safety is not statically guaranteed, à la Knowles and Flanagan’s hybrid type checking [16]. For example, consider the program in Figure 2, which is obtained by adding a conditional branch to the one in Figure 1. The type of the then and else branch of the if expression are inferred to be \({ \texttt {tensor}}({ \texttt {x}}.{ \texttt {shape}})\) and \({ \texttt {tensor}}([\frac{{ \texttt {nth}}(0,{ \texttt {x}}.{ \texttt {shape}})}{{ \texttt {s}}}])\), respectively. In this case, the type of y is simply inferred to be \({ \texttt {tensor}}\) without any information about its shape, and the inferred type for model is as follows.

$$ s{:}{ \texttt {int}}\rightarrow x{:}\{ \nu :{ \texttt {tensor}}\mid { \texttt {len}}(\nu .{ \texttt {shape}}) = 1 \} \rightarrow { \texttt {tensor}}([1]) $$

Thus, the best-effort inference of GraTen fails to capture the constraint \(\frac{{ \texttt {nth}}(0,\nu .{ \texttt {shape}})}{s}=10\) for x due to the imprecise type information of y. Along with the inferred types, GraTen outputs the program in Figure 3, which is the same as the original program except for the assertion inserted at the argument of g. Since the statically inferred type of y fails to guarantee that the application of g to y does not leads to a shape mismatch error, GraTen inserts the assertion to check the requirement dynamically.

Fig. 4.
figure 4

The program from Figure 2 after adding type annotations.

Gradual Typing. Lastly, our approach incorporates gradual typing [27]Footnote 4 so that the users can improve the precision of inferred types by adding type annotations. For example, let us consider the program in Figure 4, which is obtained from the one in Figure 2 by adding a type annotation to y. With this annotation, GraTen infers the same type for model as it did for model in Figure 1, and no assertions are inserted. As such, adding correct type annotations improves the type checking and decreases the number of assertions inserted.

Thanks to the best-effort inference, users need not add type annotations to everywhere in the program. They can focus on the program points where the static inference did not perform well, which is indicated by the insertion of assertions. We prove that our type system satisfies the gradual guarantee [27], which ensures that adding type annotation preserves the type-ability and the behavior of the program (with some assertions inserted) regardless of its precision, as long as the annotation does not disagree with the program.

Among the three features, the notion of hybrid type checking was first proposed by Knowles and Flanagan [16], and our gradual typing is closely related to gradual refinement types by [18], but we believe that the particular combination of three features is new. In particular, unlike the original gradual refinement types [18], we insert assertions instead of carrying around evidence terms [11] in the reduction to guarantee type safety.

The contributions are summarized as follows. (i) The formalization of a type system that combines hybrid type checking and gradual typing. We define our type system as the type-based transformation relation from source programs to programs with run-time assertion checks. We prove the soundness of our type system as well (Section 2). (ii) A proof that our system satisfies the gradual guarantee [27] (Section 3). (iii) Implementation of a best-effort type inference on a prototype system GraTen inference (Section 4). (iv) Experimental evaluation of GraTen using the examples of deep learning programs bundled in the OCaml-Torch library. We confirm that GraTen can statically type-check the programs effectively with a reasonable amount of type annotations (Section 5).

2 A Gradually-Typed Language with Refinement Types

In this section, we formalize our type system and the translation to insert assertions. We first introduce the source and target languages of the translation in Sections 2.1 and 2.2. We then formalize the type system and the translation and prove their soundness in Section 2.3. The gradual guarantee is discussed later in Section 3.

2.1 Source Language

We consider a call-by-value functional language, whose syntax is given in Figure 5. Throughout this paper, \(n\), \(c\), and \(x\) respectively denote integers, constants (including integers and primitive functions) and variables. The base types B and refinement predicates \(\varphi \) are explained later.

Fig. 5.
figure 5

Syntax of the source language, the types and the type environments.

Type annotations can be added to the function arguments \(\lambda x{:}\tau .M\), recursive functions \({ \texttt {fix}}(f{:}(x{:}\tau _1\mathbin {\rightarrow }\tau _2),x,M)\) and to arbitrary expressions by \((M\mathbin {:}\tau )\). In the implementation of GraTen, users may omit the type annotations in lambda expressions and recursive functions as the best-effort type inference tries to complete them.

The argument of a function application and the branching condition of an if-expression are restricted to variables for the sake of simplicity of typing rules. Note that this restriction does not lose generality, as a general function application \(M_1 \, M_2\) can be normalized to \({ \texttt {let}}\ f=M_1\ { \texttt {in}}\ { \texttt {let}}\ x=M_2\ { \texttt {in}}\ f\,x\).

Types are defined following the standard definition of refinement types. Intuitively, the type \(\{x\mathbin {:}B\mid \varphi \}\) describes a value \(x\) of type \(B\) such that \(\varphi \) holds. For example, \(\{x\mathbin {:}{ \texttt {int}}\mid x \ge 0\}\) is the type of non-negative ints. We may omit the refinement predicates when they are true. For example, we may write \(\{ x\mathbin {:}{ \texttt {int}}\mid { \texttt {true}} \}\) as \({ \texttt {int}}\).

The language presented so far is general; in GraTen it is instantiated to a language for tensor programs by defining the base types and refinement predicates as in Figure 6, and assuming that primitive operations on tensors are included in the set of constants ranged over \(c\). The refinement predicates, shapes and sizes are expressions of type \({ \texttt {bool}}\), \({ \texttt {int}}\ { \texttt {list}}\) and \({ \texttt {int}}\) respectively. The supported predicates are those described by quantifier-free formulas of first-order logic. As shown in the definition, they may use some built-in predicates and functions over integer lists such as append and primitives on integer arithmetic in order to express common tensor operations. We implicitly assume that the refinement predicates are well formed (as defined in the full version [13]).

Fig. 6.
figure 6

Syntax of base types B and predicates \(\varphi \) in GraTen.

2.2 Target Language

Fig. 7.
figure 7

Syntax of the target language.

As explained in Section 1, we insert run-time checks into places where type-safety cannot be statically guaranteed. Figure 7 shows the syntax of programs obtained by the insertion of assertions. A main difference from the source language is the addition of assertion \(\textbf{assert}(\varphi ); N\), which is used to implement the run-time checks. Like Flanagan’s hybrid type system [16] (and unlike the blame calculus [32]), we guarantee the safety of target programs by assertions. Compared with the blame calculus, this method is expected to be easier to implement since most of the modern programming languages are equipped with assertions, and more efficient in that it avoids the accumulation of dynamic casts at runtime. This implementation of the dynamic cast is possible since our system is only “gradualized” at the predicate level of the refinement type and the underlying simple type is static.

Another difference is that the binders in let expressions are annotated with their type. This is required when defining the precision relation over the cast terms in Section 3.

Fig. 8.
figure 8

Selected rules of substitution and reduction of the target language (the full definition is given in the full version [13]).

The substitution and the reduction rules of the cast terms are presented in Figure 8. The evaluation of primitive function \({ \texttt {ev}}(c, v)\) is defined to be the return value of the primitive function c applied to an argument v if v meets the constraint of the argument of c, and otherwise undefined. We denote \(N \Uparrow \) if there exists an infinite reduction sequence from N.

The substitution for cast terms is defined in the standard manner, except that the implicitly-annotated type information and the predicate in the assertion need to be updated as well. As can be seen in the definition of the cast term reduction, these implicitly-annotated types are only required for the sake of formalization and ignored at runtime.

Fig. 9.
figure 9

Typing rules for the cast terms \(\varGamma ;\varphi \vdash N:\tau \).

Fig. 10.
figure 10

Subtyping rules.

We also introduce the type derivation rules for the cast terms \(\varGamma ;\varphi \vdash N:\tau \) in Figure 9. This relation is used in the discussion of the soundness of the type system later in Section 2.3. The quadruple relation \(\varGamma ;\varphi \vdash N:\tau \) denotes that a cast term N has type \(\tau \) under a type environment \(\varGamma \) and a logical context \(\varphi \). The logical context \(\varphi \) holds the information of logically valid predicates at respective program points. New predicates are added at the then branch and the else branch of (CT-If), and the post-assertion cast term in (CT-Ass). The subsumption is allowed in (CT-Sub) by the subtyping relation \(\varGamma ;\varphi \vdash \tau _1<:\tau _2\) (Figure 10), which is defined in a standard manner.

2.3 Typing Rules

Fig. 11.
figure 11

Type derivation rules for the source language \(\varGamma ;\varphi \vdash M \leadsto N:\tau \).

Inserting Assertions Next, we discuss the typing rules for the source language and the assertion insertion into it. Figure 11 defines the type judgement and cast insertion relation. The intuition of 5-ary relation \(\varGamma ;\varphi \vdash M \leadsto N : \tau \) is: under a type environment \(\varGamma \) and a logical context \(\varphi \), a term M translates to a cast term N and has type \(\tau \). If we ignore the part “\(\leadsto N\)” and replace the gradual subtyping relation \(\lesssim \) with the standard subtyping relation on refinement types (Figure 10), our type system is a standard refinement type system. Thus, the main novelty in the rules in Figure 11 lies in the use of the consistent subtyping relation \(\varGamma ;\varphi \vdash \tau _1\lesssim \tau _2\leadsto N\), which is explained below.

The consistent subtyping relation \(\varGamma ;\varphi \vdash \tau _1\lesssim \tau _2\leadsto N\) (Figure 12)Footnote 5 is used in the cast insertion relation to guarantee that there exists a value that has both of the types \(\tau _1\) and \(\tau _2\) under \(\varGamma \) and \(\varphi \), and to produce an assertion term N that checks at runtime if a value that is statically known to be of type \(\tau _1\) can be used as a value of type \(\tau _2\).

The rule for the base case (Cast-Base) checks if there exists a value, and an assignment of the values to the variables in the type environment, that satisfies both \(\tau _1\) and \(\tau _2\). This intuitively holds if \(\tau _1\) is castable to \(\tau _2\) for some runtime values. The rule also produces a lambda function that implements the cast with an assertion. It is defined in such a way that \(\varphi _2\) can always be used as the content of the assertion \(\varphi '\), but \({ \texttt {true}}\) can also be used for \(\varphi '\) if \(\varphi _1\) implies \(\varphi _2\). Note that we cannot use \(\varphi _2\) as the content of the assertion in the definition, or otherwise Proposition 1 does not hold.

The rule for the function types (Cast-Fun) recursively checks the castability of the argument types and the return types and combines the assertion terms for them. Notice how the subsumption for the return types \(\tau _2\) and \(\tau _4\) has the meet of two argument types \(\tau _1\sqcap \tau _3\) in the type environment. The meet of two types (Figure 12) is defined as a conjunction of the refinement predicatesFootnote 6.

The consistent subtyping relation can be seen as a gradualization of the subtyping relation \(\varGamma ;\varphi \vdash \tau _1<:\tau _2\) (Figure 10). In fact, when a type \(\tau _1\) is a subtype of another type \(\tau _2\), it is possible that the assertion term generated by casting \(\tau _1\) to \(\tau _2\) only contains assertions that always succeed, which can be erased by some optimization. The following proposition states this fact. Note that this corresponds to the blame-subtyping theorem, one of the criteria for gradual typing presented in [27].

Fig. 12.
figure 12

Definition of the consistent subtyping relation \(\varGamma ;\varphi \vdash \tau _1\lesssim \tau _2\leadsto N\).

Proposition 1

\(\varGamma ;\varphi \vdash \tau _1<:\tau _2\) implies \(\varGamma ;\varphi \vdash \tau _1\lesssim \tau _2\leadsto N\) for some N where all the assertions in N are of the form \(\textbf{assert}({ \texttt {true}}); N'\).

Type Safety. We conclude this section with a note on the soundness of our type system. The soundness is based on the fact that if the source program is well-typed, the program after the assertion insertion is also well-typed.

The most critical part of the proof is to prove the assertion term can be assigned a function type from the pre-assertion type to the post-assertion type.

Lemma 1

\(\varGamma ;\varphi \vdash \tau _1\lesssim \tau _2\leadsto N\) implies \(\varGamma ;\varphi \vdash N:x{:}\tau _1\rightarrow \tau _2\) for some variable x that does not occur in \(\tau _2\).

The proof is found in the full version [13]. With Lemma 1, we can prove that the assertion-inserted program can be assigned the same type as that of the original program.

Lemma 2 (Assertion Insertion Preserves Types)

\(\varGamma ;\varphi \vdash M\leadsto N:\tau \) implies \(\varGamma ;\varphi \vdash N:\tau \).

We can also prove the standard progress and preservation properties under a reasonable assumption that the types of the primitive functions are properly defined as follows (see the full version [13] for the proofs).

Assumption 1

\(\vdash c\,v:\tau \) implies \( ev (c,v)\) is defined and \(\vdash ev (c, v):\tau \).

Combining Lemma 2 with the progress and preservation properties, we obtain the type safety as follows.

Theorem 1 (Type Safety)

With Assumption 1, \(\emptyset ;{ \texttt {true}}\vdash M\leadsto N:\tau \) implies \(N \longrightarrow ^* v\) for some v, \(N \Uparrow \), or \(N \longrightarrow ^* { \texttt {error}}\).

The type safety property states that a well-typed program does not cause untrapped dynamic errors. The only case where a cast-inserted program causes untrapped errors is when the result of an application of a primitive function is undefined (i.e., \( ev (c,v)\) is undefined). The type safety property ensures that such untrapped errors do not happen for well-typed terms as long as the \( ty (c)\) is defined appropriately.

3 Gradual Guarantee

In a standard gradual type system, programs are compared by their precision, or the amount of information contained in the type annotations. This notion is used to define the gradual guarantee [27], which is the core property of gradual typing. The gradual guarantee comes in two parts. The first one is called static gradual guarantee, which states that decreasing the precision of type annotation from a well-typed program still preserves the typeability of the program at a less precise type. The second one is called dynamic gradual guarantee, which claims that a less precise program behaves the same as the more precise one with fewer assertion errors.

Below we first define the precision for the language introduced in Section 2. We then show that our type system satisfies the gradual guarantee.

Fig. 13.
figure 13

Precision relation of types and type environments.

Precision. Figure 13 defines the precision relation \(\widetilde{x}\vdash \tau _1\sqsubseteq \tau _2\) on types by using the logical implication between the refinement predicates. The sequence of variables \(\widetilde{x}\) keeps the variables that may appear in the refinement predicates. For example, the following is an example of the type precision relation for the base type.

$$ \vdash \{x:{ \texttt {tensor}}\mid x.{ \texttt {shape}}=[3]\} \sqsubseteq \{x:{ \texttt {tensor}}\mid { \texttt {len}}(x.{ \texttt {shape}})=1\} $$

Note that in the rule (Prec-Fun), the precision of the argument type and the return type are compared independently; the type information on x is not used in the comparison of the return types. This is in contrast with the rule (Sub-Fun) in Figure 10 for subtyping. Figure 13 also extends the relation to \(\varGamma \sqsubseteq \varGamma '\) on type environments. The precision relation is also extended to the relation \(\widetilde{x}\vdash M \sqsubseteq M'\) on terms, by the rules in Figure 14. Here, \(\widetilde{x}\) is the sequence of variables in scope. Finally, we define the precision relation of the cast terms in Figure 14. Unlike the term precision relation (Figure 14), the precision relation \(\varGamma ;\varphi \vdash N_1\sqsubseteq N_2\) on cast terms requires the type environment \(\varGamma \) and the logical context \(\varphi \) in the judgement, and the refinement extraction from the type environment \(\Phi (\varGamma )\) is used in the rule (PC-Assert). We also assume the following property on the evaluation of the primitive functions.

Assumption 2

If \( ev (c,v_2)\) and \( ev (c,v_1)\) are both defined, then \(v_1\sqsubseteq v_2\) implies \( ev (c,v_1)\sqsubseteq ev (c,v_2)\).

Fig. 14.
figure 14

Selected rules for the precision relation on terms and cast terms (the full definition is found in the full version [13]).

Intuitively, the precision of cast terms are designed in such a way that, when \(\emptyset ;{ \texttt {true}}\vdash N_1\sqsubseteq N_2\) holds, the assertions in \(N_1\) is more strict than that of \(N_2\), and therefore the dynamic checks in \(N_1\) is more likely to fail than in \(N_2\). The following two propositions state this intuition (the proofs are found in the full version [13]).

Proposition 2

Suppose \(\emptyset ;{ \texttt {true}}\vdash N_1:\tau \) and \(\emptyset ;{ \texttt {true}}\vdash N_2:\tau '\). Then, \(\emptyset ;{ \texttt {true}}\vdash N_1\sqsubseteq N_2\) and \(N_1\longrightarrow N_1'\) imply \(N_2\longrightarrow N_2'\) and \(\emptyset ;{ \texttt {true}}\vdash N_1'\sqsubseteq N_2'\) for some \(N_2'\).

Proposition 3

Suppose \(\emptyset ;{ \texttt {true}}\vdash N_1:\tau \) and \(\emptyset ;{ \texttt {true}}\vdash N_2:\tau '\). Then, \(\emptyset ;{ \texttt {true}}\vdash N_1\sqsubseteq N_2\) and \(N_2\longrightarrow N_2'\) imply either of the following.

  • \(N_1\longrightarrow N_1'\) and \(N_1'\sqsubseteq N_2'\) for some \(N_1'\)

  • \(N_1\longrightarrow { \texttt {error}}\)

Gradual Guarantee. We show that our system satisfies the gradual guarantee [27]. First, we prove that the consistent subtyping relation \(\varGamma ;\varphi \vdash \tau _1\lesssim \tau _2\leadsto N\) is upper-closed with respect to the precision relation \(\widetilde{x}\vdash \tau _1\sqsubseteq \tau _3\) on types.

Lemma 3

\(\varGamma ;\varphi \vdash \tau _1\lesssim \tau _2\leadsto N_1\), \( dom (\varGamma )\vdash \tau _1\sqsubseteq \tau _3\), \( dom (\varGamma )\vdash \tau _2\sqsubseteq \tau _4\) and \(\varGamma \sqsubseteq \varGamma '\) implies \(\varGamma ';\varphi \vdash \tau _3\lesssim \tau _4\leadsto N_2\) for some \(N_2\).

We can further prove that the cast term \(N_2\) in the statement of Lemma 3 is less precise than the original cast term \(N_1\) as follows.

Lemma 4

Suppose \(\varGamma \sqsubseteq \varGamma ', dom (\varGamma )\vdash \tau _1\sqsubseteq \tau _1'\) and \( dom (\varGamma )\vdash \tau _2\sqsubseteq \tau _2'\). Then, \(\varGamma ;\varphi \vdash \tau _1\lesssim \tau _2\leadsto N\) and \(\varGamma ';\varphi \vdash \tau _1'\lesssim \tau _2'\leadsto N'\) implies \(\varGamma ;\varphi \vdash N\sqsubseteq N'\).

Using the above properties, we can prove the following lemma which constitutes the core part of the proof of the gradual guarantee.

Lemma 5

\(\varGamma \sqsubseteq \varGamma '\), \( dom (\varGamma )\vdash M \sqsubseteq M'\) and \(\varGamma ;\varphi \vdash M \leadsto N : \tau \) imply \(\varGamma ';\varphi \vdash M' \leadsto N' : \tau '\), \(\varGamma ;\varphi \vdash N \sqsubseteq N'\) and \( dom (\varGamma )\vdash \tau \sqsubseteq \tau '\) for some \(N'\) and \(\tau '\).

Finally, we can show the static and dynamic gradual guarantee as follows.

Theorem 2 (Static gradual guarantee)

\(\emptyset \vdash M_1 \sqsubseteq M_2\) and \(\vdash M_1 :\tau _1\) imply \(\vdash M_2 :\tau _2\) and \(\emptyset \vdash \tau _1 \sqsubseteq \tau _2\) for some \(\tau _2\).

Proof

This follows immediately from Lemma 5.    \(\square \)

Theorem 3 (Dynamic gradual guarantee)

Suppose \(\emptyset \vdash M_1 \sqsubseteq M_2\) and \(\vdash M_1 \leadsto N_1:\tau _1\). Then, there exist \(N_2\) and \(\tau _2\) that satisfy all of the following.

  • \(\vdash M_2 \leadsto N_2 : \tau _2\).

  • \(N_1\longrightarrow ^*v_1\) implies \(N_2\longrightarrow ^*v_2\) and \(v_1\sqsubseteq v_2\) for some \(v_2\).

  • \(N_1\Uparrow \) implies \(N_2\Uparrow \).

  • \(N_2\longrightarrow ^*v_2\) implies \(N_1\longrightarrow ^*v_1\) and \(v_1\sqsubseteq v_2\) for some \(v_1\), or \(N_1\longrightarrow ^*{ \texttt {error}}\).

  • \(N_2\Uparrow \) implies \(N_1\Uparrow \) or \(N_1\longrightarrow ^*{ \texttt {error}}\).

Proof

By Lemma 5, \(\vdash M_2\leadsto N_2:\tau _2\) holds for some \(N_2\) and \(\tau _2\) where \(\vdash N_1 \sqsubseteq N_2\) and \(\vdash \tau _1 \sqsubseteq \tau _2\). Also, from Lemma 2, we obtain \(\vdash N_1:\tau _1\) and \(\vdash N_2:\tau _2\). Using Proposition 2, \(N_1 \longrightarrow ^* v_1\) for some \(v_1\) implies \(N_2 \longrightarrow ^* v_2\) for some \(v_2\) such that \(v_1 \sqsubseteq v_2\). Also, \(N_1 \longrightarrow ^\infty \) implies \(N_2 \longrightarrow ^\infty \). Using Proposition 3, \(N_2 \longrightarrow ^* v_2\) for some \(v_2\) implies \(N_1 \longrightarrow ^* v_1\) for some \(v_1\) such that \(v_1 \sqsubseteq v_2\), or \(N_1 \longrightarrow ^* { \texttt {error}}\). Also, \(N_2 \longrightarrow ^\infty \) implies \(N_1 \longrightarrow ^\infty \) or \(N_1 \longrightarrow ^* { \texttt {error}}\).    \(\square \)

4 Best-Effort Type Inference

Thanks to our combination of gradual typing and hybrid checking described in the previous sections, a type inference procedure need not necessarily output the most precise types. It is allowed to perform type inference only in a best-effort manner, and the results in the previous sections do not depend on the particular design of the type inference procedure. Nevertheless, it is desirable for the procedure to infer reasonably good types. In this section, we report a specific design of the type inference procedure, which we have implemented in our prototype system GraTen; as reported in the Section 5, our procedure works reasonably well for actual deep learning programs.

4.1 Overview of Type Inference and Checking in GraTen

The type checking in GraTen consists of the following three phases: (1) simple type inference, (2) best-effort refinement type inference, and (3) consistent subtyping checking and assertion insertion.

In the first phase, GraTen performs the simple type inference using the standard Hindley-Milner algorithm and annotates the AST with the inferred simple types of each node.

In the second phase, GraTen first collects all the consistent subtyping constraints of the form \(\varGamma ;\varphi \vdash \tau _1\lesssim \tau _2\leadsto N\) from the source program. When it encounters AST nodes whose refinement type cannot be constructed directly, GraTen generates template refinement types using the simple types inferred in the previous phase. Template refinement types may contain variables for undetermined predicates (referred to as predicate variables).

Using the collected constraints, GraTen then tries to find a solution for all of the predicate variables with its hand-made constraint solver. The constraint solving takes place on every let binding to allow let-polymorphism on shapes. We discuss the detail of the implementation of the solver in the next subsection, but at a high level, the solver tries to find such a solution that:

  • only general types are inferred, as otherwise it could result in rejecting well-typed programs.

  • \(\varGamma ;\varphi \vdash \tau _1<:\tau _2\) holds for as many constraints \(\varGamma ;\varphi \vdash \tau _1\lesssim \tau _2\leadsto N\) as possible. This is to make the cast term \(N\) consist of trivial assertions (which can statically be discharged to avoid run-time overheads; recall Proposition 1).

Given that the subtyping constraints can be expressed in the form of constrained Horn clauses (CHC) and not all the subtyping constraints need to hold, the problem above is essentially a CHC solving problem with weak constraints and maximality [22] where the optimization objective of the problem is defined by pointwise logical comparison of the solutions.

The constraint solver of GraTen does not always find a solution for all predicate variables. In such cases, GraTen assigns true to the undetermined predicate variables; that way, they will at least not invalidate the consistent subtyping constraints.

Note that GraTen does not take into account the consistent subtyping \(\varGamma ;\varphi \vdash \tau _1\lesssim \tau _2\leadsto N\) itself when trying to find a solution, as we expect that it would be rare for a consistent subtyping \(\varGamma ;\varphi \vdash \tau _1\lesssim \tau _2\leadsto N\) to hold when the subtyping relation \(\varGamma ;\varphi \vdash \tau _1<:\tau _2\) does not hold. GraTen therefore defers the check of consistent subtyping constraints to the next phase.

In the third phase, GraTen checks the validity of consistent subtyping constraints using the inference results for the predicate variables from the previous phase. GraTen first attempts to simplify and verify the constraints by a hand-made solver, but it falls back on using z3 [5] with timeouts if it does not work. Simultaneously, it also generates the assertion terms and inserts them into the source program.

4.2 Heuristics of Best-Effort Type Inference

To solve the subtyping constraints explained above, we have implemented a hand-made constraint solver. GraTen does not use off-the-shelf SMT or CHC solvers such as Z3 [5], since the refinement predicates in GraTen often use complicated predicates on integer lists, for which standard SMT/CHC solvers cannot find a solution in a reasonable time. Also, while GraTen should infer general types (so as not to reject well-typed programs), those generic solvers are not biased towards generality and return any (non-general) solution that satisfies the constraints. This subsection describes the heuristics used in GraTen for constraint solving.

The preparation for the inference is already started when GraTen generates the template refinement types during the constraint collection. For each predicate variable generated, GraTen attaches the set of program variables it depends on, which is calculated from the type environment. This is used in the constraint solving later to avoid assigning irrelevant predicates to the predicate variables. We denote predicate variables as \(p_{\widetilde{x}}(\widetilde{y})\), where \(\widetilde{x}\) denotes the set of program variables it depends on and \(\widetilde{y}\) denotes the parameters of the predicate variable.

After collecting the constraints, GraTen decomposes the subtyping constraints to constrained Horn clauses of the form \(\widetilde{\varphi _1}\wedge \widetilde{\varphi _2}\Rightarrow \widetilde{\varphi _3}\) following the definition of the subtyping relation (Figure 10). The notation \(\widetilde{\varphi }\) denotes a set of predicates, logically interpreted as the conjunction of the predicates. The first, second, and third set of predicates in the clause respectively corresponds to the predicates from the context \(\Phi (\varGamma )\wedge \varphi \), the refinement of the type on the left \(\varphi _1\), and that of the type on the right \(\varphi _2\). We intentionally distinguish between \(\widetilde{\varphi _1}\) and \(\widetilde{\varphi _2}\) on the left-hand side of the clauses in describing the constraint solving algorithm. For example, let us reconsider the program in Figure 2. The subtyping constraints collected from the if expression of the program would be as follows, where pq and r are the predicate variables generated for the type of s, x and the if expression respectively.

$$\begin{aligned} \varGamma ;(s=1)&\vdash \{\nu {:}{ \texttt {tensor}}\mid q_{s,\nu }(\nu )\}<: \{\nu {:}{ \texttt {tensor}}\mid r_{s,x,\nu }(\nu )\} \\ \varGamma ;(s\ne 1)&\vdash \{\nu {:}{ \texttt {tensor}}\mid q_{s,\nu }(\nu )\}<: \{\nu {:}{ \texttt {tensor}}\mid { \texttt {len}}(\nu .{ \texttt {shape}})=1\} \\ \varGamma ;(s\ne 1)&\vdash { \texttt {tensor}}([{ \texttt {nth}}(0,x.{ \texttt {shape}})/s]) <: \{\nu {:}{ \texttt {tensor}}\mid r_{s,x,\nu }(\nu )\} \\ \text {where }\, \varGamma&:= [s\mapsto \{\nu {:}{ \texttt {int}}\mid p_{\nu }(\nu )\},x\mapsto \{\nu {:}{ \texttt {tensor}}\mid q_{s,\nu }(\nu )\}] \end{aligned}$$

These constraints are decomposed into the following clauses.

$$\begin{aligned} \begin{aligned} \{p_s(s), q_{s,x}(x), s=1\}\wedge \{ q_{s,\nu }(\nu ) \}&\Rightarrow r_{s,x,\nu }(\nu ) \\ \{p_s(s), q_{s,x}(x), s\ne 1\}\wedge \{ q_{s,\nu }(\nu ) \}&\Rightarrow { \texttt {len}}(\nu .{ \texttt {shape}}) = 1 \\ \{p_s(s), q_{s,x}(x), s\ne 1\}\wedge \{ \nu .{ \texttt {shape}}=[{ \texttt {nth}}(0,x.{ \texttt {shape}})/s] \}&\Rightarrow r_{s,x,\nu }(\nu ) \end{aligned} \end{aligned}$$
(1)

From the clauses obtained as above, GraTen tries to find a solution for the predicate variables using an algorithm presented in Algorithm 1.

The algorithm processes the constraints by first trying to find a solution for predicate variables that occur on the right-hand side of a clause \(\widetilde{\varphi _1}\wedge \widetilde{\varphi _2}\Rightarrow \widetilde{\varphi _3}\) (Line 6-10), and then on the left-hand side of a clause (Line 11-15), and repeats it until either all of the constraints are solved or the constraints cannot be processed any further (Line 4). In Line 8 and Line 13, the set of program variables \(\widetilde{x}\) of a predicate variable \(p_{\widetilde{x}}\) is used to assign the predicates to the predicate variablesFootnote 7.

During the iteration, the constraints need to be occasionally updated with the current solutions \(\theta \) by applying the substitution \(\theta \) to all the predicates in the constraints. After that, we also simplify the set of clauses (with simplify in Algorithm 1) by removing the predicates from the right-hand side of a clause that trivially follows from the left-hand side, and by removing clauses whose right-hand side is empty. For example, a clause \(\{\}\wedge \{x=1\}\Rightarrow \{x=1\}\) is simplified to \(\{\}\wedge \{x=1\}\Rightarrow \{\}\), and then removed from the set of clauses.

To illustrate the behavior of Algorithm 1, consider applying it to the clauses (1). During the first iteration of the while loop (Line 4), the first for loop (Line 6) exits with an empty \(\theta \) as r appears on the right-hand side of multiple clauses and cannot be resolved here due to the check at Line 7. In the next for loop (Line 11), \(\theta \) is updated to:

$$\begin{aligned}{}[q_{s,\nu }(\nu )\mapsto \left( { \texttt {len}}(\nu .{ \texttt {shape}})=1 \wedge q'_{s,\nu }(\nu )\right) ] \end{aligned}$$
(2)

where \(q'_{s,\nu }(\nu )\) is a fresh predicate variable, and the constraints c would be updated as follows.

$$\begin{aligned} \{p_s(s), { \texttt {len}}(x.{ \texttt {shape}})=1, q'_{s,x}(x), s=1\}\wedge \{ { \texttt {len}}(\nu .{ \texttt {shape}})=1 \wedge q'_{s,\nu }(\nu ) \}&\Rightarrow r_{s,x,\nu }(\nu ) \\ \{p_s(s), { \texttt {len}}(x.{ \texttt {shape}})=1, q'_{s,x}(x), s\ne 1\}\wedge \{ \nu .{ \texttt {shape}}=[{ \texttt {nth}}(0,x.{ \texttt {shape}})/s] \}&\Rightarrow r_{s,x,\nu }(\nu ) \end{aligned}$$

The while loop exits after the second iteration, as no new predicate variables can be added to \(\theta \) and \(c=c'\) holds. Thus, we only obtain (2) from Algorithm 1. After the inference, GraTen assigns true to the remaining predicate variables p, \(q'\) and r.

figure c

5 Experiment

This section reports on experiments to evaluate the effectiveness of our approach by running our tool GraTen for the example programs bundled in the OCaml-Torch library [4]. We have also checked how type annotations changed the inference results.

5.1 Methods

Input and Output of GraTen GraTen takes an OCaml program and performs type checking with its best-effort type inference. If the type checking is successful, it returns the inferred types of top-level variables defined in the program, and the source program with necessary assertions inserted. Otherwise, the type checking fails with an error message.

The assertions are inserted into the output program only when they are needed. Namely, assertions are inserted into the places where the consistent subtyping \(\varGamma ;\varphi \vdash \tau _1\lesssim \tau _2\leadsto N\) is used only when \(\varGamma ;\varphi \vdash \tau _1<:\tau _2\) doesn’t hold (see Proposition 1).

Besides the source program, GraTen also reads the types of the library functions (including those of OCaml-Torch) from manually prepared stub files. For example, the type of tr (matrix transpose function) is defined as follows.

figure d

Note that describing the types of some higher-order OCaml-Torch functions requires the polymorphic extension, which we sketch in the full version [13]. For example, the type of Layer.forward is defined as follows.

$$\begin{aligned}&\forall b_1{:}{ \texttt {bool}},b_2{:}{ \texttt {bool}}.\\&(x{:}\{x{:}{ \texttt {tensor}}\mid b_1\}\rightarrow \{y{:}{ \texttt {tensor}}\mid b_2\}) \rightarrow x{:}\{x{:}{ \texttt {tensor}}\mid b_1\}\rightarrow \{y{:}{ \texttt {tensor}}\mid b_2\} \end{aligned}$$

GraTen handles such types by instantiating the quantified parameters (\(b_1\) and \(b_2\) in the above case) with fresh predicate variables.

Test Cases We applied GraTen to programs under examples/ directory of the repository of OCaml-TorchFootnote 8. The list of programs tested is shown in Table 1. Since some programs use features of OCaml or OCaml-Torch that are not yet supported by GraTen, they were modified not to use such features without changing the structure of the neural network. Major modifications added to the target programs are listed below. Other smaller syntactic modifications can be found in the supplementary materials.

  1. (M1)

    Replacing or removing type-polymorphic functions. Some functions that create loops such as List.foldl are replaced with recursive functions. Others such as no_grad are replaced with the type-instantiated versions.

  2. (M2)

    Removing use of non-integer lists, especially tensor lists and layerFootnote 9 lists. As a result, two list-taking primitive functions are removed. One is Tensor.cat, which takes a list of tensors and returns the concatenation of them. It is replaced with a variant Tensor.cat_ which takes only two tensors. The other is Layer.sequential, which takes a list of layers and returns a layer that sequentially applies all the input layers.

  3. (M3)

    Replacing mutable float objects with 0-dimensional tensors, as GraTen does not support reference types.

As an example of (M1) and (M2), consider the following function, which creates a list of linear layers and returns a new layer that applies all the layers in the list.

figure e

The i-th layer in the list takes a tensor whose last dimension is size i+1, and returns a tensor of the same shape except that the last dimension is changed to i+2. By the modifications (M1) and (M2), the above function definition is replaced with:

figure f

Some programs in the examples/ directory are excluded from the test cases for the following reasons.

  • neural_transfer uses a library function Vgg.vgg16_layers whose type cannot be described in GraTen; the relation between its inputs and its output tensor’s shape could not be expressed in the syntax supported by GraTen.

  • Programs dqn.ml, dqn_atari.ml and dqn_pong.ml in reinforcement-learning use queues which are not supported in GraTen yet.

  • env_gym_pyml.ml and venv_env_gym_pyml.ml under reinforcement-learning use Python objects whose verification is not the scope of this paper.

  • reinforcement-learning/policy_gradient.ml uses mutable lists which cannot be replaced with another datatype already supported in GraTen.

  • yolo/darknet.ml and translation/lang.ml use hash tables which are not supported in GraTen yet.

  • translation/dataset.ml and translation/lang.ml are irrelevant as tensor objects do not appear in them.

Evaluation We evaluated the best-effort inference of GraTen on the following three aspects.

First, we counted the assertions inserted into the original program when GraTen is used for the target program. Since the assertions indicate the program points that could fail at runtime, the user of GraTen would wish to pay attention to the location and the number of inserted assertions and try to decrease them.

Second, we counted the minimum number of type annotations required to type-check the program with minimum assertions inserted. This is for evaluating the realistic programmers’ burden of trying to statically verify the program with type annotations. The annotations were added in such a way that the types of the functions do not lose the original generality. The type annotations are counted by the number of refinement types with non-true refinement predicates in them. For example, the following annotation counts as 3 because the refinement of the input tensor and the two output tensors are not true, but the refinement of the annotation of the second argument bool is true.

figure g

Third, we also measured the time taken by GraTen to analyze the unannotated and annotated programs. The experiments were conducted on a Linux machine with 12-core Intel i5-11400 (2.60GHz) and GraTen is implemented in Haskell with GHC version 9.0.2.

5.2 Experimental Results

Table 1. Results of running GraTen to the test cases. The second column is the size of the program after the modification. The third and fourth columns are the results for unannotated programs. The third column is the duration of the type-checking and the fourth column is the number of assertions inserted. From the fifth to the seventh columns are for the annotated programs. The fifth column is the number of annotations added to the program.

Table 1 summarizes the experimental results. We analyze those results by the following three aspects: assertions, type annotations and analysis time.

Inserted Assertions Out of the 26 programs tested, 10 programs required no type annotations to type-check without assertions, and other 7 programs type-checked without assertions after adding appropriate type annotations. For the remaining 9 programs such as gan/began.ml and gan/gan_stability.ml, we could not eliminate all assertions, although some of them were removed after adding type annotations. The remaining assertions were due to the imprecise type signatures of some library functions. For instance, Torch.Serialize.load is a function that loads a tensor from a file and its type signature is defined as follows.

figure h

The return type of load is simply defined as tensor since it is impossible to assume any properties about its shape. As a result, an assertion was inserted to check if the loaded tensor satisfies the requirement to run the program without uncaught errors. Even adding type annotations to the loaded tensor does not remove the assertion.

Some other functions are given imprecise types due to GraTen’s immature support of polymorphic data types. For example, the type of Tensor.stack is defined as follows because GraTen does not effectively support non-integer lists yet. Refining the return types of such functions is left as future work.

figure i

Patterns of Added Type Annotations As we added type annotations to the test cases, we observed that the program points that require type annotations have similarities. All of the type annotations fall into one of the following patterns.

  1. (P1)

    Branches i.e., if expressions and match expressions with multiple branches (e.g., Figure 4 in Section 1).

  2. (P2)

    Recursive functions. For example, loop in translation/seq2seq.ml is annotated as follows.

    figure j
  3. (P3)

    Higher-order shape-polymorphic arguments. For example, sample in char_rnn.ml is annotated as follows.

    figure k
  4. (P4)

    Definition of record types. The current implementation of GraTen expects that the definition of record types describes the refinement types of each field.

  5. (P5)

    Imprecise type signatures of primitive functions, or user-defined functions of dependent modules. For example, translation/seq2seq.ml has the following type annotation since the return type of Tensor.stack is only inferred to be tensor due to its imprecise type signature.

    figure l

    The statically inferred type of enc_outputs here is tensor([1; enc.hidden_size]) list, so we would not need this type annotation if the type signature of Tensor.stack is appropriately defined. Since it is not possible to statically verify the correctness of these types of annotations, assertions would still be inserted after adding these annotations.

The first three patterns indicate that GraTen’s current best-effort type inference does not effectively infer precise refinements for branches, recursive functions and higher-order shape-polymorphic arguments. The fourth pattern 5.2 would be inevitable when using record types. It remains as future work to exempt users from having to add type annotations for 5.2. With such improvements, we believe that it will become easier to find program points that require type annotations for better inference.

Number of Type Annotations There is no correlation between the number of assertions inserted into the unannotated program and the number of annotations needed to the program to minimize the number of assertions.

For example, adding two type annotations to gan/gan_stability.ml resulted in removing 38 assertions. This is because GraTen inferred an imprecise type for a helper function resnet_block without any type annotations, and it degraded the precision of the inference for the 24 callers of the function. Meanwhile, translation/seq2seq.ml required comparatively many type annotations as it has many definition of record types and several recursive functions with multiple inputs.

Analysis Time For all of the 11 annotated programs, GraTen’s type checking for annotated programs was faster than the unannotated counterparts. This would be because having more static information made it easier for GraTen to infer more precise types and resolve more subsumption constraints easily.

5.3 Discussions

In this subsection, we discuss the strengths, weaknesses and our perspective on the future development of our system.

Performance of Best-Effort Inference As reported in the previous subsection, the best-effort inference of GraTen does not infer precise types for branches, recursions and higher-order shape-polymorphic arguments. While this may seem unsatisfying at a glance, the aim of this research is not to develop a perfect inference algorithm, but to propose a method that can work on unannotated programs and allows users to work interactively with the type checker to gradually add type annotations. With this respect, we believe that GraTen has achieved desirable results since it will be easy for the user to find out where to add type annotations. This is because (1) the inserted assertions can inform the user of the location of potential dynamic errors, and (2) all of the required type annotations would fall into one of the patterns listed in the previous section and thus should be predictable.

Lists of Tensors and Layers As of now, the refinement inference for lists in GraTen is limited to integer lists. Meanwhile, lists of tensors or lists of functions are commonly used in deep learning programs: Tensor.cat and Tensor.stack both take a list of tensors and return their concatenation, and Layer.sequential takes a list of layers (functions that take and return a tensor) and returns their composition.

A potential approach to support these library functions would be to add new refinement predicates for tensors lists or layer lists. For example, we can add a predicate \({ \texttt {composable}}(x,S_1,S_2)\) which means that the composition of a list of layers x takes a tensor of shape \(S_1\) and returns a tensor of shape \(S_2\). The type of Layer.sequential would be expressed with the shape polymorphic extension (see the full version [13]) as follows.

figure m

To practically infer composable predicate for layer lists, we would need to change the type-instantiated versions of list-manipulating functions as well. For instance, the type of the cons function for layers would need to be defined as follows.

figure n

Reporting Incorrect Type Annotations Since our type system sees the standard refinement types as gradual, some users might find the behavior of GraTen unexpected in some cases. Consider the following function f which takes a matrix and returns a matrix obtained by transposing the input. Suppose that the programmer mistakenly annotated the return value of f to have the same shape as the input matrix.

figure o

Although this type annotation does not hold in general, this program is not rejected by our type system because the annotation can hold if the input x is a square matrix. GraTen would output the following program with an assertion.

figure p

To avoid such a situation, it would be possible to extend the type system with types with fully statically known refinements, and let the annotated types be interpreted as such.

6 Related Work

Tensor Shape Checking in Deep Learning Programs. The problem of tensor shape checking has been studied for decades by various contexts such as the numeric analysis [2, 7] and the array-oriented languages with rank polymorphism [12, 28, 29]. Tensor shape checking for deep learning programs is still a new challenge because the shapes can be more complicated, and a variety of methods have been proposed both in academia and in industry.

Some tools statically check tensor shapes with advanced type systems. Hasktorch [3] is a Haskell binding of libtorch [20] which provides a mode that statically checks tensor shapes. Since they use the type-level programming feature of Haskell to implement the tensor shapes, tensor shapes are not first-class objects. As a result, programs such as the one in Figure 1 cannot be expressed since it is impossible to define the function f whose type depends on the first-class object s. Relay [24, 25] is an IR for deep learning compilers with a rich type system for tensor shape with type inference. Both Relay and Hasktorch support dynamic shape as a wild card in the static shape checking.

Apart from the type-based verification methods, some tensor shape error detection tools also take a static approach. Pythia [6, 17] statically detects shape fault for TensorFlow [1] programs by keeping track of the tensor shapes throughout the program using value-flow analysis. The tracking of shape is in a best-effort manner, allowing the shape inference results to be “unknown” in some cases. The analysis crucially relies on the programming practice in TensorFlow to annotate tensor shapes as much as possible.

Other static checking tools took an approach that uses symbolic execution to collect constraints from the program and verifies it with a solver; Tensors Fitting Perfectly [21] and PyTea [15] are on this approach. Both methods remove loops from the program in an ad-hoc manner based on a reasonable assumption for the program.

Lastly, some took dynamic approaches to provide lightweight shape fault detection. ShapeFlow [31] is an abstract interpreter of TensorFlow programs; it shares the same APIs as TensorFlow but only calculates the shape of tensors. Users can run the analysis by replacing the import of TensorFlow with ShapeFlow in the target program, which executes more efficiently than the original TensorFlow program. Elichika [14] uses a similar method to ShapeFlow with a feature to display the interpreted shapes with a symbolic expression. These dynamic approaches enable quick analysis and require no type annotations, but provide no guarantee for untested inputs.

Static and Dynamic Checking for Refinement Types. Earlier work on dependent type system focused on decidable type checking and inference with restricted refinement logic [10, 26, 33, 34]. Dynamic checking with contracts [9, 19] offers expressive verification that cannot be covered with a static type system, but at a cost of runtime overhead. Naturally, the combination of static and dynamic checking has been actively explored by the successors of both parties.

Hybrid type checking [16], which our work is based on, extends the purely-dynamic method of using contracts by verifying specifications statically as much as possible. This method differs from ours in that it inserts a dynamic check only when the subtyping constraint is not proven to be valid or invalid. As a result, this method statically rejects the incorrectly annotated program that we discussed in Subsection 5.3, while our method accepts it with a dynamic check in the hope that a more precise type annotation will remove the need for a dynamic check. Our method can be understood as a variant of hybrid type checking with a focus on being gradual in adding type annotations.

The application of gradual typing to dependent type systems has also been studied [8, 18]. Especially, gradual refinement types [18] is very similar to our type system in that it gradualizes only the predicate part of a refinement type system and the underlying simple type is static. One of the differences is that their system distinguishes statically-unknown refinement predicates with statically-known ones, while our system assumes that any refinement predicates can have a statically-unknown portion. For example, consider the following program:

$$ { \texttt {let}}\,\,f\,x\,(y:\{\nu :{ \texttt {int}}\mid { \texttt {true}}\}) = x / y $$

This program is rejected in their system because the type annotation of y indicates that the programmer is confident that y can be any integers including 0; otherwise, the type annotation should have been \(\{\nu :{ \texttt {int}}\mid \,\star \,\}\). Meanwhile, our system interprets the type annotation as not precise enough and accepts the program by inserting a dynamic check to y. Intuitively, \(\{x:B\mid \varphi \}\) in our type system translates to \(\{x:B\mid \varphi \,\wedge \, \star \}\) in gradual refinement types [18].

The type inference for gradual refinement types has been studied by Vazou et al. [30]. Their work restricts the refinement to liquid predicates [26] to maintain the decidability, while our work does not impose such a limitation.

7 Conclusion and Future Work

We presented an extension to the standard refinement type system which can be viewed as a gradual type system. The essence of this extension is the introduction of the consistent subtyping relation, which inserts to the source program assertions that checks statically-unverified properties at runtime. We also presented that the extended type system satisfies the refined criteria of gradual typing.

We then applied this type system for verifying tensor shapes with best-effort type inference. This application makes use of the property of the proposed type system that allows us to cover the limitation of the static best-effort analysis with dynamic checks. We also implemented a prototype type checker GraTen and applied it with some of the example programs publicly available in OCaml-Torch repository. We observed that, thanks to the best-effort type inference, users would not be required too many type annotations to statically type-check the whole program, and it would not be difficult to find where to add type annotations to improve the inference.

We conclude with some ideas for future work.

  • Extension with type polymorphism. As we observed in the experiments, type polymorphic functions are frequently used in realistic programs. Extending our type system with ML-style type polymorphism would make the type checker more practical.

  • Application for imperative languages with a dynamic type system, like Python. In this paper, we have chosen OCaml as the target of the prototype to ensure that the input program is statically-typed. Python would, however, be a more attractive target since it is widely used in the machine learning community.