Graded Modal Dependent Type Theory

Graded type theories are an emerging paradigm for augmenting the reasoning power of types with parameterizable, fine-grained analyses of program properties. There have been many such theories in recent years which equip a type theory with quantitative dataflow tracking, usually via a semiring-like structure which provides analysis on variables (often called ‘quantitative’ or ‘coeffect’ theories). We present Graded Modal Dependent Type Theory (Grtt for short), which equips a dependent type theory with a general, parameterizable analysis of the flow of data, both in and between computational terms and types. In this theory, it is possible to study, restrict, and reason about data use in programs and types, enabling, for example, parametric quantifiers and linearity to be captured in a dependent setting. We propose Grtt, study its metatheory, and explore various case studies of its use in reasoning about programs and studying other type theories. We have implemented the theory and highlight the interesting details, including showing an application of grading to optimising the type checking procedure itself.


Introduction
The difference between simply-typed, polymorphically-typed, and dependentlytyped languages can be characterised by the dataflow permitted by each type theory. Dataflow in each can be similarly enacted by substituting a term for occurrences of a variable in some other term, the scope of which is delineated by a binder. In the simply-typed λ-calculus, data can only flow in 'computational' terms; computations and types are separate syntactic categories, with variables, bindings (λ), and substitution-and thus dataflow-only at the computational level. In contrast, polymorphic calculi like System F [51,27] permit dataflow within types, via type quantification (∀), and a limited form of dataflow from computations to types, via type abstraction (Λ) and type application. Dependently-typed calculi (e.g., [14,41,42,43]) break down the barrier between computations and types further: variables are bound simultaneously in types and computations, such that data can flow both to computations and types simultaneously via dependent functions Π and application. This pervasive dataflow enables the Curry-Howard correspondence to be leveraged for program reasoning and theorem proving [58]. However, the unrestricted flow of data between computations and types can impede reasoning and sometimes interacts poorly with other type theoretic ideas.
Firstly, whilst System F allows parametric reasoning due to representation independence [52,56], this representation independence is lost in general in dependently-typed languages when quantifying over higher-kinded types [45] (rather than just 'small' types [7,37]). Furthermore, unrestricted dataflow impedes efficient compilation as compilers do not know, from the types alone, where a term is actually needed. Additional static analyses are thus needed to recover dataflow information for optimisation and reasoning. For example, if a term is shown to be used only for type checking (not flowing to the computational 'run time' level) then it can be erased [9]. Thus, dependent theories do not expose the distinction between proof relevant and irrelevant terms, requiring extensions to capture irrelevance [49,50,4]. Whilst unrestricted dataflow between computations and terms has its benefits, the permissive nature of dependent types can hide useful information. This permissiveness also interacts poorly with other type theories which seek to deliberately restrict dataflow, notably linear types.
Linear types allow data to be treated as a 'resource' which must be consumed exactly once: linearly-typed values are restricted to linear dataflow [28,57,59]. This reasoning about resourceful data has been exploited by several languages, e.g., ATS [53], Alms [55], Clean [18], Granule [46], and Linear Haskell [8]. However, linear dataflow is rare in a dependently-typed setting. Consider typing the body of the polymorphic identity function in Martin-Löf type theory: a : Type, x : a ⊢ x : a This judgment uses a twice (typing x in the context and the subject of the judgment) and x once in the term but not at all in the type. There have been various attempts to meaningfully reconcile linear and dependent types [12,15,38,40] usually by keeping them separate, allowing types to depend only on nonlinear variables. However, all such theories are unable to distinguish variables used for computation from those used purely for type formation, which could then be erased at runtime.
Recent work by McBride [44], refined by Atkey [6], generalises ideas from 'coeffect analyses' (variable usage analyses, like that of Petricek et al. [48]) to a dependently-typed setting to reconcile the ubiquitous flow of data in dependent types with the restricted dataflow of linearity. This approach, called Quantitative Type Theory (Qtt), types the above example as: The annotation 0 on a explains that we can use a to form a type, but we cannot, or do not, use it at the term level, thus it can be erased at runtime. The cornerstone of Qtt's approach is that dataflow of a term to the type level counts as 0 use, so arbitrary type-level use is allowed whilst still permitting quantitative analysis of computation-level dataflow. Whilst this gives a useful way to relate linear and dependent types, it cannot reason about dataflow at the type-level (all type-level usage counts as 0). Thus, for example, Qtt cannot express that a variable is used just computationally but not at all in types.
In an extended abstract, Abel proposed a generalisation of Qtt to track variable use at both the type and computational levels [2]. We develop a core dependent type theory along the same lines. We use the terminology of "grading", augmenting types with additional information to capture the structure of proofs or terms [24,46], thus we name our approach Graded Modal Dependent Type Theory (Grtt for short). Our type theory is parameterised by a semiring which, like other coeffect and quantitative approaches [3,6,10,26,44,48,60], describes dataflow through a program, but in both types and computations equally here. We extend Abel's initial idea by presenting a rich language, including dependent tensors, a complete metatheory, and an additional notion of (dependent) graded modality which aids the practical use of this approach. The result is a calculus which extends the power of existing graded typed languages, like Granule [46], to a dependent setting.
We begin by unpacking the definition of Grtt (Section 2), before developing its metatheory (Section 3): admissibility of graded structural rules, substitution, type preservation, and strong normalization. We demonstrate the power of Grtt through several case studies (Section 4). We show that grading can be used to restrict Grtt terms to simply-typed reasoning, parametric reasoning (regaining universal quantification smoothly within a dependent theory), existential types, and linear types. The calculus can be instantiated to different kinds of dataflow reasoning: we show an example application to information-flow security.
We implemented a prototype language based on Grtt called Gerty. We briefly mention its syntax in Section 2.5 for use in examples. Later, Section 5 describes how the formal definition of Grtt is implemented as a bidirectional type checking algorithm, interfacing with an SMT solver to solve constraints over grades. Furthermore, Abel conjectured that a quantitative dependent theory could enable usage-based optimisation of type-checking itself [2], which would assist dependently-typed programming at scale. We validate this claim in Section 5 showing a grade-directed optimisation to Gerty's type checker.
Section 6 discusses next steps for further increasing the expressive power of Grtt. Proofs are provided in the supplement. Appendix A collects typing rules.
Terms include variables and a constructor for an inductive hierarchy of universes, annotated by a level l. Dependent function types are annotated with a pair of grades s and r, with s capturing how x is used in the body of the function (the 'subject'), and r capturing how x is used in the codomain B. Dependent tensors have a single grade r, which describes how the first element is used in the typing of the second. The graded modal type operator s A 'packages' a term and its dependencies (sometimes referred to as 'boxing') so that values of type A can be used with grade s in the future. Graded modal types are introduced via promotion t and eliminated via let x = t 1 in t 2 . The remainder of this section explains the semantics of each piece of syntax with respect to its typing. We typically use A and B to connote terms used as types.

Typing judgments, contexts, and grading
Typing judgments are written in either of the following two equivalent forms: The 'horizontal' syntax (left) is used most often, with the equivalent 'vertical' form (right) used for clarity in some places. Ignoring the part to the left of ⊙, typing judgments and their rules are essentially those of Martin-Löf type theory (with the addition of the modality) where Γ ranges over usual dependently-typed typing contexts. The left of ⊙ provides the grading information, where σ and ∆ range over grade vectors and context grade vectors respectively, of the form: (contexts) Γ ::= ∅ | Γ, x : A (grade vectors) σ ::= ∅ | σ, s (context grade vectors) ∆ ::= ∅ | ∆, σ A grade vector σ is a vector of semiring elements, and a context vector ∆ is a vector of grade vectors. We write (s 1 , . . . , s n ) to denote an n-vector and likewise for context grade vectors. We omit parentheses when this would not cause ambiguity. Throughout, a comma is used to concatenate vectors and disjoint contexts, and to extend vectors with a single grade, grade vector, or typing assumption.
For a judgment (∆ | σ s | σ r ) ⊙ Γ ⊢ t : A the vectors Γ , ∆, σ s , and σ r are all of equal size. Given a typing assumption y : B at index i in Γ , the element σ s [i] ∈ R is a grade denoting the use of y in t (the subject of the judgment), σ r [i] ∈ R is a grade denoting the use of y in A (the subject's type), and ∆[i] ∈ R i is a vector of grades, of size i, that describes how each assumption prior to y is used in the formation of y's type, B.
Consider the following example, which types the body of a function that takes two arguments of type A, and returns only the first: (1), (1,0) 0,1,0 1,0,0 ⊙ a : Type l , x : a, y : a ⊢ x : a Let the context grade vector be called ∆. Then, ∆[0] = () (empty vector) explains that there are no assumptions that are used to type a in the context, as it is a closed term and the first assumption. ∆ [1] = (1) explains that the first assumption a is used (grade 1) in the typing of x in the context, and ∆ [2] = (1, 0), explains that a is used once in the typing of y in the context, and x is unused in the typing of y. The subject grade vector σ s = (0, 1, 0) explains that a is unused in the subject, x is used once, and y is unused. Finally, the subject type vector σ r = (1,0,0) explains that a appears once in the subject's type (which is just a), and x and y are unused in the formation of the subject's type.
To aid reading, recall that standard typing rules typically have the form context ⊢ subject : subject-type, the order of which is reflected by (∆ | σ s | σ r )⊙. . . giving the context, subject, and subject-type grading respectively.
Well-formed contexts Contexts are identified as well-formed by the relation: Unlike typing, well-formedness does not need to include subject and subjecttype grade vectors, as it considers only the well-formedness of the assumptions in a context with respect to prior assumptions in the context. The WfEmp rule states that the empty context is well-formed with an empty context grade vector as there are no assumptions to account for. The WfExt rule states that given A is a type under the assumptions in Γ , with σ accounting for the usage of Γ variables in A, and ∆ accounting for usage within Γ , then we can form the wellformed context Γ, x : A by extending ∆ with σ to account for the usage of A in forming the context. The notation 0 denotes a vector for which each element is the semiring 0. Note that the well-formedness ∆ ⊙ Γ ⊢ is inherent from the premise of WfExt due to the following lemma: Lemma 1 (Typing contexts are well-formed). If (∆ | σ 1 | σ 2 ) ⊙ Γ ⊢ t : A then ∆ ⊙ Γ ⊢.

Typing rules
We examine the typing rules of Grtt one at a time (collected in Appendix A.1).
Variables are introduced as follows: The premise identifies Γ 1 , x : A, Γ 2 as well-formed under the context grade vector ∆ 1 , σ, ∆ 2 . By the size condition |∆ 1 | = |Γ 1 |, we are able to identify σ as capturing the usage of the variables Γ 1 in forming A. This information is used in the conclusion, capturing type-level variable usage as σ, 0, 0, which describes that Γ 1 is used according to σ in the subject's type (A), and that the x and the variables of Γ 2 are used with grade 0. For subject usage, we annotate the first zero vector with a size |∆ 1 |, allowing us to single out x as being the only assumption used with grade 1 in the subject; all other assumptions are used with grade 0. For example, typing the body of the polymorphic identity ends with Var: · · · ((), (1)) ⊙ a : Type, x : a ⊢ WfExt |(())| = |a : Type| (((), (1)) | 0, 1 | 1, 0) ⊙ a : Type, x : a ⊢ x : a
In the conclusion of Var, the typing ((), 1, 0) ⊙ a : Type ⊢ a : Type is 'distributed' to the the typing of x in the context and to the formation the subject's type. Thus subject grade (0, 1) corresponds to the absence of a from the subject and the presence of x, and subject-type grade (1,0) corresponds to the presence of a in the subject's type (a), and the absence of x.

Type
We use an inductive hierarchy of universes [47] with ordering < such that l < suc l. Universes can be formed under any well-formed context, with every assumption graded with 0 subject and subject-type use, capturing the absence of any assumptions from the universes, which are closed forms.
Functions Function types (x : (s,r) A) → B are annotated with two grades: explaining that x is used with grade s in the body of the inhabiting function and with grade r in B. Function types have the following formation rule: The usage of the dependencies of A and B (excepting x) are given by σ 1 and σ 2 in the premises (in the 'subject' position) which are combined as σ 1 + σ 2 (via vector addition using the + of the semiring), which serves to contract the dependencies of the two types. The usage of x in B is captured by r, and then internalised to the binder in the conclusion of the rule. An arbitrary grade for s is allowed here as there is no information on how x is used in an inhabiting function body. Function terms are then typed by the following rule: The second premise types the body of the λ-term, showing that s captures the usage of x in t and r captures the usage of x in B; the subject and subject-type grades for x are the internalised as graded annotations on the binder. Dependent functions are eliminated through application: where * is the usual vector scalar multiplication, using the semiring multiplication. Given a function t 1 which uses its parameter with grade s to compute and with grade r in the typing of the result, we can apply it to a term t 2 , provided that we have the resources required to form t 2 scaled by s at the subject level and by r at the subject-type level since t 2 is substituted into the return type B. This scaling behaviour is akin to that used in coeffect calculi [26,48], Qtt [6,44] and Linear Haskell [8], but scalar multiplication happens here at both the subject and subject-type level. The use of variables in A is accounted for by σ 1 as explained in the third premise, but these usages are not present in the resulting application since A no longer appears in the types or the terms. As a simple example, consider the constant function λx.λy.x : (x : (1,0) A) → (y : (0,0) B) → A (for some A and B). We can see that the resources required for the second parameter (in the body) will always be scaled by 0, and as 0 is absorbing, this means that anything we pass as the second argument has 0 subject and subject-type use. This example begins to show some of the power of grading-the grades capture the structure of the program at all levels.
Tensors The rule for forming dependent tensor types is as follows: This rule is almost identical to the ⊸ rule for forming function types, but there is only a single grade r on the binder, capturing the use of x in B (the type of the second component). For 'positive' semirings, 0 really means 'unused' and (x : 0 A) ⊗ B is then a product A × B (see Section 4 for further discussion). Dependent tensors are introduced as follows: In the typing premise for t 2 , occurrences of x are replaced with t 1 in the type, ensuring that the type of the second component (t 2 ) is calculated using the first component (t 1 ). The resources for t 1 in this substitution are scaled by r, accounting for the existing usage of x in B. In the conclusion, we see the resources for the two components (and their types) combined via the semiring addition. Finally, tensors are eliminated with the following rule: As this is a dependent eliminator, we allow the result type C to depend upon the value of the tensor as a whole, bound as z in the second premise, into which is substituted our actual tensor term t 1 in the conclusion.
To eliminate a tensor, we must consider how each component is used in the resulting expression t 2 . However, as we cannot inspect the tensor term itself, and semiring addition is not injective (thus preventing us from splitting the grades required to form t 1 ), we can only capture that each component of the pair is used in the same way, that is, with the same grading. This is seen by capturing the use of x and y with the same grade s in t 2 , and scaling the resources of the tensor term by this same amount. A remedy to this coarse-grained usage is provided by graded modalities.
Graded Modality Graded binders alone are rather inflexible: they do not allow different subparts of a value to be used differently, e.g., computing the length of a list ignores the elements, projecting from a pair discards one component. We therefore introduce a graded modality (à la [46]) which allows us to capture the notion of local inspection on data, and internalises usage information into types. A type s A denotes terms of type A that are used with grade s. Type formation and introduction rules are: To form a term of type s A, we 'promote' a term t of type A by requiring that we can use the resources used to form t (σ 1 ) according to grade s. This 'promotion' resembles that of other graded modal systems (e.g., [3,10,24,46]).
We can see promotion i as capturing t for later use according to grade s. Thus, when eliminating a term of type s A, we must consider how the 'unboxed' term is used with grade s, as per the following dependent eliminator: then subsumes equality, adding ordering of universe levels. Type conversion then allows re-typing terms based on the subtyping judgment: The full rules for equality and subtyping rules can be found in Appendix A.1.

Operational semantics
As with other graded modal calculi (e.g., [3,10,24]), the core calculus of Grtt has a Call-by-Name small-step operational semantics with reductions t t ′ . The rules are standard, with the addition of the β-rule for the graded modality: Type preservation and normalisation are considered in Section 3.

Implementation and examples
To explore our theory, we provide an implementation, Gerty. Section 5 describes how the declarative definition of the type theory is implemented as a bidirectional type checking algorithm. We briefly mention the syntax here for use in later examples. The following is the polymorphic identity function in Gerty: The syntax resembles the theory, where grading terms .n are syntactic sugar for a unary encoding of grades in terms of 0 and repeated addition of 1, e.g., .2 = (.0 + .1) + .1. This syntax can be used for grade terms of any semiring, which can be resolved to particular built-in semirings at other points of type checking.
The following shows first projection on (non-dependent) pairs, using the graded modality (at grade 0 here) to give fine-grained usage on compound data: The implementation adds various built-in semirings, some syntactic sugar, and extras such as: a singleton unit type, extensions of the theory to semirings with a pre-ordering (discussed further in Section 6), and some implicit resolution. Anywhere a grade is expected, an underscore can be supplied to indicate that Gerty should try to resolve the grade implicitly. Grades may also be omitted from binders (see above in fst), in which case they are treated as implicits. Currently, implicits are handled by generating existentially quantified grade variables, and using SMT to solve the necessary constraints (see Section 5).
So far we have considered the natural numbers semiring providing an analysis of usage. We come back to this and similar examples in Section 4. As another example, we consider here the semiring of privacy levels (proposed by Orchard et al. [46]), which enforces information-flow control, akin to DCC [1]. Differently to DCC, this tracks dataflow through variable dependencies, rather than through the results of computations in the monadic style of DCC. Definition 1. [Security levels] Let R = {Irrelevant, Private, Public} be a set of region labels where 0 = Irrelevant denotes variables that are unused, 1 = Private, i.e., we treat the base notion of dataflow as being within the private domain. Any use of Public data must then be guarded by a graded modality. This forms a lattice via Irrelevant ≤ Private ≤ Public with semiring operations: We use the ordering here just to succinctly state the above operations. This semiring is provided as primitive in Gerty. Thus we can express the following example: idPriv : (a : (.0, .2) Type 0) -> (x : (Private, Irrelevant) a) -> a idPriv = \a -> \x -> x --The following is rejected as ill-typed leak : (a : (.0, .2) Type 0) -> (x : (Public, Irrelevant) a) -> a leak = \a -> \x -> idPriv a x The first definition is well-typed, but the second yields a typing error originating from the application in its body: where grade 1 is Private for the security labels semiring. Thus we can use these abstract label semirings as a way of restricting flow of data between regions as in a region typing system [54,32].

Metatheory
We now study Grtt's metatheory. We first explain how substitution presents itself in the theory, and how type preservation follows from a relationship between equality and reduction. We then show admissibility of graded structural rules for contraction, exchange, and weakening, and strong normalization.

Substitution
We introducing substitution for well-formed contexts and then typing.
Lemma 3 (Substitution for well-formed contexts). If the following hold: That is, given Γ 1 , x : A, Γ 2 is well-formed, we can cut out x by substituting t for x in Γ 2 , accounting for the new usage in the context grade vectors. The usage of Γ 1 in t is given by σ 2 , and the usage in A by σ 1 . When substituting, ∆ remains the same, as Γ 1 is unchanged. However, to account for the usage in [t/x]Γ 2 , we have to form a new context grade vector ∆ ′ \ |∆| + (∆ ′ / |∆|) * σ 2 . The operation ∆ ′ \ |∆| (pronounced 'discard') removes grades corresponding to x, by removing the grade at index |∆| from each grade vector in ∆ ′ . Everything previously used in the typing of x in the context must now be distributed across [t/x]Γ 2 , which is done by adding on (∆ ′ / |∆|) * σ 2 , which uses ∆ ′ / |∆| (pronounced 'choose') to produce a vector of grades, which correspond to the grades cut out in ∆ ′ \ |∆|. The multiplication of (∆ ′ / |∆|) * σ 2 produces a context grade vector by scaling σ 2 by each element of (∆ ′ / |∆|). When adding vectors, if the sizes of the vectors are different, then the shorter vector is right-padded with zeroes. Thus ∆ ′ \ |∆| + (∆ ′ / |∆|) * σ 2 can be read as '∆ ′ without the grades corresponding to x, plus the usage of t scaled by the prior usage of x'.
For example, given typing ((), (1) | 0, 1 | 1, 0) ⊙ a : Type, y : a ⊢ y : a and well-formed context ((), (1), (1,0), (0, 0, 2)) ⊙ a : Type, y : a, x : a, z : t ′ ⊢, where t ′ uses x twice, we can substitute y for x. Therefore, let Γ 1 = a : Type, y : a thus |Γ 1 | = 2 and Γ 2 = z : x and ∆ ′ = ((0, 0, 2)) and σ 1 = 1, 0 and σ 2 = 0, 1. Then the context grade of the substitution [y/x]Γ 2 is calculated as: Definition 7. For type valuation ∆⊙ Γ |= ε and a type A ∈ (Kind∪Type ∪Con) with A typable in ∆ ⊙ Γ , the interpretation of types A ε is defined inductively: Grades play no role in the reduction relation for Grtt, and hence, our interpretation erases graded modalities and their introductory and elimination forms (translated into substitutions). In fact, the above interpretation can be seen as a translation of Grtt {0,1} into non-substructural set theory; there is no data-usage tracking in the image of the interpretation. Tensors are translated into Cartesian products whose eliminators are translated into substitutions similarly to graded modalities. All terms however remain well-typed through the interpretation. The interpretation of terms corresponds to term valuations that are used to close the term before interpreting it into the interpretation of its type.
Definition 8. Valid term valuations, ∆ ⊙ Γ |= ε ρ, are defined as follows: We interpret terms as substitutions, but graded modalities must be erased, and their elimination forms converted into substitutions, and similarly for the eliminator for tensor products.
Definition 9. Suppose ∆ ⊙ Γ |= ε ρ. Then the interpretation of a term t typable in ∆ ⊙ Γ is t ρ = ρ t, but where all let-expressions are translated into substitutions, and all graded modalities are erased.
We now demonstrate Grtt via several cases studies that focus the reasoning power of dependent types via grading. Since grading in Grtt serves to explain dataflow, we can characterise subsets of Grtt that correspond to various type theories. We demonstrate the approach with simple types, parametric polymorphism, and linearity. In each case study, we restrict Grtt to a subset by a characterisation of the grades, rather than by, say, placing detailed syntactic restrictions or employing meta-level operations or predicates that restrict syntax (as one might do for example to map a subset of Martin-Löf type theory into the simply-typed λ-calculus by restricting oneself only to closed types, requiring deep inspection of type terms). Since this restriction is only on grades, we can harness the specific reasoning power of particular calculi from within the language itself, simply by specifications on grades. In the context of an implementation like Gerty, this amounts to using type signatures to restrict dataflow.
Section 5 returns to a case study that builds on the implementation.

Recovering Martin-Löf type theory
When the semiring parameterising Grtt is the singleton semiring (i.e., any semiring where 1 = 0), it can be seen that grade annotations are redundant, as all grades are equal. Such a semiring gives rise to an isomorphism r A ∼ = A, rendering the graded modality redundant. For such a semiring, typing judgments can be written Γ ⊢ t : A, and all vectors and grades on binders may be omitted, giving rise to a standard Martin-Löf type theory as a special case of Grtt.
Taken together, these axioms 3 ensure that a 0-grade in a quantitative semiring represents irrelevant variable use. This notion has recently been proved in a non-dependent setting by Choudhury et al.
[13] via a heap-based semantics for grading (on computations) and the same result applies here. Conversely, in a quantitative semiring any grade other than 0 denotes relevance. From this, we can encode two things: first, we can directly encode non-dependent tensors and arrows: in (x : 0 A) ⊗ B the grade 0 captures that x cannot have any computational content in B, and likewise for (x : (s,0) A) → B the grade 0 explains that x cannot have any computational content in B, but may have computational use according to s in the inhabiting function. Thus, the grade 0 here describes that elimination forms cannot ever inspect the variable during normalisation. Next, we use quantitative semirings for simply-typed and polymorphic reasoning. Example 1. Some quantitative semirings are: -(Exact usage) (N, ×, 1, +, 0); -(0-1 ) The semiring over R = {0, 1} with 1 + 1 = 1 which describes relevant vs. irrelevant dependencies, but no further information. -(None-One-Tons [44]) The semiring on R = {0, 1, ∞} is more fine-grained than 0-1, where ∞ represents more than 1 usage, with 1 + 1 = ∞ = 1 + ∞.

Simply-typed reasoning
As discussed in Section 1, the simply-typed λ-calculus (STLC) can be distinguished from dependently-typed calculi via the restriction of dataflow: in simple types, data can only flow at the computational level, with no dataflow within, into, or from types. We can thus view a Grtt function as simply typed when its variable is irrelevant in the type, e.g., (x : (s,0) A) → B for quantitative semirings.
We define a subset of Grtt restricted to simply-typed reasoning: Definition 11. [Simply-typed Grtt] For a quantitative semiring, the following predicate Stlc(−) determines a subset of simply-typed Grtt programs: That is, all subject-type grades are 0. A similar predicate is defined on wellformed contexts (elided), restricting context grades of well-formed contexts to contain only the zero grading vector. Under the restriction of Definition 11, a subset of Grtt terms embeds into the simply-typed λ-calculus in a sound and complete way. Since STLC does not have a notion of tensor or modality, this is omitted from the encoding: Variable contexts of Grtt are interpreted by point-wise applying − τ to typing assumptions. We then get the following preservation of typing into the simplytyped λ-calculus, and soundness and completeness of this encoding: Lemma 13 (Soundness of typing). Given a derivation of (∆ Theorem 2 (Soundness and completeness of the embedding). Given A then for CBN reduction stlc in simply-typed λ-calculus: Thus, we capture simply-typed reasoning just by restricting type grades to 0 for quantitative semirings. We consider quantitative semirings again for parametric reasoning, but first recall issues with parametricity and dependent types.

Recovering parametricity via grading
One powerful feature of grading in a dependent type setting is the ability to recover parametricity from dependent function types. Consider the following type of functions in System F (we borrow this example from Nuyts et al. [45]): Due to parametricity, we get the following notion of representation independence in System F: for a function f : RI A B, some type γ ′ , and terms h : γ ′ → A and c : γ ′ , then we know that f can only use c by applying h c. Subsequently, RI A B ∼ = A → B by parametricity [51], defined uniquely as: In a dependently-typed language, one might seek to replace System F's universal quantifier with Π-types, i.e.
However, we can no longer reason parametrically about the inhabitants of such types (we cannot prove that RI ′ A B ∼ = A → B) as the free interaction of types and computational terms allows us to give the following non-parametric element of RI ′ A B over 'large' type instances: Instead of applying h c, the above "leaks" the type parameter γ. Grtt can recover universal quantification, and hence parametric reasoning, by using grading to restrict the data-flow capabilities of a Π-type. We can refine representation independence to the following: for some grades s 1 , s 2 , and s 3 , and with shorthand 2 = 1 + 1.
If we look at the definition of leak above, we see that γ is used in the body of the function and thus requires usage 1, so leak cannot inhabit RI ′′ A Type. Instead, leak would be typed differently as: leak : (γ : (1,2) The problematic behaviour (that the type parameter γ is returned by the inner function) is exposed by the subject grade 1 on the binder of γ. We can thus define a graded universal quantification from a graded Π-typed: This denotes that the type parameter γ can appear freely in B described by grade r, but is irrelevant in the body of any corresponding λ-abstraction. This is akin to the work of Nuyts et al. who develop a system with several modalities for regaining parametricity within a dependent type theory [45]. Note however that parametricity is recovered for us here as one of many possible options coming from systematically specialising the grading.
Capturing existential types With the ability to capture universal quantifier, we can similarly define existentials (allowing, e.g., abstraction [11]). We define the existential type via a Church-encoding as follows: Embedding into Stratified System F We show that parametricity is regained here (and thus eqn. (1) really behaves as a universal quantifier and not a general Πtype) by showing that we can embed a subset of Grtt into System F, based solely on a classification of the grades. We follow a similar approach to Section 4.3 for simply-typed reasoning but rather than defining a purely syntactic encoding (and then proving it type sound) our encoding is type directed since we embed Grtt functions of type (x : (0,r) Type l ) → B as universal types in System F with corresponding type abstractions (Λ) as their inhabitants. Since Grtt employs a predicative hierarchy of universes, we target Stratified System F (hereafter SSF) since it includes the analogous inductive hierarchy of kinds [39]. We use the formulation of Eades and Stump [21] with terms t s and types T : T ::= X | T → T | ∀(X : K).T with kinds K ::= ⋆ l where l ∈ N providing the stratified kind hierarchy. Capitalised variables are System F type variables and t s [T ] is type application. Contexts may contain both type and computational variables, and so free-variable type assumptions may have dependencies, akin to dependent type systems. Kinding is via judgments Γ ⊢ T : ⋆ l and typing Γ ⊢ t : T . We define a type directed encoding on a subset of Grtt typing derivations characterised by the following predicate: By Type l ∈ +ve B we mean Type l is not a positive subterm of B, avoiding higher-order typing terms (e.g., type constructors) which do not exist in SSF.
Under this restriction, we give a type-directed encoding mapping derivations of Grtt to SSF: given a Grtt derivation of judgment (∆ | σ 1 | σ 2 ) ⊙ Γ ⊢ t : A we have that ∃t s (an SSF term) such that there is a derivation of judgment Γ ⊢ t s : A τ in SSF where we interpret a subset of Grtt terms A as types: Thus, dependent functions with Type parameters that are computationally irrelevant (subject grade 0) map to ∀ types, and dependent functions with parameters irrelevant in types (subject-type grade 0) map to regular function types. We elide the full details but sketch key parts where functions and applications are translated inductively (where Ty l is shorthand for Type l ): In the last case, note the presence of [t ′ s /x] B τ . Reasoning under the context of the encoding, this is proven equivalent to B τ since the subject type grade is 0 and use of x in B is irrelevant.
Theorem 3 (Soundness and completeness of the SSF embedding). Given Ssf((∆ | σ 1 | σ 2 )⊙Γ ⊢ t : A) and ∃t a in SSF such that (∆ | σ 1 | σ 2 ) ⊙ Γ ⊢ t : A = Γ ⊢ t s : A τ then for CBN reduction Ssf in Stratified System F: s : A τ Thus, we can capture parametricity in Grtt via the judicious use of 0 grading for quantitative semirings as one particular application of grading.

Graded modal types and non-dependent linear types
Grtt can embed the reasoning present in other graded modal type theories (which often have a linear base), for example the explicit semiring-graded necessity modality found in coeffect calculi [10,24] and Granule [46]. We can recover the axioms of a graded necessity modality (usually modelled by an exponential graded comonad [24]). For example, in Gerty the following are well typed: corresponding to ε : 1 A → A and δ r,s : r * s A → r ( s A): operations of graded necessity / graded comonads. Since we cannot use arbitrary terms for grades in the implementation, we have picked some particular grades here for comult. First-class grading is future work, discussed in Section 6.
Linear functions can be captured as A ⊸ B (x : (1,r) A) → B for an exact usage semiring. It is straightforward to characterise a subset of Grtt programs that maps to the linear λ-calculus akin to the encodings above. Thus, Grtt provides a suitable basis for studying both linear and non-linear theories alike.

Implementation
Our implementation Gerty is based on a bidirectionalised version of the typing rules here somewhat following traditional schemes of bidirectional typing [19,20], but with grading (similar to [46], extended considerably to the dependent setting). We briefly outline the implementation scheme, and highlight a few key points, rules, and examples. We also use this implementation to explore further applications of Grtt, namely, optimising type checking algorithms.
Bidirectional typing splits declarative typing rules into check and infer modes. Furthermore, bidirectional Grtt rules split the grading context (left of ⊙) into input and output contexts where (∆ | σ 1 | σ 2 ) ⊙ Γ ⊢ t : A is implemented via: where ⇐ rules check that t has type A and ⇒ rules infer (calculate) that t has type A. In both judgments, the context grading ∆ and context Γ left of ⊢ are inputs whereas the grade vectors σ 1 and σ 2 to the right of A are outputs. This input-output context approach resembles that employed in linear type checking [5,33,61]. Rather than following a "left over" scheme as in these works (where the output context explains what resources are left), the output grades here explain what has been used according to the analysis of grading ('adding up' rather than 'taking away').
For example, the following is the infer rule for function elimination: The rule can be read by starting at the input of the conclusion (left of ⊢), then reading top down through each premise, to calculate the output grades in the rule's conclusion. Any concrete value or already-bound variable appearing in the output grades of a premise can be read as causing an equality check in the type checker. The last premise checks that the output subject-type grade σ 13 from the first premise matches σ 1 + σ 3 (which were calculated by later premises). In contrast, function introduction is a check rule: Thus, dependent functions can be checked against type (x : (s,r) A) → B given input ∆; Γ by first inferring the type of A and checking that its output subjecttype grade comprises all zeros 0. Then the body of the function t is checked against B under the context ∆, σ 1 ; Γ, x : A producing grade vectors σ 2 , s ′ and σ 1 , r ′ where it is checked that s = s ′ and r = r ′ (described implicitly in the rule), i.e., the calculated grades match those of the binder. The implementation anticipates some further work for Grtt: the potential for grades which are first-class terms. In which case, we anticipate complex equations on grades. For grade equality, Gerty has two modes: one which normalises terms and then compares for syntactic equality, and the other which discharges constraints via an off-the-shelf SMT solver (we use Z3 [17]). We discuss briefly some performance implications in the next section.
Using grades to optimise type checking Abel posited that a dependent theory with quantitative resource tracking at the type level could leverage linearitylike optimisations in type checking [2]. Our implementation provides a research vehicle for exploring this idea; we consider one possible optimisation here.
Key to dependent type checking is the substitution of terms into types in elimination forms (i.e., application, tensor elimination). However, in a quantitative semiring setting, if a variable has 0 subject-type grade, then we know it is irrelevant to type formation (it is not semantically depended upon, i.e., during normalization). Subsequently, substitutions into a 0-graded variable can be elided (or allocations to a closure environment can be avoided). We implemented this optimisation in Gerty when inferring the type of an application for t 1 t 2 (rule ⇒ λ e above), where the type of t 1 is inferred as (x : (s,0) A) → B. For a quantitative semiring we know that x irrelevant in B, thus we need not perform the substitution [t 2 /x]B when type checking the application.
We evaluate this on simple Gerty programs of an n-ary "fanout" combinator implemented via an n-ary application combinator, e.g., for arity 3:  Note that fan3 uses its parameter x three times (hence the grade 3) which then incurs substitutions into the type of app3 during type checking, but each such substitution is redundant since the type does not depend on these parameters, as reflected by the 0 subject-types grade.
To evaluate the optimisation and SMT solving vs. normalisation-based equality, we ran Gerty on the fan out program for arities from 3 to 8, with and without the optimisation and under the two equality approaches. Table 1 gives the results. For grade equality based on normalisation, the optimisation has a positive effect on speedup, getting increasingly significant (up to 38% speedup) as the overall cost increases. For SMT-based grade equality, the optimisation causes some slow down for arity 4 and 5 (and only just breaking even for arity 3). This is because working out whether the optimisation can be applied requires checking whether grades are equal to 0, which incurs extra calls to the SMT solver. Eventually, this cost is outweighed by the time saved by reducing substitutions. We note the SMT-based approach is often worse due to the extra start-up cost for the SMT solver. Since the grades here are all relatively simple, it is often more efficient for the type checker to normalise and compare terms rather than compiling to SMT and booting up the external solver.
The baseline performance here is poor (the implementation is not highly optimised) partly due to the overhead of computing type formation judgments often to accurately account for grading. However, such checks are often recomputed and could be optimised away by memoisation. Nevertheless this experiment gives the evidence that grades can indeed be used to optimise type checking. A thorough investigation of grade-directed optimisations is future work.

Discussion
Grading, coeffects, and quantitative types The notion of coeffects, describing how a program depends on its context, arose in the literature from two directions: as a dualisation of effect types [48] and a generalisation of Bounded Linear Logic to general resource semirings [26,10]. Coeffect systems can capture reuse bounds, information flow security [24], hardware scheduling constraints [26], and sensitivity for differential privacy [16,22]. A coeffect-style approach also enables linear types to be retrofitted to Haskell [8]. A common thread is the annotation of variables in the context with usage information, drawn from a semiring. Our approach generalises this idea to capture type, context, and computational usage.
McBride reconciles linear and dependent types, allowing types to depend on linear values [44], refined by Atkey as Quantitative Type Theory [6]. The approach employs coeffect-style annotation of each assumption in a context with a element of a resource accounting algebra. Qtt judgments have the form: where ρ i , ρ are elements of a semiring, and ρ = 0 or ρ = 1, respectively denoting a term which can be used in type formation (erased at runtime) or at runtime. Dependent function arrows are of the form (x ρ : A) → B, where ρ is a semiring element that denotes the computational usage of the parameter.
Variables used for type formation but not computation are annotated by 0. Subsequently, type formation rules are all of the form 0Γ ⊢ T , meaning every variable assumption has a 0 annotation. Grtt is similar to Qtt, but differs in its more extensive grading to track usage in types, rather than blanketing all type usage with 0. In Atkey's formulation, a term can be promoted to a type if its result and dependency quantities are all 0. A set of rules provide formation of computational type terms, but these are also graded at 0. Subsequently, it is not possible to construct an inhabitant of Type that can be used at runtime. We avoid this shortcoming allowing matching on types. For example, a computation t that inspects a type variable a would be typed as: (∆, 0, ∆ ′ | σ 1 , 1, σ ′ 1 | σ 2 , r, σ ′ 2 )⊙Γ, a : Type, Γ ′ ⊢ t : B denoting 1 computational use and r type uses in B.
At first glance, it seems Qtt could be encoded into Grtt taking the semiring R of Qtt and parameterising Grtt by the semiring R ∪ {0} where0 denotes arbitrary usage in type formation. However, there is impedance between the two systems as Qtt always annotates type use with 0. It is not clear how to make this happen in Grtt whilst still having non-0 tracking at the computational level, since we use one semiring for both. Exploring an encoding is future work.
Dependent types and modalities Lago and Gaboardi extend PCF with linear and lightweight dependent types [15] (then adapted for differential privacy analysis [23]). They add a natural number type indexed by upper and lower bound terms which index a modality. Combined with linear arrows of the form [a < I].σ ⊸ τ these describe functions using the parameter at most I times (where the modality acts as a binder for index variable a which denotes instantiations). Their system is leveraged to give fine-grained cost analyses in the context of Implicit Computational Complexity. Whilst a powerful system, their approach is restricted in terms of dependency, where only a specialised type can depend on specialised natural-number indexed terms (which are non-linear).
Gratzer et al. define a dependently-typed language with a Fitch-style modality [31]. It seems that such an approach could also be generalised to a graded modality, although we have used the natural-deduction style for our graded modality rather than the Fitch-style.
As discussed in Section 1, our approach closely resembles Abel's resourceful dependent types [2]. Our work expands on the idea, including tensors and the graded modalities. We considerably developed the associated metatheory, provide an implementation, and study applications.
Further work A powerful extension of Grtt which is future work is to allow grades to be first-class terms. Typing rules in Grtt involving grades could be adapted to internalise the elements as first-class terms. Additional terms can then be added to internalise the semiring operations. We could then, e.g., define the map function over sized vectors, which requires that the parameter function is used exactly the same number of times as the length of the vector: This type provides strong guarantees: the only well-typed implementations do the correct thing, up to permutations of the result vector. Without the grading, an implementation could apply f fewer than n times, replicating some of the transformed elements; here we know that f must be applied exactly n-times.
A further appealing possibility for Grtt is to allow the semiring to be defined internally, rather than as a meta-level parameter, leveraging dependent types for proofs of key properties. An implementation could specify what is required for a semiring instance, e.g., a record type capturing the operations and properties of a semiring. The rules of Grtt could then be extended, similarly to the extension to first-class grades, with the provision of the semiring(s) coming from Grtt terms. Thus, anywhere with a grading premise ( Semiring. This opens up the ability for programmers and library developers to provide custom modes of resource tracking with their libraries, allowing domain-specific program verification.
One expressive extension is to capture analyses which have an ordering, e.g., grading by a pre-ordered semiring, allowing a notion of approximation. This would enable analyses such as bounded reuse from Bounded Linear Logic [29], intervals with least-and upper-bounds on use [46], and top-completed semirings, with an ∞-element denoting arbitrary usage as a fall-back. We have made progress into exploring the interaction between approximation and dependent types, and the remainder of this is left as future work.
Conclusions The paradigm of 'grading' exposes the inherent structure of a type theory, proof theory, or semantics by matching the underlying structure with some algebraic structure augmenting the types. This idea has been employed for reasoning about side effects via graded monads [36], and reasoning about data flow as discussed here by semiring grading. Richer algebras could be employed to capture other aspects, such as ordered logics in which the exchange rule can be controlled via grading (existing work has done this via modalities [35]).
We developed the core of grading in the context of dependent-types, treating types and terms equally (as one comes to expect in dependent-type theories). The tracking of data flow in types appears complex since we must account for how variables are used to form types in both the context and in the subject type, making sure not to repeat context formation use. The result however is a powerful system for studying dependencies in type theories, as shown by our ability to study different theories just be specialising grades.
Whilst not yet a fully fledged implementation, Gerty is a useful test bed for further exploration.

A Appendix
A.1 Full typing rules for Grtt ⊙Γ ⊢ t : A, should it aid readability. σ 1 and σ 2 are grade vectors, which are vectors of grades which describe subject (t) and subject type (A) use, respectively. ∆ is a context grade vector, which is a vector of grade vectors, accounting for usage in the typing of assumptions in the context (Γ).
Throughout the theory, we implicitly assume that for any judgment (∆ | σ 1 | σ 2 ) ⊙ Γ ⊢ J , we have |∆| = |σ 1 | = |σ 2 | = |Γ| (i.e., that sizes align), and that for every vector σ in ∆, the size of σ is the same as its index in ∆. We assume these sizing requirements for all judgments (even those without grade vectors), not just typing ones.
The notation 0 is used to denote a vector consisting entirely of 0 grades, to whichever size would be necessary to satisfy the above sizing conditions. We may write 0 π to specify a size (π), if we feel that it aids understanding.
Proofs over multiple judgments When lemmas should hold for multiple forms of judgment, we may use the syntax (∆ | σ 1 | σ 2 ) ⊙ Γ ⊢ J (where J ranges over judgments), optionally restricting J to a subset of judgments, with J by default ranging over the following forms of judgment: , this would mean all of the following hold: Note that repeated uses of J restrict the form of resulting judgments accordingly, and term operations map over the judgment, thus a lemma "if ( would mean all of the following hold: x]A Operations on terms lift naturally to operations on judgments. For example: Definition 2.2 (Move on grade vectors). mv(π 1 ; π 2 ; σ) moves the element at index π 1 to index π 2 in σ (pushing back elements as necessary). As a special case of this, we define exch(π; σ) = mv((π + 1); π; σ). Definition 2.3 (Move on context grade vectors). The operation mv(π 1 ; π 2 ; ∆) is mv(π 1 ; π 2 ; σ) for each σ in ∆. As a special case of this, we define exch(π; ∆) = mv((π + 1); π; ∆). Definition 2.4 (Contraction on context grade vectors). contr(π; ∆) combines the elements at index π and π+1 for each grade vector in ∆, via addition. This is defined as contr(π; ∆) = ∆\(π+1)+(∆/(π+1)) * (0 π , 1). Definition 2.5 (Insert on grade vectors). The operation ins(π; s; σ) inserts the element s at index π in σ, such that all elements preceeding index π keep their positions, and every element at index π or greater in σ will be shifted one index later in the new vector.
Definition 2.7 (Choose on grade vectors). The operation σ/π selects the element at index π of σ. Definition 2.8 (Choose on context grade vectors). The operation ∆/π is σ/π on each σ in ∆, producing a new grade vector of size |∆|. Definition 2.9 (Discard on grade vectors). The operation σ\π removes the element at index π from σ. Definition 2.10 (Discard on context grade vectors). The operation ∆\π is σ\π for each σ in ∆.
Definition 2.11 (Splash multiplication of grade vectors). The operation σ 1 * σ 2 scales σ 2 by each element of σ 1 to produce a context grade vector. Definition 2.12 (Addition on grade vectors). The operation σ 1 + σ 2 combines the two vectors element-wise using the semiring addition, right-padding the shorter vector with 0, to ensure correct sizing. Definition 2.13 (Addition on context grade vectors). The operation ∆ 1 + ∆ 2 is pointwise addition of the elements of ∆ 1 and ∆ 2 in 'as much as possible.' I.e., if the corresponding vectors at a given index are of different sizes, then the shorter vector is treated as if it were right-padded with zeroes in the addition.
Lemma 3.5 (Typing an assumption in a judgmental context).
Lemma 3.6 (Typing the type of a term).

Contraction
Lemma 3.22 (Contraction). The following rule is admissible:

Exchange
Lemma 3.23 (Exchange). The following rule is admissible: Corollary 3.23.1 (Exchange (general)). As a corollary to Lemma 3.23, the following rule is admissible: Corollary 3.23.2 (Exchange from end). As a corollary to Corollary 3.23.1, the following rule is admissible:

Weakening
Lemma 3.24 (Weakening). The following rule is admissible: Where J is typing, equality, or subtyping.
Lemma 3.25 (Weakening for well-formed contexts). The following rule is admissible: Lemma 3.26 (Weakening (general)). The following rule is admissible (for J is typing, equality, or subtyping):

Substitution
Lemma 3.27 (Substitution for judgments). If the following premises hold: Lemma 3.28 (Equality through substitution). If the following premises hold:

Properties of operations
3.5 Properties of equality Lemma 3.30 (Equality is an equivalence relation). For all, we have:

Properties of subtyping
Lemma 3.34 (Subtyping inversion to typing).
Type l for some level l.

Strong Normalization
Definition 4.1. Typing can be broken up into the following stages: It is the case that Kind ∩ Type = ∅ and Const ∩ Term = ∅.
Definition 4.3. The set of base terms B is defined by: 3. If t 1 ∈ B and t 2 ∈ SN, then (t 1 t 2 ) ∈ B, 4. If t 2 ∈ B and t 1 ∈ SN, then (let (x, y) = t 1 in t 2 ) ∈ B, 5. If t 2 ∈ B and t 1 ∈ SN, then (let x = t 1 in t 2 ) ∈ B, 6. If A, B ∈ SN, then ((x : (r,s) A) → B) ∈ B for any r, s ∈ R, 7. If A, B ∈ SN, then ((x : r A) ⊗ B) ∈ B for any r ∈ R, 8. If A ∈ SN, then ( r A) ∈ B for any r ∈ R.
Definition 4.4. The key redex of a term is defined by: 1. If t is a redex, then t is its own key redex, 2. If t 1 has key redex t, then (t 1 t 2 ) has key redex t, 3. If t 1 has key redex t, then (let (x, y) = t 1 in t 2 ) has key redex t, 4. If t 1 has key redex t, then (let x = t 1 in t 2 ) has key redex t.
The term obtained from t by contracting its key redex is denoted by red k t.
Lemma 4.5. The following are both true: The key redex of a term is unique and a head redex.
Definition 4.6. A set of terms X is saturated if: 3. If red k t ∈ X and t ∈ SN, then t ∈ X.
The collection of saturated sets is denoted by SAT.
Lemma 4.7 (SN is saturated). Every saturated set is non-empty and SN is saturated.
Proof. By definition.
Proof of part 1:

T Type
This case holds trivially, because Type 1 cannot be of type Type 1 .
In this case we have: Thus, we must show that: We know by Definition C.10 and Definition C.9 that x ε = ε x ∈ K A ; thus, we obtain our result.
Thus, by Definition C.8 we must show that: (x : (s,r) B 1 ) → B 2 ε ∈ K Type 0 = SAT By Definition C.10 we know that: By the IH: SAT which holds by the closer of SAT under function spaces.

T Ten
This case follows nearly exactly as the previous case, but ending with a cartesian product, SAT×SAT, rather than the function space.

T Fun
In this case we know: We also know that (∆ | σ 3 + σ 5 | 0) ⊙ Γ ⊢ (x : (s,r) B 1 ) → B 2 : Type 1 by assumption. This implies by inversion that B 1 , B 2 ∈ Kind, B 1 ∈ Kind and B 2 ∈ Type, or B 1 ∈ Type and B 2 ∈ Kind. We consider each case in turn: Then we must show that: By Definition C.10 we know that: We know by assumption that ∆ ⊙ Γ |= ε and (∆, Therefore, by the IH: is what was to be shown. Subcase 2: Suppose B 1 ∈ Kind and B 2 ∈ Type. Then we must show that: By Definition C.10 we know that: which is what was to be shown.
Subcase 3: Suppose B 1 ∈ Type and B 2 ∈ Kind. Then we must show that: By Definition C.10 we know that: which is what was to be shown.

T App
In this case we know that: It suffices to show that: In this case we know by assumption that [t 2 /x]B 2 ∈ Kind which implies that B 2 ∈ Kind, and this further implies that ((x : (s,r) B 1 ) → B 2 ) ∈ Kind. However, there are two cases for B 1 , either B 1 ∈ Kind or B 1 ∈ Type. We consider each case in turn.
Subcase 1: Suppose B 1 ∈ Kind. Then we know by assumption that ∆ ⊙ Γ |= ε. Thus, we apply the IH to obtain: Using the IH's and Definition C.10 we know: which was what was to be shown.
Subcase 2: Suppose B 1 ∈ Type. Then we know by assumption that ∆ ⊙ Γ |= ε. Thus, we apply the IH to obtain: Using the IH's and Definition C.10 we know: which was what was to be shown.

T Pair
In this case we know that: It suffices to show that: By assumption we know that ((x : r B 1 ) ⊗ B 2 ) ∈ Kind. Thus, it must be the case that either B 1 , B 2 ∈ Kind, B 1 ∈ Kind and B 2 ∈ Type, or B 1 ∈ Type and B 2 ∈ Kind. We consider each of these cases in turn.
Case 8: In this case we know that: It suffices to show that: Based on these assumptions, it must be the case that C ∈ Kind and t 2 ∈ Type. First, we can conclude by kinding for typing that: for some vector σ 1 . Then by well-formed contexts for typing we know that: (∆, σ 3 , (σ 4 , r)) ⊙ (Γ, x : B 1 , y : B 2 ) ⊢ This then implies that: We will use these to apply the IH and define the typing evaluation below.
We must now consider cases for when B 1 , B 2 ∈ Kind, B 1 ∈ Kind and B 2 ∈ Type, B 1 ∈ Type and B 2 ∈ Kind, and B 1 , B 2 ∈ Type. We consider each case in turn.
Subcase 1: Suppose B 1 , B 2 ∈ Kind. Then we know: By Definition C.8 and Definition C.10 it suffices to show: By the applying the IH to the premise for t 1 and Definition C.8: Hence, π 1 t 1 ε ∈ K B 1 and π 2 t 1 ε ∈ K B 2 . This along with the kinding judgments given above imply that given the assumption ∆ ⊙ Γ |= ε. We also know: by applying kinding for typing to the premise for t 2 . We now have everything we need to apply the IH to the premise for t 2 : IH (2): Now by Definition C.10 and the previous results: which was what was to be shown.
Subcase 2: Suppose B 1 ∈ Kind and B 2 ∈ Type. Then we know: By Definition C.8 and Definition C.10 it suffices to show: By the applying the IH to the premise for t 1 and Definition C.8: This along with the kinding judgments given above imply that given the assumption ∆ ⊙ Γ |= ε. We also know: by applying kinding for typing to the premise for t 2 . We now have everything we need to apply the IH to the premise for t 2 : IH (2): Now by Definition C.10 and the previous results: which was what was to be shown.
Subcase 3: Suppose B 1 ∈ Type and B 2 ∈ Kind. This case is similar to the previous case using: Subcase 4: Suppose B 1 , B 2 ∈ Type. This case is similar to the previous case using: This case follows directly from the induction hypothesis.
Case 10: This case follows directly from the induction hypothesis.

T BoxE
In this case we know that: It suffices to show that: In this case we know that ([t 1 /z]B 2 ) ∈ Kind, and thus, B 2 ∈ Kind and either B 1 ∈ Kind or B 1 ∈ Type. We cover both of these cases in turn.
Subcase 1: Suppose B 1 ∈ Kind. It suffices to show that: As we have seen in the previous cases we can apply well-formed contexts for typing to obtain that: We can now apply the IH to the premise for t 1 to obtain: Using the previous two facts along with the assumption that ∆ ⊙ Γ |= ε we may obtain In addition, we know that (∆, σ 7 | σ 6 , r | 0) ⊙ Γ, z : s B 1 ⊢ B 2 : Type 1 . Thus, we can now apply the induction hypothesis a second time.
IH (2): which was what was to be shown.
Subcase 2: Suppose B 1 ∈ Type. It suffices to show that: As we have seen in the previous cases we can apply well-formed contexts for typing to obtain that: Using this along with the assumption that ∆ ⊙ Γ |= ε we may obtain In addition, we know that (∆, σ 7 | σ 6 , r | 0) ⊙ Γ, z : s B 1 ⊢ B 2 : Type 1 . Thus, we can now apply the induction hypothesis a second time.
IH (2): which was what was to be shown.
Case 12: This case follows by first applying the induction hypothesis to the typing premise, and then applying Lemma C.15 to obtain that K A = K A ′ obtaining our result.
We now move onto the second part of this result assuming the first. In this part we will show: Recall that this is a proof by mutual induction on (∆ | σ 1 | σ 2 ) ⊙ Γ ⊢ t : A and (∆ | σ | 0) ⊙ Γ ⊢ A : Type 1 .

T Var
This case is impossible, because the well-formed context premise fails, because Type 1 has no type.

T Arrow
In this case we know that: It suffices to show that: In this case either B 1 , B 2 ∈ Kind, B 1 ∈ Kind and B 2 ∈ Type, or B 1 ∈ Type and B 2 ∈ Kind. We consider each case in turn.
Subcase 1: Suppose B 1 , B 2 ∈ Kind. It suffices to show: Now suppose X ∈ (K B 1 → K B 2 ) and Y ∈ K B 1 . We know by assumption that (∆ | σ 3 | 0) ⊙ Γ ⊢ B 1 : Type l1 and ∆ ⊙ Γ |= ε, and so we can apply the induction hypothesis to the premise for B 1 to obtain: The previous facts now allow us, by Definition C.9, to obtain: Thus, we can now apply the induction hypothesis to the premise for B 2 to obtain: Then we know by IH(1) that B 1 ε (Y ) ∈ SAT and by IH(2) B 2 ε[x →Y ] (X(Y )) ∈ SAT, thus: Then by Lemma C.8: Therefore, we obtain our result.
Now suppose X ∈ K B 2 . We know by assumption that (∆ | σ 3 | 0) ⊙ Γ ⊢ B 1 : Type 0 and ∆ ⊙ Γ |= ε, and so we can apply the first part of the induction hypothesis to the premise for B 1 to obtain: IH (1): B 1 ε ∈ K Type 0 = SAT Now we know by assumption that (∆, σ 3 | σ 4 , r | 0) ⊙ Γ, x : B 1 ⊢ B 2 : Type 0 , and we can now show by Definition C.9 that (∆, σ 3 ) ⊙ (Γ, x : B 1 ) |= ε holds. So we can apply the IH to the former judgment to obtain: At this point we can see that (λX ∈ K B 2 .( B 1 ε → B 2 ε (X))) ∈ SAT by the previous facts, and the fact that SAT is closed under function spaces.

T Ten
This case is similar to the previous case.

T Box
This case follows from the IH.
Case 6: This case is impossible, because Type 1 has no type B.
Proof. This is a proof by induction on the assumed typing derivation.

T Type
In this case we have: We can now see that this case holds trivially, because Type 1 has no type.
In this case we have: Now either A ∈ Kind or A ∈ Type. We consider both cases in turn.

20
Subcase 1: Suppose A ∈ Kind and ∆ ⊙ Γ |= ε ρ. It suffices to show: But, this holds by Definition C.12, because the well-formed context premise above implies the proper kinding of A.
Subcase 2: Suppose A ∈ Type and ∆ ⊙ Γ |= ε ρ. It suffices to show: But, this holds by Definition C.12, because the well-formed context premise above implies the proper kinding of A.
Case 3: In this case we have: We only need to consider the first case of this theorem, because Type 1 has no type. So suppose ∆ ⊙ Γ |= ε ρ. It suffices to show: We know by assumption that ∆ ⊙ Γ |= ε ρ so we can apply the IH to conclude: Now suppose t ∈ B 1 ε . Then we know by Definition C.12 that (∆, σ 3 ) ⊙ (Γ, x : Thus, by applying the IH to the premise for B 2 we may conclude that IH (2): holds for every t. Therefore, we may conclude out result.

T Ten
Similar to the previous case.

T Fun
In this case we have: We have two cases to consider, either ((x : (s,r) B 1 ) → B 2 ) ∈ Kind or ((x : (s,r) B 1 ) → B 2 ) ∈ Type. We cover both cases in turn.
Then by the IH: ) Thus, we obtain our result. Subcase 2: Suppose B 1 ∈ Kind and B 2 ∈ Type. This case is similar to the previous case, except we will use part two of the IH. Subcase 3: Suppose B 1 ∈ Type and B 2 ∈ Kind. Similar to the previous case.
Subcase 2: Suppose ((x : (s,r) B 1 ) → B 2 ) ∈ Type. This case is similar to the above, but we will use the second part of the IH.

T App
In this case we have: We have several cases to consider.
By the IH we have: Notice that by Lemma C.16 we know that t 2 ε ∈ K B 1 . So using C.10 we know: Subcase 2: Suppose B 1 ∈ Kind and [t 2 /x]B 2 ∈ Type. It suffices to show: By the IH we have: Notice that by Lemma C.16 we know that t 2 ε ∈ K B 1 . So using C.10 we know: Subcase 3: Suppose B 1 ∈ Type and [t 2 /x]B 2 ∈ Type. It suffices to show: By the IH we have: Using C.10 we know:

T Pair
This case is similar to the case for λ-abstraction above.
Case 8: Similar to the application case above.
Case 9: This case follows from the IH.

23
Case 10: In this case we have: We have two cases to consider.
Subcase 1: Suppose B ∈ Kind. It suffices to show: At this point, this case holds by the IH. Subcase 2: Suppose B ∈ Type. Similar to the previous case.

T BoxE
In this case we have: We have several cases to consider: Subcase 1: Suppose B 1 , B 2 ∈ Kind. It suffices to show: By the IH: But, this follows by definition and Lemma C.16. By the IH: Subcase 2: Suppose B 1 ∈ Kind and B 2 ∈ Type. Similar to the previous case.
Subcase 3: Suppose B 1 ∈ Type and B 2 ∈ Kind. It suffices to show: By the IH: But, this follows by definition. By the IH: Subcase 4: Suppose B 1 , B 2 ∈ Type. Similar to the previous case.

Corollary 4.17.1 (Strong Normalization). For every
Proof. Similarly to CC, we can define a notion of canonical element in K A , and define a term valuation ∆ ⊙ Γ |= ε ρ, and then conclude SN by the previous theorem.

A Proofs for Graded Modal Dependent Type Theory
Proof. For well-formedness: By induction on the definition of well-formed contexts.
• Case Wf Empty. Trivial since it does not match the form of the lemma.
• Case Wf Ext We consider two cases depending on the syntactic structure of Γ 2 and ∆ 2 (simultaneously, since they must have the same size by the lemma statement).
For typing: By induction on (∆, thus satisfying the goal here.
Case T Var Thus, we can build the var rule: which satisfies the goal here.

26
By induction on the premises we get: Then we can derive: T Arrow which satisfies this case.
Case T Ten Same reasoning as above since the structure is exactly the same and thus σ 1 \π = σ ′ 1 , σ ′′ 1 . From which we form the derivation: satisfying the goal here.
• Case T TyConv By induction we have that: Then we can build the judgment: As required.
For subtyping, by standard induction and re-application (see Section A.8). For equality, by standard induction and re-application (see Section A.8).
We need to show ∆ ⊙ Γ ⊢, which holds by premise. Case.
This holds by induction on the typing premise for A. Each of the cases for T Ten, T App, T Pair, T Box, T BoxI, T BoxE, and T TenCut proceed similarly. Case.
For subtyping: all cases proceed trivially by induction.
For equality: all cases proceed trivially by induction, with the exception of TEQ Fun, which proceeds similarly to T Fun.
Proof. This holds by assumption over judgments. This may also be proven inductively, by adding size annotations to occurrences of 0.
Proof. By induction on the definition of well-formed contexts.
• Case Wf Empty. Trivial since it does not match the form of the lemma.
• Case Wf Ext We consider two cases depending on the syntactic structure of Γ 2 and ∆ 2 (simultaneously, since they must have the same size by the lemma statement).
Lemma 3.5 (Typing an assumption in a judgmental context).
Then we will show (∆ | 0 | 0) ⊙ Γ ⊢ Type suc l : Type suc suc l which holds by T Type and the well-formedness premise for Γ. Case.
The proof for T Ten follows the same process. Case.
Then we will show (∆ | σ 1 + σ 3 | 0) ⊙ Γ ⊢ (x : r A) ⊗ B : Type l ′ for some level l ′ . Applying Lemma 3.5 to the typing premise for B, we have (∆ | σ 1 | 0) ⊙ Γ ⊢ A : Type l ′′ (for some level l ′′ ). We can then apply this result, along with the typing premise for B to T Ten, to obtain: As required. Case.
Then we will show (∆ | 0 | 0) ⊙ Γ ⊢ Type l : Type suc l which holds by the following derivation: Case.
Then we will show (∆ | σ 2 | 0) ⊙ Γ ⊢ s A : Type l , for some level l. By induction on the typing premise for t, we have (∆ | σ 2 | 0) ⊙ Γ ⊢ A : Type l , therefore, we can perform the following application to achieve our goal: Case.
Proof. For well-formed contexts: Case.
Then our goal holds by the following derivation: Then our goal holds by the following derivation: For typing: most cases hold by induction and reapplication. Use when necessary. For equality, all cases proceed by induction then re-application to respective rules. For subtyping, all cases proceed by induction then re-application to respective rules.
Then our goal holds by induction.
Then our goal holds by induction.
for some level l.
Then our goal holds by induction.
Then our goal holds by the following derivation: Proof. Proof proceeds similarly to the proof for Lemma 3.11, but using Lemma 3.15.
Then by Lemma 3.16 we have: Then our goal holds by the following derivation: Case.
Then our goal holds by the following derivation: Lemma 3.14 (Arrow subtyping inversion).
Type l ′ , then x = y, s = s ′ , r = r ′ , and respectively, based on J 1 , Proof. Proof proceeds similarly to the proof for Lemma 3.16.

Lemma 3.15 (Tensor subtyping inversion). If
Proof. Proof proceeds similarly to the proof for Lemma 3.14.

Lemma 3.16 (Box subtyping inversion). If
, then s = s ′ , and respectively, based on J 1 , Proof. For subtyping, by induction, as follows: Case. With: Case.

ST Eq
Then by induction we have s = s ′ and (∆ | σ | 0) ⊙ Γ ⊢ A = A ′ : Type l , and therefore have ( Case.

ST Trans
Then by ST Trans we have (∆ | σ) ⊙ Γ ⊢ s A ≤ C, and therefore our goal holds by induction. Case.

ST Trans
Then by ST Trans we have (∆ | σ) ⊙ Γ ⊢ C ≤ s ′ A ′ , and our goal holds by induction. Case. Case.

ST Trans
Then by induction we have (∆ | σ) ⊙ Γ ⊢ A ′′ ≤ A ′ , and therefore our goal holds by ST Trans. Case.
Then our goal holds by induction. Case.

TEQ ConvTy
Then by ST Trans we have (∆ | σ) ⊙ Γ ⊢ C ′ ≤ Type l , and our goal holds by induction. Case.
Remaining cases proceed similarly.
Then our goal holds by the following derivation:

T Arrow
As required. The proofs for T Ten, T Fun, T App, T Pair, T TenCut, T Box, T BoxI, T BoxE, and T TyConv proceed similarly. Case.
Our goal holds by the following derivation: Case.
Our goal holds by the following derivation: Our goal holds by the following derivation: Case.
Corollary 3.23.2 (Exchange from end). As a corollary to Corollary 3.23.1, the following rule is admissible: Proof. This is Corollary 3.23.1 with Γ 3 being empty.
Then our goal is to show: We we obtain through the following derivation: Case.

T Arrow
Then our goal is to show: We we obtain through the following derivation: The proofs for T Ten, T Fun, T App, T Pair, T TenCut, T Box, T BoxI, T BoxE, and T TyConv proceed similarly, using induction and Lemma 3.23. Case.
Then our goal is to show: Which holds by the following derivation: For equality, by standard induction and re-application (see Section A.8). For subtyping, by standard induction and re-application (see Section A.8).

A.2.4 Substitution
Lemma 3.27 (Substitution for judgments). If the following premises hold: Proof. Throughout the proof, we make implicit use of the following size information derived from the premises (and largely Lemma 3.2), further size calculations are trivial, and we typically do not bring attention to them: For well-formed contexts, we proceed by induction on the structure of (∆, σ 1 , ∆ ′ ) ⊙ Γ 1 , x : A, Γ 2 ⊢, as follows:

Wf Empty
Trivial as it does not match the form of the typing premise. Case.
Then our goal is: Which holds by Lemma 3.3 on the typing premise for A. Case.
This proceeds similarly to the case for T Arrow. Case.
Therefore we have t ′ = x; and B = A. Our full goal is therefore: We can form the following inductive premises: Giving: We can then form the following derivation, to achieve our goal: Case.
For equality, we proceed by induction and re-application on the structure of ∆,σ1,∆ ′ σ3,s,σ4 σ5,r,σ6 t ′′ : B (in some cases substitutions need to be rewritten, to see how to rewrite substitutions, refer to the typing cases). See Section A.8. For subtyping, by standard induction and re-application (see Section A.8).

A.8 Standard results
For inductive results on multi-judgment lemmas, refer to the following lists for how to obtain goals: For subtyping: • ST Eq This holds by induction then re-application to the rule.
• ST Trans This holds by induction then re-application to the rule.
• ST Ty This holds by induction then re-application to the rule.
• ST Arrow This holds by induction then re-application to the rule (see the T Arrow case for how to handle the typing premise for [[B]]).
• ST Ten This holds by induction then re-application to the rule (see the T Fun case for how to handle the extended context).
• ST Box This holds by induction then re-application to the rule.
For equality: • TEQ Refl This holds by induction then re-application to the rule.
• TEQ Trans Holds similarly to the case for TEQ Refl.
• TEQ Sym Holds similarly to the case for TEQ Refl.
• TEQ ConvTy Holds similarly to the case for TEQ Refl.
• TEQ Arrow Holds similarly to the case for T Arrow.
• TEQ ArrowComp Holds similarly to the case for T App (induction then re-application).
• TEQ ArrowUniq Holds by induction then re-application.
• TEQ Fun Holds similarly to the case for T Fun.
• TEQ App Holds similarly to the case for T App.
• TEQ Ten Holds similarly to the case for T Ten.
• TEQ TenComp Holds by induction then re-application.
• TEQ Pair Holds similarly to the case for T Pair.
• TEQ TenCut Holds similarly to the case for T TenCut.
• TEQ TenU Holds by induction then re-application.
• TEQ Box Holds similarly to the case for T Box.
• TEQ BoxI Holds similarly to the case for T BoxI.
• TEQ BoxB Holds by induction then re-application.
• TEQ BoxE Holds similarly to the case for T BoxE.
• TEQ BoxU Holds by induction then re-application.

B.1 Simply-typed Lambda Calculus
A subset of GrTT encodes STLC We define the following subset of Grtt typing judgments by the predicate Stlc(⊢): Then we define an inductive encoding − from of a subset of Grtt syntax (ignoring Type and tensors) to STLC. We define a partial inductive encoding on terms − t and contexts − Γ :

.1 Key lemmas on quantiative use
Some key lemmas for soundness And we are done. Case.
By ST Trans we have (∆ | 0) ⊙ Γ ⊢ B ≤ Type l , and therefore our goal holds by induction.  The second goal follows from A2 or B2 trivially.

T App
We define two cases depending on A ′ : -Type l ′ ∈ +ve A ′ (no positive occurences of Type l ′ ) Thus t a = t ′ a t ′′ a . By induction we know that: /y](t ′ a t ′′ a ) satisfying the goal.
y = x then t 1 = y and [t 2 /y]t 1 = y = x. By the encoding then t 0 = x and thus [T /y]t 0 = x satisfying the goal here.