Typpete encodes the type inference problem for a Python program into an SMT constraint resolution problem such that any solution of the SMT problem yields a valid type assignment for the program. The process of generating the SMT problem consists of three phases, which we describe below.
In a first pass over the input program, Typpete collects: (1) all globally defined names (to resolve forward references), (2) all classes and their respective subclass relations (to define subtyping), and (3) upper bounds on the size of certain types (e.g., tuples and function parameters). This pre-analysis encompasses both the input program—including all transitively imported modules—and stub files, which define the types of built-in classes and functions as well as libraries. Typpete already contains stubs for the most common built-ins; users can add custom stub files written in the format that is supported by MyPy.
In the second phase, Typpete declares an algebraic datatype Type, whose members correspond one-to-one to Python types. Typpete declares one datatype constructor for every class in the input program; non-generic classes are represented as constants, whereas a generic class with n type parameters is represented by a constructor taking n arguments of type Type. As an example, the class
in Fig. 1 is represented by the constant \(\textsf {class}_\textsf {Odd}\). Typpete also declares constructors for tuples and functions up to the maximum size determined in the pre-analysis, and for all type variables used in generic functions and classes.
The subtype relation
is represented by an uninterpreted function subtype which maps pairs of types to a boolean value. This function is delicate to define because of the possibility of matching loops (i.e., axioms being endlessly instantiated [7]) in the SMT solver. For each datatype constructor, Typpete generates axioms that explicitly enumerate the possible subtypes and supertypes. As an example, for the type \(\textsf {class}_\textsf {Odd}\), Typpete generates the following axioms:
Note that the second axiom allows None to be a subtype of any other type (as in Java). As we discuss in the next section, this definition of subtype allows us to avoid matching loops by specifying specific instantiation patterns for the SMT solver. A substitution function substitute, which substitutes type arguments for type variables when interacting with generic types, is defined in a similar way.
In the third step, Typpete traverses the program while creating an SMT variable for each node in its abstract syntax tree, and generating type constraints over these variables for the constructs in the program. During the traversal, a context maps all defined names (i.e., program variables, fields, etc.) to the corresponding SMT variables. The context is later used to retrieve the type assigned by the SMT solver to each name in the program. Constraints are generated for expressions (e.g., call arguments are subtypes of the corresponding parameter types), statements (e.g., the right-hand side of an assignment is a subtype of the left hand-side), and larger constructs such as methods (e.g., covariance and contravariance constraints for method overrides). For example, the (simplified) constraint generated for the call to
at line 21 in Fig. 1 contains a disjunction of cases depending on the type of the receiver:
$$\begin{aligned}&(\textsf {v}_\textsf {item1} = \textsf {class}_\textsf {Odd} \mathrel {\wedge } \textsf {compete}_{\textsf {Odd}} = \textsf {f\_2}(\textsf {class}_{\textsf {Odd}}, \textsf {arg}, \textsf {ret}) \mathrel {\wedge }\textsf {subtype}(\textsf {v}_\textsf {item2}, \textsf {arg})) \\ \vee ~&(\textsf {v}_\textsf {item1} = \textsf {class}_\textsf {Even} \mathrel {\wedge } \textsf {compete}_{\textsf {Even}} = \textsf {f\_2}(\textsf {class}_\textsf {Even}, \textsf {arg}, \textsf {ret}) \mathrel {\wedge } \textsf {subtype}(\textsf {v}_\textsf {item2}, \textsf {arg})) \end{aligned}$$
where f_2 is a datatype constructor for a function with two parameter types (and one return type ret), and \(\textsf {v}_\textsf {item1}\) and \(\textsf {v}_\textsf {item2}\) are the SMT variables corresponding to
and
, respectively.
The generated constraints guarantee that any solution yields a correct type assignment for the input program. However, there are often many different valid solutions, as the constraints only impose lower or upper bounds on the types represented by the SMT variables (e.g., subtype(\(\textsf {v}_\textsf {item2}\), arg) shown above imposes only an upper bound on the type of \(\textsf {v}_\textsf {item2}\)). This has an impact on performance (cf. Sect. 4) as the search space for a solution remains large. Moreover, some type assignments could be more desirable than others for a user (e.g., a user would most likely prefer to assign type int rather than object to a variable initialized with value zero). To avoid these problems, Typpete additionally generates optional type equality constraints in places where the mandatory constraints only demand subtyping (i.e., local variable assignments, return statements, passed function arguments), thereby turning the SMT problem into a MaxSMT optimization problem. For instance, in addition to subtype(\(\textsf {v}_\textsf {item2}\), arg) shown above, Typpete generates the optional equality constraint \(\textsf {v}_\textsf {item2}\) \(=\) arg. The optional constraints guide the solver to try the specified exact type first, which is often a correct choice and therefore improves performance, and additionally leads to solutions with more precise variable and parameter types.