In the following, we derive a more precise data usage analysis based on syntactic dependencies between program variables. For simplicity, the analysis does not take program termination into account, but we discuss possible solutions at the end of the section. Due to space limitations, we only provide a terse description of the abstraction and refer to [36] for further details.
In order to capture implicit dependencies from variables appearing in boolean conditions of conditional and while statements, we track when the value of a variable is used or modified in a statement based on the level of nesting of the statement in other statements. More formally, each program variable maps to a value in the complete lattice shown in Fig. 5: the values \(U\) (used) and \(N\) (not-used) respectively denote that a variable may be used and is not used at the current nesting level; the values \(B\) (below) and \(W\) (over written) denote that a variable may be used at a lower nesting level, and the value W additionally indicates that the variable is modified at the current nesting level.
A variable is used (i.e., maps to \(U\)) if it is used in an assignment to another variable that is used in the current or a lower nesting level (i.e., a variable that maps to \(U\) or \(B\)). We define the operator \(\textsc {assign} [\![{x = e} ]\!]\) to compute the effect of an assignment on a map \(m:\mathrm {X} \rightarrow \textsc {usage}\), where X is the set of all variables:
$$\begin{aligned} \textsc {assign} [\![{x = e} ]\!](m) {\mathop {=}\limits ^{{\tiny \mathrm{def}}}}\lambda y. {\left\{ \begin{array}{ll} W&{} y = x \wedge y \not \in \textsc {vars} (e) \wedge m(x) \in \left\{ U, B\right\} \\ U&{} y \in \textsc {vars} (e) \wedge m(x) \in \left\{ U, B\right\} \\ m(y) &{} \text {otherwise} \end{array}\right. } \end{aligned}$$
(19)
The assigned variable is overwritten (i.e., maps to \(W\)), unless it is used in e.
Another reason for a variable to be used is if it appears in the boolean condition e of a statement that uses another variable or modifies another used variable (i.e., there exists a variable x that maps to \(U\) or \(W\)):
$$\begin{aligned} \textsc {filter} [\![{e} ]\!](m) {\mathop {=}\limits ^{{\tiny \mathrm{def}}}}\lambda y. {\left\{ \begin{array}{ll} U&{} y \in \textsc {vars} (e) \wedge \exists x \in \mathrm {X} :m(x) \in \left\{ U, W\right\} \\ m(y) &{} \text {otherwise} \end{array}\right. } \end{aligned}$$
(20)
We maintain a stack of these maps that grows or shrinks based on the level of nesting of the currently analyzed statement. More formally, a stack is a tuple \(\langle m_0, m_1, \dots , m_k \rangle \) of mutable length k, where each element \(m_0, m_1, \dots , m_k\) is a map from \(\mathrm {X} \) to \(\textsc {usage}\). In the following, we use \(\mathrm {Q} \) to denote the set of all stacks, and we abuse notation by writing \(\textsc {assign} [\![{x = e} ]\!]\) and \(\textsc {filter} [\![{e} ]\!]\) to also denote the corresponding operators on stacks:
$$\begin{aligned} \textsc {assign} [\![{x = e} ]\!](\langle m_0, m_1, \dots , m_k \rangle )&{\mathop {=}\limits ^{{\tiny \mathrm{def}}}}\langle \textsc {assign} [\![{x = e} ]\!](m_0), m_1, \dots , m_k \rangle \\ \textsc {filter} [\![{e} ]\!](\langle m_0, m_1, \dots , m_k \rangle )&{\mathop {=}\limits ^{{\tiny \mathrm{def}}}}\langle \textsc {filter} [\![{e} ]\!](m_0), m_1, \dots , m_k \rangle \end{aligned}$$
The operator \(\textsc {push} \) duplicates the map at the top of the stack and modifies the copy using the operator \(\textsc {inc} \), to account for an increased nesting level:
$$\begin{aligned} \begin{aligned} \textsc {push} (\langle m_0, m_1, \dots , m_k \rangle )&{\mathop {=}\limits ^{{\tiny \mathrm{def}}}}\langle \textsc {inc} (m_0), m_0, m_1, \dots , m_k \rangle \\ \textsc {inc} (m)&{\mathop {=}\limits ^{{\tiny \mathrm{def}}}}\lambda y. {\left\{ \begin{array}{ll} B&{} m(y) \in \left\{ U\right\} \\ N&{} m(y) \in \left\{ W\right\} \\ m(y) &{} \text {otherwise} \end{array}\right. } \end{aligned} \end{aligned}$$
(21)
A used variable (i.e., mapping to \(U\)) becomes used below (i.e., now maps to \(B\)), and a modified variable (i.e., mapping to \(W\)) becomes unused (i.e., now maps to \(N\)). The dual operator \(\textsc {pop} \) combines the two maps at the top of the stack:
$$\begin{aligned} \begin{aligned} \textsc {pop} (\langle m_0, m_1, \dots , m_k \rangle )&{\mathop {=}\limits ^{{\tiny \mathrm{def}}}}\langle \textsc {dec} (m_0, m_1), \dots , m_k \rangle \\ \textsc {dec} (m, k)&{\mathop {=}\limits ^{{\tiny \mathrm{def}}}}\lambda y. {\left\{ \begin{array}{ll} k(y) &{} m(y) \in \left\{ B, N\right\} \\ m(y) &{} \text {otherwise} \end{array}\right. } \end{aligned} \end{aligned}$$
(22)
where the \(\textsc {dec} \) operator restores the value a variable y mapped to before increasing the nesting level (i.e., k(y)) if it has not changed since (i.e., if the variable still maps to \(B\) or \(N\)), and otherwise retains the new value y maps to.
We can now define the data usage analysis \(\varLambda _\mathrm {Q} \), which is a backward analysis on the lattice
. The partial order
and the least upper bound \(\sqcup _\mathrm {Q} \) are the pointwise lifting, for each element of the stack, of the partial order and least upper bound between maps from \(\mathrm {X} \) to \(\textsc {usage}\) (which in turn are the pointwise lifting of the partial order
and least upper bound \(\sqcup _\textsc {usage}\) of the \(\textsc {usage}\) lattice, cf. Fig. 5). We define the transfer function \(\varTheta _\mathrm {Q} [\![{s} ]\!]:\mathrm {Q} \rightarrow \mathrm {Q} \) for each statement s in our simple programming language as follows:
The initial stack contains a single map, in which the output variables map to the value \(U\), and all other variables map to \(N\). We exemplify the analysis below.
Example 10
Let us consider again the program P shown in Example 7. The initial stack begins with a single map m, in which the output variable \(\texttt {passing}\) maps to \(U\) and all other variables map to \(N\).
At line 4, before analyzing the body of the conditional statement, a modified copy of m is pushed onto the stack: this copy maps \(\texttt {passing}\) to \(B\), meaning that \(\texttt {passing}\) is only used in a lower nesting level, and all other variables still map to \(N\) (cf. Eq. 21). As a result of the assignment (cf. Eq. 19), \(\texttt {passing}\) is overwritten (i.e., maps to \(W\)), and bonus is used (i.e., maps to \(U\)). Since the body of the conditional statement modifies a used variable and uses another variable, the analysis of its boolean condition makes \(\texttt {math}\) used as well (cf. Eq. 20). Finally, the maps at the top of the stack are merged and the result maps \(\texttt {math}\), \(\texttt {bonus}\), and \(\texttt {passing}\) to \(U\), and all other variables to \(N\) (cf. Eq. 22). The analysis is visualized in Fig. 6.
The stack remains unchanged at line 3 and line 2, since the statement at line 3 is identical to line 4 and the body of the conditional statement at line 2 does not modify any used variable and does not use any other variable. Finally, at line 1 the variable \(\texttt {passing}\) is modified (i.e., it now maps to \(W\)), while \(\texttt {math}\) and \(\texttt {bonus}\) remain used (i.e., they map to \(U\)). Thus, the analysis is precise enough to conclude that the input variables \(\texttt {english}\) and \(\texttt {science}\) are unused. \(\blacksquare \)
Note that, similarly to the non-interference analysis presented in Sect. 8, the data usage analysis \(\varLambda _\mathrm {Q} \) does not consider non-termination. Indeed, for the program shown in Example 8, the analysis does not capture that the input variable \(\texttt {english}\) is used, even though the termination of the program depends on its value. We define the concretization function \(\gamma _\mathrm {Q} :\mathrm {Q} \rightarrow \mathcal {P}\left( \mathcal {P}\left( \varSigma \times \varSigma \right) \right) \) as:
$$\begin{aligned} \gamma _\mathrm {Q} (\langle m_0, \dots , m_k \rangle ) {\mathop {=}\limits ^{{\tiny \mathrm{def}}}}\left\{ R \in \varSigma \times \varSigma \mid \forall i \in \mathrm {X} :m_0(i) \in \left\{ N\right\} \Rightarrow \textsc {unused}_i(R) \right\} \end{aligned}$$
(23)
where again we write \(\textsc {unused}_i\) (cf. Eq. 3) to also denote its dependency abstraction. We now show that \(\varLambda _\mathrm {Q} \) is sound for proving that a program does not use a subset of its input variables, if the program is terminating.
Theorem 7
A terminating program does not use a subset J of its input variables if the image via \(\gamma _\rightsquigarrow \circ \gamma _\mathrm {Q} \) of its abstraction \(\varLambda _\mathrm {Q} \) is a subset of \(\mathcal {N} _J\):
$$\begin{aligned} \gamma _\rightsquigarrow (\gamma _\mathrm {Q} (\varLambda _\mathrm {Q})) \subseteq \mathcal {N} _J \Rightarrow P \models \mathcal {N} _J \end{aligned}$$
Proof
Let us assume that \(\gamma _\rightsquigarrow (\gamma _\mathrm {Q} (\varLambda _\mathrm {Q})) \subseteq \mathcal {N} _J\). Since the program is terminating, we have that \(\varLambda _\rightsquigarrow \subseteq \gamma _\mathrm {Q} (\varLambda _\mathrm {Q})\), by definition of the concretization function \(\gamma _\mathrm {Q} \) (cf. Eq. 23). Then, by monotonicity of \(\gamma _\rightsquigarrow \) (cf. Eq. 11), we have that \(\gamma _\rightsquigarrow (\varLambda _\rightsquigarrow ) \subseteq \gamma _\rightsquigarrow (\gamma _\mathrm {Q} (\varLambda _\mathrm {Q}))\). Thus, since \(\gamma _\rightsquigarrow (\gamma _\mathrm {Q} (\varLambda _\mathrm {Q})) \subseteq \mathcal {N} _J\), we have that \(\gamma _\rightsquigarrow (\varLambda _\rightsquigarrow ) \subseteq \mathcal {N} _J\). The conclusion follows from Theorem 4. \(\square \)
In order to take termination into account, one could map each variable appearing in the guard of a loop to the value \(U\). Alternatively, one could run a termination analysis [3, 33, 34], along with the data usage analysis, and only map to \(U\) variables appearing in the loop guard of a possibly non-terminating loop.