Abstract
Modern software development rarely takes place within a single programming language. Often, programmers appeal to cross-language interoperability. Examples are exploitation of novel features of one language within another, and cross-language code reuse. Our previous works developed a theory of so-called multi-languages, which arise by combining existing languages, defining a precise notion of (algebraic) multi-language semantics. As regards static analysis, the heterogeneity of the multi-language context opens up new and unexplored scenarios. In this paper, we provide a general theory for the combination of abstract interpretations of existing languages, regardless of their inherent nature, in order to gain an abstract semantics of multi-language programs. As a part of this general theory, we show that formal properties of interest of multi-language abstractions (e.g., soundness and completeness) boil down to the features of the interoperability mechanism that binds the underlying languages together. We extend many of the standard concepts of abstract interpretation to the framework of multi-languages.
Similar content being viewed by others
Notes
A commercial static code analyser for Java (version 3.2.0.1227: for Linux 64 bit).
References
Ramsey N (2006) ML module mania: a type-safe, separately compiled, extensible interpreter. Electron Notes Theor Comput Sci 148(2):181–209
Juneau J, Baker J, Wierzbicki F, Soto L, Ng V (2010) The definitive guide to Jython: python for the java platform, 1st edn. Apress, Berkely
Liang S (1999) Java native interface: programmer’s guide and reference, 1st edn. Addison-Wesley Longman Publishing Co. Inc., Boston
Buro S, Mastroeni I (2019) On the multi-language construction. In: European symposium on programming. Springer, pp 293–321
Chisnall D (2013) The challenge of cross-language interoperability. Commun ACM 56(12):50–56
Perconti JT, Ahmed A (2014) Verifying an open compiler using multi-language semantics. In: Proceedings of the 23rd European symposium on programming languages and systems, pp 128–148. Springer, Berlin
Ahmed A, Blume M (2011) An equivalence-preserving cps translation via multi-language semantics. SIGPLAN Not 46(9):431–444
Furr M, Foster JS (2005) Checking type safety of foreign function calls. SIGPLAN Not. 40(6):62–72
Gray KE (2008) Safe cross-language inheritance. In: Vitek J (ed) ECOOP 2008–object-oriented programming. Springer, Berlin, pp 52–75
Patterson D, Perconti J, Dimoulas C, Ahmed A (2017) Funtal: reasonably mixing a functional language with assembly. In: Proceedings of the 38th ACM SIGPLAN conference on programming language design and implementation. ACM, New York, pp 495–509
Matthews J, Findler RB (2009) Operational semantics for multi-language programs. ACM Trans Program Lang Syst 31(3):12–11244
Campbell G, Papapetrou PP (2013) SonarQube in action. Manning Publications Co., Shelter Island
Cousot P, Cousot R (1977) Abstract interpretation: a unified lattice model for static analysis of programs by construction or approximation of fixpoints. In: Proceedings of the 4th ACM SIGACT-SIGPLAN symposium on principles of programming languages, pp 238–252
Cousot P, Cousot R (1992) Abstract interpretation frameworks. J Log Comput 2(4):511–547
Buro S, Crole RL, Mastroeni I (2020) On multi-language abstraction—towards a static analysis of multi-language programs. In: Pichardie D, Sighireanu M (eds) Proceedings of static analysis—27th international symposium, SAS 2020, virtual event, November 18-20, 2020, Lecture Notes in Computer Science, vol 12389. Springer, pp 310–332
Goguen JA, Meseguer J (1992) Order-sorted algebra I: equational deduction for multiple inheritance, overloading, exceptions and partial operations. Theoret Comput Sci 105(2):217–273
Goguen JA, Diaconescu R (1994) An oxford survey of order sorted algebra. Math Struct Comput Sci 4(3):363–392
Tennent RD (1976) The denotational semantics of programming languages. Commun ACM 19(8):437–453
Cohen H, Frey G, Avanzi R, Doche C, Lange T, Nguyen K, Vercauteren F (2005) Handbook of elliptic and hyperelliptic curve cryptography. CRC Press, Boca Raton
Goguen JA, Thatcher JW, Wagner EG, Wright JB (1977) Initial algebra semantics and continuous algebras. J ACM 24(1):68–95
Cousot P, Giacobazzi R, Ranzato F (2019) A\(^2\)i: abstract\(^2\) interpretation. Proc ACM Program Lang 3(POPL):1–31
Amato G, Meo MC, Scozzari F (2020) On collecting semantics for program analysis. Theoret Comput Sci
Spoto F, Jensen T (2003) Class analyses as abstract interpretations of trace semantics. ACM Trans Program Lang Syst 25(5):578–630
Bjørner N, Gurfinkel A (2015) Property directed polyhedral abstraction. In: International workshop on verification, model checking, and abstract interpretation. Springer, pp 263–281
Kochems J, Ong C (2011) Improved functional flow and reachability analyses using indexed linear tree grammars. In: 22nd International conference on rewriting techniques and applications (RTA’11). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik
Giacobazzi R, Ranzato F (1997) Completeness in abstract interpretation: a domain perspective. In: International conference on algebraic methodology and software technology. Springer, pp 231–245
Cousot P (2002) Constructive design of a hierarchy of semantics of a transition system by abstract interpretation. Theoret Comput Sci 277(1–2):47–103
Mastroeni I, Pasqua M (2017) Hyperhierarchy of semantics-a formal framework for hyperproperties verification. In: International static analysis symposium. Springer, pp 232–252
Pasqua M (2019) Hyper static analysis of programs—an abstract interpretation-based framework for hyperproperties verification. PhD thesis, University of Verona
Rival X, Yi K (2019) Introduction to Static Analysis
Cousot P, Halbwachs N (1978) Automatic discovery of linear restraints among variables of a program. In: Proceedings of the 5th ACM SIGACT-SIGPLAN symposium on principles of programming languages, pp 84–96
Arceri V, Mastroeni I (2019) Static program analysis for string manipulation languages. Electron Proc Theoret Comput Sci 299:19–33
Giacobazzi R, Ranzato F, Scozzari F (2000) Making abstract interpretations complete. J ACM 47(2):361–416
Oracle: Nashorn User’s Guide. https://docs.oracle.com/en/java/javase/14/nashorn/introduction.html
JetBrains: Calling Java code from Kotlin. https://kotlinlang.org/docs/reference/java-interop.html
Oracle: JNI Types and Data Structures. https://docs.oracle.com/javase/7/docs/technotes/guides/jni/spec/types.html
Monat R, Ouadjaout A, Miné A (2021) A multilanguage static analysis of python programs with native C extensions. In: Dragoi C, Mukherjee S, Namjoshi KS (eds) Static analysis—28th international symposium, SAS 2021, USA. Lecture Notes in Computer Science, vol 12913, pp 323–345
Gordon AD, Syme D (2001) Typing a multi-language intermediate code. Conference Record of POPL 2001: the 28th ACM SIGPLAN-SIGACT symposium on principles of programming languages. London, UK, January 17–19, 2001. ACM, New York, pp 248–260
Grimmer M, Schatz R, Seaton C, Würthinger T, Luján M (2018) Cross-language interoperability in a multi-language runtime. ACM Trans Program Lang Syst 40(2):8–1843
Barrett E, Bolz CF, Tratt L (2015) Approaches to interpreter composition. Comput Lang Syst Struct 44:199–217
Benton N (2005) Embedded interpreters. J Funct Program 15(4):503–542
Buro S, Mastroeni I, Crole RL (2020) Equational logic and categorical semantics for multi-languages. In: In-press (accepted for Publication at 36th international conference on mathematical foundations of programming semantics—MFPS 2020)
Buro S, Mastroeni I, Crole RL (2020) Equational logic and set-theoretic models for multi-languages. In: In-press (accepted for Publication at 21st Italian Conference on Theoretical Computer Science — ICTCS 2020)
Tan G, Morrisett G (2007) Ilea: inter-language analysis across Java and C. SIGPLAN Not 42(10):39–56
Li S, Tan G (2014) Finding reference-counting errors in python/c programs with affine analysis. In: European conference on object-oriented programmings. Springer, pp 80–104
Malcolm D. Usage example: a static analysis tool for CPython extension code. https://gcc-python-plugin.readthedocs.io/en/latest/cpychecker.html
Li S, Tan G (2009) Finding bugs in exceptional situations of jni programs. In: Proceedings of the 16th ACM conference on computer and communications security, pp 442–452
Cousot P (1997) Types as abstract interpretations. In: Proceedings of the 24th ACM SIGPLAN-SIGACT symposium on principles of programming languages, pp 316–331
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Availability of data and materials
Data sharing not applicable to this article as no datasets were generated or analysed during the current study.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A
Examples of algebraic semantics for \(\textsf{Imp}\)
We illustrate a simple imperative language \(\textsf{Imp}\) on which we define various semantics in the algebraic style, namely small step operational, prefix trace, and reachability.
Let \(\mathbb{X}\) be a set of variables and \(\mathbb{V}\) a set of scalar values with metavariables x and v, respectively. Variables and values occur in the language as terminal symbols, and for each production defining the syntax of the language (on the right), we introduce a corresponding algebraic operator (on the left), or a family of operators when they are parametric on a subscript:
where \(\odot \) is a binary operator such as \(\text {+}\), \(\text {-}\), \(\text {{*}}\), etc. We abuse notation and assume that \(\odot \) denotes both a syntactical symbol of the language and a mathematical function \(\odot :\mathbb{V}^2 \rightarrow \mathbb{V}\) over values. The rank of each algebraic operator can be inferred by the non-terminals appearing in the production rules; for instance, the operator \( cond \) is sorted as
In the examples in the following sections, we often use the correspondence between algebraic and context-free terms. For instance, we may write the algebraic term \( cond (bop_{>}(\text {x}, 0), skip , assign _{\text {x}}(bop\_(0, \text {x})))\) in the less cumbersome context-free form \(\text{if}\;\text{x > 0}\;\text{then}\;\text{skip}\;\text{else}\;\text{x}\;\text{ = }\;\text{0 - x}\).
2.1 A small-step operational semantics
We define a small-step operational semantics \(\mathscr {S}\) describing the program execution steps. The presentation provided here is purely algebraic, and therefore less intuitive than the traditional rule-based style. However, the algebraic framework allows to express many more kinds of semantics in the same formalism, thus favouring their comparison.
Expressions
We treat expressions \(\text {E}\) as “atomic” terms that are fully evaluable into a scalar value in a single-step. Let \(\mathbb{S}_{ exp }\triangleq \{ \, \langle {\text {E}, \rho }\rangle \, \vert \; \text {E} \in {\llbracket exp \rrbracket }_{\mathscr {T}_{\textsf{Imp}}} \wedge \rho \in {\mathbbm{Env}} \}\) be the set of configurations where \(\text {E}\) is an expression and \(\rho \) an environment in \({\mathbbm{Env}}\triangleq \mathbb{X}\rightarrow \mathbb{V}\). The small-step semantics of expressions is given in Fig. 17. Intuitively, starting from an expression \(\text {E}\), we build a set of pairs in \(\mathcal {P}(\mathbb{S}_{ exp }\times \mathbb{V})\) representing the one-step evaluation of \(\text {E}\) in each environment \(\rho \). More precisely, \( \langle{E} \rho \rangle v \in {\llbracket {E} \rrbracket}_{\mathscr{S}}\) simply means that \(\text {E}\) is evaluated into v in \(\rho \). We write \({\llbracket {\text {E}} \rrbracket }_{\mathscr {S}}^\rho \) for denoting such v (unique by construction).
Remark 5
Note that from the small-step semantics \(\llbracket \text {E} \rrbracket_{\mathscr{S}}\) of an expression \(\text {E}\), we are able to recover the term \(\text {E}\). Indeed, \(\llbracket \text {E} \rrbracket_{\mathscr {S}} \ne \varnothing \) and if \( {\langle {\text{E}}_{1}, \rho_{1} \rangle} {\rightarrowtriangle} v_{1}\) and \( {\langle {\text{E}}_{2}, \rho_{2} \rangle} {\rightarrowtriangle} v_{2}\) are transitions (that is, pairs) in \({\llbracket {\text{E}} \rrbracket }_{\mathscr {S}}\), then \(\text {E}_{1} = \text{E} = \text {E}_2\) (this can be shown by a simple structural induction on \(\text {E}\)).
Remark 6
There are some missing cases in the definition of the interpretation functions for the operators in Fig. 17. For instance, we have defined \({\llbracket bop_\odot \rrbracket }_{\mathscr {S}}\) on arguments \({\llbracket \text {E}_1\rrbracket }_{\mathscr {S}}\) and \({\llbracket \text {E}_2\rrbracket }_{\mathscr {S}}\). However, there are semantic elements in \(\mathcal {P}(\mathbb{S}_{ exp }\times \mathbb{V})\) that are not the image of any expressions \(\text {E}\) (e.g., the empty set \(\varnothing \)). We shall leave implicit that \({\llbracket bop_\odot \rrbracket }_{\mathscr {S}}(e_1, e_2) \triangleq \varnothing \) whenever there are no \(\text {E}_1\) or \(\text {E}_2\) such that \(e_1 = {\llbracket \text {E}_1\rrbracket }_{\mathscr {S}}\) and \(e_2 = {\llbracket \text {E}_2\rrbracket }_{\mathscr {S}}\). (This remark and Rem. 5 shall also apply to the next definitions.)
Commands
Let \(\mathbb{S}_{ com }\triangleq \{ \, \langle {\text {C}, \rho }\rangle \, \vert \; \text {C} \in {\llbracket com \rrbracket }_{\mathscr {T}_{\textsf{Imp}}} \cup \{\bot \} \wedge \rho \in {\mathbbm{Env}} \}\) where \(\text {C}\) is a command (or \(\bot \), denoting the end of a computation) and \(\rho \) an environment. For each command operator of \(\textsf{Imp}\) we define its semantics by specifying exactly the pairs of configurations which are related by the action of such an operator (Fig. 18). We write \({\llbracket \text {C}\rrbracket }_{\mathscr {S}}^\rho \) for the unique \(\langle {\text {C}^{\prime}, \rho^{\prime}}\rangle \) such that \( \langle {\text {C}, \rho }\rangle {\rightarrowtriangle} \langle {\text {C}^{\prime}, \rho^{\prime}}\rangle \in {\llbracket \text {C}\rrbracket }_{\mathscr {S}} \).
Example 3
We show a small example of the application of the newly defined semantics \({\llbracket -\rrbracket }_{\mathscr {S}}\). We adopt the more intuitive notation provided by the context-free grammar for denoting terms, and we avoid the use of subscripts \(_\mathscr {S}\). Suppose we want to compute the small-step semantics of the conditional statement \(\text {if }\text {x > 0}\text { then }\text {skip}\text { else }\text {x} = {0 - x}\). Then,
where the semantics of the condition is
and therefore,
Note that the same result would have been achieved with a traditional rule-based style for specifying small-step semantics.
2.2 Fixpoint definition of prefix trace semantics
Prefix trace semantics associates each program \(\text{P}\) with the set of all finite traces obtained by iterating an arbitrarily large number of times the small-step semantics \(\mathscr {S}\) from \(\langle {\text{P}, \rho }\rangle \), for each environment \(\rho \).
Let \(\mathbb{S}_{ com }^{*} \triangleq \bigcup _{n \in \mathbb{N}} \mathbb{S}_{ com }^n\) be the set of finite sequences of command configurations (that is, finite traces). A trace \(\tau \in \mathbb{S}_{ com }^n\) is denoted by \( \langle {\text {C}_1, \rho _1}\rangle {\rightarrowtriangle} \cdots {\rightarrowtriangle} \langle {\text {C}_n, \rho _n}\rangle \). The prefix trace semantics \(\mathscr {P}\) is defined by keeping the one-step evaluation semantics for expressions \(\text {E}\) (i.e., \({\llbracket \text {E}\rrbracket }_{\mathscr {P}} \triangleq {\llbracket \text {E}\rrbracket }_{\mathscr {S}}\)), and by defining the following fixpoint semantics for command operators \(f:w \rightarrow s\) on the domain \(\langle {\llbracket com \rrbracket }_{\mathscr {P}} \triangleq \mathcal {P}(\mathbb{S}_{ com }^{*}), \subseteq , \varnothing , \cup \rangle \):
where \(F_{f(\text{P}_1, \ldots , \text{P}_n)}:\mathcal {P}(\mathbb{S}_{ com }^{*}) \rightarrow \mathcal {P}(\mathbb{S}_{ com }^{*})\) is defined as
and the trace semantics of the constant \( skip \) is trivially defined by \({\llbracket skip \rrbracket }_{\mathscr {P}} \triangleq \{\varepsilon \} \cup \{ \, \langle { skip , \rho }\rangle \, \vert \; \rho \in {\mathbbm{Env}} \} \cup \{ \, \langle skip , \rho \rangle {\rightarrowtriangle} \langle \bot , \rho \rangle \, \vert \; \rho \in {\mathbbm{Env}} \}\). The constructive computation of \({{\,\textrm{lfp}\,}}_\varnothing ^\subseteq F_{f(\text{P}_1, \ldots , \text{P}_n)}\) is guaranteed by Kleene’s theorem (\(F_{f(\text{P}_1, \ldots , \text{P}_n)}\) is continuous on the pointed dcpo \(\langle \mathcal {P}(\mathbb{S}_{ com }^{*}), \subseteq , \varnothing , \cup \rangle \)).
Example 4
We restate Ex. 3 for the prefix trace semantics \(\mathscr {P}\) applied to the same term \(\text{P} \triangleq \text {if }\text {x > 0}\text { then }\text {skip}\text { else }\text {x} = {0 - x}\):
where the iterates of \(F_{\text{P}}\) are
and therefore \({\llbracket \text{P}\rrbracket }_{\mathscr {P}}\) is the union of the iterates.
2.3 Reachability semantics as abstraction of trace semantics
Reachability semantics aims at computing the set of states that a program \(\text{P}\) may reach during its execution. Such a set can be parametric on program points (that is, location) or it can be the union of all the environments reached in any point. We show that both of these versions can be obtained by abstracting the collecting semantics \(\mathscr {P}^{*}\) over traces provided in the previous section.
Reachability on Program Points
Let \(\mathscr {R}\) be the reachability semantics that collects states per program point. Its carrier set of sort \( com \) is defined as \({\llbracket com \rrbracket }_{\mathscr {R}} \triangleq \mathcal {P}(\mathbb{S}_{ com })\), thus a command is interpreted as a set of configurations (where program code denotes locations). We show that \(\mathscr {R}\) can be obtained by abstracting the collecting semantics \(\mathscr {P}^{*}\) by establishing a Galois connection between their carrier sets:
The abstraction function \(\alpha \) maps a semantic property \(\mathcal {X} \subseteq \mathcal {P}(\mathbb{S}_{ com }^{*})\) (i.e., a set of sets of finite traces) to the set of states that appears in those traces:
Conversely, the concretisation function \(\gamma \) maps each set of states C to the set containing only those traces whose configurations are in C:
Now, the definition of \(\mathscr {R}\) follows by the existence of a best correct approximation, as shown in Sect. 4.
Reachability without Program Points
The reachability semantics \(\mathscr {R}_\cup \) forgets about program locations and simply collects the environments reached during the execution of a program. The carrier set of commands is defined as \({\llbracket com \rrbracket }_{\mathscr {R}_\cup } \triangleq \mathcal {P}({\mathbbm{Env}})\). \(\mathscr {R}_\cup \) can be obtained by abstracting the collecting semantics \(\mathscr {P}^{*}\) over traces:
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Buro, S., Crole, R. & Mastroeni, I. On multi-language abstraction: Towards a static analysis of multi-language programs. Form Methods Syst Des (2023). https://doi.org/10.1007/s10703-022-00405-8
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10703-022-00405-8