On the Multi-Language Construction

Modern software is no more developed in a single programming language. Instead, programmers tend to exploit cross-language interoperability mechanisms to combine code stemming from diﬀerent languages, and thus yielding fully-ﬂedged multi-language programs . Whilst this approach enables developers to beneﬁt from the strengths of each single-language, on the other hand it complicates the semantics of such programs. Indeed, the resulting multi-language does not meet any of the semantics of the combined languages. In this paper, we broaden the boundary functions -based approach à la Matthews and Findler to pro-pose an algebraic framework that provides a constructive mathematical notion of multi-language able to determine its semantics . The aim of this work is to overcome the lack of a formal method (resp., model) to design (resp., represent) a multi-language, regardless of the inherent nature of the underlying languages. We show that our construction ensures the uniqueness of the semantic function (i.e., the multi-language semantics induced by the combined languages) by proving the initiality of the term model (i.e., the abstract syntax of the multi-language) in its category.


Introduction
Two elementary arguments lie at the heart of the multi-language paradigm: the large availability of existing programming languages, along with a very high number of already written libraries, and software that, in general, needs to interoperate.Although there is consensus in claiming that there is no best programming language regardless of the context [4,8], it is equally true that many of them are conceived and designed in order to excel for specific tasks.Such examples are R for statistical and graphical computation, Perl for data wrangling, Assembly and C for low-level memory management, etc. "Interoperability between languages has been a problem since the second programming language was invented" [8], so it is hardly surprising that developers have focused on the design of cross-language interoperability mechanisms, enabling programmers to combine code written in different languages.In this sense, we speak of multi-languages.
The field of cross-language interoperability has been driven more by practical concerns than by theoretical questions.The current scenario sees several en-gines and frameworks [47,28,44,13,29] (among others) to mix programming languages but only [30] discusses the semantic issues related to the multi-language design from a theoretical perspective.Moreover, the existing interoperability mechanisms differ considerably not only from the viewpoint of the combined languages, but also in terms of the approach used to provide the interoperation.For instance, Nashorn [47] is a JavaScript interpreter written in Java to allow embedding JavaScript in Java applications.Such engineering design works in a similar fashion of embedded interpreters [41,40]. 1 On the contrary, Java Native Interface (JNI) framework [29] enables the interoperation of Java with native code written in C, C ++ , or Assembly through external procedure calls between languages, mirroring the widespread mechanism of foreign function interfaces (FFI) [14], whereas theoretical papers follow the more elegant approach of boundary functions (or, for short, boundaries) in the style of Matthews and Findler's multi-language semantics [30].Simply put, boundaries act as a gate between single-languages.When a value needs to flow on the other language, they perform a conversion so that it complies to the other language specifications.
The major issue concerning this new paradigm is that multi-language programs do not obey any of the semantics of the combined languages.As a consequence, any method of formal reasoning (such as static program analysis or verification) is neutralized by the absence of a semantics specification.In this paper, we propose an algebraic framework based on the mechanism of boundary functions [30] that unambiguously yields the syntax and the semantics of the multi-language regardless the combined languages.
The Lack of a Multi-Language Framework.The notion of multi-language is employed naively in several works in literature [37,2,49,14,21,36,35,30] to indicate the embedding of two programming languages into a new one, with its own syntax and semantics.
The most recurring way to design a multi-language is to exploit a mechanism (like embedded interpreters, FFI, or boundary functions) able to regulate both control flow and value conversion between the underlying languages [30], thus adequate to provide cross-language interoperability [8].The full construction is usually carried out manually by language designers, which define the multilanguage by reusing the formal specifications of the single-languages [36,37,2,30] and by applying the selected mechanism for achieving the interoperation.Inevitably, therefore, all these resulting multi-languages notably differ one from another.
These different ways to achieve a cross-language interoperation are all attributable to the lack of a formal description of multi-language that does not provide neither a method for language designers to conceive new multi-languages nor any guarantee on the correctness of such constructions.
The Proposed Framework: Roadmap and Contributions.Matthews and Findler [30] propose boundary functions as a way to regulate the flow of values be-tween languages.They show their approach on different variants of the same multi-language obtained by mixing ML [33] and Scheme [9], representing two "syntactically sugared" versions of the simply-typed and untyped lambda calculi, respectively.
Rather than showing the embedding of two fixed languages, we extend their approach to the much broader class of order-sorted algebras [19] with the aim of providing a framework that works regardless of the inherent nature of the combined languages.There are a number of reasons to choose order-sorted algebras as the underlying framework for generalizing the multi-language construction.From the first formulation of initial algebra semantics [17], the algebraic approach to program semantics [16] has become a cornerstone in the theory of programming languages [27].Order-sorted algebras provide a mathematical tool for representing formal systems as algebraic structures through a systematic use of the notion of sort and subsort to model different forms of polymorphism [19,18], a key aspect when dealing with multi-languages sharing operators among the single-languages.They were initially proposed to ensure a rigorous model-theoretic semantics for error handling, multiple inheritance, retracts, selectors for multiple constructors, polymorphism, and overloading.In the years, several uses [25,52,3,11,38,39,24,6] and different variants [51,45,43,38] have been proposed for order-sorted algebras, making them a solid starting point for the development of a new framework.In particular, results on rewriting logic [32] extend easily to the order-sorted case [31], thus facilitating a future extension of this paper towards the operational semantics world.Improvements of the order-sorted algebra framework have also been proposed to model languages together with their type systems [10] and to extend order-sorted specification with high-order functions [38] (see [48] and [18] for detailed surveys).
In this paper, we propose three different multi-language constructions according to the semantic properties of boundary functions.The first one models a general notion of multi-language that do not require any constraints on boundaries (Sect.3).We argue that when such generality is superfluous, we can achieve a neater approach where boundary functions do not need to be annotated with sorts.Indeed, we show that when the cross-language conversion of a term does not depend on the sort at which the term is considered (i.e., when boundaries are subsort polymorphic) the framework is powerful enough to apply the correct conversion (Sect.4.1).This last construction is an improvement of the original notion of boundaries in [30].From a practical point of view, it allows programmers to avoid to explicitly deal with sorts when writing code, a non-trivial task that could introduce type cast bugs in real world languages.Finally, we provide a very specific notion of multi-language where no extra operator is added to the syntax (Sect.4.2).This approach is particularly useful to extend a language in a modular fashion and ensuring the backward compatibility with "old" programs.For each one of these variants we prove an initiality theorem, which in turn ensures the uniqueness of the multi-language semantics and thereby legitimating the proposed framework.Moreover, we show that the framework guarantees a fundamental closure property on the construction: The resulting multi-language admits an order-sorted representation, i.e., it falls within the same formal model of the combined languages.Finally, we model the multi-language designed in [30] in order to show an instantiation of the framework (Sect.6).

Background
All the algebraic background of the paper is firstly stated in [17,15,19].We briefly introduce here the main definitions and results, and we illustrate them on a simple running example.
Given a set of sorts S, an S-sorted set A is a family of sets indexed by S, i.e., We stick to the convention of using s and w as metavariables for sorts in S and S * , respectively, and we use the blackboard bold typeface to indicate a specific sort in S. In addition, if A is an S-sorted set and w = s 1 . . .s n ∈ S + , we denote by A w the cartesian product Given P ⊆ S, the restriction of an S-sorted function f to P is denoted by f P and it is the P -sorted function is a function, we still use the symbol g to denote the direct image map of g (also called the additive lift of g), i.e., the function g : Analogously, if ≤ is a binary relation on a set A (with elements a ∈ A), we use the same relation symbol to denote its pointwise extension, i.e., we write a 1 . . .a n ≤ a 1 . . .a n for a 1 ≤ a 1 , . . ., a n ≤ a n .
The basic notions underpinning the order-sorted algebra framework are the definitions of signature, that models symbols forming terms of the language, and algebra, that provides an algebraic meaning to symbols.
Definition 1 (Order-Sorted Signature).An order-sorted signature is a triple S, ≤, Σ , where S is a set of sorts, ≤ is a binary relation on S, and Σ is an If σ ∈ Σ w,s (or, σ : w → s and σ : s when w = ε, as shorthands), we call σ an operator (symbol ) or function symbol, w the arity, s the sort, and (w, s) the rank of σ; if w = ε, we say that σ is a constant (symbol ).We name ≤ the subsort relation and Σ a signature when S, ≤ is clear from the context.We abuse notation and write σ ∈ Σ when σ ∈ w,s Σ w,s .
Definition 2 (Order-Sorted Algebra).An order-sorted S, ≤, Σ -algebra A over an order-sorted signature S, ≤, Σ is an S-sorted set A of interpretation domains (or, carrier sets or semantic domains) for each σ ∈ Σ w,s , such that: (1oa) s ≤ s implies A s ⊆ A s ; and (2oa) σ ∈ Σ w1,s1 ∩ Σ w2,s2 and w 1 ≤ w 2 imply that σ w1,s1 An important property of signatures, related to polymorphism, is regularity.Its relevance lies in the possibility of linking each term to a unique least sort (see Proposition 2.10 in [19]).
Definition 3 (Regularity of an Order-Sorted Signature).An order-sorted signature S, ≤, Σ is regular if for each σ ∈ Σ w,s and for each lower bound w 0 ≤ w the set { (w, s) | σ ∈ Σ w,s ∧ w 0 ≤ w } has minimum.This minimum is called least rank of σ with respect to w 0 .
Homomorphisms between algebras capture the compositionality nature of semantics: The meaning of a term is determined by the meanings of its constituents.They are defined as order-sorted functions that preserve the interpretation of operators.
Definition 5 (Order-Sorted Homomorphism).Let A and B be S, ≤, Σalgebras.An order-sorted S, ≤, Σ -homomorphism from A to B, denoted by h : A (a)) = σ w,s B (h w (a)) for each σ ∈ Σ w,s and a ∈ A w ; and (2oh) s ≤ s implies h s (a) = h s (a) for each a ∈ A s .The class of all the order-sorted S, ≤, Σ -algebras and the class of all ordersorted S, ≤, Σ -homomorphisms form a category denote by OSAlg(S, ≤, Σ).Furthermore, the homomorphism definition determines the property of the term algebra T Σ of being an initial object in its category whenever the signature is regular.Since initiality is preserved by isomorphisms, it allows to identify T Σ with the abstract syntax of the language.If T Σ is initial, the homomorphism leaving T Σ and going to an algebra A is called the semantic function (with respect to A).
Example.Let L1 and L2 be two formal languages (see Fig. 1).The former is a language to construct simple mathematical expressions: n ∈ N is the metavariable for natural numbers, while e inductively generates all the possible additions (Fig. 1a).The latter is a language to build strings over a finite alphabet of symbols A = { a, b, . . ., z }: a ∈ A is the metavariable for atoms (or, characters), whereas s concatenates them into strings (Fig. 1b).A term in L1 and L2 denotes an element in the sets N and A * , accordingly to equations in Fig. 2a and 2b, respectively.
The syntax of the language L1 can be modeled by an order-sorted signature S1 = S1, ≤1, Σ1 defined as follows: S1 = { e, n }, a set with sorts e (stands for expressions) and n (stands for natural numbers); ≤1 is the reflexive relation on S1 plus n ≤1 e (natural numbers are expressions); and the operators in Σ1 are 0, 1, 2, . . .: n and + : e e → e.Similarly, the signature S2 = S2, ≤2, Σ2 models the syntax of the language L2: the set S2 = { s, a } carries the sort for strings s and the sort for atomic symbols (or, characters) a; the subsort relation ≤2 is the reflexive relation on S2 plus a ≤2 s (characters are one-symbol strings); and the operator symbols in Σ2 are a, . . ., z : a, -: s, and + : s s → s.Semantics of L1 and L2 can be embodied by algebras A1 and A2 over the signatures S1 and S2, respectively.We set the interpretation domains of A1 to A 1 n = A 1 e = N and those of A2 to A 2 a = A ⊆ A * = A 2 s .Moreover, we define the interpretation functions as follows (the juxtaposition of two or more strings denotes their concatenation, and we use â as metavariable ranging over A * ): Since S1 and S2 are regular, then A1 and A2 induce the semantic functions h1 : TΣ 1 → A1 and h2 : TΣ 2 → A2, providing semantics to the languages.

Combining Order-Sorted Theories
The first step towards a multi-language specification is the choice of which terms of one language can be employed in the others [36,30,35].For instance, a multilanguage requirement could demand to use ML expressions in place of Scheme expressions and, possibly, but not necessarily, vice versa (such a multi-language is designed in [30]).A multi-language signature is an amenable formalism to specify the compatibility relation between syntactic categories across two languages.
Definition 6 (Multi-Language Signature).A multi-language signature is a triple S 1 , S 2 , ≤ , where are ordersorted signatures, and ≤ is a binary relation on S = S 1 ∪ S 2 , such that satisfies the following condition: To make the notation lighter, we introduce the following binary relations on S: s s if s ≤ s but neither s ≤ 1 s nor s ≤ 2 s , and s s if s ≤ s but not s s .
In the following, we always assume that the sets of sorts S 1 and S 2 of the ordersorted signatures S 1 and S 2 are disjoint. 3Condition (1s) requires the multilanguage subsort relation ≤ to preserve the original subsort relations ≤ 1 and ≤ 2 (i.e., ≤ ∩ S i × S i = ≤ i ).The join relation provides a compatibility relation between sorts4 in S 1 and S 2 .More precisely, S i s s ∈ S j suggests that we want to use terms in T Σi,s in place of terms in T Σj ,s , whereas the intra-language subsort relation shifts the standard notion of subsort from the order-sorted to the multi-language world.In a nutshell, the relation ≤ = ∪ can only join (through ) the underlying languages without introducing distortions (indeed, The role of an algebra is to provide an interpretation domain for each sort, as well as the meaning of every operator symbol in a given signature.When moving towards the multi-language context, the join relation may add subsort constraints between sorts belonging to different signatures.Consequently, if s s , a multi-language algebra has to specify how values of sort s may be interpreted as values of sort s .These specifications are called boundary functions [30] and provide an algebraic meaning to the subsort constraints added by .Henceforth, we define S = S 1 ∪ S 2 , Σ = Σ 1 ∪ Σ 2 , and, given (w, s) ∈ S * i × S i , we denote by , and with a -sorted set α of boundary functions α = { α s,s : A s → A s | s s }, such that the following constraint holds: (1a) the projected algebra A i , where i = 1, 2, specified by the carrier set w,s , must be an order-sorted S i -algebra.
If M is an algebra, we adopt the convention of denoting by M (standard math font) its carrier set and by µ (Greek math font) its boundary functions whenever possible.Condition (1a) is the semantic counterpart of condition (1s): It requires the multi-language to carry (i.e., preserve) the underlying languages order-sorted algebras, whereas the the boundary functions model how values can flow between languages.
Given two multi-language S 1 , S 2 , ≤ -algebras A and B we can define morphisms between them that preserve the sorted structure of the underlying projected algebras.
Definition 8 (Multi-Language Homomorphism).Let A and B be multilanguage S 1 , S 2 , ≤ -algebras with sets of boundary functions α and β, respectively.A multi-language S 1 , S 2 , ≤ -homomorphism h : A → B is an S-sorted function h : A → B such that: (1h) the restriction h Si is an order-sorted S i -homomorphism h Si : Conditions (1h) and (2h) are easily intelligible when the domain algebra is the abstract syntax of the language [15]: Simply put, both conditions require the semantics of a term to be a function of the meaning of its subterms, in the sense of [15,46].In particular, the second condition demands that boundary functions act as operators. 5he identity homomorphism on a multi-language algebra A is denoted by id A and it is the set-theoretic identity on the carrier set A of the algebra A. Hence, as in the many-sorted and order-sorted case [15,19], we have immediately the category of all the multi-language algebras over a multi-language signature: Theorem 1.Let S 1 , S 2 , ≤ be a multi-language signature.The class of all S 1 , S 2 , ≤ -algebras and the class of all S 1 , S 2 , ≤ -homomorphisms form a category denoted by Alg(S 1 , S 2 , ≤).

The Initial Term Model
In this section, we introduce the concepts of (multi-language) term and (multilanguage) semantics in order to show how a multi-language algebra yields a unique interpretation for any regular (see Def. 11) multi-language specification.Multi-language terms should comprise all of the underlying languages terms, plus those obtained by the merging of the two languages according to the join relation .In particular, we aim for a construction where subterms of sort s may have been replaced by terms of sort s, whenever s s (we recall that s and s are two syntactic categories of different languages due to Def. 6).Nonetheless, we must be careful not to add ambiguities during this process: A term t may belong to both S 1 and S 2 term algebras but with different meanings t A1 and t A2 (assuming that A 1 and A 2 are algebras over S 1 and S 2 , respectively).When t is included in the multi-language, we lose the information to determine which one of the two interpretations choose, thus making the (multi-language) semantics of t ambiguous.The same problem arises whenever an operator σ belongs to both languages with different interpretation functions.The simplest solution to avoid such issues is to add syntactical notations to make explicit the context of the language in which we are operating.
Definition 9 (Associated Signature).The associated signature to the multilanguage signature S 1 , S 2 , ≤ is the ordered triple S, , Π , where S = S 1 ∪S 2 , = ≤ 1 ∪ ≤ 2 , and It is trivial to prove that an associated signature is indeed an order-sorted signature, thus admitting a term algebra T Π .All the symbols forming terms in T Π carry the source language information as a subscript, and all the new operators → s,s specify when a term of sort s is used in place of a term of sort s .Although T Π seems a suitable definition for multi-language terms, it is not a multi-language algebra according to Def. 7.However, we can exploit the construction of T Π in order to provide a fully-fledged multi-language algebra able to generate multi-language terms.
Definition 10 (Multi-Language Term Algebra).The multi-language term algebra T over a multi-language signature S 1 , S 2 , ≤ with boundary functions τ is defined as follows: T Π for i = 1, 2; and (3t) s s implies τ s,s = → s,s s,s Proving that T satisfies Def.7 is easy and omitted.T and T Π share the same carrier sets (condition (1t)), and each single-language operator σ ∈ Σ i w,s is interpreted as its annotated version σ i in T Π (condition (2t)).Furthermore, the multi-language operators → s,s no longer belong to the signature (they do not belong neither to S 1 nor to S 2 ) but their semantics is inherited by the boundary functions τ (condition (3t)), while their syntactic values are still in the carrier sets of the algebra (this construction is highly technical and very similar to the freely generated Σ(X)-algebra over a set of variables X, see [15]).
Note that this is exactly the formalization of the ad hoc multi-language specifications in [37,2,36,30]: [37,2,36] exploit distinct colors to disambiguate the source language of the operators, whereas [30] use different font styles for different languages.Moreover, boundary functions in [30] conceptually match the introduced operators → s,s .
The last step in order to finalize the framework is to provide semantics for each term in T .As with the order-sorted case, we need a notion of regularity for proving the initiality of the term algebra in its category, which in turn ensures a single eligible (initial algebra) semantics.

Definition 11 (Regularity).
A multi-language signature S 1 , S 2 , ≤ is regular if its associated signature S, , Π is regular.
Proposition 2. The associated signature S, , Π of a multi-language signature S 1 , S 2 , ≤ is regular if and only if S 1 and S 2 are regular.
The last proposition enables to avoid checking the multi-language regularity whenever the regularity of the order-sorted signatures is known.
Theorem 2 (Initiality of T ).The multi-language term algebra T over a regular multi-language signature S 1 , S 2 , ≤ is initial in the category Alg(S 1 , S 2 , ≤).
Initiality of T is essential to assign a unique mathematical meaning to each term, as in the order-sorted case: Given a multi-language algebra A, there is only one way of interpreting each term t ∈ T in A (satisfying the homomorphism conditions).
Definition 12 ((Multi-Language) Semantics).Let A be a multi-language algebra over a regular multi-language signature S 1 , S 2 , ≤ .The (multi-language) semantics of a (multi-language) term t ∈ T induced by A is defined as The last equation is well-defined since h is the unique multi-language homomorphism h : T → A and for each t ∈ T there exists a least sort ls(t) ∈ S such that t ∈ T Π,ls(t) (see Prop. 2.10 in [19]).
Example.Suppose we are interested in a multi-language over the signatures S1 and S2 specified in the example given in the background section such that satisfies the following properties: -Terms denoting natural numbers can be used in place of characters a ∈ A according to the function chr : N → A that maps the natural number n to the character symbol a (n mod |A|) (we are assuming a total lexicographical order a (0) , a (1) , . . ., a (|A|−1) on A); -Terms denoting strings can be used in place of natural numbers n ∈ N according to the function ord : A → N, which is the inverse of chr restricted the initial segment on natural numbers N <|A| .
In order to achieve such a multi-language specification, we can simply provide a join relation on S and a boundary function α s,s for each extra-language subsort relation s s introduced by .We define the join relation and the boundary functions as follows: The multi-language S1, S2, ≤ -algebra A can now be obtained by joining the projected algebras A1 and A2 with the set of boundary functions α.The term algebra T over S1, S2, ≤ provides all the multi-language terms, and Thm. 2 ensures a unique denotation of each t ∈ T in A. For instance, the term +2(o2, →e,a( +1(101, 51)) t )) is syntactically equivalent to the following but with a less pedantic notation, where language subscripts are replaced by colors (red for one, and blue for two) and prefix notation is replaced by infix notation →s,n(f + o + →e,a(10 + 5)) and it denotes the natural numbers 765: A ( 10 A, 5 A) = + e e,e A (10, 5) = 15 A ( t1 A) = →s,n s,n A (foo) = 765 (see the proof of Prop.2.10 in [19] to check how to compute the least sort of a term).

Refining the Construction
The construction in Sect. 3 does not set any constraint on boundary functions, thus giving a great deal of flexibility to language designers.For instance, they can provide boundary functions that act differently with respect to the intralanguage subsort relation : According to the previous example, it would have been possible to define α n,a = α e,a to employ different value conversion specifications for terms in T n , based on whether they are used as natural numbers (n) or as expressions (e).However, when this amount of flexibility is not needed, we can refine the previous construction by reducing the amount of syntax introduced by the associated signature.In this section we examine the case where boundary functions satisfy the monotonicity conditions of order-sorted algebra operators (Sect.4.1); and the case where boundary functions commutes with the semantics of operator symbols (Sect.4.2).
In both cases, we prove that the introduced refinements do not affect the initiality of the term algebra, thereby providing unambiguous semantics to the multilanguage.

Subsort Polymorphic Boundary Functions
In Sect.3, the join relation constraints s s are turned in syntactical operators → s,s in the associated signature S, , Π .We now show how to handle all the syntactical overhead introduced by with a single polymorphic operator → whenever the boundary functions satisfy the monotonicity conditions of the order-sorted algebras [19].Such conditions require a subsort relation s 1 ≤ s 2 between the sorts of a polymorphic operator σ ∈ Σ w1,s1 ∩ Σ w2,s2 , assuming that w 1 ≤ w 2 .In our case, σ = →, and thus we extend Def.6 with the following ad hoc constraint (2s * ): Definition 6 * (SP Multi-Language Signature).A subsort polymorphic (SP) multi-language signature is a multi-language signature S 1 , S 2 , ≤ such that (2s * ) s 1 s 1 , s 2 s 2 , and s 1 s 2 imply s 1 s 2 .
Furthermore, order-sorted algebras demand consistency of the interpretation functions of a subsort polymorphic operator on the smaller domain, which results in the following condition (2a * ) on boundary functions (that extends Def.7): Definition 7 * (SP Multi-Language Algebra).Let S 1 , S 2 , ≤ be a SP multilanguage signature.A subsort polymorphic (SP) multi-language S 1 , S 2 , ≤algebra is a multi-language S 1 , S 2 , ≤ -algebra A such that (2a * ) s 1 s 1 , s 2 s 2 , and s 1 s 2 imply that α s1,s 1 (a) = α s2,s 2 (a) for each a ∈ A s1 .
The notion of homomorphism in this new context does not change (an homomorphism between two SP algebras is still an S-sorted function decomposable in two order-sorted homomorphisms that commutes with boundaries), whereas the associated signature to an SP multi-language signature merely differs from Def. 9 for having a unique polymorphic operator → instead of a family of parametrized symbols { → s,s : s → s | s s }.
Definition 9 * (SP Associated Signature).The subsort polymorphic (SP) associated signature to the SP multi-language signature S 1 , S 2 , ≤ is the ordered triple S, , Π , where S = S 1 ∪ S 2 , = ≤ 1 ∪ ≤ 2 , and Since the associated signature is the basis for the term algebra, we need to modify the condition (3t) in Def.9: Definition 10 * (SP Multi-Language Term Algebra).The subsort polymorphic (SP) multi-language term algebra T over a SP multi-language signature S 1 , S 2 , ≤ with boundary functions τ is defined as follows: T Π for i = 1, 2; and (3t * ) s s implies τ s,s = → s,s T Π .
Signature regularity is still defined as in Def.11 and Prop. 2 still holds for the extended version developed in this section.As a result, the SP multi-language term S 1 , S 2 , ≤ -algebra T is still initial in the category Alg * (S 1 , S 2 , ≤) of SP multi-language algebras over the SP multi-language signature S 1 , S 2 , ≤ .
Theorem 3. Let S 1 , S 2 , ≤ be a SP multi-language signature.The class of all SP S 1 , S 2 , ≤ -algebras and the class of all S 1 , S 2 , ≤ -homomorphisms form a category denoted by Alg * (S 1 , S 2 , ≤).
Theorem 4 (Initiality of T ).The SP multi-language term algebra T over a regular SP multi-language signature S 1 , S 2 , ≤ is initial in the category Alg * ( S 1 , S 2 , ≤).
The semantics of a term t induced by a SP multi-language algebra A is defined in the same way of Def. 12, thanks to the initiality result: t A = h ls(t) (t).The main advantage of dealing with SP multi-language terms is that the framework is able to determine the correct interpretation function of the operator →, making the subscript notation developed in the previous section superfluous.This also means that programmers are exempted from explicitly annotating multilanguage programs with sorts, a non-trivial task in the general case that could introduce type cast bugs.
Example.The boundary functions of the previous example are subsort polymorphic: αa,n(a) = ord(a) = αs,n(a) for each character a ∈ A, and αn,a = αe,a by definition.Thus, the equivalent of the term t (see Eq. 1) in the SP term algebra is or, according to the previous notation, and denoting the same natural number 765.

Semantic-Only Boundary Functions
In the previous section, we have shown how to handle the flow of values across different languages with a single polymorphic operator.Now, we present a new multi-language construction where neither extra operators are added to the associated signature, nor single-language operators have to be annotated with subscripts indicating their original language.Thus, the resulting multi-language syntax comprises only symbols in Σ 1 ∪ Σ 2 .Such a construction is achieved by: -Imposing commutativity conditions on algebras, making homomorphisms transparently inherit the semantics of boundary functions.The framework is therefore able to apply the correct value conversion function whenever is necessary, without the need for an explicit syntactical operator →. -Requiring a new form of cross-language polymorphism able to cope with shared operators among languages.The initiality of term algebras is preserved by modifying the notion of signature in a way that every operator admits a least sort.
The variant of the framework presented in this section is particularly useful when designing the extension of a language in a modular fashion.For instance, if the signature S 1 models the syntax of a simple functional language (for an example, see [15, p. 77]) without an explicit encoding for string values, and S 2 is a language for manipulating strings (similar to the language L 2 of the running example of this paper), we can exploit the construction presented below in order to embed S 2 into S 1 .
Signature.The main issue that can arise at this stage of multi-language signature is the presence of shared operators in Σ 1 and Σ 2 .Contrary to the previous cases where such ambiguity is solved by adding subscripts in the associated signature, the trade off here is requiring ad hoc or subsort polymorphism across signatures.
Condition (2s ) forces the subsort relation to be directed, avoiding symmetricity of syntactic categories (this is typical when modeling language extensions), while condition (3s ) shifts the monotonicity condition of order-sorted signature to syntactically equal operators in The associated signature is defined without adding extra symbols in the signature, i.e., Π = Σ 1 ∪ Σ 2 , and deliberately confounding the relations and in ≤: Definition 9 (SO Associated Signature).The SO associated signature to the SO multi-language signature S 1 , S 2 , ≤ is the ordered triple S, ≤, Π , where The embedding of in ≤ (i.e., ⊆ ≤) in the associated signature enables the order-sorted term algebra construction to automatically build multi-language terms, without the need for an explicit operator → that acts as a bridge between syntactic categories.It is easy to see that the term algebra over the associated signature is precisely the symbols-free version of multi-language described at the beginning.
Unfortunately, multi-language regularity does not follow anymore from singlelanguages regularity and vice versa (see Figs. 3 and 4) 6 .More formally, Prop. 2 does not hold in this new context: -Suppose S 1 = { w, s }, S 2 = { w 0 , w, s }, ≤ 1 and ≤ 2 to be the reflexive relations on S 1 and S 2 , respectively, plus w 0 ≤ 2 w, and σ ∈ Σ 1 w,s ∩ Σ 2 w,s .If the join relation is defined as w 0 w and s s, the resulting associated signature is no longer regular, although S 1 and S 2 are regular (Fig. 3a).In Fig. 3b, it is easy to see that σ ∈ Σ w,s and w 0 ≤ w, but the set { (w, s) | σ ∈ Σ w,s ∧ w 0 ≤ w } = { (w, s), (w, s) } does not have a least element w.r.t.w 0 .
A positive result can be obtained by recalling that regularity is easier to check when S, ≤ satisfies the descending chain condition (DCC):    Lemma 1 (Regularity over DCC poset [19]).An order-sorted signature Σ over a DCC poset S, ≤ is regular if and only if whenever σ ∈ Σ w1,s1 ∩ Σ w2,s2 and there is some w 0 ≤ w 1 , w 2 , then there is some w ≤ w 1 , w 2 such that σ ∈ Σ w,s and w 0 ≤ w.
At this point, we can relate the DCC of the poset S, ≤ in the associated signature of S 1 , S 2 , ≤ to the DCC of S As a result, whenever we know that S 1 , ≤ 1 and S 2 , ≤ 2 are DCC, we can check the regularity of S 1 , S 2 , ≤ by employing the Lemma 1 without checking whether S, ≤ is DCC.
Algebra.In this multi-language construction, the boundary functions behaviour is no more bounded to syntactical operators as in the previous sections, but it is inherited by homomorphisms.A necessary condition to accomplish this aim is the commutativity of interpretation functions with boundary functions: Definition 7 (SO Multi-Language Algebra).Let S 1 , S 2 , ≤ be an SO multi-language signature.A semantic-only (SO) multi-language S 1 , S 2 , ≤ -algebra is an SP multi-language S 1 , S 2 , ≤ -algebra A such that (3a ) σ ∈ Σ w1,s1 ∩ Σ w2,s2 and w 1 w 2 imply that α s1,s2 ( σ w1,s1 A ( α w1,w2 (a)) for each a ∈ A w1 .
The term algebra is defined similarly to Def. 10, except for boundary functions: Definition 10 (SO Multi-Language Term Algebra).The semantic-only (SO) multi-language term algebra T over an SO multi-language signature S 1 , S 2 , ≤ with boundary functions τ is defined as follows: T Π ; and (3t ) s s implies τ s,s = id Ts .
Since the subsort relation ≤ includes the join relation , s s implies T Π,s = T s ⊆ T s = T Π,s .Thus, the boundary function τ s,s can be defined as the identity on the smaller domain (note that it trivially satisfies the commutativity condition (3a )).
Proposition 4. Let S 1 , S 2 , ≤ be an SO multi-language signature.Then, the SO multi-language term S 1 , S 2 , ≤ -algebra is a proper SO multi-language algebra.
Theorem 5. Let S 1 , S 2 , ≤ be a SO multi-language signature.The class of all SO S 1 , S 2 , ≤ -algebras and the class of all S 1 , S 2 , ≤ -homomorphisms form a category denoted by Alg (S 1 , S 2 , ≤).
We can now prove the initiality of T in its category.
Theorem 6 (Initiality of T ).Let S 1 , S 2 , ≤ be a regular multi-language signature.Then, the term algebra T is an initial object in the category Alg(S 1 , S 2 , ≤).
Thanks to the initiality of the term algebra, the definition of term semantics is the same of Def.12.
Example.Let A1 and A2 be two order-sorted algebras over the signatures S1 and S2, respectively, as formalized in the example in Sect.3. Suppose we are interested in a new multi-language A over S1 and S2 such that any string expressions t of sort s in S2 can denote the natural number length( t A 2 ) when embedded in S1 terms.For instance, we require that 10 + 5 A = 10 + 5 A 1 = 15 and f + o A = f + o A 2 = fo, but (f + o) + (10 + 5) A = fo + 15 L = 17 (parentheses in the last term have only been used to disambiguate the parsing result).
Since the requirements demand to use string expressions in place of natural numbers, the join relation shall define s n and ensure transitivity, hence s e, a n, and a e.
The signatures S1 and S2 are trivially regular.However, by merging S1 and S2, we are causing subsort polymorphism on the symbol +, which is used as sum operator in A1 and as concatenation operator in A2, and therefore we have to check the regularity: Let w1 = e e, w2 = s s, s1 = e, and s2 = s.Given + ∈ Σw 1 ,s 1 ∩ Σw 2 ,s 2 and the lower bound w0 = a a ≤ w1, w2, then there exists w = s s such that w ≤ w1, w2 and + ∈ Σw,s, where s = s ≤ s1, s2 (we have employed Lemma 1 thanks to Prop.3).Analogously, when w0 = w1, w2 the relative least rank is (s s, s).
The multi-language S1, S2, ≤ -algebra A is now defined by joining the projected algebras A1 and A2 and by defining boundary functions a s,s for each s s such that convert strings in naturals (their length) when strings are used in place of naturals: The above definition of boundary functions satisfy both conditions (2a * ) and (3a ).The initiality theorem yields the semantic homomorphism from T to A. For instance, suppose we want to compute the semantics of the term The least sorts of t, t1, and t2 are e, s, and e, respectively.The operator + belongs to both Σe e,e and Σs s,s, and its least rank w.r.t. the lower bound ls(t1) ls(t2) = s e is (e e, e).By Def. 12 we have We can observe that without any syntactical operator the framework is still able to apply the correct boundary functions to move values across languages.

Reduction to Order-Sorted Algbera
The constructions in the previous sections beg the question whether a multilanguage algebra admits an equivalent order-sorted representation.Conceptually, it would mean that being a multi-language is essentially a matter of perspective: By forgetting how the multi-language has been constructed, what is left is simply an ordinary language.Mathematically speaking, it requires us to exhibit a reduction functor F from the multi-language category to an order-sorted one, such that there is an isomorphism φ between the carrier sets of the multi-language term S 1 , S 2 , ≤ -algebra T and F (T ), and such that t A = φ(t) F (A) for each t ∈ T and for each multi-language S 1 , S 2 , ≤ -algebra A.
In the case of Alg(S 1 , S 2 , ≤) and Alg * (S 1 , S 2 , ≤) categories, the construction of F and F * is very simple, and we illustrate it only for the plain multi-language algebras of Sect.3: Let A be a multi-language S 1 , S 2 , ≤ -algebra.Then, we define the order-sorted S, , Π -algebra A Π (called the associated order-sorted algebra of A) by setting A for each σ ∈ Σ i w,s and i = 1, 2; and (3π) → s,s s,s A Π = α s,s for each s s .If A and B are multi-language S 1 , S 2 , ≤ -algebras, and h is a multi-language S 1 , S 2 , ≤ -homomorphism from A to B, the functor F maps A and B to their associated order-sorted algebras A Π and B Π and the homomorphism h to itself.Since A Π = A, the isomorphism φ is the identity function.
If A is an SP multi-language S 1 , S 2 , ≤ -algebra, the construction of the reduction functor F * is similar to the definition of F .The only difference is the equation in the condition (3π) that turns into (3π * ) → s,s A Π = α s,s for each s s .Finally, the definition of F starting from the category Alg (S 1 , S 2 , ≤) of SO multi-language algebras is slightly different.We define F as a map from the multi-language category Alg (S 1 , S 2 , ≤) to the order-sorted category OSAlg(S, , Σ).We denote the reduction of a multi-language algebra A and a homomorphism h : A → B as F (A) = A and F (h) = h : A → B .The order-sorted algebra A has the same carrier sets of the multi-language algebra A, i.e., A = A, and interpretation functions σ w,s A = σ w,s A .Furthermore, we define h = h.Intuitively, the algebra A is formally defined simply by forgetting about the boundary functions, while the homomorphism h : A → B inherits their semantics from h.Again, the isomorphism φ is the identity.Theorem 8. F : Alg (S 1 , S 2 , ≤) → OSAlg(S, , Σ) is a functor for every SO multi-language signature S 1 , S 2 , ≤ .Moreover, t A = t F (A) for each t ∈ T and for each SO multi-language S 1 , S 2 , ≤ -algebra A.
Unfortunately, even though T is an initial algebra in its category, F (T ) = T is not: Given two multi-language algebras A and A that differ only in the boundary functions (we denote by α and α the families of boundary functions of A and A , respectively) they both get mapped by F to the same order-sorted algebra A .Thus, if h : T → A and h : T → A are the unique homomorphisms going from T to A and A , the functor F maps them to two different order-sorted homomorphisms h : T → A and h : T → A both leaving T and going to A , hence losing the uniqueness property.However, this does not pose a problem once fixed a family of boundary functions: Theorem 9. Let T be the multi-language term S 1 , S 2 , ≤ -algebra and A be an order-sorted S, , Σ -algebra.Given a family of boundary functions α = { α s,s | s s } such that satisfies condition (3a ), there exists a unique ordersorted S, , Σ -homomorphism h α : T → A commuting with α, i.e., if s s , then h α s (t) = α s,s (h α s (t)) for each t ∈ T s .
The reduction theorems presented in this section have a strong consequence: all the already known results for the order-sorted algebras can be lifted to the multi-language world.

An Example of Multi-Language Construction
The first theoretical paper addressing the problem of multi-language construction is [30].The authors study the so-called natural embedding (a more realistic improvement of the lump embedding [30,7,40,34]), in which Scheme terms can be converted to equivalent ML terms, and vice versa. 7The novelty in their approach is how they succeed to define boundaries in order to translate values from Scheme to ML.Indeed, the latter does not admit an equivalent representation for each Scheme function.Their solution is to "represent a Scheme procedure in ML at type τ 1 → τ 2 by a new procedure that takes an argument of type τ 1 , converts it to a Scheme equivalent, runs the original Scheme procedure on that value, and then converts the result back to ML at type τ 2 ".
Our goal here is not to discuss a fully explained presentation of ML and Scheme languages in the form of order-sorted algebras, but rather to show how we can model the natural embedding construction in our framework.Doing so, we provide a sketchy formalization of Scheme and ML syntax and semantics, and we redirect the reader to [30] for all the languages details.
To provide the semantics of Scheme, we follow the same approach of Goguen et al. [15] where the denotational semantics of the simple applicative language (SAL) introduced by Reynolds [42] is given by means of an algebra, exploiting the initiality theorem.Such a language is a "syntactically sugared" version of the untyped lambda calculus with the fixpoint operator, which in turn is very similar to Scheme.
Let X = { x 1 , x 2 , . . .} be a set of variables and N be the naturals lattice with and ⊥ adjoined.From [46], there exists a complete lattice V such that satisfies the isomorphism φ : V ∼ = N + V → V , where + is the disjoint union with minimum and maximum elements identified, and V → V is the complete lattice of Scott-continuous functions from V to V .Given ξ ∈ { N , V → V }, we define the injections j ξ : ξ → N + V → V and i ξ = φ −1 • j ξ , and the projection The set of all Scheme environments is the lattice of all total functions P = X → V with componentwise ordering ρ ρ if and only if ρ(x) ρ (x) in V for all x ∈ X.Furthermore, we define auxiliary functions (see [15] for a more detailed explanation) in order to provide the semantics of the language (in the following, x ∈ X and n ∈ N ): is defined by ((abs(f ))(x))(y) = f (x, y) (abstraction); and choice : V 3 → V (conditional function), add : V 2 → V (addition), and sub : The definition of sub is analogous to the function add , with the only difference that, in the second case, The semantics of the language is obtained by defining an algebra H over a signature H, 8 then the initiality yields the unique homomorphism from the term algebra.A Scheme term denotes a continuous function in the semantic domain H e = P → V .The interpretation functions of the operators are defined by the following equations: For the sake of simplicity, we made a minor change to the language presented in [30].They have an extra operator wrong to print an error message in case of an illegal operation, due to the lack of a type system.For instance, the sum of two functions produces the error wrong "non-number".To avoid to add cases almost everywhere in the definition of the interpretation functions, we let illtyped terms to denote the value ⊥ without an explicit encoding of the error message.Furthermore, we denote by ‚ the function application.
The ML-like language defined in [30] is an extended version of the simply-typed lambda calculus.As before, we provide its semantics by defining an algebra M over an order-sorted signature M = S 2 , ≤ 2 , Σ 2 .Let I (should read 'iota') be a set of base types and K a I-sorted set of base values K = { K ι | ι ∈ I }.We inductively define the set of simple types T: If ι is a base type, then it is a simple type; If τ, τ are simple types, then (τ ) → (τ ) is a simple type (henceforth we omit the parentheses).We abuse notation and extend K to the T-sorted set of simple values The set of all ML enviornments is defined as the set of all total functions ∆ = Y → K, where Y = { y 1 , y 2 , . . .} is a set of variables disjoint from X (this assumption comes from [30]) and K = τ ∈T K τ .We instantiate I = { n } and K n = N.The poset S 2 , ≤ 2 carries all the simple types (i.e., T ⊆ S 2 ) and the sort t; ≤ 2 is the reflexive relation on S 2 plus τ ≤ 2 t for each τ ∈ T.An ML term of type τ denotes a total function in M τ = ∆ → K τ , and we define M t = ∆ → K. Due to the Turing-incompleteness of such a language, we do not need all the mathematical machinery of [15,46] to formalize its semantics.
Until now, we have just formalized the single-languages.The multi-language A that combines Scheme and ML is obtained by requiring e τ and τ e in order to use ML terms in place of Scheme terms and vice versa.However, in the simplest version of the natural embedding, "the system has stuck states, since a boundary might receive a value of an inappropriate shape" [30].They restore the type-soundness by first employing dynamic checks, and then by decoupling error-handling from the value conversion through the use of higher-order contracts [12].We limit ourselves here to describe the first version; the subsequent refinements can be embodied by further complicating the semantics of the boundary functions (we do not have forced any constraints on them).
Since we need a value representing the notion of stuck state in ML, we have to extend the algebra M.This is particularly easy by exploiting the underlying framework: We make M ⊥ into an order-sorted M-algebra by defining , and the T-sorted injection φ from M τ to M ⊥ τ such that ϕ( t) = t.Now, M ⊥ becomes an algebra by letting ϕ to be an order-sorted M-homomorphism (this in turn forces − w,s M ⊥ = − w,s M ) and letting the interpretation functions to denote the value ⊥ in the remaining non-yet defined cases (namely, they compute the value ⊥ whenever one of their arguments is ⊥).
The boundary function α e,τ (ê) moves the Scheme value ê : P → V in M τ : Vice versa, α τ,e ( t) moves values from ML to Scheme.Its definition is analogous to the previous case: α n,e (n) = val n where n = δ → n, and H (α τ ,e ( t(⊥[α e,τ (v)/y]))) These definitions adhere the conversion approach of the natural embedding in [30]: If ê is the value denoted by a natural number in Scheme, then it is converted -aside from cases deriving from ill-typed terms -by α N e,n to the corresponding constant function denoting the same natural value in ML.Otherwise, if ê is the value denoted by a Scheme function, then it is mapped by α V →V e,τ →τ to the ML function with variable x at type τ → τ such that converts its argument of type τ to the Scheme equivalent by its conversion through α τ,e to x.Then it runs the original procedure ê on it and convert back the result by α e,τ .
Since the given boundary functions are subsort polymorphic, we can improve the construction and handle all the value conversions with a single polymorphic operator as explained in Sect.4.1.

Concluding Remarks
In this paper, we have addressed the problem of providing a formal semantics to the combination of programming languages, the so-called multi-languages.We have introduced a new algebraic framework for modeling this new paradigm, and we have constructively shown how to attain a multi-language specification by only stipulate (1) how the syntactic categories of the single-languages have to be combined and ( 2) how the values may flow from one language to the other.We have proved the suitability of the framework to unambiguously yield the algebraic semantics of each multi-language term, while simultaneously preserving the single-languages semantics.We have also proved that combining languages is a close operation, i.e., that every multi-language admits an equivalent ordersorted representation.In particular, we have focused our study on the semantic properties of boundary functions in order to provide three different notions of multi-language designed to suit both general and specific cases.
To the best of our knowledge, this is the first attempt to provide a formal semantics of a multi-language independently from the combined languages.
Related Works.Cross-language interoperability is a well-researched area both from theoretical and practical points of view.The most related work to our approach is undoubtedly [30], which provides operational semantics to a combined language obtained by embedding a Scheme-like language into an ML-like language.Such an outcome is achieved by introducing boundaries, syntactic constructs that model the flow of values from one language to the other.Ours boundary functions draw heavily from their work.Nonetheless, we shift them to a semantic level, in order to several variants of multi-language constructions.[40,7,21,53,36] take a similar line and combine typed and untyped languages (Lua and ML [40], Java and PLT Scheme [21], or Assembly and a typed functional language [36]), focusing on typing issues and values exchanging techniques.Instead of focusing on a particular problem, we adopt a rather general framework to model languages.This choice abstracts away many low-level details, allowing us to reason on semantic concerns in more general terms, without having to fix any particular pair of languages.
A lot of work has been done on multi-language runtime mechanisms: [20] provides a type system for a fragment of Microsoft Intermediate Language (IL) used by the .NET framework, that allows programmers to write components in several languages (C#, Visual Basic, VBScript, . . . ) which are then translated to IL. [22] proposes a virtual machine that can execute the composition of dynamically typed programming languages (Ruby and JavaScript) and statically typed one (C).[5,4] describes a multi-language runtime mechanism achieved by combining single-language interpreters of (different versions of) Python and Prolog.
Future Works.From our perspective, the research presented in this paper opens up on three directions.Firstly, future works should aim to provide an operational semantics to the formalization of multi-languages.Rewriting logic seems the most reasonable approach to unifying the denotational world, presented in this paper, to the operational one [31].This line of research is particularly useful in order to move towards an implementation of an automatic tool able to combine languages such that the resulting multi-language guarantees the results proved in the paper.
Secondly, future research applies to use the multi-language model in order to study the problem of analyzing multi-language programs.In particular, we aim at investigating how it is possible to obtain analyses of multi-language programs by merging already existing analyses of the single combined languages.
Finally, further studies should investigate the problem of compiling multilanguages.Current compilers are closed tools, non-parametric on language constructs (for instance, we cannot compile a single if-then-else term of a standard language like C or Java unless it is plugged into a valid program).Several works on typing [20,1,26], compiling [2,37], and running [50,23] multi-language programs already exist, but without providing a formal notion of multi-language.It would be beneficial to study how their approaches can be applied to the formal framework developed in this paper.

e
::= n | e + e where n ∈ N (a) The BNF grammar of L1. s ::= -| a | s + s where a ∈ A (b) The BNF grammar of L2.

Figure 1 :
Figure 1: The BNF grammars of the running example languages.

Figure 2 :
Figure 2: The two formal semantics of the running example languages.
and associativity follows easily by the definition of •.Proposition 1. Multi-language homomorphisms are closed under composition.

Figure 3 :
Figure3: A non-regular multi-language signature comprising two regular ordersorted signatures.

Figure 4 :
Figure 4: A regular multi-language signature comprising a non-regular ordersorted signature.