The construction in Sect. 3 does not set any constraint on boundary functions, thus giving a great deal of flexibility to language designers. For instance, they can provide boundary functions that act differently with respect to the intra-language subsort relation \(\preccurlyeq \): According to the previous example, it would have been possible to define
to employ different value conversion specifications for terms in
, based on whether they are used as natural numbers (
) or as expressions (
). However, when this amount of flexibility is not needed, we can refine the previous construction by reducing the amount of syntax introduced by the associated signature. In this section we examine
-
the case where boundary functions satisfy the monotonicity conditions of order-sorted algebra operators (Sect. 4.1); and
-
the case where boundary functions commutes with the semantics of operator symbols (Sect. 4.2).
In both cases, we prove that the introduced refinements do not affect the initiality of the term algebra, thereby providing unambiguous semantics to the multi-language.
4.1 Subsort Polymorphic Boundary Functions
In Sect. 3, the join relation constraints \(s \ltimes s'\) are turned in syntactical operators \(\hookrightarrow _{s, s'}\) in the associated signature
. We now show how to handle all the syntactical overhead introduced by \(\ltimes \) with a single polymorphic operator \(\hookrightarrow \) whenever the boundary functions satisfy the monotonicity conditions of the order-sorted algebras [19]. Such conditions require a subsort relation \(s_1 \le s_2\) between the sorts of a polymorphic operator \(\sigma \in \varSigma _{w_1, s_1} \cap \varSigma _{w_2, s_2}\), assuming that \(w_1 \le w_2\). In our case, \(\sigma = \hookrightarrow \), and thus we extend Definition 6 with the following ad hoc constraint (2s\(^{*}\)):
Definition 6\(^{*}\) (SP Multi-Language Signature). A subsort polymorphic (SP) multi-language signature is a multi-language signature
such that
Furthermore, order-sorted algebras demand consistency of the interpretation functions of a subsort polymorphic operator on the smaller domain, which results in the following condition (2a\(^{*}\)) on boundary functions (that extends Definition 7):
Definition 7\(^{*}\) (SP Multi-Language Algebra). Let
be a SP multi-language signature. A subsort polymorphic (SP) multi-language
-algebra is a multi-language
-algebra \(\mathcal {A}\) such that
-
(2a\(^{*}\)) \(s_1 \ltimes s'_1\), \(s_2 \ltimes s'_2\), and \(s_1 \preccurlyeq s_2\) imply that \(\alpha _{s_1,s'_1}(a) = \alpha _{s_2,s'_2}(a)\) for each \(a \in A_{s_1}\).
The notion of homomorphism in this new context does not change (an homomorphism between two
algebras is still an S-sorted function decomposable in two order-sorted homomorphisms that commutes with boundaries), whereas the associated signature to an
multi-language signature merely differs from Definition 9 for having a unique polymorphic operator \(\hookrightarrow \) instead of a family of parametrized symbols
.
Definition 9\(^{*}\) (SP Associated Signature). The subsort polymorphic (SP) associated signature to the SP multi-language signature
is the ordered triple
, where \(S = S_1 \cup S_2\), \(\mathord {\preccurlyeq } = \mathord {\le _1} \cup \mathord {\le _2}\), and
Since the associated signature is the basis for the term algebra, we need to modify the condition (3t) in Definition 9:
Definition 10\(^{*}\) (SP Multi-Language Term Algebra). The subsort polymorphic (SP) multi-language term algebra \(\mathcal {T}\) over a SP multi-language signature
with boundary functions \(\tau \) is defined as follows:
-
(1t) \(s \in S\) implies \(T_s = T_{\varPi ,s}\);
-
(2t) \(\sigma \in \varSigma ^i_{w,s}\) implies
for \(i = 1,2\); and
-
(3t\(^{*}\)) \(s \ltimes s'\) implies
.
Signature regularity is still defined as in Definition 11 and Proposition 2 still holds for the extended version developed in this section. As a result, the
multi-language term
-algebra \(\mathcal {T}\) is still initial in the category
of
multi-language algebras over the
multi-language signature
.
Theorem 3
Let
be a
multi-language signature. The class of all
-algebras and the class of all
-homomorphisms form a category denoted by
.
Theorem 4
(Initiality of \(\mathcal {T}\)). The
multi-language term algebra \(\mathcal {T}\) over a regular
multi-language signature
is initial in the category
.
The semantics of a term t induced by a
multi-language algebra \(\mathcal {A}\) is defined in the same way of Definition 12, thanks to the initiality result:
. The main advantage of dealing with
multi-language terms is that the framework is able to determine the correct interpretation function of the operator \(\hookrightarrow \), making the subscript notation developed in the previous section superfluous. This also means that programmers are exempted from explicitly annotating multi-language programs with sorts, a non-trivial task in the general case that could introduce type cast bugs.
Example. The boundary functions of the previous example are subsort polymorphic:
for each character
, and
by definition. Thus, the equivalent of the term t (see Eq. 1) in the
term algebra is
or, according to the previous notation,
and denoting the same natural number 765.
4.2 Semantic-Only Boundary Functions
In the previous section, we have shown how to handle the flow of values across different languages with a single polymorphic operator. Now, we present a new multi-language construction where neither extra operators are added to the associated signature, nor single-language operators have to be annotated with subscripts indicating their original language. Thus, the resulting multi-language syntax comprises only symbols in \(\varSigma _1 \cup \varSigma _2\). Such a construction is achieved by:
-
Imposing commutativity conditions on algebras, making homomorphisms transparently inherit the semantics of boundary functions. The framework is therefore able to apply the correct value conversion function whenever is necessary, without the need for an explicit syntactical operator \(\hookrightarrow \).
-
Requiring a new form of cross-language polymorphism able to cope with shared operators among languages. The initiality of term algebras is preserved by modifying the notion of signature in a way that every operator admits a least sort.
The variant of the framework presented in this section is particularly useful when designing the extension of a language in a modular fashion. For instance, if the signature
models the syntax of a simple functional language (for an example, see [15, p. 77]) without an explicit encoding for string values, and
is a language for manipulating strings (similar to the language \(L_2\) of the running example of this paper), we can exploit the construction presented below in order to embed
into
.
Signature. The main issue that can arise at this stage of multi-language signature is the presence of shared operators in \(\varSigma _1\) and \(\varSigma _2\). Contrary to the previous cases where such ambiguity is solved by adding subscripts in the associated signature, the trade off here is requiring ad hoc or subsort polymorphism across signatures.
Definition 6\(^{\star }\) (SO Multi-Language Signature). A semantic-only (SO) multi-language signature is a multi-language signature
such that
-
(2s\(^{\star }\))
is a poset; and
-
(3s\(^{\star }\)) \(\sigma \in \varSigma ^i_{w_1, s_1} \cap \varSigma ^j_{w_2, s_2}\) and \(w_1 \ltimes w_2\) imply \(s_1 \ltimes s_2\) with \(i, j = 1, 2\) and \(i \ne j\).
Condition (2s\(^{\star }\)) forces the subsort relation to be directed, avoiding symmetricity of syntactic categories (this is typical when modeling language extensions), while condition (3s\(^{\star }\)) shifts the monotonicity condition of order-sorted signature to syntactically equal operators in \(\varSigma _1 \cap \varSigma _2\).
The associated signature is defined without adding extra symbols in the signature, i.e., \(\varPi = \varSigma _1 \cup \varSigma _2\), and deliberately confounding the relations \(\ltimes \) and \(\preccurlyeq \) in \(\le \):
Definition 9\(^{\star }\) (SO Associated Signature). The SO associated signature to the SO multi-language signature
is the ordered triple
, where \(S = S_1 \cup S_2\), \(\mathord {\le } = \mathord {\preccurlyeq } \cup \mathord {\ltimes }\), and \(\varPi = \varSigma _1 \cup \varSigma _2\).
The embedding of \(\ltimes \) in \(\le \) (i.e., \(\mathord {\ltimes } \subseteq \mathord {\le }\)) in the associated signature enables the order-sorted term algebra construction to automatically build multi-language terms, without the need for an explicit operator \(\hookrightarrow \) that acts as a bridge between syntactic categories. It is easy to see that the term algebra over the associated signature is precisely the symbols-free version of multi-language described at the beginning.
Unfortunately, multi-language regularity does not follow anymore from single-languages regularity and vice versa (see Figs. 3 and 4)Footnote 6. More formally, Proposition 2 does not hold in this new context:
-
Suppose
,
, \(\le _1\) and \(\le _2\) to be the reflexive relations on \(S_1\) and \(S_2\), respectively, plus
, and
. If the join relation \(\ltimes \) is defined as
and
, the resulting associated signature is no longer regular, although
and
are regular (Fig. 3a). In Fig. 3b, it is easy to see that
and
but the set
does not have a least element w.r.t.
.
-
On the other hand, let
,
, \(\le _1\) and \(\le _2\) be the reflexive relations on \(S_1\) and \(S_2\), respectively, plus
and
, and
. If the join relation \(\ltimes \) is defined as
, and
, the resulting associated signature is regular (Fig. 4a), although
is not: given
and
, the set
has least element
w.r.t.
(Fig. 4b).
A positive result can be obtained by recalling that regularity is easier to check when
satisfies the descending chain condition (
):
Lemma 1
(Regularity over DCC poset [19]). An order-sorted signature \(\varSigma \) over a
poset
is regular if and only if whenever \(\sigma \in \varSigma _{w_1, s_1} \cap \varSigma _{w_2, s_2}\) and there is some \(w_0 \le w_1, w_2\), then there is some \(w \le w_1, w_2\) such that \(\sigma \in \varSigma _{w, s}\) and \(w_0 \le w\).
At this point, we can relate the
of the poset
in the associated signature of
to the
of
and
:
Proposition 3
Let
be the associated signature of
. Then,
is
if and only if
and
are
.
As a result, whenever we know that
and
are
, we can check the regularity of
by employing the Lemma 1 without checking whether
is
.
Algebra. In this multi-language construction, the boundary functions behaviour is no more bounded to syntactical operators as in the previous sections, but it is inherited by homomorphisms. A necessary condition to accomplish this aim is the commutativity of interpretation functions with boundary functions:
Definition 7\(^{\star }\) (SO Multi-Language Algebra). Let
be an
multi-language signature. A semantic-only (SO) multi-language
-algebra is an SP multi-language
-algebra \(\mathcal {A}\) such that
Note that \(\sigma \in \varSigma _{w_1, s_1} \cap \varSigma _{w_2, s_2}\) and \(w_1 \ltimes w_2\) imply \(s_1 \ltimes s_2\) by condition (3s\(^{\star }\)). The notion of homomorphism remains unchanged from Definition 8 (to understand how the homomorphisms inherit the boundary functions behaviour, see the proof of Theorem 6).
The term algebra is defined similarly to Definition 10, except for boundary functions:
Definition 10\(^{\star }\) (SO Multi-Language Term Algebra). The semantic-only (SO) multi-language term algebra \(\mathcal {T}\) over an SO multi-language signature
with boundary functions \(\tau \) is defined as follows:
-
(1t\(^{\star }\)) \(s \in S\) implies \(T_s = T_{\varPi , s}\);
-
(2t\(^{\star }\)) \(\sigma \in \varSigma _{w, s}\) implies
; and
-
(3t\(^{\star }\)) \(s \ltimes s'\) implies
.
Since the subsort relation \(\le \) includes the join relation \(\ltimes \), \(s \ltimes s'\) implies \(T_{\varPi ,s} = T_s \subseteq T_{s'} = T_{\varPi ,s'}\). Thus, the boundary function \(\tau _{s,s'}\) can be defined as the identity on the smaller domain (note that it trivially satisfies the commutativity condition (3a\(^{\star }\))).
Proposition 4
Let
be an
multi-language signature. Then, the
multi-language term
-algebra is a proper
multi-language algebra.
Theorem 5
Let
be a
multi-language signature. The class of all
-algebras and the class of all
-homomorphisms form a category denoted by
.
We can now prove the initiality of \(\mathcal {T}\) in its category.
Theorem 6
(Initiality of \(\mathcal {T}\)). Let
be a regular multi-language signature. Then, the term algebra \(\mathcal {T}\) is an initial object in the category
.
Thanks to the initiality of the term algebra, the definition of term semantics is the same of Definition 12.
Example. Let \(\mathcal {A}_1\) and \(\mathcal {A}_2\) be two order-sorted algebras over the signatures
and
, respectively, as formalized in the example in Sect. 3. Suppose we are interested in a new multi-language \(\mathcal {A}\) over
and
such that any string expressions t of sort
in
can denote the natural number
when embedded in
terms. For instance, we require that
and
, but
(parentheses in the last term have only been used to disambiguate the parsing result).
Since the requirements demand to use string expressions in place of natural numbers, the join relation \(\ltimes \) shall define
and ensure transitivity, hence
,
, and
.
The signatures
and
are trivially regular. However, by merging
and
, we are causing subsort polymorphism on the symbol \(\texttt {+}\), which is used as sum operator in \(\mathcal {A}_1\) and as concatenation operator in \(\mathcal {A}_2\), and therefore we have to check the regularity: Let
, and
. Given \(\texttt {+} \in \varSigma _{w_1,s_1} \cap \varSigma _{w_2,s_2}\) and the lower bound
, then there exists
such that \(w \le w_1, w_2\) and \(\texttt {+} \in \varSigma _{w,s}\), where
(we have employed Lemma 1 thanks to Proposition 3). Analogously, when \(w_0 = w_1, w_2\) the relative least rank is
.
The multi-language
-algebra \(\mathcal {A}\) is now defined by joining the projected algebras \(\mathcal {A}_1\) and \(\mathcal {A}_2\) and by defining boundary functions \(a_{s, s'}\) for each \(s \ltimes s'\) such that convert strings in naturals (their length) when strings are used in place of naturals:
The above definition of boundary functions satisfy both conditions (2a\(^{*}\)) and (3a\(^{\star }\)).
The initiality theorem yields the semantic homomorphism from \(\mathcal {T}\) to \(\mathcal {A}\). For instance, suppose we want to compute the semantics of the term
The least sorts of t, \(t_1\), and \(t_2\) are
, and
, respectively. The operator \(\texttt {+}\) belongs to both
and
, and its least rank w.r.t. the lower bound
is
. By Definition 12 we have
At this point, since
and
, then the least rank of the root symbol + of \(t_1\) w.r.t. the lower bound
is
, thus
Similarly,
and
. Then, the least rank of the root symbol + of \(t_2\) w.r.t. the lower bound
is
and therefore we have
Finally,
as desired.
We can observe that without any syntactical operator the framework is still able to apply the correct boundary functions to move values across languages.