Visual Low-Code Language for Orchestrating Large-Scale Distributed Computing

Distributed, large-scale computing is typically performed using textual general-purpose programming languages. This requires significant programming skills associated with the parallelisation and distribution of computations. In this paper, we present a visual (graphical) programming language called the Computation Application Language (CAL) to raise abstraction in distributed computing. CAL programs define computation workflows by visualising data flowing between computation units. The goal is to reduce the amount of traditional code needed and thus facilitate development even by non-professional programmers. The language follows the low-code paradigm, i.e. its implementation (the editor and the runtime system) is available online. We formalise the language by defining its syntax using a metamodel and specifying its semantics using a two-step approach. We define a translation of CAL into an intermediate language which is then defined using an operational approach. This formalisation was used to develop a programming and execution environment. The environment orchestrates computations by interpreting the intermediate language and managing the instantiation of computation modules using data tokens. We also present an explanatory case-study example that shows a practical application of the language.

construction of time-efficient computation software that uses parallel and distributed processing. Considering that experienced programmers are scarce on the market, the need for new programming approaches is constantly growing. These approaches should reduce complexity by raising the level of abstraction and removing unwanted technology-related issues. This way, they would be accessible to non-professional programmers or even to domain experts.
This tendency to reduce the complexity of programming and raise the abstraction at which programming constructs are formulated led to the emergence of the low-code approach [1]. Low-code solutions are predominantly based on creating visual, model-based languages [2] with the aim of making them more understandable and accessible. It can be argued that such a solution should be easier to use by inexperienced programmers and raises the productivity of programming [3]. For this reason, the usage of visual programming languages is recently gaining popularity in engineering, and education [4,5]. This can be observed especially in the field of distributed computing, such as IoT [6], which shares multiple similarities with more powerful distributed Large-Scale Computing platforms.
Typically, low-code systems are used to develop web-based business applications. However, recently it has been observed that the low-code paradigm can be easily applied to solve complex computation problems (using, e.g. Artificial Intelligence modules) [7]. This can be achieved by wrapping certain fragments of computation logic into computation units. These units can then be (re-)used when constructing computation applications at a significantly higher level of abstraction. This would lead to the emergence of a graphical (visual) programming language that would allow for expressing orchestrations (or choreographies [8]) of many computation units.
The main challenge for such a visual language would be dealing with typical computation parallelisation issues. Prominently, these issues pertain to High-Performance Computing (HPC) systems [9]. These systems focus on using powerful co-located homogeneous environments called supercomputers with strong bindings between computation nodes, thus allowing for strongly parallelised computations. Such approaches are mainly used by big research institutions and enterprises. This is due to the very high cost of operation and the requirement of expert knowledge of parallel computing to use the potential of such machines fully. Regular users would instead need an approach which we could call Large-Scale Computing (LSC). This approach would focus on using multiple distributed computation nodes, where each node is way less powerful than any supercomputer. However, linked together and parallelised, they can be used to solve advanced computation problems much faster than with the help of a single node. In both HPC and LSC, the main problem in parallelisation is the passing of data. However, the specific challenges differ. In LSC, nodes are distributed and connected through the Internet. Data transfer speeds are thus significantly slower than in HPC. HPC supercomputers are homogeneous, and their computation nodes are connected within a single location.
Considering the above, the main focus of the LSC approach is the management of data flow between computation nodes. The key is to minimise the overhead caused by data transfer and maximise computation speed through efficient distribution of workload between many nodes. In other words, computations should be controlled by the flow of data between the nodes where computations are performed. In such a data flow-driven approach to LSC, the entire computation is separated into steps. Each step produces results that become input data for the next steps until the final result is computed. Separating complex computations into steps allows treating each step as an independent computation unit placed in a separate container. Containerisation facilitates the orchestration of computations, as each step can follow one of the multiple solutions widely used in computation centres worldwide.
Moreover, the data-driven approach would allow for better utilisation of computer resources as the computation can start only when the input data is available. This reduces the amount of reserved but unused computation resources. Separation of computation steps into containers allows for easier reuse of already developed pieces of the application. The end-user would not need to edit the code of the computation step and treat it as a black box that can be connected to other computation steps to create new applications.
In this paper, we approach the creation of a graphical distributed computations language from the point of view of the low-code paradigm. It should be noted that several approaches to representing parallel and distributed computations in a graphical form already exist (see the next section). However, none of them seems to draw from the results of research on model-driven development that is used in constructing low-code languages. This includes aspects such as the notation's usability and the formalisation of its syntax and semantics. We thus present the Computation Application Language (CAL), developed as part of the BalticLSC Platform [10,11].
The paper contributes by introducing the precise syntax and semantics of CAL and the details of its implementation. The language's abstract syntax is defined through a metamodel with concrete syntax following best practices in this area [12]. We aimed to make the language accessible to non-professional programmers and domain experts according to the principles of low-code software development. Moreover, our work includes the definition of the language's semantics using translational and operational approaches. This allows for the unambiguous construction of execution environments for the language and illustrates a method for constructing similar scientific workflow environments.
In the next section, we provide related work referencing previous approaches to constructing visual computation languages. Section 3 provides a brief informal introduction to the presented language. This is followed by presenting CAL's syntax (abstract and concrete) in Section 4 and its semantics in Section 5. Section 6 shows how CAL's semantics was used to implement its execution environment on the web. This environment was used to conduct several case studies where one of them is presented in Section 7. We conclude with a discussion and proposition for the future development of CAL and its environment.

Related Work
As a recent study shows, the development of computation applications (HPC, parallel processing) is dominated by textual programming languages [13]. Only 5% of the developers use purely visual languages. The dominating model is to use a generalpurpose textual programming language and equip it with features specific to the existing parallel and distributed programming models like MPI or OpenMP [14]. In these approaches, programmers need to deal with relatively low-level issues like message passing and memory sharing. This means that programmers need to have appropriate skills due to the significant technical complexity of parallel programming models.
The role of visual notations is to reduce the "accidental" complexity [15] and raise the level of abstraction at which programs are formulated through presenting computation flows in a graphical form. It is already a long-time discussion on whether visual notations are of benefit to professional programming [16]. However, they are more comprehensible for novice programmers [17] and increase the capability to create a mental representation of computation problems [18]. This generally stems from the fact that diagrams are most often better than text in expressing complex issues, including complex programs [19].
In the past, various graphical notations were used to assist parallel program development [20][21][22]. Thus, the idea to apply visual languages to high-performance computations (including parallel and distributed programming) has emerged quite early [23]. As a natural consequence, several visual programming systems have been proposed [24,25] together with graph grammars [26] and models [27] supporting the definition of parallelised computation. Syntactically, practically all such languages support graph structures, where graph nodes define the computation elements and graph arcs define data or control flows. This is also the case for the Computation Application Language (CAL) presented in the current work. However, CAL has several characteristics that distinguish it from other such languages.
The CODE, CODE 2.0, and Hence languages [28][29][30] are based on graph structures, where nodes define simple operations and the edges represent the order of their execution. Such a visual approach allows for a better representation of the concurrency of computations. These languages were used to define computations using atomic operations, and the graph was used to generate code directly executed on machines. Compared to them, CAL focuses on less granular computation steps encapsulated in separate containers. In CAL, graphs define sequences of container executions and the flow of data between these executions. This is different from the low-level code generation approach for which, over the years, many solutions were created. These solutions mainly focus on low-level parallelization of computations with the help of graphical languages, usually converted to C or C-like languages [31][32][33][34].
Another solution, extending the CODE language, is PEDS [35]. This tool allows for the visual mapping of computation units onto computation resources. It consists of four levels of abstraction: physical level, support level, visual language level, and application level. The PEDS tool allows for constructing applications with visual language and mapping their execution onto specific computation resources. In the PEDS visual language graphs, the nodes represent parallel processes, and the edges represent data dependencies between the processes. The use of PEDS requires its users to work within multiple layers of abstraction. Another similar tool was GRADE (Graphical Application Development Environment) [36], which consisted of a graphical editor to write parallel applications, a C code generator, and various distributed debugging tools. GRADE could perform computations on multiple nodes using only the graphically defined program.
A different parallel programming tool has been proposed by Delaitre et al. [37]. EDPEPPS is an IDE for visual parallel programs, providing a set of tools, including ones to build parallel programs, simulate their execution in heterogeneous environments and debug the programs executed in such environments. The main components of parallel computation are represented visually with the algorithms written textually as C-like procedures. Visual languages for parallel computation have also been developed to support cloud-based computations. The Visual Parallel Programming Environment [38] allows for the definition of low-level computation with a visual language, which is later translated into Java-MPI programs executed in the cloud. VPPE and other solutions [34,39] based on the generation of MPI programs require defining computation steps at an atomic level, which does not reduce the complexity required to develop a new solution. Compared to these approaches, CAL allows for defining computation with more complex steps and running it on a cloud without the need for the generation of lowlevel code.
Another use of a visual language for computation parallelisation has been proposed by Feng et al. [40] in the form of an extension of the Snap! language, designed to make learning parallel programming easier. This language has been inspired by MIT's Scratch project [41] that uses interlocking blocks instead of other already established visual notations such as Petri Nets [42] or Colored Petri Nets [43] as a way to visualise the control flow between computation steps. The previously sequential Snap! The tool has been extended, and the blocks can be converted to OpenMP code and executed on the machine of choice. Students could use the blocks to define parallelised computations using parallelisation blocks with operations such as MapReduce or parallel Map and ForEach functions. Researchers believe that by providing a visual language, they successfully lowered the learning curve for parallel computing compared to traditional C-based textual languages. The results emerging from the performed assessment seem to confirm this statement. The CAL has been created with the same goal. However, it uses boxes and arrows instead of interlocking blocks and operates on a less atomic level of computation steps.
A similar goal was the basis for the emergence of Scientific Workflow Management Systems [44]. Some of them, like Galaxy, Taverna, Kepler, and WS-PGRADE use graphical notations to denote the computation workflows. Seemingly, the most similar to our proposition is the WS-PGRADE system [45,46], which is the successor of the already mentioned GRADE system [36]. This system provides a web portal that allows for the creation and execution of computations in the form of workflows. What is important, it offers a visual workflow language where the execution of computation units (services) is controlled by the flow of data (files). Several solutions used in WS-PGRADE are close to those used in CAL and BalticLSC. Thus we will also refer to them in this paper's further text. Especially the data-flow-driven characteristics distinguish both solutions from the rest. It can be noted that WS-PGRADE was used to develop several dedicated portals for specific computation domains. In turn, BalticLSC aims to provide a common, user-friendly workspace where various domain-specific applications can easily be built using standard computation modules. Also, as mentioned in the introduction, CAL can be distinguished through its strict formalisation and usage of low-code (model-driven) approaches. It can be noted that both WS-PGRADE and BalticLSC use the orchestration approach with a central system that controls the flow of computations and data. It can be contrasted with the Flowbster system [47], which uses the choreography approach. In this system, workflows can be created as "autonomous graphs" of computation nodes that can be executed in the cloud. Moreover, it uses textual rather than visual notation.
The variety of approaches to representing scientific workflows calls for a common framework for easyto-use, web-based workflow editors. An interesting attempt in this direction was that proposed by Gesing et al. [48]. They have introduced a generic data model and a design model for a workflow editor (Generic Web-based Workflow Editor -GeWWE). Based on this, they attempted to build a prototype graphical editor capable of generating workflow applications in different textual languages. Unfortunately, this attempt did not result in a fully developed system. In our approach with CAL, we propose a fully developed data model (metamodel) with semantics that allows for building a fully operational translator and execution engine.
The BalticLSC CAL editor allows for the web-based development of parallel applications. Similar webbased approaches to parallel computation can already be found, allowing for the graph-based definition of programs [49] or just providing a general gateway to distributed computation resources [50]. In a survey performed by Calegari et al. [51], researchers examined multiple existing web-based HPC solutions and defined general requirements for such platforms. The Balti-cLSC Platform fulfils these requirements and attempts to exceed them through, e.g. allowing for the visual definition of general-purpose computations with CAL. Many visual parallel computing languages on such platforms are domain specific as it simplifies the challenge [52], but simultaneously, it makes the solution less universal. Kubeflow [53] is a good example of such a domain-specific visual language, created to help develop machine learning pipelines running on Kubernetes [54]. In addition to allowing for the visual pipeline definition, Kubeflow allows for the execution of computations on standard Kubernetes clusters, allowing for easier management of computation resources. The BalticLSC Platform and CAL have similar foundations, with CAL being domain-independent and more uniform, allowing for its wider use while still utilising the easier orchestration of computations provided by containerisation and Kubernetes technology. To help with parallel application development, CAL has multiple skeletons similar to those described by Zandifar et al. [55] that help with the automatic parallelisation of computation similar to the MapReduce operations.
The specifics of the BalticLSC Network have many similarities with the volunteer computing platforms such as BOINC [56] or Seti@Home [57]. In both solutions, advanced computations are performed in parallel on multiple machines connected via the Internet. However, the BalticLSC Network mostly consists of comparatively larger resources (from small clusters to even HPC solutions) with the more stable (not voluntary) connection of nodes to the network.

CAL Overview
As a low-code language, CAL strives to be mostly self-explanatory. Thus, we will start introducing the language with a simple example. This should provide an intuitive understanding of the language constructs without studying a formal language specification.
Our example operates in the domain of video editing. The aim is to process black-and-white films with subtitles. Each film should be colourized, and its subtitles translated into a specific language. The subtitles should be appropriately mixed within the video file. What is important, we would like to process many such films in parallel.
As a typical representative of low-code languages, CAL is strongly based on graphical syntactic constructs. This is illustrated in Figs. 1 and 2 that contain the full application for our example problem. The first figure shows an elementary application for processing a single film ("VS Mixer" -video and subtitle mixer). It receives a video file ("Video Input") and a subtitle file ("Subtitle Input") and produces a processed film ("Output Film"). The application uses three computation modules. The first one ("Video Colorizer") performs automatic colourization. The second one translates subtitles ("Subtitle Translator"). As we can see, both of these modules can be run in parallel. Their results form the input to the third module ("Subtitle Mixer") that embeds the translated subtitles into the In the second figure (Fig. 2) we can see an application that enables the parallel processing of many films. We can notice that it uses the VS Mixer app (Fig. 1) as part of its code. On its input, it receives a folder of video files ("Video Inputs") and a folder of subtitle files ("Subtitle Inputs"). These folders are handled by the "File Synchroniser" module. This module creates pairs of files (for example, based on specific file naming rules) and sends these pairs sequentially to its outputs ("files1" and "files2"). Each of such pairs is then input to an instance of the "VS Mixer" application. The resulting films are placed by the mixer instances into a specified output folder ("Output Films"). One notation element that might be non-intuitive is the "data pin" symbols. Generally, single files are denoted by a single triangle symbol, folders are denoted by a triple triangle symbol, and file/folder sequences are denoted by a black triangle symbol. This will be explained in more detail in the following section.
Note that the computation modules ("Video Colorizer", "Subtitle Translator" etc.) form the elementary building blocks of the language. In this sense, the language is extendable through defining new computation modules and thus extending their libraries. It is up to the language users to define their modules or to use the existing ones found in the library. The role of the language is to provide means for the parallelisation and distribution of computations. The language runtime system takes care of running instances of appropriate computation modules on appropriate computation resources and transmitting data between these instances. Thus, an important part of the runtime system is a component that performs computation job brokerage (assignment of jobs to specific computation nodes).
It is worth noting that some of the CAL constructs influence the way module execution is parallelised. For example, let us consider the file pairs produced by the "File Synchroniser" (see Fig. 2). The creation of such pairs early in the processing facilitates the optimisation of job brokerage. In this case, both the colourization and subtitle translation tasks can be assigned to the same computation node, thus avoiding potentially costly file transfers.
In a practical implementation of our language, computation modules are provided as container images. Each computation node operates an instance of a container management system (like Kubernetes or Docker Swarm). The CAL runtime engine then controls the distribution of container instances to appropriate container management system instances.

Language Syntax
To define the CAL syntax, we use typical techniques of software language engineering for defining graphbased languages [58]. This consists in defining the language's abstract syntax (internal structure of language constructs) and concrete syntax (visual representations of language constructs as seen by the language users). The abstract syntax is defined using a metamodel which is a typical approach. The concrete syntax is defined through some examples and an informal description of how visual representations can be used.
Basically, CAL consists of just a few constructsunit calls, data pins (declared and computed), and data flows. The respective metamodel can be seen in Fig. 3, but a detailed description of the main classes and concrete syntax follows in Tables 1, 2, 3, and the next paragraphs. CAL meta-model is centred around two main metaclasses -ComputationUnit and ComputationUnitRelease. The first of them is used to distinguish the logical units of computation that perform a specific function. The second of them represents particular versions of those units tied to given implementations. Both of those meta-classes are further described by their respective ComputationModules and ComputationModuleReleases represent the bottom layer of units and releases, corresponding to atomic parts of computations executed on particular computation nodes. Computation-Applications and ComputationApplicationReleases represent elements situated higher in a hierarchy and typically encompass the functionality of many smaller units. Specific dependency between units is defined at the level of their releases, using the UnitCall metaclass. Instances of this meta-class are contained in ComputationApplicationReleases and point to UnitReleases invoked by them. The exact sequence of unit calls in the application is defined using representatives of the DataFlow meta-class specifying the acceptable paths that data can flow between the called units. These data flows connect DataPins observing their type. These data pins represent data input or output from respective computation units. Data pins associated with unit calls (computed pins) refer to data pins that are associated Unit calls are depicted as rounded rectangles. The upper part of the unit call contains the unit call's name, and the lower part contains the name of the invoked computation unit release which consists of the unit's name and release version. Data pins owned by the unit call are typically placed on the left and right of the unit call (required on the left, provided on the right). The strength of the unit call is denoted by the outline shape of the rectanglestrong unit calls have solid lines and weak ones are dashed.
with unit releases (declared pins). In other words, each ComputedDataPin points to a compatible DeclaredDat-aPin contained in the respective ComputationUnitRelease. These DeclaredDataPins specify details about data received and produced by unit releases, and in the case of applications, also serve as starting and finishing points for data flows. Computation units and their releases do not have concrete syntax and are not presented in CAL specifications. Other language syntactic constructs have appropriate graphical representations as presented in Tables 1, 2, 3. Each of the tables briefly explains the abstract syntax (the metamodel elements in Fig. 3) and the concrete syntax through appropriate examples.
Considering the configuration and token multiplicity of input and output pins, we can distinguish several reference module types. These types can be combined into hybrid types (e.g. splitter-joiner).
• Simple processor (one single input, one single output) -a most common computation unit that performs some algorithm on input data, creating output data. • Data separator (one single input, many single outputs) -a computation unit that splits the input data The Data Flow class represents the transition of data tokens between data pins. Every data flow connects exactly two data pins. The flow of data is from a declared required or computed provided pin to a declared provided or computed required pin. There can be just one outgoing data flow from any CAL element, but multiple incoming data flows are possible.

Associations.
Target [1] -the data pin that consumes data tokens. Source [1] -the data pin that produces data tokens.
Data flows are depicted as arrows. They must connect two data pins. The figure below is a simple example of a CAL program with two data flows (arrows) appropriately connecting two declared pins with two computed pins. The Data Pin class is an abstraction that encompasses declared and computed data pins. Data pins represent the inputs and outputs of computation units (applications and modules).This refers to specific data sets required and provided by computation unit releases. Each data pin is characterised by its data and token multiplicities, its data type (e.g., JSON, XML, Image, etc. ...), its metadata structure (for structured data types), and its access type (e.g., MongoDB, FTP). The Declared Data Pin class represents the inputs and outputs of applications or modules. In turn, the Computed Data Pin class represents instantiations of declared data pins that are parts of unit calls. Computed pins are derived from the declarations of the respective computation unit releases that are invoked by the including unit calls. More specifically, they are derived from the respective declared pins. Data pins can be connected by data flows being their respective sources and targets. A required declared pin can only be a source, while a provided declared data pin can only be a target. Computed data pins act in the opposite way -required computed pins can be targets while provided computed pins can be sources.

Key attributes and associations.
Name: string -specifies the name of the data pin. Binding: DataBinding -specifies the data binding for the data pin. Binding might be RequiredStrong, RequiredWeak, Provided or ProvidedExternal. Required data pins define data sets that are consumed, and provided data pins define data sets that are produced. Strong data pins define mandatory data sets, and weak data pins define optional data sets. DataMultiplicity: CMultiplicity -specifies the multiplicity of data items in the data set defined by a single token. It might be Single (e.g. a single file) or Multiple (e.g. a folder of files). TokenMultiplicity: CMultiplicity -specifies the multiplicity of tokens consumed or produced by the data pin. It might be Single (one token produced) or Multiple (multiple tokens produced).

Declared Data Pin
Declared Pins for applications are defined within CAL programs using the notation of rectangle. The shape of the rectangle is determined by the values of the pin's attributes. The upper part of the figure below contains the required declared data pins, while the lower part contains provided declared data pins of an application. They are distinguished by the placement of a black bar -a required data pin has it on the left side, and a provided data pin has it on the right side.
Data multiplicity is depicted by the number of triangles -"single" is denoted by one triangle, and "multiple" is denoted by three. Token multiplicity is depicted by the colour of the triangles -"single" has a white interior, and "multiple" has a dark interior. The name of the declared data pin is placed near (e.g. above) the rectangle.
Computed Data Pin Computed pins are depicted as rectangles pinned to unit call boxes. The figure below shows examples, where pin names denote: Req -required, Prv -provided; two last letters provide the token and data multiplicity accordingly, Ssingle, M -multiple. Required pins are on the left, provided on the right side of the unit call. Multiplicities of computed data pins are denoted as for the declared data pins. The name of the computed data pin is placed near (e.g. above) the rectangle.
into two or more output data sets that will be further processed using different means; separation can be done using some computation algorithm. • Data splitter (one single input, at least one multiple outputs) -a computation unit that splits a data set into smaller parts that will be further processed in the same way; splitting can be done using some computation algorithm. • Data joiner (many single inputs, one single output) -a computation unit that joins several data sets (typically resulting from different processing paths) into a single data set; joining can be done using some computation algorithm. • Data merger (at least one multiple input, one single output) -a computation unit that merges several data sets of the same type into a combined data set; merging can be done using some computation algorithm.
It can be noticed that this classification of nodes is somewhat similar to that found in the WS-PGRADE system [45] (cf. Collector and Generator nodes). How-ever, in our approach, the exact categories and semantics slightly differ.

Language Formal Semantics
Being a low-code programming language, CAL needs a precise definition of its semantics to be used during runtime. To define it, we use a hybrid approach consisting of two steps. In the first step, we use the translational semantics approach [58] (see Chapter 10). For this purpose, we define an intermediate language called CAL-Executable. Based on this, we specify a set of translation rules that map CAL constructs onto the constructs of CAL-Executable. In the second step, we use the operational semantics approach [59] (see Chapter 8). For this purpose, we define an abstract machine with a set of transitions defining its behaviour. This machine defines the execution of CAL-Executable programs.
The reason for this hybrid approach lies in the characteristics of CAL. The language is graph-based and thus it is not trivial to define operational semantics directly. At the same time, it is not possible to use translational semantics alone. This is due to special requirements for the execution of CAL programs (parallelisation and distribution of computations through container instances). This prevents us from using a standard existing language (with known semantics) as the target for the translation.

CAL-Executable Definition
Before specifying CAL semantics formally, we first need to define the CAL-Executable syntax. We do it in the same way as for the CAL syntax -through metamodel, as shown in Fig. 4. The metamodel is based on three main classes CTask, CJobBatch, and CJob. CTask represents the whole computation task solving a specific problem. CJob represents the smallest portion of a computation task, connected to a particular code run in a container. CJobBach represents a strongly dependent set of CJobs that need to be run on the same cluster. An additional class, CService represents containers like databases that need to be running constantly and are required by certain CJobs.
CJobs and CServices are the elementary executable elements contained in CJobBatches and specialise in a more general CJobBatchElement metaclass. This metaclass is used to group common features of CJobs and CServices, like paths to particular container images. Even more general is the CExecutable metaclass which is specialised for all metaclasses that represent executable elements, including CTasks and CJobBatches. It is used to provide the identification of an executable element within the runtime environment. Moreover, every CExecutable instance can contain many CData-Token elements. The CDataToken metaclass represents the metadata of the data elements (e.g. files) passed between the executable elements.  6 and 7 contain examples of CAL-Executable syntax. As can be noticed, the syntax is textual. Moreover, each program can be expressed in a linear form (tasks containing batches and batches containing jobs). Figure 5 shows a translation of the VS Mixer program (see Fig. 1) into CAL-Executable. As we can see, the translated program contains one task with one embedded batch. The batch contains three jobs corresponding to the three unit calls of the source VS Mixer program. Each job contains data token definitions corresponding to its required and provided pins. The batch contains data tokens corresponding to the declared pins of the overall application, where such a situation occurs when just a single batch is created in a CAL-Executable program.
In the runtime environment, this program is executed through the exchange of data token instances. Every such instance represents a particular piece of data (e.g., a file) to be processed by computation module instances. The initial data tokens are created based on the user input. Here, for instance, the application user should provide appropriate metadata that points to the files containing "Video Input" and "Source Input". This will cause the creation of appropriate two data token instances with respective token numbers (no=1 and no=2). This in turn will cause the initiation of a new task and its only job batch instance. This is because we have two "required strong" data tokens in the definition of the job batch that have matching token numbers.
The initiation of the new batch instance is followed by the initiation of contained job instances. Specifically, an instance of the Video Coloriser and an instance of the Subtitle Translator are created. This is due to that these jobs have "required strong" tokens where their numbers correspond to the already received two data token instances (no=1 for the Video Coloriser and no=2 for the subtitle Translator). When these two job instances finish execution, they produce appropriate data token instances (no=4 and no=5, respectively). This causes the initiation of an instance of the Subtitle Mixer module. Finally, this instance produces a token (no=3) which corresponds to the "provided" data token of the containing batch. This causes the finalisation of the batch instance and the whole program. Figure 6 shows a translation of the extended CAL application (see Fig. 2. Note that this extended application calls the VS Mixer app. The translation was made in a "strong" mode, which means that all jobs should be contained in a single job batch. This results in more optimal data transfer but can negatively impact parallelisation (all job instances are executed in the same computation node).
The CAL-Executable program in Fig. 6 will be executed similarly to the program in Fig. 5 but with an additional job (File synchroniser). A significant difference is caused by the "multiple" data tokens in the job batch and File Synchroniser definitions. The tokens required by the job batch (and the File Synchroniser) have a "data multiplicity" set, which means that the batch expects to receive two tokens (no=1 and no=2) pointing to appropriate data sets (e.g. file folders). The tokens provided by the File synchroniser have a "token multiplicity" set. This means that they produce many tokens of each type (no=4 and no=5). As a result, the remaining jobs will have many instances, depending on   Figure 7 also shows a translation of the extended CAL application in the "weak" mode. This means that jobs can be distributed between several job batches. This can result in better parallelisation but can also impede data transfer times. Note that marking of applications and module calls as "weak" and "strong" indicates the sensitivity of computations on data transfer and influences division into computation batches. This is a unique characteristic of CAL which distinguishes it from previous such languages.
In our example, the CAL-Executable program is divided into two batches. Each of the batches contains its own set of data tokens. Moreover, the whole task contains a set of data tokens. This is due to that the initiation of the task is not equivalent to the initiation of one of its batches. The task will be initiated when token instances with no=1 and no=2 arrive. This will also initiate the batch with uid=b003. The other job batch will be initiated only after an instance of the File Synchroniser produces token instances with no=4 and no=5. Note that this time, multiple tokens produced by the instance of batch uid=b003 will cause the initiation of many instances of batch uid=b004.

Translation from CAL to CAL-Executable
Having defined the syntax of CAL-Executable, we can now start defining the semantics of CAL. Here, we provide the first part of the formal specification using a translational approach. We start by defining a utility function that is used within the translational rules to shorten them. Note that within the rules we refer to class and attribute names from the metamodel (see Fig. 3).

Definition 1
The function "child" is defined as follows: child(x : U nitCall, y : U nitCall) −→ y.U nit.Calls x ∨ ∃(z : U nitCall), such that (y.U nit.Calls z ∧ child(x, z)) The "child" function is boolean and has two parameters -unit calls. It is true if the first unit call is (recursively) contained within a computation application release called by the second unit call.
With this definition, we define 7 translation rules. Each rule is presented in a uniform notation. The rule definition starts with a brief, informal textual description. This is followed by three sections. The "source" section lists and names a set of objects subject to the respective translation. The "condition" section defines the specific configurations of the objects listed in the "source" section. These configurations need to be fulfilled in order for the rule to be applied to these objects. Moreover, the rule will be applied to all configurations that fulfil the condition. The "target" section defines objects and their configurations that should be created as a result of applying the given rule. Note that the first four above rules are responsible for creating the structure of the target CAL-Executable program -the task with contained batches and jobs. The last three rules are responsible for creating tokens associated with appropriate tasks, batches, and jobs.

Operational Semantics of CAL-Executable
Having defined the translation from CAL to CAL-Executable we should now complement the specification of CAL semantics by formally defining the semantics of CAL-Executable. To do this, we use the operational semantics approach. We will define an abstract machine with a set of configurations and a set of transition relations (a transition system [60]). where X denotes a finite set of elements where each element belongs to X .

Definition 2 A CAL-Executable Abstract
These elements of the abstract machine correspond to the syntactical structure of CAL-Executable programs, consisting of job batches, jobs, and tokens. Batch and job sets correspond directly to the batch and job definitions in a CAL-Executable program. The token set is treated differently. For constructing token sets, we treat CDataTokens with the same TokenNo and Binding as the same token (even if they have different PinNames). Finally, the execution starting relations correspond to the containment of tokens in appropriate batches and jobs. For these relations, the tokens are also treated the same as described above.
Based on this definition, we can specify a transition system. First, we define the following instance sets: • I S is a finite set of unique identifiers According to this, our abstract machine during execution operates on appropriate sets of token, batch, and job instances. Token and batch instances are distinguished through unique identifiers. Job instances are identified through their assignment to specific batch instances.
The resulting transition system is thus defined as follows. Its set of configurations is: and the set of transition relations is: The first transition set B pertains to creating new batch executions based on the batch execution starting relation ρ B . The source configuration at, b, j is transformed such that a new batch execution d is added to the current set of batch executions. For such a transformation to be executed, two conditions need to be met. The first condition simply requires that the current set of batch executions b does not yet contain the new batch execution d. The second condition is applied to the current set of token instances. This set should contain a subset of token instances a, compliant with the batch execution starting relation ρ B . By this, we mean that there exists a batch in relation ρ B with exactly such a set of tokens, that all of these tokens are the first elements of token instance tuples in a and all of them have the same identifier (second element of token instance tuples). Moreover, the first element of the new batch execution tuple d 1 , will be set to this above-mentioned batch, and the second element of the tuple d 2 , will be set to the above-mentioned identifier.
The second transition set J pertains to creating new job executions based on the job execution starting relation ρ J . The source configuration at, db, j is transformed such that a new job execution c is added to the current set of job executions. Moreover, a subset of token instances is removed from the current set of token instances. For such a transformation to be executed, two conditions need to be met. The first condition simply requires that the current set of batch executions contains a batch execution d that is the second element of the new job execution tuple c 2 . The second condition is applied to the current set of token instances and is analogous to the second condition of the transition set B , but pertaining to job execution. This also involves the batch d 1 (the first element of the batch execution tuple d) that needs to participate additionally in the job execution starting relation ρ J .

Language Implementation
The presented syntax and semantics of CAL and CAL-Executable were used and implemented as a basis for constructing a full Large Scale Computing systemthe BalticLSC system. 1 The system offers a web-based user interface and is currently freely available for application developers. The overall architecture of the system is presented in Figs. 8 and 14. Figure 8 shows the main components of the Master Node that are responsible for the management and execution of CAL programs. This includes mechanisms for distributing computations to be performed on the various Cluster Nodes registered in the system. CAL programs can be developed using the CAL Editor available through the BalticLSC FrontEnd component, as illustrated in Fig. 9. The editor is web-based and implements the full syntax of CAL. Individual Computation Modules can be added to the editor's toolbox, placed on the canvas and their pins connected through data flows. The editor assures dynamic validation of syntax, not allowing for incorrect connections. CAL diagrams are dynamically stored in the DiagramRegistry component through an appropriate API. Note that a detailed discussion of the FrontEnd component and Applications expressed in CAL can be run from the BalticLSC Computation Cockpit illustrated in Fig. 10. The user can select an application and initiate a new task. All the current and finished tasks can be accessed and examined. For example, Fig. 11 shows one of the task executions (X8) from Fig. 10. As we can see, one of the job instances in this task has failed and the user can diagnose the problem by examining the final message and the appropriate logs (not shown in the figure).
Whenever a new task instance is created, appropriate CAL diagrams are accessed through the IDiagram interface of the DiagramRegistry component (see again Fig. 8). The CAL program associated with the given application is first processed by the TaskManager Following this, the TaskManager initiates the TaskProcessor component. This is done by passing DataToken instances received from the Frontend (specified by the user). The TaskProcessor accesses the CAL-Exec code stored in the TaskRegistry. Based on the received DataTokens, it interprets the CAL-Exec code to start JobBatch and Job instances. This is done with the help of the MultiQueue component. All the token instances are pushed to specific queues, which form groups that trigger respective job instances. This triggering is done according to the operational semantics of CAL-Exec. The queue component helps in managing multiple tokens that need to be directed to appro-  priate job instances. For instance, some tokens with the same token number need to be transported to the same job instance, and others need to be distributed between several job instances. This mechanism is to some extent similar to that found in WS-PGRADE [45] but uses multi-level sequence identifiers contained in the tokens (metadata).
Token instance distribution done by the TaskProcessor in cooperation with the MultiQueue is illustrated in Figs. 12 and 13. The figures show example task executions for the CAL-Exec programs from Figs. 6 and 7. Token instances are denoted by circles with numbers corresponding to the token numbers as specified in the respective CAL-Exec programs. Figure 12a shows an initial step in task execution. An instance of token no. 1 has already been provided by the user and is waiting in the MultiQueue (denoted by "MQ"). At this moment, an instance of token no. 2 is sent from the front end and inserted into the queue (denoted by a solid arrow). This causes the initiation (denoted by a dashed arrow) of a new job batch execution "be101". This is consistent with the definition of the respective job batch ("b002" in Fig. 6) that requires the arrival of tokens no. 1 and 2.
In the next instance, the arrival of tokens 1 and 2 causes the initiation of job execution "je104" according to the definition of job "j004" (see Fig. 12b). Following this, "je004" can start its execution and consecutively produces token instances according to its job definition. Since the provided (output) DataTokens of "j004" are of "multiple token" type, the job execution can produce several tokens numbered "4" and "5". In our example, "je104" produces two sequences of two tokens which are shown in Fig. 12c. Tokens in a sequence are additionally numbered by the execution engine to keep track of token ordering (sequence numbering is denoted by numbers in squares; the final token in a sequence is denoted by an "f"). Figure 12d shows the status of task execution during the processing of token sequences produced by "je004". Based on the initiation rules for jobs "j005" and "j006", the execution engine starts several instances of these jobs ("je105", "je205", "je106" and "je206") and passes appropriate token instances to them. These new job executions start processing and finally provide tokens according to the definitions of jobs "j005" and "j006". As shown in Fig. 12e, the sequence numbers created by "je004" are maintained by the execution engine. After providing tokens on their output, job instances are terminated. Figure 12f illustrates one of the further steps in token processing. It shows the initiation of an instance of the job "j007". It can be noted that the MultiQueue component takes care of grouping token instances according to sequence numbering. Thus, job execution "je107" is created only after the arrival of tokens numbered "6" and "7" with the same sequence numbers (here: "2f"). Consecutive groups of matching tokens initiate consecutive job executions, which is illustrated in Fig. 12g. Figure 12h and i show the final steps in the example task execution. Job executions "je107" and "je207" produce tokens numbered "3", still maintaining the additional sequence numbering started by "je104". At this point, we should note that the output DataToken of the job "j007" is typed "single token" and "single data" (token no. 3, see Fig. 6 again). At the same time, the output DataToken of batch "b002" (also no. 3) is typed "single token" and "multiple data". This means we need an additional job that will gather individual data items sent by the job executions of job "j007" and place them in a data folder. The additional job execution ("je208") is shown in Fig. 12i. It is equivalent to a job with an input DataToken typed as "multiple token" and "single data" and produces a single output token typed as "multiple data" (a folder). Figure 12 is silent on the actual flow of data (files) which obviously follows the flow of tokens. The Balti-cLSC execution environment handles data transfer between external storage and the computation nodes on which job batches are executed. In our example, an appropriate copying job is executed when tokens no. 1 and 2 are delivered. It copies files specified by these tokens to the internal storage of the appropriate container holding the appropriate batch execution (here: "be102"). Since all the jobs in our first example are computed within one batch execution (within a single computation node), there is no need to copy any data.
The tokens simply pass pointers to appropriate data elements kept in the internal storage. Finally, when a token is produced by the job execution "je208", an appropriate copying job sends the resulting folder (holding files specified by the tokens with no. 3) to an appropriate output storage. This way, the user can access the results of computations.
The issue of data transfer and job execution becomes more complex when a given task is executed in "weak" mode. This is illustrated in Fig. 13, which shows some key steps in the execution of a task based on the program from Fig. 7. Figure 13a shows the situation where "je108" has produced tokens no. 4 and 5, and appropriate jobs are being created. In this mode, it causes the creation of another batch execution which can be assigned to a different (possibly geographically distant) computation node. In Fig. 13b, we can notice the next step, in which another pair of tokens with a different sequence number ("2") is produced. This, in turn, causes yet another batch execution ("be204") to be created. Note that the execution environment makes sure to start the various executions of jobs "j009" and "j010" Fig. 13 Example flow of tokens in a task execution ("weak" mode) together in the same batches, keeping track of their sequence numbering. This is consistent with the CAL-Executable program in Fig. 7, which assures that jobs "j009", "j010" and "j011" are kept together.
Figures 13c shows the situation where an execution of job "j011" is created in the same batch execution as the previously executed (and now terminated) executions of jobs "j009" and "j010". Finally, Fig. 13d shows the final step, where an additional copying job is created in a separate batch execution. Similarly to the previous example, all the tokens numbered "3" are directed to this new job. The distribution of batch executions potentially between several computation nodes necessitates additional data transfer. Thus, the execution environment introduces additional copying jobs. It keeps track of the various batch executions and assures that appropriate data elements (files, folders), are copied between the containers holding these distributed batch executions.
Individual batch executions and contained job executions are assigned to specific computation nodes (or "cluster nodes"). This is done through the JobBroker component shown in Fig. 8. This component has access to the NetworkRegistry that holds information about available cluster nodes. When a new batch execution is to be started, the JobBroker checks the resources available in each node and compares them with the resources required by the batch execution (determined from the contained jobs). Following this, it sends a batch-starting message to a selected cluster node. Note that the algorithm for assigning batches to cluster nodes is out of the scope of this paper.
Each cluster node is equipped with an installation of a container execution environment (currently, the system supports Kubernetes and Docker Swarm) and mechanisms for managing job batches assigned for execution on the given cluster -see Fig. 14. Communication between the master node and the cluster nodes is based on DataToken instances. These tokens are passed through appropriate interfaces -the BalticNodeAPI and the BalticServerAPI. In addition, the BalticNodeAPI allows for passing appropriate messages for starting and terminating Job Batch and Job instances. This is managed on each cluster by the BatchManager components. All the batch/job instance initiation and termination actions are scheduled by the ClusterManager components that interact directly with appropriate container execution environments (Kubernetes or Docker Swarm). Ultimately, these mechanisms allow for the parallel execution of containerised job instances. Each JobInstance containers implement computation module releases (see Fig. 3). Each such module should be built as containerised software that receives the input data, performs the task, and sends the output data. However, there are no restrictions on what technologies (operating systems, programming languages, frameworks, etc.) are used. Communication with the Balti-cLSC Environment has to be done using the abovementioned APIs. As was described, the communication between modules and the system is done using data tokens. Thus, a module should implement and use several predefined methods as REST API endpoints and read the appropriate configuration data from environment variables.
There are just two methods to be implemented by a module: • ProcessTokenMessage -accepts a message containing an input data token and responds with a simple integer denoting the initial status of token validation; • GetStatus -responds with an appropriate job status object denoting the current status of data processing.
Moreover, there are just two methods to be used by a module: • PutTokenMessage -sends a message containing an output data token and receives a simple integer response; • AckTokenMessages -sends a special message acknowledging the completion of a complete computation execution..
In short, the code of the computation module should comply with the following life cycle. 5. When the computation execution ends, send an acknowledgement message using the AckToken-Messages endpoint.
To make the development of the modules easier, we provide templates for C# and Python, which hide all the technical details related to communication through the REST API and storing data in remote storage. More information about the development of computation modules can be found in the technical documentation in the "Download" section of the BalticLSC website.

Case Study Example -Waste Collection Logistics Optimisation
To show the applicability of CAL to solving various computation problems, we will present a real-life case study. The aim of the case study is to show how the BalticLSC Environment and CAL could be used to handle non-trivial large-scale computing tasks. The emphasis was on the CAL's ability to combine standalone computation modules into a single app and reuse the built apps and modules for other tasks.
The case study involves a company that develops a software component that optimises routes of waste collection vehicles. For the given set of customers, the vehicle fleet, and the waste fields, we need to compute an optimal (as short as possible) route. The vehicle capacity and the customer demand are considered while doing the optimisation. This results in solving the Capacitated Vehicle Routing Problem (CVRP) [61]. Since such computation problems are NP-hard, they take significant time and computation resources.
The task can be split into three main steps: 1) The coordinates (latitude and longitude) have to be found for the set of geographical objects (clients, vehicle depots, waste fields) given as addresses. This is called geocoding. 2) The distance matrix containing the road distances between all geographical objects in the task has to be computed. 3) Optimisation of the route has to be performed considering the capacity of vehicles, client demand and road distances between clients, waste fields, and depots.
The data model of modules' inputs and outputs is depicted in Fig. 15. There are two main classes (besides road maps) of objects that the modules are operating on. The first is the XGeoWasteLogisticsObject class. Instances of this class contain domain-specific (e.g., capacities of vehicles and amount of clients' demand) and geographical information (addresses). They are used as the input to the application we have built for the use case. The second is the XLocation class which describes the geographical location -longitude and latitude. The XDistance class instances refer to these objects but contain the actual road distance between these objects. Information on actual distances and coordinates is used internally by the computation application.
For each step, an independent computation module has been built according to the description in the previous section.
1) Geo Coder. The module requires a list of objects with addresses (XAddressable instances) and provides a list of coordinates (XLocationObject instances) for these objects. The module uses an external service -the OpenCage GeoCoding API (https://opencagedata.com/api). 2) Geo Router. The module finds the shortest distance between all given locations using the road network given by the map of the region where the locations are situated. Thus, the Geo Router module requires the list of locations (e.g., XLocationObject instances provided by the Geo Coder) and an OpenStreetMap file describing the region. The Geo Router provides a list of distances between objects (XDistance instances). This, in fact, forms the distance matrix for these objects. The Geo Router uses the open-source routing engine GraphHopper (https://www.graphhopper.com/). 3) Geo Waste Logistics Optimizer. The module optimises the route for the given set of vehicles, customers, and waste fields. Thus, the module requires a list of the mentioned objects (XGeoWasteLogis-ticsObject instances) and the distance matrix (distances between all the objects described by XDistance instances) for these objects. It provides a list of optimised routes -sequences of the customers that the vehicles should visit in the given order. The module uses the open-source optimisation engine OptaPlanner (https://www.optaplanner.org/).
Firstly, we build a computation application -the Distance Matrix Calculator (see Fig. 16) that computes the distance matrix for the given set of addressable objects Fig. 15 Waste Collection Logistics Optimisation -data model and the road map of the region in which these objects are located. This app can be used independently of this use case whenever a distance matrix is needed. There are two required (input) data pins. The "Objects with addresses" data pin is of "multiple data" type, while the "Map" data pin is "single data". Thus, the app receives a single map file and multiple data files containing the objects' information. The usage of the "multiple data" pin allows the BalticLSC System to split data and parallelize the execution of the Geo Coder module to multiple job instances. Unlike in the previous examples, where the splitting and merging of computation modules provided the possibility of concurrent execution of job instances, the splitting is hidden behind the mismatch of data multiplicity of data pins on the opposite sides of the data flow. Since the Geo Coder module processes just one data item (token) at a time, the execution of the module (regardless of whether concurrent instances are present or not) produces a sequence of new data items (tokens) on the provided (output) data pin of the module. Thus, the next module, Geo Router, has a required "multiple token, single data" pin called "point_ list". It collects all the produced coordinates and, in fact, acts as a merger. The Geo Router also requires a map passed straight from the declared data pin. The module produces a list of distances -the distance matrix passed to the app's provided declared data pin called "Matrix".
Secondly, we reuse the distance matrix calculation app and build the Waste Collection Logistics Optimisation Application (see Fig. 17). The app has three required data pins (inputs). Two are needed to pass the data to the Distance Matrix Calculator ("Objects with addresses" and "Map"). The third pin is used to pass domain objects to the Geo Waste Logistics Optimizer module. In fact, it would be enough to have just one data pin instead of the "Objects with capacities" and "Objects with addresses" (they are copies of the same objects). However, due to the limitations of the data transformation capabilities of CAL, two pins are required. Note that the data multiplicities of both pins differ. The Geo Waste Optimizer module has two required data pins. The "AllGeoObjects" data pin receives a single file containing all the domain objects, while the "DistanceMatrix" data pin receives a single file containing the distance list between these objects The application and its individual modules have been implemented and tested and are currently available for reuse in the BalticLSC system. Through this case study, we have shown how computation modules can be reused for multiple computation applications and how a computation application can be reused in another computation application. The usage of the built modules is much broader than just the waste collection optimisation domain. Distance matrices, and generally -distances between geographical locations, are needed in a wide range of domains related to transportation and logistics. Even each of the modules separately is reusable. Geo-coding, as well as geo-routing, can be useful for different purposes, e.g., for GIS analysis and cartography. Thus, thoughtfully chosen "bricks" (in the case of BalticLSC the developed computation modules) can serve as building blocks for a wide range of possible computation applications in various domains. CAL and its execution environment is a usable "glue" to make them work together in parallel without the need for a steep learning curve or deep knowledge of underlying technical details.

Conclusion and Future Work
The presented general-purpose language allows the definition of distributed and parallel computations in a visual, low-code way. It has simple graphical syntax and precise runtime semantics. The language implementation comprises an online graphical editor and a comprehensive execution environment (BalticLSC). An important characteristic of CAL is that its pro-grams operate at a high level of abstraction, where the fundamental entities are reusable computation units. Synchronisation of computations is based on flowing data that arrive at the inputs of specific unit instances. The flow of computations in a CAL application is controlled by data produced by the computation units. This characteristic of CAL allows for the automatic distribution of computations and optimisation of data transfer between computation nodes.
Developing a CAL-based Large Scale Computing application requires programming at two distinct levels. The first level uses a visual language that people can use without advanced knowledge of distributed programming and parallelization. At this level, the programmer concentrates on the actual computation algorithm in terms of high-level computation steps and data flowing between these steps. The second level is the development of computation modules. This requires typical programming and technical skills but does not require advanced parallel and distributed programming. The developed modules can be reused easily within CAL programs, thus avoiding code duplication.
From the perspective of the CAL user, the programming task consists in selecting and reusing "computation blocks", and then defining data flows between these blocks. CAL programmers can reflect the flow of data between different computation steps in a natural way. At the same time, the execution environment allows for easy management and structuring of user's data sets which reflect specific data types. Additionally, the "computation block" structure facilitates the reusability of code. The CAL programmer can reuse computation modules, use entire computation applications as computation modules in new applications, and even reuse data sets between different computation tasks. On the other hand, module developers can easily reuse existing software, e.g. in the form of an existing library, by incorporating them directly into a computation module. This makes the solution available to every CAL programmer in the future.
Thanks to its data-flow orientation and the online execution environment, CAL abstracts away all technicalities associated with parallelization and distribution of computing (even across many computation clusters -cf. batches). Additionally, the data-flow orientation of the language allows for easy "serial" parallelization of computations by automatically processing multiple tokens simultaneously without the prerequisite of parallelization knowledge from the end-user. Therefore, the end-user can focus on the complexity of data, its dependencies and processing without manually managing computation parallelization and orchestration.
It can be noted that the individual computation modules (processing steps) can be designed with varying levels of granularity. However, one must remember that a computation module is implemented as a container. Thus, this granularity should be a manageable size. This also influences the granularity of parallel processing. It is controlled mainly by the data flowing between computation module instances (jobs). Thus, the proper design of data pins (varying token and data multiplicities) is crucial for allowing the BalticLSC Environment to decide the number of parallel workers to be launched for the same job depending on the availability of computation resources.
The granularity of processing is also related to the performance of computations. We have not done specific performance tests for the CAL implementation. This is because the overall performance is determined by the performance of computation nodes, the efficiency of the computation modules code, containerisation environments, and data transfer to/from the nodes. Execution logs collected in case studies show that the times used for diagram translation and job brokerage are minimal compared to the processing times and have little or no impact on the total execution time. Thus, the graphical nature of CAL does not significantly influence the performance of computations.
Another important related issue is the performance of application development. The built-in reusability of code in the form of computation modules and automatic computation orchestration has a significant potential to reduce the work required to develop computation software, similar to other low-code solutions. The speed-up should be much higher when many required computation modules are already available on the BalticLSC platform. To foster this goal, CAL can be enhanced by adding automatic transformation of data between modules, allowing for greater flexibility and reusability of existing computation modules.
The future research agenda will be mainly based on the development of further computation modules and improving the usability of CAL. This would allow validating reduction of effort when using the CAL environment with a significant portfolio of reusable computation modules. To conduct such validation, the research agenda would include experimental work comprising controlled experiments comparing developers' performance using CAL and traditional programming models. Current results show promising results but are based on anecdotal evidence. CAL has already been used in several industrial applications and several student projects (including Master's degree projects). All these examples show the usability of the language and the relative ease of developing computation modules based on existing computation libraries. However, this needs to be systematically analysed, which will be the subject of future research and publications.
Another interesting issue related to CAL that is worth researching is the enhancement of CAL's flexibility regarding the execution environment. This would consist in shifting from language interpretation (as in BalticLSC) to its compilation. This would allow the generation of "autonomous workflows" according to the choreography approach, similar to that proposed by the Flowbster system [47]. Compilation of CAL programs into other workflow specification formats would also allow for integration with other distributed computation ecosystems (like Galaxy, WS-PGRADE or Flowbster), similar to that proposed by GeWWE [48].
In summary, in the future, we would like to enhance and validate CAL's potential to significantly foster the use of Large Scale Computations. This is especially important in the current world where programming and especially advanced specialised programming skills are scarce on the market. Based on our current results, we can claim that CAL limits the required knowledge and experience needed to develop di stributed applications. Instead, it allows the language end-users to focus on the complexity of the data they are working with, easing the parallelization and orchestration process. In effect, we demonstrate how the low-code approach can be used to define end execute workflows in distributed computing.

Author contributions Kamil Rybiński and
MichałŚmiałek are the primary authors of the language, have contributed to its syntax and defined its semantics. Agris Sostaks have contributed to the language syntax and semantics and have prepared the syntax descriptions and the case study example. Krzysztof Marek has contributed to language usability and has participated in formulating textual content. Radosław Roszczyk and Marek Wdowiak have contributed to language implementation and have participated in formulating textual content. All authors read and approved the final manuscript Funding This work is partially funded by the European Regional Development Fund, Interreg Baltic Sea Region Programme, project BalticLSC #R075

Conflict of interest
The authors have no competing interests to declare that are relevant to the content of this article Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/ by/4.0/.