In this chapter, we first present Zoo, a script subsystem we have originally developed for OCaml file sharing. We will introduce how it is used and its design. Based on this system, we discuss the problem of computation composition and deployment in a numerical library.

9.1 Script Sharing with Zoo

The core functionality of the Zoo is simple: sharing OCaml scripts. It is known that we can use OCaml as a scripting language as Python (at certain performance cost because the code is compiled into bytecode). Even though compiling into native code for production use is recommended, scripting is still useful and convenient, especially for light deployment and fast prototyping. In fact, the performance penalty in most Owl scripts is almost unnoticeable because the heaviest numerical computation part is still offloaded to Owl which runs native code.

While designing Owl, our goal is always to make the whole ecosystem open, flexible, and extensible. Programmers can make their own “small” scripts and share them with others conveniently, so they do not have to wait for such functions to be implemented in Owl’s master branch or submit something “heavy” to OPAM.

9.1.1 Example

To illustrate how to use Zoo, let’s start with a simple synthetic scenario. Alice is a data analyst and uses Owl in her daily job. One day, she realized that the functions she needed had not been implemented yet in Owl. Therefore, she spent an hour in her computer and implemented these functions by herself. She thought these functions might be useful to others, for example, her colleague Bob; she decided to share these functions using the Zoo system. Now let’s see how Alice manages to do so in the following, step by step.

First, Alice needs to create a folder (e.g., myscript folder) for her shared script. What to put in the folder then? She needs at least two files in this folder. The first one is of course the file (i.e., coolmodule.ml) implementing the function as follows. The function sqr_magic returns the square of a magic matrix; it is quite useless in reality but serves as an example here.

#!/usr/bin/env owl open Owl let sqr_magic n = Mat.(magic n |> sqr)

The second file she needs is a #readme.md which provides a brief description of the shared script. Note that the first line of the #readme.md will be used as a short description for the shared scripts. This short description will be displayed when you use the owl -list command to list all the available Zoo code snippets on your computer.

Square of Magic Matrix `Coolmodule` implements a function to generate the square of magic matrices.

Second, Alice needs to distribute the files in the myscript folder. The distribution is done via Gist, so you must have gist installed on your computer. For example, if you use Mac, you can install gist with brew install gist. Owl provides a simple command-line tool to upload the Zoo code snippets. Note that you need to log in to your GitHub account for gist and git.

owl -upload myscript

The owl -upload command simply uploads all the files in myscript as a bundle to your Gist page. The command also prints out the URL after a successful upload. The bundle Alice uploaded before is assigned a unique id, that is, 9f0892ab2b96f81baacd7322d73a4b08. In order to use the sqr_magic function, Bob only needs to use the #zoo directive in his script, for example, bob.ml, in order to import the function.

#!/usr/bin/env owl #zoo "9f0892ab2b96f81baacd7322d73a4b08" let _ = Coolmodule.sqr_magic 4 |> Owl.Mat.print

Bob’s script is very simple, but there are a couple of things worth pointing out:

  • The Zoo system will automatically download the bundle of a given id if it is not cached locally.

  • All the ml files in the bundle will be imported as modules, so you need to use Coolmodule.sqr_magic to access the function.

  • You may also want to use chmod +x bob.ml to make the script executable. This is obvious if you are a heavy terminal user.

Note that to use the #zoo directive in REPL such as utop, you need to manually load the owl-zoo library with #require "owl-zoo";;. Alternatively, you can also load owl-top using #require "owl-top";; which is an OCaml top-level wrapper of Owl. If you want to make utop, load the library automatically by adding this line to ~/.ocamlinit.

9.1.2 Version Control

Alice has modified and uploaded her scripts several times. Each version of her code is assigned a unique version id. Different versions of code may work differently, so how could Bob specify which version to use? The good news is that he barely needs to change his code.

#!/usr/bin/env owl #zoo "9f0892ab2b96f81baacd7322d73a4b08?     vid=71261b317cd730a4dbfb0ffeded02b10fcaa5948" let _ = Coolmodule.sqr_magic 4 |> Owl.Mat.print

The only thing he needs to add is a version id using the parameter vid. The naming scheme of Zoo is designed to be similar with the field-value pair in a RESTful query. The version id can be obtained from a gist’s revisions page.

Besides specifying a version, it is also quite possible that Bob prefers to use the newest version Alice provides, whatever its id may be. The problem here is that, how often does Bob need to contact the Gist server to retreat the version information? Every time he runs his code? Well, that may not be a good idea in many cases considering the communication overhead and response time. Zoo caches gists locally and tends to use the cached code and data rather than downloading them all the time.

To solve this problem, Zoo provides another parameter in the naming scheme: tol. It is the threshold of a gist’s tolerance of the time it exists on the local cache. Any gist that exists on a user’s local cache for longer than tol seconds is deemed outdated and thus requires updating the latest vid information from the Gist server before being used. For example:

#!/usr/bin/env owl #zoo "9f0892ab2b96f81baacd7322d73a4b08?tol=300" let _ = Coolmodule.sqr_magic 4 |> Owl.Mat.print

By setting the tol parameter to 300, Bob indicates that if Zoo has already fetched the version information of this gist from the remote server within the past 300 seconds, then keep using its local cache; otherwise, contact the Gist server to check if a newer version is pushed. If so, the newest version is downloaded to local cache before being used. In the case where Bob doesn’t want to miss every single update of Alice’s gist code, he can simply set tol to 0, which means fetching the version information every time he executes his code. The vid and tol parameters enable users to have fine-grained version control of Zoo gists. Of course, these two parameters should not be used together. When vid is set in a name, the tol parameter will be ignored. If both are not set, as shown in the previous code snippet, Zoo will use the latest locally cached version if it exists.

A user can either choose a specific version id or use the latest version, which means the newest version on local cache. Obviously, using latest introduces cache inconsistency. The latest version on one machine might not be the same on the other. To get the up-to-date version from a Gist server, the download time of the latest version on a local machine will be saved as metadata. The newest version on the server will be pulled to the local cache after a certain period of time, if the latest flag is set in the Gist name. Ideally, every published service should contain a specific version id, and latest should only be used during development.

9.2 Service Deployment and Composition

Based on the Zoo system, in the rest of this chapter, we discuss the computation service deployment and composition problem. First, let’s briefly present some background. Recently, computation on edge and mobile devices has gained rapid growth in both the industry and academia, such as personal data analytics in the home, DNN application on a tiny stick, semantic search and recommendation on a web browser [53], etc. HUAWEI has identified speed and responsiveness of native AI processing on mobile devices as the key to a new era in smartphone innovation. Many challenges arise when moving machine learning (ML) analytics from cloud to edge devices.

One problem is not yet well defined and investigated: model composition. Training a model often requires large datasets and rich computing resources, which are often not available to normal users. That is one of the reasons that they are bound to the models and services provided by large companies. To this end, we propose the idea of composable service. Its basic idea is that many services can be constructed from basic ones such as image recognition, speech-to-text, and recommendation to meet new application requirements. Modularity and composition will be the key to increasing usage of ML-based data analytics.

Composing components into a more complex entity is not uncommon to see in the computer science. One such example is the composition of web services. A web service is a software application that is identified by a URI and supports machine-to-machine interaction over a network. Messages in formats such as XML and JSON are transferred among web services according to their prescribed interfaces. The potential of the web service application architecture lies in that the developers can compose multiple services and build a larger application via the network. In web service composition, one problem is to select proper participant services so that they can work together properly. A lot of research effort has been made on composition methods that consider information such as interfaces, message types, and dynamic message sequences exchanged.

A similar paradigm is the microservices architecture. With this architecture, a large monolithic software application should be decomposed into small components, each with distinct functionalities. These components can communicate with each other via predefined APIs. This approach provides multifolds of benefits, such as module reusability, service scalability, fault isolation, etc. Many companies, such as Netflix, have successfully adopted this approach. In the composition of different microservices, the application API plays a key role.Footnote 1 Another field that advocates the composition approach is the serverless computing, where the stateless functions can be composed into more complex ones. Based on the observation that existing serverless systems spend a large portion of time on booting function containers and interaction between functions, the SAND system investigates the combination of different functions. By proposing application-level sandboxing and a hierarchical message bus, this system reduces latency and improves resource utility.

In this chapter, as a contribution, the Zoo system provides a small domain-specific language (DSL) to enable the composition of advanced data analytics services. Benefiting from OCaml’s powerful type system, the Zoo provides type checking for the composition. Besides, the Zoo DSL supports fine-grained version control in composing different services provided by different developers, since the code of these services may be in constant change.

Another challenge in conducting ML-based data analytics on edge devices is the deployment of data analytics services. Most existing machine learning frameworks, such as TensorFlow and Caffe, focus mainly on the training of analytics models. On the other hand, end users, many of whom are not ML professionals, mainly use trained models to perform inference. This gap between the current ML systems and users’ requirements is growing.

The deployment of service is close to the idea of model serving. The Clipper [13] serving system is used for ML model–based prediction, and it features choosing the model that has the lowest latency from models on multiple ML frameworks. It enables users to access models based on multiple machine learning frameworks. These models are implemented in the form of containers. Compared with Clipper, the TensorFlow Serving focuses on using TensorFlow itself as a model execution framework. The models are in the form of SavedModel, and they can be deployed as a container that contains TensorFlow to serve prediction requests. Another field that employs the idea of service deployment is in the serverless computing. In serverless platforms such as Amazon Lambda and OpenLambda, utilizing the powerful ecosystem of existing cloud providers, the stateless functions provided by users can be deployed on different types of devices to get access to resources such as database and cloud files. For this aspect, as a contribution, the Zoo DSL also involves deploying composed services to multiple backends: not only containers but also unikernels and JavaScripts. We have discussed them in Chapter 8.

9.3 System Design

Based on these basic functionalities, we extend the Zoo system to address the composition and deployment challenges. Specifically, we design a small DSL to enable script sharing, type-checked composition of different data analytics services with version control, and deployment of services to multiple backends. First, we would like to briefly introduce the workflow of Zoo as shown in Figure 9-1. The workflow consists of two parts: development on the left side and deployment on the right.

Figure 9-1
A workflow architecture diagram of a zoo system. The main modules are the end user, deployed service, Owl gist, service, and published models. Each of its functions is also provided.

Zoo system architecture

Development concerns the design of interaction workflow and the computational functions of different services. One basic component is the Gist. By using Zoo, a normal Gist script will be loaded as a module in OCaml. To compose functionalities from different Gists only requires a developer to add one configuration file to each Gist. This file is in JSON format. It consists of one or more name-value pairs. Each pair is a signature for a function the script developer wants to expose as a service. These Gists can be imported and composed to make new services. When a user is satisfied with the result, they can save the new service as another Zoo Gist.

Deployment takes a Gist and creates models in different backends. These models can be published and deployed to edge devices. It is separated from the logic of development. Basic services and composed ones are treated equally. Besides, users can move services from being local to remote and vice versa, without changing the structure of the constructed service. Deployment is not limited to edge devices, but can also be on cloud servers, or a hybrid of both cases, to minimize the data revealed to the cloud and the associated communication costs. Thus, by this design, a data analytics service can easily be distributed to multiple devices. In the rest of this section, we will elaborate on the design and give details of different parts of this workflow.

9.3.1 Service

Gist is a core abstraction in Zoo. It is the center of code sharing. However, to compose multiple analytics snippets, Gist alone is insufficient. For example, it cannot express the structure of how different pieces of code are composed together. Therefore, we introduce another abstraction: service.

A service consists of three parts: Gists, types, and the dependency graph. A Gist is the list of Gist ids this service requires. Types are the parameter types of this service. Any service has zero or more input parameters and one output. This design follows that of an OCaml function. A dependency graph is a graph structure that contains information about how the service is composed. Each node in it represents a function from a Gist and contains the Gist’s name, id, and a number of parameters of this function.

Zoo provides three core operations about a service: create, compose, and publish. The create_service creates a dictionary of services given a Gist id. This operation reads the service configuration file from that Gist and creates a service for each function specified in the configuration file. The compose_service provides a series of operations to combine multiple services into a new service. A compose operation does type checking by comparing the “types” field of two services. An error will be raised if incompatible services are composed. A composed service can be saved to a new Gist or be used for further composition. The publish_service makes a service’s code into such forms that can be readily used by end users. Zoo is designed to support multiple backends for these publication forms. Currently, it targets the Docker container, JavaScript, and MirageOS [37] as backends.

9.3.2 Type Checking

As mentioned in Section 9.3, one of the most important tasks of service composition is to make sure the type matches. For example, suppose there is an image analytics service that takes a PNG format image, and if we connect to it another one that produces a JPEG image, the resulting service will only generate meaningless output for data type mismatch. OCaml provides primary types such as integer, float, string, and Boolean. The core data structure of Owl is ndarray. However, all these types are insufficient for high-level service type checking as mentioned. That motivates us to derive richer high-level types.

To support this, we use generalized algebraic data types (GADTs) in OCaml. There already exist several model collections on different platforms, for example, Caffe and MXNet. I observe that most current popular deep learning models can generally be categorized into three fundamental types: image, text, and voice. Based on them, we define subtypes for each: PNG and JPEG images, French and English text, and voice, that is, png img, jpeg img, fr text, en text, fr voice, and en voice types. More can be further added easily in Zoo. Therefore, type checking in OCaml ensures type-safe and meaningful composition of high-level deep learning services.

9.3.3 DSL

Zoo provides a minimal DSL for service composition and deployment.

Composition: To acquire services from a Gist of id gid, we use $gid to create a dictionary, which maps from service name strings to services. I implement the dictionary data structure using Hashtbl in OCaml. The # operator is overloaded to represent the “get item” operation. Therefore

$gid#sname

can be used to get a service that is named “sname.” Now suppose we have n services: f1, f2, …, fn. Their outputs are of type tf1, tf2, …, tfn. Each service s accepts ms input parameters, which have type \( {t}_s^1 \), \( {t}_s^2 \), …, \( {t}_s^{m_s} \). Also, there is a service g that takes n inputs, each of them has type \( {t}_g^1 \), \( {t}_g^2 \), …, \( {t}_g^n \). Its output type is to. Here, Zoo provides the $> operator to compose a list of services with another:

$$ \left[{f}_1,{f}_2,.\dots, {f}_n\right]\$>g $$

This operation returns a new service that has \( \sum \limits_{s=1}^n{m}_s \) inputs and is of output type to. This operation does type checking to make sure that \( {t}_{fi}={t}_g^i,\forall i\in 1,2,\dots, n \).

Deployment: Taking a service s, be it a basic or composed one, it can be deployed using the following syntax:

s$@ backend

The $@ operator publishes services to a certain backend. It returns a string of URI of the resources to be deployed.

Note that the $> operator leads to a tree structure, which is in most cases sufficient for our real-world service deployment. However, a more general operation is to support a graph structure. This will be my next-step work.

9.3.4 Service Discovery

The services require a service discovery mechanism. For simplicity’s sake, each newly published service is added to a public record hosted on a server. The record is a list of items, and each item contains the following: a Gist id that the service is based on; a one-line description of this service; a string representing the input and output types of this service, such as “image → int → string → text,”; a service URI. For the container deployment, the URI is a Docker Hub link, and for the JavaScript backend, the URI is a URL link to the JavaScript file itself. The service discovery mechanism is implemented using an off-the-shelf database.

9.4 Use Case

To illustrate the preceding workflow, let us consider a synthetic scenario. Alice is a French data analyst. She knows how to use ML and DL models on existing platforms, but is not an expert. Her recent work is about testing the performance of different image classification neural networks. To do that, she needs to first modify the image using the DNN-based Neural Style Transfer (NST) algorithm. NST takes two images and outputs to a new image, which is similar to the first image in content and the second in style. This new image should be passed to an image classification DNN for inference. Finally, the classification result should be translated to French. She does not want to put academic-related information on Google’s server, but she cannot find any single pretrained model that performs this series of tasks.

Here comes the Zoo system to help. Alice finds Gists that can do image recognition, NST, and translation separately. Even better, she can perform image segmentation to greatly improve the performance of NST using another Gist. All she has to provide is some simple code to generate the style images she needs to use. She can then assemble these parts together easily using Zoo.

open Zoo (* Image classification *) let s_img = $ "aa36e" # "infer";; (* Image segmentation *) let s_seg = $ "d79e9" # "seg";; (* Neural style transfer *) let s_nst = $ "6f28d" # "run";; (* Translation from English to French *) let s_trans = $ "7f32a" # "trans";; (* Alice's own style image generation service *) let s_style = $ alice_Gist_id # "image_gen";; (* Compose services *) let s = [s_seg; s_style] $> s_nst   $> n_img $> n_trans;; (* Publish to a new Docker Image *) let pub = (List.hd s) $@   (CONTAINER "alice/image_service:latest");;

Note that the Gist id used in the code is shortened from 32 digits to 5 due to column length limit. Once Alice creates the new service and publishes it as a container, she can then run it locally, send a request with image data to the deployed machine, and get image classification results back in French.

9.5 Discussion

One thing to note is that, in service composition, type checking is a nice property to have, but not the only one. From web services to microservices, the industry and researchers have studied the composition issue for years. Besides checking the static information such as message types, interfaces, etc., sometimes the dynamic behavior between services should also be checked. It is the same in our data analytics services composition scenario.

For example, the Generative Adversarial Network (GAN) is a huge family of networks. A GAN consists of two parts: generator and discriminator. The generator tries its best to synthesize images based on existing parameters. The discriminator takes the images produced by the generator and tries its best to separate the generated data from true data, using a Boolean or percentage value. This mutual deception process is iterated until the discriminator can no longer tell the difference between the generated data and the true data. Using Zoo, the users may want to compose a generator with different discriminators to see which combination produces the most trustworthy fake images. To do this, only matching the types of these two services is not enough. The users also need to specify the dynamic information such as the order and number and messages exchanged in between.

To solve this problem, some kind of formalisms may need to be introduced in as theoretical foundation to structure interaction and reason over communicating processes between services. One such option is the session types [31]. Session types are a type discipline for communication-centric programming. It is based on the π-calculi, and its basic idea is that the communication protocol can be described as a type, which can be checked at runtime or statically. The session types have gained much attention recently and are already implemented in multiple languages, including OCaml. This approach can effectively enhance the type checking in Zoo and is a promising future direction to pursue in my next step on this work.

9.6 Summary

In this chapter, we first introduced Zoo, a scripting sharing tool in Owl, including its usage and design. Based on it, we explored two topics: service composition and deployment. Zoo provides a small DSL to enable type-checked composition of different data analytics services with version control and deployment of services to multiple backends. It benefits from OCaml’s powerful type system. A use case was presented to demonstrate the expressiveness of this DSL in composing advanced ML services such as image recognition, text translation, etc. The Zoo DSL also enables deploying composed services to multiple backends: containers, unikernels, and JavaScripts; service deployment often requires choosing a suitable one.