1 Introduction and problem statement

Components offered by modern cloud vendors are often designed for standard web applications. In most cases, such an application consists of backend services, which implement business logic; a frontend web server, which communicates with users and a data store—typically a relational database. When an application hosts content for multiple users, it is often recommended (or even required) to use a caching layer between the application backend and the data store layer. Caching is also a good fit for modern application architectures, such as microservices [1] where a single user request might cascade into multiple calls between backend services. Each call introduces additional latency and decreases the overall user experience.

Modern caching solutions, often known as Data Grids, need to respond to high application demands and offer data storage within a distributed system without a single master node. This goal can be achieved by using consistent hashing for data partitioning which allows data access with O(1) complexity [2]. Consistent hashing can also be used by a remote client application for calculating which node in a data grid cluster owns the data, and how to access the data within the shortest possible path (without redirecting from one node to another).

Each system needs to be deployed on appropriate hardware in order to provide value to users. Each system may have different demands, and IT departments inside companies might focus on different characteristics (e.g. time to market, or time required to onboard a new development crew member). Cloud computing fits perfectly into those requirements and has therefore gained popularity very quickly [3]. Enterprise application demands very often require hybrid approach between a private cloud (where all components are installed in a standard on-premise data center) and a public cloud (where all components are hosted by a cloud vendor). Those scenarios are especially challenging from a discovery and connectivity point of view. Some cloud platforms (such as Kubernetes [4] or OpenShift [5]) encapsulate all the traffic within the cloud, exposing a single point for accessing inner components (Fig. 1). Since most vendors aim for a standard web application which uses the HTTP(S) protocol for communication with the client, the component responsible for ingress traffic is usually a reverse proxy rather than an edge router or L4 (Transport Layer of OSI model) load balancer. A reverse proxy usually offers more features related to the HTTP protocol (such as load balancing and content compression) than an edge router (which is focused on L3/Network routing) or L4 load balancer (which uses transport layer information to make balancing decisions). This approach is very convenient from a security point of view where a single component is responsible for both defending the system from unwanted traffic (the firewall) and handling encryption/decryption (SSL termination).

Fig. 1
figure 1

Accessing application instances between clouds

Although this architecture makes a lot of sense for a typical application, some types of deployments need to consume topology information in order to optimize system performance. Common examples are high efficiency data grids and gaming systems. Many video games use RTP (Real-time Transport Protocol), which was originally designed for delivering audio and video content, for communicating with one or more dedicated servers. Those systems can be viewed as if they were using a client-based load balancing technique [6] where topology information is essential to fulfill all application requirements. Apart from caching and gaming, there are also a number of hybrid cloud applications where part of the cluster is hosted inside a public cloud and part of it is hosted within a traditional on-premises data center. This approach can sometimes be seen when using quorum-based systems where a majority of instances are hosted in the cloud for stability, while a minority are hosted within a data center to improve communication speed.

Enabling client-based load balancing requires achieving two major properties—discoverability and connectivity. Discoverability represents a way to identify topology information—in particular the number of server instances, exposed ports and IP addresses. This enables the client to configure the connection pool according to that information. Connectivity represents an ability to connect to those internal instances directly. It is also worth mentioning that both the caching and gaming industry have very high demands on application throughput at low latency. Therefore, an ideal solution needs to have minimal negative impact on both of those characteristics.

The major contributions of this paper are:

  • The proposal and evaluation of solutions for exposing services hosted in the cloud to the outside world

  • A method for enabling client-side routing using a load balancer per application instance for a client application is hosted outside of the cloud

  • A discussion on benchmark results followed by a general recommendation for using different communication flows depending on the use case

This paper is structured as follows: Sect. 2 describes related work both conceptually and technically; Sect. 3 presents a number of proposed solutions and describes a set of requirements for evaluating a prototype; Sect. 4 evaluates a prototype selected from Sect. 3 according to the requirements and provides some implementation details as well as benchmarks; and Sect. 5 contains a discussion of results and planned future work.

2 Related work

Methods for exposing an application cluster deployed in the cloud to the outside world have been investigated since the early days of cloud computing. The “Cluster as a Service” [7] approach based on RVWS (Resources Via Web Service) framework published in 2010 proposes an abstraction to expose a set of WSDL-based web services deployed on multiple cloud vendors. One of the most important characteristics of the framework is the ability to discover new clusters automatically. This is made possible by using a broker architecture which connects all exposed clusters. A “client” application along with a “cloud provider” specifies a required SLA (Service Level Agreement) for the client’s request or work flow (Fig. 2). An SLA is a type of contract between a cloud provider and a client that describes the performance and functional characteristics of a service.

Fig. 2
figure 2

Cluster as a Service overview [7]

The RVWS framework uses a Publisher Service to expose cluster characteristics in WSDL format. Based on this information, the Broker can decide which cluster best meets client requirements (Fig. 3).

Fig. 3
figure 3

Exposing a cluster using Publisher Service [7]

Even though the “CaaS” framework aims at different goals than those of this paper, it relevantly solves both the discovery and connectivity problems within the cloud. The RVWS framework, which is a fundamental part of the system, focuses on batch processing tasks and WSDL-based web services whereas this paper focuses on high performance and low latency communication required by modern caching solutions and the gaming industry. To address this, solutions presented in this paper utilize basic parts of a “Publisher Service” and offer direct communication with the cluster (similarly to “CaaS”). Since both solutions allow the client application to communicate directly with the cluster, the security concerns remain similar. However, WSDL-based solutions are often based on the HTTP protocol rather than TCP connection with custom binary protocols (network protocols which use binary arrays to encode commands and data, often used in caching solutions). Since both edge routers and reverse proxies offer many more configuration options for HTTP, we should consider “CaaS” to be more secure than solutions proposed in this paper.

Exposing a cluster within the cloud is also similar (to some extent) to multi-tenancy [8]. The distinction is that multi-tenancy is based on tenant recognition, whereas exposing a cluster requires recognizing a specific server (to which a specific request should be redirected). For HTTP-based protocols there are number of options including cookie-based routing, or using a Host HTTP header. However, high performance protocols are often limited to binary data based on TCP or UDP transport. Even though the TCP protocol itself does not have any support for these kind of solutions, encrypting traffic using Transport Layer Security (TLS) with Server Name Indication (SNI) allows the use of the HostName field [9], which contains a fully qualified server name. This approach has been successfully used for the multi-tenancy implementation shown in Fig. 4.

Fig. 4
figure 4

Multi-tenancy based on SNI routing

An interesting approach (analyzed in depth in the following paragraphs) is to use a load balancer for each server deployed in the cloud, since most of the cloud vendors use high performance solutions for load balancing. A similar type of functionality has been provided by Fabric8’s Expose Controller [10]. This controller allows exposure of a load balancer per group of server instances (not a load balancer per server) and it does not provide any information about internal/external address mapping. Still, from a technical perspective, the Expose Controller was used as a source of inspiration for the technical solution presented in this paper.

Finally, all application instances need to operate on some sort of hardware. The scheduler is typically responsible for assigning an application replica to a node that runs the program. Even though scheduling algorithms are not strictly connected to the proposal mentioned in this paper, it is worth to mention that it is an interesting and very important area of research for cloud-related applications [11,12,13,14].

3 The proposed solution

Solutions for exposing clusters deployed within the cloud to the outside world need to meet discoverability and connectivity criteria. A client application needs to be able to discover nodes in the cluster, as well as to establish connections to individual cluster members. A good solution also needs to introduce the smallest possible penalty on performance (defined by throughput in operations per second). The final goal is full process automation which is required from a cluster auto healing perspective. A full list of requirements and priorities is presented in Table 1.

Table 1 Requirements for proposed solution

One of the solutions initially evaluated was securing the transport with TLS and use of the SNI extension to decide which server receives the request from the client (Fig. 4). This solution is natively supported by many reverse proxy applications used by cloud vendors (such as HAProxy [15] or NGNIX [16]) thus it can be easily automated. The biggest downside of this approach is that the performance is 30% lower (Table 2) than using non-TLS encrypted transport.

Table 2 Performing 10,000 Cache PUT operations with different TLS configuration [8]

An alternative approach is to use a custom load balancer application which is able to decode additional, non-standard TCP and UDP options (as many data grid and gaming industry applications use TCP or UDP as transport protocols). However, during the evaluation it became clear that only a small fraction of libraries allow the use of custom options on the socket level, one of the commonly used examples being JNetPcap [17]. With this in mind, the router solution was discarded as only a small proportion of projects would be able to leverage this approach.

The next group of possible solutions are those based on the use of load balancers provided by the cloud vendors. For many cloud vendors, load balancers and reverse proxies are considered one and the same, however, they are considered separately in this paper. Most solutions of this nature (such as Amazon Elastic Load Balancer [18] and Google Cloud Load Balancer [19]) are optimized to serve high volume traffic whilst introducing the smallest possible loss of performance. Network Virtualization efforts which integrate seamlessly with both load balancers and SDN are used within the cloud to achieve much better results than ever before [19, 20]. To solve the connectivity problem, it is possible to allocate a load balancer per node in the cluster. Since creating a new load balancer and a new binary proxy very often requires a REST API call to an API exposed by a cloud vendor, this process can be easily automated. Finally, the discoverability problem can be solved by creating a new application which exposes mappings between internal and external IP addresses for a cluster deployed in a cloud. Therefore, our final proposition is presented in Fig. 5.

Fig. 5
figure 5

Our system proposition

Each load balancer instance has a publicly accessible IP address, and in fact, many cloud vendors treat load balancers as an external/virtual IP implementation [21]. Each request is passed into a binary proxy (a proxy which forwards messages sent using binary protocols), then finally to an application instance. The motivation behind implementing proxying is that in container-based clouds, application instances can often be restarted (e.g. because of balancing hardware resources utilization or auto-scaling in and out process). The proxy acts as a safety buffer pausing any communication until application instances are ready to process incoming requests. Another argument is that creating and destroying load balancer instances can take some time (on Google Cloud Platform it is a matter of minutes). With binary proxies, the number of create/destroy events can be limited.

Use of load balancers and binary proxies satisfy the connectivity requirement. A newly created component called an External IP Controller instance is responsible for automating the process of discovering new application instances and creating proxies, as well load balancers for all instances. This component also exposes a REST endpoint (which serves YAML-based content with internal/external IP address mapping) that helps the client application to determine which application instances are mapped to which addresses. This thereby satisfies the discoverability requirement. However, during application benchmarking it turned out that implementing binary proxy communication degrades the performance (Table 3).

Table 3 Binary proxy benchmark results

Therefore it was decided to remove binary proxy (along with all its benefits) and propose the final optimized prototype without it.

4 Prototype evaluation

The final prototype was implemented using open source technologies such as Kubernetes. It will be implemented as cloud platform hosted on Google Cloud Engine and Infinispan for testing client/server communication. The main argument for choosing Kubernetes is that it does not depend on any particular cloud vendor (in other words it can be hosted using any public cloud provider such as Google Cloud Platform or Amazon Web Services). As for client/server communication, Infinispan is a data grid implementation which uses a Consistent Hash algorithm for locating primary owner [2] nodes for a specific portion of the data. This optimization tactic allows us to asses whether the prototype improves communication performance or not. The evaluated systems has been shown in Fig. 6.

Fig. 6
figure 6

The diagram of the evaluated system

Hot Rod is the name of Infinispan’s custom binary protocol used for transmitting data to/from the Infinispan server. The Hot Rod client obtains topology information upon its first connection to the cluster. The topology contains a list of servers and their internal addresses (10.0.5.5, 10.0.6.5 and 10.0.7.5). The External IP Controller service provides a mapping (using a REST service) between internal (e.g. 10.0.5.5) and corresponding external addresses (e.g. 104.155.17.202). This information allows the client application to reconstruct the a consistent hash provided by the Infinispan server. Each topology update follows the same update pattern on the client side. The biggest advantage of this approach is that modification only needs to be applied to the client codeFootnote 1, and the server is not aware of any translation process.

The algorithm used by the Hot Rod client is presented in Fig. 7.

Fig. 7
figure 7

Hot Rod internal/external address mapping algorithm

The creation and removal of load balancers has been automated within an External IP Controller. The algorithm used by the controller has been designed to run indefinitely. The application thread wakes up every 5 minutes to query the Kubernetes API for all application instances. To create a Load Balancer for each application instance, the controller needs to set marker labels on each instance. Later, Kubernetes Services selects proper instances based on a Selector, where the Selector query and marker labels need to match. Figure 8 presents the algorithm in pseudo-code notation.

Fig. 8
figure 8

External IP Controller algorithm

System benchmarks were established using topology-aware and a simple Hot Rod client. The topology-aware client uses consistent hash information for sending each request to specific server, whereas the simple client chooses random server from specified connection list. Each test consists of sending 10,000 Put Operations (which means inserting 10,000 random strings into the data grid) and 10,000 Put and Get Operations (where the Get Operation represents getting a value previously inserted by a Put Operation). Each test was ran 31 times and an error margin was calculated using a 99.9% confidence interval. The results have been shown in Table 4 and 5 as well as in Figs. 9 and 10.

Table 4 Benchmark results for performing 1000 Put operations
Fig. 9
figure 9

Benchmark results for performing 1000 Put and Get operations

Table 5 Benchmark results for performing 1000 Put and Get operations
Fig. 10
figure 10

Benchmark results for performing 1000 Put and Get operations

To check whether benchmark results differed from each other in any meaningful way, a test of significance was performed for each pair of results in Table 6. The test is based on a null hypothesis which assumes that both tested mean values are equal (see equation below). The research hypothesis is that the means are different [22].

$$\begin{aligned} \mu _0 = \mu _1 \end{aligned}$$
(1)

Since the samples are independent from each other, and measured groups are not linked to each other, the significance test can be done using the “Two-Sample t-Test for Equal Means“:

$$\begin{aligned} Z=\frac{\overline{X_1}+\overline{X_2}}{\sqrt{\frac{\sigma _1^2}{n_1}+\frac{\sigma _2^2}{n_2}}} \end{aligned}$$
(2)

where \(\overline{X_1}\) and \(\overline{X_2}\) are sample means, \(\sigma _1^2\) and \(\sigma _2^2\) are variances and \(n_1\) and \(n_2\) are sample sizes.

The final step is to prove that the calculated Z value is between \((-\infty , 1-\frac{\alpha }{2}>\cup <1-\frac{\alpha }{2}, \infty )\) for the tested hypothesis that \(\mu _0 \ne \mu _1\). For \(\alpha =0.05\) (99%), the tested interval is equal to \((-\infty , -2.58>\cup <2.58, \infty )\). Table 6 contains the calculated Z parameter for each test pair. We rejected the null hyphothesis for each tested pair except “L1 + Kubernetes internal + LB” and “L3 + Kubernetes external + LB”. In that case, differences between values were not statistically significant.

Table 6 Significance measure matrix between benchmark results

5 Conclusion and future work

Exposing services hosted within the cloud to external consumption is not generally very popular, yet it is very useful for some use cases. Many cloud vendors have sophisticated tools or templates that allow them to bootstrap such services very quickly. Automatic auto scaling based on CPU and memory metrics, health monitoring, and automatic backup management are only a few reasons why hosting applications within the cloud is preferred by many development teams. Many cloud vendors use load balancers as the only way of reaching services hosted within the cloud from the the outside world. Very frequently the process of provisioning load balancers is strictly connected to creating new firewall or port forwarding rules. The process of creating all the necessary components is often automated and takes no longer than few minutes. The most crucial downside of using load balancers is that they are relatively expensive and very often charged on hourly basis.

The network infrastructure maintained by cloud vendors is very often optimized for using load balancers. Comparing “L3 + Kubernetes internal” and “L3 + Kubernetes internal + LB” benchmark results from the Table 5 showed only 13.43% lower throughput when sending data through a load balancer. The binary proxy implementation presented in paragraph 3 achieved much worse results. Comparing “Perform 10,000 Put and Get Operations” with and without binary proxy from the Table 3, resulted in 68.05% worse performance.

The communication performance degrades much further when using other Virtual Machines within the same cloud offering (but not in the same container cloud). An example in the benchmark used a Kubernetes cluster running on a CentOS VM and a client application deployed in a separate VM instance. Comparing benchmarks for a topology aware client (“L3 + Kubernetes external + LB” and “L3 + Kubernetes internal + LB” benchmark results from the Table 5) showed 68.70% performance loss when the client is deployed in a separate VM instance. Benchmarks for a client that doesn’t use consistent hash-based routing the results were much better and stabilized at 24.08% worse performance when the client is deployed in a separate VM instance. The figures might significantly vary from cloud vendor to cloud vendor, due to network optimizations used in the data center.

A general recommendation based on the results is that one should always use direct communication within a container-based cloud when possible. This approach presented the best performance results among all tested configurations. Using the consistent hash-based routing is always the fastest approach from the client perspective since it picks the shortest possible path to a proper server. If for some reason this approach is not feasible (e.g. configured network policies do not allow direct communication), one should use a load balancer per instance approach as presented in this paper. This scenario resulted in 13.43% performance drop opposed to direct communication. In many cases such small performance penalty is perfectly acceptable. Another group of use cases are client applications deployed outside of the cloud that connects to a service deployed inside of the cloud. Using a load balancer per instance approach presented in this paper enables the client to pick the destination address (e.g. by using a consistent hash). Comparing “L3 + Kubernetes external + LB” and “L1 + Kubernetes external + LB” benchmark results from the Table 5 showed that using topology information resulted in 8.90% better performance results than sending requests to randomly-picked instances. Even though 8.90% might not seem much it is worth mentioning that the benchmark used relatively small payloads—random 16-character strings. Sending such small portions of data over an optimized project Andromeda network stack between data grid instances is relatively fast. The difference between those benchmarks is expected to be greater when using payloads of several megabyte size.

It is also worth to mention that adding TLS encryption resulted in 30% worse performance. High performance penalty was a deciding factor to skip security aspect from this paper.

Even though benchmark results look promising, there is still a need to benchmark other scenarios (e.g. involving disk access) and different types of services. Furthermore, this paper did not investigate security aspects of the proposed approach. Exposing internal to external (and vice versa) address mapping might be used as a potential attack vector. Some existing implementations like Infinispan may mitigate this risk by putting address mapping into the server. This way the client obtains an already-altered list of servers and is not aware of any mapping process.