Keywords

1 Introduction

Cloud computing is being increasingly adopted in various real-life software systems, thanks to its potential benefits, such as cost reduction, improved performance, elasticity, and scalability - the so-called non-functional characteristics. In deciding whether to adopt Cloud computing in the early design phase, estimating such non-functional characteristics in a fast, inexpensive, and reliable manner is an essential task for Chief Information Officers (CIOs) with limited time constraints and budgets, especially when initial requirements are ambiguous, incomplete, inconsistent and changing [1,2,3,4,5].

Cloud benchmarking and simulation are two techniques long practiced for performance estimation. Although Cloud benchmarking typically yields reliable estimates, it is potentially expensive and time-consuming to use Cloud resources to process benchmark transactions. The nature of requirements in the early design phase (e.g., the unclear and changing number of concurrent users) increases the time and cost of benchmarking various architectural designs in Clouds [6,7,8,9]. On the other hands, Cloud simulation, which is faster and cheaper than benchmarking, has a difficulty of ensuring reliability and understandability of the simulation results, because of the reduced fidelity of the simulation model. For example, when representing the CPU clock speed, Cloud simulations tend to use Million Instructions Per Second (MIPS), while real-world Clouds use Gigahertz (GHz) [10,11,12].

To deal with this dichotomy, we propose a complementary approach, using both benchmarking and simulation, to estimating the performance of Cloud-based systems, whereby performance estimates can be obtained in a fast, inexpensive, and also reliable manner. Our approach is complementary in the sense that benchmarks enhance reliability and understandability of the simulation model, and simulation results reduce the cost and time for benchmarking Clouds by narrowing down the architecture design space. In our approach, the ontological concepts of a benchmark model, whose benchmark results have already been collected, are mapped into those of a simulation model, while similarities between the two models are considered and highlighted to help better explain why the simulation results may, or may not, be reliable. This work draws on our previous work [9, 13, 14], which used benchmarking neither for guiding the development nor estimating the reliability of simulation models, but for simply comparing the results of simulation against its corresponding benchmarking results, even this only if available.

For validating our complementary approach, simulation models are constructed by using CloudSim, and the simulation results are compared against the corresponding benchmark results obtained from benchmarking Amazon Web Service (AWS) and Google Compute Engine (GCE), by using the Yahoo! Cloud Serving Benchmark (YCSB) tool and a Cassandra NoSQL Database. In our experiments, the simulation results show about 90% accuracy regarding the benchmark results. Additionally, we also feel we could better explain why or why not our simulation results are reliable, and our approach helps evaluate and validate Cloud design alternatives in the early design stages.

In the rest of this paper, Sect. 2 provides related work, and Sect. 3 presents our proposed approach. Section 4 describes our experiments, followed by observations and discussions in Sect. 5. In the end, a summary of contributions and future directions are described.

2 Related Work

The key distinctive of our work lies in the use of a complementary approach, using benchmarks and simulations together, in estimating the performance of Cloud-based systems in a fast, inexpensive, and also reliable manner. More specifically, our work proposes a five-step process for capturing the ontologies of estimation, benchmark, and simulation, and mapping the benchmark ontologies to simulation ontologies, while considering the similarities, and mismatches, between the two. Our work also provides a prototype tool for our semi-automated approach.

While there seems to be little or no other work that proposes a complementary approach to estimating the performance of Cloud-based systems in a fast, inexpensive, and also reliable manner, some parts of our work share some similarity with some previous work, although they deal with only one, without regard to the other.

Concerning benchmarking, Cloud benchmark studies show the benchmark-based approach is expensive and time-consuming, leading to unsatisfaction for the CIOs with low budgets and restrictive time constraints [6,7,8]. Moreover, a case study [15] described difficulties in finding similar Cloud benchmarks to the design alternatives. Although these studies addressed the problems in the benchmark-based approach, there seems to be a lack of studies on solutions alleviating the issues - this further motivated our work in this paper.

Concerning simulation, simulation-based approaches to cloud computing have challenges, when ensuring that simulation results are reliable and understandably so. There seems to be a lack of papers describing how to construct cloud simulation models that understandably ensure the reliability of the model. For example, the MIPS values used in the simulation-based approaches widely ranged from as small as 250 to as high as 20,000 [10,11,12, 16].

3 Our Framework for a Complementary Approach

Our approach, using benchmarks and simulations in a complementary manner, consists of a five-step process and a prototype tool. The five-step process starts with capturing the ontology of applications, benchmarks, and simulations and ends with a design that outputs reliable and understandable performance estimates. Additionally, the similarity and mapping are taken care of among the ontologies; simulation models are constructed by using the ontological relationships and are used to confirm and reconfirm the quality of the design alternatives. The tool supports capturing and mapping the ontologies, measuring similarities, and running simulation models.

The Gane-Sarson DFD of the five steps is shown in Fig. 1. Each step is envisioned for use in an interactive, iterative, and interleaving manner rather than in a strictly sequential manner and enhance the outcomes of each step by utilizing new information obtained from the later steps.

Fig. 1.
figure 1

A 5-step process for estimating the performance of Cloud-based system

3.1 Step 1: Identifying Estimation Ontology

Ontology, which is a set of essential concepts and interrelationships among them, plays a key role in successfully designing a domain by explicitly illustrating whether essential concepts are involved in the design or omitted. In our context, we identify the ontology of estimating the performance of Cloud-based systems, which means capturing essential system concepts for the estimations. Besides the ontology of the estimation, our approach takes account of softgoals of the estimation, such as cheap, fast, and reliable estimation; for example, the ontology of reliable estimation, the ontology of fast estimation, and the ontology of inexpensive estimation.

Estimation goals and ontology are extracted from our previous work and other literature [9, 14, 17]. Later, they are represented in a goal model using Softgoal Interdependency Graph (SIG). The SIG provides a useful framework for various domains to carry out the reasoning for deriving the goals, ontology, and interdependency among them [1, 18, 19]. The aim of the goal model is not meant to construct the completely generalized goal model from first principles. Rather, as initial research of its kind, the goal model is intended to act as an example to describe how goal models work in our approach and potentially as a reference goal model. If a reference goal model needs to be brought up in future works, it is required to be refined and customized because every domain has different stakeholders, ontologies, goals, and priorities.

Fig. 2.
figure 2

Estimating ontology using softgoal interdependency graph

Figure 2 illustrates a goal model using SIG. The SIG represents a softgoal in the form of type[topic]. The type part represents the softgoal type, and the topic part describes the ontology of the softgoal, such as inexpensive[Cloud Estimation]. Each softgoal is labeled as {\({G}_{i}|1 \le i \le 34\)}. To satisfy a softgoal, we decompose the softgoal into sub-softgoals. For example, inexpensive[Cloud Estimation] (\({G}_{3}\)) is decomposed into inexpensive[Machine Usage Fee] (\({G}_{9}\)), inexpensive[Network Usage Fee] (\({G}_{10}\)), and inexpensive[Storage Usage Fee] (\({G}_{11}\)). If the three sub-softgoals are satisfied, then the inexpensive[Cloud Estimation] (\({G}_{3}\)) can be satisfied. Likewise, to satisfy fast[Cloud Estimation] (\({G}_{1}\)), Fast[Data Loading] (\({G}_{4}\)) and Fast[Workload Processing] (\({G}_{6}\)) need to be satisfied.

Fig. 3.
figure 3

Benchmark and simulation ontology (Color figure online)

In order for Cloud estimation to be reliable (\({G}_{2}\)), the models of estimation techniques, such as the estimation model, benchmark model, and simulation model, should be reliable (\({G}_{6}\)). To ensure the reliability of the estimation models, the ontologies of each model must be reliable (\({G}_{7}\)), which means each model should sufficiently include essential system concepts. Furthermore, because our approach utilizes the three complementary models, the ontological differences between them need to be low (\({G}_{8}\)). For example, the benchmark ontologies need to cover the estimation ontologies. Specifically, as the ontologies of the models can be classified into workload ontology and resource ontology [17], there should be low differences in workload ontology (\({G}_{14}\)) and resource ontology (\({G}_{15}\)). For example, a benchmark should have similar workload ontology to an estimation workload ontology. The ontology of estimation model is labeled as {\({O}_{i}|1 \le i \le 23\)}. For example, workload ontology is labeled as \({O}_{1}\), and resource ontology is shown as \({O}_{12}\). The workload ontology \({O}_{1}\) is represented as \({O}_{1}\) = {\({O}_{2}\),\({O}_{3}\),{\({O}_{4}\)},{\({O}_{5}\)}}, where \({O}_{4}\) = {\({O}_{6}\),\({O}_{7}\),\({O}_{8}\)} and \({O}_{5}\) = {\({O}_{9}\),\({O}_{10}\),\({O}_{11}\)}. Likewise, the resource ontology \({O}_{12}\) is decomposed as \({O}_{12}\) = {\({O}_{13}\),{\({O}_{14}\)},{\({O}_{15}\)}}, where \({O}_{15}\) = {\({O}_{16}\),\({O}_{17}\)}, \({O}_{14}\) = {\({O}_{18}\),\({O}_{19}\),{\({O}_{20}\)},\({O}_{21}\)}, and \({O}_{20}\) = {\({O}_{22}\),\({O}_{23}\)}.

3.2 Step 2: Capturing Benchmark Ontology and Defining Mapping Rules

The goal model with estimation ontology, constructed at the previous step, is used in an interactive, iterative and interleaving manner as a guide to provide insights into what benchmark ontologies to capture, which are depicted in Fig. 3 using the Class diagram. In the backward direction, while capturing the benchmark ontologies, the ontologies unrecognized in the previous step can be captured, which results in an enhanced estimation ontology in the goal model.

For example, while capturing the benchmark ontologies that corresponds to the estimation ontology from [15, 20], data size, data access distribution, and data distribution were captured as critical factors that impact the performance; these were not explicitly identified in the estimation ontology extracted from previous studies [14, 17, 21]. Then, the goal model is enhanced with the new ontology (i.e., the low difference[\({O}_{5}\): Data Ontology]) and the sub-ontologies \({O}_{9,10,11}\).

This study captures the Yahoo! Cloud Serving Benchmark (YCSB) ontology and the CloudSim ontology because the YCSB benchmark data collected from benchmarking Amazon Web Service (AWS) and Google Compute Engine (GCE) are utilized to construct the simulation models using Cloudsim. The ontologies are clustered into workload ontology and resource ontology, and the interrelationships among ontologies are depicted in the left part of Fig. 3.

The estimation ontologies are heuristically mapped onto the those of YCSB with the same semantics, and the mapping is expressed as below. The \({O}_{11}\) is not mapped onto the YCSB ontology as YCSB does not provide the data distribution functionality.

  • \({O}_{i}\): entities of the estimation ontology O, with index \(\{1 \le i \le 23\}\).

  • \({YO}_{i}\): entities of the YCSB ontology YO.

  • \(smap_1{:} O_{i} \rightarrow {YO}_{i}\), a semantic mapping function, entity index \(i \in \mathbb {N}\). The estimation ontology \({O}_{i}\) is mapped onto \({YO}_{i}\) if the entities are semantically identical.

  • \(smap_1(O_i) = {YO}_{i}\), where \(\{O_i,{YO}_i | 1 \le i \le 23 \; and \; i \ne 11 \}\).

3.3 Step 3: Capturing Simulation Ontology and Defining Mapping Rules

Since the simulation ontologies have distance from the real-world Cloud ontologies, the Cloud benchmark ontologies, captured in the previous step, are mapped onto the simulation ontologies to complement the semantics of the simulation ontologies, which results in enhancing the reliability and understandability in the simulation models. We have captured the CloudSim ontology by adopting and refining the simulation model in our previous work [14], and the simulation ontology is illustrated on the right side of Fig. 3 in blue.

The technical difficulty in the mapping is ontological mismatches between the benchmark and simulation. The semantics of the three simulation ontologies do not directly match with the benchmark ontologies: the number of Cloudlets, the length of each Cloudlet, and the MIPS value of the CPUs. According to [16], Cloudlets are Cloud-based application services, where Cloudlet length is a pre-defined instruction length and data transfer overhead. However, it is difficult to precisely define what “Cloud-based application service” is and what “pre-defined instruction length” means regarding Cloud benchmarks.

For mapping ontologies, we define a mapping rule that maps semantics of benchmark ontologies and the values thereof onto simulation ontologies. The goal model in Fig. 2 acts as a guide along the same lines as the step 2. The ontologies that are semantically identical (e.g., memory size (\({YO}_{21}\))) are directly mapped onto simulation ontologies (e.g., ram size(\({CO}_{21}\))).

For the three mismatched ontologies, we define the number of Cloudlets as the number of concurrent users (\({O}_{2}\)) and the Cloudlet length as the number of transactions (\({O}_{7}\)) that each user generates. Since we defined the rules, \(smap_1(O_2) = {YO}_{2}\) and \(smap_1(O_7) = {YO}_{7}\), the number of client threads in each benchmark data is assigned to the number of Cloudlets, and the operation count that each client thread generates is assigned to the Cloudlet length. This is because the benchmark settings for the number of client threads and the operation count are known values, but the instruction length is not. Converting benchmark transactions to the instruction length is a non-trivial task because of difficulties in obtaining the internal information for the conversion from the Cloud providers, such as the compiler version, hypervisor version, optimization options, or VM deployment policies.

The mapping is illustrated in the Fig. 3 with red lines and can be expressed as below.

  • \({CO}_{i}\): entities of the CloudSim ontology CO

  • \(smap_2{:} {YO}_{i} \rightarrow {CO}_{i}\), a semantic mapping function, entity index \(i \in \mathbb {N}\). The estimation ontology \({YO}_{i}\) is mapped onto \({CO}_{i}\) if the entities are semantically identical

  • \(vmap{:} {YO}_{i} \rightarrow {CO}_{i}\), a value mapping function, entity index \(i \in \mathbb {N}\). A value of \({YO}_i\) is mapped onto a configuration \({CO}_{i}\)

  • \(smap_2({YO}_i) = {CO}_{i}\), where, \(\{{YO}_i,{CO}_i | i = \{12, 13, 14, 18,19,21,22,23\} \}\)

  • \(smap_2({YO}_7) = {CO}_{7}\)

  • \(smap_2({YO}_i) = {CO}_{2}\), where \(\{{YO}_i | 1 \le i \le 10 \; and \; i = \{16, 20\} \}\)

  • \(vmap({YO}_2) = {CO}_{2}\)

  • \(vmap({YO}_i) = {CO}_{i}\) \(\{{YO}_i | i={7,18,21,22,23}\}\).

The mapping rules aim to show an example for describing how to map benchmark ontologies to simulation ontologies for building reliable and understandable simulation models. The more benchmark data and mapping rules are developed, the more reliable simulation models for diverse domains can be constructed. Each mapping rule can have a weight, representing how much the mapping is satisfiable, and the maximum weight value is configurable in a fuzzy way. We assumed that every mapping rule has the maximum weight of five.

3.4 Step 4: Constructing Simulations and Measuring Reliability

After assigning the semantics and values of benchmark ontologies to simulation ontologies, except for MIPS, we drive the MIPS value to run the simulations, by utilizing benchmark data, as described in Algorithm 1. Algorithm 1 takes the inputs, such as machine type (\(type\_in\)), the number of concurrent users (\(threads\_in\)), the number of servers (\(servers\_in\)), and the number of operations (\(operations\_in\)), and returns a simulation throughput (t). One assumption is that n number of benchmark data is collected. The variables, which are type[i], threads[i], servers[i], and operations[i], contain the machine type, the number of client threads, the number of servers, and the number of operations of \({i}^{th}\) benchmark data respectively. The throughput of the \({i}^{th}\) benchmark is saved in th[i].

figure a

Lines 1–10 are a forward and backward process to find MIPS values that yields simulation throughputs similar to the benchmark throughputs. The mips variable is randomly initialized. The function mapping() abides by the mapping rule defined at Step 3 to map \({i}^{th}\) benchmark settings to the number of Cloudlets (cl), the length of each Cloudlet (len), and the number of servers (vms) that are needed to run CloudSim simulations (\(run\_simulation()\)). If the difference between simulation throughput t and \({i}^{th}\) benchmark throughput th[i] is less than the threshold, the mips value is kept in M[i]. Otherwise, the mips value is adjusted by delta and the simulation is rerun. The variables threshold and delta are user-definable variables.

The function \(derive\_mips\_equation()\) in Line 11 is to derive a MIPS equation (equ) using the set of MIPS values (M) and simulation settings mapped from the benchmark settings. Our work uses non-linear regression to derive the MIPS equation because the size of the collected benchmark data is limited. However, other techniques could yield better equations. For example, if more benchmark data is collected, we can consider applying machine learning techniques. In that case, the simulation settings, obtained from the mapping() function, can act as features, and the (M) can be used as labels.

Lines 12–14 run a simulation using the input benchmark settings that must be simulated. The value of the mips variable is derived in Line 12 using the MIPS equation, and the other simulation parameters are obtained in Line 13. Line 14 runs the simulation and returns the simulation throughput.

To yield reliable performance estimates by using benchmarks and simulation, the benchmark ontology has to be similar to the estimation ontology (similarity\({}_{1}\)), and the simulation ontology must be similar to those of the benchmark (similarity\({}_{2}\)). Therefore, the reliability of the simulation is measured as below.

$$\begin{aligned} reliability = {similarity}_{1}({O}_{est}, {R}_{eb}, {O}_{ben}) * {similarity}_{2}({O}_{ben}, {R}_{bs}, {O}_{sim}) \end{aligned}$$
(1)

\({O}_{est}\), \({O}_{ben}\), and \({O}_{sim}\) are the ontologies of estimation, benchmark, and simulation respectively. \({R}_{eb}\) is a mapping rule from estimation ontology to benchmark ontology. \({R}_{bs}\) is a mapping rule from benchmark ontology to simulation ontology. When given a rule for mapping R from the source ontology \(O_s\) to the target ontology \(O_t\), the similarity between the two ontologies is measured as below.

$$\begin{aligned} similarity_i(O_s, R, O_t) = \sum _{i=1}^{n}(\frac{{w}_{{R}_{i}}}{W_R}* \frac{{w}_{{O}_{si}}}{\sum _{j=1}^{m}({w}_{{O}_{sj}}) * {r}_{{O}_{si}}}) \end{aligned}$$
(2)

The variable n is the number of rules in the mapping rule R, \(W_R\) is the maximum weight value of the rule R, and \({w}_{R_i}\) is the weight of each rule. The variable m is the total number of ontologies in \(O_s\), \({w}_{{O}_{si}}\) is the weight of an ontology of \(O_s\) in a rule \(R_i\), and \({r}_{{O}_{si}}\) is the number of mapping rules with which the ontology of \({O}_{s}\) in the rule \(R_i\) is associated.

For example, let us assume that there are only three mapping rules \(smap({YO}_{2})\) = \({CO}_{2}\), \(smap({YO}_{2})\) = \({CO}_{3}\), and \(smap({YO}_{3})\) = \({CO}_{3}\). The maximum weight of the rule is assumed five and each rule has a weight of four. \({YO}_{2}\) and \({YO}_{3}\) have a weight of three. In this case, \(W_R\) is five, \({w}_{{R}_{1-3}}\) is four, \({w}_{{O}_{s1-3}}\) is three, and the summation of \({w}_{{O}_{s1-3}}\) is nine. Since \({YO}_{2}\) is associated with two rules, \({r}_{{O}_{s1-2}}\) is two. and \({r}_{{O}_{s3}}\) is one.

3.5 Step 5: Deriving Cloud System Architecture and Experimenting on Cloud

After building simulations, various workload and resource configurations can be simulated to derive architectural alternatives, such as the different number of concurrent users, servers, and various server types.

Let us assume a scenario that a CIO wants good performance (e.g., the throughput of over 10,000 operations per second), stable performance (e.g., stable throughput between 100 to 400 concurrent users), and inexpensive monthly cost (e.g., monthly fee less than 400 USD). Then the simulation shows the results that when the CIO uses three nodes of i2.xlarge instances, the good performance goal is satisfied because the throughput is from 11,000 to 15,000 (Ops/Sec). However, it costs 1,842.48 USD per month as the hour cost of the instance is 0.213 USD. On the other hands, when the CIO select three nodes of m4.large instances, the monthly fee would be 259.2 USD as the hour cost is 0.12 USD; therefore, the goal of the inexpensive monthly price is achieved. However, the simulated throughputs are from 5,600 to 8,000 (Ops/Sec). By using the simulation, the CIO can derive Cloud design alternatives with trade-off analysis.

Since the requirements in the early design stage are ambiguous and changing, it is essential for CIOs to come up with Cloud design alternatives quickly, inexpensively, and also reliably in the initial design phases, which helps quickly make decisions and reduce trial and error in the later stages.

3.6 A Prototype Tool for Capturing Ontologies, Defining Mapping Rules, and Running Simulations

Figure 4 shows a tool helping the five-step process. In the Ontology View tap, users can define new ontologies or load pre-defined ontologies to refine the ontologies and weights. Likewise, pre-defined mapping rules can be loaded and refined in the Mapping View tap. After ontologies and mapping rules are configured, the tool automatically calculates the similarities among the ontologies, constructs simulation models, runs simulations, and measures the simulations reliability. The console tap displays the results.

Fig. 4.
figure 4

A Prototype Tool

4 Experiment

The objective of our experiments is to show proof of concepts in our approach, constructing reliable simulation models complemented by benchmarks. Since few papers describe how to build reliable Cloud simulations, our experimental results are compared with Cloud benchmark data.

4.1 Benchmark Data

Figure 5 illustrates the Cloud benchmark data, collected from Amazon Web Service (AWS) and Google Compute Engine (GCE) by using the Yahoo! Cloud Serving Benchmark (YCSB). The benchmark data is utilized to construct the simulation models and used as a baseline to evaluate our simulation results.

Fig. 5.
figure 5

Results of Cloud Benchmarking

The YCSB benchmark used the Cassandra NoSQL database and ran on m4.large, c4.large, r3.large, and i2.large instances of AWS and n1-standard-2 instances of GCE. The operation proportions for reading, updating, and inserting were respectively 63.25%, 21.64%, and 15.09%. The operation count and record count increased as the number of concurrent users rose. The operation count rose from 100,000 to 9,200,000. The record count increased from 255,550 to 8,922,472, and the number of concurrent users was 10, 40, 80, 160, 320, 640, and 920.

4.2 Simulation Experiments

In the simulation experiments using CloudSim, m4.large instances of AWS were selected. In every iteration of the experiments, the number of instances increased from 3 to 9, and the number of concurrent users increased from 10 to 960. Figure 6a shows a comparison between the simulation results and the original benchmark data. The dotted lines show the simulation throughput results and the straight lines depict the benchmark throughput results with the same workload and infrastructure settings as the simulation.

We cross-validated the simulation models by running Cloud benchmarks with workload configurations that have not been collected, such as 60, 120, 240, and 480 concurrent users and then compared the simulation models with the new benchmark results. Figure 6b shows throughput comparisons between the benchmarks and simulations. The experiments for cross-validation show 9.79(%) of Mean Absolute Percentage Error (MAPE) and 665.73 ops/sec of Mean Absolute Error (MAE). In other words, the simulation models show 90.21(%) accuracy.

Fig. 6.
figure 6

Cloud Simulation Results

The validation results are compared to the reliability and similarity measurements described in the previous section. After applying the mapping rules, the reliability equation yields 0.9042, where similarity\({}_{1}\) is 0.9565 and similarity\({}_{2}\) is 0.9454. In other words, the measured reliability of the simulation models is 90.42%, which is similar to the accuracy evaluation result.

5 Observations and Discussion

We have focused on explaining the five steps for constructing simulation models complemented by benchmarks. Our approach is semi-automated helped by a prototype tool; the tool’s users can either define new ontology, mapping rules, and reliability quantification scheme or refine the ones as references for future applications. The tool then automates constructing simulation models, running simulations, and measuring the models reliability. The ontologies, mapping rules, and reliability quantification scheme that are described in this paper seem to yield reliable simulation models. Moreover, by mapping the benchmark ontologies to those of a simulation, the semantics of ontologies in a simulation become understandable and traceable.

One threat to the validity of our approach, specifically regarding the ontologies, mapping rules, and reliability quantification, is its limitation to be generalized because the range of experiments and benchmark data were limited. Instead of fully generalizing our approach to fit all domain, we intend to offer as many references - i.e., reference ontologies, reference mapping rules, and reference reliability quantification defined by users in the different domain - as possible, then let users select and refine one, like a catalog. However, constructing and maintaining ontologies are non-trivial tasks in practice. Since we discussed throughput only in our evaluation, our approach must cover a broader range of QoS parameters. To approach those problems, more extensive case study is required. Last, the automation for constructing simulation models currently only partially works for specific mapping rules such as those described in this paper.

6 Conclusion

We presented a complementary approach, using both benchmarking and simulations together, as among the first of its kind, to estimating the performance of Cloud-based systems, in a fast, inexpensive and also reliable manner. In this approach, mapping benchmark ontologies to those of simulations enhance the reliability of simulation-based estimations, and in a more understandable manner. More specifically, our work proposes a five-step process for capturing the ontologies of estimation, benchmark, and simulation, and mapping the benchmark ontologies to simulation ontologies, while considering the similarities, and mismatches, between the two, with the incorporation of a quantification scheme for the simulation reliability and an algorithm for helping automate the construction of simulation models. A prototype tool has also been presented for supporting our five-step semi-automatic process. The results from our complementary approach-based simulation show that they are indeed reliable to the extent that they are similar to the YCSB benchmark results, when using Cassandra on AWS and GCE.

As future work, we plan to apply our approach to a wide variety of different types of domains and investigate more ontologies pertaining to such domains, while also considering mapping rules for them. Investigations of more systematic approaches for ontology mapping schemes and reliability quantification schemes, instead of just a few heuristic ones, lie ahead as well. Work is also underway towards fuller implementation of our prototype tool, e.g., for managing graphically-oriented goal-oriented models and mapping between them.