1.1 Model Implementation

 Typically building models is a fairly easy task, but making them accurate representations of the phenomena to be reproduced is a completely different matter. The potential sources of error are so many! Among them, the most frequent regard the wrong interconnections of the components, the inaccurate values assigned to input parameters and the incorrect use of the techniques and tools adopted to implement and solve the models.

The construction of a model requires several iterations at different levels of granularity.  At the highest level the user need to iterate many times between three different operational environments: the Real world, the Abstract space, and the Modeling domain. The process of modeling is outlined in Fig. 1.1. In this book we will focus primarily on the Modeling Domain phase.

A modeling study typically begins in the real-world environment with the observation and measurement of the phenomenon that must be reproduced. The interactions among the various components, or resources, must be assessed and their influence on the behavior of the phenomenon must be investigated. The variables that describe the quantities that we measure or we want to estimate will be referred to as performance metrics or indexes. Usually, some key components are very critical for the success of the study for the strong influence they exert on the metrics of interest, while others have a negligible impact on them. Since the effect on the results produced by the latter are minimal or zero, they can be ignored without affecting the validity of the model and thus greatly reducing its complexity.

Among the key components of the model it is very important to identify the most requested one, i.e., the bottleneck, as it determines the performance of the overall system. The saturation condition of the bottleneck depends on the characteristics of both the service requests and the component itself. Note that due to the typical fluctuations of a workload, both in the intensity and in the amount of work required, the bottleneck can migrate among different resources generating abrup changes in performance. Ignoring this condition may invalidate the entire modeling study. Bottleneck migration is very common when the workload consists of different applications with highly variable service demands.

Fig. 1.1
figure 1

Operational environments involved in the modeling process  

The technical features of the components, the assumptions introduced, the workload considered, and the statistical properties of the variables used must be accurately described. Depending on the type of study, the data required for these operations can be derived directly from measurements or must be generated by analytic functions.

The knowledge gained from the analysis of the real-world phenomenon will be merged in the Abstract space (see Fig. 1.1) with the technical background of the modeler. In this phase, decisions on the modeling technique to be adopted, on the plausibility of the assumptions introduced, on which components should be considered, on the metrics to be used, and many others must be taken. The actions that must be done in the abstract space rely upon the experience and creativity of a person rather than her/his technical background. These skills can neither be learned with an academic approach nor passively absorbed; instead them build up with experience gained through daily trial-and-error work.

The work in the abstract space end with the implementation of a first version of the model. We are now entering in the modeling domain. The process of implementing a model is intrinsically iterative and an incremental approach is usually adopted, and strongly recommended.

The components represented in the model are progressively increased starting from a small initial set that must include the bottleneck and other key components that have a great impact on the results. At each iteration the complexity of the model is increased until the level of detail of the metrics obtained matches that of the objectives of the study.

Once a model with its workload are completely defined and parameterized, the validation phase starts to assess its accuracy. Several methods of analysis are used to evaluate different properties of the model. The techniques adopted are function of the type of study to be conducted.

A model should be calibrated, its sensitivity and robustness must be assessed, its performance must be forecast with a projection technique, and the domain of validity of its results must be evaluated. Typically, a validation technique requires several iterations  at low level of granularity. When the system to be analyzed is operating, the performance metrics obtained as model outputs must be compared with those measured from the real system while it processes the production workload. In this case, the calibration of the model consists in the tuning of its input parameters so that the differences between the two sets of performance metrics (the measured outputs compared to those of the model) are negligible or at least tolerable. In case of unacceptable differences, the input parameters must be recomputed and the assumptions introduced (including the layout of the model) should be revised.

Great attention must also be paid to the selection of the measurement interval (the observation interval) in which data on the behavior of the system components and the workload executed are detected. The data must be collected when the workload processed is representative of the load that is typically executed by the system. In some cases, the measurement period may consist of several disjoint intervals. The main steps of the modeling design process and the operations required by the incremental approach are shown in Fig. 1.2.

Fig. 1.2
figure 2

Incremental approach to model building

1.2 Inputs and Outputs of Models

 Any computing system can be viewed as a set of resources (hardware and software) that execute the processing requests submitted by users. Therefore, the input parameters of a model can be divided into two groups regarding the load and the resources, respectively. Depending on the system being modeled, in the following processing requests will hereinafter be interchangeably called jobs, applications, customers, requests or users, and the resources will be referred to as stations, elements, components, or service centers.

The arriving requests are collectively called workload, while workload characterization refers to their quantitative description [11]. When the individual workload components have similar characteristics, they are grouped together and their statistical parameters, such as mean, standard deviation, and distribution, are used as inputs to models. In this case, the workload is referred to as homogeneous or single class and the models are called single class models. The components of a workload that consists of various types of applications typically have significantly different service demands. In this case, several groups of components with similar characteristics must be identified and the workload is referred to as heterogeneous or multiple class (multiclass). Each class will be described with its statistical characteristics.

In this Section we focus on the inputs and outputs of single class models. With multiple class workloads the notations become more complex (an index for the classes must be added to the identifiers of the stations), but the meaning of the parameters remain the same. Models with single class workloads will be described in Chap. 2 while models with multiclass workloads will be analyzed in Chap. 3.

Table 1.1 Some basic input parameters and output measures of single class queueing network models with queue and delay stations

Since performance models can be implemented at different levels of detail, the metrics described in Table 1.1 are divided in two groups whether they refer to the component-level or to the system-level. These metrics are the basic ones used in single class models consisting of queue and delay stations. In a queue station requests compete for the server and wait in queue to receive service, then leave the station when finished. In a delay station there is no server competition as it is assumed that there is always a free server for all incoming requests.

In single class models, the class index can be avoided. When a metric refer to a single component of the model, the subscript identify the specific element considered. The subscript 0 (zero) is used to identify metrics which refer to the system as a whole. For clarity, whenever possible, we will refer to the service requests arriving at each station in a component-level model as simply requests. In system-level models, the computational requests submitted by the users will be identified as jobs or customers interchangeably.

A description of some basic input parameters and output results for models that use queue and delay stations follows. The algebraic relationships among some of them, derived in [10, 16, 26], are also reported. Metrics for other types of stations, e.g., fork/join, finite capacity regions, Petri Nets place/transitions, will be described in the case studies where they are used.

Throughout this book we have tried to keep the description as general as possible, but since we have used the JMT Java Modelling Tools to solve the models, it has sometimes been necessary to refer to terms specific to the individual tools used, i.e., JSIMg (the Simulator), JMVA (the Analytical solver), and JABA (the Asymptotic Bound Analyzer).

The open source JMT suite can be downloaded from http://jmt.sourceforge.net.

Let us remark that a large part of the queueing networks solved with analytical techniques are of separable type. This subset of general queueing networks can be solved analytically with very efficient algorithms. Clearly, the property of being separable introduces some restrictions to the system characteristics that can be modeled. Some of them concern the concurrent use of resources, the constraints on the number of requests, the adaptivity of the routing, the priority scheduling algorithms, the blocking policies, the creation and deletion of jobs, the dynamic change of Service times (see Sect.2.3 and, e.g., [4, 25]).

Often these limitations have a minimum impact on the behavior of the system modeled and can be bypassed easily changing the assumptions adopted. In some cases, system characteristics that cannot be modeled directly with separable networks may have a negligible influence on performance. In other cases, the global model can be splitted in several sub-models, some separable and other not, that may be solved with different techniques. The results obtained from the solution of the individual sub-models can then be combined with various techniques in order to obtain the solution of the original global model.

In any case, it is important to note that these limitations only affect models that are solved analytically, while those solved with simulation are not (or minimally) affected.

Input Parameters 

Open/Closed (types of workloads, types of models)

Workloads, like models, may be of two types: open or closed. When the workload is open, the number of customers in the model is fluctuating and can grow indefinitely as a station becomes saturated, while with a closed workload this number is kept constant. An example of an open workload is the flow of requests arriving from a Internet connection, which is usually quite bursty. The models that execute open workloads are referred to as open models while those executing closed workloads are called closed models (see Fig. 1.3).

Fig. 1.3
figure 3

Examples of the two basic types of models: open (a) and closed (b)

An example of a closed workload is a computing infrastructure that can only be accessed by employees of a company. The number of customers is fixed and limited to the company employees. When the maximum number of customers that can be in execution simultaneously is reached, a new customer can enter the system only when a customer completes its execution. In simulation models with open workloads the customers arriving to the system are generated by a Source station, and at the end of the execution are routed to a Sink station (in JSIMg also Fork, Class Switch, and Transition stations may generate customers). Models with both types of open and closed workloads running concurrently are also possible, and are referred to as mixed models. 

\({\boldsymbol{\lambda }_{{\textbf {0}}}}\)–Interarrival times (workload intensity in open models)

Describes the characteristics of the incoming flow of requests arriving at the system. In analytical models (JMVA), the exponential distribution of Interarrival times is typically assumed as default, and in this case only the arrival rate \(\lambda _0\) is required. In simulation models (with JSIMg), different types of distributions (e.g., burst, hyper-exponential, Erlang, Pareto, etc.) may be selected and some related statistical indexes should be defined. The number of these parameters varies as a function of the distribution.

\(\boldsymbol{N}_{{\textbf {0}}}\)–Number of customers in the model (workload intensity in closed models)

This parameter in closed models refers to the mean number \(N_0\) of customers in the model. A job arrives at the system, circulate among the service stations (the resources of the system), and then departs at the completion of the execution. In closed models it is immediately replaced by a new job with the same characteristics, keeping \(N_0\) constant.

Type of component: Queue, Delay, Source, Sink, Class Switch, Fork/ Join, Semaphore, Place/Transition, Finite Cap. Region, Router

In a model, the components representing system resources can be of different types. The most used components in analytically solved models are typically of two types: queue and delay. In queue components, requests arrive, compete for the server, wait in the queue when it is busy, execute when it becomes idle, then exit. A delay component simply introduces a delay in the flow of requests, but no queue is created. In this case, an arriving request will always find an idle server since they are assumed to be infinite. Many more types of components are used in simulation models depending of the tool considered and the complexity of the system to be modeled. For example, in JSIMg Fork and Join stations are used to simulate parallelism, FCR Finite Capacity Regions to control access to model regions, Semaphore to block selectively the requests, Place and Transition to simulate Petri Nets.

\({\textbf {V}}_{\textbf {r}}\)–Number of visits per job (to each component)

During its execution, a job visit the components (CPU, disks, storage, ...) several times before leaving the system. \(V_r\) is the mean number of visits, also referred to as visit count, that a job makes to station r during its complete execution.

\({\textbf {S}}_{\textbf {r}}\)–Service time per visit (for queue components)

The mean time required to component r to execute one service request, corresponding to one visit, is referred to as Service time. This value does not include the time waiting in queue when the server is busy. The mean value of \(S_r\) and/or other statistical indexes (distribution, variance, coefficient of variation, etc.) must be provided according to the technique adopted to solve the model (analytical or simulation). See also comments made above for Interarrival times.

\({\textbf {D}}_{\textbf {r}}\)–Service demand per job (for each component)–\(D_r=V_r\, S_r D_r=B_r\,/\,C\).

The global amount of service time required by a job to component r for a complete execution is referred to as Service demand \(D_r\). Its value is given by the product of the Service time \(S_r\) required by a visit to the component r by the Number of visits \(V_r\) that the job makes to it. The \(D_r\) are important because it can be shown that the solution of most queueing networks does not depend on the single values of \(V_r\) and \(S_r\) but only on their product, i.e., only the service demand matters. This property, exhibited by separable queueing networks, is important as its application reduce the number of input parameters and the complexity of the models. Furthermore, the \(D_r\) can be measured more easily than \(V_r\) and \(S_r\) since, very often, their values are stored directly in the system log files. Another possibility to obtain the values of the \(D_r\) is to divide the total busy time \(B_r\) of component r by the number of jobs C completed in the observation interval. The high level of aggregation adopted in the models that use the \(D_r\) makes it impossible to compute the Throughput and Response time at the single component level, while the performance indexes at the system level are obtained correctly. A more detailed description of Service demands and separable networks can be found in the Case Study Sect. 2.3.

\({\textbf {Z}}\)–Think time per visit (for delay components)

In a delay component, the requests never wait in queue since a server is always available for their execution which therefore always takes on average Z time units (this time is typically referred to as Think time). Z is the correspondent of service time S of queue components and can be considered as the mean delay introduced by a delay component to the flow of requests that goes through it. A delay component is often used to represent the users in closed models. For this reason it is commonly selected by default as Reference station of a model.

Reference station for each workload class (at system level)

The station used to compute the performance indexes at the system level (throughput, response time, global utilization of resources, etc.) is referred to as Reference station (RS).  When a job flows through an RS, in most cases it is implicitly assumed that its execution has completed and therefore it is leaving the system. In this case, a job visits the RS only once during its life and for this reason the delay station that in most models simulates the users is often selected as RS. However, any of the components of the model may be selected as RS. Clearly, all the performance indexes are affected by this choice. To compute their values, the visits to each component of the model must be scaled with respect to those made to RS. In open models JSIMg assumes by default a Source station as RS. When a job completes its execution, its performance indexes are computed considering the time interval elapsed between its generation from the Source and its exit from the Sink. In closed models any station can be selected as RS. In this case its performance indexes are calculated considering the time elapsed between its generation from RS and enters the model and the time it exits the model and reaches the RS (see Fig. 1.3).

Output Measures 

\({\textbf {X}}_{\textbf {0}}\)–System Throughput (at system level)–\(X_0=N_0/(R_0+Z) closed\;model \lambda _0=N_0/R_0\;\;open\; model \textit{Little Law}\)

This metric represents the rate at which the jobs complete their executions and leave the system. To compute this metric it is fundamental to know which is the Reference station of the model since only the jobs that visit it have completed their executions. Without loss of generality, the visit to the external part of the model is often assumed to be one. \(X_0\) may be computed applying Little law to the system as a whole (see Fig. 1.3).

\({\textbf {N}}_{\textbf {0}}\)–Number of jobs (at system level)–\(N_0 = X_0 (R_0+Z) \;\;closed\; model N_0=\lambda _0 R_0\;\;open\; model \)

In open models the number of jobs in the system is an output metric since it depends on the arrival rate and on the contention of the components. When the utilization of a component is close to one it saturates and the value of \(N_0\) grows to infinity.

\({\textbf {R}}_{\textbf {0}}\)–System Response Time (at system level)–\(R_0 = (N_0/X_0) - Z \;\; closed\; model R_0=N_0/\lambda _0\;\; open\; model\)

The amount of time required by a complete execution of a job is referred to as System Response Time. In closed models, it can be seen as the time interval between two consecutive visits to the Reference station by the same job (the first corresponds to the instant of time in which the job is generated and the second to the instant of its completion). In open models, \(R_0\) corresponds to the time interval between the generation of a job by the Source station and the moment in which it completes its execution and reaches the Sink station. Since the models are in equilibrium and each job entering the system visits the Sink usually once, the Source can be considered as Reference station.

\({\textbf {X}}_{{\textbf {r}}}\)–Throughput of component r–\(X_r = X_0 V_r\) \(\textit{Forced Flow law}\)

Number of requests processed in a time unit by the component r. Note that the unit of measure is requests per unit of time and not jobs per unit of time (used by System Throughput). When the Service demands \(D_r\) instead of the Visits \(V_r\) and Service times \(S_r\) are used as input parameters, the \(X_r\) cannot be obtained from the model. In this case, the throughput of each component is equal to \(X_0\). The relationship between \(X_0\) and \(X_r\) are obtained from the Forced Flow law.

\({\textbf {N}}_{{\textbf {r}}}\)–Number of requests in component r

In a queue component, this metric refers to all the requests in the station, whether waiting in queue or in execution. In a delay component all the requests in the station are in execution, thus \(N_r\) corresponds to the mean number of requests in service.

\({\textbf {U}}_{{\textbf {r}}}\)–Utilization of component r–\(U_r = X_r S_r = X_0 V_r S_r = X_0 D_r\) Utilization law

Fraction of time the server of a queue component r is busy (in a station with one server). In a delay component this value corresponds to the mean number of requests in service.

\({\textbf {Q}}_{\textbf {r}}\)–Queue time of component r (per request)–\(Q_r = R_r - S_r\)

Mean time spent in queue waiting for the server in a queue component.

\({\textbf {R}}_{{\textbf {r}}}\)–Response time of component r (per request)–\(R_r = N_r / X_r\)

Mean time required to execute the processing request of one visit to component r. Its value includes all time spent in the component during a visit, whether waiting in queue or being served. The unit of measure is the time per request. If the number of servers of the queue component is one, then it will be: \(R_r = Q_r + S_r\).

\({\textbf {Rd}}_{{\textbf {r}}}\)–Residence time of component r (per job)–\(Rd_r = V_r R_r\)

The total time spent by a job at component r during its complete execution (including both the time spent in queue and the time being served) is referred to as Residence time \(Rd_r\).  While the Response time \(R_r\) is local to a component (i.e., it may be computed considering only the Response times of one visit to the resource), to compute the Residence time \(Rd_r\) of a resource it is necessary to know the Number of visits \(V_r\) that a job makes to the resource during its complete execution (see Appendix A.1). The unit of measure is job. The System Response Time is the sum of the Residence times of all the components of the model.

1.3 Parameterization of Simulation Models

The sequence of operations required to implement a model is clearly influenced by the techniques and tools used. In this section we restrict our attention to the simulation which, compared to other techniques, allows for maximum generality in terms of system architectures that can be modeled and adoptable assumptions.

In the following description of the steps required to implement a simulation model (see Fig. 1.4) we tried to be as general as possible. However, since the models were solved with the JSIMg simulator, its characteristics clearly influenced the sequence of operations performed. The figures mentioned in the flowchart of Fig. 1.4 show some user-interface windows for setting the parameters required by the steps represented.

Fig. 1.4
figure 4

Main steps to implement a simulation model with JSIMg. Figure numbers in the flowchart refer to examples of the corresponding screenshots

Implementing a model begins with describing its components and their interconnections. Depending on the types of user interface available, a graphical (see, for example, the model of Fig. 1.5 created with JSIMg), or other type (e.g., wizard) of description can be done.

Fig. 1.5
figure 5

Example of the graphical representation of a model using JSIMg

Fig. 1.6
figure 6

Definition of the parameters for the open class Class1

The parameters for the workload characterization  are: type of customers classes (open or closed), arrival rate and distribution of interarrival times (in some cases also other statistical parameters are required) for open classes, number of customers for closed classes, and Reference station. In Fig. 1.6 the following parameters for Class1 are set: open class, arrival rate \(\lambda =1\) req/s, exponential distribution of interarrival times, and the station Source1 as Reference station. To select different arrival rates or distributions, simply click on the Edit button and the list of available distributions will be shown. With multiclass workloads the parameters of each class must be provided.

The next step is setting the station parameters for all workload classes. In JSIMg, the parameters for a Queue station are organized into three sections: Queue, Service, and Routing. In the Queue Section, the Capacity size (max number of customers allowed in the station, in queue and in service), the type of scheduling algorithm, the queue policy, and the Drop Rule (in stations with limited capacity) are set. In the Service Section, the Number of Servers of the station, the type of service Strategy whether load independent or load dependent, and the statistical parameters of the Service Times Distribution are set. In the Routing Section the routing strategies of jobs on the interconnections automatically detected among the stations may be described. For example, in Fig. 1.7, for station Queue1 the Probability has been set as Routing Strategy, and the customers in output are sent to Queue2 with probability 0.3 and to Queue3 with probability 0.7.

Fig. 1.7
figure 7

Parameters of the Routing Section of Queue1 station

Fig. 1.8
figure 8

Performance indexes to be collected, their precision and statistical requirements

The next step concerns the selection of the metrics (performance indexes) that must be computed with the model. Usually, for each metric, several statistical variables must be set. In JSIMg (see Fig. 1.8) the following parameters are required: the class of customers and the station (or the entire system) considered, the confidence level (see Appendix A.2 and, e.g., [36, 37]), the maximum relative error, and the decision whether to generate the file with all the collected values of the metric analyzed or not. In Fig. 1.8 five indexes concerning Class1 customers are selected: two aggregated at system level (System Response Time and System Number of Customers), and three at the Queue1 station level: (Response time, Number of customers, and Utilization). For the Response time, the generation of the CSV file with all the values of the samples analyzed is required (i.e., the Stat.Res checkbox is flagged). The 99% confidence level (default value) is required for all the indexes and the max relative error tolerated is 0.03. The simulator no longer collects data of an index when the required accuracy is achieved.

When the simulation starts, for each selected index a graph like the one of Fig. 1.9 is plotted. As the simulation progress, the behavior of the confidence intervals and of the mean value of each index are shown together with the number of samples analyzed. According to the request (see the Stat.Res. checkbox flagged in Fig. 1.8), the CSV file was generated with all the values of Response times and the statistical indexes were computed. The CSV file will contain, among the other variables, the values of the percentiles (see the example of Figs. 2.10, 2.11).

Fig. 1.9
figure 9

Response times of Queue1 station: mean value and confidence intervals computed on different samples collected during the simulation progress

Fig. 1.10
figure 10

Response times of Queue1 station as a function of the Arrival rates

Most of the performance studies require the evaluation of the impact on system performance of one or more parameters. To meet this objective it is necessary to execute a sequence of models increasing (or decreasing) at each step the value of a parameter, e.g., Arrival rate or Number of customers, referred to as Control parameter. To make this process efficient, many simulators show a feature called What-if. For example, Fig. 1.10 shows the Response times obtained from the execution of 10 models with Class1 customer Arrival rate increasing from 0.2 to 1.2 job/s. Mean values and confidence intervals are also reported.

1.4 Parameterization of Analytical Models

In this section we will outline the steps required to implement a model that will be solved with a analytical technique. As can be seen from Fig. 1.11, these steps are not very different from the ones already described for the simulation models in Fig. 1.4. Clearly, the analytical technique and the tool adopted introduce some peculiarities on the operations that can’t be found in simulation.

As a function of the solution algorithm adopted, the analytical techniques can be subdivided in exact, approximate, and asymptotic. Each technique has its own parameters. In the following, we will refer to models solved with Mean Value Analysis (MVA) technique [25, 31] using the JMVA tool. The MVA algorithm compute the exact values of performance indexes, but has several limitations in terms of system characteristics that can be modeled.

In the screenshot of Fig. 1.12 the MVA has been selected as solution algorithm for the closed class Class1 of 10 customers. The tabs are reported with the sequence that must be followed for the parameter settings: Classes, Stations, Service times, Visits, Reference station, What-if. In Fig. 1.13 the mean values of the Service times of the three stations CPU, Storage1, and Storage2 are set. As requested by the MVA algorithm, the values of these parameters are considered exponentially distributed. The visits to the three stations are \(V_{CPU}=10000, V_{Storage1}=5499, V_{Storage2}=4500\).

In simulation models, to minimize the overhead introduced by the collection of the data, users should select only those performance indexes interested in the study. In analytical models, however, a consistent set of indexes is always computed as their derivation is very fast (see, e.g., Fig. 1.14). In Figs. 1.15 and 1.16 the Utilizations and the Residence times of the three stations are plotted for the Number of customers ranging from 10 to 100 (90 models were executed with a What-if analysis). The values of the performance indexes are also provided in tabular form.

Fig. 1.11
figure 11

Main steps for the implementation of a JMVA model solved with analytical technique. Figure numbers in the flowchart refer to examples of the corresponding screenshots

Fig. 1.12
figure 12

Selection of the MVA solution algorithm and settings of one closed class Class1 with 10 customers

Fig. 1.13
figure 13

Settings of the Service times of the three stations

Fig. 1.14
figure 14

Throughput of the three stations. All the performance indexes are computed

Fig. 1.15
figure 15

Utilizations of the three stations as a function of the Number of customers

Fig. 1.16
figure 16

Residence times of the three station as a function of the Number of customers