Modeling and Optimization of Decision-Making Process During Loading and Unloading Operations at Container Port

The subject of this paper is the process of unloading and loading ships at maritime container terminals. Since decision problems in container terminals are dynamic in nature and must be re-evaluated over time based on the state of some crucial underlying factors (such as ships, cranes, or storage yard characteristics), the problem is formulated as a Markov decision process (MDP). The objective is to provide the optimal sequence of decisions that are to be undertaken at each time during the period of unloading and loading operations in order to minimize the total waiting time of wharf cranes and vehicles allocated to serve a containership. A simulation model is also developed in this paper to test and evaluate the optimal policy provided by the MDP.


Introduction
The maritime container terminal is a place where cargo containers are transferred between different transport vehicles, for onward transportation. Every marine container terminal performs four basic functions: receiving, storage, staging, and loading for both import and export containers [1]. Containers arrive to terminal by multiple means of transport and are stored in the terminal area. Then, containers leave the terminal by the same means to reach their final destinations. The maritime container terminal provides the interface between railroads, ocean-going ships, and over the road trucks, and represents the critical node in the transport network. The wharf crane is the most critical element in container terminals. Thus, managers must make decisions, regarding labor and equipment assignments that directly affect wharf crane productivity. The problem for the container terminal is to achieve different goals in an uncertain and complex environment characterized by container arrivals using different transportation modes (trucks, trains, and ships) and decisions to be taken during each stage. Loading/unloading operations, resources allocation, and storage yard management are the main criteria to be optimized. These operations are to be achieved in sequence or parallel manner. It may also have to take into account some global environment variables (wind, visibility, etc.), which can disturb the entire system operations. So, the final decision can only be made by a human decision maker. To provide adequate policies to manage the container terminal is not a simple task, and it is unlikely that a single model capable of generating an optimal solution and dealing with the complexity of this problem exists. However, it is possible to provide the decision maker with optimal policies to solve some problems such as resource allocation or loading/unloading operations. This paper presents an interesting application of Markov decision process (MDP) to optimize the loading and unloading operations in the container terminal context with the objective to minimize the total waiting time of wharf cranes and vehicles. To unload a ship, the wharf crane picks up containers from the ship and puts them on shuttle trucks that move them to the storage yard in the terminal. In loading procedure, the wharf crane unloads a container from a shuttle truck and puts it on the ship. In the formulated MDP, we consider two queues of shuttle trucks: one under wharf crane and the other in the storage yard (near the yard crane). The states in this model correspond to the numbers of shuttle trucks in the two queues and the actions concerns shuttle trucks and yard cranes allocation to serve the wharf crane at a specific time. The cost function depends on the number of waiting shuttle trucks under each crane.
Our last paper [2] presents an MDP for the same problem where the formulation considers only quay crane operations and the study is limited to provide a policy that affects only its productivity. The established MDP ignores storage and movement operations in the terminal performed by yard cranes and vehicles, and it does not take into account that the shuttle trucks form a closed loop between the quay and the storage yards. Instead, each quay crane is considered as single service facility, and the shuttle trucks are arriving according to Poisson distribution. In this paper, the presented model is an extension to the previous one formulated in [2] where yard cranes and vehicles are also considered in the MDP. The objective is to build the model more in order to make it more realistic. Furthermore, the optimal policy will involve all critical elements that affect the terminal container productivity.
Our basic idea in this paper is that: Since exponential and Poisson distributions may be accepted to represent crane service time and truck arrival time, we can establish an MDP based on this assumption in order to obtain an alternative policy that can be compared to the usual practices in container terminals. To test the validity of the exponential/Poisson assumption, we performed, in Sect. 3, a time-motion and data analysis studies for inter-arrival and service times of shuttle trucks and cranes. Kolmogorov-Smirnov tests were used to determine which theoretical distributions can or cannot be used to describe individual samples of the time-motion study. The exponential was the distribution of service time considered in the testing procedure. This distribution was appropriate for the majority of the sample tested for crane service times and shuttle truck inter-arrivals. On the other hand, the arrival process appears to be properly represented by the Poisson distribution.
Several distributions are, of course, valid to represent crane service time and truck inter-arrival time in the terminal. In the literature, exponential distribution is adopted in [3,4], and other authors propose distribution functions such as Uniform, Gamma, Erlang, Weibull. In [5], the uniform distribution is proposed to represent wharf crane service time and a triangular distribution for yard gantry crane. [3,4] suggest a triangular distribution function for yard gantry crane operation times. Other works [6,7] suggest using Erlang random variables, whereas [8] propose normal random variables for crane types (quay, yard). Interesting works presented in [9,10] where several distribution functions (normal truncated, Lognormal, gamma, Weibull, exponential, beta) for each handling activity and for different container type have been tested. Estimation results show that only Normal (truncated), Gamma and Weibull random variables were statistically significant and, in particular, that the Gamma random variable produced the best results.
Because existing non-commercial simulator of container terminals is quite limited and cannot simulate a specific scenario like the obtained policy in this paper, a simulation model based on discrete-event-simulation techniques and object oriented approach is presented in Sect. 6. Section 7 is dedicated to the simulation results presenting the comparison between the obtained policy and the current practices in the container terminal. According to congestion criteria, the resolved MDP provides a more efficient policy than the current practices (without policy). It consists to manage dynamically the allocated equipment in the terminal. For example, a shuttle truck allocated to serve a particular wharf crane can start working immediately with other crane when the speed of this wharf crane decreases. Furthermore, this realtime strategy allows allocating exactly the needed number of equipment in storage yards and berth depending on the flow of containers between ships and quay.

Literature Review
Many research works in engineering, as well as in computer science, have approached the problem of container terminal management in different ways. Existing literatures report several approaches to managing a container terminal. Most proposed approaches are based on stochastic optimization model [11][12][13]. Such approaches schematize container terminal activities through single-queue models or through a network of queues. Although the approaches based on optimization models allow a more elegant and compact formulation of the problem [14][15][16][17], discrete event simulation (DES) models help to achieve several aims: overcome mathematical limitations of optimization approaches, support and make computer-generated strategies/policies more understandable, and support decision makers in daily decision processes through a what if approach [13]. In the last years, several simulation tools have been developed. Some of those tools offer libraries, graphical interfaces, and different facilities for the user. In our previous work [11], we have proposed an architecture for container terminal simulation, which is based on object-oriented paradigm and distributed discrete event simulation approach.
Other approaches are based on simulation models. These approaches schematize container terminal activities through single-queue models or through a network of queues, and the main contributions seek to maximize overall terminal efficiency or the efficiency of a specific sub-area or activity inside the terminal. In [3], the authors propose a simulation model for analyzing the performance of a real container terminal in Korea. The simulation model is developed using an object-oriented approach, and using SIMPLE ++, objectoriented simulation software. In [18], the authors present a solution to the problem of resources allocation and scheduling of loading and unloading operations in a container terminal. The solution of the resource allocation problem is based on a network design formulation that assumes that the loading and unloading processes can be modeled as a network of container flows between the ships and the terminal yard for all the work shifts. In [19], it is proposed an approach for aiding the management of a container transshipment seaport to decide on the balance between the percentage of containers to undergo security inspection and the concomitant departure delays of out-bound vessels and port costs as measured by the number of container moves. Authors use a modified A* (A-star) algorithm for problem modeling and for understanding the relation between the percentage of containers to be inspected and departure delays. However, the paper is a preliminary study; the formulation proposed is quite simple, and it does not accurately recreate a real container terminal scenario. In [13], it is used Witness software to simulate Kwai Chung container terminals. The model is used to predict the actual container terminal operations. The simulation is also proposed in [20] to investigate the positive impact of shipto-rail direct loading on the capacity of a container terminal and to identify the congested area of the terminal. In our previous paper [11], we propose a simulation model for loading/unloading operations management. The software is based on distributed discrete event simulation approach and objectoriented paradigm. We used Java language to implement the simulator. In [20], the simulation is used as support tool for investment decision (reduction of the queue of incoming vessels). Still on simulation as decision support tool, authors in [21] propose a simulation model for the design and evaluation of multi-terminal systems for container handling. In [21], the authors present a mixed-integer programming model, which considers various constraints related to the integrated operations between different types of handling equipment. They propose a heuristic method, called multi-layer genetic algorithm to obtain the near-optimal solution of the integrated scheduling problem. The algorithm is tested using simulation. In [22], it is studied how to better utilize gate appointment system from terminal operators perspective by regulat-ing the number of trucks that can enter the terminal. In [17], the authors propose methods for optimizing the block size, by considering the throughput requirements of yard cranes and the block storage requirements. They examine two types of bloc layouts: Blocks for which transfer points are located beside each bay and blocks for which transfer points are located at both ends. They determined that the optimal number of bays in blocks with a transfer point at each side of a bay was larger than that in blocks with transfer point s at the ends, and that the optimal number of rows in blocks with a transfer point at each side of a bay was smaller than that in blocks with transfer points at the ends. In [23], it is proposed tow strategies for reducing container re-handles during the drayage truck retrieval process. These strategies are designed to be used real time, allowing for information updates during the retrieval period. The simulation is exploited to evaluate the use of truck arrival information to reduce container re-handles during the import container retrieval process by improving terminal operations. In [24], it is proposed a simulation model combined with statistical and analytical models for container terminal that takes into account the containers inspection activities. The paper presents experiment results that illustrate the impact of the container inspection process on the container.
In [25], the Visual SLAM language for discrete-event simulation is using to implement the developed simulation model for container terminal logistic activities. The model is based on queuing network approach related to arrivals, berthing, and departure processes of ships at the container terminal. A step of results validation is also performed. An application of simulation for the management of the Malaysian Kelang Container Terminal is discussed in [26]. The paper presents a model to improve the logistics processes at the port. The simulation is used in [4] to examine how productivity could be improved with the dynamic planning system for yard tractors utilizing real-time location systems technology in container terminals. As mentioned in [10], in most contributions dealing with container terminal loading/unloading operations there is an heterogeneity concerning the level of detail considered for activities involved and how such activities are aggregated in a single macro-activity. Furthermore, the most followed methodologies are based on stochastic approach. A more detailed state of the art is proposed in [10], where a classification of the published works according to adopted approach, regarding to each equipment, is presented. With regard to calibration and validation, we have published a paper [27], where a simulation model based on object-oriented approach and discrete event simulation techniques accompanied with its calibration and validation phase is presented. We have published other works in the field of container terminal simulation [28,29]. Other recent and interesting works dealing with DES can be found in [30,31].
Several works applying MDPs in the field of the transport are published. In [32], the authors present Markov decision process, applied for modeling and analysis of a public urban transportation system operation process. They assumed that the states of the discussed stochastic process correspond to operation states of the technical object (vehicle) and the decisions (called alternatives in the paper) can represent given modes of operation, events, transportation routes, maintenance, repair, etc., which can be assigned to the state of the modeled process. A simplified computing model using Gamma distribution for buses has been presented. Some simulation experiments are also presented. The authors discuss in [33] a discrete-time MDP approach for shipment consolidation in the logistics strategy. The objective is to determine when to realize consolidation loads of a consolidation program. The minimization criteria are the cost per unit time and cost per hundredweight per unit time. In [34], the Markov decision process is used to produce an optimal maintenance policy answering the following question: Which maintenance action should be chosen when the road segment has reached a certain age and condition? The results of the optimal policy are compared to another policy found using the Equivalent Annual Cost Method (EAC) used by the Road and Hydraulic Engineering Division. A Markov decision process for traffic signal system is formulated in [35]. The objective is to find an optimal control strategy for a signalized traffic intersection that reduces congestion. Statistical analysis of simulation results with different arrival rates is presented to show the effectiveness of this approach. [36] presents an approach based on MDP to determine the optimal maintenance strategy for wind turbines. An example is presented to illustrate the implementation of proposed method. A semi-Markov decision process (SMDP) is utilized in [37] to determine whether maintenance should be performed in each deterioration state and, if so, what type of maintenance to perform for repairable power equipment. In [38], road deterioration is modeled as a semi-Markov process in which the state transition has the Markov property and the holding time in each state is assumed to follow a discrete Weibull distribution. The optimal maintenance policies obtained through linear programming consist to minimize the life-cycle cost of a road network.

Data Acquisition and Analysis
In this section, we perform a time motion study to obtain the arrival and service time distributions at both quay and yard crane. Our goal is to determine the validity of Poisson/exponential distribution assumptions for shuttle trucks arrivals and crane service time. The objective of this section is to determine whether these assumptions are appropriate.
If they are not, it will be necessary what distribution can be used to accurately describe the system.

Data Collection
The collection of inter-arrival and service time is conceptually simple: The service time is the difference between service completions of succeeding vehicles. The assumption is that a vehicle in queue begins service immediately after the preceding vehicle completes service. To track the desired information, we record the time that each vehicle enters the queue or the service stage. Vehicle identification is accomplished by recording the truck or chassis number of each vehicle that appeared on both sides of chassis. The data used in this paper are collected at the container terminal of Casablanca in Morocco.
The figure 1 shows the primary data collection site near the quay crane and the secondary data location site in the storage yard. The motion of wheels was used as the basis of event occurrences, which is described in table 1. The visits to the container terminal of Casablanca in Morocco resulted of multiple data files. These files are transformed to obtain two groups of data: service times and inter-arrival times.

The Kolmogorov-Smirnov (K-S) Test
The objective of this section is to determine the distributions of the service and inter-arrival times. These analyzes are critical in correctly specifying the theoretical queuing models that are used in studying container terminal operations. The K-S test is a nonparametric test for the equality of continuous, one-dimensional probability distributions that can be used to compare a sample with a reference probability distribution (one-sample KS test), or to compare two samples (two-sample K-S test). The Kolmogorov-Smirnov statistic quantifies a distance between the empir-  ical distribution function of the sample and the cumulative distribution function of the reference distribution, or between the empirical distribution functions of two samples. The K-S test operates by comparing the cumulative distribution functions of the theoretical and the sample distributions. The test statistic, D, is the maximum absolute difference between the two distributions, which is expected in the following equation. The theoretical distribution is represented by F(t) and the sample distribution is G(t).
with: F(t) = 1 − e λt ; and G(t) = 1 n 1{x i ≤ t} The distribution test results represent statistical test for a significance level of a = 0.05. Note that the majority of data files that were tested allow several possible distributions. Since the model developed in this paper is based on exponential assumption, we only focus on this distribution. Figures 2 and 3 compare the sample distribution with the theoretical distribution for service and inter-arrival times of shuttle truck.
We have performed multiple tests. All files were tested for inter-arrival time distributions show that the exponential (i.e., Poisson arrivals at wharf or yard crane) is more appropriate than other distributions (like Normal or Erlang). Only two data files (from eleven) can reject the exponential distribution as statistically similar to the sample distribution. On the other hand, exponential distribution for service time was rejected by at least five of the data files. That demonstrates that the service time distribution at wharf or yard crane is not always exponential due to several raisons. Generally, with the single and double moves (i.e., shuttles can move one or two containers at once according to their size), there is no indication that the service times at wharf or yard cranes can be predicted or modeled as one distribution.

Formalization of the Container Terminal Problem as a Markov Decision Processes
In this section, we present a formulation of the problem loading and unloading operations as a Markov decision process (MDP) with as objective the computation of optimal decision rules that improve the productivity of each equipment used at the terminal. The main equipments considered in this case are quay cranes, yard cranes, and trucks. In this section, We will first give a review of Markov decision process (MDP), and then we characterize the process that describes the loading and unloading operations within the container terminal. The goal is to maximize (or minimize) over an infinite horizon the average, which is the sum of successive rewards (or costs) weighted by a discount factor 0 < γ < 1 that ensures the convergence of the sum, but can also be interpreted as a probability of system failure (end of mission) between two moments of the process [2,23]. If the probabilities or costs are unknown, the problem is one of reinforcement learning [39]. The main task of reinforcement learning is finding a policy that optimizes the value of costs. This policy can be represented by a map π : π : S → A s → π(x) where π(s) is the action, which the agent (it can be a human, a robot, a part of a machine, or anything susceptible to take a decision) takes at the state s.
Most MDP algorithms are based on estimating value functions. The state-value function according to a policy π is given by: where: 1. C(s, π(s)) is the expected value of the cost C t+1 when the policy π is followed and such that at time t, the state is s, 2. A parameter γ (0 ≤ γ ≤ 1) is the discount rate, which is necessary to have a present value of the future costs. In this case, the cost is given by the infinite sum: The Bellman operator is defined as [14]: T * J π (s) = min a∈A C(s, π(s))+ s ∈S

T (s, π(s), s ).γ .J π (s )
We denote J * the fixed point of the Bellman operator. J * (s) is the minimum, according to actions, of J (s) for a particular state s. J * (s) is the optimal state-value function.
In each case, there is one equation per state in S. Therefore, finding the policy π to get the right action to do for every state of the system is now equivalent to solving |S| equations, with |S| unknowns. Solving a MDP consists of looking for an efficient algorithm to solve this system. In the literature, several algorithms are proposed to calculate the optimal policy (value iteration, policy iteration, Q-learning, SARSA etc.) [39].

Markov Decision Process for Loading and Unloading Operations Within the Container Terminal
To unload a ship, the wharf crane picks up containers from the ship and puts them on shuttle trucks that move them to the storage yard in the terminal. To load a ship, the wharf crane unloads a container from a shuttle and puts it on the ship. This operation forms a closed loop that is traveled by shuttles servicing a ship (see Fig. 4). If shuttle trucks are not available underneath the crane, work ceases until a shuttle truck is arrived from the yard or until another is allocated to continue service. In this formalization, we consider two queues of shuttle trucks: The first is under wharf crane (noted Q 1 ), and the second is in the storage yard (noted Q 2 ). The two service centers are: wharf and yard cranes, each with their own finite buffer, with the sizes S Z 1 and S Z 2 . We assume that both queues have arrivals according to a Poisson process with parameters λ 1 and λ 2 (respectively). All the shuttle trucks that have been served at Q1 are routed to Q2, and the shuttle trucks that finished service at Q 2 return to Q 1 . We assume also that the service times are exponentially distributed with parameters μ 1 and μ 2 . Model: The state space S and the action set A are finite. When the action a is chosen in state s, there is a transition to state s with probability P(s; a; s) and a direct cost c(s; a). Because we have finite state and action sets, it is known that for discounted as well as for average cost there is at least one minimizing policy that is stationary and deterministic [22]. Our MDP is described as follows: 1. Definition of the set of states : The set S, which is the set of states of one three-stage handling system (formed by one wharf crane, a set of shuttle trucks, and a set of yard crane), allocated to serve a container ship. We consider the possible states: We assume that at time t, the system is at the state i(t) = (i 1 (t), i 2 (t)) ∈ S ; where: (a) i 1 and i 2 respectively, the numbers of shuttle trucks in the queues (noted Q 1 and Q 2 ) under the wharf and the yard crane; and (b) S Z 1 and S Z 2 are, respectively, the buffer sizes of Q 1 and Q 2 .

Definition of the set of actions:
The set of actions is defined by:

The transition probability:
The transition probability (noted P(i, a, i)) is the probability of being in the state i(t + t) = i (at the date t + t) by selecting the action a(t) ∈ A when the system was at the state i(t) = i (at the date t).
Since the queues Q1 and Q2 are independents, the transition probability can be calculated by: where: Since the action (+1T ) consists to add an additional shuttle truck to the queue, thus the probability of i 1 (t + t) = i 1 , when the action (+1T ) is selected and when i 1 (t) = i 1 is the same probability of i 1 (t + t) = i 1 when i 1 (t) = i 1 + 1 ; Similarly, the probability of i 1 (t + t) = i 1 when the action (-1T) is selected and when i 1 (t) = i 1 is the same probability of i 1 (t + t) = i 1 when i 1 (t) = i 1 − 1. Thus, we have the following equations: We note that executing the actions (+1C) and (−1C) is equivalent to modify the value of servicing rate in the storage yard. Therefore, the calculation of P(i 1 , (+1C), i 1 ) and P(i 2 , (+1C), i 2 ) looks like the Eqs. (9) and (10).
In this paper, we suppose that the service time of wharf and yard cranes is exponentially distributed with parameters μ 1 and μ 2 . So, the probability of begin servicing the next shuttle truck by the wharf crane in the next period t is: where: T ∼ E x p(μ 1 ).
When t is small, we have +∞ n=0 (−μ 1 t) n n! −→ 0. so, P(T ≤ t) μ 1 t. Similarly, the probability of begin servicing the next shuttle truck by the yard crane in the next period t is: P(T ≤ t) μ 2 t. We suppose also that each queue (Q 1 and Q 2 ) has arrivals according to a Poisson process with parameters λ 1 and λ 2 . Then, the probability of an arrival of a shuttle truck to the queue Q 1 (respectively the queue Q 2 ) in the next period t is proportional to λ 1 (respectively λ 2 ): where: (X t ) t>0 and (Y t ) t>0 are the Poisson processes that describe the numbers of arrivals of shuttle trucks at the queues Q 1 and Q 2 (respectively) before the date t.
In the rest of this paper, we note ρ = t. Thus, we have the following equations: where: μ 2 is the new servicing rate after the allocation of an additional yard crane. After the liberation of the additional yard crane, the servicing rate is μ 2 Thus P(i 2 , (−1C), if:i 1 = i 2 ρμ 2 (1 − ρλ 2 ); if:i 2 = i 2 and i 2 > 0 0; otherwhise. Table 2 The cost function related to operations at the quay Action (a 1 ) Table 3 The cost function related to operations at the storage yard

the cost function:
We define the cost function as follows: The cost function depends of the number of waiting shuttle trucks under each crane. The value of this function tends to be high when the length of the queue Q 1 or Q 2 is high (i.e., the number of waiting shuttle trucks exceeds the size of the queues) or when one of these queues is empty (i.e., the wharf or yard crane is waiting for a shuttle truck). We use to denote high values in the Tables 2 and 3.

Value Iteration Algorithm
One way to find an optimal policy is to find the optimal value function. It can be determined by a simple iterative algorithm called value iteration. Starting from a bounded cost function J , the iterates (T k J ) k converge uniformly to the optimal cost J * . This algorithm is called the value iteration algorithm (VIA), or sometimes successive approximation. VIA is an iterative method, which in general only converges asymptotically to the value function, even if the state space is finite. In practice, it is often very useful to have methods to accelerate the convergence of this algorithm. VIA is described as follows: Input: set of states S, set of actions A, vectors of costs for every action C(i; a), transition probability matrices for every action P(i |i; a), the discount rate γ , a very small number θ near to 0 Initialise J 0 arbitrarily for every state. (Example: and J k+1 (i) = min a∈A Q k+1 (i, a); and π(i) = argmin a∈A Q k+1 (i, a) gives the action a that has given J k+1 until |J k+1 (i) − J k (i)| < θ Output: J k takes the at the last iteration, which is the optimal state-value function Output: π k takes the k at the last iteration, which is the optimal policy

Policy Iteration Algorithm
In this sub-section, we present the policy iteration algorithm (also referred to as PIA) for finding the optimal policy in a discounted infinite horizon problem. As opposed to the value iteration algorithm, the output of PIA is not an approximation of the optimal policy, but the optimal policy itself. The policy iteration algorithm generates a sequence of improving stationary policies. The algorithm is as follows.
Initialization: Start with an initial stationary policy π 0 . Policy Evaluation: Given the stationary policy π k , compute its cost J π k by solving the linear system of equations: J π k (i) = C(i, π k (i)) + γ i ∈S P(i |i, π k (i))J π k (i ) ; ∀i ∈ S Policy Improvement: Obtain a new stationary policy π k+1 satisfying π k+1 = argmin a ∈ A[C(i, a) + γ i ∈S P(i |i, π k (i))J π k (i )] If the policy does not change, then stop. Otherwise repeat the process from step 2. Note that when the algorithm stops, we have π k+1 = π k and so, J π k = T * J π k and hence J π k is the fixed point of the Bellman operator. Thus, the policy π k is optimal.

Obtained Solution
Once we have formalized, the problem of handling operations within the container terminal as a Markov decision process,

Simulation Model
Because existing non-commercial simulator of container terminals is quite limited and cannot simulate a specific scenario like the obtained policy, we developed a simulation model based on distributed discrete-event-simulation technique (see Fig. 5). This technique was proposed as an alternative of discrete-event-simulation when the number of events to be simulated in the system is very high. This is accomplished by breaking the problem into independent parts so that each processing element can execute its part of the algorithm simultaneously with the others. Interactions in the physical system were simulated by message transmission among the processing elements. So, the simulation model is based on process-oriented paradigm and parallel computing. It uses multiple processing elements simultaneously to simulate the container terminal operations.
We used UML to model, through the class diagram, the system classes and their relationships. Class diagrams model the structure of the classes representing these "building blocks" of the system. Classes are coupled to each other through associations. In Fig. 6, we considered more general classes, which coincide in fact with categories in our system. These categories have subcategories that correspond to the elements of our model. Trucks, trains, and ships are means of transport. So, classes Truck, Train, and Ship inherit from the base class TransportMean. Import and export Yards of containers are also specializations storage yard. Thus, the classes ImportYard and ExportYard are all inherited from the super class StorageYard.
Some terminal resources such as cranes, shuttle trucks, are considered as processing elements. For this raison, they are inherited from the super class java.lang.Thread that represents an independent stream of execution within a program written with Java language. The types of events considered is this model are, for example, the arrivals of shuttle trucks to the queue under the quay or yard crane, loading/unloading  Queue size under wharf crane during unloading operations in both cases: when the optimal policy is or is not applied (more than 3,000 simulated operations) a shuttle truck, start waiting or end waiting of a crane or a truck etc.
The implementation of our model is made by Java programming language. The Java Thread API allows us to write the simulation application that can take advantage of multiple processors and perform background tasks while still retaining the interactive feel that users require.
The background of the simulator graphical interface is based the map of Casablanca container terminal in Morocco that consists of: gate, yard, and berth subsystems (see Fig. 5). Container handling equipments, considered in this paper, are storage: yard cranes, loading/unloading (wharf) cranes, and shuttle trucks. The storage yard is divided into several blocks that are served by tire or rail-mounted yard crane. The simulator is charged with a realistic reproduction of activities and flows that occur inside the terminal during loading and unloading operations.

Simulation Results
In order to test the obtained policy in Sect. 5, we performed multiple simulations of loading and unloading operations of containers. But only unloading operations are considered in the rest of this paper since the only difference between loading and unloading is the destination of the containers (ship or storage yard). In this section, we present two simulations cases: (1) In the first, we simulate unloading operations of more than 3,000 containers on the ship when we apply the optimal policy. The results are compared to the same case when this policy is not applied. We started from the allocation of one wharf crane and one yard crane installed in the reserved area to store the import containers of the ship. The number of required shuttle trucks is determined dynamically by the policy when it is applied, and it is fixed to 8 in the other case. Other parameters such as wharf and yard crane performance are the same in each case. According to the study carried out in Sect. 5, exponential distribution for crane service time and normal distribution for travel times of shuttle trucks assumptions are acceptable. Thus, we assume that the service time of wharf crane and yard crane is exponentially distributed with parameters μ 1 = 0.3 and μ 2 = 0.3 and the travel time of shuttle trucks between the quay and the storage yard is distributed according to normal distribution with parameters m = 4 and σ 2 = 0.4. Other probability distributions are supported by the simulator, but they are not considered to this paper. The output data are presented in the Figs. 7, 8, 9, and 10. They concern the queue length of trucks under wharf and yard cranes, idle times of each crane, and the total number of shuttle truck in the terminal during the time. Each figure presents a comparison of results between the two cases: When the obtained policy is applied or it is not applied. In Fig. 7, we show that the length of the queue can reach up to 7 without application of the policy, but it remains below 5 in the other case, which is not acceptable in most current container terminals where space is increasingly congested. The tendency curves presented in the two histograms in this figure show that the average of truck queue is the same in both cases (about 3 trucks), but the standard deviation around this average is different from one case to another. The applied policy has been harmonized the number of waiting trucks and ensured the normal performance of the quayside crane. In Fig. 8, it is shown a comparison of idle times in the two cases. The policy allows wharf crane to operate without stop unless a failure has occurred. In Fig. 9, we see a significant decrease of the queue length in the case of applying the policy. This is because our policy consists to increase the performances of unload- Fig. 8 Comparison of wharf crane idle time during unloading operations in both cases: when the optimal policy is or is not applied (more than 3,000 simulated operations) Fig. 9 Queue size under yard crane during unloading operations in both cases: when the optimal policy is or is not applied (more than 3,000 simulated operations) ing operations in the storage areas when the length of queue becomes so large. This has a negative effect on the storage equipments, which is shown in Fig. 10. The average of idle times becomes superior in the case where the policy is not applied. it is not a big problem, since this increase is rewarded by allowing the wharf crane to operate without idle time. In addition, the total number of trucks allocated to serve the ship decreases over time (see Fig. 11).

Fig. 10
Comparison of yard crane idle time during unloading operations in both cases: when the optimal policy is or is not applied (more than 3,000 simulated operations) Fig. 11 The number of trucks during the simulation (2) In the second case, the objective is to test the performance of the policy when more than one ship operate in the same time, where the trucks allocated to serve a particular ship could move the containers of another. For this, we run the simulation of simultaneous unloading operations of one, two, and three ships separately. The distance between two ships is assumed to be small. For each ship, we allocated one wharf crane and one yard crane. The number of containers is assumed to be more than 1,000 for each one. As in the first case, the number of allocated trucks is determined dynamically when the policy is adopted. Other parameters are the same as those of the first case (μ 1 = 0.3 , μ 2 = 0.3 , m = 4 and σ 2 = 0.4 for each ship). We note that the values of these parameters do not affect the performance of the policy since it consists to manage the number of shuttle trucks that are independent of these values. Starting from these initial inputs, we run the simulation and we retrieve the data that describe the number of waiting shuttle trucks under each crane, idle time of equipments, and the total number of trucks in movement in the system to serve both ships. In this paragraph, we focus on the number of trucks allocated using the policy to serve ships in each case: one, two, or three ships. Figure 12 displays the comparison of the number of trucks between these three cases during unloading operations. If we focus on the trend curve, we will need on average 8 trucks for one ship, 12 trucks for two ships, and 16 trucks for three ships to ensure the unloading operations. In maximum, we will need 14 trucks for one ship, 21 trucks for two ship, and 28 trucks for three ships. So we find that number of trucks is not proportional to the number of ships. Without policy, it should be on average 8 trucks for one ship, 16 trucks for two ships, and 24 trucks for three ships. So we find that implementing the policy reduces significantly the number of trucks required to serve ships when they are more than two, thereby reducing the congestion of trucks traveling between the quay and storage areas. We note also an advantage of the policy at the idle time of trucks in queues.

Conclusion
In this paper, Markov decision process (MDP) methodology was proposed and applied to the container terminal in order to improve efficiency of loading and unloading operations. MDP provides an interesting framework to model the decision-making, and the compact representation of the system provided by a MDP makes the decision strategy easier to implement. Value iteration and policy iteration algorithms mainly consist of finding the optimal policy that associates each state of the system to the action to be taken. The simulation is a valuable tool that allowed us to simulate the obtained solution. Even the approach using value iteration or policy iteration has not used in the domain of controlling a container terminal system, this work, by showing its performance, has made it an attractive one to pursue. Nevertheless, some possible perspectives to this work could be to develop the model more in order to make it more realistic by avoiding some assumptions such as the selected distributions of probability and the possibility of vehicles and stack equipments failure.
Open Access This article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.