1 Introduction

Supply chain optimization approaches are critical for businesses to survive as they aim at reducing costs and increasing the profit margins for both business and customer sides by developing win–win scenarios. Businesses require robust systems that can quickly adapt to dynamic and risky business environments more and more. As part of effective supply chain management practices, one of the areas companies focus on is reducing logistic related costs. The design of the distribution systems must assure that the costs of logistics operations are pushed down while the service level is kept at an acceptable level. When it comes to service level, it is important to recognize that sometimes the best course of action for one side—a supplier or a consumer—is not always the best course of action for the other as noted in Taleizadeh et al. (2020). Consequently, a systematic approach is needed to address this issue. Vendor Managed Inventory (VMI) is one such a systematized approach that promises benefits to both suppliers and customers in the supply chain.

John (1958) was the first study to propose a debate on who should be in charge of maintaining inventory, and this resulted in the emergence of VMI. VMI is a well-known system where the timing and quantity of the deliveries to customers is determined by the supplier while assuring that the inventory of the customer stays within minimum and maximum limits determined by the customer. VMI provides suppliers with the advantage of better demand information, and operational efficiency in distribution while reducing the inventory control related costs significantly for the customers. The problem we tackle in this study assumes the VMI setting where the distributor or supplier has the freedom to select timing and quantity of the deliveries. Utilizing a VMI program has many advantages for the supply chain and each of its participants.

There major components of the logistics costs are the fleet ownership cost, routing cost of the vehicles, and inventory related costs in the customer sites. The fleet ownership cost directly depends on the fleet size and the composition of the fleet and it is a strategic decision while the routing cost is a result of daily routing plans based on the assignment of customers to the vehicles. Therefore, effective fleet composition influences the optimization process in minimizing the logistic costs significantly. That is why taking optimal or near optimal solutions for the strategic fleet sizing and composition problem is a major logistical decision.

In this study, for the integrated fleet sizing and replenishment planning problem we develop an Approximate Dynamic Programming (ADP) algorithm that uses a Fix and Optimize (F&O) method in order to calculate the objective function value in an approximated manner for the partial problem at each iteration of the dynamic programming. Dastjerd and Ertogral (2019) developed a fix and optimize heuristic for the same problem. ADP is used for enhancing the solution quality, both in terms of percentage deviations from the optimal objective function value or best bounds and computational times.

The method of solution discussed in this paper, is a fix and optimize based approximate dynamic programming approach along with a final improvement step. ADP is an approach applied to several problems in the literature as we discuss in the literature part in more detail. Main benefit of ADP is the reduction in the computational time which results from solving the partial problems at each stage in dynamic programming using a problem specific heuristic or a general heuristic, instead of solving them optimally. In this study, we suggest such an ADP approach that uses fix and optimize as the heuristic method for the partial problems. The ADP we suggest employs a look ahead strategy as well to further reduce the computational effort. We execute an improvement stage at the end to increase the quality of the solutions produced by the ADP heuristic.

The organization of this paper is as follows; the next part examines the pertinent literature. The mathematical model and the problem are explained in Sect. 3. Section 4 describes the steps for the suggested solution approach. In Sect. 5, the structure of the dataset and the performance analysis of heuristic are presented. We present our conclusion on this study and give some insights for the potential future research in Sect. 6.

2 Litrature survey

In this section we introduce some relevant studies and give an overview of the literature in four focus groups; Fleet sizing, delivery problems with given frequencies, ADP related studies, and fix and optimize method applications. The novelty of the problem is proved in our previous paper Dastjerd and Ertogral (2019). Here, we mainly aim at searching the existing literature for the solution techniques applied to the problems similar to our problem and investigate the fields to which the suggested heuristics are applied. That is, we do not discuss the similarities and differences between the problems being searched and analyze the solution methods used and their efficiency.

2.1 Fleet sizing problem

We touch on the studies that directly address fleet sizing in this section. In Desrochers and Verhoog (1991), customers with known demands are to be replenished from a central depot. The problem integrates the decisions of fleet composition and routing. They apply a version of the saving type heuristic which relies on the consecutive route fusions. A novel mathematical formulation for fleet dimension optimization and freight car assignment is studied in Sayarshad and Ghoseiri (2009). They suggest a simulated annealing-based solution approach. Three techniques are employed for getting out of a trap, namely, inferior solutions, solution space neighborhood search, and acceptance probability. In Liu et al. (2009), they tackle a problem of fleet sizing and vehicle routing. There is a set of nonhomogeneous vehicles and the main purpose is to decide the structure of the fleet. The suggested heuristic relies on a genetic algorithm. They test it on a number of benchmark instances and claim that the provided heuristic is comparable to the available solutions in the literature considering solution quality and durations. Żak et al. (2011) solves the fleet sizing problem using a two-phased heuristic. The problem is handled in a road freight transportation firm which owns a set of heterogeneous vehicles. The first phase of proposed method uses an innovative software for producing a collection of Pareto-optimal results. Next, the decision maker uses his model of preferences for examining the sample produced in the first phase. Aziez et al. (2022) suggests methods for enhancing the efficiency of the transportation operations in hospitals. They use automated guided vehicles for saving labor which results in achieving more efficiency. They develop a mathematical formulation and a powerful metaheuristic. Sun et al. (2021) proposes an operational type fleet dimensioning problem which considers cost of the fuel, as well. The problem is a modified version of fleet dimensioning and mix vehicle routing problem. They create an economy traveling distance (ETD) method to determine the range of travel distances most suitable for each type of vehicle by taking into account its fuel consumption rate. Hajba et al. (2023) use a MILP to replenish a petrol station. Vehicles with different compartment sizes, time window constraint and restrictions on the customers that a vehicle can serve are mentioned as some of the constraints in the problem. They apply a clustering method and combine it with the MILP they proposed. The results prove the enhancement of results after clustering integration. Belfiore and Fávero (2007) suggest a scatter search approach for the fleet size and mix problem in the case that time windows are considered. The fleet is considered to be heterogeneous. They report that the solution method outperforms the existing methods.

2.2 Delivery with given frequencies

Speranza and Ukovich (1994) take into account the distribution problem with candidate frequencies. Both indivisible and divisible demand situations are supported by their integer and mixed integer formulations. They modify several dominance principles and employ them in a two-step heuristic as the solution strategy. Speranza and Ukovich (1996) use a branch and bound approach to solve the problem of distributing a variety of commodities from a single origin to a variety of consumers depending on preset replenishment schedules.

In Bertazzi et al. (1997), authors provide a strategy for resolving a problem with predefined frequency. Similar to our problem, items are delivered from a single origin to several destinations. They also take into account expenditures connected to transportation and inventories, which is another resemblance to our study. The fact that Bertazzi et al. (1997) uses continuous frequencies in their replenishment decision making approach, makes our case distinguishable from theirs. Considering cost evaluation, they do not account for the fixed costs per deliveries. They employ a sequential heuristic method in which first builds a mixed integer programming model for the single link problems and then solves a shortest path problem. As a result, a decision on frequency-truck assignment and the percentage of transported cargo is made. Bertazzi and Speranza (1999) analyze a situation that differs from ours, where there are several products, one origin, a few intermediary nodes, and a destination. They make the assumption that the candidate delivery frequencies are predetermined. They offer four distinct heuristic techniques with different underlying concepts. They develop a method based on the idea of dividing sequences into links and handling each link independently. The other method proposed by Bertazzi and Speranza (1999) applies the same shipping percentages to all the links in the network. Remaining two heuristics are discrete versions of the famous EOQ formulations. Lastly, they try a technique which is developed upon the idea of dynamic programming approach. The heuristic takes the links as stages and the collection of delivery frequencies on the preceding links as states.

The problem of transporting several items from an origin to a single destination at a predefined frequency is examined by Bertazzi et al. (2000). They employ both a heuristic and an exact method for solving the problem. The heuristic methods are developed according to the prominent EOQ formula and the exact method includes a modified version of the branch and bound algorithm. A complicated production–distribution network is handled in Bertazzi et al. (2005). The problem is approached in a Vendor Managed Inventory (VMI) setting and it aims at producing and distributing the products regularly. A fleet of trucks is available for accomplishment of the distribution operations. Their solution techniques are hierarchic heuristics that solve the production and distribution subproblems consecutively. That is, first the solution of production problem is provided and then distribution problem is handled based on the results from the production subproblems.

2.3 Approximate dynamic programming

Approximate Dynamic Programming is introduced as an algorithmic technique for addressing the curse of dimensionality problem. It is frequently employed in stochastic problems. In ADP, a heuristic approach is used for the approximation of the objective function value. The literature addresses various problems which are solved by ADP and some of them can be listed as scheduling problems, fleet management problems, knapsack problem versions, vehicle routing problems, replenishment planning problems and allocation problems.

Bertsimas and Demir (2002) suggest an ADP approach for the multidimensional knapsack problem. In their paper, the goal function value is estimated by means of a non-parametric method as well as a parametric technique. They claim that the proposed heuristic approach provides the quality solutions. Ready to use softwares like CPLEX need much longer durations to find the same quality of the solution in this setting. As another example of ADP solution approach, a convex Quadratic Knapsack Problem (QKP) is considered in Hua et al. (2006). Two approaches are suggested for estimating the objective function value: continuous quadratic programming relaxation and the integral parts of the solutions to same relaxation. The computational results from testing the method on instances with about 200 integer variables guarantee that the new solution technique generates high quality solutions in case of the large-scale QKPs. Perry and Hartman (2009) use ADP to solve a dynamic, stochastic knapsack problem. They provide an approximation technique that combines simulation with deterministic dynamic programming to allow for the solution of longer-term problems. The computational outcomes demonstrate the effectiveness of the suggested approach. Topaloglu (2005) employ an ADP-based technique to optimize distribution activities of a business, which involve creating a product across several facilities and distributing it to a number of different locations where it may be purchased. They prove the concavity of the objective function and employ concave approximations for estimation of the goal function value in their version of ADP. Simao et al. (2009) deals with model creation for a problem containing high details for a larger truckload motor carrier company in the United States. A model is created for a data containing movement details of more than 6000 drivers. They use a Monte Carlo simulation to update their estimates of the value function and combine mathematical programming with machine learning to approximate their value function. In order to accurately assess the marginal value of 300 different types of drivers, the model closely calibrates itself against actual operations.

Approximate dynamic programming is applied in some other areas as well such as, surgical scheduling problem (Astaraky and Patrick (2015), Silva and de Souza (2020)), patient admission problem (Hulshof et al. (2016)) and machine scheduling problem (Ronconi and Powell (2010)).

2.4 Fix and optimize heuristic

Pochet and Wolsey (2006) introduce the multi-level capacitated lot size problem and gave an improvement heuristic known as the “exchange heuristic,” which is where the F&O may have originated. When researchers discovered that the readily available solvers were unable to deliver high-quality or optimum solutions within reasonable computing timeframes, they turned to mixed integer programming-based heuristics (Tanksale and Jha 2020). F&O is an effective solution method that produces excellent solutions within reasonable solution times and is used in several lot-sizing problem variations, including capacitated lot sizing with setup carryover (Goren et al. 2012; Chen 2015; Gören and Tunalı 2015), cooperative lot-sizing (Drechsel and Kimms 2011), and stochastic capacitated lot sizing (Helber et al. 2013).

F&O is used for a variety of additional problem types. Gintner et al. (2005) use the fix and optimize heuristic to resolve a bus scheduling problem. Dorneles et al. (2014) study the problem of high school timetabling. They suggest a mixed integer linear programming model. A combination of fixed and optimize and variable neighborhood search is generated and applied to the problem. They use F&O and assert that for particular datasets, the suggested heuristic discovered new best-known solutions. Federgruen et al. (2007) address and use F&O to solve a multi-product capacitated lot size challenge. The fix and optimize method is used by Helber and Sahling (2010) to address the multi-level capacitated lot sizing problem. Neves-Moreira et al. (2018) use F&O to assign time intervals and create delivery schedules. Neves-Moreira et al. (2018) claim that the performance of F&O is superior to that of commercial solvers. Aghazadeh and Ertogral (2023) integrates F&O into a problem space search metaheuristic and applies to the same problem considered in Dastjerd and Ertogral (2019). They show that F&O produces fast and quality solutions and is a suitable problem specific heuristic to be embedded in another metaheuristic.

As far as we are aware of, no previous work has yet recommended an ADP based solution for the fleet sizing problem. In this study, we present a novel solution strategy based on ADP for the problem that Dastjerd and Ertogral (2019) proposed. We propose an enhanced version of classical ADP that uses F&O to estimate the objective function value at each iteration.

3 Problem definition

In this section, we will outline our problem and present a mathematical representation of it. We have a situation where there's a single product in stock with a known demand pattern, and this product is needed at various locations regularly. Our main goal is to minimize the total cost of shipping this product from one central point to multiple different destinations, while also managing inventory levels at each destination. To do this, we need to decide how often and how much to restock the inventory for each customer, as well as determine the size and composition of the vehicle fleet for transportation.

The decision of how frequently to make deliveries is made by selecting from a set of predetermined delivery frequency options. We use a fleet of different vehicles with varying capacities, costs per kilometer, and ownership expenses to transport the product to customers. The replenishment of inventory is scheduled based on specified frequencies, typically determined by the number of weeks between deliveries and the specific weekday of delivery. This means that a customer’s inventory can be restocked on a weekly, biweekly, triweekly, or quad-weekly basis, and deliveries can occur on any of the 5 weekdays. Additionally, we also consider the possibility of daily deliveries in our planning. So, altogether, we have a total of 21 distinct frequency options (derived from 4 multiplied by 5 plus 1). To categorize these frequencies, we designate the first one as the daily frequency, while frequencies 2 to 6 represent weekly occurrences for each of the 5 weekdays. Likewise, frequencies 7–11, 12–16, and 17–21 correspond to biweekly, thrice-weekly, and four-times-a-week frequencies, respectively, for the five weekdays. It's worth noting that some of these frequencies overlap, and it’s crucial to account for this overlap to avoid double-counting deadheading costs and accurately reflect capacity constraints. Specifically, if we focus on the frequencies that involve Mondays, the overlapping frequencies include 2, 7, 12, 17, in addition to the daily frequency. Hence, the initial grouping of overlapping frequencies can be denoted as F1, encompassing {1, 2, 7, 12, 17}, which includes the daily frequency, marked as frequency 1.

Our primary focus revolves around determining both the fleet size and devising delivery plans. To be more precise, our model will make decisions pertaining to the quantity of vehicles for each vehicle type in the fleet, the chosen delivery frequency, and the specific vehicle allocated to restock the inventory of each customer.

The objective function we employ reflects the total annual cost, comprising two key components: costs associated with transportation, which encompass vehicle ownership and routing expenses, and costs linked to inventory, consisting of fixed replenishment and inventory holding expenses at the customer sites.

We have made a few assumptions concerning operational aspects in our problem, as follows:

  1. 1.

    Each customer’s inventory must be replenished using a single vehicle and a single predetermined frequency. This assumption aligns with common practice, as customers generally prefer a consistent delivery frequency.

  2. 2.

    Considering real-world factors like daily traffic and long travel distances, we assume that each vehicle can complete one route in a day. Additionally, there’s a constraint on the maximum number of customers a vehicle can visit during its daily route.

  3. 3.

    We also consider the possibility of grouping customers by geographic regions, allowing us to restrict two customers from different geographic areas from being assigned to the same delivery route.

A notable feature of our model is its lack of intricate routing determinations. Instead, we incorporate routing costs in a simplified manner by multiplying the number of customers visited on a route by the average travel cost between customers.

We provide an overview of the mixed integer programming model for the problem below. The notations used are as follows:

Sets:

\(I\): Customer set

\(V\): Vehicle set

\(F\): Frequencies set, \(F\)= {1, 2, …, 21}

\({F}_{j}\): Coinciding frequencies set, \(\forall\,j=1,\dots ,n\)

\(D\): Set of days of the week, \(D\) = {1, 2, …, 5}

\(H\): Set of weeks per year, \(H\) = {1, 2, …, 52}

Parameters:

\(N\): Number of coinciding frequency sets

\(m\): Number of customers

\({r}_{v}\): Approximate routing cost between two customers for vehicle \(v\)

Gv: Dead heading cost for vehicle \(v\)

\({a}_{v}\): Annual ownership cost of vehicle \(v\)

\({\lambda }_{if}\): Demand of customer \(i\) if we replenish the customer with frequency \(f\)

\(h\): Annual inventory holding cost per unit of a product

\({k}_{if}\): Fixed cost of replenishing customer i if we replenish the customer with frequency f

\({s}_{max}\): Maximum number of customers that can be visited during the day

\({c}_{v}\): Capacity of vehicle \(v\)

M: A big number

\({p}_{f}\): Total number of annual replenishments for frequency f

\({t}_{ik}\): Incidence matrix of customers \(i\) and \(k\) (customers \(i\) and \(k\) can be in the same route if \({t}_{ik}\)=2, and cannot be in the same route when \({t}_{ik}\)=\(1\))

Decision variables:

\({x}_{ivf}\):\(1\) if customer \(i\) is replenished by vehicle \(v\) and frequency \(f\), \(0\) otherwise

\({V}_{v}\):\(1\) if vehicle \(v\) is used, \(0\) otherwise

\({L}_{vf}\):\(1\) if any customer is assigned to vehicle \(v\) and frequency \(f\), \(0\) otherwise

\({R}_{dvh}\):\(1\) if any customer assigned is to vehicle \(v\) and frequency \(f\) on the day \(d\) in week \(h\), \(0\) otherwise

\({C}_{vf}\):Number of customers assigned to vehicle \(v\) and frequency \(f\)

3.1 Mathematical model

$$Min. \sum_{v\in V}{{G}_{v}p}_{1}{L}_{v1}+\sum_{d\in D}\sum_{h\in H}{\sum }_{v\in V}{{G}_{v}R}_{dvh}+{\sum_{v\in V}\sum_{f\in F}{r}_{v}.{p}_{f}.C}_{vf} +\sum_{v\in V}{a}_{v}{V}_{v}+\sum_{i\in I}\sum_{v\in V}\sum_{f\in F}{h X}_{ivf}\frac{{\lambda }_{if}}{2}+\sum_{i\in I}\sum_{v\in V}\sum_{f\in F}{{k}_{if}X}_{ivf}$$
(1)

Subject to

$$\sum_{v\in V}\sum_{f\in F}{X}_{ivf}=1\quad \forall\, i\in I$$
(2)
$$\sum\limits_{{i \in I}} {X_{{ivf}} } = C_{{vf}} \quad \forall {\mkern 1mu} v \in V,\quad \forall {\mkern 1mu} f \in F$$
(3)
$$M{L}_{vf}\ge {C}_{vf} \quad \forall\, v\in V,f\in F$$
(4)
$${L}_{vf}\le {C}_{vf} \quad \forall\, v\in V,f\in F$$
(5)
$${M V}_{v} \ge \sum_{i\in I}\sum_{f\in F}{X}_{ivf} \quad \forall\, v\in V$$
(6)
$$\sum_{i\in I}{{\lambda }_{if}X}_{ivf}\le {c}_{v} \quad \forall\, v\in V,\quad \forall\, f\in F$$
(7)
$$\sum_{i\in I}\sum_{f\in {F}_{j}}{{\lambda }_{if}X}_{ivf}\le {c}_{v} \quad \forall\, v\in V,\quad \forall\, j=1,\dots ,n$$
(8)
$${\sum }_{f\in {F}_{j}}{C}_{vf }\le {s}_{max} \quad \forall\, v\in V,\quad \forall\, j=1,\dots ,n$$
(9)
$$\sum_{f\in {F}_{j}}{X}_{ivf}+\sum_{f\in {F}_{j}}{X}_{kvf}\le {t}_{ik} \quad \forall\, v\in V,i\in I,k\in I,\quad \forall\, j=1,\dots ,n$$
(10)
$${R}_{dvh}\ge {L}_{vf}-{L}_{v1} \quad \forall\, v\in V, d \in D, f\in .Fj, h\in H$$
(11)
$${X}_{ivf}\in \left\{\mathrm{0,1}\right\} \quad \forall\, i\in I,\quad \forall\, v\in V,\quad \forall\, f\in F$$
(12)
$${L}_{vf}\in \left\{\mathrm{0,1}\right\} \quad \forall\, v\in V,\quad \forall\, f\in F$$
(13)
$${{C}_{vf}\in Z}_{\ge 0} \quad \forall\, v\in V, \quad \forall\, f\in F$$
(14)
$${V}_{v}\in \left\{\mathrm{0,1}\right\} \quad \forall\, v\in V$$
(15)
$${R}_{dvh}\in \left\{\mathrm{0,1}\right\} d \in D,v \in V, h\in H$$
(16)

The constraint (2) assures that all the demands for all customers are satisfied. The number of consumers being served by a specific frequency and vehicle are calculated using (3). The value of \({L}_{vf}\) is calculated using constraints (4) and (5). Constraint (6) decides whether a vehicle of type \(v\) is used. Vehicle capacity limitations are imposed to the model by constraints (7) and (8). The first one assures that the trucks are loaded with the products occupying a space which is less than or equal to the carrying capacity of each vehicle. The latter assures the same on coinciding frequencies. The constraint (9) sets \({s}_{max}\) as the maximum number of clients that may be visited on a route each day. Some customers are not eligible to be on the same route simultaneously due to their geographical location. This restriction is reflected by constraint (10). The redundant deadheading calculations are omitted by the constraint (11).

For further details about the mathematical formulation one can see Dastjerd and Ertogral (2019).

4 ADP based solution heuristic

Here, we offer an ADP based solution heuristic for the problem. The phrase approximate dynamic programming (ADP) designates a large family of computational and modeling methods for tackling large and complex decision problems. ADP is a way to get around the well-known dimensionality curse that troubles the use of Bellman's equation. The employment of an approximate value function for decision making is essential for the approximate dynamic programming.

We approximate the value of the objective function at each ADP iteration using the Fix and Optimize (F&O) heuristic. In Dastjerd and Ertogral (2019) we developed the original F&O and applied it to the problem. In the following sections, first we give the recursive equation for the ADP we suggest, explain the steps for F&O heuristic and then go through the steps of the suggested ADP approach.

4.1 Suggested ADP for the problem

4.1.1 Development of the recursive equation

ADP, or Approximate Dynamic Programming, is a potent method for addressing large-scale decision-making processes, which involve discrete time multistage optimization. In these problems, there is a state space \(S\) and at each stage the problem is in a specific state \({S}_{t}\in S\) from which a decision \({x}_{t}\) can be taken. After making a decision \({x}_{t}\), we receive rewards or incur costs as given by \({C}_{t}({S}_{t}; {x}_{t})\) and move to a new state \({S}_{t}+1\). Thus, the decisions at each state are conditionally dependent on all previous states and decisions. As a result, the decision has an impact not only on the immediate costs but also on the environment in which future decisions will be made, thereby influencing the upcoming costs.

Dynamic programming solves complicated decision-making problems by dividing them into smaller subproblems. The optimal solution for the problem is the one which delivers optimal solution for all of the subproblems as stated in Bellman and Kalaba (1957)

In our setting, the stages of dynamic programming formulation correspond to the consumers (indexed with \(i)\), and at each iteration we check whether a frequency-vehicle-customer assignment is appropriate for that specific customer or not. \({x}_{ivf}\) is the decision variable at each stage, and we are to decide whether to equate \({x}_{ivf}\) to 1 or to 0 for each customer \(i\). Assigning value of 1 to \({x}_{ivf}\) for a particular \((v, f)\) would mean setting \({x}_{ivf}\)‘s to 0 for all other \((v, f)\) options for customer i since a customer can only be assigned to a single vehicle and frequency pair. In making decision about the \({x}_{ivf}\) values, one must consider all available vehicle capacity options in each frequency for the current customer \(i\) which constitutes the state in the dynamic programming terminology. At each stage when we are deciding \({x}_{ivf}\) value for a customer, the state of the system consists of the remaining capacity for each vehicle-frequency pair in terms of both the remaining volume (\({Cap}_{i})\) and the remaining number of daily customers (\({C}_{vf})\) we can assign to it. This state, we also call capacity state, \({Cap}_{i}\), is of course determined by the decisions already made prior to the current stage. In our implementation, we use mathematical model to yield the state of the system at each stage.

At each stage, the incremental cost of allocating or not allocating the current customer to a vehicle-frequency pair must be calculated given that capacity state is \({Cap}_{i}\). The last but most significant factor is the estimated solution value of the objective function when there are \(i\) remaining consumers and \({Cap}_{i}\) is the capacity state for customer \(i\) in stage \(i\). With the aforementioned dependencies and definitions in mind, we give the forward recursion equation below;

Notations:

\({Cap}_{i}\): Available capacity state at each stage for customer i, i ∈ I

\(K(i, {Cap}_{i})\): Value of optimal objective function of the partial problem for customers \(i\), \(i+1, i+2,\dots , |I|\) customers the capacity state \({Cap}_{i}\)

\(I\)(\({x}_{ivf}\),\({C}_{i}\)): Incremental cost of \({x}_{ivf}\) (equating to \(1\) or 0), for customer i ∈ I, if we have capacity state \({Cap}_{i}\)

\({f}_{i+1}\) (\({Cap}_{i}\),\({x}_{ivf}\)): The function that returns the capacity state at customer i + 1, if we have capacity state \({Cap}_{i}\) before customer i and we take the decision \({x}_{ivf}\) (Setting it to 1 or 0)

Forward recursive equation of the problem:

$$K\left(i,{Cap}_{i}\right)={\mathit{min}}_{\left\{\begin{array}{c}{x}_{\mathit{ivf}}=0 or 1, v=1,\dots , V\\ f=1,\dots ,F\end{array}\right\}}\left\{I\left({Cap}_{i},{x}_{ivf}\right)+ K\left(i+1,f\left({Cap}_{i},{x}_{ivf}\right)\right)\right\}$$
$$K\left(N+1,Any capacity state\right)= 0.$$

Approximate dynamic programming we suggest is founded on an algorithmic approach that progresses forward in number of customers. To solve this problem using traditional dynamic programming, we would need to identify the exact value function \(K\left(i+1,f\left({Cap}_{i},{x}_{ivf}\right)\right)\) for each value of \({Cap}_{i}\) which becomes so complex when the data size increases. Hence, instead of calculating the exact value for the objective function of the proceeding steps, we try to approximate it through implementation of a problem specific heuristic method. In this paper, we employ a Fix and Optimize heuristic for solving the partial problem approximately in each iteration of the approximate dynamic programming. The results obtained from the Fix and Optimize method (F&O) is used as the approximate optimal value in place of \(K\left(i+1,f\left({Cap}_{i},{x}_{ivf}\right)\right)\) in the recursive formula. The problem specific heuristic F&O relies on breaking the main problem down into smaller and easier sub problems. The details of the implemented F&O are given below.

4.1.2 Fix and optimize heuristic

Fix and optimize is a two-phased heuristic in which the first phase produces a feasible quality solution and the second phase is executed to improve the solution of the first phase. The main idea in this heuristic is the fact that solving the main problem as sequential partial integer problems will reduce solution durations while produces high quality solutions. Here, the problem is divided to customer subsets. We describe the phases of the fix and optimize and give the exact steps in the following.

In the first phase of the F&O the main problem is divided into smaller sub problems based on a predetermined criterion, which is the customer index in this context. At each iteration, variables of only one sub problem are expressed as integers or binaries and the variables in the remaining sub problems are defined as linear variables or they are equated to the fixed values from previous iterations. The procedure continues until all the variables are integers and the final solution is saved to be improved in the next phase. The problem solved in Phase I at iteration \(i\) is depicted in the following Fig. 1;

Fig. 1
figure 1

The problem solved at iteration \(i\) in Phase I

The notation and pseudocode of phase I of the F&O version for our problem are given below;

Notation:

\(s\): Subproblem generation criteria (customer index in our problem)

\(k\): Number of customers in a subproblem

\(I\): Total number of customers

\(B\): Set of variables in a subproblem

\({Fixed}^{I}\): Fixed variables from the first phase

\({Fixed}^{II}\): Fixed variables from the second phase

figure a
figure b

In the second phase, the main problem is divided into the same sub problems as well. The variables in this phase are defined as either integers or they are equated to the fixed values obtained in the previous phase. At iteration \(i\), we solve the entire model where all the binary variables are fixed to the values in the previous iteration except for \({B}_{i}\), and we take \({B}_{i}\) as integer and re-optimize them. If any improvement occurs in the objective, the algorithm starts from the beginning. The procedure continues until no improvements observed. The problem solved in Phase II at iteration \(i\) is depicted in the following Fig. 2;

Fig. 2
figure 2

The problem solved at iteration \(i\) in Phase II

The pseudocode for the second phase of the F&O is given below:

figure c

4.1.3 Look ahead strategy for ADP

At each ADP step, the value of \({x}_{ivf}\) for a specific customer \(i\) is set to \(1\) or 0 for each possible vehicle-frequency combination. The number of combinations which has to be checked for determining the value of \({x}_{ivf}\) grows rapidly with the increase in the size of the data, and this leads to long solution durations for moderate or large size problems. We managed to decrease the number of vehicle-frequency pairs to be checked by means of a “look ahead” fixation technique. The idea behind the fixation is to find the \({x}_{ivf}\) values for later customers (looking ahead) in the dynamic programming stages which are likely to take the value of 1 in the final solution. During the dynamic programming as we carry out F&O iterations, we check the \({x}_{ivf}\)‘s for later customers that often assume the value of 1 or close, and fix them to 1. This way we save significant computational effort since fixing a \({x}_{ivf}\) to 1 for a customer \(i\) for a specific \((v,f)\) pair sets the remaining \({x}_{ivf}\) values to \(0\) for the remaining \((v,f)\) options for that customer. Our algorithm monitors the values of \({x}_{ivf}\) variables in last α% of the F&O iterations, and if they take the value of 0.9 or more in the last α% of the F&O iterations, we fix them to the value of 1.

4.1.4 Improvement algorithm for ADP

We suggest a two-phased improvement algorithm for further improving to the results obtained from ADP with look ahead strategy. Assessing the characteristics of the solutions delivered by ADP, it was spotted that the large deviations from optimal/ best bound values are due to extra vehicle utilizations. Considering the fact that the highest cost component in our setting corresponds to the vehicle ownership charges, an extra vehicle assignment in comparison to the optimal solution influences objective function value significantly. Improvement algorithm is designed based on making changes in the fleet composition and size. In the first phase, we basically check the solutions where each type of vehicle in the solution from ADP is increased and decreased by one while keeping the number of other vehicles fixed.

Let kv be the number of vehicles in the solution found by ADP or in the best solution during the improvement. The steps of phase I in the improvement can be illustrated as follow;

Improvement phase I:

Step 1. Let i=1, and improvement flag = 0, best_solution = The solution from ADP.

Step 2. Fix the ki= ki+1 and fix the other vehicle numbers to their value in the best solution so far. Apply ADP. If the solution found is better than the best solution so far, let best_solution = the current solution, improvement flag = 1. Else go to next step

Step 3. Fix the ki= ki-1 and fix the other vehicle numbers to their value in the best solution so far. Apply ADP. If the solution found is better than the best solution so far, let best_solution = the current solution, improvement flag = 1. Else go to next step.

Step 4. Let i=i+1. If i < |V| then go to Step 2, else go to next step

Step 5. If improvement flag = 1 go to Step 1, else STOP and report the best_solution.

Improvement phase II:

In second phase of the improvement algorithm, we check if there is any improvement when we replace a vehicle with two smaller size vehicles or with a smaller size vehicle. Two neighborhoods are defined as explained below:

  1. 1.

    Removing two small vehicle and replacing them with a large one

  2. 2.

    Replacing one large vehicle in the fleet with a small one

We use the defined neighborhoods for changing the fixed \({V}_{v}\) set on the solution obtained after the first improvement phase. The resulting vehicle numbers are used as bounds on the number of vehicles that can be used. If the changes in vehicle sets of the second phase yields improved solutions compared to the solutions from the first phase, the first phase is restarted with the new vehicle composition and the solution is updated. Otherwise, the algorithm is terminated and the current best solution is reported as the final solution.

4.1.5 Pseudo codes of the ADP and improvement algorithm

The following section shows the ADP pseudocode:

Notation:

\(C\): Set of Customers

\(V\): Set of Vehicles

\(F\): Set of given frequencies

\({P}_{0}\): Set of \((i, v, f)\) indices for which \({x}_{ivf}\) are set to 0 in the previous stages in dynamic programming

\({P}_{1}\): Set of \((i, v, f)\) indices for which \({x}_{ivf}\) are set to 1 in the previous stages in dynamic programming

\(P\): Set of \((i, v, f)\) indices for which \({x}_{ivf}\) are set to 1 or 0 in the previous stages using look ahead strategy in dynamic programming

\({Cap}_{i}\): Remaining capacity in terms of volume and assignable customer number in stage \(i\)

\({L}_{1}^{\alpha }:\) Set of \({x}_{ivf}\)’s for later customers in dynamic programming that have a value near to \(1\) (greater than 0.9) in the last \(\alpha\) percent of the iterations of F&O executed in the dynamic programming

\({Cost}_{0}\): Approximate cost of setting \({x}_{ivf}\)= 0 in each stage (approximate value for \(\left\{I\left({Cap}_{i},0\right)+ K\left(i+1,f\left({Cap}_{i},0\right)\right)\right\}\))

\({Cost}_{1}\): Approximate cost setting \({x}_{ivf}\)= 1 in each stage (approximate value for \(\left\{I\left({Cap}_{i},1\right)+ K\left(i+1,f\left({Cap}_{i},1\right)\right)\right\}\))

ADP pseudocode

figure d

Steps of the improvement phase and the notations are defined below:

Notations:

\(N\): Set of neighborhoods generated in the first phase, \(\left\{{n}_{1}, {n}_{2},{n}_{3},{n}_{4}\right\}\)

\({N}{\prime}\): Set of neighborhoods generated in the second phase, \(\left\{{n{\prime}}_{1}, {n{\prime}}_{2}\right\}\)

\({N}_{v}\): Set of bounds generated by applying moves in \(N\) to \(V\)

\({N}_{v}{\prime}\): Set of bounds generated by applying moves in \(N{\prime}\) to \(V\)

\(C\): Set of customers

\(V\): Set of vehicles

\(F\): Set of frequencies

\(J\): Neighbors

\({P}_{1}\): Set of the \((i, v, f)\) triples that are fixed to 1

\(Cost\): Value of objective function from ADP

\({Cost}_{{n}_{v}}{\prime}\): Value of objective function for the first phase

\({Cost}_{{n{\prime}}_{v}}{\prime}\): Value of objective function for the second phase

\({n}_{1}:\) Increasing large vehicles number by \(1\)

\({n}_{2}:\) Decreasing large vehicles number by 1

\({n}_{3}:\) Increasing small vehicles number by \(1\)

\({n}_{4}:\) Decreasing small vehicles number by 1

\(n{\prime}\): Addition of one small vehicle and exclusion of one large vehicle

\({n{\prime}}_{2}:\) Exclusion of two small vehicles and addition of one large vehicle

Improvement Stage Pseudocode

figure e

5 Computational results

Using the dataset, we created as detailed in 3.1, we carried out a numerical analysis of the proposed solution approach. The setup and results of the analysis are covered in the next subsections.

5.1 Dataset

The suggested heuristic is evaluated on four scenarios differing on the demand and the customer clusters. These scenarios are the same scenarios used in Dastjerd and Ertogral (2019). Scenario 1 and 2 are normal demand cases where customers are grouped in scenario 2 based on their geographical location. Scenarios 3 and 4 are 50% high demand versions of scenario 1 and 2. In terms of problem variation, we defined 24 different settings considering vehicle capacities (cap), costs per kilometer for vehicles (R), vehicle ownership costs (A) and inventory holding costs (h). We also consider a fixed setup cost (K). In Table 1, 1.2A stands for the cases where we used 20% increased ownership costs and 1.4A states the settings in which 40% increased ownership costs are implemented for larger vehicles The same pattern applies to 1.2R and 1.4R in terms of cost per kilometers of vehicles. In all parameter settings 40 customers are served with 8 owned vehicles and their yearly demand is produced based on a uniform distribution with a range of [80,120]. The following table shows the parameters settings for the problems solved:

Table 1 Problem characteristics

5.2 ADP results with improvement step

In this section, we examine the outcomes derived from employing ADP. Additionally, we compare the results of FO and ADP approaches by constructing 90% confidence interval on the difference of the gaps based on the well-known paired-t test, assuming the normality of the results.

Suggested heuristic is applied to the generated instances for performance evaluation. Heuristic results are compared to those from CPLEX and basic F&O heuristic. All of the problems are solved within the time limit of 3 h in CPLEX and some are reported in terms of best lower bounds. The gaps from the lower bounds are designated with *. We analyzed the results under different categories including gaps from CPLEX results, fleet compositions and solution durations. The results are tabulated below.

Table 2 presents the gaps and the paired t-test results for the improved ADP and F&O. We can see that the 90% confidence intervals of the expected difference (FO–ADP result) for all scenarios are on the positive side, which indicates that ADP outperforms FO statistically speaking in terms of the gaps from the optimal/lower bound. Based on the gaps given in Table 2, we can say that the gaps of the solutions from ADP heuristic are satisfactory. They range from 4.5 to 9.24 average deviations across four scenarios and gaps are mostly from the lower bounds obtained from CPLEX within 3 h of run time, and the deviation from the actual optimal will obviously be less than these values. We could find 2 optimal solutions in scenario 1 and 4 in scenario 2 applying ADP.

Table 2 Percentage deviations from best bounds /optimal and paired t-test results for ADP and F&O

We can see on Table 2 that clustering seems to affect the performance of the ADP heuristic negatively we comparing performance for scenario 2 to scenario 1 and scenarios 4 to scenario 2. This is expected because clustering makes the problem more difficult for fix and optimize since any wrong early assignment of vehicle frequency pair to a customer in fix and optimize process is more difficult to compensate by later assignments due to the fact that the feasible solution space in clustered case is much more limited compared to case with no clusters.

Difficulty of the clustered problems is also clear from the fact that number of problems solved to the optimality within 3 h run time is much more often in scenarios 1 and 3 compared to scenarios 2 and 5.

Demand seems to be affecting the performance of ADP adversely. High demand problems require somewhat larger number of vehicles in the solution and it increases the alternatives to be considered. That is higher demand makes the problem more complicated in short hence reduces the performance of the heuristic slightly.

Considering results from paired-t test, we can see that the confidence intervals of the expected difference (FO–ADP result) for all scenarios are on the positive side, which indicates that ADP outperforms FO statistically speaking in terms of the gaps from optimal/lower bound.

In Table 3, considering scenario based average vehicle utilization for ADP and F&O, it is observed that ADP reduces the number of vehicles in all of the scenarios which leads into lower total cost in objective function value calculation. Total average number of vehicles assigned for carrying distribution operations increases by addition of clusters and adding to demand values by 50%. That is, moving from scenario 2 to 3, the total average changes from 2.25 to 2.67 and in terms of fleet composition number of small trucks alter from 1.58 to 2. This increase is due to the fact that demand is higher in scenario 3 but the available truck capacity is kept the same. Likewise, going from scenario 2 to 4 changes the average total number of utilized vehicles and fleet composition. Cluster addition in scenarios with higher demands causes changes in total average number of vehicles whereas in scenarios with base demand it stays the same.

Table 3 Scenario based Fleet utilization average for improved ADP

Table 4, tabulates computational time for different scenarios for ADP and F&O. ADP yields quality solutions in a period less than 1 h which proves its efficiency compared to CPLEX. As it is anticipated, solving scenarios with clusters yield longer solution times resulting from the limitation put on the route generation. That is, there is a constraint on the customers to be served on the same route. Geographically distant ones cannot be visited on the same route in a day. This constraint leads to an increase in the number of choices to be assessed for assigning customers to vehicles and frequencies. Comparing ADP to F&O, it is seen that ADP lasts longer due to the repeated iterations of the F&O inside its algorithm. Considering the amount of improvement brought by ADP, the gap between computational times of F&O and ADP can be neglected.

Table 4 CPU times (in seconds) for improved ADP

Figures 3, 4, 5 and 6 illustrate the cost component percentages with respect to total cost. Each chart contains the values for four cost components in one specific scenario. The cost components considered are setup cost, inventory holding cost, vehicle ownership cost and approximate routing cost. Generally, in all of the scenarios and problem settings, ownership cost is the highest cost item. It ranges from 30 to 60% of the total cost. Setup cost is the second highest cost parameter which fluctuates between 20 and 50% of the total annual cost. Inventory holding cost constitutes about 5% to 30% of the total cost. As stated previously, the least percentage is the routing cost. It constitutes up to 20% of the total cost. These cost percentages are overall in line with the expected picture for logistic costs in practice, where approximately one third of the total logistics costs is inventory related while remaining two thirds is for transportation.

Fig. 3
figure 3

Cost element percentages for scenario 1

Fig. 4
figure 4

Cost element percentages for scenario 2

Fig. 5
figure 5

Cost element percentages for scenario 3

Fig. 6
figure 6

Cost element percentages for scenario 4

Next group of the charts, Figs. 7, 8, 9 and 10, represent the repetition of each frequency set in each problem in terms of scenarios. That is, these charts answer the question of how often each set of frequencies (daily, weekly, bi-weekly, thrice-weekly or quarto-weekly) are used in each parameter settings in each scenario?

Fig. 7
figure 7

Frequency repetition for scenario 1

Fig. 8
figure 8

Frequency repetition for scenario 2

Fig. 9
figure 9

Frequency repetition for scenario 3

Fig. 10
figure 10

Frequency repetition for scenario 4

In scenarios 1 and 2, bi-weekly replenishment plans are preferred most of the time whereas in scenarios 3 and 4 customers are mostly replenished with weekly delivery programs. The reason for this difference between scenarios is that in the first two scenarios the demand is relatively low, and the amount which can be consolidated in a single truck is larger considering carrying capacities in comparison to the cases with higher demand. In scenarios 3 and 4, the demand values are increased by 50% and the vehicle capacities are kept the same. Hence, there is less chance to consolidate in a single truck and an increased need for more frequent replenishment in order to satisfy the customer demands.

6 Conclusion

This paper investigates a novel solution method for the previously proposed problem of integrated fleet dimensioning and delivery planning in case of pregiven candidate replenishment frequency availability. We intend to provide deterministic demand to clients based on defined delivery frequency. The frequency is determined by the number of weeks between delivery, and a daily replenishment is also considered. The demands are transported to their destinations using a diverse fleet of trucks. As a strategic problem, the problem under investigation necessitates using approximations rather than precise details for the routing costs. Here, finding the best customer-vehicle-frequency assignment at the lowest cost is our key goal. As a consequence of the NP-hardness of the problem, ready-to-use programs like CPLEX cannot provide the best solutions in reasonable duration in case of large instances.

In this work, we enhanced the ADP algorithm and demonstrated its effectiveness on a variety of randomly created instances with various features. Comparing ADP to previously implemented F&O method, it is observed that the percent deviations from optimal objective function values or best bounds are improved noticeably. That is, ADP was able to find optimal solutions for two problems from scenario 1 and for four problems from scenario 2. Generally speaking, ADP could reduce the gaps for all of the scenarios as depicted in Table 2.

Additionally, we established a 90% confidence interval for the disparity between the gaps using the widely recognized paired-t test, under the assumption of result normality. It is evident that, in all scenarios, the confidence intervals for the expected difference (FO–ADP result) lean towards the positive side. This suggests that, statistically speaking, ADP demonstrates superior performance over FO in terms of the gaps from the optimal or lower bound.

In terms of computational times, ADP outperformed the CPLEX. The time limit for CPLEX solutions was 3 h, while ADP solved the problems to optimality in less than 5 min in case of scenario 1 and 2. Overall solution times for ADP are under 1 h, which is a proof for its efficiency.

One might look at using different heuristics in ADP instead of the F&O technique as future research possibilities. Another promising research field is to model the problem under patterns of replenishment frequency that are more broadly applicable than the regular patterns based on weeks that we assume in the current study. By considering seasonal demand cases and the choice of employing rental vehicles during peak demand periods, one may further try to broaden the problem description.