Introduction

A supply chain is a set of facilities, suppliers, customers, products and methods of purchasing, monitoring inventory and distribution. In a supply chain, the flow of goods between suppliers and customers goes across several stages and each stage may include some facilities (Sabri and Beamon 2000). Supply chain design is the most fundamental decision that will impact the entire supply chain and has been obtaining much importance recently due to increasing competitiveness in markets (Simchi-Levi and Kaminsky 2004; Thomas and Griffin 1996). Firms are forced to keep high customer service levels while they are obliged to reduce costs and maintain profit margins simultaneously.

Distribution networks are considered as the main profitability key because they directly affect both the cost of supply chains and satisfaction of the customers (Chopra 2003). The high cost of buying assets and building facilities has made locating or relocating the facilities a long-term investment and the deployed facilities are expected to carry out their activities for a long period of time. Therefore, it seems essential to design a supply chain network which performs well with some parameters variation across time (Santosa 2003). General structure of the supply chain is shown in Fig. 1.

Fig. 1
figure 1

Typical supply chain structure (Melo et al. 2009)

Supply chain design includes Strategic decisions such as determining the number and location and capacity of facilities in each echelon of the supply chain. The location of facilities and the allocation of demands to them have been a substantial research area. The typical location–allocation problem involves selecting p facilities among n potential locations and allocating demand to the open facilities to minimize the total cost (Hakimi 1965). In addition, distances, times or costs among customers and facilities can be considered (Drezner and Hamacher 2004; ReVelle and Eiselt 2005). This is p-median problem which has attracted much attention in the literature (Daskin 1995; Mladenovic et al. 2007).

Different techniques have been employed to solve problems relating to facility location. The p-median problem is typically formulated as an integer program (IP). Enumeration and heuristics methods were the primary techniques which are proposed to solve this problem. Also, several methods have been used to solve LP relaxations of the IP formulation problems including branch-and-bound, dual ascent, subgradient optimization and Lagrangian relaxation.

Metaheuristics and approximation algorithms are the prominent techniques in the literature over the last few years. The most common Metaheuristics are Genetic Algorithms (GA), Simulated Annealing, Variable Neighborhood Search, Tabu Search and Neural Networks. Moreover, some approximation algorithms have been created which attain different bounds on the favorable solution quality.

In fact, the configuration stability of a supply chain network is highly desirable because of great investments normally dedicated with this type of decision. In this paper, the goal is to obtain a distribution network design that can cope with the future demand variations and achieve a ‘‘robust” network design. In this case study, actual data are used from a company which is one of the largest pharmaceutical distribution firms in Iran. Because drug and medicine are considered as strategic commodity in all around the world, minimizing the total cost of pharmaceutical supply chain is not the entire objective and achieving a robust supply chain design is more important than having the least cost network.

In this paper, the goal is to determine the optimal number and location of distribution centers. Moreover, the allocation of customer demands to distribution centers under demand uncertainty is defined to minimize the total annual cost including warehouse fixed costs, inventory costs and transportation costs while maintaining the required service level.

This paper is organized as follows: In “Problem statement”, the mathematical model and background on methodology are stated. The proposed genetic algorithm is given in “Proposed genetic algorithm” and the approach to deal with uncertainty is discussed in “The approach to deal with uncertainty”. Numerical results are presented in “Results and analysis” and finally, the paper is concluded in “Conclusion”.

Problem statement

The problem considered in this paper is designing the distribution network for a pharmaceutical distribution companies in Iran. Existing in the pharmaceutical industry entails accepting a series of requirements and commitments related to the Department of Health. For example, Manufacturers and importers in this industry do not have the permission to distribute their own products and they are obliged to distribute their products through distribution companies. Actually, the pharmaceutical distribution firms are the connecting chain between suppliers and retailers.

Based on the scope of activity, pharmaceuticals distribution companies are divided into provincial and national firms. Provincial distribution companies are licensed to operate in particular region bounds, whereas national distribution companies are forced by Ministry of Health to distribute drugs in all provinces of the country. This company is one of the national distribution firms in Iran which should cover the demands of all cities in country with its own products.

According to the latest published statistics, the market share of this company is 15 % of the medicines and medical equipment market in Iran. The suppliers of this company include 32 manufactures and 46 importers of medicine and medical equipment which are the main suppliers of the country. These infrastructures make this company as the second pharmaceuticals distribution company in Iran.

The customers of this company include pharmacies, hospitals and medical centers in most of the cities in Iran whose numbers roughly equate 10,000. It is obvious that this optimization problem needs a very large amount of data. Therefore, data aggregation is the first essential step that should be done. The efficient approach which is used for this purpose is classifying the pharmacies, hospitals and medical centers in each city in one cluster.

Then, all customers located in each group are replaced by a single demand point in the center of cluster. Even if suitable technology exists to solve the problem with large initial data, data aggregation would be useful. Aggregation causes reduce the data variability and forecasting with aggregated data is significantly more precise compared with non-aggregated data (Simchi-Levi and Kaminsky 2004).

This company approximately owns 1,900 different drugs and medical equipment in its product portfolio and the composition of them has been determined based on market demands. Due to the relatively large number and small size of the products, these are integrated to one product. Moreover, what is significant matter for the company is how much space each customer demands occupy the content of vehicles in the transportation process. So, according to the length, width and height of product packages, the total volume of demand products is calculated and it is used as the measure of customer demands. So, this problem is considered as a single product and multi-level supply chain design problem (Sousa et al. 2011).

The current logistical structure of the company includes a warehouse in Tehran and 18 distribution centers in different provinces. The company wants to redesign its distribution network to satisfy all customer demands and tolerate the minimum total costs caused by demand fluctuations.

Location–allocation model

The design of the distribution network for this company is modeled in the form of a mixed integer linear programming problem in both location and allocation phases. In the location phase, the objective is to determine the optimal number of distribution centers and define which customers are supplied with each distribution center. In the allocation phase, the objective is to specify the amount of inventory which is transferred from the warehouse to distribution centers and from distribution centers to the customers (Amiri 2006; Gebennini et al. 2009; Jayaraman and Ross 2008; Jayaraman and Pirkul 2001). The assumptions considered in this model are:

  • The delivery time of transferring goods from warehouses to distribution centers and from distribution centers to customers are considered constant.

  • The cost of transportation is calculated based on the distance between two points and the shipping cost of a product unit in each path.

  • The total demand of each customer must be provided by a distribution center.

  • The number of customers and their demands are known.

  • The numbers and capacities of potential distribution centers and warehouses are known.

Set I represents the set of Customer zones, J is the set of potential distribution centers, K is the set of potential warehouse locations and L is the set of products. The location model intends to minimize three types of costs. The first type of costs is fixed costs associated with open warehouses ( V K ) and distribution centers ( F J ).

Second, external transportation cost to transfer products from the warehouses to distribution centers ( C j k l ). The third type is internal transportation costs to transfer products from distribution centers to customer zones ( C i j l ). The customer demands for products ( a i l ), the capacity of distribution centers ( b j ) and warehouses ( d k ) are all given as input parameter to the model.

The decision variables track the set of customer zones that are dedicated to distribution centers for products ( X i j l ), the set of open distribution centers that are dedicated to open warehouses ( Y j k l ), and the set of distribution centers ( Z J ) and warehouses ( P K ) which are open or closed.

M 1 : Location model

MIN Z 1 = j F j Z j + k V k P K + i j l C i j l X i j l + j k l C j k l Y j k l
(1)

Subject to:

j X i j l = 1 i and l
(2)
k Y j k l = Z j j and l
(3)
i l a i l X i j l b j j
(4)
j l b j Y j k l d k k
(5)
X i j l Z j i , j and l
(6)
Y j k l P k j , k and l
(7)
X i j l { 0 , 1 } i , j and l
(8)
Y j k l { 0 , 1 } j , k and l
(9)
Z j { 0 , 1 } j
(10)
P k { 0 , 1 } k
(11)

In allocation model, ( S i j l ) shows the cost to supply one unit of product L from distribution center J which would be used by customer I, ( S j k l ) is transportation cost for one unit of product L from warehouse K to distribution center J, ( T C k l ) is the transportation cost to deliver one unit of product L to warehouse K and a i l is the demand from customer I for product L. The decision variables are represented by B k l which is the quantity of product L sent to warehouse k, Q j k l is the quantity of product L sent from warehouse k to distribution center J and V i j l is the quantity of product L sent from distribution center J to customer I (Jang et al. 2002; Shu et al. 2010; Tragantalerngsak et al. 2000).

M 2 : Allocation model

MIN Z 2 = k l T C k l B k l + i j l S i j l V i j l + j k l S j k l Q j k l
(12)

Subject to:

B k l = j Q j k l k and l
(13)
k Q j k l = i V i j l j and l
(14)
j V i j l = a i l i and l
(15)
B k l 0 k and l
(16)
V i j l 0 i , j and l
(17)
Q j k l 0 j , k and l
(18)

Backgrounds on methodology

Over the past 40 years, a great number of combinatorial problems have emerged from computer science, engineering (such as the location–allocation problem studied here). It has been proved that these combinatorial problems belong to the class of NP-complete problems (Garey and Johnson 1979) and with increase in number of variables, the time required to find the optimal solution is enhanced exponentially (Garey and Johnson 1979). Thus, solving large scale of these problems to achieve optimality in a reasonable amount of computational time is not possible.

To solve these problems, heuristic algorithms were developed which provide acceptable solutions in proper computational time. One of the major defects of heuristic algorithms is the high risk of being trapped in local optimal solutions. To improve these algorithms, new approaches were presented in 1970s. They were Meta-heuristic algorithms which explore the solution area to find optimality with diversification and intensification search (Talbi 2009).

These algorithms are very efficient and successful in solving complex optimization problems in which classical methods are unable to solve. As a Meta-heuristic algorithm, GA has been used widely as a tool in simulation and exploring the solution area in various contexts. The high applicability and simplicity usage are the main reasons for the widespread use of this algorithm.

Proposed genetic algorithm

Genetic algorithms are based on Darwin’s theory and two main ideas are considered in this theory. First, next generations are created from the evolution of the previous generations. Second, members who are stronger in the struggle for survival have greater living chance. So, GA is an algorithm which utilizes initial population and each member of the population is considered as a solution (Arostegui et al. 2006; Melo et al. 2012).

Representation

The first step to solve a problem by genetic algorithm is defining the structure of representation and the way to convert a chromosome to a solution. Representation is one of the most important subjects that influence the performance of GA. The first application of GAs in distribution problems was carried out by Michalewicz et al. (1991). One known way to represent network problems is tree-based representation. Basically, there are three ways for representing a tree: edge-based encoding, vertex-based encoding and edge-and-vertex based encoding (Gen and Cheng 2000). In this paper, matrix-based representation which belongs to edge-based encoding category is used to represent the solutions (Altiparmak et al. 2009; Drezner and Wesolowsky 2003; Syarif et al. 2002).

For this purpose, the (n × m) binary matrix is used in which n indicates the number of distribution centers and m indicates the number of demand points. There are two constraints in the representation of solutions which associate with assumptions. First, the summation of the columns of the representation matrix should certainly be one which means that the demand of each city should be covered only by a distribution center. Second, the summation of each row of this matrix must not be zero which means that each distribution center should cover the demand of at least one customer. It is noticeable that the number of initial population size be considered 250 with regard to the problem largeness.

The fitness function

Fitness function indicates the quality of the solutions with a numerical value and guides the Searching process to better solutions. The fitness function in this problem consists of three costs. The first cost is external transportation which is defined as the cost of transportation from warehouses to distribution centers. The second is internal transportation cost which is defined as the cost of transportation from distribution centers to customers and the third cost is the inventory holding costs in distribution centers (Talbi 2009).

External transportation cost

The location of each distribution center (DC) is identified based on logistical structure and the latitude and longitude of each demand point. After the calculation of distances between each DC to the Warehouse, the required vehicles for transportation of products are determined with regard to the demand of the cities. Then, the external transportation cost is calculated based on the unit cost of transportation in each path.

Internal transportation cost

According to the specific location of each DC, the distance between DCs and cities is calculated and with regard to the demand of each city, the required vehicles and the internal transportation cost are determined.

Inventory holding costs

The number of distribution centers to cover all demand points is determined by logistical structure of each solution. Based on the financial department documents related to the costs of DCs, a strong linear relationship between the amount of inventory in storage facilities and the available labors in DCs and annual cost of DCs is found. Thus, the required labor and annual cost of each DC is estimated and accordingly the inventory holding cost is calculated.

Genetic operators

In each generation, the chromosomes are evaluated by measuring their related fitness functions. The chromosomes of next generation which are produced through two ways mentioned below are called offspring.

  1. 1-

    Merging two chromosomes using crossover operator

  2. 2-

    Changing chromosomes using mutation operator

Crossover is an operator used to explore new solutions in search space and it is responsible to exchange parts of chromosomes among selected parents. In this study, two-point crossover operator is employed in which these two points is randomly selected among the corresponding parts of parents. It is noteworthy that crossover probability has been considered 0.8 in this operator.

Mutation is another operator that is responsible for modifying genes within a chromosome at low rates and creates random variations in some chromosomes. The main role of mutation is identifying some areas of the search space whose possibility of being explored is low. Also, this operator reduces the probability of trapping populations in local optimal and prevents the premature convergence (Jaramillo et al. 2002). In the proposed algorithm, uniform mutation is employed and mutation probability is considered 0.1 in this operator.

Selection strategy

The most important question that should be answered to determine the selection strategy is which parents should be selected for creating the next generation to move towards a better fitness function. In this paper, roulette wheel method is used to determine the parents. In this method, the selection probability of each chromosome is proportional to the degree they deserve. The reason for using this method is to maintain diversification in the population and provide even low chance of being selected for parents which have low fitness function (Talbi 2009).

Control parameters

This parameter impacts on the stopping criteria of algorithms and can be decreased when the convergence in solution occurs. Also, this parameter can be increased when significant improvement was not observed after a certain number of iterations to enhance the diversification in the population. It is noticeable that the optimal solution is not known in advance in this problem and the stopping criteria are set to achieve a certain number of iterations (Talbi 2009). In this paper, the number of generations is set to 1,000 and the algorithm stops if it has not any improvement over 400 generations.

With regard to the proposed mathematical model and being in the NP-hard category, there is no rational computational time for solving medium and large scale of this problem. To validate the proposed algorithm, some small-scale problems are solved by exact solutions and the answers are compared with the answers of the proposed genetic algorithm. It is noticeable that “Gams 21.7” is the software which is used to solve the mathematical model with exact approach.

The results of solving small-scale problems with exact algorithms demonstrate that they are identical with the answers resulting from the proposed genetic algorithm. Exact algorithms find the optimal solution with exploring all the search space of the problems. This equality indicates that the proposed genetic algorithm can achieve optimality in problems with small size and this demonstrates the validation of the algorithm. Therefore, it can be concluded that the proposed algorithm is appropriate for solving large and medium problems.

The approach to deal with uncertainty

With studying the data of customer demands in the past 40 months, it becomes clear that the demands behavior do not follow any specific probability distribution function. This uncertainty occurs due to the strong bullwhip effect which exists in most of the industries, but it is very impressive in Pharmaceutical industry in Iran. Figure 2 illustrates the behavior of the demands quantity in an instance city in the last 40 months.

Fig. 2
figure 2

Behavior of pharmaceutical product demands in the 40 past months in Tehran

The main reason for the demand fluctuations and the bullwhip effect exacerbation is false orders in periods of drug shortages. Such orders are so common because the customers think that the future supply of drugs and medicines will encounter by reduction. Also, the fluctuation in the price of medicines is a cause to intensify this phenomenon (Simchi-Levi and Kaminsky 2004).

An important step in the process of decision making is risk analysis. If there is a lot of uncertainty in a parameter or there is not enough availability or accessibility to the previous information, the future cannot be predicted accurately. Monte Carlo simulation provides the possibility to distinguish all the consequences of decisions and evaluate the related risk of each factor and helps to make better decisions under uncertain conditions. This technique is used in a wide variety of fields such as insurance, financial management, project management, manufacturing, research development and transportation (Deleris and Erhun 2005; Flores et al. 2006).

Monte Carlo simulation conducts risk analysis based on the outputs with replacing the values of uncertain variables. In this technique, a random set of values between the upper and lower bound for each variable is obtained from its probability distribution function. Monte Carlo simulation can contain even several thousands of calculations with regard to the number of uncertain variables and their variation ranges. Thus, a set of patterns are generated in each scenario which represent one of the possible conditions. The output of the Monte Carlo simulation illustrates the frequency of objective function values which is associated with occurrence of different scenarios (Kalos and Whitlock 2008; Klibi et al. 2010). It is obvious that the higher the numbers of scenarios, the more likely conditions are examined in the available space.

Results and analysis

As was mentioned, to consider the uncertainty in customer demands, some scenarios are created based on Monte Carlo simulation. For each scenario, the optimal number and location of distribution centers and the allocation of customers demand to distribution centers are determined. Figure 3 shows the location of 209 demand points which are the integration of pharmacies, hospitals and health centers in the intended cities with their Geographical specifications (longitude and latitude).

Fig. 3
figure 3

Locations of the demand points

Figure 4 demonstrates the Histogram of the optimal number of distribution centers based on the 400 scenarios of customer demands. In each scenario, the best number of distribution centers is calculated using the proposed genetic algorithm. This figure illustrates that the logistical structures with 9, 10 and 11 distribution centers gain the most frequencies among the structures. In the next step, the optimal logistical structures for candidate solutions are determined. This time a great number of scenarios (10,000 scenarios) for customer demands are created and the average and standard deviation of costs for these three structures are compared and shown in Table 1.

Fig. 4
figure 4

Frequency histogram of the demand scenarios

Table 1 Comparisons of average and standard deviation of the logistical structure costs

The results show that logistical structure with 9 distribution centers has the least coefficient of variation among others. In this paper, the coefficient of variation is mentioned as an index of risk, so it can be concluded that this structure is more stable against uncertainty in customer demands. Figure 5 illustrates the logistical structure with 9 distribution centers which is selected as the best distribution network for this company. It is remarkable that the total cost in current distribution network is 9.97 × 108 monthly and the proposed distribution network leads to 14 % reduction in total cost compared with the current distribution network.

Fig. 5
figure 5

Logistical structure with 9 distribution centers

To prove the significance of the difference between the costs of the structures, ANOVA test is used in SPSS 20.0 software. This test is done to compare the mean cost of these three groups and to study the influence of independent variables on dependent variable.

In this problem, the independent variables include the number of distribution centers in each candidate logistical structure and the dependent variable is the logistical costs associated with the structures. The result of this test is shown in Table 2. As it can be shown, Sig is smaller than 0.05 which means H 0 is rejected and there are significant differences between the average costs of these structures.

Table 2 Result of ANOVA test

Conclusion

In this study, a model of location–allocation problem in a three echelon supply chain with the aim of locating an unknown number of distribution centers to distribute pharmaceutical products from a specific warehouse to customers (pharmacies, hospitals and medical centers in specific cities) is provided. The objective function of this model is to minimize external and internal transportation costs and holding costs of inventory in distribution centers.

On the one hand, increasing the number of distribution centers is a cause to enhance the service levels and reduce the transportation costs from the distribution centers to the customers. On the other hand, decreasing the number of distribution centers leads to inventory integration and take advantage of risk pooling and reduce the inventory holding cost. Thus, the questions which are raised in this paper are how the optimal number of distribution centers should be determined, where distribution centers should be established and which customers must be assigned to each distribution center.

We are witnessing the customer demands uncertainty in pharmaceutical industry and it cannot be considered in any specific probability distribution function. This uncertainty is due to the intense bullwhip effect created by retailers and other factors. To consider the uncertainty in this study, a Set of possible scenarios is created based on the Monte Carlo simulation method and for each scenario, the distribution network has been optimized using genetic algorithm.

According to the frequency of optimal distribution center numbers specified from scenarios, three structures with 9, 10 and 11 distribution centers which have the highest frequency are introduced as candidate solutions. The mean and standard deviation of the costs of structures are compared and the significant difference between them was proved. The results show that logistical structure with 9 distribution centers leads to 14 % reduction in total distribution network cost compared to current network. Moreover, it has fewer coefficient of variation than two others. In this paper, the coefficient of variation is mentioned as an index of risk and with considering the risk of solutions, it can be concluded that this structure is more robust than other structures and in case of happening turbulence in customer demands such as epidemic diseases outbreak in some areas of the country, fewer costs will be imposed to this company distribution network.