Multi-period project portfolio selection under risk considerations and stochastic income

This paper deals with multi-period project portfolio selection problem. In this problem, the available budget is invested on the best portfolio of projects in each period such that the net proﬁt is maximized. We also consider more realistic assumptions to cover wider range of applications than those reported in previous studies. A novel mathematical model is presented to solve the problem, considering risks, stochastic incomes, and possibility of investing extra budget in each time period. Due to the complexity of the problem, an effective meta-heuristic method hybridized with a local search procedure is presented to solve the problem. The algorithm is based on genetic algorithm (GA), which is a prominent method to solve this type of problems. The GA is enhanced by a new solution representation and well selected operators. It also is hybridized with a local search mechanism to gain better solution in shorter time. The performance of the proposed algorithm is then compared with well-known algorithms, like basic genetic algorithm (GA), particle swarm optimization (PSO), and electromagnetism-like algorithm (EM-like) by means of some prominent indicators. The computation results show the superiority of the proposed algorithm in terms of accuracy, robustness and computation time. At last, the proposed algorithm is wisely combined with PSO to improve the computing time considerably.


Introduction
Project portfolio selection (PPS) is one of the most important decision-making problems for most organisations in project management and engineering management (Lean et al. 2012). This problem includes selection of optimum portfolio of projects among a range of available projects which are subject to a number of enterprises' intrinsic constraints such as budget, available resources, as well as some extrinsic and technical limitations of the real world (Tofighian and Naderi 2015). As defined by the Project Management Institute (PMI) (2008), a portfolio is a collection of projects or programs grouped together to facilitate effective management of work to meet strategic business objectives. The projects or programs of the portfolio may not necessarily be interdependent or directly related.
Recently, PPS has become one of the most active research topics in the fields of economic analysis (e.g., see Tofighian and Naderi 2015;Lee et al. 2006;Wu and Chen 2015), R&D projects (e.g., see Fang et al. 2008;Bhattacharyya et al. 2011;Hassanzadeh et al. 2014), supplier selection (e.g., see Hosseininasab and Ahmadi 2015;Vazhayil and Balasubramanian 2014;Lorca and Prina 2014), etc. Any model to solve this problem should consider relations between projects, uncertainties associated with incomes and risk issues so that the obtained results to be more valid. A retrospect of the literature reveals that numerous researches have taken into account the uncertainty and risk issues by stochastic programming and fuzzy programming. However, the cost dependency between projects is often disregarded.
In this paper, a novel formulation of the PPS problem is developed taking into account cost dependencies as presented in Golmohammadi and Pajoutan (2011). In addition to the existing features, some others such as multi-periods and possible investment of extra budget for each period in financial institutions are also included. The main objective of the proposed model is to maximize the net profit earned from investing available budget. Due to the complexity of the problem, no analytical method could be established to approximate a global optimal solution in a reasonable length of time. To overcome this challenge, a new geneticbased algorithm with new solution representation and operators is developed. The performance of the solution approach is then compared with a well-known algorithm, called particle swarm optimization (PSO) by means of four prominent indicators, namely: the mean gained fitness, standard deviation, relative percentage deviation and run time. The computations performed show that the proposed solution approach can substantially increase the accuracy and robustness of the results.
The rest of the paper is organized as follows: Sect. 2 presents the background and a brief overview on project portfolio selection problem. In Sect. 3, a mathematical model is presented to solve the problem. Section 4 describes the proposed algorithm and solution approaches developed based on three well-known meta-heuristics, namely: genetic algorithm (GA), particle swarm optimization (PSO) and EM-like algorithms. The calibration of these methods, computational results and the comparison between them are presented in Sect. 5. Finally the paper is concluded in Sect. 6.

Literature review
The widespread use of portfolio selection models in realworld situations has led to extensive studies in this area in recent years. For an accurate modelling of the problem, all real-world conditions and features should be considered. Uncertainty and risk are two substantial components of real-world conditions, and hence, several number of research papers consider these two elements, irrespective of the application of the study. For example, Kocadaglı and Keskin (2015) asserted that the risk-return trade-off is the main concern of financial theory and proposed a new portfolio selection model based on fuzzy goal programming techniques that incorporated the risk issues of investor and market trend. Huang (2008) provided a new definition of risk in the field of portfolio selection and developed a new model. In this study, a hybrid intelligent algorithm based on GA was also proposed and the effectiveness of the algorithm for solving the model was evaluated. Hosseininasab and Ahmadi (2015) explored one of the newest fields of PPS and proposed a new two phase supplier selection model that considers risks. Li et al. (2015) developed a fuzzy portfolio selection model considering background risks which may affect investors' decisions.
In addition to risk and uncertainty, considering dependencies between projects has an important role to model real-world cases. Rebiasz (2013) presented a new method for selecting investment projects in a fuzzy environment. It was concluded that economic dependencies between projects significantly affect their effectiveness and risk. Mathuria et al. (2015) proposed a new framework aiming to improve the profit-risk trade-off portfolio optimization of a power generation company. In another study, Lorca and Prina (2014) used PPS approach to select the best power producer in a competitive electricity market while considering locational electricity prices and risk management such that the expected profit of the company was maximised. Vazhayil and Balasubramanian (2014) developed a multi-objective model to select the optimal electricity generation portfolio for India's 12th 5-year plan with taking into account risks and barriers. Then, they used an Intelligent Pareto search Genetic Algorithm (IPGA) to solve the model. Despite all the above-mentioned studies on the risk and uncertainty issues, only a few research contemplate cost dependencies between projects. Golmohammadi and Pajoutan (2011) developed a PPS model with taking into account dependency between projects, stochastic revenue and risk. Two meta-heuristic algorithms based on GA and electromagnetism-like (EM-like) were proposed to solve the problem. The performance of the two algorithms was compared together and it was found out that the GA performed better than the EM-like algorithm. As noticed from the literature reviewed above, the main focus of the solution method has been so far on the meta-heuristic algorithms because of the NP-hard nature of the problem. GA and PSO are widely used throughout the literature for solving NP-hard complex problems. In addition, there are some good examples of using GA and PSO for PPS problem-solving. Guang-Feng et al. (2012) presented a PSO algorithm for solving the cardinality constrained Markowitz portfolio optimization problem. They compared this meta-heuristic method with GA, SA (simulated annulling) and TS (Tabu search) and showed that PSO performed better in most cases. Zhu et al. (2011) proposed a PSO-based algorithm to solve non-linear constrained portfolio optimization problem with multi-objective functions. The results from PSO were compared with those obtained from GA and VBA (Virtual Bee Algorithm) and it was shown that PSO results were comparable with or superior than GA and VBA. According to the previous studies, application of GA and PSO in portfolio selection problem and further examining their performance can be promising.

Problem definition
Project portfolio selection is one of the most recent fields of study and research in project management. Project portfolio management is widely applied to utilise resources for maximum profit. Until recently, several researches have been conducted to shape these models into reality, but only a few of them have considered the cost dependency between projects. As mentioned earlier, the formulations by Golmohammadi and Pajoutan (2011) will be further extended in this study to achieve the maximum benefit at the end of the last period of the project. In each period, there are a number of available projects and a possibility of bank investments which can be chosen by decision-makers (henceforth referred to as DM). In many cases, the balance of the budget (after selecting projects) is less than the minimum financial needs of the unselected projects. Thus, the surplus of the budget can be invested in banks for more benefits. The net benefit of the first period will be considered as the budget of the next period. This sequence will continue to the end of last period (T). Even though the costs and expected incomes of the projects at the beginning of the first period are already known to the organisation, the value of money changes over time. Hence, the time value of money is also considered in this study. In several investments, particularly in R&D project portfolio selections, finding the deterministic value for incomes is very difficult and tends to be inaccurate. So, this study considers incomes as independent normally distributed stochastic variables. However, the costs are considered deterministic because the resources needed and subsequent costs for each project are specified. In real-world cases projects that have relations like time relations or monetary/financial relations among them are common. In majority of the cases, these relations are synergic and may reduce costs. For instance, assume that the projects i and m are very similar in terms of the resources, techniques and a some other aspects. If the project i is selected in the period j, the associated costs of the project m in later periods will be much less, because of increase in the level of knowledge and experiences, etc. Also, the project m will be more straightforward to perform compared to the project i. For this reason, our paper also considers dependencies and relations between projects that affect costs. Because of the nature of stochastic incomes, risk must be primary in the chance-constrained approach as what was proposed in Charnes and Cooper (1962).

Characteristics
Some specific assumptions of this study are described as below: If a project starts, it must be continued and finished without any pause or break.
1. The whole financing process is self-financing, which means all finances should be inserted in the selected project/projects at once, and no extra money will be financed thereafter. 2. The revenues of the projects are independent normally distributed stochastic variables. 3. The investment in banks has a deterministic revenue. 4. The risk is considered in each period. 5. A deterministic budget is considered for all projects. 6. Costs of each project are deterministic and dependent on next and previous periods.
Regarding these characteristics, in Sects. 3.2 and 3.3, the parameters, indexes and decision variables are introduced and in Sect. 3.4 our mathematical model is presented.

Parameters and indexes n
Total number of available competitive projects N Set of all available projects N ¼ 1; . . .; n f g T Total number of periods j index for time periods j 2 1; . . .; T f g N j Set of existing projects in j-th period N j N i; m Indexes for the projects i; m 2 N z ji Cost of i-th project in j-th period without taking account the time value of money c ji Cost of i-th project in j-th period with respect to the time value of money r ji Revenue of project i in j-th period with respect to the time value of money v ji Revenue of project i in j-th period without taking account the time value of money r ji Standard deviation of the revenue of project i in j-th period with respect to the time value of money r ji Standard deviation of the revenue of project i in j-th period without taking account the time value of money s j Income from investments in banks in j-th period t j Amount of investments in banks in j-th period s jim Cost coefficient of dependent project i in j-th period and project m in period j þ 1 RC j Minimum acceptable revenue of projects in j-th period which is accepted by the decision maker a Maximum acceptable risk for earning at least RC j b Total available budget Rate Annual interest rate b Interest rate of investing in bank Decision variables x ji 1 if i-th project selected in j-th periods; otherwise 0 y j 1 if investment in bank made in j-th period; otherwise 0 subject to: x ji 2 0; 1 f g 8j; i; ð10Þ y j 2 0; 1 f g 8j: ð11Þ Equation (1) represents an objective function consisting of two parts. The first part P T j¼1 s j y j À t j y j þ P i2N j E r ji x ji À Á À c ji x ji À Á maximises the net profit by selecting the projects as well as the investments in banks. The sec- includes all costs reduced due to project relations and dependencies. Constraint (2) represents risk, which assures that the probability of obtained revenue in each period is less than minimum acceptable revenue of the projects in that period. It should also be lower than a: An attempt to rewrite this constraint with respect to the ration of the project costs in j-th period is shown below: where q represents the minimum rate of return (ROR) accepted by the DM. It means DM will not accept the risk of selecting projects if the gained revenue is lower than the cost of selected projects considering minimum acceptable ROR. Constraint (3) reflects the limits on the budget for the first period. Furthermore, all available budgets will be invested since the remainder of the budget after project selection can be entirely invested in banks. Hence, Constraint (3) is formulated as Equality. In this constraint, t 1 is the difference between the initial budget and the summation of costs for selected projects in the first period. Constraint (4) represents the total income generated from investments in banks for period j: Constraints (5), (6) and (7) calculate future values of costs, expected revenues and standard deviation of revenues for period j: Constraint (8) assures that costs in period j þ 1 are less than all incomes in period j: It also assigns these incomes as the budget of next period. Constraint (9) ensures that in each period unavailable project could not be selected. Constraints (10) and (11) define the decision variables.

Illustrative example
In what follows, a numerical example is provided to demonstrate different aspects of the problem and the model presented. We assume that five projects are available in each period and there are three periods in planning horizon (see Fig. 1. Total available budget (b), minimum accepted rate of return (q), interest rate of investing in bank (b) and annual interest rate (Rate) are 919.5, 10, 5 and 6%, respectively.
The cost coefficients of dependent projects (s jim ) are presented in Table 1.
As shown in Table 1, if project 2 is selected in period 1 the cost of project 4 in period 2 will decrease by 5.92%. The part A in Table 1 shows that available projects in each period may be dependent on the projects in the consecutive period and part B indicates that there is no dependency between the projects in each period.
This illustrative example was solved with Lingo 10 and the results are shown in Fig. 2. As can be seen, projects 4 and 5 are selected in the first and the second periods, whereas, in the third period, the projects 2, 4 and 5 are selected. Net profit value for this solution is estimated to be 965.89.
Following these results, an examination of the solution was carried out. If projects 4 and 5 are selected in the first period, the summation of costs will be 449 ? 329 = 778. With respect to the initial budget 919.5, it is evident that the budget is not violated and the remainder i.e., t 1 ¼ 919:5 À 778 ¼ 91:5 will be invested in a bank. Risk constraint is also satisfied because ffiffiffiffiffiffiffiffiffiffiffiffi ffi 33 2 þ20 2 p ¼ 0:012 and it is far below 5%. Net profit of the first period is 1094.075, which is obtained from investing in bank 91:5 Â 1 þ 0:05 ð Þ¼96:075 ð Þ and expected revenue of projects 4 and 5 597 þ 401 ¼ 998 ð Þ : Additionally, net profit of the first period is considered as the budget of the second period. In context with the project dependencies and time value of money, the summation of costs of selected projects will be 389 1:06

Proposed meta-heuristic algorithm
The project portfolio selection problem is a non-deterministic polynomial-time hard (NP-hard) problem (Doerner et al. 2006) which could be solved by meta-heuristic algorithms. As mentioned earlier, three prominent metaheuristic algorithms in the literature are: genetic algorithm (GA), particle swarm optimization (PSO) and EM-like algorithm. So after describing the proposed algorithm, we calibrate all these methods and solve the problem with them.

Genetic algorithm
Genetic algorithms (GAs) are population-based meta-heuristics, which are widely used to solve combinatorial problems. In the last decade, GAs have been used to solve countless problems and found to be an effective and robust search method. In this paper, a GA-based solution approach to solve the PSS model is presented, which is shown in Fig. 3. The first population is generated using generation and evaluation mechanisms which will be discussed later. Each population includes agents called individuals that represent chromosomes in GAs. Encoding of these chromosomes is also discussed in Sect. 4.1.1. After evaluating individuals, they are sorted and the fittest is selected as the global best solution. The main loop then begins at this point. Three types of offspring will be generated; two of them would be generated from current population or parents by the means of a classical GA operator called Crossover and mutation. For the sake of diversity, a portion of brand new offspring are also generated and to make better solutions of current ones, also a local search mechanism is used which is discussed in Sect. 4.1.4. All offspring and parents are evaluated and sorted, and subsequently deleting the worst extra individuals. This pruned generation is considered as current population. The best solution for this generation is compared with the global best solution, and the best amongst them is further considered as the new global best solution. This process will continue until the termination criterion is met.

Encoding scheme
The key issue in using GAs is encoding a solution into a chromosome. This encoding is used to formulate recognisable solutions for computers. A binary matrix T Â maxðn j Þ to encode the solutions is proposed. Each row and column represents one period and one project, respectively. If the element of row i and column j is one, it means that project i in period j is selected. For instance, solution shown in Fig. 2 is decoded as below:

Population generation and evaluation mechanisms
The characteristic of the problem is that the generation mechanism is a step-by-step mechanism. This means that the solutions for the first period are generated first and then a solution of the second period is generated based on the first period's solutions and so on. In each step, the feasibility of the solution is checked against the budget and risk constraints (Fig. 4). This mechanism is shown in Fig. 5.
As budget of each period is calculated simultaneously with the population generation mechanism, evaluating mechanism becomes straight forward. This mechanism is shown in Fig. 6.

GA's operators and offspring
As mentioned earlier, in order to generate new population, the parents and all three types of offspring are used. Type 1 of these offspring are those which are generated by the crossover operator. Two parents are selected by means of tournament selection (see Sect. 5.2). Later, a random binary matrix with the same size as that of the parents is generated as a mask matrix. Elements with value 0 of the mask matrix are selected from first parent and elements with value 1 are selected from the second. If the solution is not feasible, the mask matrix will keep on changing until a feasible solution is reached. Crossover mechanism is shown in Fig. 7. Type 2 of offspring are those generated by the mutation operator. First, a row and a column of parent matrix are selected. Then, if the element is 1, it will change to 0 and vice versa. The feasibility of this process is checked and if the child is not feasible, another row and column will be selected. Figure 8 shows this mechanism.
The last types of offspring are completely new. This type of offspring is generated for the sake of diversity and to avoid sticking to the local optimum. This is generated by the means of population generation mechanism, which was explained earlier. The numbers for each one of these types are predetermined and will be discussed in Sect. 5.2.

Local search mechanism
To enhance the accuracy of the optimal solution found, we apply a local search mechanism. Figure 8 shows pseudo code of this mechanism. We select one of current solutions by the means of tournament selection. It helps us to select better solutions with higher chance. This mechanism starts form very first period and try to add as many projects as it can to the selected projects. First it makes a blacklist filled with selected projects. Next it checks if any projects can be added to this period or not (if all available projects are selected then blacklist is full and no other projects can be added). Then it tries to add project to each period. If more that 50% of projects are not selected it is a good sign that shows we can enhance current period so number of tries is calculated as random number between 50% of the projects to maximum number of tries, otherwise it can be anything from one to maximum number of tries (see Sect. 5.2). Afterwards, in the ''while-loop'' an available project is randomly selected and added to the list of projects. Then, mechanism checks the feasibility of new solution, if it is feasible, mechanism updates solution, budget and adds selected project to the blacklist; otherwise it adds selected project to blacklist and increments try variable by one.

Computational evaluation
This section gives a description of the computational evaluations. It also compares proposed algorithm with GA, PSO and EM-like on different test data sets. These algorithms are coded in MATLAB 7.12 and executed on a laptop computer with Core i7, and Windows 7 using 4 GB of RAM.

Test data generation
Although, in reality, a problem with more than 15 available projects is rarely faced, in this study to gain a more accurate assessment of the proposed algorithms large-scale problems are considered. Table 2 shows problem parameters and the strategy for generating their characteristics. Furthermore, Table 2 shows that the number of available projects is a uniform random number between 5 and 70. Due to fluctuations in Rate, the number of periods is considered to vary between 3 and 12. Since in real-world situations, revenue of fulfilling each project is usually bigger than its cost, r ji is considered to be up to 1.5 times more than cost. Other parameters are calculated based on the data derived from Table 2.

Parameter calibration
Appropriate design of the parameters and operators significantly improves the effectiveness of a meta-heuristic algorithm. In this section, we study the behaviour of the proposed algorithms regarding different operators and parameters. Among different DOE methods, Taguchi approach is one of the most prominent and suitable methods as it does not need full factorial trials. In this approach, orthogonal arrays are used to study numerous decision variables with a limited number of trails. The responses of these trials are converted to the signal-to-nose (S/N) ratio. The following definition for S/N is used for a maximisation problem: where F i is the mean value of fitness function and n is the number of trials. In this paper, eight control factors are included: the number of population (PN), mutation rate (MR), crossover rate (CR), crossover method (CT), brand new population (NP), Local search rate (LS), tries rate (TR), parent selection method (PS). Table 3 depicts the levels of these factors. The orthogonal array L 9 is chosen because it meets all minimum requirements (Fig. 9). This array is presented in Table 4 Ten different problems with different sizes are generated and each experiment is performed three times. With respect to the orthogonal array Procedure: Population evaluation mechanism L 9 , the total performing number is 10 Â 3 Â 9 ¼ 270. After performing the experiments, fitness values are individually transformed into S/N ration. Figure 9 shows the average S/N ratio that is obtained at each level. Based on Fig. 10 and Table 5 the best level for each parameter is set as follows: PN ¼ 30; MU ¼ 0:15; CR ¼ 70; NP ¼ 0:2; LS ¼ 0:6; TR ¼ 03; CT = mask matrix and PS = tournament selection. To assess the impact of each factor on the performance of proposed algorithm, the delta test is used. Table 5 shows levels and delta values obtained by each factor. The most effective factor is the local search usage rate, while number of tries has the least impact on the solutions. The calculation of GA, EM-like and PSO's parameter calibration are not mentioned, but by utilising same methodology and state of art, the best values for each algorithm and associated factors are shown in Table 6.

Results
This section compares the results obtained from the proposed algorithm with the results obtained from three other algorithms. According to prior knowledge, 150 instances are generated (50 instances for each size of problems) and each one is solved 10 times. For the sake of brevity, 82 instances (25, 25 and 32 for small, medium and large sizes) are selected. The results are examined based on four criteria: mean, standard deviation, mean RPD (relative percentage deviation) and run time. Mean value indicates that how much an algorithm is better than the others based on the quality of solutions. Variance is used to find the algorithm having higher precision. Mean RPD is used to find out which algorithm produces results, nearest to best found solution among all algorithms. The RPD is calculated as follows:   RPD experiment ¼ z best across all algorithms À z experiment z best across all algorithms Â 100: In our case, the lower mean RPD implies the better solution. The terminating criterion should be varying based on the size of problem, so each algorithm runs up until 50, 40, 30 iterations in a row with no improvements for small, medium and large size instances, respectively. The run time is then calculated without including these last iterations. Note that for small, medium and large sizes each algorithm executed 10, 10 and 5 times, respectively. Table 7 summarizes computational results for each algorithm and for each size. It shows superiority of the proposed algorithm in almost all indicators and all instances (except 1/25, 4/25 and 2/32 instances in small, medium and large size problems, respectively), which means that the proposed algorithm produces better and more robust results when compared with the others. Note that for large size instances, PSO achieves a better average CPU time, and we will discuss and use it as a leverage to improve our proposed algorithm for large sizes in the next subsection. To examine the significance of this superiority, we should perform ANOVA test. Table 8 shows results of ANOVA test for RPD (as the most important indicator) at 95% level. P values reveal that algorithms are significantly different. Since the proposed algorithm produces better RPD, it can be considered as superior to the others at 95% confidence level. Because of close results of PSO to the proposed algorithm, another ANOVA test is performed at 95% confidence level and the results are shown in Table 9.

Further discussions and improvements
We found out that the proposed algorithm has superiority in terms of all indicators and it performed better than other algorithms, and the only significant merit of PSO in comparison with the proposed algorithm is the computational time when problems have large size. Table 10 shows

Main Effects Plot (data means) for SN ratios
Signal-to-noise: Larger is better    The results show if the initial solution is good enough, the proposed algorithm can reach its best solution faster and more efficient. So we used PSO algorithm to generate initial solution. But to reach the best results we should find an optimal time for generating initial solution using PSO.
Based on the behaviour of PSO versus time (as represented in Fig. 10) we examined results for 60, 70, 80, 90 and 100 percent of needed time of PSO by running hydride algorithm for large size problems. Table 11 shows the results, and it reveals that the best time for generating and combining PSO results with the proposed algorithm is 90%. It means if we generate initial solution of the large instances with PSO at 90% of needed time, we can improve the computation time of the proposed algorithm by almost 40% which is really significant and helpful at large size instances.