Introduction

Portfolio optimization deals with the problems of how to allocate the total wealth among a number of assets. The first mathematical model for portfolio selection, formulated by Markowitz (1952, 1959), evaluates investments in terms of their mean and variance. A fundamental assumption of Markowitz model is that the investor knows the true expected return. However, in practice, investors need to estimate the expected return as investment return changes over time. In real life, due to lack of historical data about security return, such as new security markets, it is difficult to forecast the investment return accurately (Qin et al. 2013). The classical portfolio formulation ignores the estimation error and thereby performs poorly in uncertain conditions. Therefore, it is needed to develop a portfolio optimization methodology that considers data uncertainty by integrating statistical methods and experts’ experience to estimate the future return on investment.

The ability to predict the future return of a business based on historical data is the biggest challenge for any investment as it is affected by uncertainties from various contributing sources. Sources of uncertainty may be divided into two types: aleatory and epistemic (Oberkampf et al. 2004; Khalaj et al. 2013). Aleatory uncertainty is irreducible. Examples include phenomena that exhibit natural variation such as operating conditions, material properties. In contrast, epistemic uncertainty results from a lack of knowledge about the system, or due to approximations in the system behavior models, or due to limited or subjective data (e.g., expert opinion); it can be reduced as more information about the system is obtained. Epistemic uncertainty regarding model parameters can be viewed in two ways. It can be defined with reference to a stochastic quantity whose distribution type and/or distribution parameters are not precisely known (Baudrit and Dubois 2006), or with reference to a deterministic quantity whose value is not precisely known (Helton et al. 2004). This paper focuses on handling the first definition of epistemic uncertainty, i.e., epistemic uncertainty with reference to a stochastic quantity. In some cases, distribution information of a random variable may only be available as intervals given by experts. The objective of this paper is to develop an efficient robust portfolio optimization methodology that includes both aleatory uncertainty and epistemic uncertainty described through interval data.

Robustness can be defined as the ability of a system to be insensitive to small departures from the assumptions on which it depends or the degree to which the system operates correctly in the presence of uncertain environmental conditions (Hu et al. 2006; Mehrbod et al. 2015; Hafezalkotob et al. 2015; Vafaeinezhad et al. 2016; Fereiduni and Shahanaghi 2017). The essential elements of robust optimization are: (1) ensuring objective robustness, (2) ensuring feasibility robustness, (3) estimating mean and measure of variation (e.g., variance) of the performance function, and (4) multi-objective optimization. A detailed description of these four components can be found in Zaman et al. (2011a). In this paper, we propose a methodology for robustness-based portfolio optimization to include estimation error in the decision-making framework so that the resulting optimal portfolio is least sensitive to the variations in the input parameters.

There is now an extensive volume of methods and applications available for portfolio optimization (e.g., Ehrgott et al. 2004; Mulvey 2004; Boyle et al. 2008; Anagnostopoulos and Mamanis 2010; Oliveira et al. 2011; Mansini et al. 2014; Bekiros et al. 2015; Tofighian et al. 2018). While some of the existing methods include robustness in the portfolio optimization framework (e.g., Kawas and Thiele 2008; DeMiguel and Nogales 2009; Delage and Ye 2010; Chen et al. 2011; Zymler et al. 2011), most of the existing methods for portfolio optimization can deal with only aleatory uncertainty. A few methods exist that deal with both aleatory uncertainty and epistemic uncertainty. In order to deal with epistemic uncertainty, the concept of stochastic dominance has been used in the portfolio choice problem by many authors including Kuosmanen (2004), Berleant et al. (2008), Dentcheva and Ruszczynski (2010), and Post and Kopa (2013). Berleant et al. (2008) proposed methodology for portfolio management under epistemic uncertainty using stochastic dominance and information gap theory. Xingyu (2013) developed a robust model based on constant elasticity of variance considering both input uncertainty and underlying distribution uncertainty using Monte Carlo simulation. Kawas and Thiele (2008) developed a log robust linear programming problem with theoretical insight of worst-case uncertainty, where probabilistic assumption is not required. Fuzzy approach (Carlsson et al. 2007; Abiyev and Menekay 2007; Zhang et al. 2009; Huang 2011) has also been used to represent epistemic uncertainty in robust portfolio model.

Most of the current methods of robust portfolio optimization under epistemic uncertainty use probabilistic approach to deal with aleatory uncertainty and non-probabilistic approach to deal with epistemic uncertainty, which may result in expensive nested analysis. Also, some of these methods need additional non-probabilistic formulations to incorporate epistemic uncertainty into the robust optimization framework, which may be computationally expensive. However, if the epistemic uncertainty can be converted to a probabilistic format, the need for these additional formulations is avoidable, and well-established probabilistic methods of robust design optimization (e.g., Zaman et al. 2011a; Zaman and Mahadevan 2013) can be used in the framework for robust portfolio optimization. Therefore, there is a need for an efficient robust portfolio optimization methodology that deals with both aleatory uncertainty and epistemic uncertainty.

In this paper, we propose robustness-based portfolio optimization formulations using probabilistic representation of epistemic uncertainty. This paper specifically focuses on epistemic uncertainty arising from interval data. Epistemic uncertainty is represented using two approaches: (1) moment bounding approach (Zaman et al. 2011b) and (2) likelihood-based approach (Zaman and Dey 2017). The main contribution of this paper is to propose a robust portfolio optimization methodology that does not require separate representation for epistemic uncertainty and aleatory uncertainty as it can handle both types of uncertainty using probabilistic format. This paper first proposes a nested robustness-based portfolio optimization formulation using the moment bounding approach-based representation of epistemic uncertainty. The nested robust portfolio formulation is simple to implement; however, the computational cost is often high due to the epistemic analysis performed inside the optimization loop. A decoupled approach is then proposed to un-nest the robustness-based portfolio optimization from the analysis of epistemic variables to achieve computational efficiency. This paper also proposes a single-loop robust portfolio optimization formulation using the likelihood-based representation of epistemic uncertainty that completely separates the epistemic analysis from the portfolio optimization framework and thereby achieves further computational efficiency.

The proposed methodology intends to achieve robustness by simultaneously optimizing the mean (i.e., portfolio return) and minimizing the variation of the performance function (i.e., portfolio risk). Therefore, the performance of robust portfolio can be defined by the mean and variation of the performance function. In our proposed formulations, we obtain the optimum mean value of the objective function (e.g., portfolio return) while also minimizing its variation (e.g., standard deviation). Thus, the optimal portfolio will meet target values in terms of both mean values and standard deviations of the problem parameters. One of the most significant contributions of this paper is to propose new methods to estimate the bounds on the median and semi-variance of multiple-interval data. Therefore, in the proposed methodology, we solve the portfolio optimization formulations by using four different risk–return measures: mean–variance, median–variance, mean–downside risk, and median–downside risk. The proposed robust portfolio optimization formulations are tested on real market data from five S&P 500 companies, and performance of the robust optimization models is discussed empirically based on portfolio return and risk. It is seen that the single-loop robust portfolio optimization formulation generates better optimal solutions than the decoupled approach in terms of both portfolio return and risk. The proposed decoupled formulations are also compared with a nominal mean–variance model, and it is seen that the proposed decoupled formulations generate conservative solutions in the presence of epistemic uncertainty. The main aspects of this paper are summarized as follows. First, we propose approaches to quantify portfolio return and risk under data uncertainty using four different risk–return measures. We then discuss the performance of each risk–return measure empirically.

The remainder of the paper is organized as follows. “Portfolio optimization under aleatory uncertainty” section gives an overview of the basic portfolio optimization. “Proposed methodologies” section describes the proposed robustness-based portfolio optimization methodologies under aleatory uncertainty and epistemic uncertainty. In “Numerical examples” section, we illustrate the proposed methodologies with numerical examples. “Conclusions” section concludes the paper with future work.

Portfolio optimization under aleatory uncertainty

Portfolio optimization

The first mathematical model for portfolio optimization developed by Markowitz (1952, 1959) was mean–variance portfolio, where the “portfolio return” was measured by maximizing the expected value of random portfolio returns and “portfolio risk” was quantified by minimizing the variance of portfolio returns. For a given level of risk, the investor may choose the portfolio with the highest expected return. The classical formulation for maximizing the expected return for an upper limit on the variance can be written as follows:

$$\begin{aligned} & \mathop {\hbox{max} }\limits_{{x_{i} }} f = \sum\limits_{i = 1}^{n} {x_{i} r_{i} } \\ & {\text{s}} . {\text{t}} .\\ & \quad \sum\limits_{i = 1}^{n} {\sum\limits_{j = 1}^{n} {x_{i} x_{j} \sigma_{ij} } } \le V \\ & \quad \sum\limits_{i = 1}^{n} {x_{i} = 1} \\ & \quad x_{i} \ge 0\quad \forall i \\ \end{aligned}$$
(1)

where \(x_{i}\) is the fraction of capital invested in asset \(i,\) \(x_{j}\) is the fraction of capital invested in asset \(j,\) \(r_{i}\) is the expected value of return for asset \(i,\) \(\sigma_{ij}\) is the covariance of the return between assets \(i\) and \(j,\) and \(V\) is the maximum allowable portfolio risk.

Similarly, for a given level of expected return, the investor may choose the portfolio with the minimum risk. The classical formulation for minimizing the variance for a lower limit on expected return can be expressed as follows:

$$\begin{aligned} & \mathop {\hbox{min} }\limits_{{x_{i} }} f = \sum\limits_{i = 1}^{n} {\sum\limits_{j = 1}^{n} {x_{i} x_{j} \sigma_{ij} } } \\ & {\text{s}} . {\text{t}} .\\ & \quad \sum\limits_{i = 1}^{n} {x_{i} r_{i} } \ge R \\ & \quad \sum\limits_{i = 1}^{n} {x_{i} = 1} \\ & \quad x_{i} \ge 0\quad \forall i \\ \end{aligned}$$
(2)

where R is the minimum portfolio return.

Portfolio optimization may have different risk–return measures. Return can be maximized considering mean or median, and risk can be minimized by minimizing variation considering variance (Delage and Ye 2010; Chen et al. 2011), lower semi-variance (Boasson et al. 2011), mean absolute deviation (Konno and Yamazaki 1991), etc. As a robust statistic, sample median, which is not affected by the outliers, is recently used in portfolio optimization instead of sample mean (e.g., Benati 2015; Salah et al. 2016). Value-at-risk (Fertis et al. 2012), conditional value-at-risk (Huang et al. 2010), partitioned value-at-risk (Goh et al. 2012), asymmetry-robust value-at-risk (Natarajan et al. 2008), worst-case value-at-risk (Huang et al. 2007), worst-case polyhedral value-at-risk (Zymler et al. 2013), worst-case quadratic value-at-risk (Zymler et al. 2013) have also been used to minimize the risk in portfolio optimization, but still many recent improvements of robust optimization focus on the conventional mean–variance structure (e.g., DeMiguel and Nogales 2009; Delage and Ye 2010; Chen et al. 2011). Sharp ratio is another important measure of risk which is widely used in portfolio management (Ghosh and Mahanti 2009). All these return and risk measures have their own advantages and limitations. A brief discussion of various techniques to estimate the parameters for portfolio optimization is presented next.

Parameter estimation for portfolio model

According to Markowitz (1952, 1959), the parameters such as mean, variance, and correlation coefficient are known with certainty. However, in practice, it is needed to estimate the parameters from historical data for the uncertain future. Because of estimation error, the resulting portfolio weights fluctuate substantially over time, which may result in an unrealistic and unstable portfolio model. Therefore, parameters need to be estimated with great accuracy to induce stability in portfolio model.

There exists a large volume of literature on ex ante portfolio parameter estimation methods. Plug-in approach (Kan and Zhou 2007; Brandt 2009) is a popular and innate one when both the sample mean and covariance matrix are unknown to the investor, but it fails to include estimation risk (Li 2015). Another approach for portfolio parameter estimation is the resampling method (Bennett 2013; Yu et al. 2013) that improves the efficiency of investment by reducing the estimation error and enhancing the robustness of the classical mean variance portfolio model. Researchers have recently focused on minimum variance portfolio, which only estimates the covariance matrix (e.g., Clarke et al. 2011; Mostowfi and Stier 2013; Yang et al. 2015) to minimize the potential sampling error of estimation. James–Stein estimation (Jorion 1986), also known as Shrinkage estimation (Jagannathan and Ma 2003; Ledoit and Wolf 2003; Pollak 2012; Stefanovits et al. 2014), is one of the most commonly used estimators of mean and covariance, which generates minimum variance portfolios incorporating significant short-scale position (Disatnik and Benninga 2007). Nonlinear shrinkage estimation of the sample covariance matrix performs well in the case when the sample size is very large as compared to the number of assets (Ledoit and Wolf 2012). A frequently used approach is the Bayesian approach (Markowitz and Usmen 2003; Shi and Irwin 2005; Levy and Levy 2014; Stefanovits et al. 2014), which considers sampling error and uncertainty to estimate parameters under predictive distribution of asset returns. Maximum likelihood estimation (MLE) is a popular method to estimate the sample mean and covariance, which is widely used for diversification of stock market portfolio (e.g., Pandher 2001; Valadkhani et al. 2008; Făt and Dezsi 2012; Lingaraja et al. 2015). Maximum likelihood estimation constructs an estimator in order to estimate the unknown distribution parameters (P) from the observed data. It is statistically well understood and least affected by sampling error. Others approaches that reduce sampling error in parameter estimation include bootstrap method (Hall 1992; Hall and Yao 2003; Mendes and Leal 2010) and \(\delta\) method (Weisberg 2014). \(\delta\) method is useful to estimate the parameters when sample size is large, whereas bootstrap method is useful when sample size is very small (Wu et al. 2017).

Most existing methods estimate model parameters by using expected values, which leaves the estimated parameters with estimation error or uncertainty. However, in portfolio optimization, the expected returns and the covariances are also uncertain, which directly affects investment decision making as the solutions to optimization problems show remarkable sensitivity to uncertainty. Robustness-based portfolio optimization takes this uncertainty into account.

Robust portfolio optimization

Unlike classical mean–variance portfolio optimization, where all input parameters are estimated using expected values, robustness-based portfolio optimization takes the input parameter uncertainty into account so that the resulting solution is less sensitive to the variations of the input random variables. The robustness-based portfolio optimization problem under aleatory uncertainty alone can be formulated as follows:

$$\begin{aligned} & \mathop {\hbox{max} }\limits_{{x_{i} }} f\left( x \right) = w \times \sum\limits_{i - 1}^{n} {r_{i} x_{i} } - v \times \sum\limits_{i = 1}^{n} {\sum\limits_{j = 1}^{n} {x_{i} x_{j} \sigma_{ij} } } \\ & {\text{s}} . {\text{t}} .\\ & \quad \sum\limits_{i = 1}^{n} {\sum\limits_{j = 1}^{n} {x_{i} x_{j} \sigma_{ij} } } \le V \\ & \quad \sum\limits_{i = 1}^{n} {x_{i} = 1} \\ & \quad LB \le x_{i} \le UB \\ \end{aligned}$$
(3)

where \(w \ge 0\) and \(v \ge 0\) are the weighting coefficients that represent the relative importance of the objectives. Sometimes the investors are motivated by others to invest a particular amount of capital in particular assets. Therefore, the fractions of capital invested in different assets have lower and upper bounds. LB and UB are the vectors of lower and upper bounds of decision variables \(x_{i}\).

The implementation of Eq. (3) requires that all \(r_{i} 's\) and \(\sigma_{ij} 's\) be precisely known, which is possible only when a large number of data points are available. In practical situations, only a small number of data points may be available for the input variables. In other cases, information about random input variables may only be specified as intervals, as by expert opinion. This is input data uncertainty (i.e., epistemic uncertainty), causing uncertainty regarding the expected value and covariance of the returns. Robustness-based optimization has to take this into account. In the following section, we propose a new methodology for robustness-based portfolio optimization that accounts for data uncertainty.

Proposed methodologies

In this paper, we propose three formulations: a nested-loop formulation, a decoupled formulation, and a single-loop formulation for robust portfolio optimization under epistemic uncertainty. Nested-loop and decoupled formulations are presented for four different risk–return measures: meanvariance, medianvariance, meandownside risk, and mediandownside risk; single-loop formulation is presented for meanvariance portfolio measure.

Nested-loop formulation

The inclusion of epistemic uncertainty in robust portfolio model adds another level of complexity in the optimization methodology. The input variables \(r_{i}\) and \(\sigma_{ij}\) in Eq. (3) might have epistemic uncertainty. Since the investment analyst does not have any control on the epistemic variables \(r_{i}\) and \(\sigma_{ij}\), the portfolio optimization methodology has to employ a search among the possible values of such epistemic variables in order to find an optimal solution. The main feature of this nested-loop formulation is that it consists of an outer optimization loop that repeatedly calls the inner optimization loop to get the optimum solution. In the nested portfolio optimization, we maximize the outer-loop objective function at each iteration of the inner-loop optimization. In such case, we get a conservative robust solution. The robustness-based portfolio optimization problem under both aleatory uncertainty and epistemic uncertainty can now be formulated with the following generalized statement, where the objective is to maximize the worst-case cost function (i.e., lower bound of the cost function, which is due to the epistemic uncertainty):

$$\begin{aligned} & \mathop {\hbox{max} }\limits_{{x_{i} }} \left( {\mathop {\hbox{min} }\limits_{{r_{i} ,\sigma_{ij} }} f\left( {x,r,\sigma } \right) = w\sum\limits_{i = 1}^{n} {r_{i} x_{i} - \left( {1 - w} \right)\sum\limits_{i = 1}^{n} {\sum\limits_{j = 1}^{n} {x_{i} x_{j} \sigma_{ij} } } } } \right) \\ & {\text{s}} . {\text{t}} .\\ & \quad \sum\limits_{i = 1}^{n} {\sum\limits_{j = 1}^{n} {x_{i} x_{j} \sigma_{ij} } } \le V \\ & \quad \sum\limits_{i = 1}^{n} {x_{i} = 1} \\ & \quad r_{l} \le r_{i} \le r_{u} \\ & \quad lb \le \sigma_{ij} \le ub \\ & \quad LB \le x_{i} \le UB \\ \end{aligned}$$
(4)

where \(r_{l}\) and \(r_{u}\) are the vectors of lower and upper bounds of the decision variables r, and lb and ub are the vectors of lower and upper bounds of the decision variables \({\varvec{\upsigma}}\) of the inner-loop optimization problem.

Note that the outer-loop optimization is a portfolio optimization problem, where a robust portfolio optimization is carried out for a fixed set of epistemic variables. The inner-loop optimization is the analysis for the epistemic variables, where the optimizer searches among the possible values of the epistemic variables to calculate the lower bound of the objective function value.

The nested formulation does not guarantee to converge, and even if it converges, it is computationally very expensive. In nested approach, for every iteration of the epistemic analysis, the portfolio optimization problem under aleatory uncertainty has to be repeated. Therefore, in the following subsection, we propose a decoupled approach to un-nest the portfolio optimization problem from the epistemic analysis and thereby achieve computational efficiency.

Decoupled approach

In this paper, we decouple the nested problem from the analysis of epistemic variables, which can be expressed as:

$$\begin{aligned} & x_{i}^{*} = \mathop {\arg \hbox{max} }\limits_{{x_{i} }} \left( {w\sum\limits_{i = 1}^{n} {r_{i}^{*} x_{i} - \left( {1 - w} \right)\sum\limits_{i = 1}^{n} {\sum\limits_{j = 1}^{n} {x_{i} x_{j} \sigma_{ij}^{*} } } } } \right) \\ & {\text{s}} . {\text{t}} .\\ & \quad \sum\limits_{i = 1}^{n} {\sum\limits_{j = 1}^{n} {x_{i} x_{j} \sigma_{ij}^{*} } } \le V \\ & \quad \sum\limits_{i = 1}^{n} {x_{i} = 1} \\ & \quad LB \le x_{i} \le UB \\ \end{aligned}$$
(5)
$$\begin{aligned} & r_{i}^{*} ,\sigma_{ij}^{*} = \mathop {\arg \hbox{min} }\limits_{{r_{i} ,\sigma_{ij} }} \left( {w\sum\limits_{i = 1}^{n} {r_{i} x_{i}^{*} - \left( {1 - w} \right)\sum\limits_{i = 1}^{n} {\sum\limits_{j = 1}^{n} {x_{i}^{*} x_{j}^{*} \sigma_{ij} } } } } \right) \\ & {\text{s}} . {\text{t}} .\\ & \quad \sum\limits_{i = 1}^{n} {\sum\limits_{j = 1}^{n} {x_{i}^{*} x_{j}^{*} \sigma_{ij} } } \le V \\ & \quad r_{l} \le r_{i} \le r_{u} \\ & \quad lb \le \sigma_{ij} \le ub \\ \end{aligned}$$
(6)

The optimization problems in Eqs. (5) and (6) are solved iteratively until convergence. Note that \(r_{i}^{*}\) and \(\sigma_{ij}^{*}\) are fixed quantities in the optimization in Eq. (5), and \(x_{i}^{*}\) and \(x_{j}^{*}\) are fixed quantities in the optimization in Eq. (6).

Robust portfolio models under epistemic uncertainty

This section develops methodologies for robustness-based portfolio optimization under epistemic uncertainty, using the formulations in Eqs. (5) and (6). The proposed nested and decoupled formulations are able to accommodate four different risk–return measures: mean–variance, median–variance, mean–downside risk, and median–downside risk. Portfolio mean is a popular measure of return as it considers all data of asset return; however, portfolio median is a robust statistic as it is not affected by the outlier. Variance or standard deviation of portfolio returns is a statistical measure of dispersion, which is one of the best known measures of risk. Generally, a larger variance indicates greater uncertainty and risk in terms of future returns. However, variance is a volatile measure of risk because minimum variance indicates minimum risk; on the other hand, higher value of variance does not always present higher risk as variance increases when the assets earn higher returns. Higher returns of assets are always desirable, and it is an opportunity, not a risk. Downside risk, also known as lower semi-variance, is a true measure of risk as it quantifies losses and seems better than variance. When investors consider variance, it diversifies both risks and opportunities; however, semi-variance diversifies away risks.

In the following subsections, the decoupled formulations are presented for four different portfolio risk return measures, when the information on the portfolio returns is available as single- and/or multiple-interval data. Once the bounds on different return and risk measures are estimated using the methods described below, we can use Eqs. (5) and (6) to solve the robustness-based portfolio optimization problem under epistemic uncertainty.

Robust mean–variance portfolio model

In this paper, we use the mean–variance portfolio measure for both single- and multiple-interval data. In mean–variance portfolio measure, portfolio return is calculated in terms of mean of the asset returns and risk is measured by the variance of asset returns. In this case, in Eqs. (5) and (6), \(r_{i}\) is the mean of return for asset \(i\) and \(\sigma_{ij}\) is the covariance of the return for assets i and j. The mean–variance portfolio optimization formulations require that the first two moments of interval data be estimated as bounds. Zaman et al. (2011b) proposed optimization-based algorithms to calculate the bounds on the moments for both single- and multiple-interval data as shown in Tables 1 and 2, respectively.

Table 1 Methods for calculating moment bounds for single-interval data
Table 2 Methods for calculating moment bounds for multiple-interval data

Since covariance is a monotone function with respect to both variance and correlation coefficient, once the bounds on the moments of interval data are estimated by the methods described above, we can now use the bounds on variance to obtain the bounds on the covariance of asset returns as follows:

$$\left[ {\begin{array}{*{20}c} {\underline{{\sigma_{ij} }} } & {\overline{{\sigma_{ij} }} } \\ \end{array} } \right] = \left[ {\begin{array}{*{20}c} {\left( {\underline{{r_{ij} }} \times \underline{{\sigma_{i} }} \times \underline{{\sigma_{j} }} } \right)} & {\left( {\overline{{r_{ij} }} \times \overline{{\sigma_{i} }} \times \overline{{\sigma_{j} }} } \right)} \\ \end{array} } \right],$$
(7)

where \(\underline{{r_{ij} }}\) and \(\overline{{r_{ij} }}\) are the lower and upper bounds on the correlation coefficient of the asset returns between asset i and asset j, respectively.

Note that in many problems, it is likely that interval data for individual assets are not observed simultaneously. Therefore, it is impractical to calculate the correlation coefficients among the asset returns which are described by interval data. Zaman et al. (2013) assumed that with interval data the correlations among the input variables (i.e., asset returns) are unknown and therefore can range from − 1 to 0 or 0 to + 1. In this paper, we assume that the correlation between two asset returns is available as bounds [\(\underline{{r_{ij} }}\) \(\overline{{r_{ij} }}\)]. Once the bounds on the expected returns and covariances are obtained using the methods described above, we can now use Eqs. (5) and (6) to solve robustness-based portfolio optimization problem under epistemic uncertainty represented through single-interval or multiple-interval data.

Robust median–variance portfolio model

In this paper, we use the median–variance portfolio measure for multiple-interval data only. In median–variance portfolio measure, portfolio return is calculated in terms of median of the asset returns and risk is measured by the variance of asset returns. In this case, in Eqs. (5) and (6), \(r_{i}\) is the median of return for asset \(i,\) and \(\sigma_{ij}\) is the covariance of the return for assets \(i\) and \(j.\) The median–variance formulations require that the median and variance be estimated as bounds. We propose a new method to estimate the bounds on the median as follows.

Since the median is a monotone function, bounds on median of multiple-interval data can be estimated as:

$$\left[ {\begin{array}{*{20}c} {\underline{M} } & {\overline{M} } \\ \end{array} } \right] = \left[ {\begin{array}{*{20}c} {{\text{Median}}\left( {a_{i} } \right)} & {{\text{Median}}\left( {b_{i} } \right)} \\ \end{array} } \right]$$
(8)

For example, if the returns on investment have the following interval data: [4, 7; 4.5, 8; 5, 8; 2, 4; 8.5, 10], then bounds on median are calculated as [4.5, 8], whereas bounds on mean are [4.5, 7.4], according to Table 2.

Once the bounds on median are estimated using Eq. (8) and bounds on covariance are estimated using Eq. (7), we can now use Eqs. (5) and (6) to solve robustness-based portfolio optimization problem described by multiple-interval data.

Robust mean–downside risk portfolio model

In this paper, we use the mean–downside risk portfolio measure for multiple-interval data only. In mean–downside risk portfolio, portfolio return is calculated in terms of mean of asset returns and risk is measured in terms of lower semi-variance of asset returns. In Eqs. (5) and (6), \(r_{i}\) is the mean of return for asset \(i,\) and \(\sigma_{ij}\) is the semi-covariance of the return for assets \(i\) and \(j.\) The mean–downside risk formulations require that the mean and semi-variance of interval data be calculated as bounds. In the following discussion, we propose a new method to estimate the bounds on half-variance.

Lower semi-variance for multiple-interval data, which is always associated with losses, can be estimated as:

$$\begin{aligned} & \mathop {\hbox{min} /\hbox{max} }\limits_{{y_{1} , \ldots ,y_{n} }} \quad \tilde{\sigma } = \frac{1}{n}\sum\limits_{i = 1}^{n} {\left[ {\hbox{min} \left\{ {\left( {y_{i} - \frac{1}{n}\sum\limits_{j = 1}^{n} {y_{j} } } \right),0} \right\}} \right]^{2} } \\ & {\text{s}} . {\text{t}} .\quad \quad \quad lb_{i} \le y_{i} \le ub_{i} \quad i = \left\{ {1, \ldots ,n} \right\} \\ \end{aligned}$$
(9)

For example, if the returns on investment have the following interval data: [4, 7; 4.5, 8; 5, 8; 2, 4; 8.5, 10], then bounds on the lower semi-variance are calculated as [0.5923, 5.0240], whereas bounds on variance can be calculated as [2.0250, 8], using the method given in Table 2.

Once the bounds on the lower semi-variance are estimated using Eq. (9), we can use Eq. (7) to estimate the bounds on lower semi-covariance; these bounds are then used in Eqs. (5) and (6) to solve robust portfolio optimization problem under epistemic uncertainty.

Robust median–downside risk portfolio model

In this paper, we use the median–downside risk portfolio measure for multiple-interval data only. In this portfolio risk–return measure, portfolio return is calculated in terms of median of asset returns and risk is measured in terms of lower semi-variance of asset returns. In Eqs. (5) and (6), \(r_{i}\) is the median of return for asset \(i,\) and \(\sigma_{ij}\) is the semi-covariance of the return for assets \(i\) and \(j.\) The median–downside risk formulations require that the median and semi-covariance of interval data be calculated as bounds. Bounds on median are estimated using Eq. (8), and the bounds on lower semi-covariance are estimated using Eqs. (7) and (9); these bounds are then used in Eqs. (5) and (6) to solve robust portfolio optimization problem under epistemic uncertainty.

The proposed decoupled approach is computationally efficient than the nested formulation. However, this is an iterative approach, where a portfolio problem and an uncertainty analysis problem for epistemic variables are solved iteratively until convergence. We can achieve further computational efficiency if the uncertainty analysis for the epistemic variables is carried out outside the portfolio optimization framework. In the following subsection, we propose such an efficient single-loop approach for robustness-based portfolio optimization under both aleatory uncertainty and epistemic uncertainty.

Single-loop formulation

In this approach, epistemic uncertainty about portfolio return data is quantified by a maximum likelihood-based approach. Zaman and Dey (2017) recently proposed a worst-case maximum likelihood-based estimation (WMLE) approach to obtain a unique distribution for the random variables with interval data so that the double-loop procedure for robust optimization can be eliminated. They proposed a nested optimization formulation to find the worst-case maximum likelihood estimates of the distribution parameters of a random variable described by multiple-interval data, which ignores the correlation among input variables. In this paper, we have modified their WMLE approach to include correlated input random variables as follows.

We use a multivariate normal distribution to fit portfolio return data available as multiple intervals. The log-likelihood function for n observations of random variable \(y_{i}\) for multivariate normal distribution is:

$$\log \left( {L\left( {\mathbf{p}} \right)} \right) = \log \left( {L\left( {\mu ,\sigma } \right)} \right) = - \frac{nM}{2}\log \left( {2\pi } \right) - \frac{n}{2}\log \left( {\det \sum } \right) - \frac{1}{2}\sum\limits_{i = 1}^{n} {\left( {Y_{i} - \mu } \right)^{T} \sum^{ - 1} \left( {Y_{i} - \mu } \right)}$$
(10)

The mean (µ) and the covariance (\(\varSigma\)) are the parameters of the multivariate normal distribution, where mean (\(\mu\)) is an M-vector and covariance (\(\varSigma\)) is an M × M matrix.

We solve a nested optimization formulation with the objective of the outer optimization problem being maximization of the likelihood function (Eq. (10)) and the objective of the inner optimization problem being minimization of the likelihood with the data points constrained to fall within each of the respective intervals. The maximum likelihood estimation problem under interval uncertainty can now be formulated with the following generalized statement, where the objective is to maximize the worst-case likelihood (i.e., lower bound of the likelihood, which is due to the epistemic uncertainty) (Zaman and Dey 2017):

$$\begin{aligned} & \mathop {\hbox{max} }\limits_{{\mathbf{p}}} \left( {\mathop {\hbox{min} }\limits_{y} \left( {f\left( {{\mathbf{y}}\left| {\mathbf{p}} \right.} \right) = \log \left( {L\left( {{\mathbf{p}};{\mathbf{y}}} \right)} \right)} \right)} \right) \\ & {\text{s}} . {\text{t}} .\quad lb_{i} \le y_{i} \le ub_{i} \quad {\text{for}}\;\;i = 1,2, \ldots ,n \\ \end{aligned}$$
(11)

where the decision variables y of the inner-loop optimization problem are the configurations of multiple-interval data (y = [y1 y2 y3  yn]), which are constrained to fall within the respective intervals \(([\begin{array}{*{20}c} {lb} & {ub} \\ \end{array} ])\). In this formulation, the outer-loop decision variables p are the parameters, µ and \(\varSigma\) of the multivariate normal distribution. Note that the use of correlated random variables in the WMLE approach requires that equal number of data be available for each input random variable.

The advantage of the above uncertainty quantification method is to generate a single PDF for a random variable in the presence of interval uncertainty, which then can be conveniently used in any existing algorithms for optimization under uncertainty. In the following discussion, we propose a new methodology for robustness-based portfolio optimization using the likelihood-based representation of epistemic uncertainty.

Likelihood-based robust portfolio optimization

In the proposed robustness-based portfolio optimization framework, the uncertainty analysis of the epistemic variables is done outside the design optimization framework using the WMLE approach. The resulting single-loop formulation is equivalent to a portfolio formulation under aleatory uncertainty alone, which completely eliminates the need for a nested analysis or an epistemic uncertainty analysis within the portfolio optimization framework. Therefore, the proposed robustness-based portfolio optimization methodology can solve the investment problem with a marginally increased computational effort than a portfolio formulation under aleatory uncertainty alone, where the increased computational cost is due to the worst-case maximum likelihood estimates of epistemic uncertainty.

The proposed single-loop formulation for the likelihood-based robust portfolio optimization can be expressed as:

$$\begin{aligned} & x_{i}^{*} = \mathop {\arg \hbox{max} }\limits_{{x_{i} }} \left( {w\sum\limits_{i = 1}^{n} {r_{i}^{*} x_{i} - \left( {1 - w} \right)\sum\limits_{i = 1}^{n} {\sum\limits_{j = 1}^{n} {x_{i} x_{j} \sigma_{ij}^{*} } } } } \right) \\ & {\text{s}} . {\text{t}} .\\ & \quad \sum\limits_{i = 1}^{n} {\sum\limits_{j = 1}^{n} {x_{i} x_{j} \sigma_{ij}^{*} } } \le V \\ & \quad \sum\limits_{i = 1}^{n} {x_{i} = 1} \\ & \quad LB \le x_{i} \le UB \\ \end{aligned}$$
(12)

In Eq. (12), the decision variables are the fraction of capital invested in asset \(i,\) \(x_{i}\) where the epistemic variables are kept fixed at \(r_{i}^{*}\) and \(\sigma_{ij}^{*}\). Unlike the nested and decoupled formulations discussed in “Nested-loop formulation” and “Decoupled approach” sections respectively, the proposed robustness-based portfolio optimization formulation is solved using the worst-case maximum likelihood estimates of the mean value \(r_{i}^{*}\) and covariance \(\sigma_{ij}^{*}\) of the epistemic variables obtained through the likelihood-based approach discussed earlier.

Since the optimization formulation in Eq. (12) is solved with a fixed set of epistemic variables, Eq. (12) is equivalent to a robust portfolio formulation under aleatory uncertainty alone. Unlike the nested formulation, the proposed formulation does not suffer from any convergence issues. The proposed formulation also does not require any epistemic analysis within the portfolio optimization framework.

In the following section, we illustrate our proposed methodologies for robustness-based portfolio optimization with both single- and multiple-interval data.

Numerical examples

The proposed methodologies are illustrated with two numerical examples. Each example has data from five different S&P 500 companies. Example 1 has point data, single-interval data, and multiple-interval data, where the numbers of observations for point and multiple-interval data are not the same. Example 2 has point data and multiple-interval data, where the numbers of observations are the same.

Example 1

We consider point data for the return of three companies (ZTH, ALTR and MS), single-interval data for one company (ABT), and multiple-interval data for one company (NTAP) as given in Table 3. It is assumed that the correlations of asset four with the first three assets are negative with correlation coefficients, ρ = [− 0.9, − 0.2], and correlations of asset five with others are positive with correlation coefficients, ρ = [0.2, 0.9].

Table 3 Data of five S&P 500 companies for Example 1

Sometimes there exist some psychological and emotional effects on investment decision making and the investors are motivated by others to invest a particular amount of wealth on particular assets. Therefore, it is assumed that the investor is biased to invest at least 20% of total money on the first asset.

Since this problem contains single-interval data, we cannot solve this problem using the robust portfolio optimization methods that use median and lower semi-variance. Also, MLE-based single-loop portfolio method cannot be applied to this problem as the numbers of observations for point and interval data are not the same. Therefore, Example 1 is illustrated for decoupled approach-based mean–variance portfolio model only.

Since the assets ABT and NTAP contain single- and multiple-interval data, respectively, bounds on the first two moments of these return data are estimated by the moment bounding methods as given in Tables 1 and 2, respectively. The bounds on covariance are calculated using Eq. (7). Once the bounds on mean asset return and covariance are estimated, the next step is to solve Eqs. (5) and (6) iteratively until convergence to obtain optimal solutions for the decoupled mean–variance portfolio model.

The weight parameter w is varied (from 0 to 1), and the optimization formulations in Eqs. (5) and (6) are solved by the MATLAB solver “fmincon.” For each weight, the optimization problems converged in five iterations. In order to demonstrate the efficiency of the proposed decoupled formulation, we also solve this problem using a nominal mean–variance portfolio model. In nominal mean–variance model, the midpoints of multiple-interval data are considered as point data. For single-interval data, midpoint of the interval is considered as the mean and the variance is assumed to be 10% of the mean. The solutions from both the approaches are presented in Fig. 1.

Fig. 1
figure 1

Portfolio risk–return for Example 1

Figure 1 shows the solutions of the robust portfolio optimization in the presence of epistemic uncertainty. It is seen from Fig. 1 that for different weights (w), portfolio risk increases with the increase of return. This is a well-known characteristic for any multi-objective optimization problem. A decrease in the risk (i.e., standard deviation) implies that some robustness is achieved in the solutions. Therefore, there is a trade-off between the two objectives, minimizing the risk and maximizing the return of portfolio.

Note that the selection of weights in the presence of both aleatory uncertainty and epistemic uncertainty is not different from the case where only aleatory uncertainty is considered. In both cases, the structure of the final results is exactly the same; we get a list of values for the mean (i.e., return) and standard deviation (i.e., risk) of the performance function corresponding to different weights as shown in Fig. 1. At this stage, the investment analyst is the main driving force of the portfolio optimization problem. The analyst needs to decide how much robustness he or she can afford at the expense of a decreased mean (i.e., less return) of the performance function, and this will serve as a guideline in selecting a combination of weights, which is completely problem dependent.

It is also seen in Fig. 1 that for the same value of the return, the nominal mean–variance portfolio generates smaller values of risk than the decoupled approach. Similarly, for the same value of the risk, the optimal solutions obtained by the nominal mean–variance portfolio have larger values of the return than the decoupled approach. This behavior is intuitive given the fact that the decoupled approach results in a conservative solution of robust portfolio as it searches among the possible values of epistemic variables, which is a minimization problem, to find optimal solution as discussed in “Proposed methodologies” section. On the contrary, the nominal mean–variance portfolio underestimates input uncertainty (i.e., ignores epistemic uncertainty) and thereby results in an optimistic portfolio.

However, epistemic uncertainty exists in the physical world. The point is how to deal with the investment problem in the presence of epistemic uncertainty. In general, the portfolio optimization problem might become infeasible or too optimistic in the presence of unaccounted epistemic uncertainty, if compared with the true optimal solution. Also, if epistemic uncertainty exists but is not explicitly considered in the portfolio optimization framework, there would be considerable difference observed between simulation and observed results. Therefore, in the presence of epistemic uncertainty arising from interval data, the proposed robustness-based portfolio optimization methodology generates realistic solutions.

Example 2

In Example 2, we consider point data for the return of three companies (CTL, BBT, and IRM) and multiple-interval data for two companies (VIZ and KMI), where the numbers of observations for both point and multiple-interval data are the same, as given in Table 4. The assumptions for correlations between asset returns and the minimum amount to be invested on the first asset are the same as in Example 1.

Table 4 Data of five S&P 500 companies for Example 2

Since this problem does not contain any single-interval data, we solve this problem by the decoupled approach using four different risk–return measures: mean–variance, median–variance, mean–downside risk, and median–downside risk. Also, the numbers of observations for point and multiple-interval data are the same. Therefore, MLE-based single-loop portfolio method is also used to solve this problem using mean–variance portfolio measure.

Since the assets VIZ and KMI contain multiple-interval data, bounds on the first two moments of these return data are estimated by the moment bounding methods as given in Table 2, bounds on the median are estimated using Eq. (8), and bounds on lower semi-variance are estimated using Eq. (9). Equation (7) is then used to obtain bounds on covariance and lower semi-covariance using the bounds on variance and lower semi-variance thus obtained. We then use the decoupled approach to solve the robustness-based portfolio optimization formulations under epistemic uncertainty for different risk–return measures. The weight parameter w is varied (from 0 to 1) and the optimization formulations in Eqs. (5) and (6) are solved iteratively until convergence to obtain optimal solutions. In each case, the optimization problems converged in five iterations. This problem is also solved by the MLE-based single-loop formulation as given in Eq. (12) using mean–variance risk–return measure. The solutions from both the approaches are presented in Fig. 2.

Fig. 2
figure 2

Portfolio risk–return for Example 2

It is seen from Fig. 2 that different risk–return measures are better for different weights (w). For lower values of the weight parameter (w), i.e., when the analyst puts more emphasis on minimizing risk rather than maximizing return, portfolio models with lower semi-variance, i.e., downside risk, show slightly higher risk values than the variance-based portfolio models. For higher weights, i.e., when the analyst emphasizes more on maximizing return, variance-based portfolio models show significantly higher risk values than the lower semi-variance-based portfolio models. In general, the median–variance and median–downside risk models are better than the mean–variance and mean–downside risk models, respectively, in terms of return maximization. Also, the mean–variance and median–variance models are better than the mean–downside risk and median–downside risk models, respectively, in terms of risk minimization.

It is also seen in Fig. 2 that singe-loop formulation provides better portfolio risk–return values than the decoupled portfolio models. For the same value of the return, the MLE-based single-loop mean–variance portfolio model generates smaller values of risk than the decoupled approach. Similarly, for the same value of the risk, the optimal solutions obtained by the singe-loop formulation have much larger values of the return than the decoupled approach. Therefore, the proposed single-loop robust optimization formulation generally outperforms the decoupled formulations in terms of both return and risk.

Conclusions

This paper proposed several formulations for robust portfolio optimization under both aleatory uncertainty and epistemic uncertainty. The proposed formulations specifically deal with epistemic uncertainty for random variables arising from interval data. Epistemic uncertainty is represented using two approaches: (1) moment bounding approach and (2) likelihood-based approach.

First, a nested robust portfolio optimization formulation is proposed in this paper using the moment bounding approach-based representation of epistemic uncertainty, where the optimizer searches among the possible values of the epistemic variables for a conservative solution of the robust portfolio problem. However, the nested formulation does not guarantee convergence, and even if it converges, it is computationally very expensive. Therefore, we propose a decoupled approach to un-nest the robustness-based portfolio optimization from the analysis of the epistemic variables to achieve computational efficiency. The proposed decoupled formulations are presented for four risk–return portfolio measures: classical mean–variance, mean–downside risk, median–variance, and median–downside risk. With numerical experimentation, we show that the median–variance and median–downside risk models are better than the mean–variance and mean–downside risk models, respectively, in terms of return maximization. Also, the mean–variance and median–variance models are better than the mean–downside risk and median–downside risk models, respectively, in terms of risk minimization.

Decoupled approach is computationally efficient than the nested formulation, but it quantifies uncertainty through iterative analysis. Therefore, a likelihood-based moment estimation method is also proposed for representing uncertainty that completely separates the epistemic analysis from the portfolio optimization framework and thereby achieves further computational efficiency. The proposed likelihood-based approach is general and is able to estimate the parameters of any known multivariate probability distributions. However, in this paper, we have used multivariate normal distribution for the sake of illustration only. The resulting MLE-based single-loop robust portfolio optimization formulation generates better optimal solutions than the decoupled approach in terms of both portfolio return and risk.

The major advantage of the proposed methodologies is that unlike the existing methods, it does not require separate representation for aleatory and epistemic uncertainties as it can handle both types of uncertainty using probabilistic format. The developed methodologies can assist the investors in making robust decisions with emphasis on both return and risk. The methods developed in this paper are applicable to any robust optimization problem under epistemic uncertainty that uses moment information or probability distribution.

The proposed uncertainty representation and optimization methodologies are capable of solving risk and revenue management problem efficiently. In revenue management problem, the decision maker not only focuses on maximizing the expected revenue, but also focuses on minimizing the risk of failing to achieve a given target revenue, and in this situation the proposed methodologies are helpful to make a robust decision.

Finally, an interesting question arises that how the portfolio decision may change if we use a linear risk estimator instead of variance or semi-variance. The answer is the scope of further research. In the future, the proposed model can be compared with the linear robust risk estimator such as Mean Absolute Deviation and Median Absolute Deviation.