1 Introduction

Sales and macroeconomic factors are some of the driving forces behind stock movements but there are many others. For example, the subjective views of market participants also have important effects. Along with the growing popularity of social media in the past decades, people tend to rapidly express and exchange their thoughts and opinions [21]. As a result, the importance of their views has dramatically risen [6]. Currently, stock movements are considered to be essentially affected by new information and the beliefs of investors [17].

Meanwhile, sentiment analysis has emerged as a new tool for analyzing the opinions shared on social media [7]. It is a branch of affective computing research that aims to classify natural language utterances as either positive or negative, but sometimes also neutral [9]. In the financial domain, sentiment analysis is frequently used to obtain a data stream of public mood toward a company, stock, or the economy. Public mood is the aggregation of individual sentiments which can be obtained and estimated from various sources, such as stock message boards [2, 19], blogs, newspapers, and really simple syndication (RSS) feeds [34].

Recently, Twitter has become a dominant microblogging platform on which many works rely for their investigations, such as [20, 23, 27]. Many previous studies support the claim that public mood helps to predict the stock market. For instance, the fuzzy neural network model considering public mood achieves high directional accuracy in predicting the market index. The mood time series is also proved a Granger cause of the market index [4]. Si et al. build a topic-based sentiment time series and predict the market index better with a vector autoregression model to interactively link the two series [26]. The Hurst exponents also suggest a long-term dependency for time series of mood extracted form financial news, similar to many market indices [8].

Despite the important role in stock market prediction, we assume that public mood does not directly effect the market: it does indirectly through market participants’ views. The actions taken by market participants as agents, are dependent on their own views, and their knowledge about other agents’ views. The changes of asset prices are the consequences of such actions. These assumptions are very different from econometric research using productivity, equilibrium, and business cycle models [1], but closer to agent-based models [14]. However, the mechanism of how market views are formed from public mood is heavily overlooked even in the latter case. An intuitive hypothesis could be: the happier the public mood, the higher the stock price. In the real-world market, however, this relationship is far more complicated. Therefore, existing superficial financial applications of AI do not appear convincing to professionals.

In this paper, we attempt to fill this gap by proposing a method for incorporating public mood to form market views computationally. To validate the quality of our views, we simulate the trading performance with a constructed portfolio. The key contributions of this paper can be summarized as follows:

  1. 1.

    We introduce a stricter and easier-to-compute definition of the market views based on a Bayesian asset allocation model. We prove that our definition is compatible, and has the equivalent expressiveness as the original form.

  2. 2.

    We propose a novel online optimization method to estimate the expected returns by solving temporal maximization problem of portfolio returns.

  3. 3.

    Our experiments show that the portfolio performance with market views blending public mood data stream is better than directly training a neural trading model without views. This superiority is robust for different models selected with the right parameters to generate market views.

The remainder of the paper is organized as follows: Sect. 2 explains the concept of Bayesian asset allocation; following, we describe the methodologies developed for modeling market views in Sect. 3; we evaluate such methodologies by running trading simulations with various experimental settings in Sect. 4 and show the interpretability of our model with an example in Sect. 5; finally, Sect. 6 concludes the paper and describes future work.

2 Bayesian Asset Allocation

The portfolio construction framework [18] has been a prevalent model for investment for more than half a century. Given the an amount of initial capital, the investor will need to allocate it to different assets. Based on the idea of trading-off between asset returns and the risk taken by the investor, the mean-variance method proposes the condition of an efficient portfolio as follows [18, 29]:

(1)

where \(\delta \) is an indicator of risk aversion, \(w_i\) denotes the weight of the corresponding asset in the portfolio, \(\mu _i\) denotes the expected return of asset i, \(\sigma _{ij}\) is the covariance between returns of asset i and j. The optimized weights of an efficient portfolio is therefore given by the first order condition of Eq. 1:

$$\begin{aligned} w^{*} = (\delta \varSigma )^{-1} \mu \end{aligned}$$
(2)

where \(\varSigma \) is the covariance matrix of asset returns and \(\mu \) is a vector of expected returns \(\mu _i\). At the risk level of holding \(w^{*}\), the efficient portfolio achieves the maximum combinational expected return.

However, when applying this mean-variance approach in real-world cases, many problems are faced. For example, the two moments of asset returns are difficult to estimate accurately [25], as they are non-stationary time series. The situation is worsened by the fact that, the Markowitz model is very sensitive to the estimated returns and volatility as inputs. The optimized weights can be very different because of a small error in \(\mu \) or \(\varSigma \). To address the limitation of the Markowitz model, a Bayesian approach that integrates the additional information of investor’s judgment and the market fundamentals was proposed by Black and Litterman [3]. In the Black-Litterman model, the expected returns \(\mu _{BL}\) of a portfolio is inferred by two antecedents: the equilibrium risk premiums \(\varPi \) of the market as calculated by the capital asset pricing model (CAPM), and a set of views on the expected returns of the investor.

The Black-Litterman model assumes that the equilibrium returns are normally distributed as \(r_{eq}\sim \mathcal {N}(\varPi , \tau \varSigma )\), where \(\varSigma \) is the covariance matrix of asset returns, \(\tau \) is an indicator of the confidence level of the CAPM estimation of \(\varPi \). The market views on the expected returns held by an investor agent are also normally distributed as \(r_{views}\sim \mathcal {N}(Q, \varOmega )\).

Subsequently, the posterior distribution of the portfolio returns providing the views is also Gaussian. If we denote this distribution by \(r_{BL}\sim \mathcal {N}(\bar{\mu }, \bar{\varSigma })\), then \(\bar{\mu }\) and \(\bar{\varSigma }\) will be a function of the aforementioned variables (see Fig. 1).

$$\begin{aligned} \left[ \bar{\mu }, \bar{\varSigma }\right] = f (\tau , \varSigma , \varOmega , \varPi , Q) \end{aligned}$$
(3)
Fig. 1.
figure 1

The posterior distribution of the expected returns as in the Black-Litterman model, which has a mean between two prior distributions and a variance less than both of them.

The function can be induced from applying Bayes’ theorem on the probability density function of the posterior expected returns:

$$\begin{aligned} pdf(\bar{\mu }) = \frac{pdf(\bar{\mu }|\varPi )\ pdf(\varPi )}{pdf(\varPi |\bar{\mu })} \end{aligned}$$
(4)

Then, the optimized Bayesian portfolio weights have a similar form to Eq. 2, only substituting \(\varSigma \) and \(\mu \) by \(\bar{\varSigma }\) and \(\bar{\mu }\):

$$\begin{aligned} w^{*}_{BL} = (\delta \bar{\varSigma })^{-1} \bar{\mu }. \end{aligned}$$
(5)

The most common criticism of the Black-Litterman model is the subjectivity of investor’s views. In other words, the model resorts to the good quality of the market views, while it leaves the question of how to actually form these views unanswered. In Sect. 3, we will investigate the possibility of automatically formalizing the market views from public mood distilled from the Web and the maximization of portfolio returns for each time period.

3 Methodologies

3.1 Modeling Market Views

The Black-Litterman model defines a view as a statement that the expected return of a portfolio has a normal distribution with mean equal to q and a standard deviation given by \(\omega \). This hypothetical portfolio is called a view portfolio [13]. In practice, there are two intuitive types of views on the market, termed relative views and absolute views, that we are especially interested in. Next, we introduce the formalization of these two types of views.

Because the standard deviation \(\omega \) can be interpreted as the confidence of expected return of the view portfolio, a relative view takes the form of “I have \(\omega _1\) confidence that asset x will outperform asset y by \(a\%\) (in terms of expected return)”; an absolute view takes the form of “I have \(\omega _2\) confidence that asset z will outperform the (whole) market by \(b\%\)”. Consequently, for a portfolio consisting of n assets, a set of k views can be represented by three matrices \(P_{k,n}\), \(Q_{k,1}\), and \(\varOmega _{k,k}\).

\(P_{k,n}\) indicates the assets mentioned in views. The sum of each row of \(P_{k,n}\) should either be 0 (for relative views) or 1 (for absolute views); \(Q_{k,1}\) is a vector comprises expected returns for each view. Mathematically, the confidence matrix \(\varOmega _{k,k}\) is a measure of covariance between the views. The Black-Litterman model assumes that the views are independent of each other, so the confidence matrix can be written as \(\varOmega =diag(\omega _1, \omega _2, \dots , \omega _n)\). In fact, this assumption will not affect the expressiveness of the views as long as the k views are compatible (not self-contradictory). Because when \(\varOmega _{k,k}\) is not diagonal, we can always do spectral decomposition: \(\varOmega =V\varOmega ^{\varLambda }V^{-1}\). Then we write the new mentioning and new expected return matrices as \(P^{\varLambda }=V^{-1}P\), \(Q^{\varLambda }=V^{-1}Q\), where \(\varOmega ^{\varLambda }\) is diagonal. Under these constructions, we introduce two important properties of the view matrices in Theorems 1 and 2.

Theorem 1

(Compatibility of Independent Views). Any set of independent views are compatible.

Proof

Compatible views refer to views that can hold at the same time. For example, {asset x will outperform asset y by \(3\%\), asset y will outperform asset z by \(5\%\), asset x will outperform asset z by \(8\%\)} is compatible. However, if we change the third piece of view to “asset z will outperform asset x by \(8\%\)”, the view set becomes self-contradictory. Because the third piece of view is actually a deduction from the former two, the view set is called “not independent”.

Assume there is a pair of incompatible views \(\{p, q\}\) and \(\{p, q^{\prime }\}\), \(q\ne q^{\prime }\). Both views are either explicitly stated or can be derived from a set of k views. Hence, there exist two different linear combinations, such that:

$$\begin{aligned}&\sum \limits _{i=1}^{k} a_i p_i = p \qquad \sum \limits _{i=1}^{k} a_i q_i = q \\&\sum \limits _{i=1}^{k} b_i p_i = p \qquad \sum \limits _{i=1}^{k} b_i q_i = q^{\prime } \end{aligned}$$

where \((a_i-b_i)\) are not all zeros.

Thus, we have \(\sum \limits _{i=1}^{k} (a_i-b_i) p_i =\mathbf{0}\), which means that matrix P is rank deficient and the k views are not independent. According to the law of contrapositive, the statement “all independent view sets are compatible” is true.    \(\square \)

Theorem 2

(Universality of Absolute View Matrix). Any set of independent relative and absolute views can be expressed with a non-singular absolute view matrix.

Proof

Assume a matrix P with r relative views and \((k-r)\) absolute views.

$$\begin{aligned} P_{k,n} = \begin{bmatrix} p_{1,1}&p_{1,2}&\cdots&p_{1,n} \\ \vdots&\vdots&\ddots&\vdots \\ p_{r,1}&p_{r,2}&\cdots&p_{r,n} \\ \vdots&\vdots&\ddots&\vdots \\ p_{k,1}&p_{k,2}&\cdots&p_{k,n} \end{bmatrix} \end{aligned}$$

The corresponding return vector is \(Q=(q_1, q_2, \dots , q_k)\), the capital weight vector for assets is \(w=(w_1, w_2, \dots , w_k)\). Hence, we can write \((r+1)\) equations with regard to r new variables \(\{q_1^{\prime }, q_2^{\prime }, \dots , q_r^{\prime }\}\), where \(j=1, 2, \dots , r\):

$$\begin{aligned}&1+ q_j^{\prime }=\sum \limits _{i\ne j}^{r}(1+q_i^{\prime }) \frac{w_i}{\sum \limits _{s\ne j} w_s} (1+q_j) \\&\quad \sum \limits _{i=1}^{r}q_i^{\prime }w_i+\sum \limits _{i=r+1}^{k}q_i w_i=Qw^\intercal \end{aligned}$$

If we consider \(\{asset_{r+1},\dots , asset_k\}\) to be one asset, return of this asset is decided by \(P_{r,n}\). Hence, r out of the \((r+1)\) equations above are independent.

According to Cramer’s rule, there exists a unique solution \(Q^{\prime }=(q_1^{\prime }, q_2^{\prime }, \dots , q_r^{\prime }, q_{r+1}, \dots , q_k)\) to the aforementioned \((r+1)\) equations, such that view matrices \(\{P^{\prime },\, Q^{\prime }\}\) is equivalent to view matrices \(\{P,\, Q\}\) for all the assets considered, where

$$\begin{aligned} P^{\prime }_{k,n} = \begin{bmatrix} 1&0&\cdots&0 \\ \vdots&\vdots&\ddots&\vdots \\ 0&p_{r,r}=1&\cdots&0 \\ \vdots&\vdots&\ddots&\vdots \\ p_{k,1}&p_{k,2}&\cdots&p_{k,n} \end{bmatrix}. \end{aligned}$$

Now, \(P^{\prime }_{k, n}\) only consists of absolute views. By deleting those dependent views, we can have a non-singular matrix that only consists of absolute views and is compatible.    \(\square \)

Given Theorems 1 and 2, without loss of generality, we can use the following equivalent yet stricter definition of market views to reduce computational complexity.

Definition 1

Market views on n assets can be represented by three matrices \(P_{n,n}\), \(Q_{n,1}\), and \(\varOmega _{n,n}\), where \(P_{n,n}\) is an identity matrix; \(Q_{n,1} \in \mathbb {R}^n\); \(\varOmega _{n,n}\) is a nonnegative diagonal matrix.

3.2 The Confidence Matrix

In the most original form of the Black-Litterman model, the confidence matrix \(\varOmega \) is set manually according to investors’ experience. Whereas in the numerical example given by [13], the confidence matrix is derived from the equilibrium covariance matrix:

$$\begin{aligned} \hat{\varOmega }_0 = diag(P(\tau \varSigma )P^{\prime }) \end{aligned}$$
(8)

This is because \(P(\tau \varSigma )P^{\prime }\) can be understood as a covariance matrix of the expected returns in the views as well. Using our definition, it is easier to understand this estimation, because P is an identity matrix, \(P(\tau \varSigma )P^{\prime }\) is already diagonal. The underlying assumption is that the variance of an absolute view on asset i is proportional to the volatility of asset i. In this case, the estimation of \(\varOmega \) utilizes past information of asset price volatilities.

3.3 Optimal Market Views

We obtain the optimal market views \(\{P, Q, \varOmega \}\) in a hybrid way, first we adopt the confidence matrix \(\hat{\varOmega }_0\), then Q can be derived from the inverse optimization problem using the Black-Litterman model.

We start from the optimal portfolio weights that maximize the portfolio returns for each period t. Obviously, without short selling and transaction fees, one should re-invest his whole capital daily to the fastest-growing asset in the next time period.

The optimal holding weights for each time period t thus take the form of a one-hot vector, where \(\oslash \) and \(\odot \) denote element-wise division and product:

$$\begin{aligned} w_t^{*}=\mathrm {argmax}\;\; w_t \oslash price_{t} \odot price_{t+1} \end{aligned}$$
(9)

Let this \(w_t^{*}\) be the solution to Eq. 1, we will have:

$$\begin{aligned} w_t^{*} = (\delta \bar{\varSigma }_t)^{-1} \bar{\mu }_t \end{aligned}$$
(10)

where the Black-Litterman model givesFootnote 1:

$$\begin{aligned}&\bar{\varSigma }_t=\varSigma _t+[(\tau \varSigma _t)^{-1}+P^{\prime }\hat{\varOmega }^{-1}_tP]^{-1} \end{aligned}$$
(11)
$$\begin{aligned}&\bar{\mu }_t= [(\tau \varSigma _t)^{-1}+P^{\prime }\hat{\varOmega }^{-1}_tP]^{-1}[(\tau \varSigma _t)^{-1}\varPi _t+P^{\prime }\hat{\varOmega }^{-1}_t Q_t] \end{aligned}$$
(12)

According to Eqs. 1011, and 12, the optimal expected returns for our market views for each period t is:

$$\begin{aligned} \begin{aligned} Q_t^{*}&=\hat{\varOmega }_{0,t}\big \{[\,(\tau \varSigma _t)^{-1}+P^{\prime }\hat{\varOmega }_{0,t}^{-1}P\,]\,\bar{\mu }_t-(\tau \varSigma _t)^{-1}\varPi _t\big \}\\&=\delta [\,\hat{\varOmega }_{0,t}(\tau \varSigma _t)^{-1}+\mathbb {I}\,]\,\bar{\varSigma }_tw_t^{*}-\hat{\varOmega }_{0,t}(\tau \varSigma _t)^{-1}\varPi _t\\&=\delta [\,\hat{\varOmega }_{0,t}(\tau \varSigma _t)^{-1}+\mathbb {I}\,]\,[\,\varSigma _t+[(\tau \varSigma _t)^{-1}+\hat{\varOmega }^{-1}_t]^{-1}\,]w_t^{*}\\&\quad -\hat{\varOmega }_{0,t}(\tau \varSigma _t)^{-1}\varPi _t \end{aligned} \end{aligned}$$
(13)

3.4 Generating Market Views with Neural Models

Equation 13 provides a theoretical perspective on determining the expected return of optimal market views. However, computing \(w_t^{*}\) requires future asset prices, which is not accessible. Therefore, the feasible approach is to learn approximating \(Q_t^{*}\) with historical data and other priors as input. We use the time series of asset prices, trading volumes, and public mood data stream to train neural models (nn) for this approximation problem of optimal market views:

$$\begin{aligned} \hat{Q}_t = nn(prices, volumes, sentiments;\, Q_t^{*}) \end{aligned}$$
(14)

We denote the time series of asset prices \(price_{t-k}, price_{t-k+1}, \dots , price_t\) by a lag operator \(\mathcal {L}^{0\sim k}price_t\). The notation of trading volumes follows a similar form. Then the model input at each time point: \([\mathcal {L}^{0\sim k}price_t,\mathcal {L}^{0\sim k}volume_t,\) \(sentiment_t, capital_t]\) can be denoted by \([p,v,s,c]_t\) in short.

Two types of neural models, including a neural-fuzzy approach and a deep learning approach are trained for comparison. Figure 2 provides an illustration of the online training process using a long short-term memory (LSTM) network, where \(\hat{Q}\) is the output.

Fig. 2.
figure 2

Model training process (LSTM) with/without sentiment information.

Dynamic Evolving Neural-Fuzzy Inference System (DENFIS) is a neural network model with fuzzy rule nodes [16]. The partitioning of which rule nodes to be activated is dynamically updated with the new distribution of incoming data. This evolving clustering method (ECM) features the model with stability and fast adaptability. Comparing to many other fuzzy neural networks, DENFIS performs better in modeling nonlinear complex systems [32].

Considering the financial market as a real-world complex system, we learn the first-order Takagi-Sugeno-Kang type rules online. Each rule node has the form of:

$$\begin{aligned}&IF \mathcal {L}^{0\sim k}attribute_{t,i}=pattern_i,\, i=1,2,\dots , N \\&\quad THEN \hat{Q_t}=f_{1,2,\dots , N}([p,v,s]_t) \end{aligned}$$

where we have 3 attributes and \((2^N-1)\) candidate functions to activate. In our implementation of the DENFIS model, all the membership functions are symmetrical and triangular, which can be defined by two parameters \(b\pm d/2\). b is where the membership degree equals to 1; d is the activation range of the fuzzy rule. In our implementation, b is iteratively updated by linear least-square estimator of existing consequent function coefficients.

LSTM is a type of recurrent neural network with gated units. This unit architecture is claimed to be well-suited for learning to predict time series with an unknown size of lags and long-term event dependencies. Early attempts, though not very successful [11], have been made to apply LSTM to time series prediction. It is now recognized that though LSTM cells can have many variants, their performance across different tasks are similar [12].

Therefore, we use a vanilla LSTM unit structure. Our implementation of LSTM cells follows the update rules of the input gate, forget gate, and output gate as in Eq. 15:

$$\begin{aligned} \begin{aligned} i_t&= \sigma (W_i\cdot [\,h_{t-1},[p,v,s]_t\,]+b_i)\\ f_t&= \sigma (W_f\cdot [\,h_{t-1},[p,v,s]_t\,]+b_f)\\ o_t&= \sigma (W_o\cdot [\,h_{t-1},[p,v,s]_t\,]+b_o) \end{aligned} \end{aligned}$$
(15)

where \(\sigma \) denotes the sigmoid function, \(h_{t-1}\) is the output of the previous state, W is a state transfer matrix, and b is the bias.

The state of each LSTM cell \(c_t\) is updated by:

$$\begin{aligned} \begin{aligned}&c_t=f_t \odot c_{t-1} + i_t \odot (W_c\cdot [\,h_{t-1},[p,v,s]_t\,]+b_c)\\&h_{t-1}=o_t\odot \tanh (c_{t-1}) \end{aligned} \end{aligned}$$
(16)

We make the training process online as well, in a sense that each time a new input is received, we use the previous states and parameters of LSTM cells \([c_{t-1}, \mathbf {W},\, \mathbf {b}]\) to initialize the LSTM cells for period t.

4 Experiments

To evaluate the quality and effectiveness of our formalization of market views, we run trading simulations with various experimental settings.

4.1 Data

The data used in this study are publicly available on the Web. We obtain the historical closing price of stocks and daily trading volumes from the Quandl APIFootnote 2; the market capitalization data from Yahoo! Finance; the daily count and intensity of company-level sentiment time series from PsychSignalFootnote 3. The sentiment intensity scores are computed from multiple social media platforms using NLP techniques. Figure 3 depicts a segment example of the public mood data stream. The market is closed on weekends, so a corresponding weekly cycle of message volume can be observed.

Fig. 3.
figure 3

The volume of daily tweets filtered by cashtag AAPL (blue, left); average sentiment intensity (red, left); net sentiment polarity (red, right); daily returns (black, right) in a time period of 90 days (2017-03-04 to 2017-06-04). All the series are normalized. (Color figure online)

We investigate a window of around 8 years (2800 days). All the time series are trimmed from 2009-10-05 to 2017-06-04. For missing values such as the closing prices on weekends and public holidays, we fill them with the nearest historical data to train the neural models. The lagged values we use for both price and trading volume consist of 4 previous days and a moving average of the past 30 days, that is, the input of our neural models takes the form of Eqs. 17 and 18:

$$\begin{aligned} \mathcal {L}^{0\sim k}price_t&=(p_t, p_{t-1}, p_{t-2}, p_{t-3}, \frac{\sum ^{30}_{i=1} p_i}{30}) \end{aligned}$$
(17)
$$\begin{aligned} \mathcal {L}^{0\sim k}volume_t&=(v_t, v_{t-1}, v_{t-2}, v_{t-3}, \frac{\sum ^{30}_{i=1} v_i}{30}) \end{aligned}$$
(18)

4.2 Trading Simulation

We construct a virtual portfolio consisting of 5 big-cap stocks: Apple Inc (AAPL), Goldman Sachs Group Inc (GS), Pfizer Inc (PFE), Newmont Mining Corp (NEM), and Starbucks Corp (SBUX). This random selection covers both the NYSE and NASDAQ markets and diversified industries, such as technology, financial services, health care, consumer discretionary etc. During the period investigated, there were two splits: a 7-for-1 split for AAPL on June 9th 2014, and a 2-for-1 split for SBUX on April 9th 2015. The prices per share are adjusted according to the current share size for computing all related variables, however, dividends are not taken into account. We benchmark our results with two portfolio construction strategies:

(1) The value-weighted portfolio (VW): we re-invest daily according to the percentage share of each stock’s market capitalization. In this case, the portfolio performance will be the weighted average of each stock’s performance. This strategy is fundamental, yet empirical study [10] shows that beating the market even before netting out fees is difficult.

(2) The neural trading portfolio (NT): we remove the construction of market views and directly train the optimal weights of daily position with the same input. For this black-box strategy, we can not get any insight on how this output portfolio weight comes about.

Fig. 4.
figure 4

Trading simulation performance with different experimental settings: (x-axis: number of trading days; y-axis: cumulative returns). In particular, we use a timespan of 90 and 180 days for our approach. The performance of neural trading is independent from timespan, accordingly the two neural models are compared in (d) and (e) respectively for better presentation.

In the simulations, we assume no short selling, taxes, or transaction fees, and we assume the portfolio investments are infinitely divisible, starting from 10, 000 dollars. We construct portfolios with no views (\(\varOmega _{\varnothing }\), in this case the degenerate portfolio is equivalent to Markowitz’s mean-variance portfolio using historical return series to estimate covariance matrix as a measure of risk), random views (\(\varOmega _{r}\)), the standard views using the construction of Black-Litterman model (\(\varOmega _{0}\)), with and without our sentiment-induced expected returns (s). The trading performances are demonstrated in Fig. 4.

Following the previous research [13], we set the risk aversion coefficient \(\delta =0.25\) and confidence level of CAPM, \(\tau =0.05\). Let the activation range of fuzzy membership function \(d=0.21\), we obtain 21 fuzzy rule nodes from the whole online training process of DENFIS. This parameter minimizes the global portfolio weight error. For the second neural model using deep learning, we stack two layers of LSTMs followed by a densely connected layer. Each LSTM layer has 3 units; the densely connected layer has 50 neurons, which is set times larger than the number of LSTM units. We use the mean squared error of vector Q as the loss function and the rmsprop optimizer [30] to train this architecture. We observe fast training error convergence in our experiments.

4.3 Performance Metrics

Diversified metrics have been proposed to evaluate the performance of a given portfolio [5, 15, 31]. We report four metrics in our experiments.

Root mean square error (RMSE) is a universal metric for approximation problems. It is widely used for engineering and data with normal distribution and few outliers. We calculate the RMSE of our realized portfolio weights to the optimal weights:

$$\begin{aligned} \text {RMSE}= \sqrt{\frac{1}{n} \sum _{i=1}^n \Vert w_i-\hat{w_i}\Vert ^2} \end{aligned}$$
(19)

Annualized return (AR) measures the profitability of a given portfolio. We calculate the geometric mean growth rate per year, which is also referred to as compound annual growth rate (CAGR) for these 2800 days.

Sharpe ratio (SR) is a risk-adjusted return measure. We choose the value-weighted portfolio as a base, consequently the Sharpe ratio of VW will be 1:

$$\begin{aligned} \text {SR}= \frac{\mathbb {E}(R_{portfolio}/R_{VW})}{\sigma (R_{portfolio})/\sigma (R_{VW})} \end{aligned}$$
(20)

SR uses the standard deviation of daily returns as the measure of risk. Note that to distinguish between good and bad risk, we can also use the standard deviation of downside returns only [28]. Our results suggest that the Sortino ratios, which are not reported due to page limit, are very close to SRs and lead to the same conclusion.

The maximum drawdown (MDD) measures the maximum possible percentage loss of an investor:

$$\begin{aligned} \text {MDD}= \max \limits _{0<t<\tau }\Big \{\frac{Value_t-Value_\tau }{Value_t}\Big \} \end{aligned}$$
(21)

Asset allocation strategies with large MDD are exposed to the risk of withdrawal. Table 1 presents the metrics.

Table 1. Performance metrics for various portfolio construction strategies, timespan = 90 and 180 days. Top three metrics are in bold.

4.4 Findings

We have some interesting observations from Fig. 4 and Table 1. SR and AR are usually considered as the most important, and besides, RMSE and MDD are all very close in our experiments. The correlation between RMSE and the other three metrics is weak, though it is intuitive that if the realized weights are close to the optimal weights, the portfolio performance should be better. On the contrary, the LSTM models seem to overfit as they are trained on the mean squared error of weights or expected return of views [22]. However, as mentioned in Sect. 1, the relationship between weights and daily returns is nonlinear. Therefore, holding portfolio weights that are close to the optimal weights does not necessarily means that the AR must be higher. In fact, it is dangerous to use any seemingly reasonable metrics outside the study of asset allocation, such as directional accuracy of price change prediction [4, 33], to evaluate the expected portfolio performance.

The Markowitz portfolio (\(\varOmega _{\varnothing }\)) displays a very similar behavior to the market-following strategy. This is consistent with the inefficacy of the mean-variance approach in practice mentioned by previous studies: holding the Markowitz portfolio is holding the market portfolio. In fact, if the CAPM holds, the market portfolio already reflects the adjustments to risk premiums, that is, fewer market participants will invest on highly risky assets, for this reason their market capitalization will be smaller as well.

However, the Black-Litterman model does not always guarantee better performance over the Markowitz portfolio. “Garbage in, garbage out” still holds for this circumstance. Given random views (\(\varOmega _r\)), it can be worse than market-following in terms of both SR and AR. The lesson learned is that if the investor knows nothing, it is better to hold no views and follow the market than pretending to know something.

In our experiments, DENFIS generally performs better than LSTM models, achieving higher SRs and ARs. The reason may be LSTM models adapt faster to the incoming data, whereas financial time series are usually very noisy. The ECM mechanism provides DENFIS models with converging learning rates, which may be beneficial to the stability of memorized rules. However, it is important to note that the ARs for both neural models improve with the blending of sentiments. The timespan used to estimate correlation and volatility of assets seems not that critical. DENFIS models perform better with longer timespan, while LSTM models perform better with shorter timespan. The Markowitz portfolio is less affected by timespan.

5 A Story

One of the main advantages of our formalization and computing of market views is that some transparency is brought to the daily asset reallocation decisions. In most cases, a stock price prediction system based on machine learning algorithms cannot justify “why he thinks that price will reach that predicted point”. Unlike these systems, our method can tell a story of the portfolio to professional investors and advice seekers. Take June 1st 2017 as an example:

“On June 1st 2017, we observe 164 positive opinions of polarity \(+1.90\), 58 negative opinions of polarity \(-1.77\) on AAPL stock; 54 positive opinions of polarity \(+1.77\), 37 negative opinions of polarity \(-1.53\) on GS stock; 5 positive opinions of polarity \(+2.46\), 1 negative opinion of polarity \(-1.33\) on PFE stock; no opinion on NEM stock; and 9 positive opinions of polarity \(+1.76\), 5 negative opinions of polarity \(-2.00\) on SBUX stock. Given the historical prices and trading volumes of the stocks, we have \(6.29\%\) confidence that AAPL will outperform the market by \(-70.11\%\); \(23.50\%\) confidence that GS will outperform the market by \(263.28\%\); \(0.11\%\) confidence that PFE will outperform the market by \(-0.50\%\); \(1.21\%\) confidence that SBUX will outperform the market by \(4.57\%\). Since our current portfolio invests \(21.56\%\) on AAPL, \(25.97\%\) on GS, \(29.43\%\) on PFE, and \(23.04\%\) on SBUX, by June 2nd 2017, we should withdraw all the investment on AAPL, \(2.76\%\) of the investment on GS, \(81.58\%\) of the investment on PFE, and \(30.77\%\) of the investment on SBUX, and re-invest them onto NEM.”

6 Conclusion and Future Work

In previous studies which have considered sentiment information for financial forecasting, the role of the investor as a market participant is often absent. In this paper, we present a novel approach to incorporate market sentiment by fusing public mood data stream into the Bayesian asset allocation framework.

This work is pioneering in formalizing sentiment-induced market views. Our experiments show that the market views provide a powerful method to asset management. We also confirm the efficacy of public mood data stream based on social media for developing asset allocation strategies.

A limitation of this work is that we fixed a portfolio with five assets, though in practice the portfolio selection problem is of equal importance. How to assess the quality of sentiment data is not discussed in this paper as well. We are not at the stage to distinguish or detect opinion manipulation though concern like the open networks are rife with bots does exist. Another limitation is that survivor bias is not taken into account: the risk that assets selected in the portfolio may quit the market or suffer from a lack of liquidity. This problem can be alleviated by only including high quality assets. In the future, we will study examining the quality of sentiment data obtained using different content analysis approaches. We also plan to develop a Bayesian asset allocation model that can deal with market frictions.