Constructing Sentiment Signal-Based Asset Allocation Method with Causality Information

Taguchi, Rei; Sakaji, Hiroki; Izumi, Kiyoshi; Murayama, Yuri

doi:10.1007/s00354-023-00231-4

Constructing Sentiment Signal-Based Asset Allocation Method with Causality Information

Open access
Published: 11 September 2023

Volume 41, pages 777–794, (2023)
Cite this article

Download PDF

You have full access to this open access article

New Generation Computing Aims and scope Submit manuscript

Constructing Sentiment Signal-Based Asset Allocation Method with Causality Information

Download PDF

Rei Taguchi ORCID: orcid.org/0000-0003-1413-4360¹,
Hiroki Sakaji¹,
Kiyoshi Izumi¹ &
…
Yuri Murayama¹

959 Accesses
Explore all metrics

Abstract

This study demonstrates whether financial text is useful for the tactical asset allocation method using stocks. This can be achieved using natural language processing to create polarity indexes in financial news. We perform clustering of the created polarity indexes using the change point detection algorithm. In addition, we construct a stock portfolio and rebalanced it at each change point using an optimization algorithm. Consequently, the proposed asset allocation method outperforms the comparative approach. This result suggests that the polarity index is useful for constructing the equity asset allocation method.

Correlate Influential News Article Events to Stock Quote Movement

Social sentiment and impact in US equity market: an automated approach

Article Open access 02 September 2023

Sentiment Index Construction and the Influence of Sentiments on Returns

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

1.1 Background

We determine whether financial text can be beneficial for tactical asset allocation methods using equities. This can be accomplished using natural language processing and statistical causal inference to create rebalancing signals. Numerous studies have been conducted on investment techniques using machine learning. In particular, there is a growing body of research on asset allocation, including the derivation of investment signals and the calculation of investment ratios [1,2,3]. We focus on the point at which stock and portfolio prices change rapidly owing to external factors, that is, the point of regime change. Regimes in finance theory refer to invisible states of the market, such as expansion, recession, bull, and bear. Some studies have attempted to capture market alpha by incorporating these regime changes into investment strategies [4,5,6].

1.2 Focus

We specifically drew on the two following studies. Wood et al. [7] employed a change point detection module to capture regime changes and created a simple and expressive model. Ito et al. [8] developed a method of switching investment strategies in response to market conditions. These studies have developed investment strategies that are sufficiently agile to capture regime changes at the time of observation and remain high performers. We go one step further and focus on how to measure future regime changes. If the information on future regime changes, that is, future changes in the market environment, is known, active management with a higher degree of freedom becomes possible. In contrast, because there are certain limitations in calculating future regimes using only traditional financial time series data, we construct an investment strategy based on a combination of alternative data that has been attracting attention in recent years in addition to financial time series data.

1.3 Hypothesis

In recent years, given the explosive development of artificial intelligence (AI), the use of alternative data, which is particularly prominent in the financial and economic fields and is beginning to be widely used for economic forecasting and investment strategies together with traditional data, has received worldwide attention. Among them is the versatility of text data, including research on creating an economic polarity index through sentiment analysis [9, 10] and extracting causal information from text data [11]. Against this background, we formulate the following hypotheses to calculate the point of change in future regimes.

Polarity indexes calculated from text data contain information that precedes traditional financial time series data such as stocks.
Polarity indexes calculated from text data can be a signal to rebalance a portfolio, and this signal can affect increases in portfolio performance.
Portfolio performance can be improved by switching between risk-minimizing and return-maximizing optimization strategies according to the change points created by the polarity index.

1.4 Contributions

The proposed investment strategy using financial text is expected to generate better performance than the comparative strategy. This is done by generating polarity indexes from the financial text using natural language processing techniques, identifying the precedence of the generated polarity indexes using statistical causal inference, calculating the regime change points of the polarity indexes using change point detection techniques, and rebalancing the portfolio using multiple mathematical optimization models according to the change points. This study makes the following contributions.

Proposed a highly expressive asset allocation framework using financial text mining techniques.
Demonstrated that the estimation of regime change points using financial text is material for active management.
Demonstrated that the preceding and following relationships between financial time series and text are material for active management.

2 Related Works

2.1 Asset Allocation Using Machine Learning

Studies of frameworks that combine mean-variance models and deep learning include Ma et al. [12] and Wang et al. [13]. Yun et al. [14] proposed a two-stage deep learning model for a forecast-based portfolio management approach. Zhang et al. [15] proposed an asset allocation framework using deep learning to maximize portfolio sharp ratio. In other work, Chen et al. [3] proposed an asset allocation framework using XGBoost and firefly algorithm. Imajo et al. [1] constructed a network to predict the distribution of the residual term of returns and proposed a method to design portfolios based on the information of the predicted distribution. Ito et al. [2] outlined an evolutionary computation model that mimics the role of a trader known as the Trader-Company method. The novelty of this study differs from the aforementioned in that it uses natural language processing technology in the asset allocation framework.

2.2 Creation of Economic Index Using Text Mining

Seki et al. [16] developed a business confidence index using BERT. Text data was provided by Nihon Keizai Shimbun, Inc. Several other studies have developed business sentiment indices from text data [9, 17]. There is also research on using text data to predict stock closing prices [18, 19]. In addition, research has been conducted on the creation of an economic index using rate information from analyst reports [6, 10]. Several studies use text mining of analyst reports to predict stock prices [20, 21]. The novelty of the current study is that the masked language model (MLM) score was used to create the polarity index to index the tone of the financial text.

2.3 Causal Inference and Its Applications

Causal structure learning algorithms can be classified into three clusters. First, constraint-based approaches that use conditional independence tests to establish the existence of an edge between two nodes [22, 23]. Second, a score-based method that utilizes several search procedures to optimize a particular score function [24,25,26]. Third, structural causal models that represent variables at specific nodes as a function of their parents [27,28,29,30,31]. The following are examples of applications of statistical inference in the financial field. D’Acunto et al. [32] used VAR-LiNGAM, which incorporates time series into the Linear Non-Gaussian Acyclic Model (LiNGAM), a semiparametric causal inference algorithm, to reveal the causal structure of risk factors in stocks. Ohmura [33] analyzed the relationship between the stock market and political support using VAR-LiNGAM. Research has also examined the construction of networks that link causal relationships regarding performance among firms, known as causal chains [34,35,36]. The current study is novel in that it uses VAR-LiNGAM for empirical analysis of the leading–lagging relationship between stock portfolios and polarity indicators created from financial texts.

2.4 Time Series Change Point Detection and Its Applications

In time series data, temporal non-independence must be explicitly addressed owing to the existence of temporal correlations among the data. For example, models using autoregressive [37] and singular spectrum analysis [38, 39] have been proposed. Several studies apply change point detection algorithms to financial time series analysis [40,41,42]. Wood et al. [7] used a change point detection algorithm with a Gaussian process to construct an investment strategy. The current study is novel in that it uses Binary Segmentation Search, a highly expressive change point detection algorithm, to estimate economic regimes.

3 Task Setting

We propose a tactical asset allocation method that agilely captures market regime changes and reflects them in rebalancing. The asset allocation of a stock portfolio comprising three or more stocks was performed using signals as points of change in a polarity index regime created from financial news. Using financial news, we developed an investment strategy using natural language processing and AI techniques in the following four steps. By comparing this investment strategy with comparative strategies, we demonstrate that the framework proposed herein is useful.

3.1 SSAAM Overview

Step 1 (Creating polarity index): Score financial news titles using MLM scoring. In addition, quartiles are calculated from the same data, and a three-value classification of positive, negative, and neutral is performed according to the quartile range. The calculated values are aggregated daily.

Step 2 (Demonstration of leading effects): We use statistical causal inference to demonstrate whether financial news has leading effects on a stock portfolio. Use the polarity index created in Step 1. We also create a portfolio of 10 stocks combined. We use the VAR-LiNGAM algorithm.

Step 3 (Change point detection): Verify that the polarity index has leading effects in Step 2. Calculate the regime change point of the polarity index using the change point detection algorithm. We use the Binary Segmentation Search Method algorithm.

Step 4 (Portfolio optimization): Portfolio optimization is performed based on the change points created in Step 3. We use the Entropic value-at-risk (EVaR) optimization algorithm.

The architecture used in this study can be described as illustrated in Fig. 1.

3.2 Framework Validity

Studies on which this framework is based include Ito et al. [8] and Wood et al. [7]. We propose a more practical framework by incorporating textual causality into these. In this section, we discuss the validity of each step of the framework elements. In Step 1, polarity was assigned to the financial text using MLM scoring. This task was set based on research indicating that the tone of U.S. financial reporting affects stock prices [43, 44]. In Step 2, VAR-LiNGAM was selected. It has been highlighted that the VAR framework is at risk of false positives when there are simultaneous effects between variables [45], and this model was selected to cover this issue. In Step 3, Binary Segmentation Search was selected. Truong et al. [46] compared each change point detection model and revealed that Binary Segmentation Search has higher scalability. We also used the least squared deviation (CostL2) as the cost function; the Step 3 combination is effective for time series analysis in the financial domain [46, 47]. In Step 4, EVaR optimization was selected. Results of a pilot experiment using the S &P500 [48, 49] established that performance was better when EVaR optimization was used compared with comparative methods such as CVaR optimization.

4 Method

4.1 Creating Polarity Index

We use pseudo-log-likelihood scores (PLLs) to create polarity indices. PLLs are probabilistic language models corresponding to the masked language models (MLMs) proposed by Salazar et al. [50]. Since MLMs are pre-trained by predicting words in both directions, they could not be handled by conventional probabilistic language models. In contrast, PLLs can determine the naturalness of sentences at a high level, because it is represented by the sum of the log-likelihoods of the conditional probabilities when each word is masked and predicted. Token $\psi _t$ is replaced as [MASK] and past and present tokens $\varvec{\Psi }_{\backslash t}= [\psi _1,\psi _2,...,\psi _{t}]$ are predicted. t represents time, $\Theta $ is a model parameter, and $P_{MLM} (\cdot )$ is the probability of each sentence token. The MLM selected BERT [51].

$$\begin{aligned} {\textbf {PLL}}(\varvec{\Psi }) := \sum ^{\mid \varvec{\Psi }\mid }_{t=1}\log _{2} P_{MLM} (\psi _t \mid \varvec{\Psi }_{\backslash t}; \Theta ). \end{aligned}$$

(1)

Score the financial news text with PLLs one sentence at a time after preprocessing. Quartile ranges are calculated for the data scored one sentence at a time. See the figure below for the polarity classification method.

Table 1 Polarity Classification Method

Full size table

Aggregate the scores chronologically according to the title column of the financial news.

4.2 Demonstration of Leading Effects

We use VAR-LiNGAM, which is a statistical causal inference model proposed by Hyvärinen et al. [45], to demonstrate precedence. The causal graph inferred by VAR-LiNGAM is as follows.

$$\begin{aligned} {\textbf {x}}(t) = \sum ^{T}_{\tau =1} {\textbf {B}}_{\tau } {\textbf {x}}(t-\tau )+{\textbf {e}}(t) \end{aligned}$$

(2)

where ${\textbf {x}}(t)$ is the vector of variables at time t, and $\tau $ is the time delay. T represents the maturity date. Additionally, ${\textbf {B}}_{\tau }$ is a coefficient matrix that represents the causal relationship between variables ${\textbf {x}}(t-\tau )$. ${\textbf {e}}(t)$ denotes the disturbance term. VAR-LiNGAM is implemented by the following procedure. First, a VAR model is applied to the causal relationships among variables from lag time to the current time. Second, for the causal relationships among variables at the current time, LiNGAM inference is performed using the residuals of the VAR model above. We confirm whether financial news is preferred over stock portfolios.

4.3 Change Point Detection

Binary Segmentation Search [47, 52] is a greedy sequential algorithm. The notation follows Truong et al. [46]. This operation is greedy in the sense that it seeks the change point that has the lowest sum of costs. Next, the signal is divided into two at the position of first change point and the same operation is repeated for the obtained partial signal until the stop reference is reached. Binary Segmentation Search is expressed as Algorithm 1. Define a signal $y=\{y_s\}^S_{s=1}$ that follows a multivariate nonstationary stochastic process. This process has S samples. L refers to the list of change points. Let s denote the value of the change point. G refers to the ordered list of change points to be computed. If signal y is given, the $(b-a)$-sample long sub-signal $\{y_s\}^b_{s=a+1}, (1 \le a < b \le S)$ is simply denoted $y_{a, b}$. Hats represent calculated values. Other notations are noted in the algorithm comments.

4.4 Portfolio Optimization

EVaR is a coherent risk measure that is the upper bound between VaR and conditional VaR (CVaR) derived from Chernoff’s inequality [53, 54]. EVaR has the advantage of being computationally tractable compared with other risk measures such as CVaR when incorporated into stochastic optimization problems [54]. The definition of EVaR is as follows.

$$\begin{aligned} {\textbf {EVaR}}_{\alpha }(X):=\min _{z>0}\left\{ z \ln \left( \frac{1}{\alpha } M_{X} \left( \frac{1}{z}\right) \right) \right\} . \end{aligned}$$

(3)

X is a random variable, $M_{X}$ is the moment generating function, $\alpha $ is the significance level, and z are variables. Cajas [49] propose general convex programming framework for EVaR. We switch between the following two optimization strategies depending on the regime classified in Sect. 4.3.

Minimize risk optimization: A convex optimization problem with constraints imposed to minimize EVaR given a level of expected $\mu $ ($\widehat{\mu })$.

(4)

Maximize return optimization: A convex optimization problem imposed to maximize expected return given a level of expected EVaR ($\widehat{EVaR}$).

(5)

where q, z, and u are variables, $K_\mathrm{{exp}}$ is the exponential cone, and T is the number of observations. w is defined as a vector of weights for N assets, r as a matrix of returns, and $\mu $ as a mean vector of assets.

5 Experiments , Results

5.1 Dataset Description

Based on the assumption that financial news has precedence over the equity portfolio, we calculate the signal for portfolio rebalancing and tactical asset allocation to actively go for an alpha. We use two types of data.

Stock Data: We used the daily stock data provided by Yahoo!Finance.^{Footnote 1} The stocks used are the components of the NYSE FANG+ Index: Facebook, Apple, Amazon, Netflix, Google, Microsoft, Alibaba, Baidu, NVIDIA, and Tesla were selected. For these data, adjusted closing prices are used. The time period for this data is January 2015 through December 2019.

Financial News Data: We used the daily historical financial news archive provided by Kaggle,^{Footnote 2} a data analysis platform. The data represent the historical news archive of U.S. stocks listed on the NYSE/NASDAQ for the past 12 years. The data were confirmed to contain information on 10 stock data issues. The data consist of nine columns and 221,513 rows; the title and release date columns were used in this study. The time period for the data is January 2015 through December 2019.

5.2 Preparation for Back-Testing

The experiments are in-sample validation. All data are used to estimate the polarity index, VAR-LiNGAM, and change points.

The polarity index is created using the method in Sect. 4.1. Financial news data are preprocessed once before creating the polarity index. Both financial news and stock data are in daily units; however, to match the period, if there are blanks in either, lines containing blanks are dropped. The summary statistics for the polarity index created are as follows.

Table 2 Summary statistics for polarity index

Full size table

Once the polarity index is created as per the method in Sect. 4.1, the next step is to create a stock portfolio by adding up the adjusted closing prices of ten stocks. The investment ratio for the portfolio was set uniformly for all stocks. Next, use VAR-LiNGAM in Sect. 4.2 to perform causal inference.

Here, the Augmented Dickey-Fuller (ADF) test [55] is used to confirm the stationarity of the data. According to Table 3, the test statistic is smaller than 0.05, so the null hypothesis can be rejected. Therefore, we can say that the data are stationary.

Table 3 Augmented dickey-Fuller test

Full size table

Now, that stationarity has been verified, the data will be used as input to VARLiNGAM. The results of causal inference are represented as follows.

Table 4 Causal inference in VAR-LiNGAM

Full size table

Values in Table 4 refers to the elements of the adjacency matrix. The lower limit was set at 0.05. The results indicate that the polarity index has a leading edge over the equity portfolio. The Python libraries LiNGAM [45] were used.

Split the polarity index by regime turn according to the method described in Sect. 4.3. The regimes are classified according to whether the index is rising or falling, and from Table 4, we know that the movement of the index is linked to that of the portfolio. As there are only two types of portfolio change points, up and down, the signals created by using sequential methods such as binomial segmentation search on the index are effective for portfolio rebalancing. We use the results of the classification of the regimes into 5 and 10.

The following metrics are used to evaluate the performance of change point detection. Few notations introduced in Sect. 4.3 are used. The Python libraries ruptures [46] were used.

Precision: Precision is the ratio of how many correct values are included in the positive class and the predicted sample. In the context of change point detection, precision is defined as follows:

$$\begin{aligned} \text {Precision} := \mid \text {TP}\mid /\mid \lbrace \hat{s_l}\rbrace _l\mid . \end{aligned}$$

(6)

Hausdorff Metric (HM): HM estimates the worst prediction error. Studies relevant to change point detection include Boysen et al. [56] and Harchaoui & Lévy-Leduc [57].

$$\begin{aligned} {\textbf {HM}} := \max \lbrace \max _m \min _l \mid s_m - \hat{s}_l\mid , \max _m \min _l \mid \hat{s}_m - s_l\mid \rbrace \end{aligned}$$

(7)

Let l and m be variables. TP refers to true positive and can be defined as TP $:= \lbrace s_m\mid \exists s_l \text{ s.t. } \mid s_l - \hat{s}_m<\mathrm{{Mar}}\mid \rbrace $. Mar is a margin. The results calculated using each metric are presented in Fig. 5. Note that the authors manually assigned the correct labels necessary for the evaluation.

Table 5 Evaluation of change point detection

Full size table

5.3 Back-Testing Scenarios

The following rebalancing timings are merged and back-tested. The Python libraries vectorbt [58]) and Riskfolio-Lib (Cajas [59]) were used for back-testing. In addition to EVaR optimization, CVaR optimization and the mean-variance model were used as optimization algorithms as well as comparative methods. The number of regimes was set at 5 and 10. The rebalancing timings were 30, 90, and 180 days. The back-testing methodology is as follows. CPD-EVaR++ is positioned as the proposed strategy, and CPD-EVaR+ is the runner-up strategy.

CPD-EVaR++ (proposed): Change point rebalancing using risk minimization and return maximization EVaR optimization + regular intervals rebalancing strategy
CPD-EVaR+: Change point rebalancing using risk minimization and no-restrictions EVaR optimization + regular intervals rebalancing strategy
EVaR: EVaR optimization regular intervals rebalancing strategy
CVaR: CVaR optimization regular intervals rebalancing strategy
MV: Mean-Variance optimization regular intervals rebalancing strategy

The binary determination of whether the polarity index within each regime shows an uptrend or downtrend is made by looking at Figs. 2 and 3. See Tables 6 and 7 for a list of algorithms used for SSAAM back-testing.

Table 6 Model Judgment: 5-Regime

Full size table

Table 7 Model Judgment: 10-Regime

Full size table

5.4 Evaluation by Back-Testing

The following metrics were employed to assess portfolio performance. For periods of regimes that cannot be back-tested with the current parameters, those related to rebalancing were set uniformly to 10. The results of the back-testing are presented in the following tables: Table 8 summarizes the methodology of this study and Table 9 summarizes the comparative methodology.

Total return (TR): TR refers to the total return earned from an investment in an investment product within a given period. TR formula is as follows: TR = valuation amount + cumulative distribution amount received + cumulative amount sold – cumulative amount bought. This study does not incorporate tax amounts and trading commissions.
Maximum drawdown (MDD): MDD refers to the rate of decline from the maximum asset. MDD formula is as follows: MDD = (trough value – peak value)/peak value.

Table 8 Back-testing (SSAAM)

Full size table

Table 9 Back-testing (comparison)

Full size table

The covariance matrix and expected returns of the SSAAM (CPD-EVaR++ and CPD-EVaR+) were estimated based on historical data. The same is true for the comparison methods (MV, CVaR, and EVaR). Also, the limit of the portfolio turnover deviation is fixed at 0.05.

As an example, the values for each portfolio are presented in Fig. 4. They assume an initial value of 100 and rebalancing every 30 days; SSAAM is calculated for the case of switching strategies in five regimes.

6 Discussion

According to Table 8, the higher the number of regular rebalances, the higher the TR. In addition, the maximum drawdowns hovered between 25% and 45%, which is considered to be within the range of maximum drawdowns acceptable to the average system trader. The experiment was conducted separately when there were five and 10 regimes. The TR was higher when there were five regimes, whereas the maximum drawdown was almost the same for both regimes. Furthermore, as hypothesized (Sect. 1.3), CPD-EVaR++, which is a combination of risk minimization and return maximization operations, performed better than the others. Therefore, using this method, the best practice in managing equity portfolios is to use CPD-EVaR++ and to rebalance irregularly in regime five in addition to the regular rebalancing every 30 days.

Back-testing of Table 9 with the same parameters as in Table 8. The results indicate that for the algorithm, the EVaR optimization performed better than the others, similar to the results in Cajas [49]. This may be because the computational efficiency of EVaR in stochastic optimization problems is higher than that of using other risk measures such as CVaR. As in Table 8, the TR tends to be higher as the number of rebalancing cycles increases. The maximum drawdowns ranged from 43 to 47% and appeared to remain high on average.

Tables 8 and 9 were compared. When comparing the strategies in Tables 8 and 9, the TR in Table 8 revealed that the strategies with the rebalancing of 30 and regime 5 are high performers in both risk evaluation and return evaluation. Conversely, strategies with a rebalance of 180 are found to have low performance. This can be interpreted that the strategies with wide rebalancing intervals do not fully utilize the benefit of the prior information contained in the text conducted in Sect. 5.2.

We address the combination of rebalancing timing. As shown in the metrics in Fig. 5, regime 10 has a higher rating for regime splitting. In contrast, the back-test results show that the combination of irregular rebalancing in regime 10 and periodic rebalancings such as 30, 90, and 180 days are incompatible. Therefore, following Ito et al. [8], we experimented to combine the two rebalances. According to the experimental results, this method is effective for short-term tactical asset allocation but remains problematic for long-term strategic asset allocation.

We discuss the determination of regime trends. Although the change points were calculated data-driven through a Binary Segmentation Search, it is up to the user of the framework to decide whether the trend of each regime is upward or downward. It is not very realistic for fund managers in quant management to use fully automated models in their investment operations, but further sophistication and explanatory power in this regard are certainly needed.

We discuss the case where this method is used in actual transactions. Daily financial news was used, but not necessarily only when textual information is issued with high frequency. For example, Taguchi et al. [6] predicted the polarity index once using Bidirectional LSTM as an alternative and simulated the investment after regime splitting. If individual polarity indices are created from quarterly earnings reports or analyst reports that are published irregularly, or if a combination of these texts is used to create polarity indices, the method of predicting a polarity index once and then using it may be effective in some extent.

We discuss the validity of this method outside of the U.S. market. We used MLM scoring, which calculates textual tone, for creating polarity indicators. The tone of the text, the way emotions are expressed, and the language system are considered to be different in each country. In addition to the U.S. market, it may be interesting to test this method for markets in European and Asian countries and measure the contribution of textual information.

7 Conclusion , Future Work

This study demonstrates the utility of financial text for asset allocation with equity portfolios. This was accomplished using natural language processing and change point detection techniques to create polarity indicators to signal for rebalancing. In the future, we would like to develop a tactical asset allocation strategy that mixes stocks as well as other asset classes such as bonds. We also hope to further enhance this framework with portfolio management that includes option-based hedging strategies and credit derivatives. We would like to confirm the effectiveness of this method by examining available textual data, such as reports and financial statements published by central banks.

Data Availability

The authors declare that the data supporting the findings of this study are available within the Kaggle (https://www.kaggle.com/).

Notes

References

Imajo, K., Minami, K., Ito, K., Nakagawa, K.: Deep portfolio optimization via distributional prediction of residual factors. Proc. AAAI Conf. Artif. Intell. 35(1), 213–222 (2021)
Google Scholar
Ito, K., Minami, K., Imajo, K., Nakagawa, K.: Trader-company method: a metaheuristics for interpretable stock price prediction. In: Proceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems. AAMAS ’21, pp. 656–664. International Foundation for Autonomous Agents and Multiagent Systems, Richland (2021)
Chen, W., Zhang, H., Mehlawat, M.K., Jia, L.: Mean-variance portfolio optimization using machine learning-based stock price prediction. Appl. Soft Comput. 100, 106943 (2021). https://doi.org/10.1016/j.asoc.2020.106943
Article Google Scholar
Komatsu, T., Makimoto, N.: Dynamic investment strategy with factor models under regime switches. Asia Pac. Financ. Mark. 22(2), 209–237 (2015)
Article MATH Google Scholar
Komatsu, T., Makimoto, N.: Linear rebalancing strategy for multi-period dynamic portfolio optimization under regime switches. J. Oper. Res. Soc. Jpn. 61(3), 239–260 (2018)
MathSciNet Google Scholar
Taguchi, R., Watanabe, H., Sakaji, H., Izumi, K., Hiramatsu, K.: Constructing equity investment strategies using analyst reports and regime switching models. Front. Artif. Intell. 5 (2022)
Wood, K., Roberts, S., Zohren, S.: Slow momentum with fast reversion: a trading strategy using deep learning and changepoint detection. J. Financ. Data Sci. 4(1), 111–129 (2021). https://doi.org/10.3905/jfds.2021.1.081
Article Google Scholar
Ito, M., Jo, K., Hibiki, N.: Application of asset allocation models in practice and mutual fund design (in Japanese). Oper. Res. Manag. Sci. 66(10), 683–689 (2021)
Google Scholar
Sakaji, H., Kuramoto, R., Matsushima, H., Izumi, K., Shimada, T., Sunakawa, K.: Financial text data analytics framework for business confidence indices and inter-industry relations. In: Proceedings of the First Workshop on Financial Technology and Natural Language Processing, pp. 40–46 (2019)
Taguchi, R., Watanabe, H., Hirano, M., Suzuki, M., Sakaji, H., Izumi, K., Hiramatsu, K.: Market trend analysis using polarity index generated from analyst reports. In: 2021 IEEE International Conference on Big Data (Big Data), pp. 3486–3494 (2021). https://doi.org/10.1109/BigData52589.2021.9671702
Izumi, K., Sakaji, H.: Economic causal-chain search using text mining technology. In: Proceedings of the First Workshop on Financial Technology and Natural Language Processing, Macao, China, pp. 61–65 (2019). https://aclanthology.org/W19-5510
Ma, Y., Han, R., Wang, W.: Portfolio optimization with return prediction using deep learning and machine learning. Expert Syst. Appl. 165, 113973 (2021). https://doi.org/10.1016/j.eswa.2020.113973
Article Google Scholar
Wang, W., Li, W., Zhang, N., Liu, K.: Portfolio formation with preselection using deep learning from long-term financial data. Expert Syst. Appl. 143, 113042 (2020). https://doi.org/10.1016/j.eswa.2019.113042
Article Google Scholar
Yun, H., Lee, M., Kang, Y.S., Seok, J.: Portfolio management via two-stage deep learning with a joint cost. Expert Syst. Appl. 143, 113041 (2020). https://doi.org/10.1016/j.eswa.2019.113041
Article Google Scholar
Zhang, Z., Zohren, S., Roberts, S.: Deep learning for portfolio optimization. J. Financ. Data Sci. 2(4), 8–20 (2020). https://doi.org/10.3905/jfds.2020.1.042
Article Google Scholar
Seki, K., Ikuta, Y., Matsubayashi, Y.: News-based business sentiment and its properties as an Economic Index (2021)
Yono, K., Sakaji, H., Matsushima, H., Shimada, T., Izumi, K.: Construction of macroeconomic uncertainty indices for financial market analysis using a supervised topic model. J. Risk Financ. Manag. (2020). https://doi.org/10.3390/jrfm13040079
Article Google Scholar
Wang, H., Guo, Z., Chen, L.: Financial forecasting based on lstm and text emotional features. In: 2019 IEEE 8th Joint International Information Technology and Artificial Intelligence Conference (ITAIC), pp. 1427–1430 (2019). IEEE
Liu, M., Huo, J., Wu, Y., Wu, J.: Stock market trend analysis using hidden Markov model and long short term memory (2021)
Suzuki, M., Sakaji, H., Izumi, K., Matsushima, H., Ishikawa, Y.: Forecasting net income estimate and stock price using text mining from economic reports. Information (2020). https://doi.org/10.3390/info11060292
Article Google Scholar
Suzuki, M., Sakaji, H., Izumi, K., Ishikawa, Y.: Forecasting stock price trends by analyzing economic reports with analyst profiles. Front. Artif. Intell. 103 (2022)
Huang, B., Zhang, K., Zhang, J., Ramsey, J.D., Sanchez-Romero, R., Glymour, C., Schölkopf, B.: Causal discovery from heterogeneous/nonstationary data. J. Mach. Learn. Res. 21(89), 1–53 (2020)
MathSciNet MATH Google Scholar
Spirtes, P., Glymour, C., Scheines, R.: Causation, prediction, and search
Chickering, D.M.: Optimal structure identification with greedy search. J. Mach. Learn. Res. 3(Nov), 507–554 (2002)
Heckerman, D., Geiger, D., Chickering, D.M.: Learning Bayesian networks: the combination of knowledge and statistical data. Mach. Learn. 20(3), 197–243 (1995)
Article MATH Google Scholar
Huang, B., Zhang, K., Lin, Y., Schölkopf, B., Glymour, C.: Generalized score functions for causal discovery. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1551–1560 (2018)
Bühlmann, P., Peters, J., Ernest, J.: Cam: causal additive models, high-dimensional order search and penalized regression. Ann. Stat. 42(6), 2526–2556 (2014)
Article MathSciNet MATH Google Scholar
Hoyer, P., Janzing, D., Mooij, J.M., Peters, J., Schölkopf, B.: Nonlinear causal discovery with additive noise models. Adv. Neural Inf. Process. Syst. 21 (2008)
Peters, J., Mooij, J.M., Janzing, D., Schölkopf, B.: Causal discovery with continuous additive noise models. J. Mach. Learn. Res. 15(58), 2009–2053 (2014)
MathSciNet MATH Google Scholar
Shimizu, S., Hoyer, P.O., Hyvärinen, A., Kerminen, A., Jordan, M.: A linear non-gaussian acyclic model for causal discovery. J. Mach. Learn. Res. 7(10) (2006)
Shimizu, S., Inazumi, T., Sogawa, Y., Hyvärinen, A., Kawahara, Y., Washio, T., Hoyer, P.O., Bollen, K.: Directlingam: a direct method for learning a linear non-gaussian structural equation model. J. Mach. Learn. Res. 12, 1225–1248 (2011)
MathSciNet MATH Google Scholar
D’Acunto, G., Bajardi, P., Bonchi, F., De Francisci Morales, G.: The evolving causal structure of equity risk factors. In: Proceedings of the Second ACM International Conference on AI in Finance, pp. 1–8 (2021)
Ohmura, H.: The connection between stock market prices and political support: evidence from japan. Appl. Econ. Lett. 29(1), 1–7 (2022)
Article MathSciNet Google Scholar
Izumi, K., Sano, H., Sakaji, H.: Economic causal-chain search and economic indicator prediction using textual data. In: Proceedings of the 3rd Financial Narrative Processing Workshop, pp. 19–25 (2021)
Izumi, K., Sakaji, H.: Economic causal-chain search using text mining technology. In: International Joint Conference on Artificial Intelligence, pp. 23–35 (2019). Springer
Kei, N., Shingo, S., Hiroki, S., Kiyoshi, I.: Economic causal chain and predictable stock returns. In: 2019 8th International Congress on Advanced Applied Informatics (IIAI-AAI), pp. 655–660 (2019). IEEE
Yamanishi, K., Takeuchi, J.-I.: A unifying framework for detecting outliers and change points from non-stationary time series data. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 676–681 (2002)
Moskvina, V., Zhigljavsky, A.: An algorithm based on singular spectrum analysis for change-point detection. Commun. Stat. Simul. Comput. 32(2), 319–352 (2003)
Article MathSciNet MATH Google Scholar
Idé, T., Tsuda, K.: Change-point detection using Krylov subspace learning. In: Proceedings of the 2007 SIAM International Conference on Data Mining, pp. 515–520 (2007). SIAM
Banerjee, S., Guhathakurta, K.: Change-point analysis in financial networks. Stat 9(1), 269 (2020)
Article MathSciNet Google Scholar
Georgescu, V.: Online change-point detection in financial time series: challenges and experimental evidence with frequentist and Bayesian setups 131–145 (2012)
Habibi, R.: Bayesian online change point detection in finance. Financ. Internet Q. 18(1), 27–34 (2022)
Google Scholar
Twedt, B., Rees, L.: Reading between the lines: an empirical examination of qualitative attributes of financial analysts’ reports. J. Acc. Public Policy 31(1), 1–21 (2012)
Article Google Scholar
Huang, A.H., Zang, A.Y., Zheng, R.: Evidence on the information content of text in analyst reports. Acc. Rev. 89(6), 2151–2180 (2014)
Article Google Scholar
Hyvärinen, A., Zhang, K., Shimizu, S., Hoyer, P.O.: Estimation of a structural vector autoregression model using non-gaussianity. J. Mach. Learn. Res. 11(5) (2010)
Truong, C., Oudre, L., Vayatis, N.: Selective review of offline change point detection methods. Signal Process. 167, 107299 (2020)
Article Google Scholar
Fryzlewicz, P.: Wild binary segmentation for multiple change-point detection. Ann. Stat. 42(6), 2243–2281 (2014)
Article MathSciNet MATH Google Scholar
Ahmadi-Javid, A., Fallah-Tafti, M.: Portfolio optimization with entropic value-at-risk. Eur. J. Oper. Res. 279(1), 225–241 (2019)
Article MathSciNet MATH Google Scholar
Cajas, D.: Entropic portfolio optimization: a disciplined convex programming framework. Available at SSRN 3792520 (2021)
Salazar, J., Liang, D., Nguyen, T.Q., Kirchhoff, K.: Masked language model scoring. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 2699–2712. Association for Computational Linguistics (2020). https://doi.org/10.18653/v1/2020.acl-main.240
Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding (2019)
Bai, J.: Estimating multiple breaks one at a time. Econ. Theory 13(3), 315–352 (1997)
Article MathSciNet Google Scholar
Ahmadi-Javid, A.: An information-theoretic approach to constructing coherent risk measures. In: 2011 IEEE International Symposium on Information Theory Proceedings, pp. 2125–2127 (2011). https://doi.org/10.1109/ISIT.2011.6033932
Ahmadi-Javid, A.: Entropic value-at-risk: a new coherent risk measure. J. Optim. Theory Appl. 155(3), 1105–1123 (2012)
Article MathSciNet MATH Google Scholar
Dickey, D.A., Fuller, W.A.: Distribution of the estimators for autoregressive time series with a unit root. J. Am. Stat. Assoc. 74(366a), 427–431 (1979)
Article MathSciNet MATH Google Scholar
Boysen, L., Kempe, A., Liebscher, V., Munk, A., Wittich, O.: Consistencies and rates of convergence of jump-penalized least squares estimators. Ann. Stat. (2009). https://doi.org/10.1214/07-aos558
Article MathSciNet MATH Google Scholar
Harchaoui, Z., Lévy-Leduc, C.: Multiple change-point estimation with a total variation penalty. J. Am. Stat. Assoc. 105(492), 1480–1493 (2010)
Article MathSciNet MATH Google Scholar
Polakow, O.: vectorbt (1.4.2) (2022). https://github.com/polakowo/vectorbt/tree/master/vectorbt
Cajas, D.: Riskfolio-Lib (3.0.0) (2022). https://github.com/dcajasn/Riskfolio-Lib

Download references

Acknowledgements

This work was supported by JST-Mirai Program Grant Number JPMJMI20B1, Japan. The authors declare that the research was conducted without any commercial or financial relationships that could be construed as a potential conflict of interest.

Funding

Open access funding provided by The University of Tokyo.

Author information

Authors and Affiliations

Department of Systems Innovation, School of Engineering, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8656, Japan
Rei Taguchi, Hiroki Sakaji, Kiyoshi Izumi & Yuri Murayama

Authors

Rei Taguchi
View author publications
You can also search for this author in PubMed Google Scholar
Hiroki Sakaji
View author publications
You can also search for this author in PubMed Google Scholar
Kiyoshi Izumi
View author publications
You can also search for this author in PubMed Google Scholar
Yuri Murayama
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rei Taguchi.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

This article is published under an open access license. Please check the 'Copyright Information' section either on this page or in the PDF for details of this license and what re-use is permitted. If your intended use exceeds what is permitted by the license or if you are unable to locate the licence and re-use information, please contact the Rights and Permissions team.

About this article

Cite this article

Taguchi, R., Sakaji, H., Izumi, K. et al. Constructing Sentiment Signal-Based Asset Allocation Method with Causality Information. New Gener. Comput. 41, 777–794 (2023). https://doi.org/10.1007/s00354-023-00231-4

Download citation

Received: 30 November 2022
Accepted: 19 August 2023
Published: 11 September 2023
Issue Date: November 2023
DOI: https://doi.org/10.1007/s00354-023-00231-4

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Constructing Sentiment Signal-Based Asset Allocation Method with Causality Information

Abstract

Similar content being viewed by others

Correlate Influential News Article Events to Stock Quote Movement

Social sentiment and impact in US equity market: an automated approach

Sentiment Index Construction and the Influence of Sentiments on Returns

1 Introduction

1.1 Background

1.2 Focus

1.3 Hypothesis

1.4 Contributions

2 Related Works

2.1 Asset Allocation Using Machine Learning

2.2 Creation of Economic Index Using Text Mining

2.3 Causal Inference and Its Applications

2.4 Time Series Change Point Detection and Its Applications

3 Task Setting

3.1 SSAAM Overview

3.2 Framework Validity

4 Method

4.1 Creating Polarity Index

4.2 Demonstration of Leading Effects

4.3 Change Point Detection

4.4 Portfolio Optimization

5 Experiments , Results

5.1 Dataset Description

5.2 Preparation for Back-Testing

5.3 Back-Testing Scenarios

5.4 Evaluation by Back-Testing

6 Discussion

7 Conclusion , Future Work

Data Availability

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation