1 Introduction

1.1 Background

We determine whether financial text can be beneficial for tactical asset allocation methods using equities. This can be accomplished using natural language processing and statistical causal inference to create rebalancing signals. Numerous studies have been conducted on investment techniques using machine learning. In particular, there is a growing body of research on asset allocation, including the derivation of investment signals and the calculation of investment ratios [1,2,3]. We focus on the point at which stock and portfolio prices change rapidly owing to external factors, that is, the point of regime change. Regimes in finance theory refer to invisible states of the market, such as expansion, recession, bull, and bear. Some studies have attempted to capture market alpha by incorporating these regime changes into investment strategies [4,5,6].

1.2 Focus

We specifically drew on the two following studies. Wood et al. [7] employed a change point detection module to capture regime changes and created a simple and expressive model. Ito et al. [8] developed a method of switching investment strategies in response to market conditions. These studies have developed investment strategies that are sufficiently agile to capture regime changes at the time of observation and remain high performers. We go one step further and focus on how to measure future regime changes. If the information on future regime changes, that is, future changes in the market environment, is known, active management with a higher degree of freedom becomes possible. In contrast, because there are certain limitations in calculating future regimes using only traditional financial time series data, we construct an investment strategy based on a combination of alternative data that has been attracting attention in recent years in addition to financial time series data.

1.3 Hypothesis

In recent years, given the explosive development of artificial intelligence (AI), the use of alternative data, which is particularly prominent in the financial and economic fields and is beginning to be widely used for economic forecasting and investment strategies together with traditional data, has received worldwide attention. Among them is the versatility of text data, including research on creating an economic polarity index through sentiment analysis [9, 10] and extracting causal information from text data [11]. Against this background, we formulate the following hypotheses to calculate the point of change in future regimes.

  • Polarity indexes calculated from text data contain information that precedes traditional financial time series data such as stocks.

  • Polarity indexes calculated from text data can be a signal to rebalance a portfolio, and this signal can affect increases in portfolio performance.

  • Portfolio performance can be improved by switching between risk-minimizing and return-maximizing optimization strategies according to the change points created by the polarity index.

1.4 Contributions

The proposed investment strategy using financial text is expected to generate better performance than the comparative strategy. This is done by generating polarity indexes from the financial text using natural language processing techniques, identifying the precedence of the generated polarity indexes using statistical causal inference, calculating the regime change points of the polarity indexes using change point detection techniques, and rebalancing the portfolio using multiple mathematical optimization models according to the change points. This study makes the following contributions.

  • Proposed a highly expressive asset allocation framework using financial text mining techniques.

  • Demonstrated that the estimation of regime change points using financial text is material for active management.

  • Demonstrated that the preceding and following relationships between financial time series and text are material for active management.

2 Related Works

2.1 Asset Allocation Using Machine Learning

Studies of frameworks that combine mean-variance models and deep learning include Ma et al. [12] and Wang et al. [13]. Yun et al. [14] proposed a two-stage deep learning model for a forecast-based portfolio management approach. Zhang et al. [15] proposed an asset allocation framework using deep learning to maximize portfolio sharp ratio. In other work, Chen et al. [3] proposed an asset allocation framework using XGBoost and firefly algorithm. Imajo et al. [1] constructed a network to predict the distribution of the residual term of returns and proposed a method to design portfolios based on the information of the predicted distribution. Ito et al. [2] outlined an evolutionary computation model that mimics the role of a trader known as the Trader-Company method. The novelty of this study differs from the aforementioned in that it uses natural language processing technology in the asset allocation framework.

2.2 Creation of Economic Index Using Text Mining

Seki et al. [16] developed a business confidence index using BERT. Text data was provided by Nihon Keizai Shimbun, Inc. Several other studies have developed business sentiment indices from text data [9, 17]. There is also research on using text data to predict stock closing prices [18, 19]. In addition, research has been conducted on the creation of an economic index using rate information from analyst reports [6, 10]. Several studies use text mining of analyst reports to predict stock prices [20, 21]. The novelty of the current study is that the masked language model (MLM) score was used to create the polarity index to index the tone of the financial text.

2.3 Causal Inference and Its Applications

Causal structure learning algorithms can be classified into three clusters. First, constraint-based approaches that use conditional independence tests to establish the existence of an edge between two nodes [22, 23]. Second, a score-based method that utilizes several search procedures to optimize a particular score function [24,25,26]. Third, structural causal models that represent variables at specific nodes as a function of their parents [27,28,29,30,31]. The following are examples of applications of statistical inference in the financial field. D’Acunto et al. [32] used VAR-LiNGAM, which incorporates time series into the Linear Non-Gaussian Acyclic Model (LiNGAM), a semiparametric causal inference algorithm, to reveal the causal structure of risk factors in stocks. Ohmura [33] analyzed the relationship between the stock market and political support using VAR-LiNGAM. Research has also examined the construction of networks that link causal relationships regarding performance among firms, known as causal chains [34,35,36]. The current study is novel in that it uses VAR-LiNGAM for empirical analysis of the leading–lagging relationship between stock portfolios and polarity indicators created from financial texts.

2.4 Time Series Change Point Detection and Its Applications

In time series data, temporal non-independence must be explicitly addressed owing to the existence of temporal correlations among the data. For example, models using autoregressive [37] and singular spectrum analysis [38, 39] have been proposed. Several studies apply change point detection algorithms to financial time series analysis [40,41,42]. Wood et al. [7] used a change point detection algorithm with a Gaussian process to construct an investment strategy. The current study is novel in that it uses Binary Segmentation Search, a highly expressive change point detection algorithm, to estimate economic regimes.

3 Task Setting

We propose a tactical asset allocation method that agilely captures market regime changes and reflects them in rebalancing. The asset allocation of a stock portfolio comprising three or more stocks was performed using signals as points of change in a polarity index regime created from financial news. Using financial news, we developed an investment strategy using natural language processing and AI techniques in the following four steps. By comparing this investment strategy with comparative strategies, we demonstrate that the framework proposed herein is useful.

3.1 SSAAM Overview

  • Step 1 (Creating polarity index): Score financial news titles using MLM scoring. In addition, quartiles are calculated from the same data, and a three-value classification of positive, negative, and neutral is performed according to the quartile range. The calculated values are aggregated daily.

  • Step 2 (Demonstration of leading effects): We use statistical causal inference to demonstrate whether financial news has leading effects on a stock portfolio. Use the polarity index created in Step 1. We also create a portfolio of 10 stocks combined. We use the VAR-LiNGAM algorithm.

  • Step 3 (Change point detection): Verify that the polarity index has leading effects in Step 2. Calculate the regime change point of the polarity index using the change point detection algorithm. We use the Binary Segmentation Search Method algorithm.

  • Step 4 (Portfolio optimization): Portfolio optimization is performed based on the change points created in Step 3. We use the Entropic value-at-risk (EVaR) optimization algorithm.

Fig. 1
figure 1

Framework of the proposed method

The architecture used in this study can be described as illustrated in Fig. 1.

3.2 Framework Validity

Studies on which this framework is based include Ito et al. [8] and Wood et al. [7]. We propose a more practical framework by incorporating textual causality into these. In this section, we discuss the validity of each step of the framework elements. In Step 1, polarity was assigned to the financial text using MLM scoring. This task was set based on research indicating that the tone of U.S. financial reporting affects stock prices [43, 44]. In Step 2, VAR-LiNGAM was selected. It has been highlighted that the VAR framework is at risk of false positives when there are simultaneous effects between variables [45], and this model was selected to cover this issue. In Step 3, Binary Segmentation Search was selected. Truong et al. [46] compared each change point detection model and revealed that Binary Segmentation Search has higher scalability. We also used the least squared deviation (CostL2) as the cost function; the Step 3 combination is effective for time series analysis in the financial domain [46, 47]. In Step 4, EVaR optimization was selected. Results of a pilot experiment using the S &P500 [48, 49] established that performance was better when EVaR optimization was used compared with comparative methods such as CVaR optimization.

4 Method

4.1 Creating Polarity Index

We use pseudo-log-likelihood scores (PLLs) to create polarity indices. PLLs are probabilistic language models corresponding to the masked language models (MLMs) proposed by Salazar et al. [50]. Since MLMs are pre-trained by predicting words in both directions, they could not be handled by conventional probabilistic language models. In contrast, PLLs can determine the naturalness of sentences at a high level, because it is represented by the sum of the log-likelihoods of the conditional probabilities when each word is masked and predicted. Token \(\psi _t\) is replaced as [MASK] and past and present tokens \(\varvec{\Psi }_{\backslash t}= [\psi _1,\psi _2,...,\psi _{t}]\) are predicted. t represents time, \(\Theta \) is a model parameter, and \(P_{MLM} (\cdot )\) is the probability of each sentence token. The MLM selected BERT [51].

$$\begin{aligned} {\textbf {PLL}}(\varvec{\Psi }) := \sum ^{\mid \varvec{\Psi }\mid }_{t=1}\log _{2} P_{MLM} (\psi _t \mid \varvec{\Psi }_{\backslash t}; \Theta ). \end{aligned}$$
(1)

Score the financial news text with PLLs one sentence at a time after preprocessing. Quartile ranges are calculated for the data scored one sentence at a time. See the figure below for the polarity classification method.

Table 1 Polarity Classification Method

Aggregate the scores chronologically according to the title column of the financial news.

4.2 Demonstration of Leading Effects

We use VAR-LiNGAM, which is a statistical causal inference model proposed by Hyvärinen et al. [45], to demonstrate precedence. The causal graph inferred by VAR-LiNGAM is as follows.

$$\begin{aligned} {\textbf {x}}(t) = \sum ^{T}_{\tau =1} {\textbf {B}}_{\tau } {\textbf {x}}(t-\tau )+{\textbf {e}}(t) \end{aligned}$$
(2)

where \({\textbf {x}}(t)\) is the vector of variables at time t, and \(\tau \) is the time delay. T represents the maturity date. Additionally, \({\textbf {B}}_{\tau }\) is a coefficient matrix that represents the causal relationship between variables \({\textbf {x}}(t-\tau )\). \({\textbf {e}}(t)\) denotes the disturbance term. VAR-LiNGAM is implemented by the following procedure. First, a VAR model is applied to the causal relationships among variables from lag time to the current time. Second, for the causal relationships among variables at the current time, LiNGAM inference is performed using the residuals of the VAR model above. We confirm whether financial news is preferred over stock portfolios.

4.3 Change Point Detection

Binary Segmentation Search [47, 52] is a greedy sequential algorithm. The notation follows Truong et al. [46]. This operation is greedy in the sense that it seeks the change point that has the lowest sum of costs. Next, the signal is divided into two at the position of first change point and the same operation is repeated for the obtained partial signal until the stop reference is reached. Binary Segmentation Search is expressed as Algorithm 1. Define a signal \(y=\{y_s\}^S_{s=1}\) that follows a multivariate nonstationary stochastic process. This process has S samples. L refers to the list of change points. Let s denote the value of the change point. G refers to the ordered list of change points to be computed. If signal y is given, the \((b-a)\)-sample long sub-signal \(\{y_s\}^b_{s=a+1}, (1 \le a < b \le S)\) is simply denoted \(y_{a, b}\). Hats represent calculated values. Other notations are noted in the algorithm comments.

Algorithm 1
figure a

Binary Segmentation Search Method

4.4 Portfolio Optimization

EVaR is a coherent risk measure that is the upper bound between VaR and conditional VaR (CVaR) derived from Chernoff’s inequality [53, 54]. EVaR has the advantage of being computationally tractable compared with other risk measures such as CVaR when incorporated into stochastic optimization problems [54]. The definition of EVaR is as follows.

$$\begin{aligned} {\textbf {EVaR}}_{\alpha }(X):=\min _{z>0}\left\{ z \ln \left( \frac{1}{\alpha } M_{X} \left( \frac{1}{z}\right) \right) \right\} . \end{aligned}$$
(3)

X is a random variable, \(M_{X}\) is the moment generating function, \(\alpha \) is the significance level, and z are variables. Cajas [49] propose general convex programming framework for EVaR. We switch between the following two optimization strategies depending on the regime classified in Sect. 4.3.

  • Minimize risk optimization: A convex optimization problem with constraints imposed to minimize EVaR given a level of expected \(\mu \) (\(\widehat{\mu })\).

(4)
  • Maximize return optimization: A convex optimization problem imposed to maximize expected return given a level of expected EVaR (\(\widehat{EVaR}\)).

(5)

where q, z, and u are variables, \(K_\mathrm{{exp}}\) is the exponential cone, and T is the number of observations. w is defined as a vector of weights for N assets, r as a matrix of returns, and \(\mu \) as a mean vector of assets.

5 Experiments , Results

5.1 Dataset Description

Based on the assumption that financial news has precedence over the equity portfolio, we calculate the signal for portfolio rebalancing and tactical asset allocation to actively go for an alpha. We use two types of data.

  • Stock Data: We used the daily stock data provided by Yahoo!Finance.Footnote 1 The stocks used are the components of the NYSE FANG+ Index: Facebook, Apple, Amazon, Netflix, Google, Microsoft, Alibaba, Baidu, NVIDIA, and Tesla were selected. For these data, adjusted closing prices are used. The time period for this data is January 2015 through December 2019.

  • Financial News Data: We used the daily historical financial news archive provided by Kaggle,Footnote 2 a data analysis platform. The data represent the historical news archive of U.S. stocks listed on the NYSE/NASDAQ for the past 12 years. The data were confirmed to contain information on 10 stock data issues. The data consist of nine columns and 221,513 rows; the title and release date columns were used in this study. The time period for the data is January 2015 through December 2019.

5.2 Preparation for Back-Testing

The experiments are in-sample validation. All data are used to estimate the polarity index, VAR-LiNGAM, and change points.

The polarity index is created using the method in Sect. 4.1. Financial news data are preprocessed once before creating the polarity index. Both financial news and stock data are in daily units; however, to match the period, if there are blanks in either, lines containing blanks are dropped. The summary statistics for the polarity index created are as follows.

Table 2 Summary statistics for polarity index

Once the polarity index is created as per the method in Sect. 4.1, the next step is to create a stock portfolio by adding up the adjusted closing prices of ten stocks. The investment ratio for the portfolio was set uniformly for all stocks. Next, use VAR-LiNGAM in Sect. 4.2 to perform causal inference.

Here, the Augmented Dickey-Fuller (ADF) test [55] is used to confirm the stationarity of the data. According to Table 3, the test statistic is smaller than 0.05, so the null hypothesis can be rejected. Therefore, we can say that the data are stationary.

Table 3 Augmented dickey-Fuller test

Now, that stationarity has been verified, the data will be used as input to VARLiNGAM. The results of causal inference are represented as follows.

Table 4 Causal inference in VAR-LiNGAM

Values in Table 4 refers to the elements of the adjacency matrix. The lower limit was set at 0.05. The results indicate that the polarity index has a leading edge over the equity portfolio. The Python libraries LiNGAM [45] were used.

Split the polarity index by regime turn according to the method described in Sect. 4.3. The regimes are classified according to whether the index is rising or falling, and from Table 4, we know that the movement of the index is linked to that of the portfolio. As there are only two types of portfolio change points, up and down, the signals created by using sequential methods such as binomial segmentation search on the index are effective for portfolio rebalancing. We use the results of the classification of the regimes into 5 and 10.

Fig. 2
figure 2

Change point detection (5 regimes)

Fig. 3
figure 3

Change point detection (10 regimes)

The following metrics are used to evaluate the performance of change point detection. Few notations introduced in Sect. 4.3 are used. The Python libraries ruptures [46] were used.

  • Precision: Precision is the ratio of how many correct values are included in the positive class and the predicted sample. In the context of change point detection, precision is defined as follows:

$$\begin{aligned} \text {Precision} := \mid \text {TP}\mid /\mid \lbrace \hat{s_l}\rbrace _l\mid . \end{aligned}$$
(6)
  • Hausdorff Metric (HM): HM estimates the worst prediction error. Studies relevant to change point detection include Boysen et al. [56] and Harchaoui & Lévy-Leduc [57].

$$\begin{aligned} {\textbf {HM}} := \max \lbrace \max _m \min _l \mid s_m - \hat{s}_l\mid , \max _m \min _l \mid \hat{s}_m - s_l\mid \rbrace \end{aligned}$$
(7)

Let l and m be variables. TP refers to true positive and can be defined as TP \(:= \lbrace s_m\mid \exists s_l \text{ s.t. } \mid s_l - \hat{s}_m<\mathrm{{Mar}}\mid \rbrace \). Mar is a margin. The results calculated using each metric are presented in Fig. 5. Note that the authors manually assigned the correct labels necessary for the evaluation.

Table 5 Evaluation of change point detection

5.3 Back-Testing Scenarios

The following rebalancing timings are merged and back-tested. The Python libraries vectorbt [58]) and Riskfolio-Lib (Cajas [59]) were used for back-testing. In addition to EVaR optimization, CVaR optimization and the mean-variance model were used as optimization algorithms as well as comparative methods. The number of regimes was set at 5 and 10. The rebalancing timings were 30, 90, and 180 days. The back-testing methodology is as follows. CPD-EVaR++ is positioned as the proposed strategy, and CPD-EVaR+ is the runner-up strategy.

  • CPD-EVaR++ (proposed): Change point rebalancing using risk minimization and return maximization EVaR optimization + regular intervals rebalancing strategy

  • CPD-EVaR+: Change point rebalancing using risk minimization and no-restrictions EVaR optimization + regular intervals rebalancing strategy

  • EVaR: EVaR optimization regular intervals rebalancing strategy

  • CVaR: CVaR optimization regular intervals rebalancing strategy

  • MV: Mean-Variance optimization regular intervals rebalancing strategy

The binary determination of whether the polarity index within each regime shows an uptrend or downtrend is made by looking at Figs. 2 and 3. See Tables 6 and 7 for a list of algorithms used for SSAAM back-testing.

Table 6 Model Judgment: 5-Regime
Table 7 Model Judgment: 10-Regime

5.4 Evaluation by Back-Testing

The following metrics were employed to assess portfolio performance. For periods of regimes that cannot be back-tested with the current parameters, those related to rebalancing were set uniformly to 10. The results of the back-testing are presented in the following tables: Table 8 summarizes the methodology of this study and Table 9 summarizes the comparative methodology.

  • Total return (TR): TR refers to the total return earned from an investment in an investment product within a given period. TR formula is as follows: TR = valuation amount + cumulative distribution amount received + cumulative amount sold – cumulative amount bought. This study does not incorporate tax amounts and trading commissions.

  • Maximum drawdown (MDD): MDD refers to the rate of decline from the maximum asset. MDD formula is as follows: MDD = (trough value – peak value)/peak value.

Table 8 Back-testing (SSAAM)
Table 9 Back-testing (comparison)

The covariance matrix and expected returns of the SSAAM (CPD-EVaR++ and CPD-EVaR+) were estimated based on historical data. The same is true for the comparison methods (MV, CVaR, and EVaR). Also, the limit of the portfolio turnover deviation is fixed at 0.05.

As an example, the values for each portfolio are presented in Fig. 4. They assume an initial value of 100 and rebalancing every 30 days; SSAAM is calculated for the case of switching strategies in five regimes.

Fig. 4
figure 4

Portfolio value: 30-days rebalance & 5-Regime

6 Discussion

According to Table 8, the higher the number of regular rebalances, the higher the TR. In addition, the maximum drawdowns hovered between 25% and 45%, which is considered to be within the range of maximum drawdowns acceptable to the average system trader. The experiment was conducted separately when there were five and 10 regimes. The TR was higher when there were five regimes, whereas the maximum drawdown was almost the same for both regimes. Furthermore, as hypothesized (Sect. 1.3), CPD-EVaR++, which is a combination of risk minimization and return maximization operations, performed better than the others. Therefore, using this method, the best practice in managing equity portfolios is to use CPD-EVaR++ and to rebalance irregularly in regime five in addition to the regular rebalancing every 30 days.

Back-testing of Table 9 with the same parameters as in Table 8. The results indicate that for the algorithm, the EVaR optimization performed better than the others, similar to the results in Cajas [49]. This may be because the computational efficiency of EVaR in stochastic optimization problems is higher than that of using other risk measures such as CVaR. As in Table 8, the TR tends to be higher as the number of rebalancing cycles increases. The maximum drawdowns ranged from 43 to 47% and appeared to remain high on average.

Tables 8 and 9 were compared. When comparing the strategies in Tables 8 and 9, the TR in Table 8 revealed that the strategies with the rebalancing of 30 and regime 5 are high performers in both risk evaluation and return evaluation. Conversely, strategies with a rebalance of 180 are found to have low performance. This can be interpreted that the strategies with wide rebalancing intervals do not fully utilize the benefit of the prior information contained in the text conducted in Sect. 5.2.

We address the combination of rebalancing timing. As shown in the metrics in Fig. 5, regime 10 has a higher rating for regime splitting. In contrast, the back-test results show that the combination of irregular rebalancing in regime 10 and periodic rebalancings such as 30, 90, and 180 days are incompatible. Therefore, following Ito et al. [8], we experimented to combine the two rebalances. According to the experimental results, this method is effective for short-term tactical asset allocation but remains problematic for long-term strategic asset allocation.

We discuss the determination of regime trends. Although the change points were calculated data-driven through a Binary Segmentation Search, it is up to the user of the framework to decide whether the trend of each regime is upward or downward. It is not very realistic for fund managers in quant management to use fully automated models in their investment operations, but further sophistication and explanatory power in this regard are certainly needed.

We discuss the case where this method is used in actual transactions. Daily financial news was used, but not necessarily only when textual information is issued with high frequency. For example, Taguchi et al. [6] predicted the polarity index once using Bidirectional LSTM as an alternative and simulated the investment after regime splitting. If individual polarity indices are created from quarterly earnings reports or analyst reports that are published irregularly, or if a combination of these texts is used to create polarity indices, the method of predicting a polarity index once and then using it may be effective in some extent.

We discuss the validity of this method outside of the U.S. market. We used MLM scoring, which calculates textual tone, for creating polarity indicators. The tone of the text, the way emotions are expressed, and the language system are considered to be different in each country. In addition to the U.S. market, it may be interesting to test this method for markets in European and Asian countries and measure the contribution of textual information.

7 Conclusion , Future Work

This study demonstrates the utility of financial text for asset allocation with equity portfolios. This was accomplished using natural language processing and change point detection techniques to create polarity indicators to signal for rebalancing. In the future, we would like to develop a tactical asset allocation strategy that mixes stocks as well as other asset classes such as bonds. We also hope to further enhance this framework with portfolio management that includes option-based hedging strategies and credit derivatives. We would like to confirm the effectiveness of this method by examining available textual data, such as reports and financial statements published by central banks.