1 Introduction

The decentralization of information is an inevitable facet of managing a large system. In modern technological systems, agents constantly face the challenge of making decisions under incomplete and asymmetric information. This challenge arises in many different applications, including transportation, cyber-security, communication, energy management, smart grids and E-commerce.

There is a large body of literature on the issues related to decision making in informationally decentralized systems when the agents are cooperative and jointly wish to maximize a social welfare function which is usually the sum of their utilities [4, 19]. The problem is more challenging when the agents are selfish and not concerned about the society as a whole, but only aim at maximizing their own utility. Currently there exist two approaches to the design of efficient multi-agent systems with asymmetric information and selfish/strategic agents. In the following, we briefly describe these approaches.

1. Mechanism design In this approach, each of the strategic agents is assumed to possess some private information. There is a coordinator who wishes to optimize a network performance metric which depends on the agents’ private information. To elicit strategic agents’ true information, the coordinator (he) needs to provide incentives to them so as to align their strategic objectives (e.g., maximization of a strategic agent’s utility) with his own objective (e.g. maximization of social welfare). In this situation, the system’s information structure (who knows what and when) is fixed, and the coordinator’s goal is to design a mechanism that aligns the agents’ incentives with the coordinator’s incentives (see [23, 40, 51, 56,57,58, 69] and references therein).

2. Information design In this approach, the coordinator knows perfectly the evolution of the system’s state, but the decision making is done by the strategic agents who have incomplete/imperfect knowledge of the state. To incentivize strategic agents to take actions that are desirable for the coordinator, he can provide, sequentially over time, information about the system’s state to them. The goal of information provision/disclosure is the alignment of each agent’s objective with the coordinator’s objective. In this situation, the game-form/mechanism is fixed but the system’s information structure is not fixed. It has to be designed by the coordinator through the sequential disclosure/provision of information to the strategic agents so as to serve his goal (see [12, 18, 74] and references therein).

This paper addresses an information design problem. Information design problems are also referred to as Bayesian persuasion problems because the strategic agents are assumed to understand how information is generated/ manipulated and react to information in a rational (Bayesian) manner. Therefore, information design problems can be seen as persuading Bayesian agents to act in desirable ways. The coordinator who would like to lead other agents to act as he wants is usually referred to as the principal. The system state is the principal’s private information which can be used to persuade others to serve his goal.

Information design problems in dynamic environments involving a principal and long-term-optimizing strategic agents are challenging (see our discussion in Sect. 3.2). Currently, very little is known about the design of optimal sequential information disclosure policies in dynamic environments with long-term-optimizing agents. Therefore, we focus on a very simple problem that allows us to highlight how sequential information provision/disclosure strategies can be designed in dynamic environments with long-term optimizing strategic agents.

We consider a version of the quickest detection problem [70] with a strategic principal (that we call “he”) and one strategic agent/detector (that we call “she”). These gender pronouns are arbitrarily assigned only for a clear referral to the agents. The detector wants to detect when a two-state discrete-time Markov chain jumps from the “good” state to the “bad” absorbing state. The detector cannot make direct observations of the Markov chain’s state. Instead, she receives, at each time instant, a message from the principal about the Markov chain’s state. The principal observes perfectly the evolution of the Markov chain; his objective is to delay, as much as possible, detection of the jump by the detector. At the beginning of the process, the principal commits to a sequential information disclosure strategy/policy which he announces to the detector. The detector’s knowledge of the policy shapes her interpretation of the messages she receives. For each fixed sequential information disclosure policy of the principal, the detector is faced with a standard quickest detection problem with noisy observations. The principal’s goal is to determine the sequential information disclosure policy that convinces the detector to wait as long as possible before declaring the jumps. A precise formulation of this problem is presented in Sects. 2 and 3.

The key features of our problem are:

  • The problem has a dynamic nature;

  • The principal and the agent/detector are long-term optimizers;

  • The principal’s private information varies/evolves over time;

  • The principal has the power of commitment.

These features are not all simultaneously present in any work available so far (see literature survey in Sect. 1.1).

In this paper, we discover an optimal sequential information disclosure strategy for the principal. We prove that it is optimal for the principal to give no information to the detector before a time threshold, run a mixed strategy to confuse the detector at the threshold time, and reveal the true state afterwards. We present an algorithm that determines both the optimal time threshold and the optimal mixed strategy that could be employed by the principal.

We compare the performance of the policy we discover with that of other sequential information disclosure policies (see Sect. 5). The comparison shows that the policy we discover outperforms the sequential information disclosure policies currently available in the literature. We extend our results to the case where the Markov chain has \(n>2\) states, one of which is absorbing.

The contribution of this paper is threefold. (i) The novelty of our model (see the features of our model listed above). (ii) The analytical approach to the solution of the dynamic information design problem. This approach is based on technical arguments that are different from the standard concavification method that is commonly used in the information design literature (see literature survey in Sect. 1.1). (iii) The extension of our results to n-state Markov chains containing one absorbing state.

1.1 Review of Related Works

Information disclosure mechanisms can be seen as a communication protocol between an information transmitter (he) and one or multiple receivers. A significant part of the existing works in this area have studied the nonstrategic case, where the information transmitter and receivers are cooperative and jointly wish to maximize the global utility [50, 54, 75,76,77].

The strategic case of information disclosure mechanisms (Bayesian Persuasion), where the transmitter (principal) and receiver have misaligned objectives, is in tradition of cheap talk [3, 8, 17, 38, 39, 61], and signaling games [34, 71]. In signaling games, the transmitter’s utility depends not only on the receiver’s actions, but also on his type (private information). However, in information design problems neither the principal’s type nor his actions enter his utility directly; they only influence his utility through the effect they have on the receivers’ actions. This feature of the model enables us to investigate/analyze the pure effect of principal’s private information on the receivers’ behaviors.

The main difference between cheap talk and information design is the level of commitment power they give to the principal. In the cheap talk literature, the principal has no commitment power; he decides on the information message to be sent after seeing the realization of his private information. The cheap talk model induces a simultaneous game between the principal and the agents. Thus, the main goal of this strand of works is to characterize the (Nash) equilibria of the induced game. Most of the existing work has focused on the static setting [3, 8, 17, 61].

The work of [17] shows that when the state of the system (principal’s private information) is one-dimensional, it has compact domain, and both the principal’s and the agent’s utilities have certain properties, stated explicitly in [17], then the principal’s equilibrium strategy employs quantization.

More general models of cheap talk, such as multidimensional sources [8], noisy communication [61] and multiple principals with misaligned objectives [3], have been studied in the literature for static settings. There are a few works that study the dynamic version of the cheap talk communication [38, 39]. These works show that allowing for dynamic information transmission improves the informativeness of communication.

In information design problems, the transmitter is endowed with full commitment power.

In this model, the transmitter is allowed to send any distribution of messages as a (measurable) function of the historys of system state and messages, but he should choose and announce his information disclosure policy before observing the system state and then stay committed to it forever. By committing to an information disclosure policy at the very beginning of the system’s operation, the transmitter attempts to persuade all other agents to employ strategies that achieve his objective.

The fact that a player in a game can improve his outcome through commitment was first established in [66, 67, 72] (see also [41]). The full commitment assumption holds in many economic ([11, 32, 37]) and engineering ([18, 74]) multi-agent systems.

The literature on information design is generally divided into two main categories: static and dynamic information design problems.

Static information design problems The static version of the problem, where the state of the system is fixed and no time is involved, has been studied extensively in the literature. The authors of [1] consider a problem of static information disclosure where the state of the system is Gaussian and the utilities are quadratic, and show that there exists a linear policy for the principal that leads to a Stackelberg equilibrium. In [12, 43], the authors propose a concavification method for deriving an optimal information provision mechanism in static settings. The concavification approach was first developed by a group of U.S. economists and game theorists, led by R. Aumann and M. Maschler, with the context of the negotiations for the Strategic Arms Limitation Treaty (SALT) between the U.S. and U.S.S.R. (see [6]). The approach was developed for problems that can be modeled as repeated games of incomplete information. For over half a century, this idea played an important role in the analysis of repeated games (see [27, 80] and references therein) but has not been applied to information design problems until 2011 [43].

Following [43], the information design problem has been studied for more general settings, such as costly communication [33], multi-dimensional state [59, 73], multiple principals [30, 48], multiple receivers [9, 10], and receivers with different prior beliefs [2]. A case of information design problem with transfers where the principal can provide both information and monetary incentives to the receiver is also studied in [47]. There is a group of works in static information disclosure which are more applied and aim to understand or improve real-world institutions via information design. Research in this strand includes applications to grading in schools [16], research procurement [79], medical testing [68], price discrimination [11], insurance [29], and routing software [18, 45, 74, 78, 81]. A through discussion of the literature on information design up until 2018 appears in [42].

Dynamic information design problems The dynamic version of the information design problem, where the informed-player/principal can disclose his superior information sequentially over time, has recently been attracting rapidly growing interest. In dynamic environments, agents’ decisions at each instant of time affect their opponents’ decisions, not only at present but also in the future. The problem becomes more tangled if the information transmitter has commitment power. In this case, the information disclosure policy that the principal commits to for the future, has a direct effect on the receivers’ estimation of what they can gain in the future, and hence on their current decisions. The interdependency among the agents’ decision-making processes over time makes the dynamic information design problems very complex and challenging.

Most of the available works avoid this challenge by assuming that the agents are myopic, that is they only look at each instant at the immediate consequence of their actions, ignoring the subsequent (future) effects. In [26] and [62], both the information transmitter and receivers are assumed to be myopic. Under this simplifying assumption, [62] shows that the principal’s optimal information disclosure policy is a set of linear functions of the current state.

To make the problem closer to reality, the authors of [14, 15, 21, 49, 60, 63,64,65] consider the myopic assumption only for the information receivers (In [63,64,65], the information receiver is myopic as a result of the model and their objective.). This set of works studies the interactions between a long-term-optimizing principal and either a myopic receiver or a sequence of short-lived receivers. In the latter case, at each instant of time a new receiver enters the system, forms her belief about the sender’s strategy by observing the history of the past messages, takes her action, and then exits the system. In such a case, since the receivers leave the system after taking only one action, they are not concerned about the subsequent effects of their decisions. Therefore, considering a sequence of short-lived receivers is exactly equivalent to assuming that the receiver is myopic. Under this assumption, [21] proposes a generalization of the concavification method for dynamic information design problems. The authors of [60] show that when the receiver is myopic, the greedy disclosure policy where the principal minimizes the amount of information being disclosed in each stage, under the constraint that it maximizes his current payoff, is optimal in many cases, but not always.

There are only a few papers which study the dynamic information design problem with both long-term-optimizing principal and long-term-optimizing receivers [5, 20, 22, 25, 35, 36, 52, 55, 74]. In [35, 36, 55], principal is assumed to have no commitment power. This assumption simplifies the problem as in this case, the principal’s policy at each time instant affects the receiver’s decisions only at the future and not at the past. Although in some of these works communication is costly, they could be seen, due to lack of commitments, as dynamic versions of cheap-talk problem. The authors in [5, 20, 22, 25, 52, 74] study the dynamic interactions between a long-term-optimizing principal who has full commitment power and a long-term optimizing receiver. In [5, 20, 22], the private information of the principal is considered to be constant and not varying with time. The problem with time-varying private information for the principal is discussed in [25, 52, 74] for dynamic two-stage settings. The authors of [52] also tackle the information design problem in an infinite-horizon setting. In this setting, they first simplify the problem by restricting attention to a special class of information disclosure mechanisms and then characterize a mechanism that improves the principal’s utility, but is not always optimal.

There are also a few works that combine mechanism design and information design (See [7, 20, 31, 44]). In these papers, the principal chooses both the game form and the information structure so as to persuade the agent to follow the recommendations. The scope of the problems discussed in this line of research is distinctly different from that of our paper.

In our problem, we consider information design in a dynamic setting with an arbitrary time horizon and long-term optimizing principal/transmitter and detector/receiver, when the principal has full commitment power and his private information evolves over time. Because of all these features, our model and problem are distinctly different from those appearing in the literature on information design that we reviewed above.

1.2 Organization of the Paper

The rest of the paper is organized as follows. We present our dynamic information design problem with strategic and long-term-optimizing agents in Sect. 2. In Sect. 3, we formulate the principal’s problem as a dynamic information design problem and discuss its main features. We describe the optimal sequential information disclosure mechanism we propose for the solution to this problem in Sect. 4. In Sect. 5, we show the superiority of our proposed mechanism by comparing it to non-optimal mechanisms that are used in real-world applications. We present extensions of our results to more general settings in Sect. 6. We conclude our paper in Sect. 7. The proofs of all the technical results appear in Appendices 1-9.

1.3 Notation

We use the following notation. Normal font letters x, bold font letters \({\mathbf{x}}\), and calligraphic font letters \({\mathcal {X}}\) stand for scalars, vectors, and sets, respectively. Scalars with a sub-index, e.g., \(x_t\), represent elements of the vector with the same name. \(x_{t_1:t_2}\), with \(t_2 \ge t_1\), represents a vector consisting of \(t_1\)-th through \(t_2\)-th elements of vector x, i.e. \(x_{t_1:t_2}=(x_{t_1},x_{t_1+1}, .., x_{t_2})\). The superscript asterisk (*) indicates an optimal value of the corresponding variable. A vector to the power of an integer, e.g., \({\mathbf{x}}^n\), indicates a vector containing n copies of the original vector \({\mathbf{x}}\). Using this notation, a vector that repeats the same value a, n times, is denoted by \((a)^n\). Notation \(((a)^n, (b)^m)\) represents a vector with n first elements being a and the next m elements being b. A set \({\mathcal {X}}\) raised to the power of an integer n, i.e., \({\mathcal {X}}^n\), denotes the Cartesian product of set \({\mathcal {X}}\) with itself n times.

The strategy of the principal is denoted by \(\varvec{\rho }\). For a fixed strategy \(\varvec{\rho }\), notations \(m^{\varvec{\rho }}\) and \({\mathbb {P}}(A|\varvec{\rho })\) represent the messages sent by the principal based on strategy \(\varvec{\rho }\) and the conditional probability of an event A given \(\varvec{\rho }\), respectively. Notation \({\mathbb {E}}^{\varvec{\rho }}_{t_1:t_2} \{f | A\}\) denotes the expected value of function f from time \(t_1\) to \(t_2\) when the principal commits to strategy \(\varvec{\rho }\) and event A has occurred. \(1_{\{A\}}\) is the indicator function of an event A.

2 Model

2.1 General Framework

Consider a Markov chain \(\{s_t, t \in {\mathcal {T}}=\{ 1,2, \ldots , T \}\}\) with state space \(\{g (good),b (bad)\}\) and a one-step transition probability matrix

$$\begin{aligned} P = \begin{pmatrix} 1-q \quad &{} q \\ 0 \quad &{} 1 \end{pmatrix} \end{aligned}$$

The chain starts in the good state with probability \(\mu \), i.e. \({\mathbb {P}}(s_1=g)=\mu \), and then at each instant of time \(t = 2, \ldots , T\), may switch from g to b with probability q. State b is an absorbing state, meaning that once the Markov chain goes to the bad state, it remains in that state forever. We denote the random time when Markov chain jumps from the good state to the bad state by \(\theta \). The situation where the Markov chain starts in the bad state is captured by considering \(\theta =1\). If the Markov chain remains in the good state until the end of the time period T, we consider \(\theta =T+1\). Therefore, the distribution of the random variable \(\theta \) is as follows:

$$\begin{aligned} {\mathbb {P}} (\theta =\theta ')= {\left\{ \begin{array}{ll} 1-\mu , &{} \text {if} \,\, \theta '=1,\\ \mu \, (1-q)^{\theta '-2} \,\, q, &{} \text {if} \,\, 2 \le \theta ' \le T,\\ \mu \, (1-q)^{T-1}, &{} \text {if} \,\, \theta '=T+1,\\ \end{array}\right. } \end{aligned}$$
(1)

There is a strategic detector (she) in the system who wants to detect the jump to the bad state as accurately as possible. Let \(\tau \) denote the (random) time the detector declares that the jump has occurred. The detector’s cost associated with declaring the jump at time \(\tau \) is

$$\begin{aligned} J^D(\tau ,\theta )=1_{\{\tau < \theta \}}+1_{\{\tau \ge \theta \}} c(\tau -\theta ), \end{aligned}$$
(2)

where \(1_{\{A\}}\) is the indicator function of an event A, which takes value one if A occurs, and zero otherwise. The detector pays one unit of cost if she declares the jump before it actually happens (i.e., false alarm), and pays c units of cost per unit of delayed detection. The goal of the detector is to choose a detection time \(\tau \) so as to minimize the expected value of the cost (2). The detector does not observe the Markov chain’s state \(s_t\), but she receives some information about it from another agent called the principal.

At each instant of time t, the principal (he) observes perfectly the Markov chain’s state \(s_t\) and sends, according to some information transmission/ disclosure strategy \(\varvec{\rho }=(\varvec{\rho }_1,\ldots ,\varvec{\rho }_T)\), a message \(m_t\) to the detector. When the detector receives the message \(m_t\), she updates her belief about the state of the system in a Bayesian way (using \(\varvec{\rho }\)), and based on her new belief she decides whether or not to declare that the jump has occurred. Let \(a_t \in \{k,d\}\) denote the detector’s action at time \(t \in {\mathcal {T}}\), where \(a_t=k\) indicates that the detector keeps silent at time t and does not declare a jump, and \(a_t= d\) indicates that she declares that the jump has occurred. For any fixed choice of the principal’s strategy \(\varvec{\rho }\), the detector has to solve a quickest detection problem [70] to find her best sequence of actions. Therefore, the detector’s optimal strategy at each time instant t is described by a threshold; these thresholds are time-varying and depend on the choice of the principal’s strategy \(\varvec{\rho }\).

The principal’s objective is to delay detection of the jump. Therefore, utilizing his superior information about the state of the Markov chain, the principal attempts to provide informational incentives to the detector so as to persuade her to keep silent. The principal’s utility is

$$\begin{aligned} U^P(\tau )=\tau -1. \end{aligned}$$
(3)

Assumption 1

The parameters of the model, including transition probability matrix P, prior belief \(\mu \), delay cost c, and action set \(\{k,d\}\), are common knowledge between the principal and the detector. The objectives of the principal and the detector are also common knowledge. The only private information in the model is the Markov chain’s state which is perfectly observed only by the principal and not by the detector.

The detector’s decision depends on her belief about the evolution of the system’s unknown state \(s_t\); this belief depends on the principal’s strategy \(\varvec{\rho }\). Therefore, the principal must design a dynamic (over time) information disclosure mechanism in order to influence the evolution of the detector’s beliefs and therefore her sequence of actions.

Remark 1

Our model is similar to Ely’s model [21] in that they both consider an uninformed agent (detector) who wants to detect the transition of a two-state Markov chain to its absorbing state. In both models, the goal of the information provider is to delay such detection. However, there is a fundamental difference between our model and that of Ely. In Ely’s model, the detector uses a time-invariant decision rule that is characterized by a fixed threshold \(p^*\). Specifically, the detector declares that the jump from the “good” state to the “bad” state occurs, at the first time instant \(\tau \) at which her (posterior) belief that the Markov chain is in the bad state exceeds \(p^*\). Such a decision strategy is not optimal for either a finite-horizon or an infinite-horizon quickest detection problem. In the finite T-horizon quickest detection problem, the detector’s optimal decision rule is characterized by a sequence of time-varying thresholds that depend on the parameter c (see Eq. (2)) and on the functional form of her observations (i.e., the principal’s information disclosure strategy). In the infinite-horizon quickest detection problem, the detector’s optimal decision rule is characterized by a time-invariant threshold as long as the functional form of her observations (i.e., the principal’s information disclosure strategy) is time-invariant. It turns out that in our problem as well as in Ely’s problem [21] the principal’s (optimal) information disclosure strategy is not time-invariant; therefore, the detector’s optimal decision strategy is not characterized by a time-invariant threshold. In our problem, the detector’s decision rule is the optimal decision rule for the T-horizon quickest detection problem where the functional form of her observations is the principal’s optimal information disclosure strategy.

In Sect. 3, we formulate the above-mentioned problem as a dynamic information design and present a dynamic information disclosure mechanism that maximizes the principal’s utility. But before that, we briefly present some applications that motivate the model of this section and illustrate some of the fundamental issues in dynamic information design.

2.2 Motivating Applications

As pointed out in Remark 1, our model is similar to that of Ely [21]. Thus, it is motivated by applications similar to those described in [21], specifically, by the manner the information technology department of an organization (principal) provides information to the organization’s employees (detectors) over time so as to influence their productivity, and by the manner a bank (principal) releases, over time, reports about its financial status so as to convince its customers (agents) not to withdraw their savings.

In general, this model captures fundamental issues arising in more general strategic information transmission problems, where one strategic agent (the principal) has superior information about the status of a system as compared to other agents. In these situations, through appropriate observable actions/signaling, the principal attempts to control the other agents’ perception of the system’s status so as to induce them to act in line with his own interest. Such problems frequently occur in Cyber-physical systems, such as power, transportation, and security systems.

3 The Dynamic Information Design Problem

3.1 Problem Formulation

A dynamic information disclosure mechanism specifies the set of messages \({\mathcal {M}}_t\) that the principal sends to the detector at each instant of time \(t\in {\mathcal {T}}\), along with a distribution over \({\mathcal {M}}_t\) given all the information available to the principal at time t. The principal’s information at time t consists of (1) the history of evolution of the state (which the principal observes perfectly) up to time t, i.e. \(s_{1:t}\), (2) the history of his past messages to the detector, i.e. \(m_{1:t-1}\), and (3) the history of the detector’s past actions, i.e. \(a_{1:t-1}\). The set of dynamic information disclosure mechanisms is completely general, it includes the extremes of full information, no information, and all conceivable intermediate mechanisms. The full information mechanism is the rule that reveals perfectly the current state \(s_t\) : i.e., \({\mathcal {M}}_t=\{g,b\}\) and \({\mathbb {P}} (m_t=s_t)=1\). A no-information mechanism is obtained when the set of messages \({\mathcal {M}}_t\) has only one element and the principal sends that single message irrespective of the system state.

As it is clear from the above examples, a dynamic information disclosure mechanism could be very complicated since there is no restriction on the set of messages \({\mathcal {M}}_t\) or the probability distribution on \({\mathcal {M}}_t\), \(t \in {\mathcal {T}}\). However, it is shown in [43] that there is no loss of generality in restricting attention to direct dynamic information disclosure mechanisms that are obedient.

The intuition behind this result is that for any information disclosure mechanism \(\Gamma \), we can construct an obedient direct information disclosure mechanism that achieves the same performance as \(\Gamma \). Thus, in this paper, we concentrate on direct dynamic information disclosure mechanisms that are obedient.

In a direct information disclosure mechanism, at each instant of time, the principal directly recommends to the detector the action she should take. In our problem, the detector’s possible actions at each time t are to either keep silent (\(a_t=k\)) or declare the jump (\(a_t=d\)). Therefore, the set of messages used by the principal at each time t is \({\mathcal {M}}_t={\mathcal {M}}=\{k,d\}\), where k is a recommendation to keep silent, and d is a recommendation to declare a jump. As a result, the principal’s behavior in a direct information disclosure mechanism can be described by a recommendation policy \(\varvec{\rho }=(\rho _{t}^{s_{1:t},m_{1:t-1},a_{1:t-1}}, t \in {\mathcal {T}})\), where \(\rho _{t}^{s_{1:t},m_{1:t-1},a_{1:t-1}}\) is the probability according to which the principal sends message k to the detector (i.e., recommends her to keep silent), when the sequence of the states he has observed up to time t is \(s_{1:t}=(s_1,\ldots ,s_t)\), the history of the past messages he has sent is \(m_{1:t-1}=(m_1,\ldots ,m_{t-1})\), and the history of the detector’s past actions is \(a_{1:t-1}=(a_1,\ldots ,a_{t-1})\). For each \(t \in {\mathcal {T}}\), \(s_{1:t} \in \{g,b\}^{t}\), and \(m_{1:t-1}, a_{1:t-1} \in \{k,d\}^{t-1}\), the principal sends message d to the detector with probability \(1-\rho _{t}^{s_{1:t},m_{1:t-1},a_{1:t-1}}\). There are two features in our problem that help us to simplify the principal’s information.

(1) At each instant of time, the detector has two actions one of which (i.e., declaring the jump) terminates the whole process. Therefore, making a recommendation at time t is meaningful for the principal only if the detector kept silent at all the previous times, i.e., \(a_{1:t-1}=(k)^{t-1}\). Thus, \(a_{1:t-1}\) can be omitted from the principal’s information.

(2) The bad state of our Markov chain is an absorbing state. Therefore, the state evolution of the Markov chain until time t is of the form \(s_{1:t}=((g)^{\theta _t-1},(b)^{t-\theta _t+1})\), where \(\theta _t\) could take any integer value between 1 and \(t+1\). We define \(\theta _t\) as the earliest possible time for the jump based on the principal’s information at time t; i.e. \(\theta _t=\min {(\theta ,t+1)}\). The parameter \(\theta _t\) is equal to \(\theta \) if the jump has occurred up to time t; however it takes value \(t+1\) when the principal finds the Markov chain in the good state at time t. Because of this feature, we can represent the recommendation policy \(\rho _{t}^{s_{1:t},m_{1:t-1}}\) of each time t by \(\rho _{t}^{\theta _t,m_{1:t-1}}\), where \(\theta _t \in \{1, \ldots , t+1 \}\).

When the principal designs an information disclosure mechanism, he announces the corresponding recommendation policy \(\varvec{\rho }\) to the detector and commits to it.

The detector is strategic and long-term optimizing; she utilizes the information she receives from the principal to her own advantage and does not necessarily follow the actions recommended by the principal. Therefore, to achieve his goal, the principal must design an information disclosure mechanism that possesses the obedience property, that is, it provides the detector with strong enough incentives to follow his recommendations.

At each time \(t \in {\mathcal {T}}\), the long-term-optimizing detector obeys the principal if the recommended action \(m_t\) satisfies the following set inclusion:

$$\begin{aligned} m_t \in \mathop {\mathrm{arg~min}}\limits _{a_t}{\left[ \min _{\gamma _{t+1:T}}{{\mathbb {E}}^{\varvec{\rho }}_{t:T} \{J^D(\tau ,\theta ) | m_{1:t}\}}\right] }, \, \forall t, m_{1:t}, \end{aligned}$$
(4)

where \(\gamma _{t+1:T}\) denotes her decision strategy profile from time \(t+1\) up to T, and \(\min _{\gamma _{t+1:T}}{{\mathbb {E}}^{\varvec{\rho }}_{t:T} \{J^D(\tau ,\theta ) | m_{1:t}\}}\) is her minimum expected continuation cost conditional on her information \(m_{1:t}\) when the principal commits to the information disclosure strategy/mechanism \(\varvec{\rho }\). Constraint (4) ensures that at each time \(t \in {\mathcal {T}}\), following the principal’s recommendation \(m_t\) minimizes the detector’s expected continuation cost. Therefore, to check the obedience property of a direct information disclosure mechanism at any time t, \(t=1,2,\ldots ,T\), we need to solve the series/sequence of optimization problems

$$\begin{aligned} \min _{a_t}{\left[ \min _{\gamma _{t+1:T}}{{\mathbb {E}}^{\varvec{\rho }}_{t:T} \{J^D(\tau ,\theta ) | m_{1:t}\}}\right] }, \end{aligned}$$
(5)

for all \(m_{1:t}\). Solving these optimization problems is a challenging and time-consuming task. However, the one-shot deviation principle allows us to derive the obedience constraints by assuming that the detector sticks to the obedient strategy in the future. Specifically, considering a fixed direct information disclosure mechanism, the whole process can be seen as a finite extensive-form game between the detector and nature (principal), where at each stage t nature sends a signal \(m_t\) to the detector according to \(\rho _t\) and then the detector takes an action \(a_t\). With this interpretation, the set of obedience constraints (4) is a necessary and sufficient set of conditions for the strategy profile \(a_t=m_t\), for \(t \in {\mathcal {T}}\), to be a subgame-perfect Nash equilibrium (SPNE) [53]. According to the one-shot deviation principle, a strategy profile \(\gamma \) is a SPNE if and only if no player can increase their payoffs by deviating from \(\gamma \) for one period and then reverting to the strategy [28]. Therefore, a simpler version of the obedience constraints is as follows:

$$\begin{aligned} m_t \in \mathop {\mathrm{arg~min}}\limits _{a_t}{\left[ {\mathbb {E}}^{\varvec{\rho }}_{t:T} \{J^D(\tau ,\theta ) | m_{1:t}, a_{t+1:T}=m^{\varvec{\rho }}_{t+1:T}\}\right] }, \end{aligned}$$
(6)

for all t and \(m_{1:t}\), where the condition \(a_{t+1:T}=m^{\varvec{\rho }}_{t+1:T}\) of the expectation reflects the fact that the detector obeys the recommendations from time \(t+1\) onward, and the notation \(m^{\varvec{\rho }}_{t+1:T}\) indicates that the messages \(m_{t+1:T}\) are generated according to the policy \(\varvec{\rho }\).

Equation (6) includes \(\sum _{t=1}^T{2^t}=2(2^T-1)\) constraints. These constraints force the detector to obey the recommendations after each message history \(m_{1:t}\). However, due to the nature of our problem, the first time the detector is recommended to declare a jump in an obedient mechanism, she will do so and the process will terminate. Therefore, the message histories with at least one d in the past are not going to occur and need not be checked. This feature reduces the obedience constraints need to be considered to the following:

$$\begin{aligned}&{\mathbb {E}}^{\varvec{\rho }}_{t:T} \{J^D(\tau ,\theta ) | m_{1:t}=(k)^t, a_t=k, a_{t+1:T}=m^{\varvec{\rho }}_{t+1:T}\} \nonumber \\&\quad \le {\mathbb {E}}^{\varvec{\rho }}_{t:T} \{J^D(\tau ,\theta ) | m_{1:t}=(k)^t, a_t=d\}, \forall t \in {\mathcal {T}}, \end{aligned}$$
(7)
$$\begin{aligned}&{\mathbb {E}}^{\varvec{\rho }}_{t:T} \{J^D(\tau ,\theta ) | m_{1:t}=((k)^{t-1},d), a_t=k, a_{t+1:T}=m^{\varvec{\rho }}_{t+1:T}\} \nonumber \\&\ge {\mathbb {E}}^{\varvec{\rho }}_{t:T} \{J^D(\tau ,\theta ) | m_{1:t}=((k)^{t-1},d), a_t=d\}, \forall t \in {\mathcal {T}}, \end{aligned}$$
(8)

(2T constraints).

Now that we derived the obedience constraints, we go through calculating the principal’s utility. When the detector follows the recommendation policy \(\varvec{\rho }\), the expected utility the principal gets is

$$\begin{aligned} \begin{aligned}&{\mathbb {E}}^{\varvec{\rho }} \{U^P(\tau )\}={\mathbb {E}}^{\varvec{\rho }} \{\tau -1\}={\mathbb {E}}^{\varvec{\rho }}\left\{ \sum _{t=1}^T{1_{\{a_{1:t} = (k)^t\}}}\right\} \\&\quad =\sum _{t=1}^T{{\mathbb {P}}(a_{1:t} = (k)^t| \varvec{\rho })}=\sum _{t=1}^T{{\mathbb {P}}(m^{\varvec{\rho }}_{1:t} = (k)^t)}\\&\quad =\sum _{t=1}^T{\sum _{\theta '=1}^{T+1}{{\mathbb {P}}(\theta =\theta ') \prod _{t'=1}^t{\rho _{t'}^{\min {(\theta ',t'+1)},(k)^{t'-1}}}}}. \end{aligned} \end{aligned}$$
(9)

Therefore, we can formulate the information design problem (Problem P) for the principal as follows:

figure a

That is, the principal wants to choose a feasible dynamic information disclosure mechanism \(\varvec{\rho }\) that satisfies the obedience constraints (7)–(8) and maximizes his expected utility given by (9).

3.2 Features of The Problem

Solving the optimization problem (P) is a formidable task as the optimization variables are strongly coupled with many non-convex constraints. This strong coupling can be seen by taking a closer look at the expectations appearing in the obedience constraints (7)–(8). According to (2), the detector’s expected continuation cost from time t onward when she has received messages \(m_{1:t}\) and decides to declare the jump at time t is

$$\begin{aligned} {\mathbb {E}}^{\varvec{\rho }}_{t:T} \{J^D(\tau ,\theta ) | m_{1:t}, a_t=d\}= {\mathbb {P}} (s_t=g | m_{1:t}), \end{aligned}$$
(10)

i.e., the probability of false alarm at t. If the detector decides to keep silent at time t and sticks to the obedient strategy at the future, her expected continuation cost is

$$\begin{aligned}&{\mathbb {E}}^{\varvec{\rho }}_{t:T} \{J^D(\tau ,\theta ) | m_{1:t}, a_t=k, a_{t+1:T}=m^{\varvec{\rho }}_{t+1:T}\} \nonumber \\&= c (1-{\mathbb {P}} (s_t=g | m_{1:t})) + {\mathbb {E}}^{\varvec{\rho }}_{t+1:T} \{J^D(\tau ,\theta ) | m_{1:t}, a_t=k, a_{t+1:T}=m^{\varvec{\rho }}_{t+1:T}\}, \end{aligned}$$
(11)

where the first term on the right-hand side is the expected delay cost at time t, and the second term denotes the expected value of all the future costs from time \(t+1\) onward. Substituting (10)–(11) in (7)–(8) shows that at each time t, the detector’s decision to obey or disobey the principal’s recommendation depends on two factors:

(i) the belief the detector has about the good state of the Markov chain, i.e., \({\mathbb {P}} (s_t=g| m_{1:t})\); this belief is constructed according to Bayes’ rule from the past messages \(m_{1:t}\) she received, hence the past recommendation rules \(\rho _{t'}^{\theta _{t'},m_{1:t'-1}}\) (\(t' \le t\)) that generate these messages;

(ii) the cost the detector expects to incur in the future if she remains silent at time t and follows the recommendations afterwards, i.e., \({\mathbb {E}}^{\varvec{\rho }}_{t+1:T} \{J^D(\tau ,\theta ) | m_{1:t},a_t=k, a_{t+1:T}=m^{\varvec{\rho }}_{t+1:T}\}\). This expected cost depends on the messages the detector expects to receive in the future. These messages depend on the future recommendation rules to which the principal commits, i.e. \(\rho _{t'}^{\theta _{t'},m_{1:t'-1}}\), \(t' > t\), \(\forall \theta \).

Therefore, the detector’s decision at each time t depends not only on the recommendation policy for time t, but also on the recommendation policies for all times before and after t. The dependence of the detector’s decision at each time t on the recommendation policy for the entire horizon makes the discovery of an optimal information disclosure mechanism that satisfies the obedience property very challenging. This is mainly because any change in the principal’s recommendation policy at any time t affects the detector’s decisions at all times. Therefore, the principal cannot optimize the recommendation policies of different time slots separately just by considering the obedience constraints at that time, instead he needs to optimize the recommendation policies of the whole horizon simultaneously.

4 An Optimal Sequential Information Disclosure Mechanism

In this section, we present the main result of the paper, namely a dynamic information disclosure mechanism that solves the principal’s problem, expressed by Problem (P). The mechanism is described in Theorem 1. To state Theorem 1, we need the following definitions.

Definition 1

Define by \(\Gamma ^M\) the class of information disclosure mechanisms \(\varvec{\rho }=(\rho _{t}^{s_{t},m_{1:t-1}}, t \in {\mathcal {T}})\) where \(\rho _{t}^{s_{t},m_{1:t-1}}\) depends on the message profile \(m_{1:t-1}\) received by the detector up to time \(t-1\) and only the current state \(s_{t}\) of the Markov chain at time t (not on the state evolution \(s_{1:t}\)).

Definition 2

A mechanism \(\varvec{\rho }=(\rho _{t}^{s_{t},m_{1:t-1}}, t \in {\mathcal {T}}) \in \Gamma ^M\) is called time-based prioritized if:

  1. (i)

    \(\rho _{t}^{g,m_{1:t-1}}=1\) for all t and all \(m_{1:t-1}\);

  2. (ii)

    there is no time t, \(t \in {\mathcal {T}}\), such that \(\rho _{t+1}^{b,(k)^{t}} >0\) while \(\rho _t^{b,(k)^{t-1}} <1\), where \((k)^{t}=(k,\ldots ,k)\) is a vector of length t with all components equal to k; and

  3. (iii)

    for all \(t \in {\mathcal {T}}\), \(\rho _t^{b,m_{1:t-1}}\) is arbitrary, \(0 \le \rho _t^{b,m_{1:t-1}} \le 1\), when \(m_{1:t-1} \ne (k)^{t}\).

We denote by \(\Gamma ^{MP}\) the class of time-based prioritized mechanisms. In information disclosure mechanism \(\varvec{\rho } \in \Gamma ^{MP}\), the priority of keeping the detector silent at each time \(t > \theta \) is higher than keeping her silent at \(t+1\). Therefore, if \(s_t=b\) the principal does not put any effort in manipulating the detector’s information at \(t+1\) unless there is no room for improving his own performance at time t. As a consequence of Definition 2, for each mechanism \(\varvec{\rho } \in \Gamma ^{MP}\), there is a threshold \(n_p\) such that

$$\begin{aligned} \rho _t^{b,(k)^{t-1}}= {\left\{ \begin{array}{ll} 1, &{} t < n_p,\\ 0, &{} t > n_p,\\ q_{n_p}, &{} t=n_p, q_{n_p} \in \left[ 0,1\right] . \end{array}\right. } \end{aligned}$$
(12)

Therefore, any \(\varvec{\rho } \in \Gamma ^{MP}\) can be uniquely described by two parameters \(n_p\) and \(q_{n_p}\). We denote any time-based prioritized mechanism with parameters \(n_p\) and \(q_{n_p}\) as \(\varvec{\rho }=TbP(n_p,q_{n_p})\).

Theorem 1

Without loss of optimality, in Problem (P) the principal can restrict attention to time-based prioritized mechanisms. Determining an optimal time-based prioritized mechanism \(\varvec{\rho }^*\) for Problem (P) is equivalent to finding \(n_p^*+q_{n_p}^*=max{\left\{ n_p+q_{n_p}\right\} }\) such that the mechanism \(TbP(n_p,q_{n_p})\) satisfies all the obedience constraints. The parameters of the optimal time-based prioritized information disclosure mechanism can be obtained by Algorithm 1 above.

figure b

The principal’s expected utility at \(\varvec{\rho }^*=TbP(n_p^*,q_{n_p}^*)\) is

$$\begin{aligned} {\mathbb {E}}^{\varvec{\rho }^*} \{U^P\}= n_p^*-1+{\mathbb {P}}(\theta \le n_p^*) q_{n_p}^*+\sum _{t=n_p^*}^T{{\mathbb {P}}(\theta > t)}. \end{aligned}$$
(13)

Outline of the proof of Theorem 1

Theorem 1 provides us with an optimal sequential information disclosure mechanism. We prove this theorem in three steps.

In the first step, we reduce the complexity of the optimization problem (P) by reducing/simplifying the domain of \(\varvec{\rho }\). To this end, we show that:

(1) the recommendation policy the principal uses at any time t when he has advised the detector to declare the jump at least once before t, plays no role in problem (P);

(2) without loss of optimality, the principal can restrict attention to mechanisms \(\varvec{\rho }=(\rho _{t}^{s_{t},m_{1:t-1}}, t \in {\mathcal {T}})\in \Gamma ^M\). In this class of mechanisms, the recommendation policy at any time t depends only on the state \(s_{t}\) of the Markov chain and the principal’s previous messages \(m_{1:t-1}\) (not on the state evolution \(s_{1:t-1}\)).

In the second step, we prove that \(\rho _{t}^{g,m_{1:t-1}}=1\), for \(t \in {\mathcal {T}}\) and all \(m_{1:t-1}\). Thus, when the Markov chain is in the good state, the principal always recommends the detector to wait.

In the third step, we use the results of the first two steps to prove that restricting attention to the class of time-based prioritized mechanisms is without loss of optimality. Then, we determine an optimal solution for the dynamic information disclosure problem (P) in this class.

Proof of Theorem 1: We prove in Appendix the lemmas appearing in each step.

Step 1 As we discussed in Sect. 3, the principal’s behavior in a direct dynamic information disclosure mechanism is described by a recommendation policy \(\varvec{\rho }=(\rho _{t}^{\theta _t,m_{1:t-1}}, t \in {\mathcal {T}})\), where \(\rho _{t}^{\theta _t,m_{1:t-1}}\) is the probability with which the principal recommends the detector to keep silent at time t, when \(\theta _t=\min {(\theta ,t+1)}\) and the message profile that has been sent so far is \(m_{1:t-1}\). For each time t, \(\theta _t\) could take any integer value between 1 and \(t+1\). Moreover, the principal’s message may take two values at each time slot, so we can have \(2^{t-1}\) different message profiles at each time t. Therefore, to design an optimal direct dynamic information disclosure mechanism we need to determine the optimal values of \(\sum _{t=1}^T{2^{t-1}(t+1)}=2^T T\) different variables. This number grows exponentially with the horizon length T. Furthermore, these variables are coupled with one another through the obedience constraints (7)–(8); hence, their optimal values must be determined simultaneously. Since such a determination is a difficult problem, our goal in this step is to reduce the number of design variables, without any loss of optimality. We achieve our goal via the results of Lemmas 1 and 2. Before stating Lemmas 1 and 2, we define the following function that counts the number of time epochs the detector was recommended to declare the jump in the past.

Definition 3

For each time t and each message history \(m_{1:t-1}\), we define \({\mathcal {N}}_{d}(m_{1:t-1})\) as the number of time slots where message d has been sent to the detector.

Lemma 1

For each time t, the recommendation policy \(\rho _{t}^{\theta _t,m_{1:t-1}}\) needs to be designed only for message profiles \(m_{1:t-1}\) with \({\mathcal {N}}_{d}(m_{1:t-1}) \le 1\). The part of the recommendation policy related to cases where the message d has been sent more than once has no effect on either the obedience property of the mechanism or the utility it gets to the principal.

As a result of Lemma 1, an optimal recommendation policy must determine the optimal values of \(\sum _{t=1}^T{t(t+1)}=\frac{1}{3} T (T+1)(T+2)\) variables. This number grows polynomially rather than exponentially with the horizon length T, so the complexity of the information disclosure problem is significantly reduced.

Lemma 2

Without loss of optimality, for each message profile \(m_{1:t-1}\), the principal can restrict attention to recommendation policies that depend, at each time t, on the current state \(s_t\) of the Markov chain and not on the exact time the jump has occurred.

As a result of Lemma 2, at each time t, and for each message profile \(m_{1:t-1}\), the principal needs to consider only two recommendation strategies, one when \(s_t=g\) the other when \(s_t=b\). Therefore, the total number of design variables is \(\sum _{t=1}^T{2t}=T(T+1)\), which grows as \(T^2\) rather than \(T^3\).

Step 2 We derive an optimal recommendation strategy for the principal when the Markov chain is in the good state.

Lemma 3

If at any time t the Markov chain is in the good state, irrespective of the message profile \(m_{1:t-1}\), it is always optimal for the principal to recommend the detector to keep silent. That is

$$\begin{aligned} \rho _{t}^{* \, g,m_{1:t-1}}=1, \forall m_{1:t-1}, \forall t \in {\mathcal {T}}, \end{aligned}$$
(14)

The result of Lemma 3 is intuitive, because when the state of the Markov chain is good there is no conflict of interest between the principal and the detector. In this state, the principal wants to prevent the detector from declaring a jump, and the detector herself has no incentive to create a false alarm. Therefore, there is no incentive for the principal to mislead the detector.

As a result of Lemma 3, when the detector is recommended to declare a jump, she is absolutely sure that the Markov chain is in the bad state, and thus, she declares a jump. Therefore, the obedience constraints (8) corresponding to situations where the detector receives recommendation \(m_t=d\) are automatically satisfied and can be neglected in the rest of the design process. Moreover, if any message d has been sent by the principal in the past, the detector declares a jump right after receiving d and the whole process terminates. Therefore, we do not need to design a recommendation policy for message profiles that contain at least one d. Consequently, the only variable we should design at any time t, is the probability of recommending the detector to declare a jump when \(s_t=b\) and \(m_{1:t-1}=(k)^{t-1}\). Finding the optimal values of these variables is the subject of the next step.

Step 3 First, we show that, without loss of optimality, the principal can restrict attention to time-based prioritized mechanisms. Then, we determine such optimal mechanism.

Lemma 4

Without loss of optimality, the principal can restrict attention to the class of time-based prioritized mechanisms.

We proceed now to complete the proof of our main result (Theorem 1). The expected utility of a principal who uses time-based prioritized mechanism \(\varvec{\rho }=TbP(n_p,q_{n_p})\) is

$$\begin{aligned} {\mathbb {E}}^{TbP(n_p,q_{n_p})} \{U^P\}= n_p -1+{\mathbb {P}}(\theta \le n_p)q_{n_p}+\sum _{t=n_p}^T{{\mathbb {P}}(\theta > t)}. \end{aligned}$$
(15)

This can be seen as follows. Before the threshold time is reached, the detector is always recommended to keep silent. Therefore, at the first \(n_p-1\) time slots silence is guaranteed. At the threshold period \(n_p\), if the Markov chain is in the bad state (prob. \({\mathbb {P}}(\theta \le n_p)\)), the detector remains silent with probability \(q_{n_p}\). However, if the Markov chain is in the good state (prob. \({\mathbb {P}}(\theta > n_p)\)), the detector remains silent for sure. At each time t after the threshold period, the detector keeps quiet only if the jump has not occurred. The probability of this event is \({\mathbb {P}}(\theta > t)\).

Equation (15) shows that for each constant threshold \(n_p\), the principal’s expected utility is an increasing linear function of \(q_{n_p}\). Moreover, we have \({\mathbb {E}}^{TbP(n_p,1)} \{U^P\}={\mathbb {E}}^{TbP(n_p+1,0)} \{U^P\}\). This is true, simply because \(\varvec{\rho }=TbP(n_p,1)\) and \(\varvec{\rho '}=TbP(n_p+1,0)\) are actually two representations of the same mechanism. We can easily conclude from these two facts that the principal’s expected utility is a piecewise linear function of \(n_p+q_{n_p}\) as depicted in Fig. 1. The slope of the segments increases whenever the variable \(n_p+q_{n_p}\) takes an integer value.

Fig. 1
figure 1

An example of the principal’s utility function in a time-based prioritized mechanism

The arguments above show that finding the optimal time-based prioritized mechanism is equivalent to finding the maximum value of \(n_p+q_{n_p}\) such that the mechanism \(\varvec{\rho }=TbP(n_p,q_{n_p})\) satisfies the obedience constraints. With some algebra, it can be shown that for a time-based prioritized mechanism \(\varvec{\rho }\) the obedience constraints (7) can be simplified to

$$\begin{aligned} c\left( \sum _{l=t}^{n_p-1}{{\mathbb {P}}(\theta \le l)} + {\mathbb {P}}(\theta \le n_p) q_{n_p}\right) \le {\mathbb {P}}(\theta > t), \forall t \le n_p. \end{aligned}$$
(16)

These constraints are very intuitive as for each time t, 1) the right-hand side of (16) is the expected cost of declaring the jump at time t; and 2) the left-hand side of (16) is the expected continuation cost that the detector incurs from time t onward, when she follows the recommendations made by the mechanism \(\varvec{\rho }=TbP(n_p,q_{n_p})\). Therefore, constraints (16) simply say that a time-based prioritized mechanism is obedient if and only if the detector finds declaring the jump more costly than keeping silent at each time \(t \le n_p\) when she is recommended to stay quiet.

The left-hand side of each obedience constraint in (16) is increasing in terms of \(q_{n_p}\). Therefore, if the obedience conditions are satisfied for a mechanism \(\varvec{\rho }=TbP(n_p,q_{n_p})\), they are also satisfied for mechanisms with smaller values for \(q_{n_p}\). Given that the mechanisms \(TbP(n_p,0)\) and \(TbP(n_p-1,1)\) are the same, we can conclude that obedience of a time-based-prioritized mechanism \(\varvec{\rho }=TbP(n_p,q_{n_p})\) implies the obedience of all time-based-prioritized mechanisms with smaller values of \(n_p+q_{n_p}\). Therefore, there is a threshold \(k^*\) such that the set of all obedient time-based-prioritized mechanisms consists of the mechanisms for which \(n_p+q_{n_p}\) takes a value smaller than the threshold \(k^*\). The fact that the principal’s expected utility is increasing in terms of \(n_p+q_{n_p}\) implies that the mechanism with \(n_p+q_{n_p}=k^*\) is optimal. The parameters of the optimal mechanism is uniquely determined by Algorithm 1.

Algorithm 1 works as follows: It iterates over \(n_p=1,\ldots ,T\) and at each iteration it computes the maximum value of \(q_{n_p}\) such that \(\varvec{\rho }=TbP(n_p,q_{n_p})\) satisfies the obedience constraints (16) for all \(t \le n_p\). Achieving a maximum greater than 1 means that the mechanism \(TbP(n_p,1)=TbP(n_p+1,0)\) not only satisfies all the obedience constraints, but also has no binding constraints. This means that there is still more room for improvement. Therefore, the algorithm goes to the next iteration to find a mechanism with the greater utility for the principal. If at some iteration \(n_p^*\) we obtain a maximum of less than one for \(q_{n_p}\) we stop. The mechanism \(TbP(n_p^*,q_{n_p}^*)\) satisfies all obedience constraints and there are binding obedience constraints. Therefore, \(k^*=n_p^*+q_{n_p}^*\) is the optimal threshold which cannot be enhanced anymore, and hence \(\varvec{\rho }^*=TbP(n_p^*,q_{n_p}^*)\) is an optimal information disclosure mechanism.

The proof of Theorem 1 is now complete.

Remark 2

Algorithm 1 determines the parameters of the optimal time-based prioritized mechanism \(\varvec{\rho }^*=TbP(n_p^*,q_{n_p}^*)\). Using these parameters, the information disclosure mechanism \(\varvec{\rho }^*\) works as follows. At any time t, the principal recommends the detector to keep silent if the Markov chain is in the good state (Definition 2-(i)). However, he makes his decision conditional on time when the Markov chain is in the bad state (See (12)). In fact,

  • when \(s_t=b\) and time t is before the threshold time \(n_p^*\), the principal recommends the detector to keep silent;

  • when \(s_t=b\) and \(t=n_p^*\), the principal recommends silence with probability \(q_{n_p}^*\); and

  • when \(s_t=b\) and time exceeds the threshold, the principal recommends the detector to declare the jump.

Therefore, the principal’s recommendation is first a function of the current state and then, if the current state is bad, a function of the time.

We further discuss the mechanism’s behavior in the next section.

5 Numerical Results and Discussion

In this section, we discuss and highlight some interesting features of our designed optimal mechanism obtained with Algorithm 1. We also run some numerical experiments to observe its performance.

Feature 1 The optimal mechanism \(\varvec{\rho }^*=TbP(n_p^*,q_{n_p}^*)\) we propose is a three-phase mechanism. The principal employs three different strategies in different time regions [See Eq. (12)].

Region 1 In the first region which consists of times before the threshold \(n_p^{*}\), irrespective of the Markov chain’s state \(s_t\), the principal recommends the detector to keep silent. During this time interval, the principal’s messages are independent of the Markov chain’s state. These messages give no information to the detector and hence, without loss of optimality, can be removed from the mechanism. Therefore, this region can be referred to as the no-information region.

Region 2 In the second region which takes only one time slot (i.e., \(t=n_p^{*}\)), the principal runs a mixed/randomized strategy to hide his information. In this time slot, the principal always recommends the detector to keep silent if the Markov chain is in the good state. However, when the Markov chain is in the bad state, he reveals this undesirable news only with a probability of \(q_{n_p}^{*}\). By employing this strategy, the detector who receives the recommendation to keep silent cannot distinguish whether this recommendation is caused by the truth-telling strategy of the principal in the good state or by his randomized strategy in the bad state. Therefore, she makes a belief about each of these two cases and takes an action that maximizes her expected continuation utility with respect to these beliefs. The probability \(q_{n_p}^{*}\) is chosen as the maximum probability which makes obedience the best action for the detector. We referred to this region as the randomized region.

Region 3 In the third region which consists of times after the threshold \(n_p^{*}\), the state of the Markov chain can be exactly derived from the principal’s messages. In this time interval, a recommendation to keep silent means that the Markov chain is in the good state and a recommendation to declare the jump means that the state of the Markov chain has switched to the bad state. This region can be referred to as the full-information region.

Based on the above arguments, we can depict the principal’s optimal strategy as in Fig. 2. In this optimal mechanism, the principal does not give any information to the detector up to time \(n_p^{*}-1\), but he promises that if the detector remains silent until that point, then he starts to provide him with “almost accurate” information. The information provided by the principal after time \(n_p^{*}-1\) would contain some noise at time \(n_p^{*}\), but will be precise and fully revealing of the state after that.

Fig. 2
figure 2

The principal’s strategy in the optimal mechanism

Feature 2 In the optimal three-phase mechanism, the principal’s commitment to full disclosure of information after a certain time increases the detector’s patience and gives her incentives to remain silent longer. To see this we compute the length of time the detector remains silent in two instances: (1) when the principal employs a no-information disclosure strategy for the whole horizon; (2) when the principal commits to full information disclosure some time in the future. In the first instance, the detector’s expected cost if she declares the jump at \(\tau =1, \ldots , T\) is

$$\begin{aligned} \begin{aligned}&{\mathbb {E}}^{No} \{J^D(\tau ,\theta )\}={\mathbb {P}} (\theta >\tau )+c \sum _{t=1}^{\tau -1}{{\mathbb {P}} (\theta \le t )}\\&\quad = \mu (1-q)^{\tau -1}+c \sum _{t=1}^{\tau -1}{(1-\mu (1-q)^{t-1})}\\&\quad = \mu (1-q)^{\tau -1}+c (\tau -1)- c \mu \frac{1-(1-q)^{\tau -1}}{q}. \end{aligned} \end{aligned}$$
(17)

If the detector keeps silent at the whole horizon, captured by \(\tau =T+1\), her expected cost is

$$\begin{aligned} {\mathbb {E}}^{No} \{J^D(T+1,\theta )\}=c \sum _{t=1}^{T}{{\mathbb {P}} (\theta \le t )}=cT-c \mu \frac{1-(1-q)^{T}}{q}. \end{aligned}$$
(18)

Therefore, to minimize her expected cost, the detector with no additional information declares the jump at time

$$\begin{aligned} \tau ^{No}=\mathop {\mathrm{arg~min}}\limits _{\tau =1:T+1}{{\mathbb {E}}^{No} \{J^D(\tau ,\theta )\}}.^1 \end{aligned}$$
(19)

Footnote 1

The case \(\tau ^{No}=T+1\) means that the belief \(\mu \) the detector has in the good state of the Markov chain is high enough that the best action for her is to keep silent over the whole horizon. Using the optimal stopping time \(\tau ^{No}\), we can derive the principal’s utility as follows:

$$\begin{aligned} {\mathbb {E}}^{No} \{U^P\}=\tau ^{No}-1. \end{aligned}$$
(20)

The arguments above show that the detector who receives no new information remains silent for \(\tau ^{No}-1\) numbers of time slots. In the optimal time-based-prioritized mechanism (instance 2), the principal’s commitment to provide almost accurate information in the future incentivizes the detector to keep silent for \(n_p^{*}-1\) periods of time without receiving any new information. The difference \(\eta =n_p^{*}-\tau ^{No}\), which we call the patience enhancement, measures by how much the principal’s commitment increases the detector’s patience. In Fig. 3, we have plotted this measure for different values of the parameters \(\mu \in [0,1]\), \(q \in [0,1]\) and \(c \in [0,1]\). In each sub-figure, we fix one of the parameters and partition the space of the two remaining parameters in terms of the patience enhancement \(\eta \). Based on Fig. 3a, the detector’s patience increases by at least one time slot in \(57.17\%\) of the cases, when \(c=1\). For \(c=1\), the patience enhancement is at least 4, 7 and 10 time slots in \(13.38\%\), \(5.64\%\), and \(2.94\%\) of the cases, respectively. The probabilities of having a patience enhancement above thresholds 1, 4, 7, and 10, are depicted in Table 1 for each of Fig. 3a–c. These results show that the principal’s commitment to provide almost accurate information in the future can significantly enhance the detector’s patience.

Fig. 3
figure 3

Color-coded map for patience enhancement (\(\eta =n_p^{*}-\tau ^{No}\)). This parameter measures by how much (in terms of the number of time slots) the principal’s commitment increases the detector’s patience. In Fig. 3a–c, we fix the delay cost (c), the transition probability (q), and the initial belief (\(\mu \)), respectively. The statistical results obtained from these figures are summarized in Table 1. See Feature 2 of Sect. 5 for a more detailed discussion (Color figure online)

Table 1 Probabilities of having a patience enhancement above certain thresholds

Feature 3 The mechanism is almost independent of the horizon length T. The only effect of the horizon length on the optimal mechanism is to limit the threshold \(n_p\) from above. Therefore, as long as the optimal threshold \(n_p^{*}\) is an interior point, increasing the time horizon T does not change the optimal mechanism.

Up to now, we have assumed that the horizon length T is finite, fixed and deterministic. However, Feature 3 allows us to extend our results to both infinite horizon and random finite horizon cases. Below, we discuss the extension of our approach to the infinite horizon case. Then, the applicability of our approach to the random finite horizon setting is derived from a well-known fact that a random horizon can be reformulated as an infinite horizon [13].

According to Feature 3, finding the optimal information disclosure mechanism in infinite horizon setting is equivalent to solving the problem with a sufficiently large horizon length \({\bar{T}}\) such that \(n_p^{*}<{\bar{T}}\). To find a large enough horizon length \({\bar{T}}\), we can run Algorithm 1 by considering \(T=\infty \). Assuming that the algorithm halts and the optimal values \(n_p^{*}\) and \(q_{n_p}^*\) are found, we can define \({\bar{T}}=n_p^{*}+1\). According to Theorem 1, mechanism \(\varvec{\rho }^*=TbP(n_p^*,q_{n_p}^*)\) is an optimal information disclosure mechanism in a finite horizon setting with horizon length \({\bar{T}}\). Since \(n_p^*\) is an interior point in this setting, we can conclude from Feature 3 that increasing the horizon length has no effect on the optimality conditions and hence \(\varvec{\rho }^*=TbP(n_p^*,q_{n_p}^*)\) is an optimal information disclosure mechanism for the infinite horizon case.

The only remaining part of the proof is to show that Algorithm 1 with \(T=\infty \) always halts. Suppose otherwise, which means that the obedience constraints are satisfied for a time-based prioritized mechanism with \(q_{n_p} = 1\) for all \(n_p \ge 1\). In such a mechanism, the recommendations are state-independent. Therefore, based on the discussion made in Feature 1, the mechanism is equivalent to the no-information mechanism. It can be seen from (17) that in a no-information mechanism, the detector’s expected cost \({\mathbb {E}}^{No} \{J^D(.)\}\) goes to infinity when the detection time \(\tau \) goes to infinity. Therefore, it is never optimal for the detector to obey the recommendations that want to keep her silent forever. This contradicts with the assumption that the mechanism is obedient. Therefore, Algorithm 1 always halts and outputs \(n_p^{*}< \infty \).

Feature 4 In the optimal time-based prioritized mechanism derived by Algorithm 1, we choose the maximum value of \(n_p\) such that there exists a \(q_{n_p} \in [0,1]\) such that \(\rho =TbP(n_p,q_{n_p})\) satisfies all the obedience constraints for times before \(n_p\). To do so, at each round, the algorithm considers a fix value of \(n_p\) and computes the maximum value of \(q_{n_p}\) that together with \(n_p\) satisfies all the obedience constraints of times \(t \le n_p\). We can simplify Algorithm 1 by using the result of next lemma.

Lemma 5

Suppose \(n_p \ge \tau ^{No}\). Let

$$\begin{aligned} q_{n_p}^t=\frac{1}{{\mathbb {P}}(\theta \le n_p)}\left( \frac{{\mathbb {P}}(\theta > t)}{c}-\sum _{l=t}^{n_p-1}{{\mathbb {P}}(\theta \le l)}\right) , \end{aligned}$$
(21)

denote the maximum value of \(q_{n_p}\) that together with \(n_p\) satisfies the obedience constraint of time t. Then, we have

$$\begin{aligned} q_{n_p}^{\tau ^{No}}=\min _{t \le n_p}{q_{n_p}^t}, \end{aligned}$$
(22)

where \(\tau ^{No}\) is the time at which the detector declares the jump, when the principal employs a no-information strategy.

Lemma 5 intuitively says that persuading the detector to keep silent at time \(\tau ^{No}\) is the most difficult challenge faced by the principal. This lemma suggests the following simplification for Algorithm 1. To find the optimal mechanism, we can run Algorithm 1 up to \(n_p=\tau ^{No}-1\). Then, if the terminating condition (Lines 3-5) has not been satisfied yet, we can replace Line 2 of the algorithm with

$$\begin{aligned} q_{n_p}=\frac{1}{{\mathbb {P}}(\theta \le n_p)}\left( \frac{{\mathbb {P}}(\theta > \tau ^{No})}{c}-\sum _{l=\tau ^{No}}^{n_p-1}{{\mathbb {P}}(\theta \le l)}\right) , \end{aligned}$$
(23)

where \(\tau ^{No}\) is derived by (19). This simplification reduces the complexity of Algorithm 1, as the algorithm does not need to solve an optimization problem at each round, anymore. Instead, it can solve the optimization problem (19) once and use the result to find the maximum feasible value of \(q_{n_p}\) at each round.

Feature 5 In Fig. 3 and Table 1, we showed the superiority of our proposed mechanism compared to the no-information mechanism. In this part, we compare our mechanism with three other benchmark mechanisms, in terms of the expected utility they can provide for the principal.

  • The first benchmark is the full information mechanism in which the principal reveals perfectly the Markov chain’s state to the detector. The full information mechanism can be considered as a time-based prioritized mechanism with \(n_p=1\), and \(q_{n_p}=0\). Therefore, we can conclude from (15) and (1) that the expected utility the principal gets if he honestly shares his information with the detector is

    $$\begin{aligned} {\mathbb {E}}^{Full} \{U^P\}=\sum _{t=1}^T{{\mathbb {P}}(\theta > t)}=\sum _{t=1}^T{\mu (1-q)^{t-1}}=\mu \frac{1-(1-q)^{T}}{q}. \end{aligned}$$
    (24)
  • The second benchmark we consider here, is the best static mechanism that can be employed by the principal. This comparison highlights the power of dynamic mechanisms compared to static ones. In a static mechanism, the set of messages \({\mathcal {M}}\) that the principal sends to the detector at each instant of time, as well as the distribution over \({\mathcal {M}}\) given the current state is time-independent. By the direct revelation principle, without loss of generality, we can focus on direct static mechanisms in which the detector follows the principal’s recommendations.

    In a direct static mechanism, the principal recommends the detector to keep silent with probability \(\rho _{s_t}\), where \(s_t \in \{g,b\}\) is the current state of the Markov chain. By an argument similar to that in Lemma 3, we can show that in the optimal static mechanism, we have \(\rho _{g}=1\). Moreover, we can show that the principal’s expected utility is an increasing function of \(\rho _{b}\). Therefore, the problem of finding the best static mechanism is equivalent to finding the maximum value of \(\rho _{b}\) such that the mechanism satisfies the obedience constraints. By some algebra, the detector’s obedience constraint at each time t can be derived as follows:

    $$\begin{aligned}&\frac{\mu (1-q)^{t-1}}{\mu (1-q)^{t-1}+(1-\mu )\rho _{b}^{t}+\sum _{\tau =1}^{t-1}{\mu (1-q)^{\tau -1}q \rho _b^{t-\tau }}} \nonumber \\&\quad \ge c \sum _{l=t}^T{\frac{(1-\mu )\rho _{b}^{l}+\sum _{\tau =1}^{l-1}{\mu (1-q)^{\tau -1}q \rho _b^{l-\tau }}}{\mu (1-q)^{t-1}+(1-\mu )\rho _{b}^{t}+\sum _{\tau =1}^{t-1}{\mu (1-q)^{\tau -1}q \rho _b^{t-\tau }}}} \end{aligned}$$
    (25)

    where the left-hand side is the average cost of declaring a jump and the right-hand side is the expected cost of keeping silent at time t, when the detector is recommended to keep silent. We denote the maximum value of \(\rho _b \in [0,1]\) that satisfies constraint (25) for each \(t \le T\), by \({\hat{\rho }}\). Therefore, an efficient static mechanism for the principal is a direct mechanism with \(\rho _{g}=1\) and \(\rho _{b}={\hat{\rho }}\). This mechanism provides the principal with the following expected utility:

    $$\begin{aligned}&{\mathbb {E}}^{stat} \{U^P\}=\sum _{t=1}^T{\left[ {\mathbb {P}}(\theta > t)+\sum _{\theta '=1}^t{{\mathbb {P}}(\theta =\theta ') {\hat{\rho }}^{t-\theta '+1}}\right] }\nonumber \\&\quad =\sum _{t=1}^T{\left[ \mu \, (1-q)^{t-1}+(1-\mu ){\hat{\rho }}^{t}+\sum _{\theta '=2}^t{\mu \, (1-q)^{\theta '-2}\, q \, {\hat{\rho }}^{t-\theta '+1}}\right] }. \end{aligned}$$
    (26)
  • The third benchmark is delayed mechanisms. In these mechanisms, the principal’s strategy is to reveal the time of the jump to the detector with a fixed delay. Such a mechanism is shown to be optimal in the problem studied by Ely [21]. Ely assumes that the detector is myopic and uses a time-invariant threshold to detect the time of the jump; based on this assumption, he proves that the principal’s optimal strategy is to reveal the time of the jump to the detector with a fixed delay. In the comparison of our mechanism with fixed delay mechanisms, we assume that the detector is long-term optimizer and uses the time-varying thresholds of the quickest detection problem corresponding to the principal’s strategy. The goal of this comparison is to investigate how much the principal’s performance would deteriorate if he restricts attention to the set of delayed mechanisms when the detector is long-term optimizer.

In Fig. 4, we have illustrated the principal’s expected utility when he adopts the optimal dynamic, best static, full-information, no-information, and best delayed mechanisms, for different delay costs c, when \(\mu =0.9\), \(q=0.3\) and \(T=50\). We observe that while the benchmark mechanisms outperform each other in different regions, the optimal mechanism proposed in this paper always outperforms all of them. In the comparison of different benchmarks, we can see that for low values of delay cost c, the best delayed mechanism outperforms the other benchmarks. However, when c goes up, the performance of the best static mechanism is superior to that of the three other benchmarks. We observe from Fig. 4 that the percentage of principal’s utility enhancement in our proposed mechanism compared to the best available benchmark is negligible when c approaches 0, goes up to \(19.2\%\) in \(c=0.29\), and then approaches to \(5.5\%\) when the delay cost reaches one.

Fig. 4
figure 4

Comparison of our results with the benchmark mechanisms, for \(\mu =0.9, q=0.3, T=50\). We observe that while the benchmark mechanisms outperform each other in different regions of delay cost (c), the optimal mechanism proposed in this paper always outperforms all of them. The results of this figure are discussed thoroughly in Feature 5 of Sect. 5 (Color figure online)

Feature 6 The three-phase optimal policy is not unique. First, it is a consequence of the fact that we restrict attention to direct obedient information disclosure mechanisms. There may be other indirect information disclosure mechanisms that are optimal but we don’t know whether such mechanisms exist. The existence of optimal indirect information disclosure mechanisms is an open problem.

Moreover, even within the class of direct obedient information disclosure mechanisms, there are some optimal mechanisms that are different from the three-phase optimal policy. The main reason for this is as follows. In Definition 2 of the paper, we define the class of time-based prioritized mechanisms as a special class of dynamic information disclosure mechanisms. We prove through Lemmas 2-4 that restricting attention to this class of mechanisms is without loss of optimality. However, this does not mean that no other optimal direct information disclosure mechanism exists.

To illustrate this fact more clearly, we consider a numerical example with parameters \(q=0.4\), \(\mu =0.7\), \(c=0.5\), and \(T=4\), and present an optimal mechanism that is different from the three-phase one. In this example, the optimal time-based prioritized mechanism which is derived by running Algorithm 1 is \(\varvec{\rho }^*=(3,0.3476)\) which provides the principal with expected utility \({\mathbb {E}}^{\varvec{\rho }^*} \{U^P\}=2.6632\). In mechanism \(\varvec{\rho }^*\), the principal recommends the detector to keep silent at times \(t=1\) and \(t=2\) irrespective of the Markov chain’s state, runs a mixed strategy at time \(t=3\), and fully reveals the state at time \(t=4\). Now, we produce another direct mechanism that is obedient and provides the principal with the same utility. Let \(\varvec{\rho }=(\rho _{t}^{s_t})\), where

$$\begin{aligned} \rho _{t}^{s_t}= {\left\{ \begin{array}{ll} 1, &{} \text {if} \, s_t=g,\\ 1, &{} \text {if} \, s_t=b, t=1,2,\\ 0.15, &{} \text {if} \, s_t=b, t=3,\\ 0.694, &{} \text {if} \, s_t=b, t=4.\\ \end{array}\right. } \end{aligned}$$
(27)

It is easy to show that this mechanism satisfies the obedience constraints and maximizes the principal’s expected utility. Comparing this mechanism with mechanism \(\varvec{\rho }^*\), we can see that in \(\varvec{\rho }\), the principal builds the detector’s trust by giving her more accurate information at time \(t=3\) (\(\rho _{3}^{b}<\rho _{3}^{*b}=0.3476\)) and then hides more information from her at time \(t=4\) (\(\rho _{3}^{b}>\rho _{3}^{*b}=0\)). Both of these mechanisms result in the same expected utility \({\mathbb {E}} \{U^P\}=2.6632\).

The main reason for the existence of optimal policies like (27) is Lemma 4. In the proof of Lemma 4, we showed that for any optimal policy \(\varvec{\rho }\) that splits its obfuscation power into two consecutive time slots t and \(t+1\) (i.e., \(\rho _{t}^{b}<1\) and \(\rho _{t+1}^{b}>0\)), we can construct a time-based prioritized optimal policy that first obfuscates the information at time t as much as it can, and then puts the rest of its obfuscation power, if any, on time \(t+1\). It is clear from this discussion that, apart from the time-based prioritized mechanisms, there exist other optimal mechanisms similar to \(\varvec{\rho }\) (27), that split their obfuscation power into several time slots. However, restricting attention to the set of time-based prioritized mechanisms involves no loss of optimality and considerably simplifies the search for an optimal mechanism.

6 Extensions

We discuss extension of our results in two directions: 1- When the Markov chain has a time-varying matrix of transition probabilities; 2- When the Markov chain has more than two states and one of the states is absorbing [46]. We conclude the section with a conjecture.

  1. 1.

    So far, for simplicity, we have assumed that the Markov chain’s transition probability q is time-independent. However, the results of the paper are readily extendable to settings with a time-varying transition probability, i.e., q(t). For such settings, all theorems and lemmas presented in the paper still hold. This basically means that, for problems in which the Markov chain’s transition matrix varies with time, the principal can still, without loss of optimality, restrict his attention to time-based prioritized mechanisms and the optimal time-based prioritized information disclosure mechanism can be obtained by running Algorithm 1. This can be readily proved by following the same approach as in Sect. 4.

  2. 2.

    Consider a Markov chain \(\{s_t, t \in {\mathcal {T}}\}\) with state space \(\{e_1, e_2, \ldots , e_n\}\), and a one-step transition probability matrix

    $$\begin{aligned} P = \begin{pmatrix} 1 &{} 0 \\ {\underline{P}}_{n-1 \times 1} &{} {\bar{P}}_{n-1 \times n-1} \end{pmatrix}. \end{aligned}$$

    In this Markov chain, state \(e_1\) is an absorbing state and denotes the state after the jump occurs. This state is equivalent to the bad state in our main model. We denote the initial distribution of the Markov chain, which is common knowledge to both the principal and the detector, by \(\varvec{\mu }=(\mu _1,\mu _2,\ldots ,\mu _n)\), where \(\mu _i \ge 0\) and \(\sum _{i=1}^n{\mu _i}=1\). In this setting, the detector’s goal is to detect the jump to the absorbing state as accurately as possible, while the principal’s goal is to delay detection of the jump.

    This problem is more complicated than the two-state case for two reasons:

    1. (a)

      The probability of jumping to the absorbing state depends on the current state of the Markov chain which is unknown to the detector. Therefore, the information superiority of the principal in this case is higher than in the two-state case.

    2. (b)

      When \(|S| = 2\), the state evolution of the Markov chain is of the form \(s_{1:t} = ((g)^{\theta _t-1}, (b)^{t-\theta _t+1})\), where \(\theta _t\) denotes whether the jump has occurred (i.e., \(\theta _t \le t\)) or not (i.e., \(\theta _t = t+1 \)), and if so, when. In this case, as described in detail in Sect. 3.1, the history of states at any time t can be expressed by a one-dimensional parameter \(\theta _t\). However, when the Markov chain has more than two states, this type of simplification is not possible.

    In spite of these additional difficulties, the method proposed in Sect. 4 can be modified as described below to address the general case. First, according to the direct revelation principle, we can restrict attention to direct dynamic information disclosure mechanisms that are obedient. In such a mechanism, at any time t, the principal directly recommends the detector to either keep silent (\(m_t=k\)) or declare the jump (\(m_t=d\)), and the detector must be incentivized to follow all the principal’s recommendations. We describe the principal’s strategy by a recommendation policy \(\varvec{\rho }=(\rho _{t}^{s_{1:t},m_{1:t-1}}, t \in {\mathcal {T}})\), where \(\rho _{t}^{s_{1:t},m_{1:t-1}}\) is the probability according to which the principal recommends the detector to keep silent, when the state and message histories are \(s_{1:t}\) and \(m_{1:t}\), respectively.

    Considering direct dynamic information disclosure mechanisms and following an approach similar to that of Sect. 4, we can prove that it is always optimal for the principal to recommend to the detector to keep silent when the jump has not yet occurred, i.e. \(\rho _t^{s_{1:t},m_{1:t}}=1\), when \(s_t \ne e_1\). We present the details of the proof of this result in [24]. Therefore, designing an optimal information disclosure mechanism is equivalent to determining the optimal values of the parameters \(\rho _t^{s_{1:t},m_{1:t}}\), when \(s_t=e_1\). These probabilities could potentially depend on the evolution path of the states (i.e. \(s_{1:t}\)). However, since the principal uses the same strategy in all non-absorbing state, the detector’s belief about the Markov chain’s state is only a function of the jump time, and not exclusively related to the non-absorbing states the Markov chain has visited. Moreover, the visited non-absorbing states do not affect the future transition probabilities, as the Markov chain has already jumped to state \(e_1\) and remains there forever. Therefore, when the Markov chain is in the absorbing state, the history of the states, apart from the jump time, do not affect either the detector’s belief about the past or the expectation (of the principal and the detector) about the future. Therefore, without loss of optimality, the principal can restrict attention to recommendation policies that depend only on the time of the jump (and not on the state evolution \(s_{1:t}\)). This feature suffices for Lemmas 1-4 in Sect. 4 to be applicable to the general case of the problem. These lemmas allow the principal to further restrict attention to the class of time-based prioritized mechanisms.

    Using this result, we can derive an algorithm, similar to Algorithm 1 of Sect. 4, to determine the optimal time-based prioritized mechanism that maximizes the principal’s utility. The steps of both algorithms are the same; the only difference is in the probabilities that appear in line 2 of Algorithm 1. For the multi-state case, these probabilities are different from those of the two-state case (they are still common knowledge between the principal and the detector).

We conclude our discussion with the following conjecture.

Conjecture 1

The method presented in this paper can be used to derive close-to-optimal information disclosure mechanisms for problems with general jump processes (i.e., processes that cannot be modeled as a Markov chain).

The reason we believe this conjecture to be true is the following. For Markov chains with an arbitrary number of states one of which is absorbing the distribution of the jump time into the absorbing state is a phase type (PH) distribution [46]. The family of all PH-distributions forms a dense subset of the set of all distributions [46], and hence it can be used to approximate jump times into the absorbing state with an arbitrary distribution. Using this approximation and the method presented in this paper, we can design information disclosure mechanisms for general jump processes.

7 Conclusion

We studied a dynamic Bayesian persuasion problem whereby a strategic principal observes the evolution of a Markov chain and designs a recommendation policy that generates a recommendation to a strategic detector at each time. The goals of the principal and the detector are different, therefore, the long-term-optimizing detector does not have to obey the principal’s recommendations unless she is convinced to do so. We presented a sequential recommendation policy that maximizes the principal’s utility, and ensures the detector’s obedience. We proved that the optimal policy is a threshold type, with two thresholds that can be explicitly computed. As time goes by, the optimal recommendation strategy first shifts from a no-information type to a randomized type and then switches to a full-information type.