Motivating Time-Inconsistent Agents: A Computational Approach

We study the complexity of motivating time-inconsistent agents to complete long term projects in a graph-based planning model proposed by Kleinberg and Oren (2014). Given a task graph G with n nodes, our objective is to guide an agent towards a target node t under certain budget constraints. The crux is that the agent may change its strategy over time due to its present-bias. We consider two strategies to guide the agent. First, a single reward is placed at t and arbitrary edges can be removed from G. Secondly, rewards can be placed at arbitrary nodes of G but no edges must be deleted. In both cases we show that it is NP-complete to decide if a given budget is sufficient to keep the agent motivated. For the first setting, we give complementing upper and lower bounds on the approximability of the minimum required budget. In particular, we devise a \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$(1+\sqrt {n})$\end{document}(1+n)-approximation algorithm and prove NP-hardness for ratios greater than \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$\sqrt {n}/3$\end{document}n/3. We also argue that the second setting does not permit any efficient approximation unless P = NP.


Introduction
Motivated by a recent paper of Kleinberg and Oren [4], we study the phenomenon of time-inconsistent behavior from a computer science perspective.This fundamental problem in behavioral economics has many examples in every day life, including academia.Consider, for instance, a referee who agrees to evaluate a scientific proposal.Despite good intentions, the referee gets distracted and never submits a report.Or consider a student who enrolls in a course.After successfully completing the first couple of homework assignments the student drops out without earning any credit points.In general, these situations have a reoccurring pattern.An agent makes a plan to complete a set of tasks in the future but changes the plan at a later point in time.Sometimes this is the result of unforeseen circumstances.However, in many cases the plan is changed or abandoned even if the circumstances are the same as when the plan was made.This paradox behavior of procrastination and abandonment is well-known in the field of behavioral economics and can have substantial effects on the performance of agents in an economic or social domain, see e.g.[1,7,8].
A sensible explanation for time-inconsistent behavior is that agents assign disproportionally greater value to current cost than to future expenses.As an example, consider a simple car wash problem in which an agent, say Alice, is promised extra pocket money for washing her family's car.Each day Alice can either do the chore Our contribution: In this paper we focus on the complexity and approximability of finding motivating subgraphs and reward configurations.Our objective is budget efficiency.Note that we take a design perspective.In particular, we are not interested in minimizing the total cost experienced by the agent on its walk from s to t but rather the reward necessary to motivate the agent.
As for the first problem, by removing edges from G, it is possible to limit the agent's options in each of its steps.Ideally, this prevents the agent from pursuing costly distractions and thereby reduces the reward required for it to finish the project.The benefit of choice reduction is a well-known phenomenon in the field of behavioral economics.It also has a very natural intuition in many real-live projects.Take for instance the car wash problem.As we will show in Section 2, the removal of edges in the problem's task graph corresponds to the introduction of deadlines.
The second problem takes a slightly less restrictive approach and allows the placement of intermediate rewards at arbitrary vertices of G. Again this is meant to prevent the agent from pursuing distractions and encourage it to complete the project.We examine a version of the problem that, in our view, is the most sensible one.First, only non-negative rewards may be laid out.This assumption is reasonable as it could be hard to convince an agent to pursue projects in which it has to make payments.Furthermore, it is not clear how to account for such payments in the budget.Secondly, the cost of a reward configuration is only measured by the sum of the rewards that are placed at vertice visited by the agent on its walk from s to t.This setting is a fundamentally different from the ones analyzed by Tang et al. as it may lead to configurations in which the agent is motivated by rewards that are never claimed.Such configurations are also called exploitative.We give an example in Section 2.
In Section 3 we settle the complexity of finding a motivating subgraph for a fixed r.We first observe that the problem is polynomially solvable if β = 0 or β = 1.We then prove that, for general β ∈ (0, 1), it is NPcomplete to decide the existence of a motivating subgraph.In their paper [9], Tang et al. showed NP-hardness via a reduction from 3-SAT.In contrast, we present a different reduction via k DISJOINT CONNECTING PATHS [3].We believe that this reduction is slightly simpler.More importantly, we are able to generalize the reduction and show a hardness of approximation result in the following section.
In Section 4 we study the optimization version of the motivating subgraph problem.More formally, given a β ∈ (0, 1), determine the smallest possible value of r such that G contains a motivating subgraph.We develop a (1 + √ n)-approximation algorithm that outputs r as well as a corresponding motivating subgraph.
Interestingly, these subgraphs are paths.The algorithm is in fact a combination of two strategies, one which computes good solutions for small β and one which is effective for large β.Furthermore, the approximation factor of our algorithm is asymptotically tight.As the main technical contribution of this paper, we prove that the optimization problem cannot be approximated in polynomial-time to within a ratio of √ n/4 or less unless P = NP.Thus we resolve the approximability of the problem.
In Section 5 we explore the problem of finding reward configuration within a fixed total budget of at most b.We show that the problem can again be solved in polynomial-time if β = 0 or β = 1.Using a reduction from SET PACKING [3], we prove that deciding the existence of a motivating reward configuration is NPcomplete for general β ∈ (0, 1), even if b = 0.This immediately implies that the optimization problem of computing the minimum b that admits a motivating reward configuration cannot be approximated efficiently to within any ratio greater or equal to 1 unless P = NP.

The formal model
In the following, we present the model by Kleinberg and Oren [4].Let G = (V, E) be a directed acyclic graph.Associated with each edge (v, w) is a non-negative cost c G (v, w).An agent, with a bias factor β ∈ [0, 1], has to incrementally construct a path from a source s to a target t.At any vertex v the agent evaluates its lowest perceived cost.For this purpose, the agent considers all paths from v to t and accounts for the cost of incident outgoing edges by their actual value, whereas it discounts future edges by β.More specifically, let d G (w) denote the cost of a cheapest path from some vertex w to t, considering the original edge costs.If no path exists, we assume that d G (w) = ∞.Accordingly, the agent's lowest perceived cost is defined as . Ties are broken arbitrarily.Should G be clear from context, we will omit the index and write c(v, w), d(v) and ζ(v) instead.
In Section 3 and 4 we will investigate problems in which a single non-negative reward r is placed at t.The agent perceives the value of this reward as βr at every vertex different from t.A graph G is motivating if the agent does not abandon the project while constructing a path from s to t in G.More specifically, at any vertex v along the agent's path, it compares ζ(v) to βr and continues moving if ζ(v) ≤ βr, i.e. the reward is sufficiently motivating.Otherwise, if ζ(v) > βr, the agent abandons.Because ties are broken arbitrarily, there could be more than one path for the agent.Consequently, G is only considered motivating if the agent abandons the project on non of these paths.
In Section 5 we will generalize Kleinberg and Oren's model to allow the placement of non-negative rewards r(v) at arbitrary vertices v.We call such a placement a reward configuration.Given a specific reward configuration r, let c r (v, w) = c(v, w) − r(w) be the cost of traversing (v, w) minus the reward collected at w with respect to r.Using c r as new cost metric, we denote the cost of a cheapest path from w to t as d r (w).When located at v, the agent considers all paths from v to t and accounts for incident outgoing edges by their actual value, whereas future costs and rewards are discounted by β.More specifically, we define the agent's perceived cost as In this case the agent traverses an outgoing edge (v, w) which minimizes its perceived cost, i.e. c(v, w) + β(d r (w) − r(w)) = ζ r (v).Again ties are broken arbitrarily.Otherwise, if ζ r (v) > 0, the agent abandons.The agent only collects the rewards that it visits on its path from s to t.We are only interested in the value of the total reward handed out to the agent.We say r is within some given budget b if the agent does not collect a total reward greater than b on any of these paths.
To illustrate the model, we consider the car wash problem once more.Assume that the car has to be washed during the next m days, where m > 50.The task graph G is depicted in Figure 1.For each day i, with 1 ≤ i ≤ m, there is a vertex v i .Let v 1 be the source.There is an edge (v i , t) of cost i/50 representing the action that Alice washes the car on day i.In order to keep the drawing simple, the edges (v i , t) merge in Figure 1.Moreover, for every i < m there is an edge (v i , v i+1 ) of cost 0 that represents the postponement of the job from day i to the next day.Assume for now that Alice is located at some v i , with i < m.Her perceived cost for procrastination is at least β(i + 1)/50.This lower bound is tight if Alice plans to traverse the edges (v i , v i+1 ) and (v i+1 , t).Alternatively, her perceived cost for using (v i , t) and washing the car on day i is i/50.Remember that β = 1/3.It follows that ζ(v i ) = β(i + 1)/50, which means that Alice always prefers to wash the car on the next day instead of the current day.Moreover, if i < 50, then ζ(v i ) ≤ βr for the reward of r = 1 provided by the family at t. Thus Alice procrastinates and moves along (v i , v i+1 ).Note that her planning is time-inconsistent.On day i she intends to follow the path (v i , v i+1 ), (v i+1 , t).However when located at v i+1 she pursues a different strategy.Once Alice reaches v 50 , she realizes that ζ(v 50 ) = β(50 + 1)/50 exceeds the perceived value of the reward βr and abandons.Thus G is not motivating.
Next assume that we delete (v 16 , v 17 ) from G. In other words, we remove a procrastination edge and thereby set a deadline at day i = 16.Let G ′ be the resulting graph.When Alice reaches v 16 she can not procrastinate anymore and her perceived cost is ζ G ′ (v 16 ) = 16/50 which is less than the perceived value βr = 1/3 of the reward.Hence Alice washes the car and reaches t.Subgraph G ′ is motivating.However, it is interesting to observe, that there is no reward configuration r within a budget less than (m/50)/β that is motivating in the original task graph G.This is due to the fact that no matter how much reward is placed at t, Alice will always prefer to procrastinate until day m, when her cost for washing the car is m/50.
To illustrate the strengths of reward configurations, we consider a second scenario.Suppose that at day i = 50 Alice's family offers her a new opportunity to earn pocket money.If she first washes the family's car, which now incurs a cost of 1, and afterwards also cleans her room, which due to years of neglect incures a cost of 6, she receives 10 Euros.Secretly, the family does not care about Alice cleaning her room.They only try to trick her into washing the car for free.We model this project with a new task graph G that consists of a path from s to t via an intermediate vertex v and another path from v to t via an intermediate vertex w.The edge (s, v) corresponds to the job of washing the car and has a cost of 1, while (v, w) is the job of cleaning Alice's room and has a cost of 6.The edges (v, t) and (w, t) are of cost 0. Assuming that β = 1/3, there is a reward configuration r for which the family can motivate Alice to complete the project within a budget of 0. Setting r(w) = 10, Alice traverses (s, v) with a lowest perceived cost of ζ r (s) = −1/3.This cost is realized along the edges (s, v), (v, w) and (w, t).When at v, Alice perceives cost of 8/3 for traversing (v, w) and cleaning her room but 0 for ending the project right away along (s, t).Thus she changes her plan and moves to t without collecting a reward.Interestingly, there is no motivating subgraph of G for a reward less than 3 if the reward must be placed at t.This suggests that, depending on the structure of the task graph, the performance of our two design strategies may vary drastically.

The complexity of finding motivating subgraphs
In this section we first observe that if β = 0 or β = 1, then the problem of finding a motivating subgraph can be solved in polynomial-time.We then prove that the decision problem, which we refer to as MOTIVATING SUBGRAPH (MS), is NP-complete for general β ∈ (0, 1).Our proof is based on a reduction from k DIS-JOINT CONNECTING PATHS (k-DCP), cf.[3].Lynch [6] showed that k-DCP is NP-complete in undirected graphs.In the Appendix, by adapting Lynch's proof, we show that k-DCP is also NP-complete in directed acyclic graphs.

Proposition 1. A motivating sub graph can be found in polynomial time
Proof.We start with β = 0.In this case a subgraph G ′ of G is only motivating if at every vertex of G ′ the agent's perceived cost is 0. Hence G contains a motivating subgraph if and only if G contains a path from s to t such that all of its edges have cost 0. Any such path is a motivating subgraph.If β = 1, then the agent follows a cheapest path from s to t in any subgraph.Hence G contains a motivating subgraph if and only if there exists a path from s to t with a total edge cost of at most r.Should such a path exist, then G is its own motivating subgraph.Clearly, a motivating subgraph can be found in polynomial-time for both cases β = 0 and β = 1.
We now formally define the decision problem MS.

Definition 1 (MOTIVATING SUBGRAPH).
Given a task graph G, a reward r and a bias factor β ∈ [0, 1], decide the existence of a motivating subgraph of G.
The following proposition, while being interesting in its own right, implies that MS is contained in the complexity class NP.

Proposition 2. For any task graph G, reward r and bias factor β, it can be decided in polynomial-time if G is motivating.
Proof.We modify G in the following way.For each vertex v, we calculate the lowest perceived cost ζ G (v). Next, we take a copy of G, say G ′ , and remove all edges In other words, we remove all edges from G ′ that do not minimize the agents perceived cost.Because the vertices that can be reached from s in G ′ are exactly those vertices that are visited by the agent in G, G is motivating if and only if ζ G (v) ≤ βr for all vertices that can be reached from s in G ′ .The latter condition can be checked in polynomial-time by standard graph search algorithms.
Before we prove NP-hardness of MS, we restate the definition of k-DCP as a brief reminder.
Definition 2 (k DISJOINT CONNECTING PATHS).Given a directed acyclic graph H and k disjoint vertex pairs (s 1 , t 1 ), . . ., (s k , t k ), decide if H contain k mutually vertex-disjoint paths, one connecting every s i to the corresponding t i .
Furthermore, we want to introduce to a simple but useful lemma, which lets us set prices along a path of arbitrary length k, such that at every vertex, except for the last, the perceived cost of following the path to its end is exactly 1.Such price structures will be a reoccurring feature of the reductions in Theorem 1 and 3.

Lemma 1. For every positive integer k and bias factor
Proof.If β is equal to 0, this claim is easy to verify.However, should β be greater than 0, the geometric series We are now ready to prove NP-completeness of MS.
Proof.By Proposition 2 we can take any motivating subgraph G ′ as certificate for a "yes"-instance of MS.Hence MS is in NP.In the following we will present a polynomial-time reduction from k-DCP to show NPhardness.This establishes the theorem.Consider an instance I of k-DCP, consisting of a directed acyclic graph H and k disjoint vertex pairs (s 1 , t 1 ), . . ., (s k , t k ).We construct an instance J of MS that is composed of a task graph G, a bias factor β and a reward r.The graph H will be embedded into G in such a way that G has a motivating subgraph if and only if H has k disjoint connecting paths.We proceed to describe the MS instance J .Let β ∈ (0, 1) be any value with the property that its encoding length is polynomial in that of I. Set r = 1/β.The task graph G is constructed as follows, see also Figure 2. It consists of a source s and a target t.These two vertices are connected by a directed main path along The graph G with an embedding of H.
The last three edges of the main path, connecting v k+1 to t, have a cost of (1 − β) 2 , 1 − β and 1 respectively.Additionally, G contains k shortcuts that connect every v i , with 1 ≤ i ≤ k, to t via an embedding of H.More formally, H is added to G. The i-th shortcut starts at v i .It visits a distinct vertex w i along an edge of cost In Figure 2 the latter edge cost is shown as two terms, namely i(1 − β)/(k + 1) and 1, in order to keep the labels of the parallel edges (t i , t) simple.Note that for any shortcut i, the edge costs of (w i , s i ) and (t i , t) complement each other, i.e. they sum to exactly The edges of H all have a cost of 0. We remark that at every vertex different from t, the perceived value of the reward is βr = 1.The resulting graph G is acyclic and its encoding length is polynomial in that of I. We next prove that I has a solution if and only if J has one.
(=⇒) First assume that I has a solution, i.e. there exist k vertex-disjoint paths, one connecting every s i to the corresponding t i .In the embedding of H we remove all edges, except for the k vertex-disjoint paths.Let G ′ be the resulting subgraph of G.We will show that G ′ is motivating, for r = 1/β.More specifically, we will argue that the agent travels along the main path from s to t.If the agent resides at one of the first k vertices v i it has two options.Either it traverses (v i , v i+1 ) and follows the main path, or it takes (v i , w i ) and walks along the i-th shortcut.Let s = 0.For 0 ≤ i < k, the perceived cost of traversing (v i , v i+1 ) and following the ).According to Lemma 1, the value of the perceived cost simplifies to 1 − ε.Note that similar calculations are scattered throughout the entire proof.For the sake of brevity, Lemma 1 will not be referred to explicitly each time.If the agent is at v k , its perceived cost in following the main path to t is also 1 − ε.Hence, taking (v i , v i+1 ) is a motivating option.In contrast, if the agent resides at v i , with 1 ≤ i ≤ k, and plans to traverse (v i , w i ), following the i-th shortcut, its perceived cost is 1.Although this option is also motivating, it is perceived as more expensive than taking (v i , v i+1 ).As a result, the agent follows the main path until it reaches v k+1 .At this point the agent has no option but to stay on the main path.The perceived cost at any of the vertices v k+1 , v k+2 and v k+3 is 1.Thus subgraph G ′ is indeed motivating.
(⇐=) Next assume that I does not have a solution.We prove that no subgraph G ′ of G is motivating.Consider any subgraph G ′ .Observe that G ′ is only motivating if the agent never leaves the main path.Otherwise the agent must visit some t i on its way to t at which point it perceives a cost of i(1 − β)/(k + 1) + 1 > 1 and abandons.We therefore focus on subgraphs G ′ that contain all edges of the main path.More specifically, we focus on subgraphs G ′ in which the agent walks along the main path.We say that the i-th shortcut is degenerate if the total edge cost of a cheapest path from v i to t via In particular, the i-th shortcut is degenerate if there is no path from v i to t via (v i , w i ), in which case the perceived cost of the shortcut is infinite.Note that by construction, every degenerate shortcut must miss the target value by (1 − β)/(k + 1) or more.
We first argue that there is at least one degenerate shortcut in G ′ .For the sake of contradiction, assume no such shortcut exists.This means that there is a cheapest path P i from v i to t via (v i , w i ) for all 1 ≤ i ≤ k.By construction, P i traverses (w i , s i ).Remember that the total cost of P i must sum up to θ.The only way to achieve this is if P i ends in (t i , t).Furthermore, P i must be vertex disjoint from all other paths P j with j < i. Otherwise P i would not be a shortest path from v i to t, given that c(t j , t) < c(t i , t).However, this implies that there are k vertex disjoint paths in H, one from each s i to the corresponding t i , which contradicts the assumption that I has no solution.Now that we established the existence of a degenerate shortcut, we distinguish two cases.Either there exists a degenerate shortcut i such that the cost of a cheapest path from v i to t via (v i , w i ) is less than θ or for each degenerate shortcut i the cost of a cheapest path from v i to t via (v i , w i ) is greater than θ.
We study the first case first.Let i be the largest index of a degenerate shortcut such that the cheapest path from v i to t via (v i , w i ) is less than θ.When located at v i the agent perceives cost less or equal to Conversely, in planning a cheapest path along (v i , v i+1 ) and following a subsequent shortcut or the main path, the agent perceives a cost of at least 1 − ε.This holds true because all subsequent shortcuts are of cost θ or more.By choice of ε, the perceived cost along (v i , w i ) is less than the perceived cost along (v i , v i+1 ).However, this contradicts our assumption that the agent stays on the main path.
We finally study the second case.Suppose that the i-th shortcut is degenerate and consider the agent planing its path from v i−1 to t via (v i−1 , v i ).The agent has two options.If the agent plans to follow the i-th shortcut, it perceives a cost greater or equal to The inequality holds by choice of ε.If the agent plans tor traverse (v i−1 , v i ) instead, taking either a shortcut j > i or following the main path all the way to t, it perceives a cost of at least This holds true because no shortcut is of cost less than θ.Once more, the perceived cost is greater than 1 by definition of ε.Hence the agent certainly abandons at v i−1 , which shows that G ′ cannot be motivating.

Approximating optimum rewards
Considering that the decision problem MS is NP-hard, the next and arguably natural question is whether there exist good approximation algorithms.Hence we formulate MS as an optimization problem.

Definition 3 (MOTIVATING SUBGRAPH OPT).
Given a task graph G and a bias factor β ∈ (0, 1), determine the minimum reward r to place at t such that G contains a motivating subgraph.
We present two simple algorithms.The first algorithm is designed for small values of β.The second algorithm computes good solutions for large β.The algorithms output r as well as a corresponding motivating subgraph G ′ .Both strategies are somewhat reminiscent of Proposition 1.A combination of them yields a (1 + √ n)-approximation, for any β ∈ (0, 1).
Suppose that β is small.Then the agent is highly oblivious to the future.Consequently it is sensible to let the agent travel along a path that minimizes the maximum cost of any edge.We call a path with this property a minmax path.A minmax path can be computed easily in polynomial-time.For instance, starting with an empty subgraph, insert the edges of G in non-decreasing order of cost until s and t become connected for the first time.Next, choose one of the possibly several paths that connect s and t in the subgraph as minmax path.Our first algorithm, called MINMAXPATHAPPROX, computes a minmax path P and returns the corresponding G ′ containing only the edges of P .Furthermore, the algorithm sets r according to the maximum over all perceived cost along P , or more formally r = max{ζ G ′ (v) | v ∈ P }/β.Clearly, this reward is sufficient to make G ′ motivating.Proposition 3. MINMAXPATHAPPROX achieves an approximation ratio of 1 + βn, for any β ∈ (0, 1).
Proof.Let c denote the maximum cost of any edge along the path P computed by MINMAXPATHAPPROX.By definition of P the agent must encounter an edge of cost at least c in any motivating subgraph.Thus the optimum reward is lower bounded by c/β.Conversely, the cost of every edge in P , of which there are at most n − 1, is upper bounded by c.This means that MINMAXPATHAPPROX returns a reward r, which is upper bounded by which yields the desired approximation ratio of 1 + βn.
Our second algorithm, called CHEAPESTPATHAPPROX, computes a path P of minimum total cost from s to t and returns the corresponding G ′ containing only the edges of P .Again, the algorithm sets the reward to r = max{ζ G ′ (v) | v ∈ P }/β.Proposition 4. CHEAPESTPATHAPPROX achieves an approximation ratio of 1/β, for any β ∈ (0, 1).
Proof.Let P be the path computed by CHEAPESTPATHAPPROX.At any vertex v of P , the agent perceives a cost of at most d G ′ (v), which is upper bounded by d G ′ (s).Thus d G ′ (s)/β is an upper bound on the the reward r calculated by CHEAPESTPATHAPPROX.In an optimal solution, when located at s, the agent is faced with a cost of at least βd G (s). Consequently, a reward of at least d G (s) is required to motivate the agent.Because P is a cheapest path, it holds that d G (s) = d G ′ (s), which establishes an approximation ratio of 1/β.

Let COMBINEDAPPROX be the combined algorithm that chooses MINMAXPATHAPPROX if β ≤ 1/
√ n and CHEAPESTPATHAPPROX otherwise.Propositions 3 and 4 immediately imply the following result.
We next prove that, although our (1+ √ n)-approximation algorithm is simple, it achieves the best possible performance guarantee, up to a small constant factor, that can be hoped for in polynomial-time.For the proof of the theorem we need the next technical lemma.
Lemma 2. For any integer ρ, with ρ ≥ 1, it holds that Proof.The sequence (1 − 1/n) n is monotonically increasing for n ≥ 1. Hence it holds that

Theorem 3. MS-OPT is NP-hard to approximated within a ratio of 1/4
√ n.
Proof.Again we present a reduction from k-DCP.Let ρ be an arbitrary positive integer.The best choice of ρ will be determined later.Given an instance I of k-DCP we construct an instance J of MS-OPT, consisting of a task graph G and a bias factor β, having the following two properties.(a) If I has a solution, then G has a subgraph that is motivating for a reward of r = 1/β.(b) If I does not have a solution, then no subgraph of G is motivating for a reward of at most r = ρ/β.Hence any algorithm that achieves an approximation ratio of ρ or better must solve instances of k-DCP.
We begin with the description of J .As before an instance of I is specified by a graph H together with k vertex pairs (s 1 , t 1 ), . . ., (s k , t k ).Considering that Proposition 4 gives a (1/β)-approximation, the bias factor of J cannot be chosen arbitrarily anymore.It must be less than 1/ρ.For convenience we set β = 1/(3ρ + 3).From a structural point of view G of J consists of two units, a central unit and an amplification unit.The central unit contains an embedding of H.The amplification unit precedes the central unit and increases any approximation error that might occur in the central unit.
The central unit, depicted in Figure 3, has the same overall structure as the graph constructed in the proof of Theorem 1.There exists a main path and k shortcuts, linking to an embedding of H.However, there are important differences.The main path is longer and edge costs are different.More specifically the main path starts at a vertex u 9ρ 2 , which is the last vertex of the amplification unit, and ends at vertex t.The path consists of k + 3ρ + 3 intermediate vertices v 1 , . . ., v k+3ρ+3 .The first k + 1 edges of the main path each have a cost of (1 − β) 3ρ+3 − ε, where ε is a positive value satisfying Note that (1−β) 3ρ+3 −1/3 is a positive quantity according to Lemma 2. The remaining edges of the main path have increasing cost.In particular, we set the cost of Each of the first k vertices v i is the starting point of a shortcut.Similar to the construction in Theorem 1, the i-th shortcut is routed through a distinct vertex w i before it enters the emending of H and eventually reaches t.The edges (v i , w i ) have a cost of (1 − β) 3ρ+2 .Furthermore, the edges (w i , s i ) and (t i , t) are of cost (k + 1 − i)(1 − β) 3ρ+1 /(k + 1) and i(1 − β) 3ρ+1 /(k + 1) + 3ρ j=0 (1 − β) j respectively.In Figure 3, the later edge cost is shown as two terms, namely i(1 − β) 3ρ+1 /(k + 1) and 3ρ j=0 (1 − β) j , in order to keep the labels of the parallel edges (t i , t) simple.Note that for 1 ≤ i ≤ k, the costs of (v i , s i ) and (t i , t) sum to exactly 3ρ+1 j=0 (1 − β) j .All edges of H have cost 0. Next we describe the amplification unit, which is shown in Figure 4. Starting at vertex s, there is a directed path to u 9ρ 2 , called the amplification path, that consists of intermediate vertices u 1 , . . ., u 9ρ 2 −1 .Each edge of the amplification path has a cost of (1 − β) 3ρ+3 − ε.From every u i , there is also an edge to a vertex z of cost (1 − β) 3ρ+2 .Vertex z is connected to t via an edge of cost 3ρ+1 j=0 (1 − β) j .In the following we prove the statements given in (a) and (b) above.
(a) Assume that I has a solution.Let G ′ be the subgraph of G obtained by deleting all edges from the embedding of H that do not belong to one of the k vertex-disjoint paths in a fixed solution of I. We claim that G ′ is motivating with reward r = 1/β.Remember that the agent perceives r as 1 on all vertices of G except for t.Furthermore, we will use Lemma 1 to calculate the agent's perceived cost if not stated explicitly otherwise.Let u 0 = s.At every vertex u i , with i < 9ρ 2 , the agent's perceived cost for moving to u i+1 and then directly to t, hence traversing edges (u i , u i+1 ), (u i+1 , z) and (z, t), is 1 − ε.Conversely, when residing at u i , with i ≤ 9ρ 2 , the agent's perceived cost in moving directly to t using edges (u i+1 , z) and (z, t) is 1.Thus the agent moves along the edges of the amplification path, until it reaches u 9ρ 2 .At u 9ρ 2 the agent moves on to u 1 because its perceived cost in traversing (u 9ρ 2 , v 1 ) and then following the first shortcut, which starts at v 1 , is 1 − ε.Similarly, when located at v i , with 1 ≤ i < k, agent perceives cost of 1 − ε for traversing (v i , v i+1 ) and then taking the (i + 1)-st shortcut.In contrast, planning a path along (v i , w i ), with 1 ≤ i ≤ k, has a cost of 1. Thus the agent follows the main path until reaching v k .By the same calculations, the agent moves to v k+1 .At this point the agent has no other option but to follow the main path.Because the agent's perceived cost is 1 for all vertices v i , with k < 1 ≤ k + 3ρ + 3, it eventually reaches t.
(b) Suppose that I does not have a solution and consider any subgraph G ′ of G.We first argue that if the agent leaves the amplification path or main path, then it abandons given a reward of at most ρ/β, which is perceived as ρ at every vertex different from t.If the agent leaves at some u i , it must pass z where it perceives cost of 3ρ+1 j=0 (1 − β) j .However, it holds that see Lemma 2 for the second inequality, and therefore the agent abandons.Similarly, if the agent leaves at some v i , with 1 ≤ i ≤ k, then it must pass one of the vertices t j , where the perceived cost is greater than 3ρ j=0 (1 − β) j and consequently also greater than ρ.Hence, we will restrict ourselves of subgraphs G ′ in which the amplification path and main path are intact and assume that the agent walks them.
We say that the i-th shortcut is degenerate if the cost of a cheapest path from v i to t via (v i , w i ) is different from the target value θ = 3ρ+2 j=0 (1 − β) j .In particular, the i-th shortcut is degenerate if there is no such path.Note that by construction, every degenerate shortcut must miss the target value by (1 − β) 3ρ+1 /(k + 1) or more.As in the proof of Theorem 1, the assumption that I has no solution implies the existence of a degenerate shortcut.By the same argument given in Theorem 1, it is also clear that if there is a degenerate shortcut of cost less than θ, the agent leaves the main path and abandons.In the remainder of the analysis of (b) we therefore assume that for all shortcuts i, the cost of a cheapest path from v i to t via (v i , w i ) is greater than θ.We distinguish two cases depending on whether or not the first shortcut is degenerate.
If the first shortcut is not degenerate, then there exists an integer i, with 1 < i ≤ k, such that the (i − 1)-st shortcut is not degenerate but shortcut i is.When the agent resides at v i−1 and plans a cheapest path along (v i−1 , w i−1 ), it perceives a cost of 1.In contrast, traversing (v i−1 , v i ) and taking the next shortcut i, has a perceives a cost of at least The inequality holds by the choice of ε.Moreover, there are no degenerate shortcuts of cost less than θ.
Thus traversing (v i−1 , v i ) and continuing further on the main path, possibly taking a subsequent shortcut, the agent's perceived cost is at least Again, the inequality holds by choice of ε.Thus the agent leaves the main path at v i−1 and abandons.Finally, we study the case that the first shortcut is degenerate with cost of a cheapest path from v 1 to t via (v 1 , w 1 ) greater than θ.Let i be highest index of a vertex on the amplification path such that u i is connected to t via (u i , z) and (z, t) in G ′ .The perceived cost of such a path is 1.Conversely, any path along (u i , u i+1 ), or (u 9ρ 2 , v 1 ) assuming i = 9ρ 2 , has a perceived cost greater than 1 as calculated in the last paragraph.Thus the agent leaves the amplification path and abandons.However, if no u i is connected to t via (u i , z) and (z, t), then the agent's lowest perceived cost at s is lower bounded by Taking into account that β = 1/(3ρ + 3), we can further simplify this term to Once more, the two inequalities hold by choice of ε.Hence G ′ is not motivating for a reward of at most ρ/β.In order to finish the proof of the theorem we have to determine ρ.We set ρ = m, where m is the number of vertices in H.The total number of vertices in G is n = 2 + (9m 2 + 1) + (m + 2k + 3m + 3).The first term accounts for s and t, the following bracket accounts for the number of vertices in the amplification unit and the last bracket accounts for the number of vertices in the central unit.Thus we have presented a polynomial-time reduction.Moreover, it holds that n ≤ 9m 2 + 6m + 6 < 16m 2 for every m ≥ 2, which means that ρ is lower bounded by 1/4 √ n.

Motivation through intermediate rewards
In this section, we generalize the original model of Kleinberg and Oren [4] to incorporate the placement of rewards on arbitrary vertices instead of just t.The goal is to minimize the total value of the rewards collected by the agent as it travels from s to t.The problem of finding an optimal reward configuration can be solved in polynomial-time if β = 0 or β = 1.We prove that, for general β ∈ (0, 1), the corresponding decision problem, which we call MOTIVATING REWARD CONFIGURATION (MRC), is NP-complete.Furthermore, we show that the optimization version MRC-OPT, is NP-hard to approximate within any ratio greater or equal to 1.
Proposition 5.An optimal reward configuration can be found in polynomial time for β = 0 or β = 1.
Proof.First suppose that β = 0.In this case the agent does not perceive rewards placed at any vertex of G and will only traverse edges of cost 0. Consider the subset V ′ of the vertices that can be reached from s on a path of cost 0.The agent will definitely travel from s to t and not abandon if and only if t can be reached from every vertex of V ′ on a path of cost 0.Because no rewards need to be placed in this scenario, the optimal budget is b = 0. Next assume β = 1.When the agent is at s, its lowest perceived cost for moving from s to t is equal to d(s).Setting r(t) = d(s) yields a motivating and also optimal reward configuration.This holds true because the agent travels from s to t along the edges of a cheapest path and its lowest perceived cost do not increase on its walk.
We now formally introduce the decision problem MRC.

Definition 4 (MOTIVATING REWARD CONFIGURATION).
Given a task graph G, a non-negative budget b and bias factor β ∈ [0, 1], decide the existence of a motivating reward configuration r, with r(v) ≥ 0 for all vertices of G, such that the total reward collected on any of the agent's walks is at most b.
The following proposition establishes membership of MRC in NP.

Proposition 6.
For any task graph G, reward configuration r and bias factor β, it is possible to decided in polynomial-time if r is motivating within a given budget of b.
Proof.Similar to Proposition 2, we modify G in the following way.First, we compute the lowest perceived cost ζ r (v) for all vertices in G. Next we take a copy of G, say G ′ , and remove all edges from G ′ that do not minimize the agents perceived cost in the original graph G.More formally, we remove all edges (v, w) for which ζ r (v) < c r (c) + β(d r (w) − r(w)).The given reward configuration r is motivating if and only if ζ r (v) ≤ 0 for all v that can be reached from s in G ′ .This condition can be checked in polynomial-time using graph search algorithms.To determine if the budget constraint is satisfied, we assign all edges (v, w) of G ′ a cost of r(w).Let c be the maximum cost among all paths from s to t in G ′ according to these prices.Because G ′ is acyclic, c can be computed in polynomial-time.Thus r is within budget if and only if c + r(s) ≤ b.
Our NP-completeness proof of MRC relies on a reduction form SET PACKING (SP) [3].For convenience, we restate SP in the following Definition.Definition 5 (SET PACKING).Given a collection S 1 , . . ., S ℓ of finite sets and an integer k ≤ ℓ, decide if there exist at least k mutually disjoint sets?.
We are now ready to prove NP-completeness of MRC.Theorem 4. MRC is NP-complete, for any bias factor β ∈ (0, 1), even if b = 0. Proof.By Proposition 6, we can take any motivating reward configuration r within budge b as certificate for a "yes"-instance of MRC.Hence MRC is in NP.In the following we present a polynomial-time reduction from SP to MRC.We focus on the case that b = 0.At the end of the proof we show how to modify the reduction to handle arbitrary values b > 0.
Let I be an arbitrary instance of SP, consisting of finite sets S 1 , . . ., S ℓ and an integer k ≤ ℓ.We start by constructing the task graph G of an MRC instance J . Figure 5 depicts G for a small sample instance of SP.In general, G consists of a source s, a target t and kℓ vertices v i,j , where 1 ≤ i ≤ k and 1 ≤ j ≤ ℓ.Intuitively, if the agents visit v i,j , then S j is the i-th set to be added to the solution of I.For every v i,j , with i < k, there is a directed edge to all vertices v i+1,j ′ on the next level.We call these edges upward edges.The cost of any such edge is 1 − β − ε.Here β ∈ (0, 1) is any value whose encoding length is polynomial in that of I. Furthermore, ε is a positive value satisfying In Figure 5 the upward edges are merged to maintain readability.From s there is a directed edge to every vertex v 1,j on the bottom level.Again, the cost of all such edges is 1 − β − ε.Finally, for every vertex v k,j on level k there exists an edge of cost 0 to t.
In order guide the agent onto a specific upward edge, we add shortcuts to G that connect every v i,j to t via an intermediate vertex w i,j .The first edge (v i,j , w i,j ) has cost 1 and the second edge (w i,j , t) has cost 0.In Figure 5 the edges (w i,j , t) are omitted for the sake of a readability.As we will see, a reward of value less than 1/β can be placed on w i,j and the agent will not claim it.Finally we introduce downward paths of length two that connect each v i,j with all w i ′ ,j ′ for which i ′ < i and S j ∩ S j ′ = ∅, i.e. the sets are not disjoint.The first edge has a cost of 0, while the second edge has a cost of (1 − β − kε)/(β − β 2 ).In Figure 5 each downward path is drawn as a single dashed edge.The purpose of these paths is to let the agent claim a reward or abandon whenever the disjointness constraints of I are violated.Notice that G is acyclic.We will show that I has a solution if and only if J has one.
(=⇒) First assume that I has a solution, i.e. there exist k mutually disjoint sets among S 1 , . . ., S ℓ .Fix such a selection of k disjoint sets and assign each set to a distinct level of G. Suppose that S j of the collection is assigned to level i.Then place a reward of value (1 − ε)/β on w i,j .The corresponding shortcut from v i,j is referred to as an active shortcut.We now analyze the agent's walk through G and show that it visits exactly the vertices v i,j at which the active shortcuts start.The agent does not claim any reward.
Suppose that the agent is located at the initial vertex v i,j , with i < k, of an active shortcut.There are three options.First the agent could follow the shortcut to w i,j .However, the perceived cost along this path to t is ε, so the agent has no incentive to move to w i,j .Secondly, the agent could take a downward path.By construction, none of them leads to an active shortcut.This means that the agent cannot collect any reward on such a path but encounters a positive cost on the downward path.Hence this option is not motivating either.The agent's only remaining option is to take an upward edge.If the agent plans to take the active shortcut of level i + 1, it perceives a total cost of 0. This is a motivating choice.Conversely, assume that the agent plans a path P to t that visits a vertex v i+1,j ′ on the next level such that the corresponding shortcut is not active.We distinguish between four scenarios.If P includes a downward path, then at best one reward can be located along P so that the agent's perceived cost is at least which is not motivating.The inequality follows from our choice of ε.If P includes the shortcut at v i+1,j ′ then the agent encounters no reward but a positive cost, which is not motivating either.If P includes a shortcut on some level above i + 1, then the agent must traverse at least two upward edges but can collect at most one reward.In this case the agent's perceived cost is lower bounded by As always, the last inequality follows by choice of ε.Finally, P may neither include a downward path nor a shortcut.However, this means that the agent has positive cost and does not collect any reward.All in all, the only motivating option is to take the upward edge leading to the active shortcut of level i + 1.
The same arguments also apply when the agent is at s or the initial vertex of an active shortcut on the top level.At s, the agents only option is to take an upward edge.Hence it moves to the initial vertex of the active shortcut of the bottom level.At the top level the agent will take the direct edge to t, which incurs no cost.All other options, namely taking a downward path or the current shortcut, are not motivating.
(⇐=) Next assume that J has a solution, i.e. there exists a motivating reward configuration such that the agent does not claim any reward.Consider arbitrary walks of the agent.A first crucial observation is that no such walk enters a shortcut or a downward path, because a positive reward on a vertex along these paths is needed to guide the agent onto them.Considering that the agent cannot change its plan once it entered a shortcut or downward path, it would either claim the reward or abandon, which contradicts the assumption that J has a solution.Hence the agents visits one vertex v i,j at each level i.We call every v i,j that is contained in one of the agent's possible walks an active vertices.Note that there might be more than one active vertex per level.
We next prove that at every active vertex the agent's lowest perceived cost at least (1 − k)ε.More specifically, we show by backwards induction, from the top level down to the bottom level, that whenever the agent is located at an active vertex of level i, its perceived cost in planing a path to t is at least (i − k)ε.Moreover, if i < k, then the only motivating paths are paths passing through the shortcuts of active vertices on level i + 1.First assume that the agent is at an active vertex v k,j on the top level.As argued above, the agent cannot take the shortcut or downward path to t.However, the edge (v k,j , t) is a motivating path with a perceived cost of 0, which is equal to (k − k)ε.This proves the basis of our induction.
For the inductive step, suppose that i < k and that the agent is located at some active vertex v i,j .Let v i+1,j ′ be the active vertex visited next by the agent.Because the agent will move from v i,j to v i+1,j ′ , there must exist a path P from v i,j to t via (v i,j , v i+1,j ′ ) that minimizes the agent's perceived cost.We distinguish four scenarios.First, assume that i = k − 1 and P contains (v i+1,j ′ , t).This means that the agent receives no reward but has positive cost, which is not motivating.Secondly, assume P contains a forward edge leaving v i+1,j ′ and consider the perceived cost of the remaining portion of P when viewed from v i+1,j ′ .By induction hypothesis this cost must be at least ((i + 1) − k)ε.Furthermore, no reward must be placed at v i+1,j ′ , as this would violate the budget.This means that the perceived cost of P at v i,j increases by β(1 − β − ε) when compared to the perceived cost of P at v i+1,j ′ .Thus the perceived cost at of P at v i,j is at least The last inequality holds by choice of ε.Hence P is not motivating.Thirdly, assume P contains a downward path out of v i+1,j ′ .In this case, the perceived cost of P at v i,j compared to the perceived cost of P at v i+1,j ′ increases by even more, namely 1 − β − ε.Certainly, P can not be motivating.Finally, assume P contains the shortcut out of v i+1,j ′ .When viewed from v i,j instead of v i+1,j ′ , the perceived cost of P increases by 1 − β − ε and decreases by 1 − β.Thus the perceived cost is at least ((i + 1) − k)ε − ε = (i − k)ε which concludes the induction step.By a similar argument, the only motivating paths out of s traverse the shortcut of an active vertex on the bottom level.
The last three paragraphs imply that for every active vertex v i,j a reward of at least (1 − ε)/β has to be placed at w i,j .Otherwise the shortcut would not be motivating when the agent resides at an active vertex on the previous level i − 1, or at s if i = 1.This implies that there can be no downward path connecting an active vertex v i,j to the shortcut of an active vertex on a lower level, because the perceived cost at v i,j for following the downward path would be at most Thus the active vertices v i,j along any walk of the agent correspond to k disjoint sets S j , which proves that I has a solution.We finally address the case that the agent may collect a total reward of b > 0. Consider a slightly modify version of G.We rename t by t ′ and add an edge from t ′ to a new target t.The cost of this edge is βb.The agent only reaches t from t ′ if a reward of value b is placed on t.With this observation the above proof immediately carries over.
We next turn to the optimization variant of MRC.Definition 6 (MOTIVATING REWARD CONFIGURATION OPT).Given a task graph G and a bias factor β ∈ (0, 1), determine the minimum budget b for which there exists a reward configuration r, with r(v) ≥ 0 for all vertices of G,such that the total reward collected on any of the agent's walks is at most b.
Assuming that P = NP, Theorem 4 implies that there exists no polynomial-time algorithm that approximates a motivating reward configuration such that the required budget is within any ratio greater of equal to 1 compared to the budget of an optimal solution.This, follows from the fact that MRC is NP-complete even in the special case that b = 0. Corollary 1. MRC-OPT is NP-hard to approximated within any ratio greater of equal to 1.

Conclusions
In this paper we have studied computational problems in time-inconsistent planning using a graph model by Kleinberg and Oren [4].As a main result, assuming P = NP, we established asymptotically tight upper and lower bounds of Θ( √ n) on the efficient approximability of MSG-OPT as well as a negative approximability result for MRC-OPT.Given the state of the art, we believe that a generalization of the graph model to quasi  hyperbolic discount functions is a promising research direction.In hyperbolic discounting, which is frequently used in the behavioral economics literature [2,5], there are two parameters β, δ ∈ [0, 1].Any value c that is realized t time steps in the future is perceived with a current value of βδ t c.For t = 0, the perceived value is c.Note that Kleinberg and Oren's [4] model is a special case of quasi hyperbolic discounting for δ = 1.Although such discount functions are more involved, the exponentially fading value of future costs and rewards might allow for improved approximation guarantees if δ < 1.
For s ′ j and t ′ j the path of any literal in c j that evaluates to true can be selected.Because the formula is satisfied, at least one such path must exist.By construction of H, this yields m + n vertex disjoint connecting path.
(=⇒) Observe, that every literal's path has exactly one intermediate vertex.As a result, the only paths that connect a terminal s ′ j with t ′ j are exactly the paths that correspond to the literals of c j .The same holds for the high and low paths of a variable x i .Hence, if there are m + n vertex-disjoint connecting path in H, then the chosen high and low paths directly translate to a satisfying variable assignment of the 3-SAT formula.

Figure 1 :
Figure 1: The task graph of the car wash problem.

Figure 3 :
Figure 3: The central unit of G.

Figure 4 :
Figure 4: The amplification unit of G.