1 Introduction

Networks naturally occur in many areas such as economics, computer science, chemistry or biology. A common way to model scenarios within networks is to use Markov chains. For a finite state space, a transition matrix describes the structure of changes on the chain’s states, see e.g. Gagniuc (2017). Usually, the iterative process of repeated transitions over the states provides a stationary distribution. However, if the considered time horizon is short, a question arises on how to efficiently manipulate the distribution of information within the network. As an obvious choice for manipulation, each agent may start with an initial distribution and spread the information by communicating with the neighbors. But, since the own manipulation power is often limited, it is quite reasonable for an agent to engage intermediary organizations instead. This could be due to the restricted access to the parts of the network, revealing the manipulation interests too apparently, or due to the lack of knowledge about the network structure. Possible examples include:

Influencers:

Companies, who want to credibly advertise their products or services via social media channels, pay influencers on social media platforms. They act as a part of the network and spread information on the products or services.

Search engines:

Web site owners try to increase the visibility of their web sites. In order to find proper content, most of the Internet users enter a query into a web search engine. Thus, these query results strongly influence the short term behavior of the users.

Conspiracy theory:

Agents try to spread false information for different interests. In order to spread fake news, agents have to rely on different distribution channels, such as groups in social media networks or Internet blogs.

Note that two of our examples are related to the manipulation of information in a social network. There is a growing literature concerning this topic, see Acemoglu and Ozdaglar (2011) for an overview. Most of the models describe the update of opinions or beliefs, see e. g. DeGroot (1974), which is done according to a convex combination of other network members’ opinions. Applying traditional techniques from the analysis of Markov chains, the formation of a consensus is examined. In these approaches, manipulation is modeled by modifying the transition matrix, e. g. introducing randomness Acemoglu and Ozdaglar (2011). Förster et al. (2016) studied manipulation in a model of opinion formation. There, the weights of the transition matrices can be changed by agents, while all starting distributions are fixed. Our model differs from those in the existing literature, since we examine how the information regarding a topic is distributed among network participants through intermediaries. Loosely speaking, we analyze who knows how much and how this information state can be efficiently manipulated by engaging intermediaries. As mentioned by Acemoglu and Ozdaglar (2011), one central component of opinion formation is how agents update their prior beliefs based on new information. In this paper, we also contribute to the opinion formation, because we investigate a way to manipulate the acquirement of information by employing the network of information sources. Our goal is not to learn the complete structure of the network, for which usually hidden Markov models are applied, see e. g. Yang et al. (1997). Instead, organizations should be able to select a starting distribution aiming to arrive at a certain information state after a number of iterations. Agents are choosing among the intermediary organizations to boost manipulation. This is at the core of our network manipulation algorithm. Note that a similar problem has been analyzed by Lindqvist (1977), where the author applies decision-theoretic techniques to observe a state at a time and obtain information about the initial state.

Let us comment on the mathematics behind the proposed network manipulation algorithm. It is motivated by Nesterov (2020) where a new technique for soft clustering is introduced. For this, voters and political parties alternately solve their subproblems, yielding an alternating minimization scheme. The behavior of voters turns out to be in accordance with the well known multinomial logit model from discrete choice theory. Namely, the voters choose rationally among the parties, but are prone to random errors, see e. g. Anderson et al. (1992). The parties update their political positions depending on how many voters they attract. Overall, the resulting soft clustering is given in terms of probabilities from the multinomial logit model. In this paper, we generalize the idea suggested in Nesterov (2020) to a broader class of discrete choice probabilities. This is done by presenting a network manipulation model based on alternating steps performed by agents and organizations. Agents try to manipulate a network by choosing intermediary organizations for helping in that. In order to select among the organizations, agents observe which of them better manipulate the network in comparison to agents’ goals. While doing so, agents are prone to random errors, which lead to choice probabilities following certain discrete choice models examined in our previous paper Müller et al. (2021a). Altogether, we show how the alternating minimization scheme introduced by Nesterov (2020) can be applied for network manipulation. Additionally, we present an inexact version of the alternating minimization scheme. Inexactness is due to the fact that the subproblems of agents and/or organizations may not be solved exactly and may suffer from numerical inaccuracies. Overall, we conclude that the agents’ imperfect behavior and organizations’ conservatism in profit maximization reduce the accumulated errors.

Notation In this paper, we mainly focus on subspaces of \(\mathbb {R}^n\) and \(\mathbb {R}^{m \times n}\), where \(\mathbb {R}^n\) is the space of n-dimensional column vectors

$$\begin{aligned} x = \left( x^{(1)}, \dots , x^{(n)}\right) ^T, \end{aligned}$$

and \(\mathbb {R}^{m \times n}\) denotes the linear space of \((m \times n)\)-matrices. We denote by \(e_j \in \mathbb {R}^n\) the j-th coordinate vector of \(\mathbb {R}^n\) and write e for the vector of an appropriate dimension whose components are equal to one. By \(\mathbb {R}^n_+\) we denote the set of all vectors with nonnegative components and notation \(\varDelta _n\) is used for the standard simplex

$$\begin{aligned} \varDelta _n = \left\{ x \in \mathbb {R}^n_+ : \sum _{i=1}^{n} x^{(i)} = 1\right\} . \end{aligned}$$

We use the norms for \(x \in \mathbb {R}^n\):

$$\begin{aligned} \Vert x\Vert _2 = \left[ \sum _{i=1}^{n} x^{(i)2}\right] ^\frac{1}{2}, \quad \Vert x\Vert _1 = \sum _{i=1}^{n} \left| x^{(i)}\right| , \quad \Vert x\Vert _\infty = \max _{i=1, \ldots , n} \left| x^{(i)}\right| . \end{aligned}$$

For \(x, s \in \mathbb {R}^n\) we use the standard scalar product:

$$\begin{aligned} \langle x, s \rangle = \sum \limits _{i=1}^n x^{(i)} s^{(i)}. \end{aligned}$$

For matrices \(A,B \in \mathbb {R}^{m\times n}\) the inner product is defined via the trace:

$$\begin{aligned} \langle A,B\rangle = \text{ Tr }(A^T\cdot B). \end{aligned}$$

A function \(F:Q \rightarrow \mathbb {R}\) is called \(\beta\)-strongly convex on a convex and closed set \(Q \subset \mathbb {R}^n\) w.r.t. a norm \(\Vert \cdot \Vert\) if for all \(x,y \in Q\) and \(\alpha \in [0,1]\) it holds:

$$\begin{aligned} F(\alpha x +(1-\alpha )y) \le \alpha F(x) + (1-\alpha ) F(y) - \alpha (1-\alpha ) \cdot \frac{\beta }{2} \Vert x-y\Vert ^2. \end{aligned}$$

The positive constant \(\beta\) is called the convexity parameter of F. If \(\beta =0\), we call F convex. A function \(\pi\) is \(\beta\)-strongly concave if \(-\pi\) is \(\beta\)-strongly convex. For a convex function \(F:Q \rightarrow \mathbb {R}\) the set \(\partial F(x)\) represents its subdifferential at \(x \in Q\), i.e.

$$\begin{aligned} \partial F(x) = \left\{ g \in \mathbb {R}^n : F(y) \ge F(x) + \langle g, y - x\rangle \, \text{ for } \text{ all } \, y \in Q \right\} . \end{aligned}$$

Its convex conjugate is

$$\begin{aligned}F^*(s) = \underset{x \in \mathbb {R}^n}{\sup }\left\{ \left\langle x,s \right\rangle - F(x)\right\} ,\end{aligned}$$

where \(s \in R^n\) is a vector of dual variables. We denote by \(\nabla F(x)\) the gradient of a differentiable function F at x.

2 Manipulation model

Let us introduce our model in order to later construct a manipulation algorithm based on interaction within a network.

2.1 Interaction network

A central aspect in our model is a network with n nodes. The structure of this network describes how nodes interact among each other, e. g. how persons receive and exchange information. Thereby, a link from node j to i represents a connection. In the context of an information network, such a link would depict that person i acquires information from person j. We summarize the data in a transition matrix \(M = \left( M_{ij}\right) _{i,j =1}^{n}\), where \(M_{ij}\) denotes the transition probability of node j to node i. Hence, the following holds:

$$\begin{aligned} \sum \limits _{i=1}^{n} M_{ij} = 1 \quad \text {for} \quad j=1,\ldots , n. \end{aligned}$$

M is a column stochastic matrix, i.e. \(M\ge 0, \; e^T\cdot M = e^T\). Our model describes the process of information acquirement rather than the formation of opinions as e. g. in Förster et al. (2016). We are interested in a few periods of interactions, thus, we take the transition matrix as fixed. The interaction causes different states of the network, based on the connections of its nodes. The state of a network can be represented as an element of the standard simplex in \(\mathbb {R}^n\) dependent on time variable. We call a vector \(x(t) \in \varDelta _n\) a state of a network at time t. Such a state reflects the value each node possesses after an interaction with other nodes. This could be for example the amount of information a person possesses in relation to the others or the market share of a company.

The dynamics of interaction can be described by an iterative process. Starting with a vector \(x(0) \in \varDelta _n\), the nodes interact repeatedly with each other. Thus, the iterative process is given by

$$\begin{aligned} x(t) = M\cdot x(t-1)= \ldots = M^t\cdot x(0). \end{aligned}$$
(1)

Obviously, all x(t)’s generated according to this process are elements of \(\varDelta _n\). Our idea is closely related to the concept of network rankings, such as the famous PageRank from Page et al. (1999). However, we focus on a limited, mostly small number, of interaction periods. Within an information network, persons would typically exchange information for a few periods, before they make a decision. This short term behavior endows the starting vector \(x(0) \in \varDelta _n\) with importance. For the sake of brevity, we drop the time index by writing \(x=x(0)\).

2.2 Agents

Let us assume that agents want to manipulate the resulting state of a network in favor of their own interests. Though they aspire certain network states, agents face some challenges by trying to manipulate a network. Often, they do not have knowledge of the network structure. Additionally, there are many situations, where the agents can’t participate in the network because they cannot connect to a node without revealing their intentions, e. g. companies cannot credibly advertise their products by themselves. There might also be networks, where an agent could interact in, but is restricted to start with a fixed vector. In particular, if the information is just spread uniformly. Instead, agents could instruct organizations to manipulate the interaction in order to reach an aspired state of the network. The organizations often have more information or at least experience about the structure of a network. In fact, they could even operate it. The agents choose among K organizations, where each organization provides an observable utility \(u^{(k)}\). This discrete choice behavior we describe by means of the so-called additive random utility models. The additive decomposition of utility goes back to psychological experiments accomplished in the 1920’s by Thurstone (1927). A formal description of this framework has been first introduced in an economic context by McFadden (1978), where rational decision-makers choose from a finite set of mutually exclusive alternatives \(\{1, \ldots , K\}\). Although the decision rule follows a rational behavior, agents are prone to random errors. These random errors describe decision-affecting features which cannot be observed. Each alternative \(k = 1,\ldots , K\) provides the utility

$$\begin{aligned}u^{(k)} + \epsilon ^{(k)}, \end{aligned}$$

where \(u^{(k)} \in \mathbb {R}\) is the deterministic utility part of the k-th alternative and \(\epsilon ^{(k)}\) is its stochastic error. We use the following notation for the vectors of deterministic utilities and of random utilities, respectively:

$$\begin{aligned} u = \left( u^{(1)}, \ldots , u^{(K)}\right) ^T, \quad \epsilon = \left( \epsilon ^{(1)}, \ldots , \epsilon ^{(K)}\right) ^T. \end{aligned}$$

The probabilistic framework yields choice probabilities for each alternative:

$$\begin{aligned} p^{(k)} = \mathbb {P}\left( u^{(k)} + \epsilon ^{(k)} = \max _{1 \le m \le K} u^{(m)} + \epsilon ^{(m)}\right) , \quad k=1, \ldots , K. \end{aligned}$$
(2)

As the consumers behave rationally, their surplus is given by the expected maximum utility of their decision:

$$\begin{aligned} E(u) = \mathbb {E}_\epsilon \left( \max _{1 \le k \le K} u^{(k)} + \epsilon ^{(k)}\right) . \end{aligned}$$
(3)

It is well known that the surplus function is convex, see e.g. Anderson et al. (1992). Additionally we make a standard assumption concerning the distribution of random errors.

Assumption 1

The random vector \(\epsilon\) follows a joint distribution with zero mean that is absolutely continuous with respect to the Lebesgue measure and fully supported on \(\mathbb {R}^K\).

We stress that the zero mean part of Assumption 1 is not restrictive and could be replaced by a finite mean assumption. By adding constants to the deterministic utilities u, it can be achieved that the random vector \(\epsilon\) has zero mean, see e. g. Train (2009).

Let \(g_{k,m}\) denote the density function of differences \(\epsilon ^{(m)} - \epsilon ^{(k)}\), \(k \not = m\) of random errors. Any point \({\bar{z}}_{k,m} \in \mathbb {R}\) which maximizes the density function \(g_{k,m}\) is called a mode of the random variable \(\epsilon ^{(m)} - \epsilon ^{(k)}\).

Assumption 2

The differences \(\epsilon ^{(k)} - \epsilon ^{(m)}\) of random errors have finite modes for all \(k \not = m\).

Assumption 1 guarantees that no ties in occur in (3), which provides differentiability of the surplus function. Further, the gradient of E corresponds to the vector of choice probabilities, which is known as the Williams-Daly-Zachary theorem, see e.g. McFadden (1978), i.e.

$$\begin{aligned} \frac{\partial E}{\partial u^{(k)}} = p^{(k)}, \quad k=1, \ldots , K. \end{aligned}$$
(4)

Hence, each component of the gradient of E yields the probability that alternative k provides the maximum utility among all alternatives.

Another equivalent representation of choice probabilities can be obtained by means of the convex conjugate of the surplus function. Note that the convex conjugate of E is given by the function \({E^*: \mathbb {R}^K \rightarrow \mathbb {R}\; \cup \; \left\{ \infty \right\} }\), defined by:

$$\begin{aligned} E^*(p) = \sup _{u \in \mathbb {R}^K} \left\{ \langle p, u \rangle - E(u)\right\} , \end{aligned}$$

where \(p = \left( p^{(1)}, \ldots , p^{(K)}\right) ^T\) is the vector of dual variables. In view of conjugate duality, the vector of choice probabilities can be derived from an optimization problem of rational inattention, see e. g. Fosgerau et al. (2020) and Müller et al. (2021a). Indeed, it has been shown that under Assumption 2 the vector of choice probabilities p is the unique solution of

$$\begin{aligned} \underset{p \in \varDelta _K}{\max } \left\{ \langle u,p \rangle - E^*(p)\right\} . \end{aligned}$$
(5)

Now, we assume that there are N agents trying to manipulate the network. Each agent i has an aspired state of network which we denote by \(v_i \in \varDelta _n\). In order to reach the aspired state, agents can choose among K organizations. The k-th organization is able to manipulate the interaction dynamics in the network, which yields at time t a state of a network \(x_k(t) \in \varDelta _n, \; k=1, \ldots , K\). In general agents prefer organizations which provide a network state in line with the states they desire such as an aspired market shares distribution or state of information. In order to assess the outcome of a manipulation, any agent i has to compare K distances, i.e.

$$\begin{aligned} \Vert v_i - x_k(t)\Vert _2 , \quad k=1, \ldots , K, \end{aligned}$$

respectively

$$\begin{aligned} \Vert v_i - M^t\cdot x_k\Vert _2 , \quad k=1, \ldots , K. \end{aligned}$$
(6)

Note that (6) provides a way for agent i to observe the utility of choosing the k-th organization. The network state at time t is observable, so any agent is able to check, if an organization has manipulated the network satisfactorily. Let us put all the states in a matrix, which yields a way to summarize all the states of a network at time t in one variable, i.e.

$$\begin{aligned} X(t)= \left( x_1(t), \ldots , x_K(t)\right) \in \varDelta _n^K. \end{aligned}$$

The matrix above can also be expressed in terms of the starting vectors, by defining

$$\begin{aligned} X = \left( x_1, \ldots , x_K\right) \in \varDelta _n^K, \end{aligned}$$

which enables us to write

$$\begin{aligned} X(t) = M^t \cdot X. \end{aligned}$$
(7)

We define a vector valued function \(g_i: \varDelta _n^K \rightarrow \mathbb {R}_+^K\) for any agent i, which stores all these distances of the i-th agent and, hence, depends on a matrix X as input variable:

$$\begin{aligned} g_i(X) = \left( \Vert v_i - M^t\cdot x_1\Vert _2, \ldots , \Vert v_i - M^t\cdot x_K\Vert _2\right) ^T,\quad i=1,\ldots , N. \end{aligned}$$

We write in matrix form:

$$\begin{aligned} G(X) = \left( g_1(X), \ldots , g_N(X)\right) \in \mathbb {R}^{K \times N}. \end{aligned}$$

In view of additive random utility models, \(g_i(\cdot )\) provides a way to characterize the observable utility \(u_i\) by setting

$$\begin{aligned} u_i = -g_i(X), \quad i=1,\ldots ,N. \end{aligned}$$

Hence, the vector of the i-th agent choice probabilities has entries

$$\begin{aligned} p_i^{(k)}(X) = \mathbb {P}\left( -g_i^{(k)}(X) + \epsilon _i^{(k)} = \max _{1 \le m \le K} -g_i^{(m)}(X) + \epsilon _i^{(m)}\right) , \quad k=1, \ldots , K. \end{aligned}$$

Equivalently, \(p_i(X)\) solves the following rational inattention problem:

$$\begin{aligned} \min _{p \in \varDelta _K} \langle g_i(X),p \rangle + E_i^*(p). \end{aligned}$$
(8)

Let us stack the choice probabilities of all the agents into a matrix and call it the choice matrix:

$$\begin{aligned} P(X) = \left( p_1(X), \ldots , p_N(X)\right) \in \varDelta _K^N. \end{aligned}$$
(9)

Similarly to the choice matrix, we write \(P \in \varDelta _K^N\) for any matrix of probability vectors, i.e. \(P = \left( p_1, \ldots , p_N\right)\) with \(p_i \in \varDelta _K\), \(i=1, \ldots , N\).

2.3 Organizations

Let us describe the behavior of advertising organizations. Their goal is to attract agents as clients by providing them with additional manipulation power. This is done by choosing an appropriate starting distribution, thus, the communication process is initialized by organizations. By strategic decisions, such as substantial alignment, design, product placements or personal relations, or by direct decisions, such as ranking of a website as result of a certain query or advertising products directly on a marketplace, the organizations determine these vectors, which reflect a network state before interaction starts.

In order to attract the i-th agent with aspired state \(v_i\), the k-th organization selects a starting distribution \(x_k \in \varDelta _n\) such that \(\left\| v_i - M^t\cdot x_k \right\| _2\) becomes small. The organization’s goal is to acquire as many agents as possible by simultaneously satisfying the corresponding aspired states. However, the agents are not necessarily equally important for the organization. Instead, the organization primary wants to please agents, who already prefer the organization compared to other competitors. Let us state these considerations in a formal way. An organization k observes to which extent the agents choose it, i.e. quantified by choice probabilities \(p_i^{(k)}\), \(i=1, \ldots , N\). Thus, the k-th organization measures its performance by the following objective:

$$\begin{aligned} \sum _{i=1}^{N} p_i^{(k)}\cdot \Vert v_i-M^t\cdot x_k\Vert _2. \end{aligned}$$
(10)

Yet, an organization’s choice of the manipulation distribution not only depends on the agents’ aspired states, but also on its own objectives. This reflects, that an organization might also aspire a certain state of the network in order to gain profits from the network participants. Therefore, we introduce a payoff function for organization k, which depends on its caused state of manipulation:

$$\begin{aligned} \pi _k\left( M^t\cdot x_k\right) . \end{aligned}$$
(11)

Let us illustrate by examples how a network state could affect the payoff of the k-th organization. Groups in social media platforms might avoid sharing information with persons who have contrary opinions, such that no arguments against their theories or fake news are communicated. Prohibiting or restricting persons’ access to information might be a worthwhile purpose in an information network. Particularly, this is interesting in situations, where direct manipulation of opinions is difficult. Since the authors in Acemoglu and Ozdaglar (2011) mention the source of information as a key component of opinion formation, the manipulation of the information acquirement process contributes to the tampering of opinion formation. A social media influencer might loose credibility of her followers, if they find out about an unacceptable advertise. We state an assumption concerning the payoff functions.

Assumption 3

The payoff function \(\pi _k\) is \(\tau _k\)-strongly concave w.r.t. the norm \(\Vert \cdot \Vert _2\) for all \(k=1, \ldots , K\).

Altogether, the objective function of the k-th organization incorporates the both goals:

$$\begin{aligned} \sum _{i=1}^{N} p_i^{(k)}\cdot \Vert v_i-M^t\cdot x_k\Vert _2 - \frac{1}{\eta _k} \cdot \pi _k\left( M^t\cdot x_k\right) , \end{aligned}$$
(12)

where \(\eta _k > 0\) is a regularization parameter, which shows the importance of payoffs generated by the network. Note that small values of \(\eta _k\) indicate a more restrictive behavior of the k-th organization, meaning that it rather focuses on its own interests than to freely adjust the manipulation distribution according to the agents’ aspired states. According to Assumption 3, the negative of the payoff function serves as regularization term. Strongly concave regularization is a well-known and widely-used technique in optimization theory, see e.g. Nesterov (2018). From an economic perspective, the payoff function mimics a stable behavior of the organization. Apart from the already mentioned payoffs generated by the network, this function could also reflect that the k-th organization avoids to deviate too much from a certain targeted state \(c_k=M^t\cdot s_k\), where \(s_k \in \varDelta _n\), due to adjustment costs. As a matter of fact, organizations might know from experience which starting distributions cause network states at a neighborhood of the targeted state, but must take more effort to detect starting distributions for states outside this neighborhood and, thus, face larger adjustment costs. Based on these considerations, a typical \(\tau _k\)-strongly concave payoff function is

$$\begin{aligned} \pi _k(M^t\cdot x_k) = -\frac{\tau _k}{2} \left\| M^t\cdot x_k- c_k\right\| _2^2. \end{aligned}$$

We shall discuss the numerical practicability of this choice later on. For a given choice matrix \(P \in \varDelta _K^N\), network M and time t, the k-th organization chooses its optimal starting distribution \(x_k \in \varDelta _n\) by solving

$$\begin{aligned} \min _{x_k \in \varDelta _n} \sum _{i=1}^{N} p_i^{(k)}\cdot \Vert v_i-M^t\cdot x_k\Vert _2 - \frac{1}{\eta _k} \cdot \pi _k\left( M^t\cdot x_k\right) . \end{aligned}$$
(13)

For now, we assume that the optimization problems given in (13) have unique solutions for any choice matrix P, which we denote by \(x_k(P)\), \(k=1, \ldots , K\). We keep these optimal manipulation values in a matrix

$$\begin{aligned} X(P) =\left( x_1(P), \ldots , x_K(P)\right) , \end{aligned}$$
(14)

and call it the manipulation matrix.

2.4 Network manipulation algorithm

In the preceding section we described the behavior of agents and organizations when facing the challenge to manipulate a network in favor of agents’ desires. The key aspect is that their behavior summarized in (8) and (13) suggests an alternating interaction between both groups. Organizations enter the market and offer their manipulation distributions. Then, agents observe how satisfactory organizations would manipulate the network state in view of the agents aspired states (e. g. by comparing past results caused by an organization). Based on these observations, agents make their decisions, i.e. they choose organizations with probability according to (8). The choice probabilities provide feedback to the organizations, which then in turn adjust their starting distributions following the behavior given in (13). By using previous notation, we have the following dynamics:

$$\begin{aligned} P_{\ell +1} = P(X_\ell ), \quad X_{\ell +1} = X(P_{\ell +1}), \end{aligned}$$

where \(X_0\) is any feasible starting variable, e. g. \(X_0 = \frac{1}{n}\cdot ee^T\).

In what follows, we provide an equivalent description of this network manipulation algorithm in order to better study its convergence properties. For that, we define a potential function which incorporates the behavior of all agents and organizations:

$$\begin{aligned} \varPhi (X,P) = \sum _{i=1}^{N} E_i^*(p_i) + \sum _{k=1}^{K} \left( \sum _{i=1}^{N} p_i^{(k)}\cdot \Vert v_i-M^t\cdot x_k\Vert _2- \frac{1}{\eta _k} \cdot \pi _k\left( M^t\cdot x_k\right) \right) . \end{aligned}$$
(15)

Therefore, the choice matrix solves the following minimization problem:

$$\begin{aligned} P(X) = \arg \min _{P \in \varDelta _K^N} \varPhi (X,P). \end{aligned}$$
(16)

Analogously, we have for a manipulation matrix:

$$\begin{aligned} X(P) = \arg \min _{X \in \varDelta _n^K} \varPhi (X,P), \end{aligned}$$
(17)

which means that the network manipulation algorithm can be viewed as an alternating minimization scheme.

From the viewpoint of computational economics, it seems reasonable to assume that agents and organizations are not able to solve their corresponding optimization problems exactly. Rather than that, the solutions can be obtained up to small errors. This can be for example due to observation errors of the input parameters given by choice and/or manipulation matrices. Another reason could be that exact optimization is time-exhaustive or too costly. In order to incorporate this faulty behavior into our manipulation algorithm, we assume that just inexact minimization in (16) and (17) is possible. More precisely, \(\delta _1\)-inexact solutions for (16) and \(\delta _2\)-inexact solutions for (17) are available. We recall that evaluated at a \(\delta\)-inexact solution the function value is at most the minimum value plus \(\delta\), see Sect. 3 for details. Thus, we are ready to state a more general network manipulation algorithm, based on an inexact alternating minimization scheme:

$$\ \begin{aligned} & {\text{Initialize }}\tilde{X}_{0} \in \Delta _{n}^{K} .\;{\text{For}}\;{\text{l = 0, 1, 2,}} \ldots \;{\text{update:}} \\ & \quad \tilde{P}_{{\ell + 1}} = \arg \mathop {\min }\limits_{{P \in {\text{ }}\varDelta _{K}^{N} }}^{{\delta _{1} }} \varPhi \left( {\tilde{X}_{\ell } ,P} \right) \\ & \quad \tilde{X}_{{\ell + 1}} = \arg \mathop {\min }\limits_{{X \in {\text{ }}\varDelta _{n}^{K} }}^{{\delta _{2} }} \varPhi \left( {X,\tilde{P}_{{\ell + 1}} } \right) \\ \end{aligned}$$

The inexact algorithm raises the questions, if the corresponding alternating behavior converges to a stable equilibrium. Do agents and organizations reach a state, where their choices do not change anymore no matter what the starting distributions of the organizations look like? In other words, does a unique minimizer of the potential function exist and does the algorithm converge to this minimizer? Moreover, it is interesting to analyze how the faulty behavior in terms of the errors impacts the possible convergence. We shall answer these questions by applying general results on inexact alternating minimization schemes, which we present in Sect. 3. This is possible since the potential function (15) can be suitably decomposed. For that, we define:

$$\begin{aligned} f(X) = -\sum _{k=1}^{K} \frac{1}{\eta _k} \cdot \pi _k\left( M^t\cdot x_k\right) , \quad h(P) = \sum _{i=1}^{N} E_i^*(p_i). \end{aligned}$$
(18)

Using the standard inner product, the potential function in (15) can be written as follows:

$$\begin{aligned} \varPhi (X,P) = f(X) + \langle G(X),P \rangle + h(P). \end{aligned}$$
(19)

3 Inexact alternating minimization

In cases, where the analytical solution of an optimization problem cannot be derived, it is necessary to solve the problem numerically. Normally, this numerical solutions are only exact up to a small \(\delta\)-error. We review some theoretical aspects of inexact optimization, which we need for convergence analysis. Let us consider optimization problems of the form

$$\begin{aligned} \min _{z \in Q} \varPhi (z), \end{aligned}$$
(20)

where \(\varPhi\) is a strongly convex function and Q a closed and convex set. We denote by \(z^*\) the solution of problem (20). Recall that for a \(\beta\)-strongly convex function \(\varPhi\) it holds:

$$\begin{aligned} \varPhi (z) \ge \varPhi (z^*) + \frac{\beta }{2}\Vert z-z^*\Vert ^2 \quad \text {for all} \; z \in Q. \end{aligned}$$
(21)

For a \(\delta\)-inexact solution we use the standard definition, see e. g. Stonyakin et al. (2019):

Definition 1

A point \(\tilde{z}\) is a \(\delta\)-inexact solution with \(\delta \ge 0\), i.e.

$$\begin{aligned} \tilde{z}\in \arg \min ^\delta _{z \in Q} \varPhi (z), \end{aligned}$$

if and only if there exists \(g \in \partial \varPhi (\tilde{z})\) such that \(\langle g, z^* - \tilde{z}\rangle \ge - \delta\).

Due to Definition 1, a point \(\tilde{z}\) provides the minimal objective function value of (20) up to the error \(\delta\). This can be easily seen, because \(\varPhi\) is convex and therefore it holds for any \(z \in Q\):

$$\begin{aligned} \varPhi (z^*) \ge \varPhi (\tilde{z}) + \langle g, z^* - \tilde{z}\rangle \ge \varPhi (\tilde{z}) - \delta , \end{aligned}$$

which is equivalent to

$$\begin{aligned} \varPhi (\tilde{z}) \le \varPhi (z^*) + \delta \le \varPhi (z) + \delta . \end{aligned}$$
(22)

In what follows, we shall focus on decision variable z, which can be separated into two blocks, i.e. \(z= (x,p)\). For those situations, alternating minimization methods can be applied. The block structure enables to minimize the objective function for each block separately, which is, in particular, a valuable property for big data applications. Over years, many convergence results for alternating minimization methods under different assumptions were shown. For example, Grippof and Sciandrone (1999) show that updating each component in a sequential manner yields a sequence of iterates such that each limit point is a global minimizer of a continuously differentiable and pseudoconvex function. Under the assumption of Lipschitz continuous gradients and coordinate-wise strong convexity of the objective function, Luo and Tseng (1993) prove linear convergence to a stationary point for constrained problems. Convergence of an alternating minimization scheme for objectives functions with non-differentiable parts has been derived by Beck (2015). Pu et al. (2014) show, that under assumptions such as convexity for one and strong convexity for the other objective term, the inexact alternating minimization algorithm applied to the primal coincides with the inexact proximal gradient method to the dual problem. Recently, in Nesterov (2020) an alternating minimization method was used for soft clustering. There, the objective function additionally includes an interaction term linking both blocks of variables. Under certain assumptions, linear convergence was established provided the problem can be solved exactly in each block. In this paper, we are interested in an inexact alternating minimization algorithm for objective functions equipped with the structure introduced by Nesterov (2020). Let \(Q_1, Q_2\) be closed and convex sets in finite dimensional vector spaces \(\mathbb {V}_1, \mathbb {V}_2\) and let \(\mathbb {V}\) be a finite dimensional vector space. The objective function is given by

$$\begin{aligned} \varPhi (x,p) = f(x) + \langle G_1(x),G_2(p) \rangle + h(p), \quad x \in Q_1, p \in Q_2, \end{aligned}$$
(23)

where the operators \(G_1: \mathbb {V}_1 \rightarrow \mathbb {V}^*\) and \(G_2: \mathbb {V}_2 \rightarrow \mathbb {V}\) are Lipschitz-continuous with moduli \(L_1\) and \(L_2\) on the respective sets \(Q_1, Q_2\). Moreover, we assume that the interaction term \(\langle G_1(x),G_2(p) \rangle\) is convex and closed in \(x \in Q_1\) for any fixed \(p \in Q_2\) and vice versa and that the functions f and h are \(\sigma _1\)- and \(\sigma _2\)-strongly convex on \(Q_1\), respectively on \(Q_2\). Further, we assume the following strict inequality to hold

$$\begin{aligned} L_1^2\cdot L_2^2 < \sigma _1 \cdot \sigma _2, \end{aligned}$$
(24)

under which the function \(\varPhi\) is shown to be strongly convex on \(Q = Q_1\times Q_2\), see Nesterov (2020). Let the optimal solution of (20) be written as \(z^* = (x^*, p^*)\). In order to solve (20), an alternating minimization method has been proposed by Nesterov (2020). This method generates sequences \(\{x_\ell \}_{\ell \ge 0}\) and \(\{p_\ell \}_{\ell \ge 1}\) as follows:

$$\begin{aligned} & {\text{Choose}}\;x_{0} \in Q_{1} {\text{.}}\;{\text{For}}\;\ell = 0,1,2, \ldots \;{\text{update:}} \\ & \quad p_{{\ell + 1}} = \arg \mathop {\min }\limits_{{p \in Q_{2} }} \varPhi (x_{\ell } ,p) = u(x_{\ell } ) \\ & \quad \,x_{{\ell + 1}} = \arg \mathop {\min }\limits_{{x \in Q_{1} }} {\text{ }}\varPhi (x,p_{{\ell + 1}} ) = v(p_{{\ell + 1}} ) \\ \end{aligned}$$

Convergence analysis in Nesterov (2020) is based on fixed point iteration. For that, the operators \(T:Q_1 \mapsto Q_1\) and \(S:Q_2 \mapsto Q_2\) are defined as follows:

$$\begin{aligned} T(x) = v(u(x)), \quad S(p) = u(v(p)). \end{aligned}$$
(25)

This enables to write the update step of the alternating minimization scheme:

$$\begin{aligned}&x_{\ell +1} = T(x_\ell ), \quad p_{\ell +1} = S(p_\ell ). \end{aligned}$$

Under condition (24), \(T(\cdot )\) and \(S(\cdot )\) are contraction mappings. Thus, the linear convergence of the generated sequences to the minimizer \(\left( x^*,p^*\right)\) of \(\varPhi\) could be shown in Nesterov (2020):

$$\begin{aligned} \Vert x_{\ell +1} - x^*\Vert \le \lambda ^{\ell +1} \Vert x(0)-x^*\Vert , \quad \Vert p_{\ell +1} - p^*\Vert \le \lambda ^{\ell } \Vert p(1)-p^*\Vert , \end{aligned}$$

where

$$\begin{aligned} \lambda = \frac{L_1^2\cdot L_2^2}{\sigma _1 \cdot \sigma _2} <1. \end{aligned}$$

We analyze an inexact version of the alternating minimization method applied to objective functions in (23), when subproblems are solved inexactly in the sense of Definition 1. For that, let us adapt the algorithm in the following way:

$$\begin{aligned} & {\text{Choose}}\;x_{0} \in Q_{1} {\text{.}}\;{\text{For}}\;\ell = 0,1,2, \ldots \;{\text{update:}} \\ & \quad \tilde{p}_{{\ell + 1}} = \arg \mathop {\min }\limits_{{p \in Q_{2} }}^{{\delta _{1}^{{(\ell )}} }} \varPhi (\tilde{x}_{\ell } ,p) = u^{{\delta _{1}^{{(\ell )}} }} (\tilde{x}_{\ell } ) \\ & \quad \,\tilde{x}_{{\ell + 1}} = \arg \mathop {\min }\limits_{{x \in Q_{1} }}^{{\delta 2}} \varPhi (x,\tilde{p}_{{\ell + 1}} ) = v^{{\delta _{2}^{{(\ell )}} }} (\tilde{p}_{{\ell + 1}} ) \\ \end{aligned}$$

We allow different accuracy for the above subproblems. Moreover, we allow for iteration-specific errors. The equations also suggest that in iteration \(\ell\) a \(\delta ^{(\ell )}\)-error is made twice. This can be seen by looking at the function values evaluated at two consecutive points of the sequences \(\{\tilde{x}_\ell \}_{t \ge 0}\) and \(\{\tilde{p}_\ell \}_{t \ge 0}\) generated via the \(\delta\)-inexact solutions of the auxiliary optimization problems:

$$\begin{aligned} \varPhi (\tilde{x}_{\ell +1},\tilde{p}_{\ell +1})&= f(\tilde{x}_{\ell +1}) + \langle G_1(\tilde{x}_{\ell +1}),G_2(\tilde{p}_{\ell +1}) \rangle + h(\tilde{p}_{\ell +1}) \\&\le f(\tilde{x}_{\ell }) +\delta ^{(\ell )}_2 + \langle G_1(\tilde{x}_{\ell }),G_2(\tilde{p}_{\ell +1}) \rangle +h(\tilde{p}_{\ell +1}) \\&\le f(\tilde{x}_{\ell })+ \delta ^{(\ell )}_2 + \langle G_1(\tilde{x}_{\ell }),G_2(\tilde{p}_{\ell }) \rangle + h(\tilde{p}_{\ell })+ \delta _1 \\&\le \varPhi (\tilde{x}_{\ell },\tilde{p}_{\ell }) + 2\cdot \max \{\delta ^{(\ell )}_1, \delta ^{(\ell )}_2\}. \end{aligned}$$

Next, we estimate the distances between \(u^{\delta ^{(\ell )}_1}(x)\) and u(x) as well as between \(v^{\delta ^{(\ell )}_2}(p)\) and v(p).

Lemma 1

For any \(x \in Q_1\) and \(\ell = 0, 1, \ldots\), it holds:

$$\begin{aligned} \Vert u^{\delta ^{(\ell )}_1}(x) - u(x)\Vert \le \sqrt{\frac{2\delta ^{(\ell )}_1}{\sigma _2}}, \end{aligned}$$

and for any \(p \in Q_2\) \(\ell = 0, 1, \ldots\) it holds:

$$\begin{aligned} \Vert v^{\delta ^{(\ell )}_2}(p) - v(p)\Vert \le \sqrt{\frac{2\delta ^{(\ell )}_2}{\sigma _1}}. \end{aligned}$$

Proof

Take an arbitrary iteration \(\ell\). We apply (21) to derive:

$$\begin{aligned} \begin{array}{rcl} h(u^{\delta ^{(\ell )}_1}(x)) + \langle G_1(x), G_2(u^{\delta ^{(\ell )}_1}(x)) \rangle &{}\ge &{} h(u(x)) + \langle G_1(x), G_2(u(x)) \rangle \\ \\ &{}&{}\displaystyle + \frac{\sigma _2}{2}\Vert u^{\delta ^{(\ell )}_1}(x) - u(x)\Vert ^2. \end{array} \end{aligned}$$

Due to (22) we additionally have:

$$\begin{aligned} h(u(x)) + \langle G_1(x), G_2(u(x)) \rangle \ge h(u^{\delta ^{(\ell )}_1}(x)) + \langle G_1(x), G_2(u^{\delta _1}(x)) \rangle - \delta ^{(\ell )}_1. \end{aligned}$$

Altogether, we obtain:

$$\begin{aligned} \Vert u^{\delta ^{(\ell )}_1}(x) - u(x)\Vert ^2 \le \frac{2\delta ^{(\ell )}_1}{\sigma _2}. \end{aligned}$$

The proof for \(\Vert v^{\delta ^{(\ell )}_2}(p) - v(p)\Vert\) follows analogously. \(\square\)

Let us elaborate on the continuity properties for operators \(u^{\delta ^{(\ell )}_1}(\cdot )\) and \(v^{\delta ^{(\ell )}_2}(\cdot )\).

Lemma 2

For any \(x_1, x_2 \in Q_1\) and \(\ell = 0, 1, \ldots\) it holds:

$$\begin{aligned} \Vert u^{\delta ^{(\ell )}_1}(x_1) - u^{\delta ^{(\ell )}_1}(x_2)\Vert \le 2\cdot \sqrt{\frac{2\delta ^{(\ell )}_1}{\sigma _2}} + \frac{L_1\cdot L_2}{\sigma _2}\cdot \Vert x_1 - x_2\Vert , \end{aligned}$$

and for any \(p_1, p_2 \in Q_2\) and \(\ell = 0, 1, \ldots\) it holds:

$$\begin{aligned} \Vert v^{\delta ^{(\ell )}_2}(p_1) - v^{\delta ^{(\ell )}_2}(p_2)\Vert \le 2\cdot \sqrt{\frac{2\delta ^{(\ell )}_2}{\sigma _1}} + \frac{L_1\cdot L_2}{\sigma _1}\cdot \Vert p_1 - p_2\Vert . \end{aligned}$$

Proof

Fix an arbitrary iteration \(\ell\). Applying Lemma 1 and the triangle inequality twice yields:

$$\begin{aligned} \Vert u^{\delta ^{(\ell )}_1}(x_1) - u^{\delta ^{(\ell )}_1}(x_2)\Vert&\le \Vert u^{\delta ^{(\ell )}_1}(x_1) - u(x_1)\Vert + \Vert u(x_1) - u^{\delta _1}(x_2)\Vert \\&\le \sqrt{\frac{2\delta ^{(\ell )}_1}{\sigma _2}} + \Vert u(x_1) - u(x_2)\Vert + \Vert u(x_2) - u^{\delta _1}(x_2)\Vert \\&\le \sqrt{\frac{2\delta ^{(\ell )}_1}{\sigma _2}} + \frac{L_1\cdot L_2}{\sigma _2}\cdot \Vert x_1 - x_2\Vert + \sqrt{\frac{2\delta ^{(\ell )}_1}{\sigma _2}}. \end{aligned}$$

The last inequality is due to Nesterov (2020), where it is shown for any \(x_1, x_2 \in Q_1\):

$$\begin{aligned} \Vert u(x_1)-u(x_2)\Vert \le \frac{L_1\cdot L_2}{\sigma _2} \Vert x_1-x_2\Vert . \end{aligned}$$

Similar reflections yield the result for \(\Vert v^{\delta ^{(\ell )}_2}(p_1) - v^{\delta ^{(\ell )}_2}(p_2)\Vert\). \(\square\)

Let us introduce inexact versions of the operators T and S:

$$\begin{aligned} T^{\delta ^{(\ell )}}(x) = v^{\delta ^{(\ell )}_2}(u^{\delta ^{(\ell )}_1}(x)), \quad S^{\delta ^{(\ell )}}(p) = u^{\delta ^{(\ell )}_1}(v^{\delta ^{(\ell )}_2}(p)), \end{aligned}$$

which we use to rewrite the update of the inexact alternating minimization as

$$\begin{aligned} \tilde{x}_{\ell +1} = T^{\delta ^{(\ell )}}(\tilde{x}_\ell ), \quad \tilde{p}_{\ell +1} = S^{\delta ^{(\ell )}}(\tilde{p}_\ell ). \end{aligned}$$
(26)

The following result provides uniform continuity up to an error of the operators defined in (26).

Proposition 1

For any \(\tilde{x}_1, \tilde{x}_2 \in Q_1\) and \(\ell = 0, 1, \ldots\) it holds:

$$\begin{aligned} \Vert T^{\delta ^{(\ell )}}(\tilde{x}_1) - T^{\delta ^{(\ell )}}(\tilde{x}_2)\Vert \le \lambda \Vert \tilde{x}_1 - \tilde{x}_2\Vert +2\cdot \sqrt{\frac{2\delta ^{(\ell )}_2}{\sigma _1}} + 2\cdot \sqrt{\frac{2\delta ^{(\ell )}_1}{\sigma _2}}\cdot \frac{L_1\cdot L_2}{\sigma _1}, \end{aligned}$$

and for any \(\tilde{p}_1, \tilde{p}_2 \in Q_2\) it holds:

$$\begin{aligned} \Vert S^{\delta ^{(\ell )}}(\tilde{p}_1) - S^{\delta ^{(\ell )}}(\tilde{p}_2)\Vert \le + \lambda \Vert \tilde{p}_1 - \tilde{p}_2\Vert +2\cdot \sqrt{\frac{2\delta ^{(\ell )}_1}{\sigma _2}} + 2\cdot \sqrt{\frac{2\delta ^{(\ell )}_2}{\sigma _1}}\cdot \frac{L_1\cdot L_2}{\sigma _2}. \end{aligned}$$

Proof

We apply Lemma 2 to derive

$$\begin{aligned} \Vert T^{\delta ^{(\ell )}}(\tilde{x}_1) - T^{\delta ^{(\ell )}}(\tilde{x}_2)\Vert&= \Vert v^{\delta ^{(\ell )}_2}(u^{\delta ^{(\ell )}_1}(\tilde{x}_1)) - v^{\delta ^{(\ell )}_2}(u^{\delta ^{(\ell )}_1}(\tilde{x}_2))\Vert \\&\le 2\cdot \sqrt{\frac{2\delta ^{(\ell )}_2}{\sigma _1}} + \frac{L_1\cdot L_2}{\sigma _1}\cdot \Vert u^{\delta ^{(\ell )}_1}(\tilde{x}_1) - u^{\delta ^{(\ell )}_1}(\tilde{x}_2)\Vert \\&\le 2\cdot \sqrt{\frac{2\delta ^{(\ell )}_2}{\sigma _1}} + \frac{L_1\cdot L_2}{\sigma _1}\cdot 2\sqrt{\frac{2\delta ^{(\ell )}_1}{\sigma _2}} + \underbrace{\frac{L_1^2\cdot L_2^2}{\sigma _1 \cdot \sigma _2}}_{= \lambda }\cdot \Vert \tilde{x}_1 - \tilde{x}_2\Vert . \end{aligned}$$

Again, the second assertion follows similarly. \(\square\)

Since we cannot rely on the contraction property of \(T^{\delta ^{(\ell )}}\) and \(S^{\delta ^{(\ell )}}\), the convergence analysis of the sequences \(\{\tilde{x}_\ell \}_{\ell \ge 0}\) and \(\{\tilde{p}_\ell \}_{\ell \ge 1}\) becomes involved. For that, we start with the following auxiliary result.

Lemma 3

For any \(x \in Q_1\) and \(\ell =0, 1, \ldots\) it holds:

$$\begin{aligned} \left\| T(x)-T^{\delta ^{(\ell )}}(x)\right\| \le \sqrt{\frac{2\delta ^{(\ell )}_2}{\sigma _1}} + \frac{L_1\cdot L_2}{\sigma _1}\cdot \sqrt{\frac{2\delta ^{(\ell )}_1}{\sigma _2}}, \end{aligned}$$

and for any \(p \in Q_2\) and \(\ell =0, 1, \ldots\) it holds:

$$\begin{aligned} \left\| S(p)-S^{\delta ^{(\ell )}}(p)\right\| \le \sqrt{\frac{2\delta ^{(\ell )}_1}{\sigma _2}} + \frac{L_1\cdot L_2}{\sigma _2}\cdot \sqrt{\frac{2\delta ^{(\ell )}_2}{\sigma _1}}. \end{aligned}$$

Proof

We show the first part. It follows by means of Lemmata 1 and 2.

$$\begin{aligned} \left\| T(x)-T^{\delta ^{(\ell )}}(x)\right\|&= \left\| v(u(x))- v^{\delta ^{(\ell )}_2}(u^{\delta ^{(\ell )}_1}(x))\right\| \\&\le \left\| v(u(x))- v(u^{\delta ^{(\ell )}_1}(x))\right\| + \left\| v(u^{\delta ^{(\ell )}_1}(x))- v^{\delta ^{(\ell )}_2}(u^{\delta ^{(\ell )}_1}(x))\right\| \\&\le \frac{L_1\cdot L_2}{\sigma _1} \cdot \left\| u(x) - u^{\delta ^{(\ell )}_1}(x) \right\| + \sqrt{\frac{2\delta ^{(\ell )}_2}{\sigma _1}} \\&\le \frac{L_1\cdot L_2}{\sigma _1} \cdot \sqrt{\frac{2\delta ^{(\ell )}_1}{\sigma _2}} + \sqrt{\frac{2\delta ^{(\ell )}_2}{\sigma _1}}. \end{aligned}$$

Clearly, the proof of the second part is similar. \(\square\)

Now we are ready to state the main result concerning convergence of the inexact alternating minimization scheme.

Theorem 1

For the inexact alternating minimization scheme holds:

$$\begin{aligned} \left\| \tilde{x}_{\ell +1} - x^*\right\| \le \lambda ^{\ell +1} \left\| x_0 - x^*\right\| + \sum _{k=0}^{\ell }\lambda ^k\left[ \sqrt{\frac{2\delta ^{(\ell -k)}_2}{\sigma _1}} + \frac{L_1\cdot L_2}{\sigma _1}\cdot \sqrt{\frac{2\delta ^{(\ell -k)}_1}{\sigma _2}}\right] \end{aligned}$$
(27)

and

$$\begin{aligned} \left\| \tilde{p}_{\ell +1} - p^*\right\| \le \lambda ^{\ell }\left[ \left\| p_1 - p^*\right\| + \sqrt{\frac{2\delta ^{(0)}_1}{\sigma _2}} \right] + \sum _{k=1}^{\ell }\lambda ^k\left[ \sqrt{\frac{2\delta ^{(\ell -k)}_1}{\sigma _2}} + \frac{L_1\cdot L_2}{\sigma _2}\cdot \sqrt{\frac{2\delta ^{(\ell -k)}_2}{\sigma _1}}\right] . \end{aligned}$$
(28)

Proof

We apply Lemma 3 to derive:

$$\begin{aligned} \left\| \tilde{x}_{\ell +1} - x_{\ell +1}\right\|&\le \left\| T(x_{\ell }) - T(\tilde{x}_{\ell })\right\| + \left\| T(\tilde{x}_{\ell }) - T^{\delta ^{(\ell )}}(\tilde{x}_{\ell })\right\| \\&\le \lambda \cdot \left\| x_{\ell } - \tilde{x}_{\ell }\right\| + \sqrt{\frac{2\delta ^{(\ell )}_2}{\sigma _1}} + \frac{L_1\cdot L_2}{\sigma _1}\cdot \sqrt{\frac{2\delta ^{(\ell )}_1}{\sigma _2}} \\&\le \ldots \le \lambda ^{\ell +1} \underbrace{\left\| x_0-x_0\right\| }_{=0} + \sum _{k=0}^{\ell }\lambda ^k\left[ \sqrt{\frac{2\delta ^{(\ell -k)}_2}{\sigma _1}} + \frac{L_1\cdot L_2}{\sigma _1}\cdot \sqrt{\frac{2\delta ^{(\ell -k)}_1}{\sigma _2}}\right] . \end{aligned}$$

We are therefore able to estimate the distance of the \(\left( \ell +1\right)\)-th iterate to the minimizer:

$$\begin{aligned} \left\| \tilde{x}_{\ell +1} - x^*\right\|&\le \left\| \tilde{x}_{\ell +1} - x_{\ell +1}\right\| + \left\| x_{\ell +1} - x^*\right\| \\&\le \sum _{k=0}^{\ell }\lambda ^k\left[ \sqrt{\frac{2\delta ^{(\ell -k)}_2}{\sigma _1}} + \frac{L_1\cdot L_2}{\sigma _1}\cdot \sqrt{\frac{2\delta ^{(\ell -k)}_1}{\sigma _2}}\right] + \lambda ^{\ell +1} \Vert x_0 - x^*\Vert . \end{aligned}$$

For the proof of the inequality (30) note that the first iterate of the algorithm is not chosen freely. Instead it is the solution of the corresponding optimization problem. Hence, the first iterates of the exact and inexact version are in general not equal, i.e. \(p_1 \ne \tilde{p}_1\), which provides

$$\begin{aligned} \left\| \tilde{p}_{\ell +1} - p_{\ell +1}\right\| \le \lambda ^\ell \left\| \tilde{p}_{1} - p_{1}\right\| + \sum _{k=1}^{\ell }\lambda ^k\left[ \sqrt{\frac{2\delta ^{(\ell -k)}_1}{\sigma _2}} + \frac{L_1\cdot L_2}{\sigma _2}\cdot \sqrt{\frac{2\delta ^{(\ell -k)}_2}{\sigma _1}}\right] . \end{aligned}$$

It remains to recall that \(p_1=u(x_0)\) and \(\tilde{p}_1=u^{\delta ^{(0)}_1}(x_0)\) and apply Lemma 1. The result (30) follows then in the same manner as for (29). \(\square\)

According to Theorem 1, the inexact alternating minimization does not converge in general. Yet, if the sequences of errors \(\left\{ \delta ^{(\ell )}_1\right\} _{\ell \ge 0}\) and \(\left\{ \delta ^{(\ell )}_2\right\} _{\ell \ge 0}\) are not growing, the right hand side of Eqs. (27) and (28) can be controlled by the model’s parameter. In order to reach good convergence results, Theorem 1 suggests that at the beginning of the inexact alternating minimization algorithm the problems can be solved up to bigger errors, whereas at later iterations the errors shall be reduced.

Corollary 1

Let in each iteration the same errors are made, i. e. \(\delta ^{(\ell )}_1 = \delta _1\) and \(\delta ^{(\ell )}_2 = \delta _2\) for \(\ell =0, 1, \ldots\). Then, for the inexact alternating minimization scheme holds:

$$\begin{aligned} \left\| \tilde{x}_{\ell +1} - x^*\right\| \le \lambda ^{\ell +1} \left\| x_0 - x^*\right\| + \frac{1-\lambda ^\ell }{1-\lambda }\left[ \sqrt{\frac{2\delta _2}{\sigma _1}} + \frac{L_1\cdot L_2}{\sigma _1}\cdot \sqrt{\frac{2\delta _1}{\sigma _2}}\right] \end{aligned}$$
(29)

and

$$\begin{aligned} \left\| \tilde{p}_{\ell +1} - p^*\right\| \le \lambda ^{\ell }\left[ \left\| p_1 - p^*\right\| + \sqrt{\frac{2\delta _1}{\sigma _2}} \right] + \frac{1-\lambda ^{\ell -1}}{1-\lambda }\left[ \sqrt{\frac{2\delta _1}{\sigma _2}} + \frac{L_1\cdot L_2}{\sigma _2}\cdot \sqrt{\frac{2\delta _2}{\sigma _1}}\right] . \end{aligned}$$
(30)

Proof

This directly follows from Theorem 1 and the fact that \(\lambda < 1\). \(\square\)

According to Corollary 1, the distance to the minimizer is bounded by the second term of the right hand side of inequalities (29) and (30). By taking the limits, we obtain:

$$\begin{aligned} \lim _{\ell \rightarrow \infty } \left\| \tilde{x}_{\ell +1} - x^*\right\| \le \frac{1}{1-\lambda }\left[ \sqrt{\frac{2\delta _2}{\sigma _1}} + \frac{L_1\cdot L_2}{\sigma _1}\cdot \sqrt{\frac{2\delta _1}{\sigma _2}}\right] \end{aligned}$$

and

$$\begin{aligned} \lim _{\ell \rightarrow \infty } \left\| \tilde{p}_{\ell +1} - p^*\right\| \le \frac{1}{1-\lambda }\left[ \sqrt{\frac{2\delta _1}{\sigma _2}} + \frac{L_1\cdot L_2}{\sigma _2}\cdot \sqrt{\frac{2\delta _2}{\sigma _1}}\right] . \end{aligned}$$

Obviously, convergence is guaranteed if the subproblems can be solved exactly, i.e. if \(\delta _1=\delta _2=0\). This is not surprising as in this case the inexact alternating minimization scheme coincides with the exact method proposed by Nesterov (2020). Therefore, the iterates generated by the inexact alternating minimization scheme (26) coincide with those generated by the exact method. Inequalities (29) and (30) show that the total error of the inexact alternating minimization scheme can be controlled. Furthermore, large convexity parameters not only improve the rate of convergence for the exact version of the algorithm, but also decrease the total accumulated error in the inexact scenario.

4 Convergence analysis

We analyze the convergence of our network manipulation algorithm by applying the general theory of inexact alternating minimization from Sect. 3. First, we estimate the convexity parameter of

$$\begin{aligned} h(P) = \sum _{i=1}^{N} E_i^*(p_i) \end{aligned}$$

w.r.t. the norm

$$\begin{aligned} \Vert P\Vert _\mathbb {H} = \left( \sum _{i=1}^{N}\Vert p_i\Vert _1^2\right) ^\frac{1}{2}, \quad P \in \varDelta _K^N. \end{aligned}$$

It turns out that the strong convexity of \(E_i^*\) holds due to Assumption 2. This has been recently shown in Müller et al. (2021a).

Lemma 4

(Müller et al. (2021a)) Let the differences \(\epsilon ^{(k)}_i - \epsilon ^{(m)}_i\) of random errors have modes \({\bar{z}}^{k,m}_i \in \mathbb {R}\), \(k \not = m\). Then, the corresponding convex conjugate \(E^*_i\) is \(\beta _i\)-strongly convex w.r.t. the norm \(\Vert \cdot \Vert _1\), where the convexity parameter is given by

$$\begin{aligned} \beta _i = \frac{1}{\displaystyle 2\sum _{k=1}^{K} \sum _{m\not =k} g^{k,m}_i \left( {\bar{z}}^{k,m}_i\right) }, \end{aligned}$$

and \(g^{k,m}_i\) denotes the density function of \(\epsilon ^{(k)}_i - \epsilon ^{(m)}_i\).

Let us review important discrete choice models in accordance with Assumption 2, where convexity parameters can be explicitly estimated.

Remark 1

In the multinomial logit model (MNL), the error terms are IID Gumbel distributed with zero location parameter and variance \(\frac{\pi \cdot \mu }{\sqrt{6}}\), where \(\mu > 0\), see e.g. Anderson et al. (1992). The choice probabilities are:

$$\begin{aligned} \mathbb {P}\left( u^{(k)} + \epsilon ^{(k)} = \max _{1 \le m \le K} u^{(m)} + \epsilon ^{(m)}\right) = \frac{e^{\frac {u^{(k)}}{\mu }}}{\displaystyle \sum _{m=1}^{K}e^{\frac {u^{(m)}}{\mu }}}, \quad k=1, \ldots , K. \end{aligned}$$

From the choice probabilities we can conclude that the parameter \(\mu\) reflects the randomness of the decision. If \(\mu\) converges to zero, this would lead to the deterministic decision based on the observable utility only. On the other hand, very large values of the parameter provide very random choices, tending towards the uniform distribution in the limit. The convex conjugate of the corresponding surplus function is up to an additive constant:

$$\begin{aligned} E^*(p)=\mu \sum _{k=1}^{K} p^{(k)} \cdot \ln p^{(k)}. \end{aligned}$$

It is well known from Pinsker inequality that this function is \(\mu\)-strongly convex w.r.t. the norm \(\Vert \cdot \Vert _1\).

Remark 2

Another famous example is the nested logit model (NL) introduced in McFadden (1978). Compared to the MNL, the NL is more appropriate in situations where some of the alternatives are correlated, i.e. the axiom of irrelevance of independent alternatives is violated, see e. g. Anderson et al. (1992). In the NL, each alternative k belongs to one of L different nests \(N_\ell \subset \{1, \ldots , K\}\) for \(\ell = 1, \ldots , L\). The choice probabilities for \(k \in N_\ell\), \(\ell \in L\) are

$$\begin{aligned} \mathbb {P}\left( u^{(k)} + \epsilon ^{(k)} = \max _{1 \le m \le K} u^{(m)} + \epsilon ^{(m)}\right) = \frac{e^{\mu _\ell \ln \sum _{m \in N_\ell } e^{\frac {u^{(m)}}{\mu _\ell }}}}{\displaystyle \sum _{\ell \in L} e^{\mu _\ell \ln \sum _{m \in N_\ell } e^{\frac {u^{(m)}}{\mu _\ell }}}}\cdot \frac{e^{\frac {u^{(k)}}{\mu _\ell }}}{\displaystyle \sum _{m\in N_\ell } e^{\frac {u^{(m)}}{\mu _\ell }}}, \end{aligned}$$

where the following condition shall be satisfied:

$$\begin{aligned} 0 < \mu _\ell \le 1, \quad \ell = 1, \ldots , L. \end{aligned}$$

The parameter \(\mu _\ell\) determines the randomness of choices within the \(\ell\)-th nest. Further, the correlation of alternatives within the same \(\ell\)-th nest is given by \(1 - \mu _\ell ^2\). The convex conjugate of the NL surplus function has been derived up to an additive constant in Fosgerau et al. (2020):

$$\begin{aligned} E^*(p) = \sum _{\ell \in L} \mu _\ell \sum _{i\in N_\ell } p^{(m)} \ln p^{(m)} + \sum _{\ell \in L} \left( 1-\mu _\ell \right) \left( \sum _{m\in N_\ell } p^{(m)} \right) \ln \left( \sum _{m\in N_\ell } p^{(m)} \right) . \end{aligned}$$

It is \(\displaystyle \left( \min _{\ell \in L} \mu _\ell \right)\)-strongly convex w.r.t. the norm \(\Vert \cdot \Vert _1\), see Müller et al. (2021b).

Remark 3

MNL and NL belong to the broader class of generalized nested logit models (GNL) introduced in Wen and Koppelman (2001). GNL surplus functions are determined by the generating function

$$\begin{aligned} G(x)= \sum _{\ell \in L} \left( \sum _{i =1}^{n} \left( \sigma _{i\ell }\cdot x^{(i)}\right) ^{\frac {1}{\mu _\ell }} \right) ^{\frac {\mu _\ell }{\mu }}. \end{aligned}$$

Here, L is a generic set of nests. The parameters \(\sigma _{i\ell } \ge 0\) denote the shares of the i-th alternative with which it is attached to the \(\ell\)-th nest. For any fixed \(i \in \{1, \ldots ,n\}\) they sum up to one:

$$\begin{aligned} \sum _{\ell \in L} \sigma _{i\ell } = 1, \end{aligned}$$

and \(\sigma _{i\ell }=0\) means that the \(\ell\)-th nest does not contain the i-th alternative. Hence, the set of alternatives within the \(\ell\)-th nest is

$$\begin{aligned} N_\ell = \left\{ i \,|\, \sigma _{i\ell } >0\right\} . \end{aligned}$$

The nest parameters \(\mu _\ell > 0\) describe the variance of the random errors while choosing alternatives within the \(\ell\)-th nest. Analogously, \(\mu >0\) describes the variance of the random errors while choosing among the nests, where the following conditions shall be satisfied

$$\begin{aligned} \mu _\ell \le \mu \quad \text{ for } \text{ all } \ell \in L. \end{aligned}$$

Apart from MNL and NL, the concrete specification of the surplus’ convex conjugate \(E^*\) is not known yet. Estimates of the convexity parameter of the convex conjugate are derived in Müller et al. (2021a). However, the choice probabilities are given as a formula.

The choice probability of the i-th alternative according to GNL amounts to

$$\begin{aligned} \mathbb {P}\left( u^{(i)} + \epsilon ^{(i)} = \max _{1 \le i \le n} u^{(i)} + \epsilon ^{(i)}\right) = \mu \frac{\partial G\left( e^{u}\right) }{\partial x^{(i)}}\cdot \frac{e^{u^{(i)}}}{G\left( e^{u}\right) }= \sum _{\ell \in L} q_\ell \cdot p_{i\ell }, \end{aligned}$$

where we set \(e^{u} = \left( e^{u^{(1)}}, \ldots , e^{u^{(n)}}\right)\) for the sake of brevity.

We state Lemma 5 concerning the strong convexity of the function h defined in (18).

Lemma 5

Let the functions \(E_i^*\) be \(\beta _i\)-strongly convex w.r.t. the norm \(\Vert \cdot \Vert _1\), \(i=1, \ldots , N\). Then, the function h is \(\sigma _2\)-strongly convex w.r.t. the norm \(\Vert \cdot \Vert _\mathbb {H}\), where

$$\begin{aligned} \sigma _2=\min _{1 \le i \le N} \beta _i. \end{aligned}$$

Proof

Take any \(P,Q \in \varDelta _K^N, \alpha \in \left[ 0,1\right]\). Then the following holds

$$\begin{aligned} h(\alpha \cdot P + (1-\alpha )\cdot Q) =&\sum _{i=1}^{N}E^*_i\left( \alpha \cdot p_i + (1-\alpha )\cdot q_i\right) \\ \le&\alpha \cdot \sum _{i=1}^{N}E^*_i(p_i) + (1-\alpha ) \cdot \sum _{i=1}^{N}E^*_i(q_i) \\&\quad - \alpha \cdot (1-\alpha ) \cdot \sum _{i=1}^{N}\frac{\beta _i}{2} \Vert p_i-q_i\Vert _1^2 \\ \le&\alpha \cdot \sum _{i=1}^{N}E^*(p_i) + (1-\alpha ) \cdot \sum _{i=1}^{N}E^*(q_i) \\&\quad - \alpha \cdot (1-\alpha ) \cdot \frac{\sigma _2}{2} \cdot \sum _{i=1}^{N} \Vert p_i-q_i\Vert _1^2 \\ =&\alpha \cdot h(P) + (1-\alpha )\cdot h(Q) - \alpha \cdot (1-\alpha ) \cdot \frac{\sigma _2}{2} \cdot \Vert P-Q\Vert ^2_\mathbb {H}. \end{aligned}$$

\(\square\)

Hence, the worst convexity parameter amongst all agents determines the strong convexity of the function h(P). In order to apply results from Sect. 3, we secondly need to show that

$$\begin{aligned} f(X) = -\sum _{k=1}^{K} \frac{1}{\eta _k} \cdot \pi _k\left( M^t\cdot x_k\right) \end{aligned}$$

is strongly convex w.r.t. the norm

$$\begin{aligned} \Vert X\Vert _{\mathbb {F}} = \left[ \sum _{k=1}^{K} \Vert x_k\Vert _2^2\right] ^\frac{1}{2}, \quad X \in \varDelta _n^K. \end{aligned}$$

For that, we need to assume that the underlying network is regular.

Assumption 4

The smallest singular value of M is positive, i. e. \(\sigma _{\min }\left( M\right) > 0\) holds.

As a consequence, we are able to estimate the convexity parameter of f w.r.t. the norm \(\Vert \cdot \Vert _{\mathbb {F}}\).

Lemma 6

The function f is \(\sigma _1\)-strongly convex w.r.t. the norm \(\Vert \cdot \Vert _{\mathbb {F}}\), where

$$\begin{aligned} \sigma _1= \min _{1 \le k \le K} \frac{\tau _k}{\eta _k} \cdot \left[ \sigma _{\min }\left( M\right) \right] ^{2t}. \end{aligned}$$

Proof

First, we recall:

$$\begin{aligned} \sigma _{\min }\left( M\right) =\min _{\Vert x\Vert _2=1} \Vert M\cdot x\Vert _2. \end{aligned}$$

Hence, we get:

$$\begin{aligned} \sigma _{\min }\left( M^t\right)&= \sigma _{\min }\left( M\cdot M^{t-1}\right) = \min _{\Vert x\Vert _2=1} \left\| M\cdot M^{t-1}\cdot x \right\| _2 \\&\ge \sigma _{\min }\left( M\right) \cdot \min _{\Vert x\Vert _2=1} \left\| M^{t-1}\cdot x\right\| _2 \ge \ldots \ge \left[ \sigma _{\min }\left( M\right) \right] ^t. \end{aligned}$$

For any \(\alpha \in [0,1]\) and \(X,Z \in \varDelta _n^K\) it holds due to the \(\tau _k\)-strong convexity of \(-\pi _k\) w.r.t. the norm \(\Vert \cdot \Vert _2\):

$$\begin{aligned}&-\pi _k\left( \alpha \cdot M^t\cdot x_k + \left( 1-\alpha \right) \cdot M^t\cdot z_k\right) \\&\quad \le -\alpha \cdot \pi _k\left( M^t\cdot x_k\right) - \left( 1-\alpha \right) \cdot \pi _k\left( M^t\cdot z_k\right) \\&\quad -\alpha \cdot \left( 1-\alpha \right) \cdot \frac{\tau _k}{2}\cdot \Vert M^t x_k - M^t z_k\Vert _2^2. \end{aligned}$$

Further, we have:

$$\begin{aligned} \Vert M^t x_k - M^t z_k\Vert _2 \ge \sigma _{\min }\left( M^t\right) \cdot \Vert x_k - z_k\Vert _2\ge \left[ \sigma _{\min }\left( M\right) \right] ^t \cdot \Vert x_k - z_k\Vert _2. \end{aligned}$$

Hence, the convexity parameter of \(-\pi _k\left( M^t x_k\right)\) is \(\tau _k \cdot \left[ \sigma _{\min }\left( M\right) \right] ^{2t}\). The assertion follows analogously to the proof of Lemma 5. \(\square\)

Note that the considerations in the proof of Lemma 6 also guarantee the existence of a unique minimizer \(x^*_k(P)\) for each objective function in (13), i.e. the manipulation matrix \(X_*(P)\) is indeed well defined.

It remains to inspect the multiplicative term. We study the Lipschitz-continuity property of the operator

$$\begin{aligned} G(X) = \left( g_1(X), \ldots , g_N(X)\right) \in \mathbb {R}^{K \times N}, \end{aligned}$$

where

$$\begin{aligned} g_i(X) = \left( \Vert v_i - M^t\cdot x_1\Vert _2, \ldots , \Vert v_i - M^t\cdot x_K\Vert _2\right) ^T, \; i=1,\ldots , N. \end{aligned}$$

For that, the dual norm of \(\Vert \cdot \Vert _\mathbb {H}\) is required, see Nesterov (2020):

$$\begin{aligned} \Vert Z\Vert ^*_\mathbb {H} = \left[ \sum _{i=1}^{N} \Vert z_i\Vert _\infty ^2\right] ^{\frac{1}{2}}, \quad Z \in \mathbb {R}^{K \times N}. \end{aligned}$$

Lemma 7

The operator G is Lipschitz-continuous with modulus

$$\begin{aligned} L_1 = N^\frac{1}{2} \cdot \left[ \sigma _{\max }\left( M\right) \right] ^t, \end{aligned}$$

where \(\sigma _{\max }\) denotes the largest singular value of M. This is to say that

$$\begin{aligned} \Vert G(X) - G(Y)\Vert _{\mathbb {H}}^* \le L_1 \cdot \Vert X-Y\Vert _{\mathbb {F}}, \quad X,Y \in \varDelta ^K_n. \end{aligned}$$

Proof

The Lipschitz-continuity of G follows mainly from (Nesterov 2020). In fact, take any \(X,Y \in \varDelta ^K_n\):

$$\begin{aligned} \Vert G(X) - G(Y)\Vert _{\mathbb {H}}^* = \left[ \sum _{i=1}^{N} \Vert g_i(X) - g_i(Y)\Vert _\infty ^2\right] ^{\frac{1}{2}}. \end{aligned}$$

It holds by means of the triangle inequality:

$$\begin{aligned} \begin{array}{rcl} \left| g_i(X)-g_i(Y) \right| &{} = &{} \left| \Vert v_i-M^t\cdot x_k\Vert _2 -\Vert v_i-M^t\cdot y_k\Vert _2 \right| \\ \\ &{}\le &{} \Vert M^t x_k- M^t y_k\Vert _2 \le \left[ \sigma _{\max }\left( M\right) \right] ^{t} \cdot \Vert x_k-y_k\Vert _2. \end{array} \end{aligned}$$

Therefore,

$$\begin{aligned} \Vert g_i(X) - g_i(Y)\Vert _\infty ^2&= \left( \max _{1 \le k\le K} \left| \Vert v_i-M^t\cdot x_k\Vert _2 - \Vert v_i-M^t\cdot y_k\Vert _2 \right| \right) ^2 \\&\le \left[ \sigma _{\max }\left( M\right) \right] ^{2t} \cdot \max _{1 \le k\le K}\Vert x_k-y_k\Vert _2^2 \\&\le \left[ \sigma _{\max }\left( M\right) \right] ^{2t} \cdot \sum _{k=1}^{K}\Vert x_k-y_k\Vert _2^2, \end{aligned}$$

and the assertion follows. \(\square\)

Note that all components of G are convex and nonnegative. Moreover, all entries of the matrices P are nonnegative. Due to Lemmata 5 and 6, \(\varPhi (\cdot ,P)\) is strongly convex for any fixed \(P \in \varDelta _K^N\) and \(\varPhi (X,\cdot )\) is strongly convex for any fixed \(X \in \varDelta _n^K\). We therefore conclude that the alternating update steps of our network manipulation algorithm are well defined.

Let us finally present our main results on the convergence of the network manipulation algorithm. Recall that the derived constants are as follows:

$$\begin{aligned} \sigma _1=\min _{1 \le k \le K} \frac{\tau _k}{\eta _k} \cdot \left[ \sigma _{\min }\left( M\right) \right] ^{2t}, \quad \sigma _2=\min _{1 \le i \le N} \beta _i, \end{aligned}$$
(31)

and

$$\begin{aligned} L_1 =N^\frac{1}{2} \cdot \left[ \sigma _{\max }\left( M\right) \right] ^t, \quad L_2=1. \end{aligned}$$
(32)

Moreover, for the rate of convergence we have:

$$\begin{aligned} \lambda = \frac{L_1^2\cdot L_2^2}{\sigma _1 \cdot \sigma _2} = \frac{N \cdot \left[ \kappa (M)\right] ^{2 t}}{\displaystyle \min _{1 \le k \le K} \frac{\tau _k}{\eta _k} \cdot \min _{1 \le i \le N} \beta _i}, \end{aligned}$$
(33)

where \(\kappa (M)\) denotes the condition number of the matrix M. In order to establish convergence of the network manipulation algorithm, we need an additional assumption which indicates a certain stability for the model.

Assumption 5

It holds:

$$\begin{aligned} \left[ \kappa \left( M\right) \right] ^t < {\left( \frac{\displaystyle \min _{1 \le k \le K} \frac{\tau _k}{\eta _k} \cdot \min _{1 \le i \le N} \beta _i}{N}\right) }^{\frac{1}{2}}. \end{aligned}$$
(34)

Assumption 5 is a version of condition (24), which enforces \(\lambda < 1\) and thereby, guarantees strong convexity of the potential function (19). The straightforward application of Theorem 1 respectively Corollary 1 now provides:

Theorem 2

Let \(\left( X^*,P^*\right) \in \varDelta _n^K \times \varDelta _K^N\) be the unique minimizer of the potential function (19). Then, for the sequences \(\{\tilde{X}_\ell \}_{\ell \ge 0}\) and \(\{\tilde{P}_\ell \}_{\ell \ge 1}\) it holds:

$$\begin{aligned} \left\| \tilde{X}_{\ell +1} - X^*\right\| _{\mathbb {F}} \le \lambda ^{\ell +1} \left\| X_0 - X^*\right\| _{\mathbb {F}} + \frac{1-\lambda ^\ell }{1-\lambda }\left[ \sqrt{\frac{2\delta _2}{\sigma _1}} + \frac{L_1\cdot L_2}{\sigma _1}\cdot \sqrt{\frac{2\delta _1}{\sigma _2}}\right] \end{aligned}$$
(35)

and

$$\begin{aligned} \left\| \tilde{P}_{\ell +1} - P^*\right\| _{\mathbb {H}} \le \lambda ^{\ell }\left[ \left\| P_1 - P^*\right\| _{\mathbb {H}} + \sqrt{\frac{2\delta _1}{\sigma _2}} \right] + \frac{1-\lambda ^{\ell -1}}{1-\lambda }\left[ \sqrt{\frac{2\delta _1}{\sigma _2}} + \frac{L_1\cdot L_2}{\sigma _2}\cdot \sqrt{\frac{2\delta _2}{\sigma _1}}\right] , \end{aligned}$$
(36)

where \(\sigma _1\), \(\sigma _2\), \(L_1\), \(L_2\), and \(\lambda\) are given in (31)–(33).

Let us comment on Assumption 5 by elaborating how the model parameters enter into the inequality (34):

Interaction network:

The network structure plays a key role in (34). This is reflected by the condition number \(\kappa \left( M\right)\). Its large values cause instability of manipulation, since small changes regarding the aspired states could lead to big changes of optimal starting distributions. In other words, a more predictive pattern of network transitions speeds up the convergence. The minimum value of the condition number is attained for permutation matrices. In this case, the network interaction is obviously predictable, i.e. organizations can easily determine how network participants distribute information. For similar reasons, the number of interaction periods t has a negative impact on possibility of manipulation. More periods hamper the influence of the starting distribution on the resulting state. Instead, if time progresses the state is mainly determined just by the network structure independently of the starting distributions.

Agents:

Clearly, more agents N slow down the rate, as organizations have to pay attention to more aspired states. Moreover, large values of \(\beta _i\), \(i=1, \ldots , N\), improve the rate of convergence. In order to interpret this fact, we refer to Remark 1. There, it has been shown that \(\beta _i\)’s, can be viewed as measures that agents are still pretty uncertain about their decisions. Due to the duality of discrete choice and rational inattention, agents prone to errors have high information processing costs. Thus, these agent pay less attention to the observable utility, i.e. if their aspired states were reached. The fact that imperfect behavior of agents could help to faster stabilize economic systems was recently also described in Müller et al. (2021b).

Organizations:

The parameters \(\eta _{k}\), \(k=1, \ldots , K\), reflect to which extent organizations take into account their network payoffs. If \(\eta _k\)’s are relatively large, the organizations focus mainly on reaching the agents’ aspired states. It seems surprising that this would not improve the convergence rate of the network manipulation algorithm, but actually worsen it. However, if organizations do not properly act on the network by maximizing their profits, their manipulation power diminishes, since they lose their credibility – e.g., their followers may be disappointed by getting biased information and leave them. Organizations thus become worthless for agents in terms of manipulation and, as consequence, the network manipulation algorithm becomes less efficient. Hence, the parameter mirrors a certain credibility of the organizations. Further, the impact of \(\tau _k\), \(k=1, \ldots , K\), on the convergence rate becomes clear if we interpret the parameters as measures of organizations’ reluctance to change their starting distributions. From this point of view, the conservative behavior of organizations towards profit maximization makes the network manipulation more stable.

Additionally, it is worth to mention that the parameters \(\beta _i\)’s, \(\eta _k\)’s, and \(\tau _k\)’s, which reflect the behavior of agents and organizations, also affect the error bounds in (35) and (36). The corresponding interpretation is similar to that for the convergence rate. Namely, the agents’ imperfect behavior and organizations’ conservatism in profit maximization reduce the accumulated errors.

5 Computational aspects

We discuss the implementation and benefits of the inexact alternating minimization scheme applied to the objective function (15). We recall that we have to minimize a strongly convex function \(\varPhi (X,P)\) over a convex and bounded set \(\varDelta _n^K \times \varDelta _K^N\). Alternative ways to find a solution are therefore minimization via direct methods. The efficiency of these methods obviously depends on the properties of the objective function. By applying the alternating minimization scheme instead, it is possible to exploit the properties of the components of \(\varPhi (X,P)\) separately, as each iteration consists of solving the two subproblems

$$\begin{aligned} \tilde{P}_{\ell +1} = \arg \min ^{\delta _1}_{P \in \varDelta _K^N} \varPhi \left( \tilde{X}_\ell ,P\right) , \end{aligned}$$
(37)
$$\begin{aligned} \tilde{X}_{\ell +1} =\arg \min ^{\delta _2}_{X \in \varDelta _n^K} \varPhi \left( X,\tilde{P}_{\ell + 1}\right) . \end{aligned}$$
(38)

Hence, the performance of the alternating scheme crucially depends on how efficiently these subproblems can be solved. Furthermore, the inexact version provides an opportunity to accelerate the alternating scheme, since we can accept approximate solutions of the subproblems and thus, stop an optimization method at an earlier iteration. In fact, due to Theorem 2 the impact of these numerical inaccuracies on the convergence of the alternating scheme can be controlled by means of the model’s parameters.

5.1 Agent’s subproblem

The complexity of solving the agent’s subproblem (37) depends on the concrete specification of the underlying discrete choice model. In order to update the choice matrix \(\tilde{P}_{\ell +1}\), the following minimization problem has to be inexactly solved:

$$\begin{aligned} \min ^{\delta _1}_{P \in \varDelta _K^N} \langle G(\tilde{X}_\ell ),P \rangle + h(P). \end{aligned}$$

Notice that this minimization problem is separable, which yields:

$$\begin{aligned} \min ^{\delta _1}_{p_i \in \varDelta _K} \langle g_i(\tilde{X}_\ell ),p_i \rangle + E_i^*(p_i), \quad i= 1, \ldots , N. \end{aligned}$$
(39)

The solution of each of these problems is given by the choice probabilities of the underlying discrete choice model, cf. Fosgerau et al. (2020) and Müller et al. (2021a). The challenge of solving problem (39) lies in the concrete specification of the function \(E_i^*\). In general, the derivation of this convex conjugate of the surplus function can be very involved. In fact, for many discrete choice models the convex conjugate \(E^*_i\) is not known yet. This is e.g. the case for the most of the generalized nested logit, the probit or the mixed logit models. However, for a large class of discrete choice models we are able to inexactly solve (39), even without the knowledge of the functions \(E_i^*\). Let us discuss this in more detail:

  • For a variety of discrete choice models the choice probabilities are given by a formula. This is the case for the generalized nested logit models, where the formula is presented in Remark 1. Note that the formulas for multinomial logit and nested logit are a special case, see Remark 3. Thus, we are able to solve Problem (39) without knowing the concrete specification of the function \(E_i^*\). In this case, the computational costs of solving the subproblem are determined by the costs of evaluating the distance matrix, which can be done in \({\mathcal {O}}(KnN)\) operations, if we assume that the costs of computing \(\Vert v-M^t\cdot x\Vert _2\) are \({\mathcal {O}}(n)\).

  • For general discrete choice models the simulation of the choice probabilities can be performed, see Train (2009). This is the case e. g. for multinomial probit where the random errors are normally distributed or the mixed logit which can approximate any random utility model, see McFadden and Train (2000). Those two models are very flexible in terms of modeling substitution, however, simulation of the choice probabilities at each iteration could be computationally expensive. For the multinomial probit it is possible to use analytical approximations of the integral. Connors et al. (2014) show that such approximations perform rather fast in numerical tests compared to simulation approaches such as the GHK-simulator by Börsch-Supan and Hajivassiliou (1993).

We stress that for applying inexact alternating minimization method it is sufficient to have approximate solutions of the subproblems (39). Exact solutions of (39) are desirable, but not needed.

5.2 Organization’s subproblem

In order to update the manipulation values, we have to inexactly solve the subproblem (38) or, equivalently,

$$\begin{aligned} \min ^{\delta _2}_{X \in \varDelta _n^K} f(X) + \langle G(X),{\tilde{P}}_{\ell +1} \rangle . \end{aligned}$$

Its decomposable structure enables to solve for any \(k=1, \ldots K\):

$$\begin{aligned} \min ^{\delta _2}_{x_k \in \varDelta _n}\sum _{i=1}^{N} \tilde{p}_i^{(k)}\cdot \Vert v_i-M^t\cdot x_k\Vert _2 - \frac{1}{\eta _k} \cdot \pi _k\left( M^t\cdot x_k\right) . \end{aligned}$$
(40)

The computational efficiency of a chosen algorithm applied to (40) depends on the properties of the payoff functions \(\pi _k\). Note that these functions are allowed to be nonsmooth as well as not necessarily simple. In the most general situation, we might, hence, have to deal with a strongly convex and nonsmooth objective function and have to rely on first-order methods. Under the assumption of Lipschitz-continuity, nonsmooth convex optimization problems can be solved at a rate of \({\mathcal {O}}\left( \frac{1}{\sqrt{T}}\right)\), where T is the iteration counter. For strongly convex problems this rate can be improved to \({\mathcal {O}}\left( \frac{1}{T}\right)\), see e.g. Lan (2020). We point out for the evaluation of a subgradient of the objective function in (40), it is necessary to compute the transpose of \(M^t\). This explains why the manipulation of organizations is required to influence the network. Namely, they likely possess the knowledge of the network’s structure in terms of M rather than the agents do.

The total complexity of the subproblems does not only depend on the rate of convergence but also on how efficient each iteration can be computed. Usually, there is a trade-off between achieving a better rate in terms of iterations and the numerical efficiency per iteration. An advantage of the inexact version is to counteract this trade-off, as it enables to stop the algorithm at an earlier stage. By applying the mirror descent with the relative entropy as Bregman divergence, the projection on the probability simplex is avoided, since the update steps are then given by a closed-form expression, see e.g. Beck and Teboulle (2003). For large networks, i. e. with large values of n, this is crucial, since in the Euclidean projection on the simplex \(\varDelta _n\) comes with the cost of \({\mathcal {O}}(n\log n)\), see Chen and Ye (2011). There is also a variant of the mirror descent for strongly convex functions, which achieves \({\mathcal {O}}\left( \frac{1}{T}\right)\) proposed by Juditsky and Nemirovski (2011). However, the evaluation of each iteration step becomes more involved.

Remark 4

(Entropic mirror descent) We recall the entropic setup of the mirror descent method for minimizing a convex function \(f:\varDelta _n \rightarrow \mathbb {R}\) on the probability simplex with (sub-)gradients \(f'(x)\) an the point x. The update step at iteration \(\ell\) is then given by (Beck 2017):

$$\begin{aligned} x_{\ell +1}^{(i)} = \frac{x_{\ell }^{(i)}\cdot \exp \left( -\alpha _\ell \cdot f'(x_{\ell })^{(i)}\right) }{\sum _{j=1}^{n} x_{\ell }^{(j)}\cdot \exp \left( -\alpha _\ell \cdot f'(x_{\ell })^{(j)}\right) }, \quad j=1,2, \ldots , n, \end{aligned}$$
(41)

where \(\alpha _\ell\) is a suitable chosen stepsize. In practice, a dynamic adaptive stepsize is often chosen, while the fixed stepsize turns out to be useful for the complexity analysis, see for example Beck (2017). In order to control the level of inexactness in the alternating minimization scheme, we can therefore rely on the complexity analysis of the mirror descent method. More precisely, let the (sub-)gradients be bounded, i. e. \(\Vert f'(x_{\ell })\Vert _\infty \le M_f\) for all \(x \in \varDelta _n\) and let the fixed stepsize \(\alpha _\ell =\frac{\sqrt{2\log (n)}}{M_f\sqrt{L+1}}, \; \ell =0, \ldots , L\) be selected. Then, after L periods the optimality gap reads as

$$\begin{aligned} f^{\text{ best }}_L - f^* \le \frac{\sqrt{2 \log (n)}M_f}{\sqrt{L+1}}, \end{aligned}$$

where \(f^{\text{ best }}_L\) is the minimal function value so far (Beck 2017). This enables to determine the maximum number of iterations L for attaining a \(\delta\)-inexact solution, i. e.

$$\begin{aligned} L > \frac{2 \log (n)\cdot M_f^2}{\delta ^2}. \end{aligned}$$
(42)

The details of applying the mirror descent method to (40) are given in Sect. 6, where \(\pi _k\) is taken as the squared Euclidean distance. \(\square\)

The first part of the objective function in (40) consists of a finite sum. If N is very large, the efficiency of first-order methods might suffer, since in any iteration N gradients must be evaluated. Therefore, it seems reasonable to apply a stochastic version of the mirror descent algorithm. However,the stochastic mirror descent for composite objectives does not linearize the second part of the objective function, see Duchi et al. (2010). Therefore, if \(\pi _k\) is not simple, the computation of an iteration step might be here too expensive.

Smoothing techniques for nonsmooth functions are commonly used in optimization for achieving the convergence of order \({\mathcal {O}}\left( \frac{1}{T}\right)\), see Nesterov (2005) and Beck (2017). There exist several approximations of the norm \(\Vert \cdot \Vert _2\) with Lipschitz smooth gradients. E.g., it is possible to replace each summand of the first term of (40) by the following approximation:

$$\begin{aligned} \sqrt{\Vert v_i - M^t\cdot x_k\Vert _2^2 + \delta _k^2} - \delta _k, \quad k=1, \ldots , K. \end{aligned}$$
(43)

The function in (43) has \(\frac{\left[ \sigma _{\max }\left( M\right) \right] ^{2t}}{\delta _k}\)-smooth gradients, see Beck and Teboulle (2012). Note that large values of \(\delta _k\) yield a very smooth function but provide a worse approximation. Furthermore, we stress that it is possible to examine the effect of the smoothing parameter on the convergence of the inexact alternating minimization method. Incorporating these smoothened versions enables to improve the efficiency even if the functions \(\pi _k\) remain nonsmooth. For simple \(\pi _k\)’s the problem can be solved in \({\mathcal {O}}\left( \log \frac{1}{\epsilon }\right)\) iterations by a version of the accelerated gradient method, see e.g. Lan (2020). However, at each iteration N gradients have to be evaluated yielding a total of \({\mathcal {O}}\left( N\cdot \log \frac{1}{\epsilon }\right)\) gradient evaluations. Reducing these gradient computations can be achieved by applying the random primal–dual gradient method, which has been introduced by Lan and Zhou (2018). There, only one component of the sum is randomly selected and its gradient is used for the update step. Compared to the accelerated gradient version, this can save up to \({\mathcal {O}}(\sqrt{N})\) gradient evaluations, see Lan (2020).

Obviously, better properties of the payoff functions increase the possibilities to efficiently solve the subproblems (40). If the payoff functions have Lipschitz-smooth gradients, or can be smoothed, replacing the norm \(\Vert \cdot \Vert _2\) by (43) enables to apply conditional gradient methods for solving the subproblems. Such algorithms are of the order \({\mathcal {O}}\left( \frac{1}{T}\right)\) and hence, an \(\epsilon\)-solution can be found in \({\mathcal {O}}\left( \frac{1}{\epsilon }\right)\) iterations. From the numerical point of view, it is important to note that the conditional gradient method does not rely on the projection, see Jaggi (2013). This facilitates its application in our setting and lowers the per iteration cost for large networks. In fact, the alternating structure enables to deal with subproblems on the probability simplex. Minimizing a linear function on the simplex is straightforward, thus, a dominant factor of each iteration step is to compute the gradient. Recently, a conditional gradient sliding method has been introduced by Lan and Zhou (2016), where the number of calls of the first-order oracle can be reduced to \({\mathcal {O}}\left( \log \frac{1}{\epsilon }\right)\), while the number of iterations remains unchanged. We note that stochastic versions of the conditional gradient sliding are available.

The preceding discussions clearly suggest that the alternating minimization algorithm simplifies the computation of update steps to optimize objective functions of the form (15). Moreover, the possibility to compute inexact solutions of the subproblems turn out to be crucial for an efficient implementation. Popular choice models like probit must rely on approximate solutions of the choice probabilities and being able to stop an algorithm earlier automatically saves computational effort, like gradient evaluations.

6 Numerical examples

We examine the theoretical findings in Theorem 2 by means of numerical examples. Due to Theorem 2, we are allowed to solve the subproblems inexactly and still achieve convergence up to an error which can be controlled by the model parameters. Note that the bounds derived in Theorem 2 are based on a worst-case analysis. In practice, we might expect to see improved results. For all numerical tests, we select

$$\begin{aligned} \pi _k(M^t\cdot x_k)= \frac{1}{2} \left\| M^t\cdot x_k - c_k\right\| ^2_2, \quad k=1,\ldots , K, \end{aligned}$$

where \(c_k\) is the k-th organization’s targeted network state. For each organization the subproblem consists of minimizing a nonsmooth function on the probability simplex, which is solved by the entropic mirror descent, see Remark 4.

Let us provide a simple test example of 2 agents, who choose between 4 credible organizations according to the multinomial logit model with parameters

$$\begin{aligned} \mu _1=8.2, \quad \mu _2=9. \end{aligned}$$

The network is of size \(n=20\) and is block-diagonal. More precisely, the j-th of, say, 8 blocks has structure \(\left( e\cdot e^T - I\right) \cdot \frac{1}{\left( n_j-1\right) }\), where e is the vector of ones and I is the identity matrix of dimension \(n_j\) and size \(n_j \times n_j\), respectively. We choose \(n_j\), \(j=1, \ldots 13\), with

$$\begin{aligned} \sum _{j=1}^{13} n_j = n, \quad 2 \le n_j \le 3. \end{aligned}$$

There is only one period of interaction, i.e. \(t=1\), and the credibility values of the organizations are

$$\begin{aligned} \eta _1=0.95, \quad \eta _2=0.81, \quad \eta _3=1, \quad \eta _4=0.79. \end{aligned}$$

The aspired states of the agents \(v_1\), \(v_2\), and \(v_3\) are randomly generated. Note that in this example it holds:

$$\begin{aligned} \sigma _1=\min _{1 \le k \le K} \frac{\tau _k}{\eta _k} \cdot \left[ \sigma _{\min }\left( M\right) \right] ^{2t} =\frac{1}{1}\cdot 0.5^2, \quad \sigma _2=\min _{1 \le i \le N} \beta _i = 8.2, \end{aligned}$$

and

$$\begin{aligned} L_1 =N^\frac{1}{2} \cdot \left[ \sigma _{\max }\left( M\right) \right] ^t= 2^\frac{1}{2} \cdot 1, \quad L_2=1. \end{aligned}$$

Hence, Assumption 5 is satisfied, since

$$\begin{aligned} 2 < \left( \frac{\frac{1}{1}\cdot 8.2}{2}\right) ^\frac{1}{2} =2.02485. \end{aligned}$$

First, we produce iterates \(\left( P^*, X^*\right)\) due to the exact version of the alternating minimization algorithm. This is done by solving organizations’ inner subproblems via entropic mirror descent with dynamic adaptive stepsize (Beck 2017), where the algorithm terminates if no sufficient progress is achieved. The latter is controlled by comparing consecutive iterates element-wise and by stopping if this element-wise difference is less than \(10^{-10}\). Due to the fixed point scheme of the inexact alternating minimization algorithm, an element-wise comparison with precision \(10^{-9}\) is used as stopping criterion for the outer iterations. Second, we produce iterates \(\left( \tilde{P}^*, \tilde{X}^*\right)\) due to the inexact version of the alternating minimization algorithm. For computing the inexact solutions we rely on the entropic mirror descent with fixed stepsize discussed in Remark 4. Therefore, a \(\delta\)-level is chosen and the algorithm stops after a number of iterations determined by Eq. (42). Figure 1 shows that the gaps between exact solutions \(P^\star , X^\star\) and inexact solutions denoted by \(\tilde{P}^\star , \tilde{X}^\star\) are closing rapidly, at least as fast as proportionally to the square root of \(\delta\). This observation confirms our theoretical convergence results in Theorem 2.

Fig. 1
figure 1

Difference between the exact solutions \(P^\star , X^\star\) and the inexact solutions \(\tilde{P}^\star , \tilde{X}^\star\) dependent on the chosen \(\delta\)-level of the inner optimization problems

However, the inexact algorithm yields a numerical advantage. In Table 1 the computational time of the exact method is compared to the computational time of different inexact versions of the alternating minimization scheme. Clearly, the computational time can be significantly reduced by applying inexact versions.

Table 1 Computational time in seconds for different inexact versions

From now on, we ignore Assumption 5 and test the performance of the inexact in relation to the exact version by running numerical simulations. In order to illustrate the numerical performance, we control the number of inner iterations by choosing to iterate 10, 50, 100, 1000 times. All subproblems are solved by entropic mirror descent with dynamic adaptive stepsize. For that, we focus on agents choosing probabilities according to the nested logit model, implying that their updates can be written in closed form. There are 5 organizations to choose from, where the organizations 1 and 4 are in the first nest, organization 5 is in the third nest, organizations 2 and 3 are in the second nest. The nest parameters are chosen uniformly at random between 0.01 and 0.6. The aspired states as well as the targeted network states of the organizations are also randomly initialized. We set

$$\begin{aligned} \eta _1=1, \quad \eta _2=2, \quad \eta _3=3, \quad \eta _4=0.01, \quad \eta _5=2, \end{aligned}$$

and the network structure is randomly generated. The results are comprised in Fig. 2. The computational time is significantly reduced, see Table 2. A higher number of agents increases the computational effort, which is summarized in Table 3, while the inexact versions yield close approximations at the latest from 100 maximum inner iterations, see Fig. 3.

Fig. 2
figure 2

Difference between exact and inexact solutions for \(n=50, N=10, t=1\)

Table 2 Computational time for \(n=50, N=10, t=1\)
Fig. 3
figure 3

Difference between exact and inexact solutions for \(n=50, N=30, t=1\)

Table 3 Computational time for \(n=50, N=30, t=1\)

The computation gets noticeably more involved when more periods of interaction take place. Setting \(t=2\) significantly increases the running time in order to compute the exact version (Table 4).

Table 4 Computational time: \(n=50, N=30, t=2\)

The difference between exact and inexact versions are shown in Fig. 4.

Fig. 4
figure 4

Difference between exact and inexact solutions for \(n=50, N=30, t=2\)

Let us increase the network size to \(n=100\) and randomly initialize a sparse network. As Table 5 shows, the computational effort for an exact version dramatically increases. At the same time, the inexact versions can be solved much faster and yield reasonable approximations.

Table 5 Computational time and errors for \(n=100, N=10\), \(t=1\), sparse network

Overall, our numerical experiments suggest that the alternating minimization algorithm is rather robust if inexactly solving inner subproblems. This feature prevails if the number of organizations grows, there are more interactions, the network is ill-conditioned or sparse. Thus, the presented computational results support our theoretical findings in Theorem 2 and motivate the use of the inexact alternating minimization.