Abstract
In this paper, we present a network manipulation algorithm based on an alternating minimization scheme from Nesterov (Soft Comput 1–12, 2020). In our context, the alternative process mimics the natural behavior of agents and organizations operating on a network. By selecting starting distributions, the organizations determine the short-term dynamics of the network. While choosing an organization in accordance with their manipulation goals, agents are prone to errors. This rational inattentive behavior leads to discrete choice probabilities. We extend the analysis of our algorithm to the inexact case, where the corresponding subproblems can only be solved with numerical inaccuracies. The parameters reflecting the imperfect behavior of agents and the credibility of organizations, as well as the condition number of the network transition matrix have a significant impact on the convergence of our algorithm. Namely, they turn out not only to improve the rate of convergence, but also to reduce the accumulated errors. From the mathematical perspective, this is due to the induced strong convexity of an appropriate potential function.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Networks naturally occur in many areas such as economics, computer science, chemistry or biology. A common way to model scenarios within networks is to use Markov chains. For a finite state space, a transition matrix describes the structure of changes on the chain’s states, see e.g. Gagniuc (2017). Usually, the iterative process of repeated transitions over the states provides a stationary distribution. However, if the considered time horizon is short, a question arises on how to efficiently manipulate the distribution of information within the network. As an obvious choice for manipulation, each agent may start with an initial distribution and spread the information by communicating with the neighbors. But, since the own manipulation power is often limited, it is quite reasonable for an agent to engage intermediary organizations instead. This could be due to the restricted access to the parts of the network, revealing the manipulation interests too apparently, or due to the lack of knowledge about the network structure. Possible examples include:
- Influencers:
-
Companies, who want to credibly advertise their products or services via social media channels, pay influencers on social media platforms. They act as a part of the network and spread information on the products or services.
- Search engines:
-
Web site owners try to increase the visibility of their web sites. In order to find proper content, most of the Internet users enter a query into a web search engine. Thus, these query results strongly influence the short term behavior of the users.
- Conspiracy theory:
-
Agents try to spread false information for different interests. In order to spread fake news, agents have to rely on different distribution channels, such as groups in social media networks or Internet blogs.
Note that two of our examples are related to the manipulation of information in a social network. There is a growing literature concerning this topic, see Acemoglu and Ozdaglar (2011) for an overview. Most of the models describe the update of opinions or beliefs, see e. g. DeGroot (1974), which is done according to a convex combination of other network members’ opinions. Applying traditional techniques from the analysis of Markov chains, the formation of a consensus is examined. In these approaches, manipulation is modeled by modifying the transition matrix, e. g. introducing randomness Acemoglu and Ozdaglar (2011). Förster et al. (2016) studied manipulation in a model of opinion formation. There, the weights of the transition matrices can be changed by agents, while all starting distributions are fixed. Our model differs from those in the existing literature, since we examine how the information regarding a topic is distributed among network participants through intermediaries. Loosely speaking, we analyze who knows how much and how this information state can be efficiently manipulated by engaging intermediaries. As mentioned by Acemoglu and Ozdaglar (2011), one central component of opinion formation is how agents update their prior beliefs based on new information. In this paper, we also contribute to the opinion formation, because we investigate a way to manipulate the acquirement of information by employing the network of information sources. Our goal is not to learn the complete structure of the network, for which usually hidden Markov models are applied, see e. g. Yang et al. (1997). Instead, organizations should be able to select a starting distribution aiming to arrive at a certain information state after a number of iterations. Agents are choosing among the intermediary organizations to boost manipulation. This is at the core of our network manipulation algorithm. Note that a similar problem has been analyzed by Lindqvist (1977), where the author applies decision-theoretic techniques to observe a state at a time and obtain information about the initial state.
Let us comment on the mathematics behind the proposed network manipulation algorithm. It is motivated by Nesterov (2020) where a new technique for soft clustering is introduced. For this, voters and political parties alternately solve their subproblems, yielding an alternating minimization scheme. The behavior of voters turns out to be in accordance with the well known multinomial logit model from discrete choice theory. Namely, the voters choose rationally among the parties, but are prone to random errors, see e. g. Anderson et al. (1992). The parties update their political positions depending on how many voters they attract. Overall, the resulting soft clustering is given in terms of probabilities from the multinomial logit model. In this paper, we generalize the idea suggested in Nesterov (2020) to a broader class of discrete choice probabilities. This is done by presenting a network manipulation model based on alternating steps performed by agents and organizations. Agents try to manipulate a network by choosing intermediary organizations for helping in that. In order to select among the organizations, agents observe which of them better manipulate the network in comparison to agents’ goals. While doing so, agents are prone to random errors, which lead to choice probabilities following certain discrete choice models examined in our previous paper Müller et al. (2021a). Altogether, we show how the alternating minimization scheme introduced by Nesterov (2020) can be applied for network manipulation. Additionally, we present an inexact version of the alternating minimization scheme. Inexactness is due to the fact that the subproblems of agents and/or organizations may not be solved exactly and may suffer from numerical inaccuracies. Overall, we conclude that the agents’ imperfect behavior and organizations’ conservatism in profit maximization reduce the accumulated errors.
Notation In this paper, we mainly focus on subspaces of \(\mathbb {R}^n\) and \(\mathbb {R}^{m \times n}\), where \(\mathbb {R}^n\) is the space of n-dimensional column vectors
and \(\mathbb {R}^{m \times n}\) denotes the linear space of \((m \times n)\)-matrices. We denote by \(e_j \in \mathbb {R}^n\) the j-th coordinate vector of \(\mathbb {R}^n\) and write e for the vector of an appropriate dimension whose components are equal to one. By \(\mathbb {R}^n_+\) we denote the set of all vectors with nonnegative components and notation \(\varDelta _n\) is used for the standard simplex
We use the norms for \(x \in \mathbb {R}^n\):
For \(x, s \in \mathbb {R}^n\) we use the standard scalar product:
For matrices \(A,B \in \mathbb {R}^{m\times n}\) the inner product is defined via the trace:
A function \(F:Q \rightarrow \mathbb {R}\) is called \(\beta\)-strongly convex on a convex and closed set \(Q \subset \mathbb {R}^n\) w.r.t. a norm \(\Vert \cdot \Vert\) if for all \(x,y \in Q\) and \(\alpha \in [0,1]\) it holds:
The positive constant \(\beta\) is called the convexity parameter of F. If \(\beta =0\), we call F convex. A function \(\pi\) is \(\beta\)-strongly concave if \(-\pi\) is \(\beta\)-strongly convex. For a convex function \(F:Q \rightarrow \mathbb {R}\) the set \(\partial F(x)\) represents its subdifferential at \(x \in Q\), i.e.
Its convex conjugate is
where \(s \in R^n\) is a vector of dual variables. We denote by \(\nabla F(x)\) the gradient of a differentiable function F at x.
2 Manipulation model
Let us introduce our model in order to later construct a manipulation algorithm based on interaction within a network.
2.1 Interaction network
A central aspect in our model is a network with n nodes. The structure of this network describes how nodes interact among each other, e. g. how persons receive and exchange information. Thereby, a link from node j to i represents a connection. In the context of an information network, such a link would depict that person i acquires information from person j. We summarize the data in a transition matrix \(M = \left( M_{ij}\right) _{i,j =1}^{n}\), where \(M_{ij}\) denotes the transition probability of node j to node i. Hence, the following holds:
M is a column stochastic matrix, i.e. \(M\ge 0, \; e^T\cdot M = e^T\). Our model describes the process of information acquirement rather than the formation of opinions as e. g. in Förster et al. (2016). We are interested in a few periods of interactions, thus, we take the transition matrix as fixed. The interaction causes different states of the network, based on the connections of its nodes. The state of a network can be represented as an element of the standard simplex in \(\mathbb {R}^n\) dependent on time variable. We call a vector \(x(t) \in \varDelta _n\) a state of a network at time t. Such a state reflects the value each node possesses after an interaction with other nodes. This could be for example the amount of information a person possesses in relation to the others or the market share of a company.
The dynamics of interaction can be described by an iterative process. Starting with a vector \(x(0) \in \varDelta _n\), the nodes interact repeatedly with each other. Thus, the iterative process is given by
Obviously, all x(t)’s generated according to this process are elements of \(\varDelta _n\). Our idea is closely related to the concept of network rankings, such as the famous PageRank from Page et al. (1999). However, we focus on a limited, mostly small number, of interaction periods. Within an information network, persons would typically exchange information for a few periods, before they make a decision. This short term behavior endows the starting vector \(x(0) \in \varDelta _n\) with importance. For the sake of brevity, we drop the time index by writing \(x=x(0)\).
2.2 Agents
Let us assume that agents want to manipulate the resulting state of a network in favor of their own interests. Though they aspire certain network states, agents face some challenges by trying to manipulate a network. Often, they do not have knowledge of the network structure. Additionally, there are many situations, where the agents can’t participate in the network because they cannot connect to a node without revealing their intentions, e. g. companies cannot credibly advertise their products by themselves. There might also be networks, where an agent could interact in, but is restricted to start with a fixed vector. In particular, if the information is just spread uniformly. Instead, agents could instruct organizations to manipulate the interaction in order to reach an aspired state of the network. The organizations often have more information or at least experience about the structure of a network. In fact, they could even operate it. The agents choose among K organizations, where each organization provides an observable utility \(u^{(k)}\). This discrete choice behavior we describe by means of the so-called additive random utility models. The additive decomposition of utility goes back to psychological experiments accomplished in the 1920’s by Thurstone (1927). A formal description of this framework has been first introduced in an economic context by McFadden (1978), where rational decision-makers choose from a finite set of mutually exclusive alternatives \(\{1, \ldots , K\}\). Although the decision rule follows a rational behavior, agents are prone to random errors. These random errors describe decision-affecting features which cannot be observed. Each alternative \(k = 1,\ldots , K\) provides the utility
where \(u^{(k)} \in \mathbb {R}\) is the deterministic utility part of the k-th alternative and \(\epsilon ^{(k)}\) is its stochastic error. We use the following notation for the vectors of deterministic utilities and of random utilities, respectively:
The probabilistic framework yields choice probabilities for each alternative:
As the consumers behave rationally, their surplus is given by the expected maximum utility of their decision:
It is well known that the surplus function is convex, see e.g. Anderson et al. (1992). Additionally we make a standard assumption concerning the distribution of random errors.
Assumption 1
The random vector \(\epsilon\) follows a joint distribution with zero mean that is absolutely continuous with respect to the Lebesgue measure and fully supported on \(\mathbb {R}^K\).
We stress that the zero mean part of Assumption 1 is not restrictive and could be replaced by a finite mean assumption. By adding constants to the deterministic utilities u, it can be achieved that the random vector \(\epsilon\) has zero mean, see e. g. Train (2009).
Let \(g_{k,m}\) denote the density function of differences \(\epsilon ^{(m)} - \epsilon ^{(k)}\), \(k \not = m\) of random errors. Any point \({\bar{z}}_{k,m} \in \mathbb {R}\) which maximizes the density function \(g_{k,m}\) is called a mode of the random variable \(\epsilon ^{(m)} - \epsilon ^{(k)}\).
Assumption 2
The differences \(\epsilon ^{(k)} - \epsilon ^{(m)}\) of random errors have finite modes for all \(k \not = m\).
Assumption 1 guarantees that no ties in occur in (3), which provides differentiability of the surplus function. Further, the gradient of E corresponds to the vector of choice probabilities, which is known as the Williams-Daly-Zachary theorem, see e.g. McFadden (1978), i.e.
Hence, each component of the gradient of E yields the probability that alternative k provides the maximum utility among all alternatives.
Another equivalent representation of choice probabilities can be obtained by means of the convex conjugate of the surplus function. Note that the convex conjugate of E is given by the function \({E^*: \mathbb {R}^K \rightarrow \mathbb {R}\; \cup \; \left\{ \infty \right\} }\), defined by:
where \(p = \left( p^{(1)}, \ldots , p^{(K)}\right) ^T\) is the vector of dual variables. In view of conjugate duality, the vector of choice probabilities can be derived from an optimization problem of rational inattention, see e. g. Fosgerau et al. (2020) and Müller et al. (2021a). Indeed, it has been shown that under Assumption 2 the vector of choice probabilities p is the unique solution of
Now, we assume that there are N agents trying to manipulate the network. Each agent i has an aspired state of network which we denote by \(v_i \in \varDelta _n\). In order to reach the aspired state, agents can choose among K organizations. The k-th organization is able to manipulate the interaction dynamics in the network, which yields at time t a state of a network \(x_k(t) \in \varDelta _n, \; k=1, \ldots , K\). In general agents prefer organizations which provide a network state in line with the states they desire such as an aspired market shares distribution or state of information. In order to assess the outcome of a manipulation, any agent i has to compare K distances, i.e.
respectively
Note that (6) provides a way for agent i to observe the utility of choosing the k-th organization. The network state at time t is observable, so any agent is able to check, if an organization has manipulated the network satisfactorily. Let us put all the states in a matrix, which yields a way to summarize all the states of a network at time t in one variable, i.e.
The matrix above can also be expressed in terms of the starting vectors, by defining
which enables us to write
We define a vector valued function \(g_i: \varDelta _n^K \rightarrow \mathbb {R}_+^K\) for any agent i, which stores all these distances of the i-th agent and, hence, depends on a matrix X as input variable:
We write in matrix form:
In view of additive random utility models, \(g_i(\cdot )\) provides a way to characterize the observable utility \(u_i\) by setting
Hence, the vector of the i-th agent choice probabilities has entries
Equivalently, \(p_i(X)\) solves the following rational inattention problem:
Let us stack the choice probabilities of all the agents into a matrix and call it the choice matrix:
Similarly to the choice matrix, we write \(P \in \varDelta _K^N\) for any matrix of probability vectors, i.e. \(P = \left( p_1, \ldots , p_N\right)\) with \(p_i \in \varDelta _K\), \(i=1, \ldots , N\).
2.3 Organizations
Let us describe the behavior of advertising organizations. Their goal is to attract agents as clients by providing them with additional manipulation power. This is done by choosing an appropriate starting distribution, thus, the communication process is initialized by organizations. By strategic decisions, such as substantial alignment, design, product placements or personal relations, or by direct decisions, such as ranking of a website as result of a certain query or advertising products directly on a marketplace, the organizations determine these vectors, which reflect a network state before interaction starts.
In order to attract the i-th agent with aspired state \(v_i\), the k-th organization selects a starting distribution \(x_k \in \varDelta _n\) such that \(\left\| v_i - M^t\cdot x_k \right\| _2\) becomes small. The organization’s goal is to acquire as many agents as possible by simultaneously satisfying the corresponding aspired states. However, the agents are not necessarily equally important for the organization. Instead, the organization primary wants to please agents, who already prefer the organization compared to other competitors. Let us state these considerations in a formal way. An organization k observes to which extent the agents choose it, i.e. quantified by choice probabilities \(p_i^{(k)}\), \(i=1, \ldots , N\). Thus, the k-th organization measures its performance by the following objective:
Yet, an organization’s choice of the manipulation distribution not only depends on the agents’ aspired states, but also on its own objectives. This reflects, that an organization might also aspire a certain state of the network in order to gain profits from the network participants. Therefore, we introduce a payoff function for organization k, which depends on its caused state of manipulation:
Let us illustrate by examples how a network state could affect the payoff of the k-th organization. Groups in social media platforms might avoid sharing information with persons who have contrary opinions, such that no arguments against their theories or fake news are communicated. Prohibiting or restricting persons’ access to information might be a worthwhile purpose in an information network. Particularly, this is interesting in situations, where direct manipulation of opinions is difficult. Since the authors in Acemoglu and Ozdaglar (2011) mention the source of information as a key component of opinion formation, the manipulation of the information acquirement process contributes to the tampering of opinion formation. A social media influencer might loose credibility of her followers, if they find out about an unacceptable advertise. We state an assumption concerning the payoff functions.
Assumption 3
The payoff function \(\pi _k\) is \(\tau _k\)-strongly concave w.r.t. the norm \(\Vert \cdot \Vert _2\) for all \(k=1, \ldots , K\).
Altogether, the objective function of the k-th organization incorporates the both goals:
where \(\eta _k > 0\) is a regularization parameter, which shows the importance of payoffs generated by the network. Note that small values of \(\eta _k\) indicate a more restrictive behavior of the k-th organization, meaning that it rather focuses on its own interests than to freely adjust the manipulation distribution according to the agents’ aspired states. According to Assumption 3, the negative of the payoff function serves as regularization term. Strongly concave regularization is a well-known and widely-used technique in optimization theory, see e.g. Nesterov (2018). From an economic perspective, the payoff function mimics a stable behavior of the organization. Apart from the already mentioned payoffs generated by the network, this function could also reflect that the k-th organization avoids to deviate too much from a certain targeted state \(c_k=M^t\cdot s_k\), where \(s_k \in \varDelta _n\), due to adjustment costs. As a matter of fact, organizations might know from experience which starting distributions cause network states at a neighborhood of the targeted state, but must take more effort to detect starting distributions for states outside this neighborhood and, thus, face larger adjustment costs. Based on these considerations, a typical \(\tau _k\)-strongly concave payoff function is
We shall discuss the numerical practicability of this choice later on. For a given choice matrix \(P \in \varDelta _K^N\), network M and time t, the k-th organization chooses its optimal starting distribution \(x_k \in \varDelta _n\) by solving
For now, we assume that the optimization problems given in (13) have unique solutions for any choice matrix P, which we denote by \(x_k(P)\), \(k=1, \ldots , K\). We keep these optimal manipulation values in a matrix
and call it the manipulation matrix.
2.4 Network manipulation algorithm
In the preceding section we described the behavior of agents and organizations when facing the challenge to manipulate a network in favor of agents’ desires. The key aspect is that their behavior summarized in (8) and (13) suggests an alternating interaction between both groups. Organizations enter the market and offer their manipulation distributions. Then, agents observe how satisfactory organizations would manipulate the network state in view of the agents aspired states (e. g. by comparing past results caused by an organization). Based on these observations, agents make their decisions, i.e. they choose organizations with probability according to (8). The choice probabilities provide feedback to the organizations, which then in turn adjust their starting distributions following the behavior given in (13). By using previous notation, we have the following dynamics:
where \(X_0\) is any feasible starting variable, e. g. \(X_0 = \frac{1}{n}\cdot ee^T\).
In what follows, we provide an equivalent description of this network manipulation algorithm in order to better study its convergence properties. For that, we define a potential function which incorporates the behavior of all agents and organizations:
Therefore, the choice matrix solves the following minimization problem:
Analogously, we have for a manipulation matrix:
which means that the network manipulation algorithm can be viewed as an alternating minimization scheme.
From the viewpoint of computational economics, it seems reasonable to assume that agents and organizations are not able to solve their corresponding optimization problems exactly. Rather than that, the solutions can be obtained up to small errors. This can be for example due to observation errors of the input parameters given by choice and/or manipulation matrices. Another reason could be that exact optimization is time-exhaustive or too costly. In order to incorporate this faulty behavior into our manipulation algorithm, we assume that just inexact minimization in (16) and (17) is possible. More precisely, \(\delta _1\)-inexact solutions for (16) and \(\delta _2\)-inexact solutions for (17) are available. We recall that evaluated at a \(\delta\)-inexact solution the function value is at most the minimum value plus \(\delta\), see Sect. 3 for details. Thus, we are ready to state a more general network manipulation algorithm, based on an inexact alternating minimization scheme:
The inexact algorithm raises the questions, if the corresponding alternating behavior converges to a stable equilibrium. Do agents and organizations reach a state, where their choices do not change anymore no matter what the starting distributions of the organizations look like? In other words, does a unique minimizer of the potential function exist and does the algorithm converge to this minimizer? Moreover, it is interesting to analyze how the faulty behavior in terms of the errors impacts the possible convergence. We shall answer these questions by applying general results on inexact alternating minimization schemes, which we present in Sect. 3. This is possible since the potential function (15) can be suitably decomposed. For that, we define:
Using the standard inner product, the potential function in (15) can be written as follows:
3 Inexact alternating minimization
In cases, where the analytical solution of an optimization problem cannot be derived, it is necessary to solve the problem numerically. Normally, this numerical solutions are only exact up to a small \(\delta\)-error. We review some theoretical aspects of inexact optimization, which we need for convergence analysis. Let us consider optimization problems of the form
where \(\varPhi\) is a strongly convex function and Q a closed and convex set. We denote by \(z^*\) the solution of problem (20). Recall that for a \(\beta\)-strongly convex function \(\varPhi\) it holds:
For a \(\delta\)-inexact solution we use the standard definition, see e. g. Stonyakin et al. (2019):
Definition 1
A point \(\tilde{z}\) is a \(\delta\)-inexact solution with \(\delta \ge 0\), i.e.
if and only if there exists \(g \in \partial \varPhi (\tilde{z})\) such that \(\langle g, z^* - \tilde{z}\rangle \ge - \delta\).
Due to Definition 1, a point \(\tilde{z}\) provides the minimal objective function value of (20) up to the error \(\delta\). This can be easily seen, because \(\varPhi\) is convex and therefore it holds for any \(z \in Q\):
which is equivalent to
In what follows, we shall focus on decision variable z, which can be separated into two blocks, i.e. \(z= (x,p)\). For those situations, alternating minimization methods can be applied. The block structure enables to minimize the objective function for each block separately, which is, in particular, a valuable property for big data applications. Over years, many convergence results for alternating minimization methods under different assumptions were shown. For example, Grippof and Sciandrone (1999) show that updating each component in a sequential manner yields a sequence of iterates such that each limit point is a global minimizer of a continuously differentiable and pseudoconvex function. Under the assumption of Lipschitz continuous gradients and coordinate-wise strong convexity of the objective function, Luo and Tseng (1993) prove linear convergence to a stationary point for constrained problems. Convergence of an alternating minimization scheme for objectives functions with non-differentiable parts has been derived by Beck (2015). Pu et al. (2014) show, that under assumptions such as convexity for one and strong convexity for the other objective term, the inexact alternating minimization algorithm applied to the primal coincides with the inexact proximal gradient method to the dual problem. Recently, in Nesterov (2020) an alternating minimization method was used for soft clustering. There, the objective function additionally includes an interaction term linking both blocks of variables. Under certain assumptions, linear convergence was established provided the problem can be solved exactly in each block. In this paper, we are interested in an inexact alternating minimization algorithm for objective functions equipped with the structure introduced by Nesterov (2020). Let \(Q_1, Q_2\) be closed and convex sets in finite dimensional vector spaces \(\mathbb {V}_1, \mathbb {V}_2\) and let \(\mathbb {V}\) be a finite dimensional vector space. The objective function is given by
where the operators \(G_1: \mathbb {V}_1 \rightarrow \mathbb {V}^*\) and \(G_2: \mathbb {V}_2 \rightarrow \mathbb {V}\) are Lipschitz-continuous with moduli \(L_1\) and \(L_2\) on the respective sets \(Q_1, Q_2\). Moreover, we assume that the interaction term \(\langle G_1(x),G_2(p) \rangle\) is convex and closed in \(x \in Q_1\) for any fixed \(p \in Q_2\) and vice versa and that the functions f and h are \(\sigma _1\)- and \(\sigma _2\)-strongly convex on \(Q_1\), respectively on \(Q_2\). Further, we assume the following strict inequality to hold
under which the function \(\varPhi\) is shown to be strongly convex on \(Q = Q_1\times Q_2\), see Nesterov (2020). Let the optimal solution of (20) be written as \(z^* = (x^*, p^*)\). In order to solve (20), an alternating minimization method has been proposed by Nesterov (2020). This method generates sequences \(\{x_\ell \}_{\ell \ge 0}\) and \(\{p_\ell \}_{\ell \ge 1}\) as follows:
Convergence analysis in Nesterov (2020) is based on fixed point iteration. For that, the operators \(T:Q_1 \mapsto Q_1\) and \(S:Q_2 \mapsto Q_2\) are defined as follows:
This enables to write the update step of the alternating minimization scheme:
Under condition (24), \(T(\cdot )\) and \(S(\cdot )\) are contraction mappings. Thus, the linear convergence of the generated sequences to the minimizer \(\left( x^*,p^*\right)\) of \(\varPhi\) could be shown in Nesterov (2020):
where
We analyze an inexact version of the alternating minimization method applied to objective functions in (23), when subproblems are solved inexactly in the sense of Definition 1. For that, let us adapt the algorithm in the following way:
We allow different accuracy for the above subproblems. Moreover, we allow for iteration-specific errors. The equations also suggest that in iteration \(\ell\) a \(\delta ^{(\ell )}\)-error is made twice. This can be seen by looking at the function values evaluated at two consecutive points of the sequences \(\{\tilde{x}_\ell \}_{t \ge 0}\) and \(\{\tilde{p}_\ell \}_{t \ge 0}\) generated via the \(\delta\)-inexact solutions of the auxiliary optimization problems:
Next, we estimate the distances between \(u^{\delta ^{(\ell )}_1}(x)\) and u(x) as well as between \(v^{\delta ^{(\ell )}_2}(p)\) and v(p).
Lemma 1
For any \(x \in Q_1\) and \(\ell = 0, 1, \ldots\), it holds:
and for any \(p \in Q_2\) \(\ell = 0, 1, \ldots\) it holds:
Proof
Take an arbitrary iteration \(\ell\). We apply (21) to derive:
Due to (22) we additionally have:
Altogether, we obtain:
The proof for \(\Vert v^{\delta ^{(\ell )}_2}(p) - v(p)\Vert\) follows analogously. \(\square\)
Let us elaborate on the continuity properties for operators \(u^{\delta ^{(\ell )}_1}(\cdot )\) and \(v^{\delta ^{(\ell )}_2}(\cdot )\).
Lemma 2
For any \(x_1, x_2 \in Q_1\) and \(\ell = 0, 1, \ldots\) it holds:
and for any \(p_1, p_2 \in Q_2\) and \(\ell = 0, 1, \ldots\) it holds:
Proof
Fix an arbitrary iteration \(\ell\). Applying Lemma 1 and the triangle inequality twice yields:
The last inequality is due to Nesterov (2020), where it is shown for any \(x_1, x_2 \in Q_1\):
Similar reflections yield the result for \(\Vert v^{\delta ^{(\ell )}_2}(p_1) - v^{\delta ^{(\ell )}_2}(p_2)\Vert\). \(\square\)
Let us introduce inexact versions of the operators T and S:
which we use to rewrite the update of the inexact alternating minimization as
The following result provides uniform continuity up to an error of the operators defined in (26).
Proposition 1
For any \(\tilde{x}_1, \tilde{x}_2 \in Q_1\) and \(\ell = 0, 1, \ldots\) it holds:
and for any \(\tilde{p}_1, \tilde{p}_2 \in Q_2\) it holds:
Proof
We apply Lemma 2 to derive
Again, the second assertion follows similarly. \(\square\)
Since we cannot rely on the contraction property of \(T^{\delta ^{(\ell )}}\) and \(S^{\delta ^{(\ell )}}\), the convergence analysis of the sequences \(\{\tilde{x}_\ell \}_{\ell \ge 0}\) and \(\{\tilde{p}_\ell \}_{\ell \ge 1}\) becomes involved. For that, we start with the following auxiliary result.
Lemma 3
For any \(x \in Q_1\) and \(\ell =0, 1, \ldots\) it holds:
and for any \(p \in Q_2\) and \(\ell =0, 1, \ldots\) it holds:
Proof
We show the first part. It follows by means of Lemmata 1 and 2.
Clearly, the proof of the second part is similar. \(\square\)
Now we are ready to state the main result concerning convergence of the inexact alternating minimization scheme.
Theorem 1
For the inexact alternating minimization scheme holds:
and
Proof
We apply Lemma 3 to derive:
We are therefore able to estimate the distance of the \(\left( \ell +1\right)\)-th iterate to the minimizer:
For the proof of the inequality (30) note that the first iterate of the algorithm is not chosen freely. Instead it is the solution of the corresponding optimization problem. Hence, the first iterates of the exact and inexact version are in general not equal, i.e. \(p_1 \ne \tilde{p}_1\), which provides
It remains to recall that \(p_1=u(x_0)\) and \(\tilde{p}_1=u^{\delta ^{(0)}_1}(x_0)\) and apply Lemma 1. The result (30) follows then in the same manner as for (29). \(\square\)
According to Theorem 1, the inexact alternating minimization does not converge in general. Yet, if the sequences of errors \(\left\{ \delta ^{(\ell )}_1\right\} _{\ell \ge 0}\) and \(\left\{ \delta ^{(\ell )}_2\right\} _{\ell \ge 0}\) are not growing, the right hand side of Eqs. (27) and (28) can be controlled by the model’s parameter. In order to reach good convergence results, Theorem 1 suggests that at the beginning of the inexact alternating minimization algorithm the problems can be solved up to bigger errors, whereas at later iterations the errors shall be reduced.
Corollary 1
Let in each iteration the same errors are made, i. e. \(\delta ^{(\ell )}_1 = \delta _1\) and \(\delta ^{(\ell )}_2 = \delta _2\) for \(\ell =0, 1, \ldots\). Then, for the inexact alternating minimization scheme holds:
and
Proof
This directly follows from Theorem 1 and the fact that \(\lambda < 1\). \(\square\)
According to Corollary 1, the distance to the minimizer is bounded by the second term of the right hand side of inequalities (29) and (30). By taking the limits, we obtain:
and
Obviously, convergence is guaranteed if the subproblems can be solved exactly, i.e. if \(\delta _1=\delta _2=0\). This is not surprising as in this case the inexact alternating minimization scheme coincides with the exact method proposed by Nesterov (2020). Therefore, the iterates generated by the inexact alternating minimization scheme (26) coincide with those generated by the exact method. Inequalities (29) and (30) show that the total error of the inexact alternating minimization scheme can be controlled. Furthermore, large convexity parameters not only improve the rate of convergence for the exact version of the algorithm, but also decrease the total accumulated error in the inexact scenario.
4 Convergence analysis
We analyze the convergence of our network manipulation algorithm by applying the general theory of inexact alternating minimization from Sect. 3. First, we estimate the convexity parameter of
w.r.t. the norm
It turns out that the strong convexity of \(E_i^*\) holds due to Assumption 2. This has been recently shown in Müller et al. (2021a).
Lemma 4
(Müller et al. (2021a)) Let the differences \(\epsilon ^{(k)}_i - \epsilon ^{(m)}_i\) of random errors have modes \({\bar{z}}^{k,m}_i \in \mathbb {R}\), \(k \not = m\). Then, the corresponding convex conjugate \(E^*_i\) is \(\beta _i\)-strongly convex w.r.t. the norm \(\Vert \cdot \Vert _1\), where the convexity parameter is given by
and \(g^{k,m}_i\) denotes the density function of \(\epsilon ^{(k)}_i - \epsilon ^{(m)}_i\).
Let us review important discrete choice models in accordance with Assumption 2, where convexity parameters can be explicitly estimated.
Remark 1
In the multinomial logit model (MNL), the error terms are IID Gumbel distributed with zero location parameter and variance \(\frac{\pi \cdot \mu }{\sqrt{6}}\), where \(\mu > 0\), see e.g. Anderson et al. (1992). The choice probabilities are:
From the choice probabilities we can conclude that the parameter \(\mu\) reflects the randomness of the decision. If \(\mu\) converges to zero, this would lead to the deterministic decision based on the observable utility only. On the other hand, very large values of the parameter provide very random choices, tending towards the uniform distribution in the limit. The convex conjugate of the corresponding surplus function is up to an additive constant:
It is well known from Pinsker inequality that this function is \(\mu\)-strongly convex w.r.t. the norm \(\Vert \cdot \Vert _1\).
Remark 2
Another famous example is the nested logit model (NL) introduced in McFadden (1978). Compared to the MNL, the NL is more appropriate in situations where some of the alternatives are correlated, i.e. the axiom of irrelevance of independent alternatives is violated, see e. g. Anderson et al. (1992). In the NL, each alternative k belongs to one of L different nests \(N_\ell \subset \{1, \ldots , K\}\) for \(\ell = 1, \ldots , L\). The choice probabilities for \(k \in N_\ell\), \(\ell \in L\) are
where the following condition shall be satisfied:
The parameter \(\mu _\ell\) determines the randomness of choices within the \(\ell\)-th nest. Further, the correlation of alternatives within the same \(\ell\)-th nest is given by \(1 - \mu _\ell ^2\). The convex conjugate of the NL surplus function has been derived up to an additive constant in Fosgerau et al. (2020):
It is \(\displaystyle \left( \min _{\ell \in L} \mu _\ell \right)\)-strongly convex w.r.t. the norm \(\Vert \cdot \Vert _1\), see Müller et al. (2021b).
Remark 3
MNL and NL belong to the broader class of generalized nested logit models (GNL) introduced in Wen and Koppelman (2001). GNL surplus functions are determined by the generating function
Here, L is a generic set of nests. The parameters \(\sigma _{i\ell } \ge 0\) denote the shares of the i-th alternative with which it is attached to the \(\ell\)-th nest. For any fixed \(i \in \{1, \ldots ,n\}\) they sum up to one:
and \(\sigma _{i\ell }=0\) means that the \(\ell\)-th nest does not contain the i-th alternative. Hence, the set of alternatives within the \(\ell\)-th nest is
The nest parameters \(\mu _\ell > 0\) describe the variance of the random errors while choosing alternatives within the \(\ell\)-th nest. Analogously, \(\mu >0\) describes the variance of the random errors while choosing among the nests, where the following conditions shall be satisfied
Apart from MNL and NL, the concrete specification of the surplus’ convex conjugate \(E^*\) is not known yet. Estimates of the convexity parameter of the convex conjugate are derived in Müller et al. (2021a). However, the choice probabilities are given as a formula.
The choice probability of the i-th alternative according to GNL amounts to
where we set \(e^{u} = \left( e^{u^{(1)}}, \ldots , e^{u^{(n)}}\right)\) for the sake of brevity.
We state Lemma 5 concerning the strong convexity of the function h defined in (18).
Lemma 5
Let the functions \(E_i^*\) be \(\beta _i\)-strongly convex w.r.t. the norm \(\Vert \cdot \Vert _1\), \(i=1, \ldots , N\). Then, the function h is \(\sigma _2\)-strongly convex w.r.t. the norm \(\Vert \cdot \Vert _\mathbb {H}\), where
Proof
Take any \(P,Q \in \varDelta _K^N, \alpha \in \left[ 0,1\right]\). Then the following holds
\(\square\)
Hence, the worst convexity parameter amongst all agents determines the strong convexity of the function h(P). In order to apply results from Sect. 3, we secondly need to show that
is strongly convex w.r.t. the norm
For that, we need to assume that the underlying network is regular.
Assumption 4
The smallest singular value of M is positive, i. e. \(\sigma _{\min }\left( M\right) > 0\) holds.
As a consequence, we are able to estimate the convexity parameter of f w.r.t. the norm \(\Vert \cdot \Vert _{\mathbb {F}}\).
Lemma 6
The function f is \(\sigma _1\)-strongly convex w.r.t. the norm \(\Vert \cdot \Vert _{\mathbb {F}}\), where
Proof
First, we recall:
Hence, we get:
For any \(\alpha \in [0,1]\) and \(X,Z \in \varDelta _n^K\) it holds due to the \(\tau _k\)-strong convexity of \(-\pi _k\) w.r.t. the norm \(\Vert \cdot \Vert _2\):
Further, we have:
Hence, the convexity parameter of \(-\pi _k\left( M^t x_k\right)\) is \(\tau _k \cdot \left[ \sigma _{\min }\left( M\right) \right] ^{2t}\). The assertion follows analogously to the proof of Lemma 5. \(\square\)
Note that the considerations in the proof of Lemma 6 also guarantee the existence of a unique minimizer \(x^*_k(P)\) for each objective function in (13), i.e. the manipulation matrix \(X_*(P)\) is indeed well defined.
It remains to inspect the multiplicative term. We study the Lipschitz-continuity property of the operator
where
For that, the dual norm of \(\Vert \cdot \Vert _\mathbb {H}\) is required, see Nesterov (2020):
Lemma 7
The operator G is Lipschitz-continuous with modulus
where \(\sigma _{\max }\) denotes the largest singular value of M. This is to say that
Proof
The Lipschitz-continuity of G follows mainly from (Nesterov 2020). In fact, take any \(X,Y \in \varDelta ^K_n\):
It holds by means of the triangle inequality:
Therefore,
and the assertion follows. \(\square\)
Note that all components of G are convex and nonnegative. Moreover, all entries of the matrices P are nonnegative. Due to Lemmata 5 and 6, \(\varPhi (\cdot ,P)\) is strongly convex for any fixed \(P \in \varDelta _K^N\) and \(\varPhi (X,\cdot )\) is strongly convex for any fixed \(X \in \varDelta _n^K\). We therefore conclude that the alternating update steps of our network manipulation algorithm are well defined.
Let us finally present our main results on the convergence of the network manipulation algorithm. Recall that the derived constants are as follows:
and
Moreover, for the rate of convergence we have:
where \(\kappa (M)\) denotes the condition number of the matrix M. In order to establish convergence of the network manipulation algorithm, we need an additional assumption which indicates a certain stability for the model.
Assumption 5
It holds:
Assumption 5 is a version of condition (24), which enforces \(\lambda < 1\) and thereby, guarantees strong convexity of the potential function (19). The straightforward application of Theorem 1 respectively Corollary 1 now provides:
Theorem 2
Let \(\left( X^*,P^*\right) \in \varDelta _n^K \times \varDelta _K^N\) be the unique minimizer of the potential function (19). Then, for the sequences \(\{\tilde{X}_\ell \}_{\ell \ge 0}\) and \(\{\tilde{P}_\ell \}_{\ell \ge 1}\) it holds:
and
where \(\sigma _1\), \(\sigma _2\), \(L_1\), \(L_2\), and \(\lambda\) are given in (31)–(33).
Let us comment on Assumption 5 by elaborating how the model parameters enter into the inequality (34):
- Interaction network:
-
The network structure plays a key role in (34). This is reflected by the condition number \(\kappa \left( M\right)\). Its large values cause instability of manipulation, since small changes regarding the aspired states could lead to big changes of optimal starting distributions. In other words, a more predictive pattern of network transitions speeds up the convergence. The minimum value of the condition number is attained for permutation matrices. In this case, the network interaction is obviously predictable, i.e. organizations can easily determine how network participants distribute information. For similar reasons, the number of interaction periods t has a negative impact on possibility of manipulation. More periods hamper the influence of the starting distribution on the resulting state. Instead, if time progresses the state is mainly determined just by the network structure independently of the starting distributions.
- Agents:
-
Clearly, more agents N slow down the rate, as organizations have to pay attention to more aspired states. Moreover, large values of \(\beta _i\), \(i=1, \ldots , N\), improve the rate of convergence. In order to interpret this fact, we refer to Remark 1. There, it has been shown that \(\beta _i\)’s, can be viewed as measures that agents are still pretty uncertain about their decisions. Due to the duality of discrete choice and rational inattention, agents prone to errors have high information processing costs. Thus, these agent pay less attention to the observable utility, i.e. if their aspired states were reached. The fact that imperfect behavior of agents could help to faster stabilize economic systems was recently also described in Müller et al. (2021b).
- Organizations:
-
The parameters \(\eta _{k}\), \(k=1, \ldots , K\), reflect to which extent organizations take into account their network payoffs. If \(\eta _k\)’s are relatively large, the organizations focus mainly on reaching the agents’ aspired states. It seems surprising that this would not improve the convergence rate of the network manipulation algorithm, but actually worsen it. However, if organizations do not properly act on the network by maximizing their profits, their manipulation power diminishes, since they lose their credibility – e.g., their followers may be disappointed by getting biased information and leave them. Organizations thus become worthless for agents in terms of manipulation and, as consequence, the network manipulation algorithm becomes less efficient. Hence, the parameter mirrors a certain credibility of the organizations. Further, the impact of \(\tau _k\), \(k=1, \ldots , K\), on the convergence rate becomes clear if we interpret the parameters as measures of organizations’ reluctance to change their starting distributions. From this point of view, the conservative behavior of organizations towards profit maximization makes the network manipulation more stable.
Additionally, it is worth to mention that the parameters \(\beta _i\)’s, \(\eta _k\)’s, and \(\tau _k\)’s, which reflect the behavior of agents and organizations, also affect the error bounds in (35) and (36). The corresponding interpretation is similar to that for the convergence rate. Namely, the agents’ imperfect behavior and organizations’ conservatism in profit maximization reduce the accumulated errors.
5 Computational aspects
We discuss the implementation and benefits of the inexact alternating minimization scheme applied to the objective function (15). We recall that we have to minimize a strongly convex function \(\varPhi (X,P)\) over a convex and bounded set \(\varDelta _n^K \times \varDelta _K^N\). Alternative ways to find a solution are therefore minimization via direct methods. The efficiency of these methods obviously depends on the properties of the objective function. By applying the alternating minimization scheme instead, it is possible to exploit the properties of the components of \(\varPhi (X,P)\) separately, as each iteration consists of solving the two subproblems
Hence, the performance of the alternating scheme crucially depends on how efficiently these subproblems can be solved. Furthermore, the inexact version provides an opportunity to accelerate the alternating scheme, since we can accept approximate solutions of the subproblems and thus, stop an optimization method at an earlier iteration. In fact, due to Theorem 2 the impact of these numerical inaccuracies on the convergence of the alternating scheme can be controlled by means of the model’s parameters.
5.1 Agent’s subproblem
The complexity of solving the agent’s subproblem (37) depends on the concrete specification of the underlying discrete choice model. In order to update the choice matrix \(\tilde{P}_{\ell +1}\), the following minimization problem has to be inexactly solved:
Notice that this minimization problem is separable, which yields:
The solution of each of these problems is given by the choice probabilities of the underlying discrete choice model, cf. Fosgerau et al. (2020) and Müller et al. (2021a). The challenge of solving problem (39) lies in the concrete specification of the function \(E_i^*\). In general, the derivation of this convex conjugate of the surplus function can be very involved. In fact, for many discrete choice models the convex conjugate \(E^*_i\) is not known yet. This is e.g. the case for the most of the generalized nested logit, the probit or the mixed logit models. However, for a large class of discrete choice models we are able to inexactly solve (39), even without the knowledge of the functions \(E_i^*\). Let us discuss this in more detail:
-
For a variety of discrete choice models the choice probabilities are given by a formula. This is the case for the generalized nested logit models, where the formula is presented in Remark 1. Note that the formulas for multinomial logit and nested logit are a special case, see Remark 3. Thus, we are able to solve Problem (39) without knowing the concrete specification of the function \(E_i^*\). In this case, the computational costs of solving the subproblem are determined by the costs of evaluating the distance matrix, which can be done in \({\mathcal {O}}(KnN)\) operations, if we assume that the costs of computing \(\Vert v-M^t\cdot x\Vert _2\) are \({\mathcal {O}}(n)\).
-
For general discrete choice models the simulation of the choice probabilities can be performed, see Train (2009). This is the case e. g. for multinomial probit where the random errors are normally distributed or the mixed logit which can approximate any random utility model, see McFadden and Train (2000). Those two models are very flexible in terms of modeling substitution, however, simulation of the choice probabilities at each iteration could be computationally expensive. For the multinomial probit it is possible to use analytical approximations of the integral. Connors et al. (2014) show that such approximations perform rather fast in numerical tests compared to simulation approaches such as the GHK-simulator by Börsch-Supan and Hajivassiliou (1993).
We stress that for applying inexact alternating minimization method it is sufficient to have approximate solutions of the subproblems (39). Exact solutions of (39) are desirable, but not needed.
5.2 Organization’s subproblem
In order to update the manipulation values, we have to inexactly solve the subproblem (38) or, equivalently,
Its decomposable structure enables to solve for any \(k=1, \ldots K\):
The computational efficiency of a chosen algorithm applied to (40) depends on the properties of the payoff functions \(\pi _k\). Note that these functions are allowed to be nonsmooth as well as not necessarily simple. In the most general situation, we might, hence, have to deal with a strongly convex and nonsmooth objective function and have to rely on first-order methods. Under the assumption of Lipschitz-continuity, nonsmooth convex optimization problems can be solved at a rate of \({\mathcal {O}}\left( \frac{1}{\sqrt{T}}\right)\), where T is the iteration counter. For strongly convex problems this rate can be improved to \({\mathcal {O}}\left( \frac{1}{T}\right)\), see e.g. Lan (2020). We point out for the evaluation of a subgradient of the objective function in (40), it is necessary to compute the transpose of \(M^t\). This explains why the manipulation of organizations is required to influence the network. Namely, they likely possess the knowledge of the network’s structure in terms of M rather than the agents do.
The total complexity of the subproblems does not only depend on the rate of convergence but also on how efficient each iteration can be computed. Usually, there is a trade-off between achieving a better rate in terms of iterations and the numerical efficiency per iteration. An advantage of the inexact version is to counteract this trade-off, as it enables to stop the algorithm at an earlier stage. By applying the mirror descent with the relative entropy as Bregman divergence, the projection on the probability simplex is avoided, since the update steps are then given by a closed-form expression, see e.g. Beck and Teboulle (2003). For large networks, i. e. with large values of n, this is crucial, since in the Euclidean projection on the simplex \(\varDelta _n\) comes with the cost of \({\mathcal {O}}(n\log n)\), see Chen and Ye (2011). There is also a variant of the mirror descent for strongly convex functions, which achieves \({\mathcal {O}}\left( \frac{1}{T}\right)\) proposed by Juditsky and Nemirovski (2011). However, the evaluation of each iteration step becomes more involved.
Remark 4
(Entropic mirror descent) We recall the entropic setup of the mirror descent method for minimizing a convex function \(f:\varDelta _n \rightarrow \mathbb {R}\) on the probability simplex with (sub-)gradients \(f'(x)\) an the point x. The update step at iteration \(\ell\) is then given by (Beck 2017):
where \(\alpha _\ell\) is a suitable chosen stepsize. In practice, a dynamic adaptive stepsize is often chosen, while the fixed stepsize turns out to be useful for the complexity analysis, see for example Beck (2017). In order to control the level of inexactness in the alternating minimization scheme, we can therefore rely on the complexity analysis of the mirror descent method. More precisely, let the (sub-)gradients be bounded, i. e. \(\Vert f'(x_{\ell })\Vert _\infty \le M_f\) for all \(x \in \varDelta _n\) and let the fixed stepsize \(\alpha _\ell =\frac{\sqrt{2\log (n)}}{M_f\sqrt{L+1}}, \; \ell =0, \ldots , L\) be selected. Then, after L periods the optimality gap reads as
where \(f^{\text{ best }}_L\) is the minimal function value so far (Beck 2017). This enables to determine the maximum number of iterations L for attaining a \(\delta\)-inexact solution, i. e.
The details of applying the mirror descent method to (40) are given in Sect. 6, where \(\pi _k\) is taken as the squared Euclidean distance. \(\square\)
The first part of the objective function in (40) consists of a finite sum. If N is very large, the efficiency of first-order methods might suffer, since in any iteration N gradients must be evaluated. Therefore, it seems reasonable to apply a stochastic version of the mirror descent algorithm. However,the stochastic mirror descent for composite objectives does not linearize the second part of the objective function, see Duchi et al. (2010). Therefore, if \(\pi _k\) is not simple, the computation of an iteration step might be here too expensive.
Smoothing techniques for nonsmooth functions are commonly used in optimization for achieving the convergence of order \({\mathcal {O}}\left( \frac{1}{T}\right)\), see Nesterov (2005) and Beck (2017). There exist several approximations of the norm \(\Vert \cdot \Vert _2\) with Lipschitz smooth gradients. E.g., it is possible to replace each summand of the first term of (40) by the following approximation:
The function in (43) has \(\frac{\left[ \sigma _{\max }\left( M\right) \right] ^{2t}}{\delta _k}\)-smooth gradients, see Beck and Teboulle (2012). Note that large values of \(\delta _k\) yield a very smooth function but provide a worse approximation. Furthermore, we stress that it is possible to examine the effect of the smoothing parameter on the convergence of the inexact alternating minimization method. Incorporating these smoothened versions enables to improve the efficiency even if the functions \(\pi _k\) remain nonsmooth. For simple \(\pi _k\)’s the problem can be solved in \({\mathcal {O}}\left( \log \frac{1}{\epsilon }\right)\) iterations by a version of the accelerated gradient method, see e.g. Lan (2020). However, at each iteration N gradients have to be evaluated yielding a total of \({\mathcal {O}}\left( N\cdot \log \frac{1}{\epsilon }\right)\) gradient evaluations. Reducing these gradient computations can be achieved by applying the random primal–dual gradient method, which has been introduced by Lan and Zhou (2018). There, only one component of the sum is randomly selected and its gradient is used for the update step. Compared to the accelerated gradient version, this can save up to \({\mathcal {O}}(\sqrt{N})\) gradient evaluations, see Lan (2020).
Obviously, better properties of the payoff functions increase the possibilities to efficiently solve the subproblems (40). If the payoff functions have Lipschitz-smooth gradients, or can be smoothed, replacing the norm \(\Vert \cdot \Vert _2\) by (43) enables to apply conditional gradient methods for solving the subproblems. Such algorithms are of the order \({\mathcal {O}}\left( \frac{1}{T}\right)\) and hence, an \(\epsilon\)-solution can be found in \({\mathcal {O}}\left( \frac{1}{\epsilon }\right)\) iterations. From the numerical point of view, it is important to note that the conditional gradient method does not rely on the projection, see Jaggi (2013). This facilitates its application in our setting and lowers the per iteration cost for large networks. In fact, the alternating structure enables to deal with subproblems on the probability simplex. Minimizing a linear function on the simplex is straightforward, thus, a dominant factor of each iteration step is to compute the gradient. Recently, a conditional gradient sliding method has been introduced by Lan and Zhou (2016), where the number of calls of the first-order oracle can be reduced to \({\mathcal {O}}\left( \log \frac{1}{\epsilon }\right)\), while the number of iterations remains unchanged. We note that stochastic versions of the conditional gradient sliding are available.
The preceding discussions clearly suggest that the alternating minimization algorithm simplifies the computation of update steps to optimize objective functions of the form (15). Moreover, the possibility to compute inexact solutions of the subproblems turn out to be crucial for an efficient implementation. Popular choice models like probit must rely on approximate solutions of the choice probabilities and being able to stop an algorithm earlier automatically saves computational effort, like gradient evaluations.
6 Numerical examples
We examine the theoretical findings in Theorem 2 by means of numerical examples. Due to Theorem 2, we are allowed to solve the subproblems inexactly and still achieve convergence up to an error which can be controlled by the model parameters. Note that the bounds derived in Theorem 2 are based on a worst-case analysis. In practice, we might expect to see improved results. For all numerical tests, we select
where \(c_k\) is the k-th organization’s targeted network state. For each organization the subproblem consists of minimizing a nonsmooth function on the probability simplex, which is solved by the entropic mirror descent, see Remark 4.
Let us provide a simple test example of 2 agents, who choose between 4 credible organizations according to the multinomial logit model with parameters
The network is of size \(n=20\) and is block-diagonal. More precisely, the j-th of, say, 8 blocks has structure \(\left( e\cdot e^T - I\right) \cdot \frac{1}{\left( n_j-1\right) }\), where e is the vector of ones and I is the identity matrix of dimension \(n_j\) and size \(n_j \times n_j\), respectively. We choose \(n_j\), \(j=1, \ldots 13\), with
There is only one period of interaction, i.e. \(t=1\), and the credibility values of the organizations are
The aspired states of the agents \(v_1\), \(v_2\), and \(v_3\) are randomly generated. Note that in this example it holds:
and
Hence, Assumption 5 is satisfied, since
First, we produce iterates \(\left( P^*, X^*\right)\) due to the exact version of the alternating minimization algorithm. This is done by solving organizations’ inner subproblems via entropic mirror descent with dynamic adaptive stepsize (Beck 2017), where the algorithm terminates if no sufficient progress is achieved. The latter is controlled by comparing consecutive iterates element-wise and by stopping if this element-wise difference is less than \(10^{-10}\). Due to the fixed point scheme of the inexact alternating minimization algorithm, an element-wise comparison with precision \(10^{-9}\) is used as stopping criterion for the outer iterations. Second, we produce iterates \(\left( \tilde{P}^*, \tilde{X}^*\right)\) due to the inexact version of the alternating minimization algorithm. For computing the inexact solutions we rely on the entropic mirror descent with fixed stepsize discussed in Remark 4. Therefore, a \(\delta\)-level is chosen and the algorithm stops after a number of iterations determined by Eq. (42). Figure 1 shows that the gaps between exact solutions \(P^\star , X^\star\) and inexact solutions denoted by \(\tilde{P}^\star , \tilde{X}^\star\) are closing rapidly, at least as fast as proportionally to the square root of \(\delta\). This observation confirms our theoretical convergence results in Theorem 2.
However, the inexact algorithm yields a numerical advantage. In Table 1 the computational time of the exact method is compared to the computational time of different inexact versions of the alternating minimization scheme. Clearly, the computational time can be significantly reduced by applying inexact versions.
From now on, we ignore Assumption 5 and test the performance of the inexact in relation to the exact version by running numerical simulations. In order to illustrate the numerical performance, we control the number of inner iterations by choosing to iterate 10, 50, 100, 1000 times. All subproblems are solved by entropic mirror descent with dynamic adaptive stepsize. For that, we focus on agents choosing probabilities according to the nested logit model, implying that their updates can be written in closed form. There are 5 organizations to choose from, where the organizations 1 and 4 are in the first nest, organization 5 is in the third nest, organizations 2 and 3 are in the second nest. The nest parameters are chosen uniformly at random between 0.01 and 0.6. The aspired states as well as the targeted network states of the organizations are also randomly initialized. We set
and the network structure is randomly generated. The results are comprised in Fig. 2. The computational time is significantly reduced, see Table 2. A higher number of agents increases the computational effort, which is summarized in Table 3, while the inexact versions yield close approximations at the latest from 100 maximum inner iterations, see Fig. 3.
The computation gets noticeably more involved when more periods of interaction take place. Setting \(t=2\) significantly increases the running time in order to compute the exact version (Table 4).
The difference between exact and inexact versions are shown in Fig. 4.
Let us increase the network size to \(n=100\) and randomly initialize a sparse network. As Table 5 shows, the computational effort for an exact version dramatically increases. At the same time, the inexact versions can be solved much faster and yield reasonable approximations.
Overall, our numerical experiments suggest that the alternating minimization algorithm is rather robust if inexactly solving inner subproblems. This feature prevails if the number of organizations grows, there are more interactions, the network is ill-conditioned or sparse. Thus, the presented computational results support our theoretical findings in Theorem 2 and motivate the use of the inexact alternating minimization.
Availability of data and material
Not applicable.
Code availability
Not applicable.
References
Acemoglu D, Ozdaglar A (2011) Opinion dynamics and learning in social networks. Dyn Games Appl 1(1):3–49
Anderson SP, Palma AD, Thisse LF (1992) Discrete choice theory of product differentiation. MIT press, Cambridge
Beck A (2015) On the convergence of alternating minimization for convex programming with applications to iteratively reweighted least squares and decomposition schemes. SIAM J Optim 25(1):185–209
Beck A (2017) First-order methods in optimization. SIAM, Philadelphia, PA
Beck A, Teboulle M (2003) Mirror descent and nonlinear projected subgradient methods for convex optimization. Oper Res Lett 31(3):167–175
Beck A, Teboulle M (2012) Smoothing and first order methods: a unified framework. SIAM J Optim 22(2):557–580
Börsch-Supan A, Hajivassiliou VA (1993) Smooth unbiased multivariate probability simulators for maximum likelihood estimation of limited dependent variable models. J Econ 58(3):347–368
Chen Y, Ye X (2011) Projection onto a simplex. arXiv preprint arXiv:1101.6081
Connors RD, Hess S, Daly A (2014) Analytic approximations for computing probit choice probabilities. Transp A Trans Sci 10(2):119–139
DeGroot MH (1974) Reaching a consensus. J Am Stat Assoc 69(345):118–121
Duchi J, Shalev-Shwartz S, Singer Y, Tewari A (2010) Composite objective mirror descent. In: Proceedings of the 23rd annual conference on learning theory. Omnipress. pp 14–26
Förster M, Mauleon A, Vannetelbosch VJ (2016) Trust and manipulation in social networks. Netw Sci 4(2):216–243
Fosgerau M, Melo E, De Palma A, Shum M (2020) Discrete choice and rational inattention: a general equivalence result. Int Econ Rev 61(4):1569–1589
Gagniuc PA (2017) Markov chains: from theory to implementation and experimentation. Wiley, Hoboken
Grippof L, Sciandrone M (1999) Globally convergent block-coordinate techniques for unconstrained optimization. Optim Methods Softw 10(4):587–637
Jaggi M (2013) Revisiting Frank-Wolfe: projection-free sparse convex optimization. In: International conference on machine learning. PMLR. pp 427–435
Juditsky A, Nemirovski A (2011) First order methods for nonsmooth convex large-scale optimization, I: general purpose methods. Optim Mach Learn 30(9):121–148
Lan G (2020) First-order and stochastic optimization methods for machine learning. Springer Nature, Heidelberg
Lan G, Zhou Y (2016) Conditional gradient sliding for convex optimization. SIAM J Optim 26(2):1379–1409
Lan G, Zhou Y (2018) An optimal randomized incremental gradient method. Math Program 171(1):167–215
Lindqvist B (1977) How fast does a Markov chain forget the initial state? Adecision theoretical approach. Scand J Stat 4(4):145–152.
Luo ZQ, Tseng P (1993) Error bounds and convergence analysis of feasible descent methods: a general approach. Ann Oper Res 46(1):157–178
McFadden D (1978) Modeling the choice of residential location. Transp Res Rec 673:72–77
McFadden D, Train K (2000) Mixed MNL models for discrete response. J Appl Economet 15(5):447–470
Müller D, Nesterov Y, Shikhman V (2021) Discrete choice prox-functions on the simplex. Math Oper Res 47(1):485–507. https://doi.org/10.1287/moor.2021.1136
Müller D, Nesterov Y, Shikhman V (2021) Dynamic pricing under nested logit demand. J Pure Appl Funct Anal 6(6):1435–1451
Nesterov Y (2005) Smooth minimization of non-smooth functions. Math Program 103(1):127–152
Nesterov Y (2018) Lectures on convex optimization, vol 137. Springer, New York
Nesterov Y (2020) Soft clustering by convex electoral model. Soft Comput 24(23):17609–17620
Page L, Brin S, Motwani R, Winograd T (1999) The pagerank citation ranking: bringing order to the web. Tech. rep, Stanford InfoLab
Pu Y, Zeilinger MN, Jones CN (2014) Inexact fast alternating minimization algorithm for distributed model predictive control. In: 53rd IEEE conference on decision and control. pp 5915–5921
Stonyakin FS, Dvinskikh D, Dvurechensky P, Kroshnin A, Kuznetsova O, Agafonov A, Gasnikov A, Tyurin A, Uribe CA, Pasechnyuk D, Artamonov S (2019) Gradient methods for problems with inexact model of the objective. In: International conference on mathematical optimization theory and operations research. Springer. pp 97–114
Thurstone L (1927) A law of comparative judgment. Psychol Rev 34(4):273
Train KE (2009) Discrete choice methods with simulation. Cambridge University Press, Cambridge
Wen CH, Koppelman FS (2001) The generalized nested logit model. Trans Res Part B Methodol 35(7):627–641
Yang J, Xu Y, Chen CS (1997) Human action learning via hidden Markov model. IEEE Trans Syst Man Cybern-Part A Syst Humans 27(1):34–44
Acknowledgements
The authors would like to thank the anonymous referees for their precise and constructive remarks which considerably improved the quality of the manuscript.
Funding
Open Access funding enabled and organized by Projekt DEAL. Not applicable.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
Not applicable.
Ethical approval
Not applicable.
Consent to participate
Not applicable.
Consent for publication
Not applicable.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Müller, D., Shikhman, V. Network manipulation algorithm based on inexact alternating minimization. Comput Manag Sci 19, 627–664 (2022). https://doi.org/10.1007/s10287-022-00429-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10287-022-00429-9