# Constrained Network Formation

- 380 Downloads

## Abstract

This study presents a novel framework for the study of endogenous network growth subject to constraints. The literature on strategic network formation analysed the specific case of positive constraints: in the present work, the model is extended to constraints which can be negative and change in time depending on the actions of the agents. A characterisation of stable networks in the static case is provided, and it is proved that finding them is computationally difficult unless specific assumptions are made. The framework can be applied to contexts in which the formation of a link inhibits or implies the formation of another one, typically due to time, space or capacity constraints. Two specific examples are investigated, highlighting the importance of modelling constraints in order to obtain credible simulations and null models: the network of corporate control and the network of citations among scientific papers.

### Keywords

Network formation Nash equilibrium Complexity of equilibria Network analysis### JEL Classification

D85 C55 C72*“Always go to other people’s funerals; otherwise they won’t go to yours.”**Yogi Berra (facing a typical “constrained growth” network)*

## 1 Introduction

In the last 20 years, the theory of networks has been recognised an important role in explaining the formation and functioning of social and economic settings in which *relationships* among agents are of fundamental importance. In particular, several models of network formation were developed targeting the mechanisms by which some characteristics of nodes (typically, the cost of creating/keeping alive a link, compared to the utility received from becoming—directly or indirectly—connected to some other nodes) endogenously determine the structure of a network. A stream of literature, starting from the seminal work of Bala and Goyal (2000), has developed focusing on a *noncooperative* approach, where the choice of adding a link between two nodes is made independently by only one of them, which bears all the cost—although other nodes potentially benefit from such link. Based on this framework, a definition of *stability* can be given, typically based on the concept of *pairwise Nash equilibrium* (such as in Galeotti 2006 and Haller et al. 2007), or some refinement of it (for instance Dutta and Mutuswami 1997 consider *coalition* choices, while the concept of “*far-sightedly stable networks*” formulated by Herings et al. 2009 is based on attributing nodes a longer horizon of strategical reasoning).

The aforementioned studies share the implicit assumption that links can be added and destroyed freely (though at some cost). Even experimental works on endogenous network formation have usually been based on the assumption that participants can *at any point in time*—or at least *repeatedly*—decide to create/break a link (Goeree et al. 2009; Kirchsteiger et al. 2016). This is a natural starting point for several reasons: links in many real world networks (e.g. computer networks, social relationships...) are indeed at least potentially volatile, the data available to researchers often describe some inherently volatile flow (e.g. trade, influence, information) over them, and even considering networks which are typically characterised by a *stratification* of links over time (such as connections in Internet social networks, or the network of roads between cities), most databases available to researchers are *snapshots* of networks at given points in time, sacrificing information on their temporal evolution. However, there are several contexts in which the process of network formation is profoundly shaped by constraints, and in which the assumption that links can be freely created is at odds both with reality and with data available to researchers. Constraints may have different origins: they can for instance be related to time (e.g. networks where nodes are scientific papers, patents or other kinds of timestamped objects), space (e.g. planarity), and rivalry (e.g. cross-ownership networks, nodes affected by capacity limits): two specific examples will be analysed in more detail in Sect. 3. Only recently some form of constrained growth was formalised in the context of strategic network formation by Haller (2012). His study provides interesting conclusions concerning networks which grow around an exogenously fixed subset of links, shown to potentially change drastically the existence, numerosity, stability and efficiency of stable configurations. An interesting insight is that such *backbone infrastructures*, that is, sets of links which are guaranteed to exist *ex ante* and independently from individual incentives, and which hence forbid nodes from playing their individual best replies, can actually cause global *welfare improvements*. The present study generalises the approach to the analysis of *repeated* addition of nodes and/or links, under positive *and negative* constraints. Differently from the work of Haller, the set of guaranteed/forbidden links will not necessarily be exogenously given, but can come instead from the previous iteration of the network formation process. This results in a rich framework, which can be specialised according to the characteristics of the network under analysis.

## 2 The Model

As in the model by Galeotti et al. (2006), a network is composed by \(N = \{1, \dots , n\}\) nodes: for each pair of nodes (*i*, *j*) a cost parameter \(c_{ij} > 0\) and a value parameter \(v_{ij} > 0\) are given. A *directed* network *g* is formally a collection of pairs of nodes: if a pair (*i*, *j*) is in *g*, we say that *i**sponsors* a link to *j*, and we write \(g_{ij}=1\). \(\bar{g}\) represents the corresponding *undirected* network, i.e. the smallest network containing *g* and also (*j*, *i*) for each (*i*, *j*) contained in *g*. The set of feasible networks is denoted by \(\mathcal G\).

*j*such that the network

*g*contains a path from

*i*to

*j*

*or vice-versa*, the benefit extracted by

*i*is defined as:

*i*also pays a cost which is the sum of costs of sponsored (outgoing) links:

*e*denotes the empty network. A set of connected nodes \(S \subset N\) is said to be a

*component*if they are not connected to any node outside

*S*(notice that \(N_i(g)\) simply denotes the component which contains a given node

*i*); a link is said to be a

*bridge*if it connects two otherwise disconnected components (i.e. if the number of components in the network increases by 1 when removing it). Moreover, the notation

*i*in the network

*g*(in the present work, it is always assumed for simplicity that \(g_{ii}=0\)). An

*action*for a node

*i*is a subset of

*N*(so again, an element of \(\{0,1\}^n\)), determining the links that

*i*is sponsoring.

### 2.1 Internal Constraints

*negative*constraints: a model of network formation will be characterised not only by \({\mathfrak {g}}\), which will be denoted henceforth as \({\mathfrak {g}}^+\), but also by another network \({\mathfrak {g}}^-\) (disjoint from \({\mathfrak {g}}^+\)), containing links which will be

*absent*in any possible network. Although it is possible to introduce this generalisation by setting the cost of links in \({\mathfrak {g}}^-\) high enough, a more tractable approach is to neglect their benefits in the payoff function,

^{1}which is hence defined as

^{2}It can be easily verified that when \({\mathfrak {g}}^- = e\), this coincides with the payoff function defined by Haller (2012). With all the components of the model exposed, we can proceed to the generalisation of some of his results concerning Nash networks—that is, networks which are stable with respect to individual deviations. As a starting point, let us consider the following result.

### Proposition 1

(Haller 2012) Consider a strategic model of network formation with payoff functions \(\Pi _i({\mathfrak {g}}^+, e, g)\), \(g \in \mathcal G\), \(i \in N\). Suppose that costs are owner-homogeneous. Then there exists a Nash network \(g^*\).

What follows is a simplified proof of this result. Like the original proof by Haller (2012), it relies on the observation that the proof of existence by Haller et al. (2007) does not exploit the homogeneity of costs, only the owner-homogeneity.

### Proof

We start by replacing each link \((i,j) \in {\mathfrak {g}}^+\) with an “ancillary” node *h* “serving” *i* and *j*, which has cost \(c_{h} = 1\), \(v_{hk}=2\) for \(k=i,j\), and \(v_{hk} = 0\) otherwise. This model has owner-homogeneous costs, so it has a Nash network \({g^*}'\) (Haller et al. 2007). In \({g^*}'\), each ancillary node must be connected (possibly indirectly) to both the nodes it serves. If it is connected indirectly, we sever the first link of the path (or any other link, if the first step is a direct connection to the other node it serves) and replace it with a direct link. Similarly, we sever any other link to any ancillary node and replace it with a link to any of the two nodes it serves. The result is still a Nash network (all utilities of “original” nodes weakly increase and their space of strategies is unchanged, while ancillary nodes have clearly no incentive to deviate), and if we restrict it to the *N* original nodes we obtain the desired \(g^*\). \(\square \)

The following result is analogous to the previous except that it now allows for \({\mathfrak {g}}^- \not = e\).

### Proposition 2

Consider a strategic model of network formation with payoff functions \(\Pi _i({\mathfrak {g}}^+, {\mathfrak {g}}^-, g)\), \(g \in \mathcal G\), \(i \in N\). Suppose that costs are owner-homogeneous. Then there exists a Nash network \(g^*\).

### Proof of Proposition 2

*i*’s strategies set having being restricted, and all of the others nodes’ ones staying unchanged, the equilibrium is still such), and hence this step is trivial. So let us assume that \((i,j) \in g^*\). The link (

*i*,

*j*) is contained in \(g^* \ominus {\mathfrak {g}}^+\) (since it is by assumption not in \({\mathfrak {g}}^+\)), so it must have been convenient for

*i*, i.e. it must be a bridge. Two cases are possible.

- (a)There exists another link (
*h*,*k*)^{3}from \(N_i(g^* \ominus (i,j))\) to \(N_j(g^* \ominus (i,j))\) (see Fig. 1) or vice-versa, which is not forbidden \(((h,k) {\not \in {\mathfrak {g}}^-})\) and is part of the best response of*h*to \(g^* \ominus (i,j)\), i.e.$$\begin{aligned} c_{h,k} < \sum _{k' \in N_k(g^* \ominus (i,j))} v_{h,k'}. \end{aligned}$$(1) - (b)
There is no such pair (

*h*,*k*).

In the case A, consider the network \(g^* \ominus (i,j) \oplus (h,k)\): for any node \(l \not \in \{i,h\}\), the actions space is unchanged from \(g^*\), as well as the payoffs. For *i*, all *available* strategies now deliver a payoff increased by \(c_{i,j}\) (the cost of connecting the two components now being borne by *h*), so their preference ordering does not change. Finally, since costs are owner-homogeneous, *h* does not have an incentive to deviate by replacing the link (*h*, *k*) with a different one. In the case B, consider instead the network \(g^* \ominus (i,j)\). For any node outside \(N_i(g^*)\), the preference ordering of strategies is unchanged. The same holds for nodes in \(N_i(g^*)\), except for strategies which would connect the two components; but such strategies are, by assumption (Eq. 1 is not satisfied), dominated. So in both cases we have a new Nash equilibrium. Since the case \(\Pi ({\mathfrak {g}}^+, e,\cdot )\) is proved by Proposition 1, the result is proved by induction for any possible \({\mathfrak {g}}^-\). \(\square \)

Proposition 2 is the natural generalisation of Proposition 1 to the presence of negative restrictions.^{4} Analogously, the following result, related to networks of positive constraints which are (in the unconstrained model of network formation) Pareto optimal, generalises Proposition 2 by Haller (2012).

### Proposition 3

Consider a strategic model of network formation with payoff functions \(\Pi _i({\mathfrak {g}}^+,{\mathfrak {g}}^-, g), g \in \mathcal G, i \in N\). Suppose that the pre-existing network or infrastructure \({\mathfrak {g}}^+ \in \mathcal G\) is Pareto optimal. Then the empty network is a strict Nash network and the only Nash network.

### Proof

Let \({\mathfrak {g}}^+\) be Pareto optimal. The case \({\mathfrak {g}}^-=e\) is Proposition 2 by Haller (2012). When considering \({\mathfrak {g}}^-\ne e\), the actions set of some nodes is restricted, but the links in \({\mathfrak {g}}^+\) are left untouched (recall that \({\mathfrak {g}}^+\) and \({\mathfrak {g}}^-\) are disjoint). Hence, the empty network is still a strict Nash network, because the preference ordering on available strategies does not change.

Suppose next that some \(g^*\ne e\) is a Nash network. The proof develops as in the original result: given some pair (*i*, *j*) with \(1 = {g^*}_{ij} \ne {{\mathfrak {g}}^+}_{ij}= 0\), it must be that \(g^*_i\) is a best response against \(g_{-i}^*\). But then \(g^*\) is strictly preferred to \({\mathfrak {g}}^+\) by at least *i*, while it is at least equally preferred by all other agents (since it contains all links in \({\mathfrak {g}}^+\)). This contradicts the Pareto optimality of \({\mathfrak {g}}^+\). \(\square \)

- 1.
considering negative constraints is important in order to understand the growth of some real world networks,

- 2.
from a social planner perspective, imposing negative constraints could in principle improve the beneficial effects of an endogenously formed network, possibly at a lower cost than through positive constraints.

*stabilising*effect (in some cases in which Nash equilibria do not exist, they can instead be obtained by choosing an appropriate \({\mathfrak {g}}^+\)), a

*welfare improvement*effect (constraints can raise the overall sum of payoffs in Nash equilibrium), and others. Those exogenous constraints can hence be imagined as publicly provided infrastructures which are provided by the social planner. Can some of the described effects be attained as well through

*negative*constraints—i.e. with a social planner acting through

*prohibition*of a set of given links? The question is relevant because in principle it can be much easier for the policy maker to forbid some given links than to provide others, or obliging the interested nodes to build them (the problem of contribution to links as public goods is analysed for instance by Anshelevich et al. 2003).

It is worth starting with an example. The network in Fig. 2 does not admit a Nash equilibrium, since \(\{(2,1)\}\) is the best response of 2 to any network which includes (3, 1) (allowing 2 to connect to 3), but (3, 1) is *in* the best response of 3 to and only to networks which do *not* include (2, 1) (3 prefers to connect to 1 through 2 than directly). However, the network \(\{(2,1), (3,1)\}\) can be made stable both with only positive constraints (\({\mathfrak {g}}^+ = \{(3,1)\}\)) and with only negative positive constraints (\({\mathfrak {g}}^- = \{(3,2)\}\)).

In general, any network *g* can be made trivially stable by setting \({\mathfrak {g}}^+ = g\) and \({\mathfrak {g}}^-\) to the complementary of *g*. At the same time, there are obvious configurations which cannot be made stable by using only positive or only negative constraints: consider the case of \(n=2\), with \(c_{12} = 2\) and \(c_{21} = 4\). If \(v_{12} = v_{21} = \) 1, the “connected” configuration can only be obtained in presence of positive constraints, while if \(v_{12} = v_{21} =\) 3, the “disconnected” configuration can only be obtained in presence of negative constraints.

Hence, both positive and negative constraints have the sometimes exclusive ability of transforming given network configurations in Nash equilibria. This symmetry however breaks when we look at the welfare of obtained equilibria, as suggested already in the last example proposed, and as formalised by the following result.^{5}

### Proposition 4

*g*which is not a Nash equilibrium.

- (a)
If

*g*becomes an equilibrium with some \({\mathfrak {g}}^- \ne e\), \({\mathfrak {g}}^+ = e\), then it is not Pareto optimal. - (b)
If

*g*is Pareto optimal, then it can be made an equilibrium with some \({\mathfrak {g}}^+ \ne e\), \({\mathfrak {g}}^- = e\).

### Proof

It is easy to see that the creation of a new link (*i*, *j*), possibly replacing another link (*i*, *k*) to a same component of the network, is always (strictly) Pareto improving when it is part of the (strict) best reply of *i*. This is because it must be convenient for *i*, and makes two components connected (or keeps them unchanged, in the replacement case). Now consider case (a): since *g* becomes unstable once negative constraints are removed, there must be some \((i,j) \in {\mathfrak {g}}^-\) which would be part of the (strict) best reply of *i* to *g*, possibly replacing some (*i*, *k*). So \(g \oplus (i,j)\) (or \(g \oplus (i,j) \ominus (i,k)\), in the replacement case) necessarily Pareto dominates *g*, which is hence not Pareto optimal. For case (b), notice that if *g* is Pareto optimal, the argument above states that there cannot be a link \((i,j) \not \in g\) which is part of a profitful deviation for *i*, so to stabilise *g* it is sufficient to set \({\mathfrak {g}}^+ = g\). \(\square \)

Figure 3 summarises the social planner perspective on positive and negative constraints: the latter can sometimes substitute the former (and assumingly be easier to implement) when the goal is to avoid implicit costs related to stability, but do not help in reaching Pareto optimality in the sense of the mere maximization of private values.

### 2.2 Complexity of Finding Nash Equilibria

Both the proof of Proposition 2 and the proof by Haller et al. (2007) it reduces to are constructive proofs, i.e. as long as the required conditions are satisfied, they provide a recipe for finding a Nash network. Such recipe is relatively simple to implement.^{6}

Outside of such assumptions, however (i.e. with non-owner-homogeneous costs), not only a Nash equilibrium might not exist, but determining if one exists, and finding it, can be a computationally hard task. What follows is a more precise characterisation of the computational complexity of such problem.

First, determining if a given network configuration is a Nash equilibrium is relatively simple (with or without constraints): it requires only to check the best response of each of the *n* nodes, and each of these checks requires *O*(*n*) operations; so the whole verification requires \(O(n^2)\) operations, and this locates the problem is in NP, the class of problems a solution of which can be validated in polynomial time.

*at least as difficult*as any other problem in NP. To do this, it is sufficient to reduce another NP-hard problem to it, and a suitable problem in this case is 3-SAT (3 satisfiability).

^{7}Consider a set of

*H*Boolean variables \({x_1, \dots x_H}\), and a set of

*K*clauses containing each three possibly negated instances of such variables, joined by disjunctive operators (an example of clause is \(x_1 \vee x_2 \vee \overline{x}_3\), where \(\overline{x}_3\) denotes the negation of \(x_3\)). The 3-SAT problem consists in stating whether there is an assignment of Boolean values to each of the variables which makes each clause evaluate to true.

The reduction of the search of Nash equilibria to 3-SAT is performed by constructing a network composed of *m* “variable” components and *k* “clause” components, as represented in Fig. 4. Notice that the components are connected among them at the extrema of the links \(\ell _{hT}\) and \(\ell _{hF}\) they share. In order for a “variable node” \(v_h\) to become connected to its objective \(o_h\), it needs to sponsor \(\ell _{hT}\) or \(\ell _{hF}\) (and is *ex ante* indifferent between them). Now, the “clause” component is unstable whenever one of the peripheral links is built (recall the example in Fig. 2). It becomes instead a Nash equilibrium if at least one of the internal paths is built, i.e. if at least one of the variable nodes chooses the appropriate path/truth value. Hence, the collection of all clause components (and hence the network) is a Nash equilibrium if and only if an appropriate assignment of truth values is implemented.

It can be observed that the stability concept being employed is *weak*: each variable node \(v_h\) can deviate and build the other path to its objective \(o_h\) without incurring any loss. However it is trivial to make it strict by amending the definition of \(v_{ij}\) with the rule that \(v_{ij} = 0.5\) if \((i,j) = (v_h, l_k)\) for some *h*,*k*, and by raising the cost of \(\ell _{hF}\) to 1.1: with these changes, variable nodes have a strict incentive not to deviate from configurations which satisfy all clauses, and if this happens regardless from the value of some variable *h*, then the variable node \(v_h\) has a slight preference for building \(\ell _{hT}\) over \(\ell _{hF}\).

Having shown that the problem of finding Nash equilibria is both in NP and in NP-hard, it is hence in NP-complete, which is defined as the intersection of the two. Notice that the result concerning NP was proven in the more general case of arbitrary constraints, while the result concerning NP-hard was proven again in the more general case of *not* resorting to constraints. So the whole proof of NP-completeness applies both to the original model by Bala and Goyal (2000), and to the generalised model adopted in the present work.

### 2.3 Repeated Internal Constraints and Non-Decreasing Network Models

A basic ingredient of virtually any real world process of network formation is *time*: as will be exemplified in Sect. 3.2, it can be a crucial ingredient in the study of some real world networks. A study of the consequences of repeated internal constraints, going beyond the analysis of static Nash equilibria relative to exogenous constraints, is hence a natural development of the theory exposed so far. In what follows, I will assume that the formation of the network happens in a discrete time setting. For each \(t=1,2\dots \), I will define as \({\mathfrak {g}}^{t^+}\) and \({\mathfrak {g}}^{t^-}\) respectively the positive and negative constraints at that time period. At each time, the best reply of each node is the one maximizing \(\Pi _i({\mathfrak {g}}^{t^+}; {\mathfrak {g}}^{t^-}; \cdot )\).^{8} The outcome, if any, of the step *t*, denoted as \(g^t\), will hence be a Nash equilibrium for these payoffs functions. Clearly, such outcome needs not be unique, and neither it necessarily exist: if it does not, the network formation process *terminates* at time *t*.

The introduction of endogenously determined, time dependent constraints is a powerful conceptual tool, but it increases considerably the amount of degrees of freedom, so the model is of limited utility unless one restricts to specific classes of rules which have a particular economic meaning. The result which follows considers the class of *non-decreasing network models*, defined as those for which \({\mathfrak {g}}^{t^+} = g^{t-1}\) (the positive restriction coincides with the outcome of the previous step of the process): such class naturally maps to several real world contexts, including the case of bibliometric networks analysed in Sect. 3.2. A peculiarity of non-decreasing network models is that, since the number of links present at time *t* is (weakly) increasing in *t* itself, and since it can never exceed \(n^2-n\), it must, for some *t*, terminate *or* stabilise in some configuration, which I will call a *limit network*. A limit network will then be defined as *strict* if there is no other limit network composed by a subset *or* a superset of its links.

### Proposition 5

If \({\mathfrak {g}}^{t^-}\) is constant, then the set of (strict) limits of the non-decreasing network model corresponds to the set of (strict) Nash equilibria of the static model associated to the payoffs function \(\Pi _i({\mathfrak {g}}^{1^+}; {\mathfrak {g}}^{1^-}; \cdot )\).

### Proof

Consider a (strict) Nash equilibrium \(g^*\) of the model associated to payoffs functions \(\Pi _i({\mathfrak {g}}^{1^+}; {\mathfrak {g}}^{1^-}; \cdot )\). By, definition, it is also a (strict) Nash equilibrium for the first step of the non-decreasing network model. In order to prove that it is a limit network, it is hence sufficient to show that it is still a (strict) Nash equilibrium for \(\Pi _i(g^*; {\mathfrak {g}}^{1^-}; \cdot )\). Assume it is not: this means there is some *i* which (weakly) prefers some \(g_i'\supset g_i^*\). But then, \(g^*\) was not a (strict) Nash network in the first place. The same applies hence for \(t=2,3\dots \)

*t*and any link (

*i*,

*j*) in \(g^t \ominus {\mathfrak {g}}^{t^+}\), let \(\Delta ^t_{i,j}\) be the profit which the link (

*i*,

*j*) yields to

*i*in \(g^t\), that is,

*i*. Any \(\Delta ^{t'}_{i,j}\) with \(t' > t\) will also be positive—all new links are bridges, and so the connected component of

*j*can only grow, while no paths from

*i*to

*j*alternative to (

*i*,

*j*) can appear. So no node has a (weakly) positive individual incentive to simply break one or more existing links in \(g^t\). If \(g^*\) is not a (strict) Nash equilibrium of \(\Pi _i({\mathfrak {g}}^{1^+}; {\mathfrak {g}}^{1^-}; \cdot )\), then necessarily some node has a (weakly) positive individual incentive to

*add*some link, or to

*replace*some link with some other. The first case is impossible: since \({\mathfrak {g}}^{t^-}\) is constant, this would make \(g^*\) unstable also at time \(t^*\). But the second is also impossible: since \(\Pi _i({\mathfrak {g}}^{1^+}; {\mathfrak {g}}^{1^-}; \cdot )\) is positive, the new link should still connect

*i*to \(N_j(g^* \ominus (i,j))\). So to be incentive compatible, it should cost less than (

*i*,

*j*). But then, it would have been chosen at time

*t*in its place.

- 1.
considering

*partially*non-decreasing network–networks in which some previously provided links*can*be destroyed, or - 2.
introducing time-dependent

*negative*constraints—for instance, real world networks with a population of nodes which increases over time can conveniently be modelled through appropriate negative constraints which decrease over time.^{9}

## 3 Applications

Several examples have been mentioned in Sect. 1 of real world networks the growth of which is significantly affected by positive or negative constraints. The present section goes more in depth in two of them in order to highlight the importance of taking into account such constraints when modelling them. The two case studies also make prominent the fact that modelling constraints inside a model of strategic network growth can result in interesting economic insights even when the distributions of costs \(c_{ij}\) and values \(v_{ij}\) are only partially known, or entirely unknown.

### 3.1 The Network of Corporate Control

^{10}Being part of a single group, which acts in a strategically coherent way, can clearly present benefits to member institutions, e.g. in terms of vertical integration. This relation of control among institutions is then subject to both natural and policy constraints.

The main natural constraint consists in the fact that control is clearly exclusive, i.e. if firm

*A*controls firm*B*, firm*C*cannot control firm*B*.A typical example of policy constraints is represented by antitrust policies, e.g. the European Commission forbidding the acquisition of a firm

*D*on behalf of some holding*E*which already owns a competitor*F*.

*C*might get control of firm

*B*if firm

*A*decided to sell enough shares; in the second case, firm

*E*might get control of firm

*D*if it first sold its shares of

*F*. Equally important is that the formalisation of both types of restriction must take into account indirect ownership. For instance, in the example of antitrust constraints, if

*E*is forbidden from acquiring a majority share of

*F*, it should as well be forbidden from acquiring a majority share in each of two other entities

*G*and

*H*each owning 30 % of the shares of

*F*. In general, defining restrictions which implement even relatively simple principles (e.g. an upper bound to market share controlled by a single entity) can result in complex rules.

While the restrictions described above are irrelevant for a mere characterisation of the network under study (i.e. quantifying the power of a core of firms), they become important in order to measure specific features relative to appropriate null models, resulting from simulations with endogenous incentives. For instance, an apparently low level of clustering (the tendency of nodes to link to neighbours of their neighbours) could be a consequence of the simple fact that ownership is exclusive, while a low assortativity (the tendency of nodes to link to similar links) might be due to local antitrust authorities limiting the control power of a single firm in a given national economy/sector. Running simulations which take into account such aspects becomes simple by resorting to the appropriate constraints, which in this case are only negative.

Interestingly, the cost of building links in the network being analysed coincides at a first approximation with the market value of the (voting) shares needed to control the target firm, which in turn is independent from the identity of the buying firm.^{11} This suggests that the case of *target-homogeneous costs* could be important to investigate (while the literature has so far mostly focused on *owner*-homogeneous costs—see Proposition 1).

### 3.2 The Network of Citations Between Scientific Papers

The network of citations between scientific papers is a prominent example of an endogenously formed network in which the time component is not just crucial for the endogenous growth mechanism, but also easily observable in the data typically available to researchers. Indeed, scientific papers have well defined publication date**s**, which impose a clear temporal hierarchy among them and hence strong restrictions to the set of “actions”—that is, of citations—they can make: these observation, together with the specific constraints the network is subject to, are exploited by Battiston (2014) to provide a measure of the so-called “Matthew effect” (Merton 1968) in shaping citations flows, and hence bibliometric indicators. The Matthew effect consists in a cumulative advantage by which papers or authors which already received many citations in the past tend to be more cited in the future, even if hypothetically controlling for quality, originality and age.

*à la*Bala and Goyal (2000) is the most appropriate for the setting being discussed because a citation is a purely one-side sponsored kind of relation: an author can very well find out

*ex-post*(if ever) that some paper of her has been cited by some other paper in the literature. The fact that being cited can, at least in some cases, represent a gain for a researcher is unanimously recognised, and is part of the reason why the network of citations is interesting to bibliometric scientists. Less intuitive is the evaluation of the utility obtained from

*making*a citation, but the mere fact that the overwhelming majority of scientific articles have a list of bibliographic references is an obvious sign of such implicit benefits. Notice that, coherently with the non-cooperative approach, a paper

*cannot*create ingoing links.

^{12}Although there is apparently no cost involved in “sponsoring” a citation, it is evident that the number of bibliographic references contained into a single scientific work is limited: many authors, starting with de Solla Price (1965), have analyzed different aspects of its distribution, evidencing a strong concentration for small values. While this evidence does not help in quantifying the implicit costs born by authors in making citations, which may be due partly to editorial/formatting choices and partly to the work involved in processing the literature to be cited, it does provide clear evidence of some implicit costs. Finally, as best exemplified by the phenomenon of

*literature reviews*, it is very natural to assume that the benefit of a citation to a given paper depends in turn also on the citations

*included*in that paper. The hypothesis of

*perfectly reliable links*—meaning that being connected to another paper through an arbitrarily long path is equivalent to being directly connected—is instead a non-harmful approximation of reality for the analysis by Battiston (2014): it does not affect its qualitative results, and on the other hand an alternative specification would make the model much more complex and require some arbitrary choices.

*n*scientific articles (i.e. composed of all papers published in a given time span), it can be assumed for simplicity that there is a one-to-one relation between each node

*i*and the time \(t_i\) at which it is published. The negative restrictions are then defined as follows:

^{13}Differently from the case of corporate control, the network of citations among scientific papers is also characterised by positive restrictions: namely, \({{\mathfrak {g}}^{t_i^+} = g^{t-1}}\), i.e. once a citation is established, it “lasts” forever. The structure of negative constraints is then peculiar in the fact that it is decreasing over time: no link to or from a node

*h*can be built before \({t_h}\), and so the model describes a growing network.

In Sect. 2.3, the fundamental building block of the development of a network with repeated internal constraints was assumed to be the Nash equilibrium of a given step *t*. Under the specification given for the network of citations, in which at each step only one node is active, such a Nash equilibrium degenerates to the best response of each node. The hypothesis of all nodes existing since time 0 does not influence the strategic choice, which is determined simply as a best response *among allowed links*—because of the direction in which value “flows”, *later* links are irrelevant.

## 4 Conclusions

The evolution of many economic and social networks is characterised by constraints which delimit the action space of single nodes, in terms of links they can build and severe. This paper provides a general framework for introducing such constraints in models of strategic network formation where links are sponsored by individual nodes.

Previous results by Haller (2012) on the existence of Nash networks are extended to the presence of negative constraints; moreover, Pareto optimality of network configurations is put in relation with the constraints needed to transform them into equilibria: in general, negative constraints do not share the welfare benefits of positive ones, but they can provide a tool to guarantee the existence of equilibria. It is then shown that finding Nash equilibria, and even just asserting whether they exist, can be computationally unfeasible (NP-complete) if the cost of building new links is not owner-homogeneous.

Two prominent examples were presented of the importance of taking into account constraints in models of endogenous network formation. In the case of the network of corporate control, the constraints can be both natural (control is exclusive) and regulatory (e.g. antitrust); in the case of bibliometric networks, they are mainly related to the time factor (links are established at the time of publication, and only go backwards). The theoretical model can be specialised to study many other kinds of social networks, and provide empirical researchers with tools that go beyond what the mere *static* analysis of networks allows to identify. For instance, such restrictions should be taken into account when simulations of endogenous network formation are used to build null models against which to compare relevant features of real world models.

The literature has explored other kinds of strategic network formation: two examples of deviations from the basic assumptions by Galeotti et al. (2006) are network models in which links allow a one-way only flow of value (Galeotti 2006), and models in which the transmission of value over links is imperfect, and hence length of paths is relevant (Billand et al. 2010). The concept of constraints can be straightforwardly applied to these and other frameworks, but understanding which of the results exposed in the present paper extend to some extent to those other models might prove to be challenging, and is a stimulating direction for further research.

## Footnotes

- 1.
As in the approach of Haller (2012), the original cost of links in \({\mathfrak {g}}^+\) should be taken again into consideration when doing comparative statics and welfare analysis.

- 2.
With a slight abuse of notation, when the network to be added/removed is composed of a single link, I will write \(g \oplus (i,j)\) or \(g \ominus (i,j)\), instead of \(g \oplus \{(i,j)\}\) or \(g \ominus \{(i,j)\}\), respectively.

- 3.
Notice that

*i*and*h*, or*j*and*k*, can coincide. - 4.
The assumption that costs are owner-homogeneous is one of the reasons why it is impractical to define negative constraints just as arbitrarily costly links: if this was the case, in order for a owner-homogeneous model of network formation to remain such after the imposition of negative constraints, such constraints could not consist in arbitrary sets of links, and should rather include all outgoing links from a given set of nodes. Another reason is that this would make the definition of

*endogenous*negative constraints, as described in Sect. 2.3, much more complicated. - 5.
- 6.
The proof by Haller et al. (2007) is composed of

*n*iterative steps, each consisting in the evaluation of links from a given node to each other connected component. It is easy to see that for each connected component, the cost of such operation is bounded above by its size, so the total cost of each step is at most*n*. The proof of Proposition 2 adds in principle as many as \(n^2-n\) steps (the maximum size of \({\mathfrak {g}}^-\)). But since the initial \(g^* \ominus {\mathfrak {g}}^+\) cannot have more than*n*links (each link must be a bridge), all but at most*n*of such steps will be trivial: for instance, if one starts by looking at \((i,j) \not \in g^*\), the first steps are all trivial. So the total cost is still \(O(n^2\)). - 7.
This choice and the proof which follows are heavily inspired by Anshelevich et al. (2003), who proved the NP-completeness of finding Nash networks in a different but related framework of network formation.

- 8.
Clearly, the framework could also be an ideal context for the study of a less myopic type of rationality, such as the farsightedly stable networks (Herings et al. 2009).

- 9.
I thank an anonymous reviewer for this remark.

- 10.
In the aforementioned studies, such share is 50 %, but it is commonly acknowledged (Barclay and Holderness 1989) that the largest shareholder can attain

*de facto*control even with a*smaller*share. - 11.
This is clearly an oversimplification made for illustrative purposes—the buyer entity could already be owning some amount of shares, and most importantly the price of the shares could reflect the interest in them on the behalf of the buyer.

- 12.
This description of the network of citations excludes on purpose “spurious” effects due environmental constraints, such as the role that the fame of an author or the prestige of a journal can have in influencing the amount of citations to a given piece of research. This modelling decision is instrumental in building a null model which allows Battiston (2014), to find evidence of such spurious effect ultimately resulting in the desired measure.

- 13.
In principle, given the typical publication process, which goes through a period of open discussion in seminars/workshop, an often lengthy referral process, and finally a delay from the definitive acceptance to the publication, it can easily happen that two papers

*i*and*j*cite some version of each other. This very special case, which is not admissible under the simplified settings just described, would possibly deserve a specific analysis.

### References

- Anshelevich E, Dasgupta A, Tardos E, Wexler T (2003) Near-optimal network design with selfish agents. In: Proceedings of the thirty-fifth annual ACM symposium on Theory of computing. ACM, pp 511–520Google Scholar
- Bala V, Goyal S (2000) A noncooperative model of network formation. Econometrica 68(5):1181–1229CrossRefGoogle Scholar
- Barclay MJ, Holderness CG (1989) Private benefits from control of public corporations. J Fin Econ 25(2):371–395CrossRefGoogle Scholar
- Battiston P (2014) Citations are forever: modeling constrained network formation. Working paper, LEM paper seriesGoogle Scholar
- Billand P, Bravard C, Sarangi S (2010) The insider-outsider model reexamined. Games 1(4):422–437CrossRefGoogle Scholar
- Chapelle A, Szafarz A (2005) Controlling firms through the majority voting rule. Phys A: Stat Mech Appl 355(2):509–529CrossRefGoogle Scholar
- de Solla Price DJ (1965) Networks of scientific papers. Science 149:510–515CrossRefGoogle Scholar
- Dutta B, Mutuswami S (1997) Stable networks. J Econ Theory 76(2):322–344CrossRefGoogle Scholar
- Galeotti A (2006) One-way flow networks: the role of heterogeneity. Econ Theory 29(1):163–179CrossRefGoogle Scholar
- Galeotti A, Goyal S, Kamphorst J (2006) Network formation with heterogeneous players. Games Econ Behav 54(2):353–372CrossRefGoogle Scholar
- Goeree JK, Riedl A, Ule A (2009) In search of stars: network formation among heterogeneous agents. Games Econ Behav 67(2):445–466CrossRefGoogle Scholar
- Haller H (2012) Network extension. Math Soc Sci 64(2):166–172CrossRefGoogle Scholar
- Haller H, Kamphorst J, Sarangi S (2007) (Non-)existence and scope of Nash networks. Econ Theory 31(3):597–604CrossRefGoogle Scholar
- Herings PJJ, Mauleon A, Vannetelbosch V (2009) Farsightedly stable networks. Games Econ Behav 67(2):526–541CrossRefGoogle Scholar
- Kirchsteiger G, Mantovani M, Mauleon A, Vannetelbosch V (2016) Limited farsightedness in network formation. J Econ Behav Organ 128:97–120Google Scholar
- Merton RK (1968) The Matthew effect in science. Science 159(3810):56–63CrossRefGoogle Scholar
- Vitali S, Glattfelder JB, Battiston S (2011) The network of global corporate control. PloS One 6(10):e25995CrossRefGoogle Scholar