From acquaintance to best friend forever: robust and fine-grained inference of social tie strengths

Social networks often provide only a binary perspective on social ties: two individuals are either connected or not. While sometimes external information can be used to infer the strength of social ties, access to such information may be restricted or impractical. Sintos and Tsaparas (KDD 2014) first suggested to infer the strength of social ties from the topology of the network alone, by leveraging the Strong Triadic Closure (STC) property. The STC property states that if person A has strong social ties with persons B and C, B and C must be connected to each other as well (whether with a weak or strong tie). Sintos and Tsaparas exploited this to formulate the inference of the strength of social ties as NP-hard optimization problem, and proposed two approximation algorithms. We refine and improve upon this landmark paper, by developing a sequence of linear relaxations of this problem that can be solved exactly in polynomial time. Usefully, these relaxations infer more fine-grained levels of tie strength (beyond strong and weak), which also allows to avoid making arbitrary strong/weak strength assignments when the network topology provides inconclusive evidence. One of the relaxations simultaneously infers the presence of a limited number of STC violations. An extensive theoretical analysis leads to two efficient algorithmic approaches. Finally, our experimental results elucidate the strengths of the proposed approach, and sheds new light on the validity of the STC property in practice.


rk).A strengt
function w : E → {weak, strong} is STC-compliant on an undirected network G = (V , E) if and only if for all i, j, k ∈ V , {i, j}, {i, k} ∈ E : w({i, j}) = w({i, k}) = strong implies {j, k} ∈ E.

A consequence of this definition is that for an STC-compliant strength function, any wedge-defined as a triple of nodes i, j k ∈ V for which {i, j}, {i, k} ∈ E but {j, k} E-can include only one strong edge.We will denote such a wedge by the pair (i, {j, k}), where i is the root and {j, k} are the end-points of the wedge, and denote the set of wedges in a given network by W.

On the other hand, for a triangle-defined as a triple of nodes i, j, k ∈ V for which {i, j}, {i, k} {j, k} ∈ E-no constraints are implied on the strengths of the three involved edges.We will denote arXiv:1802.03549v3[cs.SI] 18 Sep 2018 a triangle simply by the (unordered) set of its three nodes {i, j, k}, and the set of all triangles in a given etwork as T .

Relying on the STC property, Sintos and Tsaparas [18] propose an approach to infer the strength of social ties.They observe that a strength function that labels all edges as weak is always STCcompliant.However, as a large number of strong ties is expected to be found in a social network, th y suggest searching for a strength function that maximizes the number of strong edges, or (equivalently) minimizes the number of weak edges.

To write this formally, we introduce a variable w i j for each edge {i, j} ∈ E, defined as w i j = 0 if w({i, j}) = weak and w i j = 1 if w({i, j}) = strong.Then, the original STC problem, maximizing the number of strong edges, can be formulated as:
max w i j :{i, j } ∈E {i, j } ∈E w i j ,(STCmax)
such that w i j + w ik ≤ 1, for all (i, {j, k }) ∈ W, (1) w i j ∈ {0, 1}, for all {i, j} ∈ E.

(2)

Equivalently, one could instead minimize {i, j } ∈E (1 − w i j ) subject to the same constraints, or with transformed variables v i j = 1 −w i j equal to 1 for weak edges and 0 for strong edges:
min v i j :{i, j } ∈E {i, j } ∈E v i j ,(STCmin)
such that v i j + v ik ≥ 1, for all (i, {j, k }) ∈ W,
v i j ∈ {0, 1}, for all {i, j} ∈ E.(3)
When we do not wish to distinguish between the two form lations, we will refer to them jointly as STCbinary.Sintos and Tsaparas [18] observe that STCmin is equivalent to Vertex Cover on the so-called wedge graph G E = (E, F ), whose nodes are the edges of the original input graph G, and whose edges are F = {({i, j}, {i, k}) | (i, {j, k}) ∈ W}, i.e., two nodes of G E are connected by an edge if the edges they represented in G form a wedge.While Vertex Cover is NP-hard, a simple factor-2 pproximation algorithm can be adopted for STCmin.On the other hand, STCmax is equivalent to finding the maximum independent set on the wedge graph G E , or equivalently the maximum clique on the complement of the wedge graph.It is known that there can real number ε > 0 approximates the maximum clique to withi other words, while a polynomial-time approximation algorithm exists for minimizing the number of weak edges (with approximation factor two), no such polynomial-time approximation algorithm exists in this paper.

First, STCbinary is an NP-hard problem.Thus, one has to either resort to approximation algorithms, which are applicable only for certain problem variants-see the discussion on STCmin vs. STCmax above-or rely on exponential algorithms and hope for good behavior in practice.Second, the problem returns only binary edge strengths, weak vs. strong.In contrast, real-world social networks contain tie strengths of many different levels.A third limitation is that, on real-life networks, STCbinary tends to have many optimal solutions.Thus, any such optimal solution makes arbitrary strength assignments for the edges where different optimal solutions differ from each other. 1 Last but not least, STCbinary assumes that the STC property holds for all wedges.Yet, real-world social networks tend to be noisy, with spurious connections as well as missing edges.Contributions.In this paper we propose a series of linear programming relaxations that address all of the above limitations of STCbinary.In particular, our LP relaxations provide the following advantages.

• The first relaxation replaces the integrality constrain s w i j ∈ {0, 1} with fractional counterparts 0 ≤ w i j ≤ 1.It can be shown that this relaxed LP is half-integral i.e., the edge strengths in the optimal solution take values w i j ∈ {0, 1 2 , 1}.Thus, not only the problem becomes polynomial, but the formulation introduces meaningful three-level social strengths.

• Next we relax the upper-bound constraint, requiring only w i j ≥ 0, while generalizing the STC property to deal with higher gradations of edge strengths.We show that the optimal edge strengths still take values in a small discrete set, controlled with an additional parameter.Thus, our approach can yield multi-level edge strengths, from a small set of discrete values, while ensuring a polynomial algorithm.• We show how the previous relaxations can be solved by advanced and highly efficient combinatorial algorithms, so that one need not rely on generic LP solvers.• As our relaxations allow intermediate strength levels, arbitrary choices between weak and strong values can be avoided by assigning an intermediate strength.Furthermore, the computational tractability of the relaxed solution makes it possible to also quantify in polynomial time the range of po sible strengths an edge can have in the set of optimal strength assignments.For STCbinary, given the intractability of finding even one optimal solution, this is clearly beyond reach.• Our final relaxation simultaneously edits the network while optimizing the edge strengths, making it robust against noise in the network.Also this variant has no integrality constraints, and thus, it can again be solved in polynomial time. 2utline.We start by proposing the successive relaxations in Sec. 2. In Sec. 3 we analyse these relaxations and derive properties of their optima, highlighting the benefits of these relaxations with respect to STCbinary.The theory de

loped in Sec. 3
leads to efficient algorithms, discussed in Sec. 4. Empirical performance is evaluated in Sec. 5 and related work is reviewed in Sec.6, before drawing conclusions in Sec. 7.


LP RELAXATIONS

Here we will derive a sequence of increasingly loose relaxations of Problem STCmax.Their detailed analysis is deferred to Sec. 3.3


Elementary relaxations

In this subsection we simply enlarge the feasible set of strengths w i j , for all edges {i, j} ∈ E. This is done in two steps.

2.1.1Relaxing the integrality constraint.The first relaxation relaxes the constraint w i j ∈ {0, 1} to 0 ≤ w i j ≤ 1. Denoting the set of edge strengths with w = {w i j | {i, j} ∈ E}, this yields:
max w {i, j

∈E w i j , (LP1) such t
at w i j + w ik ≤ 1, for all (i, {j, k}) ∈ W,(5)w i j ≥ 0, for all {i, j} ∈ E,(6)w i j ≤ 1, for all {i, j} ∈ E.(7)
Equivalently n Problem STCmin one can relax constraint (4) to 0 ≤ v i j ≤ 1. Recall that Problems STCmax and STCmin are equivalent respectively with the Independent Set and Vertex Cover problems on the wedge grap y, this relaxation will lead to solutions that are not necessarily binary.However, as will be explained in Sec. 3, Problem LP1 is half-integral, meaning that there always exists an optimal solution with values w i j ∈ {0, 1  2 , 1} for all {i, j} ∈ E.


2.1.2

Relaxing the upper bound constraints to triangle constraints.We now further relax Problem LP1, so as to allow for edge strengths larger than 1.The motivation to do so is to allow for higher gradations in the inference of edge strengths.

Simply dropping the upper-bound constraint (7) would yield uninformative unbounded solutions, as edges that are not p rt of any wedge would be unconstrained.Thus, the upper-bound constraints cannot simply be deleted; they must be replaced by looser constraints that bound the values of edge strengths in triangles in the same spirit as the STC constraint does for edges in wedges.

To do so, we propose to generalize the wedge STC constraints (5) to STC-like constraints on triangles, as follows: in every triangle, the combined strength of two adjacent edges should be bounded by an increasing function of the strength of the closing edge.In socialnetwork terms: the stronger a person's friendship with two other people, the stronger the friends ip between these two people must be.Encoding this intuition as a linear constraint yields:
w i j + w ik ≤ c + d • w jk ,
for some c, d ∈ R + .This is the most general linear constraint that imposes a bound on w i j +w ik that is increasing with w jk , as desired.We will refer to such constraints as triangle constraints.

In sum, we relax Problem LP1 by first adding the triangle constraints for all triangles, and subsequently dropping the upperbound const ptimization problem to be a relaxation of Problem LP1, the triangle constraints must be satisfied throughout the original feasible region.This is the case as long as c ≥ 2: indeed, then the box constrain s 0 ≤ w i j ≤ 1 ensure that the triangle constraint is always satisfied.The tightest possible relaxation is thus achieved with c = 2, yielding the following relaxation:
max w {i, j } ∈E w i j ,(LP2)
such that w i j + w ik ≤ 1, for all (i, {j, k}) ∈ W,
w i j + w ik ≤ 2 + d • w jk ,
for all {i, j, k} ∈ T , (8) w i j ≥ 0, for all {i, j} ∈ E.

Remark 1 (The wedge constraint is a special case of the triangle constraint).Considering an absent edge as an edge with negative strength −1/d, the wedge constraint can in fact be regarded as a speci aint.


Enhancing robustness by allowing edge additio real-world social networks are noisy and may contain many exceptions to this rule.In this subsection we propose two further relaxations of Problem LP2 that gracefully deal with exceptions of two kinds: wedges where the sum of edges str

gths exceeds 1, and edges with a negative edge strength, indic
ting that the STC property would be satisfied should the edge not be present.These relaxations thus solve the STC problem while allowing a small number of edges to be added or removed from the network.


Allowing violated wedge STC constraints.

In order to allow for violated wedge STC constraints, we can simply add positive slack variables ϵ jk for all (i, {j, k }) ∈ W:
w i j + w ik ≤ 1 + ϵ jk , ϵ jk ≥ 0. (9)
Elegantly, the slacks ϵ jk can be interpreted as quantifying the strength of the (absent) edge between j and k.To show this, let Ē denote the set of pairs of end-

ints of all the wedges in the graph, i.e.
Ē = {{j, k} | there exists i ∈ V : (i, {j, k}) ∈ W}.We also extend our notation to introduce strength values for those pairs, i Ē , and define w jk ≜ ϵ jk −1 d for {j, k} ∈ Ē.The relaxed wedge constraints (9) are then formally identical to the triangle STC constraints (8).Meanwhile, the lower bound ϵ jk ≥ 0 from (9) implies w jk ≥ − 1 d , i.e., allowing the strength of these absent edges to be negative.

In order to bias the solution towards few violated wedge constraints a term −C {j,k } ∈ Ē w jk is added to the objective function.The larger the parameter C, the more a violation of a wedge constraint will be penalized.The resulting problem is:
max w {i, j } ∈E w i j − C {j,k } ∈ Ē w jk ,(LP3)
such that w i j + w ik ≤ 2 + d • w jk , for all (i, {j, k}) ∈ W,
w i j + w ik ≤ 2 + d • w jk ,
for all {i, j, k} ∈ T , w i j ≥ 0, for all {i, j} ∈ E.
w jk ≥ − 1 d , for all {j, k} ∈ Ē.
Note that in Remark 1, − 1 d was argued to correspond to the strength of an absent edge.Thus, the lower-bound constraint on w jk requires t penalty paid for adding it.

2.2.2 Allowing negative edge strengths.The final relaxation is obtained by allowing edges to have negative strength, with lower bound equal to the strength signifying an absent edge:
max w {i, j } ∈E w i j − C {j,k } ∈ Ē w jk ,(LP4)
such that w i j + w ik ≤ 2 + d • w jk , for all (i, {j, k}) ∈ W,
w i j + w ik ≤ 2 + d • w jk ,
for all {i, j, k} ∈ T ,
w i j ≥ − 1 d , for

l {i,
} ∈ E. w jk ≥ − 1 d , for all {j, k } ∈ Ē.
This formulation allows the optimization problem to strategically delete some edges from the graph, if doing so allows it to increase t [14].In this section we demonstrate and exploit the existence of symmetries in the optima to show an analogous result for Problem LP2.Furthermore, the described symm

ries also exist for Problems LP3 an
LP4, although they do not imply an analogue of the half-integrality result for these problems.

We also discuss how the described symmetries are useful in reducing the arbitrariness of the optima, as compared to Problems STCmax and STCmin, where structurally-indistinguishable edges might be assigned different strengths at the optima.Furthermore, in Sec. 4 we will show how the symmetries can be exploited for algorithmic performance gains, as well.

We start by giving some useful definitions and lemmas.


Auxiliary definitions and results

It is useful to distinguish two types of edges: Definition 3.1 (Triangle edge and wedge edge).A triangle edge is an edge that is part of at least one triangle, but that is part of no wedge.A wedge edge is an edge that is part of at least one wedge.

These definitions are illustrated in a toy graph in Figure 1, where edges (x, y), (y, z),

nd (x, z) are triangle edges, whil
edges (w, x), (w, y), (w, z), and (w, u) are we ge edges.

It is clear that in this toy example the set of triangle edges forms a clique.This is in fact a general property of triangle edges: Lemma 3.2 (Subgraph induced by triangle edges).Each connecte component in the edge-induced subgraph, induced by all triangle edges, is a clique.

Proof.The contrary would imply the presence of a wedge within this subgraph, which is a contradicti n since by definition none of the triangle-edges can be part of any wedge.□ Thus, we can introduce the notion of a triangle clique.The nodes {x, y, z} in Figure 1 form a triangle clique.Note that not every clique in a graph is a triangle clique.E.g., nodes {x, y, z, w } form a cl que but not a triangle clique.

A node k is a neighbor of a triangle clique C if k is connected to at least one node of C. It turns out that a neighbor of a triangle clique is connected to all the nodes of that triangle clique.In other words, a neighbor of one node in the triangle clique must be a neighbor of them all, in which case we can call it a neighbor of the triangle clique.

Proof.Assume the contrary, that is, that some node k ∈ V \C is connected to i ∈ C but not to j ∈ C.This means that (i, {j, k}) ∈ W.However, this contradicts the fact that {i, j} is triangle edge.□

This lemma allows us to define

he concepts bundle and ray: Definition 3.5 (Bundle and ray)
i ∈ C is called a bundle of the triangle clique.Each edge {k, i} in a bundle is called a ray of the triangle clique.

In Figure 1 the edges (w, x), (w, y), a d (w, z) form a bundle of the triangle clique with nodes x, y, and z.

A technical condition to ensure finiteness of the optimal solution.Without loss of generality, we will further assume that no conn cted component of the graph is a clique -such connected components can be easily detected and handled separately.This ensures that a finite optimal solution exists, as we show in Propositions 3.7 and 3.8.These propositions rest on the following lemma: Lemma 3.6 (Each triangle edge is adjacent to a wedge edge).Each trian le edge in a graph without cliques as connected components is immediately adjacent to a wedge edge.

Proof.Fir t note that, in a connected component, for each edge there exists at least one adjacent edge.If a triangle edge {i, j} were adjacent to triangle edges only, all i's and j's neighbors would be connected.Together with i and j, this set of neighbors would form a connected component that is a clique-a contradiction.□ Proposition 3.7 (Finite feasible region without slacks).A graph in which no connected component is a clique has a finite feasible region for Problems LP1 and LP2.

Thus, also the optimal solution is finite.

Proof.The weight of wedge edges is trivially bounded by 1. From Lemma 3.6, we know that the weight of each triangle edge is bounded by a at least one triangle inequality where the strength of the edge on the right hand side is a for a wedge edge-i.e., it is also bounded by a finite number, thus proving the theorem.□

For Problems LP3 and LP4 the following weaker result holds: Proposition 3.8 (Finite optimal solution with slacks).A graph in which no connected component is a clique has a finite optimal solution for Problems P3 and LP4 for sufficiently large C.

Note hat for these problems the feasible region is unbounded.

Proof.Let n be the number of nodes in the largest connected component in the graph.Let j, k ∈ V be two nodes in this connected component for which {j, k} E. We will first show that w jk is finite at the optimum for sufficiently large C, by showing that increasin it from a finite value strictly and monotonously reduces th value of the objective function.

By increasing w jk by δ , the other weights in the connected component may each increase by at most max{d, d 2 } • δ .Indeed, the left hand side in the w dge constraints for wedges (i, {j, k }) can increase by d • δ , which may result in a potential increase of the left hand sides of the triangle constraints by d 2 • δ .As there are at most n 2 edges in the connected component, this means that the objective function will strictly decrease if n 2 • max{d, d 2 } < C, as we set out to prove.

If w jk for all (i, {j, k}) ∈ W is finite, this means that also each wedge edge is initely bounded.

From Lemma 3.6, it thus also follows that each triangle edge is finitely bounded.□


Symmetry in the optimal solutions

We now proceed to show that certain symmetries exist in all optimal solutions(Sec.3.2.2),while for other symmetries we show that there always exists an optimal solution that exhibits it(Sec.3.2.1).


3.2.1

There always exists an optimal solution that exhibits symmetry.We first state a general result, before stating a more practical corollary.The theorem p rtains to automorphisms α : V → V of the graph G, defined as node permutations that leave the edges o the graph unaltered: for α to be a graph automorphism, it must hold that {i, j} ∈ E

f and only if {α(i), α(j)} ∈ E. Gr
ph automorphisms form a permutation group defined over the nodes of the graph.Theorem 3.9 (Invariance under graph automorphisms).For any subgroup A of the graph automorphism group of G, there exists an

ptimal
solution for Problems LP1, LP2, LP3 and LP4 that is invariant under all automorphisms α ∈ A. In other words, there exists an optimal solution w such that w i j = w α (i)α (j) for each automorphism α ∈ A.

Proof.Let w i j be the optimal strength for the node pair {i, j} in an optimal solution w.Then, we claim that assigning a strength 1 | A | α ∈A w α (i)α (j) to each node pair {i, j} is also an optimal solution.This solution satisfies the condition in the theorem statement, so if true, the theorem is proven.

It is easy to see that this strength assignment has the same value of the objective function.Thus, we only need to prove that it is also feasible.

As α is a graph automorphism, it preserves the presence of edges, wedges, and triangles (e.g., {i, j} ∈ E if and only if {α(i), α( )} ∈ E).Thus, if a set of strengths w i j for node pairs {i, j} is a feasible solution, then also the set of strengths w α (i)α (j) is feasible for these node pairs.Due to convexity of the constraints, also the average over all α of these strengths is feasible, as required.□

Enumerating all automorphisms of a raph is computationally at least as hard as solving the graph-isomorphism problem.The graph-isomorphism problem is known to belong to NP, but it is ot known whether it belongs to P. However, the set of permutations in the following proposition is easy to find.Proposition 3.10.The set Π of permutations α : V → V for which i ∈ C if and only if α(i) ∈ C for all triangle cliques C in G forms a subgroup of the automorphism group of G.

Thus the set Π contains permutations of the nodes that map any node in a triangle clique onto another node in the same tria gle clique.

Proof.Each permutation α ∈ Π is an automorphism of G.This follows directly from Lemma 3.4 and the fact that α only permutes nodes within each triangle clique.Furthermore, it is clear that if α ∈ Π then also α −1 ∈ Π, and if α 1 , α 2 ∈ Π then also α 1 α 2 ∈ Π.

Finally, Π contains at least the identity and is thus non-empty, proving that Π is a subgroup of A. □

We can now state the more practical Corollary of Theorem .9:

Corollary 3.11 (Invariance under permutations within triangle cliqes).Let Π be the set of permutations α : V → V for which i ∈ C if and only if α(i) ∈ C for all triangle cliques C.There exists an optimal solution w for problems LP1, LP2, LP3 and LP4 for which w i j = w α (i)α (j) for each permutation α ∈ Π.

Thus there always exists an optimal solution for whi h edges in the same triangle clique (i.e., adjacent triangle e ges) have equal strength, and for which rays in the same bundle have equal strength.

Such a symmetric optimal solution can be constructed from any other optimal solution, by setting the strength of a triangle edge equal to the average of strengths within the triangle clique it is part of, and setting the stre gth of each ray equal to the average of the strengths within the bundle it is part of.Indeed, this averaged solution is equal to the average of all permutations of the optimal solution, which, from con exity of the problem, is also feasible and optimal.


In each optimum, connected triangle-edges have equal strength.

Here, we will prove that only some of the symmetries discussed above are present in all optimal solutions, as formalized by the following theorem: Theorem 3.12 (Optimal strengths of adjacent triangle edges are eqal).In any optimal solution of Problems LP1, LP2, LP3 and LP4, the strengths of adjacent triangle edges are equal.

Proof.Consider an optimum w for whi

this is not the case, i.e., two adjacent triangle edges can be
found that have different strength.From Corollary 3.11 we know that we can construct from this optimal solution another optimal solution w = for which adjacent triangle edges do have the same strength, equal to the average strength in w of all triangle edges in the triangle clique they are part of.Moreover, in w = all rays wit in the same bundle have the same strength, equal to the average strength in w of all rays in the bundle.Let us denote the strength in w = of the b-th bundle to the triangle clique as w = b (i.e., b is an index to the bundle), and the strength of the edges in the triangle clique as w = c .We will prove that w = is not optimal, reaching a contradiction.

In particular, we will show that there exists a solution w * for which w * b = w = b for all bundles b, but for which the strength within the triangle clique is strictly larger: w * c > w = c .We first note that the strengths of the triangle edges w * c are bounded in triangle constraints involving two rays and one triangle edge, namely
w * c ≤ 2 + (d − 1) • w = b .
They are bounded also in triangle constraints involving only tria gle edges, namely
(2 − d) • w * c ≤ 2. For d ≥ 2 this constraint is trivially satisfied, but not for d < 2. Thus, we know that w * c = 2 + min b (d − 1) • w = b for d ≥ 2, and w * c = min 2 + min b (d − 1) • w = b , 2 2−d for d < 2.
If this optimal value for w * c is larger than w = c the contradiction is established.First we show that w = c by contradiction.For each triangle {i, j, k } in the triangle clique, the following h smallest (d −1)•w = b has at least two different weights in w.Then, note that for ea i and w b j from node b to triangle-edge (i, j), the following bounds must hold:
w i j ≤ 2+d•min{x bi , x b j }−max{x bi , x b j }.
Summing this over all {i, j} and dividing by
n(n−1) 2
where n is the ngle clique, yields:
w = c ≤ 2 + d • w − b − w + b for some w − b < w + b with w b = w − b +w + b 2 . This means that w = c < t least one pair of rays {b, i} and {b, j} for w ich w bi < w b j .Thus, this shows that w * c > w = c , and a contradiction is reached.Figure 2: This graph is an example where an optimal solution of Problem LP2 (with d = 2) exists that is not constant within a bun e to both triangle cliques (the one with node the one with nodes z i ).Its rays to both bundles constrain ea se nodes.This is achieved by assigning strengths of 1 to y's rays to z i , and 0 to y's rays to x i .Then the triangle edges in the z triangle clique can have strength 3, and the strengths betwee constrain each other in wedges (x i , {b 1 , b 2 }), such that edges from b 1 and b 2 to the same x i must sum to 1 at the optimum.Furthermore, triangles {b i , x j , x k } impose a constraint on the strength of those edges as:
w b i x j + w x i x k ≤ 2 + d • x b i x k . For d = 2 and w x j x k = 2, this gives: w b i x j ≤ 2 • x b i x k .
No other constraints apply.Thus, the (unequal) strengths for the edges in the bundles from b 1 and b 2 shown in the figure are feasible.Moreover, this particular optimal solution is a vertex point of the feasible polytope(proof not given) .1/2 for each of those edges is also feasible.

(since the right hand side is independent of i and j).Thus, again a contradiction is reached.□

Note that there do exist graphs for which not all optimal solutions have equal strengths within a bundle.An example is shown in Fig. 2.


An equivalent formulation for finding symmetric optima of Problem LP2

Solutions that lack the symmetry properties specifi al solutions that exhibit these symmetries. 4In addition, exploiting symmetry leads to fewer variables, and thus, computational-efficiency gains.

In this section, we will refer to strength assignments that are invariant with respect to permutations within triangle cliques as symmetric, f r short.The results here apply only to Problem LP2.The set of free variables consists of one variable per triangle clique, one variable per bundle, and one variable per edge that is neither a triangle edge nor a ray in a bundle.To refo

ulate Problem LP2 in terms of this reduced set of variables, it is con
enient to introduce the contracted graph, defined as the graph obtained by edge-contracting all triangle edges in G.More formally: Definition 3.13 (Contracted graph).Let ∼ denote the equivalence relation between nodes defined as i ∼ j if and only if i and j are connected by a triangle edge.Then, the contracted graph G = ( V , E)
{b 1 } b 2 } {x i |i = 1 : 3} {y} {z i |i = 1 : 6}with E ⊆ V
2 is defined as the graph for which V = V /∼ (the quotient set of ∼ on V ), and for any A, B ∈ V , it holds that {A, B} ∈ E if and only if for all i ∈ and j ∈ B it holds that {i, j} ∈ E.

Figure 3 illustrates these definitions for the graph from Fig. 2. We now introduce a vector w t indexed by sets A ⊆ V , with |A| ≥ 2, with w t A denoting the strength of the edges in the triangle clique A ⊆ V .We also introduce a vector w w indexed by unordered pairs {A, B} ∈ E, with w w AB denoting the strength of the wedge edges between nodes in A ⊆ V and B ⊆ V .Note that if |A| ≥ 2 or |B| ≥ 2, these edges are rays in a bundle.

With this notation, we can state the symmetrized problem as:
max w t ,w w A∈ V : |A | ≥2 |A|(|A| − 1) 2 w t A + {A, B } ∈ E |A||B|w w AB ,(LP2sym) s.t. w w AB + w w AC ≤ 1, for all (A, {B, C}) ∈ W,(11)w t A ≤ 2 + (d − 1) • w w AB , for all {A, B} ∈ E, |A| ≥ 2, (12)w t A ≤ 2 2 − d (if d < 1), for ll A ∈ V , |A| ≥ 3, (13)w t A ≥ 0, for all A ∈ V , |A| ≥ 2, ( olutions of Problem LP2).The set of optimal symmetric optimal solutions of Problem LP2 is equivalent to the set of all optimal solutions of Problem LP2sym.

Proof.It is easy to see that for a symmetric solution, the objective functions of Problems P2sym and LP2 are identical.Thus, it suffices to show that: (1) the feasible region of Problem LP2sym is contained within the feasible region of Problem LP2, (2) the set of symmetric feasible solutions o Problem LP2 is contained within the feasible region of Problem LP2sym.The latter is immediate, as all constraints in Prob em LP2sym are directly derived from those in Problem LP2 (see rest of the proof for clarification), apart from the reduction in variables which does nothing else than imposing symmetry.

To show the former, we need to show that all constraints of Problem LP2 are satisfied.This is trivial for the positivity constraints ( 14) and (15).The wedge inequalities (11) are also accounted in Problem LP2sym, and thus trivially satisfied, too.

We consider three types of triangle constraints i Problem LP2: those involving two rays from the same bundle and one triangle edge with the triangle edge strength on the left hand side of the < sign, those involving two rays from the same bundle and one triangle edge with the triangle edge strength on the right hand side of the > sign, and those involving three triangle edges.

Constraint ( 12) covers all triangle constraints involving two rays (from the same bundle) and one triangle edge, with the triangle edge strength being upper bounded.Indeed, w t A is the strength of the triangle edges between nodes in A, and w w AB is the strength of the edges in the bundle from any node in B to the two nodes connected by any triangle edge in A.

Triangle constraints involving two rays and one triangle edge that lower bound the triangle edge strength are redundant and can thus be omitted.Indeed, they can be stated as w w AB +w w AB ≤ 2+d •w t A , which is trivially satisfied as each wedge edge has strength at most 1.

Finally, triangle const aints involving three triangle edges within A reduce to w t A +w t A ≤ 2+d •w t A .For d ≥ 2 this constraint is trivially satisfied.For d < 2 it reduces to w t A ≤ 2 2−d .For 2 > d ≥ 1, this constraint is also redundant with the triangle constraints involving the triangle edge and two rays, which imply an upper bound of at most d +1 ≤ 2 2−d for d ≥ 1 (namely for the ray strengths equal to 1).Thus, constraint w t A ≤ 2 2−d must be included in Problem LP2sym only for d < 1.Finally, note that such triangle constraints are only possible in triangle cliques A with |A| ≥ 3. □


The vertex points of the feasible polytope of Problem LP2

The following theorem generalizes the well-known half-integrality result for Problem LP1 [14] to Problem LP2sym.

Theorem 3.15 (Vertices of the feasible polytope).On the vertex points of the feasible polytope of Problem LP2sym, the strengths of t

wedge edges take values w w AB ∈ 0, 1 2 , 1 , and the str
ngths of the triangle edges take values
w t A ∈ 0, 2, d +3 2 , d + 1 for d ≥ 1, or w t A ∈ 0, 2 2−d , d + 1, d +3 2 , 2 for d < 1.
Proof.Assume the contrary, i.e., that a vertex point of this convex feasible polytope can be found that has a different value for one of the edge strengths.To reach a contradiction, we will nudge the wedge edges' strengths w w AB as fol .Note that 0 ≤ w w AB ≤ 1 for all wedge edges (due to the wedge constraints in Eq. ( 11)).Thus, all wedge edge strengths that are not exactly equal to 0, 1 2 , or 1 will be nudged.For the triangle edges we need to distinguish between d ≥ 1 and d < 1.For d ≥ 1, we nudge their strengths as follows:
-if w t A ∈ (0, 2), add ϵ, -if w t A ∈ 2, 2 + (d − 1) • 1 2 = 2, d +3 2 , add (d − 1)ϵ, -if w t A ∈ 2 + (d − 1) • 1 2 , 2 + (d − 1) • 1 = d +3 2 , d + 1 , subtract (d − 1)ϵ.
For d < 1, we nudge the strenghts as follows:
-if w t A ∈ 0, 2 2−d ∪ 2 2−d , d + 1 , add ϵ, -if w t A ∈ 2 + (d − 1) • 1, 2 + (d − 1) • 1 2 s (due to the triangle constraints in Eq. ( 12 edge strengths that are not of one of the values specified in the theorem statement will be nudged.

For sufficiently small |ϵ | no loose constraint will become invalid by this.Furthermore, it is easy to verify that strengths in tight constraints are nudged in corresponding directions, such that all tight constraints remain tight and thus valid.Now, this nudging can be done for positive and negative ϵ, yieldin two new feasible solutions of which the average is the supposed vertex point of the polytope-a contradiction.□ Corollary 3.16.On the vertices of the optimal face of the feasible polytope of Problem LP2sym, the strengths of the wedge edges take values w w AB ∈ 0, 1 2 , 1 , and the strengths of the triangle edge take values
w t A ∈ 2, d +3 2 , d + 1 if d ≥ 1, or w t A ∈ 2 2−d , d + 1, d +3 2 , 2 if d < 1.
Moreover, for d < 1, triangle edge strengths for |A| ≥ 3 are all equal to w t A = 2 2−d throughout the optimal face of the feasible polytope.

Proof.A strength of 0 for a triangle edge can never be optimal, as triangle edges are upper bounded by at l ion of the edge strengths.The second statement follows from the fact that 2 2−d is the smallest possible value for triangle edges when d < 1, nd Eq. ( 13) bounds the triangle edge strengths in triangle cliques A with |A| ≥ 3 to that value.Thus, it is the only possible value for the vertex points of the optimal face of the feasible polytope, and thus for that entire optimal face.□

This means that there always exists an optimal solution to Problem LP2sym where the edge strengths belong to these small sets of possible values.Note that the symmetric optima of Problem LP2 coincide with those of Problem LP2sym, such that this result obviously also applies to the symmetric optima of LP2.


ALGORITHMS

In this section we discuss algorithms for solving the edge-strength inference problems LP1, LP2, LP3, and LP4.The final subsection 4.3 also discusses a number of ways to further reduce the arbitrariness of the optimal solutions.


Using generic LP solvers

First, all proposed formulations are linear progr

s (LP), and
thus, standard LP solvers can be used.In our experimental evaluation we used CVX [7] from within Matlab, and MOSEK [1] as the solver that implements an interior-point method.

Interior-point algorithms for LP run in polynomial tim

namely in O(n 3 L) opera
ions, where n is the number of variables, and L is the number of digits in the problem specification [13].For our problem formulations, L is proportional to the number of constraints.In Today, the development of primal-dual methods and pra tical improvements ensure convergence that is often much faster than this worst-case complexity.Alternatively, one can use the Simplex algorithm, which has worst-case exponential running time, but is known to yield excellent performance in practice [19].


Using the Hochbaum-Naor algorithm

For rational d, we can exploit the special structure of Problems LP1 and LP2 and solve them using more efficient combinatorial algorithms.In particular, the algorithm of Hochbau and Naor [11] is designed for a family of integer problems named 2VAR problems.2VAR problems are integer progra s (IP) with 2 variables per constraint of the form a k x i k − b k x j k ≥ c k with rational a k , b k , and c k , in addition to integer lower and upper bounds on the variables.A 2VAR problem is called monotone if the coefficients a k and b k have the same sign.Otherwise the IP is called non-monotone.

The a

orithm of Hochbaum and Naor [11] g
ves an optimal integral solution for monotone IPs and an optimal half-integral solution for non-monotone IPs.The running time of the algorithm is pseudopolynomial: polynomial in the range (difference between lower bound and upper bound) of the variables.More formally, it is O(n∆ 2 (n + r )), where n is the number of variables, r is the number of constraints, and ∆ is maximum range size.For completeness, we briefly discuss the problem and algorithm for solving it below.

The monotone case.We first consider an IP with monotone inequalities:
max n i=1 d i x i , (monotone IP) such that a k x i k b k x j k ≥ c k for k = 1, . . . , r , (16) ℓ i ≤ x i ≤ u i , x i ∈ Z, for i = 1, . . . , n,(17)
where a k , b k , c k , and d i are rational, while ℓ i and u i are integral.The coefficients a k and b k have the same sign, and d i can be negative.The algorithm is based on constructing a weighted directed graph G ′ = (V ′ , E ′ ) and finding a minimum s-t cut on G ′ .

For the construction of the graph G ′ , for each variable x i in the IP we create a set of (u i − ℓ i + 1) nodes {v ip }, one for each integer p in the range [ℓ i , u i ].An auxiliary source node s and a sink node t are he edges of G ′ are created as follows: First, we connect the source s to all nodes v ip ∈ V + , with ℓ i + 1 ≤ p ≤ u i .We also connect all nodes v ip ∈ V − , with ℓ i + 1 ≤ p ≤ u i , to the sink node t.All these edges have weight |d i |.The rest of the edges described belo have infinite weight.

For the rest of the graph, we add edges from s to all nodes v ip with p = ℓ i -both in V + and V − .For all
ℓ i +1 ≤ p ≤ u i , the node v ip is connected to v i(p−1) by a directed edge. Let q k (p) = ⌈ c k +b k p a k ⌉.
For each inequality k we connect node v j k p , corresponding to x j k with ℓ j k ≤ p ≤ u j k , to the node v i k q , correspond ng to x i k where q = q k (p).If q k (p) is below the feasible range [ℓ i k , u i k ], then the edge is not needed.If q k (p) is above this range, then node v j k p must be connected to the sink t.

Hochbaum and Naor [11] show that the optimal solution of (monotone IP) can be derived from the source se S of minimum s-t cuts on the graph G ′ by setting
x i = max{p | number of variables in the (monotone IP) problem, r is the number of constraints, and ∆ is maximum range size ∆ = max i=[1,n] (u i − ℓ i + 1).Recall that in Problem LP1 the number of variables is equal to the number of edges in the original graph m, the number of constraints is equal to the number of wedges |W|, while the rang size ∆ is constant.

Note also that despite the relatively high worst-case complexity, in practice the graph is sparse and finding the cut is fast.

Monotonization and half-integrality.A non-monotone IP with two variables per constraint is NP-hard.Edelsbrunner et al. [5] showed that a non-monotone IP with two variables per constraint has half-integral solutions, which can be obtained by the following monotonization procedure.Consider a non-monotone IP:
max n i=1 d i x i , (non-monotone IP) such that a k x i k + b k x j k ≥ c k for all k = 1, . . . , m,(18)ℓ i ≤ x i ≤ u i , x i ∈ Z, for all i = 1, . . . , n,(19)
with no constraints on the signs of a k and b k .

For mon tonization we replace each variable x i by
x i = x + i −x − i 2 , where ℓ i ≤ x + i ≤ u i and −u i ≤ x − i ≤ −ℓ i . Each non-mo otone inequality (a k and b k having the same sign) a k x i k + b k x j k ≥ c k is replaced by a pair: a k x + i k − b k x − i k ≥ c k (20) −a k x − i k + b k x + i k ≥ c k(21)
Each monotone inequality āk
x i k − bk x j k ≥ c k is replaced by: āk x + i k − bk x + j k ≥ c k (22) − āk x − i k + bk x − j k ≥ c k( lem (non-monotone IP).

LP1 is a (non-monotone) 2V R system, so that it can directly be solved by the m, such that the Hochbaum and Naor algorithm is not directly applicable.Yet for integer d ≥ 1, Problem LP2sym is a upper bound is max{2, d + 1}, which may be problem in terms of a •w i j for a the smallest integer for which a • d is integer turns it into a 2VAR pro lem again.Thus, for d rational, finding one of the symmetric solutions of Problem LP2 can be done using Hochbaum and Naor's algorithm.Moreover, this symmetric solution will immediately be one of the half-integral solutions we know exist from Corollary 3.16.


Approaches for further reducing arbitrariness

As pointed out in Sec.3.3, Problem LP2sym does not impose symmetry with respect to all graph automorphisms, as it would be impractical to enumerate them.However, in Sec.4.3.1 below we discuss an efficient (polynomial-time) algorithm that is able to find a solution that satisfies all such symmetries, without the need to explicitly enumerate all graph automorphisms.Furthermore, in Sec.4.3.2we discuss a strategy for reducing arbitrariness that is not based on finding a fully symmetric solution.This alternative strategy is to characterize the entire optimal face of the feasible polytope, rather than selecting a sing

optimal (symmetric) solution from it.We furth
rmore propose a number of different algorithms implementing this strategy, which run in polynomial time as well.

Several algorithms discussed below exploit the following characterization of the optimal face.As an example, and with o * the value of the objective at the optimum, for Problem LP2sym this characterization is:
P * = w | {i, j } ∈E w i j = o * , w i j + w ik ≤ 1, for all (i, {j, k}) ∈ W, w i j + w ik ≤ 2 + d • w jk , for all {i, j, k } ∈ T , w i j ≥ 0, for all {i, j} ∈ E.
It is trivial to extend this to the optimal faces of the other problems.


4.3.1

Invariance with respect to all graph automorphisms.Here we discuss an efficient algorithm to find a fully symmetric solution, without explicitly having to enumerate all graph automorphisms.

Given the optimal value of t e objective function of (for example) Problem LP2, consider the following problem which finds a point in the optimal face of the feasible polytope that minimizes the sum of squares of all edge strengths:
min w { iciently using interior point methods.

Theorem 4.1 (Problem LP2fullsym f

ds a solution symmetric with respect to all graph au
omorphisms).The edge strength assignments that minimize Problem LP2fullsym are an optimal solution to Problem LP2 that is symmetric with re pect to all graph automorphisms.

Proof.Let us denote the optimal vector of weights found by solving Problem LP2fullsym as w * .It is clear that w * is an optimal solution to Problem LP2, as it is constrained to be such.Now, we will sume there is a grap automorphism α ∈ A with respect to which it is not symmetric, such that there exists a set of edges {i, j} ∈ E for which
w * i j w * α (i)α (j) .
ue to convexity, w * * with w * * i j = w * i j +w * α (i )α (j)
2 is then also a solution to Problem LP2fullsym and thus to Problem LP2.
However, since a 2 + b 2 ≥ 2 a+b 2 2
for any a, b ∈ R, w * * has a smaller value for the objective of Problem LP2fullsym, su h that w * cannot be optimal-a contradiction.□

For simplicity of notation, we explained this strategy for Problem LP2, but of course it is computationally more attractive to seek a solution within the optimal face of the feasible polytope for Problem LP2sym.


4.3.2

Characterizing the entire optimal face of the feasible polytope.Here, we discuss an alternative strategy for reducing he proposed problem formulations, rather than to select a single (possi ly arbitrary) optimal solution from it.Specifically, we propose three algorithmic implementations of this strategy.

The first algorithmic implementation of this strategy ex ctly characterizes the range of the strength of each edge amongst the optimal solutions.This range can be found by solving, for edge strength w i j for each {i, j} ∈ E, two optimization problems: max w w i j , su

that
∈ P * .and min w w i j , such that w ∈ P * .This is again an LP, and thus requires polynomial time.Yet, it is clear that this approach is impractical, as the number of such optimization problems to be solved is twice the number of variables in the original problem.

The second algorithmic implementation of this strategy is computationally much more attractive, but quantifies the range of each edge strength only partially.It exploits the fact that the strengths at the vertex points of the optimal face belong to a finite set of values.Thus, given any optimal solution, we can be sure that for each edge, there exists an optimal solution for which any given edge's strength is equal to the smallest value within that set equal to or exceeding the value in that optimal solution, as well as a for which it is equal to the largest value within that set equal to or smaller than the value in that optimal solution.To ensure this range is as large as ossible, it is beneficial to avoid finding vertex points of the feasible polytope, and more generally points that do not lie within the relative interior of the optimal face.This can be done in the same polynomial time complexity as solving the LP itself, namely O(n 3 L) where L is the input length of the LP [13].This could be repeated several times with different random restarts to yield wider intervals for each edge strength.

The third implementation is to uniformly sample points (i.e., optimal solutions) from the optimal face P * s.A recent paper [3] details an MCMC algorithm with polynomial mixing time for achieving this.


EMPIRICAL RESULTS

This section contains the main empirical findings.The code used in the experiments is available at https://bitbucket.org/ghentdatascience/ stc-code-public.


Qualitative analysis

To gain some insight in our methods, we start by discussing a simple toy example.Figure 4 shows a network of 8 nodes, modelling a scenario of 2 communities being connected by a bridge, i.e., the edge {4, 5}.The nodes {1, 2, 3, 4} form a near-clique-the edge {1, 3} is missing-while the nodes {5, 6, 7, 8} form a 4-clique.This 4-clique contains a triangle clique: the subgraph induced by the nodes {6, 7, 8}.Triangle edges are colored orange in the figure .Fig. 4a contains a solution to ST

inary.Fig. 4b show
a halfintegral optimal solution to Problem LP1.We observe that for STCbinary we could swap nodes 1 and 3 and obtain a different yet equally good solution, h

ce the strength assig
ment is arbitrary with respect to several edges, while for LP1 the is not the case.Indeed, there is no evidence to prefer a strong label for edges {2, 3} and {3, 4} over the edges {1, 2} and {1, 4}.

Figure 4c shows a symmetric optimal solution to Problem LP2, allowing for multi-level edge strengths.It labels the triangle edges as stronger than all other (wedge) edges, in accordance with Theorem 3.12 and Corollary 3.16.

Finally, Figure 4d shows the outcome of LP4 for d = 1 and C = 1, allowing for edge additions and deletions.For C = 0, the problem becomes unbounded: the edge {4, 5} is only part of wedges, and since wedge violations are unpenalized, w 45 = +∞ is the best solution (see Section 2.2.2).Since this edge is part of 6 wedges, the problem becomes bounded for C > 1/6.For C = 1, the algorithm produces a value of 2 for the absent edge {1, 3}.This suggests the addition of an edge {1, 3} with strength 2 to the network, in order to increase the objective function.This is the only edge being suggested for addition by the algorithm.Edge {4, 5}, on the other hand, is given a value of −1.As discussed in Sec ion 2.2.2, this corresponds to the strength of an absent edge (when d = 1), suggesting the removal of the bridge in the network in order to increase the objective.

For large C there will be no more edge additions being suggested, as can be seen by setting C = ∞ in LP4-STC (reducing it to LP3-STC).The cost of a violation of a wedge constraint will always be higher than the possible benefits.However, regardless of the value of C, the edge {4, 5} is always being suggested for edge deletion.A further illustration on a more realistic network is given in Fig 5, which shows the edge strengths assigned by STCbinary (1st), LP1 (2nd), LP2 with d = 1 (3rd), and LP4 with d = 1 and C = 1 (4th).Also here, we see that STCbinary is forced to make arbitrary choices, while LP1, and LP2 avoids this by making use of an intermediate level.Densely-connected parts of the graph te d to contain edges marked as strong, with an extra level of strength for LP2 assigned to the triangle edges.In comparison with LP2, LP4 suggests to remove a lot of weak edges (weight 0 in LP2) that act as bridges between the communities, in order to allow a stronger labeling in the densely-connected regions.Besides edge removal, it also suggests the addition of edges in a near-cliques to form full cliques.


Objective performance analysis

We evaluate our approaches in a similar manner as Sintos and Tsaparas [18].In particular, we investigate whether the optimal strength assignments correlate to externally provided ground truth measures of tie strength, on a number of networks for which such information is available.Table 1 shows a summary of the dataset statistics and edge weight interpretations.We compare the algorithms STCbinary Greedy (which Sintos and Tsaparas found to perform best), LP1 and LP2.For each dataset, the first row in Table 2 displays the number of edges that are assigned in that category.The second row shows the mean ground truth weight over the labeling assigned by the respective algorithm.

Les Miserables is a network whe
e STCbinary Greedy is known to perform well [18].For this dataset, we can clearly see that our methods provide a correct multi-level strength labeling, enabling more refined notions of tie strength.

A second observation is that in for the networks KDD, Facebook, Twitter, and Authors, neither the existing nor the newly-proposed methods perform well.This raises the question of whether the STC assumption is valid in these networks with the provided ground truths. 5That said, it is reassuring to see that our methods work in a robust and fail-safe way: in such cases, as indicated by the high number of 1/2 strength assignments.For trust networks in particular, however, it has been desc ibed that the STC property is likely to develop due to the transitive property [4].Indeed, if a user A trusts user B and user B trusts user C, then user A has a basis for trusting user C. The two BitCoin networks are examples of suc trust networks.Our methods perform well in identifying some clearly strong and some clearly weak edges, although it again takes a cautious approach in assigning an intermediate strength to many edges.Remarkably though, STCbinary Greedy performs poorly on this network, incorrectly labeling many strong edges as weak and vice versa.

Finally, Table 3 reports running times on a PC with an Intel i7-4800MQ CPU at 2.70GHz and 16 GB RAM of our CVX/MOSEK and Hochbaum-Naor implementations.It demonstrates the superior performance of the latter.Remarkably, the Hochbaum-Naor algorithm performs very comparably to the greedy approximation algorithm for STCbinary.


RELATED WORK

This paper builds on the STC principle, which was proposed in sociology by Simmel [17].Sintos and Tsaparas [18] first considered the problem of labeling the edges of the graph to enforce the STC property: maximize the number of strong edges, such that the network satisfies the STC property.In our work we relax and extend this formulation by introducing new constraints and integer label .To our knowledge, we are the first to introduce and study such formulations.The work by Sintos and Tsaparas [18] is part of a broader line of active recent research aiming to infer the strength of the links in a social network.E.g., Jones et al. [12] uses frequency of online interaction to predict of strength ties with high

ccuracy.Gilbe
t et al. [6] characterize social ties based on similarity and interaction information.Similarly, Xiang et al. [24] estimate relationship strength from homophily principle and interaction patterns and extend the approach to heterogeneous types of relationships.Pham et al. [15] incorporate spatio-temporal features of social iterations to increase accuracy of inferred tie strength.

A related direction of research focuses solely on inferring types of the links in a network.E.g., Tang et al. [20][21][22] propose a generative statistical model, which can be used to classify heterogeneous relationships.The model relies on social theories and incorporates structural properties of the network and node attributes.Their more recent works can also produce strength of the predicted types of ties.Backstrom et al. [2] focuses on the graph structure to identify a particular type of ties-romantic relationships in Facebook.

Most of these works, however, make use of various meta-data and characteristics of social interactions in the networks.In contrast, like Sintos and Tsaparas' work, our aim is to infer strength of ties solely based of graph structure, and in particular on the STC assumption.

Another recent extension of the work of Sintos and Tsaparas [18] is followed by Rozenshtein et al. [16].However, their direction is different: they consider binary strong and weak labeling with additional community connectivity constrains and allow STC violations to satisfy those constraints.


CONCLUSIONS AND URTHER WORK 7.1 Conclusions

We have proposed a sequence of linear programming relaxations of the STCbinary problem introduced by Sintos and Tsaparas [18].These formulations have a number of advantages, most notably their computational complexity, the fact that they refrain from making arbitrary strength assignments in the presence of uncertainty, and as a result, enhanced robustness.Extensive theoretical analysis of the second relaxation (LP2) has not only provided insight into the solution and the arbitrariness the solution from STCbi ary may exhibit, it also yielded a highly efficient algorithm for finding a symmetric (non-arbitrary) optimal strenght assignment.

The empirical results confirm these findings.At the same time, they raise doubts about the validity of the STC property in real-life networks, w th trust networks appearing to be a notable exception.


Further work

Our research results open up a large number of avenues for further research.The first is to investigate whether more efficient algorithms can be found for inferring the range of edge strengths across the optimal face of the

asible polytope.A related research question i
whether the marginal distribution of individual edge strengths, under the uniform distribution of the optimal polytope, can be characterized in a more analytical manner (instead of by uniform sampling).Both these questions seem important beyond the STC problem, and we are unaware of a definite solution to them.

A second line of research is to investigate alternative problem formulations.An obvious variation would be to take into account community structure, and the fact that the STC property probably often fails to hold for wedges that span different communities.A trivial approach would be to simply remove the constraints for such wedges but more sophisticated approaches could exist.Additionally, it would be interesting to investigate the possibility to allow for different relationship types and respective edge strengths, requiring th

STC property
o hold only within each type.Furthermore, the fact that the presented formulations are LPs, combined with the fact that many graph-theoretical properties can be expressed in terms of linear constraints, opens up the possibility to impose additional constraints on the optimal strength assignments without incurring significant computational overhead as compared to the interior point implementation.One line of thought is to impose upper bounds on the sum of edge strengths incident to any given edge, modeling the well-known fact that an individual is limited in how many strong social ti s they can maintain.

A third line of research is whether an active learning approach can be developed, to quickly reduce the number of edges assigned an intermediate strength by our approaches.

More directly, a fourth line of research concerns the question of whether the theoretical understanding of Problems LP1 and LP2 can be transferred more completely to Problems LP3 and LP4 than achieved in the current paper.

Finally, perhaps the most important line of further research concerns the validity of the STC property: could it be modified so as to become more widely applicable across real-life social networks?

Figure 1 :
1
Figure 1: A toy graph illustrating the different type of edges defined in Section 3.1.


Definition 3 . 3 (
33
Triangle cliques).The connected components in the edge-induced subgraph induced by all triangle edges are called triangle cliques.


Lemma 3 . 4 (
34
Neighbors of a triangle cliqe).Consider a triangle clique C ⊆ V , and a node k ∈ V \ C.Then, either {k, i} E for all i ∈ C, or {k, i} ∈ E for all i ∈ C.


( 2 )
2
All rays in the bundle b with smallest (d − 1) • w = b have equal strength w b = w = b in w.In this case, we k ow that w i j ≤ 2 + (d − 1)w = b (due to feasibility of the original optimum).Again, averaging this over all triangle edges {i, j}, yields: w = c ≤ 2 + (d − 1)w = b , with e uality only if all terms are equal


Figure 3 :
3
Figure 3: The contracted graph corresponding to the graph shown in Fig. 2.




particular, problem LP1 has |E| variables and |W| constraints, problem LP2 has |E| variables a d |W| + |T | constraints, and problems LP3 and LP4 have |E| + | Ē| variables and |W| + |T | constraints.Here |E| is the number of edges in the input graph, |W| the number of wedges, and |T | the num