On swarm-level resource allocation in BitTorrent communities

BitTorrent is a peer-to-peer computer network protocol for sharing content in an efficient and scalable way. Modeling and analysis of the popular private BitTorrent communities has become an active area of research. In these communities users are strongly incentivized to contribute their resources, i.e., to share their files. In BitTorrent terminology, users who have finished downloading files and stay online to share these files with others in the network are called seeders. The combination of seeders and downloaders of a file is called a swarm. In this paper we examine and evaluate the efficiency of the resource allocation of seeders in multiple swarms. This is formulated as an integer linear fractional programming problem. The evaluation is done on traces representing two existing BitTorrent communities. We find that in communities, particularly with low users-to-files ratio (which is typically the case), there is room for improvement.

distribute files to their peers through swarms. Peers within a swarm are divided into two classes. Those peers who have and share the complete copy of a file are called seeders, while peers who are downloading the file are called leechers. Note that a peer in a community can be both seeder and leecher simultaneously in multiple swarms. Because asymmetric Internet connections prevail among domestic users, the upload bandwidth available in a swarm is frequently a bottleneck for the download speed. Seeders alleviate this problem and are therefore paramount for download performance.
In this paper, we evaluate how well seeder resource allocation (to be defined later in Sect. 2) serves downloaders in two BitTorrent communities. Each seeder currently allocates its resources among previously downloaded files autonomously. It is not clear whether this strategy yields desirable results in practice. We devise here an algorithm which serves as a tool for investigating the margins for further optimization.
Considerable research and development effort has been invested in designing and evaluating incentive mechanisms to promote seeding (e.g. [6,8,10]), but the analysis of the allocation mechanisms at inter-swarm level has received less attention. The paper [9] deals with the problem of channel-resource imbalance in multi-channel peer-topeer systems, which is similar to the problem considered in this paper, however, their provided solution is heuristic based. The approaches proposed in the papers [3,7] are similar to ours, but the scenarios are studied only under synthetic workloads.
We take the complementary approach of analyzing traces from two BitTorrent communities to contrast normally used resource allocation mechanisms with results from optimizing algorithms. We consider the average download performance as evaluation criteria which indicates the aggregate throughput in the community. Our analysis advances the hypotheses that it is possible to increase download performance in BitTorrent communities through better seeder allocation mechanisms.

Problem formalization
We represent a BitTorrent community as a triplet (G, L , C) at an instant in time considering the demand in each swarm, in which swarms each user can seed and how many swarms each user can seed. The first two aspects are represented by a directed acyclic bipartite graph G = (T ∪U, E l ∪ E s ), where T = {t 1 , . . . , t m } is the set of swarms currently active in the community, U = {u 1 , . . . , u n } is the set of users in that community, E l = {(u, t) : u ∈ U, t ∈ T and u is leeching in t}, and E s = {(u, t) : u ∈ U, t ∈ T and u is able to seed t}. A user is able to seed in a swarm if the user has downloaded the corresponding file in the past and has neither deleted it nor configured the BitTorrent software to stop serving it. We denote L i = {t | (u i , t) ∈ E s } as the library of user u i ; thus L := {L 1 , . . . , L n }.
Besides G and L, we also need to represent how much users can seed. Each user u i has a seeding capacity (hereafter capacity for short) c i , providing C := {c 1 , . . . , c n }. Note that 0 < c i ≤ |L i | holds.
In this paper we investigate a resource allocation problem in which the aim is to find how well a seeding allocation maximizes the mean leeching session throughput. We use the proportion of leechers in swarms to approximate throughput. This metric evaluates an allocation focusing on leeching sessions in a way that the swarms are characterized by the fixed number of downloaders and by the variable number of seeders in them. Thus, we do not consider which user a leeching session belongs to. In order to formalize this, we introduce further notations. Let a i j be a binary parameter which shows if seeder u i has the corresponding file of t j in the library, i.e., We define the decision variable x i j to denote whether seeder u i is seeding in swarm t j : Moreover, λ(t j ) denotes the number of leechers in swarm t j , i.e., λ(t j ) = |{(u, t j ) ∈ E l }|. Our resource allocation problem is to find which is an integer linear fractional programming problem. The term n i=1 a i j x i j gives the number of actual seeders in swarm t j . In the objective function, for each swarm we take the number of actual seeders divided by the size of the swarm (so that this ratio is normalized into (0, 1]) and this ratio is multiplied with the number of leechers in order to weight each swarms when we calculate the global metric (summation of metrics in each swarm). The constraints ensure that the seeders are not seeding more than their capacities.
Our aim is to determine what is the optimal solution of Problem (1) under the typical conditions of real BitTorrent communities. The optimal solution provides us with insights on how good the allocation established by the users of BitTorrent communities following no central instructions is, and how far it is from a random allocation. In order to do so, first a deterministic algorithm is introduced, which is inspired by those used for solving maximum flow problems. Then, we present the datasets obtained from measurements of activities in two BitTorrent communities. The last section gives the numerical results we acquired, followed by concluding remarks.

Algorithm 1 Optimal seeder allocation algorithm
flow in a graph [4,5]. In this section a description of the algorithm is given followed by a proof that it finds the optimal allocation for Problem (1).

Description of the algorithm
Our algorithm takes a BitTorrent community at an instant time as input, transfers it into a flow network graph, on which iteratively selects swarms and increases the number of seeders in them until it reaches the maximum individual values of the sum of the objective function of Problem (1) in all the swarms. That provides the optimal solution of Problem (1), where the feasibility is ensured by the capacity constraints in the flow network.
In the following, we refer to the lines of Algorithm 1 for the formal description. The input of our algorithm is the triplet (G, L , C), which is transfered into a flow network graph G + = (T ∪ U + , E + , ξ). To this end we keep all the existing edges in the graph G, but extending the edges with introducing the source vertex s 0 which is then connected to all the seeders in the community (line 1). We define the capacity of the edges in the following way: for the edges between the source vertex s 0 and the seeders the capacity equals to the seeding capacity, while for the other vertices the capacity equals to 1 (line 2-3). This definition of capacities enforces the constraint in the allocation defined in Problem (1). The initial flow f through graph G + is set to 0 as it is given in line 4. Note that for the flow function f : E + → R, the property of conservation holds: The algorithm starts processing every swarm in a set Q, initialized with the entire set of swarms T (line 6). The main loop (starting at line 7) finds augmenting paths w between the source vertex s 0 and all swarms in Q, and keeps running while there are swarms to be processed in Q. For each iteration, a swarm t max is chosen such that the biggest increment in the objective value is obtained if one more seeder is added to that swarm (line 8). A path w from s 0 to t max is then constructed (line 9). If there is no such path (i.e., the length of w is zero, as checked in line 10), then t max is removed from the set Q and the loop condition is evaluated; otherwise, the flow f is updated through the augmenting path w, see lines 13-16. The final allocation is represented by E ⊆ E s , for which f (u, t) = 1 for all (u, t) ∈ E + s .

Optimality and complexity of the algorithm
The objective function of Problem (1) is a sum of non-negative numbers (which we call sharing ratios in the following). Thus, this sum is maximized if its components are maximized. Moreover, similarly to the flow-conservation in the classical maximum flow algorithm, when the algorithm increases the sharing ratio in the selected swarm t max it does not decrease it in any other swarms.
We show now that if there is no more path w(s 0 , t max ) available for a selected swarm t max , then the swarm t max reaches its maximum sharing ratio value. To give the proof by transposition, suppose that we did not reach the maximum sharing ratio in swarm t max . This means that it is possible to put at least one more seeder u into this swarm. Thus, there is at least one seeder whose capacity is not saturated, i.e., ∃u j : f (s 0 , u j ) < c j . Therefore, a direct link exists from seeder u j to the swarm t max , which provides a w(s 0 , t max ) path, namely the one consists of the edges (s 0 , u j ) and (u j , t max ). Note that the algorithm does not check a swarm t anymore if there is no w(s 0 , t) path to it. Hence, we conclude that when there are no more swarms left in the set Q, then Algorithm 1 reaches the optimum value for Problem (1).
Regarding the complexity of the algorithm we can see that the main iteration loop has to be done |L| times, the Step 8 can be done in O(E s log E s ) time, whereas the path finding in Step 9 takes O((n + m) 2 ) time.

Datasets
This section presents the datasets that inform our analysis and the methodology to extract the necessary information from these datasets.
We use data from two BitTorrent communities: Bitsoup and Filelist. These two communities require users to obtain accounts to participate in the community. In this way, they can track user behavior in all swarms over time. Both communities also employ sharing ratio enforcement to promote seeding. This mechanism prevents users who have not uploaded a minimum proportion of the data volume they downloaded from joining new swarms.
Both traces were collected by periodically crawling Web pages in these communities that report statistics. These include, for each user in each swarm, the user name, current uptime and amounts uploaded and downloaded during this uptime. For Bitsoup, these pages were crawled hourly for 64 days; for Filelist, crawling happened on average every six minutes for 93 days. The main characteristics of the resulting datasets are summarized in Table 1. We notice that the two communities are significantly different in every aspect. There are more users in Filelist, but much less swarms, and the average number of active swarms, as well as the average number of leeching sessions, are also much lower compared to Bitsoup.

Estimating user capacities and libraries
We consider the capacity of a user at time τ to be the number of swarms the user is seeding in at τ . To estimate the contents of the libraries of users, we consider two scenarios that bound the worst-and best-case configurations from the allocation perspective. The worst-case scenario, named conservative, considers that a file is in the library of a user at time τ if that user was observed seeding this file both in the past, [τ − w p , τ ], and future, [τ, τ +w f ]. The values w p and w f define windows of observation. The second scenario, named optimistic, identifies the best-case configuration. In this scenario, a file is in the library of a user at time τ if that user is observed seeding it at least once in the past, [τ − w p , τ ]. Maintaining these windows constant allows us to perform unbiased comparisons of possible allocations at different times and in different traces.

Sampling community states
To analyze the seeding allocation in a community, we look at a sample of snapshots of that community taken at random times. All snapshot times τ i must allow for τ i − w p and τ i + w f to be contained in the sampled trace. The larger the time windows w p and w f are, the better the library estimations, but the less space for choosing snapshots and hence the more potential sampling bias.
We address this problem by devising a compromise by setting w p = w f to the largest value such that we still have a time period of at least one week for choosing snapshots in both communities. We consider that randomly sampling times in a one week interval accounts for most of the significant fluctuation of BitTorrent users' behavior. As a result, we have w p = w f = 28 days.

Numerical results
This section presents the numerical results obtained on the datasets which were introduced in Sect. 4. From both communities we selected 10 instances and created the Objective function values of Problem (1) are shown conservative and the optimistic scenarios. The allocations recorded in the community traces serve as the baseline for possible improvements. We call this baseline the observed allocation and compare it to random allocation, which represents a completely uninformed algorithm, and to the optimal allocation given by our Algorithm 1.
The results obtained for the Filelist community are presented in Table 2. We observe first that the conservative scenario gives space for tiny improvements only, both for random and optimized allocations. This is due to the fact that in this case the variety of possible allocations is very small. On the other hand, considering the optimistic scenario, the possible improvements are much larger. The observed allocations are already 20-45 % better than those of random allocations. The optimal allocations give about 7 % improvements compared to the observed. We conclude that in both scenarios the current allocations are already giving close to optimal allocations, which is mainly due to the fact of large peers-to-swarms ratio.
Albeit the possible improvement is little in terms of the objective value of Problem (1), the actual allocation can be still diverse in different scenarios. To investigate this, Fig. 1 depicts histograms of the swarms' seeder-to-leecher ratio (SL R) for a selected Filelist instance for the observed, conservative and optimistic scenarios. The optimized conservative scenario results in a very similar seeder distribution to that of the observed. On the other hand, the optimized optimistic scenario provides much better distribution: it eliminates most of the very high over-seeding situations as well as decreases the under-seeding ones (the number of swarms with SL R ≤ 1 got decreased from 34 to 12) and establishes good ratios between 4 and 12.
Regarding the Bitsoup community, the results are more divergent, as we can see in Table 3. Considering the conservative scenario, even random allocation can sometimes give better results than the current ones. Though these are only minor improvements, if any. Comparing the current allocations to the optimal ones, we obtain about 7 % improvements. Turning to the optimistic scenario, which enables larger search space, the current allocation gives about 20 % improvement to the random selection.  The optimal allocation provides even better allocations, having further 20 % improvements compared to the current ones. Given that the associated graph of Bitsoup (as it was defined in Sect. 2) is sparser (compared to that of Filelist), giving larger variation of possible allocations, the optimized allocations lead to significantly better average leeching session throughput Further analysis of the allocations in a particular Bitsoup instance is given in Fig. 2. The histograms of SLR in the observed, conservative optimized and optimistic optimized scenarios are shown. We can observe that, first of all, in the observed allocation, swarms with SL R ≤ 1 are the most frequent ones, followed by decreasing frequencies of SL R values. For the conservative scenario, the optimized allocation already shows a bit different pattern. Here the swarms with SL R ≤ 1 have a lower frequency compared to those up to and including 5. From that value on, we obtain a pattern similar to the observed allocation. Finally, for the optimistic optimized scenario, the histogram gives the evidence of a completely different resource allocation structure. SL R values below 3 are much less frequent, and again, we hardly obtain highly overseeded swarms. We conclude that the optimal allocation in both cases (conservative and optimistic) leads not only to better throughput, thus faster average download time, but also more balanced allocation of seeders.  Fig. 2 Histograms of seeder-to-leecher ratios in a BitSoup instance (b01): observed (left), conservative optimized (middle), and optimistic optimized (right)

Conclusion
The seeders in BitTorrent file sharing communities can decide in which swarms they want to share their resources. We have shown that optimizing the seeder resource allocation across multiple swarms is equivalent to an integer optimization problem. We evaluated the seeder resource allocation in two communities and compared them to both optimized and random allocations in worst-case and best-case scenarios. Summarizing our findings we conclude that in typical communities, where the number of users is relatively low compared to the number of shared files, it is possible to improve the average throughput as well as decrease the number of under-and over-seeded swarms at the same time.