Application: Sharing Links, Multiple Access, BuffersTopics: Central Limit Theorem, Confidence Intervals, Queueing, Randomized Protocols

Background:

  • General RV (B.4)

3.1 Sharing Links

One essential idea in communication networks is to have different users share common links.

For instance, many users are attached to the same coaxial cable; a large number of cell phones use the same base station; a WiFi access point serves many devices; the high-speed links that connect buildings or cities transport data from many users at any given time (Figs. 3.1 and 3.2).

Fig. 3.1
figure 1

Shared coaxial cable for internet access, TV, and telephone

Fig. 3.2
figure 2

Cellular base station antennas

Networks implement this sharing of physical resources by transmitting bits that carry information of different users on common physical media such as cables, wires, optical fibers, or radio channels. This general method is called multiplexing. Multiplexing greatly reduces the cost of the communication systems. In this chapter, we explain statistical aspects of multiplexing.

In the internet, at any given time, a number of packet flows share links. For instance, 20 users may be downloading web pages or video files and use the same coaxial cable of their service provider.

The transmission control protocol (TCP) arranges for these different flows to share the links as equally as possible (at least, in principle).

We focus our attention on a single link, as shown in Fig. 3.3. The link transmits bits at rate C bps. If ν connections are active at a given time, they each get a rate Cν. We want to study the typical rate that a connection gets. The nontrivial aspect of the problem is that ν is a random variable.

Fig. 3.3
figure 3

A random number ν of connections share a link with rate C

As a simple model, assume that there are N ≫ 1 users who can potentially use that link. Assume also that the users are active independently, with probability p. Thus, the number ν of active users is Binomial(N, p) that we also write as B(N, p). (See Sect. B.2.8.)

Figure 3.4 shows the probability mass function for N = 100 and p = 0.1, 0.2, and 0.5. To be specific, assume that N = 100 and p = 0.2. The number ν of active users is B(100, 0.2) that we also write as Binomial(100, 0.2). On average, there are Np = 20 active users. However, there is some probability that a few more than 20 users are active. We want to find a number m so that the likelihood that there are more than m active users is negligible, say 5%. Given that value, we know that each active user gets at least a rate Cm, with probability 95%.

Fig. 3.4
figure 4

The probability mass function of the Binomial(100, p) distribution, for p = 0.1, 0.2 and 0.5

Thus, we can dimension the links, or provision the network, based on that value m. Intuitively, m should be slightly larger than the mean. Looking at the actual distribution, for instance, by using Python’s “ppf” as in Fig. 3.5, we find that

$$\displaystyle \begin{aligned} P(\nu \leq 27) = 0.966 > 95\% \mbox{ and } P(\nu \leq 26) = 0.944 < 95\%. \end{aligned} $$
(3.1)

Thus, the smallest value of m such that P(ν ≤ m) ≥ 95% is m = 27.

Fig. 3.5
figure 5

The Python tool “ppf” shows (3.1)

To avoid having to use distribution tables or computation tools, we use the fact that the binomial distribution is well approximated by a Gaussian random variable that we discuss next.

3.2 Gaussian Random Variable and CLT

Definition 3.1 (Gaussian Random Variable)

  1. (a)

    A random variable W is Gaussian, or normal, with mean 0 and variance 1, and one writes \(W =_D {\mathcal {N}}(0, 1)\), if its probability density function (pdf) is f W where

    $$\displaystyle \begin{aligned} f_W(x) = \frac{1}{\sqrt{2 \pi}} \exp\left\{ - \frac{x^2}{2} \right\}, x \in \Re. \end{aligned}$$

    One also says that W is a standard normal random variable, or a standard Gaussian random variable (Named after C.F. Gauss, see Fig. 3.6).

    Fig. 3.6
    figure 6

    Carl Friedrich Gauss. 1777–1855

  2. (b)

    A random variable X is Gaussian, or normal, with mean μ and variance σ 2, and we write \(X =_D {\mathcal {N}}(\mu , \sigma ^2)\), if

    $$\displaystyle \begin{aligned} X = \mu + \sigma W, \end{aligned}$$

    where \(W =_D {\mathcal {N}}(0, 1)\). Equivalently,Footnote 1 the pdf of X is given by

    $$\displaystyle \begin{aligned} f_X(x) = \frac{1}{\sqrt{2 \pi \sigma^2}} \exp\left\{- \frac{(x - \mu)^2}{2 \sigma^2}\right\}.\end{aligned} $$

Figure 3.7 shows the pdf of a \({\mathcal {N}}(0, 1)\) random variable W. Note in particular that

$$\displaystyle \begin{aligned} P(W > 1.65) \approx 5\%, P(W > 1.96) \approx 2.5\% \mbox{ and } P(W > 2.32) \approx 1\%.\end{aligned} $$
(3.2)
Fig. 3.7
figure 7

The pdf of a \({\mathcal {N}}(0, 1)\) random variable

The Central Limit Theorem states that the sum of many small independent random variables is approximately Gaussian. This result explains that thermal noise, due to the agitation of many electrons, is Gaussian. Many other natural phenomena exhibit a Gaussian distribution when they are caused by a superposition of many independent effects.

Theorem 3.1 (Central Limit Theorem)

Let {X(n), n ≥ 1} be i.i.d. random variables with mean E(X(n)) = μ and variance var(X(n)) = σ 2 . Then, as n ∞,

$$\displaystyle \begin{aligned} \frac{X(1) + \cdots + X(n) - n \mu}{\sigma \sqrt{n}} \Rightarrow {\mathcal{N}}(0, 1).\end{aligned} $$
(3.3)

\({\blacksquare }\)

In (3.3), the symbol ⇒ means convergence in distribution. Specifically, if {Y (n), n ≥ 1} are random variables, then \(Y(n) \Rightarrow {\mathcal {N}}(0, 1)\) means that

$$\displaystyle \begin{aligned} P(Y(n) \leq x) \rightarrow P(W \leq x), \forall x \in \Re,\end{aligned} $$

where W is a \({\mathcal {N}}(0, 1)\) random variable. We prove this result in the next chapter.

More generally, one has the following definition.

Definition 3.2 (Convergence in Distribution)

Let {X(n), n ≥ 1} and X be random variables. One says that X(n) converges in distribution to X, and one writes X(n) ⇒ X, if

$$\displaystyle \begin{aligned} P(X(n) \leq x) \rightarrow P(X \leq x), \mbox{ for all } x \mbox{ s.t. } P(X = x) = 0.\end{aligned} $$
(3.4)

As an example, let X(n) = 3 + 1∕n for n ≥ 1 and X = 3. It is intuitively clear that the distribution of X(n) converges to that of X. However,

$$\displaystyle \begin{aligned} P(X(n) \leq 3) = 0 \not \rightarrow P(X \leq 3) = 1.\end{aligned} $$

But,

$$\displaystyle \begin{aligned} P(X(n) \leq x) \rightarrow P(X \leq x), \forall x \neq 3.\end{aligned} $$

This example explains why the definition (3.4) requires convergence of P(X(n) ≤ x) to P(X ≤ x) only for x such that P(X = x) = 0.

How does this notion of convergence relate to convergence in probability and almost sure convergence? First note that convergence in distribution is defined even if the random variables X(n) and X are not on the same probability space, since it involves only the distributions of the individual random variables. One can showFootnote 2 that

$$\displaystyle \begin{aligned} X(n) \overset{a.s.}{\rightarrow} X \mbox{ implies } X(n) \overset{p}{\rightarrow} X \mbox{ implies } X(n) \Rightarrow X. \end{aligned}$$

Thus, convergence in distribution is the weakest form of convergence.

Also, a fact that I find very comforting is that if X(n) ⇒ X, then one can construct random variables Y (n) and Y  on the same probability space so that Y (n) =D X(n) and Y =D X and

$$\displaystyle \begin{aligned} Y(n) \rightarrow Y, \mbox{ with probability } 1. \end{aligned}$$

This may seem mysterious but is in fact quite obvious. First note that a random variable with cdf F(⋅) can be constructed by choosing a random variable Z =D U[0, 1] and defining (see Fig. 3.8)

$$\displaystyle \begin{aligned} X(Z) = \inf\{ x \in \Re \mid F(x) \geq Z\}. \end{aligned}$$

Indeed, one then has P(X(Z) ≤ a) = F(a) since X(Z) ≤ a if and only if Z ∈ [0, F(a)], which has probability F(a) since Z =D U[0, 1]. But then, if X(n) ⇒ X, we have \(F_{X_n}(x) \rightarrow F_X(x)\) whenever P(X = x) = 0, and this implies that

$$\displaystyle \begin{aligned} X_n(z) = \inf\{ x \in \Re \mid F_{X_n}(x) \geq z\} \rightarrow X(z) = \inf\{ x \in \Re \mid F(x) \geq z\}, \end{aligned}$$

for all z.

Fig. 3.8
figure 8

If Z =D U[0, 1], then cdf of X(Z) is F

3.2.1 Binomial and Gaussian

Figure 3.9 compares the binomial and Gaussian distributions.

Fig. 3.9
figure 9

Comparing Binomial(100, 0.2) with Gaussian(20, 16)

To see why these distributions are similar, note that if X =D B(N, p), then one can write

$$\displaystyle \begin{aligned} X = Y_1 + \cdots + Y_N,\end{aligned} $$

where the random variables Y n are i.i.d. and Bernoulli with parameter p. Thus, by the CLT,

$$\displaystyle \begin{aligned} \frac{X - Np}{\sqrt{N}} \approx {\mathcal{N}}\big(0, \sigma^2\big),\end{aligned} $$

where \(\sigma ^2 = \mbox{var}(Y_1) = E(Y_1^2) - (E(Y_1))^2 = p(1 - p)\). Hence, one can argue that

$$\displaystyle \begin{aligned} B(N, p) \approx_D {\mathcal{N}}\big(Np, N \sigma^2\big) =_D {\mathcal{N}}(Np, Np(1 - p)). \end{aligned} $$
(3.5)

For p = 0.2 and N = 100, one concludes that \(B(100, 0.2) \approx {\mathcal {N}}(20, 16)\), which is confirmed by Fig. 3.9.

3.2.2 Multiplexing and Gaussian

We now apply the Gaussian approximation of a binomial distribution to multiplexing. Recall that we were looking for the smallest value of m such that P(B(N, p) > m) ≤ 5%. The ideas are as follows. From (3.5) and (3.2), we have

$$\displaystyle \begin{aligned} & \mbox{(1) } B(N, p) \approx {\mathcal{N}}(Np, Np(1-p)), \mbox{ for } N \gg 1; \\ & \mbox{(2) } P(N\big(\mu, \sigma^2\big) > \mu + 1.65 \sigma) \approx 5\%. \end{aligned} $$

Combining these facts, we see that, for N ≫ 1,

$$\displaystyle \begin{aligned} P(B(N, p) > Np + 1.65 \sqrt{Np(1 - p)}) \approx 5\%. \end{aligned}$$

Thus, the value of m that we are looking for is

$$\displaystyle \begin{aligned} m = Np + 1.65 \sqrt{Np(1 - p)} = 20 + 1.65 \sqrt{16} \approx 27. \end{aligned}$$

A look at Fig. 3.9 shows that it is indeed unlikely that ν is larger than 27 when ν =D B(100, 0.2).

3.2.3 Confidence Intervals

One can invert the calculation that we did in the previous section and try to guess p from the observed fraction Y (N) of active users out of N ≫ 1. From the ideas (1) and (2) above, together with the symmetry of the Gaussian distribution around its mean, we see that the events

$$\displaystyle \begin{aligned} A_1 = \{B(N, p) \geq Np + 1.65 \sqrt{Np(1 - p)} \} \end{aligned}$$

and

$$\displaystyle \begin{aligned} A_2 = \{B(N, p) \leq Np - 1.65\sqrt{Np(1 - p)} \} \end{aligned}$$

each have a probability close to 5%. With Y (N) =D B(N, p)∕N, we see that

$$\displaystyle \begin{aligned} A_1 = \left\{Y(N) \geq p + 1.65 \sqrt{\frac{p(1-p)}{N}} \right\} \end{aligned}$$

and

$$\displaystyle \begin{aligned} A_2 = \left\{Y(N) \leq p - 1.65 \sqrt{\frac{p(1-p)}{N}} \right\}. \end{aligned}$$

Hence, the event A 1 ∪ A 2 has probability close to 10%, so that its complement has probability close to 90%. Consequently,

$$\displaystyle \begin{aligned} P\left(Y(N) - 1.65 \sqrt{\frac{p(1 - p)}{N}} \leq p \leq Y(N) + 1.65 \sqrt{\frac{p(1 - p)}{N}} \right) \approx 90\%. \end{aligned}$$

We do not know p, but p(1 − p) ≤ 1∕4. Hence, we find

$$\displaystyle \begin{aligned} P\left(Y(N) - 0.83 \frac{1}{\sqrt{N}} \leq p \leq Y(N) + 0.83 \frac{1}{\sqrt{N}} \right) \geq 90\%. \end{aligned}$$

For N = 100, this gives

$$\displaystyle \begin{aligned} P(Y(N) - 0.08 \leq p \leq Y(N) + 0.08) \geq 90\%. \end{aligned}$$

For instance, if we observe that 30% of the 100 users are active, then we guess that p is between 0.22 and 0.38, with probability 90%. In other words, [Y (N) − 0.08, Y (N) + 0.08] is a 90%-confidence interval for p.

Figure 3.7 shows that we can get a 5%-confidence interval by replacing 1.65 by 2. Thus, we see that

$$\displaystyle \begin{aligned} \left[Y(N) - \frac{1}{\sqrt{N}}, Y(N) + \frac{1}{\sqrt{N}}\right] \end{aligned} $$
(3.6)

is a 95%-confidence interval for p.

How large should N be to have a good estimate of p? Let us say that we would like to know p plus or minus 0.03 with 95% confidence. Using (3.6), we see that we need

$$\displaystyle \begin{aligned} \frac{1}{\sqrt{N}} = 3\%, \mbox{ i.e., } N = 1,089. \end{aligned}$$

Thus, Y (1, 089) is an estimate of p with an error less than 0.03, with probability 95%. Such results form the basis for the design of public opinion surveys.

In many cases, one does not know a bound on the variance. In such situations, one replaces the standard deviation by the sample standard deviation. That is, for i.i.d. random variables {X(n), n ≥ 1} with mean μ, the confidence intervals for μ are as follows:

$$\displaystyle \begin{aligned} \left[\mu_n - 1.65 \frac{\sigma_n}{\sqrt{n}}, \mu_n + 1.65 \frac{\sigma_n}{\sqrt{n}}\right] &= 90\%-\mbox{Confidence Interval} \\ \left[\mu_n - 2 \frac{\sigma_n}{\sqrt{n}}, \mu_n + 2 \frac{\sigma_n}{\sqrt{n}}\right] &= 95\%-\mbox{Confidence Interval}, \end{aligned} $$

where

$$\displaystyle \begin{aligned} \mu_n = \frac{X(1) + \cdots + X(n)}{n} \end{aligned}$$

and

$$\displaystyle \begin{aligned} \sigma_n^2 = \frac {\sum_{m=1}^n (X(m) - \mu_n)^2}{n-1} = \frac{n}{n-1} \left\{ \frac{\sum_{m=1}^n X(m)^2}{n} - \mu_n^2 \right\}. \end{aligned}$$

What’s up with this n − 1 denominator? You probably expected the sample variance to be the arithmetic mean of the squares of the deviations from the sample mean, i.e., a denominator n in the first expression for \(\sigma _n^2\). It turns out that to make the estimator such that \(E(\sigma _n^2) = \sigma ^2\), i.e., to make the estimator unbiased, one should divide by n − 1 instead of n. The difference is negligible for large n, obviously. Nevertheless, let us see why this is so.

For simplicity of notation, assume that E(X(n)) = 0 and let σ 2 = var(X(n)) = E(X(n)2). Note that

$$\displaystyle \begin{aligned} & n^2 E( \big(X(1) - \mu_n\big)^2) \\ & ~~~ = E\big(\big(nX(1) - X(1) - X(2) - \cdots - X(n)\big)^2\big) \\ & ~~~ = E\big((n-1)^2X(1)^2\big) + E\big(X(2)^2\big) + \cdots + E\big(X(n)^2\big) \\ & ~~~ = (n-1)^2 \sigma^2 + (n-1) \sigma^2 = n(n-1) \sigma^2. \end{aligned} $$

For the second equality, note that the cross-terms E(X(i)X(j)) for i ≠ j vanish because the random variables are independent and zero-mean.

Hence,

$$\displaystyle \begin{aligned} E\big( (X(1) - \mu_n)^2\big) = \frac{n-1}{n} \sigma^2 \mbox{ and } \sum_{m=1}^n E\big( (X(m) - \mu_n)^2\big) = (n-1) \sigma^2. \end{aligned}$$

Consequently, an unbiased estimate of σ 2 is

$$\displaystyle \begin{aligned} \sigma_n^2 := \frac{1}{n-1} \sum_{m=1}^n E\big( (X(m) - \mu_n)^2\big). \end{aligned}$$

3.3 Buffers

The internet is a packet-switched network. A packet is a group of bits of data together with some control information such as a source and destination address, somewhat like an envelope you send by regular mail (if you remember that). A host (e.g., a computer, a smartphone, or a web cam) sends packets to a switch. The switch has multiple input and output ports, as shown in Fig. 3.10.

Fig. 3.10
figure 10

A switch with multiple input and output ports

The switch stores the packets as they arrive and sends them out on the appropriate output port, based on the destination address of the packets. The packets arrive at random times at the switch and, occasionally, packets that must go out on a specific output port arrive faster than the switch can send them out. When this happens, packets accumulate in a buffer. Consequently, packets may face a queueingFootnote 3 delay before they leave the switch. We study a simple model of such a system.

3.3.1 Markov Chain Model of Buffer

We focus on packets destined to one particular output port. Our model is in discrete time. We assume that one packet destined for that output port arrives with probability λ ∈ [0, 1] at each time instant, independently of previous arrivals. The packets have random sizes, so that they take random times to be transmitted. We assume that the time to transmit a packet is geometrically distributed with parameter μ and all the transmission times are independent. Let X n be the number of packets in the output buffer at time n, for n ≥ 0. At time n, a transmission completes with probability μ and a new packet arrives with probability λ, independently of the past. Thus, X n is a Markov chain with the state transition diagram shown in Fig. 3.11.

Fig. 3.11
figure 11

The transition probabilities for the buffer occupancy for one of the output ports

In this diagram,

$$\displaystyle \begin{aligned} p_2 &= \lambda(1 - \mu) \\ p_0 &= \mu(1 - \lambda) \\ p_1 &= 1 - p_0 - p_2. \end{aligned} $$

For instance, p 2 is the probability that one new packet arrives and that the transmission of a previous does not complete, so that the number of packets in the buffer increases by one.

3.3.2 Invariant Distribution

The balance equations are

$$\displaystyle \begin{aligned} & \pi(0) = (1 - p_2)\pi(0) + p_0\pi(1) \\ & \pi(n) = p_2 \pi(n-1) + p_1 \pi(n) + p_0 \pi(n+1), 1 \leq n \leq N - 1 \\ & \pi(N) = p_2 \pi(N-1) + (1 - p_0) \pi(N). \end{aligned} $$

You can verify that the solution is given by

$$\displaystyle \begin{aligned} \pi(i) = \pi(0) \rho^i, i = 0, 1, \ldots, N \mbox{ where } \rho := \frac{p_2}{p_0}. \end{aligned}$$

Since the probabilities add up to one, we find that

$$\displaystyle \begin{aligned} \pi(0) = \left[\sum_{i = 0}^N \rho^i\right]^{-1} = \frac{1 - \rho}{1 - \rho^{N+1}}. \end{aligned}$$

In particular, the average value of X under the invariant distribution is

$$\displaystyle \begin{aligned} E(X) &= \sum_{i=0}^N i \pi(i) = \pi(0) \sum_{i=0}^N i \rho^i \\ &= \rho \frac{N\rho^{N+1} - (N+1) \rho^N + 1}{(1 - \rho)(1 - \rho^{N+1})}\\ &\approx \frac{\rho}{1 - \rho} = \frac{p_2}{p_0 - p_2} = \frac{\lambda(1 - \mu)}{\mu - \lambda},\end{aligned} $$

where the approximation is valid if ρ < 1, i.e., λ < μ, and N ≫ 1 so that N ≪ 1.

Figure 3.12 shows a simulation of this queue when λ = 0.16, μ = 0.20, and N = 20. It also shows the average queue length over n steps and we see that it approaches λ(1 − μ)∕(μ − λ) = 3.2. Note that this queue is almost never full, which explains that one can let N → in the expression for E(X).

Fig. 3.12
figure 12

A simulation of the queue with λ = 0.16, μ = 0.20, and N = 20

3.3.3 Average Delay

How long do packets stay in the switch? Consider a packet that arrives when there are k packets already in the buffer. That packet then leaves after k + 1 packet transmissions. Since each packet transmission takes 1∕μ steps, on average, the expected time that the packet spends in the switch is (k + 1)∕μ. Thus, to find the expected time a packet stays in the switch, we need to calculate the probability ϕ(k) that an arriving packet finds k packets already in the buffer. Then, the expected time W that a packet stays in the switch is given by

$$\displaystyle \begin{aligned} W = \sum_{k \geq 0} \frac{k + 1}{\mu} \phi (k).\end{aligned} $$

The result of the calculation is given in the next theorem.

Theorem 3.2

If λ < μ, one has

$$\displaystyle \begin{aligned} W = \frac{1}{\lambda} E(X) = \frac{1 - \mu}{\lambda - \mu}. \end{aligned}$$

\({\blacksquare }\)

Proof

The calculation is a bit lengthy and the details may not be that interesting, except that they explain how to calculate ϕ(k) and that they show that the simplicity of the result is quite remarkable.

Recall that ϕ(k) is the probability that there are k + 1 packets in the buffer after a given packet arrives at time n. Thus, ϕ(k) = P[X(n + 1) = k + 1∣A(n) = 1] where A(n) is the number of arrivals at time n. Now, if D(n) is the number of transmission completions at time n,

$$\displaystyle \begin{aligned} \phi(k) &= P[X(n) = k + 1, D(n) = 1 \mid A(n) = 1] \\ &\quad + P[X(n) = k, D(n) = 0 \mid A(n) = 1]. \end{aligned} $$

Also,

$$\displaystyle \begin{aligned} &P[X(n) = k + 1, D(n) = 1 \mid A(n) = 1] = \frac{P[X(n) = k + 1, D(n) = 1, A(n) = 1]}{P(A(n) = 1)} \\ &\quad = \frac{1}{\lambda} P(X(n) = k + 1)P[D(n) = 1, A(n) = 1 \mid X(n) = k + 1] \\ &\quad = \frac{1}{\lambda} \pi(k+1) \lambda \mu = \pi(k+1) \mu. \end{aligned} $$

Similarly,

$$\displaystyle \begin{aligned} & P[X(n) = k, D(n) = 0 \mid A(n) = 1] = \frac{P[X(n) = k, D(n) = 0, A(n) = 1]}{P(A(n) = 1)} \\ &~~~~ = \frac{1}{\lambda} P(X(n) = k)P[D(n) = 0, A(n) = 1 \mid X(n) = k] \\ &~~~~ = \frac{1}{\lambda} \pi(k) \lambda (1 - \mu 1\{k > 0\}) = \pi(k) (1 - \mu 1\{k > 0\}). \end{aligned} $$

Hence,

$$\displaystyle \begin{aligned} \phi(k) = \pi(k) (1 - \mu 1\{k > 0\}) + \pi(k+1) \mu. \end{aligned}$$

Consequently, the expected time W that a packet spends in the switch is given by

$$\displaystyle \begin{aligned} W &= \sum_{k \geq 0} \frac{k+1}{\mu} \phi(k) = \frac{1}{\mu} + \frac{1}{\mu} \sum_{k \geq 0} k \pi(k) (1 - \mu 1\{k=0\}) + \sum_{k \geq 0} k \pi(k+1) \\ &= \frac{1}{\mu} + \frac{1}{\mu} \sum_{k \geq 0} k \pi(k) (1 - \mu) + \sum_{k \geq 1} (k - 1) \pi(k) \\ &= \frac{1}{\mu} + \frac{1 - \mu}{\mu} E(X) + E(X) - 1 = \frac{1}{\mu} + \frac{1}{\mu} E(X) - 1 \\ &= \frac{1}{\mu} + \frac{\lambda (1 - \mu)}{\mu(\mu - \lambda)} - 1 = \frac{1 - \mu}{\mu - \lambda} = \frac{1}{\lambda} E(X). \end{aligned} $$

3.3.4 A Note About Arrivals

Since the arrivals are independent of the backlog in the buffer, it is tempting to conclude that the probability that a packet finds k packet in the buffer upon its arrival is π(k). An argument in favor of this conclusion looks as follows:

$$\displaystyle \begin{aligned} P[X_{n+1} = k + 1 \mid A_n = 1] &= P[X_n = k \mid A_n = 1] \\ &= P[X_n = k] = \pi(k), \end{aligned} $$

where the second identity comes from the independence of the arrivals A n and the backlog X n. However, the first identity does not hold since it is possible that X n+1 = k, X n = k, and A n = 1. Indeed, one may have D n = 1.

If one assumes that λ < μ ≪ 1, then the probability that A n = 1 and D n = 1 is negligible and it is then the case that π(k) ≈ π(k). We encounter that situation in Sect. 5.6.

3.3.5 Little’s Law

The previous result is a particular case of Little’s Law (Little 1961) (Fig. 3.13).

Fig. 3.13
figure 13

John D. C. Little. b. 1928

Theorem 3.3 (Little’s Law)

Under weak assumptions,

$$\displaystyle \begin{aligned} L = \lambda W, \end{aligned}$$

where L is the average number of customers in a system, λ is the average arrival rate of customers, and W is the average time that a customer spends in the system. \({\blacksquare }\)

One way to understand this law is to consider a packet that leaves the switch after having spent T time units. During its stay, λT packets arrive, on average. So the average backlog in the switch should be λT.

It turns out that Little’s law applies to very general systems, even those that do not serve the packets in their order of arrival.

One way to see this is to think that each packet pays the switch one unit of money per unit of time it spends in the switch. If a packet spends T time units, on average, in the switch, then each packet pays T, on average. Thus, the switch collects money at the rate of λT per unit of time, since λ packets go through the switch per unit of time and each pays an average of T. Another way to look at the rate at which the switch is getting paid is to realize that if there are L packets in the switch at any given time, on average, then the switch collects money at rate L, since each packet pays one unit per unit time. Thus, L = λT.

3.4 Multiple Access

Imagine a number of smartphones sharing a WiFi access point, as illustrated in Fig. 3.14. They want to transmit packets.

Fig. 3.14
figure 14

A number of smartphones share a WiFi access point

If multiple smartphones transmit at the same time, the transmissions garble one another, and we say that they collide. We discuss a simple scheme to regulate the transmissions and achieve a large rate of success. We consider a discrete time model of the situation.

There are N devices. At time n ≥ 0, each device transmits with probability p, independently of the others. This scheme, called randomized multiple access , was proposed by Norman Abramson in the late 1960s for his Aloha network (Abramson 1970) (Fig. 3.15).

Fig. 3.15
figure 15

Norman Abramson, b. 1932

The number X(n) of transmissions at time n is then B(N, p) (see (B.4)). In particular, the fraction of time that exactly one device transmits is

$$\displaystyle \begin{aligned} P(X(n) = 1) = N p(1 - p)^{N-1}. \end{aligned}$$

The maximum over p of this success rate occurs for p = 1∕N and it is λ where

$$\displaystyle \begin{aligned} \lambda^* = \left(1 - \frac{1}{N}\right)^{N - 1} \approx \frac{1}{e} \approx 0.36. \end{aligned}$$

In this derivation, we use the fact that

$$\displaystyle \begin{aligned} \left(1 - \frac{a}{N}\right)^N \approx e^{-a} \mbox{ for } N \gg 1. \end{aligned} $$
(3.7)

Thus, this scheme achieves a transmission rate of about 36%. However, it requires selecting p = 1∕N, which means that the devices need to know how many other devices are active (i.e., try to transmit). We discuss an adaptive scheme in the next chapter that does not require that information.

3.5 Summary

  • Gaussian random variable \({\mathcal {N}}(\mu , \sigma ^2)\);

  • CLT;

  • Confidence Intervals;

  • Buffers: average backlog and delay; Little’s Law;

  • Multiple Access Protocol.

3.5.1 Key Equations and Formulas

Table 1

3.6 References

The buffering analysis is a simple example of queueing theory. See Kleinrock (1975–6) for a discussion of queueing models of computer and communication systems.

3.7 Problems

Problem 3.1

Write a Python code to compute the number of people to poll in a public opinion survey to estimate the fraction of the population that will vote in favor of a proposition within α percent, with probability at least 1 − β. Use an upper bound on the variance. Assume that we know that p ∈ [0.4, 0.7].

Problem 3.2

We are conducting a public opinion poll to determine the fraction p of people who will vote for Mr. Whatshisname as the next president. We ask N 1 college-educated and N 2 non-college-educated people. We assume that the votes in each of the two groups are i.i.d. B(p 1) and B(p 2), respectively, in favor of Whatshisname. In the general population, the percentage of college-educated people is known to be q.

  1. (a)

    What is a 95%-confidence interval for p, using an upper bound for the variance.

  2. (b)

    How do we choose N 1 and N 2 subject to N 1 + N 2 = N to minimize the width of that interval?

Problem 3.3

You flip a fair coin 10, 000 times. The probability that there are more than 5085 heads is approximately (choose the correct answer)

  • 15%;

  • 10%;

  • 5%;

  • 2.5%;

  • 1%.

Problem 3.4

Write a Python simulation of a buffer where packets arrive as a Bernoulli process with rate λ and geometric service times with rate μ. Plot the simulation and calculate the long-term average backlog.

Problem 3.5

Consider a buffer that can transmit up to M packets in parallel. That is, when there are m packets in the buffer, \(\min \{m, M\}\) of these packets are being transmitted. Also, each of these packets completes transmission independently in the next time slot with probability μ. At each time step, a packet arrives with probability λ.

  1. (a)

    What are the transition probabilities of the corresponding Markov chain?

  2. (b)

    For what values of λ, M, and μ do you expect the system to be stable?

  3. (c)

    Write a Python simulation of this system.

Problem 3.6

In order to estimate the probability of head in a coin flip, p, you flip a coin n times, and count the number of heads, S n. You use the estimator \(\hat {p}=S_n/n\). You choose the sample size n to have a guarantee

$$\displaystyle \begin{aligned}P(|S_n/n-p|\geq \epsilon) \leq \delta. \end{aligned}$$
  1. (a)

    What is the value of n suggested by Chebyshev’s inequality? (Use a bound on the variance.)

  2. (b)

    How does this value change when 𝜖 is reduced to half of its original value?

  3. (c)

    How does it change when δ is reduced to half of its original value?

  4. (d)

    Compare this value of n with that given by the CLT.

Problem 3.7

Let {X n, n ≥ 1} be i.i.d. U[0, 1] and Z n = X 1 + ⋯ + X n. What is P(Z n > n)? What would the estimate be of the same probability obtained from the Central Limit Theorem?

Problem 3.8

Consider one buffer where packets arrive one by one every 2 s and take 1 s to transmit. What is the average delay through the queue per packet? Repeat the problem assuming that the packets arrive ten at a time every 20 s. This example shows that the delay depends on how “bursty” the traffic is.

Problem 3.9

Show that if \(X(n) \overset {p}{\rightarrow } X\), then X(n) ⇒ X.

Hint

Assume that P(X = x) = 0. To show that P(X(n) ≤ x) → P(X ≤ x), note that if |X(n) − X|≤ 𝜖 and X ≤ x, then X(n) ≤ X + 𝜖.