Skip to main content
Log in

General Caching Is Hard: Even with Small Pages

  • Published:
Algorithmica Aims and scope Submit manuscript

Abstract

Caching (also known as paging) is a classical problem concerning page replacement policies in two-level memory systems. General caching is the variant with pages of different sizes and fault costs. The strong NP-hardness of its two important cases, the fault model (each page has unit fault cost) and the bit model (each page has the same fault cost as size) has been established, but under the assumption that there are pages as large as half of the cache size. We prove that this already holds when page sizes are bounded by a small constant: The bit and fault models are strongly NP-complete even when page sizes are limited to \(\{1, 2, 3\}\). Considering only the decision versions of the problems, general caching is equivalent to the unsplittable flow on a path problem and therefore our results also improve the hardness results about this problem.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Achlioptas, D., Chrobak, M., Noga, J.: Competitive analysis of randomized paging algorithms. Theor. Comput. Sci. 234(1–2), 203–218 (2000). doi:10.1016/S0304-3975(98)00116-9. (A preliminary version appeared at ESA 1996)

    Article  MathSciNet  MATH  Google Scholar 

  2. Adamaszek, A., Czumaj, A., Englert, M., Räcke, H.: An O(log k)-competitive algorithm for generalized caching. In: Proceedings of the 23rd Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 1681–1689 (2012). doi:10.1137/1.9781611973099

  3. Albers, S., Arora, S., Khanna, S.: Page replacement for general caching problems. In: Proceedings of the 10th Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 31–40 (1999). http://dl.acm.org/citation.cfm?id=314500.314528

  4. Anagnostopoulos, A., Grandoni, F., Leonardi, S., Wiese, A.: A mazing \(2+\epsilon \) approximation for unsplittable flow on a path. In: Chekuri, C. (ed.) Proceedings of the Twenty-Fifth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2014, Portland, Oregon, USA, January 5–7, 2014, pp. 26–41. SIAM (2014). doi:10.1137/1.9781611973402.3

  5. Bansal, N., Buchbinder, N., Naor, J.: Randomized competitive algorithms for generalized caching. SIAM J. Comput. 41(2), 391–414 (2012). doi:10.1137/090779000. (A preliminary version appeared at STOC 2008)

    Article  MathSciNet  MATH  Google Scholar 

  6. Bansal, N., Chakrabarti, A., Epstein, A., Schieber, B.: A quasi-PTAS for unsplittable flow on line graphs. In: Kleinberg, J.M. (ed.) Proceedings of the 38th Annual ACM Symposium on Theory of Computing, pp. 721–729. ACM (2006). doi:10.1145/1132516.1132617

  7. Bar-Noy, A., Bar-Yehuda, R., Freund, A., Naor, J., Schieber, B.: A unified approach to approximating resource allocation and scheduling. J. ACM 48(5), 1069–1090 (2001). doi:10.1145/502102.502107. (A preliminary version appeared at STOC 2000)

    Article  MathSciNet  MATH  Google Scholar 

  8. Batra, J., Garg, N., Kumar, A., Mömke, T., Wiese, A.: New approximation schemes for unsplittable flow on a path. In: Indyk, P. (ed.) Proceedings of the 26th Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 47–58. SIAM (2015). doi:10.1137/1.9781611973730.5

  9. Belady, L.A.: A study of replacement algorithms for a virtual-storage computer. IBM Syst. J. 5(2), 78–101 (1966). doi:10.1147/sj.52.0078

    Article  Google Scholar 

  10. Bonsma, P.S., Schulz, J., Wiese, A.: A constant-factor approximation algorithm for unsplittable flow on paths. SIAM J. Comput. 43(2), 767–799 (2014). doi:10.1137/120868360

    Article  MathSciNet  MATH  Google Scholar 

  11. Borodin, A., El-Yaniv, R.: Online Computation and Competitive Analysis. Cambridge University Press, Cambridge (1998)

    MATH  Google Scholar 

  12. Brodal, G.S., Moruz, G., Negoescu, A.: Onlinemin: a fast strongly competitive randomized paging algorithm. Theory Comput. Syst. 56(1), 22–40 (2015). doi:10.1007/s00224-012-9427-y. (A preliminary version appeared at WAOA 2011)

    Article  MathSciNet  MATH  Google Scholar 

  13. Chrobak, M., Karloff, H.J., Payne, T.H., Vishwanathan, S.: New results on server problems. SIAM J. Discrete Math. 4(2), 172–181 (1991). doi:10.1137/0404017. (A preliminary version appeared at SODA 1990)

    Article  MathSciNet  MATH  Google Scholar 

  14. Chrobak, M., Larmore, L.L., Lund, C., Reingold, N.: A better lower bound on the competitive ratio of the randomized 2-server problem. Inf. Process. Lett. 63(2), 79–83 (1997). doi:10.1016/S0020-0190(97)00099-9

    Article  MathSciNet  MATH  Google Scholar 

  15. Chrobak, M., Woeginger, G.J., Makino, K., Xu, H.: Caching is hard–even in the fault model. Algorithmica 63(4), 781–794 (2012). doi:10.1007/s00453-011-9502-9. (A preliminary version appeared at ESA 2010)

    Article  MathSciNet  MATH  Google Scholar 

  16. Darmann, A., Pferschy, U., Schauer, J.: Resource allocation with time intervals. Theor. Comput. Sci. 411(49), 4217–4234 (2010). doi:10.1016/j.tcs.2010.08.028

    Article  MathSciNet  MATH  Google Scholar 

  17. Fiat, A., Karp, R.M., Luby, M., McGeoch, L.A., Sleator, D.D., Young, N.E.: Competitive paging algorithms. J. Algorithms 12(4), 685–699 (1991). doi:10.1016/0196-6774(91)90041-V. (A preliminary version appeared in 1988)

    Article  MATH  Google Scholar 

  18. Folwarczný, L., Sgall, J.: General caching is hard: Even with small pages. In: Elbassioni, K.M., Makino, K. (eds.) Algorithms and Computation—26th International Symposium, ISAAC 2015, Nagoya, Japan, December 9–11, 2015, Proceedings, Lecture Notes in Computer Science, vol. 9472, pp. 116–126. Springer (2015). doi:10.1007/978-3-662-48971-0_11

  19. Irani, S.: Page replacement with multi-size pages and applications to web caching. Algorithmica 33(3), 384–409 (2002). doi:10.1007/s00453-001-0125-4. (A preliminary version appeared at STOC 1997)

    Article  MathSciNet  MATH  Google Scholar 

  20. McGeoch, L.A., Sleator, D.D.: A strongly competitive randomized paging algorithm. Algorithmica 6(6), 816–825 (1991). doi:10.1007/BF01759073. (A preliminary version appeared in 1989)

    Article  MathSciNet  MATH  Google Scholar 

  21. Young, N.E.: On-line file caching. Algorithmica 33(3), 371–383 (2002). doi:10.1007/s00453-001-0124-5. (A preliminary version appeared at SODA 1998)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgments

We are grateful to the anonymous reviewers, in particular for bringing the articles [10, 16] to our attention and for suggestions that helped us to improve the presentation.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lukáš Folwarczný.

Additional information

Partially supported by the Center of Excellence—ITI, Project P202/12/G061 of GA ČR (J. Sgall) and by the Project 14-10003S of GA ČR (L. Folwarczný).

Appendix: The Simple Proof

Appendix: The Simple Proof

In this appendix, we present a simple variant of the proof for the almost-fault model with two distinct costs. This completes the sketch of the proof presented at the end of Sect. 2. We present it with a complete description of the simplified reduction, so that it can be read independently of the rest of the paper. This appendix can therefore serve as a short proof of the hardness of general caching.

Theorem 6.1

General caching is strongly NP-hard, even in the case when page sizes are limited to 1, 2, 3 and there are only two distinct fault costs.

We prove the theorem for the optional policy. It is easy to obtain the theorem also for the forced policy the same way as in the proof of Theorem 5.1.

1.1 The Reduction

The reduction described here will be equivalent to Reduction 2.1 with \(H = 1\) and the fault cost of each vertex-page set to \(1/(n+1)\).

Suppose we have a graph \(G=(V,E)\) with n nodes and m edges. We construct an instance of general caching whose optimal solution encodes a maximum independent set in G. Fix an arbitrary numbering of edges \(e_1, \ldots , e_m\).

The cache size is \(C=2m+1\). For each vertex v, we have a vertex-page \(p_v\) with size one and cost \(1/(n+1)\). For each edge e, we have six associated edge-pages \(a^e, \bar{a}^e, \alpha ^e, b^e, \bar{b}^e, \beta ^e\); all have cost one, pages \(\alpha ^e,\beta ^e\) have size three and the remaining pages have size two.

The request sequence is organized in phases and blocks. There is one phase for each vertex. In each phase, there are two adjacent blocks associated with every edge e incident with v; the incident edges are processed in an arbitrary order. In addition, there is one initial block I before all phases and one final block F after all phases. Altogether, there are \(d=4m+2\) blocks. There are four blocks associated with each edge e; denote them \(B^e_1\), \(B^e_2\), \(B^e_3\), \(B^e_4\), in the order as they appear in the request sequence.

For each \(v\in V\), the associated page \(p_v\) is requested exactly twice, right before the beginning of the v-phase and right after the end of the v-phase; these requests do not belong to any phase. An example of the structure of phases and blocks is given in Fig. 6.

Fig. 6
figure 6

An example of phases, blocks and requests on vertex-pages for a graph with three vertices u, v, w and two edges \(e_1 = \{u, w\}\), \(e_2 = \{v, w\}\) when \(H = 2\)

Even though each block is associated with some fixed edge, it contains one or more requests to the associated pages for every edge e. In each block, we process the edges \(e_1,\ldots ,e_m\) in this order. For each edge e, we make one or more requests to the associated pages according to Table 3.

Table 3 Requests associated with an edge e

Figure 7 shows an example of the requests on edge-pages associated with one particular edge.

Fig. 7
figure 7

Requests on all pages associated with the edge e. Each column represents some block(s). The four labeled columns represent the blocks in the heading, the first column represents every block before \(B^e_1\), the middle column represents every block between \(B^e_3\) and \(B^e_4\), and the last column represents every block after \(B^e_4\). The requests in one column are ordered from top to bottom

1.2 Proof of Correctness

Instead of minimizing the service cost, we maximize the saving compared to the service which does not use the cache at all. This is clearly equivalent when considering the decision version of the problem.

Without loss of generality, we assume that any page is brought into the cache only immediately before a request to that page and removed from the cache only immediately after a request to that page; furthermore, at the beginning and at the end the cache is empty. I.e., a page may be in the cache only between two consecutive requests to this page, and either it is in the cache for the whole interval or not at all.

Each page of size three is requested only twice in two consecutive blocks, and these blocks are distinct for all pages of size three. Thus, a service of edge-pages is valid if and only if at each time, at most m edge-pages are in the cache. It is thus convenient to think of the cache as of m slots for edge-pages.

Each vertex-page is requested twice. Thus, the saving on the n vertex-pages is at most \(n/(n+1)<1\). Since all edge-pages have cost one, the optimal service must serve them optimally. Furthermore, a vertex-page can be cached if and only if during the phase it never happens that at the same time all slots for edge-pages are full and a page of size three is cached.

Let \(S_B\) denote the set of all edge-pages cached at the beginning of the block B and let \(s_B = |S_B|\). Now observe that each edge-page is requested only in a contiguous segment of blocks, once in each block. It follows that the total saving on edge-pages is equal to \(\sum _B s_B\) where the sum is over all blocks. In particular, the maximal possible saving on the edge-pages is \((d-1)m\), using the fact that \(S_I\) is empty.

We prove that there is a service with the total saving at least \((d-1)m+K/(n+1)\) if and only if there is an independent set of size K in G. First the easy direction.

Lemma 6.2

Suppose that G has an independent set W of size K. Then there exists a service with the total saving \((d-1)m+K/(n+1)\).

Proof

For any e, denote \(e=\{u, v\}\) so that u precedes v in the ordering of phases. If \(u\in W\), we keep \(\bar{a}^e, b^e, \bar{b}^e\) and \(\beta ^e\) in the cache from the first to the last request on each page, and we do not cache \(a^e\) and \(\alpha ^e\) at any time. Otherwise we cache \(\bar{b}^e, a^e, \bar{a}^e\) and \(\alpha ^e\), and do not cache \(b^e\) and \(\beta ^e\) at any time. In both cases, at each time at most one page associated with e is in the cache and the saving on those pages is \((d-1)m\). See Fig. 8 for an illustration.

Fig. 8
figure 8

The two ways of caching in Lemma 6.2

For any \(v\in W\), we cache \(p_v\) between its two requests. To check that this is a valid service, observe that if \(v\in W\), then during the corresponding phase no page of size three is cached. Thus the page \(p_v\) always fits in the cache together with at most m pages of size two. \(\square \)

Now we prove the converse in a sequence of claims. Fix a valid service with saving at least \((d-1)m\). For a block B, let \(B^{\prime }\) denote the following block.

Claim 6.3

For any block B, with the exception of \(B=I\), we have \(s_B=m\).

Proof

For each \(B\ne I\) we have \(s_B\le m\). Because \(s_I = 0\), the total saving on edge-pages is \(\sum _B s_B\le (d-1)m\). We need an equality. \(\square \)

We now prove that each edge occupies exactly one slot during the service.

Claim 6.4

For any block \(B\ne I\) and for any e, \(S_B\) contains exactly one page associated with e.

Proof

Let us use the notation \(S_B^{\le k} = S_B^{e_1} \cup \cdots \cup S_B^{e_k}\) and \(s_B^{\le k} = \left| S_B^{\le k}\right| \). First, we shall prove for each \(k \le m\)

$$\begin{aligned} s_B^{\le k} = k. \end{aligned}$$
(8)

This is true for \(B=F\), as only the m edge-pages \(\bar{b}^e\) can be cached there, and by the previous claim all of them are indeed cached. Similarly for \(B=I^{\prime }\) (i.e., immediately following the initial block).

If (8) is not true, then for some k and \(B\not \in \{I,F\}\) we have \(s_B^{\le k}<s_{B^{\prime }}^{\le k}\). Then after processing the edge \(e_k\) in the block B we have in the cache all the pages in \((S_B{\setminus }S_B^{\le k})\cup S_{B^{\prime }}^{\le k}\). Their number is \((m-s_B^{\le k})+s_{B^{\prime }}^{\le k}>m\), a contradiction.

The statement of the claim is an immediate consequence of (8). \(\square \)

Claim 6.5

For any edge e, at least one of the pages \(\alpha ^e\) and \(\beta ^e\) is cached between its two requests.

Proof

Assume that none of the two pages is cached. It follows from the previous claim that \(b^e\in S_{B^e_2}\), as at this point \(\alpha ^e\) and \(b^e\) are the only pages associated with e that can be cached. Similarly, \(a^e\in S_{B^e_4}\).

It follows that there exists a block B between \(B^e_1\) and \(B^e_4\) such that \(S_B\) contains the page \(b^e\) and \(S_{B^{\prime }}\) contains the page \(a^e\). However, in B, the page \(a^e\) is requested before the page \(b^e\). Thus at the point between the two requests, the cache contains two pages associated with e, plus one page associated with every other edge, the total of \(m+1\) pages, a contradiction. \(\square \)

Now we are ready to complete this direction.

Lemma 6.6

Suppose that there exists a valid service with the total saving \((d-1)m+K/(n+1)\). Then G has an independent set W of size K.

Proof

Let W be the set of all v such that \(p_v\) is cached between its two requests. The total saving implies that \(|W|=K\).

Now we claim that W is independent. Suppose not, let \(e=uv\) be an edge with \(u,v\in W\). Then \(p_u\) and \(p_v\) are cached in the corresponding phases. Thus neither \(\alpha ^e\) nor \(\beta ^e\) can be cached, since together with other \(m-1\) requests of size 2 associated with the remaining edges, the cache size needed would be \(2m+2\). However, this contradicts the last claim. \(\square \)

Lemmas 6.2 and 6.6 together show that we constructed a valid polynomial-time reduction from the problem of independent set to general caching. Therefore, Theorem 6.1 is proven.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Folwarczný, L., Sgall, J. General Caching Is Hard: Even with Small Pages. Algorithmica 79, 319–339 (2017). https://doi.org/10.1007/s00453-016-0185-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00453-016-0185-0

Keywords

Navigation