Abstract
Functional programming (FP) lets users focus on the business logic of their applications by providing them with high-level and composable abstractions. However, both automatic memory management schemes traditionally used for FP, namely tracing garbage collection and reference counting, may introduce latencies in places that can be hard to predict, which limits the applicability of the FP paradigm.
We reevaluate the use of lazy reference counting in single-threaded functional programming with guaranteed constant-time memory management, meaning that allocation and deallocation take only a bounded and predictable amount of time. This approach does not leak memory as long as we use uniform allocation sizes. Uniform allocation sizes were previously considered impractical in the context of imperative programming, but we find it to be surprisingly suitable for FP.
Preliminary benchmark results suggest that our approach is practical, as its performance is on par with Koka’s existing state-of-the-art implementation of reference counting for FP, sometimes even outperforming it. We also evaluate the effect of different allocation sizes on application performance and suggest ways of allowing large allocation in non-mission-critical parts of the program via Koka’s effect system.
We believe this potentially opens the door to many new industrial applications of FP, such as its use in real-time embedded software. In fact, the development of a high-level domain-specific language for describing latency-critical quantum physics experiments was one of the original use cases that prompted us to initiate this work.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Note that since object sizes are a constant of the program, field accesses still take constant time, even for split objects.
- 2.
Commit hash: b167030.
- 3.
The only change needed for hybrid reference counting in the CTRC allocator implementation is the addition of a check for the dirty bit stored in object headers.
- 4.
slightly modified to merge rules related to lambda and constructor for cleaner presentation.
References
Blackburn, S.M., McKinley, K.S.: Ulterior reference counting: fast garbage collection without a long wait. In: Crocker, R., Jr., G.L.S. (eds.) Proceedings of the 2003 ACM SIGPLAN Conference on Object-Oriented Programming Systems, Languages and Applications, OOPSLA 2003, 26–30 October 2003, Anaheim, CA, USA, pp. 344–358. ACM (2003), https://doi.org/10.1145/949305.949336
Blelloch, G.E., Wei, Y.: Concurrent fixed-size allocation and free in constant time (2020). https://doi.org/10.48550/ARXIV.2008.04296, https://arxiv.org/abs/2008.04296
Boehm, H.J.: The space cost of lazy reference counting. In: Proceedings of the 31st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2004, pp. 210–219. Association for Computing Machinery, New York (2004). https://doi.org/10.1145/964001.964019. ISBN 158113729X
Bruno, R., Jovanovic, V., Wimmer, C., Alonso, G.: Compiler-assisted object inlining with value fields. In: Freund, S.N., Yahav, E. (eds.) PLDI 2021: 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation, Virtual Event, Canada, 20–25 June 2021, pp. 128–141. ACM (2021). https://doi.org/10.1145/3453483.3454034
Collins, G.E.: A method for overlapping and erasure of lists. Commun. ACM 3(12), 655–657 (1960). https://doi.org/10.1145/367487.367501. ISSN 0001-0782
Comer, D.: Ubiquitous B-tree. ACM Comput. Surv. (CSUR) 11(2), 121–137 (1979)
Czaplicki, E., Chong, S.: Asynchronous functional reactive programming for GUIs. In: Boehm, H., Flanagan, C. (eds.) ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2013, Seattle, WA, USA, 16–19 June 2013, pp. 411–422. ACM (2013). https://doi.org/10.1145/2491956.2462161
Dolby, J.: Automatic inline allocation of objects. In: Chen, M.C., Cytron, R.K., Berman, A.M. (eds.) Proceedings of the ACM SIGPLAN 1997 Conference on Programming Language Design and Implementation (PLDI), Las Vegas, Nevada, USA, 15–18 June 1997, pp. 7–17. ACM (1997). https://doi.org/10.1145/258915.258918
Doligez, D., Leroy, X.: A concurrent, generational garbage collector for a multithreaded implementation of ML. In: Deusen, M.S.V., Lang, B. (eds.) Conference Record of the Twentieth Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, Charleston, South Carolina, USA, January 1993, pp. 113–123. ACM Press (1993). https://doi.org/10.1145/158511.158611
Elliott, C., Hudak, P.: Functional reactive animation. In: International Conference on Functional Programming (1997). http://conal.net/papers/icfp97/
Joisha, P.G.: Compiler optimizations for nondeferred reference: counting garbage collection. In: Proceedings of the 5th International Symposium on Memory Management, ISMM 2006, pp. 150–161. Association for Computing Machinery, New York (2006). https://doi.org/10.1145/1133956.1133976. ISBN 1595932216
Joisha, P.G.: Overlooking roots: a framework for making nondeferred reference-counting garbage collection fast. In: Proceedings of the 6th International Symposium on Memory Management, ISMM 2007, pp. 141–158. Association for Computing Machinery, New York (2007). https://doi.org/10.1145/1296907.1296926. ISBN 9781595938930
Leijen, D., Zorn, B., de Moura, L.: Mimalloc: free list sharding in action. In: Lin, A.W. (ed.) APLAS 2019. LNCS, vol. 11893, pp. 244–265. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-34175-6_13
McBeth, J.H.: Letters to the editor: on the reference counter method. Commun. ACM 6(9), 575 (1963). https://doi.org/10.1145/367593.367649. ISSN 0001-0782
Nilsson, H., Courtney, A., Peterson, J.: Functional reactive programming, continued. In: Proceedings of the 2002 ACM SIGPLAN Workshop on Haskell, Haskell 2002, pp. 51–64. Association for Computing Machinery, New York (2002). https://doi.org/10.1145/581690.581695. ISBN 1581136056
Partain, W.: The nofib benchmark suite of Haskell programs. In: Launchbury, J., Sansom, P.M. (eds.) Functional Programming, Glasgow 1992. Workshops in Computing, pp. 195–202. Springer, London (1992). https://doi.org/10.1007/978-1-4471-3215-8_17
Powers, B., Tench, D., Berger, E.D., McGregor, A.: Mesh: compacting memory management for C/C++ applications. In: Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2019, pp. 333–346. Association for Computing Machinery, New York (2019). https://doi.org/10.1145/3314221.3314582. ISBN 9781450367127
Puaut, I.: Real-time performance of dynamic memory allocation algorithms. In: 14th Euromicro Conference on Real-Time Systems (ECRTS 2002), 19-21 June 2002, Vienna, Austria, Proceedings, pp. 41–49. IEEE Computer Society (2002). https://doi.org/10.1109/EMRTS.2002.1019184
Reinking, A., Xie, N., de Moura, L., Leijen, D.: Perceus: garbage free reference counting with reuse. In: Freund, S.N., Yahav, E. (eds.) PLDI 2021: 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation, Virtual Event, Canada, 20–25 June 2021, pp. 96–111. ACM (2021). https://doi.org/10.1145/3453483.3454032
Shahriyar, R., Blackburn, S.M., Frampton, D.: Down for the count? Getting reference counting back in the ring. In: Proceedings of the 2012 International Symposium on Memory Management, ISMM 2012, pp. 73–84. Association for Computing Machinery, New York (2012). https://doi.org/10.1145/2258996.2259008. ISBN 9781450313506
Ullrich, S., de Moura, L.: Counting immutable beans: reference counting optimized for purely functional programming. In: Proceedings of the 31st Symposium on Implementation and Application of Functional Languages, IFL 2019. Association for Computing Machinery, New York (2021). https://doi.org/10.1145/3412932.3412935. ISBN 9781450375627
Wan, Z., Hudak, P.: Functional reactive programming from first principles. In: Proceedings of the ACM SIGPLAN 2000 Conference on Programming Language Design and Implementation, PLDI 2000, pp. 242–252. Association for Computing Machinery, New York (2000). https://doi.org/10.1145/349299.349331. ISBN 1581131992
Wan, Z., Taha, W., Hudak, P.: Real-time FRP. In: Pierce, B.C. (ed.) Proceedings of the Sixth ACM SIGPLAN International Conference on Functional Programming (ICFP 2001), Firenze (Florence), Italy, 3–5 September 2001, pp. 146–156. ACM (2001). https://doi.org/10.1145/507635.507654
Weizenbaum, J.: Symmetric list processor. Commun. ACM 6(9), 524–536 (1963). https://doi.org/10.1145/367593.367617. ISSN 0001-0782
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendices
A Formalization
In this section, we present the formal operational semantics of CTRC, prove its soundness, and show that it is garbage-free when the free list is empty (and the free list is always used for new allocations when non-empty).
1.1 A.1 Syntax
Figure 9 shows the syntax of \(\lambda ^1\) which is the same presented by Reinking et al. [19]. It is an untyped lambda calculus extended with explicit binding, pattern matching, as well as duplicate and drop instructions. Note that the duplicate and drop instructions are added by the compiler into the compiled program and are not written by the user. Constructors with fields \(x_1, x_2, \ldots , x_n\) are denoted as \(C~\overline{x_i}^n\). Functions with parameter x, body e, free variables \(y_1, y_2, \ldots , y_n\) are denoted as \(\lambda ^{\overline{y_i}^n}x.~e\).
We also define a few syntactic shorthands to simplify the presentation. We define sequence \(e_1; e_2\) as binding \(e_1\) to an unused variable x, functions are written as \(\lambda x.~e\) when the free variables are not important, and define \(\textrm{dropf}~ v\) for functions and constructors. \(\textrm{dropf}\) is used for dropping the fields of constructors and free variables of functions. It is a syntactic shorthand because it can be expanded into a fixed number of drops.
There are three different evaluation judgments, corresponding to different operational semantics.
-
Baseline Semantics The baseline semantics is the typical operational semantics that does not model memory management. Note that the syntax for match expression is modified to \(\textrm{match}~ e \;\{\overline{p_i\rightarrow e_i}^n\}\), as the variable being matched is replaced with a value. The evaluation rules for the baseline semantics are shown in Fig. 10, where the app rule is function application, the bind rule is variable binding, and the match rule is pattern matching. \(\textrm{drop}~\) and \(\textrm{dup}~\) instructions are ignored, as the baseline semantics does not model memory management and these two instructions are only for reference counting. In this paper, the baseline semantics serves as the baseline for program behavior, where the other two operational semantics should simulate. The simulation relation is shown with the simplified program trace defined below.
-
Reference Koka Semantics The reference Koka semantics, which we shall later refer to as the eager semantics, models memory management with reference counting. The heap H is a mapping from variable to reference count and value. The evaluation judgment \(H \bigm |e \longrightarrow _{\textrm{k}}H' \bigm |e'\) reads as follows: given a heap H, the expression e is evaluated to \(e'\) with the heap updated to \(H'\). This semantics is discussed in detail below.
-
New CTRC Semantics The new CTRC semantics, which can also be called the lazy semantics, models memory management with lazy reference counting. Instead of just a heap, the semantics also includes a free list F which stores reusable allocations. The evaluation judgment \(H; F \bigm |e \longrightarrow _{\textrm{c}}H'; F' \bigm |e'\) reads as follows: given a heap H and free list F, the expression e is evaluated to \(e'\), with the heap updated to \(H'\) and the free list updated to \(F'\).
1.2 A.2 Reference Koka Semantics
Figure 11 shows the originalFootnote 4 Koka reference-counted heap semantics. The evaluation context E uniquely determines where to apply an evaluation step. The are tools we introduced to facilitate the proofs below. When the boxed instructions are removed, the semantics become the baseline semantics without reference counting, so the boxed instructions can be viewed as internal routines of the memory management scheme. Note that \(\textrm{drop}~\) and \(\textrm{dup}~\) instructions may be added by the compiler, i.e. they exist statically in the program, and are not boxed. They can also be internal routines of the memory management scheme, which are introduced on the right-hand side of evaluation rules and are boxed. We use a box to denote instructions that can either be normal or boxed.
Values are allocated in the heap with rule \(\textit{new}_{\textrm{k}}\) and evaluated to the variable pointing to the allocation. The freshly allocated variable has a reference count of 1. Function application with rule \(\textit{app}_{\textrm{k}}\) duplicates the captured values of the function, drop the function allocation itself, and then perform the actual application via substitution. Similarly, for pattern matching, rule \(\textit{match}_{\textrm{k}}\) duplicates the fields of the constructor, drop the constructor object itself, and substitute the fields to the pattern in the matched case. The \(\textit{drop}_{\textrm{k}}\) and \(\textit{dup}_{\textrm{k}}\) rules update the reference count of the target variable. When the reference count reaches 1, the drop instruction instead is evaluated according to \(\textit{free}_{\textrm{k}}\), which drops the fields of the value and deallocates the allocation.
We define simplified program trace as the sequence of program states when executed according to some operational semantics, excluding the heap, free list and all steps that have boxed instructions. The simplified program trace corresponds to the execution trace of the baseline semantics, and should be the same for both the reference Koka semantics and the CTRC semantics.
For example, the full program-trace of \(\textrm{val}~x=C_1;\; \textrm{dup}~ x;\; \textrm{val}~y=\lambda ^{x} z.\ x;\; \textrm{drop}~ x;\; \textrm{drop}~ y;\; \lambda x.\ x\) according to the reference Koka semantics is
Each row above shows the rule used to arrive at the current state, current heap and the resulting expression. The simplified program trace contains states 1–7, 9–10. State 8 is excluded from the simplified trace because it contains boxed instructions.
1.3 A.3 New CTRC Semantics
We define the operational semantics for constant-time reference-counted heap in Fig. 12, i.e. the lazy semantics. The reference count in the heap can now be zero, indicating the value is no longer reachable and is added to the free list. The free list, which is denoted by F, contains a list of memory locations that the program can reuse.
The major differences between the reference Koka semantics and the CTRC semantics are the allocation and deallocation rules. When the free list is empty, allocation requests are met by requesting more memory from the system according to rule \(\textit{new}_{\textrm{c}}\), which is the same as the rule \(\textit{new}_{\textrm{k}}\) in the reference Koka semantics. When the free list is non-empty, however, the first entry in the free list is used to meet the request, and the fields in the original value of the entry are dropped according to the rule \(\textit{newr}_{\textrm{c}}\), where the r suffix stands for reuse.
For the previous example, the program trace is
Each row above shows the rule used to arrive at the current state, current heap, current free list, and the expression being evaluated. The simplified program trace is the above trace excluding step 9. Note that the full trace for both the eager semantics and the lazy semantics are very similar, except in the last few steps where they treat free and allocation differently. For the eager semantics, step 8 recursively drops the field of w, while the lazy semantics just put w into the free list. The field of w is dropped when there are new allocation requests, which happens in step 9 above. When the states involving boxed instructions are removed, the simplified traces for both semantics are the same and correspond to the baseline semantics.
1.4 A.4 Metatheory
In this section, we prove the correctness of the CTRC semantics. We show that the simplified program trace for the reference Koka semantics and the CTRC semantics are equal. From this, we derive that the CTRC semantics never reuse memory before the reference Koka semantics drop them. By the soundness of the reference Koka semantics, the CTRC is also sound because it cannot cause memory corruption. We then prove that the system is garbage-free when the free list is empty. As CTRC would not request memory from the system when the free list is non-empty, it would not allocate more memory than needed. This property is also one that enables the eager-deallocating-allocation effect extension to work (see Sect. 2.5). At last, we show that each memory instructions of CTRC only perform a statically-bounded number of steps, which provides the constant-time guarantee as promised.
Lemma 1
The eager semantics and lazy semantics simulate the baseline semantics.
Proof
Boxed instructions do not add any non-boxed instructions when evaluated, so for the simplified program trace, we can safely remove them from the rules. The resulting rules are the same for both semantics, so their simplified program traces are the same.
With the simulation relation, we can define time in program execution by the position in the simplified trace, i.e. according to the baseline semantics. We denote the reference count of variable x at a certain time when executed according to the eager semantics and the lazy semantics by \(x_{\textrm{k}}\) and \(x_{\textrm{c}}\) respectively.
Lemma 2
At any point in the program execution, we have \(x_{\textrm{k}}\le x_{\textrm{c}}- x_f\), where \(x_f\) is the number of times x occurs as a field of variables that are freed in the eager semantics but not reused in the lazy semantics.
Proof
First, notice that if the proposition holds, the lazy execution never reuses memory before the eager execution deallocates the variable. This is because in order for the lazy execution to reuse memory, it has to execute the \(\textit{new}_{\textrm{c}}\) rule, whereas the \(\textit{new}_{\textrm{k}}\) rule of the eager semantics does not deallocate anything. Let \(x_{\textrm{k}}'\) and \(x_{\textrm{c}}'\) be the reference count after this step, we know that \(x_{\textrm{c}}' = 0\) because we deallocate in this step, and \(x_{\textrm{k}}= x_{\textrm{k}}'\) as the eager semantics do not deallocate in this step, we have \(x_{\textrm{k}}= x_{\textrm{k}}' \le x_{\textrm{c}}' = 0\) so x is already deallocated in the eager execution.
Now we prove the proposition by induction on the evaluation rules.
-
Case \(H = \varnothing \) . Initially, the free list and heap are empty, so the proposition holds trivially.
-
Case new . For allocation expressions, the \(\textit{new}_{\textrm{k}}\) and \(\textit{new}_{\textrm{c}}/\textit{newr}_{\textrm{c}}\) rules are executed. In both semantics, the newly allocated value has \(x_{\textrm{k}}= x_{\textrm{c}}= 1\) and is not freed in the eager semantics, so the proposition holds for the newly allocated value.
We now prove that the proposition still holds for all the original fields of the value being reused. Notice that for the original eager semantics, it cannot perform deallocation when evaluating allocation, so \(f_{\textrm{k}}\) is not changed. For \(\textit{newr}_{\textrm{c}}\), the fields of the old deallocated value old are dropped, so reference count \(f_{\textrm{c}}\) for the field f is decremented n times, where n is the number of occurrences of the variable in the fields of the deallocated object. \(f_f\) is also decremented by n, because old is now reused in the lazy semantics, and its fields no longer contribute to \(f_f\), so the inequality still holds for f.
For other objects y, as the lazy semantics does not perform recursive drop, the reference counts are not being changed. Also, as the eager semantics does not perform reference count update in the case of allocation, except for the newly allocated value, \(y_f\) will not change, and the inequality still holds.
-
Case free For \(\textit{free}_{\textrm{k}}\) and \(\textit{free}_{\textrm{c}}\) rules, it is easy to see that \(\textit{free}_{\textrm{k}}\) decrements the reference count \(x_{\textrm{k}}\) of every field x of the deallocated value by n, while \(\textit{free}_{\textrm{c}}\) causes \(x_f\) to increase by n and no change in \(x_{\textrm{c}}\). So the proposition still holds for all fields of the deallocated value.
-
Other cases For other rules, both semantics have the same behavior so they do not affect the invariant.
Corollary 1
The lazy semantics is sound, i.e. it only reuses garbage that would have been deallocated in the eager semantics.
We now prove the garbage-free property for this lazy semantics, and the proof also shows that one can perform garbage collection and get to the same state as in the eager semantics.
Lemma 3
When the points-to graph is acyclic and the free list is empty, \(v_{\textrm{k}}= v_{\textrm{c}}\).
Proof
Note that we only have to count the number of drop calls because dups are treated the same in both semantics, and drops are commutative so order does not matter.
By induction on the longest distance from the root set in the points-to graph. If the longest distance is zero, this holds because the reference count can only be n, where n is the number of dup and drop calls, as there are no references to the variable. For the induction case, note that every pointer pointing to the current object has a strictly smaller longest distance, and the induction hypothesis holds for them. If the pointer is from some garbage, by the induction hypothesis the reference count of the garbage is the same as in the eager semantics. Because the eager semantics is garbage-free, the reference count of the garbage has 0 reference count, which should already be dropped and added to the free list. As the free list is empty, the memory is already being reused by the \((\textit{newr}_{\textrm{c}})\) rule and the fields are dropped. Hence, the reference count of the current object is equal to the number of live objects pointing to it, which is the same as in the eager semantics.
Note that the proof requires an acyclic heap, which is also a property required for reference counting to work. For functional programming languages without mutation, with suitable compilation strategy, programs can guarantee to have no reference cycles.
Corollary 2
Acyclic heaps are garbage-free when the free list is empty.
The relationship between eager reference counting and lazy reference counting is shown in Fig. 5. The heap is originally garbage-free as there is no allocation. When users perform deallocation, eager deallocation removes all garbage associated with the object, while lazy deallocation turns the heap into the CTRC heap. When the user empties the free list of the CTRC heap, the heap becomes garbage-free again.
Theorem 1
(Constant-time memory management). Each memory management instruction takes constant time with the CTRC semantics.
Proof
There are three cases to consider:
-
Case \(\textrm{dup}~\) This instruction is evaluated according to \(\textit{dup}_{\textrm{c}}\) in 1 step.
-
Case \(\textrm{drop}~\) This instruction can be evaluated according to \(\textit{drop}_{\textrm{c}}\) or \(\textit{free}_{\textrm{c}}\), where both of them can be evaluated in 1 step. \(\textit{free}_{\textrm{c}}\) requires appending a variable to the free list, which can be implemented in constant time with a linked list.
-
Case Allocation There are two cases for allocation, depending on whether the free list is empty.
-
Subcase Empty free list Allocation is evaluated according to rule \(\textit{new}_{\textrm{c}}\), which requests memory from the system in 1 step.
-
Subcase Non-empty free list Allocation reuses an allocation from the free list and drops all its fields according to rule \(\textit{newr}_{\textrm{c}}\). As we assume the number of fields is statically bounded, and each \(\textrm{drop}~\) instruction takes a statically-bounded amount of CPU operations, the whole operation takes a statically-bounded amount of CPU operations.
-
Note that the formalization is different from the actual implementation, we do not distinguish between objects and segments. The compiler is responsible for splitting objects into segments, satisfying the constant size requirement. We do not model this compiler transformation because there can be many different implementations, and our operational semantics do not depend on such details. As the size is bounded, the number of fields of each object is also bounded.
B CTRC Allocator Source Code
In this appendix, we present our implementation of basic CTRC (without the locality optimization).
The defer_drop function is used for deallocating objects, and the get_block function is used for allocating new objects.
Header initialization and reference-count updates are handled in the Koka runtime.
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Lam, C.K., Parreaux, L. (2024). Being Lazy When It Counts. In: Gibbons, J., Miller, D. (eds) Functional and Logic Programming. FLOPS 2024. Lecture Notes in Computer Science, vol 14659. Springer, Singapore. https://doi.org/10.1007/978-981-97-2300-3_11
Download citation
DOI: https://doi.org/10.1007/978-981-97-2300-3_11
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-2299-0
Online ISBN: 978-981-97-2300-3
eBook Packages: Computer ScienceComputer Science (R0)