Embedding arbitrary Boolean circuits into fungal automata

Fungal automata are a variation of the two-dimensional sandpile automaton of Bak, Tang, and Wiesenfeld (Phys. Rev. Lett. 1987). In each step toppling cells emit grains only to some of their neighbors chosen according to a specific update sequence. We show how to embed any Boolean circuit into the initial configuration of a fungal automaton with update sequence $HV$. In particular we give a constructor that, given the description $B$ of a circuit, computes the states of all cells in the finite support of the embedding configuration in $O(\log |B|)$ space. As a consequence the prediction problem for fungal automata with update sequence $HV$ is $\mathsf{P}$-complete. This solves an open problem of Goles et al. (Phys. Lett. A, 2020).


Introduction
The two-dimensional sandpile automaton by Bak, Tang, and Wiesenfeld [1] has been investigated from different points of view.Because of the simple local rule, it is easily generalized to the d-dimensional case for any integer d ≥ 1.
Several prediction problems for these cellular automata (CA) have been considered in the literature.Their difficulty varies with the dimensionality.The recent survey by Formenti and Perrot [3] gives a good overview.For one-dimensional sandpile CA the problems are known to be easy (see, e.g., [7]).For d-dimensional sandpile CA where d ≥ 3, they are known to be P-complete [8].In the two-dimensional case the situation is unclear; analogous results are not known.
Fungal automata (FA) as introduced by Goles et al. [6] are a variation of the twodimensional sandpile automaton where a toppling cell (i.e., a cell with state ≥ 4) emits 2 excess grains of sand either to its two horizontal ("H") or to its two vertical neighbors ("V ").These two modes of operation may alternate depending on an update sequence specifying in which steps grains are moved horizontally and in which steps vertically.
The construction in [6] shows that some natural prediction problem is P-complete for twodimensional fungal automata with update sequence H 4 V 4 (i.e., grains are first transferred horizontally for 4 steps and then vertically for 4 steps, alternatingly).The paper leaves open whether the same holds for shorter update sequences.The shortest non-trivial sequence is HV (and its complement V H); at the same time this appears to be the most difficult to use.By a reduction from the well-known circuit value problem (CVP), which is P-complete, we will show: Theorem 1.The following prediction problem is P-complete for FA with update sequence HV : Given as inputs initial states for a finite rectangle R of cells, a cell index y (encoded in binary), and an upper bound T (encoded in unary) on the number of steps of the FA, decide whether cell x is in a state = 0 or not at some time t ≤ T when the FA is started with R surounded by cells all in state 0.
We assume readers are familiar with cellular automata (see Section 2 for the definition).We also assume knowledge of basic facts about Boolean circuits and complexity theory, some of which we recall next.

Boolean circuits and the CVP
A Boolean circuit is a directed acyclic graph of gates: not gates (with one input), and and or gates with two inputs, n ≥ 1 input gates and one output gate.The output of a gate may be used by an arbitrary number of other gates.Since a circuit is a dag and each gate obtains its inputs from gates in previous layers, ultimately the output of each gate can be computed from a subset of the input gates in a straightforward way.
It is straightforward to realize not, and, and or gates in terms of nand gates with two inputs (with an only constant overhead in the number of gates).To simplify the construction later on, we assume that circuits consist exclusively of nand gates.
Each gate of a circuit is described by a 4-tuple (g, t, g 1 , g 2 ) where g is the number of the gate, t describes the type of the gate, and g 1 and g 2 are the numbers of the gates (called sources of g) which produce the inputs for gate g; all numbers are represented in binary.If gate g has only one input, then g 2 = g 1 by convention.Without loss of generality the input gates have numbers 1 to n and since their predecessors g 1 and g 2 will never be used, assume they are set to 0. All other gates have subsequent numbers starting at n + 1 such that the inputs for gate g are coming from gates with strictly smaller numbers.Following Ruzzo [9] the description B of a complete circuit is the concatenation of the descriptions of all of its gates, sorted by increasing gate numbers.
Problem instances of the circuit value problem (CVP) consist of the description B of a Boolean circuit C with n inputs and a list x of n input bits.The task is to decide whether C(x) = 1 holds or not.It is well known that the CVP is P-complete.

Challenges
A standard strategy for showing P-completeness of a problem Π in some computational model M (and also the one employed by Goles et al. in [6]) is by a reduction from the CVP to Π, which entails describing how to "embed" circuits in M.
In our setting of fungal automata with update sequence HV , while realizing wires and signals as in [6] is possible, there is no obvious implementation for negation nor for a reliable wire crossing.Hence, it seems one can only directly construct circuits that are both planar and monotone.Although it is known that the CVP is P-complete for either planar or monotone circuits [5], it is unlikely that one can achieve the same under both constraints.This is because the CVP for circuits that are both monotone and planar lies in NC 2 (and is thus certainly not P-complete unless P ⊆ NC 2 ) [2].
We are able to overcome this barrier by exploiting features that are present in fungal automata but not in general circuits: time and space.Namely, we deliberately retard signals in the circuits we implement by extending the length of the wires that carry them.We show how this allows us to realize a primitive form of transistor.From this, in turn, we are able to construct a nand gate, thus allowing both wire crossings and negations to be implemented.
Our construction is not subject to the limitations that apply to the two-dimensional case that were previously shown by Gajardo and Goles in [4] since the FA starting configuration is not a fixed point.The resulting construction is also significantly more complex than that of [6].

Overview of the construction
In the rest of the paper we describe how to embed any Boolean circuit with description B and an assignment of values to the inputs into a configuration c of a fungal automaton in such a way that the following holds: • "Running" the FA for a sufficient number of steps results in the "evaluation" of all simulated gates.In particular, after reaching a stable configuration, a specific cell of the FA is in state 1 or 0 if and only if the output of the circuit is 1 or 0, respectively.
• The initial configuration F of the FA is simple in the sense that, given the description of a circuit and an input to it, we can produce its embedding F using O(log n + log |B|) space.Thus we have a log-space reduction from the CVP to the prediction problem for FA.
The construction consists of several layers: Layer 0: The underlying model of fungal automata.
Layer 1: As a first abstraction we subdivide the space into "blocks" of 2 × 2 cells and always think of update "cycles" consisting of 4 steps of the CA, using the update sequence (HV ) 2 .
Layer 2: On top of that we will implement "polarized circuits" processing "polarized signals" that run along "wires".
Layer 3: Polarized circuitry is then used to implement "Boolean circuits with delay": "bits" are processed by "gates" connected by "cables".1 Layer 4: Finally a given Boolean circuit (without delay) can be embedded in a fungal automaton (as a circuit with delay) in a systematic fashion that needs only logarithmic space to construct.
The rest of this paper has a simple organization: Each layer i will be described separately in section i + 2.
2 Layer 0: The Fungal Automaton Let N + denote the set of positive integers and Z that of all integers.For d ∈ N + , a "d-dimensional CA" is a tuple (S, N, δ) where: • S is a finite set of states • N is a finite subset of Z d , called the "neighborhood" • δ : S N → S is the "local transition function" In the context of CA, the elements of Z d are referred to as cells.The function δ induces a "global transition function" ∆ : S Z d → S Z d by applying δ to each cell simultaneously.In the following, we will be interested in the case d = 2 and the so-called von Neumann neighborhood Except for the updating of cells the fungal automaton is just a two-dimensional CA with the von Neumann neighborhood of radius 1 and S = {0, 1, . . ., 7} as the set of states. 2 A "configuration" is thus a mapping c : Z 2 → S.
Depending on the their states cells will be depicted as follows in diagrams: -state 0 as -state 1 as • -state i ∈ S \ {0, 1} as i We will use colored background for cells in states 2, 3, and 4 since their presence determines the behavior of the polarized circuit.The state 1 is only a "side effect" of an empty cell receiving a grain of sand from some neighbors; hence it is represented as a dot.Cells which are not included in a figure are always assumed to be in state 0.
For a logical predicate P denote by [P ] the value 1 if P is true and the value 0 if P is false.For i ∈ Z 2 denote by h(i) the two horizontal neighbors of cell i and by v(i) its two vertical neighbors.Cells are updated according to 2 functions H and V mapping from S Z 2 4 3 3 3 3 3 2 3 3 3 3 3 2 3 2 4 3 3  3 2 2 3 3 3 • Figure 1: Five transitions according to HV HV H to S Z 2 where for each i ∈ Z 2 the following holds: The updates are similar to the sandpile model by Bak, Tang, and Wiesenfeld [1], but toppling cells only emit grains of sand either to their horizontal or their vertical neighbors.Therefore whenever a cell is non-zero, it stays non-zero forever.The composition of these functions applying first H and then V is denoted HV .For the transitions of a fungal automaton with update sequence HV these functions are applied alternatingly, resulting in a computation c, H(c), V (H(c)), H(V (H(c))), V (H(V (H(c)))), and so on.In examples we will often skip three intermediate configurations and only show c, HV HV (c), etc. Figure 1 shows a simple first example.

Layer 1: Coarse Graining Space and Time
As a first abstraction from now on one should always think of the space as subdivided into "blocks" of 2 × 2 cells.Furthermore we will look at update "cycles" consisting of 4 steps of the CA, thus using the update sequence HV HV which we will abbreviate to Z.As an example Figure 2 shows the same cycle as Figure 1 and the following cycle in a compact way.Block boundaries are indicated by thicker lines.
Cells outside the depicted area of a figure are assumed to be 0 initially and they will never become critical and topple during the shown computation.We turn to the second lowest level of abstraction.Here we work with two types of signals, which we refer to as positive (denoted ) and negative (denoted ).Both types will have several representations as a block in the FA.
• All representations of a signal have in common that the upper left corner of the block is a 4 and the other cells are 2 or 3 .
• All representations of a signal have in common that the lower left corner of the block is a 4 and the other cells are 2 or 3 .
Not all representations will be appropriate in all situations as will be discussed in the next subsection.
The rules of fungal automata allow us to perform a few basic operations on these polarized signals (e.g., duplicating, merging, or crossing them under certain assumptions).The highlight here is that we can implement a (delay-sensitive) form of transistor that works with polarized signals, which we refer to as a switch.
As a convention, in the figures in this section, we write x and y for the inputs of a component and z, z 1 , and z 2 for the outputs.

Polarized Signals and Wires
Representations of and signals are shown in Figure 3.We will refer to a block initially containing a or signal as a or source, respectively.(This will be used, for instance, to set the inputs to the embedded CVP instance.) A comparison of Figure 2 and Figure 3a shows that in the former a signal is "moving from left to right".In general we will use wires to propagate signals.Wires extending horizontally or vertically can be constructed by juxtaposing wire blocks consisting of 2 × 2 blocks of cells in state 3 .While one can use the same wire blocks for both types of signals, each block is destroyed upon use and thus can only be used once.In particular, this means a wire will either be used by a or a signal.We refer to the respective wires as and wires, accordingly.
Every representation of a signal is restricted with respect to the possible directions it can move to along a wire.In our construction each signal will start at the left end of a horizontal wire.Figure 4 shows how a signal first "turns left" once and then moves along a wire that "turns right" two times, changing its representation while meandering around.(The case of a signal is similar and is not shown.) Figure 5 can be seen as the continuation of Figure 4.The signal moves further down, "turns left" twice, and then reaches the end of the wire.The composition of both parts can be seen in Figure 11 and will be used as the basic building unit for "retarders".

Diodes
Note that and signals do not encode any form of direction in them (regarding their propagation along a wire).In fact, a signal propagates in any direction a wire is placed in.In order for our components to operate correctly, it will be necessary to ensure a signal is propagated in a single direction.To realize this, we use diodes.
A diode is an element on a horizontal wire that only allows a signal to flow from left to right.A signal coming from right to left is not allowed through.As the other components, the diode is intended to be used only once.For the implementation, refer to Figure 6.(Recall that x denotes the component's input and z its output.)Figure 7     For all the remaining elements described in this section, we implicitly add diodes to their inputs and outputs.This ensures that the signals can only flow from left to right (as intended).This is probably not necessary for all elements, but doing so makes the construction simpler while the overhead is only a constant factor blowup in the size of the elements.

Duplicating, Merging, and Crossing Wires
Wires of the same polarity can be duplicated or merged.By duplicating a wire we mean we create two wires z 1 and z 2 from a single wire x in such a way that, if any signal arrives from x, then this signal is duplicated and propagated on both z 1 and z 2 .(Equivalently, one might imagine that x = z 1 and z 2 is a wire copy of x.)In turn, a wire merge realizes in some sense the reverse operation: We have two wires x and y of the same polarity and create a wire z such that, if a signal arrives from x or y (or both), then a signal of the same polarity will emerge at z. (Hence one could say the wire merge realizes a polarized or gate.)See Figure 8 for the implementations.
As discussed in the introduction, there is no straightforward realization of a wire crossing in fungal automata in the traditional sense.Nevertheless, it turns out we can cross wires under the following constraints: 1.The two wires being crossed are a and a wire.
2. The crossing is used only once and by a single input wire; that is, once a signal from either wire passes through the crossing, it is destroyed.(If two signals arrive from both wires at the same time, then the crossing is destroyed without allowing any signal to pass through.)To elicit these limitations, we refer to such crossings as semicrossings.
We actually need two types of semicrossings, one for each choice of polarities for the two input wires.The semicrossings are named according to the polarity of the top input wire: A semicrossing has a wire as its top input (and a wire as its bottom one) whereas a semicrossing has a wire at the top (and a wire at the bottom).For the implementations, see Figure 9.

Switches
A switch is a rudimentary form of transistor.It has two inputs and one output.Adopting the terminology of field-effect transistors (FETs), we will refer to the two inputs as the   A subsequent signal arriving from the source will then be propagated on to the drain.This means that switches are delay-sensitive: A signal arriving at the source only continues on to the drain if the gate signal has arrived beforehand (or simultaneously to the source).
Similar to semicrossings, our switches come in two flavors.In both cases the top input is a wire and the bottom one a .The difference is that, in a switch, the source (and thus also the drain) is the input and the gate is the input.Conversely, in a switch the source and drain are wires and the gate is a wire.Refer to Figure 10 for the implementation of the two types of switches.

Delays and Retarders
As mentioned in the introduction, the circuits we construct are sensitive to the time it takes for a signal to flow from one point to the other.To render this notion precise, we define for every component a delay which results from the time taken for a signal to pass through the component.This is defined as follows: • The delay of a source is zero.
• The delay of a wire (including bends) at some block B is the delay of the wire's source S plus the length (in blocks) of the shortest contiguous path along the wire that leads from S to B according to the von Neumann neighborhood.We will refer to this length as the wire distance between S and B. For example, the wire distance between the inputs and outputs in all of Figures 6 and 8 to 10 is 4; similarly, the distance between x and z in Figure 11 (see below) is 15.
• The delay of a gate (i.e., a diode, wire duplication, wire merge, or semicrossing) is the maximum over the delays of its inputs plus the gate width (in blocks).Figure 11: Implementation of a basic retarder (for both and signals) that ensures a delay of ≥ 12 at z (relative to x).Retarders for greater delays can be realized by increasing (i) the height of the meanders, (ii) the number of up-down meanders, and (iii) the positions of the input and output.
Notice our definition of wire distance may grossly estimate the actual number of steps a signal requires to propagate from S to B. This is fine for our purposes since we only need to reason about upper bounds later in Section 6.3.
Finally we will also need a retarder element, which is responsible for adding a variable amount of delay to a wire.Refer to Figure 11 for their realization.Retarders can have different dimensions.Evidently, one can ensure a delay of t with a retarder that is O( √ t) × O( √ t) large.We are going to use retarders of delay at most D, where D depends on the CVP instance and is set later in Section 6.3.Hence, it is safe to assume all retarders in the same configuration are of the same size horizontally and vertically, but realize different delays.This allows one to use retarders of a single size for any fixed circuit, which simplifies the layout significantly (see also Sections 6.3 and 6.4).

Layer 3: Working With Bits
We will now use the elements from Section 4 (represented as in Figure 12) to construct planar delay-sensitive Boolean circuits.Our circuits will use nand gates as their basis.We discuss how to overcome the planarity restriction in Section 5.4. of wire distance (see Section 4.5) to cables simply by setting it to the maximum of the respective wire distances.A signal on a cable's wire represents a binary 1, and a signal on the wire represents a binary 0. By convention we will always draw the wire "above" the wire of the same cable.See Figure 13 for an example.

Representation of Bits
When referring to a gate's inputs and outputs, we indicate the and components of a cable with subscripts.For instance, for an input cable x, we write x + for its and x − for its component.

Bit Duplication
To duplicate a cable, we use the Boolean branch depicted in Figure 14.The circuit consists of two wire duplications (one of each polarity) and a crossing.

Nand Gates
As a matter of fact the nand gate is inspired by the implementation of such a gate in cmos technology 3 .Refer to Figure 15 for the implementation.Notice the usage of switches means these gates are delay-sensitive; that is, the gate only operates correctly (i.e., computing the nand function) if the retarders have strictly greater delays than the inputs x and y.In fact, for our construction we will need to instantiate this same construction using varying values for the retarders' delays (but not their size as mentioned in Section 4.5).This seems necessary in order to chain nand gates in succession (since each gate in a chain incurs a certain delay which must be compensated for in the next gate down the chain).
In addition, notice that in principle nand gates have variable size as their dimension depends on that of the three retarders, As is the case for retarders, in the same embedding 4 3 3 3 3 3 3 3  2 3 3 3 3 3 3 3   3 3 3 3 3 3 3  we insist on having all nand gates be of the same size.We defer setting their dimensions to Section 6.3; for now, it suffices to keep in mind that nand gates (and retarder elements) in the same embedding only vary in their delay (and not their size).
Claim 1. Assuming the retarders have larger delay than the input cables x and y, the circuit on Figure 15 realizes a nand gate.
Proof.Consider first the case where both x + and y + are set.Since x − is not set, X 1 is consumed by x + , turning S 4 on.In addition, since y + is set, S 2 is also turned on.Hence, using the assumption on the delay of the inputs, the negative source flows through S 2 , S 4 , and X 2 on to z − .Since both the switches S 1 and S 3 remain open, the z + output is never set.Notice the crossings X 1 and X 2 are each used exactly once.Let now x − or y − (or both) be set.Then either S 2 or S 4 is open, which means z − is never set.As a result, X 2 is used at most once (namely in case y − is set).If x − is set, then S 1 is opened, thus allowing the positive source to flow on to M .The same holds if y − is set, in which case M receives the positive source arriving from S 3 .Hence, at least one positive signal will flow to the M gate, causing z + to be set eventually.

Cable Crossings
There is a more or less well-known idea to cross to bits using three xor gates which can for example be found in the paper by Goldschlager [5]. Figure 16 shows the idea.This construction can be used in FA.Because of the delays, there is not the crossing gate, but a   [5] whole family of them.Depending on the position in the whole circuit layout, each crossing needs nand gates with specific builtin delays (which will be set in Section 6.3).

Layer 4: Layout of a Whole Circuit
Finally we describe one possibility to construct a finite rectangle of cells F of a FA containing the realization of a complete circuit, given its description B. The important point here is that, in order to produce F from B, the constructor only needs logarithmic space.(Therefore the simplicity of the layout has precedence over any form of "optimization".)

Arranging the Circuit in Tiles
Let C be the circuit that is to be embedded as an FA configuration F .Letting n be the length of inputs to C and m its number of gates, notice we have an upper bound of m on the circuit depth of C. Without restriction, we may assume m ≥ n, which also implies an upper bound of m + n = O(m) on the number of cables of C (since C has bounded fan-in).The logical gates of C are denoted by G 1 , . . ., G m and we assume that G i has number n + i in description B of C (recall Section 1.1).
In the configuration F we have cables x 1 , . . ., x n originating from the input gates as well as cables g 1 , . . ., g m coming from (the embedding of) the gates of C. The x i and g i flow in and out of equal-sized tiles T 1 , . . ., T m , where in the i-th tile T i we implement the i-th gate G i of C. The inputs to T i are I i = {x 1 , . . ., x n , g 1 , . . ., g i−1 } and its outputs Recall that, unlike standard circuits, the behavior of our layer 3 circuits is subject to spatial considerations, that is, to both gate placement and wire length.For the sake of simplicity, each tile is shaped as a square and all tiles are of the same size.In addition, the tiles are placed in ascending order from left to right and with no space in-between.The only objects in F that lie outside the tiles are the inputs and output of C itself.The inputs are placed immediately next to corresponding cables that go into T 1 whereas the output is placed next to its corresponding wire g m at the outgoing end of T m .T i x 1 x 1 x n x n g 1 g 1 Figure 17: Overview of the tile T i .The upper part of the tile has green background, the lower part has blue background.

Layout for Tile i
As depicted in Figure 17, each tile is subdivided into two areas.The upper part contains the wires that pass through it, while the lower part implements the gate G i proper.
We give a broad overview of the process for constructing T i .First determine the numbers y 1 and y 2 of the inputs to G i .Then duplicate the bits on cables y 1 and y 2 (as in Section 5.2) and cross the copies over to the lower part of the tile.These crossings require setting adequate delays, which will be adressed in the next section.(In case y 1 = y 2 , duplicate the cable twice and proceed as otherwise described.)Next instantiate G i with a proper amount of delay (again, see the next section) and plug in y 1 and y 2 as inputs into G i .Finally connect all inputs in I i as well as the output wire g i of G i to their respective outputs.Notice the tile contains O(m) crossings and thus also O(m) nand gates in total.

Choosing Suitable Delays for All Gates
The two details that remain are setting the dimensions and the delays for the retarders in all nand gates.This requires certain care since we may otherwise end up running into a chicken-and-egg problem: The retarders' dimensions are determined by the required delays (in order to have enough space to realize them); in turn, the delays depend on the aforementioned dimensions (since the input wires in the nand gates must be laid so as to "go around" the retarders).
The solution is to assume we already have an upper bound D on the maximum delay in F .This allows us to fix the size of the components as follows: • The retarders and nand gates have side length O( √ D).
• Each tile has side length O(m √ D).
• The support of F fits into a square with side length O(m 2 √ D).
determined in Section 6.3, which clearly are all computable in logspace (since the maximum delay D is polynomial in m).
Finally R also needs to produce y and T as in the statement of Theorem 1.Let c i be the cable of T m that corresponds to the output of the embedded circuit C. Then we let y be the index of the cell next to the wire of c i at the output of T m .(Hence y assumes a non-zero state if and only if c i contains a 1, that is, C(x) = 1.)As for T , certainly setting it to the number of cells in F suffices (since a signal needs to visit every cell in F at most once).
Figure 2: compact representation of two cycles

Figure 3 :
Figure 3: Representations of and signals

Figure 4 :
Figure 4: A signal moving along a wire with two right turns.

Figure 5 :
Figure 5: A signal moving along with two left turns (continuation of Figure 4)

Figure 6
Figure 6: Diode implementations signal comes from the right

ForFigure 12 :
Figure 12: Representations of the elements from abstraction layer 2 as used in layer 3. The polarities indicate whether the or version of the component is used.

Figure 13 :Figure 15
Figure 13: Binary representations travelling from left to right along a cable Implementing xor with nand gates