Using compression to find interesting one-dimensional cellular automata

Ahmed, Nadim; Teahan, William J.

doi:10.1007/s40747-019-00121-7

Using compression to find interesting one-dimensional cellular automata

Original Article
Open access
Published: 20 September 2019

Volume 6, pages 123–146, (2020)
Cite this article

Download PDF

You have full access to this open access article

Complex & Intelligent Systems Aims and scope Submit manuscript

Using compression to find interesting one-dimensional cellular automata

Download PDF

1703 Accesses
1 Citation
Explore all metrics

Abstract

This paper proposes a novel method for finding interesting behaviour in complex systems based on compression. A new clustering algorithm has been designed and applied specifically for clustering 1D elementary cellular automata behaviour using the prediction by partial matching (PPM) compression scheme, with the results gathered to find interesting behaviours. This new algorithm is then compared with other clustering algorithms in Weka and the new algorithm is found to be more effective at grouping behaviour that is visually similar in output. Using PPM compression, the rate of change of the cross-entropy with respect to time is calculated. These values are used in combination with a clustering algorithm, such as k-means, to create a new set of clusters for cellular automata. An analysis of the data in each cluster is then used to determine if a cluster can be classed as interesting. The clustering algorithm itself was able to find unusual behaviours, such as rules 167 and 181 which have output that is slightly different from all the other Sierpiński Triangle-like patterns, because their apexes are off-centre by one cell. When comparing the new algorithm with other established ones, it was discovered that the new algorithm was more effective in its ability to group interesting and unusual cellular automata behaviours together.

A Cellular Automata-Based Clustering Technique for High-Dimensional Data

Spatial Complexity Measure for Characterising Cellular Automata Generated 2D Patterns

Measuring Clustering Model Complexity

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

The analysis of the behaviour of complex systems is a difficult problem, not least because of the difficulties that arise due to emergence and self-organisation. Behaviour in this context means the movements and interactions of the agents and how this also affects the environment. Often, a system can be defined by simple rules, but those rules can lead to many different behaviours depending on the initial conditions, and often, the observer is never certain that they may have missed some interesting behaviour even after running many simulations or making many observations in the case of a real system. Clearly, it would be extremely useful to be able to automatically classify interesting behaviours within a complex system. However, to find such behaviours, it is important first to define what we might mean when we describe a behaviour as being interesting. Within the data mining and knowledge discovery community, there are several definitions of the term “interestingness”. As there is no previous analysis of interestingness within complex systems, this paper explores whether it is possible to automatically classify when interesting behaviour has occurred within a complex system. As a case study, it will specifically use one-dimensional (1D) elementary cellular automata as an example of a complex system to find interesting behaviour within such systems.

Cellular automata were invented in the 1940s. There has been substantial research, since they are still just as relevant today in fields ranging from its use as a pseudo-random sequence generator [1] to designing levels in mazes [2]. This paper will examine 1D cellular automata which produce different patterns as a result of their behaviour depending upon the initial conditions and the rules used. In a 1D elementary cellular automata, the cell can be inactive (white) or active (black) [3] with 256 basic rules available. The activation state of the cell and the two on either side of it affects the cell on the next iteration. This paper will use cellular automata with a single activated cell in the centre of the initial row (with subsequent rows being generated each time step when visualising the behaviour of the system). There are many permutations of rules and initial conditions that generate many different behaviours that are difficult to explore manually.

Amongst the potential patterns available, there could be undiscovered interesting patterns concealed amongst the myriad of different initial conditions and rule permutations. Within those patterns produced, many may be referred to as “uninteresting” patterns. These uninteresting patterns include the empty spaces and diagonal lines moving from the centre to a corner, or repeated patterns of horizontal stripes. For an initial condition of a single cell, there are interesting patterns that are produced by some of the rules, such as those that look similar to a Sierpiński Triangle. Currently, there is no list of cellular automata rules that are classified by their interesting output. This paper will propose such a list, and also a new way of automatically classifying interesting cellular automata patterns based on compression.

This paper is organised as follows. It first examines the idea of “interestingness” in terms of complex system behaviour, which is the main focus of this paper, based on how it has been defined in other fields. The next section then examines the current ways that cellular automata have been classified. As this paper uses compression to classify 1D elementary cellular automata behaviour, a previous example of using compression with classifying cellular automata is discussed. The new clustering algorithm which makes use of compression is then introduced, and then used to find interesting behaviours in 1D elementary cellular automata. Finally, the new clustering algorithm is compared with other clustering algorithms available on Weka.

Interestingness

Most of the available literature on interestingness concentrates on its use in data mining and knowledge discovery. Surprisingly, as the concept of interestingness is well accepted in these fields, there is no literature for its use within complex systems. Interestingness may be defined using subjective (human controlled) or objective (statistical data) measurements [4]. Subjective measures rely on comparing previously observed behaviour with the current behaviour and noticing any changes. It can also be subjected to human emotion at the time of deciding whether something is classed as interesting. Objective measures rely on statistical analysis of the data describing the behaviour, and are independent from humans and are, therefore, generic.

A behaviour is deemed interesting if it strays from what has been previously observed [4], is often classed as something novel or surprising [5] or that interesting discoveries are surprising [6]. However, in data mining and knowledge discovery, there is an additional requirement; it has to be useful. McGarry states that a pattern may be unexpected and, therefore, could be deemed as noise or an outlier, and, thus, may not be useful [7]. Data mining also requires patterns to be relevant and useful, so their interestingness measures are designed to reduce the number of patterns that need to be checked [8]. This means that applying the same interestingness measures used within the data mining community may not be what is needed for analysis of complex systems.

As mentioned above, one of the definitions of interestingness uses a notion of surprise. A large surprise is one where an action that was predicted with high confidence did not take place, but a different action was performed instead [9].

Another definition uses Shannon’s entropy to measure interestingness. Hall and Morton state that entropy estimators may be used to construct measures of interestingness [10]. Blanchard mentions that Shannon conditional entropy is one of the most commonly used measures to calculate rule interestingness [11]. Schmidhuber states that associating Shannon’s entropy directly with interestingness would be incorrect, due to random noise such as white noise, resulting in a high entropy value, which would not be interesting [12]. Although if in a visual system, the output from the inputs was mainly black, and then suddenly, there was white noise, it could be classed as interesting from a surprising point of view if it was unexpected.

Schmidhuber has a different idea of what constitutes interestingness [12]. He states that when something is beautiful and newly observed, then it is interesting; however, over time, a beautiful item that has been observed many times loses its interestingness value [12]. An example is the photos of the Horsehead Nebula in the constellation of Orion. For someone who has never seen it before, it looks beautiful and interesting, but after seeing dozens of photos, the interestingness factor reduces.

Whether a behaviour of a system is interesting or not depends upon the context. As Schmidhuber states, beauty is an important factor; however, there will be some exceptions to this idea. For example, he mentioned that when a visual sensor that stays in the dark experiences white noise, then there is a sudden increase in entropy. This is due to the previous input of completely zero values suddenly changing into random values of 0s and 1s, and therefore, it becomes uncompressible. Schmidhuber states that in both of these cases, complete darkness and random white noise are boring, and, therefore, not interesting [12]. This is a good case for defining what is interesting in comparison to beauty; however, from a different point of view, the sudden appearance of white noise in what was a dark environment should be classed as interesting. It may not be beautiful, but it could be an indicator that something in the environment has changed, and is, therefore, interesting.

Keeping humans in the loop when trying to determine if something is interesting is another way for determining whether something is interesting [13]. One system that relies on humans in the loop is the “Conceptual Knowledge Discovery in Databases” system which creates plots as output and relies on experts to test out hypotheses by inspecting the plot structures [7, 14]. Another demonstration of the possible importance of having a “human in the loop” aspect to help with finding interesting behaviour is Schmidhuber’s claim that beauty is important in defining interestingness. In the case of cellular automata when relying on visual observations, certain patterns may not be classed as interesting, mainly because there is no aspect of beauty within them.

Hudson [15] looks at how ease of compression determines whether a pattern is interesting or boring. His conclusion is that if compression is trivial (producing a very small output file), or is almost impossible (producing an output almost equal in size to the input file), then these situations can be considered as boring. All other situations where compression is challenging are deemed to be interesting [15]. The ease in compression indicates that very little change is occurring when compared to previously, whereas when the compression is almost impossible, then it is as though random noise like white noise is being compressed. In these two situations, they would not be classed as interesting in a general sense.

Current classifications of 1D elementary cellular automata behaviour

Cellular automata behaviour has been classified using different criteria by a number of researchers. One classification, by Wolfram [3], describes four classes:

Class 1
The pattern produced by the cellular automata eventually culminates with all cells having the same value; for example, all values become 0.
Class 2
The pattern produced by the cellular automata consists of solid simple structures or repeated patterns. Examples are vertical or diagonal lines or a ladder type of structure, otherwise repeating patterns alternating between 0 and 1.
Class 3
The pattern produced by the cellular automata is chaotic in nature; for example, patterns that grow to fill out the width of the screen, such as Sierpiński Triangle-like triangles [16].
Class 4
The pattern produced by the cellular automata is constructions of high complexity that do not disintegrate until the distant future [16]. This includes narrow structures that do not fill the whole screen width. They do not grow in width, but rather in height, although they may stop after a while.

Wolfram summarises the classes in his book, “A New Kind of Science” [17] where he states that Classes 1 and 2 will “rapidly settle down” until there is to all intents and purposes no further activity. Class 3 has cells that continually change at each step, such that they “maintain a high level of activity forever”. Class 4 systems are in between Class 2 and Class 3 as the pattern produced does not die out as quickly as Class 2, but it does not have the complexity of Class 3. Wolfram also says that Class 4 systems “waver between Class 2 and Class 3 behaviour”. Finally, there are also some borderline cases which can be defined as one or the other class [17].

Further classifications aim to refine Wolfram’s classes [18, 19]—Li has one variation having six classes [20, 21], as shown in Table 1.

Table 1 The six classes defined by Li and Packard with their descriptions

Using compression to find interesting one-dimensional cellular automata

Abstract

Similar content being viewed by others

A Cellular Automata-Based Clustering Technique for High-Dimensional Data

Spatial Complexity Measure for Characterising Cellular Automata Generated 2D Patterns

Measuring Clustering Model Complexity

Explore related subjects

Introduction

Interestingness

Current classifications of 1D elementary cellular automata behaviour

Compression and classification of 1D elementary cellular automata behaviour

Previous use of compression with cellular automata

A new approach to clustering cellular automata behaviour using compression

Prediction by partial matching

Compressing 1D cellular automata output

Clustering 1D cellular automata output

Finding an optimal number of clusters

Comparing clustering algorithms

Hierarchical agglomerative clustering

Experimental results using PPM clustering

Clustering interesting elementary cellular automata

Criterion A: high variance

Criterion B: frequent jaggedness

Criterion C: strong gradients

Criterion D: small cluster size

Comparing the PPM clustering algorithm with other algorithms implemented by Weka

Clustering using expectation maximisation

XMeans

Conclusion

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation