BINGO: brain-inspired learning memory

Storage and retrieval of data in a computer memory play a major role in system performance. Traditionally, computer memory organization is ‘static’—i.e. it does not change based on the application-specific characteristics in memory access behaviour during system operation. Specifically, in the case of a content-operated memory (COM), the association of a data block with a search pattern (or cues) and the granularity (details) of a stored data do not evolve. Such a static nature of computer memory, we observe, not only limits the amount of data we can store in a given physical storage, but it also misses the opportunity for performance improvement in various applications. On the contrary, human memory is characterized by seemingly infinite plasticity in storing and retrieving data—as well as dynamically creating/updating the associations between data and corresponding cues. In this paper, we introduce BINGO, a brain-inspired learning memory paradigm that organizes the memory as a flexible neural memory network. In BINGO, the network structure, strength of associations, and granularity of the data adjust continuously during system operation, providing unprecedented plasticity and performance benefits. We present the associated storage/retrieval/retention algorithms in BINGO, which integrate a formalized learning process. Using an operational model, we demonstrate that BINGO achieves an order of magnitude improvement in memory access times and effective storage capacity using the CIFAR-10 dataset and the wildlife surveillance dataset when compared to traditional content-operated memory.


Introduction
Digital memory is an integral part of a computer system.It plays a major role in determining system performance.Memory access behaviour largely depends on the nature of the incoming data and the specific information-processing steps on the data.Emergent applications ranging from wildlife surveillance [2] to infrastructure damage monitoring [3,4] that collect, store and analyse data often exhibit distinct memory access (e.g.storage and retrieval of specific data blocks) behaviour.Even within the same application, such behaviour often changes with time.
Hence, these systems with variable and constantly evolving memory access pattern can benefit from a memory organization that can dynamically tailor itself to meet the requirements.
Additionally, many computing systems, specifically the emergent internet of things (IoT) edge devices, must deal with a huge influx of data of varying importance while being constrained in terms of memory storage capacity, energy and communication bandwidth [5][6][7].Hence, for these applications, it is important for the memory framework to be efficient in terms of energy, space, and transmission bandwidth utilization.Edge devices often also deal with multi-modal data, and the memory system must be flexible enough to handle inter-modality relations.
Based on these observations, we believe an ideal data storage framework for these emergent applications should have the following properties: • Dynamic in nature to accommodate for the constantly evolving application requirements and scenarios.
• Able to exhibit virtually infinite capacity that can deal with a huge influx of sensor data-a common feature for many IoT applications.• Capable of trading off data granularity with transmission and energy efficiency.• Able to efficiently handle multi-modal data in the context of the application-specific requirements.
Traditional memories [1] (both address-operated and content-operated) are not ideal for meeting these requirements due to lack of flexibility in their memory organization and operations.In an address-operated memory, each address is associated with a data unit.And for a content-operated memory, each data-search-pattern (cue/tag) is associated with a single data unit.Hence, in both cases, the mapping is one-to-one and does not evolve without direct user interference.Moreover, data in a traditional memory are stored at a fixed quality/granularity.When a memory runs out of space, it can either stop accepting new data or remove old data based on a specific data replacement policy.All these traits of a traditional memory are tied to its 'static' nature, which makes it inefficient for many modern applications that have evolving access requirements as established earlier.For example, in a wildlife image-based surveillance system geared towards detecting wolves, any image frame with at least one wolf can be considered to be of importance.A traditional memory, due to lack of dynamism, will statically store all incoming image frames at the same quality and with the same level of accessibility.This will lead to the storage of irrelevant data (non-wolf images) and uniform access time for both important (wolf images) and unimportant (non-wolf images) data units.
To design an ideal memory framework for these emergent applications, we draw inspiration from biological memory and try to model some of its most useful properties, such as: (1) Virtually Infinite Capacity: the ability of the biological brain to deal with a huge influx of data of varying importance; (2) Impreciseness: the tendency of biological memory to store and retrieve imprecise but approximately correct data; (3) Plasticity: the ability of organic brain to undergo internal change based on external stimuli; (4) Intelligence: the capability of the human brain to learn from historical memory access patterns and improve storage/access efficiency.
Performing a complex task such as efficient data storage for a target application/scenario, requires the incorporation of real-time knowledge [8].Different artificial intelligence (AI) and machine learning (ML) methods have been widely used for solving diverse problems, via learning the domain knowledge.Hence, we hypothesize that incorporating AI in a digital memory will allow us to model useful human brain traits in a digital memory framework.
With this vision in mind, we propose a new paradigm of content-operated memory framework, BINGO (Brain-Inspired LearNinG MemOry), which mimics the intelligence of the human brain for efficient storage and access of multi-modal data.In BINGO, the memory storage is a network of cues (search-patterns) and data, we term Neural Memory Network (NoK).Based on the feedback generated from each memory operation we use reinforcement learning to (1) optimize the NoK organization and (2) adjust the granularity (feature quality) of specific data units.BINGO is designed to have the same interface as any traditional COM and this allows BINGO to efficiently replace traditional COMs in any application as shown in Fig. 1.Applications that are resistant to imprecise data storage/ retrieval and deals with storing data of varying importance will benefit the most from using BINGO.
To quantitatively analyse the effectiveness of BINGO as a memory system in a computing device, we implement a BINGO memory simulator with an array of hyperparameters.We evaluate the framework using two vision datasets [9,10] and observe that the BINGO framework utilizes orders of magnitude less space, and exhibits higher retrieval efficiency while incurring minimal impact on the application performance.In summary, we make the following contributions: 1. We present a new paradigm of learning computer memory, called BINGO, that can track data access patterns to dynamically organize itself for providing high efficiency in terms of data storage and retrieval performance.2. We formalize the learning process of BINGO and prove interesting properties of the NoK. 3.For quantitatively analysing the capabilities of BINGO, we have designed a memory performance simulator with an array of tuneable hyperparameters.4. We present a formal process to select and customize BINGO for a target application.To demonstrate BINGO's merit compared to traditional content-operated memory, we provide a comprehensive performance analysis of BINGO using the CIFAR-10 dataset and a wildlife surveillance dataset [9,10].
The rest of the paper is organized as follows: Sect. 2 discusses different state-of-the-art digital memory frameworks and provides motivations for the proposed intelligent digital memory design.Section 3 describes in detail the proposed memory framework.Section 4 quantitatively analyses the effectiveness of the BINGO framework using different datasets.Section 5 concludes the paper.

Background and motivation
In this section, we shall first discuss the major difference between our proposed memory framework (BINGO) and existing similar technologies.Next we will provide motivations that led to the development of BINGO.

Computer memory: a brief review
Computer memory is one of the key components of a computer system [11].Digital memories are broadly divided into two categories based on how data are stored and retrieved: (1) address operated and (2) content operated [1].
In an address operated memory (for example a Random Access Memory or RAM [11,12]), the access during read/ write is done based on a memory address/location.During data retrieval/load, the memory system takes in an address as input and returns the associated data.Different variants of RAM such as SRAM (Static Random Access Memory) and DRAM (Dynamic Random Access Memory) are widely used [11].On the contrary, in a content operated memory (COM), memory access during read/write operations is performed based on a search pattern (i.e.content).

Content operated memory
A COM [1,13] does not assign any specific data to a specific address during the store operation.During data retrieval/load, the user provides the memory system with a search pattern/tag and the COM searches the entire memory and returns the address in the memory system where the required data are stored.This renders the search process extremely slow if performed sequentially.To speed up this process of content-based searching, parallelization is employed which generally requires additional hardware.And adding more hardware makes the COM a rather expensive solution limiting its large-scale usability.
A COM can be implemented in several ways as shown in Fig. 2, each with its own set of advantages and disadvantages.CAM (Associative Memory) is the most popular variant of COM and has been used for decades in the computing domain but the high-level architecture of a CAM has not evolved much.When a CAM becomes full, it must replace old data units with new incoming data units based on a predefined replacement policy.
Traditional CAMs are designed to be precise [1].No data degradation happens over time and in most cases, a perfect match is required with respect to the search pattern/tag to qualify for a successful retrieval.This feature is essential for certain applications such as destination MAC address lookup for finding the forwarding port in a network device.However, there are several applications in implantable, multimedia, Internet-of-Things (IoT) and data mining which can tolerate imprecise storage and retrieval.
BINGO and CAM are both content operated memory frameworks.However, there are several differences between a traditional CAM and BINGO as shown in Table 1.For both Binary Content Addressable Memory (BCAM) and Ternary Content Addressable Memory (TCAM), (1) there is no learning component, (2) data resolution remains fixed unless directly manipulated by the user, (3) associations between search-pattern (tag/cue) and data remain static unless directly modified, (4) only a oneto-one mapping relation exists between search-pattern/cue and data units.Consequently, space and data fetch efficiency is generally low, and we provide supporting results for this claim in Sect. 4.

Instance retrieval frameworks
Apart from standard computer memory organizations, researchers have also investigated different software-level memory organizations for efficient data storage and retrieval.An instance retrieval (IR) framework is one such software wrapper on top of traditional memory systems that are used for feature-based data storage and retrieval tasks [14].In an IR framework, during the training phase (code-book generation), visual words are identified/learned based on features of an image dataset.These visual words are, in most cases, cluster centroids of the feature distribution.Insertion of data in the system follows, and the data are generally organized in a tree-like data structure.The location of each data is determined based on the visual words (previously learned) that exist in the input image.During the retrieval phase, a search-image (or a search feature set) is provided and in an attempt to search for similar data in the framework, the tree is traversed based on the visual words in the search image.If a good match exists with a stored image, then that specific stored image is retrieved.These systems are primarily used for storing and retrieving images.The learning component of an IR framework is limited to the code-book generation phase, which takes place during initialization.Furthermore, once a data unit is inserted in the framework, no more location and accessibility change is possible.No associations exist between data units, and the granularity of data units does not change.On the contrary, BINGO represents a low-level memory organization where data granularity and association between data and search patterns evolve dynamically (see Table 1).

Intelligent caching and prefetching
Some prior works have focused on intelligent cache replacement policies and data prefetching [15,16] in processor-based systems.Zang et.al. proposes an LSTMbased cache data block replacement approach to increase overall hit rate [15].Additional modules (a training trigger,

Manytomany
The dynamic nature of BINGO, guided by continuous learning, makes it adaptable to the application requirements and the usage scenarios yLearning is used only to decide which data blocks to replace/fetch.
HGenerally used only during the code-book generation phase a training module, and a prediction module) are incorporated in the standard setup, and the overall caching system performance was evaluated using an OpenAI-Gym-based simulation environment [17].Training an LSTM network can be computationally expensive; hence, an asynchronous training approach is adopted using the training trigger module.This approach works best for a static database and does not scale well when old data are removed and new data gets added.This limitation arises because the input length of the LSTM model and the representation of an individual input bit cannot be changed after deployment.Hashemi et.al. proposed a memory data block prefetching approach using an LSTM-based model [16].This offline learning approach was able to outperform table-based prefetching techniques.However, the efficiency of the static machine learning model gets significantly reduced when the distribution of the access and cache misses change.It is also unclear if such a bulky recurrent neural network model (such as LSTM) can achieve acceptable performance when realized in hardware.
In comparison with [15,16], BINGO organizes the whole memory as a graph and dynamically (1) modify the graph structure and weights for increased accessibility and (2) adjust the details/granularity of each data units/neurons stored in the memory based on the memory access pattern.The proposed lightweight learning process, although inspired by online reinforcement learning, does not use traditional machine learning algorithms and is designed for easy hardware implementation.Additionally, BINGO does not have any limitations in terms of old data removal and new data storage.Hence, it significantly deviates from the intelligent memory frameworks (see Table 1).

Other approaches for intelligent data storage
Another software level memory organization proposed by Niederee et al. outlines the benefit of forgetfulness in a digital memory [18].However, due to the lack of quantitative analysis and implementation details, it is unclear how effective this framework might be.Human brain-inspired spatio-temporal hierarchical models such as Hierarchical Temporal Memory (HTM) have been proposed for pattern recognition and time-series analysis [19].However, HTM is not designed to be used as a data storage system that can replace traditional CAM.Hence, it differs from BINGO both in terms of functionality and methodology.Additionally, efforts have been made towards developing AI-guide compression techniques and data retrieval algorithms [20][21][22], but to the best of our knowledge there is no intelligent memory framework with the same level of dynamism and plasticity as BINGO.

Motivation: taking inspiration from human memory
Computer and human memory are both designed to perform data storage, retention and retrieval.Although the functioning of human memory is far from being completely formalized and understood, it is clear that it is vastly different in the way data are handled.Several properties of the human brain have been identified which allows it to be far superior than traditional computer memory in certain aspects.In the following subsections, we will look into some of the most interesting properties of the human brain and envision their potential digital counterparts.

Virtually infinite capacity
The capacity of the human brain is difficult to estimate.John von Neumann, in his book ''The computer and the brain'' [23], estimated that the human brain has a capacity of 10 20 bits.Researchers now even believe that our working memory (short-term memory) can be increased through ''plasticity'', provided certain circumstances.According to Lo ¨vde ´n et al., ''... increase in workingmemory capacity constitutes a manifestation of plasticity ...'' [24].On top of that, due to the intelligent pruning of unnecessary information, a human brain is able to retain only the key aspects of huge chunks of data for a long period of time.
If a digital memory can be designed with this human brain feature, then the computer system, through intelligent dynamic memory re-organization (learning-guided plasticity) and via pruning of unnecessary data features (learned from statistical feedback), can attain a state of virtually infinite capacity.For example, in a wildlife image-based surveillance system that is geared towards detecting wolves, the irrelevant data (non-wolf frames) can be subject to feature-loss to save space without hampering the effectiveness of the application.

Imprecise/imperfect storage and access
The idea of pruning unnecessary data, as mentioned in the previous section, is possible because the human brain operates in an imprecise domain ( [25]) contrary to traditional digital memory.Certain tasks may not require precise memory storage/recall, and only specific high-level features of the data may be sufficient.
Hence, supporting the imprecise memory paradigm in a digital memory is crucial for attaining virtually infinite capacity and faster data access.For example, a wildlife image-based surveillance system can operate in the imprecise domain because some degree of compression/ feature-reduction of images will not completely destroy the high-level features necessary for its automatic detection tasks.This can lead to higher storage and transmission efficiency.

Dynamic organization
We have mentioned that plasticity can lead to increased memory capacity but it also provides several other benefits in the human brain.According to Lindenberger et al., ''Plasticity can be defined as the brain's capacity to respond to experienced demands with structural changes that alter the behavioural repertoire.''[26].It has been also hypothesized that the human brain post-processes and re-organizes old memories during downtime (i.e.sleeping) [27].Hence, we believe plasticity leads to better accessibility of important and task-relevant data in the human brain.And the ease-of-access of particular memory units is adjusted with time as per the individual's requirements.
If we can design a digital memory that can re-organize itself based on data access patterns and statistical feedback, then there will be great benefits in terms of reducing the overall memory access effort.For example, a wildlife image-based surveillance system designed to detect wolves will have to deal with retrieval requests mostly related to frames containing wolves.Dynamically adjusting the memory organization can enable faster access to data that are requested more frequently.

Learning guided memory framework
Ultimately, the human brain can boast of so many desirable qualities due to its ability to learn and adapt.It is safe to say the storage policies of the human brain also vary from person to person and time to time [25].Depending on the need and requirement, certain data are prioritized over others.The process of organizing the memories, feature reduction, storage and retrieval procedure changes over time based on statistical feedback.This makes each human brain unique and tuned to excel at a particular task at a particular time.
Hence, the first step towards mimicking the properties of the human brain is to incorporate a learning component in the digital memory system.We envision that using this learning component, the digital memory will re-organize itself over time and alter the granularity of the data to become increasingly efficient (in terms of storage, retention and retrieval) at a particular task.For example, a wildlife image-based surveillance system will greatly benefit from a memory framework that can learn to continuously re-organize itself to enable faster access to application-relevant data and continuously control the granularity of the stored data depending on the evolving usage scenario.

BINGO organization and operations
To incorporate dynamism and embody the desirable qualities of a human brain in a digital memory, we have designed BINGO.It is an intelligent, self-organizing, virtually infinite content addressable memory framework capable of dynamically modulating data granularity.We propose a novel memory architecture, geared for learning, along with algorithms for implementing standard operations, such as store, retrieve and retention.A preliminary analysis of such a memory is presented in our archived paper [28].

Memory organization
The BINGO memory organization can be visualized as a network, and we refer to it as ''Neural Memory Network'' (NoK).The NoK, as shown in Fig. 3, consists of multiple hives each of which is used to store data of a specific modality (data-type).For example, if an application requires to store image and audio data, then the BINGO framework will instantiate two separate memory hives for each data modality.This allows the search to be more directed based on the query data type.
The fundamental units of the NoK are (1) data neurons and (2) cue neurons.A data neuron stores an actual data unit and a cue neuron stores a cue (data search pattern or tag).Each data neuron is associated with a 'memory strength' which governs its size and the quality of the data inside it.A cue is a vector, of variable dimension, representing a certain concept.Assume that a BINGO memory framework is configured to support n different types of cues, in terms of their dimensions.Then, the cues with the biggest dimension/size among all the cues are referred to as level-1 cues and the cues with the ith biggest dimension are referred to as level-i cues.So it follows that the cues with the smallest dimension are referred to as level-n cues.The level-n cues are used as entry points into the NoK while (i) searching for a specific data neuron and (ii) finding a suitable place to insert a new data neuron.For example, in a wildlife surveillance system, the level-n cue neurons may contain vectors corresponding to high-level concepts such as ''Wolf'', ''Deer'', etc.The level-1 cue neurons can contain more detailed image features of the stored data.The data neurons will be image frames containing wolves, deer, jungle background, etc.
The cue neuron and data neuron associations in the NoK (\cue neuron, cue neuron[ and \cue neuron, data neuron[, \data neuron, data neuron[) change with time, based on the memory access pattern and hyperparameters.The data neuron memory strengths are also modulated during memory operations to increase storage efficiency.
To introduce the effect of ageing, all association weights and data neuron strengths decay based on a user-defined periodicity.The effect of ageing is carried out during the retention procedure.
Additionally, to facilitate multi-modal data search, connections between data neurons across memory hives are allowed.For example, when searched with the cue ''deer'' (the visual feature of a deer), if the system is expected to fetch both images and sound data related to the concept of ''deer'', then this above-mentioned flexibility will save search effort.

BINGO parameters
We propose several parameters for BINGO that can modulate its behaviour.These parameters are of two types: (1) Learnable Parameters that change throughout the system lifetime guided by online reinforcement-learning and ageing; (2) Hyperparameters that are set during system initialization and changed infrequently.

Learnable parameters
We consider the following parameters as learnable parameters for BINGO: 1. Data neuron and cue neuron weighted graph The weighted graph (NoK) directly impacts the data search efficiency (time and energy).Hence, the elements of the graph adjacency matrix are considered as learnable parameters.2. Memory strength vector The quality and size of a data neuron depends on its memory strength.Hence, the memory strengths of all the data neurons are also considered as learnable parameters.They jointly dictate the space-utilization, transmission efficiency and retrieved data quality.
The number of learnable parameters can also change after each operation.These parameters constantly evolve via an online reinforcement learning process and an ageing process as described in Sects.3.3 and 3.4, respectively.

Hyperparameters
We have defined a set of hyperparameters that influences the memory organization and operations of a BINGO framework.For each memory hive, we propose the following hyperparameters: 1. Memory strength modulation factor (d 1 ) Used to determine the step size for increasing data neuron memory strength inside the NoK in response to a specific memory access pattern.This hyperparameter is used during store (Algorithm 1) and retrieve (Algorithm 5) operations.2. Memory decay rate (d 2 ) Controls the rate at which data neuron memory strength and features are lost due to ageing.This hyperparameter is used during the retention operation (Algorithm 7). 3. Maximum memory strength (d 3 ) This is the maximum value the memory strength of a data neuron can attain.This hyperparameter is used during the store (Algorithm 1) and retrieve (Algorithm 5) operations.4. Association strengthening step-size (g 1 ) Step size for increasing association weights inside the NoK in response to a specific access pattern.This hyperparameter is used during the store (Algorithm 1) and retrieve (Algorithm 5) operations. 5. Association weight decay rate (g 2 ) Used as a step size for decreasing association weights inside the NoK due to ageing.This hyperparameter is used during the retention operation (Algorithm 7).6. Association pull-up hastiness (g 3 ) Used to determine the haste with which the accessibility of a given neuron is increased in response to a specific access pattern.This hyperparameter is used during the store (Algorithm 1) and retrieve (Algorithm 5) operations.7. Cue neuron matching metric (K) Similarity thresholds for cues.It is a list of threshold values where each entry corresponds to a specific cue level.K ¼ fk 1 ; k 2 ; :::k n g, for a system with n different cue levels.This hyperparameter is used during the store (Algorithm 1) and retrieve (Algorithm 5) operations.8. Degree of allowed impreciseness (u) Limits the amount of data feature which is allowed to be lost due to memory strength decay during ageing.u ¼ 0 implies data can get completely removed if the need arises.This hyperparameter is used during the retention operation (Algorithm 7).9. Initial association weight (e 1 ) Determines the association weight of a newly formed association.This hyperparameter is used during the store (Algorithm 1) and retrieve (Algorithm 5) operations.10.Minimum association weight (e 2 ) Limits the decay of association weight beyond a certain point.Setting e 2 ¼ 0, will allow associations to get deleted.This hyperparameter is used during the retention operation (Algorithm 7).11.Store effort limit (p 1 ) Limits the operation effort during store (Algorithm 1).12. Retrieve effort limit (p 2 ) Limits the search effort during retrieve operation (Algorithm 5).À1 indicates infinite limit.13.Locality crossover (x) It is a flag that enables localizing similar data neurons.This hyperparameter is used during the store (Algorithm 1) and retrieve (Algorithm 5) operations.14. Frequency of retention procedure (n) The retention procedure of BINGO brings in the effect of ageing.This hyperparameter is a positive integer denoting the number of normal operations to be performed before the retention operation is called once.A lower value will increase dynamism.15.Compression techniques For each memory hive an algorithm for data compression must be specified.
For example, we can use JPEG compression [29] for an image hive.

Learning process
The learnable parameters, governing the behaviour of BINGO's NoK, are updated based on the feedback from different memory operations.The proposed learning process draws inspiration from online reinforcement learning although it has a very different goal and execution strategy [30].The goals/objectives of the learning in BINGO are to: 1. Increase memory search speed by learning the right NoK organization based on the memory access pattern.2. Reduce space requirement while maintaining data retrieval quality and application performance.This should be achieved by learning the granularity (details) at which each data neuron should be stored given the data access-pattern and system requirements.
The learnable parameters (H) has two components: (1) A, the adjacency matrix for the entire NoK graph; (2) M, a vector where each element is the corresponding neuron's memory strength.For a NoK with n neurons (DNs and CNs): In Fig. 4, we present the flowcharts of all the BINGO operations.The learning process is embedded in the store and retrieve operations.The first step, for both store and retrieve operations, is to search for a suitable cue neuron in the NoK which will serve as a reference point for either (i) inserting a new data or (ii) retrieving a data neuron.This search is carried out with the help of the associated/search cues C. For the search, we have proposed a limited and weighted breadth-first search algorithm (Algorithm 2), which is explained in Sect.3.5.1.Let us assume that the outcome of this search sub-operation is a traversal order Y t ¼ fN t 1 ; N t 2 ; N t 3 ; . ..; N t f g, where each N t i 2 Y t is the index of a neuron in the NoK which have been visited during the search process.The neuron N t f is considered as the neuron which has been accessed at the end of the search (a.k.a. the reference point).Assume that the path from N t Then, both Y t and Y p constitutes the feedback on the basis of which the NoK learnable parameters must be adjusted.In the best case, the lengthðY t Þ can be minimum 2 for a search.Hence, based on the search outcome Y t , H is modified such that the probabilityðlengthðY t Þ ¼ 2jC; HÞ can be maximized.This is achieved as described in Sect.3.3.1.During a store operation, extra neurons may also get added to the NoK.This will lead to the addition of more parameters as explained in Sect.3.3.2.

Learning from the operation feedback
The here goal is to compute the updated parameters Here d 1 is the memory strength modulation factor (hyperparameter) and d 3 is the maximum memory strength allowed (hyperparameter).The function f 1 is application dependent.For example, f 1 can be f 1 ðxÞ ¼ x.With the DM defined, we compute the updated parameters: M 0 = M þ DM.Note that no parameter change is done in the case of a failed retrieval attempt.

Parameter expansion due to neuron addition
As a continuation of Sect.3.3.1, in case of a store operation, if new cue neurons and data neurons are added to the NoK, then the previously computed H 0 is expanded to H 00 ¼ ðA 00 ; M 00 Þ for accommodating the parameters of the new neurons.Let N 0 ¼ fN 0 1 ; N 0 2 ; . ..; N 0 o g be the indices of new neurons added to the NoK.Then, we compute H 00 as follows: 1. Additional o rows and columns are added to A 0 to generate A 00 .Each of the new entries (learnable parameters) in the matrix are zero with the following exceptions: 8N 0 i 2 N 0 , if N 0 i is to be associated with neurons AN i ¼ fAN i 1 ; AN i 2 ; . ..; AN i q g, then 8AN i j 2 AN i , A 00 ½N 0 i ½AN i j ¼ e 1 and A 00 ½AN i j ½N 0 i ¼ e 1 .Here e 1 is a hyperparameter defined earlier.

The dimension of vector
data neuron and M 00 ½i ¼ 0 if i corresponds to a cue neuron.Here d 3 is a hyperparameter defined earlier.

Ageing process
During the retention operation (Fig. 4), we introduce the effect of ageing by reducing the memory strength of all data neurons and decreasing the weight of all associations.As a data neuron's strength decreases, BINGO starts compressing the data gradually.This compression leads to data granularity (details) loss, but it also frees up space for more relevant data.This creates a tug-of-war between ageing and positive reinforcements from the store and retrieve operations.The method of ageing is explained in the next sub-sections.Assume that, before retention, the BINGO parameters were H ¼ ðA; MÞ.First we compute DA of dimension (n, n), such that DA½i½j ¼ minðg 2 ; A½i½j À e 2 Þ for i; j 2 ½1; n.Then, we compute DM of dimension n such that DM½i ¼ minðf 2 ðd 2 Þ; M½i À uÞ for i 2 ½1; n.The function f 2 can be f 2 ðxÞ ¼ x or something more complex depending on the application requirement.The updated parameters, H 0 ¼ ðA 0 ; M 0 Þ, are computed as follows: (i) A 0 ¼ A À DA and (ii) M 0 ¼ M À DM.Here e 2 , u, and d 2 are hyperparameters defined earlier.

Parameter freezing
As a continuation of Sect.3.4.1,after a retention operation, some of the parameters in H 0 may get frozen if they reach a certain value.Specifically, if the entries A 0 ½i½j and A 0 ½j½i reach zero, then the association is considered dead and remains frozen until it is revived due to the feedback from a future memory operation.If an entry M 0 ½i, where i corresponds to a data neuron reaches zero, then the data neuron is considered dead and the parameter M 0 ½i freezes.

Memory operation algorithms
We implement the store, retrieve, and retention operations based on the learning process and the ageing process described in Sects.3.3 and 3.4.For the implementation, we use the Algorithms 1, 2, 3, 4, 5, 6 and 7. Up to two levels of cues are supported in this embodiment of BINGO.However, the BINGO framework can be adapted to support additional cue levels of varying dimensions with minor adjustments to the algorithms.

Store
In Algorithm 1, we highlight the high-level steps required for storing a data in the BINGO NoK.MEM is the NoK where the data are to be stored.D is the data, and C are the

27:
return [access P ath, 0] associated cues for the given data.We provide the cues (C) as input to have a similar interface compared to traditional CAM.However, BINGO can be easily extended to perform in-memory feature/cue extraction.HP is the set of hyperparameters for MEM.Before the new data can be stored, we must ensure that there is enough space in MEM (line 2 and line 3).If there is a shortage of space, then the framework uses the retention procedure (Algorithm 7, described in Sect.3. The proposed search algorithm is described in Algorithm 2. MEM is the memory hive where the operation is taking place, C is set of cues provided by the user for the operation, HP is the set of hyperparameters for MEM, ep is the level-n cue neuron (n ¼ 2) selected as the starting/entry point of the search, and limit is the search effort limit for the operation.The search is a limited-weighted version of the breadth-first-search (BFS) where neuronQueue is the

6:
strengthen association(M EM, access P ath[0], access P ath [1], HP ) BFS Queue, bestCandidate is the level-1 cue which is deemed to be the best search candidate at a given point of time, and bestCandidate Sim is the similarity between the bestCandidate and C ! level 1 cue.The visited list keeps track of the neurons already visited, and the traversalMotion list keeps track of the order in which each neuron is visited along with their parent neuron.While the neu-ronQueue is not empty (line 4) and the number of neurons traversed so far is less than limit, the search continues.If the hyperparameter HP !x ¼ 1, then paths blocked by level-n cue neurons are ignored to restrict the search within a specific locality (sub-graph) of the NoK graph (line 9).During each step of the search, a new neuron is encountered and if it is a level-1 cue neuron, then it is compared with C ! level 1 cue.If the similarity of the said comparison (sim) is greater than HP !k 1 (hyperparameter), then we have found a good match.In this case, the path traversed to access this level-1 cue neuron (neuron) is extracted from the traversalMotion (line 15) and returned from the procedure along with a flag value of 1 indicating that a good match was found (line 16).After visiting each neuron, all the adjacent neurons of the visited neuron are also enqueued in descending order of their corresponding association weights (lines [20][21][22][23][24][25].If a level-1 cue neuron with sim [ HP !k 1 is not found at the end of the search, then the best candidate level-1 cue neuron encountered so far is considered.The path traversed to access this level-1 cue neuron (bestCandidate) is extracted from the traver-salMotion (line 26) and returned from the procedure along with a flag value of 0 indicating that a good match was not found (line 27).
Looking back at Algorithm 1, in line 9, the LEARN STORE procedure (Algorithm 3) is invoked.which was also the starting point of the search.In this scenario, we simply increase the association weight between access Path½0 and access Path½1 (by HP !g 1 ) to give it a higher search priority (as we are doing a weighted BFS during the SEARCH procedure defined in Algorithm 2).

Retrieve
In Algorithm 5, we highlight the high-level steps required for retrieving a data from the BINGO NoK.MEM is the BINGO NoK from where we attempt to retrieve the data.C is the set of search cues on the basis of which the data are to be retrieved.Cues (C) is provided as input to ensure a traditional CAM like interface.matching level-1 cue was located.This essentially implies that the queried data does not exist in the memory or could not be located within the search effort limit (HP !p 2 ).
The description of SEARCH and INC ACCESSIBILITY sub-operations are provided in Sect.3.5.1.

Retention
In a traditional CAM, data retention involves maintaining the memory in a fixed state.BINGO, on the other hand, allows the NoK to modify itself to show the effect of ageing as shown in Algorithm 7. The strength of all the associations in the NoK is decreased as seen in line 5 (by g 2 ), and the memory strength of all the data neurons is weakened as seen in line 9 (based on Eqn. 1).This is an exponential decay function, but it is also possible to use a linear function instead.Here d 2 and u are hyperparameters introduced earlier.Weakening a data neuron leads to compression and data feature loss.If all the associated data neurons of a level-1 cue neuron die, then that cue neuron is also marked as dead and is bypassed during future searches.Hyperparameter HP ! e 2 restricts the decay of strength(a).

Properties of the NoK and the learning process
The dynamic memory organization and constant learning creates an constantly changing complex BINGO NoK graph, but there are several interesting bounds that exist and observations that can be made regarding its structure and behaviour.

Upper bound of the NoK depth
Assume that there are m data neurons in the NoK, D ¼ fd 1 ; d 2 ; . ..; d m g.The minimum number of hops between each of these data neurons to any level-n cue neuron be H ¼ fh 1 ; h 2 ; . ..; h m g.Then, the depth of the NoK is defined as max(H).We define the depth in this manner because any search within the NoK graph starts from a level-n cue-neuron and the target of every search is the level-1 cue neuron associated with a particular data neuron.
for each a ∈ A do

8:
for each d ∈ D do 9: implementing the operations then the depth of the NoK at any state is ðp 1 þ 1Þ.
Proof Assume that the premise is true.In a NoK with 0 data-neurons, the depth ¼ 0 ðp 1 þ 1Þ, for p 1 ! 2. In a NoK with 1 data neuron (d 1 ), it must be the first data neuron that was inserted because no data neurons and associations are deleted according to the premise (u [ 0 and e 2 [ 0).According to Algorithm 1, the first data neuron is inserted as shown in Fig. 5. Hence, for a BINGO system with two levels of cues (assumption made by the proposed algorithms), the depth of the NoK will be 2 which is It appears that the consequence of the theorem statement holds for the base cases where the NoK has 0 or 1 data neuron(s).Now let us assume a NoK with m data neurons, D ¼ fd 1 ; d 2 ; . ..; d m g, where m [ 1 and k level-2 cue neurons C 2 ¼ fc 2 1 ; c 2 2 ; . ..; c 2 k g.Based on the definition of depth of a NoK, the consequence of the theorem statement can be rewritten as , the insertion could have been done in two different ways: This will continue to hold true because no data neurons and associations are deleted according to the premise (u [ 0 and e 2 [ 0).Hence, for any d i 2 D inserted in this way 9c 2 j 2 C 2 such that hopsðd i ; c 2 j Þ ðp 1 þ 1Þ.In this case the c 2 j ¼ CN 2 new .Hence, we have proved that if the premise is true, then the depth of the NoK at any state is ðp 1 þ 1Þ.h

Number of mistakes to attain the ideal accessibility state
To define 'mistakes' for our framework, we draw inspiration from the definition of 'mistakes' in regard to online learning [31].In BINGO, the effort of searching for a specific level-1 cue neuron and the associated data neuron is proportional to the number of neurons visited during the search (Algorithm 2) suboperation.During a retrieve operation if the search sub-operation selects a level-1 cue neuron, c 1 as the target cue neuron (access path½À1) and if the Flag C 1 ¼¼ 1, then the neuron c 1 is considered to have been accessed.The ''Pull up and strengthen'' strategy (in Algorithm 4) is used to increase the accessibility of neurons that are accessed more often.Hence, if continuously accessed, the search effort of a specific level-1 cue neuron and the associated data neuron will continue to decrease.After a certain number of accesses the level-1 cueneuron c 1 will be reachable by visiting only two neurons from a given level-2 cue neuron (c 2 ) as the starting point of the search.At this state, the search of c 1 from c 2 can be termed as ''ideal'' because no further accessibility improvement can be made and c 1 is said to be in the ideal accessibility state with respect to c 2 .From a given NoK state, the number of subsequent accesses to c 1 starting from c 2 required to attain the ideal accessibility state is defined as mistakes (MIST) in regard to c 1 and c 2 .vi) in the current NoK state, access path is the list of neurons (length ¼ l) that needs to be traversed to reach the target level-1 cue neuron access path½À1 starting from a level-n cue neuron access path½0, (vii) S is the weight of the strongest association attached to the neuron access path½0, (viii) the previously proposed algorithms are used for implementing the operations, (ix) no more retention and store operations are performed until access path½À1 reaches ideal accessibility state with respect to access path½0, and (x) MIST is the number of mistakes before access path½À1 can reach ideal accessibility state with respect to access path½0 then: Proof Assume that the premise is true, then the current state of the NoK can be of three types as follows: 1.If access path½À1 is adjacent to access path½0 (i.e.l ¼ 2) and if S\e 1 , then the first neuron visited from access path½0 will be access path½À1 based on the weighted BFS search carried out in Algorithm 2. Hence, in this case, access path½À1 is already in the ideal accessibility state with respect to access path½0, that is, 2. If access path½À1 is not adjacent to access path½0 and S\e 1 , then once the cue neuron access path½À1 becomes adjacent to access path½0, the search will become ideal.After each access to access path½À1 via access path½0, the subsequent access path length reduces based on g 3 (as described in Algorithm 4).So it follows that access path½À1 will be adjacent to access path½0 after d lÀ2 g 3 e accesses.Hence, 3. If access path½À1 is not adjacent to access path½0 and S ! e 1 , then once the cue neuron access path½À1 becomes adjacent to access path½0, the search will still not be ideal.Only when association weight between access path½À1 and access path½0 becomes [ S, the ideal accessibility state will be achieved.As mentioned before, after each access to access path½À1 via access path½0, the subsequent access path length reduces based on g 3 (as described in Algorithm 4).So it follows that neuron access path½À1 will be adjacent to access path½0 after d lÀ2 g 3 e accesses.After this, each access to access path½À1 from access path½0 will increase the association weight between access path½À1 and access path½0.Hence, after b SÀe 1 g 1 þ 1c accesses the association weight between the neurons access path½À1 and access path½0 will exceed S. So, MIST ¼ d lÀ2 We have demonstrated that if the premise is true, then

Data neuron strength depends on access frequency
In order to prove the next property of BINGO, we start with the following assumptions: After ðn À xÞ accesses and m retention operations, S 0 Based on our previous assumptions, Hence proved that when the premise is true the consequence is also true.

Similar data neurons form localities
If the hyperparameter x ¼ 1, then the search sub-operation (for both store and retrieve) is not allowed to pass through any level-n cue neuron (as described in Algorithm 2).Due to this, any data neuron that is inserted using a specific level-n cue neuron c n i can also be retrieved by starting the search from c n i without crossing over any other level-n cue neuron.We define a locality as the NoK sub-graph formed by neurons bounded by level-n cue neurons which are not c n i .Each locality in the NoK will store a specific type of data relating to the central level-n cue neuron and the adjacent level-n cue neurons.

Resistance of BINGO to sporadic noise
With regard to the recent historical access pattern, we define noise as an uncharacteristic retrieval access to a level-1 cue neuron and the associated data neuron.For example, assume that in the last N retrieve operations C 1 is the set of level-1 cue neurons that have been accessed.Now if another retrieve operation accesses a cue c 6 2 C 1 , then we consider this as a potential noise.Note that this may not be a noise simply because N may be too small to capture the full scope of the access pattern or there may be a legitimate change in the access behaviour of the application using the BINGO framework.However, if this is indeed a one-of noisy access then it will not have too much impact on the BINGO NoK organization because according to Theorem.2, the accessibility of a neuron takes MIST ¼ accesses before an ideal accessibility state is reached.One noisy access may increase the accessibility of c slightly and may reduce the accessibility of some cue neurons in C 1 slightly but such changes will soon get overshadowed by subsequent accesses.Hence, we believe that the BINGO NoK is robust against sporadic & noisy retrieve operations.

Dynamic behaviour of BINGO
In Fig. 6a, we illustrate the dynamic nature of BINGO by displaying how the NoK changes during a sequence of operations.Accessibility of different data neurons are changed and the memory strength of data neurons increase or decrease based on the feedback-driven reinforcement learning and ageing.In this figure, a thicker line represents a stronger association and a bigger data neuron signifies its higher memory strength.In contrast, as seen in Fig. 6b, the traditional CAM does not show any sign of intelligence or dynamism to facilitate data storage/retrieval.In Sect.4.3, a more detailed simulation-accurate depiction of BINGO's dynamism is provided.

BINGO and traditional CAM simulators
In order to quantitatively analyse the effectiveness of BINGO in a computer system, we have implemented a BINGO simulator with the following features: • It can simulate all memory operations and provide relative benefits with respect to traditional CAM in terms of operation effort.• The framework is configurable with an array of hyperparameters described in Sect.3.2.2.• The BINGO simulator can be mapped to any application designed for using a CAM or a similar framework.• The simulator implements the learning paradigm as described in Algorithms 1, 2, 3, 4, 5, 6 and 7. • The BINGO simulator is highly scalable and can simulate a memory of arbitrarily large size.
For all the analysis provided in Sect.4, we define operation effort as shown in Eqn. 2 where Comp L1 is the number of level-1 cues compared and Comp L2 is the number of level-2 cues compared during the operation.A weight of 4 is applied for Comp L1 because, in our experiments, the dimension of a level-1 cue (4096) is 4 times bigger than the dimension of a level-2 cue (1024).
We have also created a traditional CAM simulator to compare against BINGO.This traditional CAM behaviour is modelled based on standard CAM organization [32].A single 4096 dimensional tag/cue is associated with each data unit.A first-in-first-out (FIFO) or a least-recently-used (LRU) replacement policy is used if the CAM runs out of space.During data retrieve, the query cue is compared with the existing cues in the CAM.If a matching cue is found in the CAM, then the associated data are returned.There is no effect of ageing and dynamic memory re-organization in the traditional CAM.For the traditional CAM, we define operation effort as shown in Eqn. 3 where Comp is the number of tags/cues compared during the operation.
Because the cue dimension is 4096, a weight of 4 is applied to make the comparison with BINGO fair.

Desirable application characteristics
Certain applications will benefit more than others from using BINGO and we formalize the properties of such applications next.

Imprecise store and retrieval
It is recommended to use the BINGO framework in the imprecise mode for storage and search efficiency.Assume D ¼ Set of data neurons in the Memory/NoK (MEM) at a given instance.For a given data then in order for the application to operate in the imprecise domain, it must be that the Where size(X) is the size of the data neuron X and Quality(X) is the quality of the data in the data neuron X, in light of the specific application.j 1 and j 2 are small quantities.For example, in a wildlife surveillance system, if an image containing a wolf is compressed slightly, it will still look like an image with the same wolf.

Notion of object(s)-of-interest
For a specific application, assume that D is a new incoming data which must be stored in the Memory.OL ¼ objects in data D. Then for the application to benefit from BINGO, denotes the importance of the object O i for the specific application.For example, in wildlife surveillance designed to detect wolves, frames containing at least one wolf are considered to be of higher importance.This importance is learned based on the data access pattern during operation.If the importance shifts after a while, then the memory adjusts itself accordingly.

Effectiveness of BINGO: a quantitative and visual analysis
To model BINGO and traditional CAM, we use the simulators as described in Sect.3.8.We first evaluate the effectiveness of BINGO using the standard CIFAR-10 dataset [9].After that, we try to emulate a wildlife surveillance system by using multiple video footage gathered from a camera deployed in the wild and then observe the effectiveness of BINGO if used in such an application.For all the experiments, the VGG16 [33] last layer outputs before softmax (dim ¼ 1024) are used as level-n (n ¼ 2) cues and the second last layer outputs (dim ¼ 4096) are used as level-1 cues.The last layer outputs of VGG16 are indicative of the class of the data, while the second last layer output contains richer features.We use the cosine similarity metric for estimating the similarity between two cues.JPEG [29] is used for compressing the data neurons due to memory strength loss.The choice of hyperparameters will depend on the system using the BINGO framework.Just like in the case of AI frameworks, different hyperparameter searching techniques can be used for determining the ideal BINGO hyperparameters.

Evaluation with different classes prioritized
We run a total of 30 experiments prioritizing each CIFAR-10 class one-by-one for BINGO, CAM with FIFO and CAM with LRU.The above-mentioned hyperparameters were used and the memory size was limited to 250,000 bytes for all the 3 memory variants.We see the simulation results in Table 2.The detection accuracy is based on the GluonCV [34] Model Zoo ResNet-110 v1 for CIFAR-10 [35].In a CAM, all data units are accessed in parallel during the retrieve operation and we show this effort in row 'Avg.Par.Retrieve Effort'.However, as we only report sequential efforts for the BINGO operations, we have also reported the sequential efforts for CAM ('Avg.Seq.Retrieve Effort').Operation effort is the average of store effort and retrieve effort for a specific access type (sequential or parallel).
BINGO is superior to CAM-FIFO is every aspect.On average, BINGO (i) stores % 2Â more data units than CAM-FIFO, (ii) has an average operation effort 11.15Â lower with respect to CAM-FIFO's average sequential operation effort, (iii) has an average operation effort 18.8Â lower with respect to CAM-FIFO's average parallel operation effort, and (iv) has a detection accuracy about 11% higher compared to CAM-FIFO.On average, BINGO (i) stores % 2Â more data units than CAM-LRU, (ii) has an average operation effort 7.25Â lower with respect to CAM-LRU's average sequential operation effort, (iii) has an average operation effort 18.8Â lower with respect CAM-LRU's average parallel operation effort and (iv) has a detection accuracy about 4% lower in comparison with CAM-LRU.For IoT applications operating under strict space, time and energy constraints, the benefits of BINGO can far outweigh this 4% accuracy loss.

Evaluation under progressive memory constraint
To have a more in-depth understanding of the behaviours of BINGO and CAM with strict memory constraint, we perform a series of experiments with varying memory limits.To obtain a single metric for measuring the performance of a memory system, we have designed a metric called 'memory quality factor' as shown in Eqn. 4.Here QF stands for 'Memory Quality Factor', DA is the 'Detection Accuracy', DU is the number of 'Data Units Stored', Total_Samples is the total number of data that was presented to the memory system for storage, FIFO_Se-q_OE is the 'Average Sequential Operation Effort' for CAM-FIFO under the same settings (this is treated as a baseline), and OE is the 'Average Sequential Operation Effort' of the memory for which we are computing QF.
Also for these experiments, we use the hyperparameters as mentioned above except for u, which is set to 0.
We plot the memory quality factors for BINGO, CAM-FIFO and CAM-LRU in Fig. 7.In Fig. 7a, we observe the results when deer-class is prioritized and in Fig. 7b, we observe the results when automobile-class is prioritized.BINGO appears to be superior in comparison with CAM-LRU and CAM-FIFO.The memory quality factor growth rate for BINGO appears to be polynomial, while CAM-FIFO appears to grow linearly and CAM-LRU appears to be stable.Interestingly, CAM-FIFO and CAM-LRU converge to the same quality factor as the memory limit constraint gets relaxed.
4.2 A real application case study: wildlife surveillance Image sensors are widely deployed in the wilderness for rare species tracking and poacher detection.The wilderness can be vast and edge devices operating in these regions often deal with low storage space, limited transmission bandwidth and energy shortage.This demands efficient data storage, transmission and power management.Interestingly, this specific application is resistant to imprecise data storage and retrieval because compression does not easily destroy high-level data features in images.Also, in the context of this application, certain objects such as a rare animal of a specific species can be considered more important than an image with only trees or common animals.Hence, this application has the desirable characteristics suitable for using BINGO and will certainly benefit from BINGO's learning guided data preciseness modulation and plasticity schemes.
To emulate a wildlife surveillance application, we construct an image dataset from a wildlife camera footage containing 40 different animal sightings.The details, of this dataset, are located at the publicly available dataset repository that we have released [10].We sample the video with a rate of 1 frame for every 20 frames to avoid too much repetition and obtain 644 frames.During each experiment all the 644 images are stored in the memory using 644 store operations which are interleaved with 644 retrieve operations of previously stored randomly selected images of wolf or fox.Here wolf and fox images are considered as 'prioritized'.We use the following BINGO hyperparameters for our experiments unless otherwise mentioned: [d 1 : ðd 3 À current strengthÞ, d 2 : 0:1, On an average, BINGO requires less operation effort and stores more data units with minimal impact on the retrieved data quality in terms of AI based detection.'Avg.' stands for average, 'Seq' stands for sequential and 'Par' stands for parallel.Operation effort is an average of store and retrieve efforts We run the experiments with memory size limited to 50,000,000 bytes for all the 3 memory variants.We see the simulation results in Table 3.The detection accuracy is based on the pretrained ImageNet VGG16 net [33].For these experiments a detection is considered correct if any of the top-25 predictions belong to the following classes: {red_fox, red_wolf, timber_wolf, white_wolf, grey_fox, kit_fox, Arctic_fox }.
We observe a similar trend as before.BINGO is superior to CAM-FIFO is every aspect.BINGO stores (i) % 2:5Â more data units than CAM-FIFO, (ii) has an average operation effort 12.3Â lower with respect to CAM-FIFO's average sequential operation effort, (iii) has an average operation effort 41.02Â lower with respect to CAM-FIFO's average parallel operation effort, and (iv) has a detection accuracy about 4.2% higher than CAM-FIFO.On average, BINGO stores (i) % 2:5Â more data units than CAM-LRU, (ii) has an average operation effort 7.8Â lower with respect to CAM-LRU's average sequential operation effort, (iii) has an average operation effort 40.81Â lower with respect to CAM-LRU's average parallel operation effort, and (iv) has a detection accuracy about 1% lower.For an IoT device, deployed in the wilderness, which is forced to operate under strict space, time and energy constraints, the benefits of BINGO can far outweigh this 1% accuracy loss.
To have a more in-depth understanding of the behaviours of BINGO and CAM with strict memory constraint, we perform a series of experiments with varying memory limits.To obtain a single metric for measuring the performance of a memory system, we use the memory quality factor defined in Eqn. 4. We plot the memory quality factors for BINGO, CAM-FIFO and CAM-LRU in Fig. 8.The fox/wolf class is prioritized in these experiments and BINGO appears to be superior in comparison with CAM-LRU and CAM-FIFO.Also for these experiments, we use the hyperparameters as mentioned above except for u, which is set to 0.
In Fig. 10, we see the variation in memory strengths of all the data neurons.Data neurons that are accessed more frequently, such as DN: 1, DN: 2 and DN: 4 maintain a relatively high memory strength.Figures 9 and 10 together illustrate the dynamism and plasticity features of BINGO in terms of information storage and retrieval.However, we believe that the proposed organization can incorporate other advanced characteristics of human memory based on prior and emerging research to further enhance the storage and retrieval efficiency.BINGO can potentially implement data retrieval algorithms based on the parallel spreading activation theory [36,37] or the minimal cue addition concept of the elementary perceiver and memorizer [38,39].The two-stage retrieval process proposed by Atkinson et.al. can also be incorporated in the BINGO framework to increase retrieved data quality [25].Domainspecific compression techniques such as MAGIC can also be used instead of JPEG for better storage efficiency in edge devices [21].

Conclusion
We have presented BINGO, a learning-guided memory paradigm, which can provide a dramatic improvement in memory access speed and effective storage capacity in diverse applications.It draws inspiration from the human brain to systematically incorporate learning in the memory organization that dynamically adapts to the data access behaviour for improving storage and access efficiency.We have presented in detail the store, retrieve, and retention processes that integrate and employ data-driven knowledge.We have developed a complete simulator for BINGO and compared its data storage and retrieval behaviour with traditional content-based memory.Quantitative evaluation of BINGO on the CIFAR-10 dataset and the wildlife surveillance dataset shows that it vastly surpasses the storage and retrieval efficiency of traditional CAM.By dynamically adapting data granularity and adjusting the associations between data and search patterns, BINGO demonstrates a high-level of plasticity that is not manifested by any existing computer memory organization.While we have worked with high-level memory organizational parameters here, our future work will focus on physical implementation of BINGO with various memory technologies.We believe the proposed paradigm can open up avenues for promising physical realizations to further advance the effectiveness of learning and can largely benefit from the data storage behaviour of emergent nonsilicon nanoscale memory devices (such as resistive or phase change memory devices).

Declarations
Funding Not applicable.

Conflict of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Data Availability All relevant experimental data are provided in the manuscript. Open

Fig. 2
Fig.2Taxonomy of digital memory used in computer systems.The proposed memory organization falls under the content-addressable memory category and is suitable for diverse application domains as shown

Fig. 3
Fig.3Memory organization of BINGO with two memory hives.Data neurons (DN) store data and cue neurons (CN) store cues/tags.Each memory hive stores a specific data type (e.g.image, sound).Similar data neurons accumulate to form localities.We refer to this memory graph, as Neural Memory Network (NoK)

Fig. 4
Fig. 4 Flowchart showing the major steps of BINGO operations.The learning and ageing aspects of BINGO are directly embedded in the operations

Algorithm 1 STORE 1 : 4 : 5 :[ 9 : 1 ) 2 SEARCH 1 :
procedure STORE(MEM, D, C, HP ) 2: while size(D) > remaining space(MEM) do 3: RET EN T ION (MEM, HP ) if neuron Count(MEM) == 0 then Initial Insertion(MEM, D, C, HP ) Loc C n , F lag C n ] = f ind Entry P oint(MEM, C, HP ) 8: [access P ath, F lag C 1 ] = SEARCH(M EM, C, HP, Loc C n , HP → π 1 ) LEARN ST ORE(MEM, C, HP, D, F lag C n , access P ath, F lag C Algorithm procedure SEARCH(M EM, C, HP, ep, limit) 2: neuronQueue = [(ep, −1)], bestCandidate Sim = −1, bestCandidate = φ, visited = [] 3: traversalM otion = [] 5.3)  to compress less accessed data neurons.If the NoK is empty, then for the first insertion (lines 4 and line 5), we generate the NoK graph with the seed organization as shown in Fig.5.A new cue neuron is generated for each cue in C, and they are connected depending on their level.A data neuron for D is generated and appended at the end.If the NoK is not empty, then the system must search for a suitable location to insert the incoming data.The first step is to select a level-n cue neuron in the NoK graph as the starting location for the search (line 7).Loc C n is the selected level-n cue neuron from where the search is to begin and Flag C n indicates whether or not the similarity between C ! level n cue and Loc C n is more than k n (hyperparameter).Next, the search in the NoK is carried out starting from Loc C n for finding a suitable location for inserting d (line 8).Here H ! p 1 is the hyperparameter that limits the search effort.The suitable location for inserting the data neuron is represented as a level-1 cue neuron around which the data are to be inserted.access Path is the list of neurons in the path from Loc C n to the selected suitable level-1 cue neuron (access Path½À1 i.e. the last element in the list access Path).Flag C 1 indicates whether or not the similarity between C ! level 1 cue and the cue-neuron access Path½À1 is more than k 1 (hyperparameter).Finally, the data are inserted (or merged with an existing data neuron) around access Path½À1.

Fig. 5 3 : 4 : 5 : 8 : 9 : 3 : 5 :
Fig.5NoK structure after the first data store.For this example, the BINGO system is operating with two levels of cue neurons (n ¼ 2) access Path and Flag C 1 from the SEARCH procedure (Algorithm 2) and Flag C n from the find Entry Point procedure are part of the inputs.Inside LEARN STORE procedure (Algorithm 3), the NoK is modified based on the insertion location search results.Four different scenarios may arise based on the values of Flag C n and Flag C 1 .1.If Flag C n ¼ 1 and Flag C 1 ¼ 1; then it can be inferred that the graph search entry point level-n neuron has a good match with C ! level n cue and the level-1 cue selected (access Path½À1) at the end of the SEARCH procedure also has a good match with C ! level 1 cue.This indicates that the data D or a very similar data already exist in the NoK and is connected to access Path½À1.In this scenario, we simply strengthen the data-neuron (based on HP !d 1 and HP !d 3 ) connected to access Path½À1 and increase the accessibility of the level-1 cue neuron access Path½À1 in the BINGO NoK using the procedure INC ACCESSIBILITY (Algorithm 4

Fig. 6 a
Fig. 6 a Visualization of the BINGO memory structure during a sequence of operations.We observe how neurons in the NoK are accessed and modified during each operation.b Visualization of a traditional CAM for an identical sequence of operations.We observe no dynamism in this case

Fig. 7 a
Fig. 7 a Quality factor growth for BINGO and CAM under memory size constraint when deer images are prioritized for CIFAR-10.BINGO appears to perform better at different memory size limits.b Quality factor growth for BINGO and CAM under memory size constraint when automobile images are prioritized for CIFAR-10.BINGO appears to be superior in this scenario as well

1 .
DN:1 is the first data neurons to be inserted into the NoK and it holds the image '560'.The strength of the data neuron is given in brackets which is 100 at this point.Two cue neurons (CN: 1 and CN:2) are also instantiated and connected based on the operation algorithms.2. By the end of the 3rd operation, two more data neurons get inserted.Some of the associations are strengthened due to access and some associations are weakened due to the lack of the same.DN: 1 and DN: 2 suffered minor memory strength loss due to the retention induced ageing process.DN: 2 stores the image '11488' and DN: 3 stores the image '615'.3.By the end of the 8th operation, one extra data neuron (DN: 4) gets inserted.The NoK appears to take a more complex shape due to learning-guided re-organization.Some of the associations have high strength (e.g.CN: 1 to CN: 2), while some others have very low strength (e.g.CN: 4 to CN: 5).DN: 4 stores the image '6885'.4. By the end of the 10th operation, the NoK contains 6 data neurons organized in a complex NoK graph.Some of the associations such as (CN: 1 to CN: 3) and (CN: 4 to CN: 5) have very low strength due to lack of access.The data neurons have also been inserted and organized based on the type of the data they hold.For example, DN: 3, DN: 5, and DN: 6 are all images of forest background and they appear to form a cluster in the graph.DN: 5 stores the image '6699' and DN: 6 stores the image '8981'.5.By operation 13, new associations are formed and the NoK is reorganized.We observe that DN: 3, DN: 5, and DN: 6 have low memory strengths in comparison with other data neurons due to the memory access pattern.6.When all the 24 operations finish, We see that DN: 3, DN: 5, and DN: 6 have very low memory strengths.DN: 1, DN: 2, and DN: 4 are still retained at high memory strengths.Some of the cue neurons and associated data are almost not accessible (CN: 8, CN: 9, DN: 5, and DN: 6).Some of the data neurons are highly accessible due to strong association strengths.

Fig. 10
Fig.10Variation in data neuron memory strengths as operations are performed.Less accessed data neurons lose memory strength rapidly and get compressed.Data neurons which are accessed more have higher memory strengths and are kept at higher quality.These results are linked with Fig.9

Table
Comparison between BINGO and other popular memory frameworks ¼ g 1 .With the DA defined, we compute the updated parameters as follows: ðA 0 Þ = ðAÞ þ DA.Here e 1 , g 3 , and g 1 are hyperparameters defined earlier.Next we compute DM based on Y p .If D ¼ fd 1 ; d 2 ; d 3 ; . ..; d l g are the indices of data neurons associated with the cue neuron N p ).This is a form of data neuron merging because no new data neurons are inserted but only the strength and accessibility of an existing data neuron are enhanced.2. If Flag C n ¼ 0 and Flag C 1 ¼ 1; then it can be inferred that the graph search entry point level-n neuron does not have a good match with C ! level n cue but the level-1 cue selected (access Path½À1) at the end of the SEARCH procedure has a good match with C ! level 1 cue.This indicates that the data D or a very similar data already exist in the NoK and is connected to access Path½À1 but no cue neuron for C ! level n cue exist in the NoK.In this scenario, we strengthen the data-neuron (based on HP !d 1 and HP !d 3 ) connected to access Path½À1, make a new level-n cue neuron (newCN) for C ! level n cue, connect newCN with access Path½À1, and increase the accessibility of the level-1 cue neuron access Path½À1 in the NoK using the procedure INC ACCESSIBILITY (Algorithm 4).This is a form of data neuron merging as well.3.If Flag C n ¼ 1 and Flag C 1 ¼ 0; then it can be inferred that the graph search entry point level-n neuron has a good match with C ! level n cue but the level-1 cue selected (access Path½À1) at the end of the SEARCH procedure does not have a good match with C ! level 1 cue.In this scenario, we make a new level-1 cue neuron for C ! level 1 cue, make a new data neuron for D, connect the newly Loc C n ) best situated as a starting point for searching the desired data, (2) search the NoK starting from Loc C n using the SEARCH procedure, and (3) based on the search results retrieve the desired data if it exists and modify the NoK in light of this access using the function LEARN RETRIEVE.Flag Cn indicates whether or not the similarity between C ! level n cue and Loc C n is more than k n (hyperparameter).HP !p 2 is the hyperparameter that limits the search effort.Flag C 1 indicates whether or not the similarity between C ! level 1 cue and the cue-neuron access Path½À1 is more than k 1 (hyperparameter).If Flag C 1 ¼ 1, then a matching level-1 cue neuron (attached to the desired data) is found in the NoK.accessPath is the list of neurons in the path from Loc C n to this matching level-1 cue neuron (access Path½À1 i.e. the last element in the list access Path).The retrieved data D (if exists) are returned at the end of the procedure (line 9).For the LEARN RETRIEVE procedure (Algorithm 6) access Path and Flag C 1 from the SEARCH procedure (Algorithm 2) are part of the inputs.Inside the sub-operation LEARN RETRIEVE (Algorithm 6), the NoK is modified based on the search results.Two different scenarios may arise based on the value of Flag C 1 .
However, the feature/cue extraction can potentially be done inside the BINGO framework itself.HP is the set of hyperparameters for MEM.If the number of neurons in MEM is 0, then NULL is returned indicating a failed retrieval.Otherwise, we (1) find an entry point in the NoK (a level-n cue, 1.If Flag C 1 ¼ 1; then it can be inferred that at the end of the SEARCH procedure a good match between C ! level 1 cue and access Path½À1 was found.This indicates that the desired data are associated with the level-1 cue neuron access Path½À1.Hence, we (1) enhance the memory strength of this desired data neuron DN (based on H ! d 1 and HP !d 3 ), (2) increase the accessibility of the level-1 cue neuron access Path½À1, and (3) return DN. 2. If Flag C 1 ¼ 0 then we return NULL because no access P ath, F lag C 1 ] = SEARCH(M EM, C, HP, Loc C n , HP → π 2 ) 0, and (iv) the previously proposed algorithms are used for else 6: [Loc C n , F lag C n ] = f ind Entry P oint(MEM, C, HP ) 7: [1: procedure RETENTION(MEM, HP ) 2: ; access path½0Þ p 1 þ 1.This will continue to hold true because no data neurons and associations are deleted according to the premise (u [ 0 and e 2 [ 0).Hence, for any d i 2 D inserted in this way 9c 2 j 2 C 2 such that hopsðd i ; c 2 j Þ ðp 1 þ 1Þ.In this case the 1.If during insertion it was the case that (in Algorithm 3), Flag C n ¼¼ 1 and Flag C 1 ¼¼ 0, then a new level-1 cue neuron (CN new ) and the data neuron d i would have been first instantiated.CN new is then connected with the level-1 cue neuron access Path½À1 and d i is connected to CN new .If the level-1 cue neuron access Path½À1 is x hops from the level-2 cue neuron access path½0, then hopsðd i ; access path½0Þ ¼ x þ 2. That is, hopsðd i ; access path½0Þ À 1 ¼ x þ 1.We know x p 1 À 1 because p 1 is the search effort limit in Algorithm 2. new and d i is connected to CN 1 new .Hence, 1. Assume that there are d data neurons D ¼ fDN 1 ; DN 2 ; . ..; DN d g and c cue neurons C ¼ fCN 1 ; CN 2 ; . ..; CN c g in the NoK. 2. Assume that the memory strengths of DN i & DN k to be S i & S k , respectively, such that S i ¼ S k .3. Assume that a total n retrieval operations and m retention operations are performed next in any order.Out of the n retrieval operations, x retrieves DN i and ðn À xÞ retrieves DN k .Also, x [ ðn À xÞ. 4. Assume that a linear decay function is used such that during retention operation, memory strength of data neurons decrease by d 2 .Also, S i [ md 2 and S k [ md 2 .5. Assume that S 0 i & S 0 k are the memory strengths of DN i & DN k , respectively, after all the n retrieval operations and the m retention operations are finished.6.Each access to a data neuron increases its memory strength by d 1 Proof Assume that the premise is true.Hence, after x accesses and m retention operations, S 0 i

Table 3
Results highlighting BINGO's effectiveness when storing images from the wildlife surveillance dataset and retrieving images of wolf/fox Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material.If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.To view a copy of this licence, visit http://creativecommons. org/licenses/by/4.0/.