Approximating the Manifold Structure of Attributed Incentive Salience from Large-scale Behavioural Data

Incentive salience attribution can be understood as a psychobiological mechanism ascribing relevance to potentially rewarding objects and actions. Despite being an important component of the motivational process guiding our everyday behaviour its study in naturalistic contexts is not straightforward. Here we propose a methodology based on artificial neural networks (ANNs) for approximating latent states produced by this process in situations where large volumes of behavioural data are available but no experimental control is possible. Leveraging knowledge derived from theoretical and computational accounts of incentive salience attribution we designed an ANN for estimating duration and intensity of future interactions between individuals and a series of video games in a large-scale (N > 3 × 106) longitudinal dataset. We found video games to be the ideal context for developing such methodology due to their reliance on reward mechanics and their ability to provide ecologically robust behavioural measures at scale. When compared to competing approaches our methodology produces representations that are better suited for predicting the intensity future behaviour and approximating some functional properties of attributed incentive salience. We discuss our findings with reference to the adopted theoretical and computational frameworks and suggest how our methodology could be an initial step for estimating attributed incentive salience in large-scale behavioural studies.


Introduction
Individuals' body and mind are subject to continuous changes driven by physiological, cognitive and affective processes.These changes are the constituent parts of so called "internal states" which can be thought as dynamical latent constructs able to modulate observable behaviour (Eyjolfsdottir et al., 2016;Song et al., 2017;Merel et al., 2019;Calhoun et al., 2019).Among these latent states, those related to reward and motivation processes are of pivotal importance (Berridge and Kringelbach, 2008).However, despite their relevance, the study of these entities in naturalistic contexts is not straightforward: their inference is often the solutions to an "inverse problem" (Bishop, 2006) where observable and easy to acquire measures (e.g.patterns of behaviour) are used to estimate the internal factors that generated them (e.g.latent states related to motivation and reward processing) (Song et al., 2017;Wang et al., 2018b).This idea is not new (Spearman, 1961), but it has regained traction in recent years because of the increased availability of large volumes of data collected both inside and outside controlled experimental settings.Because they enable the study of phenomena in naturalistic settings, data collected with ecologically valid approaches are particularly interesting but come with their own set of challenges (Hashem et al., 2015): the lack of strict control on the data gathering process (which can worsen the noise to signal ratio), the complexity of the data or construct under scrutiny and the need for algorithms able to scale to very large datasets.Recent approaches based on latent variable models (e.g.Hidden Markov Models (HMM)) have shown promise in taming some of these problems (Calhoun et al., 2019) but might struggle to overcome the issues related to complexity (Eyjolfsdottir et al., 2016;Schuster-Böckler and Bateman, 2007) and scalability (e.g.challenges in fitting large state spaces) (Touloupou et al., 2020).
These models rely on the assumption that the latent states, despite being embedded in a high dimensional space (e.g.patterns of behaviour or brain activity), can be effectively described using much less degrees of freedom (Seung and Lee, 2000;Pang et al., 2016;Luxem et al., 2020).Motivation for instance could be reduced, at any given time, to a 2D plane representing the intensity and the target of the the motivated behaviour (Simpson and Balsam, 2016).In this regard, a promising line of research is the approximation of latent states through the representation generated by Artificial Neural Networks (ANNs) (Eyjolfsdottir et al., 2016;Song et al., 2017;Merel et al., 2019;Luxem et al., 2020;Pereira et al., 2020;McCullough and Goodhill, 2021;Shi et al., 2021).ANNs are designed for applications with large amounts of data (Oh and Jung, 2004), provide noise resiliency and are able to capture complex interactions in the data (Bengio et al., 2017).These desirable properties however come at the cost of interpretability and ANNs are often declared inaccessible black boxes only capable of efficient input-output mapping.In line with a growing tendency in the literature (Barak, 2017;Kietzmann et al., 2018;Luxem et al., 2020;Pereira et al., 2020;McCullough and Goodhill, 2021;Shi et al., 2021), we argue that this is only partially true and that given full access to the computations performed by an ANN, a certain degree of interpretability can be achieved.Through the use of prior theoretical knowledge, it is possible to constrain the input, the objective and the architecture of an ANN in order to generate so called "latent representations" (which can be thought as un-observed variables able to explain observable phenomena).Through reverse engineering, it is possible to extract the manifold structure embedded in these high dimensional representations and test it against theory driven hypotheses (Barak, 2017;Kietzmann et al., 2018).In line with this principle, we propose theoretical and methodological foundations for approximating the manifold structure of motivation and reward related latent states using behavioural data and ANN.Although our work aims to generalize to various areas of application, our experimental efforts exclusively make use of telemetry data coming from videogames.Since video games rely heavily on reward and motivational processes for producing playing behaviour (Chumbley and Griffiths, 2006;Wang and Sun, 2011;Phillips et al., 2013;Ašeriškis and Damaševičius, 2017;Agarwal et al., 2017;Steyvers and Benjamin, 2019) and allow to record large volumes of behavioural data in a naturalistic fashion (Drachen, 2015), we considered them to be a suitable initial test bed for our work.We trained an ANN model (designed according to a-priori theoretical knowledge) to estimate the duration and intensity of future interactions between individuals and a diverse range of video games.This was then used to generate latent representations that we compared, at the functional level, with the construct of attributed incentive salience, a particular type of latent state guiding motivated behaviour.We found attributed incentive salience to be a suitable framework for designing our model and interpreting the generated representations due to its strong connections with established psychobiological theories of reward, learning and motivation.Through a series of three experiments we show that, on the predictive task, our approach outperforms competing ones while also producing representations that mimic the functionalities of attributed incentive salience.
The paper starts by illustrating theoretical and computational accounts of incentive salience attribution and shows how they inspired the implementation of our approach.We then describe how our model can be applied to behavioural data coming from video games, highlighting the value of this type of large-scale longitudinal data for the study of motivationrelated latent states.A series of theory-driven hypotheses are then presented with the aim of evaluating the presence of specific desirable properties in the latent representation generated by our approach.These are investigated through a set of three different analyses, the results of which are then discussed in light of the adopted theoretical framework and potential application of our approach.Finally, we outline limitations of our work and the steps that could be adopted to mitigate them.

Theoretical Framework
In this section we will discuss the concept of motivation and how it can be used to describe the interactions that individuals have with particular objects or activities.We will first provide a short historical review of theories of reward-driven motivation focusing in the last section on how these led to the incentive salience hypothesis formulated by Berridge and Robinson (Berridge and Robinson, 1998).While acknowledging the contribution of other theories in the definition of reward-driven motivation the present work will mostly focus on the area of behavioural neuroscience and make use of the framework provided by Berridge and Robinson (Berridge and Robinson, 1998).The reason behind this choice lies in the fact that we found incentive salience to be more specific to motivation and most importantly to have a more robust and clear connection to behaviour.The sections closes with an overview on the idea that latent states, like those generated by incentive salience, might be represented as a manifold embedded in patterns of brain activity and behaviour.

Motivation
Motivation is fundamental to everyday life: it is a process directing and helping individuals to achieve goals efficiently in environments that present multiple courses of action at any moment (Ikemoto and Panksepp, 1996;McClure et al., 2003).A formal definition of motivation should encompass both the behavioural and the underlying psychobiological level.We will focus first on behavioural aspects of the process and outline in the following section a theoretical account of motivation which also covers the psychobiological level.Motivation describes why individuals react in particular ways when encountering stimuli regarded of high relevance and why they approach those stimuli at particular times (Berridge, 2004).These types of stimuli are said to possess "rewarding properties", which can be defined as the positive value ascribed to an object, a behavioral act or an internal physical state as the result of an active process of the mind and the brain (Schultz et al., 1997;Berridge and Kringelbach, 2008).Importantly, motivation is not purely driven by the fulfilment of fundamental needs like nutrition or reproduction (so-called "primary reward objects" (Schultz et al., 2000)) or the avoidance of negative consequences like physical pain.It also extends to those volitional objects and activities which do not appear to be necessary for the survival of the individual (i.e."secondary reward objects" (Berridge and Kringelbach, 2008;Sescousse et al., 2013)).For those activities the expectation of the amount of reward received is learned over time and may vary significantly between individuals (Berridge and Kringelbach, 2008;Simpson and Balsam, 2016).In spite of this, it would be inefficient to have dedicated and specialized motivational systems for every combination of individuals and objects (e.g. an individual's motivation for playing sport or eating food).Instead, we can think of motivation as a single overarching entity that controls the interaction between individuals and objects in an agnostic manner (Simpson and Balsam, 2016).An analogy may be drawn with the geometric concept of a vector.Looking at Figure 1 we can imagine the focus on a specific object being represented by the angle of the vector while its length is the intensity of the motivational process (or the amount of motivated behaviour) (Simpson and Balsam, 2016).This, can be thought as a dynamic quantity defined by the state of the individual and the rewarding properties of the object (Toates, 1994;Berridge, 2004;Zhang et al., 2009).If motivation acts as a single overarching process, we expect it to predict and explain goal directed behaviours seamlessly across a heterogeneous range of situations and individuals.Motivational theories based on the concepts of reward and incentive are promising candidates for this because, relying on consistent and plausible psychobiological bases, they tend to operate abstracting from the nature of the individuals and the objects.(Ikemoto and Panksepp, 1999;Berridge and Robinson, 1998;Salamone  and Correa, 2002;Berridge, 2004;Armony and Vuilleumier, 2013;Corbit and Balleine, 2015).
One of the early formulations of reward-based motivation proposed by Bolles (Bolles, 1972) suggested that individuals were motivated by the "expectation of incentive outcomes".In short, individuals learn (and eventually anticipate) associations between their actions and the potential pleasurable outcomes associated to them (Bolles, 1972;Berridge, 2004).Expanding on this idea, Bindra suggested that the learning process does not just generate pleasure expectations in response to specific behaviours but it also allows individuals to perceive the behaviours themselves as a source of hedonic reward (Bindra, 1978;Berridge, 2004).Further work by Toates (Toates, 1994) asserted that the magnitude of the perceived incentives introduced by both Bolles and Bindra is modulated by the internal states of the individual (Toates, 1994;Berridge, 2004).In other words, the incentive expectations (and consequently the associated motivated behaviours) learned by an individual can change over time depending on the individual's internal state.

Incentive Salience Hypothesis of Motivation
The approaches proposed by Bolles, Bindra and Toates, provide an account of reward-based motivation but they assume that there is no distinction between the affective dimension of an incentive (i.e.how pleasurable it is) and the purely motivational aspect of it (i.e.how much goal directed behaviour it can produce) (Bindra, 1978;Toates, 1994).Expanding on this, Berridge and Robinson proposed that the motivational process controlling the interaction between individuals and objects might not be a unitary mechanism but rather a composite process having specific and dissociable components which rely on specialized neurobiological mechanisms, namely: liking, wanting and learning (Berridge and Robinson, 1998;Berridge et al., 2009;Smith et al., 2011).
Liking The liking component describes the pleasure expected by an individual when interacting with an object (Berridge et al., 2009).It is responsible for the hedonic quality of an experience and acts as a signal indicating that interacting again with that object might be beneficial.Despite the fact that liking plays an important role in the incentive salience hypothesis of motivation it is difficult to measure it outside controlled laboratory environments (Berridge and Robinson, 1998) and it will not form a central theme of this research.Instead, we will focus on the "wanting" and "learning" components.
Wanting The wanting component, or "incentive salience", has the function of generating and holding latent representation of objects and behavioural acts and of attributing value to them through learning mechanisms.These "valued representations" can then be used by action selection systems in order to make certain behaviours more likely (Ikemoto and Panksepp, 1996;Berridge and Robinson, 1998;McClure et al., 2003;Berridge, 2004).As a consequence of this, when an object is attributed with incentive salience it will more likely draw the subject's attention and become the focus of goal directed behaviours (Berridge, 2004).Interestingly, wanting seems to be more than a simple form of valuecaching but rather a dynamic process in constant change (Robinson and Berridge, 1993;Zhang et al., 2009;Tindell et al., 2009;Berridge, 2012).This is because the saliency of an object depends both on its attributed value but also on the state of the individual interacting with it.A change in the individual's internal state can dampen, magnify or even revert the amount of attributed salience.(Robinson and Berridge, 1993;Zhang et al., 2009;Tindell et al., 2009;Berridge, 2012).It is important to note that wanting is not the hedonic expectation associated to an object, (which is designated by liking), but rather the process promoting the approach towards an object and the interaction with it (Berridge et al., 2009;Robinson et al., 2015).Despite the fact that liking and wanting are often correlated (i.e.I want what I like and vice versa) they can occasionally be triggered separately: addictive behaviours for instance are a notable example of wanting without liking (Robinson and Berridge, 1993).The functional dissociation between these two components is linked to differences in the underlying neurobiological substrate (Berridge et al., 2009;Smith et al., 2011).Neurotransmitters and brain areas responsible for wanting appear to be more numerous, diverse and easily activated than those for liking (Berridge et al., 2009;Robinson et al., 2015).As a consequence, increased incentive salience can be obtained by raising dopamine levels in many portion of the striatum without the need for the synchronized activity in other areas (Berridge et al., 2009;Smith et al., 2011;Meyer et al., 2015).This implies that the wanting component tends to produce more robust behavioural indicators in the form of increased amount and frequency of interactions between an individual and an object (Berridge and Robinson, 1998), which makes it a promising candidate for behavioural studies in conditions where strict experimental control is not possible.
Learning The last component in the formulation proposed by Berridge and Robinson (Berridge and Robinson, 1998;Berridge, 2004) consists of mechanisms that provide an individual with the capability to predict, based on past experiences, the occurrence of future pleasurable outcomes (i.e.liking reactions) when interacting with specific objects.These are similar to the learning process postulated by Bindra (Bindra, 1978) and have a twofold function.These mechanisms allow the attribution and change of incentive salience properties to previously liked objects (e.g.primary reward objects) but they also enable subjects to learn the hedonic value of initially neutral stimuli (e.g.secondary reward objects).The learning mechanism is based on classical conditioning: through repeated interactions with an object an individual will learn its hedonic properties and consequently attribute incentive salience to it (Berridge, 2004;Berridge et al., 2009).This process is driven by mechanisms similar to those of reward-prediction error: learning is driven by spikes in dopaminergic activity generated by a mismatch between expected and experienced rewards.(Schultz et al., 1997;Schultz, 2000;Flagel et al., 2011).

Manifold Representation of Incentive Salience
As anticipated in the introduction, we can think of the level of attributed incentive salience (i.e. the amount of Wanting attached to a particular object) as a latent state influencing motivated behaviour.The representation of these latent states is usually carried out by the activity of multiple brain regions responsible to generate patterns of observable behaviour.As we mentioned before, these multidimensional patterns are believed to reside on a manifold (Seung and Lee, 2000;Pang et al., 2016): a connected low dimensional region embedded within a high dimensional space (Bengio et al., 2017).An intuitive example of this is how the brain generates and stores mental maps of the environment which are then used for navigation tasks (Derdikman and Moser, 2011;Nieh et al., 2021).The dimensionality of the encoding signal is much larger than the intrinsic dimensionality of the spatial information encoded within it, indeed the activity of large neuronal populations is involved in generating a mapping that needs to be only 3 dimensional.When applied to incentive salience an intuitive representation sees the manifold as a 2 dimensional space, similar to what presented in Figure 1, generated by the activity of all those brain areas involved in the attribution of value and subsequent modulation of future motivated behaviour.This two dimensional space would represent, at any given time, the motivational saliency than an organism attributes to a potentially rewarding object and therefore also the intensity of the related behaviour (Berridge and Robinson, 1998;Simpson and Balsam, 2016).This idea of a neural manifold has found experimental support in different areas (e.g.motor control (Gallego et al., 2017), mnemonic processes (Derdikman and Moser, 2011;Nieh et al., 2021), reward processing (Bromberg-Martin et al., 2010) visual (Seung and Lee, 2000;Ganmor et al., 2015) and olfactory (Stopfer et al., 2003) perception) and rely on the fact that neural activity is highly redundant and reducible to just few correlation patterns (Gallego et al., 2017).In light of this, the use of dimensionality reduction techniques able to represent the manifold structure of highly dimensional data have proven to be valuable in making abstract entities like latent states more easily accessible and interpretable, while also facilitating their mapping onto brain (Gao et al., 2021;Rué-Queralt et al., 2021) and behavioural data (Luxem et al., 2020;Pereira et al., 2020;McCullough and Goodhill, 2021;Shi et al., 2021).
We have highlighted how motivation can be described as a mechanism that guides the interaction between individuals and objects.It controls and selects behaviours which are expected to lead to pleasurable outcomes for the individual (i.e.incentives or reward).These expectancies are the product of a learning process that can be modulated by the internal state of the individual.Therefore, from a behavioural point of view, an objects O can acquire salience for an individual I conditioned on its capacity to elicit rewarding experience r (Berridge and Robinson, 1998;McClure et al., 2003).The amount of attributed salience is a valued representation of O generated by I and controls how likely and intense future interactions between the two will be.(Berridge and Robinson, 1998;McClure et al., 2003).Let B represents the strength of an interaction between I and O, r a measure of how rewarding the interaction with O is perceived to be and V the generated attributed incentive salience.Following Figure 2, at time t + 1 an individual will produce an interaction with an object of strength B according to the previous V t .If we recall from section 2.2, this process relies heavily on learning mechanisms making V by nature dynamic and mutable.It should be noted that B can be represented as a multidimensional variable defined by the instrumental behaviours conventionally used for assessing the wanting component in animal studies (e.g.frequency, amount and duration of feeding behaviours like bites, nibbles and sniffs) (Berridge and Robinson, 1998).During and after the interaction I will experience a variable degree of reward r t+1 that, weighted by their internal state, will then be used for updating V t .It is worth noting that the individual's internal state is not the only factor involved in the modulation of r, also the context in which I and O interact (Env t+1 in Figure 2) seems to contribute to this (Palminteri et al., 2015).Following the idea presented in section 2.3, the latent state defined by V could be represented as a manifold defined by the activity of those regions responsible for the attribution of incentive salience.Moreover, given the strong coupling between attributed incentive salience and behaviour (Berridge and Robinson, 1998) we would also expect the structure of this manifold to be a suitable descriptor of the behavioural aspects of attributed incentive salience.

Computational Framework
In this section, we will propose a way to approximate the manifold structure of attributed incentive salience in scenarios where only large volumes of behavioural data are available and no strict experimental control is possible.First, we will briefly illustrate how previous work has framed the modelling of this construct as a reinforcement learning problem and solved it using Temporal Difference Learning (TD Learning) (Sutton, 1988).This, will provide us with a psychobiologically plausible computational model of attribute incentive salience and constitute the starting point for our approach.
Then, we will highlight how video games are promising candidates for studying the behavioural aspects of incentive salience attribution in naturalistic settings.Finally, combining these two ideas, we will show how estimating the manifold structure of attributed incentive salience can be cast as the solution to a supervised learning problem and why Artificial Neural Networks (ANNs), thanks to their representation learning capabilities, are well suited for the task.

Temporal Difference Learning
The first attempt to model incentive salience attribution was carried out by McClure et. al. using TD Learning (McClure et al., 2003).The use of TD Learning in simulation studies involving reward learning, is often motivated by its good approximation of the reward-prediction error signal generated by dopaminergic neurons (Schultz et al., 1997;Flagel et al., 2011).Algorithms in the family of TD Learning attempt to learn a value function V by iteratively refining an estimate V over time (Sutton and Barto, 2018).In the most basic form, called TD(0), this is done by simply observing the reward r associated with a particular state s at time t +1 and using it to adjust the estimate of V produced at time t (Sutton and Barto, 2018).Here t is an arbitrary unit of time, it can be specific (i.e.seconds) or generic (i.e. a point in an ordered series of events) depending on the type of application.If we let S = {s t : t ∈ T } be a sequence of states, then the value V at s t is given by the sum of all future discounted rewards expected when transitioning from s t to s t+1 with γ ∈ [0, 1] being a discounting factor for r.The iterative refining of V carried out by TD learning is achieved by first computing an error signal δ at time t which quantifies the difference between the current V and what is expected when transitioning to s t+1 .Once the error signal is computed, V (s t ) is updated as: where α ∈ [0, 1] is a constant controlling the amount of updating or the "learning rate".This process called TD update is illustrated by the diagram presented in Figure 3. Conventionally the transition from s to s t+1 is the result of an action selection process guided by V , because in optimal control settings the role of reinforcement learning is to select the course of action that maximizes future rewards (Schultz et al., 1997;McClure et al., 2003;Sutton and Barto, 2018) al. proposed that incentive salience is represented by V as defined in equation 1 while the error signal expressed by equation 2 represents the activity of dopaminergic neurons with the dual function of driving the attribution of incentive salience (through reward prediction error coding as specified in section 2.2) and guiding the previously mentioned action selection process (Schultz et al., 1997;McClure et al., 2003;O'Doherty et al., 2003) (Toates, 1994;McClure et al., 2003;Berridge, 2004;Zhang et al., 2009;Tindell et al., 2009;Berridge, 2012) The mismatch between the predicted amount of reward and the actual reward received at time t +1 generates an error signal that allows I to learn about the "correct" magnitude of V (s t ) (Schultz, 2017) .As an example, an individual may anticipate that eating their favourite meal would be a rewarding experience but instead (for some reason) it was underwhelming.They therefore reduce the salience previously attributed to it.Importantly, V (s t ) does not just encompass the previous history of interactions between I and O but also the current state of I: the individual has learned from long experience that eating is a pleasurable activity but currently, since they are sated they do not expect much reward from doing it again in the near future.
From TD to Supervised Learning The approaches discussed above frame the estimation of attributed incentive salience as a reinforcement learning task.This requires the simulation of a sequence of interactions between I and O and the concomitant delivery of r (Schultz et al., 1997;McClure et al., 2003;Zhang et al., 2009).However, it is not always straightforward to replicate these interactions in real world scenarios, especially when dealing with human participants.The control on the internal state of I and amount of r delivered that McClure and Zhang assume is usually based on strict assumptions and can be achieved only in controlled experimental settings (McClure et al., 2003;Zhang et al., 2009).As an alternative solution for inferring V outside the laboratory we propose to learn its manifold structure through supervised learning.
Differently to what reported in the literature (Calhoun et al., 2019;McCullough and Goodhill, 2021;Luxem et al., 2020;Pereira et al., 2020;Shi et al., 2021) we argue that in this case the use of supervised in place of un-supervised techniques is to be preferred.Indeed, since we are dealing exclusively with behavioural data and trying to solve an inverse problem we would like to learn a manifold structure which is not just a generic indicator of behavioural phenotype (Luxem et al., 2020) but also obeys to specific functional constrains.
In this approach, an experimenter gathers data on a set of interactions between I and O and let a learning algorithm to estimate two functions: here f 1 and f 2 are arbitrarily complex functions while θ 1 and θ 2 are parameters that the learning algorithm has to infer.The future reward that an individual expects after an interaction with an object is produced by the current level of attributed salience, which itself is a function of the current internal state of the individual (expressed through the amount of reward just experienced) and the incentive salience previously attributed to the object.It is important to note that the two functions above need to be recursive over all s ∈ S (see equations 1 and 4) in order to provide V (s t ) with the dual purpose of caching all the past V and being a suitable predictor for all the r.This formulation however still requires a measure of the r experienced by I (or more precisely its weighted version r) after interacting with O, which is not easily accessible.However, Thorndike's law of effect (Thorndike, 1927) and Skinner's operant conditioning principles (Skinner, 1965) suggest that r, which like V is a non observable latent variable, manifests itself through the intensity of interactions between I and O (i.e.B in Figure 2 and section 2.1): the frequency and amount of object-directed behaviours increase or decrease as a function of the rewards an individual expects to receive (Berridge, 2004;Schultz, 2017).Since V (s t ) predicts how much r an I expects to receive from interacting with O, we should also expect the strength of their future interactions to be a function of V (s t ).This can be represented re-arranging the equations in 5 in a more compact form as a chain of functions To approximate the above expression, a learning algorithm would require records of behaviours generated by individuals while interacting with a diverse set of potentially rewarding objects.Here, we argue that video games are one way to obtain this type of data at scale while also achieving some level of ecological validity.

Video Games and Telemetry
Interacting with video games is a volitional activity driven largely by the capacity of the games to provide pleasurable experiences (Boyle et al., 2012).Behaviour within the game is best understood as the result of a value attribution process similar to that of secondary reward objects (see section 2.2).
Indeed, it appears that the play behaviour is often produced and maintained by the structural characteristics of the game (e.g.game mechanics) (King et al., 2010b) which, working like conventional reinforcement mechanisms (Chumbley and Griffiths, 2006;Wang and Sun, 2011;Phillips et al., 2013;Ašeriškis and Damaševičius, 2017), produce effects similar to operant conditioning (Skinner, 1965).Although caution should be applied when complex activities are investigated using neuroimaging techniques, evidence suggest that the maintenance of video games playing behaviour engages the same cortico-striatal structures (Hoeft et al., 2008;Mathiak et al., 2011;Cole et al., 2012;Klasen et al., 2012;Lorenz et al., 2015;Gleich et al., 2017) and neurotransmitters (Koepp et al., 1998) involved in reward processing.This, seems also supported at the behavioural level where the ammount of experienced in-game reward appears to play a role in controlling how likely is an individual to keep engaging in playing behaviour (Agarwal et al., 2017;Steyvers and Benjamin, 2019).This, in conjunction with a growing literature highlighting similarities between certain video game mechanics and activities driven by secondary rewards (e.g.gambling) (King et al., 2010a;Drummond and Sauer, 2018;Zendle and Cairns, 2018), corroborates the idea that video games are able to elicit behavioural responses through incentive mechanics.
In this view, video games with different structural characteristics could be seen as objects possessing rewarding properties that heavily depend on the individuals interacting with them (e.g. an individual's preference for a specific game mechanic).Hence, similarly to the process specified in section 2, we can expect that through repeated interactions, an individual will experience varying degrees of reward determined by their internal state and the characteristics of the game.These interactions will produce continuous adjustments in the level of saliency attributed to playing that specific game which in turn will influence the frequency and amount of future interactions with that same game.Other than offering a context for observing the process of incentive salience attribution, video games allow us to obtain large volumes of behavioural data (similar to those mentioned in section 3.1) in a naturalistic fashion.This is made possible by the widespread practice of obtaining high frequency records (i.e.telemetry2 ) of players' behaviour during the game (Drachen, 2015).This approach, despite offering less control and rigour than conventional experimental procedures, allows us to obtain a more faithful representation of natural behaviour (similarly to field studies) while avoiding some of the limitations connected with laboratory-based studies (e.g.sampling and observer biases).
In order to use this type of behavioural data to model attributed incentive salience, a learning algorithm should possess the following properties.First, it should be scalable and noise resilient, to leverage large volumes of naturalistic data in an efficient and effective manner.Second, it should be able to approximate arbitrarily complex functions, given that the shape of the functions specified in equation 3.1 is not known a-priori.And finally, it should be able to produce an approximation of V (s t ) that can be inspected in order to evaluate if its functional properties can be compared with those of attributed incentive salience.Artificial Neural Networks (ANNs) appear to satisfy these requirements.

Artificial Neural Networks
In their conventional form, ANNs can be seen as chains of nested functions (the layers of the network).These layers are vector valued (there are multiple units or neurons in each layer) and organized as directed acyclic computational graphs (information only flows forward).When the number of layers is greater than two, the prefix "deep" is usually applied (Bengio et al., 2017).The goal of this ensemble of functions is to create a mapping between an input x and a target y.Following the example illustrated in Figure 4, given the set of parameters Θ = {θ 1 , θ 2 } an ANN would first infer a function h = f (x; θ 1 ), mapping the input to a new representation h.The same representation h would then become the input of a second function y = f 1 (h; θ 2 ) which produces an estimate of the target (Bengio et al., 2017).In this sense, we can think of each layer as a collection of many non-linear vector to scalar functions taking the previous layer as input and generating the units for the layer that follows (Bengio et al., 2017).By increasing the number of layers and units, ANNs can approximate an extremely large class of functions (Rumelhart et al., 1986).An ANN finds the optimal values for Θ by taking forward and a backward passes through the computational graph.In the forward pass, information flows from the input x to the estimate y according to the operations specified in Figure 4.During the backward pass, the error between y and the target is first computed Fig. 4: Feedforward ANN with a 2-units hidden layer.The figure represents how a feedforward ANN could be used for estimating V (s t ) given a sequence of observed behaviors (B) produced while interacting with an object (O).Here x and h are vectors indicating the model's input and the inferred representation.y indicates both the target and the estimate produced by the model while b is a bias term.The collection of all the red lines indicates the Θ that the ANN has to estimate while each line represents a single parameter w.The circles are computational units (i.e.artificial neurons) whose outputs are given by Act(∑ N i=1 w i + b).Here, Act is a non-linear function conventionally called activation while N is the dimensionality of the previous layer.
Here L is a generic convex and differentiable function measuring the distance between y and y.Then, the gradient of the error with respect to all the parameters is found and an update is performed taking steps of size α ∈ [0, 1] in the direction opposite to the gradient What we illustrated here is the application of the delta rule for updating the i th parameter of the j th layer through gradient descent (Widrow and Hoff, 1960).Deep feedforward ANNs rely on a generalization of this rule (i.e.backpropagation (Rumelhart et al., 1986)) for efficiently computing the gradient for all the parameters in the network.
Returning to the supervised learning problem specified in section 3.1, a feedforward ANN approximates V (s t ) by mapping the inputs of equation 6 to a candidate V (s t ) which is then used to generate an estimate B t+1 .Then, during the backward pass V (s t ) is adjusted based on the degree of mismatch between the estimation that it produced and the real value of B t+1 .It is of interest to note that there is a certain degree of overlap between how ANNs adjust their weights and the TD update illustrated in section 3.1.Indeed, in single-step scenarios (i.e.predicting s t+1 based on s t for each s ∈ S) the parameter changes produced by the two methods are the same (Sutton, 1988).The major difference lies in the delivery of the update: TD learning performs it at every step while backpropagation-based algorithms must wait until the end of the sequence in order to collate all the observed errors in a single term (Sutton, 1988).
Recurrent Neural Networks Despite their universal function approximation properties (Hornik et al., 1989), feedforward ANNs are not suitable for the type of recursive operations expressed in paragraph 3.1 (Bengio et al., 2017).As we can see from Figure 5A, given a sequence of inputs and targets, a conventional feedforward ANN would be limited to learning a temporally local function of the form Even when Θ are shared across time, the estimated V (s t ) cannot incorporate information from past V (s) nor guarantee predictive power for the future B. A solution to this problem is offered by ANNs with feedback connections like Recurrent Neural Networks (RNNs).These are a class of ANNs that are able to efficiently process long sequences of data while also relaxing the requirements of conventional feedforward ANNs for fixed length inputs (Bengio et al., 2017).Looking at Figure 5B, we see that for each t ∈ T a RNN would compute V (s t ) using both the input OB t and the previously estimated representation V (s t−1 ).This, in combination with the recursive application of Θ , allows the network to learn a function over the entire temporal sequence and to provide V (s t ) with the desirable properties mentioned in section 3.1.The structure of Θ is more complex in RNNs than in feedforward ANNs3 and a detailed derivation of the underlying optimization process is outside the scope of the present work.Nevertheless, it is worth singling out how the recurrent nature of the computations underlying the generation of V (s t ) makes RNNs suitable for approximating the function specified in section 3.1.
Following Figure 5B, let V (s t ) be the representation inferred by the model at time t and its associated parameters.Optimal parameter values are found through the same update rule used in feedforward ANNs however, since E can now only be observed at the end of a temporal sequence, computing requires us to take into account all the intermediate steps from t to T .This is achieved applying the chain rule and propagating the error gradient backward in time (Bengio et al., 2017;Lillicrap and Santoro, 2019) This implies that, similarly to TD update, the error flow forces V (s t ) to retain information from OB t and V (s t−1 ) in order to perform estimation of B t+1 while still being useful for generating V (s t+1 ) as we can see from Figure 5B.This process is made more efficient by an RNN variant called Long Short-Term Memory (LSTM) (Hochreiter and Schmidhuber, 1997), which, as well as improving the propagation of the error gradient, has specialized mechanisms for inferring, at each point in time, which portion of information should be kept or discarded in order to minimize E (Hochreiter and Schmidhuber, 1997;Bengio et al., 2017).

Representation and Manifold Learning
As mentioned in the previous sections, ANNs are tasked to create latent representations (e.g.V (s t )) which are not explicitly defined by their input or target but are nevertheless functional for connecting the two (Rumelhart et al., 1986;Bengio et al., 2017;Lillicrap et al., 2020).This is based on the hypothesis that the relationship between the input and the target can be expressed in terms of variations in coordinates on a manifold (Bengio et al., 2017).In the lower dimensional space of this manifold, the input is re-organized to improve estimation and elements which are similar to each other tend to appear close together (Bengio et al., 2017).In this view, during optimization, each layer of an ANN attempts to place its input on a manifold that is useful for the layer that follows.This process continues until the last layer.Here the inputs are organized in such way that it makes easier for the network to produce good predictions of the target (Bengio et al., 2017).Moving along this final manifold allows one to reach inputs with different characteristics leading to variations in the predictions produced by the model.We hypothesize that the amount of attributed incentive salience (i.e. V (s t )) can be modeled as a manifold on which the history of individual-object interactions is placed in order to best predict the intensity of all future interactions.This relates to the concept of motivation as a vector presented in sections 2.1 and 2.3: the representation V (s t ) estimated by an ANN can be thought of as a vector in an h dimensional space, where h is the number of units of the layer producing the representation, indicating the amount of attributed incentive salience after observing t interactions.As we can see, differently from completely un-supervised approaches this approach forces the learned manifold to obey to specific representational and predictive functionalities that are shared with the construct of attributed incentive salience.Given the potentially large number of layers in an ANN, locating this representation and most importantly ensuring that it is a suitable approximation of V (s t ) are potential issues.A possible solution is to impose a form of architectural constraint on the optimization process through multi-task learning.Multi-task learning closely resemble multivariate analysis, it works on the assumption that a common latent factor underlying a set of targets exists and it can be constrained in a single representation used by the ANN for producing multiple predictions (Bengio et al., 2017).An example of this process is shown in figure 6.As mentioned in section 2.2, the amount of attributed incentive salience V (s t ) that an individual I assigns to an object O should be a latent factor that indicates how intense future interactions with that object will be.Therefore, if a layer in an ANN is forced to produce a single representation which is then used to estimate multiple behavioural indicators of the intensity of these interactions, this should provide a sensible approximation of the amount of attributed incentive salience.

Methodology
We hypothesize that the amount of attributed incentive salience could be approximated by the representation produced by a model taking as input the intensity of all the interactions that an individual had with an object while estimating the intensity of the one that follows.We expect ANNs, and their recurrent variant in particular, to be well suited to this task and to provide better fits to the data.As mentioned in sections 2.1 and 2.2, in order for this representation to effectively describe the amount of attributed incentive salience it must distinguish between different objects while also maintaining the ability to differentiate individuals based on the expected intensity of their future interactions.It should also maintain this property over time and capture potential variations occurring in the history of interactions between the individual and the object.
Below we detail the methodological approach used to test these hypotheses, highlighting the connection between the theoretical and computational frameworks presented (see sections 2 and 3), the data, the models, and the experimental pipeline employed.We fit a series of models to behavioural data coming from videogames.These models aim to predict the ammount, length and frequency of future playing behaviour that users have with a specific game based on the intensity of their previous interactions.We then analyzed the representation generated by one of these models, a RNN of our design, in order to evaluate its functional similarity with the construct of attributed incentive salience.

Data
To validate our approach and hypotheses we needed to acquire records of interactions between individuals and potentially rewarding objects in naturalistic contexts.As mentioned in section 3.2, video games are particularly suited for this purpose given their learning-dependent reinforcing properties and the large amount of longitudinal data streams that they can generate.We used gameplay data from six video games published by our partner company, Square Enix Ltd..The games were Hitman Go (hmg), Hitman Sniper (hms), Just Cause 3 (jc3), Just Cause 4 (jc4), Life is Strange (lis), and Life is Strange: Before the Storm (lisbf).Due to the diversity in their in-game mechanics, each of these games was considered as an "object" with different reinforcing properties (see section 3.2).This allowed us to mimic a situation where a single model had access to data coming from a heterogeneous set of potentially rewarding entities (similarly to what we described in section 2.1).The resulting dataset contained entries from 3,209,336 individuals, evenly distributed across the six games, and randomly sampled from all users who played the games between their respective release date and January 2020.All data were obtained and processed in compliance with the European Union's General Data Protection Regulation (EUd).In order to represent state transition dynamics (i.e.sequences of interactions between I and O) for each individual, we retrieved a set of six different types of behavioural telemetry over variable-length sequences of game sessions.A game session was defined from the moment an individual started the game software until it was closed.We retrieved all sessions produced by an individual from the moment the data they generated first appeared in the game's servers.Since our modelling approach requires to predict, in a supervised manner, the intensity of future playing behaviour given the history of previous interactions, we only considered users with two or more observed game sessions.The reason for this is two fold: sequences of length one do not entail any temporal structure and do not allow to generate a supervised target.The telemetry (see Table 1) were selected to be generalizable and comparable with metrics employed in other be-havioural studies of incentive salience attribution (Berridge and Robinson, 1998;McClure et al., 2003;Zhang et al., 2009).We note that the high dispersion values (Inter Quartile Range or IQR), reported for some of the telemetry are due to the extreme skewness in the distribution of the data.This is caused both by the nature of the phenomenon they describe (e.g.Absence is a classic case of time-to-event measure) and by their typical behaviour in the context of videogames (Bauckhage et al., 2012).The final dataset was composed of 6 columns and 28,155,199 rows.A table of descriptive statistics can be found in 2.

Models
When defining the models used for evaluating our hypotheses, we first established two reasonable single-parameter baselines.The first is a Lag 1 model producing predictions according to the following rule: here t represent a single game session in a sequence of T observed interactions while B are the behavioural metrics described in section 4.1 except for N°Sessions, for which the model provides a constant prediction of 1. Indeed, the lag-1 version of N°Sessions is not a realistic prediction as it linearly increases with t.The second is a Median model computing the expectancy of each of the 4 targets according to the formula: here B t+1 is an exponentially weighted average of all the B t up to t + 1 observed when fitting the model.This is computed separately for every individual in the dataset and the median value of each of the 5 targets is used a a constant prediction.These apparently naive models provide a surprisingly robust prediction baseline for time series that are not white noise (Hyndman and Athanasopoulos, 2018) other than having a nice interpretation in terms of behavioural momentum (Nevin and Grace, 2000): in conditions of high experienced reward the behaviour of an individual tends to be consistent over time (i.e.resistant to change).An ElasticNet (E-Net) (Zou and Hastie, 2005), a form of additive model combining both l1 and l2 regularization, was used to evaluate the performance of simple linear functions.To test the effect of non linearity, a Multilayer Perceptron (MLP), the most common type of deep feedforward ANN, was used.Finally to evaluate the effect of recurrency in combination with non-linearity, we designed a hybrid approach (RNN) integrating recurrent and feedforward operations in a single ANN.Despite being markedly different, this last architecture shares similarities with the technique proposed by Calhoun et.al. (Calhoun et al., 2019) where latent states were generated dynamically and employed by Generalized Linear Models for producing predictions of observed behaviours.A representation of the computational graphs constituting the parametric models can be seen in Figure 7.It should be noted that E-Net is architecturally equivalent to MLP with the only difference being the replacement of stacked feedforward layers with a single layer with linear activation.As illustrated in Figure 5 the inputs to each model were sequences of the same behavioural metrics reported in Table 1 plus the associated game object while the targets were simply the lead 1 version of the same sequences.We want to highlight how each model was fit to sequences coming from the same game object and that the predictions only pertained the behavioural metrics.Indeed, the aim of each model was to solve jointly for each O present in the data.All the models adopted a multi-task approach (see section 3.4) and were designed to perform estimation in a sequence-to-sequence fashion: given an input sequence of length T its lead-1 version was predicted.The MLP and RNN models both used a ReLU activation function.The function has shape ReLU(x) = max(0, x) with x being the value computed by a single hidden unit.All the models, except for Median, were implemented using Tensorflow's high level API "Keras" (Abadi et al., 2015;Chollet et al., 2015).The Median model was implemented using the libraries for scientific computing Pandas and Numpy (pandas development team, 2020; Harris et al., 2020).

Experimental Procedure
Our experimental pipeline is shown in Figure 8 Full Data Preparation When querying the data from the game servers, we excluded from the random sampling procedure individuals having at least one of the considered behavioural metrics over the game population's 99 th percentile.This allowed us to eliminate potentially faulty data which are often present when dealing with telemetry.At this point data were re-arranged in a format suitable for time series modelling (i.e.arrays of shape (batch × T × B) with T being the number of available game sessions and B the number of considered behavioural metrics) and randomly split into a tuning (i.e. 10 %) and validation set (i.e. 90 %).For the sake of clarity we report an example of how the data from a game session are generated and how they are parsed by the models.
"A user decides, 36 hours after the release of game X, to enter the game world for the first time.This is when a session starts and actual playing behaviour can be observed.During this session they engage in various activities leading to 20 non-unique and user-initiated actions (e.g.being attacked by a non-playable character is not counted as a valid action).
After roughly 60 minutes spent playing, the user exits the game world and the session ends.Of this session, 80% of time has been spent actively playing, the remaining 20% has seen the game on pause or the user away from the console (i.e.idle time).After 48 hours the user logs into the game world again and a new session starts" What we described here would correspond to a single time step t 1 in the sequence T of total interactions (i.e.sessions) between the user and the specific game context X.The models will parse this session as a vector of length 4 with values 36, 20, 60, 80 and 20 along with another vector of length 1 containing the numerical index for the game.When all the sessions are observed the models will receive as inputs sequences of length T of the same vectors.This implies that that each model is fitted on 4 continuous and 1 discrete (i.e. the game context) metric.The median and lag-1 models do not explicitly make use of the categorical variable as they are "fit" on separate partitions of the data (i.e. one for each game context).The target of each model is then constituted of 5 variables: the lead-one version of the aformetioned 4 continuous variables plus the total number of future sessions observed at time t.
Performance Analysis The first step in our performance analysis aimed to control for the contribution of hyper-parameters in the performance of the parametric models (especially for MLP and RNN).The choice of factors such as the number of layers and hidden units can influence the number of free parameters and expressive power of an ANN.Manually picking their optimal value is often a challenging combinatorial problem that can lead to unexpected outcomes if left to the subjective choice of the experimenter.Therefore, to find the best hyperparameters we adopted a more impartial and efficient approach relying on algorithmic search.This was done using the Keras Tuner implementation (O'Malley et al., 2019) of the Hyperband algorithm (Li et al., 2017).Hyperband is an optimized version of random search that achieves faster convergence through adaptive resources allocation and early termination of training.It can lead to better or equivalent results to other optimization algorithms but in a fraction of the time (Li et al., 2017).When initializing the tuning step we allowed each model to grow as much as the others (except for E-Net, which, due to the fact that it is a linear model, is naturally constrained to a fixed number of parameters) so that any observed difference in number of parameters was related to characteristics of the model architecture.The tuning step was conducted running one full iteration of Hyperband with a budget of 40 epochs4 on the tuning set.To trigger early stopping for a specific configuration of hyperparameters, we monitored the decrease in loss over a 20% random sample of the tuning set (i.e. the validation tuning set) and we terminated training when the loss reduced by less than δ = 1e−4 for 10 consecutive epochs.Once the best set of hyperparameters was found we proceeded to fit all the models specified in section 4.2 on the training set using a 10-fold Cross Validation Strategy.This divided the validation set in 10 equally sized folds and iteratively used 9 of them for training and 1 for testing.The continuous inputs in the training data were min-max scaled according to the formula where x is the input vector to be scaled, while the categorical input (i.e.game object) was encoded ordinally.In order to take into account the contributions of time, game and target, the performance of each model was given by computing the Symmetric Mean Percentage Error (SMAPE) (Zhu and Laptev, 2017) for each combination of the aforementioned dimensions (e.g.SMAPE of Session Time at t1 for the game object hmg).Each model was trained for a maximum of 200 epochs and interrupted using the same early stopping strategy mentioned above (i.e.absence of δ reduction in loss on a 20% hold-out for 10 consecutive epochs).To maintain the ability to fit each model on temporal series of varying length, we adopted a data generator approach5 feeding data to the models in random batches of 256 time series with constant length within a batch.The models were trained with stochastic gradient descent using the Adaptive Moment Estimation (Adam) (Kingma and Ba, 2014) algorithm to find the set of parameters minimizing the SMAPE between the targets and the predictions generated by the model.
here y and y are respectively the ground truth and the estimate produced by the model while N indicates the size of the batch.The SMAPE is bounded between 0 and 100 and can be interpreted as percentage deviation from the target with lower values indicating better model fit.The choice of SMAPE was dictated by the fact that the targets were expressed on largely different scales (i.e.coming from different games and expressed on different units of measure see Table 1) and therefore required a loss function measuring relative distance from the target.To evaluate the overall performance, we first summed the SMAPE relative to each target in a single global performance indicator: this is the loss function that each model attempts to minimize during training.W then divided the total by 5 (i.e. the total number of targets) in order to express the metric in its original scale (i.e.0 to 100).This was then regressed using a Linear Mixed-effects Model (LMM) with fold number, game object and time as random effects and model type as fixed effect (treatment coded with RNN as reference).Subsequently, for a more thorough investigation of model performance we conducted the same regression analysis separately on each target.Both regression analyses were followed by post-hoc comparisons (i.e.t-tests with Bonferroni correction) for testing the following pairwise hypotheses on the estimated coefficients: Lag 1 < Median < ENet < MLP.All statistical analyses were conducted using the python library statsmodels (Seabold and Perktold, 2010).
Representation Analysis After comparing the performance of the RNN model with that of alternative approaches, we proceeded to analyze the representation inferred by the two ANN, with particular attention to the one generated by the RNN.First, we re-fitted both models on a random sample (i.e.90%) of the validation-set following the same procedure specified in paragraph 4.3.Then, we created an encoder composed of all the transformations and relative parameters leading to the shared-representation layer (red highlight in Figure 7).As illustrated in paragraph 3.4, this is the portion of the model that we expected to approximate the manifold structure of attributed incentive salience.Subsequently, we passed the remaining portion of the validation-set (i.e.10%) as an input to the encoders, producing arrays of shape (batch × T × h) with h being the number of units in the shared layer and T the number of sessions observed for batch number of individuals.
In order to visualize and inspect this multidimensional representation, we used the Uniform Manifold Approximation and Projection (UMAP) algorithm (McInnes et al., 2018), a dimensionality reduction technique based on manifold learning.Given a high dimensional dataset, UMAP first infers its topological structure and then, using stochastic gradient descent, attempts to structurally reproduce it in a lower dimensional space (two or three for visualization purposes) (McInnes et al., 2018).Compared to other similar dimensionality reduction approaches (for example, the t-distributed Stochastic Neighbor Embedding), UMAP tends to better preserve both global and local structure of the original data, meaning that distances in the underlying dataset should be more faithfully reproduced.Moreover, when given a sequence of datasets with entries related to each other, UMAP is able to maintain these relationships during the optimization process6 .In our case these sequential datasets were the T representations generated by the RNN model after observing T games sessions for a group of individuals.Being able to take into account these temporal relationships allowed us to gather information not just on the characteristics of the representation produced by the RNN model but also on their evolution over time.
To clarify, the encoder provided by the ANN was tasked to generate a multidimensional representation where distance represented similarity between individuals with respect to the intensity of their future interactions with a game (see the manifold hypothesis of attributed incentive salience in sections 2.3 and 3.4).The UMAP algorithm made this multidimensional representation interpretable to the human eye approximating it's manifold structure on a 2 dimensional plane and allowing us to evaluate the presence of those desirable properties that we mentioned at the beginning of section 4. Since we did not know the intrinsic dimensionality of the manifold we were trying to approximate, we conducted a Principal Component Analysis (PCA) of the representation generated by the RNN.Despite PCA and UMAP working under radically different assumptions and mechanisms, we thought this could provides us with a lower bound of how much variance we would be able to capture considering only two dimensions.The topological structure of the represen-tation produced by the RNN was inferred by computing the cosine distance in a local neighborhood of 1000 points with a minimum distance of 0.8, while the dimensionality reduction was achieved by running the optimization part of the algorithm for 2000 iterations.The choice of a large neighborhood and minimum distance was made to better capture the global structure of the representation space7 .
To understand the functions underlying the inferred representation, we conducted an exploratory investigation of the relationship between hidden units' activation in the recurrent layer and predictions produced by the model.To quantify the strength of the observed relationship we employed the Maximal Information Coefficient (MIC) (Reshef et al., 2011), a measure of mutual information that can quantify both linear and non-linear association between variables.The MIC can assume values between 0 to 1 with 1 corresponding to a perfect association.We adopted the implementation of UMAP provided McInnes et. al. (McInnes et al., 2018) while the MIC was computed using the python library minepy (Albanese et al., 2013).Visualizations were produced using the python libraries matplotlib (Hunter, 2007) and seaborn (Waskom, 2021).
Partition Analysis We conducted a partition analysis to individuate behavioural profiles associated with the representation generated by our model.As specified in section 3.4 the representation extracted by the encoder at time t can be interpreted as a set of coordinates on the manifold generated by the RNN model after observing t game sessions.Partitioning this representation allows us to identify areas of the manifold that hold information about the history of interactions between an individual and a video game object.These areas may represent variations in the levels of attributed incentive salience and therefore be associated with distinct patterns of behaviour.To partition the data, we used an unsupervised approach applying Mini-Batch K-Means (Sculley, 2010), a variation of K-Means, to the representation extracted by the encoder.Given a dataset, the algorithm attempts to divide it by iteratively moving k centroids so as to reduce variance within each partition.The choice of Mini-Batch K-Means was dictated by the fact that it is one of the few distancebased algorithms that scales to very large datasets.To select the optimal k value, we first fitted the algorithm with a varying number of centroids (i.e. 2 to 10) and computed the associated "inertia" (here, a measure of within cluster variance).Since inertia tends to zero as k approaches the number of points in the dataset, we defined the optimal number of partitions as the value of k at which the inertia reached its "elbow" or maximum curvature (Satopaa et al., 2011).This allows to individuate at which number of partitions there are diminishing returns in terms of within cluster variance reduction.Every instance of Mini-Batch K-Means was initialized 3000 times at random and ran for a maximum of 3000 epochs.The input data were re-scaled to have zero mean and unit-variance and passed to the algorithm in random batches of size (512 × h).The associated behavioural profiles were found by applying this methodology separately to each game object and retrieving for each partition the mean of all the behavioural metrics over time.The Mini-Batch K-Means implementation used for this analysis was provided by the python library scikit-learn (Pedregosa et al., 2011).
All the analyses were conducted using Python programming language version 3.6.2(Van Rossum and Drake, 2009).

Performance Analysis
At the level of global performance the RNN model markedly outperformed all competing approaches as clearly shown in Figure 9.This can be also seen in the results of the regression (see Table 3) and post hoc analysis (see Table 4).From the post hoc analysis we can also observe how all the pairwise hypotheses presented in paragraph 4.3 are confirmed.Here model performance is given by the sum of all the losses produced by the five targets and therefore provides a general indicator of model fit where lower values indicate a better performance overall.The superiority of the RNN model can still be observed when comparing the models on each target separately.However, the size of the effect varies depending on the target (see Table 5).The same trend is also present in the post hoc analysis (see Table 6) where we observe only a partial confirmation of the pairwise hypotheses.The ENet model is outperformed by the Median baseline for three targets, namely Future Active Time, Session Time and Session Activity.All the coefficients in the regression analyses and the differences in the post hoc analyses are non-standardized and can be interpreted as absolute changes in percentage error (i.e.SMAPE).In order to make these values more easily interpretable, we can use the information Table 2.For example, knowing that the median Session Time for the jc3 object is 162 minutes we can derive that when the RNN model achieves a SMAPE of 30% in predicting Future Session Time, this equates on average to an absolute error of 1.62 × 30 ∼ 48 minutes.All the p-values in the post hoc analyses are Bonferroni corrected for multiple comparisons.The results of the statistical analyses suggest positive additive effects of non-linearity and recurrency on model performance both at the level of global and target-specific performance.This effect is more pronounced for certain targets (e.g.Future Session Time, Future N°Sessions) than for others (e.g.Future Absence, Future Active Time).Moreover, looking at Figure 10 it appears that RNN improved on MLP (i.e. the second best model) using roughly half the parameters and per-epoch computation time.This could indicate that recurrency both improves model fit and allows for more efficient use of the available parameters.

Representation Analysis
From figure 11A we can observe consistent patterns of crosscorrelation for the activity of the artificial neurons constituting the RNN representation.This is supported by the fact that, considering only two dimensions, PCA was able to explain from 30 to 60% of the variance in the representation gener- ated by the RNN, with maximum explanatory power around 6 and 8 principal components.Inspecting the representation generated by the RNN model at t1 (see Figure 12A) we observe that the model was able to effectively distinguish between different game objects while simultaneously encoding for variations in the expected intensity of future interactions.This is illustrated by the fact that each game object occupies different and distinct regions in the representation space while showing a within-object gradient-like organization that places individuals (i.e.single dots) on a continuum based on the estimated magnitude of their future behaviour.This organization is preserved for each of the six targets showing how the representation inferred by the model is a suitable meta-descriptor for different behavioural indicators.As expected, some targets show a very similar but not identical organization (e.g.Future Session Time and Future Session Activity) while others appear to be independent (e.g.Future Session Time and Future Absence).We note that the absolute location of each game aggregate (i.e.all the points belonging to a specific game object) on the 2D plane is arbitrary.As we can see in figures 12 and 13, this will change at every run of the algorithm due to the stochastic nature of UMAP.Panels 12B and 12C provide more insight into the activation profiles of individual hidden units constituting the generated representation.Panel 12B shows the relationship between the activity  of 10 randomly-chosen units and the predictions generated for the five targets.These are essentially transducer functions illustrating how the estimate for a particular target varies (on average) as the output of a units increases or decreases.Each unit seems to encode for multiple non-monotonic functions, one for each of the considered targets.Differences in the shape of these functions reflect similarities between their associated targets.For example, the functions associated to two highly related targets like Future Session Time and Future Session Activity (see panel 12A) appear to be very similar in shape (see panels 12B and 12C).Interestingly, although most units appear to encode for unique functions some of them (e.g.41 and 44) show an almost identical behaviour.This suggests the presence of redundancy in the functions underlying the representation generated by the RNN model.These observations are clarified in panel 12C, where the functions associated with a single unit (20, indicated by a dark box in 12B) are presented.Here we observe a strong, non-linear relationship between the unit's activity and the estimated targets (see the high MIC values and the line of best fit).In addition, the between-targets variation in MIC values suggest how the chosen unit is not equally informative for all targets.
The analyses in Figure 12 were performed at a single time point t1.When performed at subsequent time points the results appear to be qualitatively similar.For example, focusing on Future Session Time (see Appendix A for results connected to other targets), we see in Figure 13A that the model's ability to segregate different game objects while providing an overarching representation of the intensity of future interactions is preserved over time.This supports the hypothesis that the representation inferred by our model is dynamic in nature which is further corroborated by panel 13D.There we can see how the RNN model was able to individuate a "space" with temporally consistent "hot" and "cold" regions between which individuals moved over time depending on the expected intensity of their future interactions.This means that given the history of interaction of a particular individual with a specific game object, our model would determine their "position" (i.e.their "internal state") in the "attributed incentive salience space".This aligns with the manifold hypothesis mentioned in sections 2.3 and 3.4: changes in the propensity to interact with a specific game object (i.e.variations in the amount of attributed incentive salience) can be expressed moving on a manifold embedded within an h dimensional space, with h being the dimensionality of the representation generated by our RNN model.It appears that the hidden units constituting this representation tend to be consistent over time in the type of functions they encode (see Figure 13B and C).As expected, we can again observe a strong non linear association between units' activation and targets' predictions, see MIC values and lines of best fit.The decrease in MIC value observed in Figure 13C for the artificial neuron 72 might indicate how certain units lose their informative power over time.As we mentioned in section 3, both ANNs try to predict the intensity of future behaviour given the history of interactions.They do so relying on the same type of metrics, leveraging similar computational mechanisms (i.e.multitask learning and non-linearity) and producing representation according to the same underlying principle (i.e. the manifold hypothesis).Nevertheless, the fact that MLP provides poorer fit to data already suggests that whatever representation it has inferred it is likely a sub-optimal approximation of the manifold structure of incentive salience.Looking at figure 14A, and knowing that UMAP represents differences and similarities between points through distance, we can see how the representation generated by the MLP less clearly differentiate between game objects.On the same figure, we can notice how the gradient representation for the metric Future N°Sessions Time is largely disrupted.This effect is how-ever consistently less pronounced for other metrics (see our GitHub for additional visualizations), in accordance to the differences we observed in predictive performance (see Figure 10).Recalling what mentioned in section 3, the latent state produced by the level of attributed incentive salience should retain at any point in time some predictive power over the intensity of all the future interactions (i.e.not just the one that follows).Figure 14B shows the representation generated by RNN and MLP at t1 but color coded with the discounted sum of the predictions made from t4 onward.We can see that, even if degraded, RNN still preserves some of the desired gradient-like organization which is instead much more disrupted for MLP.This is in accordance to what is shown by Figure 13D: the RNN appears to define regions of high and low expected behavioural intensity which are consistent over time rather than constrained to the region around t + 1.

Partition Analysis
We will focus on the representation generated at t4 for the game object hmg, as highlighted in the rightmost panel of Figure 13A (results related to other game objects can be found in Appendix B).As we can see from Figure 15A, following the methodology outlined in section 4.3, among all the Mini-Batch K-Means runs, the one with k = 4 was identified as optimal.All the partitions were associated with a distinct behavioural profile, each one with a distinct offset and temporal evolution.At a global level, the four partitions seem to belong to two general groups: a group with a high propensity to produce future interactions (i.e.partitions 1 and 2) and a group with low propensity (partitions 3 and 4).Noticeably, when looking in detail at each specific partition they appear as variations on the macro group they belong to.Interestingly the percentage of Session Time spent actively interacting with the game object (i.e.Active Time) seems to be a relevant component in this more granular characterization.
Partition 1 represents individuals producing very high intensity interactions (see Session Time and Activity) at a high frequency (see Absence).The high amount of Active Time highlights how the individuals were actively interacting with the game object.The individuals in this partition are projected to produce a number of future interactions that is slightly above average while maintaining a high intensity profile.It can be speculated that the history of high intensity interactions reflected a positive propensity towards the game.This might have prompted individuals in this partition to consume most of the available contents in the game leading to a reduced amount of expected future interactions.
Partition 2 describes individuals that have a history of very frequent (see Absence) but brief interactions with little activity and active time.These individuals are expected to main- The colours in the Game Context panel indicate the game object from which the representation is coming.Colours in the small panels represent the discounted sum of all future predictions for a particular target (for example, estimated Future Session Time) B t2:T which is given by ∑ t2:T i=0 γ i B i with γ = 0.1 as illustrated in equation 1.Each unit encodes the intensity of future interactions through multiple non-monotonic functions.Panels B and C show the relationship between the activation of randomly-selected hidden units in the LSTM layer of the RNN and the model's predictions at t1. Panel B shows the relationship between the discretized activation of 10 randomly selected units (artificial neurons) plotted along the y axis and the predictions made by the model at t1 (colour coded from blue to red as in the small panels in A) for the game object hmg.Panel C shows in more detail the relationship between discretized activation and RNN predictions for a single unit highlighted by a black box in Panel B.Here the x axis indicates the discretized activation while the y axis the mean discretized discounted sum of all future predictions produced by the model.Vertical lines are standard errors of the mean.The red curve is the line of best fit provided by a generalized additive model (Servén and Brummitt, 2018) while the box reports the MIC and the correlation coefficient (Spearman's ρ) between the artificial neuron activation and the model's predictions.
tain this trend in the future although producing a number of interactions that is largely above the average.An hypothetical explanation might see individuals in this partitions constituting a variant of those in Partition 1.The high frequency of interactions could suggest an eagerness to interact with the game object.This, combined with the low amount of consumed content (see Session Time, Active Time and Session Activity) could explain the projected high amount of future interactions.
Partition 3 includes individuals whose interactions have been average both in terms of length, amount of activity and frequency until session 3. From there, a reduction in both length and activity can be observed concomitant with a large temporal hiatus before the following interaction.This reduction is expected to continue and these individuals are estimated to produce a number of future interactions markedly below average while also maintaining a low intensity of interactions.These individuals might have started with a nor-mal propensity towards the game which suddenly degraded around session 4, leading to a marked reduction in the expected number of future interactions.
Partition 4 contains individuals producing the least intense and frequent interactions.However, these individuals have the highest amount of active time.Similarly to partition 3 a long temporal hiatus seems to follow a slight dip in session time and activity.Also similarly to partition 3 these individuals are estimated to produce a number of future interactions below average while maintaining the original low intensity profile.Differently from partition 3, these individuals started and maintained a low intensity profile, suggesting a negative propensity toward the game.However, the short idle time (see Active Time) characterizing their game sessions might have worked as an attenuating factor leading to a higher amount of expected future interactions.
Looking at the relationship between behavioural metrics we observe that Session Time and Session Activity are usually highly correlated.Low absence seems to be a good indicator of the propensity to produce more interactions in the future.Similarly, high absence seems to be associated with a general history of low intensity interactions.It is also worth noting that variations in this metric seem to follow and be proportional to increases and decreases in interactions' intensity (e.g.see partitions 1, 3 and 4).

Discussion
In this paper we presented a modelling approach based on ANNs for approximating the manifold structure of motivationrelated latent states in scenarios where only large amounts of behavioural data are available.Our approach produces predictions of the intensity of future interactions between individuals and a diverse range of video games while also generating representations that well approximated some of the properties of attributed incentive salience.We also show how integrating theoretical and computational insights related to the concept of attributed incentive salience in the design of our modelling approach improved its efficiency and effectiveness.
Theoretical Implications The advantage provided by the combination of non-linearity and recurrency in the estimation task is in line with the dynamical nature of motivation and incentive salience attribution (Toates, 1994;Robinson and Berridge, 1993;Zhang et al., 2009;Tindell et al., 2009;Berridge, 2012).This is also consistent with a body of research showing that the attribution of value to potentially rewarding objects or actions is often carried out by non-linear recurrent operations (Song et al., 2017;Wang et al., 2018b) and that Artificial Neural Networks with recurrent connections are well suited for approximating these operations (Kietzmann et al., 2018).These findings are corroborated not just by the superior performance of the RNN model in the prediction task (see section 5.1) but also by its capacity to produce more stable representations (see Figure 14).As mentioned in section 2.2, incentive salience attribution produces latent representations of objects which, when imbued with value, make future interactions with those objects more likely and intense (Berridge and Robinson, 1998;Berridge, 2004).The representation generated by our model showed similar functional properties in their global-local organization.At the global level, different game-objects were organized in distinct and coherent regions (see Figure 12A) showing how the model attempted to operate on a meta-level by partitioning a global representation in several object-specific ones.This finding aligns with what highlighted in various work on neural manifold where the responses related to qualitatively different stimuli tends to show a cluster-like organization when reduced to a lower dimensional space (Stopfer et al., 2003;Gallego et al., 2017;Ganmor et al., 2015).At the local level, each object-specific representation showed an internal gradient-like organization distinguishing individuals based on the estimated intensity of their future interactions with that specific object.This was true for each of the considered behavioural targets (see Figure 12A) showing how the model attempted to provide an holistic description of the intensity of future interactions.The presence of this type of gradient-like organization emerged in a work by Nieh et al. (Nieh et al., 2021) when analyzing neural responses during an evidence accumulation task in virtual reality.When reducing the neural activity to a 3 dimensional space, the resulting manifold presented a clear gradient able to code simultaneously for position and levels of accumulated evidence (Nieh et al., 2021).A similar finding was present in the work by Stopfner et al. (Stopfer et al., 2003) where the manifold structure extracted from the activity of olfactory neurons was able to represent qualitative and quantitative differences between odours through a global-local organization similar to that showed in section 5.2.The dynamic nature of the representation generated by our approach also nicely fits with that of attributed incentive salience (Toates, 1994;Robinson and Berridge, 1993;Zhang et al., 2009;Tindell et al., 2009;Berridge, 2012).In particular, the fact that the aforementioned global-local organization is maintained over time (see Figure 13A) corroborate the hypothesis that our model approximated state changes originated from a dynamic process.In support of this, we also observed that the representation generated by our model was spatially coherent over time: it produced distinct regions of low and high expected intensity between which individuals moved over time (see Figure13D).These results appear to match the definition of motivation and incentive salience attribution specified in section 2.1: a single overarching process able to dynamically predict the likelyhood and intensity by which individuals will interact with a varied set of objects (Simpson and Balsam, 2016;Toates, 1994;Berridge, 2004;Zhang et al., 2009).Many other cognitive and affective functions might rely on a latent representation that is functionally similar to the one described in our work (e.g.credit assignment and optimal control Wang et al. (2018b); Barto and Dietterich (2004), cognitive control, learning Skinner (1965) or various forms of reward processing Schultz et al. (1997Schultz et al. ( , 2000))).Similarly to attributed incentive salience, these functions are all involved in generating motivated behaviour and heavily rely on reward signals, however none of them is concerned with attributing and describing the motivational saliency that an object possess.This is made evident in the works by Mc-Clure et al. McClure et al. (2003) and Zhang et al. Zhang et al. (2009) where the system involved in salience attribution is functionally separate from the one assigning credit and executing actions: the former provide a representation that informs and biases the decisions taken by the latter serving an almost exclusively qualifying role (see the role of attributed incentive salience in addiction-like conditions Robinson and Berridge (1993)).Similarly, the representation generated by our model doesn't provide any insight on the decision making process underlying the observed playing behaviour but simply provide an approximate description of the "motivational pull" that a particular game object has on a particular individual at a certain point in time.The functions encoded by the hidden units constituting the representation appeared to have a series of distinctive properties, namely: redundancy, non linearity, multiplicity (single units code for multiple functions) and consistency over time.These may have played a role in providing the representation generated by our model with its distinctive characteristics.For example, as we mentioned in section 2.3 redundancy and inter-correlation are characteristics of the signals from which the manifold representation of internal states arises (Seung and Lee, 2000;Gallego et al., 2017).Multiplicity on the other hand, might be the factor underlying the ability of our model to produce a single unitary representation which holds predictive power over different behavioural targets.Finally, consistency over time could be the mechanisms supporting the type of temporal coherence observed in panel 13D.We want to stress that these findings are to be considered exploratory in nature since they do not rely on a-priori hypotheses.A comparison between these computational properties and those underlying the attribution of incentive salience is required and would constitute a potential venue for future investigations.This supports the idea that our approach, by giving full access to its constituent parts, provides a certain degree of interpretability and offers the possibility of generating testable hypotheses.The partition analysis revealed a set of diverse profiles that largely reflect expected behavioural correlates of different levels of attributed incentive salience (i.e.high vs low intensity profiles) (Berridge, 2004).The various offsets that each partition showed might suggest different levels of predisposition towards the individual game-objects.The dynamic nature of these profiles provided a more granular characterization allowing to observe variations in the entire history of interactions and not just in the expected intensity of future ones.For example, it was possible to see how a higher likelihood of future interactions was supported both by a history of low intensity but high frequency interactions as well as by a series of high frequency and high intensity interactions (see partitions 1 and 2 in Figure15B).In this sense, these behavioural profiles can be seen as useful devices for investigating the existence of inter-individual differences in schedules of interactions with potentially rewarding objects.
Applicative Implications The present work outlined a method for embedding theory-driven knowledge in data-driven approaches, allowing to more easily interpret and test hypotheses on the representation they produce.In comparison to other works focusing on the identification of latent states (or their manifold representation) from behavioural data (Calhoun et al., 2019;Luxem et al., 2020;Pereira et al., 2020;Shi et al., 2021;McCullough and Goodhill, 2021), the present methodology offers a series of advantages.It does not require the Markov assumption, it generates continuous rather than discrete state space (hence the number of hidden states doesn't need to be specified) and it relies on a more easily scalable class of algorithms.Moreover, in contrast with a general tendency of utilising completely unsupervised techniques for capturing the manifold structure underlying behavioural data (Calhoun et al., 2019;Luxem et al., 2020;Pereira et al., 2020;Shi et al., 2021;McCullough and Goodhill, 2021), our methodology attempts to extract representations which obey to specific functional constrains (see section 3.4) and can therefore be more easily interpreted within a specific theoretical framework.Our approach offers a convenient framework for dealing with a diverse series of tasks.It allows to produce predictions of the amount and intensity of future interactions that an individual will have with a specific object.It generates a representation that can be analyzed (similarly to what has been done in section 4.3) or provided as input to other algorithms.Indeed, the encoder mentioned in section 4.3 can be thought of as an automatic feature extractor.This can be used to reduce complex time series data of varying length to fixedsize vectors able to describe the propensity of an individual to interact with an object.For example, the analysis presented in section 4.3 showed how this process could be applied for time-series partitioning of large dataset.The present work leveraged data coming from video games but the adopted approach could easily be applied to other contexts.They only key requirement is the access to behavioural quantifiers describing the amount and intensity of interactions that an individual has with a particular object, service or task.This means that natural areas of application for our approach are those relying on the remote acquisition of behavioural data (e.g.web services or online experiments) but also situations in which large volumes of experimental data are available (e.g.large multi-center studies).

Limitations and Future Directions
The work we just presented is not exempt from limitations.First, since our approach is attempting to solve an inverse problem, the issue of uniqueness arises.Many different latent states might have produced the behavioural patterns that our model observed and there is no guarantee of a strict one-to-one mapping between the representation generated by our model and attributed incentive salience.Our approach is formally different from that of TD Learning8 and does not model the process of incentive salience attribution but rather attempt to approximate the product of this process (i.e.changes in attributed incentive salience).For this reason a direct comparison with the work of McClure et. al. (McClure et al., 2003) and Zhang et. al. (Zhang et al., 2009) is difficult.Moreover, unlike TD learning (Schultz et al., 1997) our model is not guaranteed to converge on a quantification of V that is directly comparable to its biological counterpart or that has arisen from the same type of computations.This is also reinforced by the differences in mechanistic functioning between biological and artificial neural networks (Lillicrap and Santoro, 2019;Lillicrap et al., 2020).These issues are partially attenuated by the constraints provided by our theoretical framework but in line with similar reports in the literature (Calhoun et al., 2019;Wang et al., 2018b) a verification based on controlled experiments is desirable.This could be achieved applying our approach to behavioural data acquired in laboratory settings or investigating differences and similarities between the computations achieved by our approach and those produced in simulation experiments.Differently from the works of Calhoun et.al. (Calhoun et al., 2019), McClure et. al. (McClure et al., 2003) and Zhang et.al. (Zhang et al., 2009), our methodology relies on a supervised learning approach to perform both prediction of future behaviour and latent state estimation, making this two tasks infeasible before any data is observed.This limitation could be attenuated by initializing our model using a representation generated in an unsupervised manner.As we mentioned in section 3.2 the reward dynamics generated by the interaction between the individual and the game incentive mechanics play an important role in determining the intensity of future playing behaviour (Agarwal et al., 2017;Ašeriškis and Damaševičius, 2017;Wang et al., 2018a).In addition to this, we know that these dynamics are modulated by the internal state of the individual (Zhang et al., 2009) and by the context in which in which they are generated (Palminteri et al., 2015).These factors, were only partially captured by our approach as they require a higher temporal resolution (i.e.within rather than between sessions) as well as more granular indices (i.e.in-game and environmental information) than those we employed.As a consequence we can see how our approach, despite outperforming competing ones, still achieves a relatively high error rate in predicting some behavioural targets (e.g.future Absence).Moreover, it is not possible to determine if the differences in the behavioural profiles observed in Figure 15 should be ascribed to internal factors of the individuals, to changes in their environment or inside the game.A possible solution to this would be to incorporate information about the context in which the observed behaviour occurred (e.g.time, location or in-game events) and adapt the architecture of our model accordingly.As well as improving the performance of the model, this should also increase the quality of the generated representation and consequently also that of the derived behavioural profiles.The behavioural profiles individuated by the partition analysis generally reflect those predicted by theories of reward-driven motivation (Thorndike, 1927;Skinner, 1965;Berridge, 2004) but they also show some unexpected and potentially contradictory results (see the differences between partitions 1 and 2 and between partitions 3 and 4 in Figure 15B).Given the observational setting and the unsupervised learning analysis we adopted, the explanations provided in section 4.3 should be taken with caution and be seen mostly as a starting point for future investigations.Clarifying the the nature of these discrepancies may require experimental work in more controlled settings.Lastly, despite the fact that our approach appeared to deal gracefully with objects having different structural characteristics, these were limited to the domain of video games.In order to verify the generalizability of our approach, future work should include data generated from a variety of contexts (e.g.web services, online and laboratory-based experiments).

Declarations
Ethical Approval All data were obtained and processed in compliance with the European Union's General Data Protection Regulation (EUd).shows how the optimal number of partitions was individuated using the "elbow method".Here the x axis indicates the number of partitions tested while the y axis shows the associated inertia.The point of maximum curvature (i.e. the optimal number of partitions) was found by identifying the number of partitions maximizing the distance (i.e. the vertical line) from the overall gradient of the inertia (i.e. the oblique line).Panel B shows the individuated partitions and associated behavioural profiles for the game object hmg at t4.The big panel reports the same UMAP reduction presented in the last column of Figure 13.Each dot is the representation associated with a particular individual and is colour coded based on the partition to which it belongs.Small panels represent the temporal evolution of four of the considered behavioural metrics for each individuated partition.The panel relative to N°Sessions only reports the prediction produced by the model as the number of preceding session is constant for all the partitions.The x axis reports the game sessions while the y axis the value assumed by the considered metric at a specific point in time.The y axis is expressed in terms of number of standard deviations from the game population mean (i.e.z-scores).Each line indicates the mean z-score while the shaded area around the line its 95% confidence interval.The solid part of each line indicates the portion of the temporal series observed by the model (i.e. the input) while the dotted part the predictions produced at that point in time.

Fig. 1 :
Fig. 1: Motivation as a vector.Blue and red dots represent two objects with different characteristics while the two arrows illustrate the hypothetical motivational propensity of an individual (or two individuals) towards them.The black segments delineate the space created by the combination of the objects' characteristics and the motivational propensity of the individuals.Here the red object has the potential to generate more behaviour than the blue object possibly as a result of its characteristics and those of the individual interacting with it.

Fig. 3 :
Fig. 3: Graphical representation of TD Learning.Red arrows indicate the flow of the computations for deriving δ and updating V expressed by equations 2 and 3. Black arrows instead indicate the changes of V moving from s to s t+1 .Solid circles indicate states which have already been observed while dashed ones represent future not-yet observed states.

Fig. 5 :
Fig.5: Differences in single-step prediction between feed-forward (A) and recurrent (B) neural networks.Adapted fromBengio et al. (2017);Lillicrap and Santoro (2019), the figure represents how feedforward and recurrent ANNs could be used for estimating V (s t ).Here OB = {OB t : t ∈ T } indicates the series of inputs of length T that the network receives while the target is the lead 1 version of the B portion of the same series.The series V = { V (s t ) : t ∈ T } correspond to the representations generated combining the input with the parameters Θ learned by the network in order to approximate the target.Circles indicate computational blocks similar to those present in figure4.Black and red arrows are respectively the direction of the computations and the flow of the error gradient.

Fig. 6 :
Fig.6: Multi-task learning in an ANN.Adapted fromBengio et al. (2017).The figure represents how multi-task learning could be used in an ANN to force the the latent representation h to be a sensible approximation of V (s t ).Here V (s t ) indicates the representation generated by a recurrent layer at time t while B t+1 = {B n t+1 : n ∈ N} are N targets quantifying the strength of the next interaction (in terms of frequency and amount of behaviour) between I and O. Black and red arrows are respectively the direction of the computations and the flow of the error gradient.Circles indicate computational blocks similar to those in figures 4 and 5.

Fig. 8 :
Fig. 8: Experimental Pipeline.Arrows indicate the flow of the pipeline.Big coloured blocks are major pipeline steps, white rectangles indicate sub-tasks within each step.

Fig. 9 :
Fig. 9: Aggregated comparison of model performance.Overall, our approach (RNN) outperforms all the competing approaches.Box-plots show the 10-fold cross-validation performance expressed as the total percentage of error (i.e.SMAPE) of each model over the five targets.

Fig. 10 :
Fig. 10: Dis-aggregated comparison of models' performance.. Our approach (RNN) outperformed all competing ones on each target.It consistently used fewer parameters and had shorter computation time than the second best performing model.Box-plots show the 10-fold crossvalidation performance expressed as percentage of error (i.e.SMAPE) of each model for the five targets.The bar-plot on the top row indicates the number of free parameters for each model while the bar plot on the bottom row shows the average time for each training epoch.Both bar-plots are log 10 scaled.

Fig. 11 :
Fig. 11: The activity of the RNN's artificial neurons is markedly redundant.Panel A shows the cross correlation between the activity of the RNN's artificial neurons in the game object hms going from t1 to t4.The y and x axes are symmetrical and identifies the RNN artificial neurons while the coloured cells report the Spearman's Rho correlation coefficient for the activation of each pair of neurons.White cells represent combinations for which the correlation coefficient resulted lower than 0.05.Two principal components can explain a large portion of variance in the representation generated by the RNN.Panel B shows the percentage of explained variance by considering 2 to 20 principal components for each game object going from t1 to t4.The y axis indicates the percentage of explained variance while the x axis the number of principal components considered.

Fig. 12 :
Fig.12:The representation generated by the RNN model distinguishes between different game objects while maintaining an overarching organization able to capture variations in the expected intensity of future interactions that individuals will have with a specific game object.Panel A shows the two-dimensional projection, produced by UMAP, of the multi-dimensional representation inferred by the RNN at t1 as produced by UMAP.We can read the values of the x and y axes as a coordinate system where proximity represents similarity between points in the original high-dimensional space.Each point indicates the representation inferred by the RNN model after observing one game session from a single user.The colours in the Game Context panel indicate the game object from which the representation is coming.Colours in the small panels represent the discounted sum of all future predictions for a particular target (for example, estimated Future Session Time) B t2:T which is given by ∑ t2:T i=0 γ i B i with γ = 0.1 as illustrated in equation 1.Each unit encodes the intensity of future interactions through multiple non-monotonic functions.Panels B and C show the relationship between the activation of randomly-selected hidden units in the LSTM layer of the RNN and the model's predictions at t1. Panel B shows the relationship between the discretized activation of 10 randomly selected units (artificial neurons) plotted along the y axis and the predictions made by the model at t1 (colour coded from blue to red as in the small panels in A) for the game object hmg.Panel C shows in more detail the relationship between discretized activation and RNN predictions for a single unit highlighted by a black box in Panel B.Here the x axis indicates the discretized activation while the y axis the mean discretized discounted sum of all future predictions produced by the model.Vertical lines are standard errors of the mean.The red curve is the line of best fit provided by a generalized additive model(Servén and Brummitt, 2018) while the box reports the MIC and the correlation coefficient (Spearman's ρ) between the artificial neuron activation and the model's predictions.
Consent to Participatel Not Applicable.Consent to Publish Not Applicable.Funding This work was supported by the EPSRC Centre for Doctoral Training in Intelligent Games & Games Intelligence (IGGI) [EP/L015846/1].

Fig. 13 :
Fig.13: The representation generated by the RNN model maintains its discriminant properties over time.Panel A shows a two-dimensional projection of the multi-dimensional representation inferred by the RNN at t2, t3 and t4.The inferred representation maintains its gradient-like organization over time with an increased ability to differentiate between game objects.As in Figure12,x and y axes are dimensions individuated by the UMAP algorithm and can be interpreted as a coordinate system where proximity represents similarity between points.Colours in the first row indicate which game object the representation is coming from while those in the second row indicate the discounted sum of future predictions for a single target (i.e."Future Session Time").The units constituting the generated representation encode for functions that are consistent over time.Panels B and C show the relationship between units' activation and the model's predictions over time for the game object hmg.Different units appear to encode the same target with different non non-monotonic functions which are relatively consistent over time.Panel B illustrates the relationship between the same 10 randomly selected units specified in figure12and the predictions made by the model for Future Session Time at t2, t3 and t4.Panel C shows in more detail the relationship of the three artificial neurons, highlighted by black boxes in B, across time.Each row is a different unit while each column corresponds to a different t.The x axis indicates the discretized activation while the y axis the mean discretized discounted sum of all future predictions.Vertical lines are standard errors of the mean.The red curve is the line of best fit provided by a generalized additive model (Servén and Brummitt, 2018) while the box report the MIC and the correlation coefficient (Spearman's ρ) between the artificial neuron activation and the model's predictions.The generated representation produces areas of low and high expected intensity among which individuals move over time.Panel D shows trajectories through time produced by a version of UMAP that incorporates temporal information.Data are drawn from random subsets of individuals having low, medium and high variability in their expected amount of future behaviour.The representation inferred by the RNN model produces "hot" (i.e. the left side) and "cold" (i.e. the right side) regions, representing high and low expected Future Session Time, that are spatially consistent over time.Individuals appear to either stay in the same region or to move between regions over time.Here each line represents variations in the representation generated by the RNN model for a single user over four temporal steps.Continuity is generated by means of cubic spline interpolation for the lines and by linear interpolation for the colours.The x and y axes are the dimensions individuated by the UMAP algorithm while the z axis indicates the associated point in time.Colours indicate the discounted sum of future predictions produced by the model at a specific point in time.

Fig. 14 :
Fig. 14: The representation generated by the MLP model is less effective at distinguishing between different game objects and different levels of expected future behaviour intensity.. Panel A shows a two-dimensional projection of the multi-dimensional representation inferred by the MLP at t1, t2, t3 and t4.Differently from the RNN, the representation shows a disruption in the gradient-like organization and a reduced ability to differentiate between game objects which remain constant over time.The x and y axes are dimensions individuated by the UMAP algorithm and can be interpreted as a coordinate system where proximity represents similarity between points.Colours in the first row indicate which game object the representation is coming from while those in the second row indicate the discounted sum of future predictions for a single target (i.e."Future N°o f Sessions") The representation generated by the MLP model is less effective at at distinguishing different levels of expected behaviour intensity for states that are further away in the future.Panel B shows a two-dimensional projection of the multi-dimensional representation inferred by the RNN (left) and MLP(right) at t1 but colour coded with the discounted sum of future predictions from t4 onward.The representation generated by the RNN is able to maintain a gradient-like organization even from states that are further away in the future while this capacity is almost entirely lost for the MLP.The colours in the Game Context panel indicate the game object from which the representation is coming.Colours in the small panels represent the discounted sum of all future predictions for a particular target computed from t4 onward instead that from t1.The x and y axes are the dimensions individuated by the UMAP algorithm.

Fig. 15 :
Fig.15: Partitioning the representation generated by the RNN model produces behavioural profiles with distinct characteristics.Panel A shows how the optimal number of partitions was individuated using the "elbow method".Here the x axis indicates the number of partitions tested while the y axis shows the associated inertia.The point of maximum curvature (i.e. the optimal number of partitions) was found by identifying the number of partitions maximizing the distance (i.e. the vertical line) from the overall gradient of the inertia (i.e. the oblique line).Panel B shows the individuated partitions and associated behavioural profiles for the game object hmg at t4.The big panel reports the same UMAP reduction presented in the last column of Figure13.Each dot is the representation associated with a particular individual and is colour coded based on the partition to which it belongs.Small panels represent the temporal evolution of four of the considered behavioural metrics for each individuated partition.The panel relative to N°Sessions only reports the prediction produced by the model as the number of preceding session is constant for all the partitions.The x axis reports the game sessions while the y axis the value assumed by the considered metric at a specific point in time.The y axis is expressed in terms of number of standard deviations from the game population mean (i.e.z-scores).Each line indicates the mean z-score while the shaded area around the line its 95% confidence interval.The solid part of each line indicates the portion of the temporal series observed by the model (i.e. the input) while the dotted part the predictions produced at that point in time.

Fig. 16 :
Fig. 16: UMAP reduction and artificial neurons activations profile of the RNN representation at t2, t3 and t4 for the target Future N°S essions.

Fig. 18 :
Fig. 18: UMAP reduction and artificial neurons activations profile of the RNN representation at t2, t3 and t4 for the target Future Active Time.

Fig. 19 :
Fig. 19: UMAP reduction and artificial neurons activations profile of the RNN representation at t2, t3 and t4 for the target Future Session Activity.

Fig. 21 :
Fig. 21: Partitions and associated behavioural profiles for the game object jc3.

Fig. 22 :
Fig. 22: Partitions and associated behavioural profiles for the game object jc4.

Fig. 23 :
Fig. 23: Partitions and associated behavioural profiles for the game object lis.

Fig. 24 :
Fig. 24: Partitions and associated behavioural profiles for the game object lisbf.

Table 1 :
Description of Selected Telemetries

Table 2 :
Descriptive Statistics of Considered Metrics and Games Chollet et al. (2015) feedforward (i.e.ENet, MLP) and B recurrent (i.e.RNN) architectures.Blue, orange and green shapes represent respectively feedforward, embedding and LSTM layers.Embedding layers are a type of feedforward layers specifically designed for dealing with categorical inputsChollet et al. (2015).Gray shapes indicate operations with no learnable parameters, such as array instantiation and concatenation.Stacked, transparent colouring indicates arrays with a sequential structure.Straight and curved arrows refer to the presence of feed-forward or recurrent information flow.The red highlight shows the portion of the model we hypothesize is inferring an approximation of attributed incentive salience.

Table 3 :
Results of LMM on Collapsed Targets (Sum)

Table 5 :
Results of LMM on Non-Collapsed Targets

Table 6 :
LMM Post-Hoc on Non-Collapsed Targets