1 Introduction

The convergence of heightened focus on quality of life, well-being, and societal climate resilience, alongside the transformative impact of the ongoing global COVID-19 pandemic, has created an opportunity for us to reconsider how we plan and design spaces. Besides, architectural design features such as shape or building orientation have a significant influence on energy loads and their trade-offs [26]. This creates various new challenges to re-imagine the way of designing, operating, and using these common areas, thus bringing attention to analysing the behaviour of people who interact and live in public spaces. Tuan [41] states that “place is space infused with human meaning” and argues for two important concepts that humans are rooted in place and possess and cultivate a sense of place. The impact of environmental problems on humans is significant, affecting all human activities, including health and socio-economic development. Thus, there is a need to rethink how space is used. A well-designed public space matches the multiple needs of everyday and one-time users and it should respect the European guidelines that advocate sustainable issues for planning. Providing planners with Information and Communication Technologies (ICT) tools that can facilitate the definition of guidelines or protocols for their investigation should be fundamental to determining the significance of such areas for society [32].

1.1 Challenges

Given such a premise, and considering the recent literature in the field, the design of public space starts from what occurs in that environment. Understanding the behaviours, actions and attitudes of people living in that space outperforms the standard rules of design that usually are applied. Human Behaviour Analysis (HBA) has positive outcomes from the prediction, generation, and simulation of human behaviour [19]. Human trajectory is a research area in HBA; when humans move in a given environment, they intuitively follow unwritten social rules [34]. Their behaviour also strongly depends on the type of environment in which they operate (e.g., malls, parks, or sidewalks). Predicting human behaviour by trajectory prediction is a burdensome task in several aspects. Recent research on computer vision has addressed these challenges by overcoming these limitations. Among them, the most important are the following [12]:

  • Socially acceptable movements. Some paths are physically possible but are usually not performed to comply with implicit social rules, such as respecting minimum interpersonal distance.

  • Human-space interaction. The surrounding environment affects human actions. Obstacles and objects all have, in one way or another, an effect on human behaviour. It is, therefore, important to model these interactions and try to describe them.

  • Human-human interaction. Human trajectories depend heavily on how the people around them behave. A human being can predict the behaviour of other people and, consequently, make movements to avoid them.

  • Multimodality. In HBA, since human behaviour is unpredictable, the possible behaviours are several and the correct solutions are different.

  • Generalisability. A method should be evaluated for its ability to predict the entire distribution of possible human trajectories.

The solution to this problem concerns many practical applications, ranging from data visualisation to simulation applications. It is possible to deduce if a specific configuration of the environment models human behaviour and how it influences Key Performance Indicators (KPIs). If there is an environment with physical, social or semantic limits and constraints, it is possible to correctly predict or simulate the flow of human trajectories for a specific period and generalisable mode. State-of-the-art approaches are primarily based on the use of Generative Adversarial Neural Networks (GANs) or Long Short Term Memory (LSTM)  [4], [12] in crowded environments [30], and most of them do not model user behaviour in the surrounding environment but merely generate acceptable and realistic trajectories. In [34], we have filled this gap by proposing and defining new methods and metrics to help understand trajectories. In particular, new deep learning models based on LSTM and GAN architectures are used in both unimodal and multimodal contexts.

However, frameworks based on Inverse Reinforcement Learning (IRL) closely approximate trajectories produced by humans [17], [46], and Generative Adversarial Imitation Learning (GAIL) is proven to be a powerful and practical approach for learning sequential decision-making policies [13]. GAIL allows us to find a correlation between objects present in the scene and proximity to the search target. It is possible to find an analogy between the search for a certain category of products inside the retail space and the correlation of these products with others close by.

1.2 Nature and scope

In this regard, this paper aims to present GREEN PATH (GREEN space Planning by prediction and generAtion of Trajectories of Humans) a system for the creation of opportunities for developing resilient and regenerative approaches to public space design and utilization. The goal of GREEN PATH is not merely the design of a space, but also the creation of a new model more sustainable, more agile, and smarter and can generate human trajectories in an environment with complex constraints. GREEN PATH uses human trajectories and deep learning methods to analyse and understand human behaviour for offering insights to layout designers.

In this regard, the paper aims to propose a predictive and generative model that can handle an environment with complex constraints. In particular, it proposes a framework based on the work of [46] hybridised with classical reinforcement learning methods, such as continuous penalties, which allow for modelling the shape of the trajectories and inserting a bias in the training necessary for the generation. The structure of the framework and the formalisation of the problem to be solved allow for the evaluation of the results in two aspects: prediction and generation. Generation refers to the creation of trajectories from scratch, with determined points of origin. These trajectories are completely new, and the evaluation is done on quality and efficiency. The efficiency is similar to human efficiency, while the quality indicates the ability to create realistic trajectories and is evaluated by comparing the generated trajectories with those of the test set. Forecasting refers to the prediction of future paths for real trajectories that have already started. From this point of view, the geometric proximity of the generated points to the real ones is verified.

The approach has been applied to real scenarios, and the experiments were assessed on four datasets derived from different stores over two years. The behaviour of 10.4 million visitors was analysed, as described in [8, 29].

1.3 Contributions

GREEN PATH will make extensive use of AI for automatizing i) human behaviour understanding and forecasting through the creation of a widely generalizable system that allows to generation of human trajectories trajectories from zero, ii) space interpretation and virtualization, in fact, a representation of the state that can be easily expanded to different contexts. Largely inspired by the Dynamic Context Beliefs (DCB) of Yang et al. [46] and taking inspiration from the videogame world, a dynamic representation system has been developed iii) content creation and human-space interaction with the verification and resolution of the problem concerning the form of the reward existing in the work carried out by Yang et al. [46], in particular as regards the formulation of the reward function. Finally, iv) design and arrangement. While a manually tagged dataset is used for training, this does not exclude that the source of the states may be different. Since the state is based on exploration, it is possible to generalize its creation during the deployment phase and also carry it out through other methods, such as visual input from a robot.

The paper is organised as follows. Section 2 provides an overview of state-of-the-art approaches for trajectory prediction and generation. Section 3 presents the proposed approach, which is based on GAIL. Section 4 presents a comparison between our approach and several state-of-the-art algorithms, along with a detailed analysis of our framework. Limitations are presented in Section 5. In Section 6, conclusions and discussions are presented while future directions for this research have been proposed in the last Section 7.

2 Related works

Human trajectories are information-rich features that can help in understanding the environment, giving an idea about the interactions between objects and ongoing events [28]. Modelling human behaviour has overwhelming potential, especially from an economic and strategic point of view. When people walk in a space, they adhere to a huge number of unwritten trivial rules and observe social practices [20]. For instance, if they move inside a space, they respect their paths and yield to other nearby people to have their right of way. The competence to model these unwritten rules and apply them to predict, understand and generate users’ movements in an environment is extremely worthwhile for the design of intelligent tracking systems in smart environments. This problem appears challenging since several issues arise for the prediction and generation of human actions, taking into account such common sense behaviour [1].

Tracking people to understand human behaviour has a long tradition in computer vision literature [31], [21], [45], [39]. However, recently, predictive models have gained increased interest [44], [43]. Trajectory prediction is achieved by modelling and learning human-space [3], [14] or human-human interactions [33], [24].

Predictive models of pedestrian dynamics have been developed by encoding the coupled nature of multi-pedestrian interactions using game theory and deep-learning-based visual analysis to estimate person-specific behaviour parameters [24], [22]. In particular, the authors used concepts from game theory to model the intertwined decision-making processes of multiple pedestrians. Moreover, they used visual classifiers to learn a mapping from pedestrian appearance to behaviour parameters.

Social acceptability has been inspected using data-driven techniques based on Recurrent Neural Networks (RNNs).  [1] proposed a model called Social LSTM, which can learn general human movement and predict future trajectories. The proposed model can simultaneously predict the paths of all the people in a scene, considering the common sense rules and social conventions that humans generally adopt as they operate in public environments. In particular, the author introduced a “social” pooling layer that allows LSTMs of spatially proximal sequences to share their hidden states.

Bartoli et al. [4] extended the work of Alahi et al. [1] by defining “context-aware” pooling that allows the model to deal with static objects in the region around a person. In particular, their approach is based on the LSTM network that can learn and predict human movement in crowded environments.

To address the limitations of the aforementioned works, Gupta et al.  [12] exploited GANs to generate multiple socially acceptable trajectories, given an observed past. These behaviours concern socially accepted motion trajectories in crowded spaces. Their model is called “Social GAN” since they addressed the multimodality of trajectories.

Kothari et al. [20] define trajectory predictions as “given the past trajectories of all humans in a scene, forecast the future trajectories which conform to the social norms”. To focus on learning the social interactions that affect human motion, the authors assume that there do not exist any physical constraints in the scenes. They also focus on short-term human trajectory forecasting (next 5 secs). [47] presents early experimental results obtained including social information in their convolutional model using occupancy grids and maps. These experiments empirically showed that occupancy methods are ineffective in representing social information and did not improve their results.

These works are milestones for human-human interactions. Moreover, their purpose is to predict micro-trajectories, i.e. the precise generation of points following the current one. While the interest of this work is mainly related to macro-trajectories, as stated in the Introduction section, this paper also focuses on multimodality and human-space interaction.

In this regard, Kim et al. [17] proposed a framework for socially adaptive path planning in dynamic environments. In particular, they used an IRL module that adopted a set of trajectories generated by an expert for learning expert behaviour with several state features.

In [46], the authors proposed the first IRL model for learning the internal reward function and policy used by humans during a visual search. The purpose of this work was to reproduce the trajectory of the human gaze as it searches for a given object within the image. The theoretical basis of this work is the association that the human mind makes between objects that are necessary for or related to the achievement of a given goal.

A bottleneck of reinforcement learning is that it concerns the optimisation of a predefined reward function [38]. The design of a suitable reward function can be arduous in complex environments. Imitation learning approaches have proven to close this gap by learning how to perform tasks directly from expert demonstrations [15]. Among these, GAIL is a model-free imitation learning method that is highly efficient  [13].

In this context, Li et al. [23] proposed an algorithm that can deduce the latent structure of expert demonstrations in an unsupervised manner. Their method, based on GAIL, can not only emulate complex behaviours but also learn interpretable and essential representations of behavioural data as visual demonstrations. The domain of application was autonomous driving to mimic human behaviours related to driving a vehicle. The results obtained were fair, despite the difficulty of the task. The most interesting part of this work was the improvements to GAIL performance using a modified version of the Wasserstein loss [2], often used on GANs since it allows to eliminate some problems, such as vanishing gradients or the possibility of getting stuck in a minimum location. They also used “reward augmentation” [6], which consists of adding an a priori reward, which models a bias to be reflected in the model training, to the reward provided by the discriminator  [6].

The work of Ferracuti et al. [7] concerns the retail environment and uses Real-Time Locating System (RTLS) tags to collect human trajectory data. The tags were used to infer visitors’ preferred paths and their segmentation. In the same context, Paolanti et al. [28] presented a smart mechatronic system (sCREEN, Consumer REtail ExperieNce) for indoor navigation assistance. The system is based on a new Hidden Markov Model (HMM) to represent shoppers’ shelf/category attraction and usual retail scenarios (shelf-out-of-stock and modification of store layout).

In [5], the authors proposed a unified deep learning framework for the generation and analysis of driving scenario trajectories and validated its effectiveness in a principled way. To model and generate scenarios of trajectories with different lengths, they have developed two approaches. Firstly, they adapted a Recurrent Conditional Generative Adversarial Network (RC-GAN) by conditioning the length of the trajectories. Then, they designed an architecture based on a Recurrent Autoencoder with GANs to obviate the variable length issue, wherein they trained a GAN to learn/generate the latent representations of original trajectories.

Based on the idea proposed by [46], this paper attempts to solve the generation of trajectories inside a store. Assuming that the elements of the scene in the case of gaze prediction are similar to the categories near the customer in the case of movements inside a store, it is also possible to generalise the forecasting of trajectories starting from paths already partially formed. In this way, it is possible to understand, for example, the path taken by a customer starting from any point to go to each of the categories in the store. It also allows the management of different cases and trajectories in such a way as to foresee any possible behaviour and movement of the customer, by relying on statistical data for an estimate of the probability of purchase or interest in each category. Marketing strategies can be developed for specific customers, which can also be applied in real-time.

3 Materials and methods

GREEN PATH exploits an advanced intelligence system made of RTLS tags to collect human trajectory data, which, analyses, monitors, and understands everything that happens inside the target area. GREEN PATH provides alternatives in the design process. Following the idea presented in [46], we propose a GAIL framework to predict human trajectories in real environments. The framework aims to model such behaviour by a state representation that considers the influence of the environment on the short-term decision-making process of the user. A sparse matrix is adopted, which comprises C channels and has a fixed dimension of \( 47 \times 47 \). Every channel contains a representation of the position of a certain category in the target store. Our dataset consists of data collected from four stores that are encoded in this way. For each location, we have several points related to the tags. These tags are placed on a shopping cart or a basket; hence, their points need to be split into trajectories. The framework is comprehensively evaluated on the “Shopper trajectories dataset”, a publicly available dataset. The overall framework of GREEN PATH is depicted in Fig. 1.

Fig. 1
figure 1

GREEN PATH workflow

3.1 Shopper trajectories dataset

The dataset used in this work was acquired from four different stores in Germany and Indonesia, measuring the behaviour of 10.4 million shoppers over two years, as described in [8]. The data were collected with a tracking system based on Ultra Wideband (UWB) technology, with tags embedded in shopping carts. The UWB is suitable for applications where positioning accuracy is a critical issue [7]. This technology uses some UWB antennas that are suitably placed in a fixed area and battery-powered tags that can freely move in the area [28]. Figure 2 represents the layout of the four stores for which the “Shopper trajectories dataset” has been collected.

Fig. 2
figure 2

Layout of the four stores for which the “Shopper trajectories dataset” has been collected

Table 1 Datasets and number of data points

Table 1 reports the number of data points for each dataset. The number of trajectories in a dataset is approximately proportional to the number of data points.

3.2 GAIL framework

The GAIL framework [13] is an imitation learning approach similar to inverse reinforcement learning but formally different since it does not explicitly attempt to recover the reward function. In this case, the reward function created is different from the implicit and hidden functions of the expert. The intuition is to create a “judge” (discriminator) that indicates to the agent what he should and should not do based on the data obtained from an expert. The reward increases the more the agent approaches what the judge deems correct.

Fig. 3
figure 3

Our GAIL framework

Our framework (see Fig. 3) consisted of three networks: the discriminator, the agent (the generator) and the critic. The critic and agent networks shared one layer of feature extraction. The discriminator and the agent have an identical structure, unlike the Scanpath Prediction case, where there were some small differences in padding and kernel size. There were four layers of 128, 64, 32 and 1 filters, respectively, all with a kernel of \(3\times 3\) size and zero-padding of 1, except the first layer, which will have a reflection padding. This choice was experimentally dictated by the presence of higher probabilities (for the generator) and rewards (for the discriminator) along all the edges, which implies that zero-padding for the padding of the states is not the most suitable choice. The critic will share the first layer with the generator network. In this work, some convolution layers were added, with the same dimensions as the previous layer to be downsampled but with a stride of 2, to reduce the size of the output maps. This choice allowed better effectiveness of downsampling since it allows the network to learn additional parameters to increase generalisation without eliminating information with a predefined method, such as the maximum function. This approach is often recommended when using GANs, even if it has a greater number of parameters to train. The three networks used a dropout of 0.2 on most layers, and the activation function from Rectified Linear Unit (ReLU) to Leaky ReLU was replaced to avoid the problem of dying ReLU. The generator and critic networks were initialized through a Xavier (also called “Glorot”) initialisation [9], while the standard PyTorch initialisation was preferred for the discriminator since it would still be effective. Root Mean Squared Propagation (RMSProp) was chosen as the optimiser for training the discriminator, a choice derived for reasons of greater effectiveness demonstrated in cases of loss that used gradient penalties [11, 25]. Finally, regarding the loss, tests were carried out with the Wasserstein Generative Adversarial Network (WGAN) version with gradient penalty and with the normal GAN version, which also has a gradient penalty centred at 0, and applied only to real data, since the literature guarantees better convergence and generalisation using this method  [16, 25, 35]. We concatenated the chosen task and the output on every layer, as already done in [46]. In this way, we obtained a correlation between the chosen task and the action taken by the agent. In Fig. 3, it can also be seen that \(C = 30\), whereas, there are only seven tasks. Not all categories were considered for the possible tasks, as many of them did not have that many customers with considerable Cumulative Stopping Time (CST). C also includes the Fog of War Map (FoWM), detailed later, and a further map that contains the previous positions of the agent. The actor and critic networks were trained using proximal policy optimisation [37], with a learning rate of 0.00001 and a discount factor of 0.9, and advantages were estimated using General Advantage Estimation (GAE) [36]. Other hyperparameters were set to be equal to the original paper’s suggested values. The discriminator was trained using the standard GAN loss  [10], with a learning rate of 0.00005.

3.2.1 Preprocessing

To obtain good results, it is mandatory to choose a proper splitting strategy that correctly models our requirements. The goal was a generalised framework that, given a certain map and a target category, generates a user trajectory towards that category. Therefore, the chosen splitting strategy should be related to an inferred task of the customers. To infer such tasks, we evaluated a CST, i.e. the time during which the points of a tag were stationary near a certain category. Then, we split the trajectory when the CST exceeded a certain threshold. We chose the last stopping point as the last point of the trajectory. We then initialised another trajectory by using the same ending point of the previous trajectory as a starting position. In this way, we also obtained a good generalisation for the generated trajectories, as they did not depend on the initialisation. We also split the trajectories when they reached a selected entrance or exit area. Lastly, after doing this separation, we had to filter and discretise these points in a \( 47 \times 47 \) grid. To decrease the number of points, we also used sampling by considering only points that have at least a Euclidean distance of 3 (calculated on the grid), but not greater than 5, with the previous point on the trajectory.

We formalised the environment as a Markov decision process; therefore, we had to define our trajectories as a set of state-action couples. In our case, the state was the portion of the store that the customer had seen so far, which was expressed as a map with zeroes in all the positions. We were inspired by video-games and the military concept of the “fog of war”. Hence, we dynamically upgraded the current state by following the exploration of the agent. If the agent decides to move to a certain location, the current state moves to a new state that has actual map values within a radial area from the new position. At the k-th step, we have a cumulative map that has the real categories’ values on the explored area and zeroes elsewhere (the “fog of war”). To further augment the information of the agent, we added a FoWM as a new channel in the current states matrix. This map has ones at the points that were not explored by the customer.

3.2.2 Reward augmentation

The action space is also a \( 47 \times 47 \) matrix; hence, the agent can theoretically go everywhere on the map at every step. We have experimentally observed that using a hard constraint on the movement of the agent leads to a bad convergence of the overall framework. Therefore, we chose to adopt a “reward augmentation” method, as in  [23], that models a soft constraint. We penalised the agent if he chose to move to a point farther than a preset \(\phi \) radius. Thus, we assigned him a penalty formulated as (1):

$$\begin{aligned} P_0 = \max (0,\lambda _{P_0} (\sqrt{(x_{i-1} - x_i)^2+(y_{i-1} - y_i)^2} - \phi )) \end{aligned}$$
(1)

This penalty is always non-negative and is subtracted from the final reward. The parameter \(\lambda _{P_0}\) is a constant that controls the influence of the penalty; we set it to a value of 0.1, and \(\phi \) was set to 5. We can also apply similar penalties for movements that are too close to the current position (see (2)):

$$\begin{aligned} P_1 = \max (0,\lambda _{P_1} (\phi _{Near} - \sqrt{(x_{i-1} - x_i)^2+(y_{i-1} - y_i)^2})) \end{aligned}$$
(2)

In our configuration, \(\lambda _{P_1}\) was 0.1 and \(\phi _{Near}\) was 1. It should be noted that these two penalties did not add a real bias in the training, as our preprocessing already sampled for points that were at a distance between 3 and 5 from the previous points. Hence, they were more similar to the “reward shaping” proposed by Ng et al. [27] than the “reward augmentation” of Li et al. [23]. However, we noticed that the dataset had a lot of noise and points that were often located on shelves or walls. Thus, we added a biasing penalty that discouraged the agent from moving towards these “un-walkable” points. This penalty cannot be directly formalised as a non-constant and convex function, but we can use the distance between the current position and the “un-walkable” point as a value multiplied by a parameter \(\lambda _{P_2}\) that we set to 0.1. Another biasing penalty added to improve the linearity of the generated trajectory (and to make the framework more resilient to noise) was a penalty applied to the maximum angle of movement. Consider three points on the trajectory: \(p_1(x_1, y_1)\), \(p_2(x_2, y_2)\) and \(p_3(x_3, y_3)\), with \(p_3\) being the movement that the agent desires to make. We calculate the angle \(\theta _0\) as formulated in (3):

$$\begin{aligned} \theta _0 = arctan2(\Delta x_{\theta _0}, \Delta y_{\theta _0}) \end{aligned}$$
(3)

\(\Delta x_{\theta _0}\) denotes the difference between \(x_{1}\) and \(x_{2}\), \(\Delta y_{\theta _0}\) is the difference between \(y_{1}\) and \(y_{2}\). We calculate \(\theta _1\) as follows in (4):

$$\begin{aligned} \theta _1 = arctan2(\Delta x_{\theta _1}, \Delta y_{\theta _1}) \end{aligned}$$
(4)

Like before, \(\Delta x_{\theta _1}\) is the difference between \(x_{2}\) and \(x_{3}\), and \(\Delta y_{\theta _1}\) is the difference between \(y_{2}\) e \(y_{3}\). In (5), the absolute value:

$$\begin{aligned} \Delta \theta = | \theta _1 - \theta _0 | \end{aligned}$$
(5)

We will have the angle between the two lines. This angle is the angle variation of the last movement. We convert it to degrees, and if we obtain a value greater or equal to 180, we will use the explementary angle since we need the inner angle. We express the final penalty as (6):

$$\begin{aligned} P_2 = \max (0, \lambda _{P_3} ( \Delta \theta - \theta _{Max} )) \end{aligned}$$
(6)

Where \(\lambda _{P_3}\) was set to 0.01 because this penalty was more biasing than the others. All these penalties hybridised our method with pure reinforcement learning. The final reward will be the difference between the discriminator’s result and the penalties.

4 Results and discussions

In the following section, for the evaluation of the obtained results, two different aspects were examined. The first goal was to achieve a search efficiency similar to human efficiency. Our task was to imitate human behaviour and not reach the task zone in the fewest number of steps. Therefore, our reward formulation must adhere to this requirement. Yang et al. [46] used the logarithm of the sigmoid function applied to the result of the discriminator. This function produces only non-positive values; hence, it is trivial to infer that the agent performing the actions will be encouraged to complete the task as soon as possible to minimise the total cost of the trajectory. For the generation, we will use the Target Fixation Probability AUC in the same way as Yang et al. [46]. For this metric, we will only need to change the name of the curve since in our domain, the term “Fixation” loses its meaning, and we will simply refer to it as the Cumulative Distribution Function AUC or CDF-AUC. We will compare the results with those obtained by splitting the test set into two parts and using the first part as if they were generated trajectories. This allows us to have a ground truth to which we aim to approximate or, in the case of metrics related to quality, even surpass. We will refer to the results obtained using this method as “Human." Surpassing the “Human" results in terms of quality metrics does not mean moving away from good imitation. Quality metrics measure the similarity between the generated trajectories and the test set trajectories, indicating a measure of generalization. The "Human" value is only a reference point that, depending on the extracted trajectories, may not be optimal. As the lower limit, we will use trajectories generated with the untrained framework, resulting in completely random points. The values obtained in this way, which we will call “Random," represent the values we aim to distance ourselves from. Finally, we will use the network trained through Behavioural Cloning (BC) as the last reference to evaluate the difference achieved with this method. The trajectories generated with BC will all have real initialization. We call our method Trajectory Prediction (TP).

Figure 4 represents this behaviour using the cumulative distribution function curve. For each step, the relative cumulative probability of reaching the target can be obtained. We compared our method with the efficiency of the human trajectories in the test set and with an untrained generator that created random trajectories. To have another imitation learning method that can be used for comparison, we also trained our generator with the behavioural cloning approach. In Fig. 4 and Table 2, a super-human performance in the efficiency of search can be seen for our method using a logsigmoid activation. Three different initialisation methods, namely Preset, Real and Random, were used. These methods specified which point should be taken as the first point of the trajectory. “Preset” considers a predefined point, such as a point near the checkout, “Real” chooses the first points from real trajectories of the dataset and “Random” picks a random point in the store. All the experiments were performed using an Nvidia GeForce RTX 2080 Ti GPU (11GB of memory) on a 48-CPU Linux machine with Intel(R) Xeon(R) Silver 4214 CPU @ 2.20GHz and 220GB of RAM. The codebase has been developed in Python 3, using the Pytorch library for deep learning. Details about Python requirements are given in the codebase.

Fig. 4
figure 4

Cumulative distribution function using logsigmoid activation

Table 2 CDF-AUC results with logsigmoid reward activation

To overcome the limitations of a non-positive reward, we proposed to use a linear activation. In this, we directly used the output of the discriminator before the sigmoid function to formulate a reward function that had both positive and negative values. The discriminator was trained using the loss proposed by Goodfellow et al. [10], and therefore, its output before the last activation was small and centred on zero. Results of this method are shown in Fig. 5 and Table 3. Our method showed curves that matched more accurately than the human curve, particularly while using the Real initialisation.

Fig. 5
figure 5

Cumulative Distribution Function using a linear activation

Table 3 CDF-AUC results with linear activation

The second aspect analysed was the quality of generated trajectories. It was not so trivial to evaluate the quality of a trajectory that was generated completely from scratch without a ground truth. In this work, we proposed to use metrics like Dynamic Time Warping (DTW) and Longest Common SubSequence (LCSS) to compare how a certain trajectory is similar to the ones from the test set that have the same task [40].

The DTW distance between two trajectories A and B is calculated according to (7):

$$\begin{aligned} {\begin{matrix} DTW(A, B) = \min (&{}DTW(i-1, j), \\ &{}DTW(i, j-1), \\ &{}DTW(i-1, j-1)) \\ &{}+ d(A_i, B_j) \end{matrix}} \end{aligned}$$
(7)

Where: DTW(ij) is the DTW distance between prefixes of and B up to positions i and j, \(A_i\) and \(B_j\) are elements at positions i and j, \(d(A_i, B_j)\) is the local distance between \(A_i\) and \(B_j\). The LCSS (Longest Common Subsequence) similarity between two sequences A and B is calculated according to (8)[42]:

$$\begin{aligned} LCSS(A, B) = \frac{|LCS(A, B)|}{\min (|A|, |B|)} \end{aligned}$$
(8)

Where:

$$\begin{aligned} {\begin{matrix} &{}LCS(A, B) \text { is the Longest Common Subsequence of } A \text { and } B, \\ &{}|A| \text { and } |B| \text { are the lengths of sequences } A \text { and } B. \end{matrix}} \end{aligned}$$

To calculate an Average DTW distance (ADTW) and an Average LCSS similarity (ALCSS), we chose a generated trajectory and a trajectory from the test set and extracted a subset from each of them that has a starting point near the other (if they exist and if they have more than four points). We repeated this for every trajectory of the test set and every generated trajectory. As a distance function, we used Euclidean distance normalised with two times the diagonal of the store image. Our implementation of LCSS had relaxed constraints: a single point is in common if the Euclidean distance in the discretised grid is less than or equal to \(\sqrt{2}\). Then, we normalised the length of the longest common subtrajectory using the length of the generated trajectory. We calculated Similar Trajectories Count (STC) as the number of generated trajectories that match with (distance less than \(\sqrt{2}\)) at least 50% of the points on a trajectory in the test set. In Table 4, it can be seen that our method had better results than the BC algorithm. The results of the Real and Preset initialisations were also very close to the human results. Human values were taken by splitting the test set into two parts and then comparing one half to the other. It should be noted here that human results were only a reference value and they were not the best result that can be obtained; as these measures evaluate similarity between trajectories, a value higher than the metrics for the human results does not indicate a bad generalisation.

Although our work aims mainly to generate trajectories from zero, we can also use it to forecast existing trajectories. To evaluate the results of this task, we split every test trajectory into two parts and then used the first half to predict the second one. We compared the predicted trajectories with the real ones using Average Displacement Error (ADE) and Final Displacement Error (FDE), two widely used metrics for forecasting trajectories [18].

ADE is a common metric used to evaluate the accuracy of trajectory predictions in the field of computer vision and robotics. It measures the average Euclidean distance between the predicted positions and the ground truth positions of objects over a sequence of time steps. ADE is calculated as shown in (9):

$$\begin{aligned} ADE = \frac{1}{N}\sum _{i=1}^{N} \sqrt{(x_i - \hat{x}_i)^2 + (y_i - \hat{y}_i)^2} \end{aligned}$$
(9)

where N is the number of time steps, \((x_i, y_i)\) are the ground truth positions, and \((\hat{x}_i, \hat{y}_i)\) are the predicted positions at time step i. A lower value means that the predicted positions are closer to the ground truth positions on average over the entire trajectory. FDE is another important metric used to evaluate trajectory predictions, particularly at the final time step. It measures the Euclidean distance between the predicted final position and the ground truth final position of an object. FDE is calculated as shown in (10):

$$\begin{aligned} FDE = \sqrt{(x_N - \hat{x}_N)^2 + (y_N - \hat{y}_N)^2} \end{aligned}$$
(10)

where N is the final time step, \((x_N, y_N)\) is the ground truth final position, and \((\hat{x}_N, \hat{y}_N)\) is the predicted final position. For ADE, a lower value means that the predicted positions are closer to the ground truth positions on average over the entire trajectory. In other words, a lower ADE indicates that the trajectory predictions are more accurate. For FDE, a lower value means that the predicted final position is closer to the ground truth final position. This is particularly important when evaluating the accuracy of predictions at the end of a trajectory. A lower FDE indicates better accuracy in predicting the final destination of an object.

Table 4 Results of similarity measures
Table 5 Forecasting results

In Table 5, we can see how our framework obtained better results than the behaviour cloning method. Moreover, it differed considerably from the results of the random prediction. However, metrics like ADE and FDE do not consider multimodality, as they compare a forecast with only one of the possible real trajectories. So, these metrics should be used only for a qualitative comparison. Qualitative results on prediction and forecasting are available in the appendix.

5 Limitations

The limitations of the proposed approach primarily revolve around its generalization capabilities concerning stores with significant differences in layout and size. The need for a predefined fixed number of cells for the map discretization can have varying impacts on the categorization of items, depending on the layout of each specific store. Another limitation is that the collected dataset takes into consideration only shoppers with either a shopping cart or a trolley, which can exhibit considerably different behaviour from a shopper without.

6 Conclusions

In this paper, it is proposed GREEN PATH, an intelligent expert system for space planning that employs a GAIL framework for modelling human trajectories in an environment. This work allowed for both generating trajectories from scratch and predicting the future patterns of a person from existing trajectories. The system is a predictive and generative model that can handle an environment with complex constraints, such as those in retail. In particular, a GAIL-based framework has been hybridised with classical reinforcement learning methods, such as continuous penalties, which allow for modelling the shape of the trajectories and inserting a bias in the training. The system is also very general, and the data can be constructed in multiple ways. Depending on the chosen reward, we can either enhance or emulate the behaviour of a human being. This paper focused on the second aspect, as it was more interesting for our purposes. Finally, the experimental results clearly show the feasibility of the proposed method as well as its generalisability, since state is based on the exploration, it is possible to generalize its creation during the deployment phase and also carry it out through other methods, such as visual input from a robot. Therefore, with such a framework, it is possible to develop a store simulator where we can predict customer behaviour with different layouts and shelf positions.

7 Future works

Future works will be devoted to improving the results based on the limitations highlighted in Section 5. A higher number of stores in the dataset will surely be mandatory for this task. However, if the analysis focused on a single store, the framework could provide better results if it were trained with fewer stores with many trajectories. This topic should be subject to further analysis. On the framework side, the implementation of different state and action spaces that could provide better results should be taken into consideration. A new paradigm of space design will be achieved. There will be an increased number of public spaces that will re-arrange their layout following the data collected by GREEN PATH. Managers will increase their knowledge of space utilization, data will be shared at a worldwide scale to define a shared protocol among designers. These technologies could also be integrated into a user-intuitive framework, designed around the challenges previously described, ultimately enabling a system that can be integrated seamlessly into existing spaces, even for other domains, without the need to fully re-engineer the existing environments for visitors.