Efﬁciently exploring for human robot interaction: partially observable Poisson processes

Consider a mobile robot exploring an ofﬁce building with the aim of observing as much human activity as possible over several days. It must learn where and when people are to be found, count the observed activities, and revisit popular places at the right time. In this paper we present a series of Bayesian estimators for the levels of human activity that improve on simple counting. We then show how these estimators can be used to drive efﬁcient exploration for human activities. The estimators arise from modelling the human activity counts as a partially observable Poisson process (POPP). This paper presents novel extensions to POPP for the following cases: (i) the robot’s sensors are correlated, (ii) the robot’s sensor model, itself built from data, is also unreliable, (iii) both are combined. It also combines the resulting Bayesian estimators with a simple, but effective solution to the exploration-exploitation trade-off faced by the robot in a real deployment. A series of 15 day robot deployments show how our approach boosts the number of human activities observed by 70% relative to a baseline and produces more accurate estimates of the level of human activity in each place and time.


Introduction
Autonomous mobile robots are being developed to operate in human populated spaces, such as homes and offices (Hawes et al., 2016).The biggest benefits of mobile robots operating in these spaces over any other intelligent system such as smart houses and assisted living systems are the mobility and the ease of deployment since there is no need to modify existing environment structures.Since, the robot is in humancentered environments, it is useful for these robots to predict patterns of human activity, so as to learn about those activities (Coppola et al., 2016;Hogg & Cohn, 2016) or to plan interactions with humans (Street et al., 2020).This paper is concerned with how a robot can correctly estimate how many human activities it has encountered, predict how many will occur in a particular location at a particular time, and then use these to optimize the exploration-exploitation trade-off that occurs during active learning, so as to observe as many human activities as possible during a deployment.
Consider a mobile service robot that works in a large office building.Let us suppose that one of the system designers' aims is for the robot to observe and thereby learn about the various activities performed by humans, and for this learning to have to occur over several days or weeks.To achieve this goal, the robot must learn where and when people are to be found, count the observed activities (perhaps grouped by their categories), and revisit places at times when those activities can be observed in sufficient number.For example, to learn about eating activities the robot would benefit from visiting the canteen at lunchtime, rather than at the start of the day.If such a robot is to be deployed to a variety of buildings, without re-programming of rules by hand, it should autonomously learn the spatio-temporal distribution of these activities and exploit that learning to observe a useful variety and number of activities.
This involves solving two problems.First, the robot must estimate where it can observe the greatest number of human activities at each time of the day.Second, as the robot is learning it should trade-off exploring for new time-place Fig. 1 Our mobile robot observes people at an event combinations where it might discover a high-level of human activity and re-visiting those time place combinations where it already knows that a wealth of human activity is to be found.This second problem involves solving an explorationexploitation trade-off.
This paper presents a Bayesian method to solve both these problems in the case where we treat human activities as count data.The Bayesian framework models not only the frequency of human activities and the variation in this, but also the robot's uncertainty about the mean rate at which activities occur.Thus, the Bayesian estimator captures both inherent process uncertainty (aleatoric uncertainty) and the robot's additional uncertainty in what it knows about the process (epistemic uncertainty).It can also correct for inherent biases (a tendency to false positives or false negatives in the sensory system).Because of this it has two advantages over a baseline frequentist estimator.First, it will produce more accurate estimates and predictions of human activity levels than a method that does not model classification errors.Second, because it captures epistemic uncertainty it can be used to perform active learning.This active learning problem is fundamentally an exploration-exploitation trade-off.Should the robot visit a place at a time such that it can exploit what it already knows about the likely activity level, or should it explore a place-time combination about which it knows less, but about which it might learn and so lead to a higher rate of activity observation in the long run?This active learning problem is intractable in the strict formulation, since it involves reasoning over a tree of possible knowledge states.Despite this, there are effective, heuristic active-learning rules that are quick to evaluate (Fig. 1).
We develop a series of Bayesian estimators.Then we present a method to use these to drive exploration.This uses both a Fourier transform to capture the periodicity of human activities and the epistemic uncertainty in the activity rate as captured by the posterior.Using this estimation, prediction and exploration technique, we then present the results of sev-eral long-run deployments of a real robot in a public building.These long-run deployments (15 days per treatment) are used to test whether the different Bayesian estimators, together with the solution to the exploration-exploitation trade-off, result in the robot observing greater numbers of human activities than a baseline frequentist method.
This paper builds on our earlier work, which showed how to count reliably from a single unreliable detector or from multiple, unreliable, uncorrelated detectors (Jovan et al., 2018).That work formulated the problem as Bayesian inference for a partially observable Poisson process (POPP) and showed an improvement on a baseline model assuming sensor reliability, termed the fully observable Poisson process (FOPP).
This paper makes the following technical contributions.First, we extend the POPP model to create the correlated POPP (C-POPP) model.This supports inference when the robot has multiple detectors with correlated outputs.Second, the observation model used to correct counts in the POPP model is itself constructed from data and so has both epistemic and aleatoric uncertainties.The POPP and C-POPP models only take account of the aleatoric uncertainty in the observation model.We extend the POPP model to include the epistemic uncertainty, resulting in the POPP-Beta model.The third contribution is to combine the benefits of C-POPP and POPP-Beta.This results in the POPP-Dirichlet model, which works for correlated sensors and epistemic uncertainty in the observation model.We demonstrate the inferential properties of POPP and these three extensions in both numerical simulations.The fourth contribution is that we show how these models can be used solve the exploration-exploitation problem by combining Fourier transform that allows us to exploit the periodicity of human activities with an upper bound estimate derived from the posterior.Finally, the fifth contribution is an extensive real world evaluation on a long-run robot.We compare the exploration and estimation performance of the FOPP, POPP and POPP-Beta models in a series of three 15-day deployments.Analysis shows that the POPP and POPP-Beta models are able to explore more efficiently, encountering more people than the baseline FOPP model and that they produce superior estimates of the rate of human activities.
other than those of interest.False negative counts, also called the undercount, occur when some of the events of interest are missed.Work on the undercounting problem is common.Whittemore and Gong estimated cervical cancer rates by taking into account false negative data (Whittemore & Gong, 1991).Winkelmann and Zimmermann introduced a combination of a Poisson regression model with a logit model for under-counting, yielding the Poisson-Logistic (Pogit) model (Winkelmann & Zimmermann, 1993).They applied this to model the number of days employees were absent from a workplace.Dvorzak and Wagner adapted the Pogit model to use a small set of validation data, to provide information about the true counts (Dvorzak & Wagner, 2016).They performed a Bayesian analysis of the Poisson-Logistic model and incorporate Bayesian variable selection to identify regressors with a non-zero effect and also to restrict parameters of the Poisson-Logistic model.
There is less prior work on the Poisson model for the case where the data may either be undercounted or overcounted (Sposto et al., 1992;Bratcher & Stamey, 2002;Stamey et al., 2004;Stamey & Young, 2005).Sposto et al. followed a frequentist approach to estimate both cancer and non-cancer death rates, assuming false negatives are possible on both sides of these counts (Sposto et al., 1992).In (Bratcher & Stamey, 2002), Bratcher and Stamey used a Bayesian method to estimate Poisson rates in the presence of both undercounts and overcounts, borrowing the double sampling technique introduced in (Tenenbein, 1970).They extended their work to a fully Bayesian method for interval prediction of the unobservable actual count in future samples, given a current double sample (Stamey et al., 2004).Stamey and Young (Stamey & Young, 2005) present closedform expressions for maximum likelihood estimators of the false negative rate, the false positive rate, and the Poisson rate for the model proposed in (Bratcher & Stamey, 2002).The estimators are straightforward to calculate and to interpret in terms of evaluating the effectiveness of using unreliable counts.
What we propose is similar to that of Bratcher and Stamey (Bratcher & Stamey, 2002).Both aim to accurately estimate the arrival rate parameter of a single Poisson process.Bratcher and Stamey utilise double sampling to obtain the true count together with false positive and false negative counts.They estimate the rate via MCMC since no closed form is found for λ, and the calculation of the full posterior is expensive.Double sampling assumes access to two counters with one always being a perfect counter.Our work goes beyond this since we consider multiple, potentially correlated, but always unreliable counters.We extend the work of Jovan et al in (Jovan et al., 2018) by presenting three extensions of our original model.
In Sect.8 we validate our work by demonstrating how it can be used to improve mobile robot exploration missions to observe humans.Existing work in this field is typically driven by the entropy in the model of a process to maximise the outcome of interests from an exploration (e.g. the size of area explored, the number of observed humans).However, many existing works rely on the assumption that the sensors are fully reliable and data is fully observable; the collected data is, relatively, free from inherent biases.Molina et al. (2022) is a prime example of this work with the state of the art result in robot explorations.Molina et al. (2022) focuses on robot explorations in learning human motion patterns by (1) exploiting the temporal aspects of human motion through spectral analysis to decide when to visit particular cells and (2) incorporating entropy calculation in the probabilistic maps to decide which places/cells worth visiting.Kaplow et al. (2010) employed a variable resolution map in combination with POMDP formulation to achieve scaling with a robotic wheelchair navigation and exploration.Tasklevel robot control with a decision-theoretic framework was first tackled by Pineau et al. in (Pineau et al., 2003) using a POMDP planner to derive a high-level controller for a mobile robot with a dialogue system by exploiting hierarchy to reduce the state space.
There are couple of works that do reason about sensing reliability in an attempt to correct biases in the sensors with the application on robot explorations.Martinez-Cantin et al. (2009) give a POMDP formulation of active visual mapping, use direct policy search to find a solution, and use Monte Carlo simulation to generate imaginary observations and action outcomes during optimization.The main challenge of decision-theoretic planning in partially observable environments is intractability.Velez et al. (2012) planned trajectories in a continuous space to maximize the reliability of object detection using a learned observation model.The key contribution is the use of a model of the correlations in sensor behaviour at nearby locations, thus driving the robot to gather more informative views.Similar to the work of Velez et al (Velez et al., 2012) in utilizing sensor behaviours in driving the robot to gather more information, our work goes further by utilizing an exploration-exploitation mechanism provided by Bayesian optimization to maximize human observations in the areas of interest.In contrast to the work of Molina et al. (2022), we demonstrate how robot exploration can be improved by correcting any systematic bias produced by robot's sensory system.Our main work contribution can complement any robot exploration technique by replacing 'Exploration Planner' module in Fig. 2 with any other exploration technique (e.g.entropy-based exploration).

Fully observable Poisson process
A fully observable Poisson process (FOPP) is a counting process N (t) where a counter tells, without error, the number of events that occurred during a specified interval [0, t).N (t) = c i states that in the i-th observation of interval [0, t), there are c i events.The number of events N (t) in a finite interval of length t obeys the Poisson distribution where λ represents the arrival rate in a fixed interval [0, t).Bayesian estimation for fully observable Poisson processes is straightforward.Given a Gamma density as a prior distribution over the parameter λ, where α, β are the shape and the rate parameters, the posterior over λ for a FOPP can be calculated via Bayesian inference with This adds the sample counts n i=1 c i to the hyperparameter α of the gamma prior, and adds the number of observations n to the hyper-parameter β of the gamma prior.
The FOPP model requires a single reliable sensor.With an unreliable sensor, FOPP inferences will be incorrect.

The partially observable Poisson process
The partially observable Poisson process (POPP) is a counting process N (t) with arrival rate λ where the number of events appearing over the time interval [0, t) is observed by one or more unreliable counters.The concept was firstly introduced in Jovan et al. (2018).The definition brings a distinction between the true count (or simply count), which refers to the number of events that actually occurred, and the sensed count, which refers to the count obtained by a counter (or sensor).Let c i represent the true count over the interval [0, t) during the i-th observation.With m counters unreliably observing c i , we use s j,i to represent the sensed count given by sensor j in the i-th observation within the interval [0, t) with 1 ≤ j ≤ m.Let s i = (s 1,i , . . ., s m,i ) represent a vector of sensed counts from m sensors for the i-th observation of the process.
Figure 3 presents the graphical model derived from the definition of the POPP.This shows that the true count c i has become a latent variable which can only be inferred from the sensed count.The posterior of λ is then inferred from the posterior of c i after n observations, i = 1 . . .n.
The rate parameter λ of the POPP model can be inferred by marginalising over all possible true count values c i and in the distribution of true counts given sensed counts P(c i | s i ).Given n observations of the underlying process, let all where true count probabilities, P(λ | c), can be drawn from the original FOPP definition: If we assume that the sensor counts for observation period i are conditionally independent (i.e.uncorrelated) given the true count c i , then the probability of a collection of observations given the true count is defined as follows: Using this, the probability of a particular sequence of n counts, given a sequence of n observations each from m sensors, P(c | s), can be defined as: where c −1 = c i−1 , . . ., c 1 1 and P(c i | c −1 ) can be calculated as: 1 c −1 does not exist whenever i = 1, and To complete Eq. 7 we must also define P(s j,i | c i ).The Poisson limit theorem states that the Poisson distribution may be used as an approximation to the binomial distribution (Papoulis & Pillai, 2002).Using this theorem as the foundation, an arbitrarily close approximation to the probability P(s j,i | c i ) is defined by assuming there exists a small enough finite subinterval of length δ for which the probability of more than one event occurring is less than some small value and that δ is small enough that is negligible.With this assumption, interval [0, t) is split into l smaller subintervals I 1 , . . ., I l of equal size, with the condition that l > λ.Consequently, the whole interval [0, t) = I 1 , . . ., I l becomes a series of Bernoulli trials, where the k th trial corresponds to whether (1) an event e k happens with probability λ/l and (2) a sensor j captures the event e k as the detection d k at the subinterval I k .
Following this, P(s j,i | c i ) can be defined using of the count of true positives given c i subintervals, and the false positives given the remaining l − c i subintervals.Let the probability of a true positive detection (TP) for sensor j in a single subinterval be τ j = P j (d | e=1), and the probability of a false positive detection (FP) be ξ j = P j (d | e=0).Thus P(s j,i | c i ) is defined as a sum over all possible sensed counts of the product of two binomial distributions B(r | n, π): where the first binomial provides the probability of getting some proportion of the count from TP detections and the second binomial provides the probability of getting the remainder from FP detections.Equation 4shows the difficulty of estimation in the POPP model.Since no conjugate density provides an analytical solution for the posterior over λ, every sensed count s i must be retained to calculate the posterior of λ.That means elements representing each value of c i on each observation grow infinitely.Even with an upper bound l on the maximum value of c i , the number of elements to retain on each observation periods grows exponentially.

estimators
To address this difficulty, in (Jovan et al., 2018) we proposed three estimators, each of which offers an approximation to the true posterior P(λ | s).The estimators are: (1) a gamma filter, which approximates Eq. 4 with a single gamma distribution minimising the KL-divergence α, β)) by gradient descent.The accuracy of this filter deteriorates as sensor reliability degrades.However, computation time is constant on each observation and Eq. 8 has a closed form, using the negative binomial distribution with the hyperparameters (2) a histogram filter, which approximates Eq. 4 with a discrete distribution Q(λ | s) by quantising λ.The advantage of this filter over the gamma filter is that it can track the posterior to an arbitrary fidelity via a finer quantisation with the cost of computation time.Its disadvantage is an increase in computation time compared to the gamma filter; (3) a switching filter, which approximates Eq. 4 either by a gamma filter or by a histogram filter depending on whether P(λ | s) resembles a gamma distribution and can be approximated by the gamma filter via KL-divergence In general (and in our experimental work from Sect. 6 onwards) we use the switching filter as the estimator to the true posterior P(λ | s) because it combines the best of both the gamma filter (fast calculation) and the histogram filter (accurate approximation) with minimum loss in similarity to the true posterior P(λ | s). Figure 4 shows KL-divergence between the gamma and switching filters to the true posterior over different sensor reliabilities using simulated data.Note that the histogram filter was not included because it perfectly tracked P(λ | s), i.e., A more detailed presentation of these estimators is given in (Jovan et al., 2018).

The POPP extensions
In (Jovan et al., 2018) we demonstrated that the POPP model is able to efficiently correct miscounts made by multiple unreliable counting devices observing a single Poisson process.However, the POPP model is limited by two assumptions: (1) the sensors are conditionally independent given the true count, and (2) the degree of the unreliability of a sensor (i.e.τ and ξ ) is precisely known.
In this paper, we propose three extensions to the POPP model to tackle these assumptions.The first extension (POPP-Beta) extends the POPP model with an observation model which captures uncertainty about the role of the sensor reliability.The second extension (C-POPP) modifies the POPP model to accommodate correlations between sensors.The third extension (POPP-Dirichlet) combines these ideas to jointly address both assumptions.

POPP-Beta
The POPP model requires the true positive and false positive rates to be specified for sensor j, i.e. τ j = P j (d | e=1) and ξ j = P j (d | e=0).The POPP model requires these rates to be accurate in order to generate correct posteriors over λ.
To accurately determine the rates in practice, one needs to have a large data set of both sensed counts and the ground truth.Given the ground truth is typically manually created, this places a large burden on experts who need to label the data.
Here, we extend the original POPP model to take into account uncertainty in the true and false positive rates due to limited training data.To model this uncertainty we use Bayesian estimation to determine the true positive rate (τ ) and false positive rate (ξ ).We use Beta distributions as priors for τ and ξ because the Beta distribution act as a conjugate to the binomial distribution, providing a family of prior probability distributions for the parameter of a binomial distribution.The Beta-binomial conjugacy leads to an analytically tractable compound distribution called the Beta-  ζ ξ are the number of true positive and false positive detections in the ground truth data respectively.η τ and η ξ are the number of true negative and false negative detections in the ground truth data respectively.Given these parameters, we form the POPP-Beta model from POPP by replacing Eq. 9 with: with δ s r = (s j,i − r ), and With a sensor model which follows beta densities and is fully integrated, as a distribution, in the sensed count likelihood P(s j,i | c i ) as shown in Equation 11, we obtain a graphical model with the structure shown in Fig. 5.One should note that the difference between the POPP and POPP-Beta model, lies only in the change from Eq. 9 to 11.However, given little training data for the observation model, the POPP-Beta model is expected to be more conservative in estimating the posterior P(λ | s) over λ than the POPP model.

Correlated POPP
Recall that Eq. 6 is defined under the assumption that each sensor count is conditionally independent from all the others given the true count.This assumption ignores the correlations between sensors.To introduce correlations between sensors we must alter Eq. 6 and Eq. 9 from the POPP model.
Recall that the probability of a particular sensed count given the true count P(s i | c i ) was defined from the Poisson limit theorem as a sequence of Bernoulli trials over l subintervals.With correlated sensors, the observation of an event e k in the k th trial no longer follows the Bernoulli distribution.Instead it follows the categorical distribution, where the k th trial corresponds to whether a particular combination of binary detections d 1,k , . . ., d m,k happens in subinterval I k .Therefore, we move our notation from using s j,i representing sensed counts for particular sensor j independently at time interval i to a matrix representing m sensor detections together at time interval i. Formally, we replace Eq. 6 and Eq. 9 with a probability of a series of detection outcomes given the true count c i at interval i as the following.
We first define for some interval i, l subintervals, and m sensors, there is a binary matrix of detections D i2 .D ∈ D m,l the set of binary matrices of dimension m ×l.Each column k of D, we denote D :k = d = {0, 1} m with k = 1, . . ., l. D :k is a vector of detections from m different sensors at particular subinterval k.
We further define e k ∈ {0, 1} as the variable indicating whether or not an event is hypothesized to have occurred in sub-interval k. e k = 1 means that an event occurred.We define P + as the categorical distribution of d, conditioned on e = 1, i.e.
and, by analogy, Both P + and P − have 2 m elements3 .These two probabilities represent true positive rates and true negative rates as τ and ξ for the POPP model.Similar to τ and ξ , P + and P − are estimated from both detections of each sensor and the corresponding actual (non-)event as ground truth.However, unlike τ and ξ which are sensor specific, P + and P − consider all combinations of binary detections from sensors given the true event.This means the number of elements in P + and P − grows by a factor of two for each sensor added.Due to the size of P + and P − , they may need more than a few hundred of detections together with their corresponding events to be estimated.We can partition the subintervals 1, . . ., l into two sets.e + is the set of subintervals k where e k = 1, and e − is the set of subintervals k where e k = 0. We can define a partition of the subintervals by a pair (e + , e − ).The set of possible partitions such that e + has a fixed size c, i.e. | e + |= c, is denoted c , so that (e + , e − ) ∈ c .
We  As there may be duplicate columns in either or both D e + and D e − , we define a count vector for each. and g − r = l where each of g + , g − are of length 2 m , having one element for every possible detection vector d ∈ {0, 1} m .In order to define the joint probability of a particular count being yielded by a particular sequence of detection outcomes, we must consider all possible combinations of true positives and false positives that could be generated by that sequence by exploring all elements of c .We do this in the following definition of P(D | c), and define the probability of a given sequence of detection groups yielding count c using the multinomial distribution.
Equation 14 can be understood by analogy to Eq. 9.In both equations all possible ways pairs of true and false positives counts which sum to c are considered.In the conditionally independent case the binomial distribution is used to determine the probability of each count from the available trials given the true and false positive rates.However, in the conditionally independent case, Eq. 9 is calculated independently for each sensor, and the joint probability of those sensors results in Eq. 6.In the correlated case the multinomial distribution is used to determine the probability of each count from a possible sequence of joint observations and their probability of yielding a count.With that, Eq.14 removes the need of Eq. 6 in C-POPP model.A graphical representation for C-POPP can be seen in Fig. 6.
One should note that the benefit of C-POPP is that it exploits correlations among multiple sensors contributing to detection counts.If there is only one sensor counting events, then C-POPP collapses to POPP.

The POPP-Dirichlet
The C-POPP model requires the true positive rate P + and true negative rate P − to be specified in advanced in estimating the parameter λ of a Poisson process.These are an extension of τ and ξ where the rates provide a probability for a particular combination of binary detections coming from each sensor given the true event as shown in Eq. 12 and Eq. 13.
To construct an observation model of P + and P − , one needs to have both detections and the corresponding actual (non-)events as ground truth.Pre-processing involving expert interventions is typically required before the detections and their corresponding ground truth can be further used.Similarly to the POPP model, the C-POPP model requires the observation model to be accurate to avoid the posterior over λ drifting away from the true posterior.If attaining an accurate observation model for the POPP model is a problem, then this becomes more challenging in the case of C-POPP model.This is because the training data needed to construct an observation model grows by a factor of two for each sensor involved.
Analogously to the extension from the POPP model to the POPP-Beta model, we can expand the C-POPP observation model.In this case the observation models (P + and P − ) will follow Dirichlet distributions.The Dirichlet distribution is an appropriate distribution since P + and P − are the probabilities of categorical distributions which set the probabilities of multinomial distributions in Eq.

Evaluation on synthetic data
In this section we evaluate POPP and its extensions on synthetic data to demonstrate the properties of these models when estimating the arrival rate λ of a Poisson process.With synthetic data, sensor reliability can be controlled, and the true λ and the true counts c i can be known for each sample.
In our experiments we initially generate a training set of n = 12 (true) counts from a Poisson process P(c | λ = 3) with a time interval t = 10 time unit.Along with the training set count c 1 , . . ., c 12 , for each count c i , we also generate the corresponding event occurrence e k ∈ {0, 1} on each subinterval k ∈ {1, . . ., t}, a sensed count s i , and D i from two unreliable sensors with 10 subintervals for each sensed count D i (i.e.m = 2, l = 10, D ∈ D 2,10 in our evaluation).To capture a range of possible sensor correlations and performance characteristics, the sensed counts for the training set are produced from 12 different sensor configurations (see Table 1).The true and sensed counts are then used to build (joint where appropriate) sensor models for the POPP extensions described above.For the POPP-Beta and the POPP-Dirichlet models, we set the hyperparameters of the Dirichlet prior and Beta prior to follow uniform distribution, i.We then generate a new set of n = 144 true counts and the corresponding sensed counts for each of the 12 sensor configurations.These sensing are used as input in a filtering process to estimate the posterior of λ according to each of the four models defined above (POPP, POPP-Beta, C-POPP, and POPP-Dirichlet), plus FOPP.We chose the training set size n = 12 such that there is insufficient data to build an accurate sensor model.This allows the POPP-Dirichlet and the POPP-Beta models to compensate with loose Dirichlet and beta densities.
The 12 sensor configurations mentioned previously represent 12 different experimental conditions under which we can test our proposed models.In six of the configurations we vary the true joint positive rates (true P + ) of the two sensors whilst fixing their true joint negative rates (true P − ).
In the other six we fix the true joint positive rates (TJPRs) whilst varying the true joint negative rates (TJNRs).Both cases cover variations where the sensors are uncorrelated, positively correlated and negatively correlated, and in each case where the overall true (postive or negative) rates are either high (0.9) or low (0.1).The detailed configurations are presented in Table 1.
The performance of all POPP models was assessed by measuring how accurate each model is in estimating the true λ .The true λ is estimated by applying the FOPP model on the true counts5 .Two options were used to measure the accuracy: (1) the RMSE of the expectation (mean) and the MAP hypothesis (mode) of each model posterior distribution over λ to the true λ ; and (2) the Jensen-Shannon distance between the posterior distribution P(λ | s i ) and the distribution of the true λ .
Table 1 The sensor configurations for the evaluation on synthetic data."+ corr" and "-corr" mean a positive correlation and a negative correlation between two sensors respectively In this paper, we remove computation time per sample analysis between POPP and its extensions because the computation relies heavily on the filters chosen.The time to calculate the distribution of sensed count given the actual count between POPP, POPP-Beta, C-POPP and the POPP-Dirichlet on each sample can be considered constant and, therefore, is negligible to the total computation time.Our prior work provided a detailed comparison in computational efficiency between different filters (Jovan et al., 2018).

Evaluation on aggregate human occupancy behaviour dataset
We now investigate the performance of the POPP model and its extensions on a real world dataset6 .The dataset was gathered from an office building in which a mobile robot (Hawes et al., 2016) counted the number of people in different regions whilst patrolling (see Fig. 12 for the map of the building).The dataset contains time series counts from three different automated person detectors (Dondrup et al., 2015).These use laser, depth camera and RGB information.We refer to these detectors respectively as the leg detector (LD), upper body detector (UBD), and change (or scenery) detector (CD).Each of these detectors acts as one sensor.Each returns a sensed count of the number of people it detected in each 10 minute interval during the day.To unify different frequency of detections of each sensor, we used the lowest frequency detection from the change detector and limited to maximum one detection per minute.These detectors are unreliable, as can be seen from Fig. 13, which shows examples of correct and incorrect detections.By comparing the ground truth with the detections made by sensors, we compute a sensor model for each region.An average of the sensor models across all regions can be seen in Table 2.Although the robot operated for 24 hours day, the sensor models were built using only the data collected from 10am to 8pm, since there were few detections outside these times.From a 69 day trial of the mobile robot, we obtained 48 days of usable observations.We specified a time interval for each Poisson distribution of 10 minutes, and recorded both the true counts and the detections made by each sensor in each interval.We assumed the underlying process in each region to be a periodic Poisson process in which there is a one-day periodicity, i.e. λ(t) = λ(t + ) with = 24 * 60 (minutes).This means that the expected number of people each day at a particular time is expected to be the same across the 48 days of observations.We estimated the true parameter λ (t) of the Poisson distribution at t by running a FOPP model on the true counts within each interval.We use this estimate of λ (t) from the true counts as the target which the POPP models must estimate from the sensed counts.
The different POPP approaches rely on sensor models that must be calculated from a confusion matrix relating true counts to the sensed counts from the different sensors.To separate the training and testing data we performed four fold cross-validation with data splits being on whole days, i.e., we used 12 days of data as a training set for a sensor model and then used the remaining 36 days of data as a test set on which to test the inferences made by each model from the sensor counts.
For the 36 days of test data, the different models each made predictions of the λ(t) parameter of the Poisson.Given this, we recorded (1) the RMSE between the MAP hypothesis of each model posterior distribution over λ(t) and the true λ (t) and ( 2) the Jensen-Shannon distance between the posterior distribution P(λ(t) | s i ) and the distribution of the true λ (t).Using these metrics, we compared the performance of all POPP models (estimated using the switching filter described in Sect.4) to the Bayes' filter arising from the FOPP model.The FOPP model is a single sensor model and was estimated from the change detector counts since this was the most reliable detector among the three available (as shown in Table 2).
Figures 14 and 15 show the accuracy comparison between all POPP models and the standard FOPP model over time.It can be seen that all models become more accurate as the days pass.All POPP models show more accuracy over the standard FOPP model.The λ(t) estimate produced by the POPP-Dirichlet model is more accurate than the ones produced by the standard POPP model and the POPP-Beta model.However, the POPP-Dirichlet estimate is not always more accurate than the one produced by the C-POPP model.
As the POPP-Dirichlet model is more conservative in estimating the parameter λ(t) than the C-POPP model, the estimate moves more slowly towards the true λ (t).This is seen in Fig. 15.By the third day, the POPP-Dirichlet model outperformed the POPP, POPP-Beta, and C-POPP models in terms of accuracy.However, the accuracy gap between the C-POPP model and the POPP-Dirichlet model becomes smaller over time.By the 36th day the C-POPP model outperforms the POPP-Dirichlet by a small margin.It should be noted that Figs. 14 and 15 are averaged RMSE and the Jensen-Shannon distance from 10 different regions over time.The more regions with high volume of data available, the more accurate the joint sensor model, especially for C-POPP, will be and, in turn, the more accurate the C-POPP filter becomes in estimating the parameter λ(t).
Figures 16 and 17 show the RMSE and Jensen-Shannon comparison between all POPP models and the FOPP across  The POPP-Dirichlet has an advantage on regions with low volume of data such as region 4, 5, 6 and 7.As some of these data were used to construct the joint sensor model for both C-POPP and the POPP-Dirichlet, a small amount of data creates an inaccurate point-estimate joint sensor model, which is used by the C-POPP filter.These problem is handled appropriately on the POPP-Dirichlet with its distribution joint sensor model with the help of Dirichlet prior as explained in Sect.5.3.
One interesting finding here is that there is small to no difference in performance between the POPP and the POPP-Beta filters on region 4, 5, 6, and 7.One would have thought that the performance of these two filters should follow the C-POPP and the POPP-Dirichlet filters.We argue that the volume of data used to create the sensor models for both POPP and the POPP-Beta were enough for an accurate estimate of point-estimate sensor model (POPP) and distribution sensor model (POPP-Beta).However, due to high correlations among sensors which were not captured by both the POPP and the POPP-Beta sensor models, the accuracy in estimating the parameter λ(t) is worse than the C-POPP and the POPP-Dirichlet.It is also worse for the POPP-Beta filter since the POPP-Beta is more conservative in estimating the parameter λ(t) than the POPP model.For example, region 4 contains high tables and tall chairs where the leg detector tended to falsely detected them as a person.Unless an upper body detector detects a person, the leg detector detection may be ignored.On the other hand, region 7 is a hallway with a water dispenser around the corner.This water dispenser is often falsely detected as a person by the upper body detector and the leg detector detections helps in reducing this mistake.

Exploring for human activities
So far, the paper has focused on Bayesian methods for inferring a belief state about the spatio-temporal patterns of human occupancy from unreliable sensors.Given such a belief state a robot may plan how to actively explore to acquire new information so as to complete a task (Hanheide et al., 2017;Sridharan et al., 2019).Here, the robot uses predicted counts from the belief state to explore so as to detect human activities with increasing efficiency.
Specifically, the robot's choice is whether to explore new region-time combinations or to exploit region-time combinations that are known to yield a high number of activities.This an instance of an exploration-exploitation problem.Exploration-exploitation problems arise whenever an agent lacks an adequate model of the process it must control.At each moment, the agent chooses either to explore so as to improve the model or to exploit the existing model so as to maximise immediate performance.
While exploration-exploitation problems in reinforcement learning, are typically intractable, there are well known, fast to compute, approximations (Wyatt, 1998;Alba & Dorronsoro, 2005;Audibert et al., 2009).One such approach is to use the upper bound of a probability distribution over the quantity being maximised.This causes the decisionmaking agent to exploit high-scoring, certain estimates, and explore highly uncertain estimates.In our robot exploration, for example, when the robot visits a place, it can be because the place either actually has high number of people (exploitation) or potentially has high number of people (exploration).In our case we use an upper bound on the arrival rate (λ) of a Poisson process (λ U B ) to choose the region for the robot to visit next.The upper bound of the probability interval of the arrival rate of a Poisson process is calculated as follows: with λ U B (t i , t j ) as the upper bound of λ within time t i and t j , i, j ∈ {1, . . ., }, and C DF −1 as the inverse of the cumulative density function of a Gamma distribution.Given the upper bounds λ r U B (t i , t j ) for each region r from the set of all regions R, the region to be visited between time t i and t j is chosen by: arg max Figure 18 depicts a comparison between the MAP hypothesis estimate and the upper bound estimate of a Poisson process.
To tie the estimate of a particular Poisson process over a time interval to data collected previously, as in Sect.7 we assume that human presence in each region follows a periodic Poisson process with daily periodicity.This allows us // Create a cosine signal from f • x 1 , . . ., x n ← s * cos(2π * f + p) // Subtract current x 1 , . . ., x n with the cosine • x 1 , . . ., x n ← x 1 , . . ., x n − x 1 , . . ., x n to regularise, and fill missing data, across the point estimates of upper bounds using methods based on the Fourier transform.This exploits assumptions and algorithms introduced in our prior work.In particular, the series of upper bounds λ U B (t i , t j ) are encoded and extracted via spectral analysis with the l-AAM technique described in (Jovan et al., 2016).The plot in Fig. 18 shows how a spectral Poisson process look like, i.e., the effects of the spectral processing on a periodic Poisson process.Algorithm 2 depicts the process of computing the upper bound of a Poisson process and applying spectral analysis to it.We use this approach with upper bounds produced by our previously presented estimators: FOPP, POPP, and POPP-Beta.C-POPP and POPP-Dirichlet estimators are excluded in our experiments due to a need to limit experimental time to 45 days to keep building use conditions that were broadly the same.7

Exploration evaluation
The dataset used in the previous section was collected by a mobile robot over 69 days of a real world trial.This robot was controlled by the exploration models described above.Due to hardware failures, sensor malfunctions and other external issues, only 48 days from the dataset were usable.
Three different exploration models were applied separately during three phases of the 69 days of the trial.All of these models used Eq.18 to create their exploration policies.For the first 27 day phase of the trial, the robot followed an exploration policy based on the FOPP model.This resulted in 18 days of data.From day 28 to day 47, the robot followed an exploration policy according to the POPP model.This resulted in 15 days of data.Finally, from day 48 onwards, the robot followed an exploration policy according to the POPP-Beta model.This also resulted in 15 days of data.Such that all three models can be compared equally, in the following we also constrain the data available for for the FOPP model to the first 15 of its 18 days.We can compare the different exploration policies on the observations the robot made during the phase each policy was active.Due to the absence of information regarding occupancy in the places that the robot did not visit, only a comparison of the positive observations can be made.
Figure 19 shows the percentage of visits to each region which yielded a non-zero true count.As can be seen, the exploration policy produced by POPP-Beta has the highest proportion of such visits in many of the regions, followed by the exploration policy according to the POPP model.Recall that some regions, such as 4, 5, 6, and 7, are not densely populated with humans across time compared to other regions (such as 1, 2, 3, and 10).The POPP and POPP-Beta models, however, still managed to improve the percentage of positive observations.This shows that the models correctly predicted that people would be present in particular locations at partic-Fig.19 This graph shows the percentage of time that the robot observed activities when it was present in a region.It is a measure of how successful the robot's visit policy (choice of visit time and visit location) was in finding people.It presents results for for the FOPP, POPP and POPP-Beta algorithms ular times.One should note that region 6 contains vending machines which are often detected as a person by the upper body detector.This leads to the FOPP model planning to visit this particular location when no activity is taking place.The POPP and the POPP-Beta models were able to correct the miscounts occurring in region 6, providing a better estimate of the posterior over the arrival rate λ.This leads to models that better capture the true underlying process and thus support more accurate exploration-exploitation trade-offs.
During the first few days of each 15 day phase the robot primarily explores since each model initially has a highly uncertain estimate of λ.Every three days, the sensor model for both the POPP and the POPP-Beta is updated to represent more accurate true positive and false positive rates.Note that the first 3 days of each exploration for the POPP and the POPP-Beta, the sensor model is set to follow a perfect sensor model (i.e.setting the true positive rate to 1.0 and false positive rate to 0.0) with the hyperparameters for the POPP-Beta set to follow uniform distribution.As more days of data are experienced the estimates increase in confidence and the robot starts to exploit this increased confidence by visiting locations which are likely to provide higher counts8 .Figure 20 shows the number of actual humans (performing some activities) observed throughout each its exploration policy.Looking at the raw numbers provided by the figure, it seems that the exploration policy following the POPP model performed really well in finding people.However, this metric is quite unfair due to the time when the exploration was conducted (e.g. the exploration policy following the POPP-Beta model was conducted in Saturday and Sunday in some occassions).To allow us to produce a metric for a fair comparison across three models (FOPP, POPP, and POPP-Beta) deployed at different times (and thus experiencing different population dynamics), we look at the ratio between the expected observations made by a baseline policy and those made by our exploration policy in the same period.To create the baseline total for each model we take the true counts experienced for its first three days then multiply these by five to give an expected total over 15 days (the number of days of data available to every model).Three days are chosen to align with the 3-day periodical update of the sensor model for the POPP and the POPP-Beta and create a uniform baseline across different explorations since both the POPP and the POPP-Beta explorations should act similarly to the FOPP exploration in the first 3 days.This is the denominator in Eqn. 19, where s(n) is the (true) number of people observed on day n.This is used to divide the cumulative number of observations up to the current day: Given this, a ŝ score of 1.0 on day 15 shows that people have been observed people at the rate of the baseline, i.e. the underlying model has failed to exploit additional data correctly.A result over 1.0 shows that the model has exploited the available data to observe people at a greater rate than in the first 3 days.Figure 21 presents the cumulative normalised true counts of people observed by the robot across the three phases.This shows that exploration driven by the POPP and the POPP-Beta models improves the number of people observed during these phases.By the end of each of these two phases, the ratio is around 1.7.On the other hand, the FOPP showed a stable ratio around the baseline (1.0 at day 15), this means that the FOPP is not be able to improve the number of people observed over time.

Exploration limitations
There are some limitations in our robot exploration experiment as a result of carrying them out on a real robot in an uncontrolled setting.A major limitation is how we stretched our assumption about (approximately) constant population dynamics throughout the 69-day exploration.It is clear that the population in a university building fluctuates within a single academic semester (e.g.students tend to be around in the middle of the semester-Fig.20-than during the last week of the semester-Fig.20).We mitigate this by normalising the raw counts following Eq.19.The normalisation is effective when the differences in population dynamics are within reasonable range.
Due to the impact of occasional hardware and sensor malfunctions, plus an external time limit for running the  experiments (winter break was approaching), we further stretched our assumption by assuming the population dynamics during weekends were similar to weekdays.However, upon inspection, there were far fewer students on the weekends for the robot to observe than on the weekdays (see Fig. 20).The mismatch between our assumption and the experimental setting mostly affected exploration following the POPP-Beta policy, which was tested last.The inclusion of weekends creates a big deviation in the population dynamics that renders the normalisation ineffective.Although the POPP-Beta policy was affected by a large variation in population dynamics, the policy was still able to improve the number of people observed over time.This can be shown by removing the weekends from calculations in Eq. 19, resulting 1.48 at day 15 for the POPP-Beta policy.An ideal experiment would be to run robot explorations with multiple identical robots employing different exploration policies at the same time.

Conclusion
This article has presented Bayesian estimators for (1) estimating human activities, as count data, at each time of the day and (2) helping an autonomous robot optimise between exploring for new time-place combinations where it might discover a high-level of human activity and re-visiting learnt time-place combinations that a wealth of human activity is to be found.Our work was motivated by the application of counting people from an autonomous mobile robot using noisy sensors and perception algorithms.The work extends our prior work Jovan et al. (2018) with two main contributions.First, we presented variations of our previous POPP formulation: POPP-Beta extends POPP by accounting for the unreliability of the observation model; C-POPP extends POPP by modelling the case when sensors are uncorrelated; and POPP-Dirichlet combines POPP-Beta and C-POPP to provide the benefits of each correction.Evaluations on synthetic data and observations taken by a robot show that each extension provides progressively more accurate estimates than the POPP filter.Second, posteriors from FOPP, POPP and POPP-Beta were used to drive exploration by a mobile robot for a series of three exploration experiments.An upper bound interval exploration method in combination with Fourier transformation was used to solve the explorationexploitation problem.This resulted in a labelled dataset of human presence counts.Our initial evaluation demonstrated that POPP and POPP-Beta were able to drive the robot to observe more people over time than the FOPP-based method.
There are many directions for further work including utilizing C-POPP and POPP-Dirichlet to drive the robot observation in an extended time period, allowing another filter strategies for faster and more accurate posterior estimates, and removing convenient closed forms of conjugate priors in the sensor models.

Fig. 2
Fig. 2 A cycle process from count data collection, Poisson process estimation, through exploration plan generation for one day in an office-like environment.Count data are collected through perception algorithms or sensors while the robot patrols.The raw count data from multiple sensors are correctly filtered and merged via Bayesian inference taking

Fig. 4
Fig. 4 Average KL-divergence from the gamma and switching filters to P(λ | s).The horizontal axis shows the true positive rate (top) and true negative rate (bottom) of one simulated sensor.The figure is taken from Jovan et al. (2018) binomial distribution B B(d | c, ζ, η), where the p parameter in the binomial distribution B(d | c, p) is drawn from a Beta distribution Be( p | ζ, η).Our sensor rates, τ j and ξ j , are now estimated from two Beta distributions: Be(τ | ζ τ , η τ ) and Be(ξ | ζ ξ , η ξ ).ζ τ and ...

Fig. 5
Fig. 5 Graphical representation of the POPP-Beta.Instead of having fixed estimated points for the sensor rates τ and ξ like in the POPP model, they are represented by Beta distributions in the POPP-Beta further define D e + as an m ×c detection matrix formed from all the columns D :k where k ∈ e + , and D e − as the corresponding m × (l − c) detection matrix formed from all the columns D :k where k ∈ e − .

Fig. 6
Fig. 6 Graphical representation of C-POPP.Unlike the POPP model, the matrix detection D represents a joint detection at particular time interval and is affected by the value of the true count c, and the sensor rates (joint true positive rate P + and joint true negative rate P − )

Fig. 8 Fig. 9
Fig. 8 The RMSE of posterior estimates of λ for the POPP and its variation models with 12 sample data used to build the (joint) sensor model with variation in P + .All models are compared to the FOPP model.Each trial consisted of a stream of s 1 . . .s 144 samples to update P(λ | s i ).Accuracies of MAP estimates are shown in the top panel, accuracies of expectation of the posterior in the bottom panel.Each data point is an average of 30 trials.Standard errors are shown

Fig. 12
Fig.12The office building in which the robot gathered data.Areas are bounded by imaginary lines.The figure is taken fromJovan et al. (2018)

Fig. 14
Fig. 14 The RMSE evolution of periodic Poisson processes with POPP, POPP-Beta, C-POPP, POPP-Dirichlet and FOPP filters from day 3 to day 36, averaged across all regions.Standard error is shown

Fig. 17
Fig. 17 The Jensen-Shannon of the FOPP, POPP, POPP-Beta, C-POPP, and POPP-Dirichlet filters across regions.The Jensen-Shannon value(s) are taken at the 36th day.Standard error is shown

Fig. 18 A
Fig. 18 A spectral Poisson process of region 9 (see Fig. 12) represented by its MAP hypothesis (blue line) and its upper bound of the probability interval (red line) (Color figure online)

Fig. 20
Fig. 20 The number of actual activities observed by the robot for each exploration policy

Fig. 21
Fig. 21 The improvement ratio of activity observations during each phase of the trial.The dash line indicates a baseline performance, i.e., no improvement in exploration over time

Table 2
Averaged sensor models across all areas trained from 48 days of data