We begin in Sect. 3.1 by describing our approach to learning changepoints from time series data of counts using an infinite-state HMM, and then couple this with learning latent group structure in Sect. 3.2.
Modeling communication rates
Let N
t
represent the total number of emails the user sends on day t. The set of variables {N
t
:1≤t≤T} define a stochastic process. We assume that N
t
∼Poisson(λ
t
), where λ
t
is the rate at which the user sends emails on day t. Because λ
t
is allowed to change across days, this type of process is usually referred to as a non-homogeneous Poisson process.
Our model assumes that the user communicates with K separate groups of people. Each email the user sends is sent to one of the K groups. We assume that the rate at which emails are sent to each group are independent Poisson processes, i.e., a change in the rate at which emails are sent to one group does not affect the rate at which emails are sent to other groups. This assumption is clearly an approximation of what happens in practice—for example there may be exogenous (external) events, such as the user going on vacation, that affect most or all groups simultaneously. Nonetheless, we believe this independence model is a useful (and computationally efficient) place to start, allowing us to capture “first-order” group behavior—models allowing dependence between groups and/or shared dependence on exogenous events would be of interest as extensions of the simpler model we propose here.
Let N
k,t
represent the (unobserved) number of emails the user sends to group k on day t. We model N
k,t
∼Poisson(λ
k,t
), where λ
k,t
is the rate at which the user sends emails to group k on day t. Because of our independence assumptions, N
t
is the superposition of independent Poisson processes (\(N_{t} = \sum_{k=1}^{K}N_{k,t} \sim\text {Poisson}(\lambda_{t})\), where \(\lambda_{t} = \sum_{k=1}^{K}\lambda_{k,t}\)).
For the remainder of this subsection, we describe the model for time-varying communication rates for a single group, deferring discussion of how we learn the groups themselves to Sect. 3.2. We model a user’s email rate to group k, {λ
k,t
:1≤t≤T}, using a HMM. Under the HMM, the value of λ
k,t
is dependent on a latent state s
k,t
, and the value of s
k,t
is dependent on s
k,t−1, the state of the previous day. Unique states represent different modes of activity between the user and recipient groups.
We define a changepoint to be a time t where the HMM transitions between different states (s
k,t
≠s
k,t+1). Changepoints will typically correspond to unobserved events throughout the user’s history that change their communication rate with the group (such as vacations, research deadlines, changing schools, etc.). We define the single, contiguous interval of time between two adjacent changepoints to be a segment. Each segment represents a period of constant mean activity for the user with respect to a particular group.
Traditional HMMs have a finite number of states, limiting the modes of activity a user can have. Here we allow the HMM to have a countably infinite number of states, where only a finite subset of those states are ever seen given the observed data (similar to Beal et al. 2002). We enforce the restriction that the HMM cannot transition to previously seen states (known as a left-to-right HMM), ensuring that each unique state spans a single interval of time.Footnote 1 We model such a HMM by placing separate symmetric Dirichlet priors over each row of the transition matrix. As the number of latent states tends to infinity, these priors converge in distribution to Dirichlet processes (Neal 2000). A property of Dirichlet processes is that, after integrating out the parameters for the HMM transition matrix, the transition probabilities between states become:
$$ P(s_{k,t} | s_{k,-t}, \gamma, \kappa) = \begin{cases} \frac{V_t + \gamma}{V_t + \gamma+ \kappa} & \text{if $s_{k,t} = s_{k,t-1}$,}\\ \frac{\kappa}{V_t + \gamma+ \kappa} & \text{if $s_{k,t}$ is a new state,}\\ 0 & \text{otherwise},\\ \end{cases} $$
(1)
where γ and κ are adjustable parameters, s
k,−t
={s
k,t′:t′≠t} is the set of all other states (not just the previous state, since the integration of the transition matrix introduces dependencies between all latent states), and \(V_{t} = \sum_{t'=2}^{t-1}\delta(s_{k,t'} = s_{k,t-1})\delta(s_{k,t'-1} = s_{k,t-1})\) is how long the HMM has been in state s
k,t−1 up to time t. Appendix A contains additional discussion of the sensitivity to segment lengths to γ and κ.
The other dependence to model in the HMM is how group k’s rate at time t depends on its latent state s
k,t
, namely λ
k,t
|s
k,t
. We use Poisson regression to model the log of these rates, i.e., \({\log\lambda_{k,t} = X_{k,t}^{T}\theta}\), where \(X_{k,t}^{T}\) is a set of features for day t and θ is a vector of regression parameters. We construct \(X_{k,t}^{T}\) and θ such that \({\log \lambda_{k,t} = \beta_{k, s_{k,t}}}\), where β
k,m
is the log of the rate that the user is sending emails to group k while in time segment m (corresponding to state m of the HMM). X
k,t
is a binary vector indicating the latent state of the HMM on day t, and \(\theta= [\beta_{k,1}, \beta_{k,2}, \ldots, \beta_{k,M_{k}}]^{T}\), where M
k
is the number of unique states. Because we are modeling λ
k,t
with Poisson regression, we can also include other features (which may or may not depend on group k). For example, in the results in this paper we include day-of-week effects:
$$ \log\lambda_{k,t} = \beta_{k, s_{k,t}} + \alpha_{d(t)}, $$
(2)
where d(t)∈W represents different days of the week. We can, for example, use W={0,1} to represent weekdays and weekends. Having this configuration allows the overall number of emails sent by the user to vary between weekdays and weekends, while having the relative emailing rates for each group remain unchanged. As an example, consider a user that only sends emails on weekdays. The α
weekend regression term would have a large negative value, forcing λ
k,t
≈0 for every group on weekends. In the results in this paper we use W={0,…,6}, allowing overall activity to be modulated for each day of the week individually.
Figure 2 shows the overall structure of the HMM in the form of a graphical model, with a different λ
k,t
for each group k. Since the λ’s are deterministic functions of α and β (using (2)) they could be omitted from the depiction of the graphical model, but are included for clarity.
Modeling recipient groups
We discuss next how to model the K different groups that the user interacts with, where a group is defined as a distribution over the possible sets of recipients for an email. Intuitively the different groups should account for different relationships that are reflected in the individual’s email time series, such as familial and friend relationships, organizational relationships, recreational activities, and so forth.
We assume each email is sent to one of the K latent groups. Let z
t,n
represent the latent group that email n on day t was sent to, and y
t,n
∈[0,1]R a binary vector indicating which of the R possible recipients are in the email. Given the latent group, the recipients of the email are selected independently with probability ϕ
k,r
, i.e., a conditionally-independent Bernoulli model. The generative model is
where α
(z) and β
(z) are the parameters of an independent set of Beta priors over the individual recipient probabilities. Under this model the expected number of recipients for an email sent to group k is ∑
r
ϕ
k,r
.
This modeling of the latent group indicator variables z
t,n
is a key aspect of the model; the distribution of z
t,n
introduces dependencies between the K separate HMMs from Sect. 3.1 and the generative model of email recipients in (3). The discrete probabilities over the latent variables are a function of the daily rates λ
k,t
, the rate at which the user is sending emails to group k on day t. Because {λ
k,t
:1≤t≤T} is a Poisson process and the time between emails for a particular group follows an exponential distribution, it is straightforward to show (using standard properties of exponential random variables; e.g., Ross 2007) that the probability of an email on day t is sent to group k can be written as
$$ P\bigl(z_{t,n} = k \bigm{|} {\bigl\{\lambda_{k',t} : 1 \leq k' \leq K\bigr\}}\bigr) = \frac{\lambda_{k,t}}{\sum_{k'=1}^K\lambda_{k',t}}. $$
(4)
Figure 3 shows the graphical model for representing the group aspect of the model, as described in this section. In the interest of interpretability, all the HMM variables described in Fig. 2 (except for λ and N) are combined into a single supernode.