Abstract
In this paper, we consider the problem of computing a median of marked point process data under an edit distance. We formulate this problem as a binary linear program, and propose to solve it to optimality by software. We show results of numerical experiments to demonstrate the effectiveness of the proposed method and its application in earthquake prediction.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
A time series of discrete events observed in continuous time is referred to as a point process. In particular, if each event is associated with some values called marks, then it is referred to as a marked point process (MPP). An example of the MPP is earthquake data since earthquakes occur discretely in continuous time and their magnitudes and epicenters, for example, naturally define their marks; see, e.g., [6, 15, 20, 24, 29] for other examples.
In analyzing MPPs, it is common to use time windows to generate a large number of pieces, i.e., local patterns, of given data. This processing enables us to discuss properties of given data in terms of relationships among the pieces generated. In this setting, an edit distance, originated in [35] by Victor and Purpura, is often employed as a metric; see [31] for its general definition, and also, e.g., [8, 11, 28] for data analysis based on this distance.
In practice, distance is frequently inquired and hence its computation often forms a bottleneck of the whole computation, affecting the quality of analysis results. For this reason, algorithms for computing the Victor–Purpura-type edit distance (the VP distance) have been studied in the literature; see [1, 12, 14, 35]. See also, e.g., [9, 18, 19] for the Earth mover’s distance and, e.g., [5, 34, 36] for the Wasserstein distance, which are related concepts.
However, algorithms for analyzing MPPs under the VP distance are limited in the existing literature [7, 16, 22, 27, 30]. In particular, median (or prototype) is useful when one wants to represent a set of MPPs by a single MPP that is typical and interpretable. Its applications include neuronal spike behavior [7], earthquake aftershocks [30], wildfire activity [27], and plant community [22]. Although algorithms for computing medians of MPPs have been devised in [7, 30, 33], they are heuristic. Against this background, in this paper, we devise a first exact method. It is based on integer programming, which enables us to apply state-of-the-art powerful software. Although optimization techniques have already been used in the literature by the nature of the VP distance, less attention has been paid to integer programming in this context.
1.1 Our Contribution
In this paper, we study the median computation for MPPs under the VP distance. Our contributions include:
-
We first point out the existence of a well-structured median. We then use it to give a binary linear programming formulation, offering a first exact method for the median computation of MPPs.
-
We conduct numerical experiments on random data, which show that the method has a potential to solve instances with hundreds of MPPs with thousands of events in total in a few minutes using software, Gurobi.
-
We present an application of medians of MPPs in predicting earthquake behavior and show its validity using a real-world data set.
We note that the key ingredient for our formulation, the notion of candidates in Lemma 1, has already been pointed out in [32] (Theorem 4a) to develop a heuristic method to compute medians. We also note that although generalized metrics and barycenters are introduced in [10, 25], how to extend our current work to computing the related barycenters for MPPs is an open question.
1.2 Organization
The remainder of this paper is organized as follows. We first introduce definitions in Sect. 2. We then analyze the median computation in Sect. 3, and formulate it as a binary linear program in Sect. 4. Section 5 is devoted to numerical experiments. Concluding remarks are given in Sect. 6.
2 Preliminaries
2.1 Marked Point Process Data
Let \(\mathbb {R}\) be the set of reals. An event is a point p in \(\mathbb {R}^d\) where d is some positive integer. The first component is called the time of occurrence of the event, and each remaining \(d-1\) component vector is the mark of the event. For example, if event represents earthquake data, we would typically have \(d=5\), with the components of the 4-dimensional mark representing the magnitude, the depth, and latitude and longitude of the epicenter. That is,
When \(d=1\), we are concerned only with occurrence times of some particular phenomenon (e.g., neuron activity), and the events will have no marks. We denote \(p_j\) to represent the jth component of event p.
A marked point process (MPP) P is defined as a finite multiset of events. Thus, \(p=q\) can hold for two events p, q in a general MPP. Hereafter, we denote |P| to represent the number of events in P, and also \(\mathcal {U}\) to represent the set of all possible MPPs for some fixed d. The analysis of MPPs is important [21] because they naturally arise in many phenomena, such as earthquakes [13, 30], foreign exchange markets [11, 31] and their interactions [26], and floods [2]. In such analyses, edit distance plays a central role. We next define the edit distance of two MPPs.
2.2 Edit Distance for Marked Point Process Data
To define the Victor–Purpura-type edit distance (VP distance), we need first introduce transformations from P into Q, by means of the following three elementary operations on events: For \(X \in \mathcal {U}\),
-
the \(\texttt {shift}\) operation on event x of X is to replace it by an arbitrary event y in \(\mathbb {R}^d\), which we call the shift destination of x,
-
the delete operation on an event of X is to remove it from X, and
-
the insert operation on an event of \(\mathbb {R}^d\) is to add it to X.
Note that |X| is unchanged by the shift operation, while the delete and insert, respectively, decrease and increase the value by one.
We can transform any given \(P \in U\) to any \(Q \in U\) by a finite sequence of the above three operations. Indeed, a trivial transformation is to delete all events of P, then insert all events of Q. We now define costs of elementary operations. The costs of delete and insert are 1, i.e., a constant, while the cost of shift of event x to y is given by
where \(\lambda _j\) is a positive parameter for \(j = 1,\ldots , d\). The cost of a transformation from P into Q, i.e., a finite sequence of shift, delete, and insert of events whose application to P yields Q, is defined as the sum of costs of the elementary operations. We say that a transformation from P into Q, is optimal if its cost attains the minimum cost among all such transformations.
For convenience, we write \(\texttt {cost}(T)\) to represent the cost of a transformation T. Also, we say that a transformation from P into Q is simple if
-
(C1)
for each event of P, either delete or shift is applied exactly once,
-
(C2)
for each event of Q, either insert is applied exactly once, or it is the shift destination of exactly one event of P,
and no other operations are involved in the transformation. Figure 1a shows, for \(d=2\), two MPPs
where events are indicated by vertical bars. The values below the bars show the times of occurrence, and those above the bars correspond to the value of the (1-dimensional) mark. Figure 1b shows a simple transformation from P into Q that involves one shift, one insert, and two delete operations.
Proposition 1
For any two MPPs P and Q, there always exists an optimal simple transformation from P into Q.
Proof
It suffices to show that we need only consider transformations in which shift is applied at most once to each event of P. To this end, suppose that shift is applied twice to some event, first from p to q, then from q to r. The cost associated with these two shift operations is
thus unifying the two shift operations into one operation from p to r would incur no increase in cost. This completes the proof of the proposition since any shift on an inserted event and any delete on a shifted event can be replaced by one insert and one delete, respectively, without increasing the cost. \(\square \)
We can now define the edit distance.
Definition 1
[31, 35] The VP distance d(P, Q) of two MPPs P, Q is the minimum cost required for transforming P into Q using the three operations shift, delete, and insert of events.
The value of d(P, Q) is well defined since it is always attained by the cost of a simple transformation from P into Q from Proposition 1, and there are only finitely many simple transformations for fixed P, Q. The following proposition follows immediately from the definition of edit distance.
Proposition 2
The edit distance is a metric.
Proof
For any three MPPs P, Q, R, one has
-
(i)
\(d(P,Q)=0\) if and only if \(P=Q\) by the positivity of the parameters,
-
(ii)
\(d(P,Q)=d(Q,P)\) by the symmetry of the costs, and
-
(iii)
\(d(P,R)\le d(P,Q)+d(Q,R)\) since concatenating any transformation from P into Q and one from Q into R yields a transformation from P into R whose cost is exactly the sum of those of the two.
\(\square \)
Proposition 3
The VP distance is a metric on \(\mathcal {U}\) for fixed d.
Proof
For any \(P,Q,R \in \mathcal {U}\),
-
(i)
\(d(P,Q)=0\) if and only if \(P=Q\) from positivity of the parameters,
-
(ii)
\(d(P,Q)=d(Q,P)\) by symmetry of the costs, and
-
(iii)
\(d(P,R)\le d(P,Q)+d(Q,R)\) because concatenating two transformations, one from P into Q and the other from Q into R, yields a transformation from P into R whose cost equals the sum of those of the two.
\(\square \)
2.3 Bipartite-Graph Model for Computing Edit Distance
We now explain how to compute the value of d(P, Q). This is accomplished by solving an assignment problem on a bipartite graph [14]. This bipartite graph is defined as follows:
-
Let \(V^i_P=\{ v^i_p \mid p \in P\}\) and \(V^i_Q=\{ v^i_q \mid q \in Q\}\) for \(i=0,1\). Then, set
$$\begin{aligned} V_P=V^0_P \cup V^1_Q\quad \hbox {and}\quad V_Q=V^1_P \cup V^0_Q. \end{aligned}$$Thus, \(|V_P|=|V_Q|=|P|+|Q|\). We say that each vertex \(v_x^1\) is dummy.
-
Let \(E=\{ \{u, v\} \mid u \in V_P, \, v \in V_Q \}\). Then, \((V_P\cup V_Q, E)\) is an undirected complete bipartite graph, which we denote by \(G_{PQ}\). Each edge \(\{v^i_x, v^j_y\}\) has a cost, denoted by \(\texttt {cost}(\{v^i_x, v^j_y\})\), defined as
$$\begin{aligned} \left\{ \begin{array}{ll} \texttt {cost}(x, y) &{}\quad \hbox {if}\quad i=0, j=0 \,(\texttt {shift}),\\ 1 &{}\quad \hbox {if}\quad i=1, j=0, \hbox {i.e.},\, v^i_x~\hbox {is dummy}\, (\texttt {insert}),\\ 1 &{}\quad \hbox {if}\quad i=0, j=1, \hbox {i.e.},\, v^j_y~\hbox {is dummy}\, (\texttt {delete}),\\ 0&{}\quad \hbox {if}\quad i=1, j=1, \hbox {i.e.},\, v^i_x\quad \hbox {and}\quad v^j_y~\hbox {are both dummy.}\\ \end{array}\right. \end{aligned}$$For \(F \subseteq E\), we denote by \(\texttt {cost}(F)\) the sum of costs of the edges in F.
Figure 2a shows the graph \(G_{PQ}\) for two MPPs P, Q depicted in Fig. 1a. Black (white) circles correspond to non-dummy (dummy) vertices. We show costs corresponding to the shift edges next to them assuming \(\lambda _1 = \lambda _2 = 10\). The delete and insert edges are indicated by dotted lines, and edges with 0 cost are omitted.
For a perfect matching M in \(G_{PQ}\) let \(T_M\) be a simple transformation from P into Q in which
-
for each \(p \in P\), delete is applied if \(\{v^0_p,u \} \in M\) for some \(u \in V^1_Q\) and shift from p to q is applied if \(\{v^0_p,v^0_q \} \in M\) for some \(q \in Q\), and
-
for each \(q \in Q\), insert is applied if \(\{u,v^0_q\} \in M\) for some \(u \in V^1_P\) and shift from p to q is applied if \(\{v^0_p,v^0_q\} \in M\) for some \(p \in P\).
Then, \(\texttt {cost}(M)=\texttt {cost}(T_M)\) holds by the definitions of the edge costs. Conversely, we observe that for any simple transformation T from P into Q, there is a perfect matching M such that \(T=T_M\), which implies the following.
Proposition 4
[14] For any two MPPs \(P,Q \in \mathcal {U}\), let \(M^*\) be a minimum-cost perfect matching in \(G_{PQ}\). Then, \(\texttt {cost}(M^*)=d(P,Q)\).
The perfect matching shown in Fig. 2b corresponds to the simple transformation shown in Fig. 1b.
3 Medians of Marked Point Process Data Under Edit Distance
3.1 Problem Description and Toy Examples
The median computation is a problem of finding, for a given set \(\{P_1,\ldots ,P_k\}\) of MPPs with \(P_h \in \mathcal {U}\) for \(h=1,\ldots ,k\), a single MPP X in \(\mathcal {U}\) that minimizes the sum of VP distances from \(\{P_1,\ldots ,P_k\}\), i.e.,
over \(\mathcal {U}\). Such an X is called a median. Hereafter, the given set of MPPs is referred to as an instance.
The median computation has a simple structure when \(k=2\). To see this, let \(\{P_1,P_2\}\) be any instance in this case. Then, both \(P_1\) and \(P_2\) are medians of \(\{P_1, P_2\}\) because for any X in \(\mathcal {U}\), we have
and \(f(P_1)=f(P_2)=d(P_1,P_2)\) since d is a metric. In general, any minimizer of f among all MPPs in an instance is referred to as a medoid. Hence, in this example, \(P_1\) and \(P_2\) are medoids as well as medians.
An instance where medoids and medians have no intersection is depicted in Fig. 3. Assuming that \(\lambda _1\) and \(\lambda _2\) are sufficiently small, any median consists of exactly two events, and neither insert nor delete is used since their costs are high. This means that X shown in the bottom is the unique median. Observe, for example, that event x of X is such that \(x_1\) is the one-dimensional median of \(\{p^h_1\}_{h=1,2,3}\), i.e., \(\{0.1,0.2,0.3\}\) while \(x_2\) is that of \(\{p^h_2\}_{h=1,2,3}\), i.e., \(\{1,2,3\}\) so that the total cost of shift is minimized for x.
3.2 Well-Structured Medians
Motivated by the above observation, we here point out that there always exists a median with a simple structure. Let \(\mathcal {P}\) denote the set of events in a given instance \(\{P_1,\ldots ,P_k\}\), i.e., \(\mathcal {P}=P_1 \cup \cdots \cup P_k\). Then, we call c in \(\mathbb {R}^d\) a candidate if for each \(j=1,\ldots ,d\), there is p in \(\mathcal {P}\) with \(c_j=p_j\), and denote by \(\mathcal {C}\) the set of candidates. In other words, letting \(\mathcal {A}_j\) be the set of values for the jth component of the events in the instance, i.e., \(\{p_j \mid p \in \mathcal {P}\}\), \(\mathcal {C}\) can be written as \(\mathcal {C}= \mathcal {A}_1 \times \cdots \times \mathcal {A}_d\). Note that \(\mathcal {P}\subseteq \mathcal {C}\) and \(\mathcal {C}\) is finite since each \(P_h\) is finite for \(h=1,\ldots ,k\). For the instance shown in Fig. 3, we have \(|\mathcal {P}|=6\) while \(|\mathcal {C}| = 18\) because \(\mathcal {A}_1 = \{0.1,0.2,0.3,0.7,0.8,0.9\}\) and \(\mathcal {A}_2 = \{1,2,3\}\). Note that the two events of X belong to \(\mathcal {C}\) but not to \(\mathcal {P}\). The following lemma claims that one can construct a median for any instance by selecting events from \(\mathcal {C}\) (allowing duplication).
Lemma 1
Let \(\{P_1,\ldots ,P_k\}\) be any instance. Then, there exists a median such that all of its events belong to \(\mathcal {C}\). We call such a median well-structured.
Proof
Take any median X and fix a simple transformation from X into \(P_h\) with cost \(d(X,P_h)\) for each \(h=1,\ldots ,k\). Let x be any event of X and \(H_{x}\) be the set of its shift destinations. Note that \(H_{x}\) is nonempty; otherwise, removing x from X decreases the value of f, a contradiction. The set X remains a minimizer of f even if x is replaced by y as long as y minimizes
over \(\mathbb {R}^d\). We see that y minimizes g if each \(y_j\) minimizes the term in the square bracket, which can be accomplished by setting \(y_j\) to \(p_j\) for some \(p \in H_{x}\), i.e., a one-dimensional median. This implies that there exists a minimizer of g that belongs to \(\mathcal {C}\), and hence completes the proof. \(\square \)
4 A Binary Linear Programming Model
The binary linear programming model for the median computation proposed in this paper is a direct consequence of Lemma 1. We create, for each \(c \in \mathcal {C}\), an integer variable \(m_c\) to represent the number of events in a solution that are identical to c, i.e., the multiplicity of c. We also create, for each \(c \in \mathcal {C}\) and \(p \in \mathcal {P}\), a binary variable \(s_{cp}\) which takes one if p is the shift destination of c. Finally, we create, for each \(p \in \mathcal {P}\), a binary variable \(\iota _p\) (\(\sigma _p\)) which takes one if p is inserted (p is the shift destination of some event). Then, the median computation can be formulated as
where \(c_{cp}\) represents c(c, p) for each \(c \in \mathcal {C}\) and \(p \in \mathcal {P}\), and \(\mathbb {N}\) denotes the set of nonnegative integers.
In order to obtain the median X corresponding to a solution of \(\hbox {(P)}\), pick each element c with multiplicity \(m_c\). The transformation from X to each \(P_h\) is obtained as follows: For each \(p \in P_h\),
-
(i)
if \(\iota _{p}=0\) and \(\sigma _{p}=1\), then take any event x of X that is identical to a candidate c with \(s_{cp}=1\) and shift it to p (where the unique existence of such c is guaranteed from (3) and the existence of such x from (4)),
-
(ii)
if \(\iota _{p}=1\) and \(\sigma _{p}=0\), then p is inserted (in this case, there is no c with \(s_{cp}=1\) from (3)),
and the events in X that have no shift destination after the above procedure are deleted. Thus, the resulting transformation is simple.
The number of events that are deleted in the transformation from X into \(P_h\) constructed above is given by
thus the second term in the objective function coincides with the total cost concerning delete. The first and third terms in the objective function coincide with the total costs concerning insert and shift, respectively. Therefore, the value of the objective function (when minimized) coincides with f(X) for any X that is represented by the m variables.
Theorem 1
The binary linear programming problem \(\hbox {(P)}\) correctly formulates the median computation.
Proof
Straightforward from the discussion so far. \(\square \)
Remark 1
One can replace the constraint of \(m_c \in \mathbb {N}\) by \(m_c \in \mathbb {R}\) because \(m_c\) is minimized in the objective function and the value of \(m_c\) is bounded from below by an integer from Eq. (4).
5 Numerical Experiments
In this section, we report numerical results. We used Python to implement the formulation and ran it on the Intel Core i7-1165G7@ 2.80 GHz with 16.0 GB using optimization software, Gurobi ver. 10.0.0.
We consider two settings. The first setting evaluates the computation time required for the formulation to solve instances to optimality where randomly generated data are employed. The second setting demonstrates the effective use of median in predicting the behavior of earthquakes where medoids are used as a benchmark.
5.1 Computation Time for Randomly Generated Data
Table 1 shows the results. Here, we consider 18 cases by changing the values for the dimension of the events, d, and the number of MPPs in an instance, k, which correspond to the rows. For \(d=1\), events have no marks while for \(d=3\), each event has a two-dimensional vector as its mark. For each case, ten instances are generated in the following way:
-
The first component, i.e., time of occurrence, of every event is selected from \(\{0,1,\ldots ,49\}\) uniformly at random,
-
If \(d\ge 2\), then the jth component for \(j\ge 2\), i.e., a mark, of every event is selected from \(\{1,\ldots ,5\}\) at uniformly random, meaning that there are 5 (25) possibilities for the mark vectors if \(d=2\) (\(d=3\)), and
-
The number of events in each MPP is selected from \(\{1,\ldots ,20\}\) uniformly at random, meaning that it is 10.5 on average and hence the total number of events in the input, i.e., \(|\mathcal {P}|\) is 10.5k on average and at most 20k.
Each component of the events is normalized to be contained in [0, 1] and \(\lambda _1\) and \(\lambda _2\) are set to one. The minimum, average, and maximum of the computation time for the ten instances are shown for each case.
We see from Table 1 that for each fixed d, the computation time grows as \(|\mathcal {P}|\) increases. However, for \(d=1,2\), instances with more than a thousand events are solved within 10 min on average. On the other hand, for \(d=3\), we often failed to solve an instance within an hour even for \(k=40\), and hence the results for larger instances are omitted. One reason is the fact that the number of the s variables is \(|\mathcal {C}|\cdot |\mathcal {P}|\) and hence can be \(50 \cdot 5^{d-1} \cdot 20k\) for the instances we tested, which is 1,000,000 for \(d=3\) and \(k=40\) as shown in the rightmost column. Here, note, in particular, that even when each component of the events is binary, we have \(|\mathcal {C}| = O(2^d)\).
Nevertheless, we confirmed that the gap between upper and lower bounds in each computation tends to get small at early stages, which may suggest that the proposed formulation can be used in heuristic approaches. For example for an instance with \(d=3\) and \(k=40\), a solution with a guaranteed gap 1.50% is found within a minute. The average number of events involved in a median was approximately 10.5.
Remark 2
It is possible to impose bounds on the number of events if interested in a median with as few events as possible, or use \(\mathcal {P}\) instead of \(\mathcal {C}\), which substantially reduces the number of the s variables, if interested in approximating the median with events in a given input (although in both cases, we cannot guarantee that results are medians in the exact sense).
Finally, we look at a brute-force algorithm mentioned in [32]; see Sect. 3.2.2 of that paper for a heuristic method. This algorithm investigates all the \(\left( {\begin{array}{c}|\mathcal {C}|\\ M\end{array}}\right) \) possibilities for a fixed M, and hence its computation time is much longer than that of ours in general although it is an exact method if ran for several promising values of M. Note that even for an instance with \(|\mathcal {C}|=43\), there are approximately \(1.9 \times 10^9\) possibilities when \(M=10\) for example.
A naive approach for overcoming this drawback would be to sample, say, n, solutions from all the possibilities, and choose one that minimizes f among them. The comparison results for the ten instances for \(d=2\) and \(k=10\) are summarized in Table 2 where we set M to 10, the expected number of events in an MPP. The objective values are the values of f. In the brackets, we show relative errors. When \(n=10\), the naive one runs a little bit faster than ours but the relative error can be 7.4% in the worst case and when \(n=100\), the naive one is much slower than ours and there is no significant improvement on the worst-case relative error as well as the average relative error.
5.2 Application in Earthquake Prediction
We apply our method to analyze MPPs with one-dimensional marks adapted from real-world earthquake data. The task is to predict the days and magnitudes of earthquake occurrences of magnitude 5.0 or more. The datasets generated and/or analyzed during the current study are available from Japan Meteorological Agency.Footnote 1
We focused on the earthquakes occurring in the period from January 1, 2000 to December 31, 2017, within the region of longitude \(140^{\circ }\)E–\(149^{\circ }\)E and latitude \(36^{\circ }\)N–\(42^{\circ }\)N, the area in the vicinity of the Tohoku Oki Earthquake of 2011, covering the points where the Pacific Plate and the North American Plate overlap. The time unit of events was set to 1 day. For each day, we have an event for each earthquake whose magnitude is 5.0 or greater in our preprocessed dataset. For each such event, we associate its magnitude as the one-dimensional mark. As in the previous experiment, each component of the events is normalized to be contained in [0, 1].
From this data, we created 216 MPPs by splitting the 18 year time period into time windows of 1 month each. Thus, each MPP corresponds to 1 month, and the number of events in each MPP is at most 31. We also allowed processes with zero events, as they stand for months in which no earthquakes occurred.
5.2.1 Problem Setting
First, we divided the set of MPPs obtained above into two parts. The data of 2000/1/1–2011/12/31 was treated as training data, and that of 2012/1/1–2017/12/31 as test data. For each MPP in the test data, we predict the pattern of earthquakes occurring in the following month by a procedure described below where k is a fixed parameter:
-
Procedure(k):
-
Step 1.
For a given MPP, choose the k closest MPPs (with respect to the VP distance) from the training data.
-
Step 2.
Extract the set of MPPs corresponding to the following month of the k MPPs chosen in Step 1, and compute their median. This median serves as the prediction for the following month.
-
Step 3.
Compute VP distance between the MPP of the following month in the test data and its prediction computed in Step 2.
-
Step 1.
For example, suppose that we are to use the MPP of January 2012 to predict that of February 2012 when \(k = 3\). In Step 1, we compute VP distance between the MPP of January 2012 and all MPPs in the training data, and choose the 3 MPPs having the smallest distances. Suppose that they are of June 2003, November 2008, and May 2001. Then, in Step 2 we extract the MPPs of July 2003, December 2008 and June 2001, and compute their median. Finally in Step 3, we compute VP distance between this median, and the MPP of February 2012.
5.2.2 Results
We evaluate the prediction accuracy of Procedure (k) in terms of the total sum of VP distances computed in Step 3, i.e., the gap between the actual MPPs in the test data and those predicted by the procedure. The benchmark is Procedure (1) because it corresponds to the method proposed in [13] (see also [11] for a similar method used for exchange forecasts). For this reason, we look at, for \(k \ge 2\), its (relative) prediction accuracy defined as
For example, if this value is 0.5, then it means that the proposed procedure brings 50% improvement over the alternative one proposed in [13].
The results are summarized in Table 3 where we applied the procedure to all MPPs from January 2012 to November 2017. December 2017 which has no data for the following month was excluded from the analysis, thus the number of data used in prediction was 71. The computation time shown in the table is the total computation time required for dealing with all these 71 instances. We varied the value of k as \(k = 1, 2, \ldots , 10\) and that of parameters for the cost of shift \(\lambda _1\) and \(\lambda _2\) as \(\lambda _1=\lambda _2=0.5, 1.0, 2.0\).
We see from Table 3 that large values of k do not necessarily lead to good predictions. This is because if k is unnecessarily large, then the set considered in Step 2 can include MPPs with large VP distances, which can drag down the precision of the computed median. The best prediction accuracy for cases with \(\lambda =0.5, 1.0, 2.0\)is attained at \(k=5,5,8\), respectively, which are indicated by underlines. We also observe that the computation time is not so sensitive to the changes in k as well as \(\lambda \). We note that the sizes of the instances we solved here are at most those of the instances tested for \(d=2\) and \(k=10\) in the previous section, and hence the computation here was very quick.
6 Concluding Remarks
In this paper, we considered the problem of computing a median of marked point processes data under the edit distance, originated by Victor and Purpura in 1997. Our main result is the binary linear programming formulation for this problem. Numerical results showed that through our formulation, medians of thousands of events can be computed in reasonable time by making use of the software Gurobi. This study may shed light on the value of optimization approaches in analyzing marked point process data; see, e.g., [3, 4, 17, 23] for such attempts for other data analyses. In contrast to the current study, developing efficient heuristic algorithms is also an important challenge.
References
Aronov, D.: Fast algorithm for the metric-space analysis of simultaneous responses of multiple single neurons. J. Neurosci. Methods 124(2), 175–179 (2003)
Banerjee, A., Goswami, B., Hirata, Y., Eroglu, D., Merz, B., Kruths, J., Marwan, R.: Recurrence analysis of extreme event-like data. Nonlinear Process. Geophys. 28(1), 213–229 (2021)
Bertsimas, D., Dunn, J.: Optimal classification trees. Mach. Learn. 106(7), 1039–1082 (2017)
Bertsimas, D., King, A., Mazumder, R.: Best subset selection via a modern optimization lens. Ann. Stat. 44(2), 813–852 (2016)
Chizat, L., Roussillon, P., Léger, F., Vialard, F.-X., Peyré, G.: Faster Wasserstein distance estimation with the Sinkhorn divergence. Adv. Neural Inf. Process. Syst. 33, 2257–2269 (2020)
Descombes, X., Zerubia, J.: Marked point process in image analysis. IEEE Signal Process. Mag. 19(5), 77–84 (2002)
Diez, D.M., Schoenberg, F.P., Woody, C.D.: Algorithms for computing spike time distance and point process prototypes with application to feline neuronal responses to acoustic stimuli. J. Neurosci. Methods 203(1), 186–192 (2012)
Eroglu, D., McRobie, F.H., Ozken, I., Stemler, T., Wyrwoll, K.-H., Breitenbach, S.F., Marwan, N., Kurths, J.: See-saw relationship of the Holocene East Asian-Australian summer monsoon. Nat. Commun. 7(1), 1–7 (2016)
Gottschlich, C., Schuhmacher, D.: The shortlist method for fast computation of the earth mover’s distance and finding optimal solutions to transportation problems. PLOS ONE 9(10), e110214 (2014)
Heinemann, F., Klatt, M., Munk, A.: Kantorovich–Rubinstein distance and barycenter for finitely supported measures: foundations and algorithms. Appl. Math. Optim. 87(1), 4 (2023)
Hirata, Y., Aihara, K.: Timing matters in foreign exchange markets. Physica A 391(3), 760–766 (2012)
Hirata, Y., Aihara, K.: Edit distance for marked point processes revisited: an implementation by binary integer programming. Chaos Interdiscip J Nonlinear Sci 25(12), 123117 (2015)
Hirata, Y., Iwayama, K., Aihara, K.: Possibility of short-term probabilistic forecasts for large earthquakes making good use of the limitations of existing catalogs. Phys. Rev. E 94(4), 042217 (2016)
Hirata, Y., Sukegawa, N.: Two efficient calculations of edit distance between marked point processes. Chaos Interdiscip. J. Nonlinear Sci. 29(10), 101107 (2019)
Holden, L., Sannan, S., Bungum, H.: A stochastic marked point process model for earthquakes. Nat. Hazards Earth Syst. Sci. 3(1/2), 95–101 (2003)
Junqueira Saldanha, M.H., Hirata, Y.: Solar activity facilitates daily forecasts of large earthquakes. Chaos 32(6), 061107 (2022)
Lee, I.G., Yoon, S.W., Won, D.: A mixed integer linear programming support vector machine for cost-effective group feature selection: branch-cut-and-price approach. Eur. J. Oper. Res. 299(3), 1055–1068 (2022)
Li, W., Ryu, E.K., Osher, S., Yin, W., Gangbo, W.: A parallel method for earth mover’s distance. J. Sci. Comput. 75(1), 182–197 (2018)
Ling, H., Okada, K.: An efficient earth mover’s distance algorithm for robust histogram comparison. IEEE Trans. Pattern Anal. Mach. Intell. 29(5), 840–853 (2007)
Mallet, C., Lafarge, F., Roux, M., Soergel, U., Bretar, F., Heipke, C.: A marked point process for modeling lidar waveforms. IEEE Trans. Image Process. 19(12), 3204–3221 (2010)
Marwan, R.: Challenges and perspectives in recurrence analyses of event time series. Front. Appl. Math. Stat. 32(4), 420–429 (2023)
Mateu, J., Schoenberg, F.P., Diez, D.M., González, J.A., Lu, W.: On measures of dissimilarity between point patterns: classification based on prototypes and multidimensional scaling. Biom. J. 57(2), 340–358 (2015)
Miyashiro, R., Takano, Y.: Mixed integer second-order cone programming formulations for variable selection in linear regression. Eur. J. Oper. Res. 247(3), 721–731 (2015)
Mohler, G.: Marked point process hotspot maps for homicide and gun crime prediction in Chicago. Int. J. Forecast. 30(3), 491–497 (2014)
Müller, R., Schuhmacher, D., Mateu, J.: Metrics and barycenters for point pattern data. Stat. Comput. 30(4), 953–972 (2020)
Nakano, S., Hirata, Y., Iwayama, K., Aihara, K.: Intra-day response of foreign exchange markets after the Tohoku-Oki earthquake. Physica A 419(1), 203–214 (2015)
Nichols, K., Schoenberg, F.P., Keeley, J.E., Bray, A., Diez, D.: The application of prototype point processes for the summary and description of California wildfires. J. Time Ser. Anal. 32(4), 420–429 (2011)
Ozken, I., Eroglu, D., Stemler, T., Marwan, N., Bagci, G.B., Kurths, J.: Transformation-cost time-series method for analyzing irregularly sampled data. Phys. Rev. E 91(6), 062911 (2015)
Prigent, J.-L.: Option pricing with a general marked point process. Math. Oper. Res. 26(1), 50–66 (2001)
Schoenberg, F.P., Tranbarger, K.E.: Description of earthquake aftershock sequences using prototype point patterns. Environ. Off. J. Int. Environ. Soc. 19(3), 271–286 (2008)
Suzuki, S., Hirata, Y., Aihara, K.: Definition of distance for marked point process data and its application to recurrence plot-based analysis of exchange tick data of foreign currencies. Int. J. Bifurc. Chaos 20(11), 3699–3708 (2010)
Tranbarger, K.E.: Point process prototypes, and other applications of point pattern distance metrics. PhD Thesis, University of California, Los Angeles (2005)
Tranbarger Freier, K.E., Schoenberg, F.P.: On the computation and application of prototype point patterns. Open Appl. Inform. J. 4, 1 (2010)
Vallender, S.: Calculation of the Wasserstein distance between probability distributions on the line. Theory Probab. Appl. 18(4), 784–786 (1974)
Victor, J.D., Purpura, K.P.: Metric-space analysis of spike trains: theory, algorithms and application. Netw. Comput. Neural Syst. 8(2), 127–164 (1997)
Xie, Y., Wang, X., Wang, R., Zha, H. A fast proximal point method for computing exact Wasserstein distance. In: Uncertainty in Artificial Intelligence, pp. 433–453. PMLR (2020)
Acknowledgements
The authors are very grateful to the two anonymous reviewers for their instructive comments and suggestions. The research of NS is partially supported by Grant-in-Aids for Scientific Research (no. JP20K19747). We thank the Japan Meteorological Agency for providing the earthquake dataset used in this study.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Anita Schöbel.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Sukegawa, N., Suzuki, S., Ikebe, Y. et al. On Computing Medians of Marked Point Process Data Under Edit Distance. J Optim Theory Appl 200, 178–193 (2024). https://doi.org/10.1007/s10957-023-02352-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10957-023-02352-8