We now show how to construct the different interval relations using our JoinByS operator and iterators. We start with the expressions from Sect. 4 that do not include map operators, followed by those that do.
Expressions without map operators
Start Preceding and End Following Joins
These two join predicates are the easiest to implement, as they can be mapped directly to the JoinByS operator. For the start preceding join (
) we have to keep track of the active \(\varvec{r}\) tuples, and trigger the output by the start of an \(\varvec{s}\) tuple. If two tuples start at the same time, we have to handle the \(\varvec{r}\) tuple first. Therefore, we call the JoinByS function, passing to it only the starting \(\varvec{s}\) endpoints. This is achieved by using a Filtering Iterator (Section 5.2 on page 8). We also have to pass the ‘\(\leqslant \)’ predicate as the comparison function. A start preceding join then boils down to a single call of JoinByS (see Algorithm 2).
The algorithm StartPrecedingJoin receives iterators to the Endpoint Indexes. When using this algorithm with Endpoint Indices, we simply wrap each index in an Index Iterator—an operation, which, as noted before, we consider implicit.
We define the algorithm for the end following join (
) similarly, but filter the ending endpoints of \(\varvec{s}\), and pass the ‘<’ as the comparison function. The pseudocode of the EndFollowingJoin is presented in Algorithm 3.
Overlap Joins
The left overlap join (
) can be implemented using the StartPrecedingJoin algorithm with an additional constraint \(r.T_e \leqslant s.T_e\). The pseudocode is shown in Algorithm 4. The right overlap join (
) is implemented along similar lines using the EndFollowingJoin algorithm and the selection predicate \(s.T_s \leqslant r.T_s\).
For the Allen versions of the overlap joins, we use strict versions of Algorithms 2 and 3, StartPrecedingStrictJoin and EndFollowingStrictJoin, which do not allow a tuple r to start with a tuple s or a tuple s to end with a tuple r, respectively. They are just simple variations: StartPrecedingStrictJoin merely replaces the ‘\(\leqslant \)’ in Algorithm 2 with ‘<’ and EndFollowingStrictJoin replaces the ‘<’ in Algorithm 3 with ‘\(\leqslant \)’. Additionally, we change the ‘\(\leqslant \)’ in the selection predicates in the filteringConsumer functions to ‘<’.
During Joins
Implementing the during join (
) is similar to Algorithm 4: we just have to swap the arguments for \(\varvec{r}\) and \(\varvec{s}\) (alternatively, we could also use the end following variant). For the Allen version of during joins, we replace the StartPrecedingJoin, EndFollowingJoin, and selection predicates with their strict counterparts.
If we simply call an algorithm with swapped arguments, the elements of the result pairs appear in a different order, i.e., \(\langle s, r\rangle \) instead of the expected \(\langle r, s\rangle \). If this is an issue, we can swap them back using a lambda function as the consumer. Putting everything together, we get Algorithm 5.
Expressions with map operators
In order to avoid physically changing tuple values or even the Endpoint Index, we apply the changes made by the map operators virtually with an iterator. While performing an interleaved scan of two Endpoint Indexes, instead of simply comparing the two endpoints \(r^e\) and \(s^e\) (as in \(r^e < s^e\)), we shift the time stamp of one of them when comparing: \(r^e + \delta < s^e\). In this way the algorithm performs an interleaved scan of the indexes as if we had shifted all \(\varvec{r}\) tuples in time by \(+\delta \).
During an interleaved scan, instead of forcing the iterators of the two Endpoint Indexes (for the relations \(\varvec{r}\) and \(\varvec{s}\)) to move synchronously as in all the operators so far, now one of the iterators lags behind by a constant offset. This behavior can be easily incorporated into our framework by using a special Endpoint Iterator that shifts the time stamp of every endpoint it returns on-the-fly.
There is a second issue: the new starting endpoint often is actually a shifted ending endpoint or vice versa. Consequently, we have to change the endpoint type as well. With the help of our Shifting Iterator, we can shift time stamps and also change endpoint types. As input parameters a shifting iterator receives a source Endpoint Iterator, the shifting distance, and an endpoint type (start or end).
The final issue is separately shifting the starting and ending endpoints by different amounts. We solve this by having independent iterators for both starting and ending endpoints and merging them on the fly in an interleaved fashion. The input parameters of the Merging Iterator are two other iterators, the events of which it merges. See “Appendix A” for more details.
Before and Meets Joins
We are now ready to create a GeneralBeforeJoin (see Algorithm 6 and Fig. 4 for a schematic representation); we already handle the parameterized version here as well. This algorithm performs a virtual three-way sort-merge join of the two Endpoint Indexes. One pointer will traverse the Endpoint Index for relation \(\varvec{s}\), and two pointers will traverse the Endpoint Index for relation \(\varvec{r}\), all three pointers moving synchronously, but at different positions. This is why we had to (implicitly) create two Index Iterators for the same index (lines 4 and 6)—each of them represents a physical pointer to the same Endpoint Index; therefore we need two of them.
We express the Allen’s before join (
) by substituting 1 and \(+\infty \) for \(\beta \) and \(\delta \), respectively; Allen’s meets join (
) by substituting 0 and 0, respectively; and the ISEQL before join by substituting 0 for \(\beta \) and only using the \(\delta \) for the parameterized version. The parameter \(\beta \) distinguishes between the strict (Allen) and non-strict (ISEQL) versions of the operator.
Equals and Starts Joins
For the equals join (
) we keep the original starting endpoints of \(\varvec{r}\) and use as ending endpoints the starting endpoints shifted by one and then execute a StartPrecedingJoin. This matches tuples from \(\varvec{r}\) and \(\varvec{s}\) with the same starting endpoints. We check that we have matching ending endpoints in the filteringConsumer function, which receives the actual tuples as input and thus has access to the time stamp attributes of the original tuples (see Algorithm 7) for the pseudocode.
For a starts join (
) we just have to change the predicate in the filteringConsumer function from ‘\(=\)’ to ‘<’.
Finishes Join
For the tuples in \(\varvec{r}\) we turn the ending events into starting events and shift the ending events by one before joining them to the tuples in \(\varvec{s}\) via an EndFollowingJoin (Algorithm 3). Finally, we check that the tuple from \(\varvec{s}\) started before the one from \(\varvec{r}\). For the pseudocode of the finishes join (
), see Algorithm 8.
Parameterized Start Preceding Join
We now turn to the parameterized variant of the start preceding join (
), which has the parameter \(\delta \) constraining the maximum distance between tuple starting endpoints. The basic idea is to take the starting endpoints of relation \(\varvec{r}\), shift them by \(\delta + 1\), change their type to ending endpoints, and add these virtual endpoints to the original endpoints of \(\varvec{r}\). This way each r tuple will be represented by three endpoints: the original starting and ending endpoints and the virtual ending endpoint. Then the parameterless StartPrecedingJoin algorithm (Algorithm 2) is applied to both streams of \(\varvec{r}\) and \(\varvec{s}\) endpoints. When encountering the second ending endpoint in the merged iterator, it can simply be ignored when its corresponding tuple cannot be found in the active tuple set (see “Appendix A”). Algorithm 9 depicts the pseudocode.
Parameterized End Following Join
A similar parameterized end following join (
) is more complicated. The problem here is that each r tuple will have to be represented by two starting endpoints. The algorithm must consider a tuple activated only if both starting endpoints (and no ending endpoint) have been encountered.
We achieve this by introducing an iterator, called Second Start Iterator, that stores the tuple identifiers of events for which we have only encountered one starting endpoint in a hash set (see “Appendix A”). Only the second starting endpoint of this tuple will return the starting event. The pseudocode for the parameterized end following join is shown in Algorithm 10.
Parameterized Overlap Join
Now that we have an algorithm for the parameterized StartPrecedingJoin, we can define the parameterized left overlap join (
) by combining PStartPrecedingJoin with a filteringConsumer function, similarly to what we have done for the non-parameterized overlap join. Algorithm 11 shows the pseudocode. Alternatively, we can use a PEndFollowingJoin and then check the predicate for the starting endpoint of the s tuple in the filteringConsumer function.
The right overlap join (
) uses a PEndFollowingJoin with the corresponding predicate in the filteringConsumer function.
Parameterized During Join
The parameterized during join (
) looks similar to Algorithm 11, we apply changes along the lines of those shown in the paragraph for the non-parameterized during join. (There is also an alternative version using an PEndFollowingJoin.)
Correctness of algorithms
Showing the correctness of our algorithms boils down to illustrating that we handle the map operators correctly and demonstrating the correctness of the StartPreceding and EndFollowing joins, as our algorithms are either StartPreceding and EndFollowing joins or are built on top them.
Iterators and Map Operators
Here we show how to implement map operators with the help of iterators. Instead of materializing the result (e.g., on disk), we make the corresponding changes in a tuple as it passes through an iterator. If we still need a copy of the old event later on, we feed this event through another iterator and merge the two tuple streams using a merge iterator.
StartPreceding Join
We have to show that all tuples created by Algorithm 2 satisfy the predicate \(r.T_s \leqslant s.T_s < r.T_e\). A Filter Iterator removes all the ending events from \(\varvec{s}\), so we only have to deal with starting events from \(\varvec{s}\) and with both types of events from \(\varvec{r}\). As comparison operator we use ‘\(\leqslant \)’. This determines the order in which events are dealt with.
First, let us look at the case that both upcoming events in itR and itS are starting events. If \(r.T_s \leqslant s.T_s\), then r will be inserted into the active tuple set before s is processed, meaning that the (later) arrival of s will trigger the join with r. If \(r.T_s > s.T_s\), then s will be processed first, not encountering r in the active tuple set, meaning that the two will not join.
Second, if the next event in \(\varvec{r}\) is an ending event and the next event in \(\varvec{s}\) a starting event, then the two events can never be equal. Even if they have the same time stamp, the ending endpoint of r will always be considered less than the starting endpoint of s. Therefore, if \(r.T_e \leqslant s.T_s\), r will be removed first, so r and s will not join, and if \(r.T_e > s.T_s\), s will still join with r.
So, in summary, all the tuples generated by Algorithm 2 satisfy the predicate \(r.T_s \leqslant s.T_s < r.T_e\).
For a StrictStartPreceding join we run Algorithm 2 with the comparison operator ‘<’, yielding output tuples that satisfy the predicate \(r.T_s< s.T_s < r.T_e\). If both upcoming events in itR and itS are starting events, we get the correct behavior: \(r.T_s < s.T_s\) will lead to a join, \(r.T_s \geqslant s.Ts\) will not. If the r event is an ending event and the s event is a starting one, we also get the correct behavior: \(r.T_e \leqslant s.T_s\) will not join the r and s tuple, \(r.T_e > s.T_s\) will (the ending event of r is always less than the starting event of s).
EndFollowing Join
We show that all tuples created by Algorithm 3 satisfy the predicate \(r.T_s < s.T_e \leqslant r.T_e\). This time a Filter Iterator removes all the starting events from \(\varvec{s}\), so we only have to deal with ending events from \(\varvec{s}\) and with both types of events from \(\varvec{r}\). The comparison operator used for the non-strict version is ‘<’.
First, assume that the next event in itR is a starting event and the next event in itS is an ending event. As an ending event takes precedence over a starting event, if \(r.T_s = s.T_e\), the s event will come first. In turn this means that if \(r.T_s < s.T_e\), r is added to the active set first, resulting in a join, and if \(r.T_s \geqslant s.T_e\), s is processed first, meaning there is no join.
Second, we now look at the case that both events are ending events. Due to the comparison operator ‘<’, the events are handled in the right way: if \(r.T_e < s.T_e\), we remove r first, so there is no join, and if \(r.T_e \geqslant s.T_e\) we handle s first, resulting in a join.
For a StrictEndFollowing join we run Algorithm 3 with ‘\(\leqslant \)’ as comparison operator to obtain tuples that satisfy the predicate \(r.T_s< s.T_e < r.T_e\). Let us first look at a starting event for \(\varvec{r}\) and an ending event for \(\varvec{s}\). As ending events are processed before starting events with the same time stamp, we get: if \(r.T_s < s.T_e\), then r is added first, resulting in a join, and if \(r.T_s \geqslant s.T_e\), then s is removed first, meaning there is no join. Finally, we investigate the case that both events are ending events: if \(r.T_e \leqslant s.T_e\), then r is removed first, i.e., no join, and if \(r.T_e > s.T_e\), then s is processed first, joining r and s.