STB: space time boxes

  • Dakshi Agrawal
  • Raghu GantiEmail author
  • Jeff Jonas
  • Mudhakar Srivatsa
Regular Paper


With the advent of the mobile era in the last decade and the evolution of the app economy in smartphones and other smart devices, there is an abundance of location data available. Traditional spatial analysis techniques are locked away in databases (such as DB2 Spatial, ESRI ArcGIS server, Oracle Spatial and Graph) that only enable basic analytics and do not scale very well to societal scale data. Moreover, these approaches tend to deal with only static objects, where time is not treated as a first class citizen. This paper introduces the idea of discretizing space-time as a first order primitive to significantly alter downstream algorithms ranging from simple spatial indexing to complex deep learning that operate on such space-time data. We coin the term space time box (STB) and propose this as a fundamental primitive of thinking about trajectories of moving objects. We substantiate and validate the concept of STB through various pieces of our past work. Finally, we show that 3D STBs can be used for efficiently tracking very fast moving objects (asteroids), which was never before been done.


Space time box Geospatial Geohash Asteroid 

1 Introduction

Mobile devices such as smartphones and in-car navigation systems have become quite popular in the last decade. As these devices collect location data over time, there is an abundance of space and time data that has become available. Examples of applications that collect such space-time data are mapping apps (Google Maps, Bing Maps), car apps (Progressive insurance), weather applications (AccuWeather, Weather), and many more. This space-time data has been used for urban planning, forecasting traffic conditions, and predicting spread of viruses (Kitamura et al. 2000; Eubank et al. 2004; Gonzalez et al. 2008). In addition to mobile devices owned by regular consumers, other moving objects such as airplanes, ships, and even astronomical objects are monitored and location data collected from these objects. Enterprise focused applications such as safety and security and optimization of air-traffic flows are enabled through the collection and analysis of the space-time data (in such environments). A key challenge that arises in such location data analysis is that of being able to store, query, and build machine learning models on societal and enterprise scale volumes of space-time data (Ganti et al. 2016; Srivatsa et al. 2017).

In order to address the problem of being able to store, query, and learn patterns from these space-time data, we introduce the notion of encoding location at a given time as a discrete entity, convert a continuous trajectory into a discrete trajectory. We coin the term space time box (STB), a snapshot of the moving object. STB can be thought of as a region in space-time where the object dwells during a time period and within the limits of a spatial region. The concept of STB is fairly generic and is analgous to the digitization of a continuous signal, such as an AD converter. This digitization of space and time fundamentally changes the way one thinks about trajectories; paving the way for new algorithms that enable fast querying of space-time data on distributed platforms. Further, the choice of the right encoding/digitization allows for hardware acceleration and many of the space-time operations to be implemented using digital logic, which provide several orders of magnitude speedup (on all the queries).

We used the concept of STBs in tackling various problems that range from encoding of spatial operations in hardware using FPGAs, indexing and querying space-time data on distributed noSQL stores, and using deep convolutional neural networks for predicting housing prices (Lee et al. 2014, 2016; Li et al. 2015, 2017; Moussalli et al. 2015; Ganti et al. 2016; Srivatsa et al. 2017; Bency et al. 2017). In this paper, we introduce in a more formal manner the concept of STB (space time box) and show that all of our work derives from this fundamental and novel concept. Thus far, our work had focused on only motion in two-dimensional space. In this paper, we show that STBs in three dimensions are extremely useful in solving astronomical problems. We use 3D STBs to solve the problem of predicting asteroid-asteroid fly-bys, which is a fundamentally challenging problem in the astronomy community. We also note that 3D STBs are useful in ncoding and analyzing objects that are in 3D space, such as planes and drones.

The fundamental contribution of this paper is to show how the fundamental concept of STB is applicable to a wide array of algorithms on various moving objects (whether on the Earth, flying in the air, or moving in the cosmic space). We expand on STBs and their key algorithmic properties in the next Section. We will then summarize the body of work that used STBs in various forms in Sects. 3 and  4. We will go into depth about the asteroid-asteroid fly-by detections/prediction in Sect. 5. The goal of this paper is to show the broad applicability of STBs, in application domains with different types of moving objects or in algorithms for querying and machine learning as well as hardware based acceleration of the various algorithms. As such, we note that the goal of this paper is not to cover any single algorithm, which is done in a piece-meal fashion in our previous work, but to apprise the research community of the benefits of the concept of STB (and different ways of implementing/realizing it). STBs provide a fundamentally new way of thinking about space-time related problems, the discretization fo space/time while preserving key properties of the underlying trajectory is a powerful construct and has the potential to dramatically change the way the research community has been working with such space-time data.

2 STB: space time box

The concept of a SpaceTime box is easiest to understand when we are working in a two dimensional space, such as the Earth. In what follows, we will explain the concepts using the two dimensional space. The section on Asteroids 5 will cover three dimensional STBs in greater detail. An STB can be thought of as a region in space and a time range, a three dimensional cube (as shown in Fig. 1), with the axes being X and Y in two dimensional space and T in the time range. In the case of the Earth’s coordinate system, X and Y map to the longitude and latitude dimensions.
Fig. 1

Illustration of STB on the Earth

STB is a generic technique that discretizes space and time, thus allowing trajectories that are continuous in space and time to be digitized and far more amenable to algorithms for computer systems (we will show later how a specific realization of STB has resulted in significant speedup in spatial indexing and advances in deep learning algorithms). A key question that remains is how to discretize time and space that best achieves performance. We examined several approaches to discretize space and time in our past work (Li et al. 2015, 2017) and we believe that geohashes (Niemeyer 2008) are computationally most efficient in terms of discretizing space and time. We note that geohashes as described in existing literature are applicable to space only, we can extend this to include time as well with simple mechanisms of interleaving a time range as well. For ease of explanation, we will only focus on the encoding of space; with the extensions to multiple dimensions being trivial.

Geohash is an encoding of geographic latitude and longitude in the form of binary strings (a popular variant provides base32 encoding, however, we will only use the binary form). The core of geohash encoding (Niemeyer 2008) is that of interleaving the bits of the multiple dimensions and thus preserving the spatial locality while performing this interleaving. One can now quickly see how multiple dimensions can be encoded through this interleaving. We summarize the properties of this encoding/sampling technique.
  • Deterministic hashing, i.e., an object’s coordinates should be deterministically map always to a set of keys. In general, each hash value h covers a region, such that all points within that region are mapped to the same value h. Deterministic hashing allows for the generation of keys that can be directly used in a distributed scalable key-value stores, and thus enable fast queries on spatiotemporal data. Further, deterministic hashing allows for the mapping of each hash value to a state, which can be used to model trajectory movements easily.

  • Extensible/telescopic hashing, i.e., an object’s coordinates should be mapped to an extensible key such that mapping at different spatial resolutions result in consistent key assignment. An example of extensible hashing over two dimensional coordinates with gradual precision loss (based on key length) is shown below. Given two hashes h and \(h'\) that covers regions r and \(r'\) respectively; in extensible hashing, if h is a prefix of \(h'\) then region \(r'\) is fully contained in region r. Extensible hashing enables variable sampling rates, for example if one were to use a key of length 64, it would result in no sampling at all, whereas a key of length 25 might result in a sampling ratio of 20. An important point to note here is that the sampling ratio when using such a spatial encoding mechanism is data dependent. Further, extensibility yields keys that can not only be used in key-value stores, but also directly support multi-resolution spatial analysis of the data (e.g., zoom into the vicinity of a particular location). We show below the base-32 encoding of the geohashes for different latitude/longitude values and illustrate the telescopy property in Fig. 2.
    $$\begin{aligned} hash(40.00105, -78.30105) & = {} dr07d1yzj21 \\ hash(40.001, -78.301) & = dr07d1yy \\ hash(40.01, -78.2) & = dr07se \\ hash(40, -78) &= dr0e \\ \end{aligned}$$
  • Uniform density hashing, i.e., the technique must support a choice of keys such that given any set of points, the number of points mapped to a given key is nearly equal for all the keys. This allows for a bound on the number of keys that are used at a given granularity and thus enabling variable resolution encoding.

  • Bit arithmetic for manipulating the keys and performing various operations on the keys such as truncate, distance between the keys, and identifying neighboring keys in the 2D space, this enables extremely fast operations on modern processors, GPUs and FPGAs. This enables significant speedups on various spatial operations, which we will show in the later sections.

Fig. 2

Illustration of geohash and its telescopy property

We summarize the various encoding/indexing mechanisms and the properties that they satisfy in Table 1 and observe that geohash based encoding mechanism is the most appropriate one for discretization and computational aspects as it provides a deterministic hashing, allows for variable rate sampling through extensibility/telescopy, and is bounded by 2x from a uniform density mapping perspective. As an example, consider the problem of spatial sampling, i.e. sampling of data that preserves spatial locality and reduces the data significantly while not effecting the results. The spatial sampling process can be mapped to the above geohash based discretization approach as follows: first, each location is mapped to a hash value h given a chosen parameter d, which represents the length of the geohash (which corresponds to the size of the region on the Earth’s surface). All the points that map to the value h are represented by a single sample (i.e., the center of this box). It is evident that the sampling ratio achieved will be data dependent and controlled by the parameter d.
Table 1

Summary of differences in properties across various spatial sampling techniques (\(^*\) - d is the dimension of the KD-tree)

Index name




























3 Algorithms on STB

In the previous section, we described the concept of a SpaceTime Box (STB) and realized an instance of this using geohashes as the technique to implement them. We will now describe various algorithms that utilize this particular version of the STB. The first is a technique to sample the data while preserving the spatial locality and certain key properties of the trajectory data (space-time). Such data sampling techniques can ameliorate the volumes of data collected as well as improve the compute times needed for building models. The second is a suite of techniques to index and query spatiotemporal data using STBs. These techniques are targeted toward distributed key-value stores that provide high throughput querying capabilities, but lack spatiotemporal support (in the last few years, support has been added, but even now, there is no native spatiotemporal support available for many of the noSQL stores). Finally, the STB concept can be applied to deep learning on satellite imagery, specifically to predict housing prices. These algorithms are described in detail in the authors’ past work and will be summarized in the rest of this section briefly. Another interesting application of the STBs in 3-dimensional space and time is that of tracking asteroids, which will be covered in detail in the next Section. Such a wide applicability of the concept of STB (and its realization using geohashes) indicates that the choice of using this particular realization is quite fruitful.

3.1 Sampling of data

As location data is being collected at societal scale (e.g., cars with GPS sensors, various apps on phone), a key question is if all of this data is needed for modeling and machine learning purposes. A key question that we address in our previous work (Srivatsa et al. 2017) is that of subsampling these location traces while preserving the amount of information present in such datasets. We developed a novel subsampling technique that uses the STB geohash to enable efficient spatial cluster sampling. The technique itself is detailed in our previous work (Srivatsa et al. 2017), the key highlight is to select only a small sample of the data from a given space-time box at a given granularity. The granularity is defined by the size of the space-time box. Spatiotemporal information loss is measured by modeling the trajectory as a Markovian chain and different theoretical measures are obtained on these Markov chains, specifically (1) state-space error metric, (2) KL-divergence, (3) mixing time, and (4) perplexity measure. Subsampling can be performed to reduce the data by 75% while not significantly reducing the amount of information as characterized by the above four metrics. The spatial locality based hashing and the fast computation of these hashes enables efficient subsampling, thus making any of the pattern larning and downstream machine learning quite efficient.

3.2 Indexing of spatiotemporal data

The increase in data collected and the dawn of the big data era has resulted in the development and popularity of distributed filesystems such as HDFS and the distributed data stores as well as query engines that store and answer queries on such data. These storage and compute systems work on a cluster of machines that shard the data and scale to very large volume of queries. Support for spatiotemporal data storage and retrieval in such systems is a more recent phenomena (Nishimura et al. 2011). Our previous work has shown how we can apply STBs to scale and use existing distributed compute systems for very fast spatiotemporal queries (Lee et al. 2014; Li et al. 2015, 2017). We will briefly describe how the geohash based encoding is used to scale to noSQL stores. We assume that a trajectory (note that, a single point or geometry can be thought of as a trajectory of length one) is converted to STBs through the encoding mechanism described in the previous section. The encoded STB is mapped to a specific partition in the distributed store, with each partition preserving the spatial locality. A query is first encoded to an STB, which is then used to select the partitions in the distributed store to retrieve (equivalent to a pruning mechanism), which happens in a parallel and scalable manner as the underlying distributed store has such properties. We note that the key step here is the conversion to the encoded STBs, without which the scale of the underlying distributed store cannot be easily leveraged. The interested reader can get more details in Li et al. (2015). An in-memory implementation of this indexing technique has also been developed in our previous work in Ganti et al. (2016), which implements a trie-based indexing scheme. This particular implementation is quite interesting as it can be implemented in hardware (we will cover the hardware based extensions in the next section). We show that in-memory trie implementation can give a \(1.2\times\)\(4\times\) gain over existing in-memory spatial indexes. Finally, the telescopy nature of the STBs allows for natural hotspot based load balancing in these distributed storage systems, which is extremely useful as the data density changes depending on the time of the day. As an example of this data density change, we illustrate hotspots using NYC taxicab data in Fig. 3. The first two figures (a and b) are heatmaps of the taxi pickup and dropoff for the time period of 8pm-midnight on New Year’s eve of 2011 and 2013. One can observe the similarity between these heatmaps, indicating that nothing changes when compared across a period of two years. However, the figures (c) and (d) are for the time period of January 1st 6am–10am for the years 2011 and 2013. We note that these heatmaps are drastically different from (a) and (b), indicating that within a few hours time window, the data density has changed significantly. Such hotspots can be addressed using STBs variable resolution. These various approaches to indexing of spatiotemporal data is achieved using STBs and extensions of the core implementation of these STBs (using geohashes) to varioud distributed storage and computational systems.
Fig. 3

Hotspots in NYC

3.3 Deep learning

The concept of STBs can be applied in the context of deep learning on images. One such example from our previous work (Bency et al. 2017) is that of using satellite images to predict home prices. Deep Convolutional Neural Networks (DCNNs) have become quite popular for image classification, recognition, and feature extraction from images. A popular set of applications of DCNN on street view imagery are high level human perception of safety, wealth, direction to ubiquitous landmarks and housing prices prediction. We examined the housing price prediction, which takes into account different features at different granularity of space-time boxes. For example, let us consider the homes depicted in Fig. 4. This figure illustrates two types of houses, the top row images are expensive homes, whereas the bottom row images are cheaper ones. We observe from the contrast of these two sets of images that the expensive homes tend to exist in neighborhoods with larger backyard and green-space and water bodies such as ponds and swimming pools, whereas cheaper houses tend to be locateed in compact neighborhoods where the houses are adjacent to each other with concrete and roads occupying more space. These differences can be easily spotted at different zoom levels, which correspond to different granularities of the STB.
Fig. 4

Satellite images with varying home prices

The key idea of our work in Bency et al. (2017) was to let the DCNN learn these various features using different granularity STBs, each DCNN at a given zoom level learns a set of features (e.g., zoom level 17 may learn the presence of trees in the neighborhood, zoom level 18 may learn the presence of a swimming pool). These extracted features are then combined with the description of the house (e.g., number of rooms, square footage) and a model is trained. We showed that such a model outperforms traditional approaches such as SAR (Spatial Auto Regression) and Random Forest significantly (up to 34.5% reduction in RMSE). This is a novel application of the multi-level/telescopy feature of the STB.

4 Hardware acceleration

A key feature of the STB encoding that we alluded to earlier in Sect. 2 is that of bit arithmetic making it amenable to hardware acceleration. Hardware acceleration allows for extremely fast computations on spatiotemporal data. Speedups of up to 1000× over traditional software approaches can be achieved using hardware implementations. We proposed and implemented two different hardware acceleration approaches, one using FPGAs (Moussalli et al. 2015; Lee et al. 2016) and the second using Ternary Content Addressable Memories (TCAMs) (Ganti et al. 2016).

4.1 FPGAs

The STB encoding is a bit-level encoding with simple bit operations mapping to spatial predicates, for example a prefix match indicates a containment relationship (within a certain degree of granularity). One way to exploit this property is through the use of FPGAs that allow for encoding the various spatial predicates (e.g., topological containment, touch). From an FPGA implementation standpoint, we can break the spatial predicate evaluation pipeline into four stages, (1) preprocessing, involves encoding the STBs and sorting them based on their lengths, (2) breakpoint identification, the first bit-level difference between two geohashes (which is identified through a simple XOR operation), (3) mask generation, different types of masks for blocking unnecessary bits of the encoding, and (4) contains operation, this is equivalent to a prefix matching operation on the encodings. The details of these steps and their implications on various spatial predicates can be found in Moussalli et al. (2015) and Lee et al. (2016). We show that all the spatial operations can be achieved using the above four steps. These steps are realized using FPGAs, which provides a 20-90x speedup over the software implementations of these various operations.

4.2 TCAMs

TCAMs—Ternary Content Addressable Memory is a hardware device that is routinely used in network routers for IP address matching and routing of the queries. TCAMs operate at billions of matching operations per second including prefix based matching. In order for indexing to work with a TCAM based approach, we need to realize a software-only approach that implements spatial queries as simple bit level arithmetic (e.g., boolean AND, OR). Once the spaiotemporal data is encoded through the use of STBs (realization through geohashes), we show that all the spatial (and temporal) queries can be achieved using simple bit level operations (details can be found in our previous work Ganti et al. (2016)). These queries are realized through a hierarchical representation of the encoding using a PATRICIA-trie (Morrison 1968). These bit-level query operations are primarily prefix matching at a bit level, which is very common in TCAMs. We implemented these operations in an off-the-shelf TCAM (Cisco ASR 1000 ESP), which provided a 1000x speedup in all the spatial operations.

5 Asteroid collision detection

In this section, we show how a three dimensional extension of the STB implementation can be used to solve an astronomical problem. The problem is that of being able to compute if two asteroids will intersect each other in a given time frame (the time could be in to the future as well). Before we describe the problem in further detail, we will provide a brief background and also note that this portion of the work has not been previously published.

5.1 Background

There are about 610k asteroids that are known to mankind and new asteroids are being discovered every day. These asteroids’ trajectories are computed using an N-body simulation technique (software used is OORB) (Granvik et al. 2009). As asteroids come closer to other objects (asteroids/other larger objects), the trajectory of an asteroid can be potentially altered. Typical N-body simulators only consider the gravitational influence of the eight planets and the sun in their models and do not consider potential asteroid-asteroid interactions. The primary reason for this is that as the number of objects in the model is increased, the simulation time increases significantly (\(O(N^2)\) worst case and O(NlogN) amortized average case under ideal (hierarchically partition able) distribution of bodies in space). Further, in the recent past, it was observed by some of the telescopes that asteroids can indeed collide with each other, which is illustrated in Fig. 5.
Fig. 5

A Hubble Space Telescope picture of a comet-like object called P/2010 A2 shows a bizarre X-pattern of filamentary structures near the point-like nucleus of the object and trailing streamers of dust. Scientists think the object is the remnant of an asteroid collision. Credit: NASA, ESA, and D. Jewitt (UCLA)

5.2 Detecting asteroid flybys

The challenge we face from a computational standpoint is to be able to compute the asteroids that are near each other in space at a given time and use this information to account for the changes in the trajectories of the asteroids that fly-by each other.

Our solution approach is to observe that a given asteroid’s trajectory at a given time is affected by the major planetry objects and the sun as well as those objects that are near it at that time (such as other asteroids). We observe through simple experiments that on an average at a given point of time, about 30–40 asteroids are within a distance of 0.01 AU of a randomly selected asteroid, illustrated in Fig. 6. Astronomical Unit (AU) is defined as the mean distance between our Earth and the sun (approx. 150M km). Considering that small objects that are beyond a specific distance do not affect the trajectory of a particular asteroid significantly, one can reduce the computational requirements significantly by considering only those objects within a certain distance. However, the question of adequately and efficiently pruning objects beyond a certain distance still remains.
Fig. 6

CDF of number of asteroids within a distance of 0.01AU of each other

From a computational standpoint, the problem of pruning objects beyond a certain distance is analogous to indexing moving objects in the 3D space. As new asteroids are discovered routinely and N-body simulations are compute intensive, we seek a solution that can be deployed and scaled on emerging distributed platforms such as Hadoop, Spark, and Streams. Our approach is to rely on fast and efficient distributed key-value stores such as HBase, REDIS, which are capable of performing range queries (get all records whose key is within a range (ab)) very efficiently.

A basic challenge for this approach to work is to be able to index the 3D location of an asteroid and update the location of the asteroid as well as perform fast queries. Current key-value stores support efficient queries only when the key is single dimensional and do not provide support for multi-dimensional keys.

This brings us to the extension of the realization of STBs thus far, which were in the 2D space, we note that the concept of STB itself is not restricted to the dimensionality and can be N-dimensional (\(N \ge 2\)). We extend the current implementation to a 3D spacetime box. Since an asteroid’s location is represented in the heliocentric coordinate system, which happens to be a 3D Cartesian coordinate system, we can discretize the location of an asteroid to a 3D STB. Similar to the Geohash (which was described earlier), a 3D STB converts the heliocentric coordinates and the time to a single dimensional encoding by interleaving the bit representation of each of the mappings to generate an interleaved bit representation of the location and time. The discretization depends on the granularity of the STB, the coarser the granularity, the larger the STB. Two asteroids that map to the same STB will be within the specified distance at a given time. We illustrate this encoding process in Fig. 7. Hence, the problem of identifying if two asteroids are within a certain distance reduces to checking if they map to the same 3D STB.
Fig. 7

Illustration of end-to-end algorithm

However, a problem arises when two asteroids are in adjacent space boxes, but still close to each other (within the specified distance). In this situation, one needs to do an additional check, i.e. if two asteroids’ STB mappings are neighbors. This process is illustrated in Fig. 7. Note that neighborhood checks on two keys can be accomplished using simple bit arithmetic on the key values. In the worst case the number of neighbors of one space box in 3D is \(3^3-1 = 26\). However, in practice, one needs to explore neighboring space boxes only if the object is close to the boundary of its space box; and if the object is close to the boundary only a small subset of these 26 neighbors are relevant for spatial analysis. Given a point (xyz) depending on whether on one, two or all three dimensions the point is close to the space box boundary, we have one, three and seven neighbors respectively.

We remark for a randomly chosen point, the likelihood one, two or three dimensions of the point being close to the space box boundary exponentially decreases; thus the likelihood of the number of neighbors being 0, 1, 3 and 7 also exponentially decreases. Indeed the expected number of neighbors for a randomly chosen point with distance threshold set to half the space box size (in each dimension) is 3.125. We will utilize this space box approach for pruning the search space and significantly reducing the number of objects being considered when we perform the N-body simulations to determine the new orbital elements for a given asteroid.

We will combine the above concepts to develop an end-to-end algorithm for identifying pair-wise asteroids that fly-by within a certain distance (for a given time). As these objects are constantly moving, our approach is to consider all the possible locations of an asteroid during a given time interval. We introduce the notion of a 3D STB, which is comprised of a 3D space box and a time range. Hence, an asteroid’s current location at a given time is mapped to a 3D space box (with interleaved coordinates as specified above) and a time range (e.g., given time range parameter as one day, an asteroid which is say at Earth time 19:30 July 30, 2015 will be mapped to the time range of July 30, 2015, 19:00 to 20:00 ). Once the asteroids are mapped to their STBs at a chosen granularity, neighbors that are farther than a certain distance during a certain time range can be pruned. The asteroids that are within a certain distance during a specific time range (and the major objects such as the 8 planets and sun) are provided as input to the time-step N-body simulator. The simulator then determines the exact locations of the asteroids at the end of the time window. We then use interpolation techniques for the specified time window to check if a pair of asteroids’ paths intersect or are within a certain distance parameter.

We observe that the choice of the granularity of the space box and time range is going to impact the performance of the algorithm. If a large space-time box is chosen, then a larger number of asteroids need to be considered for the exact simulations, which affects the performance. If a really small space-time box is chosen, then the accuracy of the simulations will be affected. The first step in our algorithm is to choose a time range – [t1, t2] (in this case, we chose one day based on empirical observations) and then forecast the location of each of the 600k asteroids at time t2 (given the location at time t1). This first step uses only the effects of the 8 planets and sun in the simulator. Once the locations of asteroids at time t2 are determined, we choose a space box size (in our case, 0.01 AU) and identify those asteroids that are within a distance of 0.01 AU from each other. Again, the distance is chosen based on empirical observations. Once the asteroids that are within a certain distance are identified, we re-run the simulations for a smaller time range (we chose one hour) for only those asteroids that are within the specified distance. We note that the reason to rerun the simulations is because it is highly space inefficient to keep all the locations of every asteroid during the one-day time range for the time steps (of the simulator). To give the reader a rough idea of the extent of disk space required, one year’s worth of data requires 250GB of free disk space for each asteroid when only time steps of one day intervals are stored. Further, we note that the number of asteroids involved in the smaller time range simulation is significantly lesser (in the order of a few tens). Finally, we use the smaller space box (of size 0.001 AU) to identify which asteroids are near each other and then interpolate if their paths intersect. The interpolation method that we use is a simple linear interpolation. This hierarchical descent approach can be applied further to improve the accuracy of reports by choosing a smaller time range and a space box. We also plan on exploring different interpolation techniques such as elliptic arc interpolation.

We ran the simulations for the next 25 years and show the top 3 asteroid-asteroid encounters by distance and size (that come within 0.01AU) in Tables 2 and  3.
Table 2

Top three encounters by proximity







May 1, 2032 63353.9318 (MJD)

299 km


2–4 km


4–9 km

0.000002 (AU)

15.8 (H)

13.9 (H)

Nov 24, 2016 57716.07911 (MJD)

449 km


1–2 km


2–5 km

0.000003 (AU)

17.4 (H)

15.5 (H)

Jan 11, 2018 58129.29692 (MJD)

449 km


530–1200 m


2–4 km

0.000003 (AU)

18.3 (H)

15.8 (H)

Table 3

Top three encounters by size and within 0.01 AU of each other







Feb 18, 2028 61819.1561 (MJD)

70 K km


110–240 km


2–5 km

0.000469 (AU)

7.13 (H)

15.5 (H)

Feb 28, 2031 62925.12725 (MJD)

54K km


35–75 km


2–4 km

0.000359 (AU)

9.4 (H)

16.1 (H)

Oct 25, 2036 64991.01073 (MJD)

43K km


65–150 km


3–7 km

0.000289 (AU)

8.02 (H)

14.3 (H)

In the Tables 2 and 3, A1 refers to asteroid 1 and A2 refers to asteroid 2, S1 refers to size of asteroid 1 and S2 refers to size of asteroid 2. The encounter dates are provided in calendar time as well as astronomical time (expressed as MJD, Modified Julian Day, which is an unambiguous calendar system used in astronomy), the encounter distance is provided in kilometers and typically distances under a few thousand kilometers are considered as very close fly-bys, asteroids are identified using their universal codes assigned to them when they are discovered, the size of the asteroid is defined in kilometers as well as the term H, which is called the absolute magnitude and is the visual magnitude an observer would record if the asteroid were placed 1 Astronomical Unit (au) away, and 1 au from the Sun and at a zero phase angle. In fact, one of our predictions was observed by our partners, Institute of Astronomy, Hawaii using the largest land telescope (Lilly et al. 2015).

6 Related work

We will provide a brief overview of the related work relevant to this paper, the detailed related work on each of the algorithms can be found in the corresponding papers. Typical approaches to analyze trajectories model the trajectory as continuous movement pattern and develop techniques on these continuous movement lines (Han et al. 2001). On the other hand, our approach to analyzing trajectories is through the use of space time box (STB) and discretization of these trajectories. The discretization step fundamentally changes the way we analyze movement patterns, allowing us to use distributed computing as well as hardware acceleration.

We will provide a brief recap of the various pieces of related work to the algorithms summarized in this paper, the data sampling focuses on preserving spatial locality while allowing for characterizing the mobility patterns. Previous work (Gonzalez et al. 2008) has shown that human mobility is highly periodic, thus providing insights into why this spatial sampling approach might work.

Spatial and spatiotemporal indexing has received a lot of attention in the past, with basic approaches such as R-trees (Guttman 1984) and grid indexing. The various approaches can be classified as either tree-based or grid-based, a comprehensive summary of these indexing schemes is provided in  (Šidlauskas et al. 2009), which also shows that grid indexes and tree-based approaches can be tuned to achieve comparable performance. Some extensions to these approaches were performed to work with distributed platforms such as Hadoop and HBase (Nishimura et al. 2011). Our work on indexing shows that we can use the STB concept to use existing big data systems without any modifications and perform spatial/spatiotemporal indexing. Our approach is also shown to be scalable and more performant when compared to the other systems. Other approaches to improving indexing performance include leveraging moving object patterns to maximize the bandwidth and minimize unnecessary load on the index structure, lazy batch queries, which can be used to improve any indexing technique’s performance, including the one of STBs. In fact, we believe that the STB approach will be superior as once the encoding is performed, any further operations are on bit-level representations with simple bit operations, which is faster than traditional geometry based approaches.

Finally, the deep learning application is a completely novel approach when compared to approaches that use Spatial Auto-regression (SAR) (DUBIN et al. 1999) as they are capable of capturing multi-level complex features using STBs on satellite imagery.

Overall, we have shown that STB as a concept is a powerful construct when compared to the existing techniques in various domains from sampling to deep learning. We envision that many of the future algorithms should be implemented using this fundamental construct.

7 Conclusions and future work

This paper introduced a fundamental and powerful construct called the space time box (STB), a term that we coin in this paper. This construct allows the reader to think of movement data in a discretized manner. We use geohashes as a mechanism to implement the STB construct in 2 dimensions on the Earth’s surface and extend this to 3D space through application to heliocentric coordinate system. We summarize the body of work that we conducted (through publications at various venues) on the use of realization of the STB concept toward various algorithms. These algorithms cover sampling of data for reduction in data volumes while preserving data quality, indexing of spatiotemporal data in large scale distributed systems, and deep learning on satellite images to predict house pricing. We also show that the STB encodings and operations on STBs are realizable through the use of hardware – TCAMs and FPGAs, which provide orders of improvement in performance. Finally, this paper introduced for the first time the problem of asteroid-asteroid fly-by detection and the application of a 3D STB to develop an efficient mechanism for solving it. We also predict the fly-bys for the next 25 years and with the help of our partner institute, validate one of our predictions. To conclude, we firmly believe that the concept of an STB and its realizations through geohashes and applications to various space-time problems is only the tip of the iceberg and that the future of machine learning and more complex trajectory algorithms is in the use of these STBs. We plan to pursue efforts in using STBs for trajectory mining as well as further space-time based machine learning in the future.



The authors would like to acknowledge all the co-authors of the previous body of work that utilized STBs and the derivative concepts, Sameh Asaad, Archit Bency, Manjunanth B. S., Larry Denneau, Dajung Lee, Kisung Lee, Eva Lilly-Schunova, Shen Li, Ling Liu, Roger Moussalli, Jorge Ortiz, Swati Rallapalli, and Petros Zerfos.


  1. Bency, A.J., Rallapalli, S., Ganti, R.K, Srivatsa, M., Manjunath, B.S.: Beyond spatial auto-regressive models: predicting housing prices with satellite imagery. In: IEEE Winter Conference on Applications of Computer Vision (WACV), Santa Rosa, 24–31 March 2017. IEEE (2017).
  2. Dubin, R., Pace, K., Thibodeau, T.: Spatial autoregression techniques for real estate data. J. Real Estate Lit. 7(1), 79–95 (1999)CrossRefGoogle Scholar
  3. Eubank, S., Guclu, H., Kumar, V., Marathe, M., Srinivasan, A., Toroczkai, Z., Wang, N.: Modelling disease outbreaks in realistic urban social networks. Nature 429(6988), 180–184 (2004)CrossRefGoogle Scholar
  4. Ganti, R., Srivatsa, M., Agrawal, D., Zerfos, P., Ortiz, J.: Mp-trie: fast spatial queries on moving objects. In: Proceedings of the Industrial Track of the 17th International Middleware Conference, Trento, 12–16 Dec 2016.
  5. Gonzalez, M., Hidalgo, C., Barbasi, A.-L.: Understanding individual human mobility patterns. Nature 453, 779–782 (2008)CrossRefGoogle Scholar
  6. Granvik, M., Virtanen, J., Oszkiewicz, D., Muinonen, K.: Openorb: Open-source asteroid orbit computation software including statistical ranging. Meteorit. Planet. Sci. 44(12), 1853–1861 (2009)CrossRefGoogle Scholar
  7. Guttman, A.: R-trees: a dynamic index structure for spatial searching. In: Proceedings of ACM management of data (SIGMOD), Boston, 18–21 June 1984Google Scholar
  8. Han, J., Kamber, M., Tung, A.K.H.: Spatial clustering methods in data mining: a survey. In: Miller, H.J., Han, J. (eds.) Geographic Data Mining and Knowledge Discovery, Research Monographs in GIS. Taylor and Francis (2001)Google Scholar
  9. Kitamura, R., Chen, C., Pendyala, R., Narayanan, R.: Micro-simulation of daily activity-travel patterns for travel demand forecasting. Transportation 27(1), 25–51 (2000)CrossRefGoogle Scholar
  10. Lee, K., Ganti, R.K., Srivatsa, M., Liu, L.: Efficient spatial query processing for big data. In: Proceedings of the 22nd ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, Dallas, 4–7 Nov 2014.
  11. Lee, D., Moussalli, R., Asaad, S., Srivatsa, M.: Spatial predicates evaluation in the geohash domain using reconfigurable hardware. In: IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), Washington, 1–3 May 2016. IEEE (2016).
  12. Li, S., Hu, S., Ganti, R., Srivatsa, M., Abdelzaher, T.: Pyro: a spatial-temporal big-data storage system. In: USENIX Annual Technical Conference, Santa Clara, 8–10 July 2015Google Scholar
  13. Li, S., Amin, M.T., Ganti, R., Srivatsa, M., Hu, S., Zhao, Y., Abdelzaher, T.: Stark: Optimizing in-memory computing for dynamic dataset collections. In: IEEE 37th International Conference on Distributed Computing Systems (ICDCS), Atlanta, 5–8 June 2017. IEEE (2017).
  14. Lilly, E., Jonas, J., Srivatsa, M., Ganti, R., Agrawal, D., Denneau, L., Kratky, M., Wainscoat, R.J.: Predicting close encounters between asteroids with the STB software. In: AAS/Division for Planetary Sciences Meeting Abstracts #47, vol. 47. American Astronomical Society (2015)Google Scholar
  15. Morrison, D.: Patricia—practical algorithm to retrieve information coded in alphanumeric. J. ACM 15(4), 514–534 (1968)CrossRefGoogle Scholar
  16. Moussalli, R., Srivatsa, M., Assad, S.: Fast and flexible conversion of geohash codes to and from latitude/longitude coordinates. In: IEEE 23rd Annual International Symposium on Field-Programmable Custom Computing Machines, Vancouver, 2–6 May 2015. IEEE (2015).
  17. Niemeyer, G.: Geohash. (2008)
  18. Nishimura, S., Das, S., Agrawal, D., Abbadi, A.E.: Md-hbase: A scalable multi-dimensional data infrastructure for location aware services. In: IEEE 12th International Conference on Mobile Data Management, Lulea, 6–9 June 2011. IEEE (2011).
  19. Šidlauskas, D., Šaltenis, D., Christiansen, C.W., Johansen, J.M., Šaulys, D.: Trees or grids?: indexing moving objects in main memory. In: Proceedings of the 17th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, Seattle, 4–6 Nov 2009, pp. 236–245.
  20. Srivatsa, M., Ganti, R., Mohapatra, P.: On the limits of subsampling of location traces. In: IEEE 37th International Conference on Distributed Computing Systems (ICDCS), Atlanta, 5–8 June 2017. IEEE (2017).

Copyright information

© China Computer Federation (CCF) 2019

Authors and Affiliations

  • Dakshi Agrawal
    • 1
  • Raghu Ganti
    • 1
    Email author
  • Jeff Jonas
    • 2
  • Mudhakar Srivatsa
    • 1
  1. 1.IBM T J Watson Research CenterNew YorkUSA
  2. 2.Senzing Inc.Venice BeachUSA

Personalised recommendations