1 Introduction

With the widespread use of sensors and GPS, a massive number of trajectory data are accumulated every day, such as taxi trajectory data, bus trajectory data, and other moving objects’ trajectory data. In some important applications, such as anti-terrorism, infectious disease prevention, suspect tracking, etc., tracks queries often require high timeliness and accuracy. However, the given query point set is small (several points to thousands), but the trajectory database is massive (millions of trajectory points to billion-level), so it is hard to obtain precise tracks in a short time. Therefore, how to efficiently query the track from the trajectory big data is an important issue in the trajectory retrieval field.

At present, the main methods to solve the efficiency problem of trajectory query are divided into two major aspects: tree-based spatial index and signature-based methods. The tree-based spatial index method uses the spatial index mechanism to improve trajectory retrieval efficiency. This mechanism divides the geo-space regularly and then uses the tree structure to organize the trajectory data hierarchically. However, the unstructured essence and the spatiotemporal heterogeneity of the trajectory cause the tree-based structure to have a high overlap of search region, lowerings query performance. The signature-based method uses the signature and storage structure of the trajectory to simplify and filter the trajectory. Because all the trajectory signatures are scanned and filtered before accessed, a large number of disk accesses are reduced. This type of method has certain advantages in trajectory organization and storage, and can better achieve the fuzzy matching and similarity measurement of trajectories, however, its problems such as difficulty in structure embedding and high complexity of signature calculation, would cause the inefficient trajectory search. These two methods both have improved the query performance to a certain extent, but they are far from the high efficiency, and their memory occupation and query time increase shapely when facing the trajectory big data. These two methods always have a bottleneck in efficiency while querying the trajectory based on the single point, and there is no research to search the trajectory based on the independent tracks (the subsequence of the trajectory).

The general trajectory retrieval takes the point as the minimum search object, instead of the point sequence (trajectory-subsequence, track). In fact, the essence of the track is a sequence of position records of moving objects in space and time [1], and the long time series trajectory consists of many tracks. A track is composed of a series of points, which could be regarded as a single independent object for the input condition of the query process. In this method, we attempt to deconstruct the trajectory into the independent tracks for the query.

The trajectory points are high-dimensional and disordered in time and space. However, when they are regarded as a track, they can be regarded as point stream data, which are very similar to text stream data in structure and semantics. Both stream data are composed of a series of points/characters, and the text string and the track as a whole both have special meanings. Therefore, we try to convert the irregular and disordered trajectory subsequence into a track string, and the gridded space and geocoding provide convenience for the mapping of tracks to strings. The geo-space is divided into regular cells by grid and each grid is given a unique independent geocode. The points located in the same cell would be regarded and simplified at the same point, thereby eliminating the random distribution of trajectory points in each grid. The combination of grids makes irregular trajectory point sequence turn into regular encoded track strings and maintains the uniqueness after each track encoding process. After gridded and encoded, the track can be mapped into the bit-array of the Bloom Filter.

The Bloom Filter [2] has great advantages in string retrieval, and it represents a specific string through the combination of several bits, allowing the memory to be utilized efficiently. The query of the Bloom Filter is implemented with some independent hash functions, and the time complexity is closed to O(1)and this query is not affected by the length of the string. This advantage is particularly obvious when dealing with big data. The characteristics of the Bloom Filter provide the convenience of processing the massive encoded tracks. Therefore, it is possible to use the Bloom Filter to query trajectory big data, quickly and precisely.

In this paper, we try to use the Bloom Filter to construct a track query method, combining geographic grid, trajectory encoding, hash mapping, etc. to achieve a fast and precise query of massive trajectory data. This track query method mainly involves the construction of the geographical grid, the tracks encoding, the creation of the Bloom Filter, and the query of tracks, as shown in Fig. 1.

Fig. 1
figure 1

The process of query method based on the Bloom Filter

The contributions of this paper are shown as follows:

We have proposed a new fast retrieval method for trajectory data, called the BF-based track query method, which can quickly retrieve target track from massive trajectory big data in time complexity O(1).

We use the track as the smallest unit of the query, rather than a single trajectory point, allowing spatial trajectory query to be converted into a string query.

We conduct extensive experiments study based on different vehicle trajectory datasets, to test the effectiveness, efficiency, stability, and robustness of our method. The time and efficiency of our method under different data volumes are compared. The experiment results demonstrate the effectiveness and efficiency of our method.

2 Related work

To the best of our knowledge, previous studies of the tree-based index for improving trajectory query efficiency, mainly focus on the geo-space partition, minimizing coverage of retrieval region, and reducing search region. Most of these index structures are based on the development and evolution of the R-tree and its variants, and the core of their index is the tree-based structure.

Geo-space partition

Geo-space partition uses the grid to divide the whole space region regularly, including SETI [3], SEB-Tree [4], etc. The SETI (Scalable and Efficient Trajectory Index) structure [3] proposed by V. Prasad Chakka et al. defines that geo-space is partitioned into non-overlapping regions, and each region builds a specific index: the R-Tree [5]. But in each sub-region, R-Tree will bring retrieval region overlap. SEB-Tree index structure tries to represent trajectory segments by their start and ends timestamp within a spatial grid. Although geo-space partition has reduced query time and has a good performance for regular spatial data, it is hard to process the irregular and random massive trajectory data.

Minimizing coverage of retrieval space

The main access methods for minimizing coverage of retrieval space are R-Tree and its variants [5,6,7]. The R-tree index structure [5], proposed by Guttman, uses quadratics and linear algorithms to split spatial objects, but the tree-based index structures have a heave MBR (minimum bounding rectangle) overlap. Greene’s [8] split method, proposed by Greene, has much fewer pages overlap than Guttman’s strategy. R*Tree [6], proposed by Beckmann, uses the topological split algorithm to reduce page overlap, and the reinsertions further optimize the performance of the tree for the index of spatial points. Bulk loaded R*Tree using Sort-Tile-Recursive (STR) [7]., the leaf pages do not overlap at all, but it is hard to process the point sequence (the track) and the trajectory data required to be known beforehand. R-Tree and its variants do not use index trajectory directly, but most index structures make them a core and important part.

Reducing search region

The trajectory is randomly distributed in the whole Euclidean space in principle, but most vehicles or moving objects usually move on geo-networks(road), instead of on the whole geo-space. Researchers have proposed many index structures based on fixed geo-networks. For objects constrained to move on fixed networks, Frentzos [9] proposed FNR-Tree (Fixed Network R-Tree), which is to construct a 1-dimensional R-trees on top of a 2-dimensional R-Tree to index trajectory segments. Li Guohui and Zhong Xiya [10] proposed an index structure called IMTFN (Indexing Moving objects Trajectories on Fixed Networks), which is based on the FNR-Tree and improves it. Victor Teixeira De Almeida and Ralf Hartmut Güting [11] proposed MON-Tree(Moving Objects in Networks Tree) index structure, which uses the top-level two-dimensional R-tree and a hash table structure to index the road sections in the traffic network. The index based on the fixed network can reduce the overlap of index space and redundancy to some extent. However, the trajectory index based on the fixed-network still has the problem of index coverage, because most index structures have to maintain an R-Tree as their core part.

Signature-based methods

On the other hand, many researchers try to solve the bottleneck of track query performance from the aspects of trajectory signature and storage structure. Chang J et al. [12] proposed a signature-based trajectory indexing scheme for moving the object’s trajectory, which has better performance than the TB-tree and FNR-tree. Shim et al. [13] introduced a new signature-based indexing scheme that supports similar sub-trajectory retrieval. Scanning and filtering all signatures before accessing the trajectory data reduces a large number of disk accesses thus achieving better performance. PIST [14], introduced by Botea et al., tries to partition points set into a variable-sized grid and then uses the quad-tree to query the data. But it does not adapt to new data being added dynamically, and it focuses on indexing points rather than trajectories. The TrajStore [15], introduced by Philippe et al., proposes a new adaptive storage system that indexes the trajectory data based on the quad-tree index and clustering methods. Same as PIST, it partitions points rather than trajectories, and it hard to deal with the long-sequence trajectory and trajectory big data. The SharkDB [16] attempts to use in-memory storage to achieve the purpose of fast track retrieval and no I/O cost, but the memory would overload when the SharkDB faces the big data of trajectory. Although researchers have done a lot of research on the storage structure and storage modes of the trajectory, no method is found to meet the rapid and precise query of the fast-growing massive trajectory data.

The Bloom filter

The typical Bloom Filter was put forward by Burton Howard Bloom [2] in 1970, but it does not support some operations, such as the delete, insert, and other operations. To solve these problems, researchers introduced a series of Bloom Filter variants based on Burton’s work. For delete and insert operations, the Counting Bloom filters [17] proposed by Fan et al., provide an operation that can delete and insert elements without having to re-create a new filter. Almeida et al. proposed Scalable Bloom filters [18], which dynamically insert elements while ensuring a false positive rate (FPR). The Bloom Filter can also be used for the spatial query. The Spatial Bloom filters, proposed by Palmieri et al. [19] aim to store location information, especially when the location is confidential. For performance improvement, the Parallel Partitioned Bloom Filters [20] can perform parallel hash calculations for insertion and query; the Distributed Bloom filters [21] support parallel processing of elements. The data structure proposed by Kiss et al. [22] can exclude not only false negatives but also false positives, which is extremely important for those FPR intolerable scene applications. At present, the Bloom Filter has perfected the complete system of queries or indexes for the specific goals and scenarios. And the retrieval method based on the Bloom Filter, which is different from the index of the tree-based structure, is a Boolean decision, not a logical judgment. It will help to improve retrieval efficiency.

In short, the tree-based index structure design satisfies the spatial relationship among spatial geometry objects. It has certain advantages for the regular distribution of spatial objects and spatial relationship queries, but it is not suitable for Spatio-temporal trajectory data, especially the long trajectory. Also, the research on the storage and organization of trajectory data has improved the query efficiency of trajectories to a certain extent, but when massive trajectory big data are processed, there are still excessive memory usage and frequent I/O. Therefore, it is particularly important to develop a BF-based trajectory query method.

3 Method

3.1 Problem definition

Figure 2 gives an illustration of a GPS trajectory. The trajectory, which consists of plenty of points, is the path of a body as the path travels through space. The track is the record left by something that has passed along, such as the track of a ship or the track of a car. That is to say, a trajectory is made of a track or points sequence. For the original vehicle trajectory data, we can extract a series of independent tracks, each of which can be represented as a single track, as depicted in the top part of Fig. 2.

Fig. 2
figure 2

GPS trajectory and corresponding subsequence

Let pi denote the start of the trajectory, and pn denote the end time of the trajectory. The track of a GPS trajectory is a sequence of points between the point pm and the pointpm + k. Put formally,

$$ l=\left\{{p}_m,\dots, {p}_{m+k}\right\} $$
(1)

The given GPS trajectory data are defined as T, where the track is represented as the sequencel = {pm, …, pm + k} of the trajectory point: pirepresents a single trajectory point, represented by the tuple pi = 〈id, timestamp, latitude, longitude〉, where id represents car’s ID, timestamp represents the time of the trajectory point pi records, latitude represents the latitude coordinate of pi coordinate, and longitude represents the longitude coordinate of pi, as depicted in the bottom part of Fig. 2. The track Q to be queried is represented as Q = {q1, q2, …, qn}. The geographical grid is defined as Ggeo and the range is represented as Tmsr, where minX, minY, maxX maxY is the minimum spatial range of the coordinates of all trajectory points in the trajectory set T, and the cell size in the geographic grid is represented as Ggeoin meters. The grid is divided into longitude direction and latitude direction from the beginning of the upper right corner of the grid, which is represented as \( Code=\left\{\begin{array}{l}{x}_1,{x}_2,\dots, {x}_n\\ {}{y}_1,{y}_2,\dots, {y}_n\end{array}\right. \). For example, based on the trajectory path represented by the geographical grid, Fig. 3 shows how to map the trajectory sequence to the Bloom Filter.Si represents a track, each of which contains several trajectory points. For example, the trajectorySi would be encoded as\( {Code}_{\left({S}_i\right)}= id-112232n3 nn \) (Table 1).

Fig. 3
figure 3

Schematic diagram of geographical grid representation of the trajectory

Table 1 Table of notations

Definition 1

(track). The track is a subsequence of trajectory, which is a subset of massive vehicle trajectory dataset, represented by l, continuous in time and space, and has a certain linguistic significance. In this paper, the track represents the spatial path of the vehicle in a period. Given a trajectory data set T, we can construct a series of tracks according to the original irregular trajectory points.

$$ l=\left\{{p}_1,{p}_2,\dots, {p}_n\right\},\kern1em l\subseteq T,\kern1em T=\sum \limits_{i=0}^nl $$

Definition 2 (geo-grid)

The geo-grid, represented by Ggeo, is the geographic grid, which is mainly used for regularizing geographic space and track coding. Given a trajectory dataset T, the geo-grid will be constructed based on T‘s minimum bounding extent Tmsrand the single grid size gsize, and every single grid has an independent geocode, as shown in Fig. 3 left part.

$$ {G}_{geo}=\left\{{T}_{msr},{g}_{size}\right\} $$

Definition 3

(geocoding). Geocoding is the process of mapping the tracks to the encoded tracks. The intersection of the trajectory data and the geo-grid can get the encoded trajectory set \( \sum \limits_{i=0}^n{S}_i \), as shown in Fig. 3 middle part.

$$ \sum \limits_{i=0}^n{S}_i={G}_{geo}\cup \sum \limits_{i=0}^nl $$

Definition 4 (the Bloom Filter retrieval)

The retrieval based on the Bloom Filter, represented byITBF, is an indexing structure constructed on hash functions and bit-array of the Bloom Filter. The Bloom Filter generates m hash values based on each encoded track and maps hash values one by one into the bit-array, as shown in Fig. 3 right part.

$$ {I}_{TBF}=\left\{\sum \limits_{i=0}^n{S}_i,\sum \limits_{j=0}^m hf,{B}_{arr}\right\} $$

3.2 Bloom filter

Since the Bloom Filter was put forward by Burton Howard Bloom [2] in 1970, it has been widely used in big data queries to improve query efficiency and reduce memory overload. The query time of Bloom Filter is under a constant range and the cost of storage space is small, so it has good practical value. It is mainly used in the data dictionary, data judgment, collection, and intersection, such as dictionary query [23], Spam Filtering system [24], and so on.

The essence of Bloom Filter is to use several bits to represent the elements in the bit-array. The essence of this paper is to map the string elements in the collection into a bit array through k independent hash functions. As depicted in the right part in Fig. 4, Si represents the string set. The string obtains multiple addresses in the bit-array through multiple hash operations and sets the corresponding value on the address to 1.

Fig. 4
figure 4

An example of the Bloom filter

3.3 Framework

In this section, we will present our framework of this method based on the Bloom Filter. Our framework consists of geo-grid construction, tracks encoding, and querying track. Algorithm 1 outlines the construction process. In the first phase, we obtain the minimum spatial range (Tmsr) based on the trajectory data, and then use Tmsr and gsize to create a geographic grid and assign a specific code. Finally, we get an encoded geo-grid Ggeo. The second phase aims to perform trajectory encoding. Through geo-grid, each independent track is encoded to Si, and a set \( \sum \limits_{i=0}^n{S}_i \) is formed. In the third phase, we construct the Bloom Filter ITBF, whoes elements are the encoded tracks. We create the Bloom Filter based on the size of \( \sum \limits_{i=0}^n{S}_i \) and FPR. Then according to the results obtained in the first two phases, the ITBF will be constructed.

When we have a track Q to be queried, the Q needs to be encoded first. Then the Q will be input to ITBF. After a series of processing ITBF can judge whether the track Q exists in the trajectory dataset. The specific details will be discussed further in the following two sections.

3.4 The geographic grid and geocoding

The geographic grid (geo-grid, Ggeo) is a unified and simple geospatial partitioning and positioning reference system. According to unified rules, the area is continuously divided according to distance or latitude and longitude to form regular or irregular polygons. Each polygon is called a grid or cell (a unit grid/cell) and is given a unique code. The geo-grid used in this paper is regular and decided by distance. Other forms of the grid (irregular grid, Voronoi, etc.) can also be used, depending on specific geospatial applications. The grid size of the geo-grid and the study areas are decided by the real problem and user selection. When the geo-grid is created, each grid is given a unique code, and each track will get a unique code. Each of these tracks includes the vehicle ID and the points, ensuring that each track is unique. The trajectory geocoding process is shown in Fig. 4.

3.5 Track insert and query process

Unlike the traditional index structure, the index table of the Bloom Filter is represented by a bit array. The Bloom Filter initial bit array is set to 0. The number of a hash function is decided by the data size of Trajectory instead of the user set. An element (track) calculated by hash functions only occupies several bits of the bit-array. And a bit can also be shared by multiple elements, as shown in Fig. 4. The creation of the Bloom Filter is mainly to construct bit array and hash functions, and its elements from the string set of trajectory point sequence are formed by geo-coding. After the k hash operations of each track string, which will be mapped to the bit array randomly, the Bloom Filter of the track will be constructed.

The query process of the track—to judge whether the input track exists in the set of the current trajectory dataset—is based on Bloom Filter Because the input is a sequence of coordinate points, it is necessary to use the geo-grid to encode these points, the same as the construction process. The results are mapped to the bit array one by one, and it is judged that if the result of the hash operation has one bit different from the bit array, it can be concluded that the input track does not exist in the trajectory set, and if it is all the same as the bit array, it can be judged that the path is most likely to exist in the trajectory set. The whole method construction and query process are shown in Fig. 1.

4 Experiments and results

4.1 Experiment data description

The data used in this study come from the T-drive taxi trajectory data set provided by Microsoft Research Institute [25, 26] and Shenzhen City bus trajectory data, as shown in Fig. 5. Because each taxi and bus trajectory is a long period of space-time point sequence, to facilitate experiments and performance testing, the data will be segmented to simulate the independent tracks of the vehicle. According to the topological relationship, time threshold, or other segmentation methods, the tracks are formed by several points of one trajectory. The segmented points sequence will constitute the vehicle track set.

Fig. 5
figure 5

Trajectory data overview of the study area (Beijing City and its Downtown area; Shenzhen City and its FuTian District)

The experimental platform is Windows 10, the processor is Intel (R) Core (TM) i7-9750H CPU 2.60 GHz, running memory 16GB, and the compiler version is MS VC++ 12.0. The third-party library of the Bloom Filter is ArashPartow/bloom [27]. The R*Tree index is Boost.Geometry.Index [28]. The experimental data are the 2008 Beijing taxi trajectory data [25, 26] and the 2014 Shenzhen city bus trajectory data, Guangdong Province, China.

4.2 Results

In this paper, we conducted experiments on taxi trajectory data and bus trajectory data, the conclusions drawn are the same. As shown in Fig. 6, the first line shows the result of the taxi trajectory data, and the second line is the buses’.

Fig. 6
figure 6

Query and construction time of the Bloom Filter and R*Tree index

Compared with the R*Tree index, the query efficiency of this method is verified. For the precise query, the query time based on Bloom Filter is much shorter than that based on the R*Tree index. Figure 6a shows the comparison of the query time of R*Tree and the Bloom Filter with different numbers of data. The horizontal axis represents the number of points and the vertical axis represents 2000 query times. The number of points on each track is more than 5. Although the data sources are different, we can see that the query time based on R*Tree is much longer than that based on the Bloom Filter, more than 1000 times according to the experiments. For the R*Tree index, the query time complexity is O(log n) [5], and the query mode of Bloom Filter is based on the hash function, and its time complexity is O(1), which also directly illustrates that the index based on Bloom Filter is negligible in terms of query time. Similarly, there is a large disparity in the time consumed by the construction of these two methods. The time consumed by the R*tree-based method increases linearly with the increase of data. However, the time changes of the Bloom Filter is negligible. Also, the time consumption of the two methods is no longer on the same level. As shown in Fig. 6b, the horizontal axis represents the number of total points, and the vertical axis represents the cost time of constructing the method.

In the case of the same trajectory data, we tested the impact of a different number of points included in each track during the query and construction. The test results are as follows: The number of points included in each track does not affect the Bloom Filter. With the increase of the number of points in the track, the R*Tree index has an overall upward trend, although it fluctuates, as shown in Fig. 6c. The horizontal axis represents the number of points in each track, and the vertical axis represents 2000 query times. Figure 6d shows the time consumed when the indexes were constructed, in which the two kinds of indexes have the same total number of track points, and the number of track points in each track is different. Also, although the two lines are slowly approaching, they still differ by several orders of magnitude in time consumption.

To further confirm the performance advantage of Bloom Filter for massive trajectories in other aspects, the following tests were performed in this paper:

The experimental result shows that the Bloom Filter has great advantages in memory consumption by using its bit-array, as shown in Fig. 7 and Table 2. Figure 7 shows the memory occupation of the Bloom Filter, including the memory occupied by the bit array, the encoded track set, and the original trajectory data. The transverse axis represents the number of points, and the longitudinal axis represents the amount of memory occupied, in KB. From the table and graph, we can conclude that compared with the original data, the memory consumption of the bit array is much smaller than the original data. With the increase of the number of the original data, the memory consumption of the bit-array increases linearly and slowly.

Fig. 7
figure 7

The memory occupied Comparison of different amount of trajectory data

Table 2 Comparison of memory occupied by different trajectory data size

For the query efficiency of this method, the experimental results are as follows: in Fig. 8a, the transverse axis represents the number of points contained in each track, and the longitudinal axis represents the time consumed for 10,000 times of repeated query of the same tracks in seconds. The number of points in every single track has little effect on the query time. Moreover, the number of points in the different original trajectory set (4,073,886, 8,415,326, 12,871,223) has little effect on the retrieval time, which also satisfies the conclusion explained above, that is, the query time of Bloom Filter is determined by hash functions, and the time complexity is about (1) . Also, the influence of grid size of geo-grid on query time is tested in this paper, such as shown in Fig. 8b: transverse axis represents the number of query points each time, longitudinal axis represents the time consumed for 10,000 query times in seconds, and the five curves represent the time consumed to retrieval track based on different grid sizes (100*100, 500*500,1000*1 000,3000*3000,8000*8000, in meters). Through experiments, we can conclude that the grid size has little effect on the query efficiency of tracks with different lengths.

Fig. 8
figure 8

Comparison of the influence of querying different track points on retrieval efficiency

5 Discussion

5.1 FPR influence

For the Bloom Filter, the only weakness is that it has a certain false-positive rate, which means the Bloom Filter may misjudge elements that do not exist in the dataset, but those elements that exist in the set will not be misjudged as not belonging to the set, so we call it false positive instead of a mistake. Formula \( p={\left(1-{\mathcal{e}}^{-\frac{nk}{m}}\right)}^k \) [2] represents the calculating method of the false positive rate. n denotes the number of input elements, k is the number of hash functions, m is the size of bit-array, andp is the false positive rate. In the experiment, the false positive rate is set to a certain value, which means that the overall false-positive rate needs to be less than this value. After many tests on the two data-sets, the false positive rate is far less than the set value. The tracks to be queried in the experiment are randomly generated, and the results are shown in Tables 3 and 4, and Fig. 9. Figure 9 shows the comparison between the experimental value and the set value of the FPR, where FPR is the set value, exp_Taxi is the experimental FPR of the taxi trajectory data, and exp_Bus is the bus trajectory. The horizontal axis of Fig. 9 represents the number of different trajectory points in the two trajectory data sets, and the vertical axis represents FPR. Tables 3 and 4 give specific experimental values. In this paper, we tested FPR under different data volumes, but for the different scenarios, FPR could be different, which needs further study.

Table 3 FPR of Beijing Taxi Trajectory (NTaxi means the number of taxi trajectory points)
Table 4 FPR of Shenzhen Taxi Trajectory (NBus means the number of bus trajectory points)
Fig. 9
figure 9

False positive rate of set valves and experience values

5.2 Drawbacks and future work

At present, there are many shortcomings in this method, and a lot of work is needed to complete this method. This work only distinguishes the tracks in the way of precise matching and does not further verify the fuzzy query. The tracks are simulated by segmented taxi and bus trajectory, and there is no further processing of the merging, deletion, and simplification of the points in the real tracks. For more effective geocoding, the geographical grid also needs to be further studied according to the spatial distribution of the points.

Although this work is only the first step to verify the use of the Bloom Filter to query tracks, it is an important step. If we want to complete the construction of the Bloom Filter-based trajectory query method, an in-depth research is needed in the insertion, update, deletion operation, and other aspects. For later research, we plan to use the Counting Bloom Filters [17] to improve update and delete operations and use the structure proposed by Kiss et al. [22] to study the efficiency and performance of trajectory query without FPR.

6 Conclusion

This method verifies that the Bloom Filter has a great advantage of querying trajectory point sequence, in terms of both time efficiency and memory occupancy. Moreover, the point sequence, instead of querying every single point, is regarded as the minimum retrieval granularity in this method, which can convert the track query into a string query. Also, this method has robust stability, and the retrieval efficiency is hardly affected by the length of the input tracks, the size of the geo-grid, and data size. In the process of querying, to judge the existence of a certain track only needs to compare the hash results one by one, and the comparison is the IO operations on bit-array instead of the original data. The most important is that this method can effectively avoid a low recognition rate of traditional tree-oriented index structure and index overlapping.

Big data contain a great deal of information and are widely used. A suitable query method should be constructed according to the different applications of big data. In the application of trajectory big data, our method provides a new model for the fast retrieval of tracks and also provides a new idea for querying spatial trajectory data. Therefore, our method could be applied to criminal suspect trajectory retrieval, epidemic prevention and control, and so on. When the suspects escape, they will inevitably leave their trajectory, even if it is discrete track points. Based on the method in this paper, the points are combined into tracks and quickly retrieved in trajectory big data to obtain tracks that match the suspects’ trajectory in the same period. Even if the trajectory is not of the suspects’, it may be the witnesses’, which can quickly provide clues for the investigation of the case. For those who come in contact with the 2019-nCoV or other infectious diseases patients, their trajectories can be collected as input tracks. Based on these tracks of infected persons, searching for the same or related tracks in the trajectory big data can find close contacts with the infected persons in a short time.