Skip to main content
Log in

Scalable processing of snapshot and continuous nearest-neighbor queries over one-dimensional uncertain data

  • Special Issue Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

In several emerging and important applications, such as location-based services, sensor monitoring and biological databases, the values of the data items are inherently imprecise. A useful query class for these data is the Probabilistic Nearest-Neighbor Query (PNN), which yields the IDs of objects for being the closest neighbor of a query point, together with the objects’ probability values. Previous studies showed that this query takes a long time to evaluate. To address this problem, we propose the Constrained Nearest-Neighbor Query (C-PNN), which returns the IDs of objects whose probabilities are higher than some threshold, with a given error bound in the answers. We show that the C-PNN can be answered efficiently with verifiers. These are methods that derive the lower and upper bounds of answer probabilities, so that an object can be quickly decided on whether it should be included in the answer. We design five verifiers, which can be used on uncertain data with arbitrary probability density functions. We further develop a  partial evaluation technique, so that a user can obtain some answers quickly, without waiting for the whole query evaluation process to be completed (which may incur a high response time). In addition, we examine the maintenance of a long-standing, or continuous C-PNN query. This query requires any update to be applied to the result immediately, in order to reflect the changes to the database values (e.g., due to the change of the location of a moving object). We design an incremental update method based on previous query answers, in order to reduce the amount of I/O and CPU cost in maintaining the correctness of the answers to such a query. Performance evaluation on realistic datasets show that our methods are capable of yielding timely and accurate results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Deshpande, A., Guestrin, C., Madden, S., Hellerstein, J., Hong, W.: Model-driven data acquisition in sensor networks. In: Proc. VLDB (2004)

  2. Sistla, P.A., Wolfson, O., Chamberlain, S., Dao, S.: Querying the uncertain position of moving objects. In: Temporal Databases: Research and Practice (1998)

  3. Pfoser, D., Jensen, C.: Capturing the uncertainty of moving-objects representations. In: Proc. SSDBM (1999)

  4. Böhm, C., Pryakhin, A., Schubert, M.: The gauss-tree: efficient object identification in databases of probabilistic feature vectors. In: Proc. ICDE (2006)

  5. Cheng, R., Kalashnikov, D., Prabhakar, S.: Evaluating probabilistic queries over imprecise data. In: Proc. ACM SIGMOD (2003)

  6. Chen, J., Cheng, R.: Efficient evaluation of imprecise location-dependent queries. In: Proc. ICDE (2007)

  7. Mokbel, M., Chow, C., Aref, W.G.: The new casper: query processing for location services without compromising privacy. In: VLDB (2006)

  8. Cheng, R., Kalashnikov, D.V., Prabhakar, S.: Querying imprecise data in moving object environments. IEEE TKDE 16(9) (2004)

  9. Kriegel, H., Kunath, P., Renz, M.: Probabilistic nearest-neighbor query on uncertain objects. In: DASFAA (2007)

  10. Dyreson, C., Snodgrass, R.: Query indexing and velocity constrained indexing: scalable techniques for continuous queries on moving objects. IEEE Trans. Comp. 51(10) (2002)

  11. Dalvi, N., Suciu, D.: Efficient query evaluation on probabilistic databases. In: Proc. VLDB (2004)

  12. Mar, O., Sarma, A., Halevy, A., Widom, J.: ULDBs: databases with uncertainty and lineage. In: VLDB (2006)

  13. Mayfield, C., Singh, S., Cheng, R., Prabhakar, S.: Orion: A database system for managing uncertain data, ver. 0.1 (http://orion.cs.purdue.edu) (2006)

  14. Jampani, R.F., Wu, M., Perez, L., Jermaine, C., Haas, P.: Mcdb: a monte carlo approach to managing uncertain data. In: SIGMOD (2008)

  15. Ré, C., Dalvi, N., Suciu, D.: Efficient top-k query evaluation on probabilistic data. In: Proc. ICDE (2007)

  16. Yi, K., Li, F., Kollios, G., Srivastava, D.: Efficient processing of top-k queries in uncertain databases. In: Proc. ICDE (2008)

  17. Soliman, M.A., Ilyas, I.F., Chang, K.C.C.: Top-k query processing in uncertain databases. In: Proc. ICDE (2007)

  18. Ljosa, V., Singh, A.K.: APLA: indexing arbitrary probability distributions. In: Proc. ICDE (2007)

  19. Cheng, R., Chen, J., Mokbel, M., Chow, C.Y.: Probabilistic verifiers: evaluating constrained nearest-neighbor queries over uncertain data. In: Proc. ICDE (2008)

  20. Prabhakar, S., Xia, Y., Kalashnikov, D., Aref, W., Hambrusch, S.: Supporting valid-time indeterminacy. ACM Trans. Database Syst. 23(1) (1998)

  21. Beskales, G., Soliman, M., Ilyas, I.F.: Efficient search for the top-k probable nearest neighbors in uncertain databases. In: Proc. VLDB (2008)

  22. Qi, Y., Singh, S., Shah, R., Prabhakar, S.: Indexing probabilistic nearest-neighbor threshold queries. In: Proc. Workshop on Management of Uncertain Data (2008)

  23. Lian, X., Chen, L.: Probabilistic group nearest neighbor queries in uncertain databases. IEEE Trans. Knowl. Data Eng. 20(6) (2008)

  24. Cheng, R., Chen, L., Chen, J., Xie, X.: Evaluating probability threshold k-nearest-neighbor queries over uncertain data. In: EDBT (2009)

  25. Tao, Y., Cheng, R., Xiao, X., Ngai, W.K., Kao, B., Prabhakar, S.: Indexing multi-dimensional uncertain data with arbitrary probability density functions. In: Proc. VLDB (2005)

  26. Xiong, X., Aref, W.: R-trees with update memos. In: Proc. ICDE (2006)

  27. Cheng, R., Xia, Y., Prabhakar, S., Shah, R.: Change tolerant indexing on constantly evolving data. In: Proc. ICDE (2005)

  28. Xiong, X., Mokbel, M., Aref, W.: Sea-cnn: scalable processing of continuous k-nearest neighbor queries in spatio-temporal databases. In: Proc. ICDE (2005)

  29. Kalashnikov D.V., Prabhakar S., Hambrusch S.E.: Main memory evaluation of monitoring queries over moving objects. Distrib. Parall. Databases 15(2), 117–135 (2004)

    Article  Google Scholar 

  30. Cheng, R., Kalashnikov, D., Prabhakar, S.: Evaluation of probabilistic queries over imprecise data in constantly-evolving environments. Inform. Syst. (IS) 32(1) (2007)

  31. M. Hadjieleftheriou: spatial index library version 0.44.2b URL http://u-foria.org/marioh/spatialindex/index.html

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Reynold Cheng.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, J., Cheng, R., Mokbel, M. et al. Scalable processing of snapshot and continuous nearest-neighbor queries over one-dimensional uncertain data. The VLDB Journal 18, 1219–1240 (2009). https://doi.org/10.1007/s00778-009-0152-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-009-0152-3

Keywords

Navigation