The VLDB Journal

, Volume 28, Issue 6, pp 961–985 | Cite as

Skyline queries over incomplete data streams

  • Weilong Ren
  • Xiang LianEmail author
  • Kambiz Ghazinour
Regular Paper


Nowadays, efficient and effective processing over massive stream data has attracted much attention from the database community, which are useful in many real applications such as sensor data monitoring, network intrusion detection, and so on. In practice, due to the malfunction of sensing devices or imperfect data collection techniques, real-world stream data may often contain missing or incomplete data attributes. In this paper, we will formalize and tackle a novel and important problem, named skyline query over incomplete data stream (Sky-iDS), which retrieves skyline objects (in the presence of missing attributes) with high confidences from incomplete data stream. In order to tackle the Sky-iDS problem, we will design efficient approaches to impute missing attributes of objects from incomplete data stream via differential dependency (DD) rules. We will propose effective pruning strategies to reduce the search space of the Sky-iDS problem, devise cost-model-based index structures to facilitate the data imputation and skyline computation at the same time, and integrate our proposed techniques into an efficient Sky-iDS query answering algorithm. Extensive experiments have been conducted to confirm the efficiency and effectiveness of our Sky-iDS processing approach over both real and synthetic data sets.


Skyline query Incomplete data streams Sky-iDS 



Xiang Lian is supported by NSF OAC No. 1739491 and Lian Startup No. 220981, Kent State University. We thank the anonymous reviewers for the useful suggestions.


  1. 1.
    Aberer, K., Hauswirth, M., Salehi, A.: Infrastructure for data processing in large-scale interconnected sensor networks. In: MDM (2007)Google Scholar
  2. 2.
    Antova, L., Koch, C., Olteanu, D.: From complete to incomplete information and back. In: SIGMOD (2007)Google Scholar
  3. 3.
    Awasthi, A., Bhattacharya, A., Gupta, S., Singh, U.: K-dominant skyline join queries: extending the join paradigm to k-dominant skylines. In: ICDE (2017)Google Scholar
  4. 4.
    Beckmann, N., Kriegel, H., Schneider, R., Seeger, B.: The R*-tree: an efficient and robust access method for points and rectangles. In: SIGMOD (1990)Google Scholar
  5. 5.
    Berchtold, S., Keim, D., Kriegel, H.: The x-tree: an index structure for high-dimensional data. In: VLDB (1996)Google Scholar
  6. 6.
    Bohm, C., Ooi, B.C., Plant, C., Yan, Y.: Efficiently processing continuous k-nn queries on data streams. In: ICDE (2007)Google Scholar
  7. 7.
    Borzsony, S., Kossmann, D., Stocker, K.: The skyline operator. In: ICDE (2001)Google Scholar
  8. 8.
    Bousnina, F., Elmi, S., Chebbah, M., Tobji, M., HadjAli, A., Yaghlane, B.: Skyline operator over tripadvisor reviews within the belief functions framework. In: ICDE (2017)Google Scholar
  9. 9.
    Chan, C., Jagadish, H.V., Tan, K., Tung, A., Zhang, Z.: Finding k-dominant skylines in high dimensional space. In: SIGMOD (2006)Google Scholar
  10. 10.
    Choudhury, F.M., Bao, Z., Culpepper, J.S., Sellis, T.: Monitoring the top-m rank aggregation of spatial objects in streaming queries. In: ICDE (2017)Google Scholar
  11. 11.
    Cranor, C., Johnson, T., Spataschek, O., Shkapenyuk, V.: Gigascope: a stream database for network applications. In: SIGMOD (2003)Google Scholar
  12. 12.
    Dalvi, N., Suciu, D.: Efficient query evaluation on probabilistic databases. In: VLDB (2007)Google Scholar
  13. 13.
    Das, A., Gehrke, J., Riedewald, M.: Approximate join processing over data streams. In: SIGMOD (2003)Google Scholar
  14. 14.
    Das, G., Gunopulos, D., Koudas, N., Sarkas, N.: Ad-hoc top-k query answering for data streams. In: VLDB (2007)Google Scholar
  15. 15.
    Das Sarma, A., Lall, A., Nanongkai, D., Xu, J.: Randomized multi-pass streaming skyline algorithms. In: VLDB (2009)Google Scholar
  16. 16.
    Dellis, E., Seeger, B.: Efficient computation of reverse skyline queries. In: VLDB (2007)Google Scholar
  17. 17.
    Dhanabal, L., Shantharajah, S.P.: A study on nsl-kdd dataset for intrusion detection system based on classification algorithms. In: IJARCCE (2015)Google Scholar
  18. 18.
    Ding, X., Lian, X., Chen, L., Jin, H.: Continuous monitoring of skylines over uncertain data streams. Inf. Sci. 184, 196–214 (2012)CrossRefGoogle Scholar
  19. 19.
    Dobra, A., Garofalakis, M., Gehrke, J., Rastogi, R.: Processing complex aggregate queries over data streams. In: SIGMOD (2002)Google Scholar
  20. 20.
    Fan, W., Li, J., Ma, S., Tang, N., Yu, W.: Towards certain fixes with editing rules and master data. In: VLDB (2010)Google Scholar
  21. 21.
    Gao, Y., Miao, X., Cui, H., Chen, G., Li, Q.: Processing k-skyband, constrained skyline, and group-by skyline queries on incomplete data. Expert Syst. Appl. 41, 4959–4974 (2014)CrossRefGoogle Scholar
  22. 22.
    Golab, L., Özsu, T.: Issues in data stream management. In: ACM SIGMOD Record (2003)Google Scholar
  23. 23.
    Hammad, M.A., Aref, W.G., Elmagarmid, A.K.: Query processing of multi-way stream window joins. In: VLDB (2008)Google Scholar
  24. 24.
    Hao, S., Tang, N., Li, G., He, J., Ta, N., Feng, J.: A novel cost-based model for data repairing. In: ICDE. IEEE (2017)Google Scholar
  25. 25.
    Igbe, O., Darwish, I., Saadawi, T.: Distributed network intrusion detection systems: an artificial immune system approach. In: CHASE. IEEE (2016)Google Scholar
  26. 26.
    Keogh, E., Chu, S., Hart, D., Pazzani, M.: An online algorithm for segmenting time series. In: ICDE (2001)Google Scholar
  27. 27.
    Khalefa, M., Mokbel, M., Levandoski, J.: Skyline query processing for incomplete data. In: ICDE (2008)Google Scholar
  28. 28.
    Koudas, N., Ooi, B.C., Tan, K., Zhang, R.: Approximate nn queries on streams with guaranteed error/performance bounds. In: VLDB (2004)Google Scholar
  29. 29.
    Lee, J., Hwang, S.: Toward efficient multidimensional subspace skyline computation. In: VLDB (2014)Google Scholar
  30. 30.
    Li, X., Wang, Y., Li, X., Wang, Y.: Parallelizing skyline queries over uncertain data streams with sliding window partitioning and grid index. In: KAIS (2014)Google Scholar
  31. 31.
    Lian, X., Chen, L.: Monochromatic and bichromatic reverse skyline search over uncertain databases. In: SIGMOD (2008)Google Scholar
  32. 32.
    Libkin, L.: Incomplete information and certain answers in general data models. In: PODS (2011)Google Scholar
  33. 33.
    Lin, X., Yuan, Y., Wang, W., Lu, H.: Stabbing the sky: efficient skyline computation over sliding windows. In: ICDE (2005)Google Scholar
  34. 34.
    Liu, M., Tang, S.: An effective probabilistic skyline query process on uncertain data streams. In: EUSPN/ICTH (2015)Google Scholar
  35. 35.
    Mayfield, C., Neville, J., Prabhakar, S.: Eracer: a database approach for statistical inference and data cleaning. In: SIGMOD (2010)Google Scholar
  36. 36.
    Miao, X., Gao, Y., Chen, L., Chen, G., Li, Q., Jiang, T.: On efficient \(k\)-skyband query processing over incomplete data. In: DASFAA (2013)Google Scholar
  37. 37.
    Miao, X., Gao, Y., Guo, S., Liu, W.: Incomplete data management: a survey. Front. Comput. Sci. 2018(12), 4–25 (2018)CrossRefGoogle Scholar
  38. 38.
    Ooi, B.C., Goh, C.H., Tan, K.: Fast high-dimensional data search in incomplete databases. In: VLDB (1998)Google Scholar
  39. 39.
    Papadias, D., Tao, Y., Fu, G., Seeger, B.: An optimal and progressive algorithm for skyline queries. In: SIGMOD (2003)Google Scholar
  40. 40.
    Pei, J., Jiang, B., Lin, X., Yuan, Y.: Probabilistic skylines on uncertain data. In: VLDB (2007)Google Scholar
  41. 41.
    Prokoshyna, N., Szlichta, J., Chiang, F., Miller, R.J., Srivastava, D.: Combining quantitative and logical data cleaning. In: PVLDB (2015)Google Scholar
  42. 42.
    Qin, L., Yu, J.X., Chang, L.: Scalable keyword search on large data streams. In: VLDB (2011)Google Scholar
  43. 43.
    Ren, W., Lian, X., Ghazinour, K.: Skyline Queries Over Incomplete Data Streams (Technical Report). arXiv:1909.11224 (2019)
  44. 44.
    Royston, P.: Multiple imputation of missing values. Stata J. 4, 227–241 (2004)CrossRefGoogle Scholar
  45. 45.
    Sarkas, N., Das, G., Koudas, N., Tung, A.: Categorical skylines for streaming data. In: SIGMOD (2008)Google Scholar
  46. 46.
    Song, S., Cao, Y., Wang, J.: Cleaning timestamps with temporal constraints. In: PVLDB (2016)CrossRefGoogle Scholar
  47. 47.
    Song, S., Chen, L.: Differential dependencies: Reasoning and discovery. In: TODS (2011)Google Scholar
  48. 48.
    Song, S., Cheng, H., Yu, J.X., Chen, L.: Repairing vertex labels under neighborhood constraints. In: PVLDB (2014)Google Scholar
  49. 49.
    Song, S., Liu, B., Cheng, H., Yu, J.X., Chen, L.: Graph repairing under neighborhood constraints. In: VLDBJ (2017)Google Scholar
  50. 50.
    Song, S., Sun, Y., Zhang, A., Chen, L., Wang, J.: Enriching data imputation under similarity rule constraints. In: TKDE (2018)Google Scholar
  51. 51.
    Song, S., Zhang, A., Chen, L., Wang, J.: Enriching data imputation with extensive similarity neighbors. In: VLDB (2015)Google Scholar
  52. 52.
    Song, S., Zhang, A., Wang, J., Yu, P.S.: Screen: stream data cleaning under speed constraints. In: SIGMOD (2015)Google Scholar
  53. 53.
    Srivastava, J., Cooley, R., Deshpande, M., Tan, P.: Web usage mining: Discovery and applications of usage patterns from web data. In: SIGKDD (2000)Google Scholar
  54. 54.
    Tao, Y., Papadias, D.: Maintaining sliding window skylines on data streams. In: TKDE (2006)Google Scholar
  55. 55.
    Tatbul, N., Zdonik, S.: Window-aware load shedding for aggregation queries over data streams. In: VLDB (2006)Google Scholar
  56. 56.
    Van Buuren, S.: Multiple imputation of discrete and continuous data by fully conditional specification. Stat. Methods Med. Res. 16, 219–242 (2007)MathSciNetCrossRefGoogle Scholar
  57. 57.
    Vijayakumar, N., Plale, B.: Prediction of missing events in sensor data streams using kalman filters. In: sensorKDD (2007)Google Scholar
  58. 58.
    Wang, J., Song, S., Zhu, X., Lin, X.: Efficient recovery of missing events. In: PVLDB (2013)Google Scholar
  59. 59.
    Wang, J., Song, S., Zhu, X., Lin, X., Sun, J.: Efficient recovery of missing events. In: TKDE (2016)Google Scholar
  60. 60.
    Wellenzohn, K., Böhlen, M.H., Dignös, A., Gamper, J., Mitterer, H.: Continuous imputation of missing values in streams of pattern-determining time series. In: EDBT, pp 330–341 (2017).
  61. 61.
    Xue, W., Luo, Q., Chen, L., Liu, Y.: Contour map matching for event detection in sensor networks. In: SIGMOD (2006)Google Scholar
  62. 62.
    Zhang, A., Song, S., Sun, Y., Wang, J.: Learning individual models for imputation. In: ICDE (2019)Google Scholar
  63. 63.
    Zhang, A., Song, S., Wang, J.: Sequential data cleaning: a statistical approach. In: SIGMOD (2016)Google Scholar
  64. 64.
    Zhang, A., Song, S., Wang, J., Yu, P.S.: Time series data cleaning: from anomaly detection to anomaly repairing. In: VLDB (2017)Google Scholar
  65. 65.
    Zhang, S., Mamoulis, N., Cheung, D.: Scalable skyline computation using object-based space partitioning. In: SIGMOD (2009)Google Scholar
  66. 66.
    Zhang, W., Lin, X., Zhang, Y., Wang, W., Yu, J.X.: Probabilistic skyline operator over sliding windows. In: ICDE (2009)Google Scholar
  67. 67.
    Zhou, X., Chen, L.: Event detection over twitter social media streams. In: VLDB (2014)Google Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Department of Computer ScienceKent State UniversityKentUSA
  2. 2.Center for Criminal Justice, Intelligence and CybersecurityState University of New YorkCantonUSA

Personalised recommendations