Assessing the Quality of Spatio-Textual Datasets in the Absence of Ground Truth

Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 767)

Abstract

The increasing availability of enriched geospatial data has opened up a new domain and enables the development of more sophisticated location-based services and applications. However, this development has also given rise to various data quality problems as it is very hard to verify the data for all real-world entities contained in a dataset. In this paper, we propose ARCI, a relative quality indicator which exploits the vast availability of spatio-textual datasets, to indicate how confident a user can be in the correctness of a given dataset. ARCI operates in the absence of ground truth and aims at computing the relative quality of an input dataset by cross-referencing its entries among various similar datasets. We also present an algorithm for computing ARCI and we evaluate its performance in a preliminary experimental evaluation using real-world datasets.

Keywords

Spatio-textual data Data quality Relative quality 

References

  1. 1.
    Abedjan, Z., Akcora, C.G., Ouzzani, M., Papotti, P., Stonebraker, M.: Temporal rules discovery for web data cleaning. Proc. VLDB Endowment 9(4), 336–347 (2015)CrossRefGoogle Scholar
  2. 2.
    Ballesteros, J., Cary, A., Rishe, N.: Spsjoin: parallel spatial similarity joins. In: Proceedings of the 19th ACM SIGSPATIAL GIS Conference, pp. 481–484 (2011)Google Scholar
  3. 3.
    Bouros, P., Ge, S., Mamoulis, N.: Spatio-textual similarity joins. Proc. VLDB Endowment 6(1), 1–12 (2012)CrossRefGoogle Scholar
  4. 4.
    Cao, Y., Fan, W., Yu, W.: Determining the relative accuracy of attributes. In: Proceedings of the 2013 ACM SIGMOD Conference, pp. 565–576 (2013)Google Scholar
  5. 5.
    Chiang, F., Miller, R.J.: Discovering data quality rules. Proc. VLDB Endowment 1(1), 1166–1177 (2008)CrossRefGoogle Scholar
  6. 6.
    Cong, G., Fan, W., Geerts, F., Jia, X., Ma, S.: Improving data quality: consistency and accuracy. In: Proceedings of the 33rd VLDB Conference, pp. 315–326 (2007)Google Scholar
  7. 7.
    Galarus, D., Angryk, R.: A smart approach to quality assessment of site-based spatio-temporal data. In: Proceedings of the 24th ACM SIGSPATIAL GIS Conference, pp. 55:1–55:4 (2016)Google Scholar
  8. 8.
    Kondrak, G.: N-gram similarity and distance. In: Consens, M., Navarro, G. (eds.) SPIRE 2005. LNCS, vol. 3772, pp. 115–126. Springer, Heidelberg (2005). doi:10.1007/11575832_13 CrossRefGoogle Scholar
  9. 9.
    Levandoski, J.J., Sarwat, M., Eldawy, A., Mokbel, M.F.: Lars: a location-aware recommender system. In: Proceedings of the 28th IEEE ICDE, pp. 450–461 (2012)Google Scholar
  10. 10.
    Levenshtein, V.: Binary codes capable of correcting deletions, insertions, and reversals. Soviet Phys. Doklady 10, 707–710 (1965)MathSciNetMATHGoogle Scholar
  11. 11.
    Missier, P., Embury, S., Greenwood, M., Preece, A., Jin, B.: Quality views: capturing and exploiting the user perspective on data quality. In Proceedings of the 32nd VLDB Conference, pp. 977–988 (2006)Google Scholar
  12. 12.
    Rao, J., Lin, J., Samet, H.: Partitioning strategies for spatio-textual similarity join. In: Proceedings of the 3rd ACM International Workshop on Analytics for Big Geospatial Data, pp. 40–49 (2014)Google Scholar
  13. 13.
    Razniewski, S., Nutt, W.: Completeness of queries over incomplete databases. Proc. VLDB Endowment 4(11), 749–760 (2011)Google Scholar
  14. 14.
    Recchia, G., Louwerse, M.: A comparison of string similarity measures for toponym matching. In: Proceedings of The 1st ACM International COMP Workshop, pp. 54:54–54:61 (2013)Google Scholar
  15. 15.
    Tsatsanifos, G., Vlachou, A.: On processing top-k spatio-textual preference queries. In: Proceedings of the 18th EDBT Confernce, pp. 433–444 (2015)Google Scholar
  16. 16.
    Wang, R.Y., Strong, D.M.: Beyond accuracy: what data quality means to data consumers. J. Manage. Inf. Syst. 12(4), 5–33 (1996)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Masaryk UniversityBrnoCzech Republic
  2. 2.Free University of Bozen-BolzanoSouth TyrolItaly

Personalised recommendations