Abstract
Tracking moving entities at predefined locations plays an essential role in many surveillance related applications. Occasionally, the IDs of those entities are incorrectly recorded due to various reasons such as errors in recognition. Such errors need to be repaired on the fly as those IDs are often involved in some time-sensitive query processing or data analysis tasks. In this paper, we address a specific case where the errors result in singleton IDs, i.e., IDs that appear only once during a specific period of time and thus could be safely presumed to be erroneous. The repair of the IDs is based on constraints posed by the data itself (e.g., constraints posed by the road network). We present a tracking tree structure to index the candidate repairs for each singleton ID, which enables repairing of the IDs on the fly. We implement a distributed repair system on the Apache Storm platform. Experiments on both real and synthetic datasets demonstrate the effectiveness and efficiency of our singleton detection and repair approach.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Apache storm. http://storm.apache.org/
Chang, S., Chen, L., Chung, Y., Chen, S.: Automatic license plate recognition. IEEE Trans. Intell. Transp. Syst. 5(1), 42–53 (2004)
Cui, X., Dong, Z., Lin, L., Song, R., Yu, X.: Grandland traffic data processing platform. In: 2014 IEEE International Congress on Big Data, Anchorage, AK, USA, 27 June–2 July 2014, pp. 766–767 (2014)
Elfeky, M.G., Elmagarmid, A.K., Verykios, V.S.: TAILOR: a record linkage tool box. In: Proceedings of the 18th International Conference on Data Engineering, San Jose, CA, USA, 26 February–1 March 2002, pp. 17–28 (2002)
Fan, W., Li, J., Ma, S., Tang, N., Yu, W.: Interaction between record matching and data repairing. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2011, Athens, Greece, 12–16 June 2011, pp. 469–480 (2011)
Fu, G., Luke, K.: Chinese named entity recognition using lexicalized hmms. SIGKDD Explor. 7(1), 19–25 (2005)
Galhardas, H., Florescu, D., Shasha, D., Simon, E., Saita, C.: Declarative data cleaning: language, model, and algorithms. In: Proceedings of 27th International Conference on Very Large Data Bases, VLDB 2001, Roma, Italy, pp. 371–380, 11–14 September 2001
Gliozzo, A.M., Giuliano, C., Rinaldi, R.: Instance filtering for entity recognition. SIGKDD Explor. 7(1), 11–18 (2005)
Inan, A., Kantarcioglu, M., Bertino, E., Scannapieco, M.: A hybrid approach to private record linkage. In: Proceedings of the 24th International Conference on Data Engineering, ICDE 7–12, 2008, Cancún, México, pp. 496–505, April 2008
Li, C., Weng, J., He, Q., Yao, Y., Datta, A., Sun, A., Lee, B.: Twiner: named entity recognition in targeted twitter stream. In: The 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2012, Portland, OR, USA, pp. 721–730, 12–16 August 2012
Li, Y., Wang, C., Han, F., Han, J., Roth, D., Yan, X.: Mining evidences for named entity disambiguation. In: The 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2013, Chicago, IL, USA, pp. 1070–1078, 11–14 August 2013
Liu, S., Wang, S., Zhu, F., Zhang, J., Krishnan, R.: HYDRA: large-scale social identity linkage via heterogeneous behavior modeling. In: International Conference on Management of Data, SIGMOD 2014, Snowbird, UT, USA, pp. 51–62, 22–27 June 2014
Raman, V., Hellerstein, J.M.: Potter’s wheel: an interactive data cleaning system. In: Proceedings of 27th International Conference on Very Large Data Bases, VLDB 2001, Roma, Italy, pp. 381–390, 11–14 September 2001
Wang, J., Tang, N.: Towards dependable data repairing with fixing rules. In: International Conference on Management of Data, SIGMOD 2014, Snowbird, UT, USA, pp. 457–468, 22–27 June 2014
Wu, W., Liu, Z., Chen, M., Yang, X., He, X.: An automated vision system for container-code recognition. Expert Syst. Appl. 39(3), 2842–2855 (2012)
Yakout, M., Atallah, M.J., Elmagarmid, A.K.: Efficient private record linkage. In: Proceedings of the 25th International Conference on Data Engineering, ICDE 2009, Shanghai, China, pp. 1283–1286, 29 March–2 April 2009
Yakout, M., Elmagarmid, A.K., Neville, J., Ouzzani, M.: GDR: a system for guided data repair. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2010, Indianapolis, Indiana, USA, pp. 1223–1226, 6–10 June 2010
Acknowledgement
This work was supported in part by the National Basic Research 973 Program of China under Grant No. 2015CB352502, the National Natural Science Foundation of China under Grant Nos. 61272092 and 61572289, the Natural Science Foundation of Shandong Province of China under Grant Nos. ZR2012FZ004 and ZR2015FM002, the Science and Technology Development Program of Shandong Province of China under Grant No. 2014GGE27178, and the NSERC Discovery Grants.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Cui, X., Yu, X., Guo, D. (2016). Repair Singleton IDs on the Fly. In: Li, F., Shim, K., Zheng, K., Liu, G. (eds) Web Technologies and Applications. APWeb 2016. Lecture Notes in Computer Science(), vol 9932. Springer, Cham. https://doi.org/10.1007/978-3-319-45817-5_17
Download citation
DOI: https://doi.org/10.1007/978-3-319-45817-5_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-45816-8
Online ISBN: 978-3-319-45817-5
eBook Packages: Computer ScienceComputer Science (R0)