Abstract
Inconsistent data indicates that there is conflicted information in the data, which can be formalized as the violations of given semantic constraints. To improve data quality, repair means to make the data consistent by modifying the original data. Using the feedbacks of users to direct the repair operations is a popular solution. Under the setting of big data, it is unrealistic to let users give their feedbacks on the whole data set. In this paper, the repair position selection problem (RPS for short) is formally defined and studied. Intuitively, the RPS problem tries to find an optimal set of repair positions under the limitation of repairing cost such that we can obtain consistent data as many as possible. First, the RPS problem is formalized. Then, by considering three different repair strategies, the complexities and approximabilities of the corresponding RPS problems are studied.
This work was supported in part by the General Program of the National Natural Science Foundation of China under grants 61502121, 61402130, 61772157, U1509216, the China Postdoctoral Science Foundation under grant 2016M590284, the Fundamental Research Funds for the Central Universities (Grant No. HIT.NSRIF.201649), and Heilongjiang Postdoctoral Foundation (Grant No. LBH-Z15094).
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Arenas, M., Bertossi, L., Chomicki, J.: Consistent query answers in inconsistent databases. In: Proceedings of the Eighteenth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS 1999), New York, pp. 68–79. ACM (1999)
Bohannon, P., Fan, W., Flaster, M., Rastogi, R.: A cost-based model and effective heuristic for repairing constraints by value modification. In: Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data (SIGMOD 2005), New York, pp. 143–154. ACM (2005)
Bohannon, P., Fan, W., Geerts, F., Jia, X., Kementsietsidis, A.: Conditional functional dependencies for data cleaning. In: 2007 IEEE 23rd International Conference on Data Engineering, pp. 746–755, April 2007
Cai, Z., Heydari, M., Lin, G.: Iterated local least squares microarray missing value imputation. J. Bioinform. Computat. Biol. 4, 935–958 (2006)
Chiang, F., Miller, R.J.: A unified model for data and constraint repair. In: Proceedings of the 2011 IEEE 27th International Conference on Data Engineering (ICDE 2011), Washington, DC, pp. 446–457. IEEE Computer Society (2011)
Chomicki, J., Marcinkowski, J.: Minimal-change integrity maintenance using tuple deletions. Inf. Comput. 197, 90–121 (2005)
Cong, G., Fan, W., Geerts, F., Jia, X., Ma, S.: Improving data quality: consistency and accuracy. In: Proceedings of the 33rd International Conference on Very Large Data Bases (VLDB 2007), pp. 315–326. VLDB Endowment (2007)
Decker, H., Martinenghi, D.: Inconsistency-tolerant integrity checking. IEEE Trans. Knowl. Data Eng. 23, 218–234 (2011)
Eiter, T., Fink, M., Greco, G., Lembo, D.: Repair localization for query answering from inconsistent databases. ACM Trans. Database Syst. 33, 10:1–10:51 (2008)
Feige, U.: A threshold of ln n for approximating set cover. J. ACM 45, 634–652 (1998)
Feige, U., Peleg, D., Kortsarz, G.: The dense k-subgraph problem. Algorithmica 29, 410–421 (2001)
Feige, U., Seltser, M.: On the densest k-subgraph problems, technical report, The Weizmann Institute, Jerusalem, Israel (1997)
Fuxman, A., Miller, R.J.: First-order query rewriting for inconsistent databases. J. Comput. Syst. Sci. 73, 610–635 (2007)
Greco, S., Sirangelo, C., Trubitsyna, I., Zumpano, E.: Preferred repairs for inconsistent databases. In: Proceedings of the Seventh International Database Engineering and Applications Symposium, pp. 202–211, July 2003
Kuhn, H.: The Hungarian method for the assignment problem. Nav. Res. Logist. Q. 2, 83–97 (1955)
Li, J., Liu, X.: An important aspect of big data: data usability. J. Comput. Res. Dev. 50, 1147–1162 (2013)
Lopatenko, A., Bertossi, L.: Complexity of consistent query answering in databases under cardinality-based and incremental repair semantics. In: Schwentick, T., Suciu, D. (eds.) ICDT 2007. LNCS, vol. 4353, pp. 179–193. Springer, Heidelberg (2006). https://doi.org/10.1007/11965893_13
Lopatenko, A., Bravo, L.: Efficient approximation algorithms for repairing inconsistent databases. In: 2007 IEEE 23rd International Conference on Data Engineering, pp. 216–225, April 2007
Miao, D., Liu, X., Li, J.: On the complexity of sampling query feedback restricted database repair of functional dependency violations. Theor. Comput. Sci. 609, 594–605 (2016)
Staworko, S., Chomicki, J.: Consistent query answers in the presence of universal constraints. Inf. Syst. 35, 1–22 (2010)
Wang, Y., Cai, Z., Stothard, P., Moore, S., Goebel, R., Wang, L., Lin, G.: Fast accurate missing SNP genotype local imputation. BMC Res. Notes 5, 404 (2012)
West, D.B.: Introduction to Graph Theory. Prentice Hall, New York (2001)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Liu, X., Li, Y., Li, J. (2017). Repair Position Selection for Inconsistent Data. In: Gao, X., Du, H., Han, M. (eds) Combinatorial Optimization and Applications. COCOA 2017. Lecture Notes in Computer Science(), vol 10627. Springer, Cham. https://doi.org/10.1007/978-3-319-71150-8_35
Download citation
DOI: https://doi.org/10.1007/978-3-319-71150-8_35
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-71149-2
Online ISBN: 978-3-319-71150-8
eBook Packages: Computer ScienceComputer Science (R0)