Advertisement

DART: A Data Acquisition and Repairing Tool

  • Bettina Fazzinga
  • Sergio Flesca
  • Filippo Furfaro
  • Francesco Parisi
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4254)

Abstract

An architecture is proposed providing robust data acquisition facilities from input documents containing tabular data. This architecture is based on a data-repairing framework exploiting integrity constraints defined on the input data to support the detection and the repair of inconsistencies in the data arising from errors occurring in the acquisition phase. In particular, a specific but expressive form of integrity constraints (steady aggregate constraints) is defined which enables the computation of a repair to be expressed as a mixed integer linear programming problem.

Keywords

Lexical Item Integrity Constraint Database Scheme Tabular Data Database Instance 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Agarwal, S., Keller, A.M., Wiederhold, G., Saraswat, K.: Flexible Relation: An Approach for Integrating Data from Multiple, Possibly Inconsistent Databases. In: Proc. International Conference on Data Engineering (ICDE), pp. 495–504 (1995)Google Scholar
  2. 2.
    Arenas, M., Bertossi, L.E., Chomicki, J.: Consistent Query Answers in Inconsistent Databases. In: Proc. Symposium on Principles of Database Systems (PODS), pp. 68–79 (1999)Google Scholar
  3. 3.
    Arenas, M., Bertossi, L.E., Chomicki, J.: Specifying and Querying Database Repairs using Logic Programs with Exceptions. In: Proc. International Conference on Flexible Query Answering Systems (FQAS), pp. 27–41 (2000)Google Scholar
  4. 4.
    Arenas, M., Bertossi, L.E., Chomicki, J., He, X., Raghavan, V., Spinrad, J.: Scalar aggregation in inconsistent databases. Theoretical Computer Science 3(296), 405–434 (2003)CrossRefMathSciNetGoogle Scholar
  5. 5.
    Baumgartner, R., Flesca, S., Gottlob, G.: Visual Web Information Extraction with Lixto. In: Proc. International Conference on Very Large Data Bases (VLDB), pp. 119–128 (2001)Google Scholar
  6. 6.
    Bertossi, L., Bravo, L., Franconi, E., Lopatenko, A.: Complexity and Approximation of Fixing Numerical Attributes in Databases Under Integrity Constraints. In: Proc. International Symposium on Database Programming Languages (DBPL), pp. 262–278 (2005) Google Scholar
  7. 7.
    Bohannon, P., Flaster, M., Fan, W., Rastogi, R.: A Cost-Based Model and Effective Heuristic for Repairing Constraints by Value Modification. In: Proc. ACM SIGMOD International Conference on Management of Data (SIGMOD), pp. 143–154 (2005)Google Scholar
  8. 8.
    Bry, F.: Query Answering in Information Systems with Integrity Constraints. In: IFIP WG 11.5 Working Conference on Integrity and Control in Information Systems, pp. 113–130 (1997)Google Scholar
  9. 9.
    Chomicki, J., Marcinkowski, J., Staworko, S.: Computing consistent query answers using conflict hypergraphs. In: Proc. International Conference on Information and Knowledge Management (CIKM), pp. 417–426 (2004)Google Scholar
  10. 10.
    Chomicki, J., Marcinkowski, J., Staworko, S.: Hippo: A System for Computing Consistent Answers to a Class of SQL Queries. In: Bertino, E., Christodoulakis, S., Plexousakis, D., Christophides, V., Koubarakis, M., Böhm, K., Ferrari, E. (eds.) EDBT 2004. LNCS, vol. 2992, pp. 841–844. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  11. 11.
    Chomicki, J., Marcinkowski, J.: Minimal-Change Integrity Maintenance Using Tuple Deletions. Information and Computation (IC) 197(1-2), 90–121 (2005)MATHCrossRefMathSciNetGoogle Scholar
  12. 12.
    Cohen, W.W., Hurst, M., Jensen, L.S.: A flexible learning system for wrapping tables and lists in HTML documents. In: Proc. International World Wide Web Conference (WWW), pp. 232–241 (2002)Google Scholar
  13. 13.
    Crescenzi, V., Mecca, G., Merialdo, P.: RoadRunner: Towards Automatic Data Extraction from Large Web Sites. In: Proc. International Conference on Very Large Data Bases (VLDB), pp. 109–118 (2001)Google Scholar
  14. 14.
    Embley, D.W., Tao, C., Liddle, S.W.: Automating the extraction of data from HTML tables with unknown structure. Data & Knowledge Engineering 54(1), 3–28 (2005)CrossRefGoogle Scholar
  15. 15.
    Fazzinga, B., Flesca, S., Tagarelli, A.: Learning Robust Web Wrappers. In: Andersen, K.V., Debenham, J., Wagner, R. (eds.) DEXA 2005. LNCS, vol. 3588, pp. 736–745. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  16. 16.
    Flesca, S., Furfaro, F., Parisi, F.: Consistent Query Answer on Numerical Databases under Aggregate Constraint. In: Proc. International Symposium on Database Programming Languages (DBPL), pp. 279–294 (2005)Google Scholar
  17. 17.
    Flesca, S., Tagarelli, A.: Schema-Based Web Wrapping. In: Atzeni, P., Chu, W., Lu, H., Zhou, S., Ling, T.-W. (eds.) ER 2004. LNCS, vol. 3288, pp. 286–299. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  18. 18.
    Gass, S.I.: Linear Programming Methods and Applications. McGraw Hill, New York (1985)MATHGoogle Scholar
  19. 19.
    Greco, G., Greco, S., Zumpano, E.: A Logical Framework for Querying and Repairing Inconsistent Databases. IEEE Transactions on Knowledge and Data Engineering (TKDE) 15(6), 1389–1408 (2003)CrossRefGoogle Scholar
  20. 20.
    Laender, A.H.F., Ribeiro-Neto, B.A., da Silva, A.S.: DEByE - Data Extraction By Example. Data & Knowledge Engineering 40(2), 121–154 (2002)MATHCrossRefGoogle Scholar
  21. 21.
    Liu, L., Pu, C., Han, W.: XWRAP: An XML-Enabled Wrapper Construction System for Web Information Sources. In: Proc. International Conference on Data Engineering (ICDE), pp. 611–621 (2000)Google Scholar
  22. 22.
    Papadimitriou, C.H.: On the complexity of integer programming. Journal of the Association for Computing Machinery (JACM) 28(4), 765–768 (1981)MATHMathSciNetGoogle Scholar
  23. 23.
    Papadimitriou, C.H.: Computational Complexity. Addison-Wesley, Reading (1994)MATHGoogle Scholar
  24. 24.
    Wijsen, J.: Condensed representation of database repairs for consistent query answering. In: Calvanese, D., Lenzerini, M., Motwani, R. (eds.) ICDT 2003. LNCS, vol. 2572, pp. 375–390. Springer, Heidelberg (2002)Google Scholar
  25. 25.
    Wijsen, J.: Making More Out of an Inconsistent Database. In: Benczúr, A.A., Demetrovics, J., Gottlob, G. (eds.) ADBIS 2004. LNCS, vol. 3255, pp. 291–305. Springer, Heidelberg (2004)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Bettina Fazzinga
    • 1
  • Sergio Flesca
    • 1
  • Filippo Furfaro
    • 1
  • Francesco Parisi
    • 1
  1. 1.DEISUniversità della CalabriaRende (CS)Italy

Personalised recommendations