Cleaning data with Llunatic

Data cleaning (or data repairing) is considered a crucial problem in many database-related tasks. It consists in making a database consistent with respect to a given set of constraints. In recent years, repairing methods have been proposed for several classes of constraints. These methods, however, tend to hard-code the strategy to repair conflicting values and are specialized toward specific classes of constraints. In this paper, we develop a general chase-based repairing framework, referred to as Llunatic, in which repairs can be obtained for a large class of constraints and by using different strategies to select preferred values. The framework is based on an elegant formalization in terms of labeled instances and partially ordered preference labels. In this context, we revisit concepts such as upgrades, repairs and the chase. In Llunatic, various repairing strategies can be slotted in, without the need for changing the underlying implementation. Furthermore, Llunatic is the first data repairing system which is DBMS-based. We report experimental results that confirm its good scalability and show that various instantiations of the framework result in repairs of good quality.

  1. 1.

    Typically, to make this step deterministic, an ordering on null values is assumed and the smaller null value is replaced by the larger one. We assume that \(\bot _0<\bot _1< \bot _2<\bot _3 <\cdots \).

  2. 2.

    Of course, here the universe of discourse of the first-order structure being \({\textsc {consts}}\cup {\textsc {nulls}}\cup \textsc {lluns}\) (and \(\textsc {Tids}\) for the \({\textsf {Tid}}\)-attributes). Similarly to constants and nulls, lluns are treated as constants.

  3. 3.

  4. 4.

  5. 5.


  • Data quality
  • Data cleaning
  • Data repairing
  • Chase
  • Data repairing system
  • Constraints
  • Rules
  • Repair algorithm
  • Cleaning rules
  • Dependencies
  • Error detection