Census Data Repair: A Challenging Application of Disjunctive Logic Programming
Census data provide valuable insights on the economic, social and demographic conditions and trends occurring in a country. Census data is collected by means of millions of questionnaires, each one including the details of the persons living together in the same house. Before the data from the questionnaires is sent to the statisticians to be analysed, a cleaning phase (called “imputation”) is performed, in order to eliminate consistency problems, missing answers, or errors. It is important that the imputation step is done without altering the statistical validity of the collected data. The contribution of this paper is two fold. On the one hand, it provides a clear and well-founded declarative semantics to questionnaires and to the imputation problem. On the other hand, a correct modular encoding of the problem in the disjunctive logic programming language DLP(suw), supported by the DLV system, is shown. It turns out that DLP(suw) is very well-suited for this goal. Census data repair appears to be a challenging application area for disjunctive logic programming.
Unable to display preview. Download preview PDF.
- 1.M. Arenas, L. Bertossi, and J. Chomicki. Specifying and Querying Database repairs using Logic Programs with Exceptions. In Proceedings International Conference on Flexible Query Answering, pages 27–41, 2000.Google Scholar
- 2.Michael Bankier. Experience with the new imputation methodology used in the 1996 canadian censuses with extensions for future censuses. In Statistical Commission and Economic Commission for Europe, Conference of European statisticians, UN/ECE work session on statistical Data Editing, Rome, Italy, 1999.Google Scholar
- 3.Renato Bruni and Antonio Sassano. Errors detection and correction in large scale data collecting. In F. Hoffmann et al., editor, Proc. of the Fourth International Symposium on Intelligent Data Analysis (IDA-2001), volume 2189 of LNCS, pages 84–94. Springer-Verlag, 2001.Google Scholar
- 5.Thomas Eiter, Nicola Leone, Cristinel Mateis, Gerald Pfeifer, and Francesco Scarcello. The KR System dlv: Progress Report, Comparisons and Benchmarks. In Anthony G. Cohn, Lenhart Schubert, and Stuart C. Shapiro, editors, Proceedings Sixth International Conference on Principles of Knowledge Representation and Reasoning (KR’98), pages 406–417. Morgan Kaufmann Publishers, 1998.Google Scholar
- 6.Wolfgang Faber, Nicola Leone, and Gerald Pfeifer. Experimenting with heuristics for answer set programming. In Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence (IJCAI) 2001,Washington, USA, August 2001. To appear.Google Scholar
- 8.G. Greco, S. Greco, and E. Zumpano. A Logic Programming Approach to the Integration, Repairing and Queriyng of Inconsistent Databases. In Proceedings International Conference on Logic Programming, 2001.Google Scholar
- 10.William E. Winkler. State of statistical data editing and current research problems. In Statistical Commission and Economic Commission for Europe, Conference of European statisticians,UN/ECE work session on statistical Data Editing, Rome, Italy, 1999.Google Scholar