Census Data Repair: A Challenging Application of Disjunctive Logic Programming

  • Enrico Franconi1
  • Antonio Laureti Palma
  • Nicola Leone
  • Simona Perri
  • Francesco Scarcello
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2250)

Abstract

Census data provide valuable insights on the economic, social and demographic conditions and trends occurring in a country. Census data is collected by means of millions of questionnaires, each one including the details of the persons living together in the same house. Before the data from the questionnaires is sent to the statisticians to be analysed, a cleaning phase (called “imputation”) is performed, in order to eliminate consistency problems, missing answers, or errors. It is important that the imputation step is done without altering the statistical validity of the collected data. The contribution of this paper is two fold. On the one hand, it provides a clear and well-founded declarative semantics to questionnaires and to the imputation problem. On the other hand, a correct modular encoding of the problem in the disjunctive logic programming language DLP(suw), supported by the DLV system, is shown. It turns out that DLP(suw) is very well-suited for this goal. Census data repair appears to be a challenging application area for disjunctive logic programming.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    M. Arenas, L. Bertossi, and J. Chomicki. Specifying and Querying Database repairs using Logic Programs with Exceptions. In Proceedings International Conference on Flexible Query Answering, pages 27–41, 2000.Google Scholar
  2. 2.
    Michael Bankier. Experience with the new imputation methodology used in the 1996 canadian censuses with extensions for future censuses. In Statistical Commission and Economic Commission for Europe, Conference of European statisticians, UN/ECE work session on statistical Data Editing, Rome, Italy, 1999.Google Scholar
  3. 3.
    Renato Bruni and Antonio Sassano. Errors detection and correction in large scale data collecting. In F. Hoffmann et al., editor, Proc. of the Fourth International Symposium on Intelligent Data Analysis (IDA-2001), volume 2189 of LNCS, pages 84–94. Springer-Verlag, 2001.Google Scholar
  4. 4.
    Francesco Buccafurri, Nicola Leone, and Pasquale Rullo. Enhancing Disjunctive Datalog by Constraints. IEEE Transactions on Knowledge and Data Engineering, 12(5):845–860, 2000.CrossRefGoogle Scholar
  5. 5.
    Thomas Eiter, Nicola Leone, Cristinel Mateis, Gerald Pfeifer, and Francesco Scarcello. The KR System dlv: Progress Report, Comparisons and Benchmarks. In Anthony G. Cohn, Lenhart Schubert, and Stuart C. Shapiro, editors, Proceedings Sixth International Conference on Principles of Knowledge Representation and Reasoning (KR’98), pages 406–417. Morgan Kaufmann Publishers, 1998.Google Scholar
  6. 6.
    Wolfgang Faber, Nicola Leone, and Gerald Pfeifer. Experimenting with heuristics for answer set programming. In Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence (IJCAI) 2001,Washington, USA, August 2001. To appear.Google Scholar
  7. 7.
    I. P. Fellegi and D. Holt. A systematic approach to automatic edit and imputation. Journal of the American Statistical Association, 71(353):17–35, March 1976.CrossRefGoogle Scholar
  8. 8.
    G. Greco, S. Greco, and E. Zumpano. A Logic Programming Approach to the Integration, Repairing and Queriyng of Inconsistent Databases. In Proceedings International Conference on Logic Programming, 2001.Google Scholar
  9. 10.
    William E. Winkler. State of statistical data editing and current research problems. In Statistical Commission and Economic Commission for Europe, Conference of European statisticians,UN/ECE work session on statistical Data Editing, Rome, Italy, 1999.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2001

Authors and Affiliations

  • Enrico Franconi1
    • 1
  • Antonio Laureti Palma
    • 2
  • Nicola Leone
    • 3
  • Simona Perri
    • 3
  • Francesco Scarcello
    • 4
  1. 1.Dept. of Computer ScienceUniv. of ManchesterUK
  2. 2.ISTATNational Statistical InstituteRomaItaly
  3. 3.Dept. of MathematicsUniv. of CalabriaRendeItaly
  4. 4.DEISUniv. of CalabriaRendeItaly

Personalised recommendations