Advertisement

Data Currency Assessment Through Data Mining

  • Sergio Pio Alvarez
  • Adriana Marotta
  • Libertad Tansini
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9382)

Abstract

The application of Data Mining (DM) techniques for DQ, often called Data Quality Mining (DQM), offers a wide range of possibilities for DQ assessment. The goal of this work is to propose a mechanism for data currency assessment using statistics and DM techniques. The proposed approach consists on estimating the validity period for the entities using a training set and then evaluating the probability of currency of the last known data value for each entity. The proposed scheme helps in two ways to lead to an always up-to-date database: it can warn if a certain data value is becoming obsolete, and it can inform the data manager about the best frequency for updating data.

Keywords

Data quality Currency Data mining Clustering 

References

  1. 1.
    Redman, T.C.: Data: an unfolding quality disaster. DM Rev. Mag. 8 (2004)Google Scholar
  2. 2.
    Scannapieco, M., Missier, P., Batini, C.: Data quality at a glance. Datenbank Spektrum 14, 6–14 (2005)Google Scholar
  3. 3.
    Strong, D.M., Lee, Y.W., Wang, R.Y.: Data quality in context. Commun. ACM 40(5), 103–110 (1997)CrossRefGoogle Scholar
  4. 4.
    Pipino, L., Lee, Y.W., Wang, R.Y.: Data quality assessment. Commun. ACM 45(4), 211–218 (2002)CrossRefGoogle Scholar
  5. 5.
    Wand, Y., Wang, R.Y.: Anchoring data quality dimensions in ontological foundations. Commun. ACM 39(11), 86–95 (1996)CrossRefGoogle Scholar
  6. 6.
    Scannapieco, M., Catarci, T.: Data quality under the computer science perspective. Arch. Comput. 2, 1–15 (2002)Google Scholar
  7. 7.
    Heinrich, B., Klier, M.: Assessing data currency: a probabilistic approach. J. Inform. Sci. 37, 86–100 (2011)CrossRefGoogle Scholar
  8. 8.
    Peralta, V., Ruggia R., Kedad, Z., Bouzeghoub, M.: A framework for data quality evaluation in a data integration system. In: 19th Brazilian Database Symposium (SBBD) (2004)Google Scholar
  9. 9.
    Bouzeghoub, M., Peralta, V.: A framework for analysis of data freshness. In: IQIS, Maison de la Chimie, Paris, France (2004)Google Scholar
  10. 10.
    Firestone, J.: Data mining and KDD: A shifting mosaic. White Paper (1997)Google Scholar
  11. 11.
    Grüning, F.: Data quality mining: employing classifiers for assuring consistent datasets. In: Proceedings of the 3rd International ICSC Symposium, ITEE, Oldenburg, Germany (2007)Google Scholar
  12. 12.
    Hipp, J., Güntzer, U., Grimmer, U.: Data quality mining, making a virtue of necessity. In: Proceedings of the 6th ACM SIGMOD Workshop, California, EEUU (2001)Google Scholar
  13. 13.
    Grimmer, U., Hinrichs, H.: A methodological approach to data quality management supported by data mining. In: Sixth International Conference on Information Quality (2003)Google Scholar
  14. 14.
    Farzi, S., Dastjerdi, A.B.: Data quality measurement using data mining. Int. J. Comput. Theory Eng. 2(1), 1793–8201 (2010)Google Scholar
  15. 15.
    Luebbers, D., Grimmer, U., Jarke, M.: Systematic development of data mining-based data quality tools. In: Proceedings of the 29th VLDB Conference, Berlin, Germany (2003)Google Scholar
  16. 16.
    Vázquez Soler, S., Yankelevich, D.: Quality mining: a data mining based method for data quality evaluation. In: Sixth International Conference on Information Quality (2003)Google Scholar
  17. 17.
    Dasu, T., Johnson, T.: Hunting of the snark: finding data glitches using data mining methods. In: Proceedings of the 1999 Conference on Information Quality, MIT (1999)Google Scholar
  18. 18.
    Maletic, J.I., Marcus, A.: Data cleansing: beyond integrity analysis. In: Proceedings of the 2000 Conference on Information Quality (2000)Google Scholar
  19. 19.
    Isaac, D., Lynnes, C.: Automated data quality assessment in the intelligent archive (2003)Google Scholar
  20. 20.
    Alizamini, F.G., Pedram, M.M., Alishahi, M., Badi, K.: Data quality improvement using fuzzy association rules. In: ICEIE (2010)Google Scholar
  21. 21.
    Fan, W., Geerts, F., Wijsen, J.: Determining the currency of data. ACM Trans. Database Syst. 37(4), 1–46 (2012). Article 25CrossRefGoogle Scholar
  22. 22.
    North, M.A.: Data mining for the masses. Free e-book published by Global Text Project (2012). http://globaltext.terry.uga.edu/booklist?cat=Computing
  23. 23.
    The World Data Bank - Population, total http://data.worldbank.org/indicator/SP.POP.TOTL. Accessed 15 February 2015
  24. 24.
    Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. SIGKDD Explor. 11(1), 10–18 (2009)CrossRefGoogle Scholar
  25. 25.
    Machine Learning Group at the University of Waikato - Weka 3: Data Mining Software in Java - http://www.cs.waikato.ac.nz/~ml/weka/. Accessed 15 February 2015
  26. 26.
    Agrawal, R., Imielinski, T., Swami, A.: Mining association rules between sets of items in large databases. In: Proceedings of the ACM SIGMOD Conference Washington DC, USA (1993)Google Scholar
  27. 27.
    Hipp, J., Gontzer, U., Nakhaeizadeh, G.: Algorithms for association rule mining: a general survey and comparison. SIGKDD Explor. 2(1), 58–64 (2003)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Sergio Pio Alvarez
    • 1
  • Adriana Marotta
    • 1
  • Libertad Tansini
    • 1
  1. 1.Universidad de la RepúblicaMontevideoUruguay

Personalised recommendations