Advertisement

A Taxonomy of Dirty Time-Oriented Data

  • Theresia Gschwandtner
  • Johannes Gärtner
  • Wolfgang Aigner
  • Silvia Miksch
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7465)

Abstract

Data quality is a vital topic for business analytics in order to gain accurate insight and make correct decisions in many data-intensive industries. Albeit systematic approaches to categorize, detect, and avoid data quality problems exist, the special characteristics of time-oriented data are hardly considered. However, time is an important data dimension with distinct characteristics which affords special consideration in the context of dirty data. Building upon existing taxonomies of general data quality problems, we address ‘dirty’ time-oriented data, i.e., time-oriented data with potential quality problems. In particular, we investigated empirically derived problems that emerge with different types of time-oriented data (e.g., time points, time intervals) and provide various examples of quality problems of time-oriented data. By providing categorized information related to existing taxonomies, we establish a basis for further research in the field of dirty time-oriented data, and for the formulation of essential quality checks when preprocessing time-oriented data.

Keywords

dirty data time-oriented data data cleansing data quality taxonomy 

References

  1. 1.
    Rahm, E., Do, H.H.: Data Cleaning: Problems and Current Approaches. IEEE Techn. Bulletin on Data Engineering 31 (2000)Google Scholar
  2. 2.
    Kim, W., Choi, B.-J., Hong, E.-K., Kim, S.-K., Lee, D.: A Taxonomy of Dirty Data. Data Mining and Knowledge Discovery 7, 81–99 (2003)MathSciNetCrossRefGoogle Scholar
  3. 3.
    Müller, H., Freytag, J.-C.: Problems, Methods, and Challenges in Comprehensive Data Cleansing. Technical report HUB-IB-164, Humboldt University Berlin (2003)Google Scholar
  4. 4.
    Oliveira, P., Rodrigues, F., Henriques, P.: A Formal Definition of Data Quality Problems. In: International Conference on Information Quality (MIT IQ Conference) (2005)Google Scholar
  5. 5.
    Barateiro, J., Galhardas, H.: A Survey of Data Quality Tools. Datenbankspektrum 14, 15–21 (2005)Google Scholar
  6. 6.
    Sadiq, S., Yeganeh, N., Indulska, M.: 20 Years of Data Quality Research: Themes, Trends and Synergies. In: 22nd Australasian Database Conference (ADC 2011), pp. 1–10. Australian Computer Society, Sydney (2011)Google Scholar
  7. 7.
    Madnick, S., Wang, R., Lee, Y., Zhu, H.: Overview and Framework for Data and Information Quality Research. Journal of Data and Information Quality (JDIQ) 1(1), 1–22 (2009)Google Scholar
  8. 8.
    Neely, M., Cook, J.: A Framework for Classification of the Data and Information Quality Literature and Preliminary Results (1996-2007). In: 14th Americas Conference on Information Systems 2008 (AMICS 2008), pp. 1–14 (2008)Google Scholar
  9. 9.
    Aigner, W., Miksch, S., Schumann, H., Tominski, C.: Visualization of Time-Oriented Data. Springer, London (2011)CrossRefGoogle Scholar
  10. 10.
    Andrienko, N., Andrienko, G.: Exploratory Analysis of Spatial and Temporal Data: A Systematic Approach. Springer, Berlin (2006)MATHGoogle Scholar
  11. 11.
    Shneiderman, B.: The Eyes Have It: A Task by Data Type Taxonomy for Information Visualizations. In: IEEE Symposium on Visual Languages, pp. 336–343. IEEE Computer Society Press (1996)Google Scholar
  12. 12.
    Allen, J.: Towards a General Model of Action and Time. Artificial Intelligence 23(2), 123–154 (1984)MATHCrossRefGoogle Scholar
  13. 13.
    XIMES GmbH: Time Intelligence Solutions – [TIS], http://www.ximes.com/en/software/products/tis (accessed March 30, 2012)
  14. 14.
    XIMES GmbH: Qmetrix, http://www.ximes.com/en/ximes/qmetrix/background.php (accessed March 30, 2012)
  15. 15.
    Microsoft: Excel, http://office.microsoft.com/en-us/excel/ (accessed March 30, 2012)
  16. 16.
    Corbin, J., Strauss, A.: Basics of Qualitative Research: Techniques and Procedures for Developing Grounded Theory, 3rd edn. Sage Publications, Los Angeles (2008)Google Scholar
  17. 17.
    Card, S., Mackinlay, J., Shneiderman, B.: Readings in Information Visualization: Using Vision to Think. Morgan Kaufmann, San Francisco (1999)Google Scholar
  18. 18.
    Raman, V., Hellerstein, J.: Potter’s Wheel: An Interactive Data Cleaning System. In: 27th International Conference on Very Large Data Bases (VLDB 2001), pp. 381–390. Morgan Kaufmann, San Francisco (2001)Google Scholar
  19. 19.
    Kandel, S., Paepcke, A., Hellerstein, J., Heer, J.: Wrangler: Interactive Visual Specification of Data Transformation Scripts. In: ACM Human Factors in Computing Systems (CHI 2011), pp. 3363–3372. ACM, New York (2011)Google Scholar
  20. 20.
    Huynh, D., Mazzocchi, S.: Google Refine, http://code.google.com/p/google-refine (accessed March 30, 2012)

Copyright information

© IFIP International Federation for Information Processing 2012

Authors and Affiliations

  • Theresia Gschwandtner
    • 1
  • Johannes Gärtner
    • 2
  • Wolfgang Aigner
    • 1
  • Silvia Miksch
    • 1
  1. 1.Institute of Software Technology and Interactive Systems (ISIS)Vienna University of TechnologyViennaAustria
  2. 2.XIMES GmbHViennaAustria

Personalised recommendations