East European Conference on Advances in Databases and Information Systems

ADBIS 2015: Advances in Databases and Information Systems pp 261-274 | Cite as

ForCE: Is Estimation of Data Completeness Through Time Series Forecasts Feasible?

  • Gregor Endler
  • Philipp Baumgärtel
  • Andreas M. Wahl
  • Richard Lenz
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9282)

Abstract

Measuring the completeness of a data population often requires either expert knowledge or the presence of reference data. If neither is available, measuring population completeness becomes nontrivial. We present the ForCE approach (Forecasting for Completeness Estimation), a method to estimate the completeness of timestamped data using time series forecasting. We evaluate the method’s feasibility using a medical domain real-world dataset, which we provide for download. The method is compared to three baselines. ForCE manages to surpass all three.

Keywords

Data quality Population completeness Time series Forecasting 

References

  1. 1.
    Batini, C., Scannapieco, M.: Data Quality: Concepts Methodologies and Techniques. DCSA. Springer, Heidelberg (2006)MATHGoogle Scholar
  2. 2.
    Dersch-Mills, D., Hugel, K., Nystrom, M.: Completeness of information sources used to prepare best possible medication histories for pediatric patients. Can. J. Hosp. Pharm. 64, 10–15 (2011)Google Scholar
  3. 3.
    Dugas, M., Dugas-Breit, S.: A generic method to monitor completeness and speed of medical documentation processes. Methods Inf. Med. 51(3), 252–257 (2012)CrossRefGoogle Scholar
  4. 4.
    Dustdar, S., Pichler, R., Savenkov, V., Truong, H.L.: Quality-aware service-oriented data integration: requirements, state of the art and open challenges. SIGMOD rec. 41(1), 11–19 (2012)CrossRefGoogle Scholar
  5. 5.
    Endler, G.: Data quality and integration in collaborative environments. In: Proceedings of the SIGMOD/PODS 2012 PhD Symposium, PhD 2012, pp. 21–26. ACM, New York (2012)Google Scholar
  6. 6.
    Endler, G., Baumgärtel, P., Lenz, R.: Pay-as-you-go data quality improvement for medical centers. In: Ammenwerth, E., Hörbst, A., Hayn, D., Schreier, G. (eds.) Proceedings of the eHealth2013 (2013)Google Scholar
  7. 7.
    Endler, G., Langer, M., Purucker, J., Lenz, R.: An evolutionary approach to IT support for medical supply centers. In: Proceedings der 41. Jahrestagung der Gesellschaft für Informatik e.V. (GI) (2011)Google Scholar
  8. 8.
    Endler, G., Schwab, P.K., Wahl, A.M., Tenschert, J., Lenz, R.: An architecture for continuous data quality monitoring in medical centers. In: MEDINFO 2015 (2015)Google Scholar
  9. 9.
    Fan, W., Geerts, F.: Foundations of Data Quality Management. Morgan & Claypool Publishers, San Rafael (2012)MATHGoogle Scholar
  10. 10.
    Gorupec, M., Endler, G.: ruleDQ: Ein Regelsystem zur Datenqualitätsverbesserung medizinischer Informationssysteme. In: Gesellschaft für Informatik (ed.) Lecture Notes in Informatics (LNI) Seminars 13 / Informatiktage 2014, pp. 37–40 (2014)Google Scholar
  11. 11.
    Hyndman, R.J.: R package ’forecast’ - forecasting functions for time series and linear models. http://cran.r-project.org/web/packages/forecast/forecast.pdf (2015). Accessed on 14 April 2015
  12. 12.
    Kantz, H., Schreiber, T.: Nonlinear Time Series Analysis, vol. 7. Cambridge University Press, Cambridge (2004)MATHGoogle Scholar
  13. 13.
    Miller, D.W., Yeast, J.D., Evans, R.L.: Missing prenatal records at a birth center: a communication problem quantified. In: AMIA Annual Symposium Proceedings of American Medical Informatics Association (2005)Google Scholar
  14. 14.
    Naumann, F., Freytag, J.C., Leser, U.: Completeness of integrated information sources. Inf. Syst. 29(7), 583–615 (2004)CrossRefGoogle Scholar
  15. 15.
    Pipino, L.L., Lee, Y.W., Wang, R.Y.: Data quality assessment. Commun. ACM 45, 211–218 (2002)CrossRefGoogle Scholar
  16. 16.
    Pollner, N., Steudtner, C., Meyer-Wegener, K.: Placement-safe operator-graph changes in distributed heterogeneous data stream systems. In: Datenbanksysteme für Business, Technologie und Web - Workshopband (2015)Google Scholar
  17. 17.
    Razniewski, S., Nutt, W.: Completeness of queries over incomplete databases. PVLDB 4(11), 749–760 (2011)Google Scholar
  18. 18.
    Redman, T.C.: Data Quality: The Field Guide. Digital Press, Newton (2001)Google Scholar
  19. 19.
    Scannapieco, M., Missier, P., Batini, C.: Data quality at a glance. Datenbank-Spektrum 14, 6–14 (2005)Google Scholar
  20. 20.
    Wang, R.Y., Ziad, M., Lee, Y.W.: Data Quality. ADS. Springer, New York (2002)MATHGoogle Scholar
  21. 21.
    Zaniolo, C.: Database relations with null values. In: Proceedings of the 1st ACM SIGACT-SIGMOD Symposium on Principles of database systems, PODS 1982, pp. 27–33. ACM, New York (1982)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Gregor Endler
    • 1
  • Philipp Baumgärtel
    • 1
  • Andreas M. Wahl
    • 1
  • Richard Lenz
    • 1
  1. 1.Computer Science 6 (Data Management)Friedrich-Alexander-Universität Erlangen-NürnbergErlangenGermany

Personalised recommendations