Skip to main content

ForCE: Is Estimation of Data Completeness Through Time Series Forecasts Feasible?

  • Conference paper
  • First Online:
Advances in Databases and Information Systems (ADBIS 2015)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9282))

Abstract

Measuring the completeness of a data population often requires either expert knowledge or the presence of reference data. If neither is available, measuring population completeness becomes nontrivial. We present the ForCE approach (Forecasting for Completeness Estimation), a method to estimate the completeness of timestamped data using time series forecasting. We evaluate the method’s feasibility using a medical domain real-world dataset, which we provide for download. The method is compared to three baselines. ForCE manages to surpass all three.

The original version of this chapter was revised: The authors corrected errors in the figures appearing in Sect. 3.2 and the Appendix and adjusted the text referring to the figures. An erratum to this chapter can be found at DOI: 10.1007/978-3-319-23135-8_32

An erratum to this chapter can be found at http://dx.doi.org/10.1007/978-3-319-23135-8_32

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    enterprise resource planning system.

  2. 2.

    A “benefit” is any creditable treatment, counseling, or similar action a practitioner performs.

  3. 3.

    Snippet of cleaned real-world data from a medical center. To exemplify our proposition, an artificial error has been introduced at data point 25.

  4. 4.

    available for download at www6.cs.fau.de/files/completeness_data.zip.

  5. 5.

    See r-project.org.

  6. 6.

    If the classifier always guesses positive, all actual positives are caught.

References

  1. Batini, C., Scannapieco, M.: Data Quality: Concepts Methodologies and Techniques. DCSA. Springer, Heidelberg (2006)

    MATH  Google Scholar 

  2. Dersch-Mills, D., Hugel, K., Nystrom, M.: Completeness of information sources used to prepare best possible medication histories for pediatric patients. Can. J. Hosp. Pharm. 64, 10–15 (2011)

    Google Scholar 

  3. Dugas, M., Dugas-Breit, S.: A generic method to monitor completeness and speed of medical documentation processes. Methods Inf. Med. 51(3), 252–257 (2012)

    Article  Google Scholar 

  4. Dustdar, S., Pichler, R., Savenkov, V., Truong, H.L.: Quality-aware service-oriented data integration: requirements, state of the art and open challenges. SIGMOD rec. 41(1), 11–19 (2012)

    Article  Google Scholar 

  5. Endler, G.: Data quality and integration in collaborative environments. In: Proceedings of the SIGMOD/PODS 2012 PhD Symposium, PhD 2012, pp. 21–26. ACM, New York (2012)

    Google Scholar 

  6. Endler, G., Baumgärtel, P., Lenz, R.: Pay-as-you-go data quality improvement for medical centers. In: Ammenwerth, E., Hörbst, A., Hayn, D., Schreier, G. (eds.) Proceedings of the eHealth2013 (2013)

    Google Scholar 

  7. Endler, G., Langer, M., Purucker, J., Lenz, R.: An evolutionary approach to IT support for medical supply centers. In: Proceedings der 41. Jahrestagung der Gesellschaft für Informatik e.V. (GI) (2011)

    Google Scholar 

  8. Endler, G., Schwab, P.K., Wahl, A.M., Tenschert, J., Lenz, R.: An architecture for continuous data quality monitoring in medical centers. In: MEDINFO 2015 (2015)

    Google Scholar 

  9. Fan, W., Geerts, F.: Foundations of Data Quality Management. Morgan & Claypool Publishers, San Rafael (2012)

    MATH  Google Scholar 

  10. Gorupec, M., Endler, G.: ruleDQ: Ein Regelsystem zur Datenqualitätsverbesserung medizinischer Informationssysteme. In: Gesellschaft für Informatik (ed.) Lecture Notes in Informatics (LNI) Seminars 13 / Informatiktage 2014, pp. 37–40 (2014)

    Google Scholar 

  11. Hyndman, R.J.: R package ’forecast’ - forecasting functions for time series and linear models. http://cran.r-project.org/web/packages/forecast/forecast.pdf (2015). Accessed on 14 April 2015

  12. Kantz, H., Schreiber, T.: Nonlinear Time Series Analysis, vol. 7. Cambridge University Press, Cambridge (2004)

    MATH  Google Scholar 

  13. Miller, D.W., Yeast, J.D., Evans, R.L.: Missing prenatal records at a birth center: a communication problem quantified. In: AMIA Annual Symposium Proceedings of American Medical Informatics Association (2005)

    Google Scholar 

  14. Naumann, F., Freytag, J.C., Leser, U.: Completeness of integrated information sources. Inf. Syst. 29(7), 583–615 (2004)

    Article  Google Scholar 

  15. Pipino, L.L., Lee, Y.W., Wang, R.Y.: Data quality assessment. Commun. ACM 45, 211–218 (2002)

    Article  Google Scholar 

  16. Pollner, N., Steudtner, C., Meyer-Wegener, K.: Placement-safe operator-graph changes in distributed heterogeneous data stream systems. In: Datenbanksysteme für Business, Technologie und Web - Workshopband (2015)

    Google Scholar 

  17. Razniewski, S., Nutt, W.: Completeness of queries over incomplete databases. PVLDB 4(11), 749–760 (2011)

    Google Scholar 

  18. Redman, T.C.: Data Quality: The Field Guide. Digital Press, Newton (2001)

    Google Scholar 

  19. Scannapieco, M., Missier, P., Batini, C.: Data quality at a glance. Datenbank-Spektrum 14, 6–14 (2005)

    Google Scholar 

  20. Wang, R.Y., Ziad, M., Lee, Y.W.: Data Quality. ADS. Springer, New York (2002)

    MATH  Google Scholar 

  21. Zaniolo, C.: Database relations with null values. In: Proceedings of the 1st ACM SIGACT-SIGMOD Symposium on Principles of database systems, PODS 1982, pp. 27–33. ACM, New York (1982)

    Google Scholar 

Download references

Acknowledgements

Parts of this work are supported by the German Federal Ministry of Education and Research (BMBF), grant No. 13EX1013D.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gregor Endler .

Editor information

Editors and Affiliations

Appendix

Appendix

Fig. 6.
figure 6

Overview of all performance measures

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Endler, G., Baumgärtel, P., Wahl, A.M., Lenz, R. (2015). ForCE: Is Estimation of Data Completeness Through Time Series Forecasts Feasible?. In: Tadeusz, M., Valduriez, P., Bellatreche, L. (eds) Advances in Databases and Information Systems. ADBIS 2015. Lecture Notes in Computer Science(), vol 9282. Springer, Cham. https://doi.org/10.1007/978-3-319-23135-8_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-23135-8_18

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-23134-1

  • Online ISBN: 978-3-319-23135-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics