Skip to main content

Data Quality Issues Concerning Statistical Data Gathering Supported by Big Data Technology

  • Conference paper
Beyond Databases, Architectures, and Structures (BDAS 2014)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 424))

Abstract

The aim of the paper is to show the data quality issues concerning statistical data gathering supported by Big Data technology. An example of statistical data gathering on job offers was used. This example allowed comparing data quality issues in two different methods of data gathering: traditional statistical surveys vs. Big Data technology. The case study shows that there are lots of barriers related to data quality when using Big Data technology. These barriers were identified and described in the paper. The important part of the article is the list of issues that must be tackled to improve the data quality in the repositories that comes from Big Data technology. The proposed solution gives an opportunity to integrate it with existing systems in organization, such as the data warehouse.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Biesdorf, S., Court, D., Willmott, P.: Big data: What’s your plan? McKinsey Quarterly, 40–51 (2013)

    Google Scholar 

  2. Brown, B., Court, D., Willmott, P.: Mobilizing your c-suite for big-data analytics. McKinsey Quarterly, 76–87 (2013)

    Google Scholar 

  3. Central Statistical Office of Poland: Central statistical office of poland notes, http://www.stat.gov.pl/gus/5466_PLK_HTML.htm (accessed December 1, 2013)

  4. Church, A.H., Dutta, S.: The promise of big data for od: Old wine in new bottles or the next generation of data-driven methods for change? OD Practitioner 45, 23–31 (2013)

    Google Scholar 

  5. Das, T.K., Kumar, P.: Big data analytics: A framework for unstructured data analysis. International Journal of Engineering Science & Technology 5, 153–156 (2013)

    Google Scholar 

  6. Dolnicar, S., Grun, B.: Including Don’t know answer options in brand image surveys improves data quality. International Journal of Market Research 55, 2–14 (2013)

    Google Scholar 

  7. Durand, M.: Can big data deliver on its promise? OECD Observer,17 (2012)

    Google Scholar 

  8. Eurostat: Eurostat notes, http://epp.eurostat.ec.europa.eu/cache/ITY_SDDS/en/jvs_esms.htm (accessed December 12, 2013)

  9. Hansen, J., Smith, S.: The impact of two-stage highly interesting questions on completion rates and data quality in online marketing research. International Journal of Market Research 54, 241–260 (2012)

    Article  Google Scholar 

  10. Haug, A., Arlbjorn, J., Zachariassen, F., Schlichter, J.: Master data quality barriers: an empirical investigation. Industrial Management & Data Systems 113, 234–249 (2013)

    Article  Google Scholar 

  11. Hoffmann, L.: Looking back at big data. Communications of the ACM 56, 21–23 (2013)

    Article  Google Scholar 

  12. Jacobs, A.: The pathologies of big data. Communications of the ACM 52, 36–44 (2009)

    Article  Google Scholar 

  13. Karr, A., Sanil, A., Banks, D.: Data quality: A statistical perspective. Statistical Methodology, 137–173 (2006)

    Google Scholar 

  14. Kumar, A., Niu, F., Re, C.: Hazy: Making it easier to build and maintain big-data analytics. Communications of the ACM 56, 40–49 (2013)

    Article  Google Scholar 

  15. Louridas, P., Ebert, C.: Embedded analytics and statistics for big data. IEEE Software 30, 33–39 (2013)

    Article  Google Scholar 

  16. Mandal, P.: Data quality in statistical process control. Total Quality Management & Business Excellence 15, 89–103 (2004)

    Article  Google Scholar 

  17. Maślankowski, J.: The evolution of the data warehouse systems in recent years. Journal of Management and Finance 11, 42–54 (2013)

    Google Scholar 

  18. Maślankowski, J.: The integration of web-based information and the structured data in data warehousing. In: Wrycza, S. (ed.) SIGSAND/PLAIS 2013. LNBIP, vol. 161, pp. 66–75. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  19. McAffee, A., Brynjolfsson, E.: Big data: The management revolution. Harvard Business Review, 61–68 (2012)

    Google Scholar 

  20. Nunan, D., Di Domenico, M.: Market research and the ethics of big data. International Journal of Market Research 55, 2–13 (2013)

    Article  Google Scholar 

  21. Ross, J., Beath, C.M., Quaadgras, A.: You May Not Need Big Data After All. Harvard Business Review, 90–91 (2013)

    Google Scholar 

  22. Schroeder, J.: Big data, big business and the future of enterprise computing. NetworkWorld Asia 10, 17 (2013)

    Google Scholar 

  23. Sidi, F., Mohamed, K., Jabar, M., Ishak, I., Ibrahim, H., Mustapha, A.: A review of current trend on data management and quality in data communication. Australian Journal of Basic & Applied Sciences 7, 755–760 (2013)

    Google Scholar 

  24. Stonebraker, M.: What does ‘big data’ mean? Communications of the ACM 56, 10 (2013)

    Article  Google Scholar 

  25. Vaughan, L., Yang, R.: Web data as academic and business quality estimates: A comparison of three data sources. Journal of the American Society for Information Science & Technology 63, 1960–1972 (2012)

    Article  Google Scholar 

  26. Wang, R., Strong, D.: Beyond accuracy: What data quality means to data consumers. Journal of Management Information Systems 12, 5–33 (1996)

    MATH  Google Scholar 

  27. Yiu, D.: 5 storage system challenges in the big data era. NetworkWorld Asia 10, 26 (2013)

    Google Scholar 

  28. Zhang, D.: Granularities and inconsistencies in big data analysis. International Journal of Software Engineering & Knowledge Engineering 23, 887–893 (2013)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jacek Maślankowski .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Maślankowski, J. (2014). Data Quality Issues Concerning Statistical Data Gathering Supported by Big Data Technology. In: Kozielski, S., Mrozek, D., Kasprowski, P., Małysiak-Mrozek, B., Kostrzewa, D. (eds) Beyond Databases, Architectures, and Structures. BDAS 2014. Communications in Computer and Information Science, vol 424. Springer, Cham. https://doi.org/10.1007/978-3-319-06932-6_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-06932-6_10

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-06931-9

  • Online ISBN: 978-3-319-06932-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics