Advertisement

Data Quality Issues Concerning Statistical Data Gathering Supported by Big Data Technology

  • Jacek MaślankowskiEmail author
Part of the Communications in Computer and Information Science book series (CCIS, volume 424)

Abstract

The aim of the paper is to show the data quality issues concerning statistical data gathering supported by Big Data technology. An example of statistical data gathering on job offers was used. This example allowed comparing data quality issues in two different methods of data gathering: traditional statistical surveys vs. Big Data technology. The case study shows that there are lots of barriers related to data quality when using Big Data technology. These barriers were identified and described in the paper. The important part of the article is the list of issues that must be tackled to improve the data quality in the repositories that comes from Big Data technology. The proposed solution gives an opportunity to integrate it with existing systems in organization, such as the data warehouse.

Keywords

Big Data unstructured data data quality 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Biesdorf, S., Court, D., Willmott, P.: Big data: What’s your plan? McKinsey Quarterly, 40–51 (2013)Google Scholar
  2. 2.
    Brown, B., Court, D., Willmott, P.: Mobilizing your c-suite for big-data analytics. McKinsey Quarterly, 76–87 (2013)Google Scholar
  3. 3.
    Central Statistical Office of Poland: Central statistical office of poland notes, http://www.stat.gov.pl/gus/5466_PLK_HTML.htm (accessed December 1, 2013)
  4. 4.
    Church, A.H., Dutta, S.: The promise of big data for od: Old wine in new bottles or the next generation of data-driven methods for change? OD Practitioner 45, 23–31 (2013)Google Scholar
  5. 5.
    Das, T.K., Kumar, P.: Big data analytics: A framework for unstructured data analysis. International Journal of Engineering Science & Technology 5, 153–156 (2013)Google Scholar
  6. 6.
    Dolnicar, S., Grun, B.: Including Don’t know answer options in brand image surveys improves data quality. International Journal of Market Research 55, 2–14 (2013)Google Scholar
  7. 7.
    Durand, M.: Can big data deliver on its promise? OECD Observer,17 (2012)Google Scholar
  8. 8.
    Eurostat: Eurostat notes, http://epp.eurostat.ec.europa.eu/cache/ITY_SDDS/en/jvs_esms.htm (accessed December 12, 2013)
  9. 9.
    Hansen, J., Smith, S.: The impact of two-stage highly interesting questions on completion rates and data quality in online marketing research. International Journal of Market Research 54, 241–260 (2012)CrossRefGoogle Scholar
  10. 10.
    Haug, A., Arlbjorn, J., Zachariassen, F., Schlichter, J.: Master data quality barriers: an empirical investigation. Industrial Management & Data Systems 113, 234–249 (2013)CrossRefGoogle Scholar
  11. 11.
    Hoffmann, L.: Looking back at big data. Communications of the ACM 56, 21–23 (2013)CrossRefGoogle Scholar
  12. 12.
    Jacobs, A.: The pathologies of big data. Communications of the ACM 52, 36–44 (2009)CrossRefGoogle Scholar
  13. 13.
    Karr, A., Sanil, A., Banks, D.: Data quality: A statistical perspective. Statistical Methodology, 137–173 (2006)Google Scholar
  14. 14.
    Kumar, A., Niu, F., Re, C.: Hazy: Making it easier to build and maintain big-data analytics. Communications of the ACM 56, 40–49 (2013)CrossRefGoogle Scholar
  15. 15.
    Louridas, P., Ebert, C.: Embedded analytics and statistics for big data. IEEE Software 30, 33–39 (2013)CrossRefGoogle Scholar
  16. 16.
    Mandal, P.: Data quality in statistical process control. Total Quality Management & Business Excellence 15, 89–103 (2004)CrossRefGoogle Scholar
  17. 17.
    Maślankowski, J.: The evolution of the data warehouse systems in recent years. Journal of Management and Finance 11, 42–54 (2013)Google Scholar
  18. 18.
    Maślankowski, J.: The integration of web-based information and the structured data in data warehousing. In: Wrycza, S. (ed.) SIGSAND/PLAIS 2013. LNBIP, vol. 161, pp. 66–75. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  19. 19.
    McAffee, A., Brynjolfsson, E.: Big data: The management revolution. Harvard Business Review, 61–68 (2012)Google Scholar
  20. 20.
    Nunan, D., Di Domenico, M.: Market research and the ethics of big data. International Journal of Market Research 55, 2–13 (2013)CrossRefGoogle Scholar
  21. 21.
    Ross, J., Beath, C.M., Quaadgras, A.: You May Not Need Big Data After All. Harvard Business Review, 90–91 (2013)Google Scholar
  22. 22.
    Schroeder, J.: Big data, big business and the future of enterprise computing. NetworkWorld Asia 10, 17 (2013)Google Scholar
  23. 23.
    Sidi, F., Mohamed, K., Jabar, M., Ishak, I., Ibrahim, H., Mustapha, A.: A review of current trend on data management and quality in data communication. Australian Journal of Basic & Applied Sciences 7, 755–760 (2013)Google Scholar
  24. 24.
    Stonebraker, M.: What does ‘big data’ mean? Communications of the ACM 56, 10 (2013)CrossRefGoogle Scholar
  25. 25.
    Vaughan, L., Yang, R.: Web data as academic and business quality estimates: A comparison of three data sources. Journal of the American Society for Information Science & Technology 63, 1960–1972 (2012)CrossRefGoogle Scholar
  26. 26.
    Wang, R., Strong, D.: Beyond accuracy: What data quality means to data consumers. Journal of Management Information Systems 12, 5–33 (1996)zbMATHGoogle Scholar
  27. 27.
    Yiu, D.: 5 storage system challenges in the big data era. NetworkWorld Asia 10, 26 (2013)Google Scholar
  28. 28.
    Zhang, D.: Granularities and inconsistencies in big data analysis. International Journal of Software Engineering & Knowledge Engineering 23, 887–893 (2013)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  1. 1.Department of Business InformaticsUniversity of GdańskGdańskPoland

Personalised recommendations