Abstract
The increasing amount of digital data and the declining cost of data storage have led to the fact that companies began collecting any the data possible, regardless of its adequacy and usability. This results in increasingly diverse data, in terms of its structure, quality, availability and the source of origin. Dark data is one type of data that increases significantly as the volume of data expands. Scientific literature does not precisely define the term “dark data”, while its interpretation among scientists is ambiguous. The aim of this article entails an attempt to define the dark data occurring in an enterprise, by identification of its essential features. The article presents an overview of the definitions of the term dark data, a proposal of its interpretation, and a classification of data in a company with regard to: usability, availability and quality. The analysis of the concept of dark data was carried out via a review of international journals and articles published on the Internet by Data Science practitioners. As part of the research, four universal features of dark datasets have been indicated (unavailability, unawareness, uselessness, and costliness). Based on data availability and its quality, four groups of enterprise data have also been distinguished. The data classification developed in this way allowed systematization of the term “dark data”.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
The Digital 2020 report is a global overview of Internet users, mobile devices, social networks and e-commerce, organized by Hootsuite and We are social. The statistics published on a quarterly basis refer to global and national data (https://wearesocial.com/digital-2020. Accessed 20 Aug 2020).
- 2.
IDC report (https://www.seagate.com/files/www-content/our-story/trends/files/idc-seagate-dataage-whitepaper.pdf. Accessed 10 Sept 2020).
- 3.
These principles refer to the national statistical authorities as well as the EU statistical authority (Eurostat) and constitute a set of features characterizing the data quality for official statistics (https://ec.europa.eu/eurostat/web/products-catalogues/-/KS-02-18-142. Accessed 14 Sept 2020).
- 4.
A more extensive explanation of good quality data can be found in (European Statistics Code of Practice 2017).
References
Abiteboul S (1997) Querying semi-structured data. Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics), vol 1186, pp 1–18. https://doi.org/10.1007/3-540-62222-5_33
Banafa A (2015) Understanding dark data. https://www.bbvaopenmind.com/en/technology/digital-world/understanding-dark-data/. Accessed 18 Oct 2020
Chamberlin D, Boyce R (1974) Sequel: a structured english query language. Indo-US nuclear deal: seeking synergy in bilateralism, pp 209–224. https://doi.org/10.4324/9781315816166-20
Codd EF (1970) A relational model of data for large shared data banks. Commun ACM 26(1):64–69. https://doi.org/10.1145/357980.358007
DeepDive. http://deepdive.stanford.edu/. Accessed 20 Feb 2020
Eberendu AC (2016) Unstructured data: an overview of the data of big data. Int J Comput Trends Technol 38(1):46–50. https://doi.org/10.14445/22312803/ijctt-v38p109
European Statistics Code of Practice (2017) https://ec.europa.eu/eurostat/web/products-catalogues/-/KS-02-18-142. Accessed 14 Sep 2020
Grim DJ (2019) The dark data quandar. Am Univ Law Rev 68(76):761–822
Hand DJ (2020) Dark data: why what you don’t know matters. Princeton University Press, Princeton
Heidorn BP (2008) Shedding light on the dark data in the long tail of science. Libr Trends 57(5):280–299
IDC report (2018) https://www.seagate.com/files/www-content/our-story/trends/files/idc-seagate-dataage-whitepaper.pdf. Accessed 10 Sept 2020
Kordos J (1988) Jakość danych statystycznych. PWE, Warszawa
Lugmayr A et al (2017) Cognitive big data: survey and review on big data research and its implications. What is really “new” in big data? J Knowl Manage 21(1):197–212. https://doi.org/10.1108/jkm-07-2016-0307
Maślankowski J (2015) Analiza jakości danych pozyskiwanych ze stron internetowych z wykorzystaniem rozwiązań Big Data. Roczniki Kolegium Analiz Ekonomicznych 38:167–177
Migdał-Najman K, Najman K (2018) Dirty data—profiling, cleansing and prevention. Prace Naukowe Uniwersytetu Ekonomicznego We Wrocławiu. https://doi.org/10.15611/pn.2018.508.15
Taleb I et al (2016) Big data quality: a quality dimensions evaluation. Intl IEEE. https://doi.org/10.1109/UIC-ATC-ScalCom-CBDCom-IoP-SmartWorld.2016.145
Trajanov D et al (2018) Dark data in internet of things (IoT): challenges and opportunities. In: Proceedings of the 7th small systems simulation symposium, February, pp 1–8
Trivedi B (2017) Research on dark data analysis to reduce data complexity in big data. Int Educ Res J 3(5):361–362
Wang RY (1996) Beyond accuracy: what data quality means to data consumers. J Manage Inf Syst 12(4):5–34. https://doi.org/10.1080/07421222.1996.11518099
We Are Social, Hootsuite (2020) Global digital report 2020. https://wearesocial.com/digital-2020. Accessed 3 Jan 2020
Zhang C, Shin J, Ré C, Michael Cafarella, FN (2016) Extracting databases from dark data with DeepDive. In: SIGMOD ’16 Proceedings of the 2016 International Conference on Management of Data, pp 847–859. https://doi.org/10.1145/2882903.2904442
Zhu Y, Cai L (2015) The challenges of data quality and data quality assessment in the big data era. Data Sci J 14(2):1–10. https://doi.org/10.5334/dsj-2015-002
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Raca, K. (2021). Enterprise Dark Data. In: Jajuga, K., Najman, K., Walesiak, M. (eds) Data Analysis and Classification. SKAD 2020. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Cham. https://doi.org/10.1007/978-3-030-75190-6_8
Download citation
DOI: https://doi.org/10.1007/978-3-030-75190-6_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-75189-0
Online ISBN: 978-3-030-75190-6
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)