Data Quality Evaluation in Document Oriented Data Stores

  • Emilio CristalliEmail author
  • Flavia Serra
  • Adriana Marotta
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11158)


Data quality management in document oriented data stores has not been deeply explored yet, presenting many challenges that arise because of the lack of a rigid schema associated to data. Data quality is a critical aspect in this kind of data stores, since its control is not possible and it is not a priority in the data storage stage. Additionally, data quality evaluation and improvement are also very difficult tasks due to the schema-less characteristic of data. This paper presents a first step towards data quality management in document oriented data stores. In order to address the problem, the paper proposes a strategy for defining data granularities for data quality evaluation and analyses some data quality dimensions relevant to document stores.


Document store Data Quality Schema-less Data quality dimensions Data granularities 


  1. 1.
    Db-engines ranking of document stores. Accessed 03 Feb 2018
  2. 2.
    Chodorow, K.: 50 Tips and Tricks for MongoDB Developers: Get the Most Out of Your Database. O’Reilly Media, Sebastopol (2011)Google Scholar
  3. 3.
    Dong, X., Srivastava, D.: Big Data Integration. Synthesis Lectures on Data Management. Morgan & Claypool Publishers, San Rafael (2015)CrossRefGoogle Scholar
  4. 4.
    Firmani, D., Mecella, M., Scannapieco, M., Batini, C.: On the meaningfulness of “big data quality” (invited paper). Data Sci. Eng. 1(1), 6–20 (2016). Scholar
  5. 5.
    Juddoo, S.: Overview of data quality challenges in the context of big data. In: 2015 International Conference on Computing, Communication and Security (ICCCS), pp. 1–9, December 2015.
  6. 6.
    Kwon, O., Lee, N., Shin, B.: Data quality management, data usage experience and acquisition intention of big data analytics. Int. J. Inf. Manag. 34(3), 387–394 (2014). Scholar
  7. 7.
    Sadalage, P.J., Fowler, M.: NoSQL Distilled: A Brief Guide to the Emerging World of Polyglot Persistence. Addison-Wesley Professional, Upper Saddle River (2012)Google Scholar
  8. 8.
    Scannapieco, M., Catarci, T.: Data quality under a computer science perspective. Arch. Comput. 2, 1–15 (2002)Google Scholar
  9. 9.
    Scannapieco, M., Virgillito, A., Marchetti, C., Mecella, M., Baldoni, R.: The daquincis architecture: a platform for exchanging and improving data quality in cooperative information systems. Inf. Syst. 29(7), 551–582 (2004). Scholar
  10. 10.
    Shankaranarayanan, G., Blake, R.: From content to context: the evolution and growth of data quality research. J. Data Inf. Qual. 8(2), 9:1–9:28 (2017). Scholar
  11. 11.
    Storey, V.C., Song, I.Y.: Big data technologies and management: what conceptual modeling can do. Data Knowl. Eng. 108, 50–67 (2017). Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Emilio Cristalli
    • 1
    Email author
  • Flavia Serra
    • 1
  • Adriana Marotta
    • 1
  1. 1.Universidad de la RepúblicaMontevideoUruguay

Personalised recommendations