Skip to main content
Log in

A model-driven framework for data quality management in the Internet of Things

  • Original Research
  • Published:
Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

Abstract

The internet of Things (IoT) is a data stream environment where a large scale deployment of smart things continuously report readings. These data streams are then consumed by pervasive applications, i.e. data consumers, to offer ubiquitous services. The data quality (DQ) is a key criteria for IoT data consumers especially when considering the inherent uncertainty of sensor-enabled data. However, DQ is a highly subjective concept and there is no standard agreement on how to determine “good” data. Moreover, the combinations of considered measured attributes and associated DQ information are as diverse as the needs of data consumers. This introduces expensive overheads for developers tasked with building DQ-aware IoT software systems which are capable of managing their own DQ information. To effectively handle these various perceptions of DQ, we propose a Model-Driven Architecture-based approach that allows each developer to easily and efficiently express, through models and other provided resources, the data consumer’s vision of DQ and its requirements using an easy-to-use graphical model editor. The defined DQ specifications are then automatically transformed to generate an entire infrastructure for DQ management that fits perfectly the data consumer’s requirements. We demonstrate the flexibility and the efficiency of our approach by generating two DQ management infrastructures built on top of different platforms and testing them through a real life data stream environment scenario.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24
Fig. 25
Fig. 26
Fig. 27

Similar content being viewed by others

Notes

  1. http://www.omg.org/mda/.

  2. http://www.uml.org/.

  3. http://www.omg.org/mof/.

  4. http://www.omg.org/spec/CWM/.

  5. http://www.oracle.com/technetwork/java/javase/overview/index.html.

  6. http://www.oracle.com/technetwork/database/index.html.

  7. https://docs.oracle.com/cd/E28280_01/doc.1111/e14476/overview.htm#CEPGS106.

  8. http://www.eclipse.org/ide/.

  9. https://eclipse.org/modeling/emf/.

  10. https://eclipse.org/atl/.

  11. https://eclipse.org/acceleo/.

  12. https://docs.oracle.com/cd/E16764_01/doc.1111/e12048/intro.htm.

  13. http://www.liquibase.org/.

  14. http://www.oracle.com/technetwork/developer-tools/jdev/overview/index.html.

  15. https://docs.oracle.com/middleware/12211/osa/using-streamanalytics/toc.htm.

References

  • Abadi DJ, Carney D, Etintemel U et al (2003) Aurora: a new model and architecture for data stream management. VLDB J Int J Very Large Data Bases 12:120–139. doi:10.1007/s00778-003-0095-z

    Article  Google Scholar 

  • Aßmann U, Zschaler S, Wagner G (2006) Ontologies, meta-models, and the model-driven paradigm. Ontol Softw Eng Softw Technol 249–273. doi:10.1007/3-540-34518-3_9

  • Bailey JE, Pearson SW (1983) Development of a tool for measuring and analyzing computer user satisfaction. Manage Sci 29:530–545. doi:10.1287/mnsc.29.5.530

    Article  Google Scholar 

  • Batini C, Scannapieco M (2006) Data quality: concepts, methodologies and techniques

  • Chandrasekaran S, Cooper O, Deshpande A et al (2003) TelegraphCQ: continuous dataflow processing for an uncertain world. Cidr 20:668. doi:10.1145/872757.872857

  • Crossbow (2004) Mote hardware session—day1_Sect03_Hardware.pdf. https://www.eol.ucar.edu/isf/facilities/isa/internal/CrossBow/PresentationOverheads/Day1_Sect03_Hardware.pdf. Accessed 6 Mar 2016

  • Dasu T, Johnson T (2003) Exploratory data mining and data cleaning. Comput Math with Appl 46:980. doi:10.1016/S0898-1221(03)90170-2

  • Deligiannakis a, Stoumpos V, Kotidis Y et al (2008) Outlier-aware data aggregation in sensor networks. Data Eng 2008 ICDE 2008 IEEE 24th Int Conf 0:1448–1450. doi:10.1109/ICDE.2008.4497585

  • Franke C, Gertz M (2008) Detection and exploration of outlier regions in sensor data streams. Proc IEEE Int Conf Data Min Work ICDM. Work 2008:375–384. doi:10.1109/ICDMW.2008.21

    Google Scholar 

  • Gill S, Lee B (2015) Context aware model-based cleaning of data streams. 1–6

  • Group OM, Paper W (2000) Model driven architecture. 1–12

  • Guerra-García C, Caballero I, Piattini M (2013) Capturing data quality requirements for web applications by means of DQ-WebRE. Inf Syst Front 15:433–445. doi:10.1007/s10796-012-9401-x

  • Guptill SC, Morrison JL (2013) Elements of spatial data quality

  • IntelLabData (2004) Intel Lab Data. http://db.csail.mit.edu/labdata/labdata.html. Accessed 3 May 2015

  • International Organization for Standardization (2008) ISO/IEC 25012:2008—software engineering—software product quality requirements and evaluation (SQuaRE)—data quality model. 13

  • Jeffery SR, Alonso G, Franklin MJ et al (2006) Declarative support for sensor data cleaning. Lect Notes Comput Sci (including Subser Lect Notes Artif Intell Lect Notes Bioinformatics) 3968 LNCS:83–100. doi:10.1007/11748625_6

  • Karkouch A, Mousannif H, Al Moatassime H, Noel T (2016) Data quality in internet of things: a state-of-the-art survey. J Netw Comput Appl 73:57–81. doi:10.1016/j.jnca.2016.08.002

    Article  Google Scholar 

  • Klein A, Lehner W (2009) Representing Data quality in sensor data streaming environments. J Data Inf Qual 1:1–28. doi:10.1145/1577840.1577845

    Article  Google Scholar 

  • Klein A, Do HH, Hackenbroich G et al (2007) Representing data quality for streaming and static data. Proc Int Conf Data Eng 3–10. doi:10.1109/ICDEW.2007.4400967

  • Lei J, Bi H, Xia Y et al (2016) An in-network data cleaning approach for wireless sensor networks. Intell Autom Soft Comput 8587:1–6. doi:10.1080/10798587.2016.1152769

    Google Scholar 

  • Le-Phuoc D, Nguyen Mau Quoc H, Ngo Quoc H et al (2016) The graph of things: a step towards the live knowledge graph of connected things. J Web Semant 37–38:25–35. doi:10.1016/j.websem.2016.02.003

    Article  Google Scholar 

  • Mellor SJ, Scott K, Uhl A, Weise D (2002) Model-driven architecture. 290–297

  • Pinto-valverde JM, Pérez-guardado MÁ, Gomez-martinez L et al (2013) HDQM2: healthcare data quality maturity model

  • Qin Y, Sheng QZ, Falkner NJG et al (2014) When things matter: a data-centric view of the internet of things. CoRR abs/1407. 2:1–35.

  • Rao J, Doraiswamy S (2006) A deferred cleansing method for RFID data analytics. In: VLDB’06 Proceedings of the 32nd international conference on Very large data bases, pp 175–186

  • Sareen S, Sood SK, Gupta SK (2016) IoT-based cloud framework to control Ebola virus outbreak. J Ambient Intell Humaniz Comput 1–18. doi:10.1007/s12652-016-0427-7

  • Schmidt S, Schmidt S, Legler T et al (2005) Robust real-time query processing with QStream. VLDB’05 Proc 31st Int Conf Very large data bases 1299–1301

  • Sensirion (2016) Sensirion—digital humidity sensors for accurate measurements. https://www.sensirion.com/products/digital-humidity-sensors-for-reliable-measurements/digital-humidity-sensors-for-accurate-measurements/. Accessed 6 Mar 2016

  • Silva F, Analide C (2016) Ubiquitous driving and community knowledge. J Ambient Intell Humaniz Comput. doi:10.1007/s12652-016-0397-9

    Google Scholar 

  • Strong DM, Lee YW, Wang RY (1997) Data quality in context. Commun ACM 40:103–110. doi:10.1145/253769.253804

    Article  Google Scholar 

  • Tan YL, Sehgal V, Shahri HH (2005) Sensoclean: handling noisy and incomplete data in sensor networks using modeling. Main 1–18

  • Thanigaivelan NK, Kanth RK, Virtanen S, Isoaho J (2016) Distributed internal anomaly detection system for internet-of-things. 2016 13th IEEE Annu Consum Commun Netw Conf 0–1

  • Truyen BF (2006) The fast guide to model driven architecture. The basics of model driven architecture

  • Wang RW, Strong DM (1996) Beyond accuracy: what data quality means to data consumers. J Manag Inf Syst 12:5. doi:10.2307/40398176

    Article  Google Scholar 

  • Yao Y, Gehrke J, Madden S (2003) Query processing for sensor networks. First Conf Innov Data Syst Res {CIDR} 3:46–55. doi:10.1109/MPRV.2004.1269131

Download references

Acknowledgements

The work of A. KARKOUCH leading to these results has received funding from the Moroccan National Center for Scientific and Technical Research under the Grant No. 1 8 U C A 2 0 1 as 5.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Aimad Karkouch.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Karkouch, A., Mousannif, H., Al Moatassime, H. et al. A model-driven framework for data quality management in the Internet of Things. J Ambient Intell Human Comput 9, 977–998 (2018). https://doi.org/10.1007/s12652-017-0498-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12652-017-0498-0

Keywords

Navigation