Skip to main content
Log in

Adoption of human metabolic processes as Data Quality Based Models

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

The buildup of huge data within business intelligence is essential because such data includes complete conceptual and technological stack in addition to raw and processed data, data management, and analytics. Evaluating Data Quality Model Based-In-Use has gained more ground since business value could be only estimated in its used context. Despite the numerous data quality models used for regular data quality assessment, none of them have been amended to big data. For this reason, we propose four efficiencies and four metabolism processes as data quality indicators usable in big data researches. This model appropriately obtained the quality in use levels of the entry data for big data analytics, and those adequacies of Data Quality Model Based-In-Use levels could be comprehended as dependability indicators and adequacy of big data investigation. Besides, we have demonstrated the practical examples along with a proposed method, the stacked recurrent neural network for data quality assessment. Therefore, this model being independent of any pre-conditions or technologies could be integrated into various big data research.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

References

  1. Abate ML, Diegert KV, Allen HW (1998) A hierarchical approach to improving data quality. Data Qual 4(1):365–369

    Google Scholar 

  2. Ardagna D, Cappiello C, Samá W, Vitali M (2018) Context-aware data quality assessment for big data. Future Gener Comput Syst 89:548–562

    Google Scholar 

  3. Arts DG, De Keizer NF, Scheffer G-J (2002) Defining and improving data quality in medical registries: a literature review, case study, and generic framework. J Am Med Inform Assoc 9(6):600–611

    Google Scholar 

  4. Becla J, Wang DL, Lim K-T (2012) Report from the 5th workshop on extremely large databases. Data Sci J 11:37–45

    Google Scholar 

  5. Betts J, Desaix P, Johnson E, Johnson J, Korol O, Kruse D, Poe B, Wise J, Womble M, Young K (2013) Anatomy & physiology. OpenStax College, Rice University, Houston

    Google Scholar 

  6. Chandrashekar G, Sahin F (2014) A survey on feature selection methods. Comput Electr Eng 40(1):16–28

    Google Scholar 

  7. Chang V (2014) The business intelligence as a service in the cloud. Future Gener Comput Syst 37:512–534

    Google Scholar 

  8. Chang WL, Fox G et al (2015) Nist big data interoperability framework: Volume 3, use cases and general requirements, Technical report

  9. Chollet F (2018) Deep Learning mit Python und Keras: Das Praxis-Handbuch vom Entwickler der Keras-Bibliothek. MITP-Verlags GmbH & Co, Wachtendonk

    Google Scholar 

  10. Cuperlovic-Culf M (2018) Machine learning methods for analysis of metabolic data and metabolic pathway modeling. Metabolites 8(1):4

    Google Scholar 

  11. Deng L, Yu D et al (2014) Deep learning: methods and applications. Found Trends Sig Process 7(3–4):197–387

    MathSciNet  MATH  Google Scholar 

  12. Domingos P (2012) A few useful things to know about machine learning. Commun ACM 55(10):78–87

    Google Scholar 

  13. Edition TE (2014) Anatomy and physiology. Volume 2 of 3, Lulu. com

  14. Elgendy IA, El-kawkagy M, Keshk A (2015) An efficient framework to improve the performance of mobile applications. Int J Digit Content Technol Appl (JDCTA) 9(5):43–54

    Google Scholar 

  15. Elgendy I, Zhang W, Liu C, Hsu C-H (2018) An efficient and secured framework for mobile cloud computing. In: IEEE Transactions on Cloud Computing

  16. Owner D (2017) Open food facts. https://www.kaggle.com/openfoodfacts/world-food-facts

  17. Finch G, Davidson S, Kirschniak C, Weikersheimer M, Reese C, Shockley R (2014) Analytics: the speed advantage. IBM Institute for Business Value

  18. For Standardization IO (1994) ISO 8402: 1994: quality management and quality assurance-vocabulary. In: International Organization for Standardization

  19. Gandomi A, Haider M (2015) Beyond the hype: big data concepts, methods, and analytics. Int J Inf Manag 35(2):137–144

    Google Scholar 

  20. Géron A (2019) Hands-on machine learning with scikit-learn, keras, and tensorflow: concepts, tools, and techniques to build intelligent systems. O’Reilly Media, Newton

    Google Scholar 

  21. Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT Press, Cambridge

    MATH  Google Scholar 

  22. Gubbi J, Buyya R, Marusic S, Palaniswami M (2013) Internet of things (iot): a vision, architectural elements, and future directions. Future Gener Comput Syst 29(7):1645–1660

    Google Scholar 

  23. Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3(Mar):1157–1182

    MATH  Google Scholar 

  24. Han D-H, Zhang X, Wang G-R (2015) Classifying uncertain and evolving data streams with distributed extreme learning machine. J Comput Sci Technol 30(4):874–887

    MathSciNet  Google Scholar 

  25. Hong C-G, Dietze C (2019) Enabling digital excellence through business process management and process frameworks. In: Krüssel P (ed) Future Telco. Springer, Berlin, pp 341–348

  26. Iorga M, Feldman L, Barton R, Martin MJ, Goren NS, Mahmoudi C (2018) Fog computing conceptual model, Technical report

  27. ISO I (2009) Iec 25012: 2008 software engineering-software product quality requirements and evaluation (square)-data quality model. International Organization for Standarization, Ginebra

    Google Scholar 

  28. Jin D-H, Kim H-J (2018) Integrated understanding of big data, big data analysis, and business intelligence: a case study of logistics. Sustainability 10(10):3778

    Google Scholar 

  29. Kahn BK, Strong DM, Wang RY (2002) Information quality benchmarks: product and service performance. Commun ACM 45(4):184–192

    Google Scholar 

  30. Kanehisa M, Goto S, Sato Y, Kawashima M, Furumichi M, Tanabe M (2014) Data, information, knowledge and principle: back to metabolism in KEGG. Nucleic Acids Res 42(D1):D199–D205

    Google Scholar 

  31. Karkouch A, Mousannif H, Al Moatassime H, Noel T (2016) Data quality in internet of things: a state-of-the-art survey. J Netw Comput Appl 73:57–81

    Google Scholar 

  32. Kwon O, Lee N, Shin B (2014) Data quality management, data usage experience and acquisition intention of big data analytics. Int J Inf Manag 34(3):387–394

    Google Scholar 

  33. LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–44

    Google Scholar 

  34. Lee I (2017) Big data: dimensions, evolution, impacts, and challenges. Bus Horiz 60(3):293–303

    Google Scholar 

  35. Li H, Wu D, Li G-X, Ke Y-H, Liu W-J, Zheng Y-H, Lin X-L (2015) Enhancing telco service quality with big data enabled churn analysis: infrastructure, model, and deployment. J Comput Sci Technol 30(6):1201–1214

    Google Scholar 

  36. Li P, Li J, Huang Z, Li T, Gao C-Z, Yiu S-M, Chen K (2017) Multi-key privacy-preserving deep learning in cloud computing. Future Gener Comput Syst 74:76–85

    Google Scholar 

  37. Lilford R, Mohammed MA, Spiegelhalter D, Thomson R (2004) Use and misuse of process and outcome data in managing performance of acute medical care: avoiding institutional stigma. The Lancet 363(9415):1147–1154

    Google Scholar 

  38. Lin W, Wu Z, Lin L, Wen A, Li J (2017) An ensemble random forest algorithm for insurance big data analysis. IEEE Access 5:16568–16575

    Google Scholar 

  39. Loshin D (2013) Big data analytics: from strategic planning to enterprise integration with tools, techniques, NoSQL, and graph. Elsevier, Amsterdam

    Google Scholar 

  40. Mahanti R (2014) Critical success factors for implementing data profiling: the first step toward data quality. Softw Qual Prof 16(2):13

    Google Scholar 

  41. Mantha B (2014) Five guiding principles for realizing the promise of big data. Bus Intell J 19(1):8–11

    Google Scholar 

  42. McAfee A, Brynjolfsson E, Davenport TH, Patil D, Barton D (2012) Big data: the management revolution. Harv Bus Rev 90(10):60–68

    Google Scholar 

  43. Menshawy A (2018) Deep Learning By Example: a hands-on guide to implementing advanced machine learning algorithms and neural networks. Packt Publishing Ltd, Birmingham

    Google Scholar 

  44. Merino J, Caballero I, Rivas B, Serrano M, Piattini M (2016) A data quality in use model for big data. Future Gener Comput Syst 63:123–130

    Google Scholar 

  45. Miao X, Gao Y, Zhou L, Wang W, Li Q (2018) Optimizing quality for probabilistic skyline computation and probabilistic similarity search. IEEE Trans Knowl Data Eng 30(9):1741–1755

    Google Scholar 

  46. Millstein F (2018) Convolutional neural networks in python: Beginner’s guide to convolutional neural networks in python. CreateSpace Independent Publishing Platform

  47. Muraoka K, Hanson P, Frank E, Jiang M, Chiu K, Hamilton D (2018) A data mining approach to evaluate suitability of dissolved oxygen sensor observations for lake metabolism analysis. Limnol Oceanogr Methods 16(11):787–801

    Google Scholar 

  48. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V et al (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12(Oct):2825–2830

    MathSciNet  MATH  Google Scholar 

  49. Perichappan KAP (2018) Greedy algorithm based deep learning strategy for user behavior prediction and decision making support. J Comput Commun 6(6):45–53

    Google Scholar 

  50. Ramsundar B, Zadeh RB (2018) Tensor flow for deep learning: from linear regression to reinforcement learning. O’Reilly Media Inc, Newton

    Google Scholar 

  51. Saggi MK, Jain S (2018) A survey towards an integration of big data analytics to big insights for value-creation. Inf Process Manag 54(5):758–790

    Google Scholar 

  52. Saladin KS (2004) Anatomy & physiology: the unity of form and function. McGraw-Hill, New York

    Google Scholar 

  53. Schmidhuber J (2015) Deep learning in neural networks: an overview. Neural Netw 61:85–117

    Google Scholar 

  54. Shiloach M, Frencher SK Jr, Steeger JE, Rowell KS, Bartzokis K, Tomeh MG, Richards KE, Ko CY, Hall BL (2010) Toward robust information: data quality and inter-rater reliability in the American College of Surgeons national surgical quality improvement program. J Am Coll Surg 210(1):6–16

    Google Scholar 

  55. Soares S (2012) Big data quality. In: Big Data Governance: An Emerging Imperative pp 110–112

  56. Sun S, Cegielski CG, Jia L, Hall DJ (2018) Understanding the factors affecting the organizational adoption of big data. J Comput Inf Syst 58(3):193–203

    Google Scholar 

  57. Tortora G, Derrickson B (2017) Principles of anatomy and physiology. In: 15th edn. danvers, ma

  58. Unsworth K, Adriasola E, Johnston-Billings A, Dmitrieva A, Hodkiewicz M (2011) Goal hierarchy: improving asset data quality by improving motivation. Reliab Eng Syst Saf 96(11):1474–1481

    Google Scholar 

  59. Wang C, Li X, Zhou X-H (2015) Crais: a crossbar-based interconnection scheme on FPGA for big data. J Comput Sci Technol 30(1):84–96

    Google Scholar 

  60. Wang DL, Becla J, Lim K-T (2013) Report from the 6th workshop on extremely large databases. Data Sci J 12:23–32

    Google Scholar 

  61. Wu X, Zhu X, Wu G-Q, Ding W (2013) Data mining with big data. IEEE Trans Knowl Data Eng 26(1):97–107

    Google Scholar 

  62. Zampieri G, Vijayakumar S, Yaneske E, Angione C (2019) Machine and deep learning meet genome-scale metabolic modeling. PLoS Comput Biol 15(7):e1007084

    Google Scholar 

  63. Zheng A, Casari A (2018) Feature engineering for machine learning: principles and techniques for data scientists. O’Reilly Media Inc, Newton

    Google Scholar 

Download references

Acknowledgements

This paper was partially funded by the National Key R&D Program of China under Grant Nos. 2018YFB1004700, and NSFC Grant Nos. U1866602, 61602129, 61772157.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hongzhi Wang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ngueilbaye, A., Wang, H., Khan, M. et al. Adoption of human metabolic processes as Data Quality Based Models. J Supercomput 77, 1779–1817 (2021). https://doi.org/10.1007/s11227-020-03300-3

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-020-03300-3

Keywords

Navigation