Abstract
Managing data related to natural sciences poses new and challenging problems as it is impossible to represent reality on a one-to-one scale, and imprecision has to be taken into account, both in data memorization and in its processing. Machine learning has been a key enabler in the context of information extraction from natural sciences data. However, data-driven results are strongly affected by the volume, the sparsity and different types of imprecision in the available sources. Therefore, it becomes pivotal to associate both to data and to data-driven services information about their quality, in order to effectively interpret the results. Different levels of granularity and multiple data modalities captured from the same processes could coexist, due to technological constraints or other intrinsic limiting factors. In addition, different levels of granularity might be also the result of application requirements, and outcomes at multiple levels of precision needs to be provided. Affinities of quality issues in domains such as chemistry, biology, and geoinformatics are discussed in the paper.
Keywords
- Data quality
- Quality of service
- Machine learning
This is a preview of subscription content, access via your institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
References
Ameller, D., Illa, X.B., Collell, O., Costal, D., Franch, X., Papazoglou, M.P.: Development of service-oriented architectures using model-driven development: a mapping study. Inf. Softw. Technol. 62, 42–66 (2015). https://doi.org/10.1016/j.infsof.2015.02.006
Andrikopoulos, V., Benbernou, S., Papazoglou, M.P.: On the evolution of services. IEEE Trans. Softw. Eng. 38(3), 609–628 (2012). https://doi.org/10.1109/TSE.2011.22
Andrikopoulos, V., Fugini, M., Papazoglou, M.P., Parkin, M., Pernici, B., Siadat, S.H.: QoS contract formation and evolution. In: Buccafurri, F., Semeraro, G. (eds.) EC-Web 2010. LNBIP, vol. 61, pp. 119–130. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15208-5_11
Anselma, L., Piovesan, L., Terenziani, P.: Dealing with temporal indeterminacy in relational databases: an AI methodology. AI Commun. 32(3), 207–221 (2019). https://doi.org/10.3233/AIC-190619
Ardagna, D., Cappiello, C., Samá, W., Vitali, M.: Context-aware data quality assessment for big data. Future Gener. Comput. Syst. 89, 548–562 (2018). https://doi.org/10.1016/j.future.2018.07.014
Autelitano, A., Pernici, B., Scalia, G.: Spatio-temporal mining of keywords for social media cross-social crawling of emergency events. Geoinformatica 23(3), 425–447 (2019)
Batini, C., Scannapieco, M.: Data and Information Quality - Dimensions, Principles and Techniques. DSA. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-24106-7
Bertossi, L., Geerts, F.: Data quality and explainable AI. J. Data Inf. Qual. (JDIQ) 12(2), 1–9 (2020)
Bouguettaya, A., et al.: A service computing manifesto: the next 10 years. Commun. ACM 60(4), 64–72 (2017). https://doi.org/10.1145/2983528
Breck, E., Polyzotis, N., Roy, S., Whang, S., Zinkevich, M.: Data validation for machine learning. In: Talwalkar, A., Smith, V., Zaharia, M. (eds.) Proceedings of Machine Learning and Systems 2019, MLSys 2019, Stanford, CA, USA, 31 March–2 April 2019 (2019). https://proceedings.mlsys.org/book/267.pdf. mlsys.org
Brusoni, V., Console, L., Terenziani, P., Pernici, B.: Qualitative and quantitative temporal constraints and relational databases: theory, architecture, and applications. IEEE Trans. Knowl. Data Eng. 11(6), 948–968 (1999). https://doi.org/10.1109/69.824613
Butler, A., Hoffman, P., Smibert, P., Papalexi, E., Satija, R.: Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat. Biotechnol. 36(5), 411–420 (2018). https://doi.org/10.1109/69.824613
Cappiello, C., Gal, A., Jarke, M., Rehof, J.: Data ecosystems: sovereign data exchange among organizations (Dagstuhl Seminar 19391). Dagstuhl Rep. 9(9), 66–134 (2020). https://doi.org/10.4230/DagRep.9.9.66. https://drops.dagstuhl.de/opus/volltexte/2020/11845
Castano, S., De Antonellis, V., Fugini, M.G., Pernici, B.: Conceptual schema analysis: techniques and applications. ACM Trans. Database Syst. 23(3), 286–332 (1998). https://doi.org/10.1145/293910.293150
Ching, T., et al.: Opportunities and obstacles for deep learning in biology and medicine. J. Roy. Soc. Interface 15(141), 20170387 (2018)
Consortiu, H., et al.: The human body at cellular resolution: the NIH Human Biomolecular Atlas Program. Nature 574(7777), 187 (2019)
Fauw, J.D., et al.: Clinically applicable deep learning for diagnosis and referral in retinal disease. Nat. Med. 24(9), 1342–1350 (2018). http://lmb.informatik.uni-freiburg.de/Publications/2018/Ron18
Fox, C.R., Ülkümen, G.: Distinguishing Two Dimensions of Uncertainty, vol. 14, chap. 1. Universitetsforlaget Oslo (2011)
Gala, R., et al.: A coupled autoencoder approach for multi-modal analysis of cell types. In: Advances in Neural Information Processing Systems, pp. 9267–9276 (2019)
Gilpin, L.H., Bau, D., Yuan, B.Z., Bajwa, A., Specter, M., Kagal, L.: Explaining explanations: an overview of interpretability of machine learning. In: 2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA), pp. 80–89. IEEE (2018)
Gilyazev, R., Turdakov, D.Y.: Active learning and crowdsourcing: a survey of optimization methods for data labeling. Program. Comput. Softw. 44(6), 476–491 (2018). https://doi.org/10.1134/S0361768818060142
Grambow, C.A., Li, Y.P., Green, W.H.: Accurate thermochemistry with small data sets: a bond additivity correction and transfer learning approach. J. Phys. Chem. A 123(27), 5826–5835 (2019)
Gu, Z., de Schipper, N.C., Van Deun, K.: Variable selection in the regularized simultaneous component analysis method for multi-source data integration. Scientific Rep. 9(1), 1–21 (2019)
Hansen, N., He, X., Griggs, R., Moshammer, K.: Knowledge generation through data research: new validation targets for the refinement of kinetic mechanisms. In: Proceedings of the Combustion Institute (2018)
Havas, C., et al.: E2mC: improving emergency management service practice through social media and crowdsourcing analysis in near real time. Sensors 17(12), 2766 (2017)
Jagadish, H.: Big data and science: myths and reality. Big Data Res. 2(2), 49–52 (2015)
Kendall, A., Gal, Y.: What uncertainties do we need in Bayesian deep learning for computer vision? In: Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS 2017 pp. 5580–5590 (2017). http://dl.acm.org/citation.cfm?id=3295222.3295309
Sellis, T.K., et al. (eds.): Spatio-Temporal Databases. LNCS, vol. 2520. Springer, Heidelberg (2003). https://doi.org/10.1007/b83622
Kritikos, K., et al.: A survey on service quality description. ACM Comput. Surv. 46(1), 1:1–1:58 (2013). https://doi.org/10.1145/2522968.2522969
Lähnemann, D., et al.: Eleven grand challenges in single-cell data science. Genome Biol. 21(1), 1–35 (2020). https://doi.org/10.1186/s13059-020-1926-6
Li, Y.P., Han, K., Grambow, C.A., Green, W.H.: Self-evolving machine: a continuously improving model for molecular thermochemistry. J. Phys. Chem. A 123(10), 2142–2152 (2019)
Metzger, A., Pohl, K., Papazoglou, M.P., Di Nitto, E., Marconi, A., Karastoyanova, D.: Research challenges on adaptive software and services in the future internet: towards an S-cube research roadmap. In: Metzger, A., Pohl, K., Papazoglou, M.P. (eds.) First International Workshop on European Software Services and Systems Research - Results and Challenges, S-Cube 2012, Zurich, Switzerland, 5 June 2012, pp. 1–7. IEEE (2012). https://doi.org/10.1109/S-Cube.2012.6225501
Papazoglou, M.P.: Unraveling the semantics of conceptual schemas. Commun. ACM 38(9), 80–94 (1995). https://doi.org/10.1145/223248.223275
Papazoglou, M.P., Georgakopoulos, D.: Introduction. Commun. ACM 46(10), 24–28 (2003). https://doi.org/10.1145/944217.944233
Papazoglou, M.P., Traverso, P., Dustdar, S., Leymann, F.: Service-oriented computing: state of the art and research challenges. IEEE Comput. 40(11), 38–45 (2007). https://doi.org/10.1109/MC.2007.400
Ratti, F., Scalia, G., Pernici, B., Magarini, M.: A data-driven approach to optimize bounds on the capacity of the molecular channel. In: GLOBECOM 2020 - 2020 IEEE Global Communications Conference, Taipei, Taiwan, pp. 1–7. IEEE (2020) . https://doi.org/10.1109/GLOBECOM42002.2020.9322078
Scalia, G., Grambow, C.A., Pernici, B., Li, Y.P., Green, W.H.: Evaluating scalable uncertainty estimation methods for deep learning-based molecular property prediction. J. Chem. Inf. Model. 60(6), 2697–2717 (2020). https://doi.org/10.1021/acs.jcim.9b00975
Scalia, G., Pelucchi, M., Stagni, A., Cuoci, A., Faravelli, T., Pernici, B.: Evaluating scalable uncertainty estimation methods for deep learning-based molecular property prediction. Data Sci. 2(1–2), 245–273 (2019)
Squires, S., Ewing, R., Prügel-Bennett, A., Niranjan, M.: A method of integrating spatial proteomics and protein-protein interaction network data. In: Liu, D., Xie, S., Li, Y., Zhao, D., El-Alfy, E.-S.M. (eds.) ICONIP 2017, Part V. LNCS, vol. 10638, pp. 782–790. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-70139-4_79
Tan, C., Sun, F., Kong, T., Zhang, W., Yang, C., Liu, C.: A survey on deep transfer learning. In: Kůrková, V., Manolopoulos, Y., Hammer, B., Iliadis, L., Maglogiannis, I. (eds.) ICANN 2018, Part III. LNCS, vol. 11141, pp. 270–279. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01424-7_27
Tomašev, N., et al.: A clinically applicable approach to continuous prediction of future acute kidney injury. Nature 572(7767), 116–119 (2019). https://doi.org/10.1038/s41586-019-1390-
Wang, J., et al.: Data denoising with transfer learning in single-cell transcriptomics. Nat. Methods 16(9), 875–878 (2019)
Wang, T.T., et al.: BERMUDA: a novel deep transfer learning method for single-cell RNA sequencing batch correction reveals hidden high-resolution cellular subtypes. Genome Biol. 20(1), 1–15 (2019). https://doi.org/10.1186/s13059-019-1764-6
Acknowledgements
This work was funded by the European Commission H2020 project Crowd4SDG “Citizen Science for Monitoring Climate Impacts and Achieving Climate Resilience” under project No. 872944. This work expresses the opinions of the authors and not necessarily those of the European Commission. The European Commission is not liable for any use that may be made of the information contained in this work.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Pernici, B., Ratti, F., Scalia, G. (2021). About the Quality of Data and Services in Natural Sciences. In: Aiello, M., Bouguettaya, A., Tamburri, D.A., van den Heuvel, WJ. (eds) Next-Gen Digital Services. A Retrospective and Roadmap for Service Computing of the Future. Lecture Notes in Computer Science(), vol 12521. Springer, Cham. https://doi.org/10.1007/978-3-030-73203-5_18
Download citation
DOI: https://doi.org/10.1007/978-3-030-73203-5_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-73202-8
Online ISBN: 978-3-030-73203-5
eBook Packages: Computer ScienceComputer Science (R0)