Skip to main content

About the Quality of Data and Services in Natural Sciences

Part of the Lecture Notes in Computer Science book series (LNTCS,volume 12521)

Abstract

Managing data related to natural sciences poses new and challenging problems as it is impossible to represent reality on a one-to-one scale, and imprecision has to be taken into account, both in data memorization and in its processing. Machine learning has been a key enabler in the context of information extraction from natural sciences data. However, data-driven results are strongly affected by the volume, the sparsity and different types of imprecision in the available sources. Therefore, it becomes pivotal to associate both to data and to data-driven services information about their quality, in order to effectively interpret the results. Different levels of granularity and multiple data modalities captured from the same processes could coexist, due to technological constraints or other intrinsic limiting factors. In addition, different levels of granularity might be also the result of application requirements, and outcomes at multiple levels of precision needs to be provided. Affinities of quality issues in domains such as chemistry, biology, and geoinformatics are discussed in the paper.

Keywords

  • Data quality
  • Quality of service
  • Machine learning

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-030-73203-5_18
  • Chapter length: 13 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   59.99
Price excludes VAT (USA)
  • ISBN: 978-3-030-73203-5
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   79.99
Price excludes VAT (USA)
Fig. 1.

Notes

  1. 1.

    http://www.crowd4sdg.eu/.

References

  1. Ameller, D., Illa, X.B., Collell, O., Costal, D., Franch, X., Papazoglou, M.P.: Development of service-oriented architectures using model-driven development: a mapping study. Inf. Softw. Technol. 62, 42–66 (2015). https://doi.org/10.1016/j.infsof.2015.02.006

    CrossRef  Google Scholar 

  2. Andrikopoulos, V., Benbernou, S., Papazoglou, M.P.: On the evolution of services. IEEE Trans. Softw. Eng. 38(3), 609–628 (2012). https://doi.org/10.1109/TSE.2011.22

    CrossRef  Google Scholar 

  3. Andrikopoulos, V., Fugini, M., Papazoglou, M.P., Parkin, M., Pernici, B., Siadat, S.H.: QoS contract formation and evolution. In: Buccafurri, F., Semeraro, G. (eds.) EC-Web 2010. LNBIP, vol. 61, pp. 119–130. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15208-5_11

    CrossRef  Google Scholar 

  4. Anselma, L., Piovesan, L., Terenziani, P.: Dealing with temporal indeterminacy in relational databases: an AI methodology. AI Commun. 32(3), 207–221 (2019). https://doi.org/10.3233/AIC-190619

    MathSciNet  CrossRef  Google Scholar 

  5. Ardagna, D., Cappiello, C., Samá, W., Vitali, M.: Context-aware data quality assessment for big data. Future Gener. Comput. Syst. 89, 548–562 (2018). https://doi.org/10.1016/j.future.2018.07.014

    CrossRef  Google Scholar 

  6. Autelitano, A., Pernici, B., Scalia, G.: Spatio-temporal mining of keywords for social media cross-social crawling of emergency events. Geoinformatica 23(3), 425–447 (2019)

    CrossRef  Google Scholar 

  7. Batini, C., Scannapieco, M.: Data and Information Quality - Dimensions, Principles and Techniques. DSA. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-24106-7

    CrossRef  MATH  Google Scholar 

  8. Bertossi, L., Geerts, F.: Data quality and explainable AI. J. Data Inf. Qual. (JDIQ) 12(2), 1–9 (2020)

    CrossRef  Google Scholar 

  9. Bouguettaya, A., et al.: A service computing manifesto: the next 10 years. Commun. ACM 60(4), 64–72 (2017). https://doi.org/10.1145/2983528

    CrossRef  Google Scholar 

  10. Breck, E., Polyzotis, N., Roy, S., Whang, S., Zinkevich, M.: Data validation for machine learning. In: Talwalkar, A., Smith, V., Zaharia, M. (eds.) Proceedings of Machine Learning and Systems 2019, MLSys 2019, Stanford, CA, USA, 31 March–2 April 2019 (2019). https://proceedings.mlsys.org/book/267.pdf. mlsys.org

  11. Brusoni, V., Console, L., Terenziani, P., Pernici, B.: Qualitative and quantitative temporal constraints and relational databases: theory, architecture, and applications. IEEE Trans. Knowl. Data Eng. 11(6), 948–968 (1999). https://doi.org/10.1109/69.824613

    CrossRef  Google Scholar 

  12. Butler, A., Hoffman, P., Smibert, P., Papalexi, E., Satija, R.: Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat. Biotechnol. 36(5), 411–420 (2018). https://doi.org/10.1109/69.824613

    CrossRef  Google Scholar 

  13. Cappiello, C., Gal, A., Jarke, M., Rehof, J.: Data ecosystems: sovereign data exchange among organizations (Dagstuhl Seminar 19391). Dagstuhl Rep. 9(9), 66–134 (2020). https://doi.org/10.4230/DagRep.9.9.66. https://drops.dagstuhl.de/opus/volltexte/2020/11845

    CrossRef  Google Scholar 

  14. Castano, S., De Antonellis, V., Fugini, M.G., Pernici, B.: Conceptual schema analysis: techniques and applications. ACM Trans. Database Syst. 23(3), 286–332 (1998). https://doi.org/10.1145/293910.293150

    CrossRef  Google Scholar 

  15. Ching, T., et al.: Opportunities and obstacles for deep learning in biology and medicine. J. Roy. Soc. Interface 15(141), 20170387 (2018)

    CrossRef  Google Scholar 

  16. Consortiu, H., et al.: The human body at cellular resolution: the NIH Human Biomolecular Atlas Program. Nature 574(7777), 187 (2019)

    CrossRef  Google Scholar 

  17. Fauw, J.D., et al.: Clinically applicable deep learning for diagnosis and referral in retinal disease. Nat. Med. 24(9), 1342–1350 (2018). http://lmb.informatik.uni-freiburg.de/Publications/2018/Ron18

    CrossRef  Google Scholar 

  18. Fox, C.R., Ülkümen, G.: Distinguishing Two Dimensions of Uncertainty, vol. 14, chap. 1. Universitetsforlaget Oslo (2011)

    Google Scholar 

  19. Gala, R., et al.: A coupled autoencoder approach for multi-modal analysis of cell types. In: Advances in Neural Information Processing Systems, pp. 9267–9276 (2019)

    Google Scholar 

  20. Gilpin, L.H., Bau, D., Yuan, B.Z., Bajwa, A., Specter, M., Kagal, L.: Explaining explanations: an overview of interpretability of machine learning. In: 2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA), pp. 80–89. IEEE (2018)

    Google Scholar 

  21. Gilyazev, R., Turdakov, D.Y.: Active learning and crowdsourcing: a survey of optimization methods for data labeling. Program. Comput. Softw. 44(6), 476–491 (2018). https://doi.org/10.1134/S0361768818060142

    MathSciNet  CrossRef  Google Scholar 

  22. Grambow, C.A., Li, Y.P., Green, W.H.: Accurate thermochemistry with small data sets: a bond additivity correction and transfer learning approach. J. Phys. Chem. A 123(27), 5826–5835 (2019)

    CrossRef  Google Scholar 

  23. Gu, Z., de Schipper, N.C., Van Deun, K.: Variable selection in the regularized simultaneous component analysis method for multi-source data integration. Scientific Rep. 9(1), 1–21 (2019)

    Google Scholar 

  24. Hansen, N., He, X., Griggs, R., Moshammer, K.: Knowledge generation through data research: new validation targets for the refinement of kinetic mechanisms. In: Proceedings of the Combustion Institute (2018)

    Google Scholar 

  25. Havas, C., et al.: E2mC: improving emergency management service practice through social media and crowdsourcing analysis in near real time. Sensors 17(12), 2766 (2017)

    CrossRef  Google Scholar 

  26. Jagadish, H.: Big data and science: myths and reality. Big Data Res. 2(2), 49–52 (2015)

    MathSciNet  CrossRef  Google Scholar 

  27. Kendall, A., Gal, Y.: What uncertainties do we need in Bayesian deep learning for computer vision? In: Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS 2017 pp. 5580–5590 (2017). http://dl.acm.org/citation.cfm?id=3295222.3295309

  28. Sellis, T.K., et al. (eds.): Spatio-Temporal Databases. LNCS, vol. 2520. Springer, Heidelberg (2003). https://doi.org/10.1007/b83622

    CrossRef  MATH  Google Scholar 

  29. Kritikos, K., et al.: A survey on service quality description. ACM Comput. Surv. 46(1), 1:1–1:58 (2013). https://doi.org/10.1145/2522968.2522969

    CrossRef  Google Scholar 

  30. Lähnemann, D., et al.: Eleven grand challenges in single-cell data science. Genome Biol. 21(1), 1–35 (2020). https://doi.org/10.1186/s13059-020-1926-6

    CrossRef  Google Scholar 

  31. Li, Y.P., Han, K., Grambow, C.A., Green, W.H.: Self-evolving machine: a continuously improving model for molecular thermochemistry. J. Phys. Chem. A 123(10), 2142–2152 (2019)

    CrossRef  Google Scholar 

  32. Metzger, A., Pohl, K., Papazoglou, M.P., Di Nitto, E., Marconi, A., Karastoyanova, D.: Research challenges on adaptive software and services in the future internet: towards an S-cube research roadmap. In: Metzger, A., Pohl, K., Papazoglou, M.P. (eds.) First International Workshop on European Software Services and Systems Research - Results and Challenges, S-Cube 2012, Zurich, Switzerland, 5 June 2012, pp. 1–7. IEEE (2012). https://doi.org/10.1109/S-Cube.2012.6225501

  33. Papazoglou, M.P.: Unraveling the semantics of conceptual schemas. Commun. ACM 38(9), 80–94 (1995). https://doi.org/10.1145/223248.223275

    CrossRef  Google Scholar 

  34. Papazoglou, M.P., Georgakopoulos, D.: Introduction. Commun. ACM 46(10), 24–28 (2003). https://doi.org/10.1145/944217.944233

    CrossRef  Google Scholar 

  35. Papazoglou, M.P., Traverso, P., Dustdar, S., Leymann, F.: Service-oriented computing: state of the art and research challenges. IEEE Comput. 40(11), 38–45 (2007). https://doi.org/10.1109/MC.2007.400

    CrossRef  Google Scholar 

  36. Ratti, F., Scalia, G., Pernici, B., Magarini, M.: A data-driven approach to optimize bounds on the capacity of the molecular channel. In: GLOBECOM 2020 - 2020 IEEE Global Communications Conference, Taipei, Taiwan, pp. 1–7. IEEE (2020) . https://doi.org/10.1109/GLOBECOM42002.2020.9322078

  37. Scalia, G., Grambow, C.A., Pernici, B., Li, Y.P., Green, W.H.: Evaluating scalable uncertainty estimation methods for deep learning-based molecular property prediction. J. Chem. Inf. Model. 60(6), 2697–2717 (2020). https://doi.org/10.1021/acs.jcim.9b00975

    CrossRef  Google Scholar 

  38. Scalia, G., Pelucchi, M., Stagni, A., Cuoci, A., Faravelli, T., Pernici, B.: Evaluating scalable uncertainty estimation methods for deep learning-based molecular property prediction. Data Sci. 2(1–2), 245–273 (2019)

    CrossRef  Google Scholar 

  39. Squires, S., Ewing, R., Prügel-Bennett, A., Niranjan, M.: A method of integrating spatial proteomics and protein-protein interaction network data. In: Liu, D., Xie, S., Li, Y., Zhao, D., El-Alfy, E.-S.M. (eds.) ICONIP 2017, Part V. LNCS, vol. 10638, pp. 782–790. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-70139-4_79

    CrossRef  Google Scholar 

  40. Tan, C., Sun, F., Kong, T., Zhang, W., Yang, C., Liu, C.: A survey on deep transfer learning. In: Kůrková, V., Manolopoulos, Y., Hammer, B., Iliadis, L., Maglogiannis, I. (eds.) ICANN 2018, Part III. LNCS, vol. 11141, pp. 270–279. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01424-7_27

    CrossRef  Google Scholar 

  41. Tomašev, N., et al.: A clinically applicable approach to continuous prediction of future acute kidney injury. Nature 572(7767), 116–119 (2019). https://doi.org/10.1038/s41586-019-1390-

    CrossRef  Google Scholar 

  42. Wang, J., et al.: Data denoising with transfer learning in single-cell transcriptomics. Nat. Methods 16(9), 875–878 (2019)

    CrossRef  Google Scholar 

  43. Wang, T.T., et al.: BERMUDA: a novel deep transfer learning method for single-cell RNA sequencing batch correction reveals hidden high-resolution cellular subtypes. Genome Biol. 20(1), 1–15 (2019). https://doi.org/10.1186/s13059-019-1764-6

    CrossRef  Google Scholar 

Download references

Acknowledgements

This work was funded by the European Commission H2020 project Crowd4SDG “Citizen Science for Monitoring Climate Impacts and Achieving Climate Resilience” under project No. 872944. This work expresses the opinions of the authors and not necessarily those of the European Commission. The European Commission is not liable for any use that may be made of the information contained in this work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Barbara Pernici .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this chapter

Verify currency and authenticity via CrossMark

Cite this chapter

Pernici, B., Ratti, F., Scalia, G. (2021). About the Quality of Data and Services in Natural Sciences. In: Aiello, M., Bouguettaya, A., Tamburri, D.A., van den Heuvel, WJ. (eds) Next-Gen Digital Services. A Retrospective and Roadmap for Service Computing of the Future. Lecture Notes in Computer Science(), vol 12521. Springer, Cham. https://doi.org/10.1007/978-3-030-73203-5_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-73203-5_18

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-73202-8

  • Online ISBN: 978-3-030-73203-5

  • eBook Packages: Computer ScienceComputer Science (R0)