Theta Architecture: Preserving the Quality of Analytics in Data-Driven Systems

  • Vasileios Theodorou
  • Ilias Gerostathopoulos
  • Sasan Amini
  • Riccardo Scandariato
  • Christian Prehofer
  • Miroslaw Staron
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 767)

Abstract

With the recent advances in Big Data storage and processing, there is a real potential of data-driven software systems, i.e., systems that employ analysis of large amounts of data to inform their runtime decisions. However, for these decisions to be trustworthy and dependable, one needs to deal with the well-known challenges on the data analysis domain: data scarcity, low-quality of data available for analysis, low veracity of data and subsequent analysis results, data privacy constraints that hinder the analysis. A promising solution is to introduce flexibility in the data analytics part of the system enabling optimization at runtime of the algorithms and data streams based on the combination of veracity, privacy and scarcity in order to preserve the target level of quality of the data-driven decisions. In this paper, we investigate this solution by providing an adaptive reference architecture and illustrate its applicability with an example from the traffic management domain.

Keywords

Big Data Reference architecture Data veracity 

References

  1. 1.
    Apache Hadoop (2017). http://hadoop.apache.org/
  2. 2.
    Abedjan, Z., Golab, L., Naumann, F.: Data profiling: a tutorial. In: Proceedings of the 2017 ACM International Conference on Management of Data, SIGMOD 2017, pp. 1747–1751 (2017)Google Scholar
  3. 3.
    Carey, P.W., Mehler, J., Bever, T.G.: Judging the veracity of ambiguous sentences. J. Verbal Learn. Verb. Behav. 9(2), 243–254 (1970)CrossRefGoogle Scholar
  4. 4.
    Cheng, S.W., Garlan, D., Schmerl, B.: Stitch: a language for architecture-based self-adaptation. J. Syst. Softw. 85(12), 1–38 (2012)CrossRefGoogle Scholar
  5. 5.
    Dong, X.L., Gabrilovich, E., Murphy, K., Dang, V., Horn, W., Lugaresi, C., Sun, S., Zhang, W.: Knowledge-based trust: estimating the trustworthiness of web sources. Proc. VLDB Endow. 8(9), 938–949 (2015)CrossRefGoogle Scholar
  6. 6.
    Dong, X.L., Saha, B., Srivastava, D.: Less is more: selecting sources wisely for integration. In: Proceedings of the 39th International Conference on Very Large Data Bases, PVLDB 2013, pp. 37–48. VLDB Endowment (2013)Google Scholar
  7. 7.
    Dustdar, S., Pichler, R., Savenkov, V., Truong, H.L.: Quality-aware service-oriented data integration: requirements, state of the art and open challenges. SIGMOD Rec. 41(1), 11–19 (2012)CrossRefGoogle Scholar
  8. 8.
    Filieri, A., et al.: Software engineering meets control theory. In: Proceedings of SEAMS 2015, pp. 71–82. IEEE, May 2015Google Scholar
  9. 9.
    Florescu, D., Koller, D., Levy, A.Y.: Using probabilistic information in data integration. In: Proceedings of the 23rd International Conference on Very Large Data Bases, VLDB 1997, Athens, Greece, pp. 216–225, 25–29 August 1997Google Scholar
  10. 10.
    Garlan, D., Cheng, S.W., Huang, A.C., Schmerl, B., Steenkiste, P.: Rainbow: architecture-based self-adaptation with reusable infrastructure. Computer 37(10), 46–54 (2004)CrossRefGoogle Scholar
  11. 11.
    Geistefeldt, J.: Operational experience with temporary hard shoulder running in Germany. Transp. Res. Rec. J. Transp. Res. Board 2278(6), 67–73 (2012)CrossRefGoogle Scholar
  12. 12.
    Ghezzi, C., Pinto, L.S., Spoletini, P., Tamburrelli, G.: Managing non-functional uncertainty via model-driven adaptivity. In: Proceedings of ICSE 2013, pp. 33–42. IEEE (2013)Google Scholar
  13. 13.
    Gladbach, B.: Bundesanstalt fr Straenwesen: Merkblatt fr die Ausstattung von Verkehrsrechnerzentralen und Unterzentralen (MARZ). Technical report, Ausgabe 1999 (1999)Google Scholar
  14. 14.
    Kephart, J., Chess, D.: The vision of autonomic computing. Computer 36(1), 41–50 (2003)MathSciNetCrossRefGoogle Scholar
  15. 15.
    Kreps, J., Narkhede, N., Rao, J., et al: Kafka: a distributed messaging system for log processing. In: Proceedings of the 6th International Workshop on Networking Meets Databases (NetDB 2011), pp. 1–7 (2011)Google Scholar
  16. 16.
    Krotofil, M., Larsen, J., Gollmann, D.: The process matters. In: Proceedings of the 10th ACM Symposium on Information Computer and Communications Security. Association for Computing Machinery (ACM) (2015)Google Scholar
  17. 17.
    Levine, T.R., Park, H.S., McCornack, S.A.: Accuracy in detecting truths and lies: documenting the “veracity effect”. Commun. Monogr. 66(2), 125–144 (1999)CrossRefGoogle Scholar
  18. 18.
    Li, Q., Li, Y., Gao, J., Zhao, B., Fan, W., Han, J.: Resolving conflicts in heterogeneous data by truth discovery and source reliability estimation. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, pp. 1187–1198. ACM (2014)Google Scholar
  19. 19.
    Lukoianova, T., Rubin, V.L.: Veracity roadmap: is Big Data objective, truthful and credible? (2014)Google Scholar
  20. 20.
    Mann, S., Vrij, A.: Police officers’ judgements of veracity tenseness, cognitive load and attempted behavioural control in real-life police interviews. Psychol. Crime Law 12(3), 307–319 (2006)CrossRefGoogle Scholar
  21. 21.
  22. 22.
    Marz, N., Warren, J.: Big Data: Principles and Best Practices of Scalable Realtime Data Systems, 1st edn. Manning Publications Co., Greenwich (2015)Google Scholar
  23. 23.
    Menzies, T., Zimmermann, T.: Software analytics: so what? IEEE Softw. 30(4), 31–37 (2013)CrossRefGoogle Scholar
  24. 24.
    Mihaila, G.A., Raschid, L., Vidal, M.: Using quality of data metadata for source selection and ranking. In: Proceedings of the Third International Workshop on the Web and Databases, pp. 93–98 (2000)Google Scholar
  25. 25.
    Naumann, F., Freytag, J.C., Spiliopoulou, M.: Quality driven source selection using data envelope analysis. In: Third Conference on Information Quality (IQ 1998), pp. 137–152 (1998)Google Scholar
  26. 26.
    Pautasso, C., Zimmermann, O., Leymann, F.: Restful web services vs. “Big” web services: making the right architectural decision. In: Proceedings of the 17th International Conference on World Wide Web, WWW 2008, pp. 805–814. ACM, New York (2008)Google Scholar
  27. 27.
    Quix, C., Hai, R., Vatov, I.: Metadata extraction and management in data lakes with GEMMS. CSIMQ 9, 67–83 (2016)CrossRefGoogle Scholar
  28. 28.
    Salehie, M., Tahvildari, L.: Self-adaptive software: landscape and research challenges. ACM Trans. Auton. Adapti. Syst. 4(2), 1–40 (2009)CrossRefGoogle Scholar
  29. 29.
    Schmid, S., Gerostathopoulos, I., Prehofer, C., Bures, T.: Self-adaptation based on big data analytics: a model problem and tool. In: Proceedings of the 12th International Symposium on Software Engineering for Adaptive and Self-Managing Systems (SEAMS 2017), pp. 102–108. IEEE Press, Piscataway (2017). https://doi.org/10.1109/SEAMS.2017.20
  30. 30.
    Srinivasa, S., Bhatnagar, V. (eds.): BDA 2012. LNCS, vol. 7678. Springer, Heidelberg (2012)Google Scholar
  31. 31.
    Staron, M., Scandariato, R.: Data veracity in intelligent transportation systems: the slippery road warning scenario. In: 2016 IEEE Intelligent Vehicles Symposium (IV), pp. 821–826. IEEE (2016)Google Scholar
  32. 32.
    Zhang, Y., Wang, H., Gao, H., Li, J.: Efficient accuracy evaluation for multi-modal sensed data. J. Comb. Optim. 32(4), 1068–1088 (2016)MathSciNetCrossRefMATHGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Vasileios Theodorou
    • 1
  • Ilias Gerostathopoulos
    • 2
  • Sasan Amini
    • 2
  • Riccardo Scandariato
    • 3
  • Christian Prehofer
    • 4
  • Miroslaw Staron
    • 3
  1. 1.Intracom SA Telecom SolutionsAthensGreece
  2. 2.Technische Universität MünchenMunichGermany
  3. 3.University of GothenburgGothenburgSweden
  4. 4.Fortiss GmbHMunichGermany

Personalised recommendations