Skip to main content

Big Data and software engineering: prospects for mutual enrichment


Software engineering has evolved over the last 50 years, initially as a response to the so-called software crisis (the problems that organizations had producing quality software systems on time and on budget) of the 1960s and 1970s. Software engineering (SE) has been defined as “the application of a systematic, disciplined, quantifiable approach to the development, operation, and maintenance of software; that is, the application of engineering to software”. Software engineering has developed a number of approaches to areas such as software requirements, software design, software testing, and software maintenance. Software development processes such as the waterfall model, incremental development, and the spiral model have been successfully applied to produce high-quality software on time and under budget. More recently, agile software development has gained popularity as an alternative to the more traditional development methods for development of complex systems. Within the last decade or so, advances in technologies such as mobile computing, social networks, cloud computing, and the Internet of things have given rise to massive datasets which have been given the name Big Data (BD). Big Data has been defined as data with 3Vs—high volume, velocity, and variety. Big Data contains so much data that low probability events are captured in the data. These events can be discovered using analytics methods and turned into actionable intelligence which can be used by businesses to gain a competitive advantage. Unfortunately, the very scale of BD often renders inadequate SQL-based relational database systems which have formed the backbone of data intensive systems for the last 30 years, requiring new NoSQL technologies to be effective. In this paper, we will explore how well-established SE technology can be adapted to support successful development of BD projects, as well as how BD techniques can be used to increase the utility of SE processes and techniques. Thus, BD and SE may mutually support and enrich each other.

This is a preview of subscription content, access via your institution.


  1. Brooks, F.P.: The Mythical Man-Month. Addison-Wesley, (1975)

  2. Abran, A., Moore, J.W., Bourque, P., Dupuis, R., Tripp, L.L.: Guide to the Software Engineering Body of Knowledge, IEEE, (2004)

  3. Sommerville, I.: Software Engineering, 10th edn. Pearson, (2015)

  4. Agile Manifesto,

  5. Bourque, P., Fairley, R.E. eds.: SWEBOK: Guide to the Software Engineering Body of Knowledge, Version 3.0, IEEE Computer Society Press, (2014)

  6. Laney, D.: “3-D Data Management: Controlling Data Volume, Velocity and Variety”, META Group Research Note, February 6, (2001)

  7. Kopetz, H.: “Internet of Things”, Real-Time Systems, Real-Time Systems Series. Springer, (2011)

  8. Khan, M.A., Uddin, M.F., Gupta, N.: ”Seven V’s of Big Data Understading Big Data to Extract Value”, Proceedings of the 2015 Zone 1 Conference of the American Society for Engineering Education, pp. 1-5, (2014)

  9. Zikopoulos, P., Eaton, C.: Understanding Big Data: Analytics for Enterprise Class Hadoop and Streaming Data, 1st edn. McGraw-Hill Osborne Media, (2011)

  10. Leavitt, N.: Will NoSQL Databases Live Up to Their Promise? Computer 43(2), 12–14 (2010)

    Article  Google Scholar 

  11. Buse, R.P.L., Zimmerman, T.: “Information Needs for Software Deveopment Analytics”, Proceedings 34th International Conference on Software Engineering – ICSE 2012, pp. 987-996, (2012)

  12. Szyperski, C., Peticlerc, M., Barga, R.: Three Experts on Big Data Engineering. IEEE Software 33(2), 68–72 (2016)

    Article  Google Scholar 

  13. Sena, B., Allian, A.P., Nakagawa, E.Y.: “Characterizing Big Data Software Architectures: A Systematic Mapping Study”, Proceedings of the 11th Brazilian Symposium on Software Components, Architectures, and Reuse, (2017)

  14. Gorton, I., Klein, J.: Distribution, Data, Deployment: Software Architecture Convergence in Big Data Systems. IEEE Software 32(3), 78–85 (2015)

    Article  Google Scholar 

  15. Chen, H.M., Kazman, R., Haziyev, S.: Agile Big Data Analytics for Web-Based Systems: An Architecture-Centric Approach. IEEE Transactions on Big Data 3(2), 234–248 (2016)

    Article  Google Scholar 

  16. Guerriero, M., Tajfar, S., Tamburri, D.A., Di Nitto, E.: “Towards a Model-Driven Design Tool for Big Data Architectures”, Proceedings of the 2nd International Workshop on BIG Data Software Engineering (BIGDSE ’16), ACM, New York, NY, USA, pp. 37-43, (2016)

  17. Osvaldo, S.S., Lopes, D., Silva, A.C., Abdelouahab, Z.: Developing Software Systems to Big Data Platform Based on MapReduce Model: An Approach Based on Model Driven Engineering. Information and Software Technology 92, 30–48 (2017)

    Article  Google Scholar 

  18. Kätevä, J., Laurinen, P., Rautio, T., Suutala, J., Tuovinen, L., Röning, J.: “DBSA: a Device-Based Software Architecture for Data Mining”, Proceedings of the 2010 ACM Symposium on Applied Computing (SAC ’10), pp. 2273-2280, (2010)

  19. Nadal, S., Herrero, V., Romero, O., Abelló, A., Franch, X., Vansummeren, S., Valerio, D.: A Software Reference Architecture for Semantic-Aware Big Data Systems. Information and Software Technology 90, 75–92 (2017)

    Article  Google Scholar 

  20. Zhang, W., Xu, L., Li, Z., Lu, Q., Liu, Y.: A Deep-Intelligence Framework for Online Video Processing. IEEE Software 33(2), 44–51 (2016)

    Article  Google Scholar 

  21. Wu, D., Zhu, L., Xu, X., Sakr, S., Sun, D., Lu, Q.: Building Pipelines for Heterogeneous Execution Environments for Big Data Processing. IEEE Software 33(2), 60–67 (2016)

    Article  Google Scholar 

  22. Chen, H., Kazman, R., Haziyev, S.: Strategic Prototyping for Developing Big Data Systems. IEEE Software 33(2), 36–43 (2016)

    Article  Google Scholar 

  23. Miranskyy, A., Hamou-Lhadj, A., Cialini, E., Larsson, A., Liu, Y.: Operational-Log Analysis for Big Data Systems: Challenges and Solutions. IEEE Software 33(2), 52–59 (2016)

    Article  Google Scholar 

  24. Camilli, M.: “Formal Verification Problems in a Big Data World: Towards a Mighty Synergy”, Companion Proceedings of the 36th International Conference on Software Engineering (ICSE Companion 2014). ACM, New York, NY, USA, pp. 638-641, (2014)

  25. Shapira, G., Chen, Y.: Common Pitfalls of Benchmarking Big Data Systems. IEEE Transactions on Services Computing 9(1), 152–160 (2016)

    Article  Google Scholar 

  26. Saltz, J.: “Acceptance Factors for Using a Big Data Capability and Maturity Model”, Proceedings of the 25th European Conference on Information Systems (ECIS), pp. 2602-2612, (2017)

  27. Lin, Y., Huang, S.J.: The Design of a Software Engineering Life Cycle Process for Big Data Projects. IT Professional (2017).

  28. Al-Jaroodi, J., Hollein, B., Mohamed, N.: “Applying software engineering processes for big data analytics applications development”, Proceedings IEEE \(7^{{\rm th}}\) Annual Computing and Communication Workshop and Conference (CCWC), (2017)

  29. Sachdeva, V., Chung, L.: “Handling Non-Functional Requirements for Big Data and IOT Projects in Scrum”, Proceedings 7th International Conference on Cloud Computing, Data Science & Engineering - Confluence, (2017)

  30. Dutta, D., Bose, I.: Managing a Big Data project: The case of Ramco Cements Limited. International Journal of Production Economics” 165, 293–306 (2015)

    Article  Google Scholar 

  31. Begel, A., Zimmerman, T.: “Analyze This! 145 Questions for Data Scientists in Software Engineering”, Proceedings 36\(^{{\rm th}}\) International Conference on Software Engineering – ICSE 2014, pp.12-23, (2014)

  32. Kim, M., Zimmermann, T., DeLine, R., Begel, A.: Data Scientists in Software Teams: State of the Art and Challenges. IEEE Transactions on Software Engineering (2017).

  33. Robbes, R., Kamei, Y., Pinzger, M.: Guest Editorial: Mining Software Repositories. Empirical Software Engineering 22, 1143–1145 (2017)

    Article  Google Scholar 

  34. Choetkiertikul, M., Dam, H.K., Tran, T., Ghose, A.: Predicting the Delay of Issues with Due Dates in Software Projects. Empirical Software Engineering 22, 1223–1263 (2017)

    Article  Google Scholar 

  35. Coelho, R., Almeida, L., Gousios, G., et al.: Exception Handling Bug Hazards in Android: Results From a Mining Study and an Exploratory Survey. Empirical Software Engineering 22(3), 1264–1304 (2017)

    Article  Google Scholar 

  36. Batarseh, F., Gonzalez, A. J.: “Predicting Failures in Agile Software Development through Data Analytics“, Software Quality Journal, pp. 1-18, 2015,

  37. Sawant, A., Bachelli, A.: fine-GRAPE: Fine-Grained APi Usage Extractor - An Approach and Dataset to Investigate API Usage. Empirical Software Engineering 22(3), 1348–1371 (2017)

    Article  Google Scholar 

  38. Spinellis, D.: A repository of Unix history and evolution. Empirical Software Engineering 22(3), 1372–1404 (2017)

    Article  Google Scholar 

  39. Caneill, M., Germán, D.M., Zacchiroli, S.: The Debsources Dataset: Two Decades of Free and Open Source Software. Empirical Software Engineering 22(3), 1405–1437 (2017)

    Article  Google Scholar 

  40. Hentschel, J., Schmietendorf, A., Dumke, R.R.: “Big Data Benefits for the Software Measurement Community”, 2016 Joint Conference of the International Workshop on Software Measurement and the International Conference on Software Process and Product Measurement (IWSM-MENSURA), (2016),

  41. Telea, A., Voinea, L.: Visual Software Analytics for Build Optimization of Large-Scale Software Systems. Computational Statistics 26(4), 635–654 (2011)

    MathSciNet  Article  MATH  Google Scholar 

  42. González-Torres, A., García-Peñalvo, F.J., Therón-Sánchez, R., Colomo-Palacios, R.: Knowledge Discovery in Software Teams by Means of Evolutionary Visual Software Analytics. Science of Computer Programming 121(1), 55–74 (2016)

    Article  Google Scholar 

  43. Schmid, S., Gerostathopoulos, I., Prehofer, C., Bures, T.: “Self-Adaptation Based on Big Data Analytics: A Model Problem and Tool”, Proceedings of the 12th International Symposium on Software Engineering for Adaptive and Self-Managing Systems, pp. 102-108, (2017)

  44. Gorton, I., Bener, A.B., Mockus, A.: Software Engineering for Big Data Systems. IEEE Software 33(2), 32–35 (2016)

    Article  Google Scholar 

  45. Bagriyanik, S., Karahoca, A.: Big Data in Software Engineering: A Systematic Literature Review. Global Journal of Information Technology 6(1), 107–116 (2016)

    Google Scholar 

  46. Rouhani, S., Rotbei, S., Shamizanjani, M.: “Meta-Synthesis of Big Data Impacts on Information Systems Development”, Journal of Management Analytics vol. 4, no. 2, (2017)

  47. Kumar, V.D., Alencar, P.: “Software Engineering for Big Data Systems: Domains, Methodologies and Gaps”, Proceedings of IEEE International Conference on Big Data, (2016)

  48. Kumar, V.D.: “Software Engineering for Big Data Systems”, Masters Degree Thesis, University of Waterloo, (2017)

  49. Otero, C.E., Peter, A.: Research Directions for Engineering Big Data Analytics Software. IEEE Intelligent Systems 30(1), 13–19 (2015)

    Article  Google Scholar 

  50. Madhavji, N.H., Miranskyy, A., Kontogiannis, K.: “Big Picture of Big Data Software Engineering: With Example Research Challenges”, Proceedings of the First International Workshop on BIG Data Software Engineering (BIGDSE ’15), pp. 11-14, (2015)

Download references

Author information

Authors and Affiliations


Corresponding author

Correspondence to Timothy Arndt.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Arndt, T. Big Data and software engineering: prospects for mutual enrichment. Iran J Comput Sci 1, 3–10 (2018).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:


  • Big Data
  • Software engineering
  • Software analytics
  • Data mining