Advertisement

An Empirical Study on the Use and Misuse of Java 8 Streams

Open Access
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12076)

Abstract

Streaming APIs allow for big data processing of native data structures by providing MapReduce-like operations over these structures. However, unlike traditional big data systems, these data structures typically reside in shared memory accessed by multiple cores. Although popular, this emerging hybrid paradigm opens the door to possibly detrimental behavior, such as thread contention and bugs related to non-execution and non-determinism. This study explores the use and misuse of a popular streaming API, namely, Java 8 Streams. The focus is on how developers decide whether or not to run these operations sequentially or in parallel and bugs both specific and tangential to this paradigm. Our study involved analyzing 34 Java projects and 5:53 million lines of code, along with 719 manually examined code patches. Various automated, including interprocedural static analysis, and manual methodologies were employed. The results indicate that streams are pervasive, parallelization is not widely used, and performance is a crosscutting concern that accounted for the majority of fixes. We also present coincidences that both confirm and contradict the results of related studies. The study advances our understanding of streams, as well as benefits practitioners, programming language and API designers, tool developers, and educators alike.

Keywords

empirical studies functional programming Java 8 streams multi-paradigm programming static analysis. 

References

  1. 1.
    Ahmed, S., and Bagherzadeh, M.: What Do Concurrency Developers Ask About?: A Large-scale Study Using Stack Overflow. In: International Symposium on Empirical Software Engineering and Measurement, 30:1–30:10 (2018).  https://doi.org/10.1145/3239235.3239524
  2. 2.
    AOL: AOL/cyclops: An advanced, but easy to use, platform for writing functional applications in Java 8. (2019). http://git.io/fjxzF (visited on 08/29/2019)
  3. 3.
    Bagherzadeh, M., and Khatchadourian, R.: Going Big: A Large-scale Study on What Big Data Developers Ask. In: Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. ESEC/FSE 2019, pp. 432–442. ACM, Tallinn, Estonia (2019).  https://doi.org/10.1145/3338906.3338939
  4. 4.
    Bagherzadeh, M., and Rajan, H.: Order Types: Static Reasoning About Message Races in Asynchronous Message Passing Concurrency. In: International Workshop on Programming Based on Actors, Agents, and Decentralized Control, pp. 21–30 (2017).  https://doi.org/10.1145/3141834.3141837
  5. 5.
    Biboudis, A., Palladinos, N., Fourtounis, G., and Smaragdakis, Y.: Streams a la carte: Extensible Pipelines with Object Algebras. In: European Conference on Object-Oriented Programming, pp. 591–613 (2015).  https://doi.org/10.4230/LIPIcs.ECOOP.2015.591
  6. 6.
    Bloch, J.: Effective Java. Prentice Hall, Upper Saddle River, NJ, USA (2018)Google Scholar
  7. 7.
    Bordet, S.: Pull Request #2837 \(\bullet \) eclipse/jetty.project, Webtide. (2018). http://git.io/JeBAF (visited on 10/20/2019)
  8. 8.
    Casalnuovo, C., Devanbu, P., Oliveira, A., Filkov, V., and Ray, B.: Assert Use in GitHub Projects. In: International Conference on Software Engineering. ICSE ’15, pp. 755–766. IEEE Press, Florence, Italy (2015). http://dl.acm.org/citation.cfm?id=2818754.2818846
  9. 9.
    Casalnuovo, C., Suchak, Y., Ray, B., and Rubio-González, C.: GitcProc: A Tool for Processing and Classifying GitHub Commits. In: International Symposium on Software Testing and Analysis. ISSTA 2017, pp. 396–399. ACM, Santa Barbara, CA, USA (2017).  https://doi.org/10.1145/3092703.3098230
  10. 10.
    Dean, J., and Ghemawat, S.: MapReduce: Simplified Data Processing on Large Clusters. Commun. ACM 51(1), 107–113 (2008).  https://doi.org/10.1145/1327452.1327492
  11. 11.
    Dyer, R., Rajan, H., Nguyen, H.A., and Nguyen, T.N.: Mining Billions of AST Nodes to Study Actual and Potential Usage of Java Language Features. In: International Conference on Software Engineering. ICSE 2014, pp. 779–790. ACM, Hyderabad, India (2014)Google Scholar
  12. 12.
    Eclipse Foundation: Eclipse Java development tools (JDT), Eclipse Foundation. (2019). http://eclipse.org/jdt (visited on 10/19/2019)
  13. 13.
    Engler, D., Chen, D.Y., Hallem, S., Chou, A., and Chelf, B.: Bugs As Deviant Behavior: A General Approach to Inferring Errors in Systems Code. In: Symposium on Operating Systems Principles. SOSP ’01, pp. 57–72. ACM, Banff, Alberta, Canada (2001).  https://doi.org/10.1145/502034.502041
  14. 14.
    EPFL: Collections–Mutable and Immutable Collections–Scala Documentation, (2017). http://scala-lang.org/api/2.12.3/scala/collection/index.html (visited on 08/24/2018)
  15. 15.
    Erdfelt, J.: Pull Request #2837 \(\bullet \) eclipse/jetty.project, Eclipse Foundation. (2018). http://git.io/JeBAM (visited on 10/20/2019)
  16. 16.
    Fink, S.J., Yahav, E., Dor, N., Ramalingam, G., and Geay, E.: Effective Typestate Verification in the Presence of Aliasing. ACM Transactions on Software Engineering and Methodology 17(2), 91–934 (2008).  https://doi.org/10.1145/1348250.1348255
  17. 17.
    Gharbi, S., Mkaouer, M.W., Jenhani, I., and Messaoud, M.B.: On the Classification of Software Change Messages Using Multi-label Active Learning. In: Symposium on Applied Computing. SAC ’19, pp. 1760–1767. ACM, Limassol, Cyprus (2019).  https://doi.org/10.1145/3297280.3297452
  18. 18.
    Jin, H., Qiao, K., Sun, X.-H., and Li, Y.: Performance Under Failures of MapReduce Applications. In: International Symposium on Cluster, Cloud and Grid Computing. CCGRID ’11, pp. 608–609. IEEE Computer Society, Washington, DC, USA (2011).  https://doi.org/10.1109/ccgrid.2011.84
  19. 19.
    Kavulya, S., Tan, J., Gandhi, R., and Narasimhan,P.: An Analysis of Traces from a Production MapReduce Cluster. In: International Conference on Cluster, Cloud and Grid Computing. CCGrid 2010, pp. 94–103. IEEE, Melbourne, Australia (2010).  https://doi.org/10.1109/CCGRID.2010.112
  20. 20.
    Ketkar, A., Mesbah, A., Mazinanian, D., Dig, D., and Aftandilian, E.: Type Migration in Ultra-large-scale Codebases. In: International Conference on Software Engineering. ICSE ’19, pp. 1142–1153. IEEE Press, Montreal, Quebec, Canada (2019).  https://doi.org/10.1109/ICSE.2019.00117
  21. 21.
    Khatchadourian, R., and Masuhara, H.: Automated Refactoring of Legacy Java Software to Default Methods. In: International Conference on Software Engineering, pp. 82–93 (2017).  https://doi.org/10.1109/ICSE.2017.16
  22. 22.
    Khatchadourian, R., and Masuhara, H.: Proactive Empirical Assessment of New Language Feature Adoption via Automated Refactoring: The Case of Java 8 Default Methods. In: International Conference on the Art, Science, and Engineering of Programming, 6:1–6:30 (2018).  https://doi.org/10.22152/programming-journal.org/2018/2/6
  23. 23.
    Khatchadourian, R., Tang, Y., Bagherzadeh, M., and Ahmed, S.: A Tool for Optimizing Java 8 Stream Software via Automated Refactoring. In: International Working Conference on Source Code Analysis and Manipulation, pp. 34–39 (2018).  https://doi.org/10.1109/SCAM.2018.00011
  24. 24.
    Khatchadourian, R., Tang, Y., Bagherzadeh, M., and Ahmed, S.: Safe Automated Refactoring for Intelligent Parallelization of Java 8 Streams. In: International Conference on Software Engineering. ICSE ’19, pp. 619–630. IEEE Press (2019).  https://doi.org/10.1109/ICSE.2019.00072
  25. 25.
    Khatchadourian, R., Tang, Y., Bagherzadeh, M., and Ray, B.: An Empirical Study on the Use and Misuse of Java 8 Streams, (2020).  https://doi.org/10.5281/zenodo.3677449. Feb. 2020.
  26. 26.
    Kochhar, P.S., and Lo, D.: Revisiting Assert Use in GitHub Projects. In: International Conference on Evaluation and Assessment in Software Engineering. EASE’17, pp. 298–307. ACM, Karlskrona, Sweden (2017).  https://doi.org/10.1145/3084226.3084259
  27. 27.
    Lau, J.: Future of Java 8 Language Feature Support on Android. Android Developers Blog (2017). http://android-developers.googleblog.com/2017/03/future-of-java-8-language-feature.html (visited on 08/24/2018)
  28. 28.
    Lu, S., Park, S., Seo, E., and Zhou, Y.: Learning from Mistakes: A Comprehensive Study on Real World Concurrency Bug Characteristics. In: International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 329–339. ACM (2008).  https://doi.org/10.1145/1346281.1346323
  29. 29.
    Lucas, W., Bonifácio, R., Canedo, E.D., Marcílio, D., and Lima, F.: Does the Introduction of Lambda Expressions Improve the Comprehension of Java Programs? In: Brazilian Symposium on Software Engineering. SBES 2019, pp. 187–196. ACM, Salvador, Brazil (2019).  https://doi.org/10.1145/3350768.3350791
  30. 30.
    Luontola, E.: Pull Request #140 \(\bullet \) orfjackal/retrolambda, Nitor Creations. (2018). http://git.io/JeBAQ (visited on 10/20/2019)
  31. 31.
    Marin, M., Moonen, L., and Deursen, A. van: An Integrated Crosscutting Concern Migration Strategy and its Application to JHotDraw. In: International Working Conference on Source Code Analysis and Manipulation (2007)Google Scholar
  32. 32.
    Mazinanian, D., Ketkar, A., Tsantalis, N., and Dig, D.: Understanding the Use of Lambda Expressions in Java. Proc. ACM Program. Lang. 1(OOPSLA), 85:1–85:31 (2017).  https://doi.org/10.1145/3133909
  33. 33.
    Microsoft: LINQ: .NET Language Integrated Query, (2018). http://msdn.microsoft.com/en-us/library/bb308959.aspx (visited on 08/24/2018)
  34. 34.
    Moncsek, A.: allow OnShow when Perspective is initialized, fixed issues with OnShow/OnHide in perspective \(\bullet \) JacpFX/JacpFX@f2d92f7, JacpFX. (2015). http://git.io/Je0X8 (visited on 10/24/2019)
  35. 35.
    Naftalin, M.: Mastering Lambdas: Java Programming in a Multicore World. McGraw-Hill (2014)Google Scholar
  36. 36.
    Nielebock, S., Heumüller, R., and Ortmeier, F.: Programmers Do Not Favor Lambda Expressions for Concurrent Object-oriented Code. Empirical Softw. Engg. 24(1), 103–138 (2019).  https://doi.org/10.1007/s10664-018-9622-9
  37. 37.
    Oracle: Collectors (Java Platform SE 10 & JDK 10)–groupingByConcurrent, (2018). http://docs.oracle.com/javase/10/docs/api/java/util/stream/Collectors.html#groupingByConcurrent(java.util.function.Function) (visited on 08/29/2019)
  38. 38.
    Oracle: HashSet (Java SE 9) & JDK 9, (2017). http://docs.oracle.com/javase/9/docs/api/java/util/HashSet.html (visited on 04/07/2018)
  39. 39.
    Oracle: java.util.stream (Java SE 9 & JDK 9), (2017). http://docs.oracle.com/javase/9/docs/api/java/util/stream/package-summary.html (visited on 02/22/2020)
  40. 40.
    Oracle: java.util.stream (Java SE 9 & JDK 9)–Parallelism, (2017). http://docs.oracle.com/javase/9/docs/api/java/util/stream/package-summary.html#Parallelism (visited on 02/22/2020)
  41. 41.
    Oracle: Stream (Java Platform SE 10 & JDK 10)–forEach, (2018). http://docs.oracle.com/javase/10/docs/api/java/util/stream/Stream.html#forEach(java.util.function.Consumer) (visited on 08/29/2019)
  42. 42.
    Oracle: Thread Interference, (2017). http://docs.oracle.com/javase/tutorial/ essential/concurrency/interfere.html (visited on 04/16/2018)
  43. 43.
    Parnin, C., Bird, C., and Murphy-Hill, E.: Adoption and Use of Java Generics. Empirical Softw. Engg. 18(6), 1047–1089 (2013).  https://doi.org/10.1007/s10664-012-9236-6
  44. 44.
    Refsnes Data: JavaScript Array map() Method, (2015). http://w3schools.com/jsref/jsrefmap.asp (visited on 02/22/2020)
  45. 45.
    Rutledge, P.: Pull Request #1 \(\bullet \) RutledgePaulV/monads, Vodori. (2018). http://git.io/JeBAZ (visited on 10/20/2019)
  46. 46.
    Sangle, S., and Muvva, S.: On the Use of Lambda Expressions in 760 Open Source Python Projects. In: Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. ESEC/FSE 2019, pp. 1232–1234. ACM, Tallinn, Estonia (2019).  https://doi.org/10.1145/3338906.3342499
  47. 47.
    Shilkov, M.: Introducing Stream Processing in F#, (2016). http://mikhail.io/2016/11/introducing-stream-processing-in-fsharp (visited on 07/18/2018)
  48. 48.
    Stack Overflow: Newest ‘java-stream’ Questions, (2018). http://stackoverflow.com/questions/tagged/java-stream (visited on 03/06/2018)
  49. 49.
    Strom, R.E., and Yemini, S.: Typestate: A programming language concept for enhancing software reliability. IEEE Transactions on Software Engineering SE-12(1), 157–171 (1986).  https://doi.org/10.1109/tse.1986.6312929
  50. 50.
    Tian, Y., and Ray, B.: Automatically Diagnosing and Repairing Error Handling Bugs in C. In: Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. ESEC/FSE 2017, pp. 752–762. ACM, Paderborn, Germany (2017).  https://doi.org/10.1145/3106237.3106300
  51. 51.
    Uesbeck, P.M., Stefik, A., Hanenberg, S., Pedersen, J., and Daleiden, P.: An empirical study on the impact of C++ lambdas and programmer experience. In: International Conference on Software Engineering. ICSE ’16, pp. 760–771. ACM, Austin, Texas (2016).  https://doi.org/10.1145/2884781.2884849
  52. 52.
    WALA Team: T.J. Watson Libraries for Analysis, (2015). http://wala.sf.net (visited on 01/18/2017)
  53. 53.
    Warburton, R.: Java 8 Lambdas: Pragmatic Functional Programming (2014)Google Scholar
  54. 54.
    Weiss, T.: Java 8: Behind The Glitz and Glamour of The New Parallelism APIs. OverOps Blog (2014). http://blog.overops.com/new-parallelism-apis-in-java-8-behind-the-glitz-and-glamour (visited on 10/20/2019)
  55. 55.
    Wilkins, G.: Issue #3681 \(\bullet \) eclipse/jetty.project@70311fe, Webtide, LLC. (2019)Google Scholar
  56. 56.
    Wilkins, G.: Jetty 9.4.x 3681 http fields optimize by gregw \(\bullet \) Pull Request #3682 \(\bullet \) eclipse/jetty.project, Webtide, LLC. (2019). http://git.io/JeBAq (visited on 09/18/2019)
  57. 57.
    Wilkins, G.: Jetty 9.4.x 3681 http fields optimize by gregw \(\bullet \) Pull Request #3682 \(\bullet \) eclipse/jetty.project. Comment, Webtide, LLC. (2019). http://git.io/Je0MS (visited on 10/24/2019)
  58. 58.
    Xiao, T., Zhang, J., Zhou, H., Guo, Z., McDirmid, S., Lin, W., Chen, W., and Zhou, L.: Nondeterminism in MapReduce Considered Harmful? An Empirical Study on Non-commutative Aggregators in MapReduce Programs. In: ICSE Companion, pp. 44–53 (2014).  https://doi.org/10.1145/2591062.2591177
  59. 59.
    Zhitnitsky, A.: How Java 8 Lambdas and Streams Can Make Your Code 5 Times Slower. OverOps Blog (2015). http://blog.overops.com/benchmark-how-java-8-lambdas-and-streams-can-make-your-code-5-times-slower (visited on 10/20/2019)
  60. 60.
    Zhou, H., Lou, J.-G., Zhang, H., Lin, H., Lin, H., and Qin, T.: An Empirical Study on Quality Issues of Production Big Data Platform. In: International Conference on Software Engineering. ICSE 2015, pp. 17–26. ACM, Florence, Italy (2015)Google Scholar

Copyright information

© The Author(s) 2020

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Authors and Affiliations

  1. 1.CUNY Hunter CollegeNew YorkUSA
  2. 2.CUNY Graduate CenterNew YorkUSA
  3. 3.Oakland UniversityRochesterUSA
  4. 4.Columbia UniversityNew YorkUSA

Personalised recommendations