Java Lambdas and the Stream API

Sequential vs Parallel Streams

Your browser needs to be JavaScript capable to view this video

Try reloading this page, or reviewing your browser settings

Autoplay:
View previous videoPrevious video

This segment introduces the concept of Parallel Streams and how they differ from Sequential Streams. We also discuss the factors involved in when to use, and when not to use Parallel Streams.

Keywords

  • Sequential
  • Parallel
  • Streams
  • Threading
  • Hardware
  • Cores
  • forEach
  • Reduce
  • N*Q Model

About this video

Author(s)
Jim McLaughlin
First online
16 November 2019
DOI
https://doi.org/10.1007/978-1-4842-5594-0_9
Online ISBN
978-1-4842-5594-0
Publisher
Apress
Copyright information
© Jim McLaughlin 2019

Related content

Video Transcript

In this video, we’ll look at parallel streams and how they differ from the sequential streams we’ve used so far. Conceptually, you can think of a sequential stream as a pipeline in which the items in the stream are sent one at a time to the intermediate and terminal operations. Now with parallel streams, Java can use the multicore hardware architecture of the host machine to create a stream with multiple pipelines running in parallel. So what does that look like?

So far in this course, we’ve discussed how a stream acquires items from a data source and passes them into the intermediate and terminal operations. It does this one element at a time. Now with parallel streams, items may be processed in parallel. It does this by using multiple threads, which can each take care of moving the items through the operations.

Remember when we discussed the for-each terminal operation and how it’s explicitly non-deterministic? Well, as you can see, parallel streams make the order of the items being processed unpredictable, meaning the order in which they are sent to the for-each method can change from execution to execution.

This is also a good time to revisit another terminal operation we have discussed, and that is the reduce method. Remember how our cricket example used the reduce method to accumulate the cricket team’s win-loss record? The reduce method we took three parameters– the identity, the accumulator, and the combiner.

When using a parallel stream, now the purpose of the combiner makes more sense. If we have multiple threads pushing items through the reduce method, we end up with each thread creating their own identity object and updating the identity using the accumulator lambda.

When all items have gone through the reduce method across all threads, we now have a combiner lambda which knows how to reduce all the individual entities from the multiple threads down to a single object by combining them two at a time, until we have one object left. So how do we create parallel streams?

Well, let’s start with a little review. With sequential streams, we learned that we can create a stream from a collection by using the collection APIs stream method. We also learned that there are some static helper methods in the arrays and stream classes which we can use to create streams.

By comparison, to create a parallel stream from a collection, the collections API has a special method named parallel stream to build the multi-threaded version. But interestingly, there aren’t any equivalent static helper methods to directly create a parallel stream. Instead, there is an intermediate operation called parallel that converts an existing sequential stream into the multi-threaded parallel stream.

Because it’s an intermediate operation defined in the stream interface, we can turn any stream into a parallel stream by including it in our method chain. Here, I create a stream using the arrays stream static helper method and then immediately convert it into a multi-threaded parallel stream before performing the rest of my operations.

So this brings us to the obvious question, when should we use parallel streams over sequential streams? Well, that’s actually a complicated question, and the answer is, it depends. First, it’s important to recognize that running in parallel is no magic solution. It doesn’t always improve performance.

Because parallel streams use Java’s ForkJoinPool threading framework, we have to take into consideration all the overhead of managing the parallel operations using threads. This leads us to one of the bigger factors in considering parallel streams, and that is the number of items to be processed.

Parallel streams gain an advantage when the number of items being streamed is large and the CPU cost for processing each item is also significant. This is what Brian Goetz, the lead Java language architect on the Lambda Project at Oracle, calls the N times Q model, where N is the number of items to be processed and Q is the cost of each item.

Another important factor to consider is the data source itself. Because a parallel stream wants to split the work of processing a data source, sources which can easily be split up, like an array list or an array, are good sources for consideration. On the other hand, a linked list or an IO-based source, like a file, those are more complicated, and the parallel stream has to use more overhead to know how to split up the work.

And finally, knowing the target hardware capabilities may help us determine if parallel streams are worth it. The more cores that are available, the more efficient the ForkJoinPool framework will be. My advice is to use parallel streams cautiously. Start with sequential streams. And if you run into performance issues, do your research and see if parallel streams will improve things, knowing that there may be several factors involved.

Well, I hope you’ve enjoyed this course on Java lambdas and the stream API. And I wish you the best of luck in all your Java projects.