Speed Recommendation for Keeping and Manipulating Objects

It is often the case at Wanari that we need to convert objects of one kind to become objects of another kind. This happens because we use different models in the different levels of an application. The question obviously came up: how should we do this? Does it matter what kind of a collection we use? Is the Java 8 stream more productive? Is parallelisation worth the effort?

 

So what are we going to do exactly?

We will have a bunch of standalone numbers in our short examples. The order of the items doesn’t matter. We will eventually make these strings and we will measure how long it all takes for us.

How and with which methods is it fastest to read?

Java offers a variety of options for storing our data. Every one is a bit differently good: one reads faster than the others, one writes faster, one takes up less memory, and so on. There are quite a few comparisons online, but most of them are pretty superficial. These usually don’t go into detail about calling them, they usually just mention a few operations.

For now, I only want to read, so I’ve only collected the easier collections. Here is their reading ability:

get next
ArrayList O(1) O(1)
LinkedList O(n) O(1)
HashSet O(1)
LinkedHashSet O(1)

None of them is thread-safe, however, since we only want to read it’s not going to be a problem at the moment. There is no point in examining the direct access get operation with LinkedList; it would be for sure bad because of the scale difference.

 

We will be reading from these lists in 3+1 ways to figure out which is best with which method:

 

[FOREACH]

Traditional for-each that we can nicely write in Java 8 with a method reference.

Old:

Java 8:

[STREAM-FOREACH]

We first make it a stream here then we foreach right through it. (However this is similar, it is not the same forEach we used above!!!)

[PARALLEL-FOREACH]

Here we multi-thread the stream and then use forEach. (Data might not be read in the same order than they were originally there, but element-order doesn’t concern us!)

+1 [FOR]

To make it interesting, we shall look at ArrayList with direct access as well:

 

I did the measurements quite a few times, then I took the lowest of the values. All that because I wanted to lessen the ratio of the degraded results coming in because of external forces. Still I encountered some similar (see below) outcomes. I then grouped the test data by their size and created graphs. And you can click on each graph to see it enlarged.

 

< 10 000

manipulating objects with java 8: reading a few items

It doesn’t really matter what we use with only a few items, and outside forces hugely influence the outcomes. Differences start to appear around 4-5000 items.

With even more items, you can actually see the differences between our groupings:

Lists with 10 000 – 100 000 items:

manipulating objects with java 8: reading more items

Lists with 100 000 – 1 000 000 items:

manipulating objects with java 8: reading even more items

Lists with 1 000 000 – 10 000 000 items:

manipulating objects with java 8: reading even more items 2

1 000 000 – 10 000 000  TOP

manipulating objects with java 8: reading a lot of items

(At 7 000 000 items there was an anomaly that influenced a few tests. Strangely, it improved the results… perhaps we should repeat it!)

After all these measurements, here is what we should conclude:

With a large number of items, LinkedList is better with parallel streams than with anything else.

Except for ArrayList, traditional forEaches are a lot slower.

To me the most eye-catching part is that even though the first two were running on one thread, there is no notable difference in performance between forEach and the paralleled stream with ArrayList.

When reading, paralellisation is only clearly better with a large (million and above) number of items. Moreover, with a smaller sized ArrayList, a simple forEach is more productive.

 

How can we write fastest and into which list?

With writing we have to make sure that in a multithread environment, only one thread can use the object. Most of the collections are not threadsafe and we can only make it so with a wrapper. No need to worry about it with a vector though, as they are threadsafe by design.

add thread-safe
ArrayList O(1) no
LinkedList O(1) no
HashSet O(1) no
LinkedHashSet O(1) no
Vector O(1) yes

In each case ArrayList will be the source.

[SIMPLE]

Simple writing, on one thread. (This is what is usually measured and shown elsewhere.)

[PARALLEL]

We will be writing the collection (we made it threadsafe beforehand) in parallel.

< 10 000

manipulating objects with java 8: writing items

With a small number of items, outside forces are much more apparent here as well, though it also seems to be easier to choose what is beneficial to use.

Lists with 10 000 – 100 000 items:

manipulating objects with java 8: writing a few items

Lists with 100 000 – 1 000 000 items:

manipulating objects with java 8: writing more items

Lists with 1 000 000 – 10 000 000 items:

manipulating objects with java 8: writing even more items

(At 10 000 000 items, testing LinkedList on one thread, some kind of noise appears again.)

It is obvious that being threadsafe really slows collections down. They should only be used when necessary.

Single threaded ArrayLists and LinkedLists are the fastest, there is barely a difference between them.

 

The transformation

After the measurements we now know that we would like to read and write from and to ArrayList and/or LinkedList. The question now is the method.

 

[SIMPLE]

The simplest, most traditional approach with nothing out of the ordinary:

[COLLECTOR]

The paralellisation we checked at the writing tests we won’t try now as we know it’d be very slow. But there is a delicate solution in which we can read on multithreads, but the collection won’t have to care for it being multithreaded. This method creates a separate collection for every thread (used only by that thread) and then it collects the results at the end.

We need to create the collectors:

First we need to define how a collection should be added to a thread and then how two separate ones can be brought together.

ArrayList collector LinkedList collector

< 10 000

manipulating objects with java 8: transforming items

Outside forces still have a huge effect when testing on a few items, but it seems as though the collectors are doing slightly better than other solutions.

Steady slowing is apparent with larger amounts of data, except for the LinkedList to LinkedList collector.

Lists with 10 000 – 100 000 items:

manipulating objects with java 8: transforming more items

Lists with 100 000 – 1 000 000 items:

manipulating objects with java 8: transforming even more items

(While testing the linkedlist-arraylist-collector, there again was some kind of an external force at around 800 000 items and the same must have happened with the linkedlist-linkedlist-collector at around 900 000 items.)

100 000 – 1 000 000 TOP

manipulating objects with java 8: transforming a lot of items

At large amounts of data, it is obvious that using an ArrayList to ArrayList collector is best.

 

All is all:

If you have a few 1 000 items, it really doesn’t seem to matter what you use with regards to speed. However in these cases you should take other matters into account such as the readability of your code.

With a larger number of items, you might want to use ArrayList with parallel streams and collectors to gather the data. The most prevalent Java implementations’ Collectors.toList() collector uses ArrayList too. So at the end, our solution could look like this:

 

Features of the machine I used for the measurements:

ASUS Z170-P

Intel Core i5-6600 CPU @ 3.30GHz

2x 8 GB DDR4 SDRAM Kingston KHX2400C15D4/8G

Microsoft Windows 10 Professional

Oracle Java version 1.8.0_101

 

Kristof Horvath

Kristof Horvath

Sleep is the coffee of the weak.

Latest posts by Kristof Horvath (see all)