Solutions for a filterable sortable pageable list in Spring

(GraphQL, REST, …)

Hello again everybody, it’s Süxy here! (smile) Today we are going to take a look at several custom implementations for the good old “filterable + sortable + pageable list” problem. In this post we are going to compare the oldschool REST solution with GraphQL and our custom solution (both at the source code level and performance), and we have some hybrid solutions as well (like GraphQL combined with our custom solution, etc.)

If you have followed my posts, you know that I published some about the same topic, so i’m going to jump over the basics. If you are interested in creating a solution with Spring Specification or you are using Couchbase and are interested in how you can easily implement your lists with it, you should take a look at my previous blogpost.

graphql vs rest better

The environment

Okay so let’s assume that we have a CMS system, with Users, Roles, Privileges, and Printers (by now you should know that I love printers, especially when they are to be detonated (who should be allowed to detonate a printer(big grin)). The relation between them is simple. One user can have multiple roles (and one role can belong to multiple users), one role can have multiple privileges (and one privilege can belong to multiple roles), and one user can have multiple printers (but one printer can only have one owner).

graphql vs rest vs custom solution tutorial

Let’s say, you have a frontend application (doesn’t really matter if it’s a native mobile app or a web app), that can show the users in 3 different ways:

  1. There is a list view, where you list your users, but don’t want to put out every detail.
  2. There is a detail view, where you want to show the users’ roles and privileges.
  3. And finally, there is an editor view in which you can assign roles to users and/or users to roles. In this last view you only need the user’s id and name. This means that your frontend application wants to query the user data in at least 3 different ways. Now how would you implement the backend for it?

The oldschool implementation

Let’s take a look at the oldschool implementation of the solution.

Nothing special, we find our specific user (or all users) convert it to a DTO object, and return it. Now let’s take a look at our DTO:

Pretty simple, it contains the user’s data. Please note that with this implementation we always return all the data to all the views. But what if you don’t want to do that? Let’s say you want to return only the data needed for each view. With this implementation you either make 3 separate DTOs or fill only the fields that the view needs, and annotate your DTO with @JsonInclude(Include.NON_NULL) annotation. I think neither is a good solution to the problem. This is where GraphQL comes in handy.

GraphQL

GraphQL is a query language with which the backend can define what the client asks for, and the client decides which information is needed on which view. With this approach, the client will only receive the data that is necessary for that specific UI. You can find a lot of tutorials on the internet, like Baeldung’s tutorial (which is missing the @Component annotations on the Resolvers) or Pusher’s tutorial (which uses Spring with Kotlin) so I’m not going to show you how to use it in your application (you can take a look at the full source code if you want (smile) ), I’m just going to focus on its pros and cons. I already told you about some of the pros so what are the cons of the ‘default’ usage that these tutorials suggest? We have the following schema defined with the proper resolvers.

It nearly does the same thing as the Controller defined above. Let’s take a look at the QueryResolver!


We have only one filter now, so the code is nearly okay, but imagine you want n filters. Your if statement could have 2^n branches. That’s just horrible in my opinion. So how could we solve this problem? Unfortunately Spring Data JPA can not build queries from parameters dynamically.

You are going to face the same problem, if you use the oldschool methodology without GraphQL. But is there any solution for this problem? OF COURSE (big grin) Let’s do some java-black-magic (smile)

The magic library

If you use our magic library, you can easily convert REST API params or GraphQL params into query filters. I think the inner mechanism is not that important this time, it nearly does the same thing as my Couchbase Query Executoryou can tell the builder how to turn your API or GraphQL parameters into a query by CriteriaBuilder. You only have to implement GenericFilterRepository’s methods like em() to provide the EntityManager, joinTables(…) to tell the library how to join your tables (if you want, you can decide if you want or not according to the measurement results below (smile) ), getOrderByAttribute(…) to tell the library how you want your entities to be sorted and getWereConditions(…) to tell the library how to filter your entities. Let’s see how this is done from the API to the lowest levels of code snippets.

The implementation

So you can either use REST or GraphQL – the top level of implementation is the only difference.

REST implementation

GraphQL implementatoion


They pretty much look the same, you build up your parameters and filters from the incoming input, validate your parameters, and run the userService::find function. With the ‘joinTables’ method you can ask the library to fetch or to not fetch your entities. See the difference below (smile) but first let’s take a look at the service layer. I’ll only describe the REST version of it, since that is a little more complicated cuz GraphQL does the mapping for you (that’s why we have that little trick with the GraphqlUserService).

Here you can have Wrapper expressions. For example: if you send ‘fields=listView’ in the queryString, then the backend can add all the fields needed for that view. If you want the client to decide which fields are needed, you can totally leave that Wrapper stuff out. This is one thing that GraphQL can’t do. With GraphQL, the client has to define the properties it needs. With our custom-rest-solution, you can define your own wrappers if you want (smile). So the service is as simple as that. What happens in the repository?

If you are using interfaces, your Repository’s interface should implement our Generic filter repository, and then partially implement the functionality. If you are using implementations only, then you don’t need the interface.

I think it’s pretty straightforward. getWhereConditions(…) only adds a criterion if the filter is given in the map. getOrderByAttribute(…) returns either the specified field or a default ordering. joinTables(…) joins the tables according to the fields defined.

So as you see, it’s way easier on the developer side to implement new conditions/properties for an entity with our library. Also, if you modify the Printer entity for example and add a new property to it and then modify the mapper, then if you query the users with their printers, you can automatically ask for the new property from the client side! Awsum isn’t it? (smile)

Okay so we now have a nice library with which a developer can easily make a filterable-sortable-pageable list, and can easily add new properties/filters to the entities. Additionally the client only gets the data needed. But what about the performance? Isn’t this solution too slow with a huge amount of data? Let’s see my colleague’s work on the measurements (smile)

Performance testing

Hey there, it’s Laki here from Wanari. Testing was an interesting and very time-consuming task. The results and conclusions are in the results section at the bottom, but if you are interested in the details or if you want to run the tests yourself, stay with me for the next bit.

Load testing our application

The first step is to fill our database with the right amount of data. Manually it wouldn’t be very effective so I made a simple service for it. We can make datasets of different sizes with it.

To do that, we have to call our service through a POST request. It inserts a load of data in the database, based on the given parameters from the request body. 

We can set the size of the tables individually, except for the printer tables. 
There is a fix number of roles and privileges in all test occasions, because it is a lifelike situation that type-like data does not change or grow over time but the number of users does.
The other things we can set are the cardinality of the relations and that is where the size of the printer table came from. We set the range of the printers by user  1 to 10 in smaller scale and 1 to 5 in the larger two. That gives us printer tables with the size of approximately five user tables in the first, and two and a half user tables in the other case. 
The same rules are true for the user-role and role-privilege relations too.

We have our application up and running, now if we call one of the db seed requests from Postman (or create a new one) we also have some data. 
For the sake of precision and time effectiveness, after I called a db seed request I made a db dump of it. I did it for two reasons: for one, the dump is a faster way to fill a schema with data, so I only had to wait it out once per size, secondly if I have to remeasure something, then I have the same data in my database unlike if I had generated it again.

Now we can start the actual tests! Yey 😀

The test is almost entirely made using the Apache Jmeter application. It’s a very useful and versatile open source software. Its purpose is obviously performance testing but it can also be used in many other ways.

All the different solutions discussed above can be found in the postman collection json and also in the Jmeter files, so you can use them to start testing right after you are done with the tasks below.

First, you obviously have to set up a Jmeter on your computer, it has nothing interesting in it, if you have problems with it, you can find help here
The next step is to open the attached .jmx file and modify the Server Name or IP field in the HTTP Request Defaults config element (it’s in the left sidebar) to match your application’s location and save it.

 

Now you can press start and the result will be there on the UI under the listeners. Nonetheless it is not recommended to run your test in GUI mode, because it is less resource efficient than the non GUI mode.
To run your test in non GUI mode you have to add the jmeter bin location to your path. 
In windows it’s in System properties → Environment variables → Path, there you can click edit and add the bin location. Now apply the changes and you can use the jmeter tag in cmd.

Now you have to open a command window preferably in the jmx file location and type the following command:

jmeter -n -t {{jmx location}} -l {{desired result csv location}} -e -o {{desired result htm location}}

After that, the test will start and you will get the result. The csv contains the result in a table, but you have the possibility to generate an html result also – which has a lot of useful charts and comparisons. The html result can be generated afterwards also from the csv.

With all the above information, you will be able to run the five tests on the five different sized dbs, and compare the data. However I’d like to give you a full picture about Jmeter, so I’ll show you how I made the tests.

In Jmeter we can create thread groups, which are the base of a heavy load web application testing. Thread groups are very simple, they basically simulate a group of users/requests. 
It has three parameters that are important to us right now:

  1. the number of threads (users) which sets the number of parallel clients,
  2. the ramp-up period which is the time it takes to reach the desired number of client-server connections and
  3. last a loop count which is to define the number of requests that one user sent. This parameter can be set to infinite also.

There are a couple of other things also such as a scheduler and an error handler, but we won’t need them now.

graph ql vs rest solution results 1

For example, if we set these parameters to 10, 5 and 20, that gives us 10 user in 5 seconds all of which make 20 requests. That gives us a total of 200 requests.

 

We have our users but we didn’t tell them what to do, this is where the samplers come into the picture, they are the actual requests. There are many types of samplers such as HTTP, java, ftp and so on. We’ll need an http request, which has a couple of parameters too, as you can see on the picture below. We can set the web server credentials and also add headers and body elements. In our case, we have a bunch of different requests on the same server, the only differences are in the path and in the body elements, so we set up an HTTP Request Default. With it we can set the server location and all the common parameters, so we have to add them only once. It has the same layout as the http request below.

graphql v rest v magic solution results

We have almost thirty different requests that we want to try out, so we should try them out first before adding them to Jmeter. For that I found Postman is the perfect tool. For those who are not familiar with it, Postman is a free API development tool.

You can give every needed parameter very similarly to Jmeter. You can set the url, headers, body, cookies and everything else, you can see below.
Unfortunately Postman can’t give you as much information about the response, but it’s easier to use and you can test the requests individually, so to try out different requests, it is perfect.
The Postman file with all the requests including the db seed requests are in the repository with the default local url: localhost:8080.

These were only the basics, there is much more to learn about these tools, and a lot to experiment with them, but this post is not about them, so let’s have a look at the results instead.

 

THE RESULTS

Before I share  the actual results with you, I’d like to talk about our preliminary expectations.
We were pretty sure, that our custom solution will be the fastest one by far and we expected GraphQL right after it. We also thought that the oldschool solution will be washed away.
We were right….on a few guesses (big grin)

We tested our solutions on the previously mentioned five occasions. These were the 100, 1 000, 10 000, 100 000 and 1 000 000 user based datasets/occasions. From now we’ll call them #1, #2 … and so on.
We ran five requests with all six solutions. We requested all of the users, one user by id, all the printers, one printer by id and one printer by owner(user) id.

Surprisingly enough, there is no one true winner, just one big looser.
As you can see, four of our six solutions performed in the same area, especially if we look only at the select by id cases. The two eccentric cases were GraphQL’s implementations.
These were the slowest in all but two test cases, where the plain oldschool solution performed a little bit under GraphQL.

The fastest one in most cases was the joined solution, which isn’t that surprising. If you know a thing or two about sql, you can write queries that perform very well. So yes, if you need only a couple of db requests, or you have a lot of developer man days to burn,

that’s your solution for sure.

Right after it, the second and third fastest ones were the custom filtered joined and the custom filtered solutions.
We’re in luck, because similar to GraphQL, these solutions are versatile and fast to implement and fortunately we don’t have to pay the price for speed.

You probably noticed that the results from #4 and #5 are incomplete, insomuch the results from #5 are missing entirely.
The explanation is the following:
In #4 the plain solution joined and the GraphQL requests mostly returned with db timeout error. So the data from them was unusable. This is a good indicator that in scales like these, the usage of these solutions is not recommended.
In #5 the same problems came up. Where the db timeout error didn’t show up, the slower solution’s processing time was still so bad that the full measurement would take almost a day. So I decided that I’ll let go of the idea of that measurement.

In the end, as I mentioned before, I can’t really line them up from worst to best. Every solution has its own advantage over the others.
Some of them are relatively slow, but it’s very easy and/or fast to implement them. Some of them are very fast, but their implementation is not so easy, nor fast.

Summing up:

It is safe to say that if you are in need of a well-performing, but still kinda easy and fast-to-implement solution, our custom solution will do the magic.

And if you more curious about this, we shared all the available data here: https://download.wanari.net/leaks/graphqlJmeter/results.zip


We did this research at Wanari. We work for clients big and small, digitalizing established enterprises as well as building MVPs for startups from a variety of sectors. Custom software development is our passion and creating the optimal solution is the challenge we love taking.

We build software we would use. And we are picky.

Get in touch with us, share your thoughts on Facebook, follow us on LinkedIn, check us out on our Insta feed! We hope to see you around.

József Lakatos

József Lakatos

Software Developer at Wanari

Latest posts by József Lakatos (see all)
Alex Sükein

Alex Sükein

- Süxy, can you tell me about robust software development Nasa uses?
- Yes. If we have a final exam tomorrow.