Adventures of the Junior on the Memory Lake (Leak)

Introduction

First of all, I’d like to apologize for the terrible pun in the title. It’s 8 o’clock in the morning and I’m just trying to get through the first hour with my coffee and my sense of humor is definitely not at the top of its game at this time.

memory leak first coffee

This is me, Zoli and in my first ever Wanari Leaks blogpost I am going to tell you about finding a memory leak in one of our projects and about my adventures discovering its source and (trying to) fixing it.

About the project in a nutshell

We have this project called Jarvis, which is currently an internal anonymous feedback system, that being said, we plan on extending its functionality. We started the project with a modular point of view in mind – meaning that we wanted to make an extensive use of the sbt multiproject setup. This was the first time we used this structure, so it was kind of an experiment, which is actually fine, because this is an internal project, as I’ve mentioned. (smile) Also, I wouldn’t say that we used this setup properly.. But hey, this is how people learn! ^^ We made an extensive use or akka-http, slick, mongodb, and for testing scalatest and mockito.

Overall the project structure looks like this:

  • Jarvis (main project)
    • common
    • feedback
    • mailer
    • security
    • usermanagement

Each subproject has its own unit and integration tests and additionally to these, we use scapegoat as a static code analyzer and neo-sbt-scalafmt as a code formatter, in order to ensure code quality. Like every programmer, I am also the lazy type.. I hated that I always had to type the following, arguably pretty long command before creating a pull request. So, I have created a simple command alias.

I also like to run these sbt commands from GIT bash (everyone has their preferences.. (smile)).

As long as the project was small, all of these worked very well together, but then my adventures began…

The first encounter

It was a sunny Thursday morning and I was sooo happy to finish working on one of the issues that had been assigned to me. I was just about to submit the pull request, but of course as a very responsible programmer, I ran my sweet command: sbt check. However, instead of my favorite success message I’ve got the following:

My honest first reaction was:

memory leak first reaction

This was my first time ever meeting with an exception in connection with the JVM, so I was really clueless. While my seniors have met with similar exceptions, its cause can vary, so it was time to ask Google. I have quickly found that I am not alone with this issue on Stackoverflow Is java.lang.OutOfMemoryError: Metaspace normal in sbt REPL?

Identifying the problem + Tools

Reading that this could potentially be a memory leak immediately triggered me and I started looking into the issue. The first step was to learn how to actually diagnose a memory leak.. (big grin) It’s like a sea monster… You usually don’t realize it until it’s too late.. Luckily, the Java JDK (since JDK 7u40) comes with a really useful multi-functional weapon to fight these monsters. Actually, it’s more like a radar. This awesome tool is called Java Mission Control, it lets you continuously collect low level and detailed runtime information about any JVMs, enabling after-the-fact incident analysis. You can find JMC in the Java source folder (for me this was: C:\Program Files\Java\jdk1.8.0_131\bin). When you start the application, you will find the JVM Browser on the left side and by right-clicking on one of the JVMs, you can start the JMX Console for that machine. This is what you’re going to see:

memory leak jmx

As you can see in the picture, I have connected the JMC to the one JVM that runs the sbt, and I can now see statistics about memory and CPU usage, check Threads and several other functions. By default, in the Memory section, you will only see the information about the heap, but we need the Metaspace. So, by clicking on the + in the top right corner of the section we can add the statistics about the Metaspace to the graph so that we can keep an eye on it as well. If you want to learn more about JMC, I’d recommend checking out its official website.

It’s time now to proove our hypothesis about the problem being a memory leak. The expected behaviour would look like the below pic if the metaspace would reach its maximum value, then the Garbage Collector would start working and free some space for new meta-data information. In the JMC, this should look like this:

memory leak jmc

From time to time, there are valleys, which are right after garbage collection. However, what I saw instead in our application’s tests was the following:

memory leak garbage collection

As you can see, there are no alleys.. At all.. Only on the heap, but we’ve got issues with the Metaspace anyway. What you can see here is that while our tests were running, the metaspace was using more and more memory, but there was no garbage collection. This is not good. We forgot to stop some services at the end of the tests. If we go to the Threads tab in JMC, we can see the following:

memory leak threads exploration

Okay.. Something is really off. I realized the following while looking at the above:

  1. I have no idea how I’ll deal with this.. Like even my seniors had no idea.
  2. Okay let’s try this…
  3. ScalaTest-dispatcher? jarvis-common-test-scheduler-1? Yes something keeps our tests running.
  4. AsyncAppender-Worker-ASYNC? Come on, our logger is also stuck.memory leak step five

I find it important to mention here, that you shouldn’t be afraid of these challenges even if you are “just” a junior.. (smile) This is the only way you will learn new things. (smile) Try and fail… Try again and succeed. (smile)

Finding the source(s) of the leak

Finding the source of memory leak is sometimes obvious… Sometimes it isn’t really.. I suspected that we actually have several memory leaks, not just one. The most obvious was the AsyncAppender, which is definitely a Logback leak. Finding the others was not trivial, so I started Googling again for java.lang.OutOfMemoryError: Metaspace and after going through some links, I have seen several suggestions like Increase the size of Metaspace by -XX:MaxMetaspaceSize=512m . This is cool and stuff, but I was thinking: just increasing the size of Metaspace will not really solve the problem.. The exception will just arise a little bit later with more memory.

This was not the correct way of Googling.. I was already sure that this is a memory leak, so let’s focus on finding out its sources. You’ve got to learn how to Google as well.. (big grin) So this time I searched for how to find source of memory leak scala. I have found the mentioning of Eclipse MAT. This tool is a Java heap analyzer that helps us find memory leaks! Great! Also while reading through links, I have seen several mentions of Class loader leak in regards to Metaspace out of memory errors. So, in order to user Eclipse MAT we are going to need a Heap Dump right after an out of memory exception occurance. You could ask: “But don’t we need the Metaspace dump?” Well.. Technically yes, but according to the answer from apangin on Stackoverflow The heap dump will also help, because each class in the Metaspace has a corresponding java.lang.Class instance in the heap. So I was like: let’s try it this way.. (smile)

I found out from the JMC that the PID of my JVM is 24524, but you could also easily open up a command line on Windows and use the jps command to find out the PID of your Java process.

You have two options to import your Java heap dump into Eclipse MAT:

  1. Open up Eclipse MAT and use the File/Acquire Heap Dump… option and select your JVM’s PID, then click Finish. OR
  2. Use your command line and type the following: jmap -dump:file=[file-name].bin [PID]. Then, in Eclipse MAT select File/Open Heap Dump… and select the file you’ve just created.

The next step after both cases is to select Leak Suspects Report and click on Finish.

Here, I followed Mattias Jiderhamn’s blogpost about Classloader leaks I – How to find classloader leaks with Eclipse Memory Analyser (MAT). This gave me the hint, that it’s actually one of our Threads that WE created somewhere, which causes the leak. But I couldn’t dig deeper in the Heap to find out the exact source. So, what I did was: I opened another project of ours and ran our tests there. Guess what!? We had the same error while running the tests! Here, I was actually really happy that we had subprojects, because I could run the tests on each project seperately and what I found out was that only those tests are leaky, which had akka-http as a library dependency. This was also the only common library in the other project and Jarvis.

But there was one subproject, which didn’t have akka-http… But still, there was a leak somewhere. Aaaarghh…

And then don’t forget about the obvious AsyncAppender… (big grin)

So, I have started experimenting by starting a new project and just including one library at a time. In the end, I could narrow down the sources of leaks to Akka HTTP, Mockito and Logback. Here, I have to add that most to these are not necessarily bugs in the library. Sometimes you should just read the documentation and use these libraries according to it. Anyway, let’s see how I overcame these issues.

Logback

For me the first and most obvious issue was our logger. At that time we haven’t had a specific logback.xml configuration specified for tests, so the tests picked up the configuration from the main resources folder, which looked like this (only the relevant parts of configuration):

I guess you already see what might be the cause of issue here… If you don’t: don’t worry, I also didn’t see it at first.. (big grin) If we check the documentation of logback’s AsyncAppender you can immediately see the following:

APPLICATION STOP/REDEPLOY Upon application shutdown or redeploy, AsyncAppender must be stopped in order to stop and reclaim the worker thread and to flush the logging events from the queue.

Well, our tests did not do that. I had two options here:

  1. Stop the LoggerContext at the end of the tests or
  2. just define a logging configuration, which doesn’t use an AsyncAppender in tests.

I went with the 2nd option, because I wanted to spare time (this was the more trivial solution) and the log messages are not that useful during tests anyway. If the logger hangs the running of tests, then there is probably a bigger issue anyway, so you would have to look into it and in this case the AsyncAppender wouldn’t make a difference.

Conclusion: read the documentation! If you don’t… Well you’ll have to do the walk of shame if something gets messed up because of you (big grin) But hey, these things happen, learn from your mistakes! (smile) Or Cersei’s mistakes (big grin)

memory leak shame

Akka HTTP

Actually, the issue was not specifically Akka HTTP, but the underlying Akka ActorSystem. In our application, there were two places where we had to use and ActorSystem for testing. In the API tests and in our Service tests. What I realized however, was that the memory leak was only present in our service tests, but not in the API tests.. I was like… OK what might be the difference here?? Let’s see…

There are actually two differences… LoginJsonSupport.. Well, that is unlikely.. That doesn’t use any ActorSystem, so what prevented the LoginApiSpec from leaking must have been the ScalatestRouteTesttrait… Which extends the RouteTest trait… Which looks like this:

I was like… Ohh… Ohh… So… This creates an ActorSystem.. And then it has some cleanUp() method. Wait.. cleanUp()? Where does this get called? I quickly found out the following:

So, the API tests actually implemented the Scalatests’s BeforeAndAfterAll’s afterAll() method with the cleanUp() method. Meanwhile, our Service implementation classes used a global ActorSystem’s execution context (for running Futures), which was imported into them instead of having an implicit ExecutionContext input parameter. So basically, we used the same ActorSystem for all service tests, but we never ever stopped them. This was wrong on sooo many levels. First of all, unit tests should not depend on each other. Secondly, we didn’t shut down the ActorSystem.. Ever.. Of course we had memory leaks.. (big grin)

This also immediately reminded me of the akka-testkit. I’ve used it in the past, but I didn’t think the lack of this here would cause any problems, because we didn’t explicitly use actors in this project. (Well, of course we did but in an indirect way (smile)). So guess what’s the first example in the akka-testkit documentation(big grin)

memory leak akka note

Oh well…  ¯\_(ツ)_/¯

I’ve spent some time fixing our services and tests according to the documentation and in my test project everything worked well. Both API and Service tests. Wohooo!!! But then I’ve tried it in the Jarvis project. Something was still not right. (sad) Metaspace OutOfMemory… Back to finding the differences between our old projects and Jarvis.. There was only one difference remaining: Mockito

Mockito

When I looked at our library dependencies, I’ve realized that this project was using mockito-all 1.10.19. I’ve Googled again a little bit.. I found similar questions on Stackoverflow again. But the thing is, each of my test should have created a new instance of mock, so calling Mockito.reset() would not be appropriate.. And in any case.. If you have to call reset on a mock, something is not right with your test. I have tried this out in my small test project, and indeed, mockito-all was leaking somehow (I didn’t dig deep into this issue)… Additionally, mockito-all has been discontinued by Mockito 2. So I, for sure, wanted to upgrade it anyway.

Thus, I’ve updated our library dependencies to mockito-core 2.11.0 (the most recent version). This resolved the issues in my testing environment. Everything worked. But now something unusual happened to our Jarvis tests. They just stopped in the middle of execution. Now this is something totally different from what I’ve experienced. I again used JMC to try to find out what was happenning here, but this time I suspected that I’d get Deadlocked somewhere. So I went on the Threads tab and selected Deadlock Detection and started running our tests. Here are the results:

memory leak mockito

As you can see, ScalaTest-running-FeedbackUserServiceSpec and two other threads got deadlocked.. This is not good.

After a quick Google search, I found out that there’s an ongoing discussion about this on the Github of Mockito. So this is something, which was not going to be easy to resolve, but the original poster mentioned that this only came up after he upgraded to the given version of Mockito. So, what I did was, I tried downloading different versions of Mockito and it turned out (as you can also see my comment on Github), that using mockito 2.6.6 in my tests didn’t result in a deadlock, while using 2.6.7+ did result in deadlock. So, I decided to use Mockito 2.6.6 for now, until we figure out a fix for the deadlock issue. (smile)

At this stage, if I ran the tests one by one on each project, everything was going fine! The Garbage Collector was soo happy he/she could do his work properly :3. But when I used aggregation to run all subprojects’ tests alltogether, I again ran into the same issue…

No Deadlock, no Memory Leak, still not working

At this time, I felt hopeless… In my despair, I was forced to use the first-last resort. Increasing the memory of JVM… I didn’t give it high hopes, but it was worth a try, so I ran sbt -mem 2048 this is twice as much as the default setting. But to my surprise, increasing the memory actually helped! There was a huge spike in the beggining, but the GC could kick in and do his/her job and I could run the tests without any exceptions several times after each other in the same session. I had the theory, that maybe when all subprojects get aggregated, they start running in parallel and we indeed simply don’t have enough memory by default for all of them. Reading sbt’s Multi Project documentation confirmed this: "Note: aggregation will run the aggregated tasks in parallel and with no defined ordering between them."

By default, if you just write sbt [taskKey] in the console, the task specified will run on the same JVM as the sbt. I’ve started looking into how to increase the size of this JVM. While researching, I came across Forking in sbt and I realized that if I wanted to change the memory settings in the main build.sbt I would have to look into Forking. I wanted to avoid my colleagues having to enter sbt -mem 2048 check every time they wanted to run the tests. (Have I mentioned that I’m lazy? (big grin))

Here’s what I found, again, in sbt’s documentation: http://www.scala-sbt.org/1.x/docs/Running-Project-Code.html

Well, we definitely create threads, which are not GUI threads.. Due to the usage of AsyncAppender we also can’t necessarily ensure that when the program ends all other treads do the same. So actually running our tests and the main code as forked is a good idea as it turned out.. (smile) A little bit of Google led me to these settings based on a Stackoverflow question:

We enable forking for the whole project (doesn’t matter if it’s test or run task) and provide increased memory settings to these JVMs. One drawback is that according to -again- the sbt documentation’s Testing page if you fork your tests, then By default, tests executed in a forked JVM are executed sequentially. Well, I’d like to run them in parallel, but luckily thanks to the setting in the last line, they can run in parallel now. (smile)

memory leak conclusion

 

Conclusion

Overall, this adventure on the memory lake seemed pretty daunting. I’m happy that I found a solution to our issues! Most of these could have been avoided if we read the documentations carefully. Even though they are pretty boring sometimes, it’s better to spend a few extra minutes going through them thoroughly, than having your colleague resolve a memory leak for a few hours! (smile) I hope you enjoyed this little story and if you get into a similar situation, it will help you overcome it. If you have any questions about the lake or about my Googling tips, give it a go in the comment section!

See you soon again!

Zoli

PS. If you haven’t yet, check out our Fb and  Twitter pages!

Zoltan Tudlik

Zoltan Tudlik

#TeamWanari #ESN #eitDigital

Latest posts by Zoltan Tudlik (see all)