Working with Microservices in an Ideal World

Generally, microservices are great. In theory, they are divided up well, they are scalable, and they can be maintained easily. With one word, they are great. However, there are some challenges you will most probably encounter when working with microservices. This post will help you get a grip on the world of microservice-development, its challenges, and the available supporting tools.

 microservices in a perfect world

Microservices and their challenges

In the microservice architecture all of the services are doing their well defined small tasks. Of course, they need to use other services to do their daily work, so they communicate with REST. But when you want to make a new service reliant on other services, you need to know how you can communicate with them and what data structures they can consume etc. So when you want to build your new service, you need to do your homework and figure out through which syntax you can talk to the others to tell them what you want to them. That’s your first difficulty.

When you are done with your newly-built service, you want to test it. But it depends on an another service. So if you want to test it properly, you need to mock the original services. If you want to see the actual communications too, you need to set up a playground for the test complete with the mocked services and your fully functioning service as well. Creating the mocked versions of the original services can be an overhead, and building the playground is interesting as well. This is your second difficulty.

So we have services, and we know some of them are slower, so we start more instances of them (we can properly scale our application because we are awesome enough to build microservices instead of a big bad monolithic application). Our next difficulty is the load-balancing and endpoint-resolving.

How will we deploy our application? We need a ton of virtual machines, we need to be able to start or stop service instances fast, and we need to deploy new versions if we improve our service . And what if our new version is not compatible with our old version? We need to update and redeploy the services above it too. This is an interesting problem and our fourth difficulty.

The fifth will be the monitoring. How will we measure a thing? Which service would need more instances, which will go well with less? Which services are slow? What happens if some of our service instances or services are starting to fail? What if our inner connections start to work with a huge network delay? We need to have answers for these questions if we want to host a reliable application.

REST Api tools

Our first problem is that we want to use services that were written by others. Conversely, we want others to rely on our new features and functions. Unfortunately we don’t know how to use theirs and they don’t know how to use ours. And we really don’t want to write duplicate code for all services that could use just one specific one.

There are useful things on the internet for this family of problems. We can write down how our services react to requests and then others can use our stuff. If we choose the right tool-stack we can generate some code from the API documentation too. This can be really helpful.

For example if we choose Swagger, we can generate our server and client code for some popular languages and frameworks. Or, we can add some extensions to APIBlueprint, it will generate some code for us. This is good, we don’t need to write boilerplate codes, just drop some files in a generator and paste the generated resources into our directories. Of course we need to write those docs, and more importantly, we need to update them regularly!

If you have an undocumented service, you can try to document it with tools like Stoplight. However, I think the docs from the code are better. Some frameworks have some built-in functionality or library to create API documentation from your code. These can help you update, manage or (if you are super lazy) create your documentation completely (for example Spring Boot or Play! Framework).

Testing and mocking

So we write our new service, we need to integration test it!

We have two paths. We can write and keep a mock service up-to-date for every service we have, or we can mock them with something like a mock-generation tool. The first path is expensive in devtime, and when the releases come, the “keep it up-to-date” part will be painful. The generation method is faster (devtime is expensive, CPU time is not (smile) ) if we have documented service api-s.

If we use swagger, we can start mock-servers from the documentations. There are lot of tools, some have complex functionalities, some are just the starter type. If you interest, you can google “mock servers”, and get good responses like Prism, Swagger server stub generator, Drakov or api-mock.

Running this on a CI is a bit more difficult, and if you work with some IT-oriented guys, they will love this challenge. If you are deploying a new version of your service, you need to run tests for your “clients” too; some minor API changes can crash their functionalities.

Load balancing

It can be already done if you just write a new service to an already working infrastructure. But if you are building your full application from scratch, I think this is the biggest challenge.

You need to achieve some hard goals at once:

  • Replicate or kill services on the fly as the load changes significantly
  • Let the load-balancer know instantly where and how many services are up
  • You need to build all of the services with the same name-resolver-mechanics or use possibly costly load-balancers per service

There are too many tools and too many points of views to just choose the best one, use it, and walk away happily.

You can build up some self-hosted load-balancers (like NIGINX) or use something that your VPS provider gives you (like Azure or AWS) or, in extreme situations, you can build your own server farm with supporting tools built for these kinds of problems (like OpenShift). (Read more about your options in Barni’s post…) And all of our decisions are of high impact on our services codes and costs. I think it’s really problem-dependent. I think if you really want to host your application, you need to learn from the big players on this market, read blogposts from them and don’t make their early mistakes again. My favorite blog in this field is from Netflix, but Uber is great too. RiotGames‘s blog is an interesting one too, but the posts are now focusing on the dev challenges and processes rather then hosting.

Deployment and version management

The deployment is a bit similar to the load-balancing topic. It really depends on your hosting model, and if you are building your application from the ground up, you need to learn from the big fish. Azure and AWS has some built-in method for changing your servers or containers for a newer version. If you are running your own farm, OpenShift has some similar functionalities for orchestration, too.

The new functionalities and new API versions should be dev challenges and not IT (by IT, I mean the guys fueling servers with coal (wink)). If you want to add some new properties to a response, you need to make a new API call and not override an existing API call. Some name-resolver services (like Eureka) can handle multiple service versions for parallel redeploy, but if you want to play it safe, it’s much better to change only one service at a time. So the best approach is versioning your API endpoints. This article collects the solutions used most: path versioning, Content-Negotiation or custom version header.

Monitoring and failsafe

Monitoring is always important. But monitoring will be hard if we start 200 servers for a service at once. We need to aggregate logs, read the load values and so on. There are interesting tools  that can track outer API calls in the whole system, telling us why our performance is good (or bad), which inner components respond slower than others and so on. Some, for example: Dynatrace, TraceNeoSense, or Zipkin.

Proper monitoring can help you handle deployment or programming errors, but if you want a super fail-safe operation, you need to stress-test your system from time to time. Netflix created the Simian Army for this. They developed the Chaos Monkey to shut down some services randomly (it’s open source, and it has a wikipedia article, too). Later they made some more monkeys to help them create and resolve issues. They also wrote an in-depth post on their failure testing.

 

Summary

When I first met the microservice architecture I had a lot of questions about supporting tools and how they can handle boilerplate codes and new challenges which came from the architecture design. If you want to build a microservice application from scratch, you will need to dig deeper. Some companies started to build tools for all of these matters, and most of them are open sourced and well documented. I think every future-oriented programmer wants to build something new and innovative. So if you are one of us, you must think of these points as challenges, rather than problems. Then you can build better tools to help others in this new area (or era?).

Gergő Törcsvári

Gergő Törcsvári

Software Developer at Wanari
I would love to change the world, but they won’t give me the source code (yet).