OFBizian: Fighting Service Latency in Microservices with Kubernetes

(This post was originally published on Red Hat Developers, the community to learn, code, and share faster. To read the original post, click here.)

CPU and network speed have increased significantly in the last decade, as well as memory and disk sizes. But still one of the possible side effects of moving from a monolithic architecture to Microservices is the increase in the service latency. Here are few quick ideas on how to fight it using Kubernetes.

It is not the network

In the recent years, networks transitioned to using more efficient protocols and moved from 1GBit to 10GBit and even to 25GBit limit. Applications send much smaller payloads with less verbose data formats. With all that in mind, the chances are the bottleneck in a distributed application is not in the network interactions, but somewhere else like the database. We can safely ignore the rest of this article and go back to tuning the storage system :)

Kubernetes scheduler and service affinity

If two services (deployed as Pods in the Kubernetes world) are going to interact a lot, the first approach to reduce the network latency would be to ask politely the scheduler to place the Pods as close as possible using node affinity feature. How close, depends on our high availability requirements (covered by anti-affinity), but it can be co-locating in the same region, availability zone, rack or even on the same host.

Run services in the same Pod

Containers/Service co-located in the same Pod

The deployment unit in Kubernetes (Pod) that allows a service to be independently updated, deployed and scaled. But if performance is a higher priority, we could put two services in the same Pod as long as that is a deliberate decision. Both services would still be independently developed, tested, released as containers, but they would share the same runtime lifecycle in the same deployment unit. That would allow the services to talk to each other over localhost rather than using the service layer, or use the file system, or use some other high performant IPC mechanism on the shared host, or shared memory.

Run services in the same process

If co-locating two services on the same host is not good enough, we could have a hybrid between microservices and monolith by sharing the same process for multiple services. That means we are back to a monolith, but we could still use some of the principles of Microservices and allow development time independence and make a compromise in favour of performance in rare occasions.
We could develop and release two services independently by two different teams, but place them in the same container and share the runtime.
For example, in the Java world that would be placing two .jar files in the same Tomcat, WildFly or Karaf server. At runtime, the services can find each other and interact using a public static field that is accessible from any application in the same JVM. This same approach is used in Apache Camel direct component which allows synchronous in-memory interaction of Camel routes from different .jar files by sharing the same JVM.

Other areas to explore

If none of the above approaches seem like a good idea, maybe you are exploring in the wrong direction. It might be better to explore whether using some alternative approaches such using a cache, data compression, HTTP/2, or something else might help for the overall application performance. Service mesh tools such as envoy, linkerd, traefik can also help by providing latency-aware load balancing and routing. A completely new area to explore.

Follow me @bibryam for future blog posts on related topics.