Which Camel DSL to Choose and Why?

(This post was originally published on Red Hat Developers, the community to learn, code, and share faster. To read the original post, click here.)

Apache Camel is a powerful integration library that provides mainly three things: lot’s of integration connectors + implementation of multiple integration patterns + a higher level Domain Specific Language abstraction to glue all together nicely. While the connectors and pattern choices are use case and feature driven and easy to make, choosing which Camel DSL to use might be a little hard to reason about. I hope this article will be able to guide you in you first Camel journey.

I work for Red Hat Consulting as an Integration architect and one of my primary goals is to help customers get the design and the architecture of their future systems as right as possible, and consequently get the best value out of Apache Camel. One of the common questions I get at the start of every new Camel based project is: “Which Camel DSL should we use? What are the pros and cons of each?

I have one good news and one more good news for you. First, by choosing to use Apache Camel you have already done the right choice and Camel will turn out to be a very useful toolkit in your arsenal of libraries for lot’s of future use cases and projects to come. And second, the DSL is just a technicality and it will not impact the success of your project and you can always change your mind later and even mix and match.

If you are a part of large company, with multiple independent two-pizza size teams that use Java language here and there, the chances are that some teams are already using Camel. Even in small companies, teams use Camel without being aware of each other as it is useful for all kind of tasks and a small enough library you to add to your pom.xml and use it without a permission from the technical design board. If that is the case, just talk to your colleagues and learn from first hand their experience with their DSL of choice.

If you need a more comprehensive comparison and a reason to choose one of the DSLs, below is a brain dump from multiple engineers developing developing Apache Camel and consultants using Camel at multiple customer projects across the globe. Pick the arguments that are valid in your context and make your choice.

Comparing Apache Camel's XML and Java DSLs
Comparing Apache Camel's XML and Java DSLs
If this table doesn’t give you the straight answer you were looking for, probably the answer is: it doesn’t matter. Camel has multiple DSLs, but there are good reasons for both Java and XML based DSLs to be equally popular. The more important takeaway from here is for developers to get used to think in terms of Pipes and Filters, learn the Enterprise Integration Patterns and their notations. Then using one of the Camel DSLs to express these patterns is a technicality without a technical consequence. Usually it is a team preference and culture based choice, such as “We are a hard-core Java shop and we hate XML” or “Can we do all through drag and drop?”.

All that said, the only advice I can suggest is strive for consistency. Avoid using different DSLs in the same service, even for different services in the same project. And if you can convince everybody in the company to use the same DSL even better.

Check out my Camel Design Patterns book for more Camel related topoics and follow me @bibryam for future blog posts.

Rapid SEMAT Application Development with Apache Isis

TL;DR This post talks about a SEMAT pet project I created using Apache Isis and deployed to OpenShift Online here

Apache Isis

As a Java developer who is working primarily on backend systems, I hate do not enjoy creating user interfaces and dealing with Javascript. Luckily, there are Java projects such as JSF (grrr), Apache Wicket, Vaadin that can help avoid Javascript altogether and still create functional user interfaces. But even with these projects, the developer has to think about and actively create the user interface from Java code. That is similar to writing your own SQL statements in the age of ORMs such as Hibernate - an activity we do only when OOTB ORM is not good enough for the use case. And that is exactly where Apache Isis fits in: given a domain model and mapping annotations, it generates the complete user interface at runtime. In a sense, Apache Isis is a OUIM (Object/User Interface Mapping) framework for Java.
There is much more to Apache Isis than only creating user interfaces, it is a full stack rapid application development framework focused on domain driven design. But rather than talking about it, let's see a complete application created with Apache Isis.

SEMAT Essence Kernel

To learn Apache Isis, I decided to implement the SEMAT model and deploy it to OpenShift Online asa  Docker container. Simply said, SEMAT (Software Engineering Method and Theory) Essence Kernel is a OMG Standard that helps define among other things a framework for describing the state of software projects from multiple perspectives (called alphas).

SEMAT Alpa States
The idea is, that every project can be described in a generic way using the following seven alphas:
Stakeholders, Opportunity, Requirements, Software System, Work, Team, Way-of-Working. And each Alpha can be in one or multiple states, for example, the Stakeholders can be: Recognized, Represented, Involved, In Agreement, Satisfied, etc. In addition, each state has certain items to be satisfied before an Alpha can be transitioned to that state.

Stakeholders Alpha's States
As you can see, this is a pretty simple domain model with a state machine logic behid it.

The Showcase Application

Enough said, to see how much Java I had to write for this application, check the dom module of the project on github. All of the other skeleton code is generated through a maven plugin and no user interface code is required. And here is a screenshot of the Project domain entity screen rendering:

Project view as Apache Wicket screen
In addition to generating a user interface, Apache Isis will generate also a REST API using the same domain model. How cool is that.
SEMAT REST API generated from domain model
And the beauty of all this is, that generating UI allows you quickly to iterate over the domain model, show it to the business owners to get feedback and contonue evolving the model.

Some of the SEMAT Application Features implemented/enabled

  • Multi tenancy
  • Manage multiple projects per tenant
  • Manage project Alpha states
  • Custom Essence Alpha state list per tenancy
  • Custom Essence Checklist items per tenancy
  • Alpha state spider/radar diagram
  • Automatic Apache Wicket based UI generation from domain model
  • Automatic REST API generation from the same domain model
  • Self Signup/Registration
  • Auditing user actions
  • Session logging
  • Internationalization
  • Breadcrumb trail
  • Bookmarks

Build and Run

Check the readme for full details, but you can build and run the application locally or on OpenShift to try it out.
mvn clean install
cd webapp
mvn jetty:run


mvn clean install
docker build --rm -t bibryam/semat .
docker run -p 8080:8080 bibryam/semat
Then go to http://localhost:8080/ and login: user/user

Deploy to OpenShift

Once you have got an OpenShift running either locally or online, and have a oc client installed, then you can deploy the already build semat docker image with the following commands:
oc new-project semat
oc new-app bibryam/semat:latest -e CATALINA_OPTS=“-Xmx300m”
oc expose service semat
If you do not trust docker images build by others (you should not!) then you can build your own docker image as shown above with options 2 and 3 and push it your own docker registry and run the application from it:
oc new-app your_name/semat:latest -e CATALINA_OPTS=“-Xmx300m”
Alternatively, you could avoid installing and running docker all together, and have the source code and the docker image build on OpenShift. That is called OpenShift Source-to-Image approach. You can do this from OpenShift UI by using for example "Red Hat JBoss Web Server 3.1 Tomcat 8 1.0" template and pointing to the SEMAT github repo. Or use the template provided in the project itself:
oc create -f semat-openshift-template.json
oc process semat
Using source to image approach allows setting up github webhooks, have a Red Hat base image, have jolokia added, Java memory configurations done, etc.

Live demo on OpenShift

See try out the application, check live demo running on a OpenShift Online

In summary, if you have a domain model that changes often, and the agility in changing the domain logic is more important than how the user user interface looks like, check out Apache Isis. It is an incredible productive and fast business application development framework.
Follow me @bibryam for future blog posts on related topics.

Hexagonal Architecture as a Natural fit for Apache Camel

(This post was originally published on Red Hat Developers, the community to learn, code, and share faster. To read the original post, click here.)

There are architectures and patterns that look cool on a paper, and there are ones that are good in practice. Implementing the hexagonal architecture with Camel is both: cool to talk about, and a natural implementation outcome. I love going hexagonal with Camel because it is one of these combinations where the architecture and the tool come together naturally and many end up doing it without realising it. Let’s see why that is the case.

Why go Hexagonal?

Hexagonal architecture is originally described by Alistair Cockburn as an approach for dividing an application into inside and outside parts. Its intent is to move focus from multiple conceptual layers of an application to a distinction between the inside and outside parts of the application. The inside part represents the domain layer or the business logic, and the outside part consists of all the possible incoming or outgoing interaction points of the application. The same architecture is also known as Ports and Adapters as the the connection between the inside and the outside of the application is realized through ports and adapters. The word “port” is inspired by the operating systems ports where any application that conform to the protocol of a port can send or receive signals from an application. In a sense, a port represent a purposeful conversation. And the adapters represent the technology specific implementations of a port. Depending on the business benefits offered through the port, there might be multiple adapters that would like to expose the port using different technologies.
Hexagonal architecture visualized with Enterprise Integration Patterns
Hexagonal architecture visualized with Enterprise Integration Patterns
Notice that all ports and adapters are fundamentally similar at the architectural level, but Alistair acknowledges that the ports and adapters come up in two flavours: primary and secondary or driving and driven. For example, if there is a simple REST based service that reads and writes to a database, the REST side of the service would be the primary actor port and adapter as it initiates and drives the interactions. The port and adapter for writing to the database side would be the secondary and driven actor as it is not initiating any calls (assuming we are not using any data change capture listeners in which case this adapter would also be a primary one).
Briefly said, hexagonal architecture helps us avoid multi-layered architectures that are prone to end up being baklava architecture (anti-pattern). Instead it pushes us towards simplified separation of concerns, and onion-architecture, clean architecture, and similar.

Why is Camel Hexagonal in Nature?

Let’s look at the two extremes: a layered architecture manages the complexity of a large application by decomposing it and structuring into groups of subtasks of particular abstraction level called layer. Each layer has a specific role and responsibility within the application and changes made in one layer of the architecture usually don’t impact the components of other layers. In practice, this architecture splits an application into horizontal layers, and it is a very common approach for large monolithic web or ESB applications of the JEE world.
Layered architecture compared to Pipes and Filters pattern
Layered architecture compared to Pipes and Filters pattern
On the other extreme is Camel with its expressive DSL and route/flow abstractions. Based on Pipes and Filters pattern, Camel would divide a large processing task into a sequence of smaller independent processing steps (Filters) connected by channel (Pipes). There is no notion of layers that depend on each other, and in fact, because of its powerful DSL, a simple integration can be done in few lines and a single layer only. In practice, Camel routes split your application by use case and business flow into vertical flows rather than horizontal layers. And a typical Camel application is composed of multiple independently working Camel routes that collaborate for achieving the common business goals. As mentioned previously, when working with Camel, services created with it tend to end up as a single layer. Whereas this is fine for most of the simpler cases, applying the hexagonal architecture principles will help with creating better applications when working on large scale projects. What I mean by that is, split your Camel routes into two layers that represent the inside and the outside of the application. The inside of the application is represented by Camel routes that implement the business logic of your integration, and intended to be reused by multiple other routes and protocols. The outside of the application would be implemented by Camel routes that are the adapters in the hexagonal architecture i.e. routes that provide technology specific logic e.g. handle a specific protocol, error handling logic specific for the endpoint, transactionality and recovery actions specific for the endpoint as well.

How to Map Hexagonal Architecture to Camel?

Identify the inside of your application

Even the simplest services created with Camel have some kind of business logic. Usually that is a combination of transforming data, content-based routing, filtering, splitting, aggregating, etc. Very often, none of the out of the box enterprise integration patterns will be applicable and you will have to use your own custom Java bean as part of a Camel route. The awesome part is that Camel is completely non-intrusive and you can develop, test and use Java beans in Camel routes with absolute no dependency on the Camel APIs. Camel bean component will make sure that the bean method parameters are populated with the correct values and also take the return value and put it back into Camel routes. If you have identified the routes containing the elements mentioned above, typically these represent the inside of your application. These kind of routes should not contain logic that is technology and protocol specific. For example avoid using data that is directly populated by components, such as http headers, jms headers, and also error handling retry logic common for the HTTP protocol, compensating action logic, etc. Instead keep this inside Camel routes focused on the business logic only and isolated from outside Camel routes.

Isolate the inside from the outside

In the hexagonal architecture, the inside of the application is reached through ports that abstract conversations. Camel direct component is the perfect implementation of a port. It provides synchronous invocation, the same as a method call in Java. It is not technology and protocol specific, there is no specific data format or schema validation requirement and can be used to pass in and out any kind of data. Typically the preferred data format to pass is a POJO as it is the easiest and most flexible structure to manipulate in a Camel route. But if in your domain, the primary data format is XML, JSON, or anything else, you can keep to such a format as well. No strong rules to follow, but whatever works for you. The only thing that is fixed with direct component is that it is a synchronous interaction model and I think that is the correct one by default. If asynchronicity is required, rather than using SEDA component for a port, it would be better to implement asynchronous logic either as part of the outside route (if asynchronicity is required by an adapter) or in the inside route if it is part of the business logic. But don’t limit the port to asynchronous model only. A port represents a meaningful conversation in the context of a service. That in Camel is represented by direct component which is identified uniquely as a String value in the context of a JVM. One would think that direct component was implemented as a response for Alistairs port definition.

Keep outside out

So we have the business logic of our application implemented as Camel routes accessible only through direct component endpoints as ports. Such a setup allows testing, reusing, and exposing the business logic over multiple protocols to the outside world using other routes. The outside of the application contains any logic that is dependent on the endpoints. Nowadays most common of these are the messaging or file based for asynchronous interaction, and HTTP based for synchronous interaction. But it can also be any of the other over 200 connectors that are present in Camel. Keep in mind that the components you use for the outside, are not only dictating the interaction model, but also usually define the data format, the transaction semantics, the error handling logic, and potentially even other aspects of the applications. For example a SOAP endpoint will perform a schema validation, but consuming a JMS message will require an additional validation step. A transactional endpoint will perform rollback in the case of a failure, but a non-transactional endpoint will require a recovery action, an idempotent endpoint will allow retry, and a non-idempotent endpoint will not. I have described these kind of considerations and other related Camel use cases in more details in the Camel Design Patterns book. Putting it all together, a Camel based service that exposes some business functionality over SOAP and JMS is visualized below. The same business functionality is accessible through direct component for JMS and SOAP based routes. Also, on the right hand side, the same route is using an email notifications port and adapter for sending emails.

Ports and Adapters based Camel service
Ports and Adapters based Camel service
 Notice that, outside routes are not only on the consumer side, they are also on the producer side, i.e. routes that send messages to other systems (remember driving and driven ports/adapters). The intent of the outside routes is to represent the various adapters that should handle everything that is outside specific: protocol, data format, additional logic specific for the endpoint. In addition, an outside route should prepare the data in format that is expected by the port by populating expected headers and the message body. This would allow the same port to be reused by multiple adapter routes. This includes also test fixtures in Camel for unit testing Camel routes, and even the error handling code. The error handling constructs in Camel (no doTry, doCatch, doFinally, but the onException construct) is actually representing a port that is automatically called by the framework on different types of exceptional conditions. Such a concept doesn’t exist in the Java language, but in Camel it is a very commonly used execution path for unhappy scenarios. And treating the error handling flow as just another port in your application (even if it is not called by you but the framework on certain occasion), will help you to reuse it for common error handling across multiple Camel routes.

In Summary

There are no clear rules or guidelines on how to compose an application with Camel routes. Defining those at design time usually limits the creativity of developers at implementation time, and not having guidelines can be a recipe for a spaghetti architecture. In this line of thought, I think hexagonal architecture is sufficiently lightweight, doesn’t kill creativity and imagination during implementation by forcing specific structure. At the same time, it provides just enough guidance for structuring routes. And the best part is, it naturally fits the Camel programing model.
My suggestion would be start with VETRO pattern (Validate, Enrich, Transform, Route, Operate), then apply hexagonal architecture style (Edge Component Pattern as described in Camel Design Patterns book). This is a good starting point for structuring Camel routes for the happy paths. Then pay special attention to achieving data consistency with the various error handling and recovery patterns. And don’t forget, there are no best practices, but only good practices in a context. Focus on your context and Camel will be on your side.
Follow me @bibryam for future blog posts on related topics.

Short Retry vs Long Retry in Apache Camel

(This post was originally published on Red Hat Developers, the community to learn, code, and share faster. To read the original post, click here.)

Camel Design Patterns book describes 20 patterns and numerous tips and best practices for designing Apache Camel based integration solutions. Each pattern is based on a real world use case and provides Camel specific implementation details and best practises. To get a feel of the book, below is an extract from the Retry Pattern from the book describing how to do Short and Long retires in Apache Camel.

Context and Problem

By their very nature integration applications have to interact with other systems over the network. With dynamic cloud-based environments becoming the norm, and the microservices architectural style partitioning applications into more granular services, the successful service communication has become a fundamental prerequisite for many distributed applications. Services that communicate with other services must be able to handle transient failures that can occur in downstream systems transparently, and continue operating without any disruption. As a transient failure can be considered an infrastructure-level fault, a loss of network connectivity, timeouts and throttling applied by busy services, etc. These conditions occur infrequently and they are typically self- correcting, and usually retrying an operation succeeds.

Forces and Solution       

Reproducing and explaining transient failures can be a difficult task as these might be caused by a combination of factors happening irregularly and related to external systems. Tools such as Chaos Monkey can be used to simulate unpredictable system outages and let you test the application resiliency if needed. A good strategy for dealing with transient failures is to retry the operation and hope that it will succeed (if the error is truly transient, it will succeed; just keep calm and keep retrying).
To implement a “retry” logic there are a few areas to consider:            

Which failures to retry?

Certain service operations, such as HTTP calls and relational database interactions, are potential candidates for a retry logic, but further analysis is needed before implementing it. A relational database may reject a connection attempt because it is throttling against excessive resource usage, or reject an SQL insert operation because of concurrent modification. Retrying in these situations could be successful. But if an relational database rejects a connection because of wrong credentials, or an SQL insert operation has failed because of foreign key constraints, retrying the operation will not help. Similarly with HTTP calls, retrying a connection timeout or response timeout may help, but retrying a SOAP Fault caused by a business error does not make any sense. So choose your retries carefully.

How often to retry?

Once a retry necessity has been identified, the specific retry policy should be tuned to satisfy the nature of both applications: the service consumer with the retry logic and the service provider with the transient failure. For example, if a real time integration service fails to process a request, it might be allowed to do only few retry attempts with short delays before returning a response, whereas a batch-based asynchronous service may be able to afford to do more retries with longer delays and exponential back off. The retry strategy should also consider other factors such as the service consumption contracts and the SLAs of the service provider. For example, a very aggressive retry strategy may cause further throttling and even a blacklisting of a service consumer, or it can fully overload and degrade a busy service and prevent it from recovering at all. Some APIs may give you an indication of the remaining request count for a time period and blacklisting information in the response, but some may not. So a retry strategy defines how often to retry and for how long before you should accept the fact that it is a non-transient failure and give up.


When retrying an operation, consider the possible side effects on that operation. A service operation that will be consumed with retry logic should be designed and implemented as idempotent. Retrying the same operation with the same data input should not have any side effects. Imagine a request that has processed successfully, but the response has not reached back. The service consumer may assume that the request has failed and retry the same operation which may have some unexpected side effects.


Tracking and reporting retries is important too. If certain operations are constantly retried before succeeding or they are retried too many times before failing, these have to be identified and fixed. Since retries in a service are supposed to be transparent to the service consumer, without proper monitoring in place, they may remain undetected and affect the stability and the performance of the whole system in a negative way.

Timeouts and SLAs

When transient failures happen in the downstream systems and the retry logic kicks in, the overall processing time of the retrying service will increase significantly. Rather than thinking about the retry parameters from the perspective of the number of retries and delays, it is important to drive these values from the perspective of service SLAs and service consumer timeouts. So take the maximum amount of time allowed to handle the request, and determine the maximum number of retries and delays (including the processing time) that can be squeezed into that time frame.


There are a few different ways of performing retries with Camel and ActiveMQ.

Camel RedeliveryPolicy (Short Retry)

This is the most popular and generic way of doing retries in a Camel. A redelivery policy defines the retry rules (such as the number of retries and delays, whether to use collision avoidance and an exponential backoff multiplier, and logging) which can then be applied to multiple errorHandler and onException blocks of the processing flow. Whenever an exception is thrown up, the rules in the redelivery policy will be applied.
Camel RedeliveryPolicy example
The key differentiator of the retry mechanism is that Camel error handling logic will not retry the whole route, but it will retry only the failed endpoint in the processing flow. This is achieved thanks to the channels that connect the endpoints in the Camel route. Whenever an exception is thrown up by the processing node, it is propagated back and caught by the channel, which can then apply various error handling policies. Another important difference here is that Camel-based error handling and redelivery logic is in-memory, and it blocks a thread during retries, which has consequences. You may run out of threads if all threads are blocked and waiting to do retries. The owner of the threads may be the consumer, or some parallel processing construct with a thread pool from the route (such as a parallel splitter, recipient list, or Threads DSL). If, for example, we have an HTTP consumer with ten request processing threads, a database that is busy and rejects connections, and a RedeliveryPolicy with exponential backoff, after ten requests all the threads will end up waiting to do retries and no thread will be available to handle new requests. A solution for this blocking of threads problem is opting for asyncDelayedRedelivery where Camel will use a thread pool and schedule the redelivery asynchronously. But the thread pool stores the redelivery requests in an internal queue, so this option can consume all of the heap very quickly. Also keep in mind that there is one thread pool for all error handlers and redeliveries for a CamelContext, so unless you configure a specific thread pool for long-lasting redelivery, the pool can be exhausted in one route and block threads in another. Another implication is that because of the in-memory nature of the retry logic, restarting the application will lose the retry state, and there will be no way of distributing or persisting this state.
Overall, this Camel retry mechanism is good for short-lived local retries, and to overcome network glitches or short locks on resources. For longer-lasting delays, it is a better option to redesign the application with persistent redeliveries that are clustered and non-thread-blocking (such a solution is described below).

ActiveMQ Broker Redelivery (Long Retry)

This retry mechanism has different characteristics to the previous two since it is managed by the broker itself (rather than the message consumer or the Camel routing engine). ActiveMQ has the ability to deliver messages with delays thanks to its scheduler. This functionality is the base for the broker redelivery plug-in. The redelivery plug-in can intercept dead letter processing and reschedule the failing messages for redelivery. Rather than being delivered to a DLQ, a failing message is scheduled to go to the tail of the original queue and redelivered to a message consumer. This is useful when the total message order is not important and when throughput and load distribution among consumers is.
ActiveMQ redelivery example
The difference to the previous approaches is that the message is persistent in the broker message store and it would survive broker or Camel route restart without affecting the redelivery timings. Another advantage is that there is no thread blocked for each retried message. Since the message is returned back to the broker, the Competing Consumers Pattern can be used to deliver the message to a different consumer. But the side effect is that the message order is lost as the message will be put at the tail of the message queue. Also, running the broker with a scheduler has some performance impact. This retry mechanism is useful for long-delayed retries where you cannot afford to have a blocked thread for every failing message. It is also useful when you want the message to be persisted and clustered for the redelivery.
Notice that it is easy to implement the broker redelivery logic manually rather than by using the broker redelivery plug-in. All you have to do is catch the exception and send the message with an AMQ_SCHEDULED_DELAY header to an intermediary queue. Once the delay has passed, the message will be consumed and the same operation will be retried. You can reschedule and process the same message multiple times until giving up and putting the message in a backoff or dead letter queue.

Side note - I know, shameless plug, but I'm pretty excited about my book on this topic. You can check it out here at a 40% discount until end of June! And hope you like it.

Bet on a Cloud Native Ecosystem, not a Platform

This is a small extract from a longer post I published at The New Stack. Check the original post here.
Recently I wrote about “The New Distributed Primitives for Developers” provided by cloud-native platforms such as Kubernetes and how these primitives blend with the programming primitives used for application development. For example, have a look below to see how many Kubernetes concepts a developer has to understand and use in order to run a single containerized application effectively:
Kubernetes concepts for Developers
The chances are, the developers will have to write the same amount of YAML code as the application code in the container. More importantly, the application itself will rely on more the platform than it ever used to do before. The cloud native application expects the platform to perform a health check, deployment, placement, service discovery, running a periodic task (cron job), or scheduling an atomic unit of work (job), autoscaling, configuration management, etc. As a result, your application has abdicated and delegated all these responsibilities to the platform and expects them to be handled in a reliable way. And the fact is, now your application and the involved teams are dependent on the platform on so many different levels: code, design, architecture, development practices, deployment and delivery pipelines, support procedures, recovery scenarios, you name it.

Bet on an Ecosystem, not a Platform

The platform is just the tip of the iceberg, and to be successful in the cloud-native world, you will need to become part of a fully integrated ecosystem of tools and companies. So the bet is never about a single platform, or a project or a cool library, or one company. It is about the whole ecosystem of projects that work together in sync, and the whole ecosystem of companies (vendors and customers) that collaborate and are committed to the cause for the next decade or so.  

You can read the full article published on The New Stack here. Follow me @bibryam for future blog posts on related topics.

Fighting Service Latency in Microservices with Kubernetes

(This post was originally published on Red Hat Developers, the community to learn, code, and share faster. To read the original post, click here.)

CPU and network speed have increased significantly in the last decade, as well as memory and disk sizes. But still one of the possible side effects of moving from a monolithic architecture to Microservices is the increase in the service latency. Here are few quick ideas on how to fight it using Kubernetes.

It is not the network

In the recent years, networks transitioned to using more efficient protocols and moved from 1GBit to 10GBit and even to 25GBit limit. Applications send much smaller payloads with less verbose data formats. With all that in mind, the chances are the bottleneck in a distributed application is not in the network interactions, but somewhere else like the database. We can safely ignore the rest of this article and go back to tuning the storage system :)

Kubernetes scheduler and service affinity

If two services (deployed as Pods in the Kubernetes world) are going to interact a lot, the first approach to reduce the network latency would be to ask politely the scheduler to place the Pods as close as possible using node affinity feature. How close, depends on our high availability requirements (covered by anti-affinity), but it can be co-locating in the same region, availability zone, rack or even on the same host.

Run services in the same Pod

Containers/Service co-located in the same Pod

The deployment unit in Kubernetes (Pod) that allows a service to be independently updated, deployed and scaled. But if performance is a higher priority, we could put two services in the same Pod as long as that is a deliberate decision. Both services would still be independently developed, tested, released as containers, but they would share the same runtime lifecycle in the same deployment unit. That would allow the services to talk to each other over localhost rather than using the service layer, or use the file system, or use some other high performant IPC mechanism on the shared host, or shared memory.

Run services in the same process

If co-locating two services on the same host is not good enough, we could have a hybrid between microservices and monolith by sharing the same process for multiple services. That means we are back to a monolith, but we could still use some of the principles of Microservices and allow development time independence and make a compromise in favour of performance in rare occasions.
We could develop and release two services independently by two different teams, but place them in the same container and share the runtime.
For example, in the Java world that would be placing two .jar files in the same Tomcat, WildFly or Karaf server. At runtime, the services can find each other and interact using a public static field that is accessible from any application in the same JVM. This same approach is used in Apache Camel direct component which allows synchronous in-memory interaction of Camel routes from different .jar files by sharing the same JVM.

Other areas to explore

If none of the above approaches seem like a good idea, maybe you are exploring in the wrong direction. It might be better to explore whether using some alternative approaches such using a cache, data compression, HTTP/2, or something else might help for the overall application performance. Service mesh tools such as envoy, linkerd, traefik can also help by providing latency-aware load balancing and routing. A completely new area to explore.

Follow me @bibryam for future blog posts on related topics.

It takes more than a Circuit Breaker to create a resilient application

(This post was originally published on Red Hat Developers, the community to learn, code, and share faster. To read the original post, click here.)

Topics such as application resiliency, self-healing, antifragility are my area of interest. I've been trying to distinguish, define, and visualize these concepts, and create solutions with these characteristics.

Software characteristics
However, I notice over and over again, that there are various conference talks about resiliency, self-healing, and antifragility and often they lazily conclude that Neflix OSS Hystrix is the answer to all of that. It is important to remember that conferences speakers are overly optimistic, wishful thinkers, and it takes more than a Circuit Breaker to create a resilient and self-healing application.

Conference level Resiliency

So what does a typical resiliency pitch look like: use timeouts, isolate in bulkheads, and of course apply the circuit breaker pattern. Having implemented the circuit breaker pattern twice in Apache Camel (first a homegrown version, then using Hystrix) I have to admit that circuit breaker is a perfect conference material with nice visualization options and state transitions. (I will spare explaining to you how a circuit breaker works here, I'm sure you will not mind). And typically, such a pitch concludes that the answer to all of the above concerns is Hystrix. Hurrah!

Get out of the Process

I agree with all the suggestions above such as timeout, bulkhead and circuit breaker. But that is a very narrow sighted view. It is not possible to make an application resilient and self-healing (not to mention antifragile) only from within. For a truly resilient and self-healing architecture you need also isolation, external monitoring, and autonomous decision making. What do I mean by that?

If you read Release It book carefully, you will realize that bulkhead pattern is not about thread pools. In my Camel Design Patterns book, I've explained that there are multiple levels to isolate and apply the bulkhead pattern. Thread Pools with Hystrix is only the first level.

Tools for bulkhead pattern
Hystrix uses thread pools to ensure that the CPU time dedicated to your application process is better distributed among the different threads of the application. This will prevent a CPU intensive failure from spreading beyond a thread pool and other parts of the service still gets some CPU time.
But what about any other kind of failure that can happen in an application that is not contained in a thread pool? What about if there is a memory leak in the application or some sort of infinite loop or a fork bomb? For these kinds of failures, you need to isolate the different instances of your service through processes resource isolation. Something that is provided by modern container technologies and used as the standard deployment unit nowadays. In practical term, this means isolating processes on the same host using containers by setting memory and CPU limits.

Once you have isolated the different service instances and ensured failure containment among the different service processes through containers, the next step is to protect from VM/Node/Host failures. In a cloud environment, VMs can come and go even more often, and with that, all process instances on the VM would also vanish. That requires distributing the different instances of your service into different VMs and contain VMs failures from bringing down the whole application.

All VMs run on some kind of hardware and it is also important to isolate hardware failures too. If an application is spread across multiple VMs but all of them depend on a shared hardware unit, a failure on the hardware can still affect the whole application.
A container orchestrator such as Kubernetes can spread the service instances on multiple nodes using anti-affinity feature. Even further, anti-affinity can spread the instances of a service across hardware racks, availability zones, or any other logical grouping of hardware to reduce correlated failures.

Self-Healing from What?

The circuit breaker pattern has characteristics for auto-recovery and self-healing. An open or half-open circuit breaker will periodically let certain requests reach the target endpoint and if these succeed, the circuit breaker will transition to its healthy state.
But a circuit breaker can protect and recover only from failures related to service interactions. To recover from other kinds of failures that we mentioned previously, such as memory leaks, infinite loops, fork bombs or anything else that may prevent a service from functioning as intended, we need some other means of failure detection, containment, and self-healing. This where container health checks come into the picture.
Health checks such as Kubernetes liveness and readiness probes will monitor and detect failures in the services and restart them if required. That is a pretty powerful feature, as it allows polyglot services to be monitored and recovered in a unified way.
Restarting a service will help only to recover from failures. But what about coping with other kinds of behavior such as high load? Kubernetes can scale up and down the services horizontally or even the underlying infrastructure as demonstrated here.

AWS outage handled by Kubernetes
Health checks and container restarts can help with individual services failures, but what happens if the whole node or rack fails? This is where the Kubernetes scheduler kicks in and places the services on other hosts that have enough capacity to run them.
As you can see here, in order to have a system that can self-heal from different kinds of failures, there is a need for a way more resiliency primitives than a circuit breaker. The integrated toolset in Kubernetes in the form of container resource isolation, health checks, graceful termination and start up, container placement, autoscaling, etc do help achieve application resiliency, self-healing and even blend into antifragility.

Let the Platform Handle it

There are many examples of developer and application responsibilities that have shifted from the application into the platform. With Kubernetes some examples are:
  • Application health checks and restarts are handled by the platform.
  • Application placements are automated and performed by the scheduler.
  • The act of updating a service with a newer version is covered by Deployments.
  • Service discovery, which was an application level concern has moved into the platform (through Services).
  • Managing Cron jobs has shifted from being an application responsibility to the platform (through Kuberneres CronJobs).
In a similar fashion, the act of performing timeouts, retries, circuit breaking is shifting from the application into the platform. There is a new category of tools referred to as Service Mesh and with the more popular members at this moment being:
These tools provide features such as:
  • Retry
  • Circuit-breaking
  • Latency and other metrics
  • Failure- and latency-aware load balancing
  • Distributed tracing
  • Protocol upgrade
  • Version aware routing
  • Cluster failover, etc
That means, very soon, we won't need an implementation of the circuit breaker as part of every microservice. Instead, we will be using one as a sidecar pattern or host proxy. In either case, these new tools will shift all of the network-related concerns where they belong: from L7 to L4/5.
Image from Christian Posta
When we talk about Microservices at scale, that is the only possible way to manage complexity: automation and delegation to the platform. My colleague and friend @christianposta has blogged about Service Mesh in depth here.

A Resiliency Toolkit

Without scaring you death, below is a collection of practises and patterns for achieving a resilient architecture by Uwe Friedrichsen.

Resiliency patterns by Uwe Friedrichsen
Do not try to use all of them, and do not try to use Hystrix all the time. Consider which of these patterns will apply to your application and use them cautiously, only when a pattern benefit outweighs its cost.
At the next conference, when somebody tries to sell you a circuit breaker talk, tell them that this is only the starter and ask for the main course.
Follow me @bibryam for future blog posts on related topics.

Some IT Wisdom Quotes from Twitter

I believe the way we interact with Twitter reflects the mood and the mindset in general we are. Here I collected some of the tweets I've liked and enjoyed reading recently. Let me know if you have others.

The price for free software is your time.

Kelsey Hightower @kelseyhightower

If you don’t end up regretting your early technology decisions, you probably overengineered.

Randy Shoup @randyshoup

Optimize to be Wrong, not Right.

Barry O'Reilly @BarryOReilly

Most decisions should probably be made with somewhere around 70% of the information you wish you had. If you wait for 90%, in most cases, you're probably being slow.

Jeff Bezos, Amazon CEO @JeffBezos

You can't understand the problem up front. The act of writing the software is what gives us insight into it. Embrace not knowing.

Sarah Mei @sarahmei

I love deadlines. I like the whooshing sound they make as they fly by.

Douglas Adams

It is the cloud, it is not heaven.

Everything is a tradeoff... just make them intentionally.

Matt Ranney, Chief Architect Uber @mranney

Microservices simplifies code. It trades code complexity for operational complexity.

Do not strive for reusability, and instead aim for replaceability.

Fred Brooks, @ufried

Signing up for Microservices is signing up for evolutionary architecture. There’s no point where you’re just done.

Josh Evans from Netflix

Inverse bus factor: how many people must be hit by a bus for the project to make progress.

Erich Eichinger @oakinger

If you think good architecture is expensive, try bad architecture.

Brian Foote & Joseph Yoder

API Design is easy ... Good API Design is HARD.

David Carver

If we don’t create the thing that kills Facebook, someone else will.

Facebook’s Little Red Book

The Job of the deployment pipeline is to prove that the release candidate is unreleasable.

Jez Humble @jezhumble

Wait... Isn't forking what #opensource is all about? Nope. The power isn't the fork; it's the merge.

It is not necessary to change. Survival is not mandatory.

W. Edwards Deming

You can sell your time, but you can never buy it back. So the price of everything in life is the amount of time you spend on it.

Hope reading this post was worth the time you spent on it :) Follow me @bibryam for future blog posts on related topics.

New Distributed Primitives for Developers

(This post was originally published on Red Hat Developers, the community to learn, code, and share faster. To read the original post, click here.)

Object-Oriented Primitives (in-process primitives)

As a Java developer, I'm well familiar with object-oriented concepts such as class, object, inheritance, encapsulation, polymorphism, etc. In addition to the object-oriented concepts, I'm also well familiar with the Java runtime, what features it provides, how I can tune it, how it manages my applications, what would be the lifecycle of my object and the application as a whole, etc.

And for over a decade, all that have been the primary tools, primitives and building blocks I've used a developer to create applications. In my mental model, I would use classes as components, which would give birth to objects that are managed by the JVM. But that model has started to change recently.

Kubernetes Primitives (distributed primitives)

In the last year, I began to run my Java applications on Kubernetes, and that introduced new concepts and tools for me to use. With Kubernetes I don't rely only on the object-oriented concepts and the JVM primitives to implement the whole application behavior. I still need to use the object-oriented building blocks to create the components of the application, but I can also use Kubernetes primitives for some of the application behavior.

For example, now I strive to organize the units of application behavior into independent container images which become the main building blocks. That allows me to use a new richer set of constructs provided by Kubernetes to implement the application behavior. For example, now I don't rely on only an implementation of ExecutorService to run some service periodically, but I can also use Kubernetes CronJob primitive to run my container periodically. The Kubernetes CronJob will provide similar temporal behavior, but use higher level constructs, and rely on the scheduler to do dynamic placement, performing health checks, and shutting down the container when the Job is done. All that ends up in more resilient execution with better resource utilization as a bonus. If I want to perform some application initialization logic, I could use the object constructor, but I could also use init-container in Kubernetes to carry out the initialization at a higher level.

The Distributed Mental Model

Having in-process primitives in the form of object-oriented concepts and the JVM features, combined with distributed out-fo-process primitives provided by Kubernetes give developers a richer set of tools to create better applications. When building a distributed application, my mental model is not any longer limited to a JVM, but spreads across a couple of nodes with multiple JVMs running in coordination.

The in-process primitives and the distributed primitives have commonalities, but they are not directly comparable and replaceable. They operate at different abstraction levels, have different preconditions and guarantees.  Some primitives are supposed to be used together, for example, we still have to use classes, to create objects and put them into container images. But some other primitives such as CronJob in Kubernetes can replace the ExecutorService behavior in Java completely. Here are few concepts which I find commonalities in the JVM and Kubernetes, but don't take that any further.

With time, new primitives give birth to new ways of solving problems, and some of these repetitive solutions become patterns. Check out my in-progress Kubernetes Patterns book for this line of thinking.

CloudNativeCon + KubeCon Europe 2017 Impressions

I was lucky to get my Cloud Native Patterns (video, slides) lightning talk accepted and attend CloudNativeCon + KubeCon Europe 2017 in Berlin. The following is a quick braindump / cameradump while the adrenaline and the excitement of the conference are still in my veins.

The conference had 1200 attendees which is 3x bigger than last year conference in London.

A few quick stats about Kubernetes community (video) by Chen Goldberg

What is Cloud Native and Why Should I Care (video)? by Alexis Richardson

The software is eating the world.
Open source is eating the software.
Cloud (is that Cloud Native?) is eating open source.

All sessions really well attended and packed and in some sessions people not let in. Below is shot from Autoscaling in Kubernetes (video) by Marcin Wielgus.

Also was interesting to see that Philips Hue (smart lights) started evaluating Kubernetes after last year's KubeCon and today they run in production all smart light backend.

A common theme across few sessions was about the fact that Kubernetes makes the life of Ops easy, but the life of the developers harder. The entry level for Kubernetes is quite high which prevents faster adoption.

Michelle Noorali from Deis did excellent talk on getting this point across, and so did Joe Beda.
Coming from a Java background, this is a topic that is close to my heart as well. I've been trying to educate the Java community why containerized Cloud Native and Kubernetes matter. And it is great to see that it is a widely recognized theme and a priority for the cloud native community.

Lot's of companies presented in the conference, from big players such Google, Red Hat, IBM and Microsoft (which also offer Kubernetes as a service), to Mesosphere. And many other smaller companies and new startups, where everybody does something around Cloud Native. (Would have been nice if Cloud Foundry had also shown up as the pioneers in Cloud Native).

Containerised USB sticks and Kubernerts based OpenShift books have all gone.

If you are looking to get involved into the cloud native world, check out the Job Board below for ideas and Red Hat jobs site as well.

Final thoughts:

  • At these events, you can see and feel how CNCF is building a great community of users accompanied by a collaborative ecosystem of companies.
  • At least half of the keynote sessions were given by women. That is at least 10x higher than other Open Source conferences.
  • Kubernetes (and other CNCF projects) have to become more user/developer friendly. Expect that to happen next!
  • All Recordings from the conferences are on youtube already. Check them out, feel the vibe and become part of it.
  • Don't miss CloudNativeCon + KubeCon December 6-8 2017 in Texas.

About Me