Architecting messaging solutions with Apache ActiveMQ Artemis

Wednesday, July 01, 2020 No comments

As an architect in the Red Hat Consulting team, I’ve helped countless customers with their integration challenges over the last six years. Recently, I had a few consulting gigs around Red Hat AMQ 7 Broker (the enterprise version of Apache ActiveMQ Artemis), where the requirements and outcomes were similar. That similarity made me think that the whole requirement identification process and can be more structured and repeatable.

This guide is intended for sharing what I learned from these few gigs in an attempt to make the AMQ Broker architecting process, the resulting deployment topologies, and the expected effort more predictable—at least for the common use cases. As such, what follows will be useful for messaging and integration consultants and architects tasked with creating a messaging architecture for Apache Artemis, and other messaging solutions in general. This article focuses on Apache Artemis. It doesn’t cover Apache Kafka, Strimzi, Apache Qpid, EnMasse, or the EAP messaging system, which are all components of our Red Hat AMQ 7 product offering.

Typical customer requirements

In my experience, a typical middleware use case has fairly basic messaging requirements and constraints that fall under a few general categories. Based on the findings in these areas, there are a few permutations of the possible solutions with pros and cons, and the final resulting architectures are fairly common. Designing, documenting, communicating the constraints and implementing these common architectures, should be well understood by messaging SMEs. Anything different from these standard architectures should be expected to require additional effort and lead to a bespoke architecture with unique non-functional and operational characteristics.

This article covers the following hypothetical but common messaging scenario. Here is a customer describing the typical messaging requirements:

We have around 100 microservices with Spring Boot and Apache Camel that use messaging extensively.
All of our services are scalable and high availability (HA), and we expect the messaging layer to have similar characteristics.
We use mostly point-to-point but we also have a few publish-subscribe interactions.
Most of our messages are small in the KB range, but there are those that are fairly large in the single-digit MBs range.
We don’t know our current message throughput, it changes as we add new services using messaging.
We don’t use any exotic features, but we have use cases with message selectors, scheduled delivery, and TTL.
We need to preserve the message ordering with and without message grouping.
We primarily use JMS from our Java-based services and AMQP from the few .NET services.
We don’t like XA, but in a handful of services, we use distributed transactions involving the message broker.
We can replay messages if necessary, but we cannot lose any message, and we use only persistent messages.
We put all failed messages in DLQs and discard later.
We want to know all of the best practices and naming conventions.
All broker-to-broker communication and client-to-broker communication must be secured.
We want to control who can create queues, read and write messages, and browse.

If you hear these above requirements, you are in familiar territory, and this article should be useful to you. If not, and there are specific hardware, throughput, topological, or other requirements, clone the Apache Artemis repo and go deeper. And don’t forget to share what you learn with others later.

The constraints identification approach

In addition to the obvious customer requirements and wishes, other hard and soft constraints will shape the resulting architecture. The customer might or might not be aware of these constraints and dependencies, and it is your job to dig deep and discover them all.

The approach I follow is to start from the fundamental and hard to change requirements, such as infrastructure and storage, as shown in Figure 1. Explore what options there are for each and document the constraints with pros and cons. Then, do the same for the orchestration layer, if present.

The fundamental constraints will then dictate what is possible in the upper layers, such as options for high availability and scalability. Further up in the layers the flexibility increases, and one can choose swap load balancers and different client implementations without impacting the layers below.

Figure 1: Breaking down higher-level requirements into specific constraints.

Finding the answers to these points and identifying what is most important and where the customer is willing to make a compromise will help you identify a workable architecture. Next, let’s go deeper, and see what the specific constraints are for an Apache Artemis-based solution.

Infrastructure

This an area where the customer will have the least amount of flexibility, and your goal is to identify how the message broker fits within the available infrastructure in a reliable configuration. It is unlikely that a customer will change their infrastructure provider for their messaging needs, so try to identify a fit-for-purpose solution.

Typically, common messaging infrastructures are based on on-premise infrastructure with virtualizers, NFS storage, and F5 load balancers. This infrastructure all can be within a single data center or spread across two data centers (rather than three, unfortunately). In an alternative scenario, the customer might be using AWS (or equivalent), such as EC2, EBS, EFS, RDS, or ELBs. Typically, all of these options spread across three AZs in a single region. That is the most common AWS setup for a small-to-midsize integration use case.

Apart from computing, storage, and load balancers, at this stage, we want to identify the data center's topology, network latency, and throughput. Is the client using a single datacenter, two data centers, or any other odd number? Is it an active-active, or active-passive data center topology?

Last but not least, what are the operating system, JDK, and client stacks? This information is easily verifiable from the AMQ 7-supported configuration page, including what is tested, what is supported, for how long, and whatnot.

While on-premise and cloud-based infrastructures offer similar resources, the difference typically is in the number of data centers and the operation, failover, and disaster recovery models. Influencing these fundamental models is a slow process, which is why we want to identify these constraints first.

Storage

Once we have identified the broader infrastructure level details, the next step is to focus on storage. Storage is a part of the infrastructure layer but it requires separate considerations here. When HA is a requirement (which is always the case), storage is the most critical and limiting factor for the messaging architecture. Pay special attention to what options the customer's infrastructure offers, as the answer will significantly limit the possible deployment topologies.

Storage capacity

Capacity is hardly a real issue, as typically there are many unknowns when estimating the exact storage capacity required. Most customers:

Use the message broker as their temporary staging area, where messages are consumed as fast as the consumers can handle. Typically there are no consumer service RPOs defined, and it is not clear for how long messages can keep accumulating.
Put messages into DLQs, but will not have a clear idea of what to do with these messages later. Replaying failed messages is dependent on the actual business requirements and is not always desirable.
Expect that if a message is 1MB in size, it will consume 1MB on the disk. As you all know from experience by now, that is not the case. The same message could end up consuming multiple times more storage, depending on the type of messaging interaction style, caching, and other configurations.

All of these and other scenarios can lead to the accumulation of messages in the broker and consume hundreds of gigabytes of storage. If the customer has no answers to these points, the only proven approach for estimating the required storage size is "finger in the air." Luckily, Artemis—like its predecessors—has flow control, which can protect the broker from running out of storage. This question typically comes down to whether to throw an exception or block the producers to protect the broker.

Storage type

Storage type is much more important and dictates what high-availability options will be required later. For example, if the broker is on Kubernetes, there is no master/slave, and therefore, there is no need for a shared file system with a distributed lock such as NFSv4, GFS2, or GlusterFS. But, the file system should ensure that the journal has high availability.

When the broker is on VMs (not on Kubernetes), the simpler option to implement and operate is to use a supported shared file system. Notice that AWS EFS service is not a full NFSv4 spec, but it is still supported as a shared storage option for Artemis. If a shared file system is not present, alternatively, you can use a relational database as storage with a potential performance hit. Check which relational databases are supported (currently, that is Oracle, DB2, MSSQL). Note that using AWS RDS is a viable option here too.

If no shared file system or relational database is possible, you can consider replication. Replication requires additional considerations. One big advantage of replication is that the messaging and middleware team will not depend on any storage team to provision the infrastructure. Also, there won’t be a cost for shared filesystems or relational databases, and the broker performs its data replication. There are customers who like this aspect, but all good things come at a cost, such as the fact that a reliable replication requires a minimum of three master and three slave brokers to avoid split-brain situations.

There is also the option of using the network pinger, which is risky and not recommended in practice. The network pinger avoids the need for three of everything, but you should only use the network pinger if you are unable to use three or more live backup groups. If you are using the replication high availability policy, and if you have only a single live backup pair, configuring network pinging reduces (but does not remove) the chances of encountering network isolation.

Another cost is that split-brain could happen, not only for network partitions and server crashes, but also as a consequence of overload, CPU starvation, long I/O waits, long garbage collection pauses, and other reasons. Also, replication can happen only within a single datacenter and LAN and requires a reliable, low-latency network. AWS AZs in the same region are considered different data centers, as Amazon does not commit to networking latency SLAs either. Finally, replication also has a performance hit compared to a shared store option.

Ultimately, the critical point about storage is that while we can make the broker process and the client process HA, the datastore itself also has to be HA and durable, and this is possible only through data replication. As part of AMQ architecture, it is important to identify who is replicating the data (the file system, the database, or the message broker itself through replication) to ensure that the data is highly available.

Orchestration

Here, the question boils down to checking whether the customer will run the message broker on container orchestrators such as Kubernetes and Red Hat OpenShift, on bare VMs through homegrown bash scripts, or Red Hat Ansible playbooks. If the customer is not targeting OpenShift, the questions in this section can be skipped. If the messaging infrastructure will run on containers and be orchestrated by Kubernetes, there are a few constraints and architectural implications to consider.

For example, there is no master/slave failover (so no hot backup broker present). Instead, there is a single pod per broker instance that is health monitored and restarted by Kubernetes, which ensures broker HA. The single pod failover process with Kubernetes is different from master/slave with replication failover on-premise. Because there is no master/slave failover, there is no need for message replication between master/slave either. There is also no need for distributed file locking, which means that there is no need for a shared file system with distributed locking capabilities and that one can still use these file systems to mount the same storage to different Kubernetes nodes and pods, but the locking capability of the file system is not a prerequisite any longer.

For example, in the case of a node failure, Kubernetes would start a broker pod on a different node and make the same PV and data available. Because there is no master/slave, there is no need for ReadWriteMany, but only for ReadWriteOnce volume types. That said, you might still need a shared file system that can be mounted to different nodes in the case of node failure (such as AWS EBS, which can be mounted to different EC2 instances in the same region).

So, are there messaging clients located outside of the cluster? Connecting to the broker from within an OpenShift cluster is straightforward through Kubernetes services, but there are restrictions for connecting to the broker from outside of the Kubernetes cluster.

Next, can external clients use a protocol that supports SNI? The easiest option is typically to use SSL and access the broker from the router. If using TLS for clients is not possible, consider using NodePort binding which requires cluster-admin permissions.

Finally, there is a scaledown controller to drain and migrate messages when scaling down broker pods in a cluster.

There might be a few other differences, but failover, discovery, and scaledown is automated, and the broker fundamentals do not change on Kubernetes and Openshift.

High availability

When a customer talks about "high availability," what they mean is a full-stack, highly-available messaging layer. That means HA storage, HA brokers, HA clients, HA load balancers, and HA anything else that might be in between. To cover this scenario, you have to consider the availability of every component in the stack, as shown in Figure 2:

Storage

The only way to ensure HA for data is by replicating the data. You have to identify who replicates the data and where the data is replicated: locally, across VMs, across DCs, and so on. Most customers will want to survive a single data center outage without a message loss, which requires a cross data center replication mechanism. The easiest option is to delegate the journal replication to the file system. This option has implications on cost and dependency on infrastructure teams. For example, if you replicate data using a database, consider the cost and performance hit. If you replicate data using Artemis journal replication but consider the customer's maturity to operate a broker cluster, consider split-brain scenarios, data center latencies, and performance hits.

Broker

On Kubernetes, broker HA is achieved through health checks and container restarts. On-premise, the broker HA is achieved through master/slave (shared store or replication). When replication is used, the slave will already hold the queues in memory, and therefore is pretty much ready to go in case of failover. With shared storage, when the slave gets hold of the lock, then the queues need to be read from the journals ahead of the slave takeover. The time for a shared storage slave to take over will be dependent on the number and size of messages in the journal.

When we talk about broker HA, it comes down to an active-passive failover mechanism (with Kubernetes being an exception). But Artemis also has an active-active clustering mechanism used primarily for scalability rather than HA. In active-active clustering, every message belongs to only one broker, and losing an active broker will make its messages also unaccessible—but a positive side effect of that issue is that the broker infrastructure is still up and functioning. Clients can use active instances and exchange messages with the drawback of temporarily not accessing the messages that are in the failed broker. To sum up, active-active clustering is primarily for scalability, but it also partially improves the availability with temporary message unavailability.

Load balancer

If there is a load balancer, prefer one that is already HA in the organization, such as F5s. If Qpid is used, you will need two or more active instances for high availability.

Clients

This is probably the easiest part, as most customers will already run the client services in redundantly HA fashion, which means two or more instances of consumers and producers most of the time. A side effect of running multiple consumers is that message ordering is not guaranteed. This is where message groups and exclusive consumers can be used.

Scalability

Scalability is relatively easier to achieve with Artemis. Primarily, there are two approaches to scaling the message broker.

Active-active clustering

Create a single logical broker cluster that is scaled transparently from the clients. This can be three masters and three slaves (replication or shared storage doesn’t matter) to start with, which means that clients can use any of the masters to produce and consume the messages. The broker will perform load balancing and message distributions. Such a messaging infrastructure is scalable and supports many queues and topics with different messaging patterns. Artemis can handle large and small messages effectively, so there is no need for using separate broker clusters depending on the message size either.

A few of the consequences of active-active clustering are:

Message ordering is not preserved.
Message grouping needs to be clustered.
Scaling down requires message draining.
Browsing the brokers and the queues is not centralized.

Client-side partitioning

Create separate, smaller master/slave clusters for different purposes. You can have a separate master/slave cluster for real-time, batch, small messages, large messages, per business domain, criticality, team, etc. When a broker pair reaches its capacity limit, create a separate broker pair and reconfigure clients to use it.

This technique works as long as the clients can choose which cluster to connect to (hence the name client-side partitioning). There is also a use case here for Apache Qpid where you can add new brokers and assign addresses to them without the clients needing to know anything about the location of these brokers, and therefore, simplifying the clients and making the messaging network dynamic.

Load balancer

While a load balancer is not a mandatory component of the messaging stack, it is an architectural decision whether you are going to use client-side load balancing or an external load balancer. With an external load balancer, you have the fact that customers like using their existing load balancers such as F5 for messaging, too. Plus, load balancers:

Already exist in many organizations, they are already HA, and they support many protocols—so it makes sense to use them for messaging too.
Allow a single IP for all clients.
Can do health checks and failover to the master broker (that is, a probe attempting to connect to the relevant acceptor).

There are clients, such as the .NET client with the AMQP protocol, do not support the failover protocol OOTB (here is also an example showing how to mitigate this limitation). Using a load balancer helps with these clients.

Apache Qpid

Apache Qpid can act as an intelligent load balancer for the AMQP protocol only. It supports closest-, lowest latency-, and multicasting-type distributions. You will need to run multiple instances of Qpid to make it HA. That means the clients have to be configured to use multiple IPs. Qpid can also support many topologies, and allow having connections from more secure to less secure directions rather than the other way.

Qpid comes into its own in a geographically-spread meshing message where clients do not know the location of each other and any of the brokers they might be sending messages to, bi-directional messaging beyond the firewall, and building redundant messaging network routes. It's also easy to scale the number of brokers without changing the clients, and the topology can change without a change to the clients as well (dynamic messaging infrastructure).

You can also use Qpid to create multiple brokers sharding an address without the need to use broker clustering, but this feature is only useful if message ordering is not that important, or to act as a client connection concatenator (especially useful for IoT scenarios).

Client-side load balancing

Using a load balancer is not required in reality. You can configure messaging clients to connect to a broker cluster directly. A client can connect to a single broker and discover all other brokers, changing topologies, etc. Consider what happens if the single broker the client is trying to connect is down. For the answer, a list of broker IPs can be passed, and custom load balancing strategies implemented.

Client-side load balancing has advantages: The client can publish to multiple brokers and perform load balancing. This feature can be disabled if publishing to a single broker is required. The downside here is that the client-side load balancing is a client-specific implementation, and the options mentioned here vary across clients.

Clients

With Artemis, there are multiple clients, protocols, and possible combinations. In terms of protocols, here are a few high-level pointers. Another thing to consider here is which protocol can be converted to which protocols when consumers and producers use different wire protocols and clients. The protocol and client choices are unlikely to impact the broker architecture, but they will impact the client service development efforts, and this issue can easily turn into a mess.

AMQP 1.0

AMQP 1.0 should be the default starting option when possible. This option is one of the most tested and used. It is also cross-language, and the only supported option for .NET clients. Keep in mind that Interconnect (the enterprise version of Apache Qpid) supports AMQP 1.0 only, and if Interconnect is in the architecture, the clients have to use AMQP to interact with it.

A limitation of AMQP is that it does not offer XA transaction support

Core

The Core protocol is one of the most advanced, feature-rich, and tested protocols for Artemis. It is the only supported protocol when using EAP with embedded Artemis, and it is the recommended protocol when XA is required.

OpenWire

This protocol is here for legacy compatibility reasons with AMQ 6 (Apache ActiveMQ broker) clients. It is useful in situations when the client code cannot change, so you are stuck with OpenWire. An attractive point about this protocol is that it supports XA.

Reference architectures

Having identified requirements, dependencies, and specific constraints, the next step is coming up with possible deployment architectures. I’m a firm believer in the mantra, "There is no reference architecture for the real world." Consequently, there is no simple process to follow and map the findings to a target architecture. It is the combination of all requirements, constraints, and possible compromises that lead to identifying the most suitable architecture for a customer.

For demonstration purposes, the following are common Artemis deployment topologies for AWS, on bare VMS instead of Kubernetes. The same topologies also apply for on-premise deployments where similar alternative infrastructure services are present. The considerations that apply to all of the deployments below are:

Client-side load balancing or a load balancer can be used for all of these deployments.
Load balancers can be co-located with the broker, client, in a dedicated layer, or a combination of these.
Slave brokers can be kept in separate hosts as demonstrated below, or co-located with a master broker.

Non-clustered Apache Artemis with shared storage

The simplest HA architecture for Artemis is a single master/slave cluster with shared storage. The example that follows is a scalable version of that set up with two separate master/slave clusters. Notice that there is no clustering (server-side message distribution or load balancing) between the masters. As a result, the clients need to decide which master/slave cluster to use.

Pros

The pros of this approach are:

It is a simple but highly available Artemis configuration and operational model.
It is the same topology as in Apache ActiveMQ with master/slave.
There is no possibility for split-brain, no stuck messages, and message order guaranteed.

Cons

The cons are that this approach requires a shared file system or database, which has an additional cost. Typically, database-based storage is expected to perform worse than file-based storage.

Other notes

Additional notes include the fact that journal high availability is achieved through file system or database (AWS EFS or RDS ) data replication. Optionally, masters can be clustered for message distribution, load balancing, and scalability. The number of master/slave pairs can vary (there are two in Figure 3), and scaling is achieved by adding more master/slave pairs and using client-side partitioning.

Also, in the case of VM or DC failure, it ensures HA.

Figure 3: Apache Artemis with a shared file and database store.

Clustered Apache Artemis with shared storage

In this topology, we have three master/slave pairs, ensuring HA. In addition, all of the masters are clustered and provide server-side load balancing and message distribution. In this setup, the clients can connect to any member of the cluster and exchange messages. Such a cluster can also scale and change topology without affecting client configuration.

Pros

The pros of this option are that it offers:

The same topology as in Apache ActiveMQ with master/slave and Network-of-Brokers.
Server-side message distribution and load balancing.
No possibility for split-brain scenarios.

Cons

The cons are that it requires a shared file system or database, which has an additional cost. Typically, database-based storage is expected to perform worse than file-based storage.

Other notes

With this approach, journal high availability is achieved through file system or database (AWS EFS/RDS ) data replication. Optionally, masters can be non-clustered to prevent server-side load balancing. The number of master/slave pairs can vary (there are three in Figure 4), and scaling is achieved by adding more master/slave pairs transparently to the clients.

Finally, this approach ensures HA in the case of VM or DC failure.

Figure 4: Apache Artemis with a shared file and database store.

Clustered Apache Artemis with replication

This architecture is a variation of the previous one, where we replace shared storage between Master and slave with replication. As such, this architecture has all of the benefits of server-side load-balancing and transparency for the clients. An added benefit of this architecture is that it does not require a highly-available shares storage layer. Instead, the brokers replicate the data.

Pros

The pros of this approach are that:

Data replication is performed by the broker, not by the infrastructure services.
There is no extra cost or dependency on the infrastructure for journal replication.
It offers scalable and highly available messaging infrastructure.

Cons

The cons are that:

Replication is sensitive to network latency, opening the possibility of split-brain scenarios. Notice that the replication in Figure 4 is within the same DC.
Compared to other options, this one has complex configuration and operational models.
It requires a minimum of three master and three slave brokers (as in the diagram below).

Other notes

With this approach:

The number of master/slave pairs can be different (odd number required).
Optionally, server-side message distribution and load balancing can be disabled.
It ensures HA in the case of VM failure, but not in the case of DC failure.
It requires a quorum and a certain number of brokers to be alive.

Capacity planning

The numbers and ranges shown in Figure 5 are provided only as a guide and starting point. Depending on the use case, you might have to scale up or down your individual architectural components.

Summary

Over the years, I have hardly seen two messaging architectures that are absolutely the same. Every organization has something unique in the way they manage their infrastructure and organize their teams, and that inevitably ends up reflected in the resulting architectures. Your job as a consultant or architect is to find the most suitable architecture within the current constraints, and educate and guide the customer towards the best possible outcome. There is no right or wrong architecture, but deliberate trade-off commitments in a context.

In this article, I tried to cover as many areas of Artemis as possible from an architecturally significant point of view. But by doing so, I had to be opinionated, ignore other areas, and emphasize what I think is significant based on my experience. I hope you find it useful and learned something from it. If that is the case, say something on Twitter and spread the word. This post was originally published on Red Hat Developers. To read the original post, check here.

Multi-Runtime Microservices Architecture

Thursday, May 28, 2020 No comments

Creating good distributed applications is not an easy task: such systems often follow the 12-factor app and microservices principles. They have to be stateless, scalable, configurable, independently released, containerized, automatable, and sometimes event-driven and serverless. Once created, they should be easy to upgrade and affordable to maintain in the long term. Finding a good balance among these competing requirements with today’s technology is still a difficult endeavor.

In this article, I will explore how distributed platforms are evolving to enable such a balance, and more importantly, what else needs to happen in the evolution of distributed systems to ease the creation of maintainable distributed architectures. If you prefer to see my talk on this very same topic, checkout my QConLondon recording at InfoQ.

Distributed application needs

For this discussion, I will group the needs of modern distributed applications into four categories — lifecycle, networking, state, binding — and analyze briefly how they are evolving in recent years.

Distributed application needs

Lifecycle

Let’s start with the foundation. When we write a piece of functionality, the programming language dictates the available libraries in the ecosystem, the packaging format, and the runtime. For example, Java uses the .jar format, all the Maven dependencies as an ecosystem, and the JVM as the runtime. Nowadays, with faster release cycles, what’s more important with lifecycle is the ability to deploy, recover from errors, and scale services in an automated way. This group of capabilities represents broadly our application lifecycle needs.

Networking

Almost every application today is a distributed application in some sense and therefore needs networking. But modern distributed systems need to master networking from a wider perspective. Starting with service discovery and error recovery, to enabling modern software release techniques and all kinds of tracing and telemetry too. For our purpose, we will even include in this category the different message exchange patterns, point-to-point and pub/sub methods, and smart routing mechanisms.

State

When we talk about state, typically it is about the service state and why it is preferable to be stateless. But the platform itself that manages our services needs state. That is required for doing reliable service orchestration and workflows, distributed singleton, temporal scheduling (cron jobs), idempotency, stateful error recovery, caching, etc. All of the capabilities listed here rely on having state under the hood. While the actual state management is not the scope of this post, the distributed primitives and their abstractions that depend on state are of interest.

Binding

The components of distributed systems not only have to talk to each other but also integrate with modern or legacy external systems. That requires connectors that can convert various protocols, support different message exchange patterns, such as polling, event-driven, request/reply, transform message formats, and even be able to perform custom error recovery procedures and security mechanisms.

Without going into one-off use cases, the above represent a good collection of common primitives required for creating good distributed systems. Today, many platforms offer such features, but what we are looking for in this article is how the way we used these features changed in the last decade and how it will look in the next one. For comparison, let’s look at the past decade and see how Java-based middleware addressed these needs.

Traditional middleware limitations

One of the well-known traditional solutions satisfying an older generation of the above-listed needs is the Enterprise Service Bus (ESB) and its variants, such as Message Oriented Middleware, lighter integration frameworks, and others. An ESB is a middleware that enables interoperability among heterogeneous environments using a service-oriented architecture (i.e. classical SOA).

While an ESB would offer you a good feature set, the main challenge with ESBs was the monolithic architecture and tight technological coupling between business logic and platform, which led to technological and organizational centralization. When a service was developed and deployed into such a system, it was deeply coupled with the distributed system framework, which in turn limited the evolution of the service. This often only became apparent later in the life of the software.

Here are a few of the issues and limitations of each category of needs that makes ESBs not useful in the modern era.

Lifecycle

In traditional middleware, there is usually a single supported language runtime, (such as Java), which dictates how the software is packaged, what libraries are available, how often they have to be patched, etc. The business service has to use these libraries that tightly couple it with the platform which is written in the same language. In practice, that leads to coordinated services and platform upgrades which prevents independent and regular service and platform releases.

Networking

While a traditional piece of middleware has an advanced feature set focused around interaction with other internal and external services, it has a few major drawbacks. The networking capabilities are centered around one primary language and its related technologies. For Java language, that is JMS, JDBC, JTA, etc. More importantly, the networking concerns and semantics are deeply engraved into the business service as well. There are libraries with abstractions to cope with the networking concerns (such as the once-popular Hystrix project), but the library’s abstractions "leak" into the service its programming model, exchange patterns and error handling semantics, and the library itself. While it is handy to code and read the whole business logic mixed with networking aspect in a single location, this tightly couples both concerns into a single implementation and, ultimately, a joint evolutionary path.

State

To do reliable service orchestration, business process management, and implement patterns, such as the Saga Pattern and other slow-running processes, platforms require persistent state behind the scenes. Similarly, temporal actions, such as firing timers and cron jobs, are built on top of state and require a database to be clustered and resilient in a distributed environment. The main constraint here is the fact that the libraries and interfaces interacting with state are not completely abstracted and decoupled from the service runtime. Typically these libraries have to be configured with database details, and they live within the service leaking the semantics and dependency concerns into the application domain.

Binding

One of the main drivers for using integration middleware is the ability to connect to various other systems using different protocols, data formats, and message exchange patterns. And yet, the fact that these connectors have to live together with the application, means the dependencies have to be updated and patched together with the business logic. It means the data type and data format have to be converted back and forth within the service. It means the code has to be structured and the flow designed according to the message exchange patterns. These are a few examples of how even the abstracted endpoints influence the service implementation in the traditional middleware.

Cloud-native tendencies

Traditional middleware is powerful. It has all the necessary technical features, but it lacks the ability to change and scale rapidly, which is demanded by modern digital business needs. This is what the microservices architecture and its guiding principles for designing modern distributed applications are addressing.

The ideas behind the microservices and their technical requirements contributed to the popularization and widespread use of containers and Kubernetes. That started a new way of innovation that is going to influence the way we approach distributed applications for years to come. Let’s see how Kubernetes and the related technologies affect each group of requirements.

Lifecycle

Containers and Kubernetes evolved the way we package, distribute, and deploy applications into a language-independent format. There is a lot written about the Kubernetes Patterns and the Kubernetes Effect on developers and I will keep it short here. Notice though, for Kubernetes, the smallest primitive to manage is the container and it is focused on delivering distributed primitives at the container level and the process model. That means it does a great job on managing the lifecycle aspects of the applications, health-check, recovery, deployment, and scaling, but it doesn’t do such a good job improving on the other aspects of distributed applications which live inside the container, such as flexible networking, state management, and bindings.

You may point out that Kubernetes has stateful workloads, service discovery, cron jobs, and other capabilities. That is true, but all of these primitives are at the container level, and inside the container, a developer still has to use a language-specific library to access the more granular capabilities we listed at the beginning of this article. That is what drives projects like Envoy, Linkerd, Consul, Knative, Dapr, Camel-K, and others.

Networking

It turns out, the basic networking functionality around service discovery provided by Kubernetes is a good foundation, but not enough for modern applications. With the increasing number of microservices and the faster pace of deployments, the needs for more advanced release strategies, managing security, metrics, tracing, recovery from errors, simulating errors, etc. without touching the service, have become increasingly more appealing and created a new category of software on its own, called service mesh.

What is more exciting here is the tendency of moving the networking-related concerns from the service containing the business logic, outside and into a separate runtime, whether that is sidecar or a node level-agent. Today, service meshes can do advanced routing, help to test, handle certain aspects of security, and even speak application-specific protocols (for example Envoy supports Kafka, MongoDB, Redis, MySQL, etc.). While service mesh, as a solution, might not have a wide adoption yet, it touched a real pain point in distributed systems, and I’m convinced it will find its shape and form of existence.

In addition to the typical service mech, there are also other projects, such as Skupper, that confirm the tendency of putting networking capabilities into an external runtime agent. Skupper solves multi-cluster communication challenges through a layer 7 virtual network and offers advanced routing and connectivity capabilities. But rather than embedding Skupper into the business service runtime, it runs an instance per Kubernetes namespace which acts as a shared sidecar.

To sum up, container and Kubernetes made a major step forward in the lifecycle management of the applications. Service mesh and related technologies hit a real pain point and set the foundation for moving more responsibilities outside of the application into proxies. Let’s see what’s next.

State

We listed earlier the main integration primitives that rely on state. Managing state is hard and should be delegated to specialized storage software and managed services. That is not the topic here, but using state, in language-neutral abstractions to aid integration use cases is. Today, many efforts try to offer stateful primitives behind language-neutral abstractions. Stateful workflow management is a mandatory capability in cloud-based services, with examples, such as AWS Step Functions, Azure Durable Functions, etc. In the container-based deployments, CloudState and Dapr, both rely on the sidecar model to offer better decoupling of the stateful abstractions in distributed applications.

What I look forward to is also abstracting away all of the stateful features listed above into a separate runtime. That would mean workflow management, singletons, idempotency, transaction management, cron job triggers, and stateful error handling all happening reliably in a sidecar, (or a host-level agent), rather than living within the service. The business logic doesn’t need to include such dependencies and semantics in the application, and it can declaratively request such behavior from the binding environment. For example, a sidecar can act as a cron job trigger, idempotent consumer, and workflow manager, and the custom business logic can be invoked as a callback or plugged in on certain stages of the workflow, error handling, temporal invocations, or unique idempotent requests, etc.

Another stateful use case is caching. Whether that is request caching performed by the service mesh layer, or data caching with something like Infinispan, Redis, Hazelcast, etc., there are examples of pushing the caching capabilities out of the application’s runtime.

Binding

While we are on the topic of decoupling all distributed needs from the application runtime, the tendency continues with bindings too. Connectors, protocol conversions, message transformations, error handling, and security mediation could all move out of the service runtime. We are not there yet, but there are attempts in this direction with projects such as Knative and Dapr. Moving all of these responsibilities out of the application runtime will lead to a much smaller, business-logic-focused code. Such a code would live in a runtime independent from distributed system needs that can be consumed as prepackaged capabilities.

Another interesting approach is taken by the Apache Camel-K project. Rather than using an agent runtime to accompany the main application, this project relies on an intelligent Kubernetes Operator that builds application runtimes with additional platform capabilities from Kubernetes and Knative. Here, the single agent is the operator that is responsible for including the distributed system primitives required by the application. The difference is that some of the distributed primitives are added to the application runtime and some enabled in the platform (which could include a sidecar as well).

Future architecture trends

Looking broadly, we can conclude that the commoditization of distributed applications, by moving features to the platform level, reaches new frontiers. In addition to the lifecycle, now we can observe networking, state abstraction, declarative eventing, and endpoint bindings also available off-the-shelf, and EIPs are next on this list. Interestingly enough, the commoditization is using the out-of-process model (sidecars) for feature extension rather than runtime libraries or pure platform features (such as new Kubernetes features).

We are now approaching full circle by moving all of the traditional middleware features (a.k.a ESBs) into other runtimes, and soon, all we have to do in our service will be to write the business logic.

Traditional middleware and cloud-native platforms overview

Compared to the traditional ESB era, this architecture decouples the business logic from the platform better, but not yet fully. Many distributed primitives, such as the classic enterprise integration patterns (EIPs): splitter, aggregator, filter, content-based router; and streaming processing patterns: map, filter, fold, join, merge, sliding windows; still have to be included in the business logic runtime, and many others depend on multiple distinct and overlapping platform add-ons.

If we stack up the various cloud-native projects innovating at the different domains, we end up with a picture such as the following:

Multi-runtime microservices

The diagram here is for illustration purposes only, it purposefully picks representative projects and maps them to a category of distributed primitives. In practice, you will not use all of these projects at the same time as some of them are overlapping and not compatible workload models. How to interpret this diagram?

Kubernetes and containers made a huge leap in the lifecycle management of polyglot applications and set the foundation for future innovations.
Service mesh technologies improved on Kubernetes with advanced networking capabilities and started tapping into the application concerns.
While Knative is primarily focused on serverless workloads through rapid scaling, it also addresses service orchestration and event-driven binding needs.
Dapr builds on the ideas of Kubernetes, Knative, and Service Mesh and dives into the application runtimes to tackle stateful workloads, binding, and integration needs, acting as a modern distributed middleware.

This diagram is to help you visualize that, most likely in the future, we will end up using multiple runtimes to implement the distributed systems. Multiple runtimes, not because of multiple microservices, but because every microservice will be composed of multiple runtimes, most likely two — the custom business logic runtime and the distributed primitives runtime.

Introducing multi-runtime microservices

Here is a brief description of the multi-runtime microservices architecture that is beginning to form.

Do you remember the movie Avatar and the Amplified Mobility Platform (AMP) "mech suits" developed by scientists to go out into the wilderness to explore Pandora? This multi-runtime architecture is similar to these Mecha-suits that give superpowers to their humanoid drivers. In the movie you have humans putting on suits to gain strength and access destructive weapons. In this software architecture, you have your business logic (referred to as micrologic) forming the core of the application and the sidecar mecha component that offers powerful out-of-the-box distributed primitives. The micrologic combined with the mecha capabilities form a multi-runtime microservice which is using out-of-process features for its distributed system needs. And the best part is, Avatar 2 is coming out soon to help promote this architecture. We can finally replace vintage sidecar motorcycles with awesome mecha pictures at all software conferences ;-). Let’s look at the details of this software architecture next.

This is a two-component model similar to a client-server architecture, where every component is separate runtime. It differs from a pure client-server architecture in that, here, both components are located on the same host with reliable networking among them that is not a concern. Both components are equal in importance, and they can initiate actions in either direction and act as the client or the server. One of the components is called Micrologic, and it holds the very minimal business logic stripped out of almost all of the distributed system concerns. The other accompanying component is the Mecha, and it provides all of the distributed system features we have been talking about through the article (except lifecycle which is a platform feature).

Multi-runtime (out-of-process) microservices architecture

There might be a one-to-one deployment of the Micrologic and the Mecha (known as the sidecar model), or it can be one shared Mecha with a few Micrologic runtimes. The first model is most appropriate on environments, such as Kubernetes, and the latter on the edge deployments.

Micrologic runtime characteristics

Let’s briefly explore some of the characteristics of the Micrologic runtime:

The Micrologic component is not a microservice on its own. It contains the business logic that a microservice would have, but that logic can only work in combination with the Mecha component. On the other hand, microservices are self-contained and do not have pieces of the overall functionality or part of the processing flow spread into other runtimes. The combination of a Micrologic and its Mecha counterpart form a Microservice.
This is not a function or serverless architecture either. Serverless is mostly known for its managed rapid scaling up and scale-to-zero capabilities. In the serverless architecture, a function implements a single operation as that is the unit of scalability. In that regard, a function is different from a Micrologic which implements multiple operations, but the implementation is not end-to-end. Most importantly, the implementation of the operations is spread over the Mecha and the Micrologic runtimes.
This is a specialized form of client-server architecture, optimized for the consumption of well-known distributed primitives without coding. Also, if we assume that the Mecha plays the server role, then each instance has to be specifically configured to work with the individual client(s). It is not a generic server instance aiming to support multiple clients at the same time as a typical client-server architecture.
The user code in the Micrologic does not interact directly with other systems and does not implement any distributed system primitives. It interacts with the Mecha over de facto standards, such as HTTP/gRPC, CloudEvents spec, and the Mecha communicates with other systems using enriched capabilities and guided by the configured steps and mechanisms.
While the Micrologic is responsible only for implementing the business logic stripped out of distributed system concerns, it still has to implement a few APIs at a minimum. It has to allow the Mecha and the platform to interact with it over predefined APIs and protocols (for example, by following the cloud-native design principles for Kubernetes deployments).

Mecha runtime characteristics

Here are some of the Mecha runtime characteristics:

The Mecha is a generic, highly configurable, reusable component offering distributed primitives as off-the-shelf capabilities.
Each instance of the Mecha has to be configured to work with one Micrologic component (the sidecar model) or configured to be shared with a few components.
The Mecha does not make any assumption about the Micrologic runtime. It works with polyglot microservices or even monolithic systems using open protocols and formats, such as HTTP/gRPC, JSON, Protobuf, CloudEvents.
The Mecha is configured declaratively with simple text formats, such as YAML, JSON, which dictates what features to be enabled and how to bind them to the Micrologic endpoints. For specialized API interactions, the Mechan can be additionally supplied with specs, such as OpenAPI, AsyncAPI, ANSI-SQL, etc. For stateful workflows, composed of multiple processing steps, a spec, such as Amazon State Language, can be used. For stateless integrations, Enterprise Integration Patterns (EIPs) can be used with an approach similar to the Camel-K YAML DSL. The key point here is that all of these are simple, text-based, declarative, polyglot definitions that the Mecha can fulfill without coding. Notice that these are futuristic predictions, currently, there are no Mechas for stateful orchestration or EIPs, but I expect existing Mechas (Envoy, Dapr, Cloudstate, etc) to start adding such capabilities soon. The Mecha is an application-level distributed primitives abstraction layer.
Rather than depending on multiple agents for different purposes, such as network proxy, cache proxy, binding proxy, there might be a single Mecha providing all of these capabilities. The implementation of some capabilities, such as storage, message persistence, caching, etc., would be plugged in and backed by other cloud or on-premise services.
Some distributed system concerns around lifecycle management make sense to be provided by the managing platform, such as Kubernetes or other cloud services, rather than the Mecha runtime using generic open specifications such as the Open App Model.

What are the main benefits of this architecture?

The benefits are loose coupling between the business logic and the increasing list of distributed systems concerns. These two elements of software systems have completely different dynamics. The business logic is always unique, custom code, written in-house. It changes frequently, depending on your organizational priorities and ability to execute. On the other hand, the distributed primitives are the ones addressing the concerns listed in this post, and they are well known. These are developed by software vendors and consumed as libraries, containers or services. This code changes depending on vendor priorities, release cycles, security patches, open-source governing rules, etc. Both groups have little visibility and control over each other.

Business logic and distributed system concerns coupling in application architectures

Microservices principles help decouple the different business domains by bounded contexts where every microservice can evolve independently. But microservices architecture does not address the difficulties coming from coupling the business logic with middleware concerns. For certain microservices that are light on integration use cases, this might not be a big factor. But if your domain involves complex integrations (which is increasingly becoming the case for everybody), following the microservices principles will not help you protect from coupling with the middleware. Even if the middleware is represented as libraries you include in your microservices, the moment you start migrating and changing these libraries, the coupling will become apparent. And the more distributed primitives you need, the more coupled into the integration platform you become. Consuming middleware as a separate runtime/process over a predefined API rather than a library helps loose coupling and enables the independent evolution of each component.

This is also a better way to distribute and maintain complex middleware software for vendors. As long as the interactions with the middleware are over inter-process communication involving open APIs and standards, the software vendors are free to release patches and upgrades at their pace. And the consumers are free to use their preferred language, libraries, runtimes, deployments methods, and processes.

What are the main drawbacks of this architecture?

Inter-process communication. The fact that the business logic and the middleware mechanics (you see where the name comes from) of the distributed systems are in different runtimes and that requires an HTTP or gRPC call rather than an in-process method call. Notice though, this is not a network call that is supposed to go to a different machine or datacenter. The Micrologic runtime and the Mecha are supposed to be colocated on the same host with low latency and minimal likelihood of network issues.

Complexity. The next question is, whether it is worth the complexity of development, and maintaining such systems for the gained benefits. I think the answer will be increasingly inclining towards yes. The requirements of distributed systems and the pace of release cycles are increasing. And this architecture optimizes for that. I wrote some time ago that the developers of the future will have to be with hybrid development skills. This architecture confirms and enforces further this trend. Part of the application will be written in a higher-level programming language, and part of the functionality will be provided by off-the-shelf components that have to be configured declaratively. Both parts are inter-connected not at compile-time, or through in-process dependency injection at startup time, but at deployment time, through inter-process communications. This model enables a higher rate of software reuse combined with a faster pace of change.

What comes after microservices are not functions

Microservices architecture has a clear goal. It optimizes for change. By splitting applications into business domains, this architecture offers the optimal service boundary for software evolution and maintainability through services that are decoupled, managed by independent teams, and released at an independent pace.

If we look at the programming model of the serverless architecture, it is primarily based on functions. Functions are optimized for scalability. With functions, we split every operation into an independent component so that it can scale rapidly, independently, and on-demand. In this model, the deployment granularity is a function. And the function is chosen because it is the code construct that has an input whose rate correlates directly to the scaling behavior. This is an architecture that is optimized for extreme scalability, rather than long term maintainability of complex systems.

What about the other aspect of Serverless, which comes from the popularity of AWS Lambda and its fully managed operational nature? In this regard, "AWS Serverless" optimizes for speed of provisioning for the expense of lack of control and lock-in. But the fully managed aspect is not application architecture, it is a software consumption model. It is an orthogonal functionally, similar to consuming a SaaS-based platform which in an ideal world should be available for any kind of architecture whether that is monolithic, microservices, mecha or functions. In many ways, AWS Lambda resembles a fully managed Mecha architecture with one big difference: Mecha does not enforce the function model, instead it allows a more cohesive code constructs around the business domain, split from all middleware concerns.

Application architecture optimizations

Mecha architecture, on the other hand, optimizes microservices for middleware independence. While microservices are independent of each other, they are heavily dependent on embedded distributed primitives. The Mecha architecture splits these two concerns into separate runtimes allowing their independent release by independent teams. This decoupling improves day-2 operations (such as patching and upgrades) and the long term maintainability of the cohesive units of business logic. In this regard, Mecha architecture is a natural progression of the microservices architecture by splitting software based on the boundaries that cause most friction. That optimization provides more benefits in the form of software reuse and evolution than the function model, which optimizes for extremely scalability at the expense of over-distribution of code.

Conclusion

Distributed applications have many requirements. Creating effective distributed systems requires multiple technologies and a good approach to integration. While traditional monolithic middleware provided all of the necessary technical features required by distributed systems, it lacked the ability to change, adapt, and scale rapidly, which was required by the business. This is why the ideas behind microservices-based architectures contributed to the rapid popularization of containers and Kubernetes; with the latest developments in the cloud-native space, we are now coming full circle by moving all of the traditional middleware features into the platform and off-the-shelf auxiliary runtimes.

This commoditization of application features is primarily using the out-of-process model for feature extension, rather than runtime libraries or pure platform features. That means that in the future it is highly likely that we will use multiple runtimes to implement distributed systems. Multiple runtimes, not because of multiple microservices, but because every microservice will be composed of multiple runtimes; a runtime for the custom micro business logic, and an off-the-shelf, configurable runtime for distributed primitives.

This article was originally published on InfoQ here.

Top 10 must-know Kubernetes design patterns

Monday, May 25, 2020 No comments

Here are the must-know top 10 design patterns for beginners synthesized from the Kubernetes Patterns book. Getting familiar with these patterns will help you understand foundational Kubernetes concepts, which in turn will help you in discussions and when designing Kubernetes-based applications. There are many important concepts in Kubernetes, but these are the most important ones to start with:

Top 10 must-know Kubernetes design patterns

Top 10 most common Kubernetes Patterns

To help you understand, the patterns are organized into a few categories below, inspired by the Gang of Four’s design patterns.

Foundational patterns

These patterns represent the principles and best practices that containerized applications must comply with in order to become good cloud-native citizens. Regardless of the application’s nature, you should aim to follow these guidelines. Adhering to these principles will help ensure that your applications are suitable for automation on Kubernetes.

Health Probe pattern

Health Probe dictates that every container should implement specific APIs to help the platform observe and manage the application in the healthiest way possible. To be fully automatable, a cloud-native application must be highly observable by allowing its state to be inferred so that Kubernetes can detect whether the application is up and ready to serve requests. These observations influence the life-cycle management of Pods and the way traffic is routed to the application.

Predictable Demands pattern

Predictable Demands explains why every container should declare its resource profile and stay confined to the indicated resource requirements. The foundation of successful application deployment, management, and coexistence on a shared cloud environment is dependent on identifying and declaring the application’s resource requirements and runtime dependencies. This pattern describes how you should declare application requirements, whether they are hard runtime dependencies or resource requirements. Declaring your requirements is essential for Kubernetes to find the right place for your application within the cluster.

Automated Placement patterns

Automated Placement explains how to influence workload distribution in a multi-node cluster. Placement is the core function of the Kubernetes scheduler for assigning new Pods to nodes satisfying container resource requests and honoring scheduling policies. This pattern describes the principles of Kubernetes’ scheduling algorithm and the way to influence the placement decisions from the outside.

Structural patterns

Having good cloud-native containers is the first step, but not enough. Reusing containers and combining them into Pods to achieve the desired outcome is the next step. The patterns in this category are focused on structuring and organizing containers in a Pod to satisfy different use cases. The forces that affect containers in Pods result in these patterns.

Init Container pattern

Init Container introduces a separate life cycle for initialization-related tasks and the main application containers. Init Containers enable separation of concerns by providing a separate life cycle for initialization-related tasks distinct from the main application containers. This pattern introduces a fundamental Kubernetes concept that is used in many other patterns when initialization logic is required.

Sidecar patterns

Sidecar describes how to extend and enhance the functionality of a pre-existing container without changing it. This pattern is one of the fundamental container patterns that allows single-purpose containers to cooperate closely together.

Behavioral patterns

These patterns describe the life-cycle guarantees of the Pods ensured by the managing platform. Depending on the type of workload, a Pod might run until completion as a batch job or be scheduled to run periodically. It might run as a daemon service or singleton. Picking the right life-cycle management primitive will help you run a Pod with the desired guarantees.

Batch Job patterns

Batch Job describes how to run an isolated, atomic unit of work until completion. This pattern is suited for managing isolated atomic units of work in a distributed environment.

Stateful Service patterns

Stateful Service describes how to create and manage distributed stateful applications with Kubernetes. Such applications require features such as persistent identity, networking, storage, and ordinality. The StatefulSet primitive provides these building blocks with strong guarantees ideal for the management of stateful applications.

Service Discovery pattern

Service Discovery explains how clients can access and discover the instances that are providing application services. For this purpose, Kubernetes provides multiple mechanisms, depending on whether the service consumers and producers are located on or off the cluster.

Higher-level patterns

The patterns in this category are more complex and represent higher-level application management patterns. Some of the patterns here (such as Controller) are timeless, and Kubernetes itself is built on top of them.

Controller pattern

Controller is a pattern that actively monitors and maintains a set of Kubernetes resources in a desired state. The heart of Kubernetes itself consists of a fleet of controllers that regularly watch and reconcile the current state of applications with the declared target state. This pattern describes how to leverage this core concept for extending the platform for our own applications.

Operator pattern

An Operator is a Controller that uses a CustomResourceDefinitions to encapsulate operational knowledge for a specific application in an algorithmic and automated form. The Operator pattern allows us to extend the Controller pattern for more flexibility and greater expressiveness. There are an increasing number of Operators for Kubernetes, and this pattern is turning into the major form of operating complex distributed systems.

Summary

Today, Kubernetes is the most popular container orchestration platform. It is jointly developed and supported by all major software companies and offered as a service by all of the major cloud providers. Kubernetes supports both Linux and Windows systems, plus all major programming languages. This platform can also orchestrate and automate stateless and stateful applications, batch jobs, periodic tasks, and serverless workloads. The patterns described here are the most commonly used ones from a broader set of patterns that come with Kubernetes as shown below.

Kubernetes patters categorized

Kubernetes is the new application portability layer and the common denominator among everybody on the cloud. If you are a software developer or architect, the odds are that Kubernetes will become part of your life in one form or another. Learning about the Kubernetes patterns described here will change the way you think about this platform. I believe that Kubernetes and the concepts originating from it will become as fundamental as object-oriented programming concepts. The patterns here are an attempt to create the Gang of Four design patterns, but for container orchestration. Reading this article must not be the end, but the beginning of your Kubernetes journey. Happy kubectl-ing!

This post was originally published on Red Hat Developers. To read the original post, check here.

Camel Rebirth with Subsecond Experiences

Thursday, May 23, 2019 No comments

This post was originally published here. Check out my new Kubernetes Patterns book and follow me @bibryam for future blog posts.

A look at the past decade

The integration space is in constant change. There are many open source projects and closed source technologies that have not passed the test of time and disappeared from the middleware stacks for good. After a decade, Apache Camel is still here and becoming even stronger for the next decade of integration.

Gang of Four for integration

Apache Camel started life as an implementation of the Enterprise Integration Patterns (EIP) book. Today, these patterns are the equivalent of the object-oriented Gang of Four Design Patterns but for messaging and integration domain. They are agnostic of programing language, platform, architecture, and provide a universal language, notation and description of the forces around fundamental messaging primitives.

But Camel community did not stop with these patterns, and kept evolving and adding newer patterns from SOA, Microservices, Cloud Native and Serverless paradigms. As a result, Camel turned into a generic pattern based integration framework suitable for multiple architectures.

Universal library consumption model

While the patterns gave the initial spark to Camel, its endpoints quickly became popular and turned into a universal protocol for using Java based integration libraries as connectors. Today, there are hundreds of Java libraries that can be used as Camel connectors using the Camel endpoint notation. It takes a while to realize that Camel can also be used without the EIPs and the routing engine. It can act as a connector framework where all libraries are consumed as universal URIs without a need for understanding the library specific factories and configurations that vary widely across Java libraries.

The right level of abstraction

When you talk to developers who have not used Camel in anger before, they will tell you that it is possible to do integration without Camel. And they are right about the 80% of the easy use cases, but not for the remaining 20% of the cases that can turn a project into a multi-year frustrating experience. What they do not realize yet is that without Camel, there are multiple manual ways of doing the same thing, but none are validated by the experience of hundreds of open source developers. And if you fast-forward a few years, 10s of different systems to integrate with, and 10s of developers coming and going, 100s of microservices, an integration project can quickly turn into a bespoke home-grown framework that nobody wants to work on. Doing integration is easy, but doing good integration that will evolve and grow for many years, by many teams, is the hardest part. Camel addresses this challenge with universal patterns and connectors, combined with integration focused DSLs, that have passed the test of time. Next time, if you think you don't need Camel, your are either thinking for short term gains, or you are not realizing yet how complex integration can become.

Embracing change

It takes only a couple of painful experiences in large integration projects to start appreciating Camel. But Camel is not great only because it was started by and built on the works of great minds, it is great because it evolves thanks to the world's knowledge, shared through the open source model and its networking effects. Camel started as the routing layer in ESBs during SOA period with a lot of focus on XML, WS, JBI, OSGI etc, but then it quickly adapted for REST, Swagger, Circuit breakers, SAGAs, and Spring Boot in the Microservices era. And the world has not stopped there, with Docker and Kubernetes, and now Serverless architecture, Camel keeps embracing the change. That is because Camel is written for integrating changing environments, and Camel itself grows and shines on change. Camel is a change enabling library for integration.

Behind the scene engine

One of the Camel secret sauces is that it is a non-intrusive, unopinionated, small (5MB and getting smaller) integration library without any affinity where and how you consume it. If you notice, this is the opposite of an ESB as commonly Camel is confused with because of its extensive capabilities. Over the years, Camel has been used as the internal engine powering projects such as:

Apache ServiceMix ESB
Apache ActiveMQ
Talend ESB
JBoss Switchyard
JBoss Fuse Service Works
Red Hat Fuse
Fuse Online/Syndesis
And many other frameworks mentioned here.

You can use Camel standalone, embed it with Apache Tomcat, with Spring Boot starters, JBoss WildFly, Apache Karaf, Vert.x, Quarkus, you name it. Camel doesn't care and it will bring superpowers to your project every time.

Looking to the future

Nobody can tell how the ideal integration stack will look like in a decade, nor can I. But I will tell you about two novelties coming into Apache Camel now (and to Red Hat Fuse later), and why they will have a noticeable positive effect for the developers and the business. I call these changes as subsecond deployment and subsecond startup of Camel applications.

Subsecond deployments to Kubernetes

There was a time when cloud-native meant different technologies. Today, after a few years of natural selection and consolidation in the industry, cloud-native means applications created specifically for Kubernetes and its ecosystem of projects around CNCF. Even with this definition, there are many shades of cloud-native, from running a monolithic non-scalable application in a container, to triggering a function that is fully embracing the cloud-native development and management practices. The Camel community has realized that Kubernetes is the next generation application runtime, and it is steadily working on making Camel a Kubernetes native integration engine. The same way Camel is a first-class citizen in OSGI containers, JEE application servers, other fat-jar runtimes, Camel is becoming a first-class citizen on Kubernetes, integrating deeply and benefiting from the resiliency and scalability offered by the platform.

Here are a few of the many enhancement efforts going on in this direction:

Deeper Kubernetes integration - Kubernetes API connector, full health-check API implementation for Camel subsystems, service discovery through a new ServiceCall EIP, configuration management using ConfigMaps. Then a set of application patterns with special handling on Kubernetes such as: clustered singleton routes, scalable XA transactions (because sometimes, you have to), SAGA pattern implementation, etc.
Cloud-native integrations - support for other cloud-native projects such as exposing Camel metrics for Prometheus, tracing Camel routes through Jaeger, JSON formatted logging for log aggregation, etc.
Immutable runtimes - whether you use the minimalist immutable Karaf packaging, or Spring Boot, Camel is a first class citizen ready to put in a container image. There are also Spring Boot starter implementations for all Camel connectors, integration with routes, properties, converters, and whatnot.
Camel 3 - is a fact and actively progressing. A big theme for Camel 3 is to make it more modular, smaller, with faster startup time, reactive, non-blocking and triple awesome. This is the groundwork needed to restructure Camel for the future cloud workloads.
Knative integration - Knative is an effort started by Google in order to bring some order and standardization in the serverless world dominated by Amazon Lambda. And Camel is among the projects that integrate with Knative primitives from early days and enhances the Knative ecosystem with hundreds of connectors acting as generic event sources.
And here is a real game-changer initiative: Camel-K (a.k.a deep Kubernetes integration for Camel) - we have seen that Camel is typically embedded into the latest modern runtime where it acts as the developer-friendly integration engine behind the scene. The same way Camel used to benefit from JEE services in the past for hot-deployment, configuration management, transaction management, etc, today Camel-K allows Camel runtime to benefit from Kubernetes features for high-availability, resiliency, self-healing, auto-scaling, and basically distributed application management in general. The way Camel-K achieves this is through a CLI and an Operator where the latter is able to understand the Camel applications, its build time dependencies, runtime needs, and make intelligent choices from the underlying Kubernetes platform and its additional capabilities (from Knative, Istio, Openshift and others in the future). It can automate everything on the cluster such as picking the best suited container image, runtime management model and update them when needed. The CLI can automate the tasks that are on the developer machine such as observing the code changes and streaming those to the Kubernetes cluster, and printing the logs from the running Pods.

Camel route auto-deployment to Kubernetes with Camel-K

Camel-K operator understands two domains: Kubernetes and Camel. By combining knowledge of both areas, it can automate tasks that usually require a human operator.
The really powerful part is that, with Camel-K, a Camel route can be built and deployed from source code to a running Camel route on Kubernetes in less than a second!

Time to deploy and run a Camel integration(in seconds)

Forget about making a coffee, or even having a sip while building and deploying a Camel route with Camel K. As soon as you make changes to your source code and open a browser, the Camel route will be running in Kubernetes. This has a noticeable impact on the way the developers write Camel code, compile, drink coffee, deploy and test. Apart from changing the development practices and habits, this toolset will significantly reduce the development cycles which would be noticed by the business stakeholders too. For live demonstration, check out the following awesome video from Fuse engineers working on Camel-K project.

Subsecond startups of Camel applications

A typical enterprise integration landscape is composed of stateless services, stateful services, clustered applications, batch jobs, file transfers, messaging, real time integrations, and may be even blockchain based business processes. To that mix, today, we also have to add serverless workloads as well, which is best suited for event driven workloads. Historically, the heavy and slow Java runtime had significant drawbacks compared Go, Javascript and other light runtimes in the serverless space. That is one of the main motivations for Oracle to create GraalVM/Substrate VM. Substrate VM is a framework that enables ahead-of-time (AOT) compilation of Java applications into native executables that are light and fast. Then a recent effort by Red Hat led to the creation of Quarkus project which improves further the resource consumptions, startup and response times of Java applications mind-blowingly (a term not-used lightly here).

Supersonic Subatomic Java with Quarkus

As you can see from the metrics above, Quarkus combined with SubstrateVM is not a gradual evolution. It is a mutation, and a revolutionary jump that suddenly changes the perspectives on the Java’s footprint and speed in the cloud native workloads. It makes Java friendly for serverless architecture. Considering the huge Java ecosystem composed of developers and libraries, it even turns Java into the best suited language for Serverless applications. And Camel combined with Quarkus, the best placed integration library in this space.

Summary

With the explosion of Microservices architecture, the number of services has increased tenfold which gave birth to Kubernetes-enabled architectures. These architectures are highly dynamic in nature, and most powerful with light and fast runtimes that enable instant scale up and higher deployment density.

Camel is the framework to fill the space between disparate systems and services. It offers data consistency guarantees, reliable communication, failover, failure detection and recovery, and so on, in a way that makes developers productive. Now, imagine the same powerful Apache Camel based integration in 2020 that: deploys to Kubernetes in 20ms; starts up in 20ms; requires 20MB memory, consumes 20MB on the disk... That is regardless whether it runs as a stateless application in a container, or as a function on KNative. That is 100x faster deployments to Kubernetes, 100x faster startup time, 10x less resource consumption allowing real-time scale-up, scale-down, and scale to zero. That is a change that the developers will notice during development, users will notice when using the system, and the business will notice on the infrastructure cost and overall delivery velocity. That is the real cloud-native era we have been waiting for.

Getting started with blockchain for Java developers

Monday, May 13, 2019 No comments

Follow me on twitter for other posts in this space. This post was originally published on Opensource.com under CC BY-SA 4.0. If you prefer, read the same post on Hacker Noon.

Top technology prognosticators have listed blockchain among the top 10 emerging technologies with the potential to revolutionize our world in the next decade, which makes it well worth investing your time now to learn. If you are a developer with a Java background who wants to get up to speed on blockchain technology, this article will give you the basic information you need to get started.
Blockchain is a huge space and at first it can be overwhelming to navigate. Blockchain is different from other software technologies as it has a parallel non-technical universe with a focus on speculations, scams, price volatility, trading, ICOs, cryptocurrencies, Bitcoin maximalism, game theory, human greed, etc. Here we will ignore that side of blockchain completely and look at the technical aspects only.

The theoretical minimum for blockchain

Regardless of the programing language, implementation details, there is a theoretical minimum about blockchain that you should be familiar with. Without this understanding, it is impossible to grasp the foundations, and build on. Based on my experience, the very minimum two technologies that must be understood are Bitcoin and Ethereum. It happens that both projects introduced something new in this space, both currently have the highest market cap, and highest developer community, etc. Most other blockchain projects, whether they are public or private, permissionless or permissioned, are forks of Bitcoin or Ethereum, or build and improve their shortcomings in some ways by making certain trade-offs. Understanding these two projects is like taking networking, database theory, messaging, data structures and two programing language classes in the university. Understanding how these two blockchain technologies will open your mind for the blockchain universe.

Tech books to start with blockchain

The two books I recommend for this purpose happen to be from the same author - Andreas M. Antonopoulos:

Mastering Bitcoin is the most in depth, technical but still understandable and easy to read book I could find about Bitcoin. The tens of other books I checked on this topic were either mostly philosophical and non-technical.
On the Ethereum side, there are many more technical books, but I liked the level of detail in Mastering Ethereum most.
Building Ethereum Dapps is another book I found very thorough and covering the Ethereum development very well.

Most popular Java based blockchain projects

If you are coming from a technical background, it makes sense to build on that knowledge and see what blockchain brings to the table. In the end, blockchain is a fully new technology, but a new combination of existing technologies with human behavior fueled by network effects.

It is worth stating that the popular technologies such as Java, .Net, relational databases are not common in the blockchain space. This space is primarily dominated by C, Go, Rust on the server side, and JavaScript on the client side. But if you know Java, there are a few projects and components written in Java that can be used as a leveraged entry point to the blockchain space.

Assuming you read the above two books, and want to get your hands dirty, here are a few open source blockchain projects written in Java:

Popular Java-based blockchain projects

Corda - this is probably the most natural starting point for a Java developer. Corda is JVM based project that builds on top of popular widely used Java projects such as Apache Artemis, Hibernate, Apache Shiro, Jackson, and relational databases. It is inspired by Bitcoin, but has elements of business processes, messaging, and other familiar concepts. Check out my first impressions from it as a Java developer here.
Pantheon - is a full implementation of an Ethereum node in Java. It is specifically created to attract developers from the Java ecosystem into the blockchain world. Here is an intro and a getting started video by its creators.
BitcoinJ - is the most popular Java implementation of the Bitcoin protocol. If you prefer to start with Bitcoin directly, this is the Java project to explore.
Web3J - while Corda, Pantheon are examples of a full blockchain node implemented in Java, Web3J is client library written in Java. It is very well documented and active project that makes talking to Ethereum compatible nodes straight forward. I created a Apache Camel connector for it and wrote about it here.
Hyperledger Fabric Java SDK - one of the most popular enterprise blockchain projects is Hyperledger Fabric and it has a full-featured Java SDK to play with.

FundRequest - I also want to point you to full end user applications written in Java. While the above projects are examples of clients or nodes, FundRequest is an open source funding platform implemented on top of Ethereum network and fully written in Java. It gives a good idea how to implement a complete blockchains project interacting with the Ethereum network.
Eventum - this is a Java project that can help you monitor the Ethereum network and store Events on Kafka. It addresses a few of the common challenges when integrating with blockchain networks which are decentralized.

If you are still not sure where to start, I suggest you read Mastering Bitcoin, that will give you the solid foundation. If you like touching technology before reading, go to Github and play with one of the projects listed above. The rest will follow. The future is open and decentralized.

OFBizian

Blogroll

Architecting messaging solutions with Apache ActiveMQ Artemis

Typical customer requirements

The constraints identification approach

Infrastructure

Storage

Storage capacity

Storage type

Orchestration

High availability

Storage

Broker

Load balancer

Clients

Scalability

Active-active clustering

Client-side partitioning

Load balancer

Apache Qpid

Client-side load balancing

Clients

AMQP 1.0

Core

OpenWire

Reference architectures

Non-clustered Apache Artemis with shared storage

Pros

Cons

Other notes

Clustered Apache Artemis with shared storage

Pros

Cons

Other notes

Clustered Apache Artemis with replication

Pros

Cons

Other notes

Capacity planning

Summary

Multi-Runtime Microservices Architecture

Distributed application needs

Lifecycle

Networking

State

Binding

Traditional middleware limitations

Lifecycle

Networking

State

Binding

Cloud-native tendencies

Lifecycle

Networking

State

Binding

Future architecture trends

Introducing multi-runtime microservices

Micrologic runtime characteristics

Mecha runtime characteristics

What are the main benefits of this architecture?

What are the main drawbacks of this architecture?

What comes after microservices are not functions

Conclusion

Top 10 must-know Kubernetes design patterns

Foundational patterns

Health Probe pattern

Predictable Demands pattern

Automated Placement patterns

Structural patterns

Init Container pattern

Sidecar patterns

Behavioral patterns

Batch Job patterns

Stateful Service patterns

Service Discovery pattern

Higher-level patterns

Controller pattern

Operator pattern

Summary