OFBizian: Integration

Showing posts with label Integration. Show all posts

First Steps with Dapr

Saturday, August 13, 2022 No comments

I recently left Red Hat to join Diagrid and work on the Dapr project. I spoke about Dapr when it was initially announced by Microsoft, but hadn’t looked into it since it joined CNCF. Two years later, during my onboarding into the new role, I spent some time looking into it and here are the steps I took in the journey and my impressions so far.

What is Dapr?

TL;DR: Dapr is a distributed systems toolkit in a box. It addresses the peripheral integration concerns of applications and lets developers focus on the business logic. If you are familiar with Apache Camel, Spring Framework in the Java world, or other distributed systems frameworks, you will find a lot of similarities with Dapr. Here are a few parallels with other frameworks:

Similar to Camel, Dapr has connectors (called bindings) that let you connect to various external systems.
Similar to HashiCorp Consul, Dapr offers services discovery which can be backed by Consul.
Similar to Spring Integration, Spring Cloud, (remember Netflix Hystrix?) and many other frameworks, Dapr has error handling capabilities with retries, timeouts, circuit breakers which are called resiliency policies.
Similar to Spring Data KeyValue, Dapr offers Key/Value-based state abstractions.
Similar to Kafka, Dapr offers pub/sub-based service interactions.
Similar to ActiveMQ clients, Dapr offers DLQs, but these are not specific to a messaging technology, which means they can be used even with things such as AWS SQS or Redis for example.
Similar to Spring Cloud Config, Dapr offers configuration and secret management
Similar to Zookeeper or Redis clients, Dapr offers distributed locks
Similar to a Service Mesh, Dapr offers mTLS and additional security between your application and the sidecar.
Similar to Envoy, Dapr offers enhanced observability through automatic metrics, tracing and log collection.

The primary difference between all of these frameworks and Dapr is that the latter offers its capabilities not as a library within your application, but as a sidecar running next to your application. These capabilities are exposed behind well-defined HTTP and gRPC APIs (very creatively called building blocks) where the implementations (called components) can be swapped w/o affecting your application code.

High-level Dapr architecture

You could say, Dapr is a collection of stable APIs exposed through a sidecar and swappable implementations running somewhere else. It is the cloudnative incarnation of integration technologies that makes integration capabilities previously available only in a few languages, available to everybody, and portable everywhere: Kubernetes, on-premise, or literally on the International Space Station (I mean the edge).

Getting started

The project is surprisingly easy to get up and running regardless of your developer background and language of choice. I was able to follow the getting started guides and run various quickstarts in no time on my MacOS. Here are roughly the steps I followed.

Install Dapr CLI

Dapr CLI is the main tool for performing Dapr-related tasks such as running an application with Dapr, seeing the logs, running Dapr dashboard, or deploying all to Kubernetes.

brew install dapr/tap/dapr-cli

With the CLI installed, we have a few different options for installing and running Dapr. I’ll start from the least demanding and flexible option and progress from there.

Option 1: Install Dapr without Docker

This is the lightest but not the most useful way to run Dapr.

dapr init --slim

In this slim mode only daprd and placement binaries are installed on the machine which is sufficient for running Dapr sidecars locally.

Run a Dapr sidecar

The following command will start a Dapr sidecar called no-app listening on HTTP port 3500 and a random gRPC port.

dapr run --app-id no-app --dapr-http-port 3500

Congratulations, you have your first Dapr sidecar running. You can see the sidecar instance through this command:

dapr list

or query its health status:

curl -i http://localhost:3500/v1.0/healthz

Dapr sidecars are supposed to run next to an application and not on their own. Let’s stop this instance and run it with an application.

dapr stop --app-id no-app

Run a simple app with a Dapr sidecar

For this demonstration we will use a simple NodeJS application:

git clone https://github.com/dapr/samples.git

cd samples/hello-dapr-slim

npm install

This is a Hello World the Dapr way and here is the gist of it:

app.post('/neworder', bodyParser.json(), (req, res) => {

const data = req.body.data;

const orderId = data.orderId;

res.status(200).send("Got a new order! Order ID: " + orderId); });

The application has one /neworder endpoint listening on port 3000. We can run this application and the sidecar with the following command:

dapr run --app-id nodeapp --app-port 3000 --dapr-http-port 3500 node app.js

The command starts the NodeJS application on port 3000 and Dapr HTTP endpoint on 3500. Once you see in the logs that the app has started successfully, we can poke it. But rather than hitting the /neworder endpoint directly on port 3000, we will instead interact with the application through the sidecar. We do that using Dapr CLI like this:

dapr invoke --verb POST --app-id nodeapp --method neworder --data '{"data": { "orderId": "41" } }'

And see the response from the app. If you noticed, the CLI only needs the app-id (instead of host and port) to locate where the service is running. The CLI is just a handy way to interact with your service. It that seems like too much magic, we can use bare-bones curl command too:

curl -XPOST -d @sample.json -H "Content-Type:application/json" http://localhost:3500/v1.0/invoke/nodeapp/method/neworder

This command uses the service Dapr’s invocation API to synchronously interact with the application. Here is a visual representation of what just happened:

Invoking an endpoint through Dapr sidecar

Now, with Dapr on the request path, we get the Daprized service invocation benefits such as resiliency policies such as retries, timeouts, circuit breakers, concurrency control; observability enhancements such as: metrics, tracing, logs; security enhancements such as mTLS, allow lists, etc. At this point, you can try out metadata, metrics endpoints, play with the configuration options, or see your single microservice in Dapr dashboard.

dapr dashboard

The slim mode we are running on is good for the Hello World scenario, but not the best setup for local development purposes as it lacks state store, pub/sub, metric server, etc. Let’s stop the nodeapp using the command from earlier (or CTL +C), and remove the slim Dapr binary:

dapr uninstall

One thing to keep in mind is that this command will not remove the default configuration and component specification files usually located in: ~/.dapr folder. We didn’t create any files in the steps so far, but if you follow other tutorials and change those files, they will remain and get applied with every dapr run command in the future (unless overridden). This caused me some confusion, keep it in mind.

Option 2: Install Dapr with Docker

This is the preferred way for running Dapr locally for development purposes but it requires Docker. Let’s set it up:

dapr init

The command will download and run 3 containers

Dapr placement container used with actors(I wish this was an optional feature)
Zipkin for collecting tracing information from our sidecars
And a single node Redis container used for state store, pub/sub, distributed-lock implementations.

You can verify when these containers are running and you are ready to go.
docker ps

Run the Quickstarts

My next step from here was to try out the quickstarts that demonstrate the building blocks for service invocation, pub/sub, state store, bindings, etc. The awesome thing about these quickstarts is that they demonstrate the same example in multiple ways:

With Dapr SDK and w/o any dependency to Dapr SDK i.e. using HTTP only.
In multiple languages: Java, Javascript, .Net, Go, Python, etc.

You can mix and match different languages and interaction methods (SDK or native) for the same example which demonstrates Dapr’s polyglot nature.

Option 3: Install Dapr on Kubernetes

If you have come this far, you should have a good high-level understanding of what Dapr can do for you. The next step would be to deploy Dapr on Kubernetes where most of the Dapr functionalities are available and closest to a production deployment. For this purpose, I used minikube locally with default settings and no custom tuning.
dapr init --kubernetes --wait

If successful, this command will start the following pods in dapr-system namespace:

dapr-operator: manages all components for state store, pub/sub, configuration, etc
dapr-sidecar-injector: injects dapr sidecars into annotated deployment pods
dapr-placement: required with actors only.
dapr-sentry: manages mTLS between services and acts as a certificate authority.
dapr-dashboard: a simple webapp to explore what is running within a Dapr cluster

These Pods collectively represent the Dapr control plane.

Injecting a sidecar

From here on, adding a Dapr sidecar to an application (this would be Dapr dataplane) is as easy as adding the following annotations to your Kubernetes Deployments:

annotations:

dapr.io/enabled: "true"

dapr.io/app-id: "nodeapp"

dapr.io/app-port: "3000"

The dapr-sidecar-injector service watches for new Pods with the dapr.io/enabled annotation and injects a container with the daprd process within the pod. It also adds DAPR_HTTP_PORT and DAPR_GRPC_PORT environment variables to your container so that it can easily communicate with Dapr without hard-coding Dapr port values.

To deploy a complete application on Kubernetes I suggest this step-by-step example. It has a provider and consumer services and it worked the first time for me.

Transparent vs explicit proxy

Notice Dapr sidecar injection is less intrusive than a typical service mesh with a transparent sidecar such as Istio’s Envoy. To inject a transparent proxy, typically the Pods also get injected with an init-container that runs at the start of the Pod and re-configures the Pods networking rules so that all ingress and egress traffic or your application container goes through the sidecar. With Dapr, that is not the case. There is a sidecar injected, but your application is in control of when and how to interact with Dapr over its well-defined explicit (non-transparent) APIs. Transparent service mesh proxies operate at lower network layers typically used by operations teams, whereas Dapr provides application layer primitives needed by developers. If you are interested in this topic, here is a good explanation of the differences and overlaps of Dapr with services meshes.

Summary

And finally, here are some closing thoughts with what I so far liked more and what less from Dapr.

Liked more

I love the fact that Dapr is one of the few CNCF projects targeting developers creating applications, and not only operations team who are running these applications. We need more cloudnative tools for implementing applications.
I love the non-intrusive nature of Dapr where capabilities are exposed over clear APIs and not through some black magic. I prefer transparent actions for instrumentation, observability, and general application insight, but not for altering application behavior.
I loved the polyglot nature of Dapr offering its capabilities to all programming languages and runtimes. This is what attracted me to Kubernetes and cloudnative in the first place.
I loved how easy it is to get started with Dapr and the many permutations of each quickstart. There is something for everyone regardless of where you are coming from into Dapr.
I’m excited about WASM modules and remote components features coming into Dapr. These will open new surface areas for more contributions and integrations.

Liked less

I haven’t used actors before and it feels odd to have a specific programming model included in a generic distributed systems toolkit. Luckily you don’t have to use it if you don’t want to.
The documentation is organized, but too sparse into multiple short pages. Learning a topic will require navigating a lot of pages multiple times, and it is still hard to find what you are looking for.

Follow me at @bibryam to join my journey of learning and using Dapr and shout out with any thoughts and comments.

Multi-Runtime Microservices Architecture

Thursday, May 28, 2020 No comments

Creating good distributed applications is not an easy task: such systems often follow the 12-factor app and microservices principles. They have to be stateless, scalable, configurable, independently released, containerized, automatable, and sometimes event-driven and serverless. Once created, they should be easy to upgrade and affordable to maintain in the long term. Finding a good balance among these competing requirements with today’s technology is still a difficult endeavor.

In this article, I will explore how distributed platforms are evolving to enable such a balance, and more importantly, what else needs to happen in the evolution of distributed systems to ease the creation of maintainable distributed architectures. If you prefer to see my talk on this very same topic, checkout my QConLondon recording at InfoQ.

Distributed application needs

For this discussion, I will group the needs of modern distributed applications into four categories — lifecycle, networking, state, binding — and analyze briefly how they are evolving in recent years.

Distributed application needs

Lifecycle

Let’s start with the foundation. When we write a piece of functionality, the programming language dictates the available libraries in the ecosystem, the packaging format, and the runtime. For example, Java uses the .jar format, all the Maven dependencies as an ecosystem, and the JVM as the runtime. Nowadays, with faster release cycles, what’s more important with lifecycle is the ability to deploy, recover from errors, and scale services in an automated way. This group of capabilities represents broadly our application lifecycle needs.

Networking

Almost every application today is a distributed application in some sense and therefore needs networking. But modern distributed systems need to master networking from a wider perspective. Starting with service discovery and error recovery, to enabling modern software release techniques and all kinds of tracing and telemetry too. For our purpose, we will even include in this category the different message exchange patterns, point-to-point and pub/sub methods, and smart routing mechanisms.

State

When we talk about state, typically it is about the service state and why it is preferable to be stateless. But the platform itself that manages our services needs state. That is required for doing reliable service orchestration and workflows, distributed singleton, temporal scheduling (cron jobs), idempotency, stateful error recovery, caching, etc. All of the capabilities listed here rely on having state under the hood. While the actual state management is not the scope of this post, the distributed primitives and their abstractions that depend on state are of interest.

Binding

The components of distributed systems not only have to talk to each other but also integrate with modern or legacy external systems. That requires connectors that can convert various protocols, support different message exchange patterns, such as polling, event-driven, request/reply, transform message formats, and even be able to perform custom error recovery procedures and security mechanisms.

Without going into one-off use cases, the above represent a good collection of common primitives required for creating good distributed systems. Today, many platforms offer such features, but what we are looking for in this article is how the way we used these features changed in the last decade and how it will look in the next one. For comparison, let’s look at the past decade and see how Java-based middleware addressed these needs.

Traditional middleware limitations

One of the well-known traditional solutions satisfying an older generation of the above-listed needs is the Enterprise Service Bus (ESB) and its variants, such as Message Oriented Middleware, lighter integration frameworks, and others. An ESB is a middleware that enables interoperability among heterogeneous environments using a service-oriented architecture (i.e. classical SOA).

While an ESB would offer you a good feature set, the main challenge with ESBs was the monolithic architecture and tight technological coupling between business logic and platform, which led to technological and organizational centralization. When a service was developed and deployed into such a system, it was deeply coupled with the distributed system framework, which in turn limited the evolution of the service. This often only became apparent later in the life of the software.

Here are a few of the issues and limitations of each category of needs that makes ESBs not useful in the modern era.

Lifecycle

In traditional middleware, there is usually a single supported language runtime, (such as Java), which dictates how the software is packaged, what libraries are available, how often they have to be patched, etc. The business service has to use these libraries that tightly couple it with the platform which is written in the same language. In practice, that leads to coordinated services and platform upgrades which prevents independent and regular service and platform releases.

Networking

While a traditional piece of middleware has an advanced feature set focused around interaction with other internal and external services, it has a few major drawbacks. The networking capabilities are centered around one primary language and its related technologies. For Java language, that is JMS, JDBC, JTA, etc. More importantly, the networking concerns and semantics are deeply engraved into the business service as well. There are libraries with abstractions to cope with the networking concerns (such as the once-popular Hystrix project), but the library’s abstractions "leak" into the service its programming model, exchange patterns and error handling semantics, and the library itself. While it is handy to code and read the whole business logic mixed with networking aspect in a single location, this tightly couples both concerns into a single implementation and, ultimately, a joint evolutionary path.

State

To do reliable service orchestration, business process management, and implement patterns, such as the Saga Pattern and other slow-running processes, platforms require persistent state behind the scenes. Similarly, temporal actions, such as firing timers and cron jobs, are built on top of state and require a database to be clustered and resilient in a distributed environment. The main constraint here is the fact that the libraries and interfaces interacting with state are not completely abstracted and decoupled from the service runtime. Typically these libraries have to be configured with database details, and they live within the service leaking the semantics and dependency concerns into the application domain.

Binding

One of the main drivers for using integration middleware is the ability to connect to various other systems using different protocols, data formats, and message exchange patterns. And yet, the fact that these connectors have to live together with the application, means the dependencies have to be updated and patched together with the business logic. It means the data type and data format have to be converted back and forth within the service. It means the code has to be structured and the flow designed according to the message exchange patterns. These are a few examples of how even the abstracted endpoints influence the service implementation in the traditional middleware.

Cloud-native tendencies

Traditional middleware is powerful. It has all the necessary technical features, but it lacks the ability to change and scale rapidly, which is demanded by modern digital business needs. This is what the microservices architecture and its guiding principles for designing modern distributed applications are addressing.

The ideas behind the microservices and their technical requirements contributed to the popularization and widespread use of containers and Kubernetes. That started a new way of innovation that is going to influence the way we approach distributed applications for years to come. Let’s see how Kubernetes and the related technologies affect each group of requirements.

Lifecycle

Containers and Kubernetes evolved the way we package, distribute, and deploy applications into a language-independent format. There is a lot written about the Kubernetes Patterns and the Kubernetes Effect on developers and I will keep it short here. Notice though, for Kubernetes, the smallest primitive to manage is the container and it is focused on delivering distributed primitives at the container level and the process model. That means it does a great job on managing the lifecycle aspects of the applications, health-check, recovery, deployment, and scaling, but it doesn’t do such a good job improving on the other aspects of distributed applications which live inside the container, such as flexible networking, state management, and bindings.

You may point out that Kubernetes has stateful workloads, service discovery, cron jobs, and other capabilities. That is true, but all of these primitives are at the container level, and inside the container, a developer still has to use a language-specific library to access the more granular capabilities we listed at the beginning of this article. That is what drives projects like Envoy, Linkerd, Consul, Knative, Dapr, Camel-K, and others.

Networking

It turns out, the basic networking functionality around service discovery provided by Kubernetes is a good foundation, but not enough for modern applications. With the increasing number of microservices and the faster pace of deployments, the needs for more advanced release strategies, managing security, metrics, tracing, recovery from errors, simulating errors, etc. without touching the service, have become increasingly more appealing and created a new category of software on its own, called service mesh.

What is more exciting here is the tendency of moving the networking-related concerns from the service containing the business logic, outside and into a separate runtime, whether that is sidecar or a node level-agent. Today, service meshes can do advanced routing, help to test, handle certain aspects of security, and even speak application-specific protocols (for example Envoy supports Kafka, MongoDB, Redis, MySQL, etc.). While service mesh, as a solution, might not have a wide adoption yet, it touched a real pain point in distributed systems, and I’m convinced it will find its shape and form of existence.

In addition to the typical service mech, there are also other projects, such as Skupper, that confirm the tendency of putting networking capabilities into an external runtime agent. Skupper solves multi-cluster communication challenges through a layer 7 virtual network and offers advanced routing and connectivity capabilities. But rather than embedding Skupper into the business service runtime, it runs an instance per Kubernetes namespace which acts as a shared sidecar.

To sum up, container and Kubernetes made a major step forward in the lifecycle management of the applications. Service mesh and related technologies hit a real pain point and set the foundation for moving more responsibilities outside of the application into proxies. Let’s see what’s next.

State

We listed earlier the main integration primitives that rely on state. Managing state is hard and should be delegated to specialized storage software and managed services. That is not the topic here, but using state, in language-neutral abstractions to aid integration use cases is. Today, many efforts try to offer stateful primitives behind language-neutral abstractions. Stateful workflow management is a mandatory capability in cloud-based services, with examples, such as AWS Step Functions, Azure Durable Functions, etc. In the container-based deployments, CloudState and Dapr, both rely on the sidecar model to offer better decoupling of the stateful abstractions in distributed applications.

What I look forward to is also abstracting away all of the stateful features listed above into a separate runtime. That would mean workflow management, singletons, idempotency, transaction management, cron job triggers, and stateful error handling all happening reliably in a sidecar, (or a host-level agent), rather than living within the service. The business logic doesn’t need to include such dependencies and semantics in the application, and it can declaratively request such behavior from the binding environment. For example, a sidecar can act as a cron job trigger, idempotent consumer, and workflow manager, and the custom business logic can be invoked as a callback or plugged in on certain stages of the workflow, error handling, temporal invocations, or unique idempotent requests, etc.

Another stateful use case is caching. Whether that is request caching performed by the service mesh layer, or data caching with something like Infinispan, Redis, Hazelcast, etc., there are examples of pushing the caching capabilities out of the application’s runtime.

Binding

While we are on the topic of decoupling all distributed needs from the application runtime, the tendency continues with bindings too. Connectors, protocol conversions, message transformations, error handling, and security mediation could all move out of the service runtime. We are not there yet, but there are attempts in this direction with projects such as Knative and Dapr. Moving all of these responsibilities out of the application runtime will lead to a much smaller, business-logic-focused code. Such a code would live in a runtime independent from distributed system needs that can be consumed as prepackaged capabilities.

Another interesting approach is taken by the Apache Camel-K project. Rather than using an agent runtime to accompany the main application, this project relies on an intelligent Kubernetes Operator that builds application runtimes with additional platform capabilities from Kubernetes and Knative. Here, the single agent is the operator that is responsible for including the distributed system primitives required by the application. The difference is that some of the distributed primitives are added to the application runtime and some enabled in the platform (which could include a sidecar as well).

Future architecture trends

Looking broadly, we can conclude that the commoditization of distributed applications, by moving features to the platform level, reaches new frontiers. In addition to the lifecycle, now we can observe networking, state abstraction, declarative eventing, and endpoint bindings also available off-the-shelf, and EIPs are next on this list. Interestingly enough, the commoditization is using the out-of-process model (sidecars) for feature extension rather than runtime libraries or pure platform features (such as new Kubernetes features).

We are now approaching full circle by moving all of the traditional middleware features (a.k.a ESBs) into other runtimes, and soon, all we have to do in our service will be to write the business logic.

Traditional middleware and cloud-native platforms overview

Compared to the traditional ESB era, this architecture decouples the business logic from the platform better, but not yet fully. Many distributed primitives, such as the classic enterprise integration patterns (EIPs): splitter, aggregator, filter, content-based router; and streaming processing patterns: map, filter, fold, join, merge, sliding windows; still have to be included in the business logic runtime, and many others depend on multiple distinct and overlapping platform add-ons.

If we stack up the various cloud-native projects innovating at the different domains, we end up with a picture such as the following:

Multi-runtime microservices

The diagram here is for illustration purposes only, it purposefully picks representative projects and maps them to a category of distributed primitives. In practice, you will not use all of these projects at the same time as some of them are overlapping and not compatible workload models. How to interpret this diagram?

Kubernetes and containers made a huge leap in the lifecycle management of polyglot applications and set the foundation for future innovations.
Service mesh technologies improved on Kubernetes with advanced networking capabilities and started tapping into the application concerns.
While Knative is primarily focused on serverless workloads through rapid scaling, it also addresses service orchestration and event-driven binding needs.
Dapr builds on the ideas of Kubernetes, Knative, and Service Mesh and dives into the application runtimes to tackle stateful workloads, binding, and integration needs, acting as a modern distributed middleware.

This diagram is to help you visualize that, most likely in the future, we will end up using multiple runtimes to implement the distributed systems. Multiple runtimes, not because of multiple microservices, but because every microservice will be composed of multiple runtimes, most likely two — the custom business logic runtime and the distributed primitives runtime.

Introducing multi-runtime microservices

Here is a brief description of the multi-runtime microservices architecture that is beginning to form.

Do you remember the movie Avatar and the Amplified Mobility Platform (AMP) "mech suits" developed by scientists to go out into the wilderness to explore Pandora? This multi-runtime architecture is similar to these Mecha-suits that give superpowers to their humanoid drivers. In the movie you have humans putting on suits to gain strength and access destructive weapons. In this software architecture, you have your business logic (referred to as micrologic) forming the core of the application and the sidecar mecha component that offers powerful out-of-the-box distributed primitives. The micrologic combined with the mecha capabilities form a multi-runtime microservice which is using out-of-process features for its distributed system needs. And the best part is, Avatar 2 is coming out soon to help promote this architecture. We can finally replace vintage sidecar motorcycles with awesome mecha pictures at all software conferences ;-). Let’s look at the details of this software architecture next.

This is a two-component model similar to a client-server architecture, where every component is separate runtime. It differs from a pure client-server architecture in that, here, both components are located on the same host with reliable networking among them that is not a concern. Both components are equal in importance, and they can initiate actions in either direction and act as the client or the server. One of the components is called Micrologic, and it holds the very minimal business logic stripped out of almost all of the distributed system concerns. The other accompanying component is the Mecha, and it provides all of the distributed system features we have been talking about through the article (except lifecycle which is a platform feature).

Multi-runtime (out-of-process) microservices architecture

There might be a one-to-one deployment of the Micrologic and the Mecha (known as the sidecar model), or it can be one shared Mecha with a few Micrologic runtimes. The first model is most appropriate on environments, such as Kubernetes, and the latter on the edge deployments.

Micrologic runtime characteristics

Let’s briefly explore some of the characteristics of the Micrologic runtime:

The Micrologic component is not a microservice on its own. It contains the business logic that a microservice would have, but that logic can only work in combination with the Mecha component. On the other hand, microservices are self-contained and do not have pieces of the overall functionality or part of the processing flow spread into other runtimes. The combination of a Micrologic and its Mecha counterpart form a Microservice.
This is not a function or serverless architecture either. Serverless is mostly known for its managed rapid scaling up and scale-to-zero capabilities. In the serverless architecture, a function implements a single operation as that is the unit of scalability. In that regard, a function is different from a Micrologic which implements multiple operations, but the implementation is not end-to-end. Most importantly, the implementation of the operations is spread over the Mecha and the Micrologic runtimes.
This is a specialized form of client-server architecture, optimized for the consumption of well-known distributed primitives without coding. Also, if we assume that the Mecha plays the server role, then each instance has to be specifically configured to work with the individual client(s). It is not a generic server instance aiming to support multiple clients at the same time as a typical client-server architecture.
The user code in the Micrologic does not interact directly with other systems and does not implement any distributed system primitives. It interacts with the Mecha over de facto standards, such as HTTP/gRPC, CloudEvents spec, and the Mecha communicates with other systems using enriched capabilities and guided by the configured steps and mechanisms.
While the Micrologic is responsible only for implementing the business logic stripped out of distributed system concerns, it still has to implement a few APIs at a minimum. It has to allow the Mecha and the platform to interact with it over predefined APIs and protocols (for example, by following the cloud-native design principles for Kubernetes deployments).

Mecha runtime characteristics

Here are some of the Mecha runtime characteristics:

The Mecha is a generic, highly configurable, reusable component offering distributed primitives as off-the-shelf capabilities.
Each instance of the Mecha has to be configured to work with one Micrologic component (the sidecar model) or configured to be shared with a few components.
The Mecha does not make any assumption about the Micrologic runtime. It works with polyglot microservices or even monolithic systems using open protocols and formats, such as HTTP/gRPC, JSON, Protobuf, CloudEvents.
The Mecha is configured declaratively with simple text formats, such as YAML, JSON, which dictates what features to be enabled and how to bind them to the Micrologic endpoints. For specialized API interactions, the Mechan can be additionally supplied with specs, such as OpenAPI, AsyncAPI, ANSI-SQL, etc. For stateful workflows, composed of multiple processing steps, a spec, such as Amazon State Language, can be used. For stateless integrations, Enterprise Integration Patterns (EIPs) can be used with an approach similar to the Camel-K YAML DSL. The key point here is that all of these are simple, text-based, declarative, polyglot definitions that the Mecha can fulfill without coding. Notice that these are futuristic predictions, currently, there are no Mechas for stateful orchestration or EIPs, but I expect existing Mechas (Envoy, Dapr, Cloudstate, etc) to start adding such capabilities soon. The Mecha is an application-level distributed primitives abstraction layer.
Rather than depending on multiple agents for different purposes, such as network proxy, cache proxy, binding proxy, there might be a single Mecha providing all of these capabilities. The implementation of some capabilities, such as storage, message persistence, caching, etc., would be plugged in and backed by other cloud or on-premise services.
Some distributed system concerns around lifecycle management make sense to be provided by the managing platform, such as Kubernetes or other cloud services, rather than the Mecha runtime using generic open specifications such as the Open App Model.

What are the main benefits of this architecture?

The benefits are loose coupling between the business logic and the increasing list of distributed systems concerns. These two elements of software systems have completely different dynamics. The business logic is always unique, custom code, written in-house. It changes frequently, depending on your organizational priorities and ability to execute. On the other hand, the distributed primitives are the ones addressing the concerns listed in this post, and they are well known. These are developed by software vendors and consumed as libraries, containers or services. This code changes depending on vendor priorities, release cycles, security patches, open-source governing rules, etc. Both groups have little visibility and control over each other.

Business logic and distributed system concerns coupling in application architectures

Microservices principles help decouple the different business domains by bounded contexts where every microservice can evolve independently. But microservices architecture does not address the difficulties coming from coupling the business logic with middleware concerns. For certain microservices that are light on integration use cases, this might not be a big factor. But if your domain involves complex integrations (which is increasingly becoming the case for everybody), following the microservices principles will not help you protect from coupling with the middleware. Even if the middleware is represented as libraries you include in your microservices, the moment you start migrating and changing these libraries, the coupling will become apparent. And the more distributed primitives you need, the more coupled into the integration platform you become. Consuming middleware as a separate runtime/process over a predefined API rather than a library helps loose coupling and enables the independent evolution of each component.

This is also a better way to distribute and maintain complex middleware software for vendors. As long as the interactions with the middleware are over inter-process communication involving open APIs and standards, the software vendors are free to release patches and upgrades at their pace. And the consumers are free to use their preferred language, libraries, runtimes, deployments methods, and processes.

What are the main drawbacks of this architecture?

Inter-process communication. The fact that the business logic and the middleware mechanics (you see where the name comes from) of the distributed systems are in different runtimes and that requires an HTTP or gRPC call rather than an in-process method call. Notice though, this is not a network call that is supposed to go to a different machine or datacenter. The Micrologic runtime and the Mecha are supposed to be colocated on the same host with low latency and minimal likelihood of network issues.

Complexity. The next question is, whether it is worth the complexity of development, and maintaining such systems for the gained benefits. I think the answer will be increasingly inclining towards yes. The requirements of distributed systems and the pace of release cycles are increasing. And this architecture optimizes for that. I wrote some time ago that the developers of the future will have to be with hybrid development skills. This architecture confirms and enforces further this trend. Part of the application will be written in a higher-level programming language, and part of the functionality will be provided by off-the-shelf components that have to be configured declaratively. Both parts are inter-connected not at compile-time, or through in-process dependency injection at startup time, but at deployment time, through inter-process communications. This model enables a higher rate of software reuse combined with a faster pace of change.

What comes after microservices are not functions

Microservices architecture has a clear goal. It optimizes for change. By splitting applications into business domains, this architecture offers the optimal service boundary for software evolution and maintainability through services that are decoupled, managed by independent teams, and released at an independent pace.

If we look at the programming model of the serverless architecture, it is primarily based on functions. Functions are optimized for scalability. With functions, we split every operation into an independent component so that it can scale rapidly, independently, and on-demand. In this model, the deployment granularity is a function. And the function is chosen because it is the code construct that has an input whose rate correlates directly to the scaling behavior. This is an architecture that is optimized for extreme scalability, rather than long term maintainability of complex systems.

What about the other aspect of Serverless, which comes from the popularity of AWS Lambda and its fully managed operational nature? In this regard, "AWS Serverless" optimizes for speed of provisioning for the expense of lack of control and lock-in. But the fully managed aspect is not application architecture, it is a software consumption model. It is an orthogonal functionally, similar to consuming a SaaS-based platform which in an ideal world should be available for any kind of architecture whether that is monolithic, microservices, mecha or functions. In many ways, AWS Lambda resembles a fully managed Mecha architecture with one big difference: Mecha does not enforce the function model, instead it allows a more cohesive code constructs around the business domain, split from all middleware concerns.

Application architecture optimizations

Mecha architecture, on the other hand, optimizes microservices for middleware independence. While microservices are independent of each other, they are heavily dependent on embedded distributed primitives. The Mecha architecture splits these two concerns into separate runtimes allowing their independent release by independent teams. This decoupling improves day-2 operations (such as patching and upgrades) and the long term maintainability of the cohesive units of business logic. In this regard, Mecha architecture is a natural progression of the microservices architecture by splitting software based on the boundaries that cause most friction. That optimization provides more benefits in the form of software reuse and evolution than the function model, which optimizes for extremely scalability at the expense of over-distribution of code.

Conclusion

Distributed applications have many requirements. Creating effective distributed systems requires multiple technologies and a good approach to integration. While traditional monolithic middleware provided all of the necessary technical features required by distributed systems, it lacked the ability to change, adapt, and scale rapidly, which was required by the business. This is why the ideas behind microservices-based architectures contributed to the rapid popularization of containers and Kubernetes; with the latest developments in the cloud-native space, we are now coming full circle by moving all of the traditional middleware features into the platform and off-the-shelf auxiliary runtimes.

This commoditization of application features is primarily using the out-of-process model for feature extension, rather than runtime libraries or pure platform features. That means that in the future it is highly likely that we will use multiple runtimes to implement distributed systems. Multiple runtimes, not because of multiple microservices, but because every microservice will be composed of multiple runtimes; a runtime for the custom micro business logic, and an off-the-shelf, configurable runtime for distributed primitives.

This article was originally published on InfoQ here.

Camel Rebirth with Subsecond Experiences

Thursday, May 23, 2019 No comments

This post was originally published here. Check out my new Kubernetes Patterns book and follow me @bibryam for future blog posts.

A look at the past decade

The integration space is in constant change. There are many open source projects and closed source technologies that have not passed the test of time and disappeared from the middleware stacks for good. After a decade, Apache Camel is still here and becoming even stronger for the next decade of integration.

Gang of Four for integration

Apache Camel started life as an implementation of the Enterprise Integration Patterns (EIP) book. Today, these patterns are the equivalent of the object-oriented Gang of Four Design Patterns but for messaging and integration domain. They are agnostic of programing language, platform, architecture, and provide a universal language, notation and description of the forces around fundamental messaging primitives.

But Camel community did not stop with these patterns, and kept evolving and adding newer patterns from SOA, Microservices, Cloud Native and Serverless paradigms. As a result, Camel turned into a generic pattern based integration framework suitable for multiple architectures.

Universal library consumption model

While the patterns gave the initial spark to Camel, its endpoints quickly became popular and turned into a universal protocol for using Java based integration libraries as connectors. Today, there are hundreds of Java libraries that can be used as Camel connectors using the Camel endpoint notation. It takes a while to realize that Camel can also be used without the EIPs and the routing engine. It can act as a connector framework where all libraries are consumed as universal URIs without a need for understanding the library specific factories and configurations that vary widely across Java libraries.

The right level of abstraction

When you talk to developers who have not used Camel in anger before, they will tell you that it is possible to do integration without Camel. And they are right about the 80% of the easy use cases, but not for the remaining 20% of the cases that can turn a project into a multi-year frustrating experience. What they do not realize yet is that without Camel, there are multiple manual ways of doing the same thing, but none are validated by the experience of hundreds of open source developers. And if you fast-forward a few years, 10s of different systems to integrate with, and 10s of developers coming and going, 100s of microservices, an integration project can quickly turn into a bespoke home-grown framework that nobody wants to work on. Doing integration is easy, but doing good integration that will evolve and grow for many years, by many teams, is the hardest part. Camel addresses this challenge with universal patterns and connectors, combined with integration focused DSLs, that have passed the test of time. Next time, if you think you don't need Camel, your are either thinking for short term gains, or you are not realizing yet how complex integration can become.

Embracing change

It takes only a couple of painful experiences in large integration projects to start appreciating Camel. But Camel is not great only because it was started by and built on the works of great minds, it is great because it evolves thanks to the world's knowledge, shared through the open source model and its networking effects. Camel started as the routing layer in ESBs during SOA period with a lot of focus on XML, WS, JBI, OSGI etc, but then it quickly adapted for REST, Swagger, Circuit breakers, SAGAs, and Spring Boot in the Microservices era. And the world has not stopped there, with Docker and Kubernetes, and now Serverless architecture, Camel keeps embracing the change. That is because Camel is written for integrating changing environments, and Camel itself grows and shines on change. Camel is a change enabling library for integration.

Behind the scene engine

One of the Camel secret sauces is that it is a non-intrusive, unopinionated, small (5MB and getting smaller) integration library without any affinity where and how you consume it. If you notice, this is the opposite of an ESB as commonly Camel is confused with because of its extensive capabilities. Over the years, Camel has been used as the internal engine powering projects such as:

Apache ServiceMix ESB
Apache ActiveMQ
Talend ESB
JBoss Switchyard
JBoss Fuse Service Works
Red Hat Fuse
Fuse Online/Syndesis
And many other frameworks mentioned here.

You can use Camel standalone, embed it with Apache Tomcat, with Spring Boot starters, JBoss WildFly, Apache Karaf, Vert.x, Quarkus, you name it. Camel doesn't care and it will bring superpowers to your project every time.

Looking to the future

Nobody can tell how the ideal integration stack will look like in a decade, nor can I. But I will tell you about two novelties coming into Apache Camel now (and to Red Hat Fuse later), and why they will have a noticeable positive effect for the developers and the business. I call these changes as subsecond deployment and subsecond startup of Camel applications.

Subsecond deployments to Kubernetes

There was a time when cloud-native meant different technologies. Today, after a few years of natural selection and consolidation in the industry, cloud-native means applications created specifically for Kubernetes and its ecosystem of projects around CNCF. Even with this definition, there are many shades of cloud-native, from running a monolithic non-scalable application in a container, to triggering a function that is fully embracing the cloud-native development and management practices. The Camel community has realized that Kubernetes is the next generation application runtime, and it is steadily working on making Camel a Kubernetes native integration engine. The same way Camel is a first-class citizen in OSGI containers, JEE application servers, other fat-jar runtimes, Camel is becoming a first-class citizen on Kubernetes, integrating deeply and benefiting from the resiliency and scalability offered by the platform.

Here are a few of the many enhancement efforts going on in this direction:

Deeper Kubernetes integration - Kubernetes API connector, full health-check API implementation for Camel subsystems, service discovery through a new ServiceCall EIP, configuration management using ConfigMaps. Then a set of application patterns with special handling on Kubernetes such as: clustered singleton routes, scalable XA transactions (because sometimes, you have to), SAGA pattern implementation, etc.
Cloud-native integrations - support for other cloud-native projects such as exposing Camel metrics for Prometheus, tracing Camel routes through Jaeger, JSON formatted logging for log aggregation, etc.
Immutable runtimes - whether you use the minimalist immutable Karaf packaging, or Spring Boot, Camel is a first class citizen ready to put in a container image. There are also Spring Boot starter implementations for all Camel connectors, integration with routes, properties, converters, and whatnot.
Camel 3 - is a fact and actively progressing. A big theme for Camel 3 is to make it more modular, smaller, with faster startup time, reactive, non-blocking and triple awesome. This is the groundwork needed to restructure Camel for the future cloud workloads.
Knative integration - Knative is an effort started by Google in order to bring some order and standardization in the serverless world dominated by Amazon Lambda. And Camel is among the projects that integrate with Knative primitives from early days and enhances the Knative ecosystem with hundreds of connectors acting as generic event sources.
And here is a real game-changer initiative: Camel-K (a.k.a deep Kubernetes integration for Camel) - we have seen that Camel is typically embedded into the latest modern runtime where it acts as the developer-friendly integration engine behind the scene. The same way Camel used to benefit from JEE services in the past for hot-deployment, configuration management, transaction management, etc, today Camel-K allows Camel runtime to benefit from Kubernetes features for high-availability, resiliency, self-healing, auto-scaling, and basically distributed application management in general. The way Camel-K achieves this is through a CLI and an Operator where the latter is able to understand the Camel applications, its build time dependencies, runtime needs, and make intelligent choices from the underlying Kubernetes platform and its additional capabilities (from Knative, Istio, Openshift and others in the future). It can automate everything on the cluster such as picking the best suited container image, runtime management model and update them when needed. The CLI can automate the tasks that are on the developer machine such as observing the code changes and streaming those to the Kubernetes cluster, and printing the logs from the running Pods.

Camel route auto-deployment to Kubernetes with Camel-K

Camel-K operator understands two domains: Kubernetes and Camel. By combining knowledge of both areas, it can automate tasks that usually require a human operator.
The really powerful part is that, with Camel-K, a Camel route can be built and deployed from source code to a running Camel route on Kubernetes in less than a second!

Time to deploy and run a Camel integration(in seconds)

Forget about making a coffee, or even having a sip while building and deploying a Camel route with Camel K. As soon as you make changes to your source code and open a browser, the Camel route will be running in Kubernetes. This has a noticeable impact on the way the developers write Camel code, compile, drink coffee, deploy and test. Apart from changing the development practices and habits, this toolset will significantly reduce the development cycles which would be noticed by the business stakeholders too. For live demonstration, check out the following awesome video from Fuse engineers working on Camel-K project.

Subsecond startups of Camel applications

A typical enterprise integration landscape is composed of stateless services, stateful services, clustered applications, batch jobs, file transfers, messaging, real time integrations, and may be even blockchain based business processes. To that mix, today, we also have to add serverless workloads as well, which is best suited for event driven workloads. Historically, the heavy and slow Java runtime had significant drawbacks compared Go, Javascript and other light runtimes in the serverless space. That is one of the main motivations for Oracle to create GraalVM/Substrate VM. Substrate VM is a framework that enables ahead-of-time (AOT) compilation of Java applications into native executables that are light and fast. Then a recent effort by Red Hat led to the creation of Quarkus project which improves further the resource consumptions, startup and response times of Java applications mind-blowingly (a term not-used lightly here).

Supersonic Subatomic Java with Quarkus

As you can see from the metrics above, Quarkus combined with SubstrateVM is not a gradual evolution. It is a mutation, and a revolutionary jump that suddenly changes the perspectives on the Java’s footprint and speed in the cloud native workloads. It makes Java friendly for serverless architecture. Considering the huge Java ecosystem composed of developers and libraries, it even turns Java into the best suited language for Serverless applications. And Camel combined with Quarkus, the best placed integration library in this space.

Summary

With the explosion of Microservices architecture, the number of services has increased tenfold which gave birth to Kubernetes-enabled architectures. These architectures are highly dynamic in nature, and most powerful with light and fast runtimes that enable instant scale up and higher deployment density.

Camel is the framework to fill the space between disparate systems and services. It offers data consistency guarantees, reliable communication, failover, failure detection and recovery, and so on, in a way that makes developers productive. Now, imagine the same powerful Apache Camel based integration in 2020 that: deploys to Kubernetes in 20ms; starts up in 20ms; requires 20MB memory, consumes 20MB on the disk... That is regardless whether it runs as a stateless application in a container, or as a function on KNative. That is 100x faster deployments to Kubernetes, 100x faster startup time, 10x less resource consumption allowing real-time scale-up, scale-down, and scale to zero. That is a change that the developers will notice during development, users will notice when using the system, and the business will notice on the infrastructure cost and overall delivery velocity. That is the real cloud-native era we have been waiting for.

The next integration evolution - blockchain

Tuesday, April 02, 2019 No comments

Below is the conclusion from an article I wrote at TechCrunch. Checkout the full article here.

Enterprise integration has multiple nuances. Integration challenges within an organization, where all systems are controlled by one entity and participants have some degree of trust to each other, are mostly addressed by modern ESBs, BPMs and Microservices architectures. But when it comes to multi-party B2B integration, there are additional challenges. These systems are controlled by multiple organizations, have no visibility of the business processes and do not trust each other. In these scenarios, we see organizations experimenting with a new breed of blockchain-based technology that relies not only on sharing of the protocols and contracts but sharing of the end-to-end business processes and state.

Integration evolution stages

And this trend is aligned with the general direction integration has been evolving over the years: from sharing the very minimum protocols, to sharing and exposing more and more in the form of contracts, APIs and now business processes. This shared integration infrastructure enables new transparent integration models where the previously private business processes are now jointly owned, agreed, built, maintained and standardized using the open-source collaboration model. This can motivate organizations to share business processes and form networks to benefit further from joint innovation, standardization and deeper integration in general.

A Java developer’s first impressions from Corda

Friday, December 07, 2018 1 comment

Follow me on twitter for other posts in this space. If you prefer, read the same post on Medium.

Recently I had a chance to play a little bit with the open source permissioned JVM based blockchain platform Corda. I was surprised to discover how it blends blockchain ideas with the commodity middleware technology and creates a new brief of decentralized enterprise integration. Below are my first impressions from it along with an Apache Camel connector contribution.

What is Corda?

Corda is a decentralized database and business process platform designed and built from the ground up for the implementation of legal agreements among identiﬁable parties. It is a DLT implementation heavily inﬂuenced by the Bitcoin's UTXO model and driven by the "enterprisy" requirements of the ﬁnancial industry. Corda is written in Kotlin, runs on the JVM and uses many of the proven middleware technologies. As such, compared to other blockchain platforms, Corda offers a low-entry barrier for Java developers experienced with integration, messaging and business processes management.

Design Principles

Permissioned (instead of permissionless network such as Bitcoin, Ethereum, etc) - this is a no surprise as enterprise blockchain use cases are primarily focused around automating the business integration challenges among identifiable parties.
Point-to-point (instead of global transaction broadcasts as in Bitcoin and others) - this enables data to be shared only among the nodes that need-to-know it which also leads to improved privacy and scalability.
UTXO model similar to Bitcoin (instead of the account model of Ethereum) - which is the part that makes Corda a DLT/Blockchain rather than a distributed business process management platform.
Re-use (instead of building everything from scratch) - this is the favorite part of Corda for me. Reuse of the Java ecosystem, reuse of relational databases, reuse of the messaging systems, etc.

The combination of these design principles makes Corda a very unique DLT platform among its competitors. It has elements of Bitcoin UTXO model, Ethereum smart contracts capabilities, Fabric private channels and most importantly - it reuses and builds on top of the existing battle-tested middleware technologies whenever possible.

Main Concepts

A permissioned network made up of point-to-point communicating nodes.
A ledger where each node maintains its unique database, rather than a single store.
Notary nodes that prevent double spends and validate transactions.
Oracle services that only sign transactions if the included facts are true (slightly different to typical oracles).
State objects are immutable that represent on-ledger facts. The state is modified through transactions and stored on owning nodes only.
Contracts are deterministic JVM based functions that validate the transactions.
Transactions are candidate updates to the ledger and must be contractually valid and signed to be committed.
Flows encapsulate business processes and abstract all the networking, I/O, storage and concurrency. All smart contract activity occurs within the scope of flows which can be started through RPC calls or other flow calls. Flows do not run within sandbox as in the case of contracts but executed as regular Java code.

A Corda Flow that interacts with Node A, Node B, and the Notary Pool

In Ethereum, the concepts of smart contracts encapsulate both the business logic and the state into one. In Corda, state and contract objects are separate concepts: the state is persisted, and contracts are deterministic functions (meaning all transaction validations that are performed by the contracts on different nodes and should produce the same result).
In addition, Corda introduces the concept of a Flow (kind of a distributed orchestration engine), which in Ethereum world would be similar to contracts calling each other (kind of choreography). But Corda flows are not deployed into all nodes, they are not part of the shared state, but rather represent standard JVM code, specific to individual nodes.

Technology Stack

Driven by the "re-use" principle, Corda is reusing existing storage, messaging and Java solutions. While blockchain platforms such as Quorum take a permissionless PoW framework such as Ethereum, and make it "enterprisy" by replacing the consensus mechanism, removing gas payments, introducing private transactions, etc., Corda takes the opposite approach. Corda, takes existing middleware technologies and applies the Bitcoin concept of UTXO and creates a new class of software that can be described as "a distributed business process and state management system". Corda achieves that through the use of commodity technologies such as relational databases for storage, and messaging for state replication and distributed business process coordination.

Main components of Corda

High level technology stack:

It builds with Gradle, requires Oracle JDK 8, runs on Docker (and Linux on production).
A Corda node is a flat classpath JVM application (no Spring Boot, App Server, or OSGI container required).
Storage: relational database - H2, PostgreSQL, SQL Server, OracleDB.
Object-Relational Mapping: JPA - JBoss Hibernate.
Messaging: AMQP based - Apache ActiveMQ /Artemis.
Metrics: Jolokia
Other: Quasar, Kryo, Shiro, Jackson, etc,

Apache Camel Integration with Corda

Driven by my background in enterprise integration and interest in the blockchain, recently I created and wrote about Apache Camel connector for Ethereum and Quorum. In the same spirit of exploring enterprise blockchains, I created an Apache Camel connector for Corda. The connector uses Corda-RPC library and provides Camel producer/consumer endpoints to interact with a Corda node. The component offers a consumer for signing up and receiving events from a Corda node, and producer to send commands to a node.

Apache Camel connector for Corda

Here is the full set of supported operations:

Consumer: vaultTrack, vaultTrackBy, vaultTrackByCriteria, vaultTrackByWithPagingSpec, vaultTrackByWithSorting, stateMachinesFeed, networkMapFeed, networkMapFeed, stateMachineRecordedTransactionMappingFeed, startTrackedFlowDynamic.

Producer: currentNodeTime, getProtocolVersion, networkMapSnapshot, stateMachinesSnapshot, stateMachineRecordedTransactionMappingSnapshot, registeredFlows, clearNetworkMapCache, isFlowsDrainingModeEnabled, setFlowsDrainingModeEnabled, notaryIdentities, nodeInfo, addVaultTransactionNote, getVaultTransactionNotes, uploadAttachment, attachmentExists, openAttachment, queryAttachments, nodeInfoFromParty, notaryPartyFromX500Name, partiesFromName, partyFromKey, wellKnownPartyFromX500Name, wellKnownPartyFromAnonymous, startFlowDynamic, vaultQuery, vaultQueryBy, vaultQueryByCriteria, vaultQueryByWithPagingSpec, vaultQueryByWithSorting.

To find more about Camel, and how it can complement Corda solutions, read the Camel Ethereum connector article linked above.

Conclusion

Public permissionless blockchains are facing serious technical challenges in the form of scaling, governance, energy waste and non-technical challenges with speculation, regulation, general usefulness and applicability. They have the noble idea of decentralizing everything but are yet to prove that the technology and the economic models are capable of delivering that vision.

On the other hand, private permissioned blockchains such as Corda, Fabric, Quorum are immune to these technical challenges as they target use cases with a smaller number of identiﬁable parties in regulated markets. Their goal is to improve and automate the existing business models of the enterprise rather than trying to discover brand new economic models. In a sense, permissioned blockchains represent the next generation cross-organization business process and data integration systems.

In this space, Corda is not revolutionary, but rather an evolutionary platform built on top of a well-estabilished storage and middleware technology ecosystems. The blockchain technology still has to prove itself, and building on top of a proven technology is the first step.

My next stop on exploring enterprise blockchains will be Hyperledger Fabric. Take care.

Enterprise Integration for Ethereum

Saturday, July 21, 2018 No comments

If you prefer, read the same post on Medium.

The most popular open source Java integration library — Apache Camel supports Ethereum’s JSON-RPC API now.

The Ethereum Ecosystem

Ethereum is an open source, public, blockchain platform for running smart contracts. It provides a decentralized Turing-complete virtual machine that can execute scripts and a cryptocurrency used to compensate participant mining nodes for computations performed or to mitigate spam. Today, Ethereum is one of the most established and mature blockchain platforms with interests from small and large companies, nonprofit organizations and governments. There is a lot that can be said about Ethereum ecosystem and the pace it moves with. But the facts talk for themselves, Ethereum has the momentum and all the indications of a technology with a potential:

Ethereum has an order of magnitude more active developers than any other blockchain platform and as dictated by the Metcalfe's law, this gap widens day by the day. Ethereum coding school CryptoZombies has over 200K users, Truffle development framework has over half a million downloads.

The Ethereum platform has the richest tools and infrastructure ecosystem for building decentralized applications: Truffle, Infura, Web3.js, OpenZeppelin, Geth, Parity, Ganache, MetaMask, CryptoZombies, MyCrypto, Etherscan, ERC20 (for ICOs), etc.

The cloud platforms Amazon Web Services and Microsoft Azure offer services for one-click Ethereum infrastructure deployment and management.

The Ethereum technology has the interest of enterprise software companies. Customized Ethereum-based applications are being developed and experimented by financial institutions such as JPMorgan Chase,Deloitte, R3,Innovate UK, Barclays, UBS, Credit Suisse and many others. One of the best known in this area is the J. P. Morgan Chase developed permissioned of Ethereum blockchain called Quorum.

In 2017, Enterprise Ethereum Alliance (EEA) was setup up by various blockchain start-ups, Fortune 500 companies, research groups and others with the aim to help adoption of Ethereum based technology. It provides standards, resources for businesses to learn about Ethereum and leverage this groundbreaking technology to address specific industry use cases.

Ethereum has passed the moment when it was a hipster technology or a scientific experiment, and now it is a fundamental open source decentralization technology that enterprise companies are looking into. Talking about open source and the enterprise, I thought I also do my tiny piece of contribution to the Ethereum ecosystem and help for its adoption. Let's see what is it.

Open Source Enterprise Integration

Ethereum is distributed and decentralized, but it is mostly a closed system with the embedded ledger, the currency, and the executing nodes. In order to be useful for the enterprise, Ethereum has to be well integrated with existing legacy and new systems. Luckily, Ethereum offers a robust and lightweight JSON-RPC API with a good support for the JavaScript language. But in the enterprise companies, JavaScript is not the primary choice for integration, it is rather Java followed by .Net. Java is not necessary lightweight or fast evolving, but it has a huge developer community and a mature library ecosystem making it the top choice for the majority of enterprise companies. The main factor contributing to the productivity of the Java language is the reuse of existing libraries and avoiding reinventing the wheel. One of the most popular libraries enabling reuse and avoiding reinventing the wheel for integration is Apache Camel. Luckily, Camel happens to be my passion and a project I have been contributing for many years, so connecting the two was the most natural thing for me to do.

Building blocks of Apache Camel

For those who are coming from a blockchain background and are not familiar with Camel, here is a very brief intro. Apache Camel is a lightweight open source integration library that is composed conceptually of three parts:

Implementations of the widely used Enterprise Integration Patterns (EIPs). (Notice this are not Ethereum Improvement Proposal that shares the same acronym.) EIPs provide a common notation, language and definition of the concepts in the enterprise integration space (think of publish-subscribe, dead letter channel, content-based router, filter, splitter, aggregator, throttler, retry, circuit breaker, etc.). Some of these patterns have been around for over a decade and some are new, but they are well known by anyone doing messaging and distributed system integration for a living.
The second major part of Apache Camel is the huge connectors library. Basically, as long as there is a Java library for a protocol, system endpoint, SaaS API, most likely there is a Camel connector for it (think of HTTP, JMS, SOAP, REST, AWS SQS, DropBox, Twitter, and now Ethereum, etc). Connectors abstract away the complexity of configuring the different libraries and provide a unified URI based approach for connecting to all kind of systems.
And the last piece of Apache Camel is the Domain Specific Language (DSL) that wires together connectors and EIPs in a higher level integration focused language. The DSL, combined with connectors and patterns makes developers highly productive in connecting systems and creates solutions that are industry standard and easier to maintain for long periods. All these are characteristics that are important for enterprise companies looking to create modern solutions based on mature technology.

Companies are more integrated than ever, the systems within the companies are more integrated than ever. And if you are a Java shop, most likely there is already some Apache Camel based integration in use somewhere in the organization. Now you can use Camel and all the capabilities it provides also to talk to Ethereum.

Apache Camel Connector for Ethereum

The natural intersection of the two technologies is a Camel connector for Ethereum. Such a connector would allow integrating Ethereum with any other system, interaction style, and protocol. For that purpose, I evaluated the existing Java libraries for Ethereum and came to the conclusion that web3j is the right fit for this use case. Web3j is an actively developed, feature rich, Java library for interacting with Ethereum compatibles nodes over JSON-RPC. Camel-web3j connector (the technical name for the Camel Ethereum connector) is a thin wrapper that gives an easy way to use the capabilities offered by web3j from Apache Camel DSL. Currently, the connector offers the following features:

Works with Geth, Parity, Ganache.
Support for Infura API.
Support for JP Morgan's Quorum API.
Support for JSON-RPC API over HTTP and IPC.
Implementation of all web3, net, eth, shh operations.
Support for filters.
Support for Ethereum Name Service (ENS).
Fully unit and integration tested (over 100 and growing).

The full power of this integration comes not from the connector features, but when the connector is used together with the other connectors, patterns and all other Camel capabilities to provide a complete integration framework around Ethereum.

Ethereum compatible JSON-RPC APIs

Next, I'm going to focus on adding support for Parity's Personal, and Geth's Personal client API, Ethereum wallet support, and others. The aim is to keep the component up-to-date with the web3j capabilities that are useful in system-to-system integration scenarios with Apache Camel. The connector is pushed to Apache Camel 2.22 and ready for early adopters to give it a try and provide feedback. To get started, have a look at the unit tests to discover how each operation is configured, and the integration tests to see how to connect to Ganache or Ethereum mainnet, etc. Enjoy.

Use Cases for Apache Camel

Bellow is the Enterprise Ethereum Architecture Stack (EEAS) which represents a conceptual framework of the common layers and components of an Enterprise Ethereum (EE) application according to the client specification v1.0.

Enterprise Ethereum Architecture Stack

If you wonder where exactly Camel fits here, Camel-web3j is part of the tooling layer as an integration library with a focus on system-to-system integration. It uses the public Ethereum JSON-RPC API, which any Enterprise Ethereum compatible implementation must support and keep backward compatible with.
Then, Camel would primarily be used to interact with services that are external to Ethereum but trusted by the smart contracts (so-called Oracles). In a similar manner, Camel can be used to interact with Enterprise Management Systems to send alerts and metrics, report faults, change configurations, etc.

The main use cases I can think of for this connector are:

Listen for new blocks, events, happening in the Ethereum network, filter, transform, enrich and publish them into other systems. For example listen for new blocks, retrieving its transactions, filter out uninteresting ones, enriching others, and process them. That can be done using Ethereum node filters capabilities, or purely with Camel, using polling consumers to query a node periodically and idempotent filters to prevent processing previously processed blocks, etc.
The other use case would be, to listen for events and commands coming from an enterprise system (maybe a step in the business process) and then tell the Ethereum network about it. For example, a KYC is approved or payment is received in one system, which causes Camel to talk to the second system and retrieve a user's ERC20 address and perform an Ethereum transaction.

Real world uses of Camel would involve a more complex mixture of the above scenarios ensuring high availability, resilience, replay, auditing, etc, in which Camel is really good at.

Ethereum Oracle Implemented in Apache Camel

"Talk is cheap. Show me the code." - Linus Torvalds

In many occasions, smart contracts need information from the real world to operate. An oracle is, simply put, a smart contract that is able to interact with the outside world. The demonstrate the usage of Camel-web3j, I created a Camel route that represents an oracle. The route listens for CallbackGetBTCCap events on a specific topic, and when such an event is received, the Camel route generates a random value and passes it to the same contract by calling setBTCCap method. That is basically a "Hello world!" the Ethereum way.

To trigger the event, you can call updateBTCCap method on the smart contract using the following unit test:

mvn test -Dtest=CamelOracleRouteTest#updateBTCCap

To check the current price in the contract, you can call getBTCCap method on the smart contract using the following unit test:

mvn test -Dtest=CamelOracleRouteTest#getBTCCap

Check the full instructions, the smart contract, Camel routes on Github and try it for yourself. If you use the component and have questions or feedback, if you like this and you are interested from implementing Camel connector for other blockchain projects, reach out. Take care.

OFBizian

Blogroll

First Steps with Dapr

What is Dapr?

Getting started

Install Dapr CLI

Option 1: Install Dapr without Docker

Option 2: Install Dapr with Docker

Run the Quickstarts

Transparent vs explicit proxy

Summary

Liked more

Liked less

Multi-Runtime Microservices Architecture

Distributed application needs

Lifecycle

Networking

State

Binding

Traditional middleware limitations

Lifecycle

Networking

State

Binding

Cloud-native tendencies

Lifecycle

Networking

State

Binding

Future architecture trends

Introducing multi-runtime microservices

Micrologic runtime characteristics

Mecha runtime characteristics

What are the main benefits of this architecture?

What are the main drawbacks of this architecture?

What comes after microservices are not functions

Conclusion

Camel Rebirth with Subsecond Experiences

This post was originally published here. Check out my new Kubernetes Patterns book and follow me @bibryam for future blog posts.

A look at the past decade

Gang of Four for integration

Universal library consumption model

The right level of abstraction

Embracing change

Behind the scene engine

Looking to the future

Subsecond deployments to Kubernetes

Subsecond startups of Camel applications

Summary

The next integration evolution - blockchain

A Java developer’s first impressions from Corda

What is Corda?

Design Principles

Main Concepts

Technology Stack

Apache Camel Integration with Corda

Conclusion

Enterprise Integration for Ethereum

The Ethereum Ecosystem

Open Source Enterprise Integration

Apache Camel Connector for Ethereum

Use Cases for Apache Camel

Ethereum Oracle Implemented in Apache Camel

Labels

Find Me Online

Archive