Building Distributed Workflow Applications on Amazon with Camel

Pipeline with SNS-SQS
A workflow consist of independent tasks performed in particular sequence determined by dynamic conditions. Very often a workflow represents a business process, for example the order processing steps in a ecommerce store.
Amazon Web Services offer various tools for building distributed and scalable workflow applications. One approach for building such an application is to use topics and queues for connecting the distinct steps in the workflow process. Then we can use publish/subscribe,  competing consumers and other mechanisms to scale our application and soon even the simplest application takes a shape similar to this:
Each step of the pipeline is connected to the next one with a queue and each step performs some actions and takes decision what is the next step. In addition using SNS/SQS involves some other low level tasks:
- Serialize/deserialize the data
- Ensure consistency (FIFO order) for SQSmessages
- Make sure message size is not exceeded
- Invent some kind of auditing support
- Subscriber queues to topics, assign permissions
- Manage DLQs
At the end it works, but overcoming these technical challenges takes as much time as writing the actual code that delivers the business value.
Simple Workflow Service
SWF on the other hand offers a higher level API for writing distributed, asynchronous workflow applications. It automatically serializes/deserializes data, manages application state, offers auditability, guarantees strong consistency, supports multiple versions. Most importantly, it ensures that the workflow orchestration and business  logic execution are separated. Any typical SWF application has the following building blocks:
In SWF terms, a workflow is the actual template that describes the distinct steps a process should follow. And a workflow execution is one run of this template.
Starter - the process that can start, stop and interact with a workflow execution.
Decider - the process that orchestrates and decides what is the next step of a workflow exection.
Worker - a process that executes a tasks from a specific type.
SWF Console - provides full visibility and control of the execution.
An example workflow execution can go through the following steps: a starter starts a workflow execution, SWF receives it, asks the decider what is the next step, then based on the decision passes the task to an appropriate activity worker. Once the result from the activity worker is received SWF asks the decider again for the next step, and depending on the response may execute another worker or not. This flow continues till the decider replies that the workflow is completed. You can see how the decider orchestrate each of the steps of the workflow and the activity workers perform the individual tasks. All that is managed by SWF and auditable at any stage.
Why use Camel?
The amazon provided Java clients work by using annotations to generate proxy classes to access SWF services. The whole process of generating and using proxy classes combined with the dependency from the starter to the decider, and from the decider to the activity workers is not very joyful. And what can be better than using a Camel route for orchestration and another route for the actual activity worker? The result is a Camel SWF component that is in Camel master now. Camel-swf component has two types of endpoints: workflow and activity.
A workflow producer allows us to start, terminate, cancel, signal, get state or retrieve the whole execution history of a workflow execution. In our diagram it represents the starter. Here is an example of how to start a workflow execution:
A workflow consumer is the decider. It receives decision tasks from SWF service and either schedules activity tasks for execution or indicates that the workflow execution has completed. It is a stateless deterministic route that only job is to orchestrate tasks:
The activity endpoints allow us to interact with the activity tasks. An activity producer is used to schedule activity tasks, and it can be used only from a decider route (actually decider thread). It is because only a decider can schedule activity tasks. The last box in our diagram that we have to provide implementation is the activity worker, which can be created using an activity consumer. This endpoint will receive activity tasks from SWF, execute them and return the results back to SWF. This is the bit that actually performs the business logic:
So any SWF application consist of a starter(workflow producer) that starts the execution, a decider (worfklow consumer) that receives decision tasks and schedules activity tasks (using activity producer) and the activity workers (activity consumer) that performs the tasks. And the communication between these endpoints is asynchronous, consistent and managed by SWF service.
It is not the easiest component to use, but it pays off with a simple and scalable architecture.
PS: Thanks to my ex-manager S. Wheeler for letting me contribute this component back to the Camel community.

More Apache Camel Books

It is hard to write impartial book review when you are the author of the first one, technical reviewer of the second one and really like the third book, but I'll try my best with this post.

Recently I blogged about my Instant Apache Camel Message Routing book. It is a short book intented for new comers to Apache Camel, showing quickly, how to create messaging applications in Camel using Enterprise Integration Patters. In this book I tried to distile and put the most important bits and peices about Camel and most commonly used integration patters. So there is no fluff, no repetion, only 50 pages of Camel integration crush course with examples and diagrams. If you want to get a high level view of Camel and EIPs, without diving into the details, that's the book to read.

The other new Apache Camel book that is expected by the end of this year is called Apache Camel Developer's Cookbook by Scott Cranton and Jakub Korab. It is a lookup guide full of recipes for everything you might want to do with Camel. It provides around 500 pages of best practice tips for using Apache Camel and lots of examples. I enjoyed reviewing it and even learned some new useful tips. If you prefer learing with examples, this cookbook is a must have.

And there is of course the famous Camel in Action by Claus Ibsen and Jonathan Anstey. It is a book which I started learning Camel couple of years ago and I still read some of the chapters from time to time. If you want to learn the philisofy behind Camel and find out how Camel internals work this is the book. A must read before putting #Camel on your CV.

I cannot stop myself mentioning the Bible of Integrations, the Enterprise Integration Patterns book by Gregor Hohpe. This is not a Camel book, but it lays down the foundation of Enterprise Integration Patterns which Camel implements. The book's website is a great pattern reference with use cases and diagrams. It is a huge book (with more than 700 pages), but a good to have as a reference for any team. You might be surpised to find out that there is pattern for everything.

In short: if you want to discover what is Apache Camel and how to use EIPs without going into the  internals go for Camel Message Routing book. If you want to see lots of example recipes and tips, go for Camel Cookbook. And if you want to deep dive into Camel go for Camel in Action.

PS: When I created this post there were three Camel books, but by the time I decided to publish it there were four already. A really productive year for the Camel community. The fourt book in my list is called Instant Apache Camel Messaging System by Evgeniy Sharapov. It is another short book, intented for beginners that emphsasys on test driven approach for writing Camel applications. I haven't read the book, but Claus Ibsen did and posted a review here.
Choose a book, read it, and hack something.

How to do FIFO messaging with Amazon SQS

If you have used Amazon Web Services, you probably know Simple Queue Service(SQS) - it is a reliable, highly scalable hosted queue for storing messages. One of the main drawbacks of SQS is that it does not guarantee first-in, first-out (FIFO) access to messages and that's clearly stated in the Amazon documentation:

"Amazon SQS does not guarantee FIFO access to messages in Amazon SQS queues, mainly because of the distributed nature of the Amazon SQS. If you require specific message ordering, you should design your application to handle it."

Here is a quick example how Camel Resequencer pattern can help you overcome this drawback in 2-3 lines of code. To setup our example scenario, let's first create a route that will populate our queue with 100 messages each containing the message number: Then create a consumer route, that will read from the queue and log each message:
To prove that Amazon doesn't guarantee FIFO ordering we will write a test: Don't be misled by the short size of the test, it starts both routes and verifies that all 100 messages are received in the right order.

The above test fails in most of the runs and proves that SQS doesn't support FIFO order(when the messages are sent too quickly). The idea in this example is that our producer will index the messages or provide some kind of sequencing information, so that the message consumer can interpret it and order the messages. To do that we simply send a number as message body, but in a real world application that can be a field in a JSON or XML message. Then to make the test pass and ensure that the messages are received in the same order as they were sent, all we have to do is to add the Resequencer pattern in our consumer route: The streaming based Resequencer will let the messages go without any delay as long as they are in the right order. If the messages are not in the right sequence, it can hold up to 100 messages for 1 second while waiting for the missing message. Depending on your message load, you should adjust these numbers to hold enough messages while waiting for the missing one, but not hold for too long and reduce the throughput. Another option would be to try the non-streaming batch based Resequencer, which always collects a number of messages before sorting and releasing them.
If you are new to Apache Camel and Enterprise Integration Patterns (like the Resequencer), have a look at my recently published "Instant Apache Camel Message Routing" book where similar patterns and how to use them in Camel are explained in a short and focused manner.
Or if you want to deep dive into the integration world, I recommend you start from "Enterprise Integration Patterns" and "Camel in Action" books.

An old pet project based on Sencha Ext JS and Apache OFBiz

Play with the DEMO using username: scrum and password: scrum

Couple of years ago while working on software project we used Pivotal Tracker as our project management tool. It was a great free SAAS when it suddenly changed its terms and conditions and became a paid one. As a developer with great enthusiasm I said to myself "I know a great UI library (with not so great license) - Sencha Ext JS and great backend project with not so great UI - Apache OFBiz, why not combine them and create something better and still free". After couple of months I created LazyPlanner and realized that it is full with free project management tools out there, so it never went live. I wish I had read Eric Ries's The Lean Startup book earlier and had not started my idea by coding it first.
Any way, now I found this old project on my computer and put it on github. It is a standard component for OFBiz which works just by putting it in hot-deploy folder. For this demo installation I put some data and created a project with couple of task lists (called sprints) and few tasks. It supports multiple projects, with multiple tasks lists and tasks... It will be also running on the cloud for couple of days, so play with it and if you find it appealing get the code and use it on your own risk.

Instant Apache Camel and Enterprise Integration Patterns Book

I'm excited to announce that my book "Instant Apache Camel Message Routing" is published and ready for reading!
With new APIs and technologies emerging every day, the need for integrating applications is greater than ever before. With the right tools, integrating applications is not hard. Apache Camel is the leading open source integration and message orchestration framework with a variety of connectors and numerous well-known integration pattern implementations.
Instant Apache Camel Message Routing will help you to get started with Camel and Enterprise Integration Patterns in matter of hours. The book, is a short, focused and practical guide to Apache Camel that provides a high level overview of the Camel architecture and message routing principles. It introduces a number of integration patterns, complete with diagrams, common use cases, and examples about how to use them. It also explains how to test and monitor Camel applications and cope with failure scenarios.
The book contains the following chapters:
Creating a Camel project (Simple)
Routing messages to different destinations (Simple)
Using components (Simple)
Connecting routes (Simple)
Removing unwanted messages (Simple)
Transforming messages (Intermediate)
Splitting a message into many (Intermediate)
Aggregating multiple messages into one (Intermediate)
Reorganizing messages (Intermediate)
Multicasting messages (Intermediate)
Error handling and monitoring (Advanced)
Testing the messaging applications (Advanced)

In summary: Instant Apache Camel Message Routing is an easy to read and focused book that contains only the essence of Apache Camel and Enterprise Integration Patterns. It is ideal for developers who want to get started with Camel and message routing quickly.

Transactional caching for Camel with Infinispan

Some time ago I created a Redis connector for Camel. Redis is awesome key-value store (and a lot more) but then I needed a cache running in the same JVM as Camel and noticed Infinispan which has just switched to ASL v2. There are already other connectors in Camel for caching on the JVM, like Hazelcast and EHCache, but if you are already using Camel as part of other Red Hat products or want to see how LIRS eviction overperforms LRU, Infinispan is worth trying.
Briefly, Infinispan is transactional in-memory key-value store and data grid. When used in embedded mode, Infinispan resides in the same JVM as Camel and allows Camel consumer to receive cache change notifications: In the example above, when a cache entry is created, Infinispan will fire two events - one before and one after the cache entry has been created. It is also possible to receive the events synchronously, meaning in the same thread that processes the cache action, or asynchronously in a separate thread without blocking the cache action.

Using Infinispan as a local cache is simple, it exposes a ConcurrentMap interface and has the usual expiration, eviction, passivation, persistent store, querying, etc features. What makes Infinispan also a data grid is the ability of the nodes to discover other nodes and replicate or distribute data among themselves. Replication allows sharing data across a cluster whereas distribution uses consistent hashing algorithm to achieve better scalability.
In client-server mode, Infinispan is running as standalone application, and Camel producer can send messages using Infinispan's Hot Rod client. Hot Rod is a binary, language neutral, intelligent protocol, allowing interaction with Infinisnap servers in topology and hash-distribution-aware fashion.
The Infinispan producer in Camel currently offers GET, PUT, REMOVE and CLEAR operations. Here is an example of the producer putting data to orders cache:

Let's create a more interesting example. Infinispan is also JTA compliant and can participate in transactions. We will create a REST API for registering persons, which will first persist the person in a relational database using Camel sql component, and then will put the firstName into Infinispan cache in the same transaction. We will do that using a transacted Camel route, so if an error occurs during routing, at any stage, Camel will make sure the transaction(for the cache and the database) is rolled back, so that the database and the cache are always in a consistent state.
As you can see there is no magic or extra configuration in the route, it is a standard route. We have small bit of code that throws an exception when the person lastName is damn to simulate errors in the middle of the route.
The application runs in a standalone mode with atomikos JTA transaction manager. First we create a JtaTransactionManager to be used by the transacted route: Then wrap our DataSource with it: and using a TransactionManagerLookup tell Infinispan to participate in the same transaction: After all this boilerplate code, we have our datasource, cache and Camel route participating in the same transaction. To see the full REST example with two phase commit and rollback, get the source code from github and play with it.
BTW Camel-infinispan component is still not part of Camel trunk, to run the example you will need that too.

Publish/Subscribe Pattern with Apache Camel

Publish/Subscribe is a simple messaging pattern where a publisher sends messages to a channel without the knowledge of who is going to receive them. Then it is the responsibility of the channel to deliver a copy of the messages to each subscriber. This messaging model enables creation of loosely coupled and scalable systems.
It is a very common messaging pattern and there are so many ways to create a kind of pub-sub in Apache Camel. But bear in mind that they are all different and have different characteristics. From the simplest to more complex, here is a list:
  • Multicast - works only with a static list of subscribers, can deliver the message to subscriber in parallel, stops or continues on exception if one of the subscribers fails.
  • Recipient List - it is similar to multicast, but allows the subscribers to be defined at run time, for example in the message header.
  • SEDA - this component provides asynchronous SEDA behaviour using BlockingQueue. When multipleConsumers option is set, it can be used for asynchronous pub-sub messaging. It also has possibilities to block when full, set queue size or time out publishing if the message is not consumed on time.
  • VM - same as SEDA, but works cross multiple CamelContexts, as long as they are in the same JMV. It is a nice mechanism for sending messages between webapps in a web-container or bundles in OSGI container.
  • Spring-redis - Redis has pubsub feature which allows publishing messages to multiple receivers. It is possible to subscribe to a channel by name or using pattern-matching. When pattern-matching is used, the subscriber will receive messages from all the channels matching the pattern. Keep in mind that in this case it is possible to receive a message more than once, if the multiple patterns matches the same channel where the message was sent.
  • JMS (ActiveMQ) - that's probably the best know way for doing pub-sub including durable subscriptions. For a complete list of features check ActiveMQ website.
  • Amazon SNS/SQS - if you need a really scalable and reliable solution, SNS is the way to go. Subscribing a SQS queue to the topic, turns it into a durable subscriber and allows polling the messages later. The important point to remember in this case is that it is not very fast and most importantly, Amazon doesn't guarantee FIFO order for your messages.
There are also less popular Camel components which offer publish-subscribe messaging model:
  • websocket - it uses Eclipse Jetty Server and can sends message to all clients which are currently connected.
  • hazelcast - SEDA implements a work-queue in order to support asynchronous SEDA architectures.
  • guava-eventbus - integration bridge between Camel and Google Guava EventBus infrastructure.
  • spring-event - provides access to the Spring ApplicationEvent objects.
  • eventadmin - on OSGi environment to receive OSGI events.
  • xmpp - implements XMPP (Jabber) transport. Posting a message in chat room is also pub-sub;)
  • mqtt - for communicating with MQTT compliant message brokers.
  • amqp - supports the AMQP protocol using the Client API of the Qpid project.
  • javaspace - a transport for working with any JavaSpace compliant implementation.
Can you name any other way for doing publish-subscribe?

Accessing AWS without key and secret

If you are using Amazon Web Services(AWS), you are probably aware how to access and use resources like SNS, SQS, S3 using key and secret. With the aws-java-sdk that is straight forward:
AmazonSNSClient snsClient = new AmazonSNSClient(
new BasicAWSCredentials("your key", "your secret"))
One of the difficulties with this approach is storing the key/secret securely especially when there are different set of these for different environments. Using java property files, combined with maven or spring profiles might help a little bit to externalize the key/secret out of your source code, but still doesn't solve the issue of securely accessing these resources.
Amazon has another service to help you in this occasion. No, no, this is not one more service to pay for in order to use the previous services. It is a free service, actually it is a feature of the amazon account. AWS Identity and Access Management (IAM) lets you securely control access to AWS services and resources for your users, you can manage users and groups and define permissions for AWS resources.
One interesting functionality of IAM is the ability to assign roles to EC2 instances. The idea is you create roles with sets of permissions and you launch an EC2 instance by assigning the role to the instance. And when you deploy an application on that instance, the application doesn't need to have access key and secret in order to access other amazon resource. The application will use the role credentials to sign the requests. This has a number of benefits like a centralized place to control all the instances credentials, reduced risk with auto refreshing credentials and so on. Here is a short video demonstrating how to assign roles to an EC2 instance:

Once you have role based security enabled for an instance, to access other resources from that instances you have to create and AwsClient using the chained credential provider:
AmazonSNSClient snsClient = new AmazonSNSClient(
new DefaultAWSCredentialsProviderChain())
The provider will search your system properties, environment properties and finally call instance metadata API to retrieve the role credentials in chain of responsibility fashion. It will also refresh the credentials in the background periodically depending on its expiration period.
And finally, if you want to use role based security from Camel applications running on Amazon, all you have to do is create an instance of the client with configured chained credentials object and don't specify any key or secret:

Apache Camel meets Redis

The Lamborghini of Key-Value stores
Camel is the best of bread Integration framework and in this post I'm going to show you how to make it even more powerful by leveraging another great project - Redis. Camel 2.11 is on its way to be released soon with lots of new features, bug fixes and components. Couple of these new components are authored by me, redis-component being my favourite one. Redis - a ligth key/value store is an amazing piece of Italian software designed for speed (same as Lamborghini - a two-seater Italian car designed for speed). Written in C and having an in-memory closer to the metal nature, Redis performs extremely well (Lamborgini's motto is "Closer to the Road"). Redis is often referred to as a data structure server since keys can contain strings, hashes, lists and sorted sets. A fast and light data structure server is like a super sportscars for software engineers - it just flies. If you want to find out more about Redis' and Lamborghini's unique performance characteristics google around and you will see for yourself.
Getting started with Redis is easy: download, make, and start a redis-server. After these steps, you ready to use it from your Camel application. The component uses internally Spring Data which in turn uses Jedis driver, but with possibility to switch to other Redis drivers. Here are few use cases where the camel-redis component is a good fit:

Idempotent Repository
The term idempotent is used in mathematics to describe a function that produces the same result if it is applied to itself. In Messaging this concepts translates into the a message that has the same effect whether it is received once or multiple times. In Camel this pattern is implemented using the IdempotentConsumer class which uses an Expression to calculate a unique message ID string for a given message exchange; this ID can then be looked up in the IdempotentRepository to see if it has been seen before; if it has the message is consumed; if its not then the message is processed and the ID is added to the repository. RedisIdempotentRepository is using a set structure to store and check for existing Ids.

One of the main uses of Redis is as LRU cache. It can store data inmemory as Memcached or can be tuned to be durable flushing data to a log file that can be replayed if the node restarts.The various policies when maxmemory is reached allows creating caches for specific needs:
  • volatile-lru remove a key among the ones with an expire set, trying to remove keys not recently used.
  • volatile-ttl remove a key among the ones with an expire set, trying to remove keys with short remaining time to live.
  • volatile-random remove a random key among the ones with an expire set.
  • allkeys-lru like volatile-lru, but will remove every kind of key, both normal keys or keys with an expire set.
  • allkeys-random like volatile-random, but will remove every kind of keys, both normal keys and keys with an expire set.
Once your Redis server is configured with the right policies and running, the operation you need to do are SET and GET:

Interap pub/sub with Redis
Camel has various components for interacting between routes:
direct: provides direct, synchronous invocation in the same camel context.
seda: asynchronous behavior, where messages are exchanged on a BlockingQueue, again in the same camel context.
vm: asynchronous behavior like seda, but also supports communication across CamelContext as long as they are in the same JVM.
Complex applications usually consist of more than one standalone Camel instances running on separate machines. For this kind of scenarios, Camel provides jms, activemq, combination of AWS SNS with SQS, for messaging between instances.
Redis has a simpler solution for the Publish/Subscribe messaging paradigm. Subscribers subscribes to one or more channels, by specifying the channel names or using pattern matching for receiving messages from multiple channels. Then the publisher publishes the messages to a channel, and Redis makes sure it reaches all the matching subscribers.

Other usages
Guaranteed Delivery: Camel supports this EIP using JMS, File, JPA and few other components. Here Redis can be used as lightweight key-value persistent store with its transaction support.
The Claim Check from the EIP patterns allows you to replace message content with a claim check (a unique key), which can be used to retrieve the message content at a later time. The message content can be stored temporarily in Redis.
Redis is also very popular for implementing counters, leaderboards, tagging systems and many more functionalities. Now, with two swiss army knives under your belt, the integrations to make are limited only by your imagination.

About Me