Blogroll

Building Distributed Workflow Applications on Amazon with Camel

Pipeline with SNS-SQS
A workflow consist of independent tasks performed in particular sequence determined by dynamic conditions. Very often a workflow represents a business process, for example the order processing steps in a ecommerce store.
Amazon Web Services offer various tools for building distributed and scalable workflow applications. One approach for building such an application is to use topics and queues for connecting the distinct steps in the workflow process. Then we can use publish/subscribe,  competing consumers and other mechanisms to scale our application and soon even the simplest application takes a shape similar to this:
Each step of the pipeline is connected to the next one with a queue and each step performs some actions and takes decision what is the next step. In addition using SNS/SQS involves some other low level tasks:
- Serialize/deserialize the data
- Ensure consistency (FIFO order) for SQSmessages
- Make sure message size is not exceeded
- Invent some kind of auditing support
- Subscriber queues to topics, assign permissions
- Manage DLQs
At the end it works, but overcoming these technical challenges takes as much time as writing the actual code that delivers the business value.
Simple Workflow Service
SWF on the other hand offers a higher level API for writing distributed, asynchronous workflow applications. It automatically serializes/deserializes data, manages application state, offers auditability, guarantees strong consistency, supports multiple versions. Most importantly, it ensures that the workflow orchestration and business  logic execution are separated. Any typical SWF application has the following building blocks:
In SWF terms, a workflow is the actual template that describes the distinct steps a process should follow. And a workflow execution is one run of this template.
Starter - the process that can start, stop and interact with a workflow execution.
Decider - the process that orchestrates and decides what is the next step of a workflow exection.
Worker - a process that executes a tasks from a specific type.
SWF Console - provides full visibility and control of the execution.
An example workflow execution can go through the following steps: a starter starts a workflow execution, SWF receives it, asks the decider what is the next step, then based on the decision passes the task to an appropriate activity worker. Once the result from the activity worker is received SWF asks the decider again for the next step, and depending on the response may execute another worker or not. This flow continues till the decider replies that the workflow is completed. You can see how the decider orchestrate each of the steps of the workflow and the activity workers perform the individual tasks. All that is managed by SWF and auditable at any stage.
Why use Camel?
The amazon provided Java clients work by using annotations to generate proxy classes to access SWF services. The whole process of generating and using proxy classes combined with the dependency from the starter to the decider, and from the decider to the activity workers is not very joyful. And what can be better than using a Camel route for orchestration and another route for the actual activity worker? The result is a Camel SWF component that is in Camel master now. Camel-swf component has two types of endpoints: workflow and activity.
A workflow producer allows us to start, terminate, cancel, signal, get state or retrieve the whole execution history of a workflow execution. In our diagram it represents the starter. Here is an example of how to start a workflow execution:
A workflow consumer is the decider. It receives decision tasks from SWF service and either schedules activity tasks for execution or indicates that the workflow execution has completed. It is a stateless deterministic route that only job is to orchestrate tasks:
The activity endpoints allow us to interact with the activity tasks. An activity producer is used to schedule activity tasks, and it can be used only from a decider route (actually decider thread). It is because only a decider can schedule activity tasks. The last box in our diagram that we have to provide implementation is the activity worker, which can be created using an activity consumer. This endpoint will receive activity tasks from SWF, execute them and return the results back to SWF. This is the bit that actually performs the business logic:
So any SWF application consist of a starter(workflow producer) that starts the execution, a decider (worfklow consumer) that receives decision tasks and schedules activity tasks (using activity producer) and the activity workers (activity consumer) that performs the tasks. And the communication between these endpoints is asynchronous, consistent and managed by SWF service.
It is not the easiest component to use, but it pays off with a simple and scalable architecture.
PS: Thanks to my ex-manager S. Wheeler for letting me contribute this component back to the Camel community.

More Apache Camel Books

It is hard to write impartial book review when you are the author of the first one, technical reviewer of the second one and really like the third book, but I'll try my best with this post.

Recently I blogged about my Instant Apache Camel Message Routing book. It is a short book intented for new comers to Apache Camel, showing quickly, how to create messaging applications in Camel using Enterprise Integration Patters. In this book I tried to distile and put the most important bits and peices about Camel and most commonly used integration patters. So there is no fluff, no repetion, only 50 pages of Camel integration crush course with examples and diagrams. If you want to get a high level view of Camel and EIPs, without diving into the details, that's the book to read.

The other new Apache Camel book that is expected by the end of this year is called Apache Camel Developer's Cookbook by Scott Cranton and Jakub Korab. It is a lookup guide full of recipes for everything you might want to do with Camel. It provides around 500 pages of best practice tips for using Apache Camel and lots of examples. I enjoyed reviewing it and even learned some new useful tips. If you prefer learing with examples, this cookbook is a must have.

And there is of course the famous Camel in Action by Claus Ibsen and Jonathan Anstey. It is a book which I started learning Camel couple of years ago and I still read some of the chapters from time to time. If you want to learn the philisofy behind Camel and find out how Camel internals work this is the book. A must read before putting #Camel on your CV.

I cannot stop myself mentioning the Bible of Integrations, the Enterprise Integration Patterns book by Gregor Hohpe. This is not a Camel book, but it lays down the foundation of Enterprise Integration Patterns which Camel implements. The book's website is a great pattern reference with use cases and diagrams. It is a huge book (with more than 700 pages), but a good to have as a reference for any team. You might be surpised to find out that there is pattern for everything.

In short: if you want to discover what is Apache Camel and how to use EIPs without going into the  internals go for Camel Message Routing book. If you want to see lots of example recipes and tips, go for Camel Cookbook. And if you want to deep dive into Camel go for Camel in Action.

PS: When I created this post there were three Camel books, but by the time I decided to publish it there were four already. A really productive year for the Camel community. The fourt book in my list is called Instant Apache Camel Messaging System by Evgeniy Sharapov. It is another short book, intented for beginners that emphsasys on test driven approach for writing Camel applications. I haven't read the book, but Claus Ibsen did and posted a review here.
Choose a book, read it, and hack something.

About Me