Building Distributed Workflow Applications on Amazon with Camel

Pipeline with SNS-SQS
A workflow consist of independent tasks performed in particular sequence determined by dynamic conditions. Very often a workflow represents a business process, for example the order processing steps in a ecommerce store.
Amazon Web Services offer various tools for building distributed and scalable workflow applications. One approach for building such an application is to use topics and queues for connecting the distinct steps in the workflow process. Then we can use publish/subscribe,  competing consumers and other mechanisms to scale our application and soon even the simplest application takes a shape similar to this:
Each step of the pipeline is connected to the next one with a queue and each step performs some actions and takes decision what is the next step. In addition using SNS/SQS involves some other low level tasks:
- Serialize/deserialize the data
- Ensure consistency (FIFO order) for SQSmessages
- Make sure message size is not exceeded
- Invent some kind of auditing support
- Subscriber queues to topics, assign permissions
- Manage DLQs
At the end it works, but overcoming these technical challenges takes as much time as writing the actual code that delivers the business value.
Simple Workflow Service
SWF on the other hand offers a higher level API for writing distributed, asynchronous workflow applications. It automatically serializes/deserializes data, manages application state, offers auditability, guarantees strong consistency, supports multiple versions. Most importantly, it ensures that the workflow orchestration and business  logic execution are separated. Any typical SWF application has the following building blocks:
In SWF terms, a workflow is the actual template that describes the distinct steps a process should follow. And a workflow execution is one run of this template.
Starter - the process that can start, stop and interact with a workflow execution.
Decider - the process that orchestrates and decides what is the next step of a workflow exection.
Worker - a process that executes a tasks from a specific type.
SWF Console - provides full visibility and control of the execution.
An example workflow execution can go through the following steps: a starter starts a workflow execution, SWF receives it, asks the decider what is the next step, then based on the decision passes the task to an appropriate activity worker. Once the result from the activity worker is received SWF asks the decider again for the next step, and depending on the response may execute another worker or not. This flow continues till the decider replies that the workflow is completed. You can see how the decider orchestrate each of the steps of the workflow and the activity workers perform the individual tasks. All that is managed by SWF and auditable at any stage.
Why use Camel?
The amazon provided Java clients work by using annotations to generate proxy classes to access SWF services. The whole process of generating and using proxy classes combined with the dependency from the starter to the decider, and from the decider to the activity workers is not very joyful. And what can be better than using a Camel route for orchestration and another route for the actual activity worker? The result is a Camel SWF component that is in Camel master now. Camel-swf component has two types of endpoints: workflow and activity.
A workflow producer allows us to start, terminate, cancel, signal, get state or retrieve the whole execution history of a workflow execution. In our diagram it represents the starter. Here is an example of how to start a workflow execution:
A workflow consumer is the decider. It receives decision tasks from SWF service and either schedules activity tasks for execution or indicates that the workflow execution has completed. It is a stateless deterministic route that only job is to orchestrate tasks:
The activity endpoints allow us to interact with the activity tasks. An activity producer is used to schedule activity tasks, and it can be used only from a decider route (actually decider thread). It is because only a decider can schedule activity tasks. The last box in our diagram that we have to provide implementation is the activity worker, which can be created using an activity consumer. This endpoint will receive activity tasks from SWF, execute them and return the results back to SWF. This is the bit that actually performs the business logic:
So any SWF application consist of a starter(workflow producer) that starts the execution, a decider (worfklow consumer) that receives decision tasks and schedules activity tasks (using activity producer) and the activity workers (activity consumer) that performs the tasks. And the communication between these endpoints is asynchronous, consistent and managed by SWF service.
It is not the easiest component to use, but it pays off with a simple and scalable architecture.
PS: Thanks to my ex-manager S. Wheeler for letting me contribute this component back to the Camel community.


Post a Comment

About Me