Blogroll

Turning Microservices Inside-Out

There is a fantastic talk by Martin Kleppmann called “Turning the database inside-out”. Once watched, it can change your perspective on databases and event logs irreversibly. While I agree with the outlined limitations of databases and the benefits of event logs, I’m not convinced of the practicality of replacing databases with event logs. I believe the same design principles used for turning databases inside-out, should instead be applied at a higher, service design level to ensure microservices stream changes from inside-out. With that twist, within the services, we can keep using traditional databases for what they are best for - efficiently working with mutable state and also use event logs to reliably propagate changes among services. With the help of frameworks such as Debezium which can act as a connecting tissue between databases and event logs, we can benefit from the time-tested and familiar database technology and modern event logs such as Red Hat’s managed Apache Kafka service at the same time. This inside-out mindset requires a deliberate focus on offering outbound APIs in microservices to stream all relevant state change and domain events from within the service to the outside world. This merge of microservices movement with the event driven emerging trends is what I call turning the microservices data inside-out.

Microservices API types

To build up this idea, I will look into microservices from the point of different API types they provide and consume. A common way to describe microservices is as independently deployed components, built around a business domain, that own their data and are exposed over APIs. That is very similar to how databases are described in the post mentioned above - a black box with a single API that goes in and out.


Data flowing from microservices’ inbound to outbound APIs

Data flowing from microservices’ inbound to outbound APIs

I believe a better way to think about microservices would be one where every microservice is composed of inbound and outbound APIs where the data flows through and a meta API that describes these APIs. While inbound APIs are well known today, outbound APIs are not used as much, and the responsibilities of meta API are spread around various tools and proliferating microservices technologies. To make the inside-out approach work, we need to make outbound and meta APIs first-class microservices constructs and improve the tooling and practices around these areas.

Inbound APIs

Inbound APIs are what every microservice has today in the form of service endpoints. These APIs are outside-in, and they allow outside systems to interact with the service directly through commands and queries or indirectly through events.

Inbound APIs are the norm in microservices today

Inbound APIs are the norm in microservices today

In terms of implementation, these are typically REST-based APIs that offer mutating or read-only operations for synchronous operations, fronted by a load balancing gateway. These can also be implemented as queues for asynchronous command-based interactions, or topics for event-based interactions. The responsibilities and governance of these APIs are well understood and they form the majority of the microservices API landscape today.

Outbound APIs

What I refer to as outbound APIs here are the interactions that originate from within the service and go to outside services and systems. The majority of these are queries and commands initiated by the service and targeted to dependent services owned by somebody else. What I also put under this category are the outbound events that originate from within the service. Outbound events are different from the query and commands targeted for a particular endpoint because an outbound event is defined by the service without concrete knowledge of the existing and possible future recipients. Regardless of the indirect nature of the API, there is still the expectation that these events are generated predictably and reliably for any significant change that happens within the service (typically caused by inbound interactions). Today, outbound events are often an afterthought. They are either created for the needs of a specific consumer that depends on them, or they are added later in the service lifecycle, not by the service owners but other teams responsible for data replication. On both occasions, the possible use cases of outbound events remain low and diminish its potential.

The challenging part with outbound events is implementing a uniform and reliable notification mechanism for any change that happens within the service. To apply this approach uniformly in every microservice and for any kind of database, the tools here have to be non-intrusive and developer-friendly. Not having good frameworks that support this pattern, not having proven patterns, practices, and standards are impediments preventing the adoption of outbound events as a common top-level microservices construct.

Outbound events implemented through change data capture

Outbound events implemented through change data capture

To implement outbound events, you can include the logic of updating a database and publishing an event to a messaging system in your application code but that leads to the well-known dual-write problem. Or you could try to replace the traditional database with an event log, or use specialized event sourcing platforms. But if you consider that your most valuable resources in a project are the people and their proven tools and practices, replacing a fundamental component such as the database with something different will have a significant impact. A better approach would be to keep using the relational databases and all the surrounding tools and practices that have served fine for decades and complement your database with a connecting tissue such as Debezium (disclaimer: I’m the product manager for Debezium at Red Hat and I’m biased about it). I believe the best implementation approach for outbound events is the outbox pattern which uses a single transaction to both perform the normal database update dictated by the service logic and insert a message into a specific outbox table within the same database. Once the transaction is written to the database’s transaction log, Debezium picks up the outbox message from the log and sends it to Apache Kafka. This has nice properties such as "read your own writes" semantics, where a subsequent query to the service returns the newly persisted record and at the same time, we get reliable, asynchronous, propagation of changes via Apache Kafka. Debezium can selectively capture changes from the database transaction logs, transform and publish them into Kafka in a uniform way acting as an outbound eventing interface of the services. Debezium can be embedded into the Java application runtimes as a library, or decoupled as a sidecar. It is a plug-and-play component you add to your service regardless of whether it is a legacy service or created from scratch. It is the missing configuration-based outbound eventing API for any service.

Meta APIs

Today meta APIs describe the inbound and outbound APIs, enabling their governance, discovery, and consumption. They are implemented in siloed tools around a specific technology. In my definition, an OpenAPI definition for a REST endpoint published to an API portal is an example of meta API. An AsyncAPI definition for a messaging topic that is published to a schema registry is an example of meta API too. The schema change topic that Debezium publishes database schema change events (which are different from the data change events) is an example of meta API. There are various capabilities in other tools that describe the data structures and the APIs serving them that can all be classified as meta APIs. So in my definition, meta APIs are all the artifacts that allow different stakeholders to work with the service and enable other systems to use the inbound and outbound APIs.

The evolving responsibilities of Meta APIs

The evolving responsibilities of Meta APIs

One of the fundamental design principles of microservices is to make them independently updatable and deployable. But today there are still significant amounts of coordination required among service owners for upgrades that involve API changes. Service owners need better meta API tools to subscribe for updates from dependent services and prepare to change timely. The meta API tools need to be integrated deeper into development and operational activities to increase agility. Meta API tools today are siloed, passive, and disparate across the technology stack. Instead, meta tools need to reflect the changing nature of service interactions towards an event-driven approach and play a more proactive role in automating some of the routine tasks of the development and operational teams.

Emerging trends

The rise of outbound events

Outbound events are already present as the preferred integration method for most modern platforms. Most cloud services emit events. Many data sources (such as Cockroach changefeeds, MongoDB change streams) and even file systems (for example Ceph notifications) can emit state change events. Custom-built microservices are not an exception here. Emitting state change or domain events is the most natural way for modern microservices to fit uniformly among the event-driven systems they are connected to in order to benefit from the same tooling and practices. Outbound events are bound to become a top-level microservices design construct for many reasons. Designing services with outbound events can help replicate data during an application modernization process. Outbound events are also the enabler for implementing elegant inter-service interactions through the Outbox Patterns and complex business transactions that span multiple services using a non-blocking Saga implementation. Outbound events fit nicely into the Distributed Data Mesh architecture where a service is designed with its data consumers in mind. Data mesh claims that for data to fuel innovation, its ownership must be federated among domain data owners who are accountable for providing their data as products… In short, rather than having a centralized data engineering team to replicate data from every microservice through an ETL process, it is better if microservices are owned jointly with developers and data engineers and design the services to make the data available in the first place. What better way to do that than outbound events with real-time data streaming through Debezium, Apache Kafka, and Schema Registry.

To sum up, outbound events align microservices with the Unix philosophy where “the output of every program becomes the input of a yet unknown program”. To future proof your services, you have to design them in a way to let the data flow from inbound to outbound APIs. This allows all the services to be developed and operated uniformly using modern event-oriented tools and patterns, and unlocks yet unknown future uses of data exposed through events.

Convergence of meta API tools

With the increasing adoption of event-driven architectures and faster pace of service evolution, the responsibilities and the importance of meta APIs are growing too. The scope of meta API tools is no longer limited to synchronous APIs but includes asynchronous APIs too. The meta APIs are expanding towards enabling faster development cycles by ensuring safe schema evolution through compatibility checks, notifications for updates, code generation for bindings, test simulations, and so forth. As a consumer of a service, I want to discover existing endpoints and data formats, the API compatibility rules, limits, and SLAs the service complies with in one place. At the same time, I want to get notifications for any changes that are coming, any deprecations, updates to the APIs, or any new APIs the service is going to offer that might be of interest to me. Not only that, developers are challenged to ship code faster and faster, and modern API tools can automate the process of schema and event structure discovery. Once a schema is discovered and added to the registry, a developer can quickly generate code bindings for their language and start developing in an IDE. Then, other tools could use the meta API definitions and generate tests and mocks, and simulate load by emitting dummy events with something like Microcks or even Postman. At runtime, the contextual information available in the meta APIs can enable the platforms I’m running the application on to inject the connection credentials, register it with monitoring tools, and so on.

Overall, the role of meta API is evolving towards playing a more active role in the asynchronous interaction ecosystem by automating some of the coordination activities among service owners, increasing developer productivity, and automating operations teams’ tasks. And for that to become a reality, the different tools containing API metadata, code generation, test stimulation, environment management must converge, standardize and integrate better.

Standardization of the event-driven space

While event-driven architecture (EDA) has a long history, recent drivers such as cloud adoption, microservices architecture, and a faster pace of change have amplified the relevance and adoption of EDA. Similar to the consolidation and the standardization that happens with Kubernetes and its ecosystem on the platform space, there is a consolidation and community-driven standardization that is happening in the event-driven space around Apache Kafka. Let see a few concrete examples.

Apache Kafka has reached the point of becoming the de facto standard platform for event streaming, the same way AWS S3 is for object store, and Kubernetes is for container orchestration. Kafka has a huge community behind, a large open source ecosystem of tools and services, and possibly the largest adoption as eventing infrastructure by modern digital organizations. There are all kinds of self-hosted Kafka offerings, managed services by boutique companies, cloud providers, and recently by Red Hat too (Red Hat OpenShift Streams for Apache Kafka is a managed Kafka service I’m involved with and I’d love to hear your feedback). Kafka as an API for log-based messaging is so widespread that even non-Kafka projects such as Pulsar, Red Panda, Azure Event Hubs offer compatibility with it. Kafka today is more than a 3rd party architectural dependency. Kafka influences how services are designed and implemented, it dictates how systems are scaled and made highly available, it drives how the users consume the data in real-time. But Kafka alone is like a bare Kubernetes platform without any pods. Let’s see what else in the Kafka ecosystem is a must-have complement and is becoming a de facto standard too.

A Schema Registry is as important for asynchronous APIs as an API manager is for synchronous APIs. In many streaming scenarios, the event payload contains structured data that both the producer and consumer need to understand and validate. A schema registry provides a central repository and a common governance framework for schema documents and enables applications to adhere to these contracts. Today there are registries such as Apicurio by Red Hat, Karapace by Aiven, registries by Cloudera, Lenses, Confluent, Azure, AWS, and more. While schema repositories are increasing in popularity and consolidating in the capabilities and practices around schema management, at the same time they vary in licensing restrictions. Not only that, schema registries tend to leak into client applications in the form of Kafka Serializer/Deserializer (SerDes), converters, and other client dependencies. So the need for an open and vendor-neutral standard where the implementations can be swapped has been apparent for a while. And the good news is that Schema Registry API standard proposal exists in CNCF and few registries such Apicurio and Azure Schema Registry have already started to follow it. Complementing the open source Kafka API with an open source service registry API and common governance practices feels right and I expect the adoption and consolidation in this space to grow to make the whole meta API concept a cornerstone of event-driven architectures.

Similar to EDA, the concept of Change Data Capture (CDC) is not new. But the recent drivers around event-driven systems and the increasing demand for access to real-time data are building the momentum for transaction-log-driven event streaming tools. Today, there are many closed source, point-and-click tools (such as Striim, HVR, Qlik) that rely on the same transaction log concept to replicate data point-to-point. There are cloud services such as AWS DMS, Oracle GoldenGate Cloud Service and Google Datastream that will stream your data into their services (but never in the opposite direction). There are many databases, key-value stores that stream changes too. The need for an open source and vendor-neutral CDC standard that different vendors can follow and downstream change-event consumers can rely on is growing. To succeed, such a standard has to be managed on a vendor-neutral foundation and be part of a larger related ecosystem. The closest thing that exists today is CNCF which is already home to AsyncAPI, CloudEvents, Schema Registry, and Serverless Workflow specifications too. Today, by far, the leading open source project in the CDC space is Debezium. Debezium is used by major companies, embedded into cloud services from Google, Heroku, Confluent, Aiven, Red Hat, embedded into multiple open source projects, and used by many proprietary solutions that we won’t ever know about. If you are looking for a standard in this domain, the closest de facto standard is Debezium. To clarify, with a CDC standard I don’t mean an API for data sources to emit changes. I mean standard conventions for data sources and connecting tissues such as Debezium to follow when converting database transaction logs into events. That includes data mapping (from database field types into JSON/Avro types), data structures (for example Debezium’s Before/After message structure), snapshotting, partitioning of tables into topics, and primary keys into topic partitions, transaction demarcation indicators, and so forth. If you are going heavy on CDC, using Debezium will ensure consistent semantics for mapping from database transaction log entries into Apache Kafka events that are uniform across datasources.

Specifications and implementation around the Apache Kafka ecosystemSpecifications and implementation around the Apache Kafka ecosystem

 Specifications and implementation around the Apache Kafka ecosystem

There are already a few existing specifications from the event-driven space at CNCF that are gaining traction.

  • AsyncAPI is OpenAPI’s equivalent for event-driven applications that recently joined CNCF. It offers a specification to document your event-driven systems to maintain consistency, and governance across different teams and tools.
  • CloudEvents (also part of CNCF) aims to eliminate the metadata challenge by specifying mandatory metadata information into what could be called a standard envelope. It also offers libraries for multiple programming languages for multiple protocols, which streamlines interoperability.
  • OpenTelemetry (another CNCF sandbox project) standardizes the creation and management of trace information that reveals the end-to-end path of events through multiple applications.
  • CNCF Serverless Workflow is a vendor-neutral spec for coordination asynchronous stateless and stateful interaction.
  • The service registry proposal in CNCF we discussed above...

Whether we call it standardization, community adoption or something else, we cannot deny the consolidation process around event-driven constructs and the rise of some open source projects as de facto standards.

Summary

Microservices are focused around the encapsulation of data that belongs to a business domain and exposing it over a minimal API as possible. But that is changing. Data going out of a service is as important as data going into it. Exposing data in microservices can no longer be an afterthought. Siloed and inaccessible data wrapped in a highly decoupled microservice are of limited value. There are new users of data and possible yet unknown users that will demand access to discoverable, understandable, real-time data. To satisfy the needs of these users, microservices have to turn data inside-out and be designed with outbound APIs that can emit data and meta APIs that make the consumption of data a self-service activity. Projects such as Apache Kafka, Debezium and schema registries are a natural enabler of this architecture and with the help of the various open source asynchronous specifications are turning into de facto choice for implementing future-proof event-driven microservices. 

This article was originally published on InfoQ here.

About Me