One of the foundational theorems in distributed systems is the CAP theorem. It simply explains why a distributed data store can provide only two (CP or AP, since P is a given) of the consistency (C), availability(A), and partition tolerance(P) guarantees. While this dilemma explains the main tradeoffs of distributed systems in the case of network partitions, it is not useful in the day-to-day optimizing of systems for specific performance goals in the happy path scenarios. In addition to consistency and availability, developers have to optimize applications for latency and throughput too. In this post, we will uncover how the main distributed systems forces relate to each other and which are the main primitives influencing these forces in the context of Apache Kafka. We will visualize and formulate these findings in the form of a theorem that can serve as a guide while optimizing Apache Kafka for specific goals.

Kafka Primitives

One of the less-known distributed systems theorems is PACELC which extends on CAP by explaining what else (E) happens when a distributed system is running normally in the absence of partitioning. In that case, one can optimize for latency (L) or consistency (C) (hence the full acronym PAC-E-LC). PACELC explains how some systems such as Amazon’s Dynamo favor availability and lower latency over consistency, and how ACID-compliant systems favor consistency over availability or lower latency.

PACELC theorem for distributed data stores

While PACELC explains the main tradeoffs and distributed data stores, it doesn’t apply as-is to Kafka. Kafka is an event streaming platform used primarily for moving data from one place to another. A better way to look at a Kafka-based system is as a collection of producers, consumers, and topics forming distinct data flows. The consequence of that is, the client application configuration can influence every goal significantly and there are first class optimization goals in streaming applications such as end-to-end latency and throughput. Whether you are aware of it or not, every time you configure a Kafka cluster, create a topic, or configure a producer or a consumer, you are making a choice between latency vs throughput and availability vs durability. To explain why this is the case, we have to look into the main dimensions of Kafka topics and the role client applications play in an event flow.

Apache Kafka optimization goals

Partitions

The topic partitions (on the X-axis in our diagram) represent the unit of parallelism in Kafka. On the broker, and the client-side, reads and writes to different partitions can be done fully in parallel. While there are many factors that can limit throughput, generally a higher number of partitions for a topic will enable higher throughput, and a smaller number of partitions will lead to lower throughput.

At the same time, a very large number of partitions will lead to the creation of more metadata that needs to be passed and processed across all brokers and clients. This can impact end to end latency negatively unless more resources are added to the brokers. This is a purposely simplified view of partitions to demonstrate the main tradeoff that the number of partitions leads to in our context.

Replicas

The replication factor (on the Y-axis in our diagram) determines the number of copies (including the leader) each topic partition in a cluster must have. By ensuring all replicas of a partition exist on different brokers, replicas define the data durability. A higher number of replicas ensure the data is copied to more brokers and ensure better durability of data in the case of broker failure.

On the other hand, a lower number of replicas reduces the data durability but in certain circumstances it can increase the availability for the producers or consumers by tolerating more broker failures. That said, the availability for a consumer is determined by the availability of in-sync replicas, whereas for a producer it is determined by the minimum number of in-sync replicas. With a low number of replicas, the availability of overall data flow will be dependent on which brokers fail (the one with the partitions leader of interest or not) and if other brokers are in sync or not. For our purpose we assume less replicas would lead to higher application availability and lower data durability.

Partitions and replicas are not the only primitives influencing the tradeoff between throughput vs latency and durability vs availability, but they represent the broker side of the picture. Other participants in shaping the main Kafka optimization tradeoffs are the client applications.

Producers and Consumers

Topics are used by the producers that send messages and consumers that read these messages. Producers and consumers also state their preference between throughput vs latency and durability vs availability through various configuration options. It is the combination of topic and client application configurations (and other cluster-level configurations such as leader election) that defines what your application is optimized for.

We can look at a flow of events consisting of a producer, a consumer, and a topic. Optimizing such an event flow for average latency, on the client-side would require tuning the producer and consumer to exchange smaller batches of messages. The same flow can be tuned for average throughput by configuring the producer and consumer for larger batches of messages. In the same way, the number of topic partitions influences throughput vs latency, a producer and consumer message batch sizes influence the same.

Producers and consumers involved in a message flow state preference for durability or availability too. A producer that favors durability over availability can demand a higher number of acknowledgements by specifying acks=all. A lower number of acknowledgments (lower than minISR which means 0 or 1) could lead to higher availability from the producer point of view by tolerating a higher number of broker failures, but reduces data durability in the case of catastrophic events with the brokers.

The consumer configurations influencing this dimension (Y-axis) are not as straightforward as the producer configurations but dependent on the consumer application logic. A consumer can favor higher consistency by committing message consumption offsets more often or even individually. Or the consumer can favor availability by increasing various timeouts and tolerating broker failures for longer.

Kafka Optimization Theorem

We defined the main actors involved in an event flow as a producer, a consumer, and a topic. We also defined the opposing goals we have to optimize for: throughput vs latency and durability vs availability. Given that, the Kafka Optimization Theorem states that any Kafka data flow makes tradeoffs between throughput vs latency and durability vs availability.

Kafka Optimization Theorem with primary configuration options

For simplicity, we put producer and consumer preferences on the same axes as topic replicas and partitions. Optimizing for a particular goal would be easier to achieve when these primitives are aligned. For example, optimizing an end-to-end data flow for low latency would be best achieved with smaller producer and consumer message batches combined with a small number of partitions.

A data flow optimized for higher throughput would have larger producer and consumer message batches and higher number of partitions for parallel processing. A data flow optimized for durability would have a higher number of replicas and require a higher number of producer acknowledgments and granular consumer commits. If the data flow is optimized for availability, a smaller number of replicas, and/or smaller number of producer acknowledgments, larger timeouts, would be preferred. In practice, there is no correlation between partitions, replicas, producers, and consumers configurations. It is possible to have a large number of replicas for a topic (minISR), but have a producer that requires a 0 or 1 of acknowledgment. Or have a higher number of partitions because your application logic requires so, and small producer and consumer message batches. These are valid scenarios to use Kafka and one of the strengths of Apache Kafka is being a highly configurable and flexible eventing system satisfying many use cases. The framework proposed here is to serve as a mental model for the main Kafka primitives and how they relate to the optimization dimensions. Knowing the foundational forces will enable you to do tuning specific for your application and understand the effects.

The Golden Ratio

In the proposed Kafka Optimization Theorem, there are deliberately no numbers but only the relation between the main primitives and the direction of change. It is not intended to serve as a concrete optimization configuration but a guide that shows how reducing or increasing a primitive configuration influences the direction of the optimization. Yet sharing a few of the most common configuration options accepted as the industry best practices could be useful as a starting point and demonstration of the theorem in practice.

Most Kafka clusters today, whether that is on-prem with something like Strimzi, or as a fully managed service offering such as OpenShift Streams for Apache Kafka, are almost always deployed within a single region. A production-grade Kafka cluster is typically spread into 3 availability zones (AZs) with a replication factor of 3 (RF=3) and minimum in-sync replicas of 2 (minISR=2). This ensures a good level of data durability during happy times, and good availability for client applications during temporary disruptions. This represents the good middle ground as having minISR=3 would prevent producers from producing even when a single broker is affected, and having minISR=1 would affect both producer and consumer when a leader is down. Typically this replication configuration is accompanied by acks=all on the producer side and default offset commit configurations for the consumers.

While there are commonalities in consistency and availability tradeoffs among different applications, throughput and latency requirements vary. The number of partitions for a topic is influenced by the shape of the data, the data processing logic, and its ordering requirements. At the same time, the number of partitions dictates what is the level of maximum parallelism and message throughput you can achieve. As a consequence, there is no good default number or a range for the partition count.

By default, Kafka clients are optimized for low latency. We can observe that from the default producer values (batch.size=16384, linger.ms=0, compression.type=none) and the default consumer values (fetch.min.bytes=1, fetch.max.wait.ms=500). The producer prior Kafka 3 had acks=1 which recently changed to acks=all with a preference on durability rather than availability or low latency. Optimizing the client applications for throughput would require increasing wait times and batch sizes 5-10 fold and examining the results. Knowing the default values and what they are optimized for is a good starting point for your service optimization goal.

Summary

CAP is a great theorem for failure scenarios. While failures are a given and partitioning will always happen, we have to optimize applications for happy paths too. This post introduces a simplified model explaining how Kafka primitives can be used for performance optimization. It deliberately focuses on the main primitives and selective configurations options to demonstrate the principles. In real life, more factors influence your application performance metrics. Encryption, compression, and memory allocation will impact latency and throughput. Transactions, exactly-one semantics, retries, message flushes, and leader election will impact consistency and availability tradeoffs. You will have broker primitives (replicas and partitions) and client primitives (batch sizes and acknowledgments) optimized for competing tradeoffs. Finding the ideal Kafka configuration for your application will require experimentation, and the Kafka Optimization Theorem will guide you in the journey. Follow me at @bibryam to join my journey of learning Apache Kafka. This post was originally published on Red Hat Developers. To read the original post, check here.