Todo: https://medium.com/@yuvarajl/why-nutanix-beam-went-ahead-with-apache-pulsar-instead-of-apache-kafka-1415f592dbbb

Also read https://pk.org/417/notes/kafka.html

Goal: Create a distributed messaging system to handle large-scale streams of messages.

How can a cluster of computers handle the influx of never-ending streams of data, coming from multiple sources? This data may come from industrial sensors, IoT devices scattered around the world, or log files from tens of thousands of systems in a data center.

It’s easy enough to say that we can divide the work among multiple computers but how would we exactly do that?

image

Overview

image

Ref: https://jaceklaskowski.gitbooks.io/apache-kafka/content/kafka-overview.html

https://stackoverflow.com/questions/41744506/difference-between-stream-processing-and-message-processing

Event broker vs Message queue

image

MESSAGE QUEUE

Messages are put onto a queue and a consumer consumes the message and processes them. Messages are acknowledged as consumed and deleted afterwards. Messages are split between consumers which makes it hard to communicate system with events.

Example of this would be Amazon SQS. Publish messages to the queue and then listen to them, process them and they are removed from the queue.

EVENT BROKER

Event brokers are a push system, they push these events downstream to consumers. Example of this would be Amazon EventBridge.

Ref: https://serverlessland.com/event-driven-architecture/visuals/message-queue-vs-event-broker

Message Queue (MQ)

Stream

image

A message queue here would introduce unnecessary latency and discard historical events that analytics needs to stay accurate.

Rule of thumb:

That’s why the right answer here is: Streams for real-time analytics.

Why Streams fit Aaron’s case (real-time analytics for e-commerce):

image

image

How Google PubSub achieves similar functionality

By separating Topic and Subscription

image

AMQP Protocol

image

RabbitMQ supports different types of Exchanges, to achieve Pubsub-like functionality, fanout with multiple queues as subscription is good image

Similarly, in SNS + SQS for achieving similar functionality

image

Kafka

image

Read at https://www.oreilly.com/library/view/kafka-the-definitive/9781491936153/ch04.html

1693476406513

Topics and Paritions

image

Ref: Kafka white paper

image

image

image

image

Consumers

image

image

image

Ref: https://stackoverflow.com/questions/36203764/how-can-i-scale-kafka-consumers to read about scaling of consuming

Write scalability image

Read scalability image

Ref: https://www.instaclustr.com/blog/the-power-of-kafka-partitions-how-to-get-the-most-out-of-your-kafka-cluster/

Zookeeper

image

TODO:

Zuul architecture, https://www.youtube.com/watch?v=6w6E_B55p0E