Apache Kafka is a distributed publisher subscriber-based message delivery system that acts as a middle layer, which enables back-end systems to share real-time data feed with one another through Kafka topics. Generally, a user or an application can write messages pertaining to any topic and read data from any topic with a standard Kafka set-up. However, Kafka security needs to be implemented when an organisation moves to a shared resources model while multiple teams and applications use the same Kafka cluster. Kafka security also needs to be implemented when a Kafka cluster starts onboarding critical and confidential information.
The following security measures are supported by Kafka:
Authentication of connection requests made by clients to brokers (consumer to producer), other brokers and tools
Authentication of connection requests made by brokers to ZooKeeper
Encryption of data transferred between brokers, between brokers and clients, or between brokers and tools using SSL.
Authorisation of the read/write operations made by clients
In Apache Kafka, exactly-once semantics ensure that every message is written only once without any data loss. This ensures that a message is not processed multiple times so that it can be persisted to the Kafka topic only once.
No, this configuration is not recommended because it will lead to the wastage of resources. Multiple consumers belonging to the same group cannot read messages/data from the same partitions. Some of the consumers in this instance will not be able to read any messages and will, therefore, be idle. The number of consumers in a group should always be less than or equal to the number of partitions.
In Kafka, data is stored across different brokers. A particular broker may stop working at some point in time, which will eventually lead to the loss of data stored in that particular broker. To ensure that data is secure and not lost whenever a broker stops working, topics are replicated and stored across different brokers. So, if one broker stops working, the data stored in that broker can still be fetched from some other broker.
Kafka uses topic replication to ensure fault tolerance. In order to create a topic with multiple copies, the replication factor needs to be specified for the topic at the time of its creation. The minimum value of the replication factor is one. Its maximum value depends on the number of brokers present in the Kafka cluster. The replication factor cannot be greater than the number of brokers in the cluster.
Each consumer belonging to a consumer group is assigned certain partitions of a topic in order to consume messages from the topic. Rebalancing refers to the redistribution of the partitions of a topic across the consumers of a group. This typically occurs when a new consumer joins a group or when a consumer from a group stops working. In such scenarios, partitions are rebalanced across all the available consumers.
Kafka follows the pub-sub messaging model. In this model, consumers can subscribe to the topics of their choice and can consume messages from these topics. Kafka uses the concept of consumer groups, in which consumers that belong to the same application are grouped together and have a common group ID. So, when two consumers belong to different groups, they can
read the same messages. Hence, by allowing consumers to form groups, Kafka enables different/multiple groups to read the same message.
When a message gets pushed to a partition, it is assigned an offset ID. An offset ID is an incremental ID. The message that is pushed first will have a lower offset value, and the one that is pushed later will have a higher offset value. These messages are read by consumers in the order in which they are pushed to the partitions. The message that is pushed first will be consumed first, and the message that is pushed last will be the last one to be consumed.
In Kafka, topics are an organised collection of data. The data sent by producers is stored in a topic. The data to be sent first is written first to a topic. Each Kafka topic has a unique name. A topic can be further divided into partitions. Messages sent by producers contain a key and a value. A key can take null values, which means that the message can be sent without specifying the key. In this scenario, the round-robin distribution of messages occurs across all the partitions of a topic. If two messages contain the same key, then they are written to the same partitions of a topic. Whenever a message gets pushed to a partition, it is assigned an offset ID. An offset ID is an incremental ID. The message that is pushed first will have a lower offset value, and the one that is pushed later will have a higher offset value.