Kafka Interview Questions

Displaying 1 - 10 of 14

Discuss the security features in Kafka.

Apache Kafka is a distributed publisher subscriber-based message delivery system that acts as a middle layer, which enables back-end systems to share real-time data feed with one another through Kafka topics. Generally, a user or an application can write messages pertaining to any topic and read data from any topic with a standard Kafka set-up. However, Kafka security needs to be implemented when an organisation moves to a shared resources model while multiple teams and applications use the same Kafka cluster. Kafka security also needs to be implemented when a Kafka cluster starts onboarding critical and confidential information.

  • The following security measures are supported by Kafka:
  • Authentication of connection requests made by clients to brokers (consumer to producer), other brokers and tools
  • Authentication of connection requests made by brokers to ZooKeeper
  • Encryption of data transferred between brokers, between brokers and clients, or between brokers and tools using SSL.
  • Authorisation of the read/write operations made by clients

Explain the exactly-once semantics in Kafka?

In Apache Kafka, exactly-once semantics ensure that every message is written only once without any data loss. This ensures that a message is not processed multiple times so that it can be persisted to the Kafka topic only once.

What is Kafka Connect? What are the different scenarios in which Kafka Connect can be used?

Kafka Connect is typically used for data ingestion from Kafka to other systems and from other systems to Kafka at scale, ensuring security.
It can be used in the following scenarios:

  • When integrating Kafka with well-known systems such as Twitter and databases
  • When the data volume and velocity is high
  • When you need to have a secure system
  • When you need to quickly define the connectors

Suppose we have a Kafka topic with five partitions. A consumer group that has seven instances of consumers is attached to the topic. Is this configuration recommended? If not, why?

No, this configuration is not recommended because it will lead to the wastage of resources. Multiple consumers belonging to the same group cannot read messages/data from the same partitions. Some of the consumers in this instance will not be able to read any messages and will, therefore, be idle. The number of consumers in a group should always be less than or equal to the number of partitions.

What is the importance of topic replication? How can you create topics using replication? What are the possible values that the replication factor can take?

In Kafka, data is stored across different brokers. A particular broker may stop working at some point in time, which will eventually lead to the loss of data stored in that particular broker. To ensure that data is secure and not lost whenever a broker stops working, topics are replicated and stored across different brokers. So, if one broker stops working, the data stored in that broker can still be fetched from some other broker.

Kafka uses topic replication to ensure fault tolerance. In order to create a topic with multiple copies, the replication factor needs to be specified for the topic at the time of its creation. The minimum value of the replication factor is one. Its maximum value depends on the number of brokers present in the Kafka cluster. The replication factor cannot be greater than the number of brokers in the cluster.

What is Kafka rebalancing? When does rebalancing occur in Kafka?

Each consumer belonging to a consumer group is assigned certain partitions of a topic in order to consume messages from the topic. Rebalancing refers to the redistribution of the partitions of a topic across the consumers of a group. This typically occurs when a new consumer joins a group or when a consumer from a group stops working. In such scenarios, partitions are rebalanced across all the available consumers.

How can the same message be consumed by different/multiple consumers in Kafka?

Kafka follows the pub-sub messaging model. In this model, consumers can subscribe to the topics of their choice and can consume messages from these topics. Kafka uses the concept of consumer groups, in which consumers that belong to the same application are grouped together and have a common group ID. So, when two consumers belong to different groups, they can 

read the same messages. Hence, by allowing consumers to form groups, Kafka enables different/multiple groups to read the same message.

Explain the concept of offsets in Kafka. What is the order in which messages are read by a consumer from a partition?

When a message gets pushed to a partition, it is assigned an offset ID. An offset ID is an incremental ID. The message that is pushed first will have a lower offset value, and the one that is pushed later will have a higher offset value. These messages are read by consumers in the order in which they are pushed to the partitions. The message that is pushed first will be consumed first, and the message that is pushed last will be the last one to be consumed.

What is a Kafka topic? How is the ordering of the messages guaranteed in Kafka?

In Kafka, topics are an organised collection of data. The data sent by producers is stored in a topic. The data to be sent first is written first to a topic. Each Kafka topic has a unique name. A topic can be further divided into partitions. Messages sent by producers contain a key and a value. A key can take null values, which means that the message can be sent without specifying the key. In this scenario, the round-robin distribution of messages occurs across all the partitions of a topic. If two messages contain the same key, then they are written to the same partitions of a topic. Whenever a message gets pushed to a partition, it is assigned an offset ID. An offset ID is an incremental ID. The message that is pushed first will have a lower offset value, and the one that is pushed later will have a higher offset value.