Featured Post

8 Ways to Optimize AWS Glue Jobs in a Nutshell

Image
  Improving the performance of AWS Glue jobs involves several strategies that target different aspects of the ETL (Extract, Transform, Load) process. Here are some key practices. 1. Optimize Job Scripts Partitioning : Ensure your data is properly partitioned. Partitioning divides your data into manageable chunks, allowing parallel processing and reducing the amount of data scanned. Filtering : Apply pushdown predicates to filter data early in the ETL process, reducing the amount of data processed downstream. Compression : Use compressed file formats (e.g., Parquet, ORC) for your data sources and sinks. These formats not only reduce storage costs but also improve I/O performance. Optimize Transformations : Minimize the number of transformations and actions in your script. Combine transformations where possible and use DataFrame APIs which are optimized for performance. 2. Use Appropriate Data Formats Parquet and ORC : These columnar formats are efficient for storage and querying, signif

10 Kafka Interview Questions That Recently Asked

10 Kafka Interview Questions That Recently Asked

Kafka Interview Questions

Here're ten interview questions that were asked during Kafka's interview.  These are useful to update your knowledge.


1. What is Kafka?

Kafka is a framework of Publisher and Subscribe. It reads messages from the Producer and allows them to read by Subscribers. It keeps store all the producer messages in the form of topics (underlying partitions). It also maintains logs.


2. What is a Consumer group?

Each consumer is part of some Consumer group. By adding more consumers to a Consumer group, you can balance the load. In general, the Consumer group reads data from the same topic. The number of partitions in a Topic always should be the same as Consumers in a particular CG (consumer group).


3. What is Fault-Tolerance?

Each partition is replicated on multiple servers. So, when one partition is failed, the other backup will deliver. So this concept is called Fault-tolerance.


4. Can we decrease the partitions that we created?

No, you can't decrease the partitions once created. But, you can increase the partitions.


5. What is the architecture of Kafka?

The architecture is a combination of Producer, Broker, Subscriber, and Zookeeper. It can handle messages from multiple producers. It can have multiple Brokers (Sometimes it is called Kafka Broker). Zoooker oversees the Kafka cluster and has information about consumer's messages.


6. How to start Kafka Broker?

In Linux environments, you can start using $ bin/kafka-server-start.sh config/server-1.properties

$ bin/kafka-server-start.sh config/server-2.properties

So, you start Kafka server using different Server properties. Here Server-1, Server-2, and so on.


7. What is Leader Balancing in Kafka?

A partition in a Broker acts as a leader. The partitions of replicas are followers of this leader. In case of failure, the followers act as leads and deliver messages to consumers. This is called Leader balancing.


8. What is the real use of Broker?

The Broker's main functionality is to handle the storage of messages in topics.


9. What are the two main functions of Zookeeper?

  • Oversee the function of the Kafka cluster (all the nodes)
  • It commits each offset after reading by the consumer. So, in case of Consumer failure, with the help of Zooker, the consumer starts reading from the next offset (after it recovered from failure)

 10. What is the Retention period?

The amount of Time that Kafka stores messages in Topics are called the retention period. There are two types of retentions - Time-based and Storage Based


References

Comments

Post a Comment

Thanks for your message. We will get back you.

Popular posts from this blog

How to Fix datetime Import Error in Python Quickly

How to Check Kafka Available Brokers

SQL Query: 3 Methods for Calculating Cumulative SUM