Posts

Showing posts with the label dummies

Featured Post

8 Ways to Optimize AWS Glue Jobs in a Nutshell

Image
  Improving the performance of AWS Glue jobs involves several strategies that target different aspects of the ETL (Extract, Transform, Load) process. Here are some key practices. 1. Optimize Job Scripts Partitioning : Ensure your data is properly partitioned. Partitioning divides your data into manageable chunks, allowing parallel processing and reducing the amount of data scanned. Filtering : Apply pushdown predicates to filter data early in the ETL process, reducing the amount of data processed downstream. Compression : Use compressed file formats (e.g., Parquet, ORC) for your data sources and sinks. These formats not only reduce storage costs but also improve I/O performance. Optimize Transformations : Minimize the number of transformations and actions in your script. Combine transformations where possible and use DataFrame APIs which are optimized for performance. 2. Use Appropriate Data Formats Parquet and ORC : These columnar formats are efficient for storage and querying, signif

Kafka Flowchart Useful for Dummies

Image
How Kafka Works Here're the prime points on Kafka stream-processing. In Mainframe, the data you receive/process in two methods (Batch and online). In Kafka, it receives data and sends it to consumers. Here're the details with Architecture, Logs, and applications that use Kafka. The streaming data is different (YouTube Live). When the data comes into Queue the data will then be processed. In the batch process, you need to wait till you get the Batch completes. In the case of stream processing, it is on the fly. 1. Architecture 2. Process Kafka is a publish/subscribe system , but it would be more precise to say that Kafka acts as a message broker. A broker is an intermediary that brings together two parties that don’t necessarily know each other for a mutually beneficial exchange or deal. Kafka stores messages in topics and retrieves messages from topics. There’s no direct connection between the producers and the consumers of the messages. Additionally, Kafka doesn’t keep any st