Posts

Showing posts with the label set comprehension

Featured Post

8 Ways to Optimize AWS Glue Jobs in a Nutshell

Image
  Improving the performance of AWS Glue jobs involves several strategies that target different aspects of the ETL (Extract, Transform, Load) process. Here are some key practices. 1. Optimize Job Scripts Partitioning : Ensure your data is properly partitioned. Partitioning divides your data into manageable chunks, allowing parallel processing and reducing the amount of data scanned. Filtering : Apply pushdown predicates to filter data early in the ETL process, reducing the amount of data processed downstream. Compression : Use compressed file formats (e.g., Parquet, ORC) for your data sources and sinks. These formats not only reduce storage costs but also improve I/O performance. Optimize Transformations : Minimize the number of transformations and actions in your script. Combine transformations where possible and use DataFrame APIs which are optimized for performance. 2. Use Appropriate Data Formats Parquet and ORC : These columnar formats are efficient for storage and querying, signif

Python Set comprehension - How to Use it Read now

Image
In python, Set does not allow duplicates, and  you can't modify an existing set with a comprehension. But using the Set comprehension you can create a new Set. Set Comprehension  In addition, the comprehension must result in a valid set.  Likewise Dictionary, a set does not allow entries of the same value. If you try to add values to the set that are already there, it will replace the old one with the new one. Explained syntax Set comprehensions using the {} syntax only exist in Python 3. Before that, you'll have to use the set() function to create and work with sets. You might guess, therefore, that one of the best uses of a set is to eliminate duplicates. In fact, this is one of the most basic forms of set comprehension. Given a list, we can duplicate it as a list with a simple list comprehension like this: Details of logic if we change the list comprehension to a set comprehension, we get the same result, but as a set. That means without duplicates. list_copy = [x for x in o