8 Ways to Optimize AWS Glue Jobs in a Nutshell

  Improving the performance of AWS Glue jobs involves several strategies that target different aspects of the ETL (Extract, Transform, Load) process. Here are some key practices. 1. Optimize Job Scripts Partitioning : Ensure your data is properly partitioned. Partitioning divides your data into manageable chunks, allowing parallel processing and reducing the amount of data scanned. Filtering : Apply pushdown predicates to filter data early in the ETL process, reducing the amount of data processed downstream. Compression : Use compressed file formats (e.g., Parquet, ORC) for your data sources and sinks. These formats not only reduce storage costs but also improve I/O performance. Optimize Transformations : Minimize the number of transformations and actions in your script. Combine transformations where possible and use DataFrame APIs which are optimized for performance. 2. Use Appropriate Data Formats Parquet and ORC : These columnar formats are efficient for storage and querying, signif

Relative Vs. Absolute Path in Linux: Top Differences

 Here's the difference between the relative and absolute paths in Linux. Many a time, the programmer needs to trade in these paths. Here're simple ideas on how you can differentiate.

Absolute Path

$ cd /usr/lib

$ cd /usr/lib pwd

See this path (linux#1/usr/lib), when you give PWD, it gives a full path from the root level. This is called absolute or full path.

Think of the absolute pathname as being the complete mailing address for a package that the postal service will deliver to your next-door neighbor.

Relative Path

$ cd usr
$ /user cd lib
$ /usr/lib pwd

$ linux#1/usr/lib ==> Going step by step and achieving.

$ linux#1/usr/lib cd ../../ ==> This is the method of going back step by step.

$ linux#1 ==> This is root level directory

You are currently in the lib directory. So relative path nothing but complete information of all the mother directories.

Here, for lib, the usr is the mother directory. In simple terms, it is a step-by-step way and to reach your target directory. You might aware the .. (double dots), means you'll go to the mother of the current directory.

Think of the relative directory name as giving the postal carrier directions from your house to the one next door so that the carrier can deliver the package.

Command to got to Home directory

$ linux#1/usr/lib

You currently in the 'lib' directory.

$ cd

$ linux#1 ==> This is your home directory.

Notes: If you give cd without any arguments, it goes to home directory.


