Posts

Showing posts with the label Strategies

Featured Post

8 Ways to Optimize AWS Glue Jobs in a Nutshell

Image
  Improving the performance of AWS Glue jobs involves several strategies that target different aspects of the ETL (Extract, Transform, Load) process. Here are some key practices. 1. Optimize Job Scripts Partitioning : Ensure your data is properly partitioned. Partitioning divides your data into manageable chunks, allowing parallel processing and reducing the amount of data scanned. Filtering : Apply pushdown predicates to filter data early in the ETL process, reducing the amount of data processed downstream. Compression : Use compressed file formats (e.g., Parquet, ORC) for your data sources and sinks. These formats not only reduce storage costs but also improve I/O performance. Optimize Transformations : Minimize the number of transformations and actions in your script. Combine transformations where possible and use DataFrame APIs which are optimized for performance. 2. Use Appropriate Data Formats Parquet and ORC : These columnar formats are efficient for storage and querying, signif

Five top SQL Query Performance Tuning Tips

Image
SQL query runs faster when you write it in a specific method. You can say it as tuning. There are five tuning tips: List of Performance Tuning Tips use index columns, use group by, avoid duplicate column in SELECT & Where, use Left Joins use a co-related subquery. Five top SQL Query Performance Tuning Tips SQL Performance Tuning Tip: 01 Use  indexes in the where clause of SQL . Let me elaborate more on that. Be sure the columns that you are using in the WHERE clause should be already part of the Index columns of that database Table. An example SQL Query: SELECT *  FROM emp_sal_nonppi WHERE dob <= 2017-08-01; SQL Performance Tuning Tip: 02 Use GROUP BY . Some people use a  DISTINCT clause to eliminate duplicates . You can achieve this by GROUP BY. An example SQL Query: SELECT E.empno, E.lastname FROM emp E,emp_projact EP WHERE E.empno = EP.empno GROUP BY E.empno, E.lastname; SQL Performance Tuning Tip: 03 Avoid using duplicates in the Query. Some people use the same col