Featured Post

8 Ways to Optimize AWS Glue Jobs in a Nutshell

  Improving the performance of AWS Glue jobs involves several strategies that target different aspects of the ETL (Extract, Transform, Load) process. Here are some key practices. 1. Optimize Job Scripts Partitioning : Ensure your data is properly partitioned. Partitioning divides your data into manageable chunks, allowing parallel processing and reducing the amount of data scanned. Filtering : Apply pushdown predicates to filter data early in the ETL process, reducing the amount of data processed downstream. Compression : Use compressed file formats (e.g., Parquet, ORC) for your data sources and sinks. These formats not only reduce storage costs but also improve I/O performance. Optimize Transformations : Minimize the number of transformations and actions in your script. Combine transformations where possible and use DataFrame APIs which are optimized for performance. 2. Use Appropriate Data Formats Parquet and ORC : These columnar formats are efficient for storage and querying, signif

These 2 top skills you need to become an analyst

Pools of master data present in repositories play a big role in data analytics. For example, data is already re-posited in data warehouses. Example, product data, customer data etc.
tech and soft skills

Big data needs mixed skills. For example - technical skills and some soft skills.

1# Technical Skills

One being able to administer software frameworks like:
  • Hadoop, 
  • expertise in databases like noSQL, 
  • Cassandra or HBase 
  • analytics programming languages and facilities like R or Pig.

2# Soft Kills

Ability of people to think broadly across the organization, to understand the bottom-line needs of the business, to know which analytics questions to pose to get to those bottom lines, and to measure and communicate results. Additional Technical Skills - SAS, Cognos



Popular posts from this blog

How to Fix datetime Import Error in Python Quickly

How to Check Kafka Available Brokers

SQL Query: 3 Methods for Calculating Cumulative SUM