Posts

Featured Post

8 Ways to Optimize AWS Glue Jobs in a Nutshell

Image
  Improving the performance of AWS Glue jobs involves several strategies that target different aspects of the ETL (Extract, Transform, Load) process. Here are some key practices. 1. Optimize Job Scripts Partitioning : Ensure your data is properly partitioned. Partitioning divides your data into manageable chunks, allowing parallel processing and reducing the amount of data scanned. Filtering : Apply pushdown predicates to filter data early in the ETL process, reducing the amount of data processed downstream. Compression : Use compressed file formats (e.g., Parquet, ORC) for your data sources and sinks. These formats not only reduce storage costs but also improve I/O performance. Optimize Transformations : Minimize the number of transformations and actions in your script. Combine transformations where possible and use DataFrame APIs which are optimized for performance. 2. Use Appropriate Data Formats Parquet and ORC : These columnar formats are efficient for storage and querying, signif

How to Delete an Item from a Set in Python: Best Example

Image
Set is a built-in data type in Python. Furthermore, it is an unordered collection without duplicate items. Here are the two methods that explain to delete an item from a Set. Methods to delete an item from a Set discard remove Discrd Vs. Remove discard() will not raise an error if the item to remove does not exist. The remove() will raise an error if the item does not exist. Explanation to discard and remove methods Python program: #Prints all the Set items food = {"pasta", "burger", "hot dog", "pizza"} print(food) # Prints the Set items without pasta food.discard("pasta") print(food) # Prints the Set items without burger and pasta food.remove("burger") print(food) # The next two lines try to remove an item that isn't in the set! food.discard("pasta")  # this will not report an error food.remove("pasta")   # this will report an error The output: {'pasta', 'burger', 'pizza', '

How to Access Dictionary Key-Value Data in Python

Image
Use for-loop to read dictionary data in python. Here's an example of reading dictionary data. It's helpful to use in real projects. Python program to read dictionary data yearly_revenue = {    2017 : 1000000,    2018 : 1200000,    2019 : 1250000,    2020 : 1100000,    2021 : 1300000,  } total_income = 0 for year_id in yearly_revenue.keys() :   total_income+=yearly_revenue[year_id]   print(year_id, yearly_revenue[year_id]) print(total_income) print(total_income/len(yearly_revenue)) Output 2017 1000000 2018 1200000 2019 1250000 2020 1100000 2021 1300000 5850000 1170000.0 ** Process exited - Return Code: 0 ** Press Enter to exit the terminal Explanation The input is dictionary data. The total revenue sums up for each year. Notably, the critical point is using the dictionary keys method. References Python in-depth and sample programs

How to Decode Python Exception Messages Like a Pro

Image
While developing python programs, you might see exception messages from python. Here's an explanation to understand each part of the message. Here're tips on how to understand python exceptions. You can find two kinds of exceptions. These are StandardError and StopIteration errors. Here is a chart that shows the types of python errors. Python exceptions class Python exceptions are basically three parts. Reading an error message produced by  Python is not very difficult . The error type, the error description, and the traceback. Understand the python exception message The Error Type There are so many in-built exception types in python. Here is the command to get all the exception types: [x for x in dir(__builtins__) if 'Error' in x] The Error description The text message right after the error type gives us a description of what exactly the problem was. These descriptions are sometimes very accurate, sometimes not. Sample error Traceback (most recent call last):       Fil

Python Tuples: An Overview with Code Examples

Image
Tuple in python is one of the streaming datasets. The other streaming datasets are List and Dictionary. Operations that you can perform on it are shown here for your reference. Writing tuple is easy. It has values of comma separated, and enclosed with parenthesis '()'. The values in the tuple are immutable, which means you cannot replace with new values. #1. How to create a tuple Code: Tuple example my_tuple=(1,2,3,4,5) print(my_tuple) Output: (1, 2, 3, 4, 5) ** Process exited - Return Code: 0 ** Press Enter to exit terminal #2. How to read tuple values Code: print(my_tuple[0]) Output: 1 ** Process exited - Return Code: 0 ** Press Enter to exit terminal #3. How to add two tuples Code: a=(1,6,7,8) c=(3,4,5,6,7,8) d=print(a+c) Output: (1, 6, 7, 8, 3, 4, 5, 6, 7, 8) ** Process exited - Return Code: 0 ** Press Enter to exit terminal #4.  How to count tuple values Here the count is not counting values; count the repetition of a given value. Code: sample=(1, 6, 7, 8, 3, 4, 5, 6, 7, 8

Relational Operators in Python: A Quick Guide On How to Use Them

Image
Relational operators in Python are helpful, If you are working with numeric values to compare them. Here we explore eight different relational operators and provide examples of how each one works. So to compare numeric values it is a useful guide to refresh. Python Relational Operators Here's a frequently used list of relational operators, and these you can use to compare numeric values. The list shows how to use each operator helpful for data analysis . < <= > >= == != Is is not Python program: How to use relational operators Assign 23 to a and 11 to b. Then, apply all the comparison operators. The output is self-explanatory. Bookmark this article to refresh when you are in doubt. Example a = 23 b = 11 print("Is a greater than b?", a > b) #greater than print("Is a less than b?", a < b) #less than print("Is a greater or equal to b?", a >= b) #greater or equal print("Is a less or equal to b?", a <= b) #less or equal pr

Python Program: JSON to CSV Conversion

Image
JavaScript object notion is also called JSON file, it's data you can write to a CSV file. Here's a sample python logic for your ready reference.  You can write a simple python program by importing the JSON, and CSV packages. This is your first step. It is helpful to use all the JSON methods in your python logic. That means the required package is JSON. So far, so good. In the next step, I'll show you how to write a Python program. You'll also find each term explained. What is JSON File JSON is key value pair file. The popular use of JSON file is to transmit data between heterogeneous applications. Python supports JSON file. What is CSV File The CSV is comma separated file. It is popularly used to send and receive data. How to Write JSON file data to a CSV file Here the JSON data that has written to CSV file. It's simple method and you can use for CSV file conversion use. import csv, json json_string = '[{"value1": 1, "value2": 2,"value3

Numpy Array Vs. List: What's the Difference

Image
Here are the differences between List and NumPy Array. Both store data, but technically these are not the same. You'll find here where they differ from each other. Python Lists Here is all about Python lists: Lists can have data of different data types. For instance, data = [3, 3.2, 4.6, 6, 6.8, 9, “hello”, ‘a’] Operations such as subtraction, multiplying, and division allow doing through loops Storage space required is more, as each element is considered an object in Python Execution time is high for large datasets Lists are inbuilt data types How to create array types in Python NumPy Arrays Here is all about NumPy Arrays: Numpy arrays are containers for storing only homogeneous data types. For example: data= [3.2, 4.6, 6.8]; data=[3, 6, 9]; data=[‘hello’, ‘a’] Numpy is designed to do all mathematical operations in parallel and is also simpler than Python Numpy storage space is very much less compared to the list due to the practice of homogeneous data type Execution time is