Featured Post

8 Ways to Optimize AWS Glue Jobs in a Nutshell

Image
  Improving the performance of AWS Glue jobs involves several strategies that target different aspects of the ETL (Extract, Transform, Load) process. Here are some key practices. 1. Optimize Job Scripts Partitioning : Ensure your data is properly partitioned. Partitioning divides your data into manageable chunks, allowing parallel processing and reducing the amount of data scanned. Filtering : Apply pushdown predicates to filter data early in the ETL process, reducing the amount of data processed downstream. Compression : Use compressed file formats (e.g., Parquet, ORC) for your data sources and sinks. These formats not only reduce storage costs but also improve I/O performance. Optimize Transformations : Minimize the number of transformations and actions in your script. Combine transformations where possible and use DataFrame APIs which are optimized for performance. 2. Use Appropriate Data Formats Parquet and ORC : These columnar formats are efficient for storage and querying, signif

How to Create UDF in Python Example

In Python,user-defined function usage is to avoid repeated work. The UDFs in Python are not like C/C++/JAVA. I am sharing ideas on how to create UDF in Python.



udf in python

Python Syntax for User defined function(UDF)

Below is the good example on Python UDF.
def function_name(list of parameters): 
"docstring" 
statement(s) 
return(parameter)       

Explanation of each keyword 

  1. The keyword def symbolizes the start of the function header.
  2. A function name to uniquely identify it. Function naming follows the similar rules that are used for writing identifiers
  3. List of parameters also called as a list of arguments through which value is passed to the function. The list of parameters is optional.
  4. A colon (:) to mark the end of function header.
  5. Optional documentation string (docstring) is used to describe the purpose of the function, which is slightly similar to python documentation using comment.
  6. Python statements that perform the intended task for which the user-defined function is made. It is mandatory to maintain the indentation level while writing python statements in the function definition.
  7. In the end, an optional return statement is used to return a value (result) from the function. This statement can contain an optional parameter to return the computed result back to the function call. If there is no parameter in the statement or the return statement is not mentioned at the end of function definition then the function returns the None object.

Python Vs Other Languages

Python user defined functions
Python is one of the most popular languages in data analytics. There are many other languages that have an option to create UDFs. Even in SQL of any database, you can easily create user-defined functions.

Advantages of Python User defined Function

  • User-defined functions help to decompose a large program into small segments which make the program easy to understand, maintain and debug.
  • If repeated code occurs in a program. The function can be used to include those codes and execute when needed by calling that function.
  • Programmers working on the large project can divide the workload by making different functions.
References

One practical advice

It is always a good idea to name user-defined functions according to the task they perform.

Also, Read

Comments

Popular posts from this blog

How to Fix datetime Import Error in Python Quickly

How to Check Kafka Available Brokers

SQL Query: 3 Methods for Calculating Cumulative SUM