Featured Post

8 Ways to Optimize AWS Glue Jobs in a Nutshell

Image
  Improving the performance of AWS Glue jobs involves several strategies that target different aspects of the ETL (Extract, Transform, Load) process. Here are some key practices. 1. Optimize Job Scripts Partitioning : Ensure your data is properly partitioned. Partitioning divides your data into manageable chunks, allowing parallel processing and reducing the amount of data scanned. Filtering : Apply pushdown predicates to filter data early in the ETL process, reducing the amount of data processed downstream. Compression : Use compressed file formats (e.g., Parquet, ORC) for your data sources and sinks. These formats not only reduce storage costs but also improve I/O performance. Optimize Transformations : Minimize the number of transformations and actions in your script. Combine transformations where possible and use DataFrame APIs which are optimized for performance. 2. Use Appropriate Data Formats Parquet and ORC : These columnar formats are efficient for storage and querying, signif

How to write Regular expression Quickly in python and Examples

Regular Expressions purpose is to find matching string in another string. You will get either 'True' or 'False' as a response. I am not sharing here how to play tennis. My intention is if you just follow ideas, you can play tennis today.

Python supports regular expressions. It has a special library to work with these. I have shared best examples for your quick reference.
 

Python Regular Expressions

  1. What is a regular expression
  2. How does python support
  3. Best examples


1. What is regular expression


>>> haystack = 'My phone number is 213-867-5309.' 
>>> '213-867-5309' in haystack
True


This is just a fundamental use of the regular expression. The real use of Regular Expression comes here. That is - to find if the main has any valid phone number.


Regular expressions also called regexes.

2. Why do we need regx

  1. Data mining - to get required data if it is present are not
  2. Data validations - to get an answer if the received string is valid or not.

Python support


Python has its own regular expression library. That is called re. What you need to do is just import it.

>>>import re


When data matches and not matches

  1. If a match found, it returns the String
  2. If there is no match, it will return null


Example for regex


>>> import re
>>> re.search(r'fox', 'The quick brown fox jumped...')
<_sre.SRE_Match object; span=(16, 19), match='fox'>

Notes: The returned string is 'fox'.


Matching string


>>> match = re.search(r'fox', 'The quick brown fox jumped...')
>>> match.group() 'fox'

Notes: The returned string is 'fox'.



Multiple matches

>>> import re >>> re.findall(r'o', 'The quick brown fox jumped...')
['o', 'o']

Notes: It returns multiple strings.

Comments

Popular posts from this blog

How to Fix datetime Import Error in Python Quickly

How to Check Kafka Available Brokers

SQL Query: 3 Methods for Calculating Cumulative SUM