Showing posts with the label PySpark

Featured Post

Python Regex: The 5 Exclusive Examples

 Regular expressions (regex) are powerful tools for pattern matching and text manipulation in Python. Here are five Python regex examples with explanations: 01 Matching a Simple Pattern import re text = "Hello, World!" pattern = r"Hello" result =, text) if result:     print("Pattern found:", Output: Output: Pattern found: Hello This example searches for the pattern "Hello" in the text and prints it when found. 02 Matching Multiple Patterns import re text = "The quick brown fox jumps over the lazy dog." patterns = [r"fox", r"dog"] for pattern in patterns:     if, text):         print(f"Pattern '{pattern}' found.") Output: Pattern 'fox' found. Pattern 'dog' found. It searches for both "fox" and "dog" patterns in the text and prints when they are found. 03 Matching Any Digit   import re text = "The price of the

How to Handle Spaces in PySpark Dataframe Column

In PySpark, you can employ SQL queries by importing your CSV file data to a DataFrame. However, you might face problems when dealing with spaces in column names of the DataFrame. Fortunately, there is a solution available to resolve this issue. Reading CSV file to Dataframe Here is the PySpark code for reading CSV files and writing to a DataFrame. #initiate session spark = SparkSession.builder \ .appName("PySpark Tutorial") \ .getOrCreate() #Read CSV file to df dataframe data_path = '/content/Test1.csv' df =, header=True, inferSchema=True) #Create a Temporary view for the DataFrame df2.createOrReplaceTempView("temp_table") #Read data from the temporary view spark.sql("select * from temp_table").show() Output --------+-----+---------------+---+ |Student| Year|Semester1|Semester2| | ID | | Marks | Marks | +----------+-----+---------------+ | si1 |year1|62.08| 62.4| | si1 |year2|75.94| 76.75| | si