Featured Post

Python Regex: The 5 Exclusive Examples

Image
 Regular expressions (regex) are powerful tools for pattern matching and text manipulation in Python. Here are five Python regex examples with explanations: 01 Matching a Simple Pattern import re text = "Hello, World!" pattern = r"Hello" result = re.search(pattern, text) if result:     print("Pattern found:", result.group()) Output: Output: Pattern found: Hello This example searches for the pattern "Hello" in the text and prints it when found. 02 Matching Multiple Patterns import re text = "The quick brown fox jumps over the lazy dog." patterns = [r"fox", r"dog"] for pattern in patterns:     if re.search(pattern, text):         print(f"Pattern '{pattern}' found.") Output: Pattern 'fox' found. Pattern 'dog' found. It searches for both "fox" and "dog" patterns in the text and prints when they are found. 03 Matching Any Digit   import re text = "The price of the

How to Handle Spaces in PySpark Dataframe Column

In PySpark, you can employ SQL queries by importing your CSV file data to a DataFrame. However, you might face problems when dealing with spaces in column names of the DataFrame. Fortunately, there is a solution available to resolve this issue.


SQL Space in Column Names


Reading CSV file to Dataframe

Here is the PySpark code for reading CSV files and writing to a DataFrame.

#initiate session
spark = SparkSession.builder \
.appName("PySpark Tutorial") \
.getOrCreate()


#Read CSV file to df dataframe
data_path = '/content/Test1.csv'
df = spark.read.csv(data_path, header=True, inferSchema=True)

#Create a Temporary view for the DataFrame
df2.createOrReplaceTempView("temp_table")

#Read data from the temporary view
spark.sql("select * from temp_table").show()


Output
--------+-----+---------------+---+
|Student| Year|Semester1|Semester2|
| ID | | Marks | Marks |
+----------+-----+---------------+ | si1 |year1|62.08| 62.4| | si1 |year2|75.94| 76.75| | si2 |year1|68.26| 72.95| | si2 |year2|85.49| 75.8| | si3 |year1|75.08| 79.84| | si3 |year2|54.98| 87.72| | si4 |year1|50.03| 66.85| | si4 |year2|71.26| 69.77| | si5 |year1|52.74| 76.27| | si5 |year2|50.39| 68.58| | si6 |year1|74.86| 60.8| | si6 |year2|58.29| 62.38| | si7 |year1|63.95| 74.51| | si7 |year2|66.69| 56.92| +----------+-----+-------------+

Fix for space in the column name


Suppose the column name "Student ID" contains a space. To prevent errors, you must modify your SQL query.

spark.sql("select `Student ID` as sid from temp_table").show()

Output:

+---+ |sid| +---+ |si1| |si1| |si2| |si2| |si3| |si3| |si4| |si4| |si5| |si5| |si6| |si6| |si7| |si7| +---+


Related

Comments

Popular posts from this blog

Explained Ideal Structure of Python Class

6 Python file Methods Real Usage

How to Decode TLV Quickly