Featured Post

Python: Built-in Functions vs. For & If Loops – 5 Programs Explained

Image
Python’s built-in functions make coding fast and efficient. But understanding how they work under the hood is crucial to mastering Python. This post shows five Python tasks, each implemented in two ways: Using built-in functions Using for loops and if statements ✅ 1. Sum of a List ✅ Using Built-in Function: numbers = [ 10 , 20 , 30 , 40 ] total = sum (numbers) print ( "Sum:" , total) 🔁 Using For Loop: numbers = [ 10 , 20 , 30 , 40 ] total = 0 for num in numbers: total += num print ( "Sum:" , total) ✅ 2. Find Maximum Value ✅ Using Built-in Function: values = [ 3 , 18 , 7 , 24 , 11 ] maximum = max (values) print ( "Max:" , maximum) 🔁 Using For and If: values = [ 3 , 18 , 7 , 24 , 11 ] maximum = values[ 0 ] for val in values: if val > maximum: maximum = val print ( "Max:" , maximum) ✅ 3. Count Vowels in a String ✅ Using Built-ins: text = "hello world" vowel_count = sum ( 1 for ch in text if ch i...

How to Handle Spaces in PySpark Dataframe Column

In PySpark, you can employ SQL queries by importing your CSV file data to a DataFrame. However, you might face problems when dealing with spaces in column names of the DataFrame. Fortunately, there is a solution available to resolve this issue.


SQL Space in Column Names


Reading CSV file to Dataframe

Here is the PySpark code for reading CSV files and writing to a DataFrame.

#initiate session
spark = SparkSession.builder \
.appName("PySpark Tutorial") \
.getOrCreate()


#Read CSV file to df dataframe
data_path = '/content/Test1.csv'
df = spark.read.csv(data_path, header=True, inferSchema=True)

#Create a Temporary view for the DataFrame
df2.createOrReplaceTempView("temp_table")

#Read data from the temporary view
spark.sql("select * from temp_table").show()


Output
--------+-----+---------------+---+
|Student| Year|Semester1|Semester2|
| ID | | Marks | Marks |
+----------+-----+---------------+ | si1 |year1|62.08| 62.4| | si1 |year2|75.94| 76.75| | si2 |year1|68.26| 72.95| | si2 |year2|85.49| 75.8| | si3 |year1|75.08| 79.84| | si3 |year2|54.98| 87.72| | si4 |year1|50.03| 66.85| | si4 |year2|71.26| 69.77| | si5 |year1|52.74| 76.27| | si5 |year2|50.39| 68.58| | si6 |year1|74.86| 60.8| | si6 |year2|58.29| 62.38| | si7 |year1|63.95| 74.51| | si7 |year2|66.69| 56.92| +----------+-----+-------------+

Fix for space in the column name


Suppose the column name "Student ID" contains a space. To prevent errors, you must modify your SQL query.

spark.sql("select `Student ID` as sid from temp_table").show()

Output:

+---+ |sid| +---+ |si1| |si1| |si2| |si2| |si3| |si3| |si4| |si4| |si5| |si5| |si6| |si6| |si7| |si7| +---+


Related

Comments

Popular posts from this blog

SQL Query: 3 Methods for Calculating Cumulative SUM

5 SQL Queries That Popularly Used in Data Analysis

Big Data: Top Cloud Computing Interview Questions (1 of 4)