Featured Post

Claude Code for Beginners: Step-by-Step AI Coding Tutorial

Image
 Artificial Intelligence is changing how developers write software. From generating code to fixing bugs and explaining complex logic, AI tools are becoming everyday companions for programmers. One such powerful tool is Claude Code , powered by Anthropic’s Claude AI model. If you’re a beginner or  an experienced developer looking to improve productivity, this guide will help you understand  what Claude Code is, how it works, and how to use it step-by-step . Let’s get started. What is Claude Code? Claude Code is an AI-powered coding assistant built on top of Anthropic’s Claude models. It helps developers by: Writing code from natural language prompts Explaining existing code Debugging errors Refactoring code for better readability Generating tests and documentation In simple words, you describe what you want in plain English, and Claude Code helps turn that into working code. It supports multiple programming languages, such as: Python JavaScri...

How to Handle Spaces in PySpark Dataframe Column

In PySpark, you can employ SQL queries by importing your CSV file data to a DataFrame. However, you might face problems when dealing with spaces in column names of the DataFrame. Fortunately, there is a solution available to resolve this issue.


SQL Space in Column Names


Reading CSV file to Dataframe

Here is the PySpark code for reading CSV files and writing to a DataFrame.

#initiate session
spark = SparkSession.builder \
.appName("PySpark Tutorial") \
.getOrCreate()


#Read CSV file to df dataframe
data_path = '/content/Test1.csv'
df = spark.read.csv(data_path, header=True, inferSchema=True)

#Create a Temporary view for the DataFrame
df2.createOrReplaceTempView("temp_table")

#Read data from the temporary view
spark.sql("select * from temp_table").show()


Output
--------+-----+---------------+---+
|Student| Year|Semester1|Semester2|
| ID | | Marks | Marks |
+----------+-----+---------------+ | si1 |year1|62.08| 62.4| | si1 |year2|75.94| 76.75| | si2 |year1|68.26| 72.95| | si2 |year2|85.49| 75.8| | si3 |year1|75.08| 79.84| | si3 |year2|54.98| 87.72| | si4 |year1|50.03| 66.85| | si4 |year2|71.26| 69.77| | si5 |year1|52.74| 76.27| | si5 |year2|50.39| 68.58| | si6 |year1|74.86| 60.8| | si6 |year2|58.29| 62.38| | si7 |year1|63.95| 74.51| | si7 |year2|66.69| 56.92| +----------+-----+-------------+

Fix for space in the column name


Suppose the column name "Student ID" contains a space. To prevent errors, you must modify your SQL query.

spark.sql("select `Student ID` as sid from temp_table").show()

Output:

+---+ |sid| +---+ |si1| |si1| |si2| |si2| |si3| |si3| |si4| |si4| |si5| |si5| |si6| |si6| |si7| |si7| +---+


Related

Comments

Popular posts from this blog

SQL Query: 3 Methods for Calculating Cumulative SUM

Step-by-Step Guide to Reading Different Files in Python

5 SQL Queries That Popularly Used in Data Analysis