Featured Post

Python map() and lambda() Use Cases and Examples

Image
 In Python, map() and lambda functions are often used together for functional programming. Here are some examples to illustrate how they work. Python map and lambda top use cases 1. Using map() with lambda The map() function applies a given function to all items in an iterable (like a list) and returns a map object (which can be converted to a list). Example: Doubling Numbers numbers = [ 1 , 2 , 3 , 4 , 5 ] doubled = list ( map ( lambda x: x * 2 , numbers)) print (doubled) # Output: [2, 4, 6, 8, 10] 2. Using map() to Convert Data Types Example: Converting Strings to Integers string_numbers = [ "1" , "2" , "3" , "4" , "5" ] integers = list ( map ( lambda x: int (x), string_numbers)) print (integers) # Output: [1, 2, 3, 4, 5] 3. Using map() with Multiple Iterables You can also use map() with more than one iterable. The lambda function can take multiple arguments. Example: Adding Two Lists Element-wise list1 = [ 1 , 2 , 3 ]

How to Handle Spaces in PySpark Dataframe Column

In PySpark, you can employ SQL queries by importing your CSV file data to a DataFrame. However, you might face problems when dealing with spaces in column names of the DataFrame. Fortunately, there is a solution available to resolve this issue.


SQL Space in Column Names


Reading CSV file to Dataframe

Here is the PySpark code for reading CSV files and writing to a DataFrame.

#initiate session
spark = SparkSession.builder \
.appName("PySpark Tutorial") \
.getOrCreate()


#Read CSV file to df dataframe
data_path = '/content/Test1.csv'
df = spark.read.csv(data_path, header=True, inferSchema=True)

#Create a Temporary view for the DataFrame
df2.createOrReplaceTempView("temp_table")

#Read data from the temporary view
spark.sql("select * from temp_table").show()


Output
--------+-----+---------------+---+
|Student| Year|Semester1|Semester2|
| ID | | Marks | Marks |
+----------+-----+---------------+ | si1 |year1|62.08| 62.4| | si1 |year2|75.94| 76.75| | si2 |year1|68.26| 72.95| | si2 |year2|85.49| 75.8| | si3 |year1|75.08| 79.84| | si3 |year2|54.98| 87.72| | si4 |year1|50.03| 66.85| | si4 |year2|71.26| 69.77| | si5 |year1|52.74| 76.27| | si5 |year2|50.39| 68.58| | si6 |year1|74.86| 60.8| | si6 |year2|58.29| 62.38| | si7 |year1|63.95| 74.51| | si7 |year2|66.69| 56.92| +----------+-----+-------------+

Fix for space in the column name


Suppose the column name "Student ID" contains a space. To prevent errors, you must modify your SQL query.

spark.sql("select `Student ID` as sid from temp_table").show()

Output:

+---+ |sid| +---+ |si1| |si1| |si2| |si2| |si3| |si3| |si4| |si4| |si5| |si5| |si6| |si6| |si7| |si7| +---+


Related

Comments

Popular posts from this blog

How to Fix datetime Import Error in Python Quickly

SQL Query: 3 Methods for Calculating Cumulative SUM

Python placeholder '_' Perfect Way to Use it