Posts

Featured Post

Top Questions People Ask About Pandas, NumPy, Matplotlib & Scikit-learn — Answered!

Image
 Whether you're a beginner or brushing up on your skills, these are the real-world questions Python learners ask most about key libraries in data science. Let’s dive in! 🐍 🐼 Pandas: Data Manipulation Made Easy 1. How do I handle missing data in a DataFrame? df.fillna( 0 ) # Replace NaNs with 0 df.dropna() # Remove rows with NaNs df.isna(). sum () # Count missing values per column 2. How can I merge or join two DataFrames? pd.merge(df1, df2, on= 'id' , how= 'inner' ) # inner, left, right, outer 3. What is the difference between loc[] and iloc[] ? loc[] uses labels (e.g., column names) iloc[] uses integer positions df.loc[ 0 , 'name' ] # label-based df.iloc[ 0 , 1 ] # index-based 4. How do I group data and perform aggregation? df.groupby( 'category' )[ 'sales' ]. sum () 5. How can I convert a column to datetime format? df[ 'date' ] = pd.to_datetime(df[ 'date' ]) ...

How to Fill Nulls in Pandas: bfill and ffill

Image
In Pandas, bfill and ffill are two important methods used for filling missing values in a DataFrame or Series by propagating the previous (forward fill) or next (backward fill) valid values respectively. These methods are particularly useful when dealing with time series data or other ordered data where missing values need to be filled based on the available adjacent values. ffill (forward fill): When you use the ffill method on a DataFrame or Series, it fills missing values with the previous non-null value in the same column. It propagates the last known value forward. This method is often used to carry forward the last observed value for a specific column, making it a good choice for time series data when the assumption is that the value doesn't change abruptly. Example: import pandas as pd data = {'A': [1, 2, None, 4, None, 6],         'B': [None, 'X', 'Y', None, 'Z', 'W']} df = pd.DataFrame(data) print(df) # Output: #      A     B...

How to Handle Spaces in PySpark Dataframe Column

Image
In PySpark, you can employ SQL queries by importing your CSV file data to a DataFrame. However, you might face problems when dealing with spaces in column names of the DataFrame. Fortunately, there is a solution available to resolve this issue. Reading CSV file to Dataframe Here is the PySpark code for reading CSV files and writing to a DataFrame. #initiate session spark = SparkSession.builder \ .appName("PySpark Tutorial") \ .getOrCreate() #Read CSV file to df dataframe data_path = '/content/Test1.csv' df = spark.read.csv(data_path, header=True, inferSchema=True) #Create a Temporary view for the DataFrame df2.createOrReplaceTempView("temp_table") #Read data from the temporary view spark.sql("select * from temp_table").show() Output --------+-----+---------------+---+ |Student| Year|Semester1|Semester2| | ID | | Marks | Marks | +----------+-----+---------------+ | si1 |year1|62.08| 62.4| | si1 |year2|75.94| 76.75| | si...

How to Convert Dictionary to Dataframe: Pandas from_dict

Image
 Pandas is a data analysis Python library.  The example shows you to convert a dictionary to a data frame. The point to note here is DataFrame will take only 2D data. So you need to supply 2D data.  Pandas Dictionary to Dataframe import pandas as pd import numpy as np data_dict = {'item1' : np.random.randn(4), 'item2' : np.random.randn(4)} df3=pd.DataFrame. from_dict (data_dict, orient='index') print(df3) Output 0 1 2 3 item1 -0.109300 -0.483624 0.375838 1.248651 item2 -0.274944 -0.857318 -1.203718 -0.061941 Explanation Using the NumPy package, created a dictionary with random values. There are two items - item 1 and item 2. The data_dict is input to the data frame. The from_dict method needs two parameters. These are data_dict and index. Here's the syntax you can refer to quickly. Related Hands-on Data Analysis Using Pandas How to create 3D data frame in Pandas

The Easy Way to Split String Python Partition Method

Image
Here's a way without the Split function you can split (or extract) a substring. In Python the method is Partition. You'll find here how to use this method with an example.  How to Split the string using Partition method   Returns Left side part Example-1 my_string='ABCDEFGH||10||123456.25|' my_partition=my_string.partition('|')[0] print(my_partition) Output |10||123456.25| ** Process exited - Return Code: 0 ** Press Enter to exit terminal Example-2 Returns from the separator to last of the string. my_string='ABCDEFGH||10||123456.25|' my_partition=my_string.partition('|')[-1] print(my_partition) Output |10||123456.25| ** Process exited - Return Code: 0 ** Press Enter to exit terminal The use of Rpartition to split a string in Python Example-1 Returns except right side last separator. my_string='ABCDEFGH||10||123456.25|' my_partition=my_string.rpartition('|')[0] print(my_partition) Output ABCDEFGH||10||123456.25 ** Process exited - ...

5 Python Pandas Tricky Examples for Data Analysis

Image
Here are five tricky Python Pandas examples. These provide detailed insights to work with Pandas in Python, #1 Dealing with datetime data ( parse_dates pandas example) import pandas as pd # Convert a column to datetime format data['date_column'] = pd.to_datetime(data['date_column']) # Extract components from datetime (e.g., year, month, day) data['year'] = data['date_column'].dt.year data['month'] = data['date_column'].dt.month # Calculate the time difference between two datetime columns data['time_diff'] = data['end_time'] - data['start_time'] #2 Working with text data   # Convert text to lowercase data['text_column'] = data['text_column'].str.lower() # Count the occurrences of specific words in a text column data['word_count'] = data['text_column'].str.count('word') # Extract information using regular expressions data['extracted_info'] = data['text_column']....

2 User Input Python Sample Programs

Image
Here are the Python programs that work on taking user input and giving responses to the user. These are also called interactive programs.  Python enables you to read user input from the command line via the input() function or the raw_input() function. Typically, you assign user input to a variable containing all characters that users enter from the keyboard. User input terminates when users press the <return> key (included with the input characters). #1 User input sample program The following program takes input and replies if the given input value is a string or number. my_input = input("Enter something: ")  try:       x = 0 + eval(my_input)       print('You entered the number:', my_input)  except:       print(userInput,'is a string') Output Enter something:  100 You entered the number: 100 ** Process exited - Return Code: 0 ** Press Enter to exit terminal.  #2 User input sample program The fo...

The Quick and Easy Way to Analyze Numpy Arrays

Image
The quickest and easiest way to analyze NumPy arrays is by using the numpy.array() method. This method allows you to quickly and easily analyze the values contained in a numpy array. This method can also be used to find the sum, mean, standard deviation, max, min, and other useful analysis of the value contained within a numpy array. Sum You can find the sum of Numpy arrays using the np.sum() function.  For example:  import numpy as np  a = np.array([1,2,3,4,5])  b = np.array([6,7,8,9,10])  result = np.sum([a,b])  print(result)  # Output will be 55 Mean You can find the mean of a Numpy array using the np.mean() function. This function takes in an array as an argument and returns the mean of all the values in the array.  For example, the mean of a Numpy array of [1,2,3,4,5] would be  result = np.mean([1,2,3,4,5])  print(result)  #Output: 3.0 Standard Deviation To find the standard deviation of a Numpy array, you can use the NumPy st...