Posts

Featured Post

SQL Query: 3 Methods for Calculating Cumulative SUM

Image
SQL provides various constructs for calculating cumulative sums, offering flexibility and efficiency in data analysis. In this article, we explore three distinct SQL queries that facilitate the computation of cumulative sums. Each query leverages different SQL constructs to achieve the desired outcome, catering to diverse analytical needs and preferences. Using Window Functions (e.g., PostgreSQL, SQL Server, Oracle) SELECT id, value, SUM(value) OVER (ORDER BY id) AS cumulative_sum  FROM your_table; This query uses the SUM() window function with the OVER clause to calculate the cumulative sum of the value column ordered by the id column. Using Subqueries (e.g., MySQL, SQLite): SELECT t1.id, t1.value, SUM(t2.value) AS cumulative_sum FROM your_table t1 JOIN your_table t2 ON t1.id >= t2.id GROUP BY t1.id, t1.value ORDER BY t1.id; This query uses a self-join to calculate the cumulative sum. It joins the table with itself, matching rows where the id in the first table is greater than or

Python Regex: The 5 Exclusive Examples

Image
 Regular expressions (regex) are powerful tools for pattern matching and text manipulation in Python. Here are five Python regex examples with explanations: 01 Matching a Simple Pattern import re text = "Hello, World!" pattern = r"Hello" result = re.search(pattern, text) if result:     print("Pattern found:", result.group()) Output: Output: Pattern found: Hello This example searches for the pattern "Hello" in the text and prints it when found. 02 Matching Multiple Patterns import re text = "The quick brown fox jumps over the lazy dog." patterns = [r"fox", r"dog"] for pattern in patterns:     if re.search(pattern, text):         print(f"Pattern '{pattern}' found.") Output: Pattern 'fox' found. Pattern 'dog' found. It searches for both "fox" and "dog" patterns in the text and prints when they are found. 03 Matching Any Digit   import re text = "The price of the

Best Practices for Handling Duplicate Elements in Python Lists

Image
Here are three awesome ways that you can use to remove duplicates in a list. These are helpful in resolving your data analytics solutions.  01. Using a Set Convert the list into a set , which automatically removes duplicates due to its unique element nature, and then convert the set back to a list. Solution: original_list = [2, 4, 6, 2, 8, 6, 10] unique_list = list(set(original_list)) 02. Using a Loop Iterate through the original list and append elements to a new list only if they haven't been added before. Solution: original_list = [2, 4, 6, 2, 8, 6, 10] unique_list = [] for item in original_list:     if item not in unique_list:         unique_list.append(item) 03. Using List Comprehension Create a new list using a list comprehension that includes only the elements not already present in the new list. Solution: original_list = [2, 4, 6, 2, 8, 6, 10] unique_list = [] [unique_list.append(item) for item in original_list if item not in unique_list] All three methods will result in uni

10 Exclusive Python Projects for Interviews

Image
Here are ten Python projects along with code and possible solutions for your practice. 01. Palindrome Checker: Description: Write a function that checks if a given string is a palindrome (reads the same backward as forward). def is_palindrome(s):     s = s.lower().replace(" ", "")     return s == s[::-1] # Test the function print(is_palindrome("radar"))  # Output: True print(is_palindrome("hello"))  # Output: False 02. Word Frequency Counter: Description: Create a program that takes a text file as input and counts the frequency of each word in the file. def word_frequency(file_path):     with open(file_path, 'r') as file:         text = file.read().lower()         words = text.split()         word_count = {}         for word in words:             word_count[word] = word_count.get(word, 0) + 1     return word_count # Test the function file_path = 'sample.txt' word_count = word_frequency(file_path) print(word_count) 03. Guess the Nu

How to Fill Nulls in Pandas: bfill and ffill

Image
In Pandas, bfill and ffill are two important methods used for filling missing values in a DataFrame or Series by propagating the previous (forward fill) or next (backward fill) valid values respectively. These methods are particularly useful when dealing with time series data or other ordered data where missing values need to be filled based on the available adjacent values. ffill (forward fill): When you use the ffill method on a DataFrame or Series, it fills missing values with the previous non-null value in the same column. It propagates the last known value forward. This method is often used to carry forward the last observed value for a specific column, making it a good choice for time series data when the assumption is that the value doesn't change abruptly. Example: import pandas as pd data = {'A': [1, 2, None, 4, None, 6],         'B': [None, 'X', 'Y', None, 'Z', 'W']} df = pd.DataFrame(data) print(df) # Output: #      A     B

How to Handle Spaces in PySpark Dataframe Column

Image
In PySpark, you can employ SQL queries by importing your CSV file data to a DataFrame. However, you might face problems when dealing with spaces in column names of the DataFrame. Fortunately, there is a solution available to resolve this issue. Reading CSV file to Dataframe Here is the PySpark code for reading CSV files and writing to a DataFrame. #initiate session spark = SparkSession.builder \ .appName("PySpark Tutorial") \ .getOrCreate() #Read CSV file to df dataframe data_path = '/content/Test1.csv' df = spark.read.csv(data_path, header=True, inferSchema=True) #Create a Temporary view for the DataFrame df2.createOrReplaceTempView("temp_table") #Read data from the temporary view spark.sql("select * from temp_table").show() Output --------+-----+---------------+---+ |Student| Year|Semester1|Semester2| | ID | | Marks | Marks | +----------+-----+---------------+ | si1 |year1|62.08| 62.4| | si1 |year2|75.94| 76.75| | si

How to Convert Dictionary to Dataframe: Pandas from_dict

Image
 Pandas is a data analysis Python library.  The example shows you to convert a dictionary to a data frame. The point to note here is DataFrame will take only 2D data. So you need to supply 2D data.  Pandas Dictionary to Dataframe import pandas as pd import numpy as np data_dict = {'item1' : np.random.randn(4), 'item2' : np.random.randn(4)} df3=pd.DataFrame. from_dict (data_dict, orient='index') print(df3) Output 0 1 2 3 item1 -0.109300 -0.483624 0.375838 1.248651 item2 -0.274944 -0.857318 -1.203718 -0.061941 Explanation Using the NumPy package, created a dictionary with random values. There are two items - item 1 and item 2. The data_dict is input to the data frame. The from_dict method needs two parameters. These are data_dict and index. Here's the syntax you can refer to quickly. Related Hands-on Data Analysis Using Pandas How to create 3D data frame in Pandas

The Easy Way to Split String Python Partition Method

Image
Here's a way without the Split function you can split (or extract) a substring. In Python the method is Partition. You'll find here how to use this method with an example.  How to Split the string using Partition method   Returns Left side part Example-1 my_string='ABCDEFGH||10||123456.25|' my_partition=my_string.partition('|')[0] print(my_partition) Output |10||123456.25| ** Process exited - Return Code: 0 ** Press Enter to exit terminal Example-2 Returns from the separator to last of the string. my_string='ABCDEFGH||10||123456.25|' my_partition=my_string.partition('|')[-1] print(my_partition) Output |10||123456.25| ** Process exited - Return Code: 0 ** Press Enter to exit terminal The use of Rpartition to split a string in Python Example-1 Returns except right side last separator. my_string='ABCDEFGH||10||123456.25|' my_partition=my_string.rpartition('|')[0] print(my_partition) Output ABCDEFGH||10||123456.25 ** Process exited -