Featured Post

How to Read a CSV File from Amazon S3 Using Python (With Headers and Rows Displayed)

Image
  Introduction If you’re working with cloud data, especially on AWS, chances are you’ll encounter data stored in CSV files inside an Amazon S3 bucket . Whether you're building a data pipeline or a quick analysis tool, reading data directly from S3 in Python is a fast, reliable, and scalable way to get started. In this blog post, we’ll walk through: Setting up access to S3 Reading a CSV file using Python and Boto3 Displaying headers and rows Tips to handle larger datasets Let’s jump in! What You’ll Need An AWS account An S3 bucket with a CSV file uploaded AWS credentials (access key and secret key) Python 3.x installed boto3 and pandas libraries installed (you can install them via pip) pip install boto3 pandas Step-by-Step: Read CSV from S3 Let’s say your S3 bucket is named my-data-bucket , and your CSV file is sample-data/employees.csv . ✅ Step 1: Import Required Libraries import boto3 import pandas as pd from io import StringIO boto3 is...

A Beginner's Guide to Pandas Project for Immediate Practice

Pandas is a powerful data manipulation and analysis library in Python that provides a wide range of functions and tools to work with structured data. Whether you are a data scientist, analyst, or just a curious learner, Pandas can help you efficiently handle and analyze data. 


Simple project for practice


In this blog post, we will walk through a step-by-step guide on how to start a Pandas project from scratch. By following these steps, you will be able to import data, explore and manipulate it, perform calculations and transformations, and save the results for further analysis. So let's dive into the world of Pandas and get started with your own project!


Simple Pandas project

Import the necessary libraries:


import pandas as pd

import numpy as np


Read data from a file into a Pandas DataFrame:


df = pd.read_csv('/path/to/file.csv')

Explore and manipulate the data:


View the first few rows of the DataFrame:


print(df.head())


Access specific columns or rows in the DataFrame:


print(df['column_name'])

print(df.iloc[row_index])


Iterate through the DataFrame rows:


for index, row in df.iterrows():

    print(index, row)


Sort the DataFrame by one or more columns:


df_sorted = df.sort_values(['column1', 'column2'], ascending=[True, False])


Perform calculations and transformations on the data:


df['new_column'] = df['column1'] + df['column2']


Save the manipulated data to a new file:

df.to_csv('/path/to/new_file.csv', index=False)

Remember to adjust the file paths and column names based on your project requirements. These steps provide a basic starting point for a Pandas project and can be expanded upon depending on the specific task or analysis you're working on.


Data sources for CSV files

Comments

Popular posts from this blog

SQL Query: 3 Methods for Calculating Cumulative SUM

5 SQL Queries That Popularly Used in Data Analysis

Big Data: Top Cloud Computing Interview Questions (1 of 4)