Featured Post

Claude Code for Beginners: Step-by-Step AI Coding Tutorial

Image
 Artificial Intelligence is changing how developers write software. From generating code to fixing bugs and explaining complex logic, AI tools are becoming everyday companions for programmers. One such powerful tool is Claude Code , powered by Anthropic’s Claude AI model. If you’re a beginner or  an experienced developer looking to improve productivity, this guide will help you understand  what Claude Code is, how it works, and how to use it step-by-step . Let’s get started. What is Claude Code? Claude Code is an AI-powered coding assistant built on top of Anthropic’s Claude models. It helps developers by: Writing code from natural language prompts Explaining existing code Debugging errors Refactoring code for better readability Generating tests and documentation In simple words, you describe what you want in plain English, and Claude Code helps turn that into working code. It supports multiple programming languages, such as: Python JavaScri...

How to Read a CSV File from Amazon S3 Using Python (With Headers and Rows Displayed)

 

Introduction

If you’re working with cloud data, especially on AWS, chances are you’ll encounter data stored in CSV files inside an Amazon S3 bucket. Whether you're building a data pipeline or a quick analysis tool, reading data directly from S3 in Python is a fast, reliable, and scalable way to get started.

In this blog post, we’ll walk through:

  • Setting up access to S3

  • Reading a CSV file using Python and Boto3

  • Displaying headers and rows

  • Tips to handle larger datasets

Let’s jump in!

Python code connecting to AWS S3 using boto3 to read a CSV file


What You’ll Need

  • An AWS account

  • An S3 bucket with a CSV file uploaded

  • AWS credentials (access key and secret key)

  • Python 3.x installed

  • boto3 and pandas libraries installed (you can install them via pip)

pip install boto3 pandas

Step-by-Step: Read CSV from S3

Let’s say your S3 bucket is named my-data-bucket, and your CSV file is sample-data/employees.csv.

✅ Step 1: Import Required Libraries


import boto3 import pandas as pd from io import StringIO

  • boto3 is the AWS SDK for Python.
  • pandas helps load and process the CSV.
  • StringIO is used to handle the in-memory string as a file-like object.

✅ Step 2: Connect to S3

We’ll use your AWS credentials. You can configure them using environment variables or directly in the code for testing (not recommended in production).


s3 = boto3.client( 's3', aws_access_key_id='YOUR_ACCESS_KEY', aws_secret_access_key='YOUR_SECRET_KEY' )

You can also omit the keys above if your environment is already configured using:

aws configure

✅ Step 3: Read the CSV File from S3

bucket_name = 'my-data-bucket' file_key = 'sample-data/employees.csv' response = s3.get_object(Bucket=bucket_name, Key=file_key) csv_data = response['Body'].read().decode('utf-8') get_object() fetches the file.
We decode the binary response to a UTF-8 string.

✅ Step 4: Load CSV into Pandas


df = pd.read_csv(StringIO(csv_data))

At this point, your CSV is now a DataFrame.


✅ Step 5: Display Headers and Rows


# Print column headers print("Column Headers:") print(df.columns.tolist()) # Print first 5 rows print("\nFirst 5 Rows:") print(df.head())

This will output:


Column Headers: ['employee_id', 'name', 'department', 'salary'] First 5 Rows: employee_id name department salary 0 1 Alice HR 60000 1 2 Bob Sales 72000 2 3 Charlie Finance 85000 3 4 Diana Sales 69000 4 5 Edward HR 62000

✅ Complete Code in One Block


import boto3
import pandas as pd from io import StringIO # S3 config bucket_name = 'my-data-bucket' file_key = 'sample-data/employees.csv' # Connect to S3 s3 = boto3.client( 's3', aws_access_key_id='YOUR_ACCESS_KEY', aws_secret_access_key='YOUR_SECRET_KEY' ) # Read file from S3 response = s3.get_object(Bucket=bucket_name, Key=file_key) csv_data = response['Body'].read().decode('utf-8') # Convert to DataFrame df = pd.read_csv(StringIO(csv_data)) # Display headers and rows print("Column Headers:") print(df.columns.tolist()) print("\nFirst 5 Rows:") print(df.head())

📌 Common Errors & Fixes

ErrorFix
botocore.exceptions.NoCredentialsErrorMake sure your credentials are set using aws configure or passed into boto3.client()
UnicodeDecodeErrorTry changing .decode('utf-8') to .decode('ISO-8859-1') or appropriate encoding
File not foundDouble-check the file_key path in your S3 bucket


🔄 Reading Large CSV Files Efficiently

If your CSV file is too large to load all at once, you can use pandas.read_csv() with the chunksize parameter.



chunksize = 1000 # rows per chunk for chunk in pd.read_csv(StringIO(csv_data), chunksize=chunksize): print(chunk.head()) # process or print each chunk

This is useful for optimizing memory and enabling real-time processing.


🔐 Bonus: Secure Your Access with IAM Roles

If you're running this code inside an EC2, Lambda, or Glue environment, it’s best to avoid hardcoding credentials. Use IAM roles with permissions to access S3.

Example policy:


{ "Effect": "Allow", "Action": "s3:GetObject", "Resource": "arn:aws:s3:::my-data-bucket/sample-data/*" }

✅ Summary

Here’s what we’ve done in this guide:

  • Connected Python to AWS S3 using boto3

  • Retrieved and read a CSV file

  • Displayed headers and data using pandas

  • Covered tips for large files and secure access

Working with CSV files from S3 is a great way to build flexible, cloud-powered data pipelines. Whether you're a data engineer, analyst, or Python enthusiast, this pattern is essential in cloud-native projects.

Comments

Popular posts from this blog

SQL Query: 3 Methods for Calculating Cumulative SUM

Step-by-Step Guide to Reading Different Files in Python

5 SQL Queries That Popularly Used in Data Analysis