How to Read a CSV File from Amazon S3 Using Python (With Headers and Rows Displayed)

- July 19, 2025

Introduction

If you’re working with cloud data, especially on AWS, chances are you’ll encounter data stored in CSV files inside an Amazon S3 bucket. Whether you're building a data pipeline or a quick analysis tool, reading data directly from S3 in Python is a fast, reliable, and scalable way to get started.

In this blog post, we’ll walk through:

Setting up access to S3
Reading a CSV file using Python and Boto3
Displaying headers and rows
Tips to handle larger datasets

Let’s jump in!

Python code connecting to AWS S3 using boto3 to read a CSV file

What You’ll Need

An AWS account
An S3 bucket with a CSV file uploaded
AWS credentials (access key and secret key)
Python 3.x installed
boto3 and pandas libraries installed (you can install them via pip)

pip install boto3 pandas

Step-by-Step: Read CSV from S3

Let’s say your S3 bucket is named my-data-bucket, and your CSV file is sample-data/employees.csv.

✅ Step 1: Import Required Libraries


import boto3
import pandas as pd
from io import StringIO

boto3 is the AWS SDK for Python.
pandas helps load and process the CSV.
StringIO is used to handle the in-memory string as a file-like object.

✅ Step 2: Connect to S3

We’ll use your AWS credentials. You can configure them using environment variables or directly in the code for testing (not recommended in production).


s3 = boto3.client(
    's3',
    aws_access_key_id='YOUR_ACCESS_KEY',
    aws_secret_access_key='YOUR_SECRET_KEY'
)

You can also omit the keys above if your environment is already configured using:

aws configure

✅ Step 3: Read the CSV File from S3

bucket_name = 'my-data-bucket'
file_key = 'sample-data/employees.csv'

response = s3.get_object(Bucket=bucket_name, Key=file_key)

csv_data = response['Body'].read().decode('utf-8')
get_object() fetches the file.


We decode the binary response to a UTF-8 string.

✅ Step 4: Load CSV into Pandas


df = pd.read_csv(StringIO(csv_data))

At this point, your CSV is now a DataFrame.

✅ Step 5: Display Headers and Rows


# Print column headers
print("Column Headers:")
print(df.columns.tolist())

# Print first 5 rows
print("\nFirst 5 Rows:")
print(df.head())

This will output:


Column Headers:
['employee_id', 'name', 'department', 'salary']

First 5 Rows:
   employee_id     name department  salary
0            1    Alice         HR   60000
1            2      Bob     Sales   72000
2            3  Charlie   Finance   85000
3            4    Diana     Sales   69000
4            5   Edward         HR   62000

✅ Complete Code in One Block


import boto3
import pandas as pd
from io import StringIO

# S3 config
bucket_name = 'my-data-bucket'
file_key = 'sample-data/employees.csv'

# Connect to S3
s3 = boto3.client(
    's3',
    aws_access_key_id='YOUR_ACCESS_KEY',
    aws_secret_access_key='YOUR_SECRET_KEY'
)

# Read file from S3
response = s3.get_object(Bucket=bucket_name, Key=file_key)
csv_data = response['Body'].read().decode('utf-8')

# Convert to DataFrame
df = pd.read_csv(StringIO(csv_data))

# Display headers and rows
print("Column Headers:")
print(df.columns.tolist())

print("\nFirst 5 Rows:")
print(df.head())

📌 Common Errors & Fixes

Error	Fix
`botocore.exceptions.NoCredentialsError`	Make sure your credentials are set using `aws configure` or passed into `boto3.client()`
`UnicodeDecodeError`	Try changing `.decode('utf-8')` to `.decode('ISO-8859-1')` or appropriate encoding
File not found	Double-check the `file_key` path in your S3 bucket

🔄 Reading Large CSV Files Efficiently

If your CSV file is too large to load all at once, you can use pandas.read_csv() with the chunksize parameter.



chunksize = 1000  # rows per chunk

for chunk in pd.read_csv(StringIO(csv_data), chunksize=chunksize):
    print(chunk.head())  # process or print each chunk

This is useful for optimizing memory and enabling real-time processing.

🔐 Bonus: Secure Your Access with IAM Roles

If you're running this code inside an EC2, Lambda, or Glue environment, it’s best to avoid hardcoding credentials. Use IAM roles with permissions to access S3.

Example policy:


{
  "Effect": "Allow",
  "Action": "s3:GetObject",
  "Resource": "arn:aws:s3:::my-data-bucket/sample-data/*"
}

✅ Summary

Here’s what we’ve done in this guide:

Connected Python to AWS S3 using boto3
Retrieved and read a CSV file
Displayed headers and data using pandas
Covered tips for large files and secure access

Working with CSV files from S3 is a great way to build flexible, cloud-powered data pipelines. Whether you're a data engineer, analyst, or Python enthusiast, this pattern is essential in cloud-native projects.

Search This Blog

ApplyBigAnalytics

Featured Post