Step-by-Step Guide to Reading Different Files in Python

- August 10, 2025

In the world of data science, automation, and general programming, working with files is unavoidable. Whether you’re dealing with CSV reports, JSON APIs, Excel sheets, or text logs, Python provides rich and easy-to-use libraries for reading different file formats. In this guide, we’ll explore how to read different files in Python, with code examples and best practices.

Python code snippet showing file reading examples.

1. Reading Text Files (`.txt`)

Text files are the simplest form of files. Python’s built-in open() function handles them effortlessly.

Example:


# Open and read a text file
with open("sample.txt", "r") as file:
    content = file.read()

print(content)

Explanation:

"r" mode means read.
with open() automatically closes the file when done.

Best Practice: Always use with to handle files to avoid memory leaks.

2. Reading CSV Files (`.csv`)

CSV files are widely used for storing tabular data. Python has a built-in csv module and a powerful pandas library.

Using csv module:


import csv

with open("data.csv", "r") as file:
    reader = csv.reader(file)
    for row in reader:
        print(row)

Using Pandas (recommended for data analysis):


import pandas as pd

df = pd.read_csv("data.csv")
print(df.head())

Best Practice: Use Pandas for large datasets and data analysis tasks.

3. Reading JSON Files (`.json`)

JSON (JavaScript Object Notation) is common for APIs and configuration files.

Example:


import json

with open("data.json", "r") as file:
    data = json.load(file)

print(data)

Best Practice: For large JSON files, consider ijson for streaming reads.

4. Reading Excel Files (`.xlsx` and `.xls`)

Excel files are popular in business and reporting.

Using Pandas:


import pandas as pd

df = pd.read_excel("data.xlsx")
print(df.head())

Using openpyxl (for .xlsx only):


from openpyxl import load_workbook

wb = load_workbook("data.xlsx")
sheet = wb.active

for row in sheet.iter_rows(values_only=True):
    print(row)

Best Practice: Use pandas for analysis, openpyxl for Excel-specific operations like formatting.

5. Reading XML Files (`.xml`)

XML is still used in configuration files, legacy systems, and web services.

Example with xml.etree.ElementTree:


import xml.etree.ElementTree as ET

tree = ET.parse("data.xml")
root = tree.getroot()

for child in root:
    print(child.tag, child.attrib)

6. Reading PDF Files (`.pdf`)

PDFs are common for reports and documents. Python’s PyPDF2 or pdfplumber can extract text.

Example with PyPDF2:


import PyPDF2

with open("sample.pdf", "rb") as file:
    reader = PyPDF2.PdfReader(file)
    for page in reader.pages:
        print(page.extract_text())

7. Reading Images (`.jpg`, `.png`, etc.)

While not textual data, reading image files is essential in machine learning.

Example with PIL:


from PIL import Image

img = Image.open("image.jpg")
img.show()

8. Reading ZIP Files

Sometimes files come compressed.


import zipfile

with zipfile.ZipFile("files.zip", "r") as zip_ref:
    zip_ref.extractall("extracted_files")

Tips for Efficient File Reading

Use context managers (with open) to handle file closing automatically.
Avoid reading large files all at once; read in chunks if memory is limited.
Choose the right library — Pandas for data analysis, built-in modules for small tasks.
Check file encoding — for text files, specify encoding="utf-8" if needed.

Conclusion

Python makes working with different file formats straightforward. With just a few lines of code, you can handle text, CSV, JSON, Excel, XML, PDF, images, and ZIP archives. The choice of library depends on the file type and your end goal. Once you master file handling, you’ll be ready to build powerful automation scripts, data pipelines, and AI models.

Search This Blog

ApplyBigAnalytics

Featured Post

Step-by-Step Guide to Creating an AWS RDS Database Instance

Step-by-Step Guide to Reading Different Files in Python

1. Reading Text Files (`.txt`)

2. Reading CSV Files (`.csv`)

3. Reading JSON Files (`.json`)

4. Reading Excel Files (`.xlsx` and `.xls`)

5. Reading XML Files (`.xml`)

6. Reading PDF Files (`.pdf`)

7. Reading Images (`.jpg`, `.png`, etc.)

8. Reading ZIP Files

Tips for Efficient File Reading

Conclusion

Comments

Post a Comment

Popular posts from this blog

SQL Query: 3 Methods for Calculating Cumulative SUM

PowerCurve for Beginners: A Comprehensive Guide

Featured Post

Step-by-Step Guide to Creating an AWS RDS Database Instance

Step-by-Step Guide to Reading Different Files in Python

1. Reading Text Files (.txt)

2. Reading CSV Files (.csv)

3. Reading JSON Files (.json)

4. Reading Excel Files (.xlsx and .xls)

5. Reading XML Files (.xml)

6. Reading PDF Files (.pdf)

7. Reading Images (.jpg, .png, etc.)

8. Reading ZIP Files

Tips for Efficient File Reading

Conclusion

Comments

Post a Comment

Popular posts from this blog

SQL Query: 3 Methods for Calculating Cumulative SUM

PowerCurve for Beginners: A Comprehensive Guide

1. Reading Text Files (`.txt`)

2. Reading CSV Files (`.csv`)

3. Reading JSON Files (`.json`)

4. Reading Excel Files (`.xlsx` and `.xls`)

5. Reading XML Files (`.xml`)

6. Reading PDF Files (`.pdf`)

7. Reading Images (`.jpg`, `.png`, etc.)