Claude Code for Beginners: Step-by-Step AI Coding Tutorial
In the world of data science, automation, and general programming, working with files is unavoidable. Whether you’re dealing with CSV reports, JSON APIs, Excel sheets, or text logs, Python provides rich and easy-to-use libraries for reading different file formats. In this guide, we’ll explore how to read different files in Python, with code examples and best practices.
.txt)Text files are the simplest form of files. Python’s built-in open() function handles them effortlessly.
Example:
# Open and read a text file
with open("sample.txt", "r") as file:
content = file.read()
print(content)
Explanation:
"r" mode means read.
with open() automatically closes the file when done.
Best Practice: Always use with to handle files to avoid memory leaks.
.csv)CSV files are widely used for storing tabular data. Python has a built-in csv module and a powerful pandas library.
Using csv module:
import csv
with open("data.csv", "r") as file:
reader = csv.reader(file)
for row in reader:
print(row)
Using Pandas (recommended for data analysis):
import pandas as pd
df = pd.read_csv("data.csv")
print(df.head())
Best Practice: Use Pandas for large datasets and data analysis tasks.
.json)JSON (JavaScript Object Notation) is common for APIs and configuration files.
Example:
import json
with open("data.json", "r") as file:
data = json.load(file)
print(data)
Best Practice: For large JSON files, consider ijson for streaming reads.
.xlsx and .xls)Excel files are popular in business and reporting.
Using Pandas:
import pandas as pd
df = pd.read_excel("data.xlsx")
print(df.head())
Using openpyxl (for .xlsx only):
from openpyxl import load_workbook
wb = load_workbook("data.xlsx")
sheet = wb.active
for row in sheet.iter_rows(values_only=True):
print(row)
Best Practice: Use pandas for analysis, openpyxl for Excel-specific operations like formatting.
.xml)XML is still used in configuration files, legacy systems, and web services.
Example with xml.etree.ElementTree:
import xml.etree.ElementTree as ET
tree = ET.parse("data.xml")
root = tree.getroot()
for child in root:
print(child.tag, child.attrib)
.pdf)PDFs are common for reports and documents. Python’s PyPDF2 or pdfplumber can extract text.
Example with PyPDF2:
import PyPDF2
with open("sample.pdf", "rb") as file:
reader = PyPDF2.PdfReader(file)
for page in reader.pages:
print(page.extract_text())
.jpg, .png, etc.)While not textual data, reading image files is essential in machine learning.
Example with PIL:
from PIL import Image
img = Image.open("image.jpg")
img.show()
Sometimes files come compressed.
import zipfile
with zipfile.ZipFile("files.zip", "r") as zip_ref:
zip_ref.extractall("extracted_files")
Use context managers (with open) to handle file closing automatically.
Avoid reading large files all at once; read in chunks if memory is limited.
Choose the right library — Pandas for data analysis, built-in modules for small tasks.
Check file encoding — for text files, specify encoding="utf-8" if needed.
Python makes working with different file formats straightforward. With just a few lines of code, you can handle text, CSV, JSON, Excel, XML, PDF, images, and ZIP archives. The choice of library depends on the file type and your end goal. Once you master file handling, you’ll be ready to build powerful automation scripts, data pipelines, and AI models.
Comments
Post a Comment
Thanks for your message. We will get back you.