Posts

Showing posts with the label Hadoop Utilities

Featured Post

Step-by-Step Guide to Creating an AWS RDS Database Instance

Image
 Amazon Relational Database Service (AWS RDS) makes it easy to set up, operate, and scale a relational database in the cloud. Instead of managing servers, patching OS, and handling backups manually, AWS RDS takes care of the heavy lifting so you can focus on building applications and data pipelines. In this blog, we’ll walk through how to create an AWS RDS instance , key configuration choices, and best practices you should follow in real-world projects. What is AWS RDS? AWS RDS is a managed database service that supports popular relational engines such as: Amazon Aurora (MySQL / PostgreSQL compatible) MySQL PostgreSQL MariaDB Oracle SQL Server With RDS, AWS manages: Database provisioning Automated backups Software patching High availability (Multi-AZ) Monitoring and scaling Prerequisites Before creating an RDS instance, make sure you have: An active AWS account Proper IAM permissions (RDS, EC2, VPC) A basic understanding of: ...

Hadoop: How to find which file is healthy

Image
Hadoop provides file system health check utility which is called "fsck". Basically, it checks the health of all the files under a path It also checks the health of all the files under the '/'(root). BIN/HADOOP fsck / - It checks the health of all the files BIN/HADOOP fsck /test/ - It checks the health of files under the path By default fsck utility cannot do anything for under replicated blocks and over replicated blocks. Hadoop itself heal the blocks.   How to find which file is healthy It prints out dot for each healthy file It will print a message for each file, if it is not healthy, also for under replicated blocks, over replicated blocks, mis-replicated blocks, and corrupted blocks. By default fsck utility cannot do anything for under replicated blocks and over replicated blocks. Hadoop itself heal the blocks. How to delete corrupted blocks BIN/HADOOP fsck -delete block-names It will delete all corrupted blocks BIN/HADOOP fsck -m...