Hadoop: How to find which file is healthy

- July 22, 2015

Hadoop provides file system health check utility which is called "fsck". Basically, it checks the health of all the files under a path It also checks the health of all the files under the '/'(root).

BIN/HADOOP fsck / - It checks the health of all the files
BIN/HADOOP fsck /test/ - It checks the health of files under the path

By default fsck utility cannot do anything for under replicated blocks and over replicated blocks. Hadoop itself heal the blocks.

How to find which file is healthy

It prints out dot for each healthy file
It will print a message for each file, if it is not healthy, also for under replicated blocks, over replicated blocks, mis-replicated blocks, and corrupted blocks.
By default fsck utility cannot do anything for under replicated blocks and over replicated blocks. Hadoop itself heal the blocks.

How to delete corrupted blocks

BIN/HADOOP fsck -delete block-names
It will delete all corrupted blocks
BIN/HADOOP fsck -move block-names
It will move corrupted blocks to /lost directory
Other options we can use with fsck:

files
blocks
locations

Search This Blog

ApplyBigAnalytics

Featured Post

Step-by-Step Guide to Reading Different Files in Python

Hadoop: How to find which file is healthy

How to find which file is healthy

How to delete corrupted blocks

Comments

Post a Comment

Popular posts from this blog

SQL Query: 3 Methods for Calculating Cumulative SUM

Step-by-Step Guide to Reading Different Files in Python

5 SQL Queries That Popularly Used in Data Analysis