Hadoop: How to find which file is healthy
Hadoop provides file system health check utility which is called "fsck". Basically, it checks the health of all the files under a path It also checks the health of all the files under the '/'(root).
- BIN/HADOOP fsck / - It checks the health of all the files
- BIN/HADOOP fsck /test/ - It checks the health of files under the path
How to find which file is healthy
- It prints out dot for each healthy file
- It will print a message for each file, if it is not healthy, also for under replicated blocks, over replicated blocks, mis-replicated blocks, and corrupted blocks.
- By default fsck utility cannot do anything for under replicated blocks and over replicated blocks. Hadoop itself heal the blocks.
How to delete corrupted blocks
- BIN/HADOOP fsck -delete block-names
- It will delete all corrupted blocks
- BIN/HADOOP fsck -move block-names
- It will move corrupted blocks to /lost directory
- Other options we can use with fsck:
- files
- blocks
- locations
Comments
Post a Comment
Thanks for your message. We will get back you.