Hadoop: How to find which file is healthy

Hadoop provides file system health check utility which is called "fsck". Basically, it checks the health of all the files under a path It also checks the health of all the files under the '/'(root).
  • BIN/HADOOP fsck / - It checks the health of all the files
  • BIN/HADOOP fsck /test/ - It checks the health of files under the path
By default fsck utility cannot do anything for under replicated blocks and over replicated blocks. Hadoop itself heal the blocks.
Healthy file checking ides

 How to find which file is healthy

  • It prints out dot for each healthy file
  • It will print a message for each file, if it is not healthy, also for under replicated blocks, over replicated blocks, mis-replicated blocks, and corrupted blocks.
  • By default fsck utility cannot do anything for under replicated blocks and over replicated blocks. Hadoop itself heal the blocks.

How to delete corrupted blocks

  • BIN/HADOOP fsck -delete block-names
  • It will delete all corrupted blocks
  • BIN/HADOOP fsck -move block-names
  • It will move corrupted blocks to /lost directory
  • Other options we can use with fsck:
    • files
    • blocks
    • locations

Comments

Popular Posts

Hyperledger Fabric: 20 Real Interview Questions

7 AWS Interview Questions asked in Infosys, TCS

How to Fix Python Syntax Errors Quickly

Python 'getsizeof' Command the Real Purpose

Blue Prism complete tutorials download now

Python Dictionary Vs List With Examples

How to Use the ps Command in Linux

Vi Editor to Quit use Esc and Colon

5 HBase Vs. RDBMS Top Functional Differences

The Growth of Machine Learning till TensorFlow