Featured Post

How to Build CI/CD Pipeline: GitHub to AWS

Image
 Creating a CI/CD pipeline to deploy a project from GitHub to AWS can be done using various AWS services like AWS CodePipeline, AWS CodeBuild, and optionally AWS CodeDeploy or Amazon ECS for application deployment. Below is a high-level guide on how to set up a basic GitHub to AWS pipeline: Prerequisites AWS Account : Ensure access to the AWS account with the necessary permissions. GitHub Repository : Have your application code hosted on GitHub. IAM Roles : Create necessary IAM roles with permissions to interact with AWS services (e.g., CodePipeline, CodeBuild, S3, ECS, etc.). AWS CLI : Install and configure the AWS CLI for easier management of services. Step 1: Create an S3 Bucket for Artifacts AWS CodePipeline requires an S3 bucket to store artifacts (builds, deployments, etc.). Go to the S3 service in the AWS Management Console. Create a new bucket, ensuring it has a unique name. Note the bucket name for later use. Step 2: Set Up AWS CodeBuild CodeBuild will handle the build proces

Hadoop fs (File System) Commands List

Hadoop HDSF File system commands given in this post. These are useful for your projects and interviews.

Hadoop fs (File System) Commands List
HDFS commands




HDFS File System Commands.


Hadoop fs -cmd <args>

cmd is a specific command and arg is the variable name. 


The List of Commands


cat 

Hadoop fs –cat FILE [FILE …] 

Displays the files' content. For reading compressed files. 


chgrp 

Hadoop fs –chgrp [-R] GROUP PATH [PATH …] 

Changes the group association for files and directories. The – R option applies the change recursively. 
The user must be the files' owner or a superuser. 


chmod 

Hadoop fs –chmod [-R] MODE[,MODE …] PATH [PATH …] 

Changes the permissions of files and directories. Like, its Unix equivalent, MODE can be a 3-digit octal mode, or {augo}+/-{rwxX}. The -R option applies the change recursively. The user must be the files' owner or a superuser. 


chown 


Hadoop fs –chown [-R] [OWNER][:[GROUP]] PATH [PATH…] 


Changes the ownership of files and directories. The –R option applies the change recursively. The user must be a superuser. 


copyFromLocal 

Hadoop fs –copyFromLocal LOCALSRC [LOCALSRC …] DST 

Identical to put (copy files from the local file system). 


copyToLocal 

Hadoop fs –copyToLocal [-ignorecrc] [-crc] SRC [SRC…] LOCALDST 

Identical to get (copy files to the local file system). 


count 

Hadoop fs –count [-q] PATH [PATH …] 

Displays the number of subdirectories, number of files, number of bytes used, and name for all files/directories identified by PATH. The -option displays quota information. 


cp 

Hadoop fs –cp SRC [SRC …] DST 

Copies files from source to destination. If multiple source files are specified, the destination must be a directory.


du 

Hadoop fs –du PATH [PATH …] 

Displays file sizes. If PATH is a directory, the size of each file in the directory is reported. Filenames are stated with the full URI protocol prefix. Note that although dust and for disk usage,
it should not be taken literally, as disk usage depends on block size and replica factors. 

dus 

Hadoop fs –dus PATH [PATH …] 

Like du, but for a directory, dus reports the sum of file sizes in aggregate rather than individually. 


expunge 

Hadoop fs –expunge 

Empties the trash. If the trash feature is enabled, when a file is deleted, it is first moved into the temporary. Trash/ folder. The file will be permanently deleted from the. Trash/ folder only after a user-configurable delay. The expunge command forcefully deletes all files from the. Trash/ folder. Note that as long as a file is in the. Trash/ folder, it can be restored by moving it back to its original location. 


get 

Hadoop fs –get [-ignorecrc] [-crc] SRC [SRC …] LOCALDST 

Copies files to the local filesystem. If multiple source files are specified, a local destination must be a directory. If LOCALDST is -, the files are copied to stdout. 


HDFS computes a checksum for each block of each file. The checksums for a file are stored separately in a hidden file. When a file is read from HDFS, the checksums in that hidden file are used to verify the file's integrity. For the get command, the -crc option will copy that hidden checksum file. The - ignore crc option will skip the checksum checking when copying. 


getmerge 

Hadoop fs –getmerge SRC [SRC …] LOCALDST [addnl] 

Retrieves all files identified by SRC, merges them, and writes the single merged file to LOCALDST in the local filesystem. The option addnl will add a newline character to the end of each file. 


help 

Hadoop fs –help [CMD] 

Displays usage information for the command CMD. If CMD is omitted, it displays useful information for all commands. 


ls 

Hadoop fs –ls PATH [PATH …] 

Lists files and directories. Each entry shows the name, permissions, owner, group, size, and modification date. File entries also show their replication factor. 


lsr 

Hadoop fs –lsr PATH [PATH …] 

The recursive version of ls. 


mkdir 

Hadoop fs –mkdir PATH [PATH …] 

Creates directories. Any missing parent directories are also created (like Unix mkdir –p). 


moveFromLocal 

Hadoop fs –moveFromLocal LOCALSRC [LOCALSRC …] DST 

Similar to put, except the local source is deleted after it's been successfully copied to HDFS. 



moveToLocal 

Hadoop fs –moveToLocal [-crc] SRC [SRC …] LOCALDST 

Displays a "not implemented yet" message. 


mv 

Hadoop fs –mv SRC [SRC …] DST 

Moves files from source(s) to destination. 
If multiple source files are specified, the destination has to be a directory. Moving across filesystems is not permitted. 


put 


Hadoop fs –put LOCALSRC [LOCALSRC …] DST 

Copies files or directories from local system to destination filesystem. If LOCALSRC is set to -, input is set to stdin and DST must be a file. 


rm 

Hadoop fs –rm PATH [PATH …] 

Deletes files and empty directories. 


rmr 

Hadoop fs –rmr PATH [PATH …] 

The recursive version of rm. 

setrep 

Hadoop fs –setrep [-R] [-w] REP PATH [PATH …] 

Sets the target replication factor to REP for given files. The -R option will recursively apply the target replication factor to files in directories identified by PATH. The replication factor will take some time to get to the target. The -w option will wait for the replication factor to match the target. 


stat 

Hadoop fs –stat [FORMAT] PATH [PATH …] 

Displays "statistical" information on files. The FORMAT string is printed exactly but with the following format, specifiers replaced. %b Size of file in blocks %F. The string "directory" or "regular file" depending on file type %n Filename %o Block size %r Replication %y UTC date in yyyy-MM-dd HH:mm:ss format %Y Milliseconds since January 1, 1970 UTC.


tail 

Hadoop fs –tail [-f] FILE 

Displays the last one kilobyte of FILE. 


test 

Hadoop fs –test –[ezd] PATH 

Performs one of the following type checks on PATH: -e PATH existence. Returns 0 if PATH exists. -z Empty file. Returns 0 if file length is 0. -d Returns 0 if PATH is a directory. 


text 

Hadoop fs –text FILE [FILE …] 

Displays the textual content of files. Identical to catif files are text files. Files in known compressed 
format (gzip and Hadoop's binary sequence file format) 
are uncompressed first. 


touchz 

Hadoop fs –touchz FILE [FILE …] 

Creates files of length 0. Fails if files already exist and have nonzero length.


Comments

  1. I feel very grateful that I read this. It is very helpful and very informative and I really learned a lot from it.
    data science course in indore

    ReplyDelete

Post a Comment

Thanks for your message. We will get back you.

Popular posts from this blog

How to Fix datetime Import Error in Python Quickly

How to Check Kafka Available Brokers

SQL Query: 3 Methods for Calculating Cumulative SUM