Skip to main content

Top 20 ETL real time interview questions to prepare now

top 20 etl questions asked in many interviews on Unix
#top 20 etl questions asked in many interviews on Unix
1). How to print/display the first line of a file? 
there are many ways to do this. However the easiest way to display the first line of a file is using the [head] command. 
$> head -1 file. Txt
no prize in guessing that if you specify [head -2] then it would print first 2 records of the file. 
another way can be by using [sed] command. [sed] is a very powerful text editor which can be used for various text manipulation purposes like this. 
$> sed '2,$ d' file. Txt
2). how does the above command work? 
The 'd' parameter basically tells [sed] to delete all the records from display from line 2 to last line of the file (last line is represented by $ symbol). Of course it does not actually delete those lines from the file, it just does not display those lines in standard output screen. So you only see the remaining line which is the 1st line. 
3). how to print/display the last line of a file? 
the easiest way is to use the [tail] command. 
$> tail -1 file. Txt
if you want to do it using [sed] command, here is what you should write: 
$> sed -n '$ p' test
from our previous answer, we already know that '$' stands for the last line of the file. So '$ p' basically prints (p for print) the last line in standard output screen. '-n' switch takes [sed] to silent mode so that [sed] does not print anything else in the output. 
4). how to display n-th line of a file? 
the easiest way to do it will be by using [sed] i guess. Based on what we already know about [sed] from our previous examples, we can quickly deduce this command: 
$> sed –n ' p' file. Txt
you need to replace with the actual line number. So if you want to print the 4th line, the command will be 
$> sed –n '4 p' test
of course you can do it by using [head] and [tail] command as well like below: 
$> head - file. Txt | tail -1
you need to replace with the actual line number. So if you want to print the 4th line, the command will be 
$> head -4 file. Txt | tail -1
5). how to remove the first line / header from a file? 
we already know how [sed] can be used to delete a certain line from the output – by using the'd' switch. So if we want to delete the first line the command should be: 
$> sed '1 d' file. Txt
but the issue with the above command is, it just prints out all the lines except the first line of the file on the standard output. It does not really change the file in-place. So if you want to delete the first line from the file itself, you have two options. 
either you can redirect the output of the file to some other file and then rename it back to original file like below: 
$> sed '1 d' file. Txt > new_file. Txt
$> mv new_file. Txt file. Txt
or, you can use an inbuilt [sed] switch '–i' which changes the file in-place. See below: 
$> sed –i '1 d' file. Txt
6). how to remove the last line/ trailer from a file in unix script? 
always remember that [sed] switch '$' refers to the last line. So using this knowledge we can deduce the below command: 
$> sed –i '$ d' file. Txt
7). how to remove certain lines from a file in unix? 
if you want to remove line to line from a given file, you can accomplish the task in the similar method shown above. Here is an example: 
$> sed –i '5,7 d' file. Txt
the above command will delete line 5 to line 7 from the file file. Txt 
8). how to remove the last n-th line from a file? 
this is bit tricky. Suppose your file contains 100 lines and you want to remove the last 5 lines. Now if you know how many lines are there in the file, then you can simply use the above shown method and can remove all the lines from 96 to 100 like below: 
$> sed –i '96,100 d' file. Txt   # alternative to command [head -95 file. Txt] 
but not always you will know the number of lines present in the file (the file may be generated dynamically, etc. ) in that case there are many different ways to solve the problem. There are some ways which are quite complex and fancy. But let's first do it in a way that we can understand easily and remember easily. Here is how it goes: 
$> tt=`wc -l file. Txt | cut -f1 -d' '`;sed –i "`expr $tt - 4`,$tt d" test
as you can see there are two commands. The first one (before the semi-colon) calculates the total number of lines present in the file and stores it in a variable called “tt”. The second command (after the semi-colon), uses the variable and works in the exact way as shows in the previous example. 
9). how to check the length of any line in a file? 
we already know how to print one line from a file which is this: 
$> sed –n ' p' file. Txt
where is to be replaced by the actual line number that you want to print. Now once you know it, it is easy to print out the length of this line by using [wc] command with '-c' switch. 
$> sed –n '35 p' file. Txt | wc –c
the above command will print the length of 35th line in the file. Txt. 
10). how to get the nth word of a line in unix? 
assuming the words in the line are separated by space, we can use the [cut] command. [cut] is a very powerful and useful command and it's real easy. All you have to do to get the n-th word from the line is issue the following command: 
cut –f -d' '
'-d' switch tells [cut] about what is the delimiter (or separator) in the file, which is space ' ' in this case. If the separator was comma, we could have written -d',' then. So, suppose i want find the 4th word from the below string: “a quick brown fox jumped over the lazy cat”, we will do something like this: 
$> echo “a quick brown fox jumped over the lazy cat” | cut –f4 –d' '
and it will print “fox” 
11). how to reverse a string in unix? 
pretty easy. Use the [rev] command. 
$> echo "unix" | rev
xinu
12). how to get the last word from a line in unix file? 
we will make use of two commands that we learnt above to solve this. The commands are [rev] and [cut]. Here we go. 
let's imagine the line is: “c for cat”. We need “cat”. First we reverse the line. We get “tac rof c”. Then we cut the first word, we get 'tac'. And then we reverse it again. 
$>echo "c for cat" | rev | cut -f1 -d' ' | rev
cat
13). how to get the n-th field from a unix command output? 
we know we can do it by [cut]. Like below command extracts the first field from the output of [wc –c] command 
$>wc -c file. Txt | cut -d' ' -f1
109
but i want to introduce one more command to do this here. That is by using [awk] command. [awk] is a very powerful command for text pattern scanning and processing. Here we will see how may we use of [awk] to extract the first field (or first column) from the output of another command. Like above suppose i want to print the first column of the [wc –c] output. Here is how it goes like this: 
$>wc -c file. Txt | awk ' ''{print $1}'
109 
the basic syntax of [awk] is like this: 
awk 'pattern space''{action space}'
the pattern space can be left blank or omitted, like below: 
$>wc -c file. Txt | awk '{print $1}'
109
in the action space, we have asked [awk] to take the action of printing the first column ($1). More on [awk] later. 
14). how to replace the n-th line in a file with a new line in unix? 
this can be done in two steps. The first step is to remove the n-th line. And the second step is to insert a new line in n-th line position. Here we go. 
step 1: remove the n-th line 
$>sed -i'' '10 d' file. Txt       # d stands for delete
step 2: insert a new line at n-th line position 
$>sed -i'' '10 i this is the new line' file. Txt     # i stands for insert
15). how to show the non-printable characters in a file? 
open the file in vi editor. Go to vi command mode by pressing [escape] and then [:]. Then type [set list]. This will show you all the non-printable characters, e. G. Ctrl-m characters (^m) etc. , in the file. 
16). how to zip a file in linux? 
use inbuilt [zip] command in linux 
17). how to unzip a file in linux? 
use inbuilt [unzip] command in linux. 
$> unzip –j file. Zip
18). how to test if a zip file is corrupted in linux? 
use “-t” switch with the inbuilt [unzip] command 
$> unzip –t file. Zip
19). how to check if a file is zipped in unix? 
in order to know the file type of a particular file use the [file] command like below: 
$> file file. Txt
file. Txt: ascii text
if you want to know the technical mime type of the file, use “-i” switch. 
$>file -i file. Txt
file. Txt: text/plain; charset=us-ascii
if the file is zipped, following will be the result 
$> file –i file. Zip
file. Zip: application/x-zip

Comments

Popular posts from this blog

11 Top Blockchain Key Advantages to Read Now

Blockchain architecture changes the financial world in near future. Increasing population and volume of transactions cause financial crimes. Opportunities to implement Blockchain technology are Banks, Share markets, Government Bodies, and Big Corporations.  
Less maintenance and distributable made blockchain hot in the market. Why You Need BlockchainBlockchain stores each transaction in Blocks. No one can tamper or change the details. The people who are making a transaction in Blockchain world they both have same copies. No possibility of changing these records by parties involved. So it is robust.Key Advantages of BlockchainThe ledger details distributed.Distributed data available to all parties, and no one can tamper this data. Every transaction is Public. That means only people who have access can see the information. Stores all records permanently.No one can edit or manipulate the dataThe possibility is there to hack a centralized database. In Blockchain one cannot hack the data. S…

Blue Prism complete tutorials download now

Blue prism is an automation tool useful to execute repetitive tasks without human effort. To learn this tool you need the right material. Provided below quick reference materials to understand detailed elements, architecture and creating new bots. Useful if you are a new learner and trying to enter into automation career. The number one and most popular tool in automation is a Blue prism. In this post, I have given references for popular materials and resources so that you can use for your interviews.
RPA Blue Prism RPA blue prism tutorial popular resources I have given in this post. You can download quickly. Learning Blue Prism is a really good option if you are a learner of Robotic process automation.
RPA Advantages The RPA is also called "Robotic Process Automation"- Real advantages are you can automate any business process and you can complete the customer requests in less time.

The Books Available on Blue Prism 
Blue Prism resourcesDavid chappal PDF bookBlue Prism BlogsVi…

Three popular RPA tools functional differences

Robotic process automation is growing area and many IT developers across the board started up-skill in this popular area. I have written this post for the benefit of Software developers who are interested in RPA also called Robotic Process Automation.

In my previous post, I have described that total 12 tools are available in the market. Out of those 3 tools are most popular. Those are Automation anywhere, BluePrism and Uipath. Many programmers asked what are the differences between these tools. I have given differences of all these three RPA tools.

BluePrism Blue Prism has taken a simple concept, replicating user activity on the desktop, and made it enterprise strength. The technology is scalable, secure, resilient, and flexible and is supported by a comprehensive methodology, operational framework and provided as packaged software.The technology is developed and deployed within a “corridor of IT governance” and has sophisticated error handling and process modelling capabilities to ens…

R Vs SAS differences to read today

Statistical analysis should know by every software engineer. R is an open source statistical programming language. SAS is licensed analysis suite for statistics. The two are very much popular in Machine learning and data analytics projects.
SAS is analysis suite software and R is a programming language R ProgrammingR supports both statistical analysis and GraphicsR is an open source project.R is 18th most popular LanguageR packages are written in C, C++, Java, Python and.NetR is popular in Machine learning, data mining and Statistical analysis projects. SASSAS is a statistical analysis suite. Developed to process data sets in mainframe computers.Later developed to support multi-platforms. Like  Mainframe, Windows, and LinuxSAS has multiple products. SAS/ Base is very basic level.SAS is popular in data related projects. Learn SAS vs R Top Differences between SAS Vs R Programming SAS AdvantagesThe data integration from any data source is faster in SAS.The licensed software suite, so you…

Testing in DevOps to maximize Quality

Testing is the critical phase in DevOps. The process of DevOps is to speed up the deployment process. That means there are no shortcuts in testing. Covering most relevant test cases is the main thing the tester has to focus.
Requirements to Maximize QualityGood maintainable codeExhaustive coverage of casesTraining documents to Operations teamFewer bugs in the bug trackerLess complex and no redundant code Testing Activities in DevOpsThe team to use Tools to check the quality of codeStyle checker helps to correct code styleGood design avoids bugs in productionCode performance depends on the code-qualityBugs in production say poor testing  Tester Roles in DevOpsGood quality means zero bugs in production.Design requirements a base to validate testing results.Automated test scripts give quick feedback on the quality of code. Right test cases cover all the functional changes. The Bottom LineThe DevOps approach is seamless integration between Development and Operations without compromi…

Top Differences Read Today Agile vs Waterfall model

The Agile and Waterfall both models are popular in Software development. The Agile model is so flexible compared to waterfall model. Top differences on Waterfall vs Agile give you clear understanding on both the processes. Waterfall ModelThe traditional model is waterfall. It has less flexibility.Expensive and time consuming model.Less scalable to meet the demand of customer requirements.The approach is top down. Starting from requirements one has to finish all the stages, till deployment to complete one cycle.A small change in requirement, one has to follow all the stages till deployment.Waterfall model creates idleness in resource management. Agile ModelAgile model is excellent for rapid deployment of small changesThe small split-requirements you can call them as sprintsLess idleness in resource management.Scope for complete team involvement.Faster delivery makes client happy.You can deploy changes related to compliance or regulatory quickly.Collaboration improves among the team.