Featured post

The Ultimate Cheat Sheet On Hadoop

Top 20 frequently asked questions to test your Hadoop knowledge given in the below Hadoop cheat sheet. Try finding your own answers and match the answers given here.




Question #1 

You have written a MapReduce job that will process 500 million input records and generate 500 million key-value pairs. The data is not uniformly distributed. Your MapReduce job will create a significant amount of intermediate data that it needs to transfer between mappers and reducers which is a potential bottleneck. A custom implementation of which of the following interfaces is most likely to reduce the amount of intermediate data transferred across the network?



A. Writable
B. WritableComparable
C. InputFormat
D. OutputFormat
E. Combiner
F. Partitioner
Ans: e




Question #2 

Where is Hive metastore stored by default ?


A. In HDFS
B. In client machine in the form of a flat file.
C. In client machine in a derby database
D. In lib directory of HADOOP_HOME, and requires HADOOP_CLASSPATH to be modified.
Ans: c




Question…

How to write R Script in simple way

A script is a good way to keep track of what you're doing. If you have a long analysis, and you want to be able to recreate it later, a good idea is to type it into a script. If you're working in the Windows R GUI (also in the Mac R GUI), there is even a built-in script editor.

#How-to-write-RScript:
Photo credit: Srini
To get to it, pull down the File menu and choose New Script (New Document on a Mac). A window will open in which you can type your script. R Script is a series of commands that you can execute at one time and you can save a lot of time. the script is just a plain text file with R commands in it.

How to create an R Script

  1. You can prepare a script in any text editor, such as vim, TextWrangler, or Notepad.
  2. You can also prepare a script in a word processor, like Word, Writer, TextEdit, or WordPad, PROVIDED you save the script in plain text (ASCII) format.
  3. This should (!) append a ".txt" file extension to the file.
  4. Drop the script into your working directory, and then read it into R using the source() function.
  5. Just put the .txt file into your working directory
  6. Now that you've got it in your working directory one way or another, do this in R.
> source(file = "sample_script.txt") # Don't forget those quotes!
A note: This may not have worked. And the reason for that is, your script may not have had the name "sample_script.txt".
if you make sure the file has the correct name, R will read it. If the file is in your working directory, type dir() at the command prompt, and R will show you the full file name.
Also, R does not like spaces in script names, so don't put spaces in your script names! (In newer versions of R, this is no longer an issue.)

What is all about the script you have written

Example:
# A comment: this is a sample script.
y=c(12,15,28,17,18)
x=c(22,39,50,25,18)
mean(y)
mean(x)
plot(x,y)
What happened to the mean of "y" and the mean of "x"?
The script has created the variables "x" and "y" in your workspace (and has erased any old objects you had by that name).
You can see them with the ls( ) function.

Executing a script does everything typing those commands in the Console would do, EXCEPT print things to the Console. Do this.

> x
[1] 22 39 50 25 18
> mean(x)
[1] 30.8
See? It's there. But if you want to be sure a script will print it to the Console, you should use the print() function.
> print(x)
[1] 22 39 50 25 18
> print(mean(x))
[1] 30.8
When you're working in the Console, the print() is understood (implicit) when you type a command or data object name. This is not necessarily so in a script.
  • Hit the Enter key after the last line. Now, in the editor window, pull down the Edit menu and choose Run All. (On a Mac, highlight all the lines of the script and choose Execute.) The script should execute in your R Console.
  • Pull down the File Menu and choose Save As... Give the file a nice name, like "script2.txt". R will NOT save it by default with a file extension, so be sure you give it one. (Note: On my Mac, the script editor in R will not let me save the script with a .txt extension. It insists that I use .R. Fine!) Close the editor window. Now, in the R Console, do this:
> source(file = "script2.txt") # or source(file = "script2.R") if that's how you saved it
The "aov.out" object was created in your workspace. However, nothing was echoed to your Console because you didn't tell it to print().
Go to File and choose New Script (New Document on a Mac). In the script editor, pull down File and choose Open Script... (Open Document... on a Mac). In the Open Script dialog that appears, change Files Of Type to all files (not necessary on a Mac). Then choose to open "script2.txt" (or "script2.R", whatever!). Edit it to look like this.
print(with(PlantGrowth, tapply(weight, group, mean)))
with(PlantGrowth, aov(weight ~ group)) -> aov.out
print(summary.aov(aov.out))
print(summary.lm(aov.out))
Pull down File and choose Save. Close the script editor window(s). And FINALLY...
> source(file = "script2.txt") # or source(file = "script2.R") if necessary
Finally, writing scripts is simple.

Comments

Popular posts from this blog

AWS Vs Azure Load Balancers Top Insights

Hadoop File System Basic Commands

4 Important Skills You Need for Data Scientists

Hyperledger Fabric: 20 Real Interview Questions