Featured post

The Ultimate Cheat Sheet On Hadoop

Top 20 frequently asked questions to test your Hadoop knowledge given in the below Hadoop cheat sheet. Try finding your own answers and match the answers given here.




Question #1 

You have written a MapReduce job that will process 500 million input records and generate 500 million key-value pairs. The data is not uniformly distributed. Your MapReduce job will create a significant amount of intermediate data that it needs to transfer between mappers and reducers which is a potential bottleneck. A custom implementation of which of the following interfaces is most likely to reduce the amount of intermediate data transferred across the network?



A. Writable
B. WritableComparable
C. InputFormat
D. OutputFormat
E. Combiner
F. Partitioner
Ans: e




Question #2 

Where is Hive metastore stored by default ?


A. In HDFS
B. In client machine in the form of a flat file.
C. In client machine in a derby database
D. In lib directory of HADOOP_HOME, and requires HADOOP_CLASSPATH to be modified.
Ans: c




Question…

Why Amazon Web services AWS Cloud computing is so popular


Amazon its Cloud computing services started in three stages:
  1. S3 (Simple storage service)
  2. SQS (Simple Que service)
  3. EC2 (Elastic compute cloud)

Amazon Web Services was officially revealed to the world on March 13, 2006. On that day, AWS offered the Simple Storage Service, its first service. (As you may imagine, Simple Storage Services was soon shortened to S3.) The idea behind S3 was simple:


It could offer the concept of object storage over the web, a setup where anyone could put an object — essentially, any bunch of bytes — into S3.


Those bytes may comprise a digital photo or a file backup or a software package or a video or audio recording or a spreadsheet file or — well, you get the idea.


S3 was relatively limited when it first started out. Though objects could, admittedly, be written or read from anywhere, they could be stored in only one region: the United States. Moreover, objects could be no larger than 5 gigabytes — not tiny by any means, but certainly smaller than many files that people may want to store in S3.


The actions available for objects were also quite limited: You could write, read, and delete them, and that was it.


In its first six years, S3 has grown in all dimensions. The service is now offered throughout the world in a number of different regions. Objects can now be as large as 5 terabytes. S3 can also offer many more capabilities regarding objects. An object can now have a termination date, for example: You can set a date and time after which an object is no longer available for access.


(This capability may be useful if you want to make a video available for viewing for only a certain period, such as the next two weeks.) S3 can now also be used to host websites — in other words, individual pages can be stored as objects, and your domain name (say, www.example.com) can point to S3, which serves up the pages.


S3 did not remain the lone AWS example for long. Just a few months after it was launched, Amazon began offering Simple Queue Service (SQS), which provides a way to pass messages between different programs. SQS can accept or deliver messages within the AWS environment or outside the environment to other programs (your web browser, for example) and can be used to build highly scalable distributed applications.

  • Later in 2006 came Elastic Compute Cloud (known affectionately as EC2). As the AWS computing service, EC2 offers computing capacity on demand, with immediate availability and no set commitment to length of use.
  • The overall pattern of AWS has been to add additional services steadily, and then quickly improve each service over time.
  • AWS is now composed of more than 25 different services, many offered with different capabilities via different configurations or formats.
  • This rich set of services can be mixed and matched to create interesting and unique applications, limited only by your imagination or needs.

Related:

Comments

Popular posts from this blog

AWS Vs Azure Load Balancers Top Insights

Hadoop File System Basic Commands

4 Important Skills You Need for Data Scientists

Hyperledger Fabric: 20 Real Interview Questions