Featured post

The Ultimate Cheat Sheet On Hadoop

Top 20 frequently asked questions to test your Hadoop knowledge given in the below Hadoop cheat sheet. Try finding your own answers and match the answers given here.




Question #1 

You have written a MapReduce job that will process 500 million input records and generate 500 million key-value pairs. The data is not uniformly distributed. Your MapReduce job will create a significant amount of intermediate data that it needs to transfer between mappers and reducers which is a potential bottleneck. A custom implementation of which of the following interfaces is most likely to reduce the amount of intermediate data transferred across the network?



A. Writable
B. WritableComparable
C. InputFormat
D. OutputFormat
E. Combiner
F. Partitioner
Ans: e




Question #2 

Where is Hive metastore stored by default ?


A. In HDFS
B. In client machine in the form of a flat file.
C. In client machine in a derby database
D. In lib directory of HADOOP_HOME, and requires HADOOP_CLASSPATH to be modified.
Ans: c




Question…

What is Elastic Nature in Cloud Computing

Natural clouds are indeed elastic, expanding and contracting based on the force of the winds carrying them. The cloud is similarly elastic, expanding and shrinking based on resource usage and cloud tenant resource demands. The physical resources (computing, storage, networking, etc.) deployed within the data center or across data centers and bundled as a single cloud usually do not change that fast.
This elastic nature, therefore, is something that is built into the cloud at the software stack level, not the hardware.
Best cloud computing example: The classic promise of the cloud is to make compute resources available on demand, which means that theoretically, a cloud should be able to scale as a business grows and shrink as the demand diminishes. Consider here, for example, Amazon.com during Black Friday. There's a spike in inbound traffic, which translates into more memory consumption, increased network density, and increased compute resource utilization. If Amazon.com had, let's say, 5 servers and each server could handle up to 100 users at a time, the whole deployment would have peak service capacity of 500 users. During the holiday season, there's an influx of 1,000 users, which is double the capacity of what the current deployment can handle.

If Amazon were smart, it would have set up 5 additional (or maybe 10) servers within its data center in anticipation of the holiday season spike. This would mean physically provisioning 5 or 10 machines, setting them up, and connecting with the current deployment of 5 servers. Once the season is over and the traffic is back to normal, Amazon doesn't really need those additional 5 to 10 servers it brought in before the season. So either they stay within the data center sitting idle and incurring additional cost or they can be rented to someone else.

What we just described is what a typical deployment looked like pre-cloud. There was unnecessary physical interaction and manual provisioning of physical resources. This is inefficient and something that cannot be linearly scaled up. Imagine doing this with millions of users and hundreds or even thousands of servers. Needless to say, it would be a mess. This manual provisioning is not only inefficient, it's also financially infeasible for startups because it requires investing significant capital in setting up or co-locating to a data center and dedicated personnel who can manually handle the provisioning.

This is what the cloud has replaced. It has enabled small, medium, and large teams and enterprises to provision and then decommission compute, network, and memory resources, all of which are physical, in an automated way, which means that you can now scale up your resources just in time to serve the traffic spike and then wind down the additional provisioned resources, effectively just paying for the time that your application served the spike with increased resources. This automated resource allocation and deallocation is what makes a cloud elastic.

Comments

Popular posts from this blog

AWS Vs Azure Load Balancers Top Insights

Hadoop File System Basic Commands

4 Important Skills You Need for Data Scientists

Hyperledger Fabric: 20 Real Interview Questions