Featured post

The Ultimate Cheat Sheet On Hadoop

Top 20 frequently asked questions to test your Hadoop knowledge given in the below Hadoop cheat sheet. Try finding your own answers and match the answers given here.




Question #1 

You have written a MapReduce job that will process 500 million input records and generate 500 million key-value pairs. The data is not uniformly distributed. Your MapReduce job will create a significant amount of intermediate data that it needs to transfer between mappers and reducers which is a potential bottleneck. A custom implementation of which of the following interfaces is most likely to reduce the amount of intermediate data transferred across the network?



A. Writable
B. WritableComparable
C. InputFormat
D. OutputFormat
E. Combiner
F. Partitioner
Ans: e




Question #2 

Where is Hive metastore stored by default ?


A. In HDFS
B. In client machine in the form of a flat file.
C. In client machine in a derby database
D. In lib directory of HADOOP_HOME, and requires HADOOP_CLASSPATH to be modified.
Ans: c




Question…

Cloud Computing Vs Virtualization Real Differences

Servicing of computing resources you can say as Cloud computing. They charge for some amount each service used by an user. On the other hand, making multiple machines virtually by using same hardware is called virtualization. Sell all the real differences.

Public Vs Private Vs Hybrid Cloud

  1. Public Cloud: Available to the public is called the PUBLIC cloud. 
  2. Private Cloud: Available to the particular Business/Company is called PRIVATE Cloud. 
  3. Hybrid Cloud: Combination of Public and Private Clouds.
Cloud services
Cloud Services

Cloud Services

  • Cloud services (remote data and computation) are exposed as simple and user-friendly web services. For example, Microsoft's ADO.NET (originally called Astoria) provides the tools to expose any data object from a collection, stored in a database or other form, as a URI to an encoded form using a standard such as JSON or ATOM representation
  • Google's AppEngine provides a way to deploy a remote Python script that becomes a web service that can access data in their BigTable database system.
  • To deliver highly available and flexible services (i.e., computation as a service), and owing to the maturity of virtualization technology, Virtual Machines (VMs) are used as a standard for object deployment in the cloud.

Virtual Machines

  • VMs decouple the computing infrastructure from the physical infrastructure. VMs allow the customization of the platform to suit the needs of the end-user. 
  • For example, in the Amazon Elastic Compute Cloud (EC2), the customer selects his/her preferred VM image (virtual appliance) from a list of various versions of Linux and Windows servers

References


Comments

Popular posts from this blog

AWS Vs Azure Load Balancers Top Insights

Hadoop File System Basic Commands

4 Important Skills You Need for Data Scientists

Hyperledger Fabric: 20 Real Interview Questions