Featured post

The Ultimate Cheat Sheet On Hadoop

Top 20 frequently asked questions to test your Hadoop knowledge given in the below Hadoop cheat sheet. Try finding your own answers and match the answers given here.

Question #1 

You have written a MapReduce job that will process 500 million input records and generate 500 million key-value pairs. The data is not uniformly distributed. Your MapReduce job will create a significant amount of intermediate data that it needs to transfer between mappers and reducers which is a potential bottleneck. A custom implementation of which of the following interfaces is most likely to reduce the amount of intermediate data transferred across the network?

A. Writable
B. WritableComparable
C. InputFormat
D. OutputFormat
E. Combiner
F. Partitioner
Ans: e

Question #2 

Where is Hive metastore stored by default ?

B. In client machine in the form of a flat file.
C. In client machine in a derby database
D. In lib directory of HADOOP_HOME, and requires HADOOP_CLASSPATH to be modified.
Ans: c


IBM - Open Cloud Architecture

Cloud computing is changing the way we think about technology, and it’s no passing fad. Consumers are using the cloud to store music. Startups are turning to cloud to get up and running without huge investments. Big businesses and governments are relying on clouds to make more data more accessible. 

Cloud computing.

is changing how business and society run, and it's opening up huge avenues of innovation. We are looking at how developers are now combining systems of record with systems of engagement, and we see a new style of cloud-based application emerging.

These are systems of interaction. For these applications to be sustainable, cloud computing needs to be built on open source and standards.

Wide adoption of open source software and open standards should be everyone's goal. It means customers won’t have to fear vendor lock-in, and organizations can participate in a growing market that welcomes a wide variety of cloud technology and service providers.

We've learned through our experience that open source and standards allow developers to share information more quickly and easily, and at lower costs. This leads to greater innovation. We are at an inflection point. We're focusing the industry on important standards for interoperability, and their open source reference implementations will:
  • Ensure that end users have a strong voice in establishing and adopting cloud computing paradigms
  • Reduce barriers of entry into cloud computing, such as development skills and freedom of choice
  • Increase the long-term viability of today’s cloud investments
  • Prevent unnecessary architectural complexity and fragmentation.
  • Openstack is a software to control your cloud

Open-stack software 

It controls large pools of compute, storage, and networking resources throughout a datacenter, managed through a dashboard or via the OpenStack API. OpenStack works with popular enterprise and open source technologies making it ideal for heterogeneous infrastructure.

Hundreds of the world’s largest brands rely on OpenStack to run their businesses every day, reducing costs and helping them move faster. OpenStack has a strong ecosystem, and users seeking commercial support can choose from different OpenStack-powered products and services.


Popular posts from this blog

AWS Vs Azure Load Balancers Top Insights

Hadoop File System Basic Commands

4 Important Skills You Need for Data Scientists

Hyperledger Fabric: 20 Real Interview Questions