The Ultimate Cheat Sheet On Hadoop

Top 20 frequently asked questions to test your Hadoop knowledge given in the below Hadoop cheat sheet. Try finding your own answers and match the answers given here.

Question #1 

You have written a MapReduce job that will process 500 million input records and generate 500 million key-value pairs. The data is not uniformly distributed. Your MapReduce job will create a significant amount of intermediate data that it needs to transfer between mappers and reducers which is a potential bottleneck. A custom implementation of which of the following interfaces is most likely to reduce the amount of intermediate data transferred across the network?

A. Writable
B. WritableComparable
C. InputFormat
D. OutputFormat
E. Combiner
F. Partitioner
Ans: e

Question #2 

Where is Hive metastore stored by default ?

B. In client machine in the form of a flat file.
C. In client machine in a derby database
D. In lib directory of HADOOP_HOME, and requires HADOOP_CLASSPATH to be modified.
Ans: c


Internet Of Things Awesome Basics You Need to Read Now: Part-3

How things are connected in the IoT world. IT jobs are growing in this area across the world. Earlier, we discussed how the IoT refers to the interconnection of distinguishable smart/intelligent objects or "things" and their virtual manifestation within the Internet or other IP structure.

basics of iot tutorial
Image courtesy|Stockphotos.io

The Basics in IoT

The increased capability of processors (8-, 16- or 32-bit microcontrollers), memory (several tens of kilobytes), storage, possible nowadays to connect anything at an affordable cost.

IoT Devices

  • The devices always may not be small objects and battery operated. They may be big and are operated with electrical power too.
  • The smart objects connected to a grid, and where the control system connected to a pool of software controllers. The generated data is a huge asset, which then passed to the data analysis engine.

IP Scalability

  1. The scalability and flexibility of IP are suitable for the diverse range and a potential number of applications, and it is its well-established architecture, with existing applications for email, Voice over IP (VoIP), video streaming, and so on, that affords the protocol its robust and adaptable usage. 
  2. The IP stack has been regarded as large and perhaps cumbersome, requiring high amounts of processing power and memory.

New Generation Protocols

  • However, several lightweight revisions have emerged to accommodate smaller devices and lower energy footprints. Furthermore, the lightweight nature of the new generation IP stack has allowed itself to be used in conjunction with other protocols, such as low power Wi-Fi, ZigBee, and Bluetooth low energy. 
  • Since IP can run over almost anything, adapting the stack to run over ZigBee or Bluetooth low energy, for example, is a relatively straightforward task, as we highlight in Figure 6.3. As can be seen in the figure, a 6loWPAN adaptation layer is used cohesively to allow both ZigBee and Bluetooth low energy to coexist with the upper layers of the IP stack, namely IPv6 and User Datagram Protocol (UDP) /Transmission Control Protocol (TCP).


