Featured post

The Ultimate Cheat Sheet On Hadoop

Top 20 frequently asked questions to test your Hadoop knowledge given in the below Hadoop cheat sheet. Try finding your own answers and match the answers given here.




Question #1 

You have written a MapReduce job that will process 500 million input records and generate 500 million key-value pairs. The data is not uniformly distributed. Your MapReduce job will create a significant amount of intermediate data that it needs to transfer between mappers and reducers which is a potential bottleneck. A custom implementation of which of the following interfaces is most likely to reduce the amount of intermediate data transferred across the network?



A. Writable
B. WritableComparable
C. InputFormat
D. OutputFormat
E. Combiner
F. Partitioner
Ans: e




Question #2 

Where is Hive metastore stored by default ?


A. In HDFS
B. In client machine in the form of a flat file.
C. In client machine in a derby database
D. In lib directory of HADOOP_HOME, and requires HADOOP_CLASSPATH to be modified.
Ans: c




Question…

SAN: Real Architecture Explained

A SAN is connected behind the servers. SANs provide block-level access to shared data storage. Block level access refers to the specific blocks of data on a storage device as opposed to file level access. One file will contain several blocks.
The simplified SAN architecture to understand how data is stored in storage from servers





Storage Area Networks (SANs)
  • SANs provide high availability and robust business continuity for critical data environments. SANs are typically switched fabric architectures using Fibre Channel (FC) for connectivity.
  • The term switched fabric refers to each storage unit being connected to each server via multiple SAN switches also called SAN directors which provide redundancy within the paths to the storage units. This provides additional paths for communications and eliminates one central switch as a single point of failure.
  • Ethernet has many advantages similar to Fibre Channel for supporting SANs. Some of these include high speed, support of a switched fabric topology, widespread interoperability, and a large set of management tools.
  • In a storage network application, the switch is the key element. With the significant number of Gigabit and 10 Gigabit Ethernet ports shipped, leveraging IP and Ethernet for storage is a natural progression for some environments. 
SAN Vs IP
  1. IP was developed as an open standard with complete interoperability of components. Two new IP storage network technologies are Fibre Channel over Ethernet (FCoE) and SCSI over IP (iSCSI). IP communication across a standard IP network via Fibre Channel Tunneling or storage tunneling has the benefit of utilizing storage in locations that may exceed the directly attached limit of nearly 10 km when using fiber as the transport medium.
  2. Internal to the data center, legacy Fibre Channel can also be run over coaxial cable or twisted pair cabling, but at significantly shorter distances.
  3. The incorporation of the IP standard into these storage systems offers performance benefits through speed, greater availability, fault tolerance, and scalability. These solutions, properly implemented, can almost guaranty 100% availability of data. The IP based management protocols also provide network managers with a new set of tools, warnings, and triggers that were proprietary in previous generations of storage technology. Security and encryption solutions are also greatly enhanced. With 10G gaining popularity and the availability of new faster WAN links, these solutions can offer true storage on demand.

Comments

Popular posts from this blog

AWS Vs Azure Load Balancers Top Insights

Hadoop File System Basic Commands

4 Important Skills You Need for Data Scientists

Hyperledger Fabric: 20 Real Interview Questions