Posts

Showing posts with the label interview-questions-on-hadoop

Featured Post

Mastering flat_map in Python with List Comprehension

Image
Introduction In Python, when working with nested lists or iterables, one common challenge is flattening them into a single list while applying transformations. Many programming languages provide a built-in flatMap function, but Python does not have an explicit flat_map method. However, Python’s powerful list comprehensions offer an elegant way to achieve the same functionality. This article examines implementation behavior using Python’s list comprehensions and other methods. What is flat_map ? Functional programming  flatMap is a combination of map and flatten . It transforms the collection's element and flattens the resulting nested structure into a single sequence. For example, given a list of lists, flat_map applies a function to each sublist and returns a single flattened list. Example in a Functional Programming Language: List(List(1, 2), List(3, 4)).flatMap(x => x.map(_ * 2)) // Output: List(2, 4, 6, 8) Implementing flat_map in Python Using List Comprehension Python’...

Top 100 Hadoop Complex interview questions (Part 4 of 4)

Image
Hadoop framework is most popular in data analytics and data related projects. I have given here my 4th set of questions for you to read quickly. 1) What is MapReduce? Ans) It is a framework or a programming model that is used for processing large data sets over clusters of computers using distributed programming. 2). What are ‘maps’ and ‘reduces’? Ans). ‘Maps‘ and ‘Reduces‘ are two phases of solving a query in HDFS. ‘Map’ is responsible to read data from input location, and based on the input type, it will generate a key-value pair, that is, an intermediate output in the local machine. ’Reducer’ is responsible to process the intermediate output received from the mapper and generate the final output. 3). What are the four basic parameters of a mapper? Ans) The four basic parameters of a mapper are LongWritable, text, text, and IntWritable. The first two represent input parameters and the second two represent intermediate output parameters. 4). What are the four basic parame...

Top 100 Hadoop Complex Interview Questions (Part 3 of 4)

Image
These are complex Hadoop interview questions. This is my 3rd set of questions useful for your interviews (3 of 4).      1). What are the features of Standalone (local) mode? Ans). In stand-alone mode there are no daemons, everything runs on a single JVM. It has no DFS and utilizes the local file system. Stand-alone mode is suitable only for running MapReduce programs during development. It is one of the least used environments. 2). What are the features of Pseudo mode? Ans). The pseudo mode is used both for development and in the QA environment. In the Pseudo mode, all the daemons run on the same machine. 3). Can we call VMs as pseudos? Ans). No, VMs are not pseudos because VM is something different and pseudo is very specific to Hadoop. 4). What are the features of Fully Distributed mode? Ans). The fully Distributed mode is used in the production environment, where we have ‘n’ number of machines forming a Hadoop cluster. Hadoop daemons run on a cluster of mac...

Top 100 Hadoop Complex Interview Questions (Part 2 of 4)

Image
I am giving a series of Hadoop interview questions. This is my 2nd set of questions. You can get quick benefits by reading these questions from start to end. 1). If a data Node is full how it’s identified? Ans). When data is stored in a data node, then the metadata of that data will be stored in the Namenode. So Namenode will identify if the data node is full. 2). If data nodes increase, then do we need to upgrade Namenode? Ans). While installing the Hadoop system, Namenode is determined based on the size of the clusters. Most of the time, we do not need to upgrade the Namenode because it does not store the actual data, but just the metadata, so such a requirement rarely arise. 3). Are job tracker and task trackers present in separate machines? Ans). Yes, job tracker and task tracker are present in different machines. The reason is job tracker is a single point of failure for the Hadoop MapReduce service. If it goes down, all running jobs are halted. 4). When we send a da...

Top 100 Hadoop Complex Interview Questions (Part 1 of 4)

Image
The below list is complex interview questions as part of Hadoop tutorial (part 1 of 4) you can go through these questions quickly. 1. What is BIG DATA? Ans). Big Data is nothing but an assortment of such a huge and complex data that it becomes very tedious to capture, store, process, retrieve and analyze it with the help of on-hand database management tools or traditional data processing techniques. 2. Can you give some examples of Big Data? Ans). There are many real-life examples of Big Data! Facebook is generating 500+ terabytes of data per day, NYSE (New York Stock Exchange) generates about 1 terabyte of new trade data per day, a jet airline collects 10 terabytes of sensor data for every 30 minutes of flying time. All these are a day to day examples of Big Data! 3. Can you give a detailed overview of the Big Data being generated by Facebook?   Ans). As of December 31, 2012, there are 1.06 billion monthly active users on Facebook and 680 million mobile users. On an avera...

Big data: Quiz-1 Hadoop Top Interview Questions

Image
In this post, I have given a Quiz on Big data with answers. This is part-1 set of questions for your quick reference. Photo credit: Srini Q.1) How Hadoop achieve scaling in terms of storage? A.By increasing the hard disk capacity of the machine B.By increasing the RAM capacity of the machine C.By increasing both the hard disk and RAM capacity of the machine D.By increasing the hard disk capacity of the machine and by adding more machine Q.2) How fault tolerance with respect to data is achieved in Hadoop? A.By breaking the data into smaller blocks and distributing these smaller blocks into several machines B.By adding extra nodes. C.By breaking the data into smaller blocks and copying each block several times, and distributing these replicas across several machines. By doing this Hadoop makes sure even if the machines are failed the replica is present in some other machine D.None of these Q.3) In what all parameters Hadoop scales up? A. Storage only B. Performan...