Featured Post

SQL Interview Success: Unlocking the Top 5 Frequently Asked Queries

Image
 Here are the five top commonly asked SQL queries in the interviews. These you can expect in Data Analyst, or, Data Engineer interviews. Top SQL Queries for Interviews 01. Joins The commonly asked question pertains to providing two tables, determining the number of rows that will return on various join types, and the resultant. Table1 -------- id ---- 1 1 2 3 Table2 -------- id ---- 1 3 1 NULL Output ------- Inner join --------------- 5 rows will return The result will be: =============== 1  1 1   1 1   1 1    1 3    3 02. Substring and Concat Here, we need to write an SQL query to make the upper case of the first letter and the small case of the remaining letter. Table1 ------ ename ===== raJu venKat kRIshna Solution: ========== SELECT CONCAT(UPPER(SUBSTRING(name, 1, 1)), LOWER(SUBSTRING(name, 2))) AS capitalized_name FROM Table1; 03. Case statement SQL Query ========= SELECT Code1, Code2,      CASE         WHEN Code1 = 'A' AND Code2 = 'AA' THEN "A" | "A

Here is Hadoop MapReduce DataFlow Tutorial

Here are the six stages of MapReduce. The MapReduce is critical for your data processing needs. Traditionally, the whole file needs to read once then divided manually, but it is not convenient. With that respect, Hadoop provides the facility to read files (ignoring their size) line-for-line by using offset and key-value.

Explained the dataflow in Hadoop MapReducer

MapReduce dataflow Quick Tutorial


1. Dataflow Diagram



How a Mapreduce process in Hadoop divides input and processes it, you will learn in this post.


2. MapReduce Stages


MapReduce receives input and processes it. Here are the six stages of processing. It is helpful for your interviews and project.


MapReduce Stage-1


Take the file as input for processing purposes. Any file will consist of a group of lines. These lines containing key-value pairs of data. The whole file can be read out with this method.

MapReduce Stage-2


In the next step, the file will be in "splitting" mode. This mode will divide the file into key, value pair of data. This time key will be offset and data will be a valuable part of the program. Each line will be read individually so there is no need to split data manually.

MapReduce Stage-3


The further step is to process the value of each line with an associate from counting numbers. Each individual that is separated from a space counted with the number and that number is written with each key. This is the logic of "mapping" that programmers need to write.

MapReduce Stage-4


After that shuffling is performed and with this, each key gets associated with the group of numbers that are involved in the mapping section. Now scenario becomes key with string and value will be a list of numbers. This will go as input to the reducer.

MapReduce Stage-5


In the reducer phase, whole numbers are counted and each key associated with final counting is the sum of all numbers which leads to the final result.

MapReduce Stage-6


Output of the reducer phase will lead to the final result. This final result will have counting of individual word count. This is independent of the size of the file used for processing.


Keep Reading
  1. Big Data and Hadoop: Learn by Example

Comments

Popular posts from this blog

How to Fix datetime Import Error in Python Quickly

Explained Ideal Structure of Python Class

How to Check Kafka Available Brokers