Featured Post

Top Questions People Ask About Pandas, NumPy, Matplotlib & Scikit-learn — Answered!

Image
 Whether you're a beginner or brushing up on your skills, these are the real-world questions Python learners ask most about key libraries in data science. Let’s dive in! 🐍 🐼 Pandas: Data Manipulation Made Easy 1. How do I handle missing data in a DataFrame? df.fillna( 0 ) # Replace NaNs with 0 df.dropna() # Remove rows with NaNs df.isna(). sum () # Count missing values per column 2. How can I merge or join two DataFrames? pd.merge(df1, df2, on= 'id' , how= 'inner' ) # inner, left, right, outer 3. What is the difference between loc[] and iloc[] ? loc[] uses labels (e.g., column names) iloc[] uses integer positions df.loc[ 0 , 'name' ] # label-based df.iloc[ 0 , 1 ] # index-based 4. How do I group data and perform aggregation? df.groupby( 'category' )[ 'sales' ]. sum () 5. How can I convert a column to datetime format? df[ 'date' ] = pd.to_datetime(df[ 'date' ]) ...

Big Data: Top Hadoop Interview Questions (3 of 5)

1) What are daemons in Hadoop?

Big Data: Top Hadoop Interview Questions
#Big Data: Top Hadoop Interview Questions:
In reality running Hadoop means, running daemons of resident programs in multiple servers of your network. This kind of architecture is called fully configured cluster.

2) How daemons run in Hadoop architecture?

Some daemons run in only one server, and others run in more than one server

3) What are the 5 daemons of Hadoop?

-Name node
-Secondary name node
-Data Node
-Job tracker
-Task tracker

4) How many levels do we classify Hadoop broadly?

Broadly we can classify as, it is combination of distributed storage and distributed computation.
Also, as Master/Slave architecture

5) Who is the master of HDFS?

Name node is the master of HDFS

6) What are the functions of Name node?

-Master of HDFS
-Directs slave node i.e., Data nodes
-Book keeping for HDFS
-Monitor overall health of HDFS

7) What is data node?

Each slave machine will have Data node daemon.It performs grunt work of distributed file system

8) What are the functions of Data node?

-Main functionality is read or write HDFS file blocks to local system
-Data node communicates to name node about data blocks. Name node in turn communcates about data block and Data nodes to client.
-Data nodes can communicate each other
-Every change of data in Data node will communicate to Name node

9) How many replicas of data blocks stored in different Data nodes?

3

10) What is Secondary Data node(SNN)?
  • SNN is an assistant to Name node. It also monitors the state of HDFS cluster
  • Like Name node each cluster has one SNN, and it typically resides on its own machine
  • Data nodes and Task trackers run on multiple servers.
It does not record any changes, but time to time it suggests Name node to take SNAP shots of HDFS metadata

11) What will happen if Name node fails?

Then , human interventions is required. That time SNN acts as Name node.

12) What is the role of Job tracker?

This is mediator between client and Task tracker
- Prepares execution plan
-Assign works to task trackers
-Assign nodes to different tasks
-Monitors all tasks are running fine or not

13) What is the role of Task tracker?

Manages execution of individual tasks on each slave node
Single task tracker for each slave node
A task tracker can spread multiple JVMs in a single slave node, to process parallel

Comments

Popular posts from this blog

SQL Query: 3 Methods for Calculating Cumulative SUM

5 SQL Queries That Popularly Used in Data Analysis

Big Data: Top Cloud Computing Interview Questions (1 of 4)