Posts

Showing posts matching the search for Hadoop

Featured Post

SQL Interview Success: Unlocking the Top 5 Frequently Asked Queries

Image
 Here are the five top commonly asked SQL queries in the interviews. These you can expect in Data Analyst, or, Data Engineer interviews. Top SQL Queries for Interviews 01. Joins The commonly asked question pertains to providing two tables, determining the number of rows that will return on various join types, and the resultant. Table1 -------- id ---- 1 1 2 3 Table2 -------- id ---- 1 3 1 NULL Output ------- Inner join --------------- 5 rows will return The result will be: =============== 1  1 1   1 1   1 1    1 3    3 02. Substring and Concat Here, we need to write an SQL query to make the upper case of the first letter and the small case of the remaining letter. Table1 ------ ename ===== raJu venKat kRIshna Solution: ========== SELECT CONCAT(UPPER(SUBSTRING(name, 1, 1)), LOWER(SUBSTRING(name, 2))) AS capitalized_name FROM Table1; 03. Case statement SQL Query ========= SELECT Code1, Code2,      CASE         WHEN Code1 = 'A' AND Code2 = 'AA' THEN "A" | "A

Top 100 Hadoop Complex Interview Questions (Part 3 of 4)

Image
These are complex Hadoop interview questions. This is my 3rd set of questions useful for your interviews (3 of 4).      1). What are the features of Standalone (local) mode? Ans). In stand-alone mode there are no daemons, everything runs on a single JVM. It has no DFS and utilizes the local file system. Stand-alone mode is suitable only for running MapReduce programs during development. It is one of the least used environments. 2). What are the features of Pseudo mode? Ans). The pseudo mode is used both for development and in the QA environment. In the Pseudo mode, all the daemons run on the same machine. 3). Can we call VMs as pseudos? Ans). No, VMs are not pseudos because VM is something different and pseudo is very specific to Hadoop. 4). What are the features of Fully Distributed mode? Ans). The fully Distributed mode is used in the production environment, where we have ‘n’ number of machines forming a Hadoop cluster. Hadoop daemons run on a cluster of machines. There i

Hadoop fs (File System) Commands List

Image
Hadoop HDSF File system commands given in this post. These are useful for your projects and interviews. HDFS commands HDFS File System Commands. Hadoop fs -cmd <args> cmd is a specific command and arg is the variable name.  The List of Commands cat  Hadoop fs –cat FILE [FILE …]  Displays the files' content. For reading compressed files.  chgrp  Hadoop fs –chgrp [-R] GROUP PATH [PATH …]  Changes the group association for files and directories. The – R option applies the change recursively.  The user must be the files' owner or a superuser.  chmod  Hadoop fs –chmod [-R] MODE[,MODE …] PATH [PATH …]  Changes the permissions of files and directories. Like, its Unix equivalent, MODE can be a 3-digit octal mode, or {augo}+/-{rwxX}. The -R option applies the change recursively. The user must be the files' owner or a superuser.  chown  Hadoop fs –chown [-R] [OWNER][:[GROUP]] PATH [PATH…]  Changes the ownership of files and directories. The –R option applies the change recursiv

How to Setup Hadoop Cluster Top Ideas

Image
Hadoop cluster setup in Centos Operating System explained in this post. So you can install CentOs either in your Laptop or in Virtual Machine. Hadoop Cluster Setup Process 9 Steps Process to Setup Hadoop Cluster Step 1:  Installing Sun Java on Linux. Commands to execute for the same: sudo apt-add-repository ppa:flexiondotorg/java sudo apt-get update sudo apt-get install sun-java6-jre sun-java6-plugin sudo update-java-alternatives -s java-6-sun Step 2:  Create Hadoop User. Commands to execute for the same: $sudo addgroup hadoop $sudo adduser —ingroup hadoop hduser Step 3:  Install SSH Server if not already present. Commands are: $ sudo apt-get install openssh-server $ su - hduser $ ssh-keygen -t rsa -P "" $ cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys Step 4:  Installing Hadoop. Commands for the same are: $wget http://www.eng.lsu.edu/mirrors/apache/hadoop/core/hadoop-0.22.0/hadoop-0.22.0.tar.gz $ cd /home/hduser $ tar xzf hadoop-0.

30 High Paying Tech Jobs,$110,000 Plus Salary

Image
There is a growing demand for software developers across the globe. These 30 highly paying IT jobs really worth. PaaS or "Platform as a Service" is a type of cloud computing technology. It hosts everything that a developer needs to write an app. These apps once written, would live on PaaS cloud. Paas++jobs Cassandra is a free and open source NoSQL database. It's a kind of database that can handle and store data of different types and sizes of data and it's increasingly the go-to database for mobile and cloud applications. Several IT companies including Apple and Netflix use Cassandra. Cassandra+jobs MapReduce has been called "the heart of Hadoop." MapReduce is the method that allows Hadoop to store all kinds of data across many low-cost computer servers. To get meaningful data of Hadoop, a programmer writes software programs (often in the popular language, Java) for MapReduce. Mapreduce+jobs 30 High Paying IT Jobs Cloudera is a company that ma

Big Data:Top Hadoop Interview Questions (2 of 5)

Image
Frequently asked Hadoop interview questions. 1. What is Hadoop? Hadoop is a framework that allows users the power of distributed computing. 2.What is the difference between SQL and Hadoop? SQL is allowed to work with structured data. But SQL is most suitable for legacy technologies. Hadoop is suitable for unstructured data. And, it is well suited for modern technologis. Hadoop 3. What is Hadoop framework? It is distributed network of commodity servers(A server can contain multiple clusters, and a cluster can have multiple nodes) 4. What are 4 properties of Hadoop? Accessible-Hadoop runs on large clusters of commodity machines Robust-An assumption that low commodity machines cause many machine failures. But it handles these tactfully.  Scalable-Hadoop scales linearly to handle larger data by adding more nodes to the cluster.  Simple-Hadoop allows users to quickly write efficient parallel code 5. What kind of data Hadoop needs? Traditional RDBMS having relational

Here's Quick Guide on Hadoop Security

Image
Here is a topic of security and tools in Hadoop. These are security things that everyone needs to take care of while working with the Hadoop cluster. Hadoop Security Security We live in a very insecure world. For instance, your home's front door to all-important virtual keys, your passwords, everything needs to be secured. In Big data systems, where humongous amounts of data are processed, transformed, and stored. So security you need for the data . Imagine if your company spent a couple of million dollars installing a Hadoop cluster to gather and analyze your customers' spending habits for a product category using a Big Data solution. Here lack of data security leads to customer apprehension. Security Concerns Because that solution was not secure, your competitor got access to that data, and your sales dropped 20% for that product category. How did the system allow unauthorized access to data? Wasn't there any authentication mechanism in place? Why were there no alerts? Th

Top 100 Hadoop Complex Interview Questions (Part 1 of 4)

Image
The below list is complex interview questions as part of Hadoop tutorial (part 1 of 4) you can go through these questions quickly. 1. What is BIG DATA? Ans). Big Data is nothing but an assortment of such a huge and complex data that it becomes very tedious to capture, store, process, retrieve and analyze it with the help of on-hand database management tools or traditional data processing techniques. 2. Can you give some examples of Big Data? Ans). There are many real-life examples of Big Data! Facebook is generating 500+ terabytes of data per day, NYSE (New York Stock Exchange) generates about 1 terabyte of new trade data per day, a jet airline collects 10 terabytes of sensor data for every 30 minutes of flying time. All these are a day to day examples of Big Data! 3. Can you give a detailed overview of the Big Data being generated by Facebook?   Ans). As of December 31, 2012, there are 1.06 billion monthly active users on Facebook and 680 million mobile users. On an average,

Top 100 Hadoop Complex Interview Questions (Part 2 of 4)

Image
I am giving a series of Hadoop interview questions. This is my 2nd set of questions. You can get quick benefits by reading these questions from start to end. 1). If a data Node is full how it’s identified? Ans). When data is stored in a data node, then the metadata of that data will be stored in the Namenode. So Namenode will identify if the data node is full. 2). If data nodes increase, then do we need to upgrade Namenode? Ans). While installing the Hadoop system, Namenode is determined based on the size of the clusters. Most of the time, we do not need to upgrade the Namenode because it does not store the actual data, but just the metadata, so such a requirement rarely arise. 3). Are job tracker and task trackers present in separate machines? Ans). Yes, job tracker and task tracker are present in different machines. The reason is job tracker is a single point of failure for the Hadoop MapReduce service. If it goes down, all running jobs are halted. 4). When we send a da

Big data: Quiz-1 Hadoop Top Interview Questions

Image
In this post, I have given a Quiz on Big data with answers. This is part-1 set of questions for your quick reference. Photo credit: Srini Q.1) How Hadoop achieve scaling in terms of storage? A.By increasing the hard disk capacity of the machine B.By increasing the RAM capacity of the machine C.By increasing both the hard disk and RAM capacity of the machine D.By increasing the hard disk capacity of the machine and by adding more machine Q.2) How fault tolerance with respect to data is achieved in Hadoop? A.By breaking the data into smaller blocks and distributing these smaller blocks into several machines B.By adding extra nodes. C.By breaking the data into smaller blocks and copying each block several times, and distributing these replicas across several machines. By doing this Hadoop makes sure even if the machines are failed the replica is present in some other machine D.None of these Q.3) In what all parameters Hadoop scales up? A. Storage only B. Performan

Hadoop Skills Free Video Training

Are you interested in the world of Big data technologies, but find it a little cryptic and see the whole thing as a big puzzle. The hadoop free video training really useful to learn quickly. Are you looking to understand how Big Data impact large and small business and people like you and me? Do you feel many people talk about Big Data and Hadoop, and even do not know the basics like history of Hadoop, major players and vendors of Hadoop. Then this is the course just for you! This course builds a essential fundamental understanding of Big Data problems and Hadoop as a solution. This course takes you through: Understanding of Big Data problems with easy to understand examples. History and advent of Hadoop right from when Hadoop wasn’t even named Hadoop. What is Hadoop Magic which makes it so unique and powerful. Understanding the difference between Data science and data engineering, which is one of the big confusions in selecting a carrier or understanding a job role. And mos