Posts

Featured Post

SQL Interview Success: Unlocking the Top 5 Frequently Asked Queries

Image
 Here are the five top commonly asked SQL queries in the interviews. These you can expect in Data Analyst, or, Data Engineer interviews. Top SQL Queries for Interviews 01. Joins The commonly asked question pertains to providing two tables, determining the number of rows that will return on various join types, and the resultant. Table1 -------- id ---- 1 1 2 3 Table2 -------- id ---- 1 3 1 NULL Output ------- Inner join --------------- 5 rows will return The result will be: =============== 1  1 1   1 1   1 1    1 3    3 02. Substring and Concat Here, we need to write an SQL query to make the upper case of the first letter and the small case of the remaining letter. Table1 ------ ename ===== raJu venKat kRIshna Solution: ========== SELECT CONCAT(UPPER(SUBSTRING(name, 1, 1)), LOWER(SUBSTRING(name, 2))) AS capitalized_name FROM Table1; 03. Case statement SQL Query ========= SELECT Code1, Code2,      CASE         WHEN Code1 = 'A' AND Code2 = 'AA' THEN "A" | "A

Top 100 Hadoop Complex Interview Questions (Part 1 of 4)

Image
The below list is complex interview questions as part of Hadoop tutorial (part 1 of 4) you can go through these questions quickly. 1. What is BIG DATA? Ans). Big Data is nothing but an assortment of such a huge and complex data that it becomes very tedious to capture, store, process, retrieve and analyze it with the help of on-hand database management tools or traditional data processing techniques. 2. Can you give some examples of Big Data? Ans). There are many real-life examples of Big Data! Facebook is generating 500+ terabytes of data per day, NYSE (New York Stock Exchange) generates about 1 terabyte of new trade data per day, a jet airline collects 10 terabytes of sensor data for every 30 minutes of flying time. All these are a day to day examples of Big Data! 3. Can you give a detailed overview of the Big Data being generated by Facebook?   Ans). As of December 31, 2012, there are 1.06 billion monthly active users on Facebook and 680 million mobile users. On an average,

How to Setup Hadoop Cluster Top Ideas

Image
Hadoop cluster setup in Centos Operating System explained in this post. So you can install CentOs either in your Laptop or in Virtual Machine. Hadoop Cluster Setup Process 9 Steps Process to Setup Hadoop Cluster Step 1:  Installing Sun Java on Linux. Commands to execute for the same: sudo apt-add-repository ppa:flexiondotorg/java sudo apt-get update sudo apt-get install sun-java6-jre sun-java6-plugin sudo update-java-alternatives -s java-6-sun Step 2:  Create Hadoop User. Commands to execute for the same: $sudo addgroup hadoop $sudo adduser —ingroup hadoop hduser Step 3:  Install SSH Server if not already present. Commands are: $ sudo apt-get install openssh-server $ su - hduser $ ssh-keygen -t rsa -P "" $ cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys Step 4:  Installing Hadoop. Commands for the same are: $wget http://www.eng.lsu.edu/mirrors/apache/hadoop/core/hadoop-0.22.0/hadoop-0.22.0.tar.gz $ cd /home/hduser $ tar xzf hadoop-0.

Top features of HPCC -High performance Computing Cluster

Image
[Hadoop Jobs] HPCC (High-Performance Computing Cluster) was elaborated and executed by LexisNexis Risk Solutions. The creation of this data processing program started in 1999 and applications remained in manufacture by belated 2000.  The HPCC style as well uses product arrays of equipment operating the Linux Operating System. Custom configuration code and Middleware parts remained elaborated and layered on the center Linux Operating System to supply the implementation ecosystem and dispersed filesystem aid needed for data-intensive data processing. LexisNexis as well executed a spic-and-span high-level lingo for data-intensive data processing. The ECL (data-centric program design language)|ECL program design lingo is a high-level, declarative, data-centric, Implicit parallelism|implicitly collateral lingo that permits the software coder to determine what the information handling effect ought to be and the dataflows and transformations that are required to attain the effec

What is Tibco Spotfire - Visualization tool

Image
[Tibco online learning] When you start Spotfire for the first time, your first task is to load some data. This data can come from a file, a database, or even the clipboard. Data is at the heart of all analysis, and it's important that you know, not only how to load data into Spotfire, but also how data works. If you handle a lot of data in spreadsheet form, you will no doubt understand its content and meaning very well. You might even have developed advanced and insightful representations of your data. However, there is so much more you can do with Spotfire to improve the handling of this subject matter. Importing data into Spotfire is just the beginning. To progress into its rich analytic world, you will have to become familiar with the relational database model. You will have to learn some formal data concepts. We will therefore spend some time taking a look at some basic database principles to set you on your way to advance quickly beyond the limited world of the sprea

Hadoop Bigdata a Quick Story for Dummies

Mike Olson is one of the fundamental brains behind the Hadoop development. Yet even he looks at the new type of "Big Data" programming utilized inside Google. Mike Olson runs an organization that represents considerable authority on the planet's most sultry programming. He's the CEO of Cloudera, a Silicon Valley startup that arrangements in Hadoop, an open source programming stage focused around tech that transformed Google into the most predominant drive on the web. Hadoop is relied upon to fuel an $813 million product advertise by the year 2016 . In any case even Olson says it’s as of now old news. Hadoop sprung from two exploration papers Google distributed in late 2003 and 2004. One portrayed the Google File System, a method for putting away enormous measures of data crosswise over a great many extremely inexpensive machine servers, and the other nitty gritty Mapreduce, which pooled the preparing power inside each one of those servers and crunched all that

Top sub-modules in Cloud Computing Technology Architecture

Image
#Top sub-modules in Cloud Computing Technology Architecture: The main architectural characteristics of a cloud computing environment. One fundamental architectural aspect of a cloud is heterogeneity. A cloud must support the aggregation of heterogeneous hardware and software resources, as it happens with scientific experiments. The concept of virtualization is also a key aspect for clouds. Through virtualization, many users may benefit from the same infrastructure using independent instances. Virtualization enables the first security level in the clouds, since it allows the isolation of environments. In clouds, each user has unique access to its individual virtualized environment. Cloud Architecture Virtualization Heterogeneity Security Resource sharing Scalability Monitoring Resource sharing is provided by clouds, since each resource is represented as a single artifact, giving the impression of a single dedicated resource. Scalability is mainly defined by increasing

Top features in the design of data modelling (1 of 2)

Image
[Data modelling jobs career] The analogy with architecture is particularly appropriate because architects are designers and data modeling is also a design activity. In design, we do not expect to find a single correct answer, although we will certainly be able to identify many that are patently incorrect. Two data modelers (or architects) given the same set of requirements may produce quite different solutions. Data modeling is not just a simple process of "documenting requirements" though it is sometimes portrayed as such. Several factors contribute to the possibility of there being more than one workable model for most practical situations. First, we have a choice of what symbols or codes we use to represent real-world facts in the database. A person's age could be represented by Birth Date, Age at Date of Policy Issue, or even by a code corresponding to a range ("H" could mean "born between 1961 and 1970"). Second, there is usually more