Posts

Showing posts from July, 2015

Featured Post

SQL Interview Success: Unlocking the Top 5 Frequently Asked Queries

Image
 Here are the five top commonly asked SQL queries in the interviews. These you can expect in Data Analyst, or, Data Engineer interviews. Top SQL Queries for Interviews 01. Joins The commonly asked question pertains to providing two tables, determining the number of rows that will return on various join types, and the resultant. Table1 -------- id ---- 1 1 2 3 Table2 -------- id ---- 1 3 1 NULL Output ------- Inner join --------------- 5 rows will return The result will be: =============== 1  1 1   1 1   1 1    1 3    3 02. Substring and Concat Here, we need to write an SQL query to make the upper case of the first letter and the small case of the remaining letter. Table1 ------ ename ===== raJu venKat kRIshna Solution: ========== SELECT CONCAT(UPPER(SUBSTRING(name, 1, 1)), LOWER(SUBSTRING(name, 2))) AS capitalized_name FROM Table1; 03. Case statement SQL Query ========= SELECT Code1, Code2,      CASE         WHEN Code1 = 'A' AND Code2 = 'AA' THEN "A" | "A

Networking in IoT age for big opportunities (1 of 3)

Networking is common in the age of IOT. The basics I want to say are networking means connecting objects together. The networking is possible with wires and without cables. The without cables you can say as wireless. How Computers Connected Computers are connected by using fiber cables. Each computer is connected by cable to a central switch, which connects to the rest of the network   The advantage of wireless networking is no cables required. In a wireless network, most cables and switches are moot. Radio transmitters and receivers take the place of cables. Networking software must be installed. This drives networking functioning. Benefits of network To share resources Sharing information Sharing applications

Differences: Data Center Vs. Telecom Networking

Image
Data Center Networking Data Center (DC)-based services are emerging as a relevant source of network capacity demand for service providers and telecom operators. Cloud computing services, Content Distribution Networks (CDNs), and, generally, networked applications have a huge impact on the telecom operator infrastructure. New trends The Cloud computing paradigm provides a new model for service delivery where computing resources can be provided on-demand across the network. This elasticity permits the sharing of resources among users, thus reducing costs and maximizing utilization while posing a challenge towards an efficient cloud-aware network. The computing resources can be provided on-demand depending on the user requests. Such resources can be allocated on distinct servers into a data center, or through data centers distributed in the network. Under this new model, the users access their assigned resources, as well as the applications and services using them, through telecom o

Cloud Storage as a Service Basics (2 of 3)

The really awesome point is cloud storage. Yes, you are storing data in cloud. But you need to understand here few good things about it.   What is cloud storage... Cloud storage involves exactly what the name suggests—storing your data with a cloud service provider rather than on a local system. As with other cloud services, you access the data stored on the cloud via an Internet link. Even though data is stored and accessed remotely, you can maintain data both locally and on the cloud as a measure of safety and redundancy.  Cloud storage has a number of advantages over traditional data storage : The benefits.. If you store your data on a cloud, you can get at it from any location that has Internet access.  This makes it especially appealing to road warriors.  Workers don’t need to use the same computer to access data nor do they have to carry around physical storage devices.  Also, if your organization has branch offices, they can all access the data from the cloud p

How to achieve Virtualization in cloud computing real ideas

In order to run applications on a Cloud, one needs a flexible middleware that eases the development and the deployment process. Middleware Approach to Deploy Application on Cloud GridGain provides a middleware that aims to develop and run applications on both public and private Clouds without any changes in the application code.  It is also possible to write dedicated applications based on the map/reduce programming model. Although GridGain provides a mechanism to seamlessly deploy applications on a grid or a Cloud, it does not support the deployment of the infrastructure itself. It does, however, provide protocols to discover running GridGain nodes and organize them into topologies (Local Grid, Global Grid, etc.) to run applications on only a subset of all nodes. Elastic Grid infrastructure provides dynamic allocation, deployment, and management of Java applications through the Cloud.  It also offers a Cloud virtualization layer that abstracts specific Cloud computing provide

Big data: Quiz-1 Hadoop Top Interview Questions

Image
In this post, I have given a Quiz on Big data with answers. This is part-1 set of questions for your quick reference. Photo credit: Srini Q.1) How Hadoop achieve scaling in terms of storage? A.By increasing the hard disk capacity of the machine B.By increasing the RAM capacity of the machine C.By increasing both the hard disk and RAM capacity of the machine D.By increasing the hard disk capacity of the machine and by adding more machine Q.2) How fault tolerance with respect to data is achieved in Hadoop? A.By breaking the data into smaller blocks and distributing these smaller blocks into several machines B.By adding extra nodes. C.By breaking the data into smaller blocks and copying each block several times, and distributing these replicas across several machines. By doing this Hadoop makes sure even if the machines are failed the replica is present in some other machine D.None of these Q.3) In what all parameters Hadoop scales up? A. Storage only B. Performan

4 Modern databases Every Developer Should Know

Image
Below is the complete list of NoSQL databases currently available in the market. NoSQL Databases 1. Sorted Order Column Oriented Stores Google's Bigtable espouses a model where data is stored in a column-oriented way. This contrasts with the row-oriented format in RDBMS. The column-oriented storage allows data to be stored effectively. It avoids consuming space when storing nulls by simply not storing a column when a value doesn't exist for that column. Each unit of data can be thought of as a set of key/value pairs, where the unit itself is identified with the help of a primary identifier, often referred to as the primary key. Bigtable and its clones tend to call this primary key to the row-key. Example: The name column-family bucket stores the following values:  For row-key: 1  first_name: John  last_name: Doe  For row-key: 2  first_name: Jane T he location column-family stores the following: For row-key: 1 zip_code: 10001 For row-key: 2 zip_code:

Internet of Things Basics (Part-7)

The connecting devices and getting raw data from multiples sources, and sending this data to analysis is a major concept in IoT. Devices can be connected through Protocols. What is protocol... I want to share some information on advanced IP based Protocols. Read my previous post on IOT . The role of IPv6- It is advanced in the Range of internet protocols. The main function is it supports Mobility. This is not easily achievable with IPv4 for a number of reasons; however, MIPv6 described in RFC 3775, "Mobility Support in IPv6" (June 2004), among others, facilitates this task. RFC 3775 is known as the "MIPv6 base specification." RFCs are specifications and related materials published by the Internet Engineering Task Force (IETF). IPv6 mobility, specifically MIPv6, relies on IPv6 capabilities. RFC 3775 notes that without specific support for mobility in IPv6, packets destined to an MN would not be able to reach it while the MN is away from its home ne

Hadoop: How to find which file is healthy

Image
Hadoop provides file system health check utility which is called "fsck". Basically, it checks the health of all the files under a path It also checks the health of all the files under the '/'(root). BIN/HADOOP fsck / - It checks the health of all the files BIN/HADOOP fsck /test/ - It checks the health of files under the path By default fsck utility cannot do anything for under replicated blocks and over replicated blocks. Hadoop itself heal the blocks.   How to find which file is healthy It prints out dot for each healthy file It will print a message for each file, if it is not healthy, also for under replicated blocks, over replicated blocks, mis-replicated blocks, and corrupted blocks. By default fsck utility cannot do anything for under replicated blocks and over replicated blocks. Hadoop itself heal the blocks. How to delete corrupted blocks BIN/HADOOP fsck -delete block-names It will delete all corrupted blocks BIN/HADOOP fsck -m

Top SAP HANA Iot must read Interview Questions(3 of 3)

The below is my third set of interview questions. In this lot I have given ten interview questions for your quick reference. What is SAP HANA? SAP deployed SAP HANA as an integrated solution that combines software and hardware, which is frequently referred to as the SAP HANA appliance. As with SAP NetWeaver Business Warehouse Accelerator (SAP NetWeaver BW Accelerator), SAP partners with several hardware vendors to provide the infrastructure that is needed to run the SAP HANA software. Lenovo partnered with SAP to provide an integrated solution. 2) What is memory for CORE ratio in SAP HANA? For in-memory computing appliances, such as SAP HANA, the amount of main memory is important. In-memory computing brings data that is kept on disk into main memory. This action allows for much faster processing of the data because the CPU cores do not have to wait until the data is loaded from disk to memory, which means each CPU is better used. SQLDBC:An SAP native database SDK that ca

SCALA in Web Development Read Now

Image
What is Scala - Scala's design has been influenced by many programming languages and ideas in programming language research. Beginner Notes on SCALA. In fact, only a few features of Scala are genuinely new; most have been already applied in some form in other languages. Scala's innovations come primarily from how its constructs are put together. At the surface level, Scala adopts a large part of the syntax of Java and C#, which in turn borrowed most of their syntactic conventions from C and C++. Expressions, Statements, and blocks are mostly as in Java, as is the syntax of classes, packages, and imports.  Besides syntax, Scala adopts other elements of Java, such as its basic types, its class libraries, and its execution model. Scala's new version. Scala also owes much to other languages. Its uniform object model was pioneered by Smalltalk and taken up subsequently by Ruby.  Its idea of universal nesting (almost every construct in Scala can be nested inside any o

Understand Data power why quality everyone wants

Information and data quality is new service work for data intense companies. I have seen not only in Analytics projects but in Mainframe projects, there is the Data Quality team. How incorrect data impact on us Information quality problems and their impact are all around us: A customer does not receive an order because of incorrect shipping information. Products are sold below cost because of wrong discount rates. A manufacturing line is stopped because parts were not ordered—the result of inaccurate inventory information. A well-known U.S. senator is stopped at an airport (twice) because his name is on a government "Do not fly" list. Many communities cannot run an election with results that people trust. Financial reform has created new legislation such as Sarbanes—Oxley.  Incorrect data leads to many problems. The role of Data Science is to use quality data for effective decisions. What is information Information is not simply data, strings of numbers, lis

IoT Architecture for very new developers: part 1 of 6

What is the architecture of internet of things -The three-layer DCM classification is more about the IoT value chain than its system architecture at run time. I hope you enjoyed with my previous  post-5  on IOT. For system architecture, some have divided the IoT system into as many as nine layers, from bottom to top: devices connectivity data collection communication device management. data rules administration applications integration While large companies such as IBM, Oracle, Microsoft, and others have comprehensive solutions, products, and services that cover almost the entire value chain. Recommendation for you:     Part-2  |  Part-1 Broadly IOT architecture can be classified as three layers: Device Layer Communication Layer Mangement Layer Device Layer: Devices or assets can be categorized as two groups: those that have inherent intelligence such as electric meters or heating, ventilation, and air-conditioning (HVAC) controllers, and those tha

Top Hive interview Questions for quick read (1 of 2)

Image
The selected interview questions on HIVE. Hive is a technology being used in Hadoop eco system. 1) What are major activities in Hadoop eco system? Within the Hadoop ecosystem, HDFS can load and store massive quantities of data in an efficient and reliable manner. It can also serve that same data back up to client applications, such as MapReduce jobs, for processing and data analysis. 2)What is the role of HIVE in HADOOP Eco system? Hive, often considered the Hadoop data warehouse platform, got its start at Facebook as their analyst struggled to deal with the massive quantities of data produced by the social network. Requiring analysts to learn and write MapReduce jobs was neither productive nor practical. Stockphotos.io 3)What is Hive in Hadoop? Facebook developed a data warehouse-like layer of abstraction that would be based on tables. The tables function merely as metadata, and the table schema is projected onto the data, instead of actually moving potentially massive set

SAP HANA: Top Data Processing Interview Questions

1. How parallel processing is achieved in SAP HANA? The phrase "divide and conquer" (derived from the Latin saying divide et impera) typically is used when a large problem is divided into a number of smaller, easier-to-solve problems. Regarding performance, processing huge amounts of data is a problem that can be solved by splitting the data into smaller chunks of data, which can be processed in parallel. 2.How data portioning will happen in SAP HANA? Although servers that are available today can hold terabytes of data in memory and provide up to eight processors per server with up to 10 cores per processor, the amount of data that is stored in an in-memory database or the computing power that is needed to process such quantities of data might exceed the capacity of a single server. To accommodate the memory and computing power requirements that go beyond the limits of a single server, data can be divided into subsets and placed across a cluster of servers, which forms a d

Big Data: Top Hadoop Interview Questions (4 of 5)

Image
1) What is MAP Reduce program? - You need to give actual steps in this program - You have to write scripts and codes 2) What is MAPReduce? -Mapreduce is a data processing model -It is combination of 2 parts. One is Mappers and the other one is Reducers 3)What will happen in Mapping phase? It takes the input data, and feeds each data element into the mapper 4)What is the function of Reducer? The reducer process all outputs from mapper and arrives at a final result 5)What kind of input required for Mapreduce? It should be structured in the form of (Key,Value) pairs 6)What is HDFS? HDFS is a file system designed for large-scale data processing under frameworks such as MapReduce. 7) Is HDFS like UNIX? No, but commands in HDFS works similarly to UNIX 8) What is Simple file command? hadoop fs -ls 9) How to copy data into HDFS file system? Copy a file into HDFS from local system 10) What is default working directory in HDFS? /user/$USER $USER ==> Your log

Big Data: IBM InfoSphere BigInsights Basics

I am explaining here why you need IBM infoSphere. You all know about what is file system in Hadoop. Hadoop is a distributed file system and data processing engine that is designed to handle extremely high volumes of data in any structure. In simpler terms, just imagine that you've got dozens, or even hundreds (or thousands!) of individual computers racked and networked together. Each computer (often referred to as a node in Hadoop-speak) has its own processors and a dozen or so 2TB or 3TB hard disk drives. All of these nodes are running software that unifies them into a single cluster, where, instead of seeing the individual computers, you see an extremely large volume where you can store your data. The beauty of this Hadoop system is that you can store anything in this space: millions of digital image scans of mortgage contracts, days and weeks of security camera footage, trillions of sensor-generated log records, or all of the operator transcription notes from a call center

Big Data: Top Cloud Computing Interview Questions (1 of 4)

Image
The below are frequently asked interview questions on Cloud computing: 1) What is the difference between Cloud and Grid? Grid: -Information service -Security Service -Data management -Execution Manageement Cloud: - Maintains up-to-date information of resources -Create VMs according to user requirement -Application deploment -User management 2) What are the different cloud standards? -Interoperability standards -Security standards -Portability Standards -Governance and Risk standards 3) What are the two different sub-systems in Cloud computing ? -Management sub system -Resource sub system 4)What is Cloud compouting? The promise of cloud computing is ubiquitous access to a broad set of applications and services, which are delivered over the network to multiple customer. 5) Why we need specialized network for Cloud services? The public Internet is the simplest choice for delivering cloud-based services. In this model, the cloud provider simply purchases Inter

Internet of Thing Awesome Basics You Need to Read Now: Part 5

Image
Internet of things can be applied to both Vertical and Horizontal of things: Applications of the Internet of Things (IoT) have spread across an enormously large number of industry sectors. The development of the vertical applications in these sectors is unbalanced. It is very important to sort out those vertical applications and identify common underpinning technologies that can be used across the board, so that interconnecting, interrelating, and synergized grand integration and new creative, disruptive applications can be achieved. IoT part 5 One of the common characteristics of the Internet of Things is that objects in a IoT world have to be instrumented  Why we need IOT is a fundamental change in the way information is generated, from mostly manual input to massively machine-generated without human intervention. To achieve such 5A (anything, anywhere, anytime, anyway, anyhow) and 3I (instrumented, interconnected, and intelligent) capabilities, some common, horizontal,

SAP HANA In-memory Real Usage

Below are the list of questions on SAP HANA In-memory. That explains the real usage. 1. What is in-memory computing? A1) In-memory computing is a technology that allows the processing of massive quantities of data in main memory to provide immediate results from analysis and transaction.  The data that is processed is ideally real-time data (that is, data that is available for processing or analysis immediately after it is created). 2. How in-memory computing works ? A2) Keep data in main memory to speed up data access. Minimize data movement by using the columnar storage concept, compression, and performing calculations at the database level.  Divide and conquer. Use the multi-core architecture of modern processors and multi-processor servers (or even scale out into a distributed landscape) to grow beyond what can be supplied by a single server. 3. What is the benefit of keeping data in memory? A3) Data accessing from main memory is much faster than accessing data from Disk. 4.

Big Data: Top NoSQL Interview Questions (2 of 5)

Image
1) What is most important character of NoSQL? High Availability 2)Different types of NoSQL databases? Key-Value stores Column Stores Graph Stores Document Stores 3)What is oracle NoSQL database? Oracle NoSQL Database is a distributed key-value database designed to provide highly reliable, scalable, and available data storage across a configurable set of systems. 4)What is the DB engine being used in Oracle NoSQL database? Oracle NoSQL Database uses Oracle Berkeley DB Java Edition as the underlying data storage engine. 5)What is oracle NoSQL database? Oracle NoSQL Database is a shared-nothing system designed to run and scale on commodity hardware. Key-value pairs are hash partitioned across server groups known as shards. At any point in time, a single key-value pair is always associated with a unique shard in the system. 6) What are unique features of Oracle NoSQL? Oracle NoSQL Database leverages the high availability features in Berkeley DB in order to provide res

10 Top NoSQL Database Recently Asked Interview Questions

Image
1) Who is involved in developing NoSQL? Amazon and Google Papers 2) What is NoSQL? You can use NoSQL on non-relational databases. Like columnar databases, by using NoSQL, you can query data from non-relational databases. 3) What are the unique features of NoSQL databases? no relationship between records need Un-structural data store data that individual records do not have a relationship with each other 4) How NoSQL-databases are faster than traditional RDBMS? Stores database on multiple servers, rather than storing the whole database in a single server Adding replicas on other servers, we can retrieve data faster even one of the servers crashes 5) What are the UNIQUE features of NoSQL? Opensource ACID complaint 6) What are the characteristics of a good NoSQL product? High availability: Fault tolerance when a single server goes down Disaster recovery: For when a data center goes down, or more likely, someone digs up a network cable just outside the data center Support: Someone to st