Posts

Showing posts from December, 2016

Featured Post

SQL Interview Success: Unlocking the Top 5 Frequently Asked Queries

Image
 Here are the five top commonly asked SQL queries in the interviews. These you can expect in Data Analyst, or, Data Engineer interviews. Top SQL Queries for Interviews 01. Joins The commonly asked question pertains to providing two tables, determining the number of rows that will return on various join types, and the resultant. Table1 -------- id ---- 1 1 2 3 Table2 -------- id ---- 1 3 1 NULL Output ------- Inner join --------------- 5 rows will return The result will be: =============== 1  1 1   1 1   1 1    1 3    3 02. Substring and Concat Here, we need to write an SQL query to make the upper case of the first letter and the small case of the remaining letter. Table1 ------ ename ===== raJu venKat kRIshna Solution: ========== SELECT CONCAT(UPPER(SUBSTRING(name, 1, 1)), LOWER(SUBSTRING(name, 2))) AS capitalized_name FROM Table1; 03. Case statement SQL Query ========= SELECT Code1, Code2,      CASE         WHEN Code1 = 'A' AND Code2 = 'AA' THEN "A" | "A

Storage area network: Quick Definition

SANs are primarily used to enhance storage devices, such as disk arrays, tape libraries, and optical jukeboxes, accessible to servers so that the devices appear to the operating system as locally attached devices. A SAN typically has its own network of storage devices that are generally not accessible through the local area network (LAN) by other devices. The cost and complexity of SANs dropped in the early 2000s to levels allowing wider adoption across both enterprise and small to medium-sized business environments.   Best Uses of Storage Area Networks A SAN does not provide file abstraction, only block-level operations. However, file systems built on top of SANs do provide file-level access, and are known as shared-disk file systems. More to read: Best SAN Storage area networks acronyms Top 20 benefits of SAN Storage area networks

8 Useful Books to Change Your Mindset

Image
I have given eight useful books. These are useful to change your mindset. 1.  10 Percent Entrepreneur Everyone knows that building a startup means hard work and long hours, with payment in stock that may turn out to be worthless. Indeed, that’s part of the glamour. But it also keeps some people with good ideas from getting started. Patrick J. McGinnis, a Wall Street venture capitalist, says don’t worry: You can “live your startup dream without leaving your day job.” Devote 10 percent of your time and capital to pursuing your dream, McGinnis says, and you can keep your job and the security that goes with it. McGinnis, who identifies himself as a 10-percenter, provides a detailed plan to identify a promising first project. He shows how to invest resources in a savvy way, and how to develop something you love doing into a business. Best of all, until you reach your dream of an independent business, McGinnis promises you will perform better at your day job with a step-by-step pl

SAN: Real Architecture Explained

Image
A SAN is connected behind the servers. SANs provide block-level access to shared data storage. Block level access refers to the specific blocks of data on a storage device as opposed to file level access. One file will contain several blocks. Storage Area Networks (SANs) SANs provide high availability and robust business continuity for critical data environments. SANs are typically switched fabric architectures using Fibre Channel (FC) for connectivity. The term switched fabric refers to each storage unit being connected to each server via multiple SAN switches also called SAN directors which provide redundancy within the paths to the storage units. This provides additional paths for communications and eliminates one central switch as a single point of failure. Ethernet has many advantages similar to Fibre Channel for supporting SANs. Some of these include high speed, support of a switched fabric topology, widespread interoperability, and a large set of management tools. In a st

Storage area network (SAN): Networks Vs Configurations

These are most popular terms used in Storage area networks area. Every developer must know these terms clearly. Highly useful to explain in interviews. Frequently used terminology in SAN given below for your quick reference. 

SAN Vs NAS Benefits, Differences

SANs are particularly helpful in backup and disaster recovery.  Within a SAN, data can be transferred from one storage device to another without interacting with a server. 

19 Top Unix File Scenario Commands

Image
ETL developers main task is to browse various flat files before they start testing. File browsing in UNIX is tricky. If you know right command to do it you can save a lot of time. These 19 top UNIX files commands useful to use in your project. In UNIX a file normally can have Header, Detail and Trailer. There are scenarios where you need only details without header and Trailer, and need only recent one record, and you need to skip some records from the input files. So for all the File based scenarios, I have given useful UNIX commands.   1). How to print/display the first line of a file?  There are many ways to do this. However the easiest way to display the first line of a file is using the [head] command.  $> head -1 file. Txt If you specify [head -2] then it would print first 2 records of the file.  Another way can be by using [sed] command. [sed] is a very powerful text editor which can be used for various text manipulation purposes like this.  $> sed '2,$ d

4 Top Data Mining Tools

Image
Many data mining tools present out of those listed here top free tools useful for development. 4 Top Data Mining Tools 1. Rapid Miner (erstwhile YALE) This is very popular since it is a ready-made, open-source, no-coding-required software, which gives advanced analytics.  Written in Java, it incorporates multifaceted data mining functions such as data preprocessing, visualization, predictive analysis, and can be easily integrated with WEKA and R-tool to directly give models from scripts written in the former two. 2. WEKA This is a JAVA based customization tool, which is free to use. It includes visualization and predictive analysis and modeling techniques, clustering, association, regression, and classification. 3. R-Programming Tool This is written in C and FORTRAN and allows the data miners to write scripts just like a programming language/platform. Hence, it is used to make statistical and analytical software for data mining. It supports graphical analysis, both linear and nonlinea

New fresh best Daily Python tips to your Inbox

Image
Thanks for reading www.biaganalytics.me . Just add your details to get Python daily tips to your inbox. #daily-python-tips-to-your-inbox-add-value-to-your-career The below are the 6 top benefits. Alternatively try for Python the best training. Presence of Third Party Modules Extensive Support Libraries Open Source and Community Development Learning Ease and Support Available User-friendly Data Structures Productivity and Speed Subscribe to daily Python tips

Data mining Real life Examples

Image
Data mining is a process to understand about unused data and to get insights from the data. You need a quick tutorial and examples to perfect with this process. The best example is the Backup data business use case to mine the data for useful information. The backup data is simply wasted unless a restore is required. It should be leveraged for other, more important things. This method is called Data Mining Technique . --- For example, can you tell me how many instances of any single file is being stored across your organization? Probably not.  But if it’s being backed up to a single-instance repository, the repository stores a single copy of that file object, and the index in the repository has the links and metadata about where the file came from and how many redundant copies exist. By simply providing a search function into the repository, you would instantly be able to find out how many duplicate copies exist for every file you are backing up, and where they are coming from. Know

SAN Storage: All about its 4 Real Usages

The storage area network fundamentals everyone must know you understand about applications. These applications may refer to horizontal applications (e.g., backup, archiving, data replication, disaster protection, and data warehousing) or vertical applications (e.g., online transaction processing (OLTP), enterprise resource planning (ERP) business applications, electronic commerce, broadcasting, prepress, medical, and geophysics). SAN is also well suited to making performance and high availability more scalable and more affordable in applications such as clustering and data sharing. This article discusses two major horizontal applications, backup and data sharing, and how they interact with SAN. The other important point is, if you are a job seeker the below list is helpful. This is just a like a one time SAN interviews refresher. So you can do well in interviews. 1. Realtime (or window-less) backup The importance of window-less backup (also called hot backup) becomes obvious when it a

20 Top Benefits of SAN (storage area network)

Image
In my previous post I have covered about fundamentals of SAN (storage area networks). The below are the list of top 20 benefits of storage area networks.   What are the major benefits of san Greater performance: Current Fibre Channel SANs allow connection to disks at hundreds of megabytes per second; the near future will see speeds in multiple gigabytes to terabytes per second. Increased disk utilization: SANs enable more than one server to access the same physical disk, which lets you allocate the free space on those disks more effectively. Higher availability to storage by use of multiple access paths: A SAN allows for multiple physical connections to disks from a single or multiple servers. Deferred disk procurement: That’s business-speak for not having to buy disks as often as you used to before getting a SAN. Because you can use disk space more effectively, no space goes to waste. Reduced data center rack/floor space: Because you don’t need to buy big servers with room for lot

SAN real configuration ideas to speed the devices

Image
In today’s terms, the technical description of a SAN (Storage Area Network) is a collection of computers and storage devices, connected over a high-speed optical network and dedicated to the task of storing and protecting data. In a nutshell, you use a SAN to store and protect data.  A SAN uses the SCSI (Small Computer Storage Interconnect) and FC (Fibre Channel) protocols to move data over a network and store it directly to disk drives in block format. Photo credit: Srini SAN Configuration Today, that high-speed network usually consists of fiber-optic cables and switches that use light waves to transmit data with a connection protocol known as Fibre Channel. (A protocol is a set of rules used by the computer devices to define a common communication language.) More and more, regular Internet protocol (IP)–based corporate networks, and even the Internet, are being used as the network part of a SAN.  Nowadays Internet is the part of SAN IP Networks IP networks that

R Language: Data types and structures

Image
To make the best of the R language, you'll need a strong understanding of the basic data types and data structures and how to operate on those. Very Important to understand because these are the things you will manipulate on a day-to-day basis in R. Everything in R is an object. The basic data types  logical (e.g., TRUE, FALSE) integer (e.g,, 2L, as.integer(3)) numeric (real or decimal) (e.g, 2, 2.0, pi) complex (e.g, 1 + 0i, 1 + 4i) character (e.g, "a", "swc") The basic data structures in R vector list matrix data frame factors tables Vector in R A vector is the most common and basic data structure in R and is pretty much the workhorse of R.  Vectors can be of two types: atomic vectors lists

R language five useful real functions

Image
In Data Science R language plays a crucial role. In the R language, there are five top functions present. These functions I have explained in this post. 1. Storing Values Stores a value to variable. The value can be same or mixed data type. It is available /* */ to give comments for your scripts inside Char, Double, Boolean and Decimal are more frequently used data types 2. Reading data from files Large data objects will usually be read as values from external files rather than entered during an R session at the keyboard.  R input facilities are simple and their requirements are fairly strict and even rather inflexible. There is a clear presumption by the designers of R that you will be able to modify your input files using other tools, such as file editors or Perl1 to fit in with the requirements of R. Generally this is very simple. If variables are to be held mainly in data frames, as we strongly suggest they should be, an entire data frame can be read directly w

How to write R Script in simple way

Image
A script is a good way to keep track of what you're doing. If you have a long analysis, and you want to be able to recreate it later, a good idea is to type it into a script. If you're working in the Windows R GUI (also in the Mac R GUI), there is even a built-in script editor. Photo credit: Srini To get to it, pull down the File menu and choose New Script (New Document on a Mac). A window will open in which you can type your script. R Script is a series of commands that you can execute at one time and you can save a lot of time. the script is just a plain text file with R commands in it. How to create an R Script You can prepare a script in any text editor, such as vim, TextWrangler, or Notepad. You can also prepare a script in a word processor, like Word, Writer, TextEdit, or WordPad, PROVIDED you save the script in plain text (ASCII) format. This should (!) append a ".txt" file extension to the file. Drop the script into your working directory, and

Here's to Know Data lake Vs Database

Image
In a data lake, data stored internally in a repository. You can call it a blob. The data in the lake a no-format data, but you need a schema for the database.  Data lake Repository Database In the database, the Schema definition you need before you store data on it. It should follow Codd's rules. Here data is completely formatted. The data stores here in Tables, so you need SQL language to read the records. Poor performance in terms of scalability. Data lake It doesn't have any format - it's just a dump. You can send this dump to the Hadoop repository for data analysis. This repository can be incremental. You can build a database. The data lake is a dump of data with no format. It needs a pre-format before it sends for analytics. Data security and encryption: You need these before you send data to Hadoop. In real-time, you need to pre-process data. This data you need to send to the data warehouse to get insights.