27 April 2015

What Is Tableau Software

All about tableau
#What is Tableau:
Tableau Software has its roots in the Stanford University Computer Science department, in a Department of Defense–sponsored research project aimed at increasing people's ability to rapidly analyze data. Chris Stolte, a Ph.D. candidate, was researching visualization techniques for exploring relational databases and data cubes.

Stolte's Ph.D. advisor, Professor Pat Hanrahan, a founding member of Pixar and chief architect for Pixar's RenderMan, was the worldwide expert in the science of computer graphics. Chris, Pat, and a team of Stanford Ph.D.s realized that computer graphics could deliver huge gains in people's ability to understand databases. Their invention VizQL™ brought together these two computer science disciplines for the first time.

Also Read | Tableau 9 for Data Science Engineers

VizQL lets people analyze data just by building drag-and-drop pictures of what they want to see. With Christian Chabot on board as CEO, the company was spun out of Stanford in 2003.


While Tableau 8 improves on the previous seven major releases of the software, the core approach to visual design remains the same: connect to a desired data source, and drag various data fields to desired parts of the Tableau screen. The result is a basic visualization that can then be enhanced and modified by dragging additional data fields to different destinations in the workspace.


Beyond this basic visualization approach, Tableau's Show Me feature allows quick choices of predefined visualizations by just selecting relevant data fields and clicking a thumbnail. For more advanced requirements, Tableau features a complete formula language, as well as more robust data connection options.

Also Read | The best Tableau Useful Commands

When you first start Tableau, you are presented with the Start Page. The largest portion of the Start Page is reserved for thumbnails of recent workbooks you have used. Simply click on any one of these to open the workbook (like Microsoft Excel, Tableau's format for storing data on your disk drive is in a workbook, with a .TWB or .TWBX file extension). You may also open sample workbooks included with Tableau 8 by clicking the desired thumbnail at the bottom of the Start Page.


If you want to create a new workbook, you must first connect to a data source (types of data sources Tableau works with include industry standard databases such as Oracle or Microsoft SQL Server, Microsoft Excel spreadsheets, text files, and so forth). Unlike spreadsheet or word processing programs, Tableau must connect to some existing data before you can create a visualization.


Certain data sources, known as saved data sources, will appear on the left side of the Start Page. These "pointers" to an existing data source can be selected by simply clicking them. If you want to connect to a different data source, click the Connect to Data tab (the tab with the "barrel" icon) in the upper right, or click Connect to Data in the upper left under the Data section. Once you've connected to a data source, a new workspace will appear where you can drag and drop desired data fields.

Also Read | All about Tableau Self-service tool for Data Visualization

19 April 2015

1000 SQL Queries For Practice (Part-1)

Welcome to SQL Tutorial. As I told in my previous posts. Learning SQL needs lot of Practice instead of reading more.

In my series of posts, I am giving best SQL examples for practice. So that you can become SQL professional with my tutorials.

Always Look For SQL/PL-SQL Jobs-Great Demand Ahead for people who has hands-on experience

Download Free 1000 SQL Queries 




15 April 2015

Overview Of Cloud Standards

Cloud computing slowly becoming realty. So it has to address many concerns such as security, interoperability, portability and governance at the earliest opportunity.This can be accelerated by compliance to guidelines and standards defined in consensus by the cloud providers. Without addressing these concerns, users would be wary to tread this path in spite of its powerful economic model for business computing.

    Also read: Cloud computing more topics

Interoperability/integration - interoperability enables products/software components to work with or integrate with each other seamlessly, in order to achieve a desired result. Thus, it provides flexibility and choice to use multiple products to achieve our need. This is enabled by either integrating through standard interfaces or by means of a broker that converts one product interface to another.

Security - security involves the protection of information assets through various policies, procedures and technologies, which need to adhere to standards and best practices in order to achieve the desired level of security.

    For example, Payment Card Industry (PCI) data security standards from PCI SSC define ways to secure credit card data to avoid fraud. This is applicable to all organisations that hold, process or pass credit cardholder information.

Cloud standards
Image courtesy|Stockphotos.io
Portability - A software is said to be portable when the cost of porting the same from an existing platform for which it was originally developed, to a new platform, is less than the cost of re-writing it for the new platform. Software with good portability thus avoids vendor lock-in.

      This is typically achieved by adhering to standard interfaces defined between the software              component and vendor platforms. For example, Java programs are set to be portable across operating systems (OS) that adhere to standard interfaces defined between the Java runtime environment and the OS.

Governance - Risk Management and Compliance (GRC) - governance focuses on ensuring that the enterprise adheres to defined policies and processes. Risk management puts in controls to manage and mitigate risks as defined by the enterprise.

      Compliance ensures that the enterprise adheres to various legal/legislative as well as internal policies. Standards have been defined for IT systems to adhere to certain industry as well as legal standards such as Sarbanes-Oxley (SOX) [4], Health Insurance Portability and Accountability Act (HIPAA), etc.

12 April 2015

Oozie - Concepts And Architecture

Oozie is a workflow/coordination system that you can use to manage Apache Hadoop jobs.  It is one of the main components of Oozie is the Oozie server — a web application that runs in a Java servlet container (the standard Oozie distribution is using Tomcat). This server supports reading and executing Workflows, Coordinators, Bundles, and SLA definitions. It implements a set of remote Web Services APIs that can be invoked from Oozie client components and third-party applications.
Add a note hereThe execution of the server leverages a customizable database.

This database contains Workflow, Coordinator, Bundle, and SLA definitions, as well as execution states and process variables. The list of currently supported databases includes MySQL, Oracle, and Apache Derby.

The Oozie shared library component is located in the Oozie HOME directory and contains code used by the Oozie execution.

Oozie provides a command-line interface (CLI) that is based on a client component, which is a thin Java wrapper around Oozie Web Services APIs. These APIs can also be used from third-party applications that have sufficient permissions.

A single Oozie server implements all four functional Oozie components:
  • Oozie Workflow
  • Oozie Coordinator
  • Oozie Bundle
  • Oozie SLA
  • Oozie server is described in this chapter, starting with what Oozie Workflow is and how you can use it.

Oozie Architecture:


 

05 April 2015

5 top Data Storage Patterns to handle variety of data

Data patterns

Data is now variety of patterns. Data is now more than just plain text, it can exist in various persistence-storage mechanisms, with Hadoop distributed file system (HDFS) being one of them.

The way data is ingested or the sources from which data is ingested affects the way data is stored. On the other hand, how the data is pushed further into the downstream systems or accessed by the data access layer decides how the data is to be stored.

Role of RDBMS

The need to store huge volumes of data has forced databases to follow new rules of data relationships and integrity that are different from those of relational database management systems (RDBMS). RDBMS follow the ACID rules of atomicity, consistency, isolation and durability. These rules make the database reliable to any user of the database. However, searching huge volumes of big data and retrieving data from them would take large amounts of time if all the ACID rules were enforced.

A typical scenario is when we search for a certain topic using Google. The search returns innumerable pages of data; however, only one page is visible or basically available (BA). The rest of the data is in a soft state (S) and is still being assembled by Google, though the user is not aware of it. By the time the user looks at the data on the first page, the rest of the data becomes eventually consistent (E). This phenomenon—basically available soft state and eventually consistent—is the rule followed by the big data databases, which are generally NoSQL databases following BASE properties.

Database theory suggests that any distributed NoSQL big database can satisfy only two properties predominantly and will have to relax standards on the third. The three properties are consistency, availability, and partition tolerance (CAP). This is the CAP theorem.
  • Polyglot pattern: Multiple types of storage mechanisms—like RDBMS, file storage, CMS, OODBMS, NoSQL and HDFS—co-exist in an enterprise to solve the big data problem.
  • The aforementioned paradigms of ACID, BASE, and CAP give rise to new big data storage patterns like below:
  • Façade pattern: HDFS serves as the intermittent Façade for the traditional DW systems.
  • Lean pattern: HBase is indexed using only one column-family and only one column and unique row-key.
  • NoSQL pattern: Traditional RDBMS systems are replaced by NoSQL alternatives to facilitate faster access and querying of big data.

04 April 2015

Data warehouse 2.0 in Big Data World

The new data warehouse, often called “Data Warehouse 2.0,” is the fast-growing trend of doing away with the old idea of huge, off-site, mega-warehouses stuffed with hardware and connected to the world through huge trunk lines and big satellite dishes.  The replacement is very different from that highly controlled, centralized, and inefficient ideal towards a more cloud-based, decentralized preference of varied hardware and widespread connectivity.
In today’s world of instant, varied access by many different users and consumers, data is no longer nicely tucked away in big warehouses.  Instead, it is often stored in multiple locations (often with redundancy) and overlapping small storage spaces that are often nothing more than large closets in an office building.  The trend is towards always-on, always-accessible, and very open storage that is fast and friendly for consumers yet complex and deep enough to appease the most intense data junkie.

02 April 2015

Internet Of Things Basics (Part-1)

IOT Protocols
IOT Protocols
IBM investing $3 billion dollars on internet of things(IOT). What is IOT -It estimates that 90 per cent of all data generated by devices like smartphones, tablets, connected vehicles and appliances is never analysed or acted on.

In simple terms, it means machine-to-machine connecting. The emergence of the Internet of Things (IoT) destroys every precedent and preconceived notion of network architecture. To date, networks have been invented by engineers skilled in protocols and routing theory.


But the architecture of the Internet of Things will rely much more upon lessons derived from nature than traditional (and ossified, in my opinion) networking schemes.

It will consider the reasons why the architecture for the Internet of Things must incorporate a fundamentally different architecture from the traditional Internet, explore the technical and economic foundations of this new architecture, and finally begin to outline a solution to the problem.

Why internet of things require new solution: The architecture of the original Internet was created long before communicating with billions of very simple devices such as sensors and appliances was ever envisioned. The coming explosion of these much simpler devices creates tremendous challenges for the current networking paradigm in terms of the number of devices, unprecedented demands for low-cost connectivity, and impossibility of managing far-flung and diverse equipment. Although these challenges are becoming evident now, they will pose a greater, more severe problem as this revolution accelerates.

Related:  15 Hot IT Jobs

New generation devices: But the vast majority of devices to be connected in the coming IoT are very different. They will be moisture sensors, valve controls, "smart dust," parking meters, home appliances, and so on. These types of end devices almost never contain the processors, memory, hard drives, and other features needed to run a protocol stack.

These components are not necessary for the end devices' prime function, and the costs of provisioning them with these features would be prohibitive, or at least high enough to exclude wide use of many applications that could otherwise be well served. So these simpler devices are very much "on their own" at the frontier of the network.

How things are connected: Billions of  devices worldwide will form a network unprecedented in history. Devices as varied as soil moisture sensors, street lights, diesel generators, video surveillance systems—even the legendary Internet-enabled toasters—will all be connected in one fashion or another.

Featured post

10 top Blockchain real features useful to financial projects

Blockchain is basically a shared ledger and it has many special features. Why you need it. Business transactions take place every second...

Most Viewed