Skip to main content

Featured post

AWS EC2 Real Story on Elastic Cloud Computing

The short name for Amazon Elastic Computing Cloud is EC2. You can keep this point as an interview question. The computing capacity has an elastic property. Based on your requirement you can increase or decrease computing power.
You need to be very attentive when you enable Auto scaling feature. It is a responsibility on Admins. Amazon AWS EC2Making your existing hardware to the requirement, always is not so easy. So the EC2 feature in AWS helps you to allocate computing power according to your needs. AWS EC2 instance acts as your physical server.It has memory.You can increase the instance size in terms of CPU, Memory, Storage and GPU.EC2 auto scaling is a property, where it automatically increase your computing power. List of Top Security Features in EC2  1#. Virtual Private CloudThe responsibility of Virtual Private Cloud, is to safeguard each instance separately. That means, you cannot access others instance, which is already created by other organization.
2#. Network Access Control L…

5 Top Data warehousing Skills in the age of Big data

5 Top Data warehousing Skills in the age of Big data
#5 Top Data warehousing Skills in the age of Big data:
A data warehouse is a home for "secondhand" data that originates in either other corporate applications, such as the one your company uses to fill customer orders for its products, or some data source external to your company, such as a public database that contains sales information gathered from all your competitors.

What is Data warehousing

If your company's data warehouse were advertised as a used car, for example, it may be described this way: "Contains late-model, previously owned data, all of which has undergone a 25-point quality check and is offered to you with a brand-new warranty to guarantee hassle-free ownership."

Most organizations build a data warehouse in a relatively straightforward manner:
  • The data warehousing team selects a focus area, such as tracking and reporting the company's product sales activity against that of its competitors.
  • The team in charge of building the data warehouse assigns a group of business users and other key individuals within the company to play the role of Subject-Matter Experts. Together, these people compile a list of different types of data that enable them to use the data warehouse to help track sales activity (or whatever the focus is for the project).
  • The group then goes through the list of data, item by item, and figures out where it can obtain that particular piece of information. In most cases, the group can get it from at least one internal (within the company) database or file, such as the one the application uses to process orders by mail or the master database of all customers and their current addresses. In other cases, a piece of information is not available from within the company's computer applications but could be obtained by purchasing it from some other company. Although the credit ratings and total outstanding debt for all of a bank's customers, for example, aren't known internally, that information can be purchased from a credit bureau.
  • After completing the details of where each piece of data comes from, the data warehousing team (usually computer analysts and programmers) create extraction programs. These programs collect data from various internal databases and files, copy certain data to a staging area (a work area outside the data warehouse), ensure that the data has no errors, and then copy it all into the data warehouse. Extraction programs are created either by hand (custom-coded) or by using specialized data warehousing products.
Different roles in Data warehousing projects:

Data modeling.: Design and implementation of data models are required for both the integration and presentation repositories. Relational data models are distinctly different from dimensional data models, and each has unique properties. Moreover, relational data modelers may not have dimensional modeling expertise and vice versa.

ETL development: ETL refers to the extraction of data from source systems into staging, the transformations necessary to recast source data for analysis, and the loading of transformed data into the presentation repository. ETL includes the selection criteria to extract data from source systems, performing any necessary data transformations or derivations needed, data quality audits, and cleansing.

Data cleansing: Source data is typically not perfect. Furthermore, merging data from multiple sources can inject new data quality issues. Data hygiene is an important aspect of data warehouse that requires specific skills and techniques.

OLAP design: Typically data warehouses support some variety of online analytical processing (HOLAP, MOLAP, or ROLAP). Each OLAP technique is different but requires special design skills to balance the reporting requirements against performance constraints.

Application development: Users commonly require an application interface into the data warehouse that provides an easy-to-use front end combined with comprehensive analytical capabilities, and one that is tailored to the way the users work. This often requires some degree of custom programming or commercial application customization.

Production automation: Data warehouses are generally designed for periodic automated updates when new and modified data is slurped into the warehouse so that users can view the most recent data available. These automated update processes must have built-in fail-over strategies and must ensure data consistency and correctness.

General systems and database administration: Data warehouse developers must have many of the same skills held by the typical network administrator and database administrator. They must understand the implications of efficiently moving possibly large volumes of data across the network, and the issues of effectively storing changing data.

Comments

Popular posts from this blog

Blue Prism complete tutorials download now

Blue prism is an automation tool useful to execute repetitive tasks without human effort. To learn this tool you need the right material. Provided below quick reference materials to understand detailed elements, architecture and creating new bots. Useful if you are a new learner and trying to enter into automation career.
The number one and most popular tool in automation is a Blue prism. In this post, I have given references for popular materials and resources so that you can use for your interviews.
RPA Blue Prism RPA blue prism tutorial popular resources I have given in this post. You can download quickly. Learning Blue Prism is a really good option if you are a learner of Robotic process automation.

RPA Advantages The RPA is also called "Robotic Process Automation"- Real advantages are you can automate any business process and you can complete the customer requests in less time.

The Books Available on Blue Prism 
Blue Prism resourcesDavid chappal PDF bookBlue Prism Blogs

AWS EC2 Real Story on Elastic Cloud Computing

The short name for Amazon Elastic Computing Cloud is EC2. You can keep this point as an interview question. The computing capacity has an elastic property. Based on your requirement you can increase or decrease computing power.
You need to be very attentive when you enable Auto scaling feature. It is a responsibility on Admins. Amazon AWS EC2Making your existing hardware to the requirement, always is not so easy. So the EC2 feature in AWS helps you to allocate computing power according to your needs. AWS EC2 instance acts as your physical server.It has memory.You can increase the instance size in terms of CPU, Memory, Storage and GPU.EC2 auto scaling is a property, where it automatically increase your computing power. List of Top Security Features in EC2  1#. Virtual Private CloudThe responsibility of Virtual Private Cloud, is to safeguard each instance separately. That means, you cannot access others instance, which is already created by other organization.
2#. Network Access Control L…

Python Syntax Rules Eliminate Errors Before you start debugging

In Python, if you know syntax rules, you can eliminate errors. The basic mistakes programmers do are missing semicolons, adding extra commas, and extra spaces. Python is case sensitive. So using the wrong identifier gives an error.
Indentation is unique to Python. You cannot find this kind of rule in any other programming languages Python Syntax Cheat Sheet These are the main areas you need to focus while writing a Python program. You need to learn rules. Else you need to waste a lot of time fixing the issues or errors.
Indentation or Syntax ErrorsExceptionsHandling Exceptions
1. Indentation If you do not follow proper order, you will get an error. The details of one block shroud follow in one vertical line. The sub-block should be inside of that.

In if loop, the if, elif, and else should have same indentation. Not only, the statement inside of them should have same indentation.Understand these examples a good material on indentation for you.   2. Exceptions  Python raises exception, wh…

Python Improved Logic Easy Way to Calculate Factorial

I am practicing Python programming. This post is you can write logic to calculate factorial in function. This function you can call it a user-defined function. The function name is 'factorial.py'. In real-time, you can write a program in a file and run it in python console. The main task of a developer is to create functions for the reusable code. They call these functions whenever they need. Factorial calculation program for supplied input value. Factorial Logic in Python I have completed this logic in 3 steps. Write factorial.pyImportExecute it Write Factorial.py  Here you need to define a function. Use 2 for loops, and write your logic. This is done on LInux operating system. You can also try on Linux.
After, ESC command Use, :wq to come out of the module. Import Factorial.py Go to Python console, using 'python' command. Use import factorial.py command.


Execute Factorial.py  >>> factorial.fact(5) It will show the result of factorial. Bottom line  Factorial o…

Calculate Circle Area the Logic You Need to write in Python

In Python, you can calculate circle area easily by using function. The purpose of Python is to use in data analysis.


You need this logic in many areas. You can use in your present finance projects or new ones.

Benefits of function you can re-use the same code number of times Area of Circle=pi*r*r Area of Circle Steps Given - You Can do Using Two Methods, Explained Both Methods I have given steps to calculate area of circle using two different methods. First I followed by creating an user defined function. Next directly I ran  the formula in interpreter.


Method-1 Steps I have followed to Calculate Area Using FunctionLogged into Cent Os (Linux) Create .py module Import .py module into Python Execute .py module  1. Log in I have first logged into CentOS. You can see there '$'.

2. Creating .py module To create .py module. You can use vi editor command.

You need 'import decimal' to get Decimal values. Else you will get only integer.


I have given pwd comman…

Hyperledger Fabric Real Interview Questions Read Today

I am practicing Hyperledger. This is one of the top listed blockchains. This architecture follows R3 Corda specifications. Sharing the interview questions with you that I have prepared for my interview.

Though Ethereum leads in the real-time applications. The latest Hyperledger version is now ready for production applications. It has now become stable for production applications.
The Hyperledger now backed by IBM. But, it is still an open source. These interview questions help you to read quickly. The below set of interview questions help you like a tutorial on Hyperledger fabric. Hyperledger Fabric Interview Questions1). What are Nodes?
In Hyperledger the communication entities are called Nodes.

2). What are the three different types of Nodes?
- Client Node
- Peer Node
- Order Node
The Client node initiates transactions. The peer node commits the transaction. The order node guarantees the delivery.

3). What is Channel?
A channel in Hyperledger is the subnet of the main blockchain. You c…

Top 10 SCALA Quiz Questions for Programmers

Scala is an acronym for “Scalable Language”. This means that Scala grows with you. You can play with it by typing one-line expressions and observing the results. But you can also rely on it for large mission critical systems, as many companies, including Twitter, LinkedIn, or Intel do.


To some, Scala feels like a scripting language. Its syntax is concise and low ceremony; its types get out of the way because the compiler can infer them. There’s a REPL and IDE worksheets for quick feedback.

Developers like it so much that Scala won the ScriptBowl contest at the 2012 JavaOne conference. At the same time, Scala is the preferred workhorse language for many mission critical server systems. The generated code is on a par with Java’s and its precise typing means that many problems are caught at compile-time rather than after deployment.


At the root, the language’s scalability is the result of a careful integration of object-oriented and functional language concepts.(Ref-what is Scala).View Su…

R Vs SAS differences to read today

Statistical analysis should know by every software engineer. R is an open source statistical programming language. SAS is licensed analysis suite for statistics. The two are very much popular in Machine learning and data analytics projects.
SAS is analysis suite software and R is a programming language R ProgrammingR supports both statistical analysis and GraphicsR is an open source project.R is 18th most popular LanguageR packages are written in C, C++, Java, Python and.NetR is popular in Machine learning, data mining and Statistical analysis projects. SASSAS is a statistical analysis suite. Developed to process data sets in mainframe computers.Later developed to support multi-platforms. Like  Mainframe, Windows, and LinuxSAS has multiple products. SAS/ Base is very basic level.SAS is popular in data related projects. Learn SAS vs R Top Differences between SAS Vs R Programming SAS AdvantagesThe data integration from any data source is faster in SAS.The licensed software suite, so you…