Skip to main content


Showing posts from June, 2015

Cloud Storage as a Service Basics(1 of 3)

Cloud storage is a model of networked enterprise storage where data is stored in virtualized pools of storage which are generally hosted by third parties. Hosting companies operate large data centers, and customers that require their data to be hosted buy or lease storage capacity from these hosting companies.

The data center operators virtualize the resources according to customer requirements and expose them as storage pools, which the customers can use to store data. Physically, the resource may span multiple servers and multiple locations. The safety of the data depends upon the hosting companies and on the applications that leverage the cloud storage.

Cloud storage is based on highly virtualized infrastructure and has the same characteristics as cloud computing in terms of agility, scalability, elasticity, and multi-tenancy. It is available both off-premises and on-premises. 

While it is difficult to declare a canonical definition of cloud storage architecture, object storage is…

Big Data:Top Hadoop Interview Questions (2 of 5)

Frequently asked Hadoop interview questions.
1. What is Hadoop?Hadoop is a framework that allows users the power of distributed computing.
2.What is the difference between SQL and Hadoop?SQL is allowed to work with structured data. But SQL is most suitable for legacy technologies. Hadoop is suitable for unstructured data. And, it is well suited for modern technologis.3. What is Hadoop framework?It is distributed network of commodity servers(A server can contain multiple clusters, and a cluster can have multiple nodes)
4. What are 4 properties of Hadoop?Accessible-Hadoop runs on large clusters of commodity machinesRobust-An assumption that low commodity machines cause many machine failures. But it handles these tactfully. Scalable-Hadoop scales linearly to handle larger data by adding more nodes to the cluster. Simple-Hadoop allows users to quickly write efficient parallel code
5. What kind of data Hadoop needs?Traditional RDBMS having relational structure with data resides in tables. In H…

Big Data:Top Hadoop Interview Questions (1 of 5)

Looking out for Hadoop Interview Questions that are frequently asked by employers? Here is the first list of Hadoop Interview Questions which  covers HDFS…
What is BIG DATA?
Big Data is nothing but an assortment of such a huge and complex data that it becomes very tedious to capture, store, process, retrieve and analyze it with the help of on-hand database management tools or traditional data processing techniques.
Can you give some examples of Big Data? There are many real life examples of Big Data! Facebook is generating 500+ terabytes of data per day, NYSE (New York Stock Exchange) generates about 1 terabyte of new trade data per day, a jet airline collects 10 terabytes of censor data for every 30 minutes of flying time. All these are day to day examples of  Big Data!

Can you give a detailed overview about the Big Data being generated by Facebook?
As of December 31, 2012, there are 1.06 billion monthly active users on facebook and 680 million mobile users. On an average, 3.2 billion …

Skill set for Software Automation Developer Jobs

According to KPMG: Process automation provides a means to integrate people in a software development organization with the development process and the tools supporting that development. By automating processes, you can boost your efficiency and help ensure standardized handling of repetitive workflow steps. Organization’s benefits are translation projects that can be realized in a shorter time for less money.
The following Skill set needed:
Programming Languages (C++/Java/Scala), OOPs Concepts - MUSTUnix/Linux - MUST
Automation INDIA Jobs | USA Automation developer Jobs
Automation Development/Scripting Experience - MUSTXML/Xpath - OptionalPerl/Python/Shell Scripting - OptionalSQL/Sybase/Mongo DB - OptionalWeb Services - SOAP or REST API - OptionalAdditional Skills:
Puppet/Chef/CFEngine experience 
Experience with system packaging tools; e.g. RPM 
SQL database programming experience

New Directions for Digital Products (1 of 2)

We already crossed Agriculture, Industrial, Information age. Now we are in digitization age. Many companies investing huge money on digitization. Mphasis - is betting on digitization of Financial institutionsTech Mahindra  - started research on Heath care digitizationInfosys -  focusing on Automation and artificial intelligenceTCS - focussing on Machine learningWIPRO - is focusing on Big data and Hadoop
What is digitization
What we mean by digital. Digital data is distinguished from analog data in that the datum is represented in discrete, discontinuous values, rather than the continuous, wavelike values of analog. Thus, the digitization of data refers to the conversion of information into binary code, allowing for more efficient transmission and storage of data. 
A key differentiator of our current age from prior human history is that, as of the last decade, we not only convert data to a digital format, but we also create data in a digital format. Thus, we now have the digital product, a…

IBM Parallel Machine Learning Toolbox Basics

IBM Parallel Machine Learning Toolbox (PML) is similar to that of Google's MapReduce programming model (Dean and Ghemawat, 2004) and the open source Hadoop system,which is to provide Application Programming Interfaces (APIs) that enable programmers who have no prior experience in parallel and distributed systems to nevertheless implement parallel algorithms with relative ease. Like MapReduce and Hadoop, PML supports associative-commutative computations as its primary parallelization mechanism. Unlike MapReduce and Hadoop, PML fundamentally assumes that learning algorithms can be iterative in nature, requiring multiple passes over data. 

It also extends the associative-commutative computational model in various aspects, the most important of which are:

The ability to maintain the state of each worker node between iterations, making it possible, for example, to partition and distribute data structures across workersEfficient distribution of data, including the ability for each worker …

Resource and Scheduling Management in Cloud ( 1 of 2)

A key challenge faced by providers when building a cloud infrastructure is managing physical and virtual resources according to user-resources' demands, with respect to the access time to the associated servers, the capacity needed for storage and the heterogeneity aspects traversing different networks in a holistic fashion. The organisation or, namely, the orchestration of resources must be performed in a way that rapidly and dynamically provides resources to applications.

Some challenges:

One can easily argue whether a company or an organisational body would be offering better services if these services can be easily migrated to the cloud. It is undoubtedly true that the cloud services present a simplistic view of IT in the case of IaaS or a simplistic view of programming notations in the case of PaaS or even a simplistic view of resources manipulation and utilisation in the case of SaaS. However, the underlying communicating mechanisms comprising of heterogeneous systems are form…

Stanford Machine Learning Free Video Course

According to Coursera -Machine learning is the science of getting computers to act without being explicitly programmed. In the past decade, machine learning has given us self-driving cars, practical speech recognition, effective web search, and a vastly improved understanding of the human genome.
1.introduction,The Motivation Applications of Machine Learning 2.An Application of Supervised Learning - Autonomous Deriving 3.The Concept of Underfitting and Overfitting 4.Newtons Method 5.Discriminative Algorithms 6.Multinomial Event Model 7.Optimal Margin Classifier 8.Kernels 9.Bias/variance Tradeoff 10.Uniform Convergence - The Case of Infinite H 11.Bayesian Statistics and Regularization 12.The Concept of Unsupervised Learning 13.Mixture of Gaussian 14.The Factor Analysis Model 15.Latent Semantic Indexing (LSI)

Traditional RDBMS Vs NOSQL databases

Traditional RDBMS
For the last 20 or 30 years, classic data warehousing has been based on the same regimented approach. However, the future is changing this. Traditionally, processes such as identifying data lineage, documenting metadata, and being able to reconcile data across different reports coming from different data tables in different data marts have been critical to ensure that numbers are correct.  This standard approach to data warehousing has been important, in order to have confidence in your data and meet regulatory and compliance requirements. However, over time businesses have become more complex and are doing things at an ever-accelerating pace, and as a result, data storage needs for analytics are changing. For example, there was a time when the core business of a retail establishment was to understand and sell a single line of products through a bricks-and-mortar presence. NoSQL database

However, within the last 5 to10 years, that has evolved into retail establishments …

Machine Learning Basics (Part-2)

Machine learning is a branch of artificial intelligence. Using computing, we design systems that can learn from data in a manner of being trained. The systems might learn and improve with experience, and with time, refine a model that can be used to predict outcomes of questions based on the previous learning.

Life cycle of machine learning:
Acquisition . Collect the dataPrepare - Data Cleaning and QualityProcess- Run Machine ToolsReport- Present the Results

You can acquire data from many sources; it might be data that's held by your organization or open data from the Internet. There might be one data set, or there could be ten or more.

You must come to accept that data will need to be cleaned and checked for quality before any processing can take place. These processes occur during the prepare phase.

The processing phase is where the work gets done. The machine learning routines that you have created perform this phase.

Finally, the results are presented. Reporting can happen in a…

Machine Learning Basics (Part-1)

The following are the list of languages we can use in Machine learning:

With most languages, there is a lot of crossover in functionality. With the languages that access the Java Virtual Machine (JVM) there's a good chance that you'll be accessing Java-based libraries. There's no such thing as one language being "better" than another. It's a case of picking the right tool for the job.

The Python language has increased in usage, because it's easy to learn and easy to read. It also has some good machine learning libraries, such as scikit-learn, PyML, and pybrain. Jython was developed as a Python interpreter for the JVM, which may be worth investigating.

R is an open source statistical programming language. The syntax is not the easiest to learn, but I do encourage you to have a look at it. It also has a large number of machine learning packages and visualization tools. The RJava project allows Java programmers to access R functions from Java code.


Interesting things in WindowsAzure

Ref: Microsoft

Interestingly, Windows Azure is an open platform that will support both Microsoft and non-Microsoft languages and environments. To build applications and services on Windows Azure, developers can use their existing Microsoft® Visual Studio® 2008 expertise. Windows Azure is not grid computing, packaged software, or a standard hosting service. It is an integrated development, service hosting and management environment maintained at Microsoft datacenters. The environment includes a robust and efficient core of compute and simple storage capabilities and support for a rich variety of development tools and protocols.

Jon Brodkin of Network World quotes Tim O'Brien, senior director of Microsoft's Platform Strategy Group, as saying that Microsoft's Windows Azure and Amazon's Elastic Compute Cloud tackle two very different cloud computing technology problems today, but are destined to emulate each other over time.

Many existing applications were built on the LAMP p…

The Story behind Mainframe to Cloud Journey

Mainframe to CLOUD: Mainframe computing took off in the 1950s and gained much prominence through-out the 1960s. Corporations such as IBM (International Business Machines), Univac, DEC (Digital Equipment Corporation), and Control Data Corporation started developing powerful mainframe systems.

These mainframe systems mainly carried out number-crunching for scientists and engineers. The main programming language used was Fortran. Then in the 1960s, the notion of database systems was conceived and corporations developed database systems based on the network and hierarchical data models. The database applications at that time were written mainly in COBOL.

Cloud Vs Mainframe

In the 1970s, corporations such as DEC created the notion of mini-computers. An example is DEC's VAX machine. These machines were much smaller than the mainframe systems. Around that time, terminals were developed. This way, programmers did not have to go to computing centers and use punch cards for their computat…

30 High Paying Tech Jobs,$110,000 Plus Salary

There is a growing demand for software developers across the globe. These 30 highly paying IT jobs really worth.

PaaS or "Platform as a Service" is a type of cloud computing technology. It hosts everything that a developer needs to write an app. These apps once written, would live on PaaS cloud.Paas++jobs

Cassandra is a free and open source NoSQL database. It's a kind of database that can handle and store data of different types and sizes of data and it's increasingly the go-to database for mobile and cloud applications. Several IT companies including Apple and Netflix use Cassandra.

MapReduce has been called "the heart of Hadoop."

MapReduce is the method that allows Hadoop to store all kinds of data across many low-cost computer servers. To get meaningful data of Hadoop, a programmer writes software programs (often in the popular language, Java) for MapReduce.

30 High Paying IT Jobs

Cloudera is a company that makes a commercial…

MemSQL in Advanced Data Analytics

Why use a battery of "complicated" and "immature" tools like Kafka, Zookeeper, and NoSQL databases to support low-latency big data applications when you can use a durable, consistent, SQL-compliant in-memory database?

This is the question NewSQL in-memory database vendors MemSQL and VoltDB are posing to big-data developers who are trying to build real-time applications. MemSQL this week announced a two-way, high-performance MemSQL Spark Connector designed to complement the fast-growing Apache Spark in-memory analytics platform.  

"There's a lot of excitement about Spark, but many data scientists struggle with complexity and the high degree of expertise to work with related data pipelines," said Erik Frenkiel, CEO and cofounder of MemSQL, in a phone interview with InformationWeek. "As a database, MemSQL offers durability and transaction support, so it can simplify those real-time data pipelines, providing the ability to ingest data and query the sys…

Story IoT devices human intelligence basic concepts (3 of 3)

Artificial intelligence is now changing the world. It is also called synonym for automation. The new concept is we can implement AI in software development life cycle.
How we can develop software applications with improved quality?
Software Engineering is concerned with the planning, design, development, maintenance and documentation of software systems. It is well known that developing high quality software for real-world applications is complex. Such complexity manifests itself in the fact that software has a large number of parts that have many interactions and the involvement of many stakeholders with different and sometimes conflicting objectives. Furthermore, Software Engineering is knowledge intensive and often deals with imprecise, incomplete and ambiguous requirements on which analysis, design and implementations are based on. 

Artificial intelligences (AI) techniques such as knowledge based systems, neural networks, fuzzy logic and data mining have been advocated by many rese…