Skip to main content

How to Identify Data Relevant for Data Science Analytics

Much of the data we currently work with is the direct consequence of Web 2.0
Data in data science
Data is everywhere

your government, your web server,your business partners, even your body. While we aren’t drowning in a sea of data, we’re finding that almost everything can (or has) been instrumented. We frequently combine publishing industry data from Nielsen Book Scan with our own sales data, publicly available Amazon data, and even job data to see what’s happening in the publishing industry.

Sites like Infochimps and Factual provide access to many large datasets, including climate data, MySpace activity streams, and game logs from sporting events. Factual enlists users to update and improve its datasets, which cover topics as diverse as endocrinologists to hiking trails.

How the data is growing

Much of the data we currently work with is the direct consequence of Web 2.0, and of Moore’s Law applied to data. The Web has people spending more time online,and leaving a trail of data wherever they go. Mobile applications leave an even richer data trail, since many of them are annotated with geolocation, or involve video or audio, all of which can be mined.

Point-of-sale devices and frequent-shopper’s cards make it possible to capture all of your retail transactions, not just the ones you make online. All of this data would be useless if we couldn’t store it, and that’s where Moore’s Law comes in. Since the early ’80s, processor speed has increased from 10 MHz to 3.6 GHz—an increase of 360 (not counting increases in word length and number of cores).

The need of Storage capacity

But we’ve seen much bigger increases in storage capacity, on every level. RAM has moved from $1,000/MB to roughly $25/GB—a price reduction of about 40000, to say nothing of the reduction in size and increase in speed. Hitachi made the first gigabyte disk drives in 1982, weighing in at roughly 250 pounds; now terabyte drives are consumer equipment, and a 32 GB microSD card weighs about half a gram. Whether you look at bits per gram, bits per dollar, or raw capacity, storage has more than kept pace with the increase of CPU speed.

Comments

Popular posts from this blog

10 Tricky Interview Questions On Storm

Storm is real time computation system. It is a flagship software from Apache foundation. Has the capability to process in stream data. Storm is capable to integrate traditional databases. The list given below are tricky and highly useful for your next interview.
Bench mark for Storm is a million tuples processed per second per node. Tricky Interview Questions1) Real uses of Storm?

A) You can use in realtime analytics, online machine learning, continuous computation, distributed RPC, ETL

2) What are different availble layers on Storm?
FluxSQLStreams APITrident3)  Real use of SQL API on top of Storm?
A) You can run SQL queries on stream data
4) Most popular integrations to Storm? HDFSCassandraJDBCHIVEHBase 5) What are different possible Containers integration with Storm? YARNDOCKERMESOS6) What is Local Mode?

A) Running topologies in Local server we can say as Local Mode.

7) Where all the Events Stored in Storm?
A) Event Logger mechanism saves all events

8) What are Serializable data types in …

Blue Prism complete tutorials download now

Blueprsim is an automation tool useful to execute repetitive tasks without human effort. To learn this tool you need right material. Provided below quick reference materials to understand detailed elements, architecture and creating new bots. Useful if you are a new learner and trying to enter into automation career.
The number one and most popular tool in automation is Blue prism. In this post I have given references for popular materials and resources, so that you can use for your interviews.
Why You Need to Learn RPA blue prsim tutorial popular resources I have given in this post. You can download quickly. Learning Blue Prism is really good option if you are learner of Robotic process automation.
RPA Advantages The RPA is also called "Robotic Process Automation"- Real advantages are you can automate any business process and you can complete the customer requests in less time.

The Books Available on Blue Prism 
Blue Prism resourcesDavid chappal PDF bookBlue Prism BlogsVideo…

Blockchain Smart contract behind mechanism you need to learn quickly

Smart contract in Blockchain is a kind of software application that works without human intervention based on the transaction logs and provide solution to user request. I want to share the back end mechanism in Smart Contract of Blockchain. Smart Contract Mechanism What is Smart ContractA smart contract is a protocol which can auto execute, facilitate, verify or enforce the negotiation of a contract.Agreement between two parties you can say as a contract.Incorporating the rules of physical contract into computing world, you can say as smart contractBlockchain supports you to create smart contracts.Smart Contracts are self-executing programs which run on the blockchain and are capable of enforcing rulesUsing Blockchain as platform and making an agreement or contract between more than two parties, you can say as Smart Contract.Traditional Markets  4 Top Benefits of Smart ContractCurrently smart contracts are being used only in Crypto CurrenciesNow Smart Contracts being used in all financ…