Featured post

The Ultimate Cheat Sheet On Hadoop

Top 20 frequently asked questions to test your Hadoop knowledge given in the below Hadoop cheat sheet. Try finding your own answers and match the answers given here.




Question #1 

You have written a MapReduce job that will process 500 million input records and generate 500 million key-value pairs. The data is not uniformly distributed. Your MapReduce job will create a significant amount of intermediate data that it needs to transfer between mappers and reducers which is a potential bottleneck. A custom implementation of which of the following interfaces is most likely to reduce the amount of intermediate data transferred across the network?



A. Writable
B. WritableComparable
C. InputFormat
D. OutputFormat
E. Combiner
F. Partitioner
Ans: e




Question #2 

Where is Hive metastore stored by default ?


A. In HDFS
B. In client machine in the form of a flat file.
C. In client machine in a derby database
D. In lib directory of HADOOP_HOME, and requires HADOOP_CLASSPATH to be modified.
Ans: c




Question…

What is "Learning Algorithm" in Machine Learning

#What is machine learning algorithm
#What is machine learning algorithm
Just a basics on Machine Learning

Alice has just begun taking a course on machine learning. She knows that at the end of the course, she will be expected to have “learned”all about this topic. A common way of gauging whether or not shehas learned is for her teacher, Bob, to give her a exam. She has done well at learning if she does well on the exam.

But what makes a reasonable exam? If Bob spends the entire semester talking about machine learning, and then gives Alice an exam on History of Pottery, then Alice’s performance on this exam will not be representative of her learning. On the other hand, if the (The general supervised approach to machine learning: a learning algorithm reads in training data and computes a learned function f . This function can then automatically label future text examples.)

exam only asks questions that Bob has answered exactly during lectures, then this is also a bad test of Alice’s learning, especially if it’s an “open notes” exam. What is desired is that Alice observes specific examples from the course, and then has to answer new, but related questions on the exam. This tests whether Alice has the ability to  generalize. Generalization is perhaps the most central concept in machine learning. This Machine learning with data science hands-on really useful.

Examples for Machine Learning

As a running concrete example, we will use that of a course recommendation system for undergraduate computer science students. We have a collection of students and a collection of courses. Each student has taken, and evaluated, a subset of the courses. The evaluation is simply a score from −2 (terrible) to +2 (awesome).

The job of the recommender system is to predict how much a particular student (say, Alice) will like a particular course (say, Algorithms).

Given historical data from course ratings (i.e., the past) we are trying to predict unseen ratings (i.e., the future). Now, we could be unfair to this system as well. We could ask it whether Alice is likely to enjoy the History of Pottery course. This is unfair because the system has no idea what History of Pottery even is, and has no prior experience with this course. On the other hand, we could ask it how much Alice will like Artificial Intelligence, which she took last year and rated as +2 (awesome).

We would expect the system to predict that she would really like it, but this isn’t demonstrating that the system has learned: it’s simply recalling its past experience. In the former case, we’re expecting the system to generalize beyond its experience, which is unfair. In the latter case, we’re not expecting it to generalize at all.

This general set up of predicting the future based on the past is at the core of most machine learning. The objects that our algorithm will make predictions about are examples. In the recommender system setting, an example would be some particular Student/Course pair (such as Alice/Algorithms). The desired prediction would be the rating that Alice would give to Algorithms.

Comments

Popular posts from this blog

AWS Vs Azure Load Balancers Top Insights

Hadoop File System Basic Commands

4 Important Skills You Need for Data Scientists

Hyperledger Fabric: 20 Real Interview Questions