Featured Post

8 Ways to Optimize AWS Glue Jobs in a Nutshell

Image
  Improving the performance of AWS Glue jobs involves several strategies that target different aspects of the ETL (Extract, Transform, Load) process. Here are some key practices. 1. Optimize Job Scripts Partitioning : Ensure your data is properly partitioned. Partitioning divides your data into manageable chunks, allowing parallel processing and reducing the amount of data scanned. Filtering : Apply pushdown predicates to filter data early in the ETL process, reducing the amount of data processed downstream. Compression : Use compressed file formats (e.g., Parquet, ORC) for your data sources and sinks. These formats not only reduce storage costs but also improve I/O performance. Optimize Transformations : Minimize the number of transformations and actions in your script. Combine transformations where possible and use DataFrame APIs which are optimized for performance. 2. Use Appropriate Data Formats Parquet and ORC : These columnar formats are efficient for storage and querying, signif

What is "Learning Algorithm" in Machine Learning

#What is machine learning algorithm
#What is machine learning algorithm
Just a basics on Machine Learning

Alice has just begun taking a course on machine learning. She knows that at the end of the course, she will be expected to have “learned”all about this topic. A common way of gauging whether or not shehas learned is for her teacher, Bob, to give her a exam. She has done well at learning if she does well on the exam.

But what makes a reasonable exam? If Bob spends the entire semester talking about machine learning, and then gives Alice an exam on History of Pottery, then Alice’s performance on this exam will not be representative of her learning. On the other hand, if the (The general supervised approach to machine learning: a learning algorithm reads in training data and computes a learned function f . This function can then automatically label future text examples.)

exam only asks questions that Bob has answered exactly during lectures, then this is also a bad test of Alice’s learning, especially if it’s an “open notes” exam. What is desired is that Alice observes specific examples from the course, and then has to answer new, but related questions on the exam. This tests whether Alice has the ability to  generalize. Generalization is perhaps the most central concept in machine learning. This Machine learning with data science hands-on really useful.

Examples for Machine Learning

As a running concrete example, we will use that of a course recommendation system for undergraduate computer science students. We have a collection of students and a collection of courses. Each student has taken, and evaluated, a subset of the courses. The evaluation is simply a score from −2 (terrible) to +2 (awesome).

The job of the recommender system is to predict how much a particular student (say, Alice) will like a particular course (say, Algorithms).

Given historical data from course ratings (i.e., the past) we are trying to predict unseen ratings (i.e., the future). Now, we could be unfair to this system as well. We could ask it whether Alice is likely to enjoy the History of Pottery course. This is unfair because the system has no idea what History of Pottery even is, and has no prior experience with this course. On the other hand, we could ask it how much Alice will like Artificial Intelligence, which she took last year and rated as +2 (awesome).

We would expect the system to predict that she would really like it, but this isn’t demonstrating that the system has learned: it’s simply recalling its past experience. In the former case, we’re expecting the system to generalize beyond its experience, which is unfair. In the latter case, we’re not expecting it to generalize at all.

This general set up of predicting the future based on the past is at the core of most machine learning. The objects that our algorithm will make predictions about are examples. In the recommender system setting, an example would be some particular Student/Course pair (such as Alice/Algorithms). The desired prediction would be the rating that Alice would give to Algorithms.

Comments

Popular posts from this blog

How to Fix datetime Import Error in Python Quickly

How to Check Kafka Available Brokers

SQL Query: 3 Methods for Calculating Cumulative SUM