Featured Post

8 Ways to Optimize AWS Glue Jobs in a Nutshell

  Improving the performance of AWS Glue jobs involves several strategies that target different aspects of the ETL (Extract, Transform, Load) process. Here are some key practices. 1. Optimize Job Scripts Partitioning : Ensure your data is properly partitioned. Partitioning divides your data into manageable chunks, allowing parallel processing and reducing the amount of data scanned. Filtering : Apply pushdown predicates to filter data early in the ETL process, reducing the amount of data processed downstream. Compression : Use compressed file formats (e.g., Parquet, ORC) for your data sources and sinks. These formats not only reduce storage costs but also improve I/O performance. Optimize Transformations : Minimize the number of transformations and actions in your script. Combine transformations where possible and use DataFrame APIs which are optimized for performance. 2. Use Appropriate Data Formats Parquet and ORC : These columnar formats are efficient for storage and querying, signif

Data science: Simple Project to Practice

I want to share with you how to use Python for your Data science or analytics Projects. Many programmers struggle to learn Data science because they do not know where to start. You can get hands-on if you start with a mini-project.

I have used Ubuntu Operating System for this project. You Need dual skills; Learning and Apply knowledge to become a data scientist. In Data science you need to learn and apply your knowledge.
 Data science: Simple Project to Practice

After engineering, you can go for M Tech Degree. 

You can become a real engineer if you apply engineering principles. So Data science also the same.

Data Visualization in Python is my simple project

Importance of Data

Data is a precious resource in resolving Machine Learning and Data Science Problems. 

Define first what is your problem.
  1. Collect Data
  2.  Wrangle the Data and Clean it.
  3. Visualize the Patterns
In the olden days, you might be studied a subject called Statistical Analysis. 

In this subject, you need to study the actual problem and collect the data in a notebook. 

Let us say when there were no computers in the olden days, people use paper and notebooks to collect and analyze data.

After that, they use pencil and graph paper and draw the charts based on selected data. 

It is time consuming and laborious process. Finally based on the data visualization people correct the process.

The same concept you can see in current data science projects.

How to Write a Script in Python

Related Posts

Make sure These Steps Completed

  1. Install Ubuntu on a Virtual Machine
  2.  Install Python 3.7X
  3. Install Anaconda Python - Which contains all the packages that you need for Data science projects.
'ls' Command gives the Script I created for this Project ls command you can use all the scripts

How to View Python Script using less command

less command to check the code

5 Key Points to Remember

  1. Import command you can use to import packages. 
  2. matplotlib is a package. This you need to draw a plot.
  3. Numpy is required to use all mathematical and scientific calculations. So I imported Numpy.
  4. The 'as' command is an alias. So that you can save a lot of coding time.
  5. I have drawn two plots. One is uniform and the other one in normal
Why I used if__name__=="__main__":

The real meaning is that the script is running under the main(). 

Also the module mypython.py, you can run as Standalone or you can call this module using import command from another script. 

In Python, you need to create modules using the .py extension. 

Some people say as scripts and other people say as Modules. 

All Python documents and Standard textbooks using the word Python module. So you also can use it.

You May Also Like: Story of  Python name and main

How to Execute mypython.py in Python Console

 $python mypython.py

Two Plots I have Drawn: Normal and Uniform Distribution

1). Normal Distribution

  normal distribution 

2). Uniform Distribution

uniform distribution


Try today as this is a simple project. In realtime, a data analyst role is to deal with data and charts. I am sure you can begin your data analyst career with this project.


Post a Comment

Thanks for your message. We will get back you.

Popular posts from this blog

How to Fix datetime Import Error in Python Quickly

Explained Ideal Structure of Python Class

How to Check Kafka Available Brokers