Featured Post

8 Ways to Optimize AWS Glue Jobs in a Nutshell

Image
  Improving the performance of AWS Glue jobs involves several strategies that target different aspects of the ETL (Extract, Transform, Load) process. Here are some key practices. 1. Optimize Job Scripts Partitioning : Ensure your data is properly partitioned. Partitioning divides your data into manageable chunks, allowing parallel processing and reducing the amount of data scanned. Filtering : Apply pushdown predicates to filter data early in the ETL process, reducing the amount of data processed downstream. Compression : Use compressed file formats (e.g., Parquet, ORC) for your data sources and sinks. These formats not only reduce storage costs but also improve I/O performance. Optimize Transformations : Minimize the number of transformations and actions in your script. Combine transformations where possible and use DataFrame APIs which are optimized for performance. 2. Use Appropriate Data Formats Parquet and ORC : These columnar formats are efficient for storage and querying, signif

The Real Use of Git in DevOps Environment

Why you need Git? Here's well explained with differences among Git, GitFlow and GitHub.

Git is a tool


Git in devops



It is created by Linus Torvalds, the creator of the Linux system. The tool was created to help Linux developers control the development flow among many developers around the world. It helps to solve conflicts, track the modifications, or even revert the configurations that were working before and stopped working in a new version.


To install Git, you can access the following link: https://git-scm.com/download/win


The installation process is the same as we did for all Windows applications; it is just Next, Next, and Finish.

After the installation, you will find a new program called Git Bash, which allows you to create your local repos and use the Git commands to create versions of your application.


GitFlow


Now, when we start working on a project, we have the code files that are already in production, and we cannot work in the main branch, because of the CI/CD pipelines. We need to generate a new version of our software with the complete code.

Let's commit the first version of the code, which will be shared among all the developers working on the same project: PS

C:\Users\1511 MXTI\Chapter09> git add --all PS C:\Users\1511 MXTI\Chapter09> git commit -m "uploading the scaffold of the project" [master (root-commit) f6284bf] uploading the scaffold of the project 1 file changed, 3 insertions(+) create mode 100644 code.txt


If we run the git status again, nothing is untraced or pending to commit: PS C:\Users\1511 MXTI\Chapter09> git status On branch master nothing to commit, working tree clean


To check the branches you have, run the following command: PS C:\Users\1511 MXTI\Chapter09> git branch * master.

GitHub


The commands ran in the local machine and in a local environment. It was just the basic steps, but I am pretty sure that with these commands you will be able to do 80% of your work every day. If you cannot, there are many tools to help you with the Git commands, like git Kraken, or you can even use plugins for Visual Studio Code.

We need to have in mind that as bigger as your team is, you will have more conflicts and more changes happening at the same time.

To share our code, we can use many tools, like GitLab, GitHub, Bitbucket. There are dozens of services to help you with it. Let's use GitHub because it is the most famous. But, for some of my personal projects, I use Bitbucket, because we can have unlimited private repos.


You can create your own GitHub account for free on their website:

https://github.com/

Comments

Popular posts from this blog

How to Fix datetime Import Error in Python Quickly

How to Check Kafka Available Brokers

SQL Query: 3 Methods for Calculating Cumulative SUM