Featured Post

8 Ways to Optimize AWS Glue Jobs in a Nutshell

Image
  Improving the performance of AWS Glue jobs involves several strategies that target different aspects of the ETL (Extract, Transform, Load) process. Here are some key practices. 1. Optimize Job Scripts Partitioning : Ensure your data is properly partitioned. Partitioning divides your data into manageable chunks, allowing parallel processing and reducing the amount of data scanned. Filtering : Apply pushdown predicates to filter data early in the ETL process, reducing the amount of data processed downstream. Compression : Use compressed file formats (e.g., Parquet, ORC) for your data sources and sinks. These formats not only reduce storage costs but also improve I/O performance. Optimize Transformations : Minimize the number of transformations and actions in your script. Combine transformations where possible and use DataFrame APIs which are optimized for performance. 2. Use Appropriate Data Formats Parquet and ORC : These columnar formats are efficient for storage and querying, signif

Cloud Storage as a Service Basics (2 of 3)

The really awesome point is cloud storage. Yes, you are storing data in cloud. But you need to understand here few good things about it.
 
What is cloud storage...
Cloud storage involves exactly what the name suggests—storing your data with a cloud service provider rather than on a local system. As with other cloud services, you access the data stored on the cloud via an Internet link.

Even though data is stored and accessed remotely, you can maintain data both locally and on the cloud as a measure of safety and redundancy. Cloud storage has a number of advantages over traditional data storage:

The benefits..
  • If you store your data on a cloud, you can get at it from any location that has Internet access. 
  • This makes it especially appealing to road warriors. 
  • Workers don’t need to use the same computer to access data nor do they have to carry around physical storage devices. 
  • Also, if your organization has branch offices, they can all access the data from the cloud provider.
The Basics: There are hundreds of different cloud storage systems, and some are very specific in what they do. Some are niche-oriented and store just email or digital pictures, while others store any type of data. Some providers are small, while others are huge and fill an entire warehouse.

One of Google’s datacenters in Oregon is the size of a football field and houses thousands of servers.

At the most rudimentary level, a cloud storage system just needs one data server connected to the Internet. A subscriber copies files to the server over the Internet, which then records the data. When a client wants to retrieve the data, he or she accesses the data server with a web-based interface, and the server then either sends the files back to the client or allows the client to access and manipulate the data itself.

How cloud storage works...

Cloud storage systems utilize dozens or hundreds of data servers. Because servers require maintenance or repair, it is necessary to store the saved data on multiple machines, providing redundancy. Without that redundancy, cloud storage systems couldn’t assure clients that they could access their information at any given time. Most systems store the same data on servers using different power supplies. That way, clients can still access their data even if a power supply fails.

Summary...
Many clients use cloud storage not because they’ve run out of room locally, but for safety. If something happens to their building, then they haven’t lost all their data.

Comments

Popular posts from this blog

How to Fix datetime Import Error in Python Quickly

How to Check Kafka Available Brokers

SQL Query: 3 Methods for Calculating Cumulative SUM