Featured Post

8 Ways to Optimize AWS Glue Jobs in a Nutshell

Image
  Improving the performance of AWS Glue jobs involves several strategies that target different aspects of the ETL (Extract, Transform, Load) process. Here are some key practices. 1. Optimize Job Scripts Partitioning : Ensure your data is properly partitioned. Partitioning divides your data into manageable chunks, allowing parallel processing and reducing the amount of data scanned. Filtering : Apply pushdown predicates to filter data early in the ETL process, reducing the amount of data processed downstream. Compression : Use compressed file formats (e.g., Parquet, ORC) for your data sources and sinks. These formats not only reduce storage costs but also improve I/O performance. Optimize Transformations : Minimize the number of transformations and actions in your script. Combine transformations where possible and use DataFrame APIs which are optimized for performance. 2. Use Appropriate Data Formats Parquet and ORC : These columnar formats are efficient for storage and querying, signif

SAN: Real Architecture Explained

A SAN is connected behind the servers. SANs provide block-level access to shared data storage. Block level access refers to the specific blocks of data on a storage device as opposed to file level access. One file will contain several blocks.
The simplified SAN architecture to understand how data is stored in storage from servers





Storage Area Networks (SANs)
  • SANs provide high availability and robust business continuity for critical data environments. SANs are typically switched fabric architectures using Fibre Channel (FC) for connectivity.
  • The term switched fabric refers to each storage unit being connected to each server via multiple SAN switches also called SAN directors which provide redundancy within the paths to the storage units. This provides additional paths for communications and eliminates one central switch as a single point of failure.
  • Ethernet has many advantages similar to Fibre Channel for supporting SANs. Some of these include high speed, support of a switched fabric topology, widespread interoperability, and a large set of management tools.
  • In a storage network application, the switch is the key element. With the significant number of Gigabit and 10 Gigabit Ethernet ports shipped, leveraging IP and Ethernet for storage is a natural progression for some environments. 
SAN Vs IP
  1. IP was developed as an open standard with complete interoperability of components. Two new IP storage network technologies are Fibre Channel over Ethernet (FCoE) and SCSI over IP (iSCSI). IP communication across a standard IP network via Fibre Channel Tunneling or storage tunneling has the benefit of utilizing storage in locations that may exceed the directly attached limit of nearly 10 km when using fiber as the transport medium.
  2. Internal to the data center, legacy Fibre Channel can also be run over coaxial cable or twisted pair cabling, but at significantly shorter distances.
  3. The incorporation of the IP standard into these storage systems offers performance benefits through speed, greater availability, fault tolerance, and scalability. These solutions, properly implemented, can almost guaranty 100% availability of data. The IP based management protocols also provide network managers with a new set of tools, warnings, and triggers that were proprietary in previous generations of storage technology. Security and encryption solutions are also greatly enhanced. With 10G gaining popularity and the availability of new faster WAN links, these solutions can offer true storage on demand.

Comments

Popular posts from this blog

How to Fix datetime Import Error in Python Quickly

How to Check Kafka Available Brokers

SQL Query: 3 Methods for Calculating Cumulative SUM