Featured Post

8 Ways to Optimize AWS Glue Jobs in a Nutshell

Image
  Improving the performance of AWS Glue jobs involves several strategies that target different aspects of the ETL (Extract, Transform, Load) process. Here are some key practices. 1. Optimize Job Scripts Partitioning : Ensure your data is properly partitioned. Partitioning divides your data into manageable chunks, allowing parallel processing and reducing the amount of data scanned. Filtering : Apply pushdown predicates to filter data early in the ETL process, reducing the amount of data processed downstream. Compression : Use compressed file formats (e.g., Parquet, ORC) for your data sources and sinks. These formats not only reduce storage costs but also improve I/O performance. Optimize Transformations : Minimize the number of transformations and actions in your script. Combine transformations where possible and use DataFrame APIs which are optimized for performance. 2. Use Appropriate Data Formats Parquet and ORC : These columnar formats are efficient for storage and querying, signif

R Language basics for Beginners to Apply in Analytics

In the early days, a key feature of R was that its syntax is very similar to S, making it easy for S-PLUS users to switch over. While the R’s syntax is nearly identical to that of S’s, R’s semantics, while superficially similar to S, are quite different.

R Language basics for Beginners to Apply in Analytics


Steps to learn R Language


In fact, R is technically much closer to the Scheme language than it is to the original S language when it comes to how R works under the hood. Today R runs on almost any standard computing platform and operating system. Its open-source nature means that anyone is free to adapt the software to whatever platform they choose.

#R language basics


Indeed, R has been reported to be running on modern tablets, phones, PDAs, and game consoles. One nice feature that R shares with many popular open-source projects is frequent releases. These days there is a major annual release, typically in October, where major new features are incorporated and released to the public. Throughout the year, smaller-scale bugfix releases will be made as needed.


Releases -The frequent releases and regular release cycle indicates active development of the software and ensures that bugs will be addressed in a timely manner. 

Of course, while the core developers control the primary source tree for R, many people around the world make contributions in the form of new features, bug fixes, or both. Another key advantage that R has over many other statistical packages (even today) is its sophisticated graphics capabilities. 


R’s ability to create “publication quality” graphics has existed since the very beginning and has generally been better than competing packages.

Today, with many more visualization packages available than before, that trend continues. R’s base graphics system allows for very fine control over essentially every aspect of a plot or graph.


Other newer graphics systems, like lattice and ggplot2, allow for complex and sophisticated visualizations of high-dimensional data. R has maintained the original S philosophy, which is that it provides a language that is both useful for interactive work but contains a powerful programming language for developing new tools.

This allows the user, who takes existing tools and applies them to data, to slowly but surely become a developer who is creating new tools.


Finally, one of the joys of using R has nothing to do with the language itself, but rather with the active and vibrant user community. 


In many ways, a language is successful inasmuch as it creates a platform with which many people can create new things. R is that platform and thousands of people around the world have come together to make contributions to R, to develop packages, and help each other use R for all kinds of applications.


The R-help and R-devel mailing lists have been highly active for over a decade now and there is considerable activity on websites like Stack Overflow.

Comments

Popular posts from this blog

How to Fix datetime Import Error in Python Quickly

How to Check Kafka Available Brokers

SQL Query: 3 Methods for Calculating Cumulative SUM