Featured Post

The Quick and Easy Way to Analyze Numpy Arrays

Image
The quickest and easiest way to analyze NumPy arrays is by using the numpy.array() method. This method allows you to quickly and easily analyze the values contained in a numpy array. This method can also be used to find the sum, mean, standard deviation, max, min, and other useful analysis of the value contained within a numpy array. Sum You can find the sum of Numpy arrays using the np.sum() function.  For example:  import numpy as np  a = np.array([1,2,3,4,5])  b = np.array([6,7,8,9,10])  result = np.sum([a,b])  print(result)  # Output will be 55 Mean You can find the mean of a Numpy array using the np.mean() function. This function takes in an array as an argument and returns the mean of all the values in the array.  For example, the mean of a Numpy array of [1,2,3,4,5] would be  result = np.mean([1,2,3,4,5])  print(result)  #Output: 3.0 Standard Deviation To find the standard deviation of a Numpy array, you can use the NumPy std() function. This function takes in an array as a par

Top features of Apache Avro in Hadoop eco-System

Avro defines a data format designed to support data-intensive applications, and provides support for this format in a variety of programming languages.

The Hadoop ecosystem includes a new binary data serialization system — Avro. 

Avro provides:
·     Rich data structures.

·         A compact, fast, binary data format.
·         A container file, to store persistent data.
·         Remote procedure call (RPC).
·       Simple integration with dynamic languages. Code generation is not required to read or write data files nor to use or implement RPC protocols. Code generation as an optional optimization, only worth implementing for statically typed languages.

Its functionality is similar to the other marshaling systems such as Thrift, Protocol Buffers, and so on.

The main differentiators of Avro include the following:

[Hadoop Interview Questions]
[Hadoop Interview Questions]
Dynamic typing — The Avro implementation always keeps data and its corresponding schema together. As a result, marshaling/unmarshaling operations do not require either code generation or static data types. This also allows generic data processing.

Untagged data — Because it keeps data and schema together, Avro
marshaling/unmarshaling does not require type/size information or manually assigned IDs to be encoded in data. As a result, Avro serialization produces a smaller output.

Enhanced versioning support — In the case of schema changes, Avro contains both schemas, which enables you to resolve differences symbolically based on the field names.
Because of high performance, a small codebase, and compact resulting data, there is a wide adoption of Avro not only in the Hadoop community, but also by many other NoSQL implementations (including Cassandra).

At the heart of Avro is a data serialization system. Avro can either use reflection to dynamically generate schemas of the existing Java objects, or use an explicit Avro schema — a JavaScript Object Notation (JSON) document describing the data format. Avro schemas can contain both simple and complex types.

Simple data types supported by Avro include null, boolean, int, long, float, double, bytes, and string. Here, null is a special type, corresponding to no data, and can be used in place of any data type.

Complex types supported by Avro include the following:
Record — This is roughly equivalent to a C structure. A record has a name and optional namespace, document, and alias. It contains a list of named attributes that can be of any Avro type.
Enum — This is an enumeration of values. Enum has a name, an optional namespace, document, and alias, and contains a list of symbols (valid JSON strings).
Array — This is a collection of items of the same type.
Map — This is a map of keys of type string and values of the specified type.
Union — This represents an or option for the value. A common use for unions is to specify nullable values.

Comments

Popular posts from this blog

How to Decode TLV Quickly

7 AWS Interview Questions asked in Infosys, TCS