Posts

Showing posts from September, 2015

In traditional RDBMS, when a data source is accessed by multi users at single time, then database will go into deadlock state. One of the advantages of a columnar model is that if two or more users want to use a different subset of columns, they do not have to lock out each other. (Superior benefits for NoSQL Jobs) This design is made easier because of a disk storage method known as RAID (redundant array of independent disks, originally redundant array of inexpensive disks), which combines multiple disk drives into a logical unit. Data is stored in several patterns called levels that have different amounts of redundancy. The idea of the redundancy is that when one drive fails, the other drives can take over. When a replacement disk drive in put in the array, the data is replicated from the other disks in the array and the system is restored. The following are the various levels of RAID: RAID 0 (block-level striping without parity or mirroring) ...

Top features of Apache Avro in Hadoop eco-System

- September 29, 2015

Avro defines a data format designed to support data-intensive applications, and provides support for this format in a variety of programming languages. The Hadoop ecosystem includes a new binary data serialization system — Avro. Avro provides: · Rich data structures. · A compact, fast, binary data format. · A container file, to store persistent data. · Remote procedure call (RPC). · Simple integration with dynamic languages. Code generation is not required to read or write data files nor to use or implement RPC protocols. Code generation as an optional optimization, only worth implementing for statically typed languages. Its functionality is similar to the other marshaling systems such as Thrift, Protocol Buffers, and so on. The main differentiators of Avro...

RDBMS Vs Key-value Four Top Differences

- September 29, 2015

This post tells you differences between rdbms and distributed key-value storage. Rdbms is quite different from key-value storage. RDBMS (Relational Database) You have already used a r elational d atabase m anagement s ystem — a storage product that's commonly referred to as RDBMS . It is basically a structured data. RDBMS systems are fantastically useful to handle moderate data. The BIG challenge is in scaling beyond a single server. You can't maintain redundant data in rdbms. All the data available on single server. The entire database runs on single server. So when server is down then database may not be available to normal business operations. Outages and server downs are common in this rdbms model of database. Key-Value Database Key-value storage systems often make use of redundancy within hardware resources to prevent outages. This concept is important when you're running thousands of servers because they're bound...

Amazon web services -Object Storage

- September 28, 2015

Object Storage: Object storage provides the ability to store, well, objects — which are essentially collections of digital bits. Those bits may represent a digital photo, an MRI scan, a structured document such as an XML file — or the video of your cousin's embarrassing attempt to ride a skateboard down the steps at the public library (the one you premiered at his wedding). Object storage offers the reliable (and highly scalable) storage of collections of bits, but imposes no structure on the bits. The structure is chosen by the user, who needs to know, for example, whether an object is a photo (which can be edited), or an MRI scan (which requires a special application for viewing it). The user has to know both the format as well as the manipulation methods of the object. The object storage service simply provides reliable storage of the bits. Difference between Object storage and File storage Object storage differs from file storage, which you may be more familiar with from usi...

SOAP Vs REST top differences you need to know

- September 28, 2015

What is SOAP? SOAP is based on a document encoding standard known as Extensible Markup Language (XML, for short), and the SOAP service is defined in such a way that users can then leverage XML no matter what the underlying communication network is. For this system to work, though, the data transferred by SOAP (commonly referred to as the payload) also needs to be in XML format. Notice a pattern here? The push to be comprehensive and flexible (or, to be all things to all people) plus the XML payload requirement meant that SOAP ended up being quite complex, making it a lot of work to use properly. As you might guess, many IT people found SOAP daunting and, consequently, resisted using it. About a decade ago, a doctoral student defined another web services approach as part of his thesis: REST Representational State Transfer, which is far less comprehensive than SOAP, aspires to solve fewer problems. It doesn't address some aspects of SOAP that seemed important but tha...

SQL Vs NOSQL real differences to read today

- September 27, 2015

SQL and NoSQL both or two different languages that will be used on different databases. In resolving bigdata analytics NoSQL is most popular. Where as SQL is popular in relational databases. SQL Vs NOSQL Top Differences SQL SQL is structured query language It was first commercial language used in RDBMS SQL language is divided into multiple sub elements NoSQL Data is not in one machine or even one network. Data can be any type public data and private data Huge volume of data so you cannot put it in one place. It is uncoordinated in time as well as space. It is not always nice, structured data that SQL was meant to handle. Also Read RDBMS Vs NoSQL Databases top differences

What is CompTIA Cloud+ Certification

- September 26, 2015

#What is CompTIA Cloud+ Certification: The CompTIA Cloud+ certification is an internationally recognized validation of the knowledge required of IT practitioners working in cloud computing environments. This exam will certify that the successful candidate has the knowledge and skills required to understand standard cloud terminology and methodologies to implement, maintain, and deliver cloud technologies and infrastructures (e.g., server, network, storage, and visualization technologies); and to understand aspects of IT security and use of industry best practices related to cloud implementations and the application of virtualization. Related: Cloud Computing+Jobs Cloud+ certified professionals ensure that proper security measures are maintained for cloud systems, storage, and platforms to mitigate risks and threats while ensuring usability. The exam is geared toward IT professionals with 24 to 36 months of experience in IT networking, network storage, or data center adminis...

Oracle 12C 'Bitmap Index' benefits over B-tree Index

- September 25, 2015

#Oracle 12C 'Bitmap Index' benefits over B-tree Index: A bitmap index has a significantly different structure from a B-tree index in the leaf node of the index. It stores one string of bits for each possible value (the cardinality) of the column being indexed. Note: One string of BITs means -Each tupple of possible value it assigns '1' bit in a string.So, all the BITs become a string ( This is an example, on which column you created BIT map index) The length of the string of bits is the same as the number of rows in the table being indexed. In addition to saving a tremendous amount of space compared to traditional indexes, a bitmap index can provide dramatic improvements in response time because Oracle can quickly remove potential rows from a query containing multiple WHERE clauses long before the table itself needs to be accessed. Multiple bitmaps can use logical AND and OR operations to determine which rows to access from the table. Although you ca...

What is Elastic Nature in Cloud Computing

- September 25, 2015

Natural clouds are indeed elastic, expanding and contracting based on the force of the winds carrying them. The cloud is similarly elastic, expanding and shrinking based on resource usage and cloud tenant resource demands. The physical resources (computing, storage, networking, etc.) deployed within the data center or across data centers and bundled as a single cloud usually do not change that fast. This elastic nature, therefore, is something that is built into the cloud at the software stack level, not the hardware. Best cloud computing example: The classic promise of the cloud is to make compute resources available on demand, which means that theoretically, a cloud should be able to scale as a business grows and shrink as the demand diminishes. Consider here, for example, Amazon.com during Black Friday. There's a spike in inbound traffic, which translates into more memory consumption, increased network density, and increased compute resource utilization. If Amazon.com ha...

Essential features of Hadoop Data joins (1 of 2)

- September 23, 2015

Limitation of map side joining: A record being processed by a mapper may be joined with a record not easily accessible (or even located) by that mapper. This is the main limitation. Who will facilitate map side join: Hadoop's apache.hadoop.mapred.join package contains helper classes to facilitate this map side join. What is joining data in Hadoop: You will come across, you need to analyze data from multiple sources, this scenario Hadoop follows data joining. In the case database world, joining of two or more tables is called joining. In Hadoop joining data involved different approaches. Approaches: Reduce side join Replicated joins using a Distributed cache Semijoin-Reduce side join with map side filtering What is the functionality of Map-reduce job: The traditional MapReduce job reads a set of input data, performs some transformations in the map phase, sorts the results, performs another transformation in the reduce phase, and writes a set of output data. The...

How to verify SSH Installed in Hadoop Cluster Quickly

- September 21, 2015

Below command helps, whether SSH is installed or not on your Hadoop cluster. [hadoop-user@master]$ which ssh /user/bin/bash [hadoop-user@master] $ which sshd /user/bin/sshd [hadoop-user@master] $ which ssh -keygen /user/bin/sshd If you do not get proper response as above. That means that SSH is not installed on your cluster. Resolution: If you receive an error message /user/bin/which: no ssh in (/user/bin: /user/sbin....) You need to install open SSH (www.openssh.com) vial Linux package manager. Or by downloading the source directly. Note: This is usually done by System Admin.

How to Use Help Command in HDFS

- September 21, 2015

Sometimes as a Hadoop developer it is difficult to remember all the Hadoop commands. The HELP command useful to know the correct syntax. ---- How to List all HDFS Commands hadoop fs ==> Enter This will list all Hadoop commands. Help Command in HDFS Hadoop commands are the flavor of UNIX. If you want to see each Command description, you can go for Hadoop help command. You can use the below command for help. hadoop fs -hlep ls Deleting a File in Hadoop HDFS The below command helps how to delete a file from Hadop cluster. hadoop fs -rm exmaple.txt

Why Amazon Web services AWS Cloud computing is so popular

- September 18, 2015

Amazon its Cloud computing services started in three stages: S3 (Simple storage service) SQS (Simple Que service) EC2 (Elastic compute cloud) Amazon Web Services was officially revealed to the world on March 13, 2006. On that day, AWS offered the Simple Storage Service, its first service. (As you may imagine, Simple Storage Services was soon shortened to S3.) The idea behind S3 was simple: It could offer the concept of object storage over the web, a setup where anyone could put an object — essentially, any bunch of bytes — into S3. Those bytes may comprise a digital photo or a file backup or a software package or a video or audio recording or a spreadsheet file or — well, you get the idea. S3 was relatively limited when it first started out. Though objects could, admittedly, be written or read from anywhere, they could be stored in only one region: the United States. Moreover, objects could be no larger than 5 gigabytes — not tiny by any means, but certainly smaller than ma...

Cloudera Impala top features useful for developers

- September 17, 2015

Cloudera Impala that runs on Apache Hadoop. The program was proclaimed in October 2012 with a common beta trial dispersion. Popular usage is in data analytics.The key features useful for interviews. Impala The Apache-licensed Impala program begets scalable collateral database techniques to Hadoop, authorizing consumers to subject low-latency SQL requests to information kept in HDFS and Apache HBase short of needing information motion either alteration. Impala is amalgamated with Hadoop to employ the similar file and information setups, metadata, safeguarding and asset administration architectures applied by MapReduce, Apache Hive, Apache Pig and different Hadoop code. Impala Applications Impala is advanced for experts and information experts in science to accomplish systematic computational analysis of data or statistics on information kept in Hadoop through SQL either trade intellect implements. The effect is that extensive information handling (via MapReduce) and tw...

Tutorial: SAP HANA Basics for Beginners

- September 16, 2015

What is SAP HANA? HANA stands for High-Performance Analytic Appliance. SAP HANA is a combination of hardware and software, and is therefore an appliance. SAP HANA supports column- and row-level storage. We can store and perform analytics on a huge amount of real-time, non-aggregated transactional data. Hence, HANA acts as both a database and a warehousing tool, which helps in making decisions at the right time. Challenges in Traditional RDBMS? There are a few challenges in traditional databases, such as latency, the cost involved, and complexity in accessing databases. Related: SAP HANA jobs and career options What is Architecture of traditional RDBMS? Presentation Layer: This is the top-most layer and allows users to manipulate data so that they can input it for querying. This data input from users is passed on to the database layer through the application layer and the results are passed back to the application layer to implement business logics. The presentation layer ...

How to Setup Hadoop Cluster Top Ideas

- September 13, 2015

Hadoop cluster setup in Centos Operating System explained in this post. So you can install CentOs either in your Laptop or in Virtual Machine. Hadoop Cluster Setup Process 9 Steps Process to Setup Hadoop Cluster Step 1: Installing Sun Java on Linux. Commands to execute for the same: sudo apt-add-repository ppa:flexiondotorg/java sudo apt-get update sudo apt-get install sun-java6-jre sun-java6-plugin sudo update-java-alternatives -s java-6-sun Step 2: Create Hadoop User. Commands to execute for the same: $sudo addgroup hadoop $sudo adduser —ingroup hadoop hduser Step 3: Install SSH Server if not already present. Commands are: $ sudo apt-get install openssh-server $ su - hduser $ ssh-keygen -t rsa -P "" $ cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys Step 4: Installing Hadoop. Commands for the same are: $wget http://www.eng.lsu.edu/mirrors/apache/hadoop/core/hadoop-0.22.0/hadoop-0.22.0.tar.gz $ cd /home/hduser $ tar xzf ...

Search This Blog

ApplyBigAnalytics

Posts

Featured Post

Python: Built-in Functions vs. For & If Loops – 5 Programs Explained

6 Advantages of Columnar Databases over Traditional RDBMS