Featured post

The Ultimate Cheat Sheet On Hadoop

Top 20 frequently asked questions to test your Hadoop knowledge given in the below Hadoop cheat sheet. Try finding your own answers and match the answers given here.




Question #1 

You have written a MapReduce job that will process 500 million input records and generate 500 million key-value pairs. The data is not uniformly distributed. Your MapReduce job will create a significant amount of intermediate data that it needs to transfer between mappers and reducers which is a potential bottleneck. A custom implementation of which of the following interfaces is most likely to reduce the amount of intermediate data transferred across the network?



A. Writable
B. WritableComparable
C. InputFormat
D. OutputFormat
E. Combiner
F. Partitioner
Ans: e




Question #2 

Where is Hive metastore stored by default ?


A. In HDFS
B. In client machine in the form of a flat file.
C. In client machine in a derby database
D. In lib directory of HADOOP_HOME, and requires HADOOP_CLASSPATH to be modified.
Ans: c




Question…

Cloud integration two top key methods people follow

ITaaS is the most recent entrant to the IT landscape. It is an efficient delivery method. With the meteoric and mesmerizing rise of service orientation principles, every single IT element is being viewed and visualized as a service that sets the tone for the service era.
cloud integration

Two top methods for Cloud integration

1 - Integration as a service (IaaS)

  1. It is a budding and distinctive capability of clouds that help in fulfilling internal as well as external business integration requirements. Increasingly, business applications are deployed in clouds to reap the manifold business and technical benefits of using clouds.
  2. On the other hand, innumerable mission-critical applications and data sources still remain locally stationed and sustained primarily due to the expressed security concerns associated with hosting them in clouds. The question here is how to create seamless data flow between hosted and on-premise applications so that they work together.
  3. The IaaS overcomes these challenges by smartly utilizing the time-tested B2B integration technology as the value-added bridge between SaaS solutions and in-house business applications. The B2B systems are capable of driving this new on-demand integration model because they are traditionally used to automate business processes between manufacturers and their trading partners.
  4. This means they provide application-to-application connectivity along with the functionality that is crucial for linking internal and external software securely. 
  5. Unlike the conventional EAI solutions designed only for internal data sharing, B2B platforms have the ability to encrypt files for safe passage across the public network, manage large data volumes, transfer batch files, convert disparate file formats, and guarantee data delivery across multiple enterprises. 
  6. The IaaS just imitates this established communication and collaboration model to create reliable and durable linkage for ensuring smooth data passage between traditional and cloud systems over the web infrastructure.
Related: Latest trends in Distributed Computing

2 - Hub-and-spoke architecture

  • It further simplifies the implementation and avoids placing an excessive processing burden on the customer side. The hub is installed at the SaaS provider's cloud center to do the heavy lifting, such as the reformatting of files.
  • A spoke unit at each user site typically acts as a basic data transfer utility. With these pieces in place, SaaS providers can offer integration services under the same subscription/ usage-based pricing model as their core offerings. 
  • As IT resources are becoming more distributed and decentralized every day, linking and leveraging them for multiple purposes need a multifaceted infrastructure.
  • Clouds, being web-based infrastructures, are the best fit for hosting scores of unified and utility-like platforms to take care of all sorts of brokering needs among connected ICT systems.

Comments

Popular posts from this blog

AWS Vs Azure Load Balancers Top Insights

Hadoop File System Basic Commands

4 Important Skills You Need for Data Scientists

Hyperledger Fabric: 20 Real Interview Questions