Featured post

The Ultimate Cheat Sheet On Hadoop

Top 20 frequently asked questions to test your Hadoop knowledge given in the below Hadoop cheat sheet. Try finding your own answers and match the answers given here.

Question #1 

You have written a MapReduce job that will process 500 million input records and generate 500 million key-value pairs. The data is not uniformly distributed. Your MapReduce job will create a significant amount of intermediate data that it needs to transfer between mappers and reducers which is a potential bottleneck. A custom implementation of which of the following interfaces is most likely to reduce the amount of intermediate data transferred across the network?

A. Writable
B. WritableComparable
C. InputFormat
D. OutputFormat
E. Combiner
F. Partitioner
Ans: e

Question #2 

Where is Hive metastore stored by default ?

B. In client machine in the form of a flat file.
C. In client machine in a derby database
D. In lib directory of HADOOP_HOME, and requires HADOOP_CLASSPATH to be modified.
Ans: c


Case Study On Cloud Computing

The concept of cloud computing is in use for many years, but in recent years has it become a highlighted and come in picture. In the year 1990s, cloud computing was developed by major IT providers such as Sun, Microsoft, Google, and Amazon.

Cloud computing
Cloud computing
Different products came into use for different levels of users. The most popular services for end users include web-based email systems (SaaS), e.g. AOL, Gmail, Hotmail, and Yahoo! Mail, and office applications such as Google Docs, Microsoft MS Office Online, Cloud-canvas.com, and Write.fm, etc.

Developers can run their programs on the cloud (PaaS) like Google AppEngine, Windows Azure, and Force.com. Companies or organizations store or backup their large data on remote servers (IaaS), for example, Rackspace, Microsoft Azure, Animoto, Jungle Disk and Amazon's EC2 or S3 servers.

In 2011, the Primary Research Group (PRG) published a report of its recently conducted survey on library use of cloud computing (Primary Research Group, 2011). Participants included 70 libraries worldwide with the majority from the United States. The survey report reveals that 61.97 percent of libraries in the sample used free SaaS while 22.54 percent of libraries sampled used paid subscription SaaS; less than 3 percent of libraries surveyed used PaaS, and 4.23 percent used IaaS.

Most libraries using PaaS or IaaS had annual budgets over $5,000,000. Smaller libraries usually used the servers of their parent organizations, while libraries with multi-million dollar budgets tended to use their own servers.

  • Few case studies also show the exploration and adoption of other models of cloud computing in libraries. For example, California State University libraries have migrated their key library systems to vendors' cloud-based servers (i.e. a public cloud) and to campus IT's internally virtualized environment (i.e. a private cloud) (Wang, 2012). 
  • The Burritt Library at Central Connecticut State University used Amazon's S3 to back up their high-resolution digital objects (Iglesias, 2011). Murray State University library experimented with Dropbox for library services (Bagley, 2011). University of Arizona Libraries have migrated their ILS, Digital Libraries website, Interlibrary Loan system and repository software to cloud-based services (Han, 2010). 
  • The Z. Smith Reynolds Library in Winston Salem, NC, started to use Amazon's EC2 for hosting its website, discovery services, and digital library services in 2009 (Mitchell, 2010b).

The service model represent server administration and maintenance responsibilities are moved from local personnel to the hosting vendor, while the management of the application remains in the traditional way, i.e. librarians are still able to access the backend of the system for local customizations as if they were managing the system locally.


Popular posts from this blog

AWS Vs Azure Load Balancers Top Insights

Hadoop File System Basic Commands

4 Important Skills You Need for Data Scientists

Hyperledger Fabric: 20 Real Interview Questions