Featured Post

Python map() and lambda() Use Cases and Examples

Image
 In Python, map() and lambda functions are often used together for functional programming. Here are some examples to illustrate how they work. Python map and lambda top use cases 1. Using map() with lambda The map() function applies a given function to all items in an iterable (like a list) and returns a map object (which can be converted to a list). Example: Doubling Numbers numbers = [ 1 , 2 , 3 , 4 , 5 ] doubled = list ( map ( lambda x: x * 2 , numbers)) print (doubled) # Output: [2, 4, 6, 8, 10] 2. Using map() to Convert Data Types Example: Converting Strings to Integers string_numbers = [ "1" , "2" , "3" , "4" , "5" ] integers = list ( map ( lambda x: int (x), string_numbers)) print (integers) # Output: [1, 2, 3, 4, 5] 3. Using map() with Multiple Iterables You can also use map() with more than one iterable. The lambda function can take multiple arguments. Example: Adding Two Lists Element-wise list1 = [ 1 , 2 , 3 ]

Big Data: IBM InfoSphere BigInsights Basics

I am explaining here why you need IBM infoSphere. You all know about what is file system in Hadoop.
Hadoop is a distributed file system and data processing engine that is designed to handle extremely high volumes of data in any structure.
In simpler terms, just imagine that you've got dozens, or even hundreds (or thousands!) of individual computers racked and networked together. Each computer (often referred to as a node in Hadoop-speak) has its own processors and a dozen or so 2TB or 3TB hard disk drives.

All of these nodes are running software that unifies them into a single cluster, where, instead of seeing the individual computers, you see an extremely large volume where you can store your data.

The beauty of this Hadoop system is that you can store anything in this space: millions of digital image scans of mortgage contracts, days and weeks of security camera footage, trillions of sensor-generated log records, or all of the operator transcription notes from a call center. 

This ingestion of data, without worrying about the data model, is actually a key tenet of the NoSQL movement.

IBM InfoSphere BigInsights


BigInsights features Apache Hadoop and its related open source projects as a core component. This is informally known as the IBM Distribution for Hadoop. IBM remains committed to the integrity of these open source projects and will ensure 100 percent compatibility with them.
BigInsights is IBM Open Source for Hadoop
This fidelity to open source provides a number of benefits. For people who have developed code against other 100 percent open source–compatible distributions, their applications will also run on BigInsights, and vice versa. This open source compatibility has enabled IBM to amass over 100 partners, including dozens of software vendors, for BigInsights.

Simply put, if the software vendor uses the libraries and interfaces for open source Hadoop, they'll work with BigInsights as well.

Components in IBM Infosphere Biginsights

Hadoop (common utilities, HDFS, and the MapReduce framework)

1.0.3

Avro (data serialization)

1.6.3

Chukwa (monitoring large clustered systems)

0.5.0

Flume (data collection and aggregation)

0.9.4

HBase (real-time read and write database)

0.94.0

HCatalog (table and storage management)

0.4.0

Hive (data summarization and querying)

0.9.0

Lucene (text search)

3.3.0

Oozie (work flow and job orchestration)

3.2.0

Pig (programming and query language)

0.10.1

Sqoop (data transfer between Hadoop and databases)

1.4.1

ZooKeeper (process coordination)

Comments

Popular posts from this blog

How to Fix datetime Import Error in Python Quickly

SQL Query: 3 Methods for Calculating Cumulative SUM

Python placeholder '_' Perfect Way to Use it