Featured Post

Best Practices for Handling Duplicate Elements in Python Lists

Here are three awesome ways that you can use to remove duplicates in a list. These are helpful in resolving your data analytics solutions.  01. Using a Set Convert the list into a set , which automatically removes duplicates due to its unique element nature, and then convert the set back to a list. Solution: original_list = [2, 4, 6, 2, 8, 6, 10] unique_list = list(set(original_list)) 02. Using a Loop Iterate through the original list and append elements to a new list only if they haven't been added before. Solution: original_list = [2, 4, 6, 2, 8, 6, 10] unique_list = [] for item in original_list:     if item not in unique_list:         unique_list.append(item) 03. Using List Comprehension Create a new list using a list comprehension that includes only the elements not already present in the new list. Solution: original_list = [2, 4, 6, 2, 8, 6, 10] unique_list = [] [unique_list.append(item) for item in original_list if item not in unique_list] All three methods will result in uni

Hyderabad Based Startup Built Largest Ever Big data Electoral Repository

I was gone through an email from my friend saying that they are creating a Hadoop project to analyze voters data. This project in my view is both academic and research oriented.
hadoop project
The real challenge was extraction of voter info from 2.5 crore PDF pages and translation of the same into English to fuse with other sources. The technology was a big hurdle. 

Hadoop Project

The infrastructure, built especially for the project, included 64 node Hadoop, PostgreSQL and servers that process a master file containing over 8 Terabytes of Data.

Besides, Testing and Validation was another big task. ‘First of a Kind’ Heuristic (machine learning) algorithms were developed for people classification based on name, geography etc., which help in the identification of religion, caste, and even ethnicity.

Data from Sources

“Data from multiple sources like census, economic and social surveys were mapped to polling booths. Simultaneously, external and propriety data sources had to be fused with individual voters’ data,” informed Joshi. 

Also Read


Popular posts from this blog

Explained Ideal Structure of Python Class

6 Python file Methods Real Usage

How to Decode TLV Quickly