Featured Post

Step-by-Step Guide to Reading Different Files in Python

Image
 In the world of data science, automation, and general programming, working with files is unavoidable. Whether you’re dealing with CSV reports, JSON APIs, Excel sheets, or text logs, Python provides rich and easy-to-use libraries for reading different file formats. In this guide, we’ll explore how to read different files in Python , with code examples and best practices. 1. Reading Text Files ( .txt ) Text files are the simplest form of files. Python’s built-in open() function handles them effortlessly. Example: # Open and read a text file with open ( "sample.txt" , "r" ) as file: content = file.read() print (content) Explanation: "r" mode means read . with open() automatically closes the file when done. Best Practice: Always use with to handle files to avoid memory leaks. 2. Reading CSV Files ( .csv ) CSV files are widely used for storing tabular data. Python has a built-in csv module and a powerful pandas library. Using cs...

Hyderabad Based Startup Built Largest Ever Big data Electoral Repository

I was gone through an email from my friend saying that they are creating a Hadoop project to analyze voters data. This project in my view is both academic and research oriented.
hadoop project
The real challenge was extraction of voter info from 2.5 crore PDF pages and translation of the same into English to fuse with other sources. The technology was a big hurdle. 

Hadoop Project

The infrastructure, built especially for the project, included 64 node Hadoop, PostgreSQL and servers that process a master file containing over 8 Terabytes of Data.

Besides, Testing and Validation was another big task. ‘First of a Kind’ Heuristic (machine learning) algorithms were developed for people classification based on name, geography etc., which help in the identification of religion, caste, and even ethnicity.

Data from Sources

“Data from multiple sources like census, economic and social surveys were mapped to polling booths. Simultaneously, external and propriety data sources had to be fused with individual voters’ data,” informed Joshi. 



Also Read

Comments

Popular posts from this blog

SQL Query: 3 Methods for Calculating Cumulative SUM

5 SQL Queries That Popularly Used in Data Analysis

Big Data: Top Cloud Computing Interview Questions (1 of 4)