Featured Post

Step-by-Step Guide to Creating an AWS RDS Database Instance

Image
 Amazon Relational Database Service (AWS RDS) makes it easy to set up, operate, and scale a relational database in the cloud. Instead of managing servers, patching OS, and handling backups manually, AWS RDS takes care of the heavy lifting so you can focus on building applications and data pipelines. In this blog, we’ll walk through how to create an AWS RDS instance , key configuration choices, and best practices you should follow in real-world projects. What is AWS RDS? AWS RDS is a managed database service that supports popular relational engines such as: Amazon Aurora (MySQL / PostgreSQL compatible) MySQL PostgreSQL MariaDB Oracle SQL Server With RDS, AWS manages: Database provisioning Automated backups Software patching High availability (Multi-AZ) Monitoring and scaling Prerequisites Before creating an RDS instance, make sure you have: An active AWS account Proper IAM permissions (RDS, EC2, VPC) A basic understanding of: ...

Hyderabad Based Startup Built Largest Ever Big data Electoral Repository

I was gone through an email from my friend saying that they are creating a Hadoop project to analyze voters data. This project in my view is both academic and research oriented.
hadoop project
The real challenge was extraction of voter info from 2.5 crore PDF pages and translation of the same into English to fuse with other sources. The technology was a big hurdle. 

Hadoop Project

The infrastructure, built especially for the project, included 64 node Hadoop, PostgreSQL and servers that process a master file containing over 8 Terabytes of Data.

Besides, Testing and Validation was another big task. ‘First of a Kind’ Heuristic (machine learning) algorithms were developed for people classification based on name, geography etc., which help in the identification of religion, caste, and even ethnicity.

Data from Sources

“Data from multiple sources like census, economic and social surveys were mapped to polling booths. Simultaneously, external and propriety data sources had to be fused with individual voters’ data,” informed Joshi. 



Also Read

Comments

Popular posts from this blog

Step-by-Step Guide to Reading Different Files in Python

SQL Query: 3 Methods for Calculating Cumulative SUM

PowerCurve for Beginners: A Comprehensive Guide