Posts

Showing posts with the label Cleaning data

Featured Post

Step-by-Step Guide to Creating an AWS RDS Database Instance

Image
 Amazon Relational Database Service (AWS RDS) makes it easy to set up, operate, and scale a relational database in the cloud. Instead of managing servers, patching OS, and handling backups manually, AWS RDS takes care of the heavy lifting so you can focus on building applications and data pipelines. In this blog, we’ll walk through how to create an AWS RDS instance , key configuration choices, and best practices you should follow in real-world projects. What is AWS RDS? AWS RDS is a managed database service that supports popular relational engines such as: Amazon Aurora (MySQL / PostgreSQL compatible) MySQL PostgreSQL MariaDB Oracle SQL Server With RDS, AWS manages: Database provisioning Automated backups Software patching High availability (Multi-AZ) Monitoring and scaling Prerequisites Before creating an RDS instance, make sure you have: An active AWS account Proper IAM permissions (RDS, EC2, VPC) A basic understanding of: ...

10 Excusive Steps You need for Web Scrapping

Image
Here're ten Python technics to clean the scraped data. The scraped  Text has unwanted hidden data . So, as part of cleaning it try to remove these ten in your data. 10 Steps for Web scrapping Data is prime input for  text analytics projects . After cleaning, you can feed to Machine/Deep Learning systems. Removing HTML tags Tokenization Removing unnecessary tokens and stop-words Handling contractions Correcting spelling errors Stemming Lemmatization Tagging Chunking Parsing 10 Technics to Clean Text in Python 1. Removing HTML tags The unstructured text contains a lot of noise ( data from web pages, blogs, and online repositories.)when you use web/screen scraping.  The HTML tags, JavaScript, and Iframe tags typically don't add much value to understanding and analyzing text. Our purpose is to remove HTML tags, and other noise. 2. Tokenization Tokens are independent and minimal textual components. And have a definite syntax and semantics. A paragraph of text or a text documen...