Featured Post

Scraping Website: How to Write a Script in Python

Image
Here's a python script that you can use as a model to scrape a website. Python script The below logic uses BeautifulSoup Package for web scraping. import requests from bs4 import BeautifulSoup url = 'https://www.example.com' response = requests.get(url) soup = BeautifulSoup(response.text, 'html.parser') # Print the title of the webpage print(soup.title.text) # Print all the links in the webpage for link in soup.find_all('a'):     print(link.get('href')) In this script, we first import the Requests and Beautiful Soup libraries. We then define the URL we want to scrape and use the Requests library to send a GET request to that URL. We then pass the response text to Beautiful Soup to parse the HTML contents of the webpage. We then use Beautiful Soup to extract the title of the webpage and print it to the console. We also use a for loop to find all the links in the webpage and print their href attributes to the console. This is just a basic example, but

What is Cluster- In the age of Big data and Analytics

A cluster is local in that all of its component subsystems are supervised within a single administrative domain, usually residing in a single room and managed as a single computer system.

The constituent computer nodes are commercial-off-the-shelf (COTS), are capable of full independent operation as is, and are of a type ordinarily employed individually for standalone mainstream workloads and applications.
Hadoop + Cluster + Nodes-Jobs
(Cluster in Hadoop- Career options)

The nodes may incorporate a single microprocessor or multiple microprocessors in a symmetric multiprocessor (SMP) configuration.

The interconnection network employs COTS local area network (LAN) or systems area network (SAN) technology that may be a hierarchy of or multiple separate network structures. A cluster network is dedicated to the integration of the cluster compute nodes and is separate from the cluster's external (worldly) environment.

A cluster may be employed in many modes including but no limited to: high capability or sustained performance on a single problem, high capacity or throughput on a job or process workload, high availability through redundancy of nodes, or high bandwidth through multiplicity of disks and disk access or I/O channels.

Comments

Popular posts from this blog

7 AWS Interview Questions asked in Infosys, TCS

How to Decode TLV Quickly