Posts

Showing posts with the label workflow

Featured Post

Scraping Website: How to Write a Script in Python

Image
Here's a python script that you can use as a model to scrape a website. Python script The below logic uses BeautifulSoup Package for web scraping. import requests from bs4 import BeautifulSoup url = 'https://www.example.com' response = requests.get(url) soup = BeautifulSoup(response.text, 'html.parser') # Print the title of the webpage print(soup.title.text) # Print all the links in the webpage for link in soup.find_all('a'):     print(link.get('href')) In this script, we first import the Requests and Beautiful Soup libraries. We then define the URL we want to scrape and use the Requests library to send a GET request to that URL. We then pass the response text to Beautiful Soup to parse the HTML contents of the webpage. We then use Beautiful Soup to extract the title of the webpage and print it to the console. We also use a for loop to find all the links in the webpage and print their href attributes to the console. This is just a basic example, but

Oozie - Concepts And Architecture

Image
Oozie is a workflow/coordination system that you can use to manage Apache Hadoop jobs. It is one of the main components of Oozie is the Oozie server — a web application that runs in a Java servlet container (the standard Oozie distribution is using Tomcat). Oozie is a workflow management-server that works on the Oozie server. Role of Oozie in Workflow Management in Hadoop Jobs This server supports reading and executing Workflows, Coordinators, Bundles, and SLA definitions . It implements a set of remote Web Services APIs that can be invoked from Oozie client components and third-party applications. Add a note where the execution of the server leverages a customizable database. This database contains Workflow, Coordinator, Bundle, and SLA definitions, as well as execution states and process variables. The list of currently supported databases includes MySQL, Oracle, and Apache Derby. The Oozie shared library component is located in the Oozie HOME directory and contains code u