Web Scraping using Python

What is web Scraping?

The technique of obtaining structured web data in an automated manner is known as web scraping. It’s also known as the method of automatically collecting structured web data.

Web Scraping
  1. Find the URL that you want to scrape or the URL data you want to collect
  2. Inspecting the Page(right clicking on webpage and view the source code)
  3. Find the info or a tag and class you would like to extract
  4. Write the code to extract the data from web site
  5. Store the info within the required format(csv,json)
  6. Run the code and extract the info

We are using python language along with it’s libraries requests, pandas and BeautifulSoup for web scraping.

Requests- It is a Python HTTP library. It makes HTTP requests simpler. we just need to add the URL as an argument and the get() gets all the information from it.

Pandas- Pandas is a library that may be used to manipulate and analyse data. It is customary to extract data and save it in the desired format.

BeautifulSoup- is another powerful Python library for pulling out data from HTML/XML files. It creates a parse tree for parsed pages that can be used to extract data from HTML, which is useful for web scraping.

In this blog, I shall illustrate how to scrape weather prediction data from National Weather Service website.

  1. Find the URL that you want to scrape :

For this instance , we are going scrape National Weather Service website to extract the information about Los Angeles from this page.

Weather report of Los Angeles

2. Inspecting the Page :

The data is usually nested in tags. So, we inspect the page to ascertain , under which tag the info we would like to scrape is nested. To inspect the page, click on the element and click on on “Inspect”.

3. Find the info you would like to extract :

Extract the Title name,year of releasing and rating which is within the “div” tag respectively.

4. Write the code :

First, let’s create a Python file. To do this, you can use Google Co-lab or Jupiter book. I am using Google Co-lab for this.

Import libraries:

import requests
from bs4 import BeautifulSoup
import pandas as pd

Create an empty array to store the details. In this case, we create 3 Empty Array to store the period, short description and temperature.

Period=[]
Short_description=[]
Temperature=[]

open the URL and extract the data.

url = "https://forecast.weather.gov/MapClick.php?lat=34.0536&lon=-118.2454"
r = requests.get(url)

Using the Find and Find All methods in BeautifulSoup. We extract the data and store into the variable.

soup = BeautifulSoup(r.content,"html.parser")
week = soup.find(id="seven-day-forecast-body")
items = soup.find_all("div",class_ = "tombstone-container")
period_name = [item.find(class_="period-name").get_text() for item in items]
short_desc = [item.find(class_="short-desc").get_text() for item in items]
temp = [item.find(class_="temp").get_text() for item in items]

Using append we store the small print within the Array we’ve created before

Period.append(period_name)
Short_description.append(short_desc)
Temperature.append(temp)

5. Store the info during a Sheet.We store the info in Comma-separated values (CSV format)

a = {"Period" : period_name,"Short Description" : short_desc,"Temperature" : temp}
df = pd.DataFrame.from_dict(a)
df.to_csv('18IT003.csv', index=False, encoding='utf-8')

6. Now run the whole code.

All the data are stored in 18IT003.csv file.

18IT003.csv file

For whole code, Here is the Github.

I hope you guys enjoyed this blog on “Web Scraping using Python”. Visit the profile for more blogs. If you like this blog then do not forget to give a clap😄.