Web Scraping using Python

What is web Scraping?

The technique of obtaining structured web data in an automated manner is known as web scraping. It’s also known as the method of automatically collecting structured web data.

Web Scraping
  1. Find the URL that you want to scrape or the URL data you want to collect
  2. Inspecting the Page(right clicking on webpage and view the source code)
  3. Find the info or a tag and class you would like to extract
  4. Write the code to extract the data from web site
  5. Store the info within the required format(csv,json)
  6. Run the code and extract the info

We are using python language along with it’s libraries requests, pandas and BeautifulSoup for web scraping.

Requests- It is a Python HTTP library. It makes HTTP requests simpler. we just need to add the URL as an argument and the get() gets all the information from it.

Pandas- Pandas is a library that may be used to manipulate and analyse data. It is customary to extract data and save it in the desired format.

BeautifulSoup- is another powerful Python library for pulling out data from HTML/XML files. It creates a parse tree for parsed pages that can be used to extract data from HTML, which is useful for web scraping.

In this blog, I shall illustrate how to scrape weather prediction data from National Weather Service website.

  1. Find the URL that you want to scrape :

For this instance , we are going scrape National Weather Service website to extract the information about Los Angeles from this page.

Weather report of Los Angeles

2. Inspecting the Page :

The data is usually nested in tags. So, we inspect the page to ascertain , under which tag the info we would like to scrape is nested. To inspect the page, click on the element and click on on “Inspect”.

3. Find the info you would like to extract :

Extract the Title name,year of releasing and rating which is within the “div” tag respectively.

4. Write the code :

First, let’s create a Python file. To do this, you can use Google Co-lab or Jupiter book. I am using Google Co-lab for this.

Import libraries:

import requests
from bs4 import BeautifulSoup
import pandas as pd

Create an empty array to store the details. In this case, we create 3 Empty Array to store the period, short description and temperature.

Period=[]
Short_description=[]
Temperature=[]

open the URL and extract the data.

url = "https://forecast.weather.gov/MapClick.php?lat=34.0536&lon=-118.2454"
r = requests.get(url)

Using the Find and Find All methods in BeautifulSoup. We extract the data and store into the variable.

soup = BeautifulSoup(r.content,"html.parser")
week = soup.find(id="seven-day-forecast-body")
items = soup.find_all("div",class_ = "tombstone-container")
period_name = [item.find(class_="period-name").get_text() for item in items]
short_desc = [item.find(class_="short-desc").get_text() for item in items]
temp = [item.find(class_="temp").get_text() for item in items]

Using append we store the small print within the Array we’ve created before

Period.append(period_name)
Short_description.append(short_desc)
Temperature.append(temp)

5. Store the info during a Sheet.We store the info in Comma-separated values (CSV format)

a = {"Period" : period_name,"Short Description" : short_desc,"Temperature" : temp}
df = pd.DataFrame.from_dict(a)
df.to_csv('18IT003.csv', index=False, encoding='utf-8')

6. Now run the whole code.

All the data are stored in 18IT003.csv file.

18IT003.csv file

For whole code, Here is the Github.

I hope you guys enjoyed this blog on “Web Scraping using Python”. Visit the profile for more blogs. If you like this blog then do not forget to give a clap😄.

--

--

--

2x AWS Certified

Love podcasts or audiobooks? Learn on the go with our new app.

What I Learned getting my Lean Six Sigma Green Belt Certification

Sentiment analysis using NLTK

Regression with a View — Part 1

Reduce Your Data’s Memory and Your Life is Better

Public Land Survey System

Code: the new data to look at

Location Analytics Use-Cases that Every Data Scientist Should Know

Predictive models

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Janvi Ajudiya

Janvi Ajudiya

2x AWS Certified

More from Medium

List Data Structure in Python

Introduction to Uipath Object Repository

How everything started.

Data Structures In Python