WHAT IS WEB SCRAPPING ?
Web-scrapping is the process of gathering information from the Internet. Even copy-pasting the lyrics of your favorite song is a form of web scraping! However, the words “web scraping” usually refer to a process that involves automation. Some websites don’t like it when automatic scrapers gather their data, while others don’t mind.
SCRAPPING THE INDEED JOB SITE
In this tutorial, you’ll build a web scraper that fetches Software Developer job listings from the indeed job aggregator site. Your web scraper will parse the HTML to pick out the relevant pieces of information and filter that content for specific words.
You can scrape any site on the Internet that you can look at, but the difficulty of doing so depends on the site. This tutorial offers you an introduction to web scraping to help you understand the overall process. Then, you can apply this same process for every website you’ll want to scrape.
PYTHON CODE USING BeautifulSoup
#Please fix some indendation before using this code
import pandas as pd
import requests
from bs4 import BeautifulSoup
JobTitle = []
IndustryName = []
HiringOrganization = []
SalaryRange = []
RequiredExperience = []
JobLocation = []
Requirement = []
Required_Language = []
PostedDate = []
for i in range(1,25):
url = 'https://www.workindia.in/jobs-in-ahmedabad/?pg='+str(i)
page = requests.get(url)
soup = BeautifulSoup(page.content,'html.parser')
inds = soup.findAll(attrs={'itemprop':'industry'})
tit = soup.findAll(attrs={'class': 'text-brand f-bold f16 text-ellipsis'})
hir = soup.findAll(attrs={'itemprop': 'hiringOrganization'})
sal = soup.findAll(attrs={'itemprop': 'baseSalary'})
exp = soup.findAll(attrs={'itemprop': 'experienceRequirements'})
loc = soup.findAll(attrs={'itemprop': 'jobLocation'})
edu = soup.findAll(attrs={'itemprop': 'educationRequirements'})
lan = soup.findAll(attrs={'itemprop': 'qualifications'})
dt = soup.findAll(attrs={'itemprop': 'datePosted'})
for item in tit:
title = item.text
JobTitle.append(title)
for item1 in inds:
industry = item1.text
IndustryName.append(industry)
for item2 in hir:
hiring = item2.text
HiringOrganization.append(hiring)
for item3 in sal:
salary = item3.text
SalaryRange.append(salary)
for item4 in exp:
experience = item4.text
RequiredExperience.append(experience)
for item5 in loc:
location = item5.text
JobLocation.append(location)
for item6 in edu:
education = item6.text
Requirement.append(education)
for item7 in lan:
language = item7.text
Required_Language.append(language)
for item8 in dt:
postdate = item8.text
PostedDate.append(postdate)
dict = {'Job_Title': JobTitle, 'Industry_Name': IndustryName, 'Hiring_Organization': HiringOrganization,
'Salary_Range': SalaryRange, 'Required_Experience': RequiredExperience, 'Job_Location': JobLocation,
'Requirement': Requirement, 'Languages': Required_Language, 'Posted_Date': PostedDate}
df = pd.DataFrame(dict)
print(df)
df.to_csv('WorkinidaDataScrapped.csv')
LEARN MORE ABOUT WEB-SCRAPPING USING BeautifulSoup :
We @ DNG INC always strive to work for futuristic analytical techniques to solve today’s
problem and Derive Next Gen (www.dngsoftwares.com) data solutions.
0 Comments