WHAT IS WEB SCRAPPING ?

Web-scrapping is the process of gathering information from the Internet. Even copy-pasting the lyrics of your favorite song is a form of web scraping! However, the words “web scraping” usually refer to a process that involves automation. Some websites don’t like it when automatic scrapers gather their data, while others don’t mind.

SCRAPPING THE INDEED JOB SITE

In this tutorial, you’ll build a web scraper that fetches Software Developer job listings from the indeed job aggregator site. Your web scraper will parse the HTML to pick out the relevant pieces of information and filter that content for specific words.
You can scrape any site on the Internet that you can look at, but the difficulty of doing so depends on the site. This tutorial offers you an introduction to web scraping to help you understand the overall process. Then, you can apply this same process for every website you’ll want to scrape.

PYTHON CODE USING BeautifulSoup

#Please fix some indendation before using this code
import pandas as pd
import requests
from bs4 import BeautifulSoup


JobTitle = []
IndustryName = []
HiringOrganization = []
SalaryRange = []
RequiredExperience = []
JobLocation = []
Requirement = []
Required_Language = []
PostedDate = []


for i in range(1,25):

    url = 'https://www.workindia.in/jobs-in-ahmedabad/?pg='+str(i)
    page = requests.get(url)
    soup = BeautifulSoup(page.content,'html.parser')
    inds = soup.findAll(attrs={'itemprop':'industry'})
    tit = soup.findAll(attrs={'class': 'text-brand f-bold f16 text-ellipsis'})
    hir = soup.findAll(attrs={'itemprop': 'hiringOrganization'})
    sal = soup.findAll(attrs={'itemprop': 'baseSalary'})
    exp = soup.findAll(attrs={'itemprop': 'experienceRequirements'})
    loc = soup.findAll(attrs={'itemprop': 'jobLocation'})
    edu = soup.findAll(attrs={'itemprop': 'educationRequirements'})
    lan = soup.findAll(attrs={'itemprop': 'qualifications'})
    dt = soup.findAll(attrs={'itemprop': 'datePosted'})


    for item in tit:

        title = item.text
        JobTitle.append(title)


    for item1 in inds:

        industry = item1.text
        IndustryName.append(industry)

    for item2 in hir:
        hiring = item2.text
        HiringOrganization.append(hiring)

    for item3 in sal:
        salary = item3.text
        SalaryRange.append(salary)

    for item4 in exp:
        experience = item4.text
        RequiredExperience.append(experience)

    for item5 in loc:
        location = item5.text
        JobLocation.append(location)

    for item6 in edu:
        education = item6.text
        Requirement.append(education)

    for item7 in lan:
        language = item7.text
        Required_Language.append(language)

    for item8 in dt:
        postdate = item8.text
        PostedDate.append(postdate)



dict = {'Job_Title': JobTitle, 'Industry_Name': IndustryName, 'Hiring_Organization': HiringOrganization,
'Salary_Range': SalaryRange, 'Required_Experience': RequiredExperience, 'Job_Location': JobLocation,
'Requirement': Requirement, 'Languages': Required_Language, 'Posted_Date': PostedDate}

df = pd.DataFrame(dict)
print(df)

df.to_csv('WorkinidaDataScrapped.csv')

LEARN MORE ABOUT WEB-SCRAPPING USING BeautifulSoup :

We @ DNG INC always strive to work for futuristic analytical techniques to solve today’s
problem and Derive Next Gen
 (www.dngsoftwares.com) data solutions.
Categories: WEB SCRAPPING

0 Comments

Leave a Reply

Avatar placeholder

Your email address will not be published. Required fields are marked *