python beautifulsoup web scraping issue

Course Queries Syllabus Queries 3 years ago

3.75K 2 0 0 0

User submissions are the sole responsibility of contributors, with TuteeHUB disclaiming liability for accuracy, copyrights, or consequences of use; content is for informational purposes only and not professional advice.

Answers (2)

Post Answer
profilepic.png
manpreet Tuteehub forum best answer Best Answer 3 years ago

 

s://forum.tuteehub.com/tag/page">page = requests.get("http://www.freejobalert.com/upsc-recruitment/16960/#Engg-Services2019")
c = s://forum.tuteehub.com/tag/page">page.content
soup=BeautifulSoup(c,"html.parser")
data=soup.find_all("tr")
for r in data:
    td = r.find_all("td",{"style":"text-align: center;"})
    for d in td:
        link =d.find_all("a")
        for li in link:
            span = li.find_all("span",{"style":"color: #008000;"})
            for s in span:
                strong = s.find_all("strong")
                for st in strong:
                        dict['title'] = st.text
        for l in link:
            dict["link"] = l['href']
    print(dict)

It is giving

{'title': 'Syllabus', 'link': 'http://www.upsc.gov.in/'}
{'title': 'Syllabus', 'link': 'http://www.upsc.gov.in/'}
{'title': 'Syllabus', 'link': 'http://www.upsc.gov.in/'}

I am expecting:

{'title': 'Apply Online', 'link': 'https://upsconline.nic.in/mainmenu2.php'}
{'title': 'Notification', 'link': 'http://www.freejobalert.com/wp-content/uploads/2018/09/Notification-UPSC-Engg-Services-Prelims-Exam-2019.pdf'}
{'title': 'Official Website ', 'link': 'http://www.upsc.gov.in/'}

Here i want all "Important Links" means "Apply online","Notification","official s://forum.tuteehub.com/tag/website">website" and it's link for each table. but it is giving me "Syllabus" in title instead with repeting links..

please have a look into this..

0 views
0 shares

profilepic.png
manpreet 3 years ago

This may help you, check the code below.

import requests
from bs4 import BeautifulSoup
com/tag/page">page = requests.get('http://www.freejobalert.com/'
'upsc-recruitment/16960/#Engg-Services2019')
c = com/tag/page">page.content
soup = BeautifulSoup(c,"html.parser")
row = soup.find_all('tr')
dict = {}
for i in row:
    for title in i.find_all('span', attrs={
        'style':'color: #008000;'}):
        dict['Title'] = title.text
    for com/tag/link">link in i.find_all('a', href=True):
        dict['Link'] = com/tag/link">link['href']
        print(dict)

0 views   0 shares

No matter what stage you're at in your education or career, TuteeHUB will help you reach the next level that you're aiming for. Simply,Choose a subject/topic and get started in self-paced practice sessions to improve your knowledge and scores.

Similar Forum