Best technology(s) for web crawling [closed]

General Tech Technology & Software 2 years ago

0 1 0 0 0 tuteeHUB earn credit +10 pts

5 Star Rating 1 Rating

Posted on 16 Aug 2022, this text provides information on Technology & Software related to General Tech. Please note that while accuracy is prioritized, the data presented might not be entirely correct or up-to-date. This information is offered for general knowledge and informational purposes only, and should not be considered as a substitute for professional advice.

Take Quiz To Earn Credits!

Turn Your Knowledge into Earnings.

tuteehub_quiz

Answers (1)

Post Answer
profilepic.png
manpreet Tuteehub forum best answer Best Answer 2 years ago


2ND EDIT: I guess the suggested library must be the most efficient and best in the world without room for improvement as nobody even attempted to answer the questions regarding the technology, only down voted as if I hadn't already had a library for this.

SO is very clear in their down voting rules:

Instead of voting down: If the post is spammy or offensive, flag it. If the question is duplicate or off-topic, flag it for moderator attention. If something is wrong, please leave a comment or edit the post to correct it.

EDIT: Not sure why down voted, however I got one of the answers I wanted.

What would be the best technology, language, et cetera for creating a web crawler (In terms of finding the actual URIs/URLs inside of the HTML)?

Things I have considered and tried: - C# Substring methods (String manipulation) - Regex - Xslt transformation / XPath

Is there some sort of standard for this? Are there already libraries for this?

Would also like to be able to include IP addresses

No matter what stage you're at in your education or career, TuteeHub will help you reach the next level that you're aiming for. Simply,Choose a subject/topic and get started in self-paced practice sessions to improve your knowledge and scores.