Popular Categories

Creating a bot/crawler

Web Technologies Web Development 2 years ago

2.72K 1 0 0 0

Manpreet Singh

_x000D_ _x000D_ I would like to make a small bot in order to automatically and periodontally surf on a few partner website. This would save several hours to a lot of employees here. The bot must be able to : connect to this website, on some of them log itself as a user, access and parse a particular information on the website. The bot must be integrated to our website and change it's settings (used userâ€¦) with data of our website. Eventually it must sum up the parse information. Preferably this operation must be done from the client side, not on the server. I tried dart last month and loved itâ€¦ I would like to do it in dart. But I am a bit lost : Can I use a Document class object for each website I want to parse? Could be headless or should I use the chrome/dartium api to controle the webbrowser (i'd like to avoid this) ? I've been reading this thread : https://groups.google.com/a/dartlang.org/forum/?fromgroups=#!searchin/misc/crawler/misc/TkUYKZXjoEg/Lj5uoH3vPgIJ Does using https://github.com/dart-lang/html5lib is a good idea for my case?

Previous Next

Posted on 16 Aug 2022, this text provides information on Web Development related to Web Technologies. Please note that while accuracy is prioritized, the data presented might not be entirely correct or up-to-date. This information is offered for general knowledge and informational purposes only, and should not be considered as a substitute for professional advice.

Answers (1)

Post Answer

manpreet Best Answer 2 years ago

_x000D_ There are two parts to this. Get the page from the remote site. Read the page into a class that you can parse. For the first part, if you are planning on running this client-side, you are likely to run into cross-site issues, in that your page, served from server X, cannot request pages from server Y, unless the correct headers are set. See: CORS with Dart, how do I get it to work? and Dart application and cross domain policy or the site in question needs to be returning the correct CORS headers. Assuming that you can actually get the pages from the remote site client-side, you can use HttpRequest to retrieve the actual content: // snippet of code... new HttpRequest.get("http://www.example.com", (req) { // process the req.responseText }); You can also use HttpRequest.getWithCredentials. If the site has some custom login, then you will probably problems (as you will likely be having to Http POST the username and password from your site into their server - This is when the second part comes in. You can process your HTML using the DocumentFragment.html(...) constructor, which gives you a nodes collection that you can iterate and recurse through. The example below shows this for a static block of html, but you could use the data returned from the HttpRequest above. import 'dart:html'; void main() { var d = new DocumentFragment.html(""" Foo """); // print the content of the top level nods d.nodes.forEach((node) => print(node.text)); // prints "Foo" // real-world - use recursion to go down the hierarchy. } I'm guessing (not having written a spider before) that you'd be wanting to pull out specific tags at specific locations / depths to sum as your results, and also add urls in hyperlinks to a queue that your bot will navigate into.

0 views

0 shares

No matter what stage you're at in your education or career, TuteeHUB will help you reach the next level that you're aiming for. Simply,Choose a subject/topic and get started in self-paced practice sessions to improve your knowledge and scores.

Popular Categories

Creating a bot/crawler

Manpreet Singh

Answers (1)

manpreet Best Answer 2 years ago

Similar Forum

Unity hosting a custom piece of code on own website [on hold]

I am needing advice on a web bot [on hold]

date and time picker, that can take a max and min absolute time, compatible with react [on hold]

Explore Other Libraries

Online Exams

Question Bank

Career News

Feeds

Full Forms

Dictionary

Interview Question

Gigs

Quotes

Lyrics

Videos

Courses

Blogs

Tutorials

Forum

Educators

Corporates

Tools

Related Searches

Important Web Technologies Links

Join Our Community Today