How does the wadseeker work?

General help and assistance forum.
Post Reply
User avatar
doomista
Forum Regular
Posts: 147
Joined: Sat Mar 07, 2015 6:58 pm
Location: I've been to hell. Twice

How does the wadseeker work?

#1

Post by doomista » Wed Feb 22, 2017 11:41 am

Hi,

I am currently hosting my own server for the first time and I had already tackled a couple of tricky things related to this. At this moment, I am trying to actually comprehend how the wadseeker is supposed to work since I don't want to rely od external repos and I also want to test out my mods.

I managed to get wadseeker to download the wads from my server automatically, but I don't understand why it works and whether this could be done more conveniently. I have an apache running and all hosted wads are in /var/www/html. Somehow, this alone does not make wadseeker download the wads. After I made a simple index.html containing <a href> links to the files, it automagically started to recognize the downloads.

Is this really an intended behaviour or is this cache-related in a sense that doomseeker has to cashe the server settings and then tell wadseeker?

Thanks for explanation

User avatar
Sean
IRC Operator
Posts: 952
Joined: Thu Jan 16, 2014 9:09 pm
Location: United Kingdom
Contact:

Re: How does the wadseeker work?

#2

Post by Sean » Wed Feb 22, 2017 3:44 pm

Wadseeker needs a list of files from the server so it can actually find them. If you're using Crapache, you'll want to look into mod_autoindex.
<capodecima> i dont say any more word without my loyer jenova

User avatar
doomista
Forum Regular
Posts: 147
Joined: Sat Mar 07, 2015 6:58 pm
Location: I've been to hell. Twice

Re: How does the wadseeker work?

#3

Post by doomista » Wed Feb 22, 2017 4:07 pm

wow, in that case I am pretty much surprised it works for me at all. But I can't quite understand why is it coded in such way. There's no point for wadseeker to cache those lists and for me it seems way quicker to just request server.name/wadname.wad instead of parsing .html files with regexes. Anyways, thanks for explaining.

User avatar
Zalewa
Developer
Posts: 329
Joined: Wed May 30, 2012 3:28 pm

Re: How does the wadseeker work?

#4

Post by Zalewa » Wed Feb 22, 2017 5:44 pm

doomista wrote:for me it seems way quicker to just request server.name/wadname.wad instead of parsing .html files with regexes. Anyways, thanks for explaining.
Such heuristic might not be a bad idea, although I'd imagine it will be more often a miss than a hit.
doomista wrote:But I can't quite understand why is it coded in such way. There's no point for wadseeker to cache those lists
Wadseeker mirrors GetWAD behavior - it has a built-in list of sites that host WADs, it knows how to talk to idgames archive and wad-archive and it will also receive the custom URL of your server. Custom URLs are prioritized so if you want to ensure that Wadseeker downloads the WADs you're hosting you should take care to provide a correct URL in the configuration of your server. The archives (idgames & wad-archive) provide their own APIs so Wadseeker will ask them for specific files and the archives will respond with info if they have the files, provide some detailed information like file size or checksum and also a download URL.

There's also support for pages that allow querying directly for a given file. The URL can have a %WADNAME% placeholder that will be substituted for the seeked file and the site is supposed to provide the file when such URL is requested.

Generic pages are downloaded and parsed as HTML. Wadseeker is looking for <a hrefs> that may lead to <wadname>.<extension>, <wadname>.zip and <wadname>.7z files. It also remembers which sites were visited to avoid going into an infinite loop. Wadseeker may crawl through pages if <a hrefs> to other pages contain <wadname> in their text content and Wadseeker will also recognize HTTP attachments and download them as WADs. Wadseeker will extract WADs from archives if server hosts a ".wad" file or install the archives directly if server hosts the archive.

In the most recent beta version Wadseeker will seek WAD files in subdirectories, so if you download one of those archives where there's a directory inside and then the WAD is in this directory then Wadseeker will extract this properly.

Much of this heuristic based and Wadseeker will from time to time install the wrong WAD. It has no way of knowing what is the version of the WAD being hosted on a given server, where the WAD can be downloaded with 100% certainty or if the WAD it downloads is the actually hosted WAD or another WAD with the same name.

User avatar
Empyre
Zandrone
Posts: 1316
Joined: Sun Jul 08, 2012 6:41 am
Location: Garland, TX, USA

Re: How does the wadseeker work?

#5

Post by Empyre » Wed Feb 22, 2017 11:10 pm

The latest stable version of WadSeeker can't find wads on http://static.allfearthesentinel.net/wads/ but the latest beta can. Should I open a ticket for this?
"For the world is hollow, and I have touched the sky."

User avatar
doomista
Forum Regular
Posts: 147
Joined: Sat Mar 07, 2015 6:58 pm
Location: I've been to hell. Twice

Re: How does the wadseeker work?

#6

Post by doomista » Wed Feb 22, 2017 11:46 pm

[spoiler]
Zalewa wrote:
doomista wrote:for me it seems way quicker to just request server.name/wadname.wad instead of parsing .html files with regexes. Anyways, thanks for explaining.
Such heuristic might not be a bad idea, although I'd imagine it will be more often a miss than a hit.
doomista wrote:But I can't quite understand why is it coded in such way. There's no point for wadseeker to cache those lists
Wadseeker mirrors GetWAD behavior - it has a built-in list of sites that host WADs, it knows how to talk to idgames archive and wad-archive and it will also receive the custom URL of your server. Custom URLs are prioritized so if you want to ensure that Wadseeker downloads the WADs you're hosting you should take care to provide a correct URL in the configuration of your server. The archives (idgames & wad-archive) provide their own APIs so Wadseeker will ask them for specific files and the archives will respond with info if they have the files, provide some detailed information like file size or checksum and also a download URL.

There's also support for pages that allow querying directly for a given file. The URL can have a %WADNAME% placeholder that will be substituted for the seeked file and the site is supposed to provide the file when such URL is requested.

Generic pages are downloaded and parsed as HTML. Wadseeker is looking for <a hrefs> that may lead to <wadname>.<extension>, <wadname>.zip and <wadname>.7z files. It also remembers which sites were visited to avoid going into an infinite loop. Wadseeker may crawl through pages if <a hrefs> to other pages contain <wadname> in their text content and Wadseeker will also recognize HTTP attachments and download them as WADs. Wadseeker will extract WADs from archives if server hosts a ".wad" file or install the archives directly if server hosts the archive.

In the most recent beta version Wadseeker will seek WAD files in subdirectories, so if you download one of those archives where there's a directory inside and then the WAD is in this directory then Wadseeker will extract this properly.

Much of this heuristic based and Wadseeker will from time to time install the wrong WAD. It has no way of knowing what is the version of the WAD being hosted on a given server, where the WAD can be downloaded with 100% certainty or if the WAD it downloads is the actually hosted WAD or another WAD with the same name.
[/spoiler]

Wow, thumbs up for this amount of work. If I had to code this, I would go the easy way (i.e. requiring some sort of REST api or requesting server/wad.name). By the way, is there any kind of documentation on how to make at least simple version of idgames/wad-archive API? Verifying the size of the file would be nice (I've already tackled some broken uploads).

User avatar
Zalewa
Developer
Posts: 329
Joined: Wed May 30, 2012 3:28 pm

Re: How does the wadseeker work?

#7

Post by Zalewa » Thu Feb 23, 2017 12:12 am

doomista wrote:Wow, thumbs up for this amount of work. If I had to code this, I would go the easy way (i.e. requiring some sort of REST api or requesting server/wad.name). By the way, is there any kind of documentation on how to make at least simple version of idgames/wad-archive API? Verifying the size of the file would be nice (I've already tackled some broken uploads).
Wadseeker won't talk to your site even if you make an API that is compatible with one of the archives. The archives are pretty much hardcoded into the library and it's expected that there will be one instance of each archive under a known address.
Empyre wrote:The latest stable version of WadSeeker can't find wads on http://static.allfearthesentinel.net/wads/ but the latest beta can. Should I open a ticket for this?
No. If it's fixed in the "beta" (quotation intended) then it's fixed. The official release of 1.1 is long overdue.

Post Reply