Past EED rants

Labels

Live leaderboard

Poker leaderboard

Voice of EED

Tuesday 22 April 2003

Grub - Distributed web search or con? [lurks]

I spied this interesting story on Slashdot about a distributed search engine called Grub. This is a nifty idea. The hardest thing about a decent searchengine is basically scanning the regularly changed pages and what the Grub project does is provide a nifty little client which runs in the system tray or as a screensaver etc, fetches lists of URLs from the server and reads them to see what's changed and sends tokened/compressed results to the server etc. It's not so much distributed processing as distributed bandwidth and we all know how much that is a cool thing based on Bittorrent.
The other promising thing was, taking a peek at the Grub.org website shows that they have an XML interface for getting results. So, on the surface of it this would appear to be an extremely cool thing. In return for running a client which helps out on the global spidering of the web (and you can become an authoritative source on your own web sites etc), you can query their XML interface and get your own search results in a raw format to use how you like. Nifty eh?
Well it would be except it's not actually what's happening. You see the 'API - use the results' section of their web site only contains the useless queries concerning URL and client statistics. There's no facility to get your own results and no plans to add them. Instead, Grub has actually teamed up with commercial search engine operators and agreed deals to supply the backend. So hang on a minute, they want to sponge all this free CPU and bandwidth off us and then we don't even get to use the results, we're just bolstering some commercial search engine?
Yep, that's about the size of it. How lame is that? Lame enough to ban Grub clients from your robots.txt in my view. This does of course leave the field open for a proper distributed and open implementation of the same idea. It's definitely the way forward, I'm sick of parsing Google's HTML to get search results and despite their 10,000 odd servers, they're still way behind the curve on updating web sites regularly on the Internet.

3 comments:

  1. Thats a fucking outrage frankly, sucking up your personal cpu cycles for their company and giving nothing back. Set your robots to kill on sight.

    ReplyDelete
  2. We heard about this at work a while back; on the basis that we're currently building a google-esque application to facilitate global corporate document indexing/sharing.
    Grub is owned by LookSmart. You probably haven't heard of them - but they're another bunch of morons who buy up domains left right and centre, and redirect them to a lame 'web directory' page - I'm sure you've all seen this sort of stuff in action.
    Grub is a cheeky pile of wank. Steer clear.
    Incidentally, I thought Google did XML results?In fact, wouldn't this be of use to you Mat?
    http://www.google.com/apis/

    ReplyDelete
  3. You need an account to access it and you're only allowed x-many queries unless you pay. We use a surprising number of Google queries during a day.

    ReplyDelete