Spiderz

Scarily easy spidering

What is it?

Spiderz is a very simple ruby gem for spidering websites. Here is a short example that creates a sitemap.

      spider = Spiderz.new "http://mysite.com" 
      
      spider.success do |url, doc|
        title = (doc / "title").text.strip
        puts "<a href='#{url}' >#{title}</a>" 
      end
      
      spider.crawl "/" 
      

Here is another to find 404’s

      spider = Spiderz.new "http://mysite.com" 
      
      spider.failure do |url|
        puts "Failed to load: url" 
      end
      
      spider.crawl "/" 
      

CALLBACKS

The idea behind Spiderz is to provide very simple callbacks that can be customized with blocks

You can also override the default skip behaviour (by default it follows all internal links that are not mailto’s or bookmarks).

For more info read the source

REQUIREMENTS

INSTALL

sudo gem install spiderz

Spiderz is open source and available at github: here