Tuesday, September 21, 2010

Scraping

I have been doing some scraping jobs lately, mostly for fun because none has brought me money yet. Now, since Scrapy 0.10 came out, I'm planing on integrating Django with Scrapy. Up until now it wasn't that easy, but 0.10 introduces a Scrapy daemon, persistent queues and other stuff that makes it easier to schedule scraping jobs remotely.

This project will also give me the opportunity to learn jQuery because I want to use Ajax for the user input and the display of the scraping results. I imagine the site will look somewhat like a search engine, the difference being that the crawling will be done in real time.

When I finish it, I'll put it on Bitbucket for anyone interested.