Thursday, December 30, 2010

Master's Thesis: Design and Implementation of a Scalable High-Speed Parallel Web Crawler

I have been planning to share this for quite sometime, and today finally I managed the time to do so. My Master's thesis covers a very fundamental component of search engines, namely Web crawlers. The research focus of my work is crawler efficiency which is related with scalability and speed of a Web crawler.

The proposed architecture extends the DRUM technique proposed in the best paper of WWW 2008 titled "IRLbot: Scaling to 6 Billion Pages and Beyond": the technique is used for a single-machine Web crawler. In the thesis, I extend it for a parallel crawler.

Following is my Master's thesis defense presentation, which I successfully passed on 16th December, 2010.

The full-text of the thesis will be available soon. Interested students/researchers may contact me for any questions, comments or feedback. Any researcher interested in the domain of Web crawling may also contact me if he/she has any suggestions. The full-text of the thesis can also be requested via email.


  1. Sure please give me your email id.

    1.,Please send me your thesis ..

  2. Thank you for sharing your thesis. Well, from the looks of it, you certainly did a great job construction the paper. And it is really a hard topic to tackle about the implementation of a scalable high-speed parallel web crawler, so it’s a plus that you really defended it.