Thursday, December 30, 2010

Master's Thesis: Design and Implementation of a Scalable High-Speed Parallel Web Crawler

I have been planning to share this for quite sometime, and today finally I managed the time to do so. My Master's thesis covers a very fundamental component of search engines, namely Web crawlers. The research focus of my work is crawler efficiency which is related with scalability and speed of a Web crawler.

The proposed architecture extends the DRUM technique proposed in the best paper of WWW 2008 titled "IRLbot: Scaling to 6 Billion Pages and Beyond": the technique is used for a single-machine Web crawler. In the thesis, I extend it for a parallel crawler.

Following is my Master's thesis defense presentation, which I successfully passed on 16th December, 2010.

The full-text of the thesis will be available soon. Interested students/researchers may contact me for any questions, comments or feedback. Any researcher interested in the domain of Web crawling may also contact me if he/she has any suggestions. The full-text of the thesis can also be requested via email.

4 comments:

  1. Sure please give me your email id.

    ReplyDelete
    Replies
    1. 212551703@qq.com,Please send me your thesis ..

      Delete
  2. Thank you for sharing your thesis. Well, from the looks of it, you certainly did a great job construction the paper. And it is really a hard topic to tackle about the implementation of a scalable high-speed parallel web crawler, so it’s a plus that you really defended it.

    ReplyDelete