The proposed architecture extends the DRUM technique proposed in the best paper of WWW 2008 titled "IRLbot: Scaling to 6 Billion Pages and Beyond": the technique is used for a single-machine Web crawler. In the thesis, I extend it for a parallel crawler.
Following is my Master's thesis defense presentation, which I successfully passed on 16th December, 2010.
The full-text of the thesis will be available soon. Interested students/researchers may contact me for any questions, comments or feedback. Any researcher interested in the domain of Web crawling may also contact me if he/she has any suggestions. The full-text of the thesis can also be requested via email.