Nomad - A Tiny Search Engine.

Crawl Statistics

I tried to run the crawler at various times. I have observed almost the similar performance across the different periods in a day. The reason may be lying in the fact that this crawler doesn't wait for too long for a response from heavy-load web servers. Still, i have tried to categorize the tests in two parts. Day/Peak time, when the bandwidth consumption is assumed to be on the higher side and Night/Non-Peak Time when the bandwidth consumption is assumed to be at the other end. This test has been done on 64 kbps shared leased line.

Day/Peak Time

URLs Tried : 2009
URLs Indexed : 1838
URLs Uncrawled : 8291
URLS Broken : 171 (8.5%)
Words Parsed : 18697
Words X Urls : 18697
Min Doc Size(b): 1106
Avg Doc Size(b): 24545.5822
Max Doc Size(b): 32767
Missing Title : 5
Avg. Links/Page: 4.51
CPU Time Taken : 951.22 sec approx.
Total Time : 183 mins approx.
Efficiency : 10 urls/min approx.

Night/Non-Peak Time

URLs Tried : 4005
URLs Indexed : 3608
URLs Uncrawled : 37507
URLS Broken : 401 (10%)
Words Parsed : 57104
Words X Urls : 57104
Min Doc Size(b): 1383
Avg Doc Size(b): 24427.8395
Max Doc Size(b): 32767
Missing Title : 22
Avg. Links/Page: 10
CPU Time Taken : 1577.86 sec approx.
Total Time : 342 mins approx.
Efficiency : 10.5 urls/min approx.