What is Nomad?
Nomad is tiny but efficient search engine. Written entirely in Python, this software is categorized into two distinctive parts.
- Nomad Crawler
- Nomad Searcher
Nomad Crawler
This is the core of the applicaton. The crawler takes the list of urls either from the static file or the database.
Nomad crawler itself is split into numerous parts - downloader, indexer, parser and storage.
The small feature list if as given below.
- Crawls at good speed.
- Non-Blocking IO.
- Easy Configuration like TIMEOUTs, HTTP WAIT, MINIMUM PAGE SIZE to download.
- Compressed Archives (compression ratio 1:6)
- Indexed URLs cahcing.
- Batch Processing for Downloading, Storing and Indexing.
Nomad Searcher
A simple web based front end to see what Nomad Crawler has done so far. I have not done much work on the front end as such. This is just minimal facility to let the users have type in the keyword and dig into the indexed data store.
Nomad Searcher returns with the list of results (if matched :-). The presentation style is more or less like Google with the option to see cached contents.
|