The goal of this project is to explore how hashtables work and compare the performance to linear search. Along the way, the application will display search results in a browser window and being able to navigate to documents.
A search engine accepts one or more terms and searches a corpus for files matching all of those terms. A corpus is just a directory and possibly subdirectories full of text files. Here is a fragment of a sample search results page as displayed in Chrome (activated from Python); clicking on a link brings up the actual file.
HTML output | File Content |
---|---|
We will need the 7z compression utility to uncompress those data.
Assuming you have placed the slate directory under a data directory in your home directory.
$ python search.py linear ~/data/slate
$ python search.py index ~/data/slate
$ python search.py myhtable ~/data/slate
Here is what the program looks like in action:
$ python search.py linear ~/data/slate
4530 files
Search terms: Reagan Iran
After you enter the search terms and hit return, the Python program pops up your default browser on the HTML file you have just generated as a result of the search.