In this project, you need to implement a simple Information Retrieval system able to index a collection of documents, perform queries over it, and generate an output in the form of a ranked list of documents. As part of this project, you will conduct also a first evaluation of this system in order to assess its performance.
More specifically, the project will include the implementation of the following components:
Search and ranking. You need to implement three different retrieval models: Vector Space Model, BM25, and one Language Model of your choice.
Evaluation. You will build a pipeline for evaluating your system and compare the three different models that you have implemented. To generate the results of your experimental evaluation, you should use the TREC evaluation script ( trec_eval- 9.0.7.tar.gz ) that you can find h ere. A description of the metrics computed by this
script can be found h ere. Please, follow the instructions in the and run the script.
file to compile
In this project, you will be working with the Cranfield collection, a small collection of 1,400 abstracts and 226 queries on the aviation domain.
The file contains the collection with the following files: