In this project, you need to implement a simple
Information Retrieval system able to index a collection of documents, perform
queries over it, and generate an output in the form of a ranked list of
documents. As part of this project, you will conduct also a first evaluation of
this system in order to assess its performance.
More specifically, the project will include the
implementation of the following components:
Indexing.
Search and ranking.
You need
to
implement three different retrieval models: Vector
Space Model, BM25, and one Language Model of
your choice.
Evaluation. You will build a pipeline for evaluating your system and compare the three different models that you have implemented. To generate the results of your experimental evaluation, you
should use the TREC evaluation script ( trec_eval- 9.0.7.tar.gz
) that you can find h ere. A description of the metrics
computed by this
script can be found h ere. Please, follow the
instructions in the and run the script.
file to compile
In this project,
you will be working with the Cranfield
collection, a small collection of 1,400 abstracts and 226 queries on the
aviation domain.
The file contains
the collection with the following files:
Sun | Mon | Tue | Wed | Thu | Fri | Sat |
---|---|---|---|---|---|---|
27 | 28 | 29 | 30 | 1 | 2 | 3 |
4 | 5 | 6 | 7 | 8 | 9 | 10 |
11 | 12 | 13 | 14 | 15 | 16 | 17 |
18 | 19 | 20 | 21 | 22 | 23 | 24 |
25 | 26 | 27 | 28 | 29 | 30 | 31 |