In
this assignment, you will run an experiment to study the effects of relevance
feedback on the recall, precision and Mean Average Precision (MAP) values of an
IR system. The IR system will use a vector space model with cosine similarity
(tf-idf weighting). You will run the study on the TIME dataset, provided along
with the assignment.
Part A: Cosine Similarity and Rocchio’s algorithm
[40pts]
Part B:
Experimental study [35pts]
o
Query text and ID
(provided in the testbed)
o Precision, recall
and MAP values of the query
o IDs of documents
which are Positive and Negative feedback provided for each query
during
each iteration of the Rocchio algorithm
o For each iteration
of the Rocchio algorithm, provide the terms of the new query and
their
weights
TIME.REL file). This can be a
problem when calculating the performance values, if k is kept constant
during the retrieval. For the experimental study, assume that the number of
relevant
documents
is provided to the system along with the query. In other words, the value of k will change with the query.
Part C: Pseudo Relevance Feedback [25pts]
Sun | Mon | Tue | Wed | Thu | Fri | Sat |
---|---|---|---|---|---|---|
27 | 28 | 29 | 30 | 1 | 2 | 3 |
4 | 5 | 6 | 7 | 8 | 9 | 10 |
11 | 12 | 13 | 14 | 15 | 16 | 17 |
18 | 19 | 20 | 21 | 22 | 23 | 24 |
25 | 26 | 27 | 28 | 29 | 30 | 31 |