As mentioned in class (1st Oct 2007), evaluation is necessary to assess the effectiveness of various systems in satisfying some particular user needs. To evaluate systems, the three major steps are
* Test Corpus Creation * * Query Set Compilation * Document Judgements
Admininstrative Info:
As part of the course project, we would like to create a reusable test collection with topics and judgements. Each participating student would be required to submit a run, where they retrieve documents from wikipedia for a given query. Along with that, students would be required to take part in topic development and evaluate documents retrieved for their composed topics. The marks breakdown would be
* topic dev
Introduction NIST in context of TREC provides framework to experimentally compare the effectivenes and efficiency of various methods. ,..
</Imp Points>
As shown in above example, each topic contains a title, desc, and a Imp Points fields. Each participating student is required to submit 2 or 3 topics, from which we select two topics towards topic file compilation. Topic file (hopefully consisting of around 50 topics) would contain only title and desc fields, where as "Imp Points" field is used during evaluation phase.
Evaluation
As mentioned previously, topic creators (students) would be involved in evaluation phase by judging documents retrieved for their topics (max 2). We ask you to judge one document at a time, independent of previously shown documents. For each document, judge can choose one of the following choices:
* not relevant (0) : Page is not relevant to the topic. * relevant (1): Page provides some information about topic. * highly relevant (2): Page is a perfect candidate to be used as reference for topic.
In addition to that, each relevant (or highly relevant) document is judged as follows (for above query):
* Does it answer question 1? * Does it answer question 2? ... * Does it answer question n?
Our aim in carrying out such evaluation would be to measure the novelty of information provided by certain document as compared to previously seen document(s). !!! TO DO: JUDGING HELP (SNAPSHOTS) and MEASURES !!!
Group/single participants Number of documents.