Title: Wikipedia Search

As mentioned in class (1st Oct 2007), evaluation is necessary to assess the effectiveness of various systems in satisfying some particular user needs. To evaluate systems, the three major steps are

* Test Corpus Creation * * Query Set Compilation * Document Judgements

Admininstrative Info:

As part of the course project, we would like to create a reusable test collection with topics and judgements. Each participating student would be required to submit a run, where they retrieve documents from wikipedia for a given query. Along with that, students would be required to take part in topic development and evaluate documents retrieved for their composed topics. The marks breakdown would be

* topic dev

Introduction NIST in context of TREC provides framework to experimentally compare the effectivenes and efficiency of various methods. ,..

Number: 503 Vikings in Scotland? <Description> What hard evidence proves that the Vikings visited or lived in Scotland? <Narrative> A document that merely states that the Vikings visited or lived in Scotland is not relevant. A relevant document must mention the source of the information, such as relics, sagas, runes or other records from those times. [ b;lah blah ] Our framework of evaluation is as follows: <p /> * Corpus : We decided to use the latest snapshot dump of wikipedia as out test corpus. It could be downloaded from here * Topic Creation and Evaluation: Our topic creation and evaluation are simulated based on certain user-specific task, where user is looking for comprehensive information about certain topic. <p /> Guidelines for topic creation and evaluation are as follows: Topic Development <p /> We wish to simulate the following (hypothetical) situation. Scenario : User is working on an assignment to write a report about certain topic (e.g. "Drug Usage in American Sports"). In order to impress her instructor, she needs to cover various events/facts/aspects about the topic and should back her points with proper references. Going by the rule " if there is something out there, it ought to be in Wikipedia ", she decides to search for pages related to her topic that she could use as reference. Under the assumption that she has some knowledge of the topic, she (almost) knows relevance of particular event description in an article. <p /> Translating her information need, we compose a sample topic as shown below: <topic> <title> Drug usage in American sports User wishes to know about Steriod usage in American Sports. Effects of usage, players implicated and actions taken, drugs restrictions and actions taken by organizations to prevent their usage in competetive sports are all relevant.

  1. Does the document contain names of steroids and their influence?
  2. Does the document mention about steriods usage in baseball?
  3. Does the document provide names of players (American players) implicated and actions taken ?
  4. Does the document mention about the actions taken by establishments ?
  5. Does the document provide ?

</Imp Points>

As shown in above example, each topic contains a title, desc, and a Imp Points fields. Each participating student is required to submit 2 or 3 topics, from which we select two topics towards topic file compilation. Topic file (hopefully consisting of around 50 topics) would contain only title and desc fields, where as "Imp Points" field is used during evaluation phase.

Evaluation

As mentioned previously, topic creators (students) would be involved in evaluation phase by judging documents retrieved for their topics (max 2). We ask you to judge one document at a time, independent of previously shown documents. For each document, judge can choose one of the following choices:

* not relevant (0) : Page is not relevant to the topic. * relevant (1): Page provides some information about topic. * highly relevant (2): Page is a perfect candidate to be used as reference for topic.

In addition to that, each relevant (or highly relevant) document is judged as follows (for above query):

* Does it answer question 1? * Does it answer question 2? ... * Does it answer question n?

Our aim in carrying out such evaluation would be to measure the novelty of information provided by certain document as compared to previously seen document(s). !!! TO DO: JUDGING HELP (SNAPSHOTS) and MEASURES !!!

Group/single participants Number of documents.

Topic revision: r2 - 2007-10-10 - IanMacKinnon
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback