TWiki
>
Main Web
>
TWikiGroups
>
PLGWikiGroup
>
CS798ProjectDescription
(2007-10-10,
IanMacKinnon
)
(raw view)
E
dit
A
ttach
Title: Wikipedia Search As mentioned in class (1st Oct 2007), evaluation is necessary to assess the effectiveness of various systems in satisfying some particular user needs. To evaluate systems, the three major steps are * Test Corpus Creation * * Query Set Compilation * Document Judgements Admininstrative Info: As part of the course project, we would like to create a reusable test collection with topics and judgements. Each participating student would be required to submit a run, where they retrieve documents from wikipedia for a given query. Along with that, students would be required to take part in topic development and evaluate documents retrieved for their composed topics. The marks breakdown would be * topic dev Introduction NIST in context of TREC provides framework to experimentally compare the effectivenes and efficiency of various methods. ,.. <num> Number: 503 <Title> Vikings in Scotland? <Description> What hard evidence proves that the Vikings visited or lived in Scotland? <Narrative> A document that merely states that the Vikings visited or lived in Scotland is not relevant. A relevant document must mention the source of the information, such as relics, sagas, runes or other records from those times. [ b;lah blah ] Our framework of evaluation is as follows: * Corpus : We decided to use the latest snapshot dump of wikipedia as out test corpus. It could be downloaded from here * Topic Creation and Evaluation: Our topic creation and evaluation are simulated based on certain user-specific task, where user is looking for comprehensive information about certain topic. Guidelines for topic creation and evaluation are as follows: Topic Development We wish to simulate the following (hypothetical) situation. Scenario : User is working on an assignment to write a report about certain topic (e.g. "Drug Usage in American Sports"). In order to impress her instructor, she needs to cover various events/facts/aspects about the topic and should back her points with proper references. Going by the rule " if there is something out there, it ought to be in Wikipedia ", she decides to search for pages related to her topic that she could use as reference. Under the assumption that she has some knowledge of the topic, she (almost) knows relevance of particular event description in an article. Translating her information need, we compose a sample topic as shown below: <topic> <title> Drug usage in American sports </title> <desc> User wishes to know about Steriod usage in American Sports. Effects of usage, players implicated and actions taken, drugs restrictions and actions taken by organizations to prevent their usage in competetive sports are all relevant. </desc> <Imp Points> 1. Does the document contain names of steroids and their influence? 2. Does the document mention about steriods usage in baseball? 3. Does the document provide names of players (American players) implicated and actions taken ? 4. Does the document mention about the actions taken by establishments ? 5. Does the document provide ? </Imp Points> </topic> As shown in above example, each topic contains a title, desc, and a Imp Points fields. Each participating student is required to submit 2 or 3 topics, from which we select two topics towards topic file compilation. Topic file (hopefully consisting of around 50 topics) would contain only title and desc fields, where as "Imp Points" field is used during evaluation phase. Evaluation As mentioned previously, topic creators (students) would be involved in evaluation phase by judging documents retrieved for their topics (max 2). We ask you to judge one document at a time, independent of previously shown documents. For each document, judge can choose one of the following choices: * not relevant (0) : Page is not relevant to the topic. * relevant (1): Page provides some information about topic. * highly relevant (2): Page is a perfect candidate to be used as reference for topic. In addition to that, each relevant (or highly relevant) document is judged as follows (for above query): * Does it answer question 1? * Does it answer question 2? ... * Does it answer question n? Our aim in carrying out such evaluation would be to measure the novelty of information provided by certain document as compared to previously seen document(s). !!! TO DO: JUDGING HELP (SNAPSHOTS) and MEASURES !!! Group/single participants Number of documents.
E
dit
|
A
ttach
|
Watch
|
P
rint version
|
H
istory
: r2
<
r1
|
B
acklinks
|
V
iew topic
|
Ra
w
edit
|
M
ore topic actions
Topic revision: r2 - 2007-10-10
-
IanMacKinnon
Main
Welcome TWikiGuest
Register
Log in
Main Web
Main Web Home
Users
Groups
Offices
Changes
Changes detailed
Topic list
Search
TWiki Webs
CSEveryBody
AIMAS
CERAS
CF
CrySP
External
Faqtest
HCI
Himrod
ISG
Main
Multicore
Sandbox
TWiki
TestNewSandbox
TestWebS
UW
My links
People
CERAS
WatForm
Tetherless lab
Ubuntu HowTo
eDocs
RGG NE notes
RGG
CS infrastructure
Grad images
Edit
Copyright © 2008-2025 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki?
Send feedback