Conceptually, these are the steps involved in answering questions using the Web:
The Aranea is comprised of independent modules whose input and
output are organized around the Aranea XML object, which is parsed and
supported by
Areana::XML::Aranea
in Perl. At the top-level,
the XML object contains five components components:
<query>
contains the question being
answered, along with other meta-information.
<request_list>
contains "requests"
derived from the natural language query. Each request translates
into one or more search Engine queries with additional
postprocessing.
<page_list>
stores all the pages
retrieved from the Web: page summary, url, etc.
<pipeline>
keeps track of the modules
(timestamped) that have modified the Aranea XML object; basically,
this provides an accounting method.
<candidate_list>
stores all candidate
answers to the query, including their score, and urls of documents
that support the answer.
The Aranea XML object starts out with something like this, when the user query is converted into requests:
<aranea> <query> <text>where is Belize located?</text> <id>stdin</id> </query> <request_list> <entry> <query>Belize is located in ?x</query> <score>10</score> <constraints> <max_length>100</max_length> <pages_to_search>10</pages_to_search> <max_words>5</max_words> </constraints> </entry> <entry> <query>where is Belize located?</query> <score>1</score> <constraints> <use_backoff>true</use_backoff> <pages_to_search>10</pages_to_search> </constraints> </entry> </request_list> </aranea>
The requests are executed and pages are retrieved: (Note that each entry is tagged with the request that the page came from.)
<aranea> ... <page_list> <entry> <id>0</id> <page_category> ... </page_category> <page_summary> ... </page_summary> <page_url> ... </page_url> <page_title> ... </page_title> <page_cache_url> ... </page_cache_url> </entry> ... <entry> <id>1</id> <page_category> ... </page_category> <page_summary> ... </page_summary> <page_url> ... </page_url> <page_title> ... </page_title> <page_cache_url> ... </page_cache_url> </entry> <page_list> ... </aranea>
After additional processing by various modules,
<candidate_list>
becomes populated with candidate
answers:
<aranea> ... <candidate_list> ... <entry> <support> <doc>http://commonwealth.ednet.ns.ca/americas/Belize/belize.htm</doc> <doc>http://realestate.escapeartist.com/Properties/Belize/</doc> ... </support> <score>60</score> <candidate>Central America</candidate> </entry> </candidate_list> </aranea>
The Aranea modules tutorial provides a HOWTO guide for writing modules.
$Header: /fs/clip-qa/.cvsroot/Aranea.support/docs/general-architecture.html,v 1.3 2005/06/10 20:44:17 jimmylin Exp $