Aranea::XML::CandidateList

CONTENTS


NAME

Aranea::XML::CandidateList - object representing the <candidate_list> of an Aranea XML object


SYNOPSIS


  use Aranea::XML::Aranea;
  use Aranea::XML::CandidateList;
 
  my $stdin = new_from_fd IO::Handle(STDIN, "r");
  my $Aranea = Aranea::XML::Aranea->new($stdin);
 
  my $candidate_list = $Aranea->candidate_list();
  my $entries = $candidate_list->entries();
  # arrayref to entries in candidate list


DESCRIPTION

The CandidateList object represents the candidate_list section of an Aranea XML object. Here is an example:


  <candidate_list>
    <entry>
      <support>
        <doc> http://www.foo.com/p1.html </doc>
        <doc> http://www.bar.com/p1.html </doc>
        <doc> http://www.baz.com/p2.html </doc>
      </support>
      <score>234.287927</score>
      <candidate>foo bar baz</candidate>
    </entry>
    ...
  </candidate_list>

The CandidateList object serves as a container for a collection of CandidateEntrys. Under normal circumstances, the creation of a CandidateList should be handled automatically by the Aranea::XML::Aranea object.


METHODS

new($data)
Constructs a CandidateList object given parsed XML data. Note that this method is rarely invoked manually; under typical usage patterns, the constructor of Aranea::XML::Aranea will call this method automatically.

entries([$listref])
Accesses or mutates the collection of CandidateEntrys. When called with no arguments, this method returns a listref of CandidateEntrys. When called with a single listref, the collection of CandidateEntrys is set to the listref.

The typical usage pattern is to iterate over each page object and perform some action, e.g.,


  my $candidate_list = $Aranea->candidate_list();
  my $entries = $candidate_list->entries();
 
  for my $entry ( @$entries ) {
      print $entry->candidate() . "\n";
 
      # do more stuff...
  }

entries_count()
Returns the number of CandidateEntrys in this CandidateList. Note that the return value is similiar to that of $#, i.e., -1 for no entries, 0 for one entry, etc.

add_entry($entry)
Adds a CandidateEntry to this CandidateList.

map_candidates($proc)
Maps a procedure over the answer candidates. Thie method takes as its single argument a procedure that take a candidate string as the argument; the answer candidate is mutated to the results of this procedure.


  # expand 'Jan' to 'January'
  sub foo { my $s = shift; $s =~ s/Jan/January/g; $s }
  $Aranea->candidate_list()->map_candidates(\&foo);

filter_candidates($proc)
Filter answer candidates. Thie method takes as its single argument a procedure that take a candidate string as the argument and evaluates to true iff the candidate is to be kept (otherwise candidate is discarded).


  # throw away candidates that do not have 'foo' in them.
  sub foo { shift =~ /foo/ }
  $Aranea->candidate_list()->filter_candidates(\&foo);

filter_entries($proc)
Filter answer entries. Thie method takes as its single argument a procedure that take a CandidateEntry as an argument and evaluates to true iff the candidate is to be kept (otherwise candidate is discarded). The major difference between this method and filter_candidates is that the criteria procedure in filter_candidates is passed only the candidate string, whereas filter_entries allows the criteria procedure access to the entire CandidateEntry, e.g., the score and supporting documents.

write_xml($writer)
Writes the XML representation of this CandidateList; takes an XML::Writer as its only argument. Note that this method is rarely invoked manually; under typical usage patterns, the write_xml method of Aranea::XML::Aranea will call this method automatically.