| Class | Description |
|---|---|
| BitextClassifierUtils |
Train and test a bitext classifier.
|
| BruteForcePwsim |
A class to extract the similarity list of each sample document, either by performing dot product
between the doc vectors or finding hamming distance between signatures.
|
| BruteForcePwsim.MyMapperSignature |
For every document (signature) in the sample, find all other docs that are closer than some
given hamming distance.
|
| BruteForcePwsim.MyMapperTermDocVectors |
For every document (term doc vector) in the sample, find all other docs that have cosine
similarity higher than some given threshold.
|
| BruteForcePwsim.MyMapperWeightedIntDocVectors |
For every document (weighted int doc vector) in the sample, find all other docs that have
cosine similarity higher than some given threshold.
|
| BruteForcePwsim.MyReducer |
This reducer reduces the number of pairs per sample document to a given number
(Ivory.NumResults).
|
| ConvertMapToPairs |
Convert the format of the PCP algorithm's output.
|
| ConvertMapToPairs.MyMapper |
Input is keyed by german docno, and the value is a map from similar english docnos to
similarity weights.
|
| Docnos2Titles | |
| ExtractWikipedia |
A class to extract interwiki language links from a Wikipedia collection .
|
| FilterResults | |
| FilterResults.MyMapper | |
| FilterResults.MyMapperTopN |
Filter results that are not from sample and/or have distance more than specified in option
Ivory.MaxHammingDistance.
|
| FilterResults.MyReducerTopN | |
| OutputResultsAsText |
Read in sequence file format and output as text format.
|
| SampleIntDocVectors |
A program that samples from a collection of key,value pairs according to a given frequency.
|
| SampleIntDocVectors.MyReducer | |
| SampleSignatures | |
| SampleSignatures.MyMapper |
Filter signatures that are not from sample.
|
| SampleSignatures.MyReducer | |
| SampleTermDocVectors |
A program that samples from a collection of key,value pairs according to a given frequency.
|
| SampleTermDocVectors.MyReducer |