ILINK [15][7] is a widely used genetic linkage analysis
program that locates specific disease genes on chromosomes.
The input to ILINK consists of several family trees. The program
traverses the family trees and visits each nuclear family.
The main data structure in ILINK is a pool of genarray
s.
A genarray contains the probability of each genotype for an individual.
Since the genarray is sparse, an index array of pointers to nonzero values in
the genarray is associated with each one of them.
A bank of genarrays large enough to accommodate the biggest nuclear
family is allocated at the beginning of program, and the same bank is
reused for each nuclear family.
When the computation moves to a new nuclear family, the pool of
genarrays is reinitialized for each person in the current family.
The computation either updates a parent's genarray
conditioned on the spouse and all children, or updates one child
conditioned on both parents and all the other siblings.
We use the parallel algorithm described in Dwarkadas et al. [8]. Updates to each individual's genarray are parallelized. A master processor assigns the nonzero elements in the parent's genarray to all processors in a round robin fashion. After each processor has worked on its share of nonzero values and updated the genarray accordingly, the master processor sums up the contributions of each of the processors.
In the TreadMarks version, the bank of genarrays is shared among the processors, and barriers are used for synchronization. In the PVM version, each processor has a local copy of each genarray, and messages are passed explicitly between the master and the slaves at the beginning and the end of each nuclear family update. Since the genarray is sparse, only the nonzero elements are sent. The diffing mechanism in TreadMarks automatically achieves the same effect. Since only the nonzero elements are modified during each nuclear family update, the diffs transmitted to the master only contain the nonzero elements.
We used the CLP data set [12], with an allele product 2 x 4 x 4 x 4. The results are shown in Figure 12. The sequential program runs for 1473 seconds. At 8 processors, TreadMarks achieves a speedup of 5.57, which is 93%of the 5.99 obtained by PVM. A high computation-to-communication ratio leads to good speedups and also explains the fact that PVM and TreadMarks are close in performance. However, we were able to identify three reasons for the lower performance of TreadMarks. First, while both versions send only the nonzero elements, PVM performs this transmission in a single message. TreadMarks sends out a diff request and a response for each page in the genarray. For the CLP data set, the size of the genarray is about 16 pages. Second, false sharing occurs in TreadMarks because the nonzero values in the parents' genarrays are assigned to processors in a round robin fashion. In PVM, when the parents' genarrays are distributed, each processor gets only its part of the genarray, but in TreadMarks, a processor gets all the nonzero elements in the page, including those belonging to other processors. The third and final reason for the difference in performance is diff accumulation. The bank of genarrays is re-initialized at the beginning of the computation for each nuclear family. Although the processors need only the newly initialized data, TreadMarks also sends diffs created during previous computations.