Next: 3.12 Summary Up: 3 Results Previous: 3.10 3-D FFT

3.11 ILINK

ILINK [15][7] is a widely used genetic linkage analysis program that locates specific disease genes on chromosomes. The input to ILINK consists of several family trees. The program traverses the family trees and visits each nuclear family. The main data structure in ILINK is a pool of genarrays. A genarray contains the probability of each genotype for an individual. Since the genarray is sparse, an index array of pointers to nonzero values in the genarray is associated with each one of them. A bank of genarrays large enough to accommodate the biggest nuclear family is allocated at the beginning of program, and the same bank is reused for each nuclear family. When the computation moves to a new nuclear family, the pool of genarrays is reinitialized for each person in the current family. The computation either updates a parent's genarray conditioned on the spouse and all children, or updates one child conditioned on both parents and all the other siblings.

We use the parallel algorithm described in Dwarkadas et al. [8]. Updates to each individual's genarray are parallelized. A master processor assigns the nonzero elements in the parent's genarray to all processors in a round robin fashion. After each processor has worked on its share of nonzero values and updated the genarray accordingly, the master processor sums up the contributions of each of the processors.

In the TreadMarks version, the bank of genarrays is shared among the processors, and barriers are used for synchronization. In the PVM version, each processor has a local copy of each genarray, and messages are passed explicitly between the master and the slaves at the beginning and the end of each nuclear family update. Since the genarray is sparse, only the nonzero elements are sent. The diffing mechanism in TreadMarks automatically achieves the same effect. Since only the nonzero elements are modified during each nuclear family update, the diffs transmitted to the master only contain the nonzero elements.

We used the CLP data set [12], with an allele product 2 x 4 x 4 x 4. The results are shown in Figure 12. The sequential program runs for 1473 seconds. At 8 processors, TreadMarks achieves a speedup of 5.57, which is 93%of the 5.99 obtained by PVM. A high computation-to-communication ratio leads to good speedups and also explains the fact that PVM and TreadMarks are close in performance. However, we were able to identify three reasons for the lower performance of TreadMarks. First, while both versions send only the nonzero elements, PVM performs this transmission in a single message. TreadMarks sends out a diff request and a response for each page in the genarray. For the CLP data set, the size of the genarray is about 16 pages. Second, false sharing occurs in TreadMarks because the nonzero values in the parents' genarrays are assigned to processors in a round robin fashion. In PVM, when the parents' genarrays are distributed, each processor gets only its part of the genarray, but in TreadMarks, a processor gets all the nonzero elements in the page, including those belonging to other processors. The third and final reason for the difference in performance is diff accumulation. The bank of genarrays is re-initialized at the beginning of the computation for each nuclear family. Although the processors need only the newly initialized data, TreadMarks also sends diffs created during previous computations.



Next: 3.12 Summary Up: 3 Results Previous: 3.10 3-D FFT


logoRice Systems Group