3.12 Summary

Next: 4 Conclusions Up: 3 Results Previous: 3.11 ILINK

3.12 Summary

From our experience with PVM and TreadMarks, we conclude that it is easier to program using TreadMarks than using PVM. Although there is little difference in programmability for simple programs, for programs with complicated communication patterns, such as ILINK and 3-D FFT, it takes a lot of effort to figure out what to send and whom to send it to.

Our results show that because of the use of release consistency and the multiple-writer protocol, TreadMarks performs comparably with PVM on a variety of problems in the experimental environment examined. These results are corroborated by those in [4], which performed a similar experiment comparing the Munin DSM system against message passing on the V System [6]. For five out of the twelve experiments, TreadMarks performed within 10%of PVM. Of the remaining experiments, Barnes-Hut and to a lesser extent IS-Large exhibit poor performance on both PVM and TreadMarks. With the data sets used, these applications have too low a computation-to-communication ratio for a network of workstations. For the remaining five experiments, the performance differences are between 10%and 30%.

The separation of synchronization and data transfer and the request-response nature of data communication in TreadMarks are responsible for lower performance for all the TreadMarks programs. In PVM, data communication and synchronization are integrated together. The send and receive operations not only exchange data, but also regulate the progress of the processors. In TreadMarks, synchronization is through locks/barriers, which do not communicate data. Moreover, data movement is triggered by expensive page faults, and a diff request is sent out in order to get the modifications. In addition, PVM benefits from the ability to aggregate scattered data in a single message, an access pattern that would result in several miss messages in the invalidate-based TreadMarks protocol.

Although the multiple-writer protocol addresses the problem of simultaneous writes to the same page, false sharing still affects the performance of TreadMarks. While multiple processors may write to disjoint parts of the same page without interfering with each other, if a processor reads the data written by one of the writers after a synchronization point, diff requests are sent to all of the writers, causing extra messages and data to be sent.

In the current implementation of TreadMarks, diff accumulation occurs as a result of several processors modifying the same data, a common pattern with migratory data. Diff accumulation is not a serious problem when the diff sizes are small, because several diffs can be sent in one message.

Next: 4 Conclusions Up: 3 Results Previous: 3.11 ILINK

3.12 Summary

Rice Systems Group