Next: 3.8 Water Up: 3 Results Previous: 3.6 TSP

3.7 QSORT

In QSORT, the quicksort algorithm is used to partition an unsorted list into sublists. When the sublist is sufficiently small, the integers are sorted using bubblesort. QSORT is parallelized using a work queue that contains descriptions of unsorted sublists, from which worker threads continuously remove the lists.

In the TreadMarks version of QSORT, the list and the work queue are shared, and accesses to the work queue are protected by a lock. Unlike TSP, in QSORT, the processor releases the task queue without subdividing the subarray it removes from the queue. If the subarray is further divided, the processor reacquires control of the task queue, and places the newly generated subarrays back on the task queue. The PVM version uses a master-slave arrangement similar to TSP, with n slaves and 1 master, where n is the number of processors. The master maintains the work queue and the slaves perform the partitioning and the sorting.

In our experiments, the array size was 512K and the bubblesort threshold was 1024. The speedups are shown in Figure 7. The sequential program runs for 53 seconds. The 8-processor speedups using TreadMarks and PVM are 4.46 and 6.17, respectively. Several factors contribute to the 28%difference in performance. The most important reason is the diff requests in TreadMarks. At 8 processors, TreadMarks sends 10 times more messages than PVM. Among the 33742 messages sent in TreadMarks, about 32000 messages are sent due to diff requests and diff transmission. Since the bubblesort threshold is 1024, all of the intermediate subarrays are larger than one page, resulting in multiple diff requests for each subarray, in addition to some false sharing. Diff accumulation also occurs in this case because the intermediate subarrays and the work queue migrate among processes.


logoRice Systems Group