Next: 3.7 QSORT Up: 3 Results Previous: 3.5 Integer Sort

3.6 TSP

TSP solves the traveling salesman problem using a branch and bound algorithm. The major data structures are a pool of partially evaluated tours, a priority queue containing pointers to tours in the pool, a stack of pointers to unused tour elements in the pool, and the current shortest path. The evaluation of a partial tour is composed mainly of two procedures, get_tour and recursive_solve. The subroutine get_tour removes the most promising path from the priority queue. If the path contains more than a threshold number of cities, get_tour returns this path. Otherwise, it extends the path by one node, puts the promising paths generated by the extension back on the priority queue, and calls itself recursively. The subroutine get_tour returns either when the most promising path is longer than a threshold, or when the priority queue becomes empty. The procedure recursive_solve takes the path returned by get_tour, and tries all permutations of the remaining nodes recursively. It updates the shortest tour if a complete tour is found that is shorter than the current best tour.

In the TreadMarks version, all the major data structures are shared. The subroutine get_tour is guarded by a lock to guarantee exclusive access to the tour pool, the priority queue, and the tour stack. Updates to the shortest path are also protected by a lock. The PVM version uses a master-slave arrangement. With n processors, there are n slave processes and 1 master process. In other words, one processor runs both the master and one slave process, while the remaining processors run only a slave process. The master keeps all the major data structures in its private memory. It executes get_tour and keeps track of the optimal solution. The slaves execute recursive_solve, and send messages to the master either to request solvable tours, or to update the shortest path.

We solved a 19 city problem, with a recursive_solve threshold of 12. The speedups are shown in Figure 6. The sequential program runs for 95 seconds. At 8 processors, TreadMarks obtains a speedup of 6.21, which is 78%of the speedup of 7.97 obtained by PVM. At 8 processors, TreadMarks sends 11 times more messages and 150 times more data than PVM.

The performance gap comes from the difference in programming styles. In the PVM version of TSP, only the tours directly solvable by recursive_solve and the minimum tour are exchanged between the slaves and the master. These message exchanges take only 2 messages. In contrast, in TreadMarks, all the major data structures migrate among the processors. In get_tour, it takes at least 3 page faults to obtain the tour pool, priority queue, and tour stack. As for the amount of data, because of diff accumulation, on average, a processor gets (n-1) diffs on each page fault, where n is the number of processors in the system. Furthermore, there is some contention for the lock protecting get_tour. On average, at 8 processors, each process spends 1.1 out of 15 seconds waiting at lock acquires.



Next: 3.7 QSORT Up: 3 Results Previous: 3.5 Integer Sort


logoRice Systems Group